Similar protocols

Pipeline publication

[…] with the soils of cold environments of Himachal Pradesh, India, we isolated a Paenibacillus strain, IHB B 3415, which produces cellulase at low temperature. This bacterium appears to be most closely related to Paenibacillus borealis KK19T based on a 16S rRNA gene sequence analysis. The genome of Paenibacillus sp. IHB B 3415 was sequenced, given the ability of the organism to grow and produce low-temperature active enzymes., Whole-genome shotgun sequencing was completed using the Illumina Genome Analyzer IIx in 76-bp paired-read format. A partial flow cell obtained 28,321,562 raw paired-end (PE) reads with 4,304,877,424 bases of raw sequence. The paired reads were quality filtered using the NGS QC toolkit version 2.3 () (cutoff read length for HQ, 70%; cutoff quality score, 20). A sum of 23,383,467 (77.57%) of the filtered PE reads were obtained without adaptor/primer contamination and were used further for assembly. De novo assembly of the genome data was done using Velvet version 1.2.10 (). In this data set, the k parameter (from 49 to 73) was optimized for best assembly. We found that at k of 51 mers, 89.5% (41,853,238) of the reads were aligned out of 46,766,934 total reads. The Paenibacillus sp. IHB B 3415 genome was assembled in 752 contigs, with a sum of 8,350,804 bases (N50 length, 26,240 bases; maximum contig length, 110 kb). We used SSPACE version 3.0 () to extend and merge the resulting scaffolds based on read-pair information and short overlaps to reduce the number of scaffolds. GapFiller version 1.1 () was used to close the gaps between short scaffolds contained within the large scaffolds by replacing unknown nucleotides (Ns) with true nucleotides based on paired-read information and short overlaps. After filling the gaps, the reads were assembled as 290 scaffolds summing 8,437,849 bp (N50 size, 78,530 bp; longest size, 315,231 bp; G+C content, 50.77%). Annotation conducted on the RAST server using the Glimmer3 option predicted 7,897 protein-coding genes, including 78 RNA genes and 454 predicted SEED subsystem features (). A total of 2,868 (37%) features were covered by SEED subsystems, out of which 2,737 were nonhypothetical proteins. The annotation of Paenibacillus sp. IHB B 3415 using Prodigal () predicted 7,335 coding sequences summing to a total of 7,161,309 bases, which is ~85% of the assembled size of the genome (8,437,849 bases). Further, tRNAscan-SE () and RNAmmer () predicted 77 noncoding RNAs (71 tRNAs, 1 pseudo-tRNA, and 3 RNAs consisting of 3 copies of 5S rRNA and one copy each of 16S rRNA and 23S rRNA genes)., It is interesting to note that the genome of Paenibacillus sp. IHB B 3415 is currently the second largest genome in the genus after Paenibacillus mucilaginosus in both overall size and the number of genes encoding proteins (). In the IHB B 3415 genome, 1,011 genes were assigned for carbohydrate metabolism, 453 for amino acids and derivatives, and 131 for stress response (including 18 and 3 for heat and cold shock, respectively). The genome contains 16 genes predicted for cellulases, corroborating our results for cellulose degradatio […]

Pipeline specifications