Computational protocol: Draft Genome Sequence of Bacillus shackletonii LMG 18435T, Isolated from Volcanic Mossy Soil

Similar protocols

Protocol publication

[…] The type strain LMG 18435T (=CIP 107762T) of Bacillus shackletonii was isolated from mossy soil taken from the eastern lava flow of northern Candlemas Island, South Sandwich archipelago (). Recently, the thermophilic strain K5 of B. shackletonii was isolated from a biotrickling filter and appeared to have a strong capacity for producing polyhydroxybutyrate (up to 2.28 g/L) (). Because of the application prospects and lack of genomic information of B. shackletonii, its type strain LMG 18435T was selected as one of the research objects in our genome sequencing project for genomic taxonomy and phylogenomics of Bacillus-like bacteria. Here, we present the first draft genome sequence of B. shackletonii.The genome sequence of B. shackletonii LMG 18435T was obtained by paired-end sequencing on the Illumina HiSeq 2500 system. Two DNA libraries with insert sizes of 272 and 5,106 bp were constructed and sequenced. After filtering of the 1.29-Gb raw data, the 1.23-Gb clean data were obtained, providing approximately 233-fold coverage. The reads were assembled via the SOAPdenovo software version 1.05 (), using a key parameter K setting at 76. Through the data assembly, 24 scaffolds with a total length of 5,297,592 bp were obtained, and the scaffold N50 was 3,831,633 bp. The average length of the scaffolds was 220,733 bp, and the longest and shortest scaffolds were 3,831,633 bp and 536 bp, respectively. A total of 95.35% clean reads could be aligned back to the genome, which covered 99.36% of the sequence.The annotation of the genome was performed using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) (http://www.ncbi.nlm.nih.gov/genome/annotation_prok) utilizing GeneMark, Glimmer, and tRNAscan-SE tools (). A total of 5,522 genes were predicted, including 5,401 coding sequences, 94 tRNAs, and 27 rRNAs. There were 3,666 and 2,210 genes assigned to the COG and KEGG databases, respectively. The average DNA G+C content was 36.71%, agreeing with the value 35.4 mol% (). […]

Pipeline specifications