Computational protocol: Whole-Genome Sequences of 80 Environmental and Clinical Isolates of Burkholderia pseudomallei

Similar protocols

Protocol publication

[…] Burkholderia pseudomallei is the causative agent of melioidosis and is endemic in parts of the tropical world, including northern Australia, Papua New Guinea, and Southeast Asia (). Studies of pathogen phylogeny or diversity using whole-genome sequencing have been dominated by Asian strains, for which more genome sequences were available (, ). We report here the whole-genome sequences of 80 B. pseudomallei isolates from both Australian clinical cases and environmental sampling of geographically diverse regions in northern Australia and Papua New Guinea. The genomes will contribute to our understanding of the global diversity of B. pseudomallei.High-quality, high-molecular-weight genomic DNA was sequenced using a combination of Illumina, 454, and PacBio technologies, depending on the isolate. For those with only Illumina short-insert data (100-bp reads, noted as “I” in ) assemblies were generated with IDBA version 1.1.1 (). For those that also included Roche 454 data (noted as “R”) or Illumina long-insert data (insert sizes 8 to 10 kb, noted as “L”), the libraries were assembled together in Newbler version 2.6 (Roche) and the consensus sequences computationally shredded into 2-kbp overlapping fake reads (shreds). The raw reads were also assembled in Velvet and those consensus sequences computationally shredded into 1.5-kbp overlapping shreds (). Draft data from all platforms were assembled together with AllPaths (), and if Pacific Biosciences data was available (noted in as “P”) and at 100× coverage or greater, assembled using HGAP (). Consensus sequences from all assemblers were computationally shredded and assembled with a subset of read pairs from the long-insert library using Phrap (, ). The resulting assemblies were manually and computationally improved using Consed () and in-house scripts.For strains MSHR62 and MSHR3997, a 10-kb insert library was sequenced on the Pacific Biosciences platform. The assembly was generated by Celera Assembler version 8.0 () by previously described methods (). The longest 25× of corrected sequences were assembled, and contigs composed of fewer than 10 sequences were omitted. Contigs were manually merged based on identified end overlaps to obtain the final assembly. The MSHR62 10-kb insert assembly was used to assist in gap closure and correction of the short-read assembly.For all genomes, annotations were completed at the Los Alamos National Laboratory (LANL) using the Ergatis workflow manager () and in-house scripts. Of the 80 B. pseudomallei genomes assembled, nine are at finished quality (<1 error per 100,000 bp []), 49 are either noncontiguous finished or improved high-quality draft (IHQD) and available as scaffolded draft assemblies, and 22 assemblies are unscaffolded drafts. […]

Pipeline specifications

Software tools IDBA, Newbler, Velvet, ALLPATHS-LG, HGAP, Consed, Celera assembler, Ergatis
Applications Phylogenetics, WGS analysis
Organisms Burkholderia pseudomallei
Diseases Melioidosis