Computational protocol: Use of Multiple Sequencing Technologies To Produce a High-Quality Genome of the Fungus Pseudogymnoascus destructans, the Causative Agent of Bat White-Nose Syndrome

Similar protocols

Protocol publication

[…] Since emerging in 2006, white-nose syndrome has rapidly spread across eastern North America from an index site in New York, causing the mortality of millions of bats of numerous species (). The pathogen was identified as Pseudogymnoascus destructans (, ), a psychrophilic fungus that infects bats during hibernation, when bat body temperatures decrease to the ambient temperature of their hibernacula (). The growth of this fungus on bats and subsequent invasion of epidermal tissues can cause a cascade of deleterious physiological changes resulting in high mortality rates (, ). The fungus is widespread in Eurasia (), and initial genetic and experimental results suggest that it was recently introduced to North America (, ). Critically missing, however, are detailed genomic analyses of P. destructans to shed light on the origins, evolution, global dispersal, and pathogenicity of this fungus.Initial genome sequencing efforts for P. destructans used 454 and Illumina platforms; however, unresolved repeat regions in the genome resulted in fragmented draft assemblies (). Therefore, to generate a high-quality contiguous assembly of the P. destructans genome, we utilized a combination of data from PacBio SMRT reads, Illumina MiSeq 250-bp paired-end reads (SRR1952982), 8-kb 454 “jumping” libraries (SRP001346), and end sequences from an unbiased random shear bacterial artificial chromosome (BAC) library with an average insert size of 100 kb ().The PacBio reads were assembled with the PacBio SMRT version 2.2 analysis pipeline () and yielded 153 contigs. Using Illumina 250-bp paired-end MiSeq reads quality-trimmed to Q30 with ea-utils version 1.1.2 () and error-corrected with Hammer (), 454 8-kb jumping reads, and >100-kb BAC library end sequences, the contigs were extended and scaffolded with SSPACE version 3.0 (); gaps were removed with GapFiller version 1.10 (). Twenty iterations of SSPACE/GapFiller were followed by Pilon version 1.8 () to correct local misalignments and indels from the assembly, resulting in 96 scaffolds. A self-query of the improved scaffolds with the NUCmer application of MUMmer version 3.23 () indicated that the smaller scaffolds (<42,633 bp) and a few larger scaffolds were complete or nearly complete duplications of the sequence in other scaffolds. Thirteen scaffolds, which had a >95% identity over 95% of their length with a larger scaffold in the assembly, were removed. The final assembly contains 83 scaffolds, is 35.818201 Mb in size, has an N50 value of 1.168637 Mb, and contains only 7,812 bp of gaps. This is a significant improvement compared to previous draft assemblies of P. destructans (AEFC01: 1,847 scaffolds, 30.6849 Mb, N50 105.158 kb, 2,328,879 bp Ns; AYKP01: 5,304 scaffolds, 30.2827 Mb, N50 17.914 kb, 0 Ns).The new P. destructans genome contains 38.17% repetitive sequence elements (RepeatModeler version 1.08, RepeatMasker version 4.05, http://www.repeatmasker.org), which required a combination of long-read and large-insert sequencing technologies to scaffold. The nearly complete assembly presented here should serve as a valuable resource to facilitate fungal disease research. […]

Pipeline specifications

Software tools ea-utils, SSPACE, GapFiller, Pilon, MUMmer, RepeatModeler, RepeatMasker
Applications De novo sequencing analysis, Nucleotide sequence alignment
Organisms Pseudogymnoascus destructans
Diseases Leukoencephalopathies