Computational protocol: Phylogenetic tree shapes resolve disease transmission patterns

Similar protocols

Protocol publication

[…] We used the classifier on phylogenetic trees derived from two real-world tuberculosis outbreak datasets. Outbreak A was previously published Gardy et al. [] and is available in the NCBI Sequence Read Archive under the accession number SRP002589. This dataset comprises 31 M. tuberculosis isolates collected in British Columbia over the period 1995–2008 and was sequenced using paired-end 50 bp reads on the Illumina Genome Analyzer II. Outbreak B comprises 33 M. tuberculosis isolates collected in British Columbia over the period 2006–11, and was sequenced using paired-end 75 bp reads on the Illumina HiSeq. The outbreak, sequences and single nucleotide polymorphisms (SNPs) are presented in Didelot et al. [].For both datasets, reads were aligned against the reference genome M. tuberculosis CDC1551 (NC002755) using Burrows-Wheeler Aligner []. Single nucleotide variants were identified using samtools mpileup [] and were filtered to remove any variant positions within 250 bp of each other and any positions for which at least one isolate did not have a genotype quality score of 222. The remaining variants were manually reviewed for accuracy and were used to construct a phylogenetic tree for each outbreak as described above. We apply the classification methods to 1000 samples from the BEAST posterior timed phylogenies estimated from WGS data using a birth–death prior. […]

Pipeline specifications

Software tools BWA, SAMtools, BEAST
Databases SRA
Applications Phylogenetics, WGS analysis