Computational protocol: Use of bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis

Similar protocols

Protocol publication

[…] Full details of the bioinformatics workflow are provided in the Supplementary Information. Briefly, reads were trimmed and mapped to the M. bovis reference genome (GenBank accession number BX248333; ) using BWA (). Variants were identified using SAMtools () and filtered on base quality, mapping quality, heterozygosity, proportion of samples with high quality calls at each site, clustering of variant loci, and location relative to repeat regions of the genome. The resulting variant sites were concatenated for each isolate, giving the genetic sequences used for downstream analyses.A maximum likelihood phylogeny was generated in PhyML v3.0 () under the Jukes Cantor model of nucleotide substitution, including the M. bovis reference sequence as outgroup, and evaluating statistical support for individual nodes based on 1000 non-parametric bootstraps. A Bayesian phylogeny was generated in MrBayes () under the Jukes Cantor model, also including the M. bovis reference sequence, and was run for 106 MCMC iterations at which point the standard deviation of split frequencies was below 0.01. Raw pairwise single nucleotide polymorphism (SNP) differences between sequenced samples were calculated in MEGA 5 (), using pairwise deletions in the event of missing data. Due to the increased level of sampling from 2009 onwards mentioned above and the low levels of within-breakdown diversity (see Results), further analyses were restricted to one representative sample per herd breakdown. [...] To quantify the spread of M. bovis across the landscape, continuous phylogeographic models were applied using the Bayesian phylogenetic program BEAST v1.7.4 (, ). This analysis was restricted to the VNTR-10 clade containing the majority of isolates (Group 1, see Results and ), using one representative sample per breakdown. A strict Brownian model of spatial diffusion was compared to a relaxed model allowing diffusion rates to vary among branches, with rates drawn from a Cauchy distribution (see Supplemental Information). A relaxed model with branch rates drawn from a gamma distribution was also tested but failed to converge. Models were run for 5 × 108 iterations, assessed for convergence in Tracer, and model fit evaluated based on log Marginal Likelihood Estimates (MLE) generated by using path-sampling and stepping-stone sampling in BEAST (). Posterior trees for the best fitting model were combined to find and annotate the Maximum Clade Credibility (MCC) tree. Node locations, branch lengths, and branch-specific rates of geographic dispersal were extracted and evaluated for the MCC tree.Given that the molecular clock rate of M. bovis and other closely related mycobacteria has been shown to be slow and variable (, ), it was uncertain whether these data would contain enough genetic signal to accommodate phylogeographic analyses. To test this, we simulated a homogeneous spatial diffusion process along the MCC phylogeny generated above, guided by empirical rates, generating a set of spatial coordinates for sampled sequences under a set rate of spatial diffusion along the existing phylogeny. We then evaluated whether phylogeographic analysis in BEAST, using the settings described above, using the simulated coordinates and observed sequences and sampling dates as input, could recover the originally specified diffusion rate for each of 100 simulations. […]

Pipeline specifications

Software tools BWA, SAMtools, PhyML, MrBayes, MEGA, BEAST
Application Phylogenetics
Organisms Mycobacterium bovis, Bos taurus, Bacteria, Mycoplasma bovis
Diseases Tuberculosis