Computational protocol: Whole Genome Sequencing Reveals Local Transmission Patterns of Mycobacterium bovis in Sympatric Cattle and Badger Populations

Similar protocols

Protocol publication

[…] M. bovis was isolated and confirmed from suspect bovine granulomatous tissue using standard protocols. Confirmed cultures were grown to single colonies on LJ slopes and single colonies were amplified for DNA extraction using the standard CTAB and solvent extraction protocol . Extracted DNA was sequenced at the Sir Henry Wellcome Functional Genomics Facility at the University of Glasgow using an Illumina Genome Analyser IIx. Pair-end reads of 70 bp in length, separated by an average of about 350 bp, were trimmed from both ends based on phred quality scores so as to result in an error rate of 0.001 or less for each base call in the remaining sequence. Reads were mapped to a published UK reference genome (AF2122/97) using the Geneious assembler under the “medium-low sensitivity” option, allowing for a maximum of 10% gaps and mismatches per read . The reference sequence belongs to the same spoligotype (SB0140) as VNTR type 10 and shares identical repeat numbers with it for four out of eight loci used for typing. Mapping resulted in greater than 99% genome coverage with at least 1× and an average read depth of 60–112× for all isolates (see Table S1 in for full details). Consensus sequences were generated from the mapped contig based on the quality score sum for each position. A cumulative quality score threshold of 60 (corresponding to an error probability of 1 in 1,000,000) was applied to each position to ensure that accuracy of the final consensus sequence was dependent on both quality and read depth, rather than read depth alone. Below this threshold, the consensus base call was scored as unknown (“N”). Alignment of consensus sequences was carried out using Mauve , as implemented within Geneious, assuming collinear genomes and with automatic calculation of seed weight and of the minimum Locally Collinear Blocks (LCB) score. Regions that were difficult to align or which contained >3 consecutive columns of unknown bases or gaps were removed from the final alignment. Similarly, sites that were polymorphic solely due to one or more sequences having ambiguity base calls were removed; this was the only context in which ambiguities were observed. The final alignment, which still represented 99.2% of the reference genome, thus only contained dimorphic single site polymorphisms situated within otherwise invariable regions.After stripping identical sites, a total of 39 SNPs were identified (38 substitutions, 1 deletion, Table S2 in ), of which seven were shared between two or more sequences. All SNPs were examined to confirm their validity before further analysis. Of particular concern was the potential inclusion of spurious SNPs associated with repeat regions in the genome for which mapping may be unreliable. While four of the SNPs were found to fall either in or close to potentially problematic regions, the reliability of the mapping and SNP calling could be confirmed in all four cases (see Supplementary Materials). All SNP calls were supported by at least 38× coverage, with high consistency among reads (usually>95%). The only exception to this was a SNP in position 221927 (G−>A), for which consensus calling was ambiguous in one of the four isolates in which it occurred (Herd5_E_2010, 92×, A: 64%, G: 36%, Table S2 in ). Preliminary analyses further showed that the phylogenetic information provided by this site was in conflict with that of other informative positions (which were in complete agreement). Because these observations raised doubts about the reliability of scoring this SNP as well as about the information it provided, the site was removed from the data set. The final data set was used to generate a maximum likelihood tree using phyml under a Jukes-Cantor model using a heuristic search and the reference genome for outgroup rooting.All sequence data generated for this project are available from the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under accession number ERP001418. […]

Pipeline specifications

Software tools Geneious, Mauve, PhyML
Databases ENA
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Mycobacterium bovis, Bos taurus, Homo sapiens, Mycoplasma bovis
Diseases Encephalitis, Arbovirus, Tuberculosis, Tuberculosis, Meningeal