Computational protocol: Genomic and functional analyses of Mycobacterium tuberculosis strains implicate ald in D-cycloserine resistance

Similar protocols

Protocol publication

[…] We used two sets of strains in this study: one set of 337 strains from South Africa (NCBI Bioproject ID numbers: PRJNA183624, PRJNA235615) and one set of 161 strains from China (NCBI Sequence Read Archive (SRA) accession number: SRA065095). For both sets, reads were mapped onto a reference assembly of H37Rv (GenBank accession number: CP003248.2) using BWA version 0.5.9. In cases where read coverage of the reference was greater than 200x, reads were down-sampled using Picard (see URLs) prior to mapping. Variants were identified using Pilon version 1.5 as described. All sites with unambiguous SNPs were used to generate a phylogenetic tree using FastTree version 2.1.3. To assess the amount of unexplained phenotypic resistance in our dataset, we identified genomic polymorphisms known to confer drug resistance in each strain using variants from current diagnostics supplemented with a literature review ().In order to identify novel variants conferring drug resistance, we hypothesized that the presence of causal genotypes in bacteria should be both positively correlated with phenotypes and not evolutionarily independent of them. We therefore conducted two tests, the first of which was a Fisher’s exact test to measure the association between genotype and phenotype. The second test was a test for correlated evolution implemented in the software package BayesTraits (see URLs). Briefly, the correlated evolution test calculates the likelihood of two traits evolving dependently and independently and then compares the two models with a likelihood ratio test. If the dependent model is significantly more likely than the independent model, the hypothesis of evolutionary independence is rejected. All test results were corrected for multiple comparisons using the Benjamini-Hochberg method for plotting purposes. Once both the Fisher’s exact test and the correlated evolution test were calculated, their uncorrected p-values were combined into an F-score by taking the harmonic mean of the negative log p-values. Uncorrected values were used in the F-score calculation to provide maximal discrimination between variants with poor p-values.In order to compare phenotypic drug resistances to genotypic features, two matrices of these features were constructed from the variant calls of all strains. In the first matrix, rare variants (those at ≤ 2% frequency) were collapsed by gene or intergenic region, while common variants (those at > 2% frequency) remained uncollapsed and were considered independently. In the second matrix, we assessed whether each gene had a loss-of-function mutation (defined as a nonsense mutation, frameshift, or large (>30 bp) in-frame insertion or deletion (indel)), regardless of the frequency of the mutation. We then combined the two matrices into a single matrix for association analyses. All analyses were conducted twice: once with all samples, and once after removing strains with known resistance-conferring mutations, in order to amplify signal from strains with potential for novel resistance genotypes (Results). […]

Pipeline specifications

Software tools BWA, Picard, Pilon, FastTree, BayesTraits
Application Phylogenetics
Organisms Mycobacterium tuberculosis
Diseases Tuberculosis
Chemicals Alanine