Computational protocol: Evidence for an evolutionary antagonism between Mrr and Type III modification systems

[…] To construct the phylogenetic tree of E. coli and S. enterica strains, all full-length Escherichia and Salmonella 16 S rRNA gene sequences were downloaded from the Greengenes database core set () in the FASTA aligned format. Next, E. coli ED1A 16S rRNA gene sequence was identified using NCBI nblast (with E. coli CFT073 16S rRNA as a seed) and added manually to the dataset. Gene sequences that could not be successfully aligned originated from partially sequenced genomes or were duplicative, and were removed from the dataset. Next, the dataset was re-aligned using Greengenes (), which aligns 16S rRNA gene sequences to 7682 characters full-length gene templates. All thresholds were kept at default values. Finally, based on the alignment, the phylogenetic tree was calculated using MEGA 4.0 () employing the Minimum Evolution method and assuming a Jukes–Cantor model of nucleotide substitution. Bootstrap values based on 1000 replications are listed as percentages at the branching points. [...] Sequences of E. coli Mrr (gi: 127320) and S. typhimurium Mod (gi: 300193) were used as queries for PSI-BLAST () searches (E-value ≤ 1e−3) against the nr database using default parameters and run until convergence. We retrieved 2001 Mrr and 6878 Mod sequences. After removing sequences from partially sequenced genomes and redundant sequences, we performed sequence clustering based on pair-wise BLAST similarity scores, using Cluster Analysis of Sequences (CLANS) (). The clustering was completed at P = 0.012 for Mrr and P = 0.004 for Mod, respectively. The reported P-values give a well-resolved separation of multiple distinct clusters. The P-value was chosen empirically, given the P-value plot for each data set, which shows a histogram of the number of sequences for each E-value below a given value. For example, a cut-off of 0.004 will exclude connections worse than 0.004. Clusters are thought of as robust if small changes in cut-off values do not result in major changes in their content. Next, we extracted members of clusters containing the Mrr query sequence (gi: 127320) and the Mod query sequence (gi: 300193) for correlation analysis. At the end, we obtained 75 MrrMG1655 and 211 ModLT2 homologues derived from 272 fully sequenced genomes and eliminated duplicate species to obtain 45 MrrMG1655 and 156 ModLT2 homologues derived from 192 fully sequenced genomes.The Pearson r correlation coefficient measures the degree to which values of two variables are linearly related to each other. It is defined as the covariance of two variables divided by the product of their standard deviations and was calculated for the MrrMG1655 and ModLT2 families using STATISTICA 8 (StatSoft, Inc., Tulsa, OK, USA). We also determined the probability that the observed correlation is real and not a chance occurrence. The obtained correlation of −0.52 for 45 MrrMG1655 and 156 ModLT2 homologues, derived from 192 fully sequenced genomes, is less than the critical value for df = 190, α = 0.05, two-tailed test with P < 0.0001. […]

Pipeline specifications

Software tools MEGA, BLASTP, CLANS, Statistica
Databases Greengenes
Applications Miscellaneous, Phylogenetics, Proteome data visualization
Organisms Escherichia coli, Salmonella enterica subsp. enterica serovar Typhimurium str. LT2, Escherichia coli ED1a
Diseases Drug-Related Side Effects and Adverse Reactions