Computational protocol: Multiple conversion between the genes encoding bacterial class-I release factors

Similar protocols

Protocol publication

[…] We retrieved the aa sequences of RF1 and RF2, and the nucleotide (nt) sequences of 16S and 23S ribosomal RNA (rRNA) genes, in the genome of 99 taxa belonging to the phylum Bacteroidetes from GenBank database. The retrieved RF1 and RF2 aa sequences are aligned into a single alignment using MAFFT, followed by manual refinement. After the exclusion of ambiguously aligned positions, 230 aa positions were remained in the final RF alignment. The 16S and 23S rRNA nt sequences were separately aligned as described above, and then concatenated into a single alignment. The final rRNA alignment includes unambiguously aligned 3,729 nt positions.The RF alignment was subjected to both ML and Bayesian phylogenetic analyses. The ML analyses were conducted using RAxML ver. 8.0 under the LG model incorporating among-site rate variation (ASRV) approximated with a discrete gamma distribution with four categories (LG + Γ model). The ML tree was selected from the heuristic tree search initiated from 20 randomized stepwise addition parsimony trees. In ML bootstrap analyses (100 replicates), a single tree search per replicate was performed. Bayesian analyses under the LG + Γ model were also conducted using MrBayes 3.2.1. Eight parallel Metropolis-coupled Markov chain Monte Carlo runs, each consisting of one cold and three heated chains with a chain temperature of 0.2, were run for 5,000,000 generations. Log-likelihood scores and trees with branch lengths were sampled at every 1,000 generations. The first 1,250,000 generations were excluded as burn-in, and the remaining trees were summarized to obtain Bayesian posterior probabilities.The rRNA alignment was subjected to both ML and Bayesian phylogenetic analyses as described above, except the nt substitutions were modelled under the general-time-reversible model incorporating ASRV approximated with a discrete gamma distribution with four categories (GTR + Γ model). In the alignment, G + C content varied from 49.2 to 59.5%. The impact of the variation in G + C content across a tree on tree reconstruction was evaluated by the additional ML analyses described below. We estimated the 95% confidence interval of the G + C content for each sequence based on the 3,729 nt positions, and surveyed the sequences of which G + C contents significantly depart from the average G + C content calculated from the 99 sequences. Then we modified the original rRNA alignment by removing the sequences with significantly high or low G + C content (Note that the rRNA sequences of the members of Bacteroidia possessing 4aa_mtif-type RF1 were retained in the second alignment, regardless of their G + C contents). The second alignment was subjected to the ML analysis under the GTR + Γ model as described above. In addition, we recoded four nucleotide characters (A, C, G, and T) into purine (R; A or G) and pyrimidine (Y; C or T) in the original alignment, as this ‘RY-coding’ procedure were known to cancel or reduce the artifactual impact of the variation in G + C content in both empirical and simulated nt data on tree reconstruction. The resultant ‘RY-coding’ alignment was subjected to the ML analysis with the model of Cavender and Felsenstein for two-state characters incorporating ASRV approximated with a discrete gamma distribution with four categories. We used RAxML for the ML analyses of the rRNA alignments comprising four nucleotide characters, while PhyML ver. 3.0 was used for the ML analysis of the RY-recoded alignment. [...] We generated ‘6-pair’ alignments from the RF alignment to survey the potential signal of the conversion between RF1 and RF2 genes. The alignment positions (230 aa positions) in 6-pair alignments were identical to those in the RF alignment. Each 6-pair alignment contained a pair of the 12aa_motif-type RF1 and RF2 sequences and five pairs of the 4aa_motif-type RF1 and RF2 sequences, which were sampled from five species in Bacteroidetes. The detailed sequence sampling in these alignments was described in Results and Discussion. We preliminary subjected ‘4-pair,’ ‘8-pair,’ and ‘10-pair’ alignments, which comprised a single pair of 12aa_motif-type RF1 and RF2 plus 3, 7, and 9 pairs of 4aa_motif-type RF1 and RF2, respectively, to the SW analyses (; see below for the details of the SW analyses). The signal of RF1-RF2 gene conversion (GC-signal) in windows 12–15 appeared to be less conspicuous in the 4-pair alignment-based analysis than the 6-pair alignment-based analysis (Compare the plot in pink with that in green in ). On the other hand, the 10-pair alignment-based analysis was seemingly more sensitive to ‘non GC-signal’ in the N-terminal region (windows 1–10), which were unrelated to the RF1-RF2 gene conversion, than the 6-pair alignment-based analysis (Compare the plot in blue with that in green in ). The GC-signal in windows 12–15 from the 8-pair alignment-based analysis appeared to be conspicuous as that from the 6-pair alignment-based analysis, whereas the aforementioned analysis was more sensitive to non GC-signal in windows 2–6 than the undermentioned analysis (Compare the plot in purple with that in green in ). Considering the balance between the sensitivity to the GC-signal and the insensitivity to non GC-signal, we decided to subject 6-pair alignments to the main SW analyses in the current study.We subjected all 6-pair alignments to the SW analysis. For each window, we calculated the lnL of the tree assuming no gene conversion (Treeglobal), and that of the tree affected by gene conversion (Treeconv), and then subtracted the former value from the latter value (see Results and Discussion for the details). Statistical significance of the difference between the two lnL values (ΔlnL) was assessed by a parametric bootstrap test. Seq-Gen version 1.3.3 was used to simulate 50 replicates with 230 aa positions over the ML tree, whose topology and branch lengths were inferred from each of the 50 6-pair alignments, respectively. Of note, all ML trees inferred from 230 aa positions of 50 6-pair alignments recovered the split of RF1 and RF2 sequences. The model parameters for sequence simulation were estimated from the original datasets. We subjected the simulated datasets (2,500 in total) to the SW analysis to obtain the null distribution of the ΔlnL values and set the critical value for a 0.01-level test. We applied RAxML 8.0 with the LG + Γ model for the SW analyses of the original 6-pair alignments. We used the WAG + Γ model for both sequence simulation and SW analyses based on the simulated sequence data, as LG model is not implemented in Seq-Gen.The same procedure described above was applied the analyses with the 7aa_motif-type RF1 and RF2 sequences of three members of Chloroflexi (see Results and Discussion). [...] The tertiary structures of RF1 and RF2 of Thermus thermophiles (RCSB Protein Data Bank IDs 3MR8 and 2X9R, respectively), which reside in the ribosome as a part of the release complex, were visualized using VMD 1.9.1. In this work, the four domains in RF1/2 are defined as per Korostelev (2011). […]

Pipeline specifications