Computational protocol: Phylogenetic analyses suggest reverse splicing spread of group I introns in fungal ribosomal DNA

Similar protocols

Protocol publication

[…] To provide a framework for understanding group I intron evolution in the fungi, we reconstructed a phylogeny of the Pezizomycotina that included Symbiotaphrina spp. This tree was inferred from a combined DNA data set of nuclear SSU rDNA, LSU rDNA, and RPB2 DNA (2100 nt) from 84 ascomycetes with one basidiomycete as the outgroup. These data are available from GenBank. The combined data set was analyzed using a GTR + Γ + I model of evolution for each of the five data partitions (SSU, LSU, RPB2-1st, -2nd, -3rd codon positions). Bayesian analysis (MrBayes V3.0b4 []) was initiated using a random tree from the combined dataset with four chains running simultaneously for 5,000,000 generations, and trees sampled every 100 generations. The first 10,000 trees were discarded (burnin) and a majority rule consensus tree was generated from the remaining 40,000 (post burnin) trees. A neighbor-joining analysis was also used to calculate bootstrap support values for nodes in the Bayesian consensus tree. The supra-generic taxon names used in this tree follow []. [...] The ten group I introns found in Symbiotaphrina spp. were aligned with 179 fungal group I introns at 28 different rDNA sites. The non-Symbiotaphrina introns are published [e.g., []] and available either from GenBank or from the Comparative RNA Web Site []. The introns from the 28 rDNA sites represent well the diversity of fungal rDNA group I introns although we excluded a small number of introns from other sites that were either difficult to align or to unambiguously identify their rDNA genic position (e.g., S940, S1049, S1201). The 189 group I introns were aligned through juxtaposition of the secondary structural elements P1–P9 found in nuclear group I introns [,,]. For this procedure we used, wherever possible, existing secondary structures from representatives of different intron insertion sites [e.g., [,,-]] to guide the alignment. We did not attempt to include all available fungal group I introns (there are nearly 1200 group I introns in this group [see []]) but sampled (given the taxonomic distribution) evenly the different lineages. Our approach was designed to provide an overall view of fungal group I intron phylogeny and is not expected to detect lateral transfers within intron lineages that would be apparent in detailed analyses of introns at particular rDNA sites and the host phylogeny [e.g., [,,,,,]]. Given the large number of introns and rDNA genic sites to consider, we divided the phylogenetic analyses into increasingly more focused data sets. The initial data set of 189 introns was used to gain broad insights into group I intron phylogeny and in particular, the distribution of the S. buchneri introns within the tree. This tree provided evidence for the vertical evolution and movement of introns. Thereafter, we reduced the data set to a representative group of 116 introns to increase the phylogenetic resolution. Finally, we included the putative reverse splicing candidates in data sets of 51 and 34 sequences. The sequences were pruned approximately uniformly from the trees to retain the diversity of introns at the different rDNA sites. This approach was necessary to gain meaningful insights into group I intron evolution because phylogenetic methods often perform poorly under the situations used here; i.e., the interrelationships of many divergent lineages need to be resolved with a relatively small data set.A total of 136 aligned positions were selected for the initial phylogenetic analyses (alignment available from DB upon request). For these data, we used two different approaches to infer the phylogeny. First, we used the single parameter Jukes-Cantor (JC) evolutionary model [] with neighbor-joining (NJ) tree reconstruction to estimate a tree. This "simple" model is potentially useful for large data sets with short (in this case, highly divergent) sequences when multiple parameter estimates are expected to have high associated variances [e.g., [,]]. Under such conditions, the maximum likelihood method may give an incorrect topology []. Branch lengths will, however, be underestimated under the JC model. The JC-NJ tree was inferred using PAUP* (V4.0b10 []) and bootstrap analyses (2000 replications) were done to assess the support for monophyletic groups in the JC-NJ tree. In the second approach, we used the parameter-rich GTR + Γ model ([] i.e., estimated proportion of invariant sites = 0.0147) in a Bayesian inference as described above to calculate posterior probabilities for nodes in the intron tree. In this analysis, a random starting tree was initiated and run for 3,000,000 generations with trees sampled every 1000th generation. To increase the probability of chain convergence, the first 2,000 trees were discarded as burnin and the remaining 1,000 were used to calculate the posterior probabilities.Based on the analysis of the 189-sequence data set, we generated a second reduced intron data set of 116 sequences that maintained the diversity of intron sites in the large data set. A JC-NJ tree was inferred from these data (with bootstrap support values) and Bayesian posterior probabilities were calculated for the tree as described above. In addition, we did an unweighted maximum parsimony (MP) bootstrap analysis of the data. For this method, a heuristic search was used with each of the 2000 bootstrap pseudosamples and starting trees were obtained using random additions (10 rounds) with tree bisection-reconnection branch swapping. The 51-sequence data set (138 nt in this alignment) was analyzed with the JC-NJ, MP, and Bayesian methods as described above. In addition, we did a maximum likelihood bootstrap analysis of these data. In this approach, the gamma value (with 4 rate categories) and the transition/transversion ratio were estimated using PAUP*. Bootstrap analyses (100 replicates) were then done using DNAML (PHYLIP V3.6b []) with 1 random taxon addition and global rearrangements. We also generated a second reduced alignment of 34 introns that included only the catalytic core (66 nt) of the fungal group I introns []. Analysis of the core region alignment allowed us to assess whether the most highly conserved region of these ribozymes resulted in essentially the same tree as when the more variable regions were included. For the core alignment, we used Bayesian inference (as described above) to infer a 50% majority-rule consensus phylogeny from the final 1000 trees in the posterior distribution. In all of these intron phylogenies, the evolutionarily distantly related group IE introns [,] were used to root the subtree of IC introns []. Finally, we used the maximum likelihood-based Shimodaira-Hasegawa statistical test [] to assess likelihood support alternative intron topologies. […]

Pipeline specifications

Software tools MrBayes, PAUP*, PHYLIP
Databases CRW
Application Phylogenetics
Organisms Saccharomyces cerevisiae