Computational protocol: Evolutionary Roots and Diversification of the Genus Aeromonas

Similar protocols

Protocol publication

[…] Phylogenetic reconstruction of all strains was carried out from the concatenated sequences of mdh and recA genes. For each gene, the translated sequences were aligned using the ClustalW program () implemented in MEGA6 () and translated back to obtain the nucleotide alignments. Both alignments were concatenated with the DAMBE program (v5.3.10; ).Dating and diversification model analyses were performed using two different approaches to obtain one sequence per species. In one approach, the consensus sequence for each species was obtained from the sequences of all the strains belonging to the same species. For those species with only a single strain, the concatenated sequence was used. The consensus DNA sequences were obtained using the R seqinr package () and the majority method option, in which the character with the highest frequency is returned as the consensus character. In the second approach, we generated the species tree from multiple alignments of each gene as separate data partitions, with several individuals per species, using the starBEAST method (), an extension of the BEAST (Bayesian Evolutionary Analysis Sampling Trees) software package (). [...] Bayesian phylogenetic trees were reconstructed with the BEAST program (v1.8.1; ) from the data sets. The model of evolution for each gene was determined using the jModelTest 2 program (). The general time-reversible model with discrete gamma distribution and invariant sites (GTR+G+I) was selected as the best-fit model of nucleotide substitution. The Bayesian analyses were performed using a GTR model with four gamma categories, a Yule process of speciation, and an uncorrelated lognormal relaxed-clock model of rate as the tree priors, as well as other default parameters. We performed three independent Markov Chain Monte Carlo (MCMC) runs of 20 (consensus tree), 50 (all strains) or 100 (species tree) million generations, sampling every 2,000 (consensus tree) or 5,000 (all strains and species tree) generations. Posterior distributions for parameter estimates and likelihood scores to approximate convergence were visualized with the Tracer program (v1.6.0; ). Visual inspection of traces within and across runs, as well as the effective sample sizes (EES) of each parameter (>200), allowed us to confirm that the analyses were adequately sampled. A maximum clade credibility (MCC) tree was chosen by TreeAnnotator (v1.8.1; ) from the combined output of the three MCMC runs using the LogCombiner program after the removal of the initial trees (20–25%) as burn-in. The MCC tree was visualized with the program FigTree (v1.4.2). [...] All analyses were performed in the R environment (v3.1.3; ) using functions implemented in ape (), LASER (, ) and TreeSim () packages. MCC ultrametric trees (consensus and species tree chronograms) were used after excluding the calibration outgroup.Standard lineages-through-time (LTT) plots, linear regression analysis, and LTT plots obtained from 1,000 simulated phylogenies with the same size and diversification rate for each set were generated as previously described (), to graphically visualize and evaluate the temporal pattern of lineage diversification in Aeromonas. Moreover, we also estimated the theoretical LTT curve, a method recently developed by , to assess the fit of our data.We used the birth–death likelihood (BDL) tests implemented in LASER to detect the temporal pattern of diversification and the speciation and extinction rates (λ and μ) from the Aeromonas phylogeny. The LTT plot derived from the MCC tree was used to test the null hypothesis of no-rate change versus variable-rate change in diversification, applying the maximum likelihood (ML) approach of Rabosky, the test ΔAICRC (). This statistic is calculated as: ΔAICRC = AICRC - AICRV, where AICRC is the Akaike information criterion (AIC) score for the best fitting rate-constant diversification model, and AICRV is the AIC for the best fitting variable-rate diversification model. Thus, a positive value for ΔAICRC indicates that the data are best approximated with a rate-variable model, while a negative ΔAICRC value suggests a rate-constant model of diversification. We tested five different models, two of which were rate-constant (pure-birth or Yule and birth–death) and three were rate-variable (DDL, DDX and Yule 2-rates) ().We calculated the gamma (γ) statistic () and its significance by simulating 1,000 phylogenies, as described previously (). This statistic compares the relative node positions in a phylogeny with those expected under a constant diversification rate model, in which the statistic follows a standard normal distribution. Positive γ values evidence that nodes are closer to the tips than expected under the constant rate model. When γ is negative, the internal nodes are closer to the root than expected under a constant model, indicating a decrease in diversification through time. In addition, we compared the observed empirical gamma value with the gamma distribution obtained by simulation.Finally, in order to detect variations in evolutionary rates through time and among lineages, we used the BAMM (Bayesian Analysis of Macroevolutionary Mixtures) program (). All the results and calculations were visualized using the BAMMtools package (), from which we obtained a phylogenetic tree with the diversification rates in each branch, as well as the net diversification rates through time. Moreover, we estimated the cumulative probabilities of the number of rate shifts in a phylogeny (models with 0, 1 or several shifts) and the Bayes factor (BF). The BF () is the ratio of the posterior probabilities of two models: a model with zero rate shifts and another with at least one diversification shift. The BF criterion is not worthy (1–3.2), moderate (3.2–10), strong (10–100), or decisive (>100) evidence in favor of the numerator model. […]

Pipeline specifications