Computational protocol: A Molecular Genetic Timescale for the Diversification of Autotrophic Stramenopiles (Ochrophyta): Substantive Underestimation of Putative Fossil Ages

Similar protocols

Protocol publication

[…] The nuclear-encoded SSU rRNA was chosen as the molecular marker for inferring phylogenetic relationships among the major ochrophyte lineages and the timeframes within which the photosynthetic heterokonts originated and diversified. Utilization of this gene allowed for the most expansive taxonomic sampling of the autotrophic stramenopile classes, including the non-photosynthetic oomycetes which are thought to be the closest living relatives of the ochrophytes , , , , . Incorporating the immediate sister-taxon is imperative for gaining increased accuracy in elucidating the time period within which a given lineage evolved (i.e. it allows for the estimation of both stem- and crown-ages). In addition to the non-photosynthetic stramenopiles, we used representatives of the dinoflagellates, haptophytes, ‘green plants’, and rhodophytes as outgroups and for calibration purposes. All the 135 nuclear-encoded SSU rRNA sequences used in the study were obtained from GenBank (for accession numbers, see in supplementary information).The software package DAMBE v4.5.55 was utilized to manage the nucleotide data. The alignment of the nucleotide sequences was carried out using MAFFT v6 . The default settings of the parameters were used (scoring matrix value: 200PAM/K = 2; gap opening penalty = 1.53; offset value = 0.00). The alignment strategy implemented was L-INS-i . The alignment is available from the corresponding author upon request. [...] In an attempt to reduce bias in phylogenetic inference, we employed a joint model that accommodates both rate- (heterotachy; ) and pattern-heterogeneity as implemented in the program BayesPhylogenies (available from A reversible-jump Markov chain Monte Carlo (rjMCMC) algorithm was used to determine how many distinct rate-variation patterns among sites and branch length parameters (with a maximum of two parameters for each branch) were required to optimally describe the empirical data matrix. In addition to potentially indentifying regions of the tree where phylogenetic reconstruction might be misled (for example, due to a high degree of heterotachy), an initial well-resolved tree was required to guide the placement of fossil calibrations in the divergence time analyses (below). A General Time Reversible (GTR) model of nucleotide substitution with discretized gamma-distributed rate variability (with 4 rate categories; γ4) was employed throughout. This is slightly simpler than the model implemented in divergence time estimation (GTR + γ4 + I; below), as the authors of BayesPhylogenies recommended against estimating the proportion of invariant sites. Five independent MCMC analyses (each with 1 chain running for 106 generations, sampling every 103 generations) were conducted to approximate the posterior distribution of phylogenetic trees, and post-burnin samples (with burnin set to 10%) from all analyses were combined for parameter summary. Convergence of the MCMC runs was assessed graphically by examining the cumulative posterior and between-run variation in split frequencies using the on-line tool AWTY . [...] Divergence time estimation accommodating topological uncertainty was performed using the relaxed clock model of Drummond et al. under GTR + Γ4 + I as implemented in the program BEAST v1.5.3 . Unlike most other relaxed clock methods available , , this approach does not assume that rates are necessarily autocorrelated across the tree in an ancestor-descendant fashion; rather, branch-specific relative rates are drawn from a lognormal distribution, the mean and standard deviation of which are estimated from the data via MCMC sampling. A birth-death diversification process was used as a prior on the distribution of node heights. Tree topology and divergence times were estimated simultaneously, although for some internal nodes monophyly was enforced to facilitate the placement of prior age calibration distributions (see below). Six replicate runs of 107 generations were performed for each analysis, sampling every 5×104 generations. Convergence, mixing, and effective sample sizes (ESS) were monitored through the use of Tracer v1.5 . Post-burnin samples were combined across runs to summarize parameter estimates. […]

Pipeline specifications

Software tools DAMBE, MAFFT, BayesPhylogenies, AWTY, BEAST
Application Phylogenetics