Computational protocol: Integration of Bayesian molecular clock methods and fossil based soft bounds reveals early Cenozoic origin of African lacertid lizards

Similar protocols

Protocol publication

[…] Thirty-five species, representing 33 of 41 currently recognized genera, were used to construct a molecular phylogeny for the main lineages of Lacertidae. Partial DNA sequences of 3 mitochondrial genes (12S, 16S and Cytb) and 2 nuclear genes (Rag-1 and C-mos) were retrieved from GenBank. Most lacertid genera are represented by a single species, with the exception of Psammodromus and Mesalina, which are each represented by two. All genes used in this study were not available for some of the species, so that six of the genera (Acanthodactylus, Algyroides, Eremias, Nucras, Parvilacerta and Pedioplanis) are represented by a combination of genes from two congeneric species. For example, the missing Cytb sequence of Acanthodactylus boskianus is substituted by that of A. erythrurus. Such substitutions at the genus level should have no effect on overall tree topology, since we are primarily interested in phylogenetic relationships of higher taxonomic units (i.e. above the generic level). The final data set for Lacertidae consists of 3 individuals from the subfamily Gallotinae (Gallotia + Psammodromus), 15 individuals from Eremiadini corresponding to 14 genera, and 17 individuals from Lacertini each representing a single genus. Three additional species were used as outgroups: the teiid Cnemidophorus tigris, the amphisbaenian Rhineura floridana, and one of two living members of Rhynchocephalia, Sphenodon punctatus, as outgroup to all squamates. GenBank accession numbers for sequence data are listed in Table . Lacertid taxonomy follows Arnold et al. [].Alignments were performed separately for each gene using ClustalW [] and manually corrected in SEAVIEW []. A total of 15–20 base pairs (bp) of 16S that could not be aligned unambiguously were excluded from the analysis. Final gene lengths are 254 bp 16S, 327 bp 12S, 281 bp Cytb, 1012 bp Rag-1 and 375 bp C-mos. To test for incongruence among genes, a partition homogeneity test [] was conducted in PAUP* 4.0b10 []. The test (100 replicates of random addition heuristic search option with tree-bisection-reconnection branch swapping) indicated significant heterogeneity among genes (p = 0.01). However, since a growing number of studies indicate that incongruence tests are not reliable indicators of data set combinability [] and no strongly supported nodes were in conflict with previous studies, genes were concatenated into a multigene data set of 2249 bp. Following a total evidence approach [], the following analyses were conducted on the combined data to maximize the amount of characters and explanatory power of the available data. As a test of our combined approach, we also analyzed partitioned mitochondrial DNA (mtDNA) and nuclear DNA (nDNA) sequences for one of the relaxed clock models (Uncorrelated lognormal with 10% prior probability distributions, described below). These values were then compared to results from the concatenated data set to explore possible biases associated with the different genomes. [...] Divergence dates for Lacertidae were estimated under four different Bayesian molecular clock models. Minimum constraints for five nodes were chosen based on evidence from the fossil record. In a conservative approach, the oldest age of the stratigraphic layer in which a fossil was found was used to represent the earliest occurrence of that lineage, and potential calibrations were limited to fossils that are reliably assigned to extant clades. Calibrated nodes are: (i) Sphenodon punctatus – Cnemidophorus tigris, 228.0 Mya, based on the earliest identified rhynchocephalian from the late Triassic [Carnian; []], and corroborated by the oldest-known fossil squamate, Tikiguania, from the Carnian of India [], (ii) Cnemidophorus tigris – Rhineura floridana, 113.0 Mya, corresponding to the oldest known teiid, Ptilotodon, from the lower Cretaceous [Aptian-Albian; []], (iii) Rhineura floridana – Gallotia galloti, 64.2 Mya, based on the fossil rhineurid Plesiorhineura from the Paleocene [Torrejonian; []], and (iv) Timon lepidus – Dalmatolacerta oxycephala, 5.3 Mya based on the Pliocene "Lacerta ruscinensis" from Roussillon, France, whose fossil remains are indistinguishable from the modern T. lepidus presently living in the same area [].To incorporate uncertainty surrounding fossil calibrations, prior constraints are expressed as probability based distributions. We use a rigid, or "hard", minimum bound, meaning that the true divergence date cannot be younger than the earliest known fossil. The probability that the divergence event occurred above the minimum date declines according to an exponential distribution, such that 95% of the posterior density falls within the range [x - x + 10%] (Figure ). For example, the minimum age constraint for the split between Rhynchocephalia and Squamata is 228 Mya, and the expected posterior estimate is between 228.0 and 239.4. To test the sensitivity of posterior estimates to prior distributions, we also allow expectancy values for calibrated nodes to fall within 20% of the minimum age, so that 95% of the posterior density is between [x - x + 20%]. This allows us to evaluate the influence of the range of soft bounds used for a given data set, irrespective of possible errors in fossil calibration dates.In addition to estimating divergence dates, we evaluate the reliability of our proposed fossil calibrations by systematically removing individual priors and comparing posterior estimates. Specifically, we test the accuracy of the dates proposed for amphisbaenians (Plesiorhineura, 64.2 Mya) and teiids (Ptilotodon, 113.0 Mya) using three different treatments. In the first treatment, both the amphisbaenian and teiid are excluded so that only the oldest date (Rhynchocephalia, 228 Mya) and youngest date (Lacerta ruscinensis, 5.3 Mya) remain. In the second and third treatments, only the amphisbaenian or teiid is removed, respectively. If a calibration is accurate, provided the remaining calibrations are reliable and the data and model are appropriate, the posterior estimate should remain within the prior range even in the absence of the fossil constraint. If the calibration is poor, the posterior should move away from the prior []. This approach also allows us to compare our results with other studies using similar combinations of fossil calibrations to date the origins of squamate groups [see [,]].For nucleotide data, all models are nested in the General Time Reversible model of sequence evolution with a proportion of invariant sites and gamma distributed rate heterogeneity (GTR+I+Γ), as determined by jModelTest 0.1.1 [,]. For each analysis, the MCMC was run for 50,500,000 steps each chain and sampled every 500,000 steps. The first 1,000,000 steps of each run were discarded as burnin. To couple the four parallel chains we used a heating coefficient of 0.3. This resulted in a sample of size of 100 from the posterior distribution, taken from the cold chain.MCMC calculations were performed in the program TreeTime, freely available at []. Within that program, the following models were implemented:MC: Strict molecular clock model [], assumes a fixed rate of evolution along all branches of the tree.CPP: Compound Poisson Process [], in which points of rate change are interspersed along branches. Following each substitution event, the current rate is modified according to a Poisson process with an adaptive intensity, which determines the a priori distribution of the number of changes. Rate modulations are gamma distributed, such that the expectancy value of the product of multiple rate changes is equal to 1.ULN: Uncorrelated lognormal distributed model of Drummond et al. [], in which the evolutionary rate of each branch is independently drawn from a lognormal distribution. There is no autocorrelation of rates between neighbouring branches. Parameters within the model determine the expectancy value and variance of rates. A smaller variance indicates a smaller deviation from the strict molecular clock, since rates of change are similar across branches.DM: Dirichlet model []. The a priori distribution of evolutionary rates at the branches follows a dirichlet distribution. Parameters within the model determine the variance of rates. The smaller the variance, the smaller the deviation from a strict molecular clock. The average evolutionary rate across branches is kept constant, so that only relative differences between rates are considered.As an independent evaluation of our results, we also calculate divergence dates for Lacertidae under the ULN model in BEAST [], an alternative program for Bayesian analysis. Identical model parameters were used in the two programs with the following exceptions: 1) In addition to priors for calibrated nodes, BEAST requires a prior for the distribution of divergence dates, for which we chose the Yule process. 2) BEAST estimates the equilibrium distribution of nucleotides only once at the beginning of the analysis, TreeTime samples these estimates continuously. 3) In BEAST the molecular clock is relaxed by varying molecular rates of the substitution model among branches, for which reason the rates are dependent on the time scale of the tree. TreeTime compresses or stretches the lengths of branches in the tree, given in molecular time units, by rate multipliers with a mean of one.Finally, we test the performance of alternative clock relaxations on our data by computing Bayes factors, a Bayesian alternative to likelihood ratio tests. Bayes factors calculate the ratio of marginal likelihoods between two given models by integrating over all possible parameter values (as opposed to estimating the maximum likelihood for each parameter). In a comparison between models M1 and M2, a Bayes factor >10 on a logarithmic scale indicates that M1 is more strongly supported by the data under consideration than M2 []. A significant advantage of Bayes factors over likelihood ratio tests is that they automatically penalize models with increasing complexity, and thus guard against overfitting. Furthermore, by using the strict molecular clock as a reference, they allow for a general comparison among any number of independent models []. […]

Pipeline specifications

Software tools Clustal W, SeaView, PAUP*, jModelTest, TreeTime, BEAST
Application Phylogenetics