Computational protocol: To move or to evolve: contrasting patterns of intercontinental connectivity and climatic niche evolution in “Terebinthaceae” (Anacardiaceae and Burseraceae)

Similar protocols

Protocol publication

[…] Sequence data for assessing the individual phylogenies of Anacardiaceae and Burseraceae have been generated by current authors using multiple phylogenetic markers (Weeks, ; Pell, ; Fine et al., , ; Weeks et al., ; Pell et al., ). The published datasets overlapped for three DNA sequence regions: the nuclear ribosomal external transcribed spacer (ETS), the chloroplast trnL intron and trnL-F intergenic spacer (trnL-F region), and the chloroplast rps16 intron. All of these regions have proven alignable across the targeted taxa and useful for investigating phylogeny at the familial and generic levels. These three datasets were expanded with additional taxa for the current study using amplification and sequencing protocols as outlined in publications referenced above. Multiple sequence alignment for each locus was carried out in MAFFT v7.0 (Katoh and Standley, ) with the E-INS-i algorithm. To improve alignment quality, we ran GBlocks V0.91b (Castresana, ) with parameters −b3 = 4, −b4 = 10, −b5 = h to clean the alignments as this has been shown to improve subsequent phylogenetic analyses (Talavera and Castresana, ). Before phylogenetic inference, we evaluated whether the final concatenated matrix should be partitioned by marker or by any combination of markers, and which nucleotide substitution model should be employed for the final partition scheme. For this analysis, we used the Bayesian Information Criterion as implemented in PartitionFinder (Lanfear et al., ) using the greedy algorithm, and we unlinked branch length estimates for each of the substitution models in each partition. Results of this analysis showed that the matrix should be treated as a single partition evolving under the GTR+I+Gamma model of nucleotide substitution. [...] The chronogram and divergence times were co-estimated using Markov Chain Monte Carlo (MC2) sampling in BEAST v1.8 (Drummond et al., ). A birth-death speciation process (Gernhard, ) was specified as a tree prior with a death rate parameter sampled from a U(0,1) prior distribution, and a growth rate parameter sampled from a U(0,inf) prior distribution. Rate heterogeneity among lineages was modeled using an uncorrelated lognormal relaxed molecular clock (Drummond et al., ) with a mean sampled from an Exp(10) prior distribution. We used a secondary calibration to set the prior on the age of the root using a N(85,8) prior distribution; this parameterization accounts for the uncertainty surrounding the age of the Sapindales (Muellner et al., ; Magallón and Castillo, ). We used the six Terebinthaceae fossils (see above) to set priors on six nodes: the most recent common ancestor (MRCA) of Cotinus, the MRCA of Loxopterygium, the MRCA of Anacardium, the MRCA of the Protieae, the MRCA of Commiphora, and the MRCA of Bursera subgenus Elaphrium. Because all of these fossils are fragmentary, it is not possible to be certain that any of those fossils possess features that would place them in the crown groups. Therefore, we took a conservative approach and used them as minimum calibrations of the stem groups (Forest, ). All these nodes were parameterized with Exponential distributions in which the offset matched the minimum bound set by the fossil age, and the mean was set to be 10% older than this value. Because random starting trees did not satisfy the temporal and topological constraints associated with some fossil calibrations, we used ExaML v1.0.12 (Stamatakis and Aberer, ) to estimate a maximum likelihood tree, transformed it into a chronogram using penalized likelihood (Sanderson, ; Paradis, ), and used it as starting topology in BEAST. The MC2 was run for 6 × 107 generations sampling every 4 × 103 with the first 20% of the samples discarded as burn-in. Convergence to stationarity of the MC2 sampling was determined with time-series plots of the likelihood scores and cumulative split frequencies, and assessing that estimated effective sample sizes for the chronograms and model parameters were at least 100. Post burn-in chronograms were summarized with a majority clade credibility tree (MCCT) using median branch lengths.We carried out diversification analyses in two ways. First, we used BayesRate (Silvestro et al., ) to evaluate whether a single birth-death diversification process for the whole Terebinthaceae, or two birth-death diversification processes, one for Anacardiaceae and one for Burseraceae, better explain the accumulation of lineages through time. For this analysis, we used flat priors, clade-specific taxon sampling proportions (PAnacardiaceae = 0.21, PBurseraceae = 0.19), we unlinked rates between clades, and ran the MC2 for 1 × 105 generations, sampling every 1 × 102, and discarding the first 10% as burn-in. For model selection, we used Bayes Factors using the marginal likelihoods calculated using thermodynamic integration. Second, we used BAMM (Rabosky, ) to automatically detect shifts in diversification process through time without defining tree partitions a priori. For this analysis, we used 1.0 for the Poisson rate prior, the lambda initial prior, and the extinction rate prior. We included a global taxon sampling proportion P = 0.20. We ran 1 × 107 generations of MC2, sampling every 1 × 103, and discarding the first 10% as burn-in, with two independent runs to assess convergence. […]

Pipeline specifications

Software tools MAFFT, Gblocks, PartitionFinder, BEAST, ExaML
Application Phylogenetics