Computational protocol: Integrating Fossils, Phylogenies, and Niche Models into Biogeography to Reveal Ancient Evolutionary History: The Case of Hypericum (Hypericaceae)

Similar protocols

Protocol publication

[…] To reconstruct ancestral geographic ranges and biogeographic events in Hypericum, we used the chloroplast data set of , which is based on three chloroplast markers (trnL-trnF, trnS-trnG and psbA-trnH). analyzed different concatenated plastid data sets varying in the amount of missing data to evaluate its effect on phylogenetic reconstruction. Of these, we selected the matrix called “Two-markers” that only includes those specimens represented in at least two of the chloroplast markers (N = 114). The reduction in missing data allowed recovering a tree with higher resolution and branch support values than the complete data set including all specimens (N = 192), while at the same time recovering most of the morphological (92% of traditionally described sections) and all geographical variation within the genus (). The missing species mainly belong to the large Brathys group from America, Ascyreia from Asia, and the Hirtella group from the Levant region (; ). Most species within these clades are distributed in the same biogeographic region, and all three clades are represented in our phylogeny. Moreover, used the “two-marker” data set in their biogeographic analysis based only on extant taxa, and using the same data set in this study facilitates comparison with their results. A fourth data set based on the nuclear marker ITS (N = 252) recovered a nearly identical tree than the plastid one (), but was not used here because higher variation in substitution rates in nrDNA compared with chloroplast genes (), and the possibility of incomplete concerted evolution in ITS () make this marker less reliable for phylogenetic dating. Four outgroup taxa representing the Cratoxyleae and Vismieae tribes in family Hypericaceae were also included in the analysis: Harungana Lamarck, Psorospermum Spach, and Vismia Vand. of the tropical tribe Vismieae sister to tribe Hypericeae and Eliea Cambess., representing the tropical tribe Cratoxyleae.Phylogenetic relationships and absolute lineage divergence times within Hypericum were estimated in BEAST v.1.8.1 (). The “Two-marker” chloroplast data set was analyzed partitioned by-gene—that is, applying individual GTR+G substitution models to each gene—based on the results of , who carried out sensitivity analyses to evaluate the effect of different partitioning strategies on their data. Choice of clock and tree model priors in the BEAST analysis was based on Bayes factor comparisons of the harmonic mean estimator of the posterior likelihood of alternative runs with different combinations of settings (e.g., strict clock vs. uncorrelated lognormal (UCLD), Yule vs. birthdeath). We repeated the analyses using the Path Sampling (PS) and Stepping Stone sampling methods, which have been shown to outperform the harmonic mean estimator in terms of reliability and consistency of results (, ). We ran the analysis for 40 million generations, to which we added a chain length of 4 million generations as the power posterior runs. The highest likelihood corresponded to a birth-death tree prior with UCLD molecular clock, but this was not significantly higher than the Yule UCLD model (). Given that mixing, EES, and convergence were better in the latter model we used this model in our analysis with two replicate MCMC searches. As in , two calibration points were used to derive absolute ages: the Late Eocene fossil H. antiquum to constrain the crown node of Hypericum (=Hypericeae) using a lognormal prior (offset = 33.9 myr, standard deviation “SD” = 0.7), and a secondary calibration point for the root of the tree (crown-node family Hypericaceae) from the calibrated clusioid clade phylogeny of applying a normal prior (mean = 65.2 myr, SD = 11) (see online Appendix 2 and Supplementary Fig. S1 in online Appendix 3 for more details). [...] We used the Geographic State Speciation and Extinction model “GeoSSE” () implemented in the R package diversitree () to investigate the existence of historical differences in speciation and extinction rates across biogeographic regions in Hypericum. In trait-dependent diversification models like GeoSSE or BISSE (), the trait itself (transitions between character states) influences the birth–death process that generates speciation times in the phylogeny. GeoSSE extends the BISSE binary model to incorporate a third, polymorphic state for geographic characters, since taxa are often not endemic but present in more than one area/state (). Biogeographic regions were defined the same as above, except that we excluded the NT and OC from the analysis: the first does not include widespread species, whereas occurrence of Hypericum in OC is only marginal. For the selected geographic regions we estimated ML parameters for a full GeoSSE model—in which speciation, extinction and dispersal rate parameters are allowed to differ between areas—as well as for a set of GeoSSE constrained models: same rates of within-region speciation (sA ∼ sB, sAB ∼ 0), of between-region speciation (sAB ∼ 0), of dispersal between regions (dA ∼ dB) or of within-region extinction (xA ∼ xB). Model fit comparison was assessed using likelihood ratio tests and comparing the Akaike Information Criterion values of the different models. The eight parameters estimated in the full GeoSSE model under ML were used as a prior for a Bayesian MCMC search. The MCMC chain was run for 10,000 generations, and the first 1000 were discarded. The current version of GeoSSE accounts for random incomplete taxon sampling within areas but can only manage two areas, so we compared each biogeographic region against the pooled values in all other regions, for instance, diversification and dispersal rates were estimated for species distributed in the EP region and compared with the values estimated for species in the remaining regions. […]

Pipeline specifications

Software tools BEAST, Diversitree
Application Phylogenetics