Computational protocol: The biogeographic origin of a radiation of trees in Madagascar: implications for the assembly of a tropical forest biome

Similar protocols

Protocol publication

[…] The Canarieae contains roughly 250 species placed in 11 genera distributed throughout the tropics (Additional file : A1), the internal structure of which is currently being revised []. We sampled 90 of these species: 13 of Boswellia, 50 of Canarium, 13 of Dacryodes, one of Garuga, 11 species of Santiria, two of Trattinnickia, and the single species of Triomma (see Additional file : A1 for taxonomic sampling information, A2 for collection and voucher information, and A3 for sequence data obtained from GenBank). This sample encompasses the range of ecological and morphological variation in Canarieae, as well as the major biogeographic regions in which this variation occurs. The majority of the sampled species were collected by the authors (S. Federman, A. Downie, D. Daly), but sampling was also supplemented with previously published sequences available in GenBank (Additional file : A2 & A3). Outgroups included representatives of seven other Burseraceae genera and two species of Anacardiaceae used in previous phylogenetic studies of Burseraceae [, , ].We sequenced four molecular markers used in previous phylogenetic analyses of Burseraceae [, , ]: the nuclear ribosomal external transcribed spacer (ETS) and the three chloroplast DNA markers, rbcL, rps16, and the trnL-F intergenic spacer. Plant tissues were ground using the MP Biomedicals FastPrep-24 instrument (Santa Ana, CA), and DNA was extracted using the Qiagen DNeasy plant kit (Valencia, CA) following the manufacturer’s protocol. PCR amplification and sequencing conditions followed Weeks et al. []. Sequences were edited using Geneious R7 (, and all new sequences were deposited in GenBank (Additional file : A2). [...] Multiple sequence alignment for each locus was carried out using MUSCLE [] in Geneious R7 (; Kearse et al. 2012) with each alignment refined by eye. We used PartitionFinder [] to simultaneously infer both the best-fitting nucleotide substitution models and partitioning scheme. The candidate pool of potential partitions ranged from a single partition per locus to partitions that divided the protein coding loci by codon position.Bayesian phylogenetic analyses were performed using an uncorrelated lognormal relaxed clock in BEAST v1.7.5 [] with two independent analyses, each run for 80 million generations (sampled every 8000 generations). Substitution and clock models were unlinked among partitions and a birth-death speciation process on branching times was specified as the tree prior for each analysis []. The alignment was partitioned based on the Bayesian Information Criterion (BIC) results inferred in PartitionFinder using the greedy algorithm []. The best-fit model contained one partition for ETS, one for the chloroplast coding region (rbcL), and one for the two non-coding chloroplast regions (rps16 and trnL-F). A GTR + I+ Γ model of sequence evolution was used for all three partitions []. Convergence between runs and adequacy of the burn-in period were both assessed using Tracer v1.5 []. Adequate sampling of the posterior distribution was diagnosed by quantification of effective sample size (ESS) values in TRACER, with ESS values above 200 indicating effective sampling []. We used Tree Annotator [] to summarize the posterior probability distribution of trees using a maximum clade credibility tree (MCCT) with median branch lengths.In order to time-calibrate the phylogeny, we used three fossil-based prior age calibrations and a secondary calibration based on previous divergence-time estimates []. These calibrations were explained in detail by Fine et al. [], and their phylogenetic placements were determined based on morphological assessments of the fossils relative to living members of the Burseraceae by P. Fine and D. Daly []. Briefly, the youngest fossil-based calibration was based on endocarps attributed to Canarium from Czechoslovakian sediments with an estimated age of 23–29 Ma []. Because Canarium emerged as non-monophyletic, the Canarium fossil was placed with a lognormal probability prior at the least inclusive node containing all of the Canarium species sampled (node A), following Fine et al. []. The fossil Protocommiphora europea from the Bognor and Sheppey sediments of the London Clay, with an estimated age of 48.6 Ma, can be assigned to either Commiphora or Bursera subgenus Elaphrium [] and was placed with a lognormal probability prior at the most recent common ancestor (MRCA) of Commiphora and Bursera (node B). The fossil Bursericarpum aldwickense, also from the Bognor and Sheppey sediments of the London Clay [, ], was assigned with a lognormal probability prior to the MRCA of Protieae (node C). Following Fine et al. [] and De Nova et al. [], the age of the MRCA of all Burseraceae (Node D) was constrained using a secondary calibration to place a normally distributed prior age with a mean of 64.92 Ma and a standard deviation of 2.35.We additionally tested how robust results of the dating analyses were to uncertainty in the phylogenetic placement of the Canarium fossils. In particular, we were concerned that fossil placements too deep in the phylogeny might provide false support for generally younger divergence times within the Canarieae []. First, we ran a set of analyses incorporating all calibrations except for the Canarium fossil as described by Fine et al. []. Second, since the signature of historical distributions is often eroded in extant taxa [–], we ran a further analysis in which we estimated divergence times using the Canarium endocarp fossil as a tip, keeping all other fossil calibrations the same. Since we had no additional morphological data to place this fossil more precisely, we allowed it to vary in position along the stem leading to the crown group at node A.To account for the possibility of strong support for uncertain nodes in the Bayesian analyses [, ], we ran maximum likelihood phylogenetic analyses on the concatenated dataset using 1000 rapid bootstraps in RAxML [] using identical partitions as the Bayesian analysis and compared the maximum likelihood bootstraps to the Bayesian posterior probabilities as in []. [...] We assigned each species sampled in the phylogeny to one or more of the following seven biogeographic areas: Neotropics (NE); Africa (AF); Sundaland and Indochina (IAA); India (IN); Laurasia (LA); Madagascar (MA); and South Pacific (including New Caledonia, Australia, and Papua New Guinea [SP]) (Fig. ). These biogeographic areas were delimited on the basis of tropical Asian and African paleogeography [, , ], and on the distributions of extant species. We used the R package Biogeobears [] to test the fit of two biogeographic models to our data: (1) the maximum likelihood dispersal extinction cladogenesis model (DEC) [, ], and (2) a likelihood version of BayArea [, ]. For both of these models, we additionally compared the fit with a founder event parameter, J, which describes a speciation event common to island systems where a “jump dispersal” event quickly results in an evolutionarily independent lineage []. Model comparisons were evaluated using AIC scores calculated from each model’s log likelihood (LL). We carried out all analyses on both the fossil-node and fossil-tip calibrated MCCT trees with the outgroups pruned from the MCCT trees prior to analysis.Fig. 1We used two approaches to model dispersal based on the paleogeographic history of tropical climates from the Eocene onwards. First, we incorporated likely terrestrial and short distance (SD) marine pathways of dispersal through geologic time (hereafter, SD + terrestrial model) that roughly follows previously established models of Madagascar-centric historical biogeography as detailed by Yoder and Nowak [] and Buerki et al. []. We compared this model to one that also takes into account paleoclimatic and paleogeographic information to incorporate possible avenues for long distance marine dispersal (LDD) (hereafter, the LDD + terrestrial model). An advantage to likelihood-based models of biogeography is the ability to partition the temporal history of the clade into time periods with constraints reflecting the climatic and geographic conditions during that time [, ]. We divided our models into three time slices-(1) 56–33.9 Ma; (2) 33.9–16 Ma; and (3) 16 Ma-present-and conditioned dispersal rates based on information detailed in the Additional file : A4. To account for uncertainty in topology and branch lengths, we chose our best-fitting DEC model, and conducted a statistical DEC model (S-DEC) with the RASP platform [, , ] using 1000 trees randomly sampled from the posterior distribution of our phylogeny after the burn-in. […]

Pipeline specifications

Software tools Geneious, MUSCLE, PartitionFinder, BEAST, RAxML, RASP
Applications Phylogenetics, Nucleotide sequence alignment
Diseases Goiter, Endemic, Pulmonary Fibrosis