Computational protocol: Evolutionary patterns of range size, abundance and species richness in Amazonian angiosperm trees

Similar protocols

Protocol publication

[…] We intersected a list of all Neotropical tree genera (from with a list of Amazonian plant species () in order to generate a list of Amazonian tree species. The dataset additionally includes estimates of range size for all species. We obtained estimates for the total abundance of Amazonian tree species from .We obtained sequences of the rbcL plastid gene for 631 Amazonian angiosperm tree genera (), with 567 sequences coming from Genbank ( and an additional 64 genera being newly sequenced following protocols outlined in . We obtained sequences of the matK plastid gene from Genbank for 452 of the 631 genera with rbcL data (). Sequences were aligned using the MAFFT software (). ‘Ragged ends’ of the sequences that were missing data for most genera were manually deleted from the alignment. Preliminary phylogenetic analyses allowed us to exclude sequences from GenBank for genera that were phylogenetically placed in a different family to that which they are thought to belong taxonomically. The final alignment can be found in .We estimated a maximum likelihood phylogeny for the genera in RAxML v8.0.0 (), on the CIPRES web server ( We used the default settings, including a General Time Reversible (GTR) + Gamma (G) model of sequence evolution, with separate models for the rbcL and matK genes (i.e., a partitioned analysis). We included sequences of Amborella trichopoda (Amborellaceae) and Nymphaea alba (Nymphaeaceae) as outgroups. This phylogeny (see ) was used as a starting tree for simultaneous topology and divergence time estimation in the software BEAST v1.82 (). We implemented fossil-based age constraints for 25 nodes distributed across the phylogeny, using log-normal prior distributions with an offset to impose a hard minimum age (see ). We used a GTR + G model of sequence evolution, with separate models for the rbcL and matK genes, an uncorrelated relaxed lognormal clock, and a birth-death model for the speciation process. We conducted several preliminary runs to optimise the tuning and weight of parameters as per recommendations generated by the software. Once parameter optimisation stabilised, we ran two separate chains for 100 million generations. The first 50 million generations of each chain were discarded as “burn-in,” as the posterior probability of the phylogeny did not stabilise until this point. We combined the post burn-in posterior distributions of parameters and confirmed that effective sample size (ESS) values exceeded 100 for all parameters. We then used the phyutility software () to generate an all-compatible consensus tree from the combined post burn-in posterior distribution of trees. Node ages were optimised onto this consensus phylogeny as the median value for a given node across all trees in the posterior distribution that contained the node (using the TreeAnnotator software, ).For each genus in the phylogeny, we calculated the mean range size and abundance for all constituent species in the and datasets. Of the 631 genera in the phylogeny, 493 had an abundance estimate for at least one species in . We considered the number of species for each genus in the dataset as an estimate of the species richness of that genus in the Amazon. As an alternative estimate, we used the Neotropical species richness estimates for genera in , which produced highly similar results. We assessed correlations amongst these genus-level characteristics using Pearson’s correlation coefficients for both the raw values and for their phylogenetically independent contrasts.We tested for phylogenetic signal for each of these genus-level characteristics using Pagel’s λ (). Under Brownian motion evolution, where trait values drift randomly over evolutionary time and where the phylogenetic relationships of taxa perfectly predict the covariance among taxa for trait values, the expected value of λ is one. When the phylogenetic relationships of taxa do not predict the covariance at all, the expected value of λ is zero. We compared the fit of different values for λ (one, zero and the maximum likelihood estimate) using the Akaike information criterion (AIC).In order to determine which lineages may be responsible for significant phylogenetic signal for a given characteristic (e.g., mean range size of genera), we used the following approach. We first estimated the ancestral value at each node in the phylogeny using maximum likelihood ancestral state reconstruction (). We then randomised the tips of the phylogeny 1,000 times, reconstructed ancestral values at nodes each time, and compared the observed reconstructed value to that across the randomisations. If the observed value for a node was greater than that in 97.5% of the randomisations, we considered the lineage descending from that node to show significantly high values for the trait, while if the observed value was lower than 2.5% of the randomisations, we considered the lineage to show significantly low values.In order to assess whether major clades (Magnoliids, Monocots, Rosids and Asterids) differ in the species richness and mean range size and abundance of their constituent genera, we used analyses of variance with major clade as the grouping variable. In order to determine which clades may be driving significant results, we used Tukey’s tests. All analyses were conducted, and figures constructed, in the R Statistical Software () using functions in the ape (), geiger () and phytools () packages (see for codes). […]

Pipeline specifications

Software tools MAFFT, RAxML, BEAST, phyutility, PHYSIG, Phytools
Application Phylogenetics