Computational protocol: Gymnosperms on the EDGE

Similar protocols

Protocol publication

[…] A list of all gymnosperm species was obtained from the Royal Botanic Gardens, Kew online resource “World Checklist of Selected Plant Families”. Available DNA sequence data for gymnosperms for the plastid regions rbcL, matK, rpoC, rps4, and trnL, as well as the nuclear marker PHYP, were obtained from GenBank and downloaded using the data-mining tool SUMAC (data accessed on 3rd March 2016). Forty-one taxa of angiosperms and fifteen ferns and their allies were also included in our analyses as outgroup taxa. Regions were selected based on the level of coverage they achieved either across gymnosperms as a whole or with a focus on particular lineages. Details of species sampled for each region (including GenBank accession numbers) are listed in Supplementary Table .To increase taxonomic coverage, we obtained sequence data for the plastid rbcL exon (ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit) for 129 species, of which 35 were for species otherwise not represented in the data set. DNA was isolated using a modified version of the 2× CTAB method and subsequently purified on a caesium chloride/ethidium bromide gradient (1.55 g/ml density) to yield material suitable for long-term storage in the DNA & Tissue Collections at the Royal Botanic Gardens, Kew ( PCR amplifications were performed using primer combinations from Olmstead and colleagues. PCR reactions were made with the ReddyMix PCR Master Mix from ABgene (2.5 mM MgCl2; Epsom, Surrey, UK) with the addition of 1 μl of bovine serum albumin 0.4% and 50 ng of each primer, in a final volume of 25 μl. The amplification cycle started with 2 min initial denaturation at 94 °C, followed by 32 cycles of 1 min denaturation at 94 °C, 1 min annealing at 48 °C, 1.5 min extension at 72 °C, and a final extension of 3 min at 72 °C. After purification with the Nucleospin Extract II kit (Machery-Nagel, Duren, Germany), cycle sequencing reactions were performed in 10 μl reactions using 1 μl of BigDye® Terminator cycle sequencing chemistry (v3.1; ABI; Warrington, Cheshire, UK) and run on ABI 3730 automated sequencer. Geneious (version 7.1.2) was used to assemble complementary strands and verify base-calling.Sequences of each region were compiled in Geneious (version 7.1.2) and aligned using the MUSCLE algorithm. All partitions were concatenated using an R script (S. Buerki, pers. comm.) and all subsequent analyses were performed on the resulting supermatrix. A phylogenetic tree was reconstructed using the maximum likelihood (ML) criterion as implemented in the software RAxML (v. 8.2.8) on the CIPRES portal ( with 1,000 rapid bootstrap replicates followed by the search of the best ML tree. The GTRCAT model was used and all the other parameters were set as default settings. All fifteen ferns and allies were designated as outgroup taxa (e.g.,).Several attempts to obtain an ultrametric tree using the Bayesian approach implemented in the package BEAST were unsuccessful. Constraining the topology to the ML tree obtained from the software RAxML, thus allowing only the optimisation of branch lengths alone, was also unsatisfactory. In all cases, the analyses failed to converge on a single solution and the majority of effective sample size values were consistently below the threshold of 200. We thus opted to transform the ML phylogenetic tree of gymnosperms into an ultrametric tree using the programme treePL, which implements the penalized likelihood method. The default cross validation procedure was performed and identified 0.1 as the most appropriate smoothing value. A set of 15 calibration points based on fossils used by previous studies and molecular estimates from a recent study of cycads were applied (see Supplementary Table ). Outgroup taxa were pruned from the tree prior to the calculation of ED scores.Despite having a reasonably good species coverage in our phylogenetic analysis (i.e. ca 85%), incomplete sampling could potentially biased EDGE rankings, thus we used the following approach to add to our ultrametric tree the 167 species for which no suitable sequence data was available for the markers used here. We used the function from the R package phytools and the option “random”, which add randomly the missing species to their respective genera, while retaining the ultrametricity of the tree. We performed this step 100 times to assess how the random position assigned to each species within its genus affects the ED and EDGE values, and the resulting EDGE ranks.ED scores for all species of gymnosperms were obtained using the 100 ultrametric trees and were inferred using the function evol.distinct from the R package picante. The median value of all 100 resulting ED values for each species was compiled and used to produce the EDGE scores. Probability of extinction assessments were obtained from the IUCN Red List (, version 2015.4; accessed on 29th April 2016). These assessments were converted into probabilities of species extinction using two probability of extinction transformations, the original logarithmic transformation of Isaac and colleagues, and the IUCN50 probability transformation proposed by Mooers and colleagues. EDGE scores were subsequently calculated using the median ED value by implementing the EDGE equations in an R script. Species that were Data Deficient (DD) or Not Evaluated (NE) were scored as Critically Endangered. Threatened species (i.e. those assigned CR, EN, VU, as well as DD and NE) were ranked by decreasing ED scores to provide a classification conservation priority species less dependent on the transformation of probability of extinction.The gymnosperm species with the top 100 EDGE values obtained with the ISAAC transformation together with their ED scores were compared to those of amphibians, mammals and bird (obtained from using boxplots produced in R. We compared the effect of probability of extinction transformations (IUCN50 vs. ISAAC) on the overall EDGE species ranking by plotting the difference in species rankings using the IUCN50 transformation as reference; negative values indicate that the IUCN50 transformation prioritize a given species over the ISAAC transformation, whereas positive values denote the opposite. Differences in EDGE species rankings were plotted using R and each species was coloured according to its IUCN Red List category. To assess the effect of ED on EDGE species ranking, boxplots of ED values for the species prioritized by each transformation were also produced in R. A difference of ranking between plus or minus 10 was considered equivalent for the boxplot (following). A figure displaying the gymnosperm dated tree together with EDGE values (inferred using the IUCN50 transformation) and IUCN Red List assessments was produced in R,. The GSA geological time scale was used to set boundaries between geological periods,.To map gymnosperm diversity, data from the World Checklist of Selected Plant families (accessed 30 August 2016) were matched to the Taxonomic Databases Working Group (TDWG) geographical scheme level 3 geography. Data was displayed and processed in ArcGIS 10.1, using the Winkel I projection orientated around the date line (180 degrees) and to give an interpretable and reproducible map, colours were derived from Color Brewer. To evaluate if the mapped ranking follows what is expected by chance, we used Exact Binomial Test performed in R against the top 100 EDGE species using the IUCN50 transformation, assuming that the number of top 100 species in each TDWG level 3 region is expected to be proportional to the observed total number (species richness). We repeated the same analysis with the top 100 ED threatened species. The overall result (all TDWG regions) was not significant, but was highly significant for some of the individual TDWG regions, with either more or fewer species than expected by chance (see Fig. ). […]

Pipeline specifications

Software tools SUMAC, Geneious, MUSCLE, RAxML, CIPRES Science Gateway, BEAST, Phytools, Picante
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Ginkgo biloba, Wollemia nobilis