Computational protocol: Body Size, Extinction Risk and Knowledge Bias in New World Snakes

Similar protocols

Protocol publication

[…] Our sample was not obtained following a random design but instead comprised the available data. This could have resulted in a taxonomically biased database unsuitable to generate robust conclusions regarding New World snakes. We thus first checked whether our data were equally distributed among families, subfamilies and genera by computing the taxonomic distinctiveness metric and comparing it with a random distribution . We found that our sample was more representative (i.e. contained more supraspecific taxa) than expected by chance (Δ+ = 79.445; p<0.001). Hence, we concluded that severe taxonomic biases were unlikely in our data.Different species share different proportions of evolutionary history; thus whenever species are the sampling units in parametric statistical analysis (see below), the assumption of independence among observations cannot always be assumed (i.e. the closer the species in the phylogeny the more similar their characteristics might be ). Therefore, if phylogenetic autocorrelation were to be found in species' description years, body sizes, or both, this should be taken into account in the analyses. To evaluate the presence of phylogenetic autocorrelation in these variables, we used the most comprehensive phylogenetic hypothesis for reptiles available until now that comprises 288 (43.3%) species in our database. We completed this phylogeny with the remaining species whose genera were already represented in the phylogeny (271, 40.8%) including them as polytomies; whereas we excluded the remaining species (106, 15.9%) from this analysis.Subsequently, we generated phylogenetic global Moran's Is and Moran's I correlograms (in this case using five phylogenetic distance classes with equal number of observations ) for both description years and body sizes to asses levels of phylogenetic autocorrelation in these variables. We found virtually no phylogenetic autocorrelation in description years, neither globally (global Moran's I = 0.05; p<0.001) nor in any distance class (), indicating that there is no need to control for phylogenetic relatedness in analyses involving this variable. In contrast, we found significant phylogenetic autocorrelation for body size (global Moran's I = 0.19; p<0.001), particularly in the first distance class (), indicating that closely phylogenetically related species tend to have more similar body sizes.Additionally, we also checked whether these lacks of phylogenetic autocorrelation in description years as well as the phylogenetic dependence observed for body sizes were robust to potential effects introduced by existing polytomies in our phylogeny (all of them circumscribed to the genus level). We did this by randomly resolving the polytomies 1000 times and computing the abovementioned Moran's Is for each run. No substantial changes were observed in any case (mean global Moran's Is ±SD were 0.048±0.003 for description year, and 0.187±0.006 for body size). This indeed supported our previously stated conclusion that while closely related species do not show a phylogenetically autocorrelated pattern regarding their year of description, they do show it when it comes to body sizes (more closely related species tend to show more similar sizes). Based on this, we used Phylogenetic eigenVector Regression (PVR) to generate a set of variables (phylogenetic eigenvectors) representing the phylogenetic relationships among snake species, and then regressed body size against these variables using an iterative search for the subset of eigenvectors that reduce the largest amount of autocorrelation in regression residuals . Two eigenvectors (first and third) were selected for this procedure, which we used as covariables accounting for body size phylogenetic autocorrelation in our statistical analyses (see below), thus guaranteeing the statistical assumption of data independence .We used one-way ANOVA to investigate differences in species' description years among risk categories and, if found, we applied planned comparison tests to evaluate a potentially structured sequence of risk categories according to species' description years. Specifically, we asked (1) if Non-Threatened species — which are expected to be abundant and, hence, easier to find — were described earlier than Threatened ones (description year for NT>TE); and if species in these two groups were described earlier than either (2) Data Deficient (TE+NT>DD) or (3) Not-Evaluated species (TE+NT>NE).Regarding snake body size, we used multiple regression analysis to investigate relationships between species' body size and description years, and then used one-way ANCOVA to check for differences in body size among risk categories. We also applied planned comparison tests to investigate (1) if Threatened species were larger than Non-Threatened (TE>NT), and if species in these two groups were larger than either (2) Data Deficient (TE+NT>DD) or (3) Not-Evaluated (TE+NT>NE) species.Mean sizes of Threatened and Non-Threatened species may change in the future owing to potential designation of Data Deficient and Not-Evaluated species within these risk categories. In order to explore these possibilities, we generated two extreme scenarios: that all Data Deficient or Not-Evaluated species end up being classified as either (4) Threatened (i.e. TE←{DD or NE}>NT) or (5) Non-Threatened (i.e. TE>NT←{DD or NE}). Although not realistic, these two extreme scenarios represent the largest effects that could be expected for the future changes on the large body-extinction risk pattern as a result of including poorly known groups within a risk category. In all the analyses involving body size, we used the two selected phylogenetic eigenvectors previously described as covariables to account for the phylogenetic autocorrelation existing in this variable (see above).For all analyses, p-values were computed using simple F-tests as well as a randomization protocol that generated null distributions of body sizes and description years by randomly reshuffling these data across species 1,000 times. For each of these subsamples we obtained the F value of the corresponding analysis (i.e. ANOVA or regression), and the resulting 1,000 values were compared with the empirical F value. P-value results thus generated are expected to be more robust against potential biases in data , but in our case these were qualitatively similar to the ones given by classical F-tests, and we only reported the latter for simplicity. All analyses were performed in R 3.1.0 , using the packages ape , phytools and letsR . […]

Pipeline specifications

Software tools APE, Phytools
Application Phylogenetics