Computational protocol: Playing by the rules? Phenotypic adaptation to temperate environments in an American marsupial

Similar protocols

Protocol publication

[…] To determine the relationship between the phenotypic traits and latitude, we carried out Pearson (r) or Spearman (rs) correlation analyses depending on the normality of the data in the R statistical framework (). Additionally, using the ggplot2 package in R (), we plotted the trait values against latitude, implementing the non-linear regression loess function to graphically show the trend of correlations and the latitudes for which there is a change in this trend. [...] To detect if spatial autocorrelation was present in our phenotypic data, we calculated Moran’s I weighted by the Euclidean distance between two points for each trait using the ape package v. 4.1 () in R. To identify the best environmental models for predicting the phenotypic variation in the Virginia opossum, we ran random forest analyses using the randomForest package v. 4.6-12 in R (; ). Considering measurements from museum specimens that were collected in the same geographic location could contribute to increase spatial autocorrelation in our data; therefore, we averaged the phenotypic values of those specimens () for the random forest analyses. To run the random forest analyses, we used the values of each phenotypic trait as response variables, and the 12 environmental variables and two geographic variables (i.e., latitude and longitude; ) as predictors. The two geographic variables were included because incorporating geographic information in random forest models allows an evaluation of how much variation in response is explained by environmental variation as compared to simply geographic proximity (; ). Decision trees (regression or classification) and random forest methods have no a priori assumptions about the relationship between predictor and response variables, allowing for the possibility to analyze non-linear relationships with complex interactions (; ; ; ). Random forests analyses are iterations of large number of decision trees, which recursively partition the data into binary homogeneous groups splitting the response variable by the predictor variable explaining most of the remaining variance. The amount of variation in the response variable explained by each predictor is incorporated in the model. Applying a randomized bootstrapping (bagging) method, random forest analysis uses a subset of both response and predictor variables randomly permuted to construct each regression tree and assess the robustness of the model based on the remaining data not included in the tree. If the accuracy of the model decreases appreciably when a variable is left out of the model, that variable is considered an important predictor of the data (; ; ).We ran 10,000 regression trees for each random forest run, and after each run we eliminated the least important variables in the model and re-ran random forest until we identified the most predictive, least complex models, which were composed of three to five predictors that explained the largest amount of variation for each trait. We compared these combined models (which included environmental and geographic variables) with models composed only by the two geographic variables and models including environmental variables only. This was done to detect the role that geography or environment alone play in explaining phenotypic variation in the opossum. Finally, we also tested for spatial autocorrelation in the residuals of the combined models using the Moran’s I statistic to be confident that spatial autocorrelation has not affected the results of those models. […]

Pipeline specifications

Software tools Ggplot2, APE, randomforest
Applications Miscellaneous, Phylogenetics
Organisms Didelphis virginiana