Computational protocol: Species distribution models predict temporal but not spatial variation in forest growth

Similar protocols

Protocol publication

[…] We built species distributions using the RandomForest ensemble classifier (Breiman, ) implemented by the randomForest package (Liaw & Wiener, ) for the R programing environment (R Development Core Team, ). This ensemble classifier grows multiple classification trees (here, n = 500) from bootstrapped samples of the training data and determines its prediction by majority vote over all developed classification trees (Cutler et al., ). Importance values for the predictor variables were calculated as the frequency that a particular climate variable contributed to a correct classification. Habitat projections were made (1) for climate data for the 1961–1990 period to analyze associations between projected habitat suitability and growth observed on forest inventory plots, (2) for climate data for individual years from 1901 to 2009 to analyze associations between projected habitat suitability and growth tree‐ring width, and (3) for a recent climate average (1995–2009) and future periods (2020s and 2050s) to infer future trends in growth patterns across Europe.The RandomForest algorithm was preferred over other algorithms, like maxent, because we had a rather unproblematic dataset with a high number of census records. To account for nonlinearity in the species response across climate gradients and for interactions among climate variables, RandomForest is generally considered the most powerful implementation of regression tree techniques.We also made an attempt to use the regression tree (rather than classification tree) functionality of RandomForest to model a continuous response variable (i.e., ~50‐year‐old plot volume data) as a function of climate. For the volume models, we built, for each of the studied tree species, a training dataset of approximately 9,000 samples, which were climatically characterized and comprised equal numbers of plots with volume data, absence plots, and random absences. Random absences were only selected for countries without plot‐data availability by using an overlay with tree species distributions from the European Forest Genetic Resources Programme (EUFORGEN, ). We removed pseudo‐absences using a p > .5 threshold for species presence using RandomForest habitat projections based on presence/absence variants of the training datasets. This analysis did not yield acceptable validation statistics, and we briefly report on this negative result for the inventory data‐climate modeling attempt.Model performance was evaluated using the area under the receiver operating characteristic curve (AUC of ROC) to evaluate the statistical accuracy of the species distribution models for individual tree species. The AUC statistic is a common measure of the performance of classification rules; it balances the ability of a model to detect a species when it is present (sensitivity) against its ability to not predict a species when it is absent (specificity) (e.g., Fawcett, ; Fielding & Bell, ). We further report model sensitivity (calculated as TP/(TP + FN) with TP as true positives and FN as false negatives) and model specificity (TN/(TN + FP) with TN as true negatives and FP as false positives). All ROC and AUC calculations were implemented with the ROCR package (Sing, Sander, Beerenwinkel, & Lengauer, ) for the R programming environment. […]

Pipeline specifications

Software tools randomforest, ROCR
Application Miscellaneous
Organisms Picea abies