Computational protocol: Five key attributes can increase marine protected areas performance for small-scale fisheries management

Similar protocols

Protocol publication

[…] This data matrix contained 575 cells of which 3.6% of the attributes and 6.7% of the outcomes contained missing data. The missing data were compensated via missForest, an iterative imputation method based on a random forest that can successfully impute missing values. The method uses multicollinearity of surrounding cells, thus data were imputed separately for attributes and outcomes. This prevented circular reasoning and the introduction of spurious relationship between attributes and outcomes.The overall management success score (OMS) was computed including the imputed (missForest) outcomes. We assessed the strength and significance of the correlative relationships between the OMS, the 3 outcomes and the 20 attributes (i.e. predictors) by using random forests and the Boruta add-on algorithm. The random forest method was suitable for our dataset because it can cope with small sample sizes, a large number of predictors, complex interactions and highly correlated quantitative and/or qualitative attributes. The strength of correlative relationships between the outcomes and each attribute were indicated by the relative importance of each attribute to the predictive accuracy of the random forest. The significance of these relationships was assessed with the Boruta algorithm. The Boruta algorithm tests the significance and predictive accuracy of each attribute by comparing the observed score against a set the randomly permuted attributes (500 permutations across objects). Hence, this provides inference about the attribute importance, which may be either confirmed (importance higher than random) or rejected (importance lower than random probes), although in some cases the attribute may be judged neither confirmed nor rejected and thus finally marked as tentative. The set of relevant attributes may contain correlated and redundant variables. Also, the correlation of the attribute with the outcomes does not imply causative relation; it may arise when both are independently correlated with a third variable.The random forest algorithm implemented in the R package randomForest has three hyperparameters known to affect RF model predictive accuracy and attribute importance estimates: (1) ntree, the number of trees grown, (2) mtry, the number of attribute randomly selected when growing one tree, and (3) nodesize, the minimum size of terminal nodes. Therefore it is crucial to tune the three hyperparameters in order to optimise the RF model. To do so we followed the procedure described in and .Once the random forests were optimised and the significant, relevant, attributes were identified we assessed collinearity among the eight most relevant attributes by using Kendall’s rank correlation coeffecient. P-values under the null hypothesis of no association were obtained by normal approximation with continuity correction.In order to characterise the multivariate relationships among the eight attributes identified as important (i.e. the ones detected as significant and tentative) by Boruta (in part collinear, ), we carried out a Factor Analysis of Mixed Data (FAMD) by using the R package FactoMineR. The eight most important attributes detected by Boruta for OMS were used as active variables. The success score was added as a in order to appreciate the directionality of attributes effects. […]

Pipeline specifications

Software tools randomforest, FactoMineR
Application Miscellaneous
Organisms Danio rerio
Chemicals 3-Mercaptopropionic Acid