Computational protocol: Microbes as Engines of Ecosystem Function: When Does Community Structure Enhance Predictions of Ecosystem Processes?

Similar protocols

Protocol publication

[…] We completed an exhaustive literature review for studies measuring environmental variables, microbial community structure, and biogeochemical processes across a range of ecosystems (Supplementary Table ). We then constructed statistical models for three predictor sets and their additive combinations – environmental data, microbial community data, and microbial biomass measurements (Supplementary Figure ). Because the measurement of microbial biomass (defined as microbial C and/or N content) was not a criterion for dataset selection, only 28 datasets included biomass measurements. To incorporate differences in microbial community structure into our models, we used ordination-based techniques to condense multivariate community data into vectors that we included as predictors in our models (Supplementary Methods). The Shannon diversity index of each sample was also included as a measure of alpha diversity. In total, these taxonomic metrics are referred to as ‘community diversity’ when applied to 16S rRNA genes or PLFAs and ‘functional diversity’ when applied to functional genes. Functional gene abundances were included as measures of total gene abundance per gram of soil when possible (n = 17), although five datasets only listed gene abundances as normalized relative to 16S rRNA gene abundance.Within each type of data, methodology to collect environmental, microbial, and process data, as well as variables collected, inevitably varied. To minimize potential error introduced by these differences, we re-analyzed data within each dataset using a multimodel inference approach with the ‘MuMIn’ package () in R software () and subsequently compared results from our analyses. Multimodel inference is a broad regression-based, model-averaging statistical approach designed to reduce errors in model selection, and this method has the advantage of standardizing our approach across studies while accounting for a lack of a priori system-specific knowledge (). This statistical approach has been used in other small scale studies with similar objectives (). We validated the accuracy of MuMIn in our dataset by comparing models for several datasets with expert-built regression models, which yielded comparable results (more details available in Supplementary Methods).Using the ‘dredge’ command in the ‘MuMIn’ package to fit and evaluate the explanatory power of all possible combinations of variables within a predictor set on a process rate, we selected a set of best fit models for each predictor set consisting of all models with a delta AICc value no more than four greater than the model with the lowest AICc value (). We generated an averaged final model from this model set using Akaike’s weights, implemented with the ‘model.avg’ command in R (Supplementary Methods). Final models from different predictor sets were compared for statistically significant differences at a delta AICc value of four to provide conservative estimates of model improvements (). Models were also evaluated for ecologically relevant improvement, defined by an increase in adjusted R2 value greater than 10% of the environmental model adjusted R2 value. This criterion was implemented to remove artifacts from datasets in which statistically different models, based on AICc values, yielded similar adjusted R2 values. Only models that showed both statistical and ecological improvement were considered to be improved.Finally, model results were synthesized across studies within ecologically relevant subsets of data. We examined results within the full dataset and within biogeochemical process [C mineralization (referred to here as ‘respiration’), nitrification, denitrification, N mineralization], microbial data (PLFA, tRFLP, ARISA, DGGE, qPCR, next generation sequencing) and environment types (natural soil, sediment, agricultural soil) with sufficient replication (n ≥ 12). We report the mean adjusted R2 of models and the increase in adjusted R2 value as measures of effect size as well as the percent of models statistically improved by the addition of microbial data relative to models constructed with only environmental parameters. Differences among the explanatory power of models with environmental, microbial, or both environmental and microbial data were assessed by comparing the distribution of model adjusted R2 values within each predictor set using unpaired one-sided Mann–Whitney U-tests for non-parametric data. We also examined correlations between predictor sets to determine the extent to which environmental variables explained variation in microbial community structure and biomass and to which biomass explained variation in microbial community structure. We analyzed these relationships with redundancy analysis (RDA), utilizing the ‘ordistep’ function in the ‘vegan’ package () in R to automate forward model selection, and we report the average R2 values of correlations between predictor sets as a measure of effect size (Supplementary Methods). […]

Pipeline specifications

Software tools MuMIn, vegan
Application Phylogenetics
Chemicals Carbon, Nitrogen