Computational protocol: A meta-analysis of changes in bacterial and archaeal communities with time

Similar protocols

Protocol publication

[…] The microbial communities in each of the 3431 individual samples were characterized by sequencing a portion of the 16S rRNA gene on either the Illumina (San Diego, CA, USA) or 454 (Branford, CT, USA) platforms. The 16S rRNA gene is widely used for determining the phylogenetic and taxonomic composition of bacterial communities and, in all the cases, the data were derived from PCR amplification of environmental DNA using primer pairs designed to amplify the gene region from all, or nearly all, known bacterial taxa. A closed reference operational taxonomic unit (OTU) picking protocol was applied to each data set separately (). Briefly, OTUs were assigned based on 97% sequence identity to sequences in the Greengenes reference database () preclustered at 97% identity (http://qiime.org/home_static/dataFiles.html). As we used the same reference-based OTU picking strategy for all samples, we could directly compare the relative abundances of taxa across samples. Furthermore, we sub-sampled each individual data set such that all samples from a given data set were compared at an equivalent sequencing depth (). All analyses were performed on the rarefied OTU tables to permit comparisons of patterns in within- and between-community diversity. Our goal was not to quantify the absolute diversity found in any of the samples: this task is difficult, if not impossible, because individual samples may harbor thousands of rare taxa (). However, as recent work has demonstrated (; ), it is not necessary to characterize absolute diversity in order to accurately describe changes in within-sample and between-sample diversity within and between habitat types. Pielou's evenness (), richness (number of OTUs) and Faith's phylogenetic diversity () were used as within-sample (alpha) diversity metrics. Bray–Curtis was used as a taxon-based metric of differences in community composition (beta diversity), and the dissimilarities were calculated from the rarefied OTU tables in R using the vegan package (; ). QIIME (version 1.2.1, () was used for constructing weighted and unweighted UniFrac distances. UniFrac is a commonly used phylogenetic distance metric to assess pairwise dissimilarity in community composition and incorporates information about differences in phylogenetic composition of community members (; ), with weighted UniFrac accounting for differences in the relative abundances of community members. [...] All analyses were performed using the R environment for statistical computing (), with the aid of the vegan and ggplot2 packages (; ). To compare temporal variability in diversity across habitats having inherently different diversities, we calculated the coefficient of variation (CV) in within-sample diversity for each community (Equation ) where σ is the s.d. and μ is the mean.We calculated median absolute deviation (MAD) to compare variability in between-sample diversity (Equation ). Finally, we calculated z-scores of richness to examine the step-wise variability of richness through time, across data sets (Equation ). where χ is the raw value of richness, μ is the mean of the sample, and σ is the s.d. around the mean.In using the coefficient of variation, median absolute deviation and z-scores of richness over time, we compared variability in diversity rather than absolute measures of diversity, which was most appropriate for comparing communities from different habitats that were assessed using different protocols for 16S rRNA short-read sequencing. To assess whether there were patterns in community structure that could be described by time between observations, we related community similarity/distance to time elapsed using Mantel tests with Pearson's correlation on 999 permutations.We evaluated the decay of community similarity over time (time-decay) using the same methods for calculating decay of community similarity over space (distance-decay; ; ). Though assessment of distance-decay is common in the literature, it is less common to assess time-decay. To assess time-decay in microbial communities, we used a similar approach adopted by to the meta-analysis of aquatic community time-decay. A log-linear model was fitted between the change in community structure (assessed by pair-wise similarities or distances, including Bray–Curtis, UniFrac and unweighted UniFrac) and days elapsed. Community dissimilarities were converted to similarities by subtracting from one. Similarities were log-transformed. The slope of the log-linear model is a rate of community change, sometimes referred to as turnover (). For plotting time-decay examples from each biome, we applied lowess smoothing over windows the length of 5% of the total series. Because time-decay can be sensitive to the duration of the study, we also performed a simple analysis of how quickly microbial communities change at temporal scales from 1 week to 1 month, from 1 to 6 months, from 6 months to 1 year, from 1 to 2 years, from 2 to 3 years, from 3 to 4 years, from 4 to 5 years and from 5 to 6 years. These windows encompass the breadth of time series durations included in the meta-analysis. For all pairs of observations within a community's time series, a rate of change was calculated by dividing Bray–Curtis dissimilarity by the time between observations. Next, all pairs of observations were partitioned into the appropriate temporal window (determined by the time between observations) and an average rate of change for each window was calculated. The global average rate of change was summarized across sites from the same habitat (reported in ).STRs for each site were constructed in R by calculating richness using the moving window approach of . This approach involves partitioning a time series into as many window subsets as possible given the number of observations and fitting the STR model (the power-law relationship between time and richness) at each window. For example, a 250-time point series could divided into one 250-point window, two 249-point windows, and so on. The power function, rather than the lognormal function, was used so that our results would be directly comparable with the results from communities of larger organisms reported in . There is some debate regarding which function is most appropriate for describing the STR (; ), but found that the log and power functions produced identical patterns. We compared STR patterns in microbial communities with patterns for communities of larger organisms. However, microbial communities are widely considered to be more diverse than communities of larger organisms, and it remains difficult to make a direct comparison, as discussed at length previously ().There are caveats to the data sets used here that are worth highlighting. First, high-throughput sequencing of 16S rRNA genes can introduce biases in the determination of microbial diversity; these limitations have been described elsewhere (for example, ; ; ; ). Second, because sequencing depth varied across the sample sets (), it is difficult to directly compare the temporal dynamics of individual taxa across the data sets (particularly taxa that are relatively rare). For this reason, we have focused our meta-analyses entirely on the overarching patterns in within- and between-sample diversity, which should be reasonably robust to differences in sequencing depth (); however, we also tested whether inter-biome differences were a byproduct of differences in sequencing depth. Third, we picked OTUs using a closed-reference database protocol to compare data sets generated using different primers that target different variable regions of the 16S rRNA gene (as done in Caporaso et al., 2011). Although this method allowed us to directly compare data sets using the same reference phylogeny, any sequences not matching the well-curated database were discarded from the analyses. Therefore, communities that had better representation of taxa in the database (for example, human-associated communities) had a higher proportion of taxon assignments than, for instance, flower communities, where a larger percentage of the sequences lack a close match to those found in the database. However, we found no relationship between the proportion of sequences that matched the reference database and patterns of temporal variability, suggesting that frequency of matches to the reference database did not bias overall patterns. […]

Pipeline specifications

Software tools QIIME, vegan, UniFrac, Ggplot2
Databases Greengenes
Applications Miscellaneous, Phylogenetics, 16S rRNA-seq analysis
Organisms Homo sapiens