Computational protocol: Do Patterns of Bacterial Diversity along Salinity Gradients Differ from Those Observed for Macroorganisms?

Similar protocols

Protocol publication

[…] Kelike Lake and Chaqia Lake, representing low (0.71‰) and high salinities (279.2‰) (), respectively, were selected to construct clone libraries for a more detailed analysis of their bacterial community structure. Aliquots of the products from the PCRs and re-PCRs of the two lakes were ligated into the pMD18-T vector (Takara, Dalian, China) and transformed into TOP10 competent cells. Further details of the clone library construction can be found in our previous report . For each primer pair, 72 clones from each environment were randomly selected for sequencing. The insert fragments resulting from the 8f/1492r, 8f/PLA886r and AC517f/AC1165r primer pairs were sequenced with the primer 8f, 8f and AC517f, respectively (). Other fragments were sequenced with the M13f primer (5′ -GTAAAACGACGGCCAGT- 3′).The clone sequences obtained were manually checked for chimeras using the Ribosomal Database Project II ( prior to submission to GenBank ( Furthermore, the submitted sequences were analyzed in the context of the complete data set to identify putative anomalies using Bellerophon and Mallard with the default settings. Fifty putative chimeric sequences were identified and these were excluded from the subsequent analyses and retained in GenBank with an organism name of ‘uncultured bacterium’ and a note stating ‘putative inter-phylum chimera’.Sequences were initially aligned with the NAST algorithm from the GreenGenes database . Maximum likelihood trees were constructed by using the Dnaml program in the Phylip package . The rate of diversification within each environment was examined using the tree shape statistic, γ , , which is extraordinarily sensitive to recent diversification rates . We transformed the maximum likelihood trees to ultrametric trees (rooted trees with edge lengths where all of the leaves are equidistant from the root) by using non-parametric rate smoothing . Under the pure-birth model, the gamma statistic equals zero. Negative values of gamma indicate that the phylogeny's internal nodes are closer to the root than expected under the pure-birth model and imply a deceleration in the accumulation of lineages, whereas positive values indicate that the phylogeny's internal nodes are closer to the tips and imply an acceleration of the accumulation of lineages . To provide a global picture of diversification rates, the gamma statistics were calculated for the total sequences and the sequences obtained from the 8f/1492r primer for each of the two lakes. All of the analyses were conducted in the R environment ( with the packages geiger v1.2-14 and ape v2.3-1. [...] Two types of data matrices were constructed from the 2005 samples. The matrices of the data on the bacterial taxa were generated from the presence and absence of DGGE bands. The environmental matrices included 16 environmental variables (shown in part in ). All of the environmental variables, except the pH, were log-transformed. The band numbers of the 341f/907r DGGE fingerprints and the sum of the bands from the phylum-specific-primer DGGE fingerprints were used as proxies for the bacterial taxon richness. Principal component analysis (PCA) was performed on the latter environmental matrix to determine the primary environmental gradient. To investigate the relationship between the environmental factors and the bacterial communities, a canonical correspondence analysis (CCA) was performed using the two matrices. The significance of the relationship between the environmental factors and the BCC was tested with a Monte Carlo permutation tests (499 permutations). A nonparametric Spearman correlation was applied to analyze the DGGE band numbers and environmental factors for all of the samples from 2005 along the entire salinity range and low salinity range (salinities <1‰), respectively. Richness trends along the investigated salinity gradient were analyzed using a locally weighted scatterplot smoothing regression (LOWESS regression, span 2/3; degree 1) to detect major nonlinearities or a generalized linear model to detect linearity. Furthermore, we calculated both the phylogenetic diversity and the community taxon richness with maximum likelihood trees from clone sequences under different sequence-similarity cutoffs (95%, 97%, 99%, and 100%). The phylogenetic diversity was described by Faith's PD as the total length of the phylogenetic branches connecting the species within a community . The taxon richness was obtained with a non-parametric estimator: Chao1 = Sobs+(a2/[2*b]), where Sobs is the number of species observed and a and b are the number of species observed exactly once and twice, respectively . We then calculated the taxon richness or phylogenetic diversity with a rarefaction analysis (1,000 times). The rarefaction curves were plotted against the sequences sampled to compare the phylogenetic or community diversity under different sequence-similarity cutoffs. CCA analysis was performed with the software Canoco v4.53, and other statistical analyses were conducted in the R environment ( with the package Vegan v1.16-2. […]

Pipeline specifications

Software tools PHYLIP, GEIGER, APE, vegan
Application Phylogenetics
Organisms Hemisus marmoratus