Computational protocol: Impact of Hydraulic Well Restoration on Native Bacterial Communities in Drinking Water Wells

[…] Initial sequence data processing was performed using mothur software (). Trimmed sequences with <250 bp, more than 8 homopolymers, and/or more than one primer mismatch were discarded. Sequences were aligned to the SILVA-compatible alignment database. The mothur-implemented algorithms PyroNoise (minimum flow length = 360, and maximum flow length = 720) and Chimera.uchime were used to denoise sequences and remove amplification artefacts. The remaining sequences were binned into operational taxonomic units (OTUs) at a 97% sequence similarity cut-off using the average neighbor clustering algorithm. Phylogenetic trees were then constructed for overall library comparisons using Clearcut. The classification of sequences was accomplished using the Greengenes train set described by Werner et al. (). Processed taxon abundances from both datasets for f- and r-reads were averaged to mean values per sample. All sequencing data has been deposited with the NCBI sequence read archive under the BioProject ID PRJNA245507.Multivariate statistics were performed with a subset of the sequence data including all taxa contributing at least 1% (relative abundance) in one of the samples. All statistical analyses were conducted using R version 15.2.0 (). The diversity index inverse Simpson (1/α) and expected OTU richness of rarefied samples were calculated using the vegan package (). The Inverse Simpson concentration was recognized as the effective number of species (), based on the mean frequency of species in an ecosystem. To test the robustness of our community analyses, we performed bootstrap resampling (n=1,000) followed by repeated diversity calculation. Rarefaction analysis was performed using the rarefy function of vegan to estimate rarefied species richness (Ŝn) () based on the minimum number of sequences amongst all samples. Data was transformed using Hellinger distances and principal component analysis was computed using the prcomp function. The sequences and constructed phylogenetic trees of the forward and reverse reads were used to estimate β-diversity based on weighted Unifrac values (). […]

Pipeline specifications

Software tools mothur, PyroNoise, UCHIME
Applications Phylogenetics, 16S rRNA-seq analysis