Computational protocol: The Response of a 16S Ribosomal RNA Gene Fragment Amplified Community to Lead, Zinc, and Copper Pollution in a Shanghai Field Trial

Similar protocols

Protocol publication

[…] The Quantitative Insights Into Microbial Ecology (QIIME; ) suite of analysis tools were used to filter and analyze the sequence data. A total of 448,102 sequences were obtained across all 20 plots after removal of the index and primer sequences, quality control filtering and rarefaction. All amplicons were annotated using blastn on NCBI database where all hits with an e-value below a threshold of 10-10 were retained for each sequence, allowing a large base of potential hits for each query sequence. All potential annotation hits were then filtered at a strict criteria of ≥99% identity and ≥99% alignment coverage on any query sequence. For each amplicon, “unknowns” or “uncharacterized” labels were removed when possible and the hit with the highest bit score was selected as annotation. For ambiguous calls (identical bit score), the lowest shared taxonomic level of all potential hits was used. Amplicon annotation was then considered as OTUs (for readability, we borrow the commonly used OTU for readability; these do not represent consensus sequences or clustered amplicons but annotation-binned sequences). Amplicons without annotation (<99% identity and coverage to known NCBI nt sequences) were binned to OTUs using the more forgiving criteria of ≥98% identity and ≥98% alignment coverage. No new OTU construction was permitted in this step (which is used to increase capture of artificial amplicons produced by sequencing error as opposed to establishing less confident OTU). Lastly, in an effort to reduce the typical sparsity encountered in rRNA OTU tables, we flagged OTU that were present in fewer than four biological replicates and those with a high total count concentrated in a single sample (OTU maximum count in a sample: OTU total count of all samples <0.75, Supplementary File ). The final 1,566 OTU table count showed similar depth within all samples (Supplementary File ). Constrained Analysis of Principal Coordinates (CAP; Figure ) ordination was performed based on Bray–Curtis ecological distances using Phyloseq package (). Dispersion ellipses were drawn using veganCovEllipse function from Vegan package () in R (). Alpha diversity was measured using six different indices [Shannon, inverse Simpson (Figure ), Observed, Chao1, Simpson, and Fisher (Supplementary File ) within Phyloseq package ()]. Alpha-diversity was compared between the different groups of samples using a t-test. Of 1,566 constructed OTU, only 642 were retained as >10 total normalized counts across all samples to minimize interference during the standard size factor estimator in the subsequent DESeq2.To characterize statistically significant differentially abundant OTUs between groups of samples, parametric models developed in transcriptomics have been shown to perform well when applied to the pitfalls of microbiome biomarker data (uneven library sizes, sparsity, sample representatively; ). The univariate DESeq2 method was used to identify differentially abundant OTUs (; ) with a p-value and false discovery rate cut-off (Benjamini–Hochberg) <0.05 applied after the statistical tests. Rlog data transformation was used (blind = TRUE) as it offers a safer curve of the square root of variance over the mean (Supplementary File ). Differential abundance is not implicit if an OTU has counts in one condition but is not detected in another within a given comparison due to the requirement of abundance (presence) to be above DESeq2 significance thresholds (population noise), but is more likely. Similarly, the absence of any counts for an OTU within plots of soil from a single condition does not distinguish between absence of the bacteria and very low abundance (below detection, particularly relevant due to rarefaction; ), so is considered only as potential absence throughout the discussion. Any amplicon’s utility to distinguish particular species within a genus, or even genera within a family, is variable and it can be useful to iteratively inform bioinformatics decision-making of the specific biology of the environment as well as to allow for acknowledgment of that variability through comprehensive OTU investigation. OTU unpacking analyses separated sequences into sub-OTUs when no specific annotation accession hit was shared between an OTU’s constituent sequences. For example, the same annotation label (i.e., so common OTU) but with different accession numbers would give rise to two sub-OTUs. The majority of 16S rRNA gene annotation in databases (NCBI nt, SILVA, etc.) derives from in silico sequence comparison, so has the potential to be distant from sequencing-independent characterized bacteria; therefore, for comparison, phylogenetic trees were built by phylum including spikes of the closest isolated/cultured relatives (; ) based on the aligned 16S rRNA gene fragment from the 16S archaea and bacteria NCBI database. Unique identifiers (accession details) for every spike sequence are available in Supplementary File . Phylogenetic trees used Jukes–Cantor model, through a preference for simplicity (of substitution model) given the multiple potential levels of selection across the entirety of 16S rRNA gene (; ), and neighbor-joining method (). Newick phylogenetic trees were visualized using iTOL (Interactive Tree of Life). […]

Pipeline specifications

Software tools QIIME, BLASTN, phyloseq, DESeq2, iTOL
Application Phylogenetics
Organisms Homo sapiens, Firmicutes, Bacteroidetes
Chemicals Copper, Zinc