Computational protocol: Polymetallic nodules, sediments, and deep waters in the equatorial North Pacific exhibit highly diverse and distinct bacterial, archaeal, and microeukaryotic communities

Similar protocols

Protocol publication

[…] Illumina paired‐end 16S rRNA gene reads were joined using the bioinformatic software fastq‐join (Aronesty, ) and sequences were processed, including an initial quality filtering and sequence sample‐mapping by barcode, using QIIME version 1.8.0 (Caporaso, Kuczynski, et al. ). Potentially chimeric sequences were identified using the UCHIME algorithm within the USEARCH package (Edgar, ) and removed from further analysis. Open reference‐based OTU picking was performed using the UCLUST algorithm (Edgar, ), one of the principal clustering algorithms in the QIIME package, at a 97% sequence similarity cutoff against the Greengenes rRNA gene database release 13_8 (DeSantis et al., ). OTUs that occurred as absolute singletons or were observed in the extraction and/or PCR blanks were filtered from the experimental samples. Taxonomy was assigned based on the Greengenes taxonomy (McDonald et al., ; Werner et al., ) using a UCLUST‐based consensus taxonomy assigner (Bokulich et al., ). A total of 13,835,715 high‐quality sequences were generated, with an average of 101,133 sequences/sample (minimum sequences/sample = 16,753; maximum sequences/sample = 236,702). These data were normalized to 16,000 reads/sample to account for uneven sampling depth using the script single_rarefaction.py, which randomly subsamples the input OTU table without replacement, and this normalized OTU table was used in subsequent analyses unless otherwise specified. The script summarize_otu_by_cat.py in the QIIME package was used to collapse this OTU table by sample type and/or depth when necessary. The only exceptions were the differential abundance analysis in which the full dataset was used, and alpha diversity analyses in which samples were collapsed by sample type (water column, nodules, or sediments) and the dataset was subsampled randomly multiple times at different depths, with a maximum depth of 2,401,000 sequences in order to take maximum advantage of this large dataset.Illumina 5′ 18S rRNA gene reads were processed similarly to 16S rRNA reads, except reference‐based OTU picking was performed against the SILVA 119 rRNA gene database (Quast et al., ). Taxonomy was assigned based on the SILVA 119 taxonomy (Yilmaz et al., ) using BLAST (Altschul, Gish, Miller, Myers, & Lipman, ). The resulting OTUs were filtered to exclude 38,523 bacterial OTUs, 23,733 archaeal OTUs, and 3,126 OTUs that could not be identified at the domain level. A total of 54,819 Eukaryota OTUs comprised of 5,353,354 high‐quality sequences remained, with an average of 45,367 sequences/sample (minimum sequences/sample = 5450; maximum sequences/sample = 154,747). These data were normalized to 5,400 reads/sample to account for uneven sampling depth, and either this normalized OTU table, or a table normalized to relative abundance, was used in subsequent analyses, except the differential abundance analysis in which the full dataset was used and alpha diversity analyses in which samples were collapsed by sample type (water column, nodules, or sediments) and the dataset was subsampled randomly multiple times at different depths, with a maximum depth of 100,100 sequences. Joined, quality filtered 16S fastq files and 5′, quality filtered 18S fastq files have been deposited in the NCBI's Sequence Read Archive under BioProject ID PRJNA281530, SRA ID SRP057408.The nodule prokaryotic core microbiome was computed using the script compute_core_microbiome.py within the QIIME package. The Wu et al. dataset was downloaded from NCBI and OTUs were picked and taxonomy assigned as described for our dataset. OTUs within our core microbiome that hit to Greengenes were compared to the newly created Wu et al. OTU table in order to identify reference‐based OTUs that were present in both datasets. [...] Principal Coordinates Analysis (PCoA) was used to visualize patterns in microbial community structure based on sample type within the CCZ. Analysis of similarities (ANOSIM; (Chapman & Underwood, ) was performed on weighted UniFrac distance measurements of both 16S and 18S gene sequences, and implemented using the compare_categories.py script within the QIIME package. Briefly, UniFrac calculates a distance measure based on the fraction of branch length shared between two communities within a phylogenetic tree; weighted UniFrac additionally takes into account the differences in relative abundances of taxa within each community (Lozupone, Lladser, Knights, Stombaugh, & Knight, ). The prokaryotic phylogenetic tree used for UniFrac was built using FastTree (Price, Dehal, & Arkin, ) from representative sequences aligned with PyNAST (Caporaso, Desantis, et al., ), as implemented in the pick_open_reference_otus.py workflow, and is available as Figure S8; sequences which failed to align were omitted from both the tree and the OTU table. The eukaryotic phylogenetic tree was created similarly from representative sequences aligned with Infernal (Nawrocki, Kolbe, & Eddy, ) and is available as Figure S9. A heatmap (Figure ) was created using the function heatmap.2 in the R package gplots (R Core Team, ; Warnes et al., ). A Bray–Curtis dissimilarity matrix was created from an OTU table containing the 10 most abundant OTUs in each habitat, average linkage hierarchical clustering was performed and a dendrogram was created using the R package vegan (Oksanen et al., ). Colors came from the R package RColorBrewer (Neuwirth, ). Average linkage hierarchical clustering was also done on the full dataset and the results were similar, that is, sediments, nodules, and the water column each formed groups (Figure S10). To create Figure , a differential analysis of count data using shrinkage estimation (DESeq2, (Love, Huber, & Anders, )) was implemented on the full dataset (not rarefied) within the phyloseq package (McMurdie & Holmes, ). Differential OTUs which had a base mean of ≥100 (prokaryotes) or ≥10 (eukaryotes) were reported and visualized using the R package ggplot2 (Wickham, ). […]

Pipeline specifications

Software tools ea-utils, QIIME, UCHIME, USEARCH, UCLUST, UniFrac, FastTree, PyNAST, gplots, DESeq2, phyloseq, Ggplot2
Applications Miscellaneous, Phylogenetics, 16S rRNA-seq analysis
Organisms Bacteria
Chemicals Cobalt, Copper, Manganese, Nickel