Computational protocol: Deep Sequencing of Subseafloor Eukaryotic rRNA Reveals Active Fungi across Marine Subsurface Provinces

Similar protocols

Protocol publication

[…] cDNA amplicons were sequenced on a GS-FLX Titanium 454 sequencer at EnGenCore (University of South Carolina, Columbia, SC), which resulted in ∼37,000 reads. To reduce homopolymer errors inherent to 454 sequencing, the dataset was put through the denoise protocol as described in the QIIME software package using the denoise_wrapper.py command. After denoising, chimeric sequences were identified and removed using ChimeraSlayer with the blast_fragments method in QIIME. The data were subjected to quality score filtering using the split_libraries.py command and clustered at various levels of sequence identity (80%, 85%, 90%, 93%, 95%, 97%) in QIIME using the uclust method of all-to-all pair-wise comparisons via the pick_otus.py command.The QIIME taxonomy classification pipeline was not able to accurately classify the majority of eukaryotic OTUs. Thus, we used Jaguc, a program developed specifically for classification of eukaryotic rRNA sequence data, to classify our sequence reads . 90% of eukaryotic OTUs were classified to genus using this approach. OTU tables were created using the make_otu_table.py command in QIIME and the Jaguc taxonomy for each OTU was amended onto this table using a custom perl script developed by the authors for this purpose. This perl script is available from the authors upon request. [...] Canonical Correspondence Analysis (CCA) was used to elucidate relationships between eukaryotic community structure and concentrations of dissolved oxygen (O2), nitrate (NO3−) dissolved inorganic carbon (DIC), total organic carbon (TOC), and sulfide. Multi-response Permutation Procedure (MRPP) was used to test for a statistically significant influence of sediment depth, DIC, sulfide, TOC, and oxygen on the observed OTU distributions. All ordination and multivariate statistical analyses were performed on the TRFLP and pyrosequenced datasets as a whole, as well as the five major eukaryotic subgroups that dominated our 454 dataset: Metazoa, Viridiplantae, Diatoms, Alveolates, and Fungi. Analyses were performed on sequences affiliated with these groups clustered at 80, 85, 90, 93, and 97% sequence identity thresholds as well as the fungal TRFLP dataset. MRPP and CCA were implemented using the PC-ORD software package (MjM Software Design). Weighted UniFrac analysis was performed in QIIME . Prior to UniFrac and alpha-diversity comparisons (i.e. ), the number of sequences per sample were normalized to the sample with the least number of sequences by randomly selecting a subset of sequences from each sample using the multiple_rarefactions.py script in QIIME. […]

Pipeline specifications