Computational protocol: Gelatinous plankton is important in the diet of European eel (Anguilla anguilla) larvae in the Sargasso Sea

[…] After sequencing, 18S rRNA gene Illumina sequence reads were assembled and trimmed to their median length, before trimming, of 224 nucleotides, and de-multiplexed (quality score of Q30) using QIIME v1.9. Removal of singletons and clustering of operational taxonomic units (OTUs) at 99% similarity was done in USEARCH v8.1.1756 using the UPARSE-OTU algorithm with implicit chimera check. Taxonomy was assigned using the SILVA v. 119 database and BLASTN. BLASTN criteria was a coverage score >98%, with BLASTN homology >95%, and 100% identity similarity over ≥100 bp. The taxonomy of the highest ranking match was then assigned to the OTU. In instances where the highest match included several organisms or species with equal similarities over the same number of bp, taxonomy was assigned using the lowest common taxonomic denominator for the grouping. Remaining ambiguous or unidentified/unknown OTUs were aligned to our custom plankton-database, first by aligning all sequences in CLUSTAL W, then via placement in a phylogenetic tree constructed in MEGA6. Identification of Actinopterygii (fish) OTUs was similarly done in this manner. OTUs only occurring once in the total dataset or which included <9 reads in total, were excluded, as were A. anguilla sequences, and any Actinopterygii OTUs with ≤2% (maximally 4 bp difference) dissimilarity to A. anguilla.16S rRNA gene sequence reads were split into samples by unique custom barcodes. All subsequent steps were performed in CLC genomics workbench 9.5.3 using the microbial genomics plugin. A maximum of 10,000 reads was used per sample. Reads were merged and trimmed for adaptors, low quality and short reads. Samples with fewer than 1000 reads were not further processed and all reads were trimmed to the same length. Merged reads were clustered to OTUs on a 97% identity level to the ARB-SILVA v119 database, removing chimeras in the process. Singletons were removed before further analysis. [...] Statistical analyses were carried out in R. To address a negative binomial data structure in the 18S and 16S rRNA amplicon data, OTU abundances were normalized using DESeq2 1.14.1, and community compositions were analyzed using principal component analysis (PCA). Compositional differences between eel gut contents and marine snow aggregates were tested using generalized linear models (GLMs) and mvabund 3.12. Furthermore, an ANOVA was applied to the GLM models to identify the OTUs, which contributed significantly to the differences between gut and marine snow aggregates. All p-values reported are adjusted using Holm’s stepdown multiple testing controlling for the family wise error rate, as implemented in the mvabund package. […]

Pipeline specifications

Software tools QIIME, USEARCH, UPARSE, BLASTN, Clustal W, MEGA, CLC Genomics Workbench, DESeq2
Applications Phylogenetics, 16S rRNA-seq analysis
Organisms Caenorhabditis elegans, Anguilla anguilla, Drosophila melanogaster