Computational protocol: Upstream Freshwater and Terrestrial Sources Are Differentially Reflected in the Bacterial Community Structure along a Small Arctic River and Its Estuary

Similar protocols

Protocol publication

[…] The sequencing data was quality checked using FastQC (Patel and Jain, ) and read pairs were merged with the paired-end read merger PEAR (Zhang et al., ). Only properly merged reads were used for downstream analysis. Merged reads were processed using Qiime version 1.8.0 (Caporaso et al., ). Demultiplexing with split_libraries_fastq.py was performed with quality filtering at phred threshold ≥ 20. Chimeric sequences were removed from demultiplexed data with USEARCH uchime reference-based chimera removal using the Greengenes database from May 2013 as reference (Edgar et al., ). Chimera check removed 12.3% of sequences. Operational taxonomic units (OTUs) were subsequently picked based on 97% identity using de novo OTU picking, which also includes taxonomy assignment using PyNAST alignment against the Greengenes core set of 16S rRNA sequences (Caporaso et al., ). Sequences only represented once in the dataset were removed, which reduced the dataset with 13.7%.Shannon indices (Shannon, ), Chao1 richness (Chao, ) and rarefaction plots were computed using alpha_rarefaction.py. Chloroplast sequences were removed and samples were rarefied to the shallowest sample depth of 12,180 sequences per sample with R version 3.1.0 (R Development Core Team, ) and R package Vegan (Oksanen et al., ). BIOENV analyses were used to assess how well the community structure was explained by environmental variables using non-factorial metadata (Table ; Clarke and Ainsworth, ). For BIOENV analysis the Vegan package was used to create distance matrices of environmental data (Euclidean distances) and community composition (Bray-Curtis distances), which were then compared through Spearman's rank coefficients. DOC, TN, temperature, salinity, oxygen saturation, and turbidity were log transformed prior to analysis. Depth was not included for BIOENV analyses including only river samples as depth was constant and particle data was not included for any analyses including river samples, as the data was not available. LabDSV package in R was used for non-metric multidimensional scaling (NMDS) and indicator species analysis. Indicator species are here denoted indicator OTUs and are defined as OTUs having a higher abundance at one site compared to other sites with indicator values d ≥ 0.3 at a significance level of p ≤ 0.05. Indicator values are a product of relative abundance of an OTU in samples from one site (between 0 and 1) and the relative average abundance of that OTU across all sites (Dufrene and Legendre, ). The used indicator species and indicator OTU concept in this study are not equal to the Indicator Species concept representing species that are markers for certain environmental variables in an ecosystem. NMDS analyses were conducted using Bray-Curtis distance matrices. NMDS stress values are included in Figure 3. […]

Pipeline specifications

Software tools FastQC, PEAR, QIIME, USEARCH, UCHIME, PyNAST
Databases Greengenes
Application 16S rRNA-seq analysis
Diseases Xerostomia