Computational protocol: Sorting things out: Assessing effects of unequal specimen biomass on DNA metabarcoding

Similar protocols

Protocol publication

[…] All 10 samples (S, M, L, Un, So for sampling sites P8 and P10) were amplified with the four freshwater macroinvertebrate fusion primer sets BF/BR (Elbrecht & Leese, ). The four primer combinations are targeting a 217‐ to 421‐bp long fragment of the Cytochrome c oxidase I (COI) gene. Figure gives an overview of sample tagging using fusion primers with inline barcodes. Each PCR was composed of 1× PCR buffer (including 2.5 mmol/L Mg2+), 0.2 mmol/L dNTPs (Thermo Fisher Scientific), 0.5 μmol/L of each primer (Biomers, Ulm, Germany), 0.025 U/μl of HotMaster Taq (5Prime, Gaithersburg, MD, USA), 0.5 mg/μl molecular grade BSA (NEB, MA, USA), 12.5 ng DNA, filled up with HPLC H2O to a total volume of 250 μl. Each 250 μL PCR reaction mix was divided into five wells before the PCR. PCRs were run in a Biometra TAdvanced Thermocycler using the following program 94°C for 3 min, 40 cycles of 94°C for 30 s, 50°C for 30 s, and 65°C for 2 min, and 65°C for 5 min. The large reaction volume and BSA were necessary due to PCR inhibitors present in the samples. PCR products were purified and size selected (left sided) using SPRIselect with a ratio of 0.8× (Beckman Coulter, CA, USA) and quantified with a Qubit fluorometer (HS Kit, Thermo Fisher Scientific). Samples were pooled to equal molarity, and the final library purified with the MinElute Reaction Cleanup Kit (Qiagen, NL), as a precaution because the BSA used in the PCR caused adhesion of beads to the tube walls in the PCR clean‐up with SPRIselect. Paired‐end sequencing was performed on one lane of an Illumina HiSeq 2500 system with a rapid run 250‐bp PE v2 sequencing kit and 5% PhiX spike‐in. However, sequences contained ambiguous bases at two positions, due to air bubbles in the flow cell (SRR3399055). Thus, the run was repeated, this time loading two lanes with the same library with slightly different cluster densities, again with a 5% PhiX spike‐in.We used the UPARSE pipeline in combination with custom R scripts (Dryad for data processing (Edgar, ; Fig. ). Reads from both lanes were demultiplexed with a R script and paired end reads merged using Usearch v8.1.1861 –fastq_mergepairs with –fastq_maxdiffs and –fastq_maxdiffpct 99 (Edgar & Flyvbjerg, ). Primers were removed with Cutadapt version 1.9 on default settings (Martin, ). Sequences were trimmed to the same 217‐bp region amplified by the BF1 + BR1 primer set and the reverse complement build if necessary using fastx_truncate/fastx_revcomp. Only sequences with 207–227 bp were length used in further analysis (filtered with Cutadapt). Low quality sequences were then filtered from all samples using fastq_filter with maxee = 1. Sequences from all samples were then pooled, dereplicated (minuniquesize = 3) and clustered into operational taxonomic units (OTUs) using cluster_otus with 97% identity (Edgar, ) (includes chimera removal). A threshold of 97% was used to reduce the effect of sequencing errors, which might lead to the generation of additional “false” OTUs.Preprocessed reads (Fig. , step B) of all samples were dereplicated again using derep_fulllength, but singletons were included. Sequences of each sample were matched against the OTUs with a minimum match of 97% using usearch_global. As the same library was loaded on both lanes, hit tables from both HiSeq lanes were combined, because they only represent sequencing replicates. Only OTUs with a read abundance above 0.01% in at least one sample were considered in downstream analysis. Within each sample, OTUs with less or equal than 0.01% were set to 0% sequence abundance to reduce the number of false positive OTUs. Taxonomy was assigned to the remaining OTUs using an R script searching the BOLD and NCBI database independently. Conflicting taxonomy was resolved on a case‐by‐case basis (with falling back to a coarser taxonomic level if the correct assignment was no evident). Only OTUs reliably identified as freshwater macroinvertebrates were included in the main analysis. […]

Pipeline specifications

Software tools UPARSE, USEARCH, cutadapt
Application 16S rRNA-seq analysis