Computational protocol: Plant DNA metabarcoding of lake sediments: How does it represent the contemporary vegetation

Similar protocols

Protocol publication

[…] Initial filtering steps were done using OBITools [] following the same criteria as in [, ] (). We then used ecotag program [] to assign the sequences to taxa by comparing them against a local taxonomic reference library containing 2445 sequences of 815 arctic [] and 835 boreal [] vascular plant taxa; the library also contained 455 bryophytes []. We also made comparisons with a second reference library generated after running ecopcr on the global EMBL database (release r117 from October 2013). Only sequences with 100% match to a reference sequence were kept. We excluded sequences matching bryophytes as we did not include them in the vegetation surveys. We used BLAST (Basic Local Alignment Search Tool) (http://www.ncbi.nlm.nih.gov/blast/) to check for potential wrong assignments of sequences.When filtering next-generation sequencing data, there is a trade-off between losing true positives (TP, sequences present in the samples and correctly identified) and retaining false positives (FP, sequences that originate from contamination, PCR or sequencing artefacts, or wrong match to database) [, , ]. We therefore assessed the number of TP and FP when applying different last step filtering criteria. We initially used two spatial levels of comparison with the DNA results: i) data from our vegetation surveys and ii) the regional flora (i.e., species in the county of Nordland and Troms as listed by the Norwegian Bioinformation Centre (http://www.biodiversity.no/). For any lake, both datasets are likely incomplete, as inconspicuous species may be lacking in the regional records [] and our vegetation surveys did not include the entire catchment area. Nevertheless, the exercise is useful for evaluating how many FPs and TPs are lost by applying different filtering criteria. We defined true positives as sequences that matched a species recorded in the vegetation surveys at the same lake, being aware that this is an under-representation, as the vegetation surveys likely missed species. We defined false positives as species recorded neither in the vegetation surveys nor the regional flora. We tested the effect of different rules of sequence removal: 1) found as ≤1,≤5 or ≤10 reads in a PCR repeat, 2) found as ≤1,≤2 or ≤3 PCR repeats for a lake sample, 3) occurring in more than one of 72 negative control PCR replicates, 4) on average, higher number of PCR repeats in negative controls than in sample, and 5) on average a higher number of reads in negative controls than in samples (). The filtering criteria resulting in overall highest number of true positives kept compared to false positives lost were applied to all lakes. These were removing sequences with less than 10 reads, less than 2 PCR repeats in lake samples, and on average a lower number of reads in lake samples than in negative controls. […]

Pipeline specifications

Software tools OBITools, ecoPCR, BLASTN
Application qPCR