Similar protocols

Pipeline publication

[…] latform. All samples were paired‐end‐sequenced with 100 bp read lengths at the Genomics Core Facility, European Molecular Biology Laboratory. Novel samples have been submitted to NCBI under accession number PRJEB17632., All samples were processed with the same computational protocol. Reads were quality‐filtered and screened against the human genome sequence for removing contamination as previously described (Zeller et al, ). Species abundance was calculated using established MOCAT (Kultima et al, ) protocols for specI clusters (Mende et al, ). Throughout the manuscript, we used specI clusters at the species level related via the NCBI taxonomy database as a taxonomic reference. Additionally, mOTU abundances were also determined using standard MOCAT procedures (Sunagawa et al, ), but exclusively used to estimate species diversity ()., For calling genomic variants, all metagenomic sequencing reads were additionally mapped to a reference set consisting of 1,753 genomes (each representative of one specI cluster) (Mende et al, ), using MOCAT (Kultima et al, ) with default parameters. Specifically, reads were mapped at 97% identity and multiple mappers were discarded. Computation of genome coverage for each specI cluster was performed using qaCompute (, resulting in estimations of both horizontal and vertical coverage per sample, per genome., Population SNPs were called using metaSNV (Costea et al, ), which resulted in 19,221,237 positions over 1,753 genomes., Determination of subspecies structure proceeded through the following steps: Firstly, the set of samples considered for each species was restricted to a high‐confidence discovery set (see below) to ensure accurate variant determination. Based on these variants, a distance was then computed between all samples and subspecies determined on this basis. Finally, variants specific to each subspecies (genotyping positions) were computed and used to expand subspecies assignments to new samples or ones that did not meet the criteria for inclusion in the discovery set., For avoiding issues caused by coverage variati […]

Pipeline specifications

Software tools mOTU, MOCAT, qaTools, metaSNV
Databases NCBI Taxonomy Database