[…] gun) were used to retrieve the genus annotation for the OTU validation set. For the most wanted OTUs, we also relied on the RDP taxonomical annotation from Fodor et al. (2012). A threshold of 98% identity and 98% alignment coverage were used to define the genus annotation level in the different databases. For the RDP annotation, we only considered genera with classification confidence ⩾80%, and discarded any assignments that disagreed with the information from other databases., The metadata and sequences for the most wanted OTUs (priority group, max fraction body habitat, RDP assignment and so on) were downloaded from the HMP website at The sequences and Qiime-processed () abundance table for the HMP OTUs were downloaded from, The MGS metadata and sequences (genes, proteins, draft genomes) were downloaded from Note that the MGS genomes were reconstructed from metagenomic data using a coabundance clustering approach similar to the one described here. They were further refined through sample-specific reassembly and the resulting assemblies were validated for completeness and correctness. For more details see ., We compared the MGS draft genomes to genomes from NCBI database using the dnadiff tool from the mummer () package with default parameters., A functional annotation of the MGS proteins to the eggNOG groups (COG) was performed in two steps. First, the MGS proteins were assigned to the eggNOG 3.0 database with blastp (v.2.2.29+), using an e-value threshold of 1E−5. The first 20 hits for each protein were then considered for pairwise alignment to the MGS proteins using Clustal Omega (). The sequence identity and coverage was extracted from the alignment. Finally, the COG was assigned from the hit that shared the highest sequence identity and coverage., The PICRUSt COG prediction was performed with picrust-1.0.0 () according to the authors' recipe for metagenome prediction., The reads of the 138 HMP samples were aligned to the MGS gene catalog 3.9M from the paper by . The reads were mapped using Bowtie (), by aligning the first 35 nt of the read and allowing up to three mismatches, and using the best-hit option., Raw count data were preprocessed to remove technical variability due to sequencing depth. We performed a downsampling procedure for each of the 13 […]

Software tools QIIME, MUMmer, BLASTP, Clustal Omega
Organisms Homo sapiens