Computational protocol: Aquatic metagenomes implicate Thaumarchaeota in global cobalamin production

[…] Metagenome profile Hidden Markov Models (profile-HMMs) were used for genes of the cobalamin biosynthesis pathway that were retrieved from the TIGRfam () and Pfam () databases, as based on the KEGG pathway (map00860). A total of 11 genes (cobA_cysG C-terminal domain (TIGR01469), cobI_cbiL (TIGR01467), cobJ_cbiH (TIGR01466), cobM_cbiF (TIGR01465), cbiT_cobL (TIGR02469), cbiE_cobL (TIGR02467), cbiC_cobH (PF02570.10), cbiA_cobB (TIGR00379), cbiB_cobD (TIGR00380), cbiP_cobQ (TIGR00313), cobS (TIGR00317)) were selected as cobalamin pathway markers for further analysis due to their broad distribution throughout the cobalamin biosynthesis pathway. For each of these 11 genes, the corresponding HMM was used to scan for homologs in all sequence reads of each metagenome, which were initially processed into open reading frames using FragGeneScan (). The program hmmsearch within HMMER version 3.1b1 ( was used with default parameters and an E-value threshold of 1 × 10−6. The use of profile-HMMs based on protein family alignments eliminated bias that would have otherwise been introduced by homology searches based on single sequence queries. Tools and taxonomy indices from the Krona package () were used to assign taxonomy to recovered hits based on their top BLAST matches in the NCBI refseq database (version 60) with an E-value threshold of 1 × 10−6. […]

Pipeline specifications

Software tools FragGeneScan, HMMER, Krona
Databases Pfam KEGG PATHWAY
Chemicals Vitamin B 12