Similar protocols

Protocol publication

[…] ). Predictions from both methods are concatenated, and in case of overlapping elements, the shorter one is removed. Identification of tRNAs is performed using tRNAScan-SE-1.23 (). Ribosomal RNA genes (5S, 16S and 23S) are predicted using hmmsearch against the custom models generated for each type of rRNA in bacteria and archaea (,). With the exception of tRNA and rRNA, all models from Rfam () are used to search the genome sequence. Sequences are first compared with a database containing all the non-coding RNA genes in the Rfam database using BLAST, then sequences that have hits to genes belonging to an Rfam model are searched using the program INFERNAL (). Signal peptides are computed using SignalP (), whereas transmembrane helices are computed using TMHMM (). Protein-coding genes are predicted using Prodigal (); models overlapping with CRISPRs and certain types of RNAs (e.g. rRNAs) are removed., After a new genome is processed, protein-coding genes are compared with protein families and the proteome of selected publicly available ‘core’ genomes, with product names assigned based on the results of these comparisons. First, protein sequences are compared with COG () using RPS-BLAST, Pfam-A () using HMMER 3.0b2 executed inside Sanger’s wrapper script and TIGRfam () databases using HMMER 3.0 (), and associated with KEGG Orthology (KO) terms () using USEARCH (). Genomes in IMG are associated with KEGG pathways using the assignment of KO terms to protein-coding genes, while their association with MetaCyc pathways () is based on correlating enzyme EC numbers in MetaCyc reactions with EC numbers associated with protein-coding genes via KO terms. Genes are further characterized using an IMG native collection of generic (protein cluster-independent) functional roles called IMG terms that are defined by their association with generic (organism-independent) functional hierarchies, called IMG pathways (). IMG terms and pathways are specified by domain experts at DOE-JGI as part of the process of annotating specific genomes of interest, and are s […]

Pipeline specifications

Software tools SignalP, TMHMM, Prodigal, BLASTN, HMMER, USEARCH