A gene prediction method which combines sequencing error models and codon usages in a hidden Markov model to improve the prediction of protein-coding region in short reads. The performance of FragGeneScan was comparable to Glimmer and MetaGene for complete genomes. But for short reads, FragGeneScan consistently outperformed MetaGene (accuracy improved ∼62% for reads of 400 bases with 1% sequencing errors, and ∼18% for short reads of 100 bases that are error free). When applied to metagenomes, FragGeneScan recovered substantially more genes than MetaGene predicted (>90% of the genes identified by homology search), and many novel genes with no homologs in current protein sequence database.

MGA / MetaGeneAnnotator

A web app and a package for gene prediction which precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. MetaGeneAnnotator integrates statistical models of prophage genes, in addition to those of bacterial and archaeal genes, and also uses a self-training model from input sequences for predictions. The MGA can precisely predict genes even on short genomic sequences. Both typical and atypical genes can be sensitively and precisely detected while keeping high specificity.

A-GAME / A GAlaxy suite for functional MEtagenomics

Incorporates tools and workflows for the analysis of environmental DNA (eDNA) sequence data. A-GAME is a general bioinformatics workflow management system implemented within Galaxy. The software contains pre-designed workflows that utilize standard tools for data pre-processing, sequence assembly and annotation; as well as custom utilities dedicated to the analysis of functional metagenomics data. It allows the incorporation of most widely used bioinformatics tools. A-GAME can be used to build and customize bioinformatics workflows.


A software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark).

Protein generator

Intends to enable unannotated bacterial genomes or contigs analysis. Protein generator uses the coordinates of the start and stop codons for predicted genes to create a personalized protein sequence database for tandem mass spectrometry (MS/MS) ions searches. Moreover, it can generate a protein sequence database by slicing the genome into fixed length overlapping fragments followed by six-frame translation of each fragment. The software is a part of PG Nexus pipeline.