DNA sequence annotation consists in several successive steps, including location of coding and non-coding sequences, gene prediction, identification of regulatory elements and functional annotation. Genome annotation pipelines are proposing a suite of tools to facilitate this complex analysis and to have reproducible workflows.
A computational method for building a highly reliable set of gene annotations in the absence of experimental data. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. This procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages.
A command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka uses parallel processing to decrease running time on multicore computers. The most time-consuming steps are BLAST+ and hmmscan, which both support multiple CPUs natively. However, Prokka is more efficient if it runs multiple single CPU threads on subsets of the data, which it achieves using GNU parallel.
Provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. BUSCO assessments are implemented in open-source software, with comprehensive lineage-specific sets of benchmarking universal single-copy orthologs for arthropods, vertebrates, metazoans, fungi, eukaryotes, and bacteria.
Assists in annotating complete or nearly complete bacterial and archaeal genomes. RAST is a fully-automated application provides high quality genome annotations for these genomes across the whole phylogenetic tree. It includes a user interface that allows registered users to make manual changes to their genomes before retrieving them. It was designed to extend annotations to as many protein-encoding genes in as many genomes as possible.
An automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. Combining the best features of the pan-genome approach in highly abundant clades with well-described and well-tested ab initio methods, PGAP now presents a flexible and extensible framework for prokaryotic annotation needs. The PGAP pipeline is designed to annotate both complete genomes and draft genomes comprising multiple contigs. PGAP is deeply integrated into NCBI infrastructure and processes, and uses a modular software framework, GPipe, developed at NCBI for execution of all annotation tasks, from fetching of raw and curated data from public repositories (the Sequence and Assembly databases) through sequence alignment and model-based gene prediction, to submission of annotated genomic data to public NCBI databases.
Provides a genome annotation tool. GenDB implements multiple user-defined annotation strategies and includes a wide range of metaphors for data navigation during the genome annotation process. This software can be integrated into other programs and data sources to build platform for system biology. It relies on relational database backend and can be linked to other systems via an application programming interface (API).
A portable and easily configurable genome annotation pipeline. MAKER allows smaller eukaryotic and prokaryotic genome projects to independently annotate their genomes and to create genome databases. It identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions and automatically synthesizes these data into gene annotations having evidence-based quality values. MAKER's inputs are minimal and its ouputs can be directly loaded into a Generic Model Organism Database (GMOD). They can also be viewed in the Apollo genome browser; this feature of MAKER provides an easy means to annotate, view and edit individual contigs and BACs without the overhead of a database. MAKER is available for download and can be tested online via the MAKER Web Annotation Service (MWAS).