DNA sequence annotation consists in several successive steps, including location of coding and non-coding sequences, gene prediction, identification of regulatory elements and functional annotation. Genome annotation pipelines are proposing a suite of tools to facilitate this complex analysis and to have reproducible workflows.
A command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka uses parallel processing to decrease running time on multicore computers. The most time-consuming steps are BLAST+ and hmmscan, which both support multiple CPUs natively. However, Prokka is more efficient if it runs multiple single CPU threads on subsets of the data, which it achieves using GNU parallel.
Provides several tools for sequence database maintenance and analysis. ARB is a graphically oriented package that consists of software tools directly interacting with one another as well as the central database, which are controlled via a common graphical interface. It includes software for visualization, sequence edition, profiles determination, filtering, phylogenetic treeing, sequence alignment, probe design and evaluation and data import and export.
Assists in annotating complete or nearly complete bacterial and archaeal genomes. RAST is a fully-automated application provides high quality genome annotations for these genomes across the whole phylogenetic tree. It includes a user interface that allows registered users to make manual changes to their genomes before retrieving them. It was designed to extend annotations to as many protein-encoding genes in as many genomes as possible.
An automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. Combining the best features of the pan-genome approach in highly abundant clades with well-described and well-tested ab initio methods, PGAP now presents a flexible and extensible framework for prokaryotic annotation needs. The PGAP pipeline is designed to annotate both complete genomes and draft genomes comprising multiple contigs. PGAP is deeply integrated into NCBI infrastructure and processes, and uses a modular software framework, GPipe, developed at NCBI for execution of all annotation tasks, from fetching of raw and curated data from public repositories (the Sequence and Assembly databases) through sequence alignment and model-based gene prediction, to submission of annotated genomic data to public NCBI databases.
A computational method for building a highly reliable set of gene annotations in the absence of experimental data. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. This procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages.
Provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. BUSCO assessments are implemented in open-source software, with comprehensive lineage-specific sets of benchmarking universal single-copy orthologs for arthropods, vertebrates, metazoans, fungi, eukaryotes, and bacteria.
Provides a genome annotation tool. GenDB implements multiple user-defined annotation strategies and includes a wide range of metaphors for data navigation during the genome annotation process. This software can be integrated into other programs and data sources to build platform for system biology. It relies on relational database backend and can be linked to other systems via an application programming interface (API).