1 - 50 of 82 results


star_border star_border star_border star_border star_border
star star star star star
Allows integrative investigation of next generation sequencing (NGS) microbiology data. Orione supports the whole life cycle of microbiology research data from production and annotation to publication and sharing. It can be used for a variety of microbiological projects including bacteria resequencing, de novo assembling and microbiome investigations. This tool is implemented on the Galaxy web platform.

RAST / Rapid Annotation using Subsystem Technology

star_border star_border star_border star_border star_border
star star star star star
forum (1)
Assists in annotating complete or nearly complete bacterial and archaeal genomes. RAST is a fully-automated application provides high quality genome annotations for these genomes across the whole phylogenetic tree. It includes a user interface that allows registered users to make manual changes to their genomes before retrieving them. It was designed to extend annotations to as many protein-encoding genes in as many genomes as possible.


Supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. MicroScope contains data for more than 6000 microbial genomes, and among the 2700 personal accounts, 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations.

PAGIT / Post-Assembly Genome-Improvement Toolkit

Provides a toolkit for improving the quality of genome assemblies created via an assembly software. PAGIT compiled four tools: (i) ABACAS which classifies and orientates contigs and estimates the sizes of gaps between them; (ii) IMAGE uses paired-end reads to extend contigs and close gaps within the scaffolds; (iii) ICORN for identifying and correcting small errors in consensus sequences and; (iv) RATT for help annotation. The software was mainly created to analyze parasite genomes of up to about 300 Mb.

RATT / Rapid Annotation Transfer Tool

Helps in genome annotation. RATT is a software that allows users to transfer any entries from a reference sequence to similar samples. It can be applied to successive versions of a genome assembly, genomes of closely related species or strains. In addition, the software is also able to detect dissimilarities between two sequences and to generate inputs making genomes’ features to be visualized with Artemis. The software is a part of PAGIT toolkit.

GAMOLA / Global Annotation of Multiplexed On-site bLasted DNA-sequences

Allows microbiologists to work with and curate draft and completed genomes. GAMOLA offers a completely local solution for annotating genomes that can tracks open reading frames (ORFs) designations and Blast results through new sequence versions. It also designs oligonucleotides for polymerase chain reaction (PCR) products to be spotted on whole genome microarrays, including an optional BlastN analysis to find potential mispriming sets.

FARAO / Flexible All-Round Annotation Organizer

A set of software tools that constitute as a highly configurable way to handle sequence annotation and coverage information. FARAO 1) integrates annotation and coverage information for the same sequence set; 2) is scalable to millions of sequences and features; 3) can filter out sequences with annotations satisfying criteria given by the user; 4) handles annotations produced by a range of bioinformatics tools; and 5) provides a flexible interface for writing custom parsers for virtually any format not supported out of the box.


A prokaryotic genome annotation framework that performs intrinsic gene predictions, homology searches, predictions of non-coding genes as well as CRISPR repeats and integrates all evidence into a consensus annotation. ConsPred achieves comprehensive, high-quality annotations based on rules and priorities, similar to decision-making in manual curation and avoids conflicting predictions. Parameters controlling the annotation process are configurable by the user. ConsPred is primarily useful for the annotation of finished genome sequences or high-quality genome drafts (e.g., for submission to public databases), and for genome re-annotation in comparative genomics and functional genomics projects of prokaryotes.


A hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models.


Provides a more reliable annotation of both protein-coding and non-coding genes. Genix is an automated pipeline for bacterial genome annotation. This server performs the annotations one-by-one, so the time needed to annotate your genome depends not only on your sequence and the parameters you set to generate the protein database, but also on the server usage. A comparison of the results generated by Genix for four reference genomes against those generated by other annotation tools indicated that this pipeline is able to provide results that are closer to the reference genome annotation, with a smaller amount of false-positive proteins and missing functional annotated proteins. These results indicate that Genix is a useful tool that is able to provide a more refined result, and may be a user-friendly way to obtain high quality results.

Companion / COMprehensive Parasite ANnotatION

A web server providing parasite genome annotation as a service using a reference-based approach. Companion delivers a readily usable annotation of features in the target genome, in a variety of different formats including those required for submission to public databases. Moreover, it implements several features to highlight gene content differences between the reference and the new assembly, such as identification of orthologous clusters, species-specific singleton genes and missing core genes present in a larger reference species set. To recognize misassemblies or rearrangements, it also provides a high level visualization of sequence matches.

SLALOM / StatisticaL Analysis of Locus Overlap Method

Estimates multiple sequence annotations of continuous sequence elements (CSE) in a group of sequences. SLALOM offers to users the choice to applicate logic of handling overlaps and duplicates within individual CSE annotations, assigning matches between annotated CSE, treating missing values, gathering sequences, picking right measures, and leveling values to answer the question of interest. This tool detects and manages statistical pitfalls via increase awareness.


Uses for the analysis of bacterial RNA-Seq data. ANNOgesic is a modular, user-friendly annotation pipeline that integrates different types of data like dRNA-Seq as well as RNA-Seq generated after transcript fragmentation and generates high-quality genome annotations. It can detect several genomic features including genes, CDSs (coding DNA sequence), tRNAs, rRNAs, TSSs (transcriptional start sites), and processing sites (PSs), transcripts, terminators, untranslated regions (UTRs) as well as sRNAs, small open reading frames (sORFs), circular RNAs, CRISPR-related RNAs, riboswitches, and RNA-thermometers.

DeNoGAP / De-Novo Genome Analysis Pipeline

Performs reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of Genomes. It scales linearly since the homology assignment is based on iteratively refined hidden Markov models. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences.


A comprehensive pipeline for computationally screening putative long non-coding RNA (lncRNA) transcripts over large multimodal datasets. lncRNA-screen main objective is to facilitate the computational discovery of lncRNA candidates to be further examined by functional experiments. lncRNA-screen provides a fully automated easy-to-run pipeline which performs data download, RNA-seq alignment, assembly, quality assessment, transcript filtration, novel lncRNA identification, coding potential estimation, expression level quantification, histone mark enrichment profile integration, differential expression analysis, annotation with other type of segmented data (copy number variations (CNVs), single nucleotide polymorphisms (SNPs), Hi-C, etc.) and visualization. Importantly, lncRNA-screen generates an interactive report summarizing all interesting lncRNA features including genome browser snapshots and lncRNA-mRNA interactions based on Hi-C data. In summary, lncRNA-screen pipeline provides a comprehensive solution for lncRNA discovery and an intuitive interactive report for identifying promising lncRNA candidates.

VAX / Variant Annotator eXtras

A scalable method for using the plugin capability of the Ensembl Variant Effect Predictor to enrich its basic set of variant annotations with additional data on genes, function, conservation, expression, diseases, pathways and protein structure, and describe an extensible framework for easily adding additional custom data sets. VAX consists of a locally installed MySQL database system, which hosts the Ensembl database and custom data used by the annotator, local installations of the Ensembl Perl API and VEP and a library of custom VEP plugins.


star_border star_border star_border star_border star_border
star star star star star
Assigns K numbers to the user's sequence data by BLAST searches, respectively, against a nonredundant set of KEGG GENES. KOALA (KEGG Orthology And Links Annotation) is KEGG's internal annotation tool for K number assignment of KEGG GENES using SSEARCH computation. Annotate Sequence in KEGG Mapper and Pathogen Checker in KEGG Pathogen are special interfaces to this server and can be executed in an interactive mode. BlastKOALA is suitable for annotating fully sequenced genomes.

CaGe / Cancer Gene annotation system

Provides access to information on cancer genes, mutations, pathways, and associated annotations. CaGe is a web-accessible cancer genome annotation system, based on several cancer gene databases composed of reported cancer-causing genes and associated cancer pathways. A cancer gene annotation function, a cancer pathway annotation function as well as cancer gene and pathway browsing functions are available. The database can be useful for identifying cancer-causing mutations and genes in High-throughput genomic technologies (HGT)-based cancer genomics.

GC-Specific MAKER

Predicts new and improved gene models and assesses the biological significance of this method in Oryza sativa. GC-specific MAKER is an annotation protocol that makes use of genes with high and low GC content as training data in order to derive separate versions of the SNAP and AUGUSTUS Hidden Markov Models (HMMs) that are tuned to accurately predict high and low GC genes. This method has an interest to anyone working on structural annotation of genomes with bimodal GC content but will likely improve the annotation of any genome.


Provides parsers to process annotation data from LocusLink, Gene Ontology Consortium, and Human Gene Project and can be extended to new data sources via user defined parsers. AnnBuilder is an R package for assembling genomic annotation data differs from other existing systems in that it provides users with unlimited ability to assemble data from user selected sources. Annbuilder can perform a variety of roles. It can be used by an institution or lab to assemble data from a variety of sources into a more suitable format for the specific analyses being performed. Manufacturers of data such as Gene Expression Omnibus and NetAffx could also employ Annbuilder as a tool to help assemble the data they present.

NPACT / N-Profile Analysis Computational Tool

A computational and graphical representation tool for gene identification and sequence annotation. NPACT identifies sequence segments of any length with statistically-significant 3-base compositional periodicities and associated with ORF structures. NPACT produces graphical representations that allow genome-wide uninterrupted visual comparison of compositional profiles, pre-annotated genes and sequence segments of three-base periodicity with ‘Newly Identified ORFs’, enabling frame analysis on a genomic scale.

BASys / Bacterial Annotation System

A web server that performs automated, in-depth annotation of bacterial genomic (chromosomal and plasmid) sequences. BASys uses more than 30 programs to determine nearly 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, reactions, and pathways. The textual annotations and images that are provided by BASys can be generated in approximately 16 hours for an average bacterial chromosome (5 Megabases. 5000 genes), or approximately 350 coding regions per hour.


Provides a set of over 30 tools to assist researchers in the exploration of genomics and proteomics datasets. Dintor covers a wide range of frequently required functionalities, from gene identifier conversions and orthology mappings to functional annotation of proteins and genetic variants up to candidate gene prioritization and Gene Ontology-based gene set enrichment analysis. A major advantage is its capability to consistently handle multiple versions of tool-associated datasets, supporting the researcher in delivering reproducible results.

FLAN / FLu ANnotation

Automatically annotates genomes of influenza virus A and B based on existing protein sequences in GenBank. For each segment, a set of sample protein sequences is maintained on the server. A special global protein-to-nucleotide alignment tool was designed to accurately annotate spliced genes and mature peptides of influenza viruses. The translated product from the best alignment to the sample protein sequence is used as the predicted protein encoded by the input sequence.

AMIGene / Annotation of MIcrobial Genes

An application for automatically identifying the most likely coding sequences (CDSs) in a large contig or a complete bacterial genome sequence. The first step in AMIGene is dedicated to the construction of Markov models that fit the input genomic data (i.e. the gene model), followed by the combination of well-known gene-finding methods and an heuristic approach for the selection of the most likely CDSs. The web interface allows the user to select one or several gene models applied to the analysis of the input sequence by the AMIGene program and to visualize the list of predicted CDSs graphically and in a downloadable text format.

MaGe / Magnifying Genomes

A microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. This system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions.

MICheck / MIcrobial genome Checker

Enables rapid verification of sets of annotated genes and frameshifts in previously published bacterial genomes. The web interface allows one easily to investigate the MICheck results, i.e. inaccurate or missed gene annotations: a graphical representation is drawn, in which the genomic context of a unique coding DNA sequence annotation or a predicted frameshift is given, using information on the coding potential (curves) and annotation of the neighbouring genes.

PGAP / Prokaryotic Genome Annotation Pipeline

An automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. Combining the best features of the pan-genome approach in highly abundant clades with well-described and well-tested ab initio methods, PGAP now presents a flexible and extensible framework for prokaryotic annotation needs. The PGAP pipeline is designed to annotate both complete genomes and draft genomes comprising multiple contigs. PGAP is deeply integrated into NCBI infrastructure and processes, and uses a modular software framework, GPipe, developed at NCBI for execution of all annotation tasks, from fetching of raw and curated data from public repositories (the Sequence and Assembly databases) through sequence alignment and model-based gene prediction, to submission of annotated genomic data to public NCBI databases.

BUSCO / Benchmarking Universal Single-Copy Orthologs

Provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB. BUSCO assessments are implemented in open-source software, with comprehensive lineage-specific sets of benchmarking universal single-copy orthologs for arthropods, vertebrates, metazoans, fungi, eukaryotes, and bacteria.


A fully automated pipeline for structural annotation of prokaryotic genomes integrating protein similarities, statistical information and any oriented expression information (RNA-seq or tiling arrays) through a variety of file formats to produce a qualitatively enriched annotation including coding regions but also (possibly antisense) non-coding genes and transcription start sites. EuGene-PP exploits a variety of information sources, under most usual formats, to produce an annotation comparable with a curated semiautomated structural annotation, especially on ncRNA genes, which are still difficult to predict.


star_border star_border star_border star_border star_border
star star star star star
A command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka uses parallel processing to decrease running time on multicore computers. The most time-consuming steps are BLAST+ and hmmscan, which both support multiple CPUs natively. However, Prokka is more efficient if it runs multiple single CPU threads on subsets of the data, which it achieves using GNU parallel.