Accurate gene structure prediction plays a fundamental role in functional annotation of genes. The main focus of gene prediction methods is to find patterns in long DNA sequences that indicate the presence of genes.
Builds transcriptomes from RNA-seq data. Trinity is a standalone software composed of three main components: (i) Inchworm, that first generates transcript contigs; (ii) Chrysalis, for clustering them and constructing complete de Bruijn graphs for each cluster and; (iii) Butterfly that processes individual graphs in parallel for finally resulting to the reconstruction of the transcript sequences.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
A software tool for gene-structure prediction. N-SCAN can model the phylogenetic relationships between the aligned genome sequences, context dependent substitution rates, and insertions and deletions. An implementation of N-SCAN was created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster.
A gene finder based on a Generalized Hidden Markov Model (GHMM). Although the gene finder conforms to the overall mathematical framework of a GHMM, additionally it incorporates splice site models adapted from the GeneSplicer program and a decision tree adapted from GlimmerM. It also utilizes Interpolated Markov Models for the coding and noncoding models. Currently, GlimmerHMM's GHMM structure includes introns of each phase, intergenic regions, and four types of exons (initial, internal, final, and single).
Determines full exonic structures of vertebrate genes in anonymous DNA sequences. GeneID can study chromosome-size sequences in a few minutes on a standard workstation. It permits a detailed analysis of gene features in genomic sequences. This tool enables the impression of an exhaustive list of all the sequence signals and exons predicted along the query sequence. It predicts and scores splice sites, and start and stop codons using position weight matrices (PWMs).
A gene finding system designed to automate the process of predicting gene structure from multiple sources of evidence, with results that often match the performance of human curators. JIGSAW computes the relative weight of different lines of evidence using statistics generated from a training set, and then combines the evidence using dynamic programming.
Allows prediction of genes in a target genome sequence using the sequence of a second informant or reference genome. SGP2 combines calculation of a pairwise alignment and processing of sequence and alignment files. It is based on DNA or amino acid pairwise alignments. This tool can be useful for validation of gene-structure annotations. It can be used for homologous sequence pairs from vertebrates and from plants.
Predicts gene structure using similar protein sequences. GeneWise is heavily used by the Ensembl annotation system. It was developed from a principled combination of hidden Markov models (HMMs). GeneWise is highly accurate and can provide both accurate and complete gene structures when used with the correct evidence.
Predicts genes in eukaryotic genomic sequences. AUGUSTUS is based on the evaluation of hints to potentially protein-coding regions by means of a Generalized Hidden Markov Model (GHMM) that takes both intrinsic and extrinsic information into account. This software models protein families by block profiles, where a block corresponds to an ungapped and highly conserved section of multiple sequence alignments (MSA).
An automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. Combining the best features of the pan-genome approach in highly abundant clades with well-described and well-tested ab initio methods, PGAP now presents a flexible and extensible framework for prokaryotic annotation needs. The PGAP pipeline is designed to annotate both complete genomes and draft genomes comprising multiple contigs. PGAP is deeply integrated into NCBI infrastructure and processes, and uses a modular software framework, GPipe, developed at NCBI for execution of all annotation tasks, from fetching of raw and curated data from public repositories (the Sequence and Assembly databases) through sequence alignment and model-based gene prediction, to submission of annotated genomic data to public NCBI databases.
An open integrative gene finder for eukaryotic and prokaryotic genomes. Compared to most existing gene finders, EuGene is characterized by its ability to simply integrate arbitrary sources of information in its prediction process, including RNA-Seq, protein similarities, homologies and various statistical sources of information.
A gene identification algorithm that combines exon-intron and splice signal models with similarity to known protein sequences in an integrated model. GenomeScan can accurately identify the exon-intron structures of genes in finished or draft human genome sequence with a low rate of false-positives.
Provides sensitivity in identifying existing genes. Prodigal is a gene-finding program for microbial for genome annotation of either draft or finished microbial sequence. It was developed to predict translation initiation sites more accurately. This application also permits to minimize the number of false positive predictions. This method can be useful for automated microbial annotation pipelines.
A eukaryotic gene finder that can use multiple sources of evidence to improve prediction accuracy. ExonHunter is based on hidden Markov models allowing use of variety of additional sources of information (ESTs, proteins, genome-genome comparison, repeats, and more).
Avoids spurious mutations while being able to report real mutations, both on simulated and real data. CESAR is an Hidden-Markov-Model (HMM) based method that enhances the utility of genome alignments for comparative gene annotation by (i) being significantly faster and memory efficient, which allows routine application without large computer resources, (ii) improving the ability to identify distal splice site shifts, which increases the accuracy of gene annotation, and (iii) providing a new gene mode that is able to detect complete intron deletions and can be used to annotate entire genes instead of individual exons.
Identifies complete exon/intron structures of genes in genomic DNA. GENSCAN uses a homogeneous fifth order Markov model of noncoding regions and a three periodic (inhomogeneous) fifth order Markov model of coding regions. Features of the program include the capacity to predict multiple genes in a sequence, to deal with partial as well as complete genes, and to predict consistent sets of genes occurring on either or both DNA strands.
An ab-intio gene prediction software, which find genes in prokaryotic genomes in all six reading frames. The methodology follows a physico-chemical approach and has been validated on 372 prokaryotic genomes.
Annotates gene structure. EVM leverages ab initio gene predictions and sequence homologies to generate weighted consensus gene predictions. It exploits high quality evidence where available and provides consensus gene structure prediction accuracy that approaches that of manual annotation. This tool computes connecting exons, introns, and intergenic regions across the complete genome sequence to report consensus gene structures.
Allows automation improvement of gene structures in Arabidopsis thaliana. PASA was used in Eukaryotic genome annotation projects such as Rice, Aspergillus species, Plasmodium falciparum, Schistosoma mansoni, Aedes aegypti, mouse, human, among others. This tool is able to recognize and organize splicing variations supported by the transcript alignments. It can clean the transcripts, validate perfect alignments or procced to automatic genome annotation.
Topics (11): Genome annotation, De novo sequencing analysis, Mycobacterium bovis, Homo sapiens, Gram-Positive Bacterial Infections, Gram-Positive Bacterial Infections, Meningitis, Central Nervous System Infections, Central Nervous System Bacterial Infections, Gram-Positive Bacterial Infections, Tuberculosis, Pulmonary