More and more genomes are being sequenced, and to keep up with the pace of sequencing projects, automated annotation techniques are required. One of the most challenging problems in genome annotation is the identification of the core promoter.
A 5' terminal exon and promoter prediction program. FirstEF consists of different discriminant functions structured as a decision tree. The probabilistic models are optimized to find potential first donor sites and CpG-related and non-CpG-related promoter regions based on discriminant analysis. For every potential first donor site (GT) and an upstream promoter region, FirstEF decides whether or not the intermediate region can be a potential first exon, based on a set of quadratic discriminant functions.
A program for a large-scale human promoter mapping using CpG islands. CpGpromoter is based on results of discriminant analysis between the promoter-associated CpG islands and non-associated ones. It enables an efficient mapping of human promoters with 2Kb resolution, if there is a CpG island inside an interval (-500...+1,500) around a transcription start site.
Predicts transcription start sites of vertebrate PolII promoters in DNA sequences. Promoter has been developed as an evolution of simulated transcription factors that interact with sequences in promoter regions. It builds on principles that are common to neural networks and genetic algorithms.
Mines for regulons, promoters and transcription factor binding sites (TFBSs) in sequenced bacterial genome. PePPER uses an approach in which all available information on prokaryotic regulons and TFBSs is used to identify regulons in any query bacterium. The software allows uploading of un-annotated data, which is then processed automatically. It can be used to pinpoint a wide range of putative regulons and their cognate TFBSs in any bacterial genome on the basis of existing knowledge.
Facilitates extraction of user specified regions around the transcription start site of all annotated human, mouse or rat genes. The transcription start sites have been identified computationally by considering alignments of a large number of partial and full-length mRNA sequences to genomic DNA, with provision for alternative promoters.
Predicts human promoter. FPROM is based on linear discriminant functions combining characteristics that describe functional motifs and oligonucleotide composition of potential start positions. It can automatically identify protein coding genes, pseudogenes and promoters in eukaryotic genomes. This tool is able to recognize 80% of TATA promoter sequences with one false positive prediction per 2,000 base pairs.
Detects intergenic miRNA transcription start sites (TSSs). microTSS relies on deeply sequenced RNA-Seq data. It integrates RNA-Seq data by creating ‘islands’ of transcription upstream of intergenic premiRNAs. This tool scores each candidate TSS and derives the final predictions by employing a combination of three independent support vector machine (SVM) models. It eases the annotation of primiRNAs and non-coding transcripts in general.
A program for annotating miRNA promoters in human, as well as other species. PROmiRNA uses deepCAGE data from the FANTOM4 Consortium and integrated cage tag counts and other promoter features, such as CpG content, conservation and TATA box affinity, to score the potential of a candidate region to be a promoter. Given a list of genomic regions of interest, in the form of a gff file, PROmiRNA returns the most probable promoter locations, together with the posterior probabilities calculated by the model.
A promoter prediction program which applies a boosting technique with stumps to select important small-scale as well as large-scale features. CoreBoost improves greatly on locating transcription start sites. The current version has more than 30% sensitivity and positive predictive value at 50 bp resolution.
The sigma(54) promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. The web-server iPro54-PseKNC was developed to predict σ54 promoters in prokaryote.
Recognizes promoter in an accurate way. PromH provides transcription start sites (TSS) positions and locations of known regulatory elements around it. It offers the several features: conservativeness levels of regions (1) around TSS and (2) to the right of TSS (40 bp), (3) an average conservation level of regulatory motifs located to the left of TSS and, for TATA promoters, (4) conservation level around TATA-box.
Enables the user to customize the procedure to a specific problem. MADAP is a flexible one-dimensional clustering tool for the inference of promoters from mRNA 5′ end profiles obtained from the mapping of full-length cDNAs to the genome sequence. It uses internally normal distributions and was designed to model non-contiguous distributions of any shape. MADAP is versatile enough to interpret data from any source in terms of a finite number of clusters characterized by center positions, volume and extension.
A kernel based method that accurately recognizes transcription start sites in human. The application of otherwise too computationally expensive support vector machines was made possible due to the use of efficient training and evaluation techniques using suffix tries. ARTS finds about 35% true positives at a false positive rate of 1/1000, where the other methods find about a half (18%).
Predicts putative promoters for five classes of σ factors in Cyanobacteria (σ A, σ C, σ H, σ G and σ F) and for five classes of sigma factors in E. coli (σ 70, σ 38, σ 32, σ 28 and σ 24). Comparing to currently available tools, bTSSfinder achieves higher accuracy (MCC=0.86, F1-score=0.93) compared to the next best tool (with MCC=0.59, F1-score=0.79) and covers multiple classes of promoters.
A promoter prediction program based on unsupervised clustering of physical properties of DNA. ProSOM can clearly distinguish between the structural profiles of promoter sequences and other genomic sequences by using self-organizing maps.
Allows users to identify promoter regions in a genome using computational models. EP3 can be applied to identify the promoter of known genes or to detect the start of any transcription event. It assists users in determining three scores: (1) number of predictions; (2) how many of these predictions are correct; and (3) how many genes are discovered by the predictions.
Offers a method for the evaluation of Promoter Prediction Programs (PPP). pppBenchmark can both facilitate the benchmarking studies and estimate the quality of predictions of PPP. It employs a set of two bin-based and two distance based protocols that can also be used if different types of data are available. It was applied for comparing more than 15 academic software and predictions encompassing a complete genome.
Locate the positions of promoters. In this method, not only the local distribution of pentamers, but also the positional CpG island features are considered. PromoterExplorer is tested based on large-scale DNA sequences from different databases, including the EPD, DBTSS, GenBank and human chromosome 22. Experimental results show that consistent and promising performance can be achieved.
A method for finding regulatory modules in Eukaryotic species using phylogenetic data. The user inputs two files: alignment file for the sequences of interest, and the XML file of parameters and model specification. The user specifies the phylogenetic tree, parameter values for the transition/transversion bias κ, the background base frequencies, the PWMs for each transcription factor binding sites (TFBS), the sliding window size and the step size.
A Java GUI with multiple graphical representations ('Views') of enhancer alignments that displays motifs, as IUPAC consensus sequences or position frequency matrices, in the context of phylogenetic conservation to facilitate cis-regulatory element discovery. Thresholds of phylogenetic conservation and motif stringency can be altered dynamically to facilitate detailed analysis of enhancer architecture. Views can be exported to vector graphics programs to generate high-quality figures for publication. Twine can be extended via Java plugins to manipulate alignments and analyze sequences.