DNA motif finding software tools | Genome annotation
De-novo motif search is a frequently applied bioinformatics procedure to identify and prioritize recurrent elements in sequences sets for biological investigation, such as the ones derived from high-throughput differential expression experiments. Several algorithms have been developed to perform motif search, employing widely different approaches and often giving divergent results.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Searches for motifs in DNA or RNA sequences that occur with improbable frequency (to be just chance) using a variation of the expectation maximization (EM) algorithm. As a background (null) model it uses up to a second-order Markov model of background sequence. Optionally, Improbizer constructs a Gaussian model of motif placement, so that motifs that occur in similar positions in the input sequences are more likely to be found.
A modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations.
Detects motifs in large scale chromatin-immunoprecipitation (ChIP) data. Trawler is a program that can be run according two different manners: (i) a standalone version providing a pipeline that generates position weight matrices (PWMs) from the extraction and clustering of over-represented motifs; and (ii) a web application supplying the possibility to submit sequences in both FASTA or BED format, to rank predicted motifs by conservation score as well as to produce a set of background sequences.
A program that discovers transcription factor binding site motifs in nucleotide sequences. DME identifies motifs, represented as position weight matrices, that are overrepresented in one set of sequences relative to another set. The ability to directly optimize relative overrepresentation is a unique feature of DME, making DME an ideal tool for analyzing promoters of transcripts found to have differential expression in a particular context. The optimization procedure is based on an enumerative algorithm that is guaranteed to identify optimal motifs from a discrete space of matrices with a specific lower bound on information content. This strategy scales very well with the number and length of the sequences used, and is well-suited to analyzing very large data sets.
A framework for studying regulatory elements in a genome. Currently focusing on patterns involved in transcriptional regulation, CREAD includes efficient tools for performing fundamental tasks in motif discovery and regulatory sequence analysis. CREAD also includes code libraries to facilitate the implementation of new tools. In addition to fundamental tools, CREAD includes an implementation of the MARS machine learning algorithm, and a Suffix Tree implementation designed for repeated searching of large amounts of sequence data using position-weight matrices, a common representation for transcription-factor binding-sites.
Determines the total affinity of a sequence for a given transcription factor, thus removing the need for a threshold value. TRAP ranks all promoter sequences of a genome on the basis of their overall affinity for that factor to proceed. It can serve to estimate the most enriched factor into a given sequence, the sequences with the highest affinity for a factor of interest, or the binding sites of a factor affected by the given single nucleotide polymorphisms (SNPs).
Discovers novel, ungapped motifs (recurring, fixed-length patterns) in your nucleotide or protein sequences (sample output from sequences). MEME splits variable-length patterns into two or more separate motifs. MEME is part of the MEME Suite online platform.
Investigates biological patterns. PatScan is an application based on the use of an expressive pattern language to detect predetermined DNA and protein sequence patterns. Users have the possibility to look for repeats, hairpins, stem loops or pseudoknots. The application can be run under a command-line interface for researchers with advanced skills or as a simplified web interface exploiting a drag & drop system.
Performs promoter analysis. Melina is a web server that highlights potential DNA motifs in promoters’ regions. The software enables users to run at most four out of five external algorithms: Consensus, MEME, Gibbs sampler, MDscan, and Weeder, along with users’ specified parameter values to avoid missing important motifs. Moreover, a weight matrix from a predicted motif can be built and applied to upstream sequences of several typical genomes or to public motif databases for detecting similar motifs.
Outperforms other leading motif finding algorithms in a number of synthetic models. Moreover, it can be shown that in some previously studied motif models, MULTIPROFILER is capable of pushing the performance envelope to its theoretical limits.
Allows users to predict specific classes of functional elements and cis-regulatory modules. ESPERR is a computational method that is developed to create a reduced representation for removing noise while keeping useful signals for characterizing a class of functional elements. The application can also discriminate regulatory regions from neutral sites thank to a Regulatory Potential score (RP).
Supplies an ensemble algorithm for regulatory site motif discovery. EMD is composed of five algorithms which are run several times independently. Their results are summarized basically by majority. The software aims to improve both sensitivity and specificity, as well as accuracy of the prediction.
Identifies motifs (made of IUPAC symbols) that occur unusually often in a given set of sequences. More specifically, YMF enumerates all motifs in the search space and is guaranteed to produce those motifs with greatest z-scores. The simple web form asks the user to supply the regulatory input sequences in Fasta format.
Allows users to predict distant regulatory elements in higher eukaryotic genomes. DiRE provides computational means to investigate regulatory features of any user-submitted dataset of genes. It simplifies enhancing the functional annotation of the human and other genomes by providing candidate distant regulatory elements (REs) responsible for specific biological functions.
A program for discovering functional motifs shared by a set of nucleotide sequences. Examples of functional motifs include transcription factor binding sites, mRNA splicing control elements, signals for mRNA 3'-cleavage and polyadenylation, and anything else you can dream of. GLAM attempts to find these motifs by obtaining the best possible gapless, multiple alignment of segments of the sequences.
Discovers short nucleotide or peptide sequences and patterns using Arabidopsis thaliana sequence datasets hosted in The Arabidopsis Information Resource (TAIR). PatMatch is a web application performing searches up to 20 residues and that supports both exact or approximate sequence matches. It also provides options allowing users to specify the number of hits that have to be returned or on which strand searching have to be run.
Finds conserved sequence motifs within coding regions. SiteSifter is based on the assumption that DNA sequences with a regulatory function should be evolutionarily conserved at the nucleotide sequence level over and above any conservation required to maintain the amino acid sequence of the encoded proteins. The software scores each instance of a motif on the basis of the chance that its constituent codons are conserved over and above that required for amino acid conservation.
A de novo motif discovery method that is able to directly optimize the statistical significance of PWMs. XXmotif can also score conservation and positional clustering of motifs. The XXmotif server provides (i) a list of significantly overrepresented motif PWMs with web logos and E-values; (ii) a graph with color-coded boxes indicating the positions of selected motifs in the input sequences; (iii) a histogram of the overall positional distribution for selected motifs and (iv) a page for each motif with all significant motif occurrences, their P-values for enrichment, conservation and localization, their sequence contexts and coordinates.
Provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species.