De novo motif discovery software tools | ChIP sequencing data analysis
De novo motif discovery is a difficult computational task. Historically, dedicated algorithms always reported a high percentage of false positives. Their performance did not improve considerably even after they adapted to handle large amounts of chromatin immunoprecipitation sequencing (ChIP-Seq) data.
Provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters.
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
A computational method that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates.
Falls into the motif enumeration family of motif discovery tools in which the occurrence of motifs in the query sequences are counted and, in this case, compared to a pre-calculated set of genome specific background motifs. This has the benefit of not having to construct a background set of sequences (no easy task). Weeder was initially used to identify common motifs in defined promoter regions, but evolved to consider first ChIP-chip and then ChIP-seq data.
Discovers DNA motifs on protein binding microarray (PBM) data. kmerHMM is a computational pipeline for PBM motif discovery in which hidden markov models (HMMs) are trained to model DNA motifs, and Belief Propagation is used to elucidate multiple motif models from each trained HMM. The software model the dependence between adjacent nucleotide positions and can also deduce multiple binding modes for a given transcription factor (TF).
Provides a Kernel Density Estimator-based package for analysis of massively parallel sequencing data from chromatin immunoprecipitations. QuEST offers the possibility to search de novo motif on ChIP-seq experiment and to perform gene ontology (GO) analysis on the ChIP-Seq data obtained from the NCBI Short Read Archive (SRA). This software is based on a realistic statistical modeling of the ChIP-Seq.
Finds over-represented conserved transcription factor binding sites (TFBS) and binding site combinations in DNA sequences of co-expressed genes or sequences generated from high-throughput methods. oPOSSUM enables researchers interested in the study of gene regulatory networks to identify TFs that may be acting in a biological context. The software features a panel of approaches to regulatory sequence analysis, including Single-Site Analysis (SSA) and anchored Combination-Site Analysis (aCSA).
A computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. RSAT peak-motifs relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes.
Offers an approach for motif discovery based on a Bayesian approach. BAMM!motif is an application that exploits Bayesian Markov Models (BaMMs) to perform its predictions. It consists of four distinct modules allowing users to: (i) investigate nucleotide sequence to determine high-order motifs; (ii) explore model repositories with a feature for searching given motifs against a pre-computed database; and (iii) detect motifs occurrences from sequences.
A web-based tool for analyzing motifs in large DNA or RNA data sets. MEME-ChIP can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP is part of the MEME Suite online platform.
Allows to indentify and analyse regulatory DNA motif. DMINDA is a motif analysis web server that contains six motif analysis functions: i) motif finding; (ii) motif scanning; (iii) motif comparison; (iv) motif co-occurrence analysis; (v) motif prediction by phylogenetic footprinting (namely MP3); and (vi) regulon prediction. The software can benefit the genomic research community in general and prokaryotic genome researchers in particular.
Searches for enrichment of motifs in large datasets of DNA, RNA or protein sequences. DRIMust is a web application whose algorithm is based on the minimum hypergeometric statistical framework and uses suffix trees for enumeration of motif candidates. The software combines search on large ranked lists with P-value assessment for the detected motifs. It can detect long motifs and motifs over large alphabets.
A tool suite designed to aid in analysis of next-generation sequencing (NGS) data. kmer-SVM uses a support vector machine (SVM) with kmer sequence features to identify predictive combinations of short transcription factor binding sites which determine the tissue specificity of the original NGS assay. Information gained from kmer-SVM can be used as an additional source of confidence in genomic experiments by recovering known binding sites, and can also reveal novel sequence features and possible cooperative mechanisms to be tested experimentally.
Identifies known or user-provided motifs that show a significant preference for particular locations in your nucleotide sequences. CentriMo can also show if the local enrichment is significant relative to control sequences. It is part of the MEME Suite online platform.
Finds sequence motif in higher eukaryotes. CompareProspector takes advantage of comparative genomics information to proceed. It employs a Gibbs sampling method to search for motifs in the input sequences, biasing the search toward conserved regions by integrating sequence conservation into the posterior probability in the sampling process. The software identifies regulatory elements using information from both intraspecies pattern enrichment and interspecies sequence conservation.
Predicts transcription factor function. PRISM combines genome-wide conserved binding site prediction with transcription factor and binding site function prediction. The software offers an interface to explore our predictions from the perspective of transcription factors, biological roles, target genes, or target binding sites/regions. It integrates with GREAT and the UCSC Genome Browser.
Allow users to predict cell type-specific effects of single nucleotide polymorphism (SNPs) on chromatin activity. OrbWeaver can identify trans-acting elements driving cellular differences in chromatin accessibility and can determine effects of genetic variation in a cell type-specific manner. Moreover, it assists users to detect transcription factors (TFs) with known cell type-specific effects and can ascertain direction of effect of the cell type-specific chromatin accessibility quantitative trait locis (caQTLs).
Allows users to preserve the inter-position dependencies and includes the flanking k-mers. KSM is a program that consists of a set of aligned k-mers that are over-represented at transcription factor (TF) binding sites. This tool can be used for predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles.
A computational pipeline to extract the transcription factor binding motifs from ChIP-seq data, assuming no reference genome is available. denovochipseq combines de novo assembly with statistical tests enabling motif discovery without the use of a reference genome. We validate the performance of denovochipseq using human and mouse data. Analysis of fly data indicates that denovochipseq outperforms alignment based methods that utilize closely related species.
Approximates expectation–maximization (EM) using suffix trees. STEME is a method that can search for motifs of different widths. Its probabilistic model only considers one width at a time. It was designed to be used on the type of large data sets typically generated by modern biological experiments. This method is an extension of the model used by MEME, an another motif finder suite.