Differential gene expression is the fundamental mechanism underlying animal development and cell differentiation. However, it is a challenge to identify comprehensively and accurately the DNA sequences that are required to regulate gene expression: namely, cis-regulatory modules (CRMs). Three major features, either singly or in combination, are used to predict CRMs: clusters of transcription factor binding site motifs, non-coding DNA that is under evolutionary constraint and biochemical marks associated with CRMs, such as histone modifications and protein occupancy.
Recognizes biological modulators of the complex gene regulation, such as combinatorial regulation, whether multiple transcription factors and modulators are functionally related. GIMLET can calculate all types of dependency, including non-monotonic and non-linear relationships, between random vectors in an arbitrary dimension. This tool can find genetic alterations and functional pertubagens modulating transcription factor activities.
A leading method for binding site cluster detection that determines the significance of observed sites while correcting for local compositional bias of sequences. MSCAN is highly flexible, applying any set of input binding models to the analysis of a user-specified sequence. From the user's perspective, a key feature of the system is that no reference data sets of regulatory sequences from co-regulated genes are required to train the algorithm. The output from MSCAN consists of an ordered list of sequence segments that contain potential regulatory modules.
Uses a motif-based hidden Markov model to scan for clusters of motifs. Its key features include a scoring scheme based on p-values and a method for calibrating the resulting scores to obtain statistical confidence estimates. The new version of MCAST offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs.
A hidden Markov model based method for detecting regulatory regions in DNA sequences, by searching for clusters of cis-elements. Cister is available through a website with a straightforward form to enter queries, and an intuitive graphical display of the results. There is also a downloadable version that can be run on the command line.
Deduces modules, transcription factor-binding site (TFBSs), and motif patterns based on their joint posterior distribution. CisModule consists of a hierarchical mixture (HMx) model including a Bayesian approach. It catches the spatial correlation between different binding sites for a set of transcription factors (TFs). This tool employs the colocalization of TFBSs to improve de novo motif identification.
A Java application for the rapid discovery of significant cis-regulatory elements from sets of coexpressed or coregulated genes. Biologists can automatically (i) retrieve genes and intergenic regions, (ii) identify putative regulatory regions, (iii) score sequences for known transcription factor binding sites, (iv) identify candidate motifs for unknown binding sites, and (v) detect those statistically over-represented sites that are characteristic for a gene set.