Differential gene expression is the fundamental mechanism underlying animal development and cell differentiation. However, it is a challenge to identify comprehensively and accurately the DNA sequences that are required to regulate gene expression: namely, cis-regulatory modules (CRMs). Three major features, either singly or in combination, are used to predict CRMs: clusters of transcription factor binding site motifs, non-coding DNA that is under evolutionary constraint and biochemical marks associated with CRMs, such as histone modifications and protein occupancy.
Allows retrieval and analysis of promoters. Gene2Promoter provides access to promoter sequences of all genes annotated in the several available genomes. The software supplies, for a given list of input genes, the chromosomal location and a short description of the corresponding gene together with the promoters and alternative transcripts. It permits users to utilize the output for comparative analysis.
Detects cis-regulatory modules (CRMs). CREME can initiate a search for abundant CRMs in the promoter regions of putatively co-regulated genes. It is limited to the promoter regions of RefSeq genes only. This tool can recognize combinations of transcription factor binding sites (TFBSs) that tend to co-occur in close proximity in the promoter regions of a specific set of genes. It has been replaced by the DiRE tool (Distant Regulatory Elements).
A leading method for binding site cluster detection that determines the significance of observed sites while correcting for local compositional bias of sequences. MSCAN is highly flexible, applying any set of input binding models to the analysis of a user-specified sequence. From the user's perspective, a key feature of the system is that no reference data sets of regulatory sequences from co-regulated genes are required to train the algorithm. The output from MSCAN consists of an ordered list of sequence segments that contain potential regulatory modules.
A hidden Markov model based method for detecting regulatory regions in DNA sequences, by searching for clusters of cis-elements. Cister is available through a website with a straightforward form to enter queries, and an intuitive graphical display of the results. There is also a downloadable version that can be run on the command line.
Uses a motif-based hidden Markov model to scan for clusters of motifs. Its key features include a scoring scheme based on p-values and a method for calibrating the resulting scores to obtain statistical confidence estimates. The new version of MCAST offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs.
Discovers cis-regulatory modules (CRMs) and their component motifs simultaneously in groups of orthologous sequences from multiple species. Compared to alignment-based motif discovery methods such as PhyME and PhyloGibbs, our approach has two unique features: (i) we consider module information through a hidden Markov model; (ii) the multiple alignments of orthologous sequences are dynamically updated, so that the uncertainty in the alignments is taken into account.