1 - 42 of 42 results


Provides a useful tool for motif identification and analysis complementary to the existing tools. BoBro is a method that (i) can reliably identify statistically significant cis-regulatory motifs at a genome scale; (ii) provides a reliable way for optimizing the sequence-similarity cutoff in genome-scale motif scanning; (iii) has a reliable capability to compare and cluster motifs and (iv) can identify transcription factors (TFs) that may jointly regulate genes through identification of the co-occurrences of their cis-regulatory motifs.


star_border star_border star_border star_border star_border
star star star star star
A tool developed for DNase I hypersensitive sites (DHSs) identification. Popera identifies DHSs by applying the kernel density estimation algorithm. All the DHSs identified from various tissues and developmental stages were merged to create a unified DHS file; DHSs were then assigned unique ID tags. Normalized scores of unified DHSs were calculated for each unique tissue and developmental stage.

CONCISE / CONvolutional neural networks for CIS-regulatory Elements

Provides a spline transformation implemented as a Keras layer. CONCISE is a python package that (i) pre-processes sequence-related data by converting a list of sequences into one-hot-encoded numpy array or tokens, (ii) specifies a Keras model with additional modules, (iii) tunes hyper-parameters and provides convenience functions for working with the hyperopt package, (iv) interprets the model and (v) shares and re-uses models.

MCAST / Motif Cluster Alignment and Search Tool

Uses a motif-based hidden Markov model to scan for clusters of motifs. Its key features include a scoring scheme based on p-values and a method for calibrating the resulting scores to obtain statistical confidence estimates. The new version of MCAST offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST's statistical confidence estimates and the utility of epigenomic priors in identifying CRMs.

INSECT / IN-silico SEarch for Co-occurring Transcription factors

Allows to analyze genomic sequence data for in silico cis-regulatory modules (CRMs) prediction and analysis. INSECT is a web server which allows a complete and flexible analysis of the predicted co-regulating Transcription Factors (TFs) and Transcription Factor Binding Sites (TFBSs). The software integrates many different search options and additional results such as automatic regulatory sequences retrieval from Ensembl, phylogenetic footprinting, nucleosome occupancy calculations, and gene ontology (GO) information.


A probabilistic modeling method for predicting cis-regulatory modules (CRMs) that builds a more powerful CRM discovery model based on an HSMM (hidden semi-Markov model). SMCis characterizes the regulatory structure of CRMs and effectively models dependencies between motifs at a higher level of abstraction based on segments rather than nucleotides. SMCis has the following advantages: the level of abstraction at sequence segments rather than single nucleotides makes the model representations more natural, and we can build an individual model for each type of segment (corresponding to the states of the HSMM model).

EDCC / Exploration of Distinctive CREs and CRMs

Examines cis-regulatory modules (CRM) and cis-regulatory elements (CRE). EDCC evaluates positional preferences of the single CREs in relation to each other and to the transcriptional start site. This tool makes use of three initial data sets: gene expression data, promoter sequences of the respective genes, and a list of CRE and CRM described by the user. It correlates with candidates CRE and CRM with gene expression patterns and compares with the expression pattern of all genes.


Identifies conserved transcription factor binding sites (TFBSs) in sequence alignments from multiple related species. MONKEY extends probabilistic models of binding specificity to multiple species with probabilistic models of evolution. This tool is useful for comparative sequence analysis capable of functioning on relatively large numbers of related species, and enabled the examination of several important questions in comparative genomics. Using genomes from the genus saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.


A leading method for binding site cluster detection that determines the significance of observed sites while correcting for local compositional bias of sequences. MSCAN is highly flexible, applying any set of input binding models to the analysis of a user-specified sequence. From the user's perspective, a key feature of the system is that no reference data sets of regulatory sequences from co-regulated genes are required to train the algorithm. The output from MSCAN consists of an ordered list of sequence segments that contain potential regulatory modules.


A flexible Hidden Markov Model (HMM) framework capable of predicting nuclear hormone receptor (NHR) binding sites in genomic sequences. NHR-scan supports Viterbi and Forward algorithm scoring, user-defined thresholds and multiple output formats. For convenience, gene sequences can be retrieved from the EnsEMBL database by user-supplied genome coordinates for any species with EnsEMBL annotations. Because several research groups are focused on particular NRs and may have supplemental data, the option to directly modify the HMM emission and transition parameters is available. NHR-scan model is implemented in a web interface, freely available for academic researchers.

CORECLUST / COnservative REgulatory CLUster STructure

Predicts cis-regulatory modules (CRMs) based on known positional weight matrices (PWMs). CORECLUST uses an HMM-based technique to predict CRMs given known motifs for a set of system-specific transcription factors (TFs). Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a model describing conserved rules of relative location of TF binding sites (CRM structure). The constructed model may be used for the CRM prediction, as well as for the investigation of the regulatory grammar of the system of interest.

CREAM / Clustering of genomic REgions Analysis Method

Provides a systematic approach for identifying Clusters Of cis-Regulatory Elements (COREs). CREAM is an unsupervised machine learning approach that takes into account the distribution of distances between cis-regulatory elements (CREs) in a given biological sample. It can offers a model of research for personalized therapeutic identification in clinical cancer setting. This method can be used to further characterize cis-regulatory landscapes of cells.

DECRES / DEep learning for identifying Cis-Regulatory ElementS

Based on the Deep Learning Tutorials and Theano. DECRES is developped for the identification of CREs. This tool contains a supervised deep model – feedforward neural network (also known as multilayer perceptrons or MLP) for the detection of regulatory regions. A utility module for classification is also includes normalization methods, class and feature pre-processing functions, post-processing functions, and visualizations of classification results.

CompMoby / Comparative MobyDick

Identifies de novo cis-regulatory elements functioning at the transcriptional as well as the post-transcriptional level in metazoans. CompMoby is a general cis-regulatory motif discovery algorithm that can be used to identify motifs enriched in intergenic and 3' Untranslated Transcribed Region (UTR) sequences. It integrates species specific and evolutionary conservation information and formats the output files to systematically identify over-represented putative transcription factor binding sites (TFBSs) in upstream sequences.

PRECISE / Prediction of REgulatory CIS-acting Elements

Predicts cis-acting elements. PRECISE can filter through promoter regions of a given set of genes entirely selected by the user in order to identify motifs that are likely to be involved in gene regulation. It can be divided into (1) the creation of a reference set, needed for significance assessment which needs to be done only once per organism, and (2) the scanning of a selected set of promoter sequences for motifs that appear with a high frequency which can be repeated many times using different input and settings.


Identifies cis-regulatory modules (CRMs) independent of the information about the transcription factors that regulate a target gene. CisPlusFinder may provide an accurate tool for biologists to uncover the regulation of individual genes by allowing the user to tailor the set of informant species to the local substitution rate and set parameters appropriate to the complexity of regulation of the target gene. Applied to a benchmark dataset of CRMs involved in early drosophila development, CisPlusFinder predicts more annotated CRMs than all other methods tested.

COMET / Cluster Of Motifs E-value Tool

A method to predict the regulation of genes by transcription factors. COMET finds statistically significant clusters of motifs in a DNA sequence. The Web version of COMET allows to select motifs from a small library, or to enter matrices directly. COMET performs comparably with two alternative state-of-the-art techniques, which are more complex and lack E-value calculations. This statistical method enables to clarify the major bottleneck in the hard problem of detecting cis-regulatory regions, which is that many known enhancers do not contain very significant clusters of the motif types searched. Thus, discovery of additional signals that belong to these regulatory regions will be the key to future progress.

DHC-MEGE / DNase Hypersensitivity Connectivity Motif Enrichment in GeneExpression

A program for the identification of enriched motifs in gens found to be differentially expressed in a microarray or RNA-seq experiment. DHC-MEGE is able to identify distal elements such as enhancers, which are often overlooked with standard promoter motif analysis. This program uses list of up- and down-regulated gene symbols for motif analysis. It also requires a DNase Hypersensitivity (DH) connectivity maps which describes the interaction between a promoter and distal element together with the connection correlation coefficient.


Discovers cis-regulatory modules (CRMs) and their component motifs simultaneously in groups of orthologous sequences from multiple species. Compared to alignment-based motif discovery methods such as PhyME and PhyloGibbs, our approach has two unique features: (i) we consider module information through a hidden Markov model; (ii) the multiple alignments of orthologous sequences are dynamically updated, so that the uncertainty in the alignments is taken into account.

Genome Surveyor

Predicts transcription factor (TF) binding targets and cis-regulatory modules (CRM), based on motifs representing experimentally determined DNA binding specificities. Genome Surveyor displays genome browser tracks that profile matches to individual motifs or user-selected combinations of motifs, based on sequence information from a single genome or a combination of genomes. It also provides tracks for supervised CRM prediction, driven by a user-selected subset of known CRMs from the REDfly database. Users select the type of binding site profiles that will be used for search. Next, they may choose to scan the entire genome, or provide a list of genomic loci where the search will be performed.