1 - 50 of 113 results

FIRE / Finding Informative Regulatory Elements

Infers motifs from gene expression data. FIRE allows the discovery of DNA and RNA motifs that show generic dependency with diverse aspects of gene expression. It takes into account differential expression observed in single microarray experiments, cluster indices associated with co-expressed genes, expression phases in a periodic time-series, spatial patterns of gene expression from in situ hybridization experiments, and classification of enhancers based on genetic/biochemical data.

Ori-Finder 1

Identifies replication origin (oriCs) in bacterial genomes. Ori-Finder 1 is an online system based on an integrated method comprising the analysis of base composition asymmetry using the Z-curve method, distribution of DnaA boxes, and the occurrence of genes frequently close to oriCs. The program can also deal with the unannotated sequences by integrating the gene-finding program ZCURVE 1.02. Users can define their own DnaA boxes or origin recognition boxes (ORB) elements.

SCOPE / Suite for Computational identification Of Promoter Elements

An ensemble of programs aimed at identifying novel cis-regulatory elements from groups of upstream sequences. SCOPE motif finder uses an ensemble of three programs behind the scenes to identify different kinds of motifs - BEAM identifies non degenerate motifs (e.g. ACGTGC), PRISM identifies degenerate motifs (e.g. AWCGRYH), and SPACER identifies bipartite motifs (e.g. ACCNNNNNNNNNGTT). All parameters are automatically set to find the optimal length motif and degree of degeneracy in the reported motifs.


Incorporates the dinucleotide position-specific propensity information into the general pseudo nucleotide composition and uses the random forest classifier. iROS-gPseKNC can be used to identify the duplication origin sites based on the DNA sequence information alone. The tool is significantly better than the best existing method in sensitivity, specificity, overall accuracy, and stability. Users can easily obtain their desired results without the need to go through the detailed mathematics.

MARS / Motif Assessment and Ranking Suite

A web server hosting a suite of tools that make position weight matrix (PWM) motif evaluation and ranking techniques accessible. MARS is supported by a database of benchmark data and PWM models. This database includes the corresponding experimental ChIP-seq and protein binding microarray (PBM) data obtained from ENCODE and UniPROBE databases, respectively. MARS implemented tools include: a data-independent consistency-based motif assessment and ranking (CB-MAR), which is based on the idea that ‘correct motifs’ are more similar to each other while incorrect motifs will differ from each other; and a scoring and classification-based algorithms, which rank binding models by their ability to discriminate sequences known to contain binding sites from those without.

FIDDLE / Flexible Integration of Data with Deep LEarning

A framework that learns a unified rich representation by exploiting synergistic interactions within and across datasets. FIDDLE comprises ConvNet modules for individual datasets and combines under a common scaffold for unified data representation for dataset inference. FIDDLE representation can be used to infer a particular dataset. Clearly with the cost and widespread need of genome-wide datasets, the potential of this framework is vast. All that is required for predicting a dataset is to constrain the model through aligning the representations of inputs within a context specified by the data to be predicted. Different representations can be learned by specifying different contexts. For example, we could ask FIDDLE to predict transcription factor binding sites using the same input datasets. In this case, the model would learn a different unified representation through the new constraints. These representations can then be used in a number of flexible ways, such as to infer causality between genomic features or to transfer the relationship to another domain, such as another species or cell type where the target dataset is not available.

DeFCoM / Detecting Footprints Containing Motifs

A supervised learning based footprint prediction framework. DeFCoM was designed to capture variation in DNaseI signal within active footprints and unbound motif sites to enhance footprint classification accuracy, a consideration unaccounted for in previous footprinters. From a set of motif sites labeled as active or inactive for a given transcription factor in a cell experimental condition, the Support Vector Machine (SVM) classifier is trained on features that are derived from DNase-seq data from the same cell type for each motif site. This allows DeFCoM to capture the complexity of the data when necessary with the Radial Basis Function (RBF) kernel, while avoiding over-fitting, a common problem in supervised learning, by choosing the linear kernel when that complexity is lacking.

PC-TraFF / Potentially Collaborating Transcription Factor finder

Detects interactions between homotypic and heterotypic transcription factor (TF) pairs using pointwise mutual information (PMI). PC-TraFF builds all possible transcription factor binding sites (TFBS)-pairs and calculates their weighted pointwise mutual information scores. It employs the average product correction theorem which reduces the effect of false positive TFBSs. This tool is able to predict additional pairs which are likely to play critical role in the gene regulatory network (GRN).


Tackles the qPMS problem on real data as well as challenging instances. qPMS7 is a framework for transcription factor-binding sites (TFBS) discovery developed for the qPMS problem and tested it on DNA as well as protein sequences. This method is the result of a combination of an extension of Algorithm qPMSPrune and the core idea of algorithm PMS5. This algorithm is also implemented in a web app. The results show that Algorithm qPMS7 outperforms Algorithms qPMSPruneI and qPMSPrune on all the cases.

iFORM / incorporating Find Occurrence of Regulatory Motifs

Analyses DNA sequences with transcription factors (TFs) motifs described as position weight matrices (PWMs). iFORM achieves higher accuracy and sensitivity by integrating five classical motif discovery programs using Fisher’s combined probability test. iFORM provides accurate results using a variety of data from the ENCODE Project and the NIH Roadmap Epigenomics Project, and it demonstrates its utility to further elucidate individual roles of functional elements in the mechanisms of transcriptional regulation and human disease.


Models the transcription factor binding affinity landscape. Sequence2Vec combines the strength of probabilistic graphical models, feature space embedding, and deep learning. It represents DNA binding sequences as a hidden Markov model (HMM). This tool works with a wide range of in vivo and in vitro data sets by providing a generic recipe to deal with many structured data sets in computational biology, such as protein sequences, drug molecules, or even molecular dynamic trajectories.

MOODS / Motif Occurrence Detection Suite

Uses for motif matching to deal with high-order PWMs and variants in sequences. MOODS is a suite of algorithms for matching position weight matrices (PWM) against DNA sequences, featuring advanced matrix matching algorithms that can be used to scan hundreds of matrices against chromosome-sized sequences in few seconds. MOODS has been designed with integration into large-scale python workflows in mind, but can also be used as a stand-alone analysis tool.


Supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in Oryza sativa and Zea mays.


A transcription factor (TF)-generalized classifier based on local DNA shape parameters that improves PWM-based transcription factor binding site (TFBS) prediction. regshape predicts whether a short (8–32 bp) DNA sequence from the noncoding genome is a TFBS for any TF, or whether it is a non-binding-site sequence. This generic classifier is based on a novel procedure for extracting sequence length-independent features from bp-level DNA shape parameters within the binding site.

TRAP / Transcription factor Affinity Prediction

Determines the total affinity of a sequence for a given transcription factor, thus removing the need for a threshold value. TRAP ranks all promoter sequences of a genome on the basis of their overall affinity for that factor to proceed. It can serve to estimate the most enriched factor into a given sequence, the sequences with the highest affinity for a factor of interest, or the binding sites of a factor affected by the given single nucleotide polymorphisms (SNPs).

Melina II

A web-based tool for promoter analysis. Melina II shows potential DNA motifs in promoter regions with a combination of several available programs, Consensus, MEME, Gibbs sampler, MDscan and Weeder, as well as several parameter settings. It allows running a maximum of four programs simultaneously, and comparing their results with graphical representations. In addition, users can build a weight matrix from a predicted motif and apply it to upstream sequences of several typical genomes (human, mouse, S. cerevisiae, E. coli, B. subtilis or A. thaliana) or to public motif databases (JASPAR or DBTBS) in order to find similar motifs.


A de novo motif discovery method that is able to directly optimize the statistical significance of PWMs. XXmotif can also score conservation and positional clustering of motifs. The XXmotif server provides (i) a list of significantly overrepresented motif PWMs with web logos and E-values; (ii) a graph with color-coded boxes indicating the positions of selected motifs in the input sequences; (iii) a histogram of the overall positional distribution for selected motifs and (iv) a page for each motif with all significant motif occurrences, their P-values for enrichment, conservation and localization, their sequence contexts and coordinates.

TELS / Transcribed Enhancer Landscape Search

Identifies predictive short motif signatures of transcribed enhancers (TrEn). TELS is a machine-learning algorithm that applies logistic regression (LR) coupled with dimensionality reduction techniques to identify systematically the most informative combinations of short sequence motifs of TrEn in the human genome. The software first identifies candidate combinations of sequence motifs that characterize the class of interest and then assesses, for every candidate combination of motifs, its significance.


Facilitates the searching of DNA motifs in either custom sequences or the proximal promoters or 3′ UTR of 50 genome-sequenced plant species using suffix-tree algorithm. ExactSearch is a user-friendly web-based server that caters to a different need of finding the already known degenerate motif sequences in a large number of plant proximal gene sequences in a very efficiently manner. To operate this method, the users only need a web browser and have access to an electronic mail account.

Ori-Finder 2

Predicts replication origins (oriCs) in archaeal genomes. Ori-Finder 2 is a web server which utilizes an integrated method to automatically predict the replication origins in archaeal genome including disparity analysis using the Z-curve method, the distribution of origin recognition boxe (ORB) with FIMO tool, and the occurrence of genes frequently close to replication origin. The software also could analyze the un-annotated complete genome with two embedded gene-finding programs, Zcurve and Glimmer, for gene identification.