Enhancers are of short regulatory DNA elements. They can be bound with proteins (activators) to activate transcription of a gene, and hence play a critical role in promoting gene transcription in eukaryotes. With the avalanche of DNA sequences generated in the post genomic age, it is a challenging task to develop computational methods for timely identifying enhancers from extremely complicated DNA sequences.
An integrative genomics method for the prediction of regulatory features and cis-regulatory modules in Human, Mouse, and Fly. ii-cisTarget enables: (i) to detect transcription factor motifs in a set of peaks (e.g. differentially active peaks based on H3K27ac ChIP-seq between 2 conditions) or co-expressed genes, (ii) to detect overrepresented in vivo features (histone modifications, TF ChIP-seq, DHS, Faire) for gene signatures or peaks. These regulatory features help to improve motif discovery and candidate target gene prediction, (iii) to dissect a set of co-expressed genes into direct target genes of different transcription factor motifs or ChIP-seq tracks. Some of the key features of i-cisTarget are: (i) over-represented motifs are predicted in the set of co-expressed genes, using entire intergenic and intronic sequences, (ii) 10 vertebrate species are used for motif scoring in Human and Mouse version, 12 Drosophila species are used in Drosophila version.
An integrated method for predicting enhancer targets. Leveraging abundant omics data, we develop multiple features and integrate them probabilistically to make robust predictions of enhancer–promoter (EP) pairs. The selected features are based on our current understanding of enhancer structure, function, and evolution. Using both computational and experimental validations, we show IM-PET significantly outperforms state-of-the-art methods.
Detects functional sequence elements in regulatory DNA. gkmSVM is trained to detect regulatory DNA elements active in a cell type by using chromatin accessible DNA sequences as the positive set and a set of GC matched inaccessible DNA sequences as negative set. It can then score and predict the accessibility of any DNA sequence. The software contains several sequence kernels, such as kmer-SVM, mismatch kernel and wildcard kernel.
A Random-Forest based algorithm to integrate histone modification profiles for identification of enhancers, and used it to identify enhancers in a number of cell-types. RFECS not only leads to more accurate and precise prediction of enhancers than previous methods, but also helps identify the most informative and robust set of three chromatin marks for enhancer prediction.
Allows users to determine enhancers based on chromatin features in different cell types. EP-DNN consists of a deep neural network-based global enhancer prediction algorithm. It enables researchers to detect enhancers in two distinct cell types, namely the human embryonic stem cell type (H1) and a differentiated primary lung fibroblast cell line (IMR90).
Allows the seamless integration of feature data from a variety of experimental techniques and biological contexts that have previously been used individually to predict enhancers. One motivation for developing EnhancerFinder was to explore whether combining previous successful approaches to enhancer prediction would improve performance. The EnhancerFinder's integration of diverse types of data from different cellular contexts significantly improves prediction of validated enhancers over approaches based on a single context or type of data. Applying EnhancerFinder to the entire human genome allowed us to predict more than 80,000 developmental enhancers, with tissue-specific predictions for brain, limb, and heart.
A tool suite designed to aid in analysis of next-generation sequencing (NGS) data. kmer-SVM uses a support vector machine (SVM) with kmer sequence features to identify predictive combinations of short transcription factor binding sites which determine the tissue specificity of the original NGS assay. Information gained from kmer-SVM can be used as an additional source of confidence in genomic experiments by recovering known binding sites, and can also reveal novel sequence features and possible cooperative mechanisms to be tested experimentally.