Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

De novo motif discovery software tools | ChIP sequencing data analysis

De novo motif discovery is a difficult computational task. Historically, dedicated algorithms always reported a high percentage of false positives. Their performance did not improve considerably even after they adapted to handle large amounts of chromatin immunoprecipitation sequencing (ChIP-Seq) data.

Source text:
(Lihu and Holban, 2015) A review of ensemble methods for de novo motif discovery in ChIP-Seq data. Brief Bioinform.

1 - 50 of 97 results
filter_list Filters
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 97 results
star_border star_border star_border star_border star_border
star star star star star
Falls into the motif enumeration family of motif discovery tools in which the occurrence of motifs in the query sequences are counted and, in this case, compared to a pre-calculated set of genome specific background motifs. This has the benefit of not having to construct a background set of sequences (no easy task). Weeder was initially used to identify common motifs in defined promoter regions, but evolved to consider first ChIP-chip and then ChIP-seq data.
HOMER / Hypergeometric Optimization of Motif EnRichment
star_border star_border star_border star_border star_border
star star star star star
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
MEME Suite
star_border star_border star_border star_border star_border
star star star star star
Provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters.
star_border star_border star_border star_border star_border
star star star star star
Discovers DNA motifs on protein binding microarray (PBM) data. kmerHMM is a computational pipeline for PBM motif discovery in which hidden markov models (HMMs) are trained to model DNA motifs, and Belief Propagation is used to elucidate multiple motif models from each trained HMM. The software model the dependence between adjacent nucleotide positions and can also deduce multiple binding modes for a given transcription factor (TF).
star_border star_border star_border star_border star_border
star star star star star
Identifies DNA-binding motifs in ChIP-Seq and DNase-Seq data. EXTREME is an online implementation of the MEME algorithm that uses the online expectation-maximization (EM) algorithm to discover motifs closely matching motifs discovered by MEME. The software can be useful for thorough motif discovery in large datasets. It can discover multiple motifs in DNase-Seq data and can be employed for understanding transcriptional regulation.
Finds over-represented conserved transcription factor binding sites (TFBS) and binding site combinations in DNA sequences of co-expressed genes or sequences generated from high-throughput methods. oPOSSUM enables researchers interested in the study of gene regulatory networks to identify TFs that may be acting in a biological context. The software features a panel of approaches to regulatory sequence analysis, including Single-Site Analysis (SSA) and anchored Combination-Site Analysis (aCSA).
BaMM!motif / Bayesian Markov Model motif discovery
Offers an approach for motif discovery based on a Bayesian approach. BAMM!motif is an application that exploits Bayesian Markov Models (BaMMs) to perform its predictions. It consists of four distinct modules allowing users to: (i) investigate nucleotide sequence to determine high-order motifs; (ii) explore model repositories with a feature for searching given motifs against a pre-computed database; and (iii) detect motifs occurrences from sequences.
A web-based tool for analyzing motifs in large DNA or RNA data sets. MEME-ChIP can analyze peak regions identified by ChIP-seq, cross-linking sites identified by CLIP-seq and related assays, as well as sets of genomic regions selected using other criteria. MEME-ChIP performs de novo motif discovery, motif enrichment analysis, motif location analysis and motif clustering, providing a comprehensive picture of the DNA or RNA motifs that are enriched in the input sequences. MEME-ChIP is part of the MEME Suite online platform.
DMINDA / DNA Motif Identification aND Analyses
Allows to indentify and analyse regulatory DNA motif. DMINDA is a motif analysis web server that contains six motif analysis functions: i) motif finding; (ii) motif scanning; (iii) motif comparison; (iv) motif co-occurrence analysis; (v) motif prediction by phylogenetic footprinting (namely MP3); and (vi) regulon prediction. The software can benefit the genomic research community in general and prokaryotic genome researchers in particular.
A tool suite designed to aid in analysis of next-generation sequencing (NGS) data. kmer-SVM uses a support vector machine (SVM) with kmer sequence features to identify predictive combinations of short transcription factor binding sites which determine the tissue specificity of the original NGS assay. Information gained from kmer-SVM can be used as an additional source of confidence in genomic experiments by recovering known binding sites, and can also reveal novel sequence features and possible cooperative mechanisms to be tested experimentally.
Finds sequence motif in higher eukaryotes. CompareProspector takes advantage of comparative genomics information to proceed. It employs a Gibbs sampling method to search for motifs in the input sequences, biasing the search toward conserved regions by integrating sequence conservation into the posterior probability in the sampling process. The software identifies regulatory elements using information from both intraspecies pattern enrichment and interspecies sequence conservation.
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.
MICSA / Motif Identification for ChIP-Seq Analysis
Provides peak identification in ChIP-Seq data. MICSA uses an approach combining knowledge about DNA fragment coverage in ChIP and control experiments along with motif discovery. This application is able to automatically identify overrepresented motifs in a single run, as well as to use motif occurrence probabilities to enhance the result set returned. It can also be used on medium quality datasets with low average DNA fragment coverage.
SIOMICS / Systematic Identification Of Motifs In ChIP-Seq data
Enables de novo identification of motifs and transcription factor binding sites (TFBSs) from all peak regions of a ChIP-seq experiment. SIOMICS is a computational approach that does not depend on limited information of known motifs and simultaneously considers multiple motifs. The software was tested on both simulated and experimental data and identified motifs of more known cofactors and more shared motifs in the experimental data.
Discrover / DISCRiminative and discOVER
Discovers sequence motifs of protein binding-site patterns in nucleic acid sequences. Discrover is based on a discriminative learning method and on a Hidden Markov Model (HMM). It allows user to train all parameters by one objective function, or only the motif emissions and leave other parameters unmodified. The analyses conducted by the tool of ChIPSeq data appear to be stringent and robust, as indicated by the similarity of multiply discovered motifs or the high proportion of motifs recovered.
RCADE / Recognition Code-Assisted Discovery of regulatory Elements
A software tool for motif discovery from Cys2His2 zinc finger (C2H2-ZF) ChIP-seq data. RCADE combines predictions from a DNA recognition code of C2H2-ZFs with ChIP-seq data to identify models that represent the genuine DNA binding preferences of C2H2-ZF proteins. RCADE is able to identify generalizable binding models even from peaks that are exclusively located within the repeat regions of the genome, where state-of-the-art motif finding approaches largely fail.
Discovers motifs localized relative to a biological landmark in long regulatory sequences. LocalMotif has two modules: (1) a core module that discovers prominent non-redundant motifs, and (2) a refinement module that fine-tunes these motifs. It combines three different scoring functions that individually describe three different characteristics of a motif: the relative entropy score (RES), the over-representation score (ORS), and the spatial confinement score (SCS). The interval predictions made by the software provide biologically useful information about transcription factor (TF)-TF interactions.
KeBABS / Kernel-Based Analysis of Biological Sequences
Provides functionality for kernel based analysis of biological sequences via support vector machine (SVM) based methods. Biological sequences include DNA, RNA, and amino acid (AA) sequences. Sequence kernels define similarity measures between sequences. The package implements some of the most important kernels for sequence analysis in a very flexible and efficient way and extends the standard position-independent functionality of these kernels in a novel way to take the position of patterns in the sequences into account for the similarity measure.
A flexible support vector machines (SVM) workflow that predicts new regulatory sequences based on the annotation of known cis-regulatory modules (CRMs), which are associated to a large variety of feature types. The workflow is composed of five main steps: 1) numerical mapping of features to sequences, 2) tuning the SVM parameters, 3) feature selection, 4) model creation and evaluation and 5) scoring of new sequences. LedPred is provided as an R/Bioconductor package connected to an online server to avoid installation of non-R software. Due to the heterogeneous CRM feature integration, LedPred excels at the prediction of regulatory sequences in Drosophila and mouse datasets compared to similar SVM based software.
Provides a probabilistic model for de novo DNA motif pair discovery on paired sequences. MotifHyades is more accurate than the previous ad hoc computational pipeline for DNA motif pair discovery. In particular, the de novo nature can enable to discover novel motif pairs on the rapidly growing chromatin interaction and genome segmentation datasets. In addition, MotifHyades was applied to discover thousands of DNA motif pairs with higher gold standard motif matching ratio, higher DNase accessibility and higher evolutionary conservation than the previous ones in the human K562 cell line.
PAD / Proximal And Distal
Identify chromatin immunoprecipitation of 104 DNA binding proteins in embryonic stem cell (ESC) lines co-localization at these respective regions. PAD was applied to characterize protein co-localization at proximal and distal regions using this large compendium of ESC-specific protein binding profiles. The tool has permitted to discover an extensive co-localization of BRG1 and CHD7 at distal but not proximal regions. It reveals the co-dependency of BRG1 and CHD7 at distal regions on regulating expression of their common target genes in ESC.
Models and predicts transcription factor binding sites (TFBSs). TFFM is an HMM-based framework which is flexible and supports dinucleotide composition analysis and variable lengths for prediction of TFBSs. It was used to construct transcription factor flexible models (TFFMs) from ChIP-seq data sets and to predict TFBSs within DNA sequences. The framework allows researchers to deeply analyze the features of TF-DNA binding interaction by looking at local dinucleotide dependencies captured by the TFFMs and represented by the logos.
0 - 0 of 0 results
1 - 17 of 17 results
filter_list Filters
computer Job seeker
Disable 5
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 17 of 17 results