1 - 30 of 30 results

HOMER / Hypergeometric Optimization of Motif EnRichment

star_border star_border star_border star_border star_border
star star star star star
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.


Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.

cERMIT / conserved Evidence-Ranked Motif Identification Tool

Allows motif identification. cERMIT is designed to analyze current large genomic regulatory datasets such as those from ChIPchip or ChIP-seq experiments. The software makes use of the complete data without the need to pre-define or infer thresholds. It can take different data as evidence for regulatory interactions, and can optionally utilize orthologous sequences from related species to restrict the search to co-occurring motifs.

SSA / Submodular Selection of Assays

Chooses a diverse panel of genomic assays that leverages methods from submodular optimization. SSA serves as a model for how submodular optimization can be applied to other discrete problems in biology. This method is computationally efficient, results in high-quality panels according to several quality measures, and is mathematically optimal under some assumptions. It can be used partway through the investigation of a cell type, when several assays are already available. The tool can determine the most informative next experiments to perform.


A peak-calling algorithm written in Perl, mainly intended for use in the identification of peaks in mapped DNaseI-seq data. PeaKDEck also includes a set of utilities for processing and manipulation of this data. It selects a threshold read density for peak calling by constructing a probability distribution of background read density scores using kernel density estimation. PeaKDEck can be used for similar methods such as chromatin immunoprecipitation sequencing and FAIRE-seq, by applying a user-defined offset to calculated genomic positions. It is especially useful compared with other peak callers where signal-to-noise ratio is low.

MuSERA / Multiple Sample Enriched Region Assessment

A broadly useful standalone tool for both interactive and batch analysis of combined evidence from enriched regions (ERs) in multiple ChIP-seq or DNase-seq replicates. Besides rigorously combining sample replicates to increase statistical significance of detected ERs, it also provides quantitative evaluations and graphical features to assess the biological relevance of each determined ER set within its genomic context; they include genomic annotation of determined ERs, nearest ER distance distribution, global correlation assessment of ERs and an integrated genome browser.


An efficient footprint detection program. DNase2TF searches for relatively protected regions within DNase I hypersensitive sites and generates a set of footprint candidates at a preset FDR threshold. Starting with an empirical set of candidate regions based on raw cut counts, the algorithm proceeds by iterating two basic steps: i) assessing the significance of cut depletion for all regions in the current set and ii) deciding whether to merge two closest neighboring regions for improved significance of depletion. DNase2TF provides a marked gain in computational speed, scanning DHSs in a mammalian genome for footprint candidates in minutes.


Enables the identification of protein binding footprints in DNase I hypersensitive sites sequencing (DNase-seq) data. The cumulative Skellam distribution function (package 'skellam') is used to detect significant normalized count differences of opposed sign at each DNA strand. This is done in order to determine the protein-binding footprint flanks. Preprocessing of the mapped reads is recommended before running DNaseR (e.g., quality checking and removal of sequence-specific bias).


A computational method for identifying individual transcription factor (TF) binding sites from genome sequence information and cell-type–specific experimental data, such as DNase-seq. Romulus combines the strengths of previous approaches, and improves robustness by reducing the number of free parameters in the model by an order of magnitude. We show that Romulus significantly outperforms existing methods across three sources of DNase-seq data, by assessing the performance of these tools against ChIP-seq profiles. The difference was particularly significant when applied to binding site prediction for low-information-content motifs.


A mixture modeling framework to train multinomial based footprint models and assign footprint likelihood scores to each candidate binding site. The modeling approach was also able to detect variation in the consensus motifs that transcription factors (TFs) bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.

Deopen / Deep openness prediction network

Learns regulatory sequence code and predicts chromatin accessibility at the whole genome level. Deopen is able to achieve state-of-the-art performance in the chromatin accessibility classification problem. It recovers continuous degree of chromatin accessibility for an input sequence, and fills the gap of predicting DNA accessibility signals in continuous values. This tool is based on a deep convolutional neural network (CNN) and a typical three-layer feed forward network.


Visualizes next-generation sequencing (NGS) signals and sequence motif densities along genomic features using average plots and heatmaps. It can also calculate sequence motif density profiles from reference genome. SeqPlots is useful both for exploratory data analyses and preparing replicable, publication quality plots. Other features of the software include collaboration and data sharing capabilities, as well as ability to store pre-calculated result matrixes, that combine many sequencing experiments and in-silico generated tracks with multiple different features.

DeFCoM / Detecting Footprints Containing Motifs

A supervised learning based footprint prediction framework. DeFCoM was designed to capture variation in DNaseI signal within active footprints and unbound motif sites to enhance footprint classification accuracy, a consideration unaccounted for in previous footprinters. From a set of motif sites labeled as active or inactive for a given transcription factor in a cell experimental condition, the Support Vector Machine (SVM) classifier is trained on features that are derived from DNase-seq data from the same cell type for each motif site. This allows DeFCoM to capture the complexity of the data when necessary with the Radial Basis Function (RBF) kernel, while avoiding over-fitting, a common problem in supervised learning, by choosing the linear kernel when that complexity is lacking.


Applies a hierarchical Bayesian mixture model to infer regions of the genome that are bound by particular transcription factors (TFs). CENTIPEDE starts by identifying a set of candidate binding sites (e.g., sites that match a certain position weight matrix (PWM)), and then aims to classify the sites according to whether each site is bound or not bound by a TF. It is an unsupervised learning algorithm that discriminates between two different types of motif instances using as much relevant information as possible.


An algorithm for accurately inferring transcription factor binding sites using chromatin accessibility data (DNase-seq, ATAC-seq). The hierarchical multiscale model underlying msCentipede identifies factor-bound genomic sites by using patterns in DNA cleavage resulting from the action of nucleases in open chromatin regions (regions typically bound by transcription factors). msCentipede, a generalization of the CENTIPEDE model, accounts for heterogeneity in the DNA cleavage patterns around sites bound by transcription factors.


Identifies and quantifies footprints of the effects of noncoding variants on transcription factor (TF) binding. Sasquatch provides a relatively simple and yet informative approach, requiring only a single DNase-seq data set from the appropriate cell type. It can use data from any genotype to assess variants that are appropriate to that cell type. It can employ publicly available data of any reasonable depth and quality, generated by any of the existing DNase-seq protocols, including low-input DNase-seq protocols.


Automates the processing and analysis of several commonly used Next Generation Sequencing (NGS) datasets including: ChIP-seq, RNA-seq, Global Run On sequencing (GRO-seq), micrococcal nuclease footprint sequencing (MNase-seq), DNase hypersensitivity sequencing (DNase-seq), and transposase-accessible chromatin using sequencing ATAC-seq datasets. CIPHER provides an analysis mode that accomplishes complex bioinformatics tasks such as enhancer prediction. It supplies functions to integrate various NGS datasets together.

LR-DNase / logistic regression DNase

A logistic regression model. LR-DNase predicts binding sites for a specific transcription factor (TF) using seven features derived from DNase-seq and genomic sequence. We calculate the area under the precision-recall curve at a false discovery rate cutoff of 0.5 for the LR-DNase model, a number of logistic regression models with fewer features, and several existing state-of-the-art TF binding prediction methods. The LR-DNase model outperforms existing unsupervised and supervised methods. Additionally, for many TFs, a model that uses only two features, DNase-seq reads and motif score, is sufficient to match the performance of the best existing methods.


Integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. Mocap uses sequence-derived genomic features and one chromatin accessibility experiment per cell type to profile TFCT-specific binding activities. The tool aims to help reveal the mechanistic complexity of mammalian gene regulation and chart the mammalian regulatory landscape spanning multi-lineage differentiation.

HINT / Hmm-based IdeNtification of Tf footprints

A method based on hidden Markov models to integrate DNase I hypersensitivity and histone modifications occupancy for the detection of open chromatin regions and active binding sites. We have created a framework that includes treatment of genomic signals, model training and genome-wide application. In a comparative analysis, our method obtained a good trade-off between sensitivity versus specificity and superior area under the curve statistics than competing methods. Moreover, our technique does not require further training or sequence information to generate binding location predictions. Therefore, the method can be easily applied on new cell types and allow flexible downstream analysis such as de novo motif finding.

MARGE / Mutation Analysis for Regulatory Genomic Elements

Investigates ChIP-seq, ATAC-seq, DNase I Hypersensitivity or other next generation sequencing (NGS) assays. MARGE recognizes DNA binding motifs that potentially affect transcription factor (TF) binding using traditional de-novo motif analysis on genomic sequence for each polymorphic allele. It serves to find combinations of collaborating transcription factors. This tool contains visualization software to interpret the results.