Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
Allows motif identification. cERMIT is designed to analyze current large genomic regulatory datasets such as those from ChIPchip or ChIP-seq experiments. The software makes use of the complete data without the need to pre-define or infer thresholds. It can take different data as evidence for regulatory interactions, and can optionally utilize orthologous sequences from related species to restrict the search to co-occurring motifs.
Identifies and quantifies footprints of the effects of noncoding variants on transcription factor (TF) binding. Sasquatch provides a relatively simple and yet informative approach, requiring only a single DNase-seq data set from the appropriate cell type. It can use data from any genotype to assess variants that are appropriate to that cell type. It can employ publicly available data of any reasonable depth and quality, generated by any of the existing DNase-seq protocols, including low-input DNase-seq protocols.
Chooses a diverse panel of genomic assays that leverages methods from submodular optimization. SSA serves as a model for how submodular optimization can be applied to other discrete problems in biology. This method is computationally efficient, results in high-quality panels according to several quality measures, and is mathematically optimal under some assumptions. It can be used partway through the investigation of a cell type, when several assays are already available. The tool can determine the most informative next experiments to perform.
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.
Predicts transcription factors binding sites and epigenetic data by combining sets of open-chromatin regions with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. It computes transcription factors affinities and uses open-chromatin / Histone-Mark signal intensity as quantitative measures of transcription factor (TF) binding strength. The tool outperforms several competitive approaches.
Integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. Mocap uses sequence-derived genomic features and one chromatin accessibility experiment per cell type to profile TFCT-specific binding activities. The tool aims to help reveal the mechanistic complexity of mammalian gene regulation and chart the mammalian regulatory landscape spanning multi-lineage differentiation.
Allows detection of collaborative transcription factor pairs. MMARGE consists of a suite of software tools to analyze ChIP-seq, ATAC-seq, DNase I Hypersensitivity or other next generation sequencing (NGS) assays where genotyping or DNA sequence data is available. For performing, this tool needs two types of data: (1) genetic variation, and (2) high-throughput sequencing data (ChIP-seq, ATAC-seq, DNaseI-seq).
Corrects batch effects (from multiple confounding variables) and library depth. ImpulseDE2 is a differential expression algorithm for longitudinal count data sets which arise in sequencing experiments such as RNA-seq, ChIP-seq, ATAC-seq and DNaseI-seq. This method is based on a negative binomial noise model with dispersion trend smoothing by DESeq2 and uses the impulse model to constrain the mean expression trajectory of each gene.
A broadly useful standalone tool for both interactive and batch analysis of combined evidence from enriched regions (ERs) in multiple ChIP-seq or DNase-seq replicates. Besides rigorously combining sample replicates to increase statistical significance of detected ERs, it also provides quantitative evaluations and graphical features to assess the biological relevance of each determined ER set within its genomic context; they include genomic annotation of determined ERs, nearest ER distance distribution, global correlation assessment of ERs and an integrated genome browser.
An R package and associated R Shiny web app for differential analysis of regulatory elements genome-wide using chromatin accessibility data, such as DNase-seq or ATAC-seq. Users can run the workflow either in the R console or by launching the embedded web application. User-friendly way of analysis includes guiding chromatin accessibility alignment and peak data to interpretable results of Response Elements (REs) and pathways of interest.
A supervised learning based footprint prediction framework. DeFCoM was designed to capture variation in DNaseI signal within active footprints and unbound motif sites to enhance footprint classification accuracy, a consideration unaccounted for in previous footprinters. From a set of motif sites labeled as active or inactive for a given transcription factor in a cell experimental condition, the Support Vector Machine (SVM) classifier is trained on features that are derived from DNase-seq data from the same cell type for each motif site. This allows DeFCoM to capture the complexity of the data when necessary with the Radial Basis Function (RBF) kernel, while avoiding over-fitting, a common problem in supervised learning, by choosing the linear kernel when that complexity is lacking.
Learns regulatory sequence code and predicts chromatin accessibility at the whole genome level. Deopen is able to achieve state-of-the-art performance in the chromatin accessibility classification problem. It recovers continuous degree of chromatin accessibility for an input sequence, and fills the gap of predicting DNA accessibility signals in continuous values. This tool is based on a deep convolutional neural network (CNN) and a typical three-layer feed forward network.
Provides a pipeline for executing chromatin profiling assays. ATAC2GRN gathers optimized ATAC-seq and DNase1-seq pipelines to assess accurate genome regulatory network (GRN) inference. This software assists for maximizing ChIP recovery for transcription factor occupancy assessment. The project is composed of three main parts: one to generate figures, one part in both bash and Snakemake for the pipelines and the last one to estimate pipeline recapitulation of ChIP-seq.
Automates quality controls and data analyses on ChIP-seq and DNase-seq data. ChiLin generates comprehensive quality control reports that include comparisons with historical data derived from over 23,677 public ChIP-seq and DNase-seq samples (11,265 datasets) from eight literature-based classified categories. Therefore, ChiLin can be an attractive solution to rapidly process batches of ChIP-seq datasets in an automated manner with detailed QC reports.
Automates the processing and analysis of several commonly used Next Generation Sequencing (NGS) datasets including: ChIP-seq, RNA-seq, Global Run On sequencing (GRO-seq), micrococcal nuclease footprint sequencing (MNase-seq), DNase hypersensitivity sequencing (DNase-seq), and transposase-accessible chromatin using sequencing ATAC-seq datasets. CIPHER provides an analysis mode that accomplishes complex bioinformatics tasks such as enhancer prediction. It supplies functions to integrate various NGS datasets together.
Enables optimal processing of datasets from different enrichment patterns. Epimetheus is a quantile-based multi-profile normalization tool. Users have the possibility to exclude specific genomic regions like, for example, repetitive elements or any other genomic locations for which artefactual enrichments might be expected. The Epimetheus pipeline involves four main steps: (i) processing of the raw alignment data, (ii) generation of read count intensity (RCI) matrices, (iii) computation of two subsequent levels of normalization (quantile and Zscore) and (iv) generation of outputs and plots.
Visualizes next-generation sequencing (NGS) signals and sequence motif densities along genomic features using average plots and heatmaps. It can also calculate sequence motif density profiles from reference genome. SeqPlots is useful both for exploratory data analyses and preparing replicable, publication quality plots. Other features of the software include collaboration and data sharing capabilities, as well as ability to store pre-calculated result matrixes, that combine many sequencing experiments and in-silico generated tracks with multiple different features.
Determines functional effect of genetic variants on local chromatin accessibility. circuitSNPs computes the log-odds difference between a “reference prediction” and an “alternate prediction”. It takes into account specific regulatory context considering all the footprints in the region when assessing the neural network output. This tool can be useful for tissue-specific tasks. It enables users to capture biologically relevant information.
A logistic regression model. LR-DNase predicts binding sites for a specific transcription factor (TF) using seven features derived from DNase-seq and genomic sequence. We calculate the area under the precision-recall curve at a false discovery rate cutoff of 0.5 for the LR-DNase model, a number of logistic regression models with fewer features, and several existing state-of-the-art TF binding prediction methods. The LR-DNase model outperforms existing unsupervised and supervised methods. Additionally, for many TFs, a model that uses only two features, DNase-seq reads and motif score, is sufficient to match the performance of the best existing methods.