Genome-wide association studies revealed that most disease-associated single nucleotide polymorphisms (SNPs) are located in regulatory regions within introns or in regions between genes. Regulatory SNPs (rSNPs) are such SNPs that affect gene regulation by changing transcription factor (TF) binding affinities to genomic sequences. Identifying potential rSNPs is crucial for understanding disease mechanisms.
A sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) that encodes cell type-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic contexts and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. deltaSVM provides a powerful computational approach to systematically identify functional regulatory variants.
Informs users on single nucleotide polymorphism (SNP)-related regulatory elements in human. rSNPBase provides annotation for SNPs including related regulatory elements and regulatory element-target gene pairs (E–G pairs). The tool is organized around two main features: a rSNP search and a network search that both provide information about the element gene pairs, extended annotations or related-elements on specific SNPs and SNP-related graphic networks.
Determines the total affinity of a sequence for a given transcription factor, thus removing the need for a threshold value. TRAP ranks all promoter sequences of a genome on the basis of their overall affinity for that factor to proceed. It can serve to estimate the most enriched factor into a given sequence, the sequences with the highest affinity for a factor of interest, or the binding sites of a factor affected by the given single nucleotide polymorphisms (SNPs).
A tool for exploring annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with chromatin state and protein binding annotation from the Roadmap Epigenomics and ENCODE projects, sequence conservation across mammals, the effect of SNPs on regulatory motifs, and the effect of SNPs on expression from eQTL studies. HaploReg is designed for researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation.
An automated framework for the statistical analysis and interpretation of the functional impact of SNP sets using regulatory datasets from the ENCODE, Roadmap Epigenomics and other projects. GenomeRunner prioritizes regulatory datasets most significantly enriched in SNP sets and visualizes the most significant enrichments, thus suggesting regulatory mechanisms that may be altered by them. In addition to prioritizing SNP set-specific regulatory enrichments (functional impact), GenomeRunner implements three novel approaches: 1) regulatory similarity analysis, aimed at identifying groups of SNP sets having similar functional impact; 2) differential regulatory analysis, developed to identify functional impact specific for a group of SNP sets; and 3) cell type regulatory enrichment analysis, designed to identify cell type specificity of the functional impact.
A user friendly tool for annotating cancer mutations in cis-regulatory regions of DNA. OncoCis integrates publicly available datasets representing a wide range of cancer types from genome-wide chromatin accessibility and histone modification profiles obtained from ENCODE and the Human Epigenome Atlas to identify mutations that occur within potential cis-regulatory regions. The use of cell type-specific information and gene expression can significantly reduce the number of candidate cis-regulatory mutations compared with existing tools designed for the annotation of cis-regulatory SNPs.
Identifies and quantifies footprints of the effects of noncoding variants on transcription factor (TF) binding. Sasquatch provides a relatively simple and yet informative approach, requiring only a single DNase-seq data set from the appropriate cell type. It can use data from any genotype to assess variants that are appropriate to that cell type. It can employ publicly available data of any reasonable depth and quality, generated by any of the existing DNase-seq protocols, including low-input DNase-seq protocols.
Allows users to preserve the inter-position dependencies and includes the flanking k-mers. KSM is a program that consists of a set of aligned k-mers that are over-represented at transcription factor (TF) binding sites. This tool can be used for predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles.
Quantifies the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. BayesPI-BAR is a useful tool for detecting functional driver mutation in the noncoding part of the genome and exploring massive genome-wide sequence data that are constantly generated by large consortia, such as the International Cancer Genome Consortium and the Cancer Genome Atlas.
A software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The FIMO algorithm identifies all individual motif occurrences and is the method of choice for scanning genomes. Its output can be uploaded to the UCSC genome browser for viewing. FIMO is part of the MEME Suite online platform.
An R package for predicting the disruptiveness of single nucleotide polymorphisms on transcription factor binding sites. motifbreakR allows the biologist to judge whether the sequence surrounding a polymorphism or mutation is a good match, and how much information is gained or lost in one allele of the polymorphism or mutation relative to the other. MotifbreakR is flexible, giving a choice of algorithms for interrogation of genomes with motifs from many public sources that users can choose from. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design.
A computationally efficient R package for identifying rSNPs in silico. atSNP implements an importance sampling algorithm coupled with a first-order Markov model for the background nucleotide sequences to test the significance of affinity scores and SNP-driven changes in these scores. Application of atSNP with >20K SNPs indicates that atSNP is the only available tool for such a large-scale task. atSNP provides user-friendly output in the form of both tables and composite logo plots for visualizing SNP-motif interactions.
A computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.
Analyzes local epigenetic neighborhood of a set of single nucleotides polymorphisms (SNPs). SNPhood is a package allowing users to exploit data from next generation sequencing (NGS) by offering a mean to (i) discover allelic bias through regions of interest (ROI); (ii) browse and view genotype-dependent binding patterns and (iii), make genotype-dependent comparisons and grouping of the binding pattern across ROI and samples.
Facilitates rapid design of massively parallel reporter assays (MPRA) experiments. MPRAnator allows systematic design of MPRA experiments for the investigation of the effects of single nucleotide polymorphisms (SNPs) and motifs on regulatory sequences. MPRAnator provides support for four different types of investigations. The MPRA Motif design tool can be used to systematically generate synthetic sequences with single motifs or combinations of motifs placed at preselected positions. The MPRA SNP design tool can be used to examine the regulatory effects of single or combinations of SNPs for every provided sequence. The PWM Seq-Gen tool performs probabilistic realizations of pulse width modulations (PWMs) or generates all the corresponding k-mer motifs exceeding a probability threshold. The Transmutation tool allows for the design of different types of negative controls for MPRA experiments.
Models massively parallel reporter assays (MPRA) experiments. MPRA design tools provides a collection of barcoded oligonucleotides containing reference and alternate alleles of variants of interest along with surrounding genomic sequence. It allows users to modify barcodes per allele and activity variance to analyze the estimated effect on statistical power. This tool acquires genomic context from the hg38 reference genome.
Identifies cis-regulatory mutations in a cancer sample, but it can also to filter, annotate and prioritize non coding-variants based on their putative effect on the underlying 'personal' gene regulatory network. The concept behind µ-cisTarget is to simultaneously identify “personalized” candidate master regulators for a given cancer sample, based on the gene expression profile of the sample. It concerns to priorities single nucleotide variants (SNVs) and insertions/deletions (INDELs) in the non-coding genome of the sample by their likelihood to generate de novo binding sites for any of these master regulators
Searchs putative regulatory genetic variation in favorite gene. SNPs (from dbSNP and user defined) are analyzed for overlap with potential transcription factor binding sites (TFBS) and phylogenetic footprinting using UCSC phastCons scores from multiple alignments of 8 vertebrate genomes.
Applies a topic model to systematically discover regulatory modules using a large compendium of in vivo transcription factor (TF) binding data. RMD is a program based on Hierarchical Dirichlet Processes, a Bayesian nonparametric topic model that automatically determines the number of modules based on the complexity of the observed data. For instance, this tool is able to decompose complex binding regions into a combination of specific modules.
Provides a method to distinguish Regulatory single nucleotide polymorphisms (rSNPs) in human genome from massive background SNPs. rSNPdect is a Matlab package and a computational method for rSNP identification. It can achieve a prediction result better than that given by rSNP-MAPPER and is-rSNP. This method can be helpful for studies of regulatory variations and in particular their roles in diseases.