Predict transcription factor binding from DNase footprints with Sasquatch
Predicting the impact of regulatory sequence variation on transcription factor (TF) binding is an important challenge as the vast majority of disease associated SNPs are found in the non-coding genome (Vaquerizas, Kummerfeld, Teichmann, & Luscombe, 2009). Most existing approaches rely on large catalogs of cell type and TF specific functional annotations. As only a minority of TFs is well characterized (Maurano et al., 2015; Rockman & Kruglyak, 2006), identifying the relevant factors and probing them in the appropriate cell types represents a major limitation of TF centric approaches.
With this in mind, Ron Schwessinger and colleagues from University of Oxford have developed the Sasquatch tool to use DNase footprinting data to estimate and visualize the effects of non-coding variants on TF binding. Here, they talk about the features and benefits of their tool.
Sasquatch – predicting TF binding from average, k-mer based DNase footprints
DNase I cuts the genome preferentially in accessible regions, associated with regulatory function. By mapping only the very cut sites instead of entire fragments, DNase-seq can reveal protein occupation at bp resolution. Sasquatch analyses DNase footprints to comprehensively determine any k-mer’s potential for cell type specific TF binding in the context of open chromatin and how this may be changed by sequence variants. Sasquatch is an unbiased approach, independent of known TF binding sites and motifs and only requires a single DNase-seq dataset per cell type.
Probe TF binding potential
Querying and k-mer from the tissue repository retrieves the relative DNase cut profile over a 250 bp window of that k-mer. By automatically detecting shoulders and footprints in the profile, Sasquatch quantifies the Shoulder-to-Footprint Ratio (SFR) and thus the average protein occupancy of that k-mer within open-chromatin. The SFR is cell type specific with tissue specific TF only yielding a footprint and high SFR in relevant cell types while housekeeping TF score consistently high SFRs.
Predict impact of sequence variation
Comparing the footprint profile of a reference and variant sequence yields a total and relative damage score of a particular variant. For that Sasquatch compares the SFRs in a sliding window approach. Variants with high damage scores are predicted to strongly alter TF binding potential in open chromatin in a cell type specific manner.
Priorities non-coding sequence variants
By querying sequence variants in batch mode, Sasquatch can quickly prioritize thousands of variants for their potential to alter TF binding. Importantly, Sasquatch assumes open-chromatin context. Therefore, filtering variants for location in potential open-chromatin is advised when dealing with many variants.
In silico mutate entire regions
By predicting the damage score of every possible base substitution at every bp, Sasquatch can create in silico mutation plots. Peaks of high damage predict the location of likely binding sites within open-chromatin that can be damaged by sequence variants. By analyzing entire genomic elements Sasquatch can help to characterize regulatory elements.
We implemented Sasquatch as webtool for fast and straight forward usage (http://apps.molbiol.ox.ac.uk/sasquatch/cgi-bin/foot.cgi). To allow for a more flexible and customized usage, we also made it available as R implementation (https://github.com/Hughes-Genome-Group/sasquatch). We pre-processed all human ENCODE DNase data to supply a large repository of cell types for your analysis. Custom DNase-seq data can be easily pre-preprocessed and run in our R implementation.
Maurano, M. T. et al. (2015). Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nature Genetics. Rockman, M. V., and Kruglyak, L. (2006). Genetics of global gene expression. Nature Reviews Genetics.
Schwessinger, R. et al. (2017). Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints. Genome Research.
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A., & Luscombe, N. M. (2009). A census of human transcription factors: function, expression and evolution. Nature Reviews Genetics.