Detecting in vivo transcription factor (TF) binding is important for understanding gene regulatory circuitries. ChIP-seq is a powerful technique to empirically define TF binding in vivo. However, the multitude of distinct TFs makes genome-wide profiling for them all labor-intensive and costly. Algorithms for in silico prediction of TF binding have been developed, based mostly on histone modification or DNase I hypersensitivity data in conjunction with DNA motif and other genomic features.
A computational method and software package to predict TF binding using a combination of methylation profiles and genomic features. The idea is based on the observation that in vivo TF binding events often co-occur with altered methylation levels. Methods for in silico prediction of TF binding using epigenetics data have been proposed before, mostly based on histone ChIP-seq or DNase-seq data. Our method exploits methylation data instead, which is much easier to collect experimentally. Methylphet accepts 5mC, 5hmc and CH methylation profiles individually or in combination. As the example in the package shows, training about 7000 candidate sites and predicting on about 10 000 candidate sites with both CG and CH information takes less than one minute on a MacBook Pro laptop computer with 2.7 GHz i7 CPU and 16G RAM.