Improve your differential gene expression analysis with EPEE
Differential gene expression is highly common. The standard paradigm is to quantify the significance of differences in the individual gene expression values, commonly known as differential expression (DE) analysis. One opportunity to improve the widely used DE analysis is to incorporate known gene regulation relationships. Without regulatory knowledge, DE methods cannot discover any perturbation/regulation events due to post-transcriptional and/or translational mechanisms. To address this challenge, Murat Cobanoglu and his colleagues from Lyda Hill Department of Bioinformatics have developed EPEE. Here, they present their tool and its main features.
To more accurately analyze differential gene expression data, we need algorithms that account for differential regulation (DR). However, most currently existing DR methods do not strictly integrate existing knowledge of transcriptional regulation networks. Some DR methods use no regulatory network, yet others create a restricted network from gene expression alone, ignoring other regulatory cues.
The few methods that integrate regulatory networks assess each regulator individually. This is a problem because unless all regulatory activity is inferred simultaneously, the activity of a single regulator will be attributed to multiple regulators due to target overlap.
To address these challenges, we have started to develop the Effector and Perturbation Estimation Engine (EPEE). EPEE uses context-aware regulatory networks and infers activity of all transcription factors (TFs) simultaneously using structured sparsity. We enable context-awareness by integrating FANTOM derived networks, and use structured sparsity for joint inference of all TF activity.
Validation of the method
We used four separate datasets from two different domains of biology (immunology and cancer) that have completely different characteristics. The immunology dataset enables us to test our method under normal transcriptional regulation conditions, whereas cancer represents pathological regulation due to genomic instability. The immunology dataset has five samples per condition, which represents the number that a standard experimental group can feasibly collect, whereas the cancer dataset (TCGA) has about 500 samples, representing the data that well-funded consortia can produce. The immunology data is collected from highly purified and homogeneous samples, whereas the cancer data consists of (notoriously) impure samples. In all these settings, our method outperforms existing alternatives. Here we provide the results on Th2 differentiation as an example (figure 2), with the rest of the results (as well as details) available on our preprint.
Viren Amin and Murat Can Cobanoglu. (2018). Accurate differential analysis of transcription factor activity from gene expression. bioRxiv.