Computational protocol: Characteristics of functional enrichment and gene expression level of human putative transcriptional target genes

Similar protocols

Protocol publication

[…] The functional enrichments of target genes of a TFBS and its corresponding transcription factor were examined using GO-Elite v1.2.5 with p-value threshold at 1, and after GO-Elite analyses a false discovery rate (FDR) test was performed with q-value threshold at 10−3 to correct for multiple comparisons of thousands of groups of transcriptional target genes in each cell type and condition []. For examining functional enrichments of high or low expressed genes independent of transcriptional target genes, the p-value threshold was set to 0.01 or 0.05 to confirm that the results were not significantly changed. UCSC gene IDs were transformed into RefSeq IDs prior to GO-Elite analyses. GO-Elite uses 10 databases for identifying functional enrichments: (1) Gene Ontology, (2) Disease Ontology, (3) Pathway Commons, (4) GO Slim, (5) WikiPathways, (6) KEGG, (7) Transcription factor to target genes, (8) microRNA to target genes, (9) InterPro and UniProt functional regions (Domains), and (10) Cellular biomarkers (BioMarkers). To calculate the normalized numbers of functional enrichments of target genes, the numbers of functional enrichments were divided by the total number of target genes in each cell type and condition, and were multiplied by 105. In tables showing the numbers of functional enrichments in 10 databases, heat maps were plotted according to Z-scores calculated from the numbers of functional enrichments of each database using in-house Excel VBA scripts. In the comparisons of the normalized numbers of functional enrichments of target genes in cell types and conditions, if the number of a functional annotation in a cell type or condition was two times larger than that in the other cell type or condition, the functional annotation was recognized as more enriched than the other cell type or condition.To investigate whether the normalized numbers of functional enrichments of transcriptional target genes correlate with the prediction of target genes, a part of target genes were changed with randomly selected genes with high expression level (top 30% expression level), and functional enrichments of the target genes were examined. First, 5%, 10%, 20%, 40%, and 60% of target genes were changed with randomly selected genes with high expression level in monocytes, CD4+ T cells, and CD20+ B cells. Second, as another randomization of target genes, the same number of 5%, 10%, 20%, 40%, and 60% of target genes were selected randomly from highly expressed genes, then added them to the original target genes, and functional enrichments of the target genes were examined. All analyses were repeated three times to estimate standard errors (Fig. , Additional file : Figure S1, S2, and S6). The same analysis was performed using DNase-DGF data and ChIP-seq data of 19 TF in H1-hESC. Transcriptional target genes were predicted from promoter (Additional file : Figure S7). [...] CTCF ChIP-seq data for monocytes CD14+ cells (GSM1003508_hg19_wgEncodeBroadHistoneMonocd14ro1746CtcfPk.broadPeak.gz), CD4+ T cells (SRR001460.bam), CD20+ B cells (GSM1003474_hg19_wgEncodeBroadHistoneCd20CtcfPk.broadPeak.gz), H1-hESC (wgEncodeAwgTfbsUtaH1hescCtcfUniPk.narrowPeak.gz), iPSC (GSE96477), HUVEC (wgEncodeAwgTfbsUwHuvecCtcfUniPk.narrowPeak.gz), IMR90 (wgEncodeAwgTfbsSydhImr90CtcfbIggrabUniPktfbsf.narrowPeak.gz), MCF-7 (wgEncodeAwgTfbsUwMcf7CtcfUniPktfbsf.narrowPeak.gz), and HMEC (wgEncodeAwgTfbsUwHmecCtcfUniPktfbsf.narrowPeak.gz) were used. SRR001460.bam was sorted and indexed by SAMtools and transformed into a bed file using bamToBed of BEDTools [, ]. ChIP-seq peaks were predicted by SICER-rb.sh of SICER with optional parameters ‘hg19 1 200 150 0.74 200 100’ []. Extended regions for enhancer-promoter association (association rule 4) were shortened at the genomic locations of CTCF-binding sites that were the closest to a transcriptional start site, and transcriptional target genes were predicted from the shortened enhancer regions using TFBS. Furthermore, promoter and extended regions for enhancer-promoter association (association rule 4) were shortened at the genomic locations of forward–reverse orientation of CTCF-binding sites. When forward or reverse orientation of CTCF-binding sites were continuously located in genome sequences several times, the most external forward–reverse orientation of CTCF-binding sites were selected. […]

Pipeline specifications

Software tools GO-Elite, WikiPathways, BroadPeak, SAMtools, BEDTools, SICER
Databases Pathway Commons KEGG HUVEC
Application ChIP-seq analysis
Organisms Homo sapiens