Computational protocol: Genome wide binding studies reveal DNA binding specificity mechanisms and functional interplay amongst Forkhead transcription factors

[…] For ChIP-seq analyses, the first 50 bp of the raw reads were mapped to hg18 using bowtie () with default setting, except that ‘−m1’ option was specified which only keeps reads that can be uniquely mapped to the genome. The number of reads was: 8 231 411 (FOXK2), 26 174 803 (FOXO3), 31 867 045 (FOXJ3) and 24 792 957 (H3K18ac). The binding regions were identified by peak calling using MACS 1.4.2 () and HOMER version 4.7 (), with fragment size set as 200 bp. Both programs were used using default thresholds (MACS; P < 1e−5, and all the peaks returned have a FDR < 5.67% and HOMER; P < 1e−4 and a 4-fold enrichment over input). Peaks that were identified by both peak callers were retained and MACS coordinates were used for downstream analysis. Peaks that have at least 1 bp in common were considered as overlapping, and the mergePeak functionality from HOMER was used in this case. Enriched motifs were identified by from HOMER using 200 bp spanning the summit of the transcription factor binding region.To associate binding peaks to genes and find the distance between two peaks, from HOMER was used (). The peak was assigned to a gene only if the summit is within between −5k and +2k from its transcriptional start site. HOMER () was also used to identify gene ontology (GO) categories associated with different categories of genes whose regulatory regions are bound by FOX proteins.To identify base preferences flanking the core Forkhead motif GTAAACA, the sequences located up to 5 bp upstream and downstream of these motifs were extracted. GTAAACA motifs found in the 200 bp regions flanking FOXK2 and FOXO3 summits were compared to the same number of GTAAACA motifs taken from the whole genome. The number of A, C, G and T at each flanking position was counted, the process repeated 1000 times on the whole genome (simulation), and the average and standard deviation of A, C, G and T at each position were calculated. If the occurrence of a particular base at a flanking position is more than 13 standard deviations than the simulation, it was considered as significant. […]

Pipeline specifications

Software tools Bowtie, HOMER
Application ChIP-seq analysis