Computational protocol: A diverse epigenetic landscape at human exons with implication for expression

[…] Reads from the RNA-Seq datasets for the IMR90 cells () and the B cells () were separately aligned to the human transcriptome with Tophat (), and transcript-specific expression scores (FPKMs) were assigned by Cufflinks (), using the UCSC hg18 transcriptome as a reference. Exon-specific expression scores were assigned by summing up the FPKMs of the transcripts in which the exon was included. Since we use constitutive exons throughout the study these expression values are consistent with the gene expression. A set of high expressed exons and a set of low expressed exons were determined as the set of exons at the top and bottom 20th percentile of expression rates, respectively, and was restricted to exons that had FPKM > 0. For cassette exons the inclusion rate of each exon was defined as (exon-specific expression score)/(gene expression score), where the gene expression score is the sum of expression scores over all transcripts that overlap at some region.We computed empirical P-values for observed Pearson correlations by randomly permuting the values in one of the two sets 10,000 times, and recording the number of times in which the Pearson correlation for a random permutation was larger or equal to the correlation observed. [...] H3 ChIP-Seq data was downloaded from () (GEO accession number: GSM1135044, control condition dataset). The fastq files were downloaded from the ebi website ( were aligned with Bowtie ( using the default parameters and requiring unique matches. If multiple reads mapped to the same position on the ‘+’ strand or stop position on the ‘−’ strand only a single read was maintained. At each genomic location the number of overlapping reads was recorded, and then smoothed by assigning at each position the average value across an 18bp window (centered at the position).Regions of enhanced DNA accessibility in IMR90 were obtained from the NIH Epigenome Roadmap Project, made available in Rajagopal et al. ().Predicted enhancers for IMR90 were extracted from Rajagopal et al. () using the RFECS algorithm on the 24 histone modifications available from the NIH Epigenome Roadmap Project. Enhancer regions were defined using a window of −0.5 to +0.5 kb. […]

Pipeline specifications

Software tools Bowtie, RFECS
Application ChIP-seq analysis
Organisms Homo sapiens