Computational protocol: Bioinformatic analysis of cis-regulatory interactions between progesterone and estrogen receptors in breast cancer

Similar protocols

Protocol publication

[…] PR data were taken from the study of and ERα data were obtained from the ENCODE project (). PR data were obtained by treating T47D cells with the progestin ORG2058 for 45 min, followed by PR-specific chromatin immunoprecipitation and deep sequencing (ChIP-Seq). Gertz et al. studied ERα binding sites by treating with estradiol (E2), GEN (Genistein) and BPA (Bisphenol A) and conclude that compared to E2, GEN and BPA treatment results in fewer ERα binding sites and less change in gene expression. We selected the E2-treated dataset for our study. Datasets from both studies were of 36 base pair lengths on the Illumina platform. The PR data were generated using an Illumina Genome Analyzer IIx while ERα libraries were sequenced on Illumina HiSeq 2000. The data used in this study have been derived from peer-reviewed publications, suggesting that they are of an acceptable quality, in addition we also ensured standard quality control checks prior to our re-analysis of the raw data. The two studies used different genome assemblies and different tools to align the reads and to call the peaks. Therefore, to remove any biases we re-analysed the raw ERα and PR data. We mapped the raw data to the GRCh37/hg19 assembly using Bowtie version 2 (). The aligned replicates were merged using Picard tools () and Model-based Analysis of ChIP-seq Algorithm (MACS) version 1.4.2 () was employed, with default settings, to identify PR and ERα binding regions in the two datasets. Regions associated with greater than 5% false discovery rate (FDR) were removed ().We performed motif analysis using HOMER software (). HOMER employs a differential motif discovery algorithm by comparing two sets of sequences and quantifying consensus motifs that are differentially enriched in a set. HOMER automatically generates an appropriate background sequence matched for the GC content to avoid bias from CpG Islands. The tool is exclusively written for analysing DNA regulatory elements in ChIP-Seq experiments and has been used in number of high impact publications (; ; ).Overlapping features were studied in BiSA (). BiSA is a bioinformatics database resource that can be run on Windows as a personal resource or web-based under Galaxy () as a collaborative tool. BiSA is pre-populated with published transcription factor and histone modification datasets and allows investigators to run a number of overlapping and non-overlapping genomic region analyses using their own datasets, or against the pre-loaded Knowledge Base. Overlapping features can be visualised as a Venn diagram and binding regions of interest can also be annotated with nearby genes. BiSA also provides an easy graphical interface to find the statistical significance of observed overlap between two genomic region datasets by implementing the IntervalStat tool (). The tool calculates a p-value for each peak region by comparing a region from the query dataset to all regions in a reference dataset. The tool restricts the analysis to regions that are within a domain dataset which can be a whole genome or can be possible interval locations such as promoter proximal regions. Based on IntervalStat calculated p-values BiSA calculates a summary statistic that we refer to as the Overlap Correlation Value (OCV). The OCV ranges from 0 to 1, the closer the value to 1 the stronger the significance of overlap of two datasets. The OCV represents the fraction of regions in the query dataset with a p-value less than a specified threshold. In BiSA, we have set the threshold p-value to 0.05 and used a number of domains such as whole genome and promoter proximal regions for this analysis.We also investigated the spatial correlation of regions of whole datasets being closer to each other by Binary Interval Search (BITS) () and Genometricorr (). BITS implements a Monte Carlo simulation by comparing actual overlapping regions to random observed overlap. Genometricorr considers one genomic region set as a reference and other set as a query and provides four asymmetric pair-wise statistical tests (i) relative distance also called local correlation, (ii) absolute distance, (iii) Jaccard statistic and (iv) projection statistical tests. In local correlation the significance of relative distance between the genomic regions is measured by Kolmogorov–Smirnov test, in absolute distance test the significance of base pair distance among the regions is measured by permutation test, Jaccard statistic takes into account the ratio of intersecting bases to the union base pairs. A projection test calculates the overlapping centre points of query to reference regions and finds the significance of result outside of the null expectation by binomial test (). We performed 10,000 simulations for BITS and Genometricorr statistical tests.We performed functional annotation of ERα-PR common cis-regulatory regions using GREAT (Genomic Regions Enrichment of Annotations Tool) (). GREAT incorporates annotations from 20 ontologies covering gene ontology, phenotype data, human disease pathways, gene expression, regulatory motifs and gene families. We performed GREAT annotation using its default settings. A region was considered to have a proximal association with a gene if it was within 5 kb upstream or 1 kb downstream of the transcription start site (TSS). Regions outside this distance and up to 1,000 kb from the TSS to the next gene proximal region were considered to have a distal association. […]

Pipeline specifications

Software tools Bowtie, Picard, HOMER, BiSA, Galaxy, GenometriCorr
Application ChIP-seq analysis
Diseases Breast Neoplasms
Chemicals Estrogens, Progesterone