Computational protocol: Enrichment of schizophrenia heritability in both neuronal and glia cell regulatory elements

Similar protocols

Protocol publication

[…] H3K4me3 chromatin immunoprecipitation (ChIP)-sequencing data from human embryonic stem cell-derived neurons were generated by the Epigenomics Roadmap. Preprocessed broad peak calls for H3K4me3 were downloaded from http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/. Peaks with a negative log10 P-value of 5 or greater were retained for further analysis, as described in the original MACS protocol.H3K4me3 ChIP-sequencing data from NeuN-positive and NeuN-sorted human post-mortem brain nuclei (GEO accession numbers: GSM529964, GSM529969, GSM529971 and GSM529972) were downloaded from the Sequence Read Archive (SRA, accession number: GSE21172). These data correspond to chromatin immunoprecipitation (ChIP) experiments performed on two donor samples, for which both NeuN-positive and NeuN-negative sorted cell populations were available. SRA files were converted to fastq files using fastq-dump, part of the SRA toolkit available from NCBI (http://www.ncbi.nlm.nih.gov/Traces/sra). Fastq files were then processed as previously described. Briefly, data were aligned using bowtie2, and converted to bam, sorted and indexed using SAMtools. The resulting bam files for two biological replicates for each cell type were merged using BEDTools to produce single consolidated read sets for NeuN-positive and NeuN-negative cell types. All four files (two biological replicates for each cell type) were also merged to produce an additional read set that mimicked data generated from non-sorted post-mortem tissue, hereafter called ‘reconstituted’ brain data set. Peaks were called as previously described using broad peak calling in MACS 2.0 with broad-cutoff = 0.1 and q-value = 0.01.Aligned H3K3me3 ChIP-seq reads generated from astrocytes were downloaded from GEO (GSM733747) along with matched input samples (GSM733678). Peaks were called for each replicate as described above. Peaks with a negative log10 P-value of 5 or greater were retained and the resulting peak files intersected using BEDTools produce a peak set used in further analysis.H3K4me3 and input ChIP-sequencing data from human adipocyte cells were downloaded from GEO series GSE41629 and processed as described for the astrocytes, with the exception that these data only consisted of a single replicate. This peak set was used as a negative control in the analysis to ensure results were specific to the brain cell types of interest.Cell-type specific peaks were extracted by intersecting the appropriate processed bed files (e.g. NeuN-positive peak file and NeuN-negative peak file) using BEDTools. [...] Heritability was partitioned using LD score regression following previously described methodology. Consistent with previous partitioning heritability studies, all peaks were extended by ±500 base pairs. LD score files were made for each specific annotation of interest using the open source software available here: https://github.com/bulik/ldsc/wiki. The major histocompatibility complex (MHC) region was removed from the schizophrenia GWAS summary statistics as was done in the original analysis due to the LD structure of the region. Each annotation was added to the baseline model independently creating 12 separate models, as was done in the original analysis. The baseline model includes 24 non-cell-specific annotations that cover a range of DNA features, such as coding, three prime untranslated region, promoter, intronic, H3K4me1 marks, H3K4me3 marks, H3K9ac marks, H3K27ac marks, DNase I hypersensitivity sites, chromHMM and Segway predictions, regions conserved in mammals, super-enhancers, and FANTOM5 enhancers (please see ref. for more information about the baseline model). Z-scores were extracted from each model for the annotation of interest and were used to calculate P-values, hereafter called z-score P-values. Additionally, we performed a conditional partitioned heritability analysis. For this, we ran a model that included both NeuN-positive and NeuN-negative H3K4me3 annotations along with the baseline, and another model that included human stem cell-derived neurons and cultured primary human astrocytes H3K4me3 annotations along with the baseline.We report the enrichment of schizophrenia heritability within each cell-type specific functional category, as enrichment is more easily understood and interpretable in terms of the relationship between schizophrenia heritability and cell-type specific functional categories than z-scores. However, we report the z-score P-value alongside these enrichments instead of the enrichment P-value as z-score P-value controls for the overlap of cell-type-specific annotations with other functional categories that are included in the full baseline model or annotations included in a conditional partitioned heritability analysis, whereas enrichment P-values do not.In total, we performed 12 tests of enrichment and two conditional tests for a total of 14 tests. Using Bonferroni’s correction for multiple testing, P-values below P < 3.57 × 10−3 were considered to have withstood correction for multiple testing. […]

Pipeline specifications

Software tools LDSC, ChromHMM
Application GWAS
Organisms Homo sapiens