Computational protocol: Polygenic pleiotropy and potential causal relationships between educational attainment, neurobiological profile, and positive psychotic symptoms

Similar protocols

Protocol publication

[…] Genomic DNA from blood samples was extracted by standard procedures at the Massachusetts General Hospital Center for Genomic Medicine. Genotyping was performed at the Broad Institute using the Illumina Infinium OmniExpress array (Illumina Inc., San Diego, CA, USA). The quality control (QC) procedures have been described elsewhere. Briefly, we excluded 9 individuals with discordant sex information, missing genotype rate >5% or heterozygosity rate >3 SD, shared inflammatory bowel disease >0.125, or non-European ancestry based on principal component analyses. We removed ~45,000 SNPs on the X or Y chromosome, minor allele frequency <0.05, call rate <98%, and P < 1 × 10E−6 for deviation from Hardy–Weinberg equilibrium. The QC steps were carried out with PLINK and resulted in a total of 374 subjects with genotype data on 664,907 autosomal SNPs.We then performed genotype imputation, using the phased haplotypes from the 1000 Genomes Project dataset as the reference panel. Prephasing and imputation was done with SHAPEIT and IMPUTE2,. The imputation was performed with the default parameters of the software. The final imputed dataset consisted of 9.7 million autosomal SNPs. [...] We used GWAS summary statistics for SCZ and BPD from the Psychiatric Genomics Consortium (PGC), educational attainment (college completion) from the Social Science Genetic Association Consortium (SSGAC), and childhood intelligence from the Childhood Intelligence Consortium (CHIC) as the discovery datasets to derive genome-wide PRSs for each of the above psychiatric or cognitive phenotypes in the study sample. The SCZ discovery sample consisted of 46 non-overlapping case–control samples (33,356 cases and 43,724 controls) and 3 family-based samples (1396 parent affected–offspring trios). The BPD discovery sample included 11 case–control samples (7481 cases and 9250 controls). The college completion discovery sample were combined from 42 GWAS samples (22,475 college and 78,594 non-college), and 95.8% of the individuals were older than 30 years. The childhood intelligence discovery sample consisted of 6 cohorts with a total of 12,411 children aged 6 to 18 years. All subjects in the discovery samples were of European ancestry. There were no overlapping individuals between these discovery samples and our study sample.To account for only independent association signals from these discovery GWAS, we applied a linkage disequilibrium (LD) clumping procedure to each discovery dataset, in which we retained the SNP with smallest P value in each 250 kb window and removed all those in LD (r2 > 0.1) with this SNP. We also excluded the major histocompatibility complex region between 26 and 33 Mb on chromosome 6 when calculating the PRSs, because of the complex haplotype and LD structure in this region. For each psychiatric or cognitive phenotype, we used five different association P value thresholds (PTs)—0.001, 0.01, 0.05, 0.1, and 0.5—to select index SNPs from the clumped independent SNPs for calculating the PRSs. For each individual, we calculated the PRS for each psychiatric or cognitive phenotype by summing the risk allele counts of the index SNPs, weighted by the log of their association odds ratios (for SCZ, BPD, and college completion) or the beta coefficients (for childhood intelligence) estimated from the discovery GWAS results.We used PRSice v1.23 to calculate the PRSs and test the association between each PRS and the globally impaired ERP group. Associations were tested using logistic regression models including the top 3 principal components (PCs) of ancestry from the EIGENSTRAT analysis as covariates. We adjusted for the first 3 PCs because the 4th PC offers very little increase (<2%) in the total explained variance. Wald test P values and Nagelkerke’s R2s are reported. We performed the above PRS association analyses on the entire study sample and then repeated the same analyses on the case-only subsample. We used POLYGENESCORE software in R to calculate statistical power for the association between each PRS and the globally impaired ERP (see Supplementary Methods.) […]

Pipeline specifications

Software tools PLINK, SHAPEIT, IMPUTE, PRSice
Application GWAS
Organisms Homo sapiens
Diseases Psychoses, Substance-Induced