Computational protocol: Largest GWAS of PTSD (N=20 070) yields genetic overlap with schizophrenia and sex differences in heritability

Similar protocols

Protocol publication

[…] Eleven groups contributed data for analysis; nine contributed individual-level data and two groups could only share single-nucleotide polymorphism (SNP)-level summary statistics. These two groups implemented the quality control (QC) and GWAS analyses outlined below before sharing summary statistics. Individual-level genotypes are necessary for genome-wide complex trait analysis (GCTA) and polygenic risk scoring (PRS), and hence these two data sets are not included in the polygenic analyses. For all data sets, the first analytical step was ancestry assignment. As described below, this yielded 19 ancestry-assigned data sets for analysis across the 11 contributing studies. Next, QC procedures and imputation were performed and then single SNP (GWA and meta-analyses) and polygenic analyses (GCTA, PRS, and linkage disequilibrium (LD) score regression (LDSC)) were completed. [...] SNPweights software was used to assign ancestry, applied to uncleaned data because the use of Hardy–Weinberg equilibrium filters can remove ancestry informative SNPs in samples with mixed ancestry background, and is therefore not compatible with this protocol. Ancestry assignment is separate from controlling for ancestry analytically, described below. Briefly, SNPweights uses external genomic reference panels to derive ancestry informative weights for individual SNPs. We used the four-group, continental ancestry panel to quantify individual participants’ proportions of: African (YRI), European (CEU), Asian (ASI) and Native American (NAT) ancestry. The specific cutoffs implemented in this report for assigning ancestry—which were selected after inspection of principal components (PC) plots — and adjusting cutoffs—were as follows: European American (EA) was defined as having ⩾90% European ancestry. African American (AA) was defined as ⩾90% for the combination of African and European ancestry, and also <3% Asian and Native American ancestry. Individuals who self-identify as Latino or Hispanic (LA) in the United States may have predominately European ancestry and may have relatively recent admixture with Native American ancestry or both Native American and African ancestry. In this report we focused on LA individuals with two-way admixture; cutoffs used were ⩾85% for the combination of European and Native American ancestry, <10% African ancestry and <3% Asian Ancestry. With exception of the South African data (see below), individuals not falling into the EA, AA and LA categories were excluded from analysis (on the basis of low case and/or control numbers).Data contributed from South Africa were handled separately because the African ancestry panel used in SNPweights (YRI) is a West African sample. For the South African sample, the typical GWAS method of conducting principal components analysis (PCA), followed by visual inspection of data, was used. In doing so, we identified two populations, which are the two South African data sets in this report. For comparison of PGC-PTSD ancestry to previously published meta-analyses of psychiatric phenotypes see . Recent admixture is evident for AA and LA participants in this study (). [...] QC procedures were performed sequentially on each of the 19 data subsets as follows: monomorphic SNPs and SNPs with missingness >0.05 were removed, and individuals with missingness >0.02 were removed. Individuals with heterozygosity >|0.2| and individuals failing sex checks were removed. SNPs with missingness >0.02 were removed (a more stringent SNP missingness filter was applied after individual level filters). SNPs with differential missingness between cases and controls >0.02 were removed. SNPs failing Hardy–Weinberg equilibrium: controls (P<1 × 10−6) and cases (P<1 × 10−10) were removed. All analyses were performed using second-generation PLINK.PCA was performed within each data set and then across all data sets using FastPCA. PCA was conducted on high-quality SNPs with low LD passing filters: SNP directly genotyped in all data sets; minor allele frequency (MAF) >0.05; Hardy–Weinberg equilibrium P>1 × 10−4; not strand ambiguous (i.e. no AT or GC SNPs); not in high LD region (MHC chr6:25–35 Mb, chr8 inversion chr8:7–13 Mb); and r2 between SNPs <0.2 (i.e., the PLINK option: ‘—indep-pairwise 200 100 0.2’, applied twice). Within each data set, scatterplots of PCs were visually examined and outliers removed. This process was repeated until cases and controls appeared evenly interspersed across all PC pairs.Imputation to the 1000 Genomes phase 1 reference was performed within the PGC pipeline using SHAPEIT for phasing and IMPUTE2 for imputation. Imputation was performed with a chunk size of 3 Mb with default parameters on the full set of 2186 phased haplotypes (August 2012, 30 069 288 variants, release ‘v3.macGT1’). Samples were then combined (within ancestry groups) for relatedness testing and calculation of PC covariates. The same filters as above were employed and we removed one individual from each pair of related or duplicate individuals (pi-hat value >0.2), preferentially retaining cases. [...] Single variant analysis (GWAS within each of the 19 data subsets) was performed using an additive model in PLINK, with the first 10 PCs as covariates, on dosage data. Fixed-effects meta-analysis was accomplished using METAL, with inverse variance weighting. Plotting was performed in R. Analyses were completed with both study-specific PCs and with PCs computed within each ancestry group (‘generic’ PCs for AA, EA, LA and separately the two South African data sets), with similar results. Final results in Manhattan, QQ plot, top hits tables and online use study-specific PCs.Gene and pathway analyses were completed using MAGMA (Multi-marker Analysis of GenoMic Annotation) and default parameters as in the manual (version 1.06). Gene and pathway analyses were conducted on the 16 data sets with individual-level genotype data (versus SNP-level P-values) so that we could control for ancestry using PCs. This was particularly important for the AA, LA and South African samples given poorer external sequence data resources that are needed for the option that uses summary statistics. Thus, we performed gene-based analyses on each of the 16 data sets with genotype data and PCs, followed by pathway analyses on the gene-level results (as per MAGMA procedures). For completeness we also then used the summary statistic method on the three remaining data sets for which raw data were not available, and meta-analyzed (at the gene level, per MAGMA procedures) with the other 16 data sets. Like de Leeuw et al., we used the MSigDB Canonical Pathways because this list contains a wide variety of gene sets, drawn from different gene-set databases, thus providing results that are not overly dependent on the choice of a narrow set of gene sets. […]

Pipeline specifications

Software tools GCTA, LDSC, SNPWEIGHTS, PLINK, SHAPEIT, IMPUTE, MAGMA
Applications Population genetic analysis, GWAS
Organisms Homo sapiens
Diseases Genetic Diseases, Inborn