Computational protocol: Identification of genetic modifiers of age-at-onset for familial Parkinson’s disease

Similar protocols

Protocol publication

[…] NGRC subjects were genotyped using Illumina HumanOmni1-Quad_v1-0_B BeadChips (Illumina, San Diego, CA, USA) and the Illumina Infinium II assay protocol (). Technical genotyping quality-control criteria have been described in detail (). The array genotyping call rate was 99.92% and reproducibility rate was ≥99.99%. Subjects who were inadvertently enrolled twice, or had cryptic relatedness (PI-HAT > 0.15) were excluded. SNPs were excluded if MAF < 0.01, call-rate < 99%, HWE P < 1E-6, MAF difference in males vs. females >0.15, or missing rate in PD vs. control P < 1E-5. 811,597 SNPs passed quality-control measures (genotype and phenotype data for NGRC are available on dbGaP; http://www.ncbi.nlm.nih.gov/gap, accession number phs000196.v2.p1). Principal component analysis (PCA) was conducted with HelixTree (http://www.goldenhelix.com) using a pruned subset of 104,064 SNPs, as described previously (). No association was detected between PC 1-4 and age-at-onset in all PD (P-values for PC 1-4 = 0.09, 0.15, 0.81, 0.99), in familial PD (P = 0.21, 0.57, 0.73, 0.66), or in non-familial PD (P = 0.21, 0.19, 0.80, 0.95). Thus GWAS was carried out without adjustment for PC. However, we did reexamine the significant findings by including PC1 and PC2 in the model, and found the results to be similar and slightly more significant when corrected for PCs. Imputation was conducted using the IMPUTEv2.2.2 software (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) () and the 1000 Genomes Phase I integrated variant set release v3. Imputed SNPs with info score < 0.9 or MAF < 0.01 were excluded. 6.4 million imputed SNPs passed quality control. In sum, GWAS included 7.2 million SNPs (0.8 million genotyped and 6.4 million imputed). Three of the four signals that reached P < 5E-8 were imputed. We genotyped a subset of the samples because the variants had low frequencies and the quality of imputation for uncommon variants is unclear. For TPM1: 29 heterozygotes and 53 common homozygotes (no rare homozygotes were observed) as predicted by imputation were genotyped. Genotyping results were 98% concordant with imputed genotypes. For TRPS1: 1 rare homozygote, 28 heterozygotes, and 53 common homozygotes as predicted by imputation were genotyped. Genotyping results were 99% concordant with imputed genotypes. For KLHDC1: 29 heterozygotes and 53 common homozygotes (no rare homozygotes were observed) as predicted by imputation were genotyped. Genotyped results were 100% concordant with imputed genotypes. Replication samples were all directly genotyped using genomic DNA on Sequenom iPLEX (Sequenom, San Diego, CA, USA) and TaqMan assays (Life Technologies, Grand Island, NY, USA). None were imputed. Primers are available on request. [...] Discovery: GWAS was conducted using the Cox regression survival analysis, where age-at-onset was treated as a quantitative trait, and an additive genetic model was used for SNP genotypes: [Survival(Age-at-onset, PD status) ∼ SNP]. Using the Cox method, dosages (from 0 to 2 copies) of the minor allele of each SNP were compared, age-for-age, for the hazard of developing PD. Survival was measured as disease-free lifespan, from birth to age-at-onset. A hazard ratio (HR) and P-value was calculated for each SNP under the additive model. Significance was set at P = 5E-8. The “survival” package in R software () was used for Cox regression (http://www.r-project.org/). Manhattan plots were generated using Haploview v 4.2 (). QQ plots were generated using R. Genomic inflation factors (λ) were calculated using the “GenABEL” package version 1.8-0 in R. Effect size on age-at-onset was estimated as the difference in mean age-at-onset (β) using linear regression: [Age-at-onset ∼ SNP]. Linear regression was performed in ProbABEL v. 0.1-9d software (http://www.genabel.org/packages/ProbABEL) (). Replication testing: SNPs that generated P < 5E-8 in discovery were genotyped in all replication samples (familial and non-familial). Replication samples were stratified by family history for statistical testing. For each SNP, we tested the following hypotheses in replication; (a) SNP is associated with age-at-onset in familial PD, with the minor allele being associated with earlier onset, and (b) SNP is not associated with age-at-onset in non-familial PD. Each SNP was tested in each of the replication datasets individually, using Cox regression in R, followed by meta-analyses of replication datasets using the “meta” package version 3.2-1 in R. For datasets that had 6 or fewer observations, Firth’s Penalized estimation was used to improve precision of Cox estimates (,). Datasets with zero observations (lacking rare allele) were not included in the Cox or linear regression, but were included in Kaplan Meier analysis. The effect size on age-at-onset was calculated for each dataset separately using linear regression in R, and then for all datasets combined using “meta” package in R. Meta-analysis forest plots were generated using the “meta” package in R. Moving Average Plots (MAP) of allele frequencies were generated using the algorithm described previously () and implemented in the “freqMAP” package in R. Kaplan Meier Survival plots were generated, and log-rank tests were performed using “survival” package in R. Power: The study was designed as a GWAS for common variants. Discovery of uncommon variants was a surprise. Post-hoc power calculation for GWAS suggested we had only ∼1% power to detect variants with frequencies and effect sizes that we actually detected. The replication datasets had >80% power to detect the signals from the discovery at P = 0.05 assuming no heterogeneity across datasets. PS program was used for power calculation (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize). [...] We used LocusZoom Version 1.1 (http://locuszoom.sph.umich.edu/locuszoom/) () to visualize the location and LD of the top association peaks. We examined Epigenomics Roadmap (via http://genomebrowser.wustl.edu) and ENCODE (via http://genome.ucsc.edu/index.html) () annotations of putative regulatory elements in the regions of our associated signals. We searched eQTL and mQTL databases Genevar (https://www.sanger.ac.uk/resources/software/genevar/) (), eqtl (http://eqtl.uchicago.edu/cgi-bin/gbrowse/eqtl/), SCAN (http://www.scandb.org/newinterface/about.html) () and BRAINEAC (http://www.braineac.org) () for eQTL or mQTL association results for the associated variants, but the variants were not found in any of the databases, likely due to their low frequencies. […]

Pipeline specifications

Software tools HelixTree, IMPUTE, Haploview, GenABEL, LocusZoom, Genevar, eqtl.uchicago.edu, GBrowse
Databases dbGaP
Application GWAS
Diseases Brain Neoplasms, Dementia, Parkinson Disease, Neurodegenerative Diseases