Computational protocol: Genetic determinants of common epilepsies: a meta-analysis of genome-wide association studies

Similar protocols

Protocol publication

[…] We used prespecified criteria for quality control to filter cases and controls from the 12 cohorts (). Because contributing sites had used different genotyping platforms, we did imputation to infer genotypes for common genetic variants that were not directly genotyped, allowing us to combine results across sites. Each of the five sites imputed their study datasets according to a standardised protocol. This protocol used IMPUTE2 to infer and impute haplotypes, with the 1000 Genomes Phase I (interim) June, 2011, reference panel ().Investigators at each site did a linear mixed-model association analysis for each of their datasets with FaSTLMM (version 1.09). This analysis uses linear regression, including a polygenic term designed to account for the contributions of population stratification and causal variants aside from the one being tested. Although we were assessing a binary trait, we used linear regression (rather than logistic regression) because we expected effect sizes to be small. We did this analysis separately for each of the preselected phenotypic categories of epilepsy (all epilepsy, genetic generalised epilepsy, and focal epilepsy). Sex was included as a covariate.We did a fixed-effects meta-analysis with METAL (version generic-metal-2011-03-25). Because almost all epilepsy cases were of European descent (), we chose a fixed-effects model to optimise power. Single nucleotide polymorphisms showing significant amounts of heterogeneity (p<0·05) were removed before application of the fixed-effects analysis. We applied genomic correction to the association analysis results for each dataset before combining for meta-analysis. These steps were done separately for each of the three phenotypic tests.We set our genome-wide threshold for statistical significance at 1·66 × 10−8, representing an empirical Bonferroni correction of the 5 × 10−8 genome-wide significance threshold for three tests. We regarded signals with p values between 1·66 × 10−8 and 5 × 10−7 as suggestive evidence of association.We calculated the proportion of phenotypic variance a variant must explain (heritability) for the detection power to be at least 80%. We used variance explained on the liability scale, for which we assumed a point prevalence of 0·5% for all epilepsy, 0·2% for genetic generalised epilepsy, and 0·3% for focal epilepsy. The required heritability was 0·07% or greater for all epilepsy, 0·17% or greater for genetic generalised epilepsy, and 0·10% or greater for focal epilepsy ().In addition to the main association analysis, we did logistic regression for variants in a 1 megabase window centred on each variant that showed suggestive evidence of association (p<5 × 10−7) from any of the three meta-analyses (all epilepsy, genetic generalised epilepsy, or focal epilepsy). The purpose of this analysis was technical validation and to estimate odds ratios (ORs). We analysed the dosage data, including sex and the first 20 principal components, with PLINK (version 1.07), and then combined the results from each site again with a fixed-effect meta-analysis.Conditional analysis was done with FaSTLMM (version 2.0) on variants in the same regions as those defined for the logistic regression. The purpose of the conditional analysis was to establish whether any other genetic variants in the region were associated with the disease phenotype, independent of the strongest signal from that region. We conditioned on the most significant variants within each of the three regions. Sex was included as a covariate in the conditional analysis. We applied Bonferroni correction to control for multiple testing in the conditional analysis and set the threshold for significance at 5 × 10−6 (each 1 megabase region contained approximately 10 000 single nucleotide polymorphisms).To assess the accuracy of the imputation across regions showing signals satisfying genome-wide significance, we did genotyping in a subset of patients included in the meta-analysis and compared hard genotypes with imputation dosage files. We selected a subset of individuals to represent each of the three broad ethnic origins included in our analysis (ie, European ancestry, African-American, and Asian). Genotyping was done with TaqMan (Life Technologies, Carlsbad, CA, USA) for rs28498976, Sanger sequencing for rs6732655, and Kasper KASP (LGC Genomics, Hoddesdon, Hertfordshire, UK) for rs2947349 (), because differences in sequence context required specific genotyping platforms for each single nucleotide polymorphism.We did enrichment analysis with the interval-based enrichment analysis tool as integrated in the package INRICH (version 1.0). Briefly, INRICH takes a set of independent, nominally associated genomic intervals and tests for enrichment of predefined gene sets with permutation. We analysed variants with p values less than 1 × 10−5 and defined the interval around index single nucleotide polymorphisms with an r2 threshold of 0·2. Gene sets as defined by gene ontology pathways were tested for enrichment. […]

Pipeline specifications

Software tools IMPUTE, PLINK, INRICH
Application GWAS
Diseases Epilepsy, Epilepsies, Partial, Nervous System Diseases, Genetic Diseases, Inborn