Computational protocol: Assessing the genetic overlap between BMI and cognitive function

Similar protocols

Protocol publication

[…] Data for the same-sample GCTA–GREML analysis, the phenotypic correlation analysis and the independent cohort for the polygenic prediction analysis came from Generation Scotland: the Scottish Family Health Study, a population-based, family-structured cohort that sampled over 24 000 people in Scotland between the years 2006 and 2011., The study was set up for family-based genetic epidemiology research; health outcomes including coronary heart disease, stroke, cancer, chronic obstructive pulmonary disease, diabetes and mental illness are highly prevalent in Scotland. The sampling frame of the study focused on 7953 probands between ages 35–65 years, who were registered with participating general medical practitioners from five regional centres: Glasgow, Tayside, Ayrshire, Arran and the North-East of Scotland. The probands were invited to participate through the patient lists at the participating general medical practices; in the United Kingdom, ~96% of the population is registered with a general practitioner. Up to three generations of the probands' relatives were then recruited. There was no ascertainment bias towards a particular disease or health condition. A full description of the cohort has been given previously, and at www.generationscotland.org.Cognitive function data for the independent-samples bivariate GCTA genetic correlation analysis came from Generation Scotland., Open access data from dbGaP for the Gene Environment Association Studies initiative (GENEVA) project (comprising three studies, total n=14 347: Atherosclerosis Risk in the Community, Nurses' Health Study and the Health Professionals' Follow-up Study) and the Health and Retirement Study (n=8652) were used for the BMI analysis. Their dbGaP accession numbers are phs000090.v1.p1 (Atherosclerosis Risk in the Community), phs000091.v2.p1 (GENEVA-T2D) and phs000428.v1.p1 (Health and Retirement Study). A summary description of the three cohorts and details about quality controls of genotyped data and imputation can be found elsewhere.For the LDSC genetic correlation analysis, summary data from the largest GWAS studies to date for cognitive function and BMI were used. [...] All phenotypic data analyses were conducted on the unrelated Generation Scotland cohort who had genome-wide genotyping data available (n=6815). To determine the associations between cognitive function and BMI, a linear model was used with general cognitive factor as the independent variable. Age and sex were included as covariates.Age-, sex- and population stratification-adjusted residuals for general cognitive function and BMI were computed by linear regression. A conservative number (fourteen) of ancestry components were included. The residual values were carried forward to genome-wide complex trait analyses—GCTA-GREML, —to obtain the proportion of variation in the variables explained by common SNPs. The univariate GCTA-GREML estimates for general cognitive function have been reported previously.Three methods were used to estimate the genetic correlation between BMI and general cognitive function. First, bivariate GCTA-GREML was run in Generation Scotland where the phenotypic and genotypic information came from the same unrelated individuals. This approach estimates the extent to which genetic similarities correlate with phenotypic similarities. However, the relatively small sample size (and corresponding large s.e.) for this analysis resulted in an imprecise estimate. Second, bivariate GCTA-GREML analysis was used on cognitive data from Generation Scotland and BMI data in American adults from four publicly available data sets. This approach estimates the genetic correlation through the SNP/phenotypic similarities in the independent samples. Third, summary GWAS output from the Davies et al. and Locke et al. papers were used to estimate the genetic correlation via the LDSC regression method. This method does not require raw genotype or phenotype information, and nor does it matter if there is an overlap of individuals in the two GWAS analyses. Briefly, this approach uses Linkage Disequilibrium (LD) structure (SNPs in regions of high LD will tag a greater part of the genome than those in low LD) whereby a SNP's association with a phenotype will result from its individual contribution and that of the surrounding SNPs in LD with it. In a bivariate setting, the expectation of the product of the statistical scores (z-scores) for the SNP–phenotype associations can be expressed as an intercept term and another term, including the genetic covariance between the traits, which does not depend on sample overlap for the input GWAS data.A polygenic score for general cognitive function was calculated using data from a GWAS of general cognitive function (n=48 462); Generation Scotland did not contribute to the meta-GWAS. The greatest proportion of variance (1.27%, P=1.5 × 10−17) explained in general cognitive function was for a predictor that used SNPs with a P-value <0.5 in creating the score. Here we use the same predictor. For a brief summary of polygenic risk scoring, please see .A polygenic score for BMI was created using summary data from a recent meta-analysis, which included 339 224 individuals. Generation Scotland was not included in the study. The greatest proportion of phenotypic variance in BMI is explained by a predictor that contains a subset of all HapMap 3 SNPs. We applied this predictor to our data.Linear regression models were used to assess the relationship between the phenotypes and the polygenic scores, controlling for age, sex and population stratification (the first 14 prinicipal components (PCs)). The polygenic scores were pre-adjusted for age, sex and the 14 PCs with the residuals being used in the main models.Using the results from the polygenic prediction analysis, we can provide estimates for the genetic correlation between cognitive function and BMI, based on their theoretical relationships ().The GWAS output from the general cognitive function and BMI studies were merged to identify SNPs common to both analyses (). Significant hits at a suggestive threshold of P<5 × 10−5 in both studies were carried forward as potential polygenic variants that are important for individual differences in both traits. The total number of hits observed was compared with the expected number, based on an assumption of the two traits being independent.Analyses were carried out in R. The polygenic risk scores were created using Plink., […]

Pipeline specifications

Software tools GCTA, LDSC, PLINK
Databases dbGaP
Application GWAS