Computational protocol: Gender Differences in Genetic Risk Profiles for Cardiovascular Disease

Similar protocols

Protocol publication

[…] For each gene, we aimed to genotype a set of variants that would capture the common variation present in the gene, as well as variants that have been previously associated with CVD or related traits. For the majority of the genes, haplotype-tagging single nucleotide polymorphism (SNP) variants were selected from the SeattleSNPs database (http://pga.gs.washington.edu/). The SeattleSNPs project has resequenced the genes using 24 Centre d'Etude du Polymorphisme Humain DNA samples, and tag SNPs have been selected using LDSelect, an algorithm that is based on the linkage disequilibrium (LD) statistic r2 . We selected tag SNPs from each multi-SNP bin with a frequency >10%. For genes that were not included in the SeattleSNPs sequencing project, we selected variants from public databases (Celera, dbSNP), at approximately 5 kb distance from one another, giving priority to variants with known frequency information. Once HapMap phase I data were available, we selected additional variants to better capture the common variation in these genes. More detailed information about gene cladistics, sequence and haplotype structure information was available for apolipoprotein E (APOE), lactase (LCT), and lipin 1 (LPIN1)-genes, and here variant selection was based on previously published sequencing and haplotype analysis –. A full list of the variants selected for study and successfully genotyped (see below) is provided in . [...] Genotype frequencies in sub-cohort individuals were tested for deviation from HWE using Pearson's chi-square test statistics with 1 degree of freedom for bi-allelic variants and 3 for three-allelic variants, applying a threshold of p<0.01. For variants in which one of the genotype groups had less than 5 individuals, HWE was calculated using an exact test. Allele segregation within trio families was analyzed with the PedCheck program . Pair-wise LD between the variants in each gene, haplotype frequencies, and haplotype tags were assessed with Haploview software version 3.32 . For variants in high LD with each other (r2>0.95), only one of the results is shown.Time-to-event analysis was used to assess whether any of the tested allelic variants have effect on the incidence of CHD, ischemic stroke, or CVD. The effects under recessive, dominant and multiplicative models of individual variants were tested using the proportional hazards regression model where the case-cohort design was taken into account by applying a modification of the Prentice weighting , with the non-case sub-cohort members and sub-cohort cases before events weighted with the inverses of their individual inclusion probabilities to account for the over-sampling of cases. Estimation of model parameters and standard errors was carried out in R statistical environment, using the coxph function of the package survival and its robust variance estimator. We adjusted for classic CVD risk factors: smoking, high density lipoprotein-cholesterol (HDL-C), non-HDL-cholesterol, history of diabetes, BMI, and hypertension, as well as geographic region (western Finland, northern Finland, and eastern Finland), and cohort (and gender for combined analysis in women and men). Age was used in the models as the time scale. We fitted two types of models. In the first model, men and women from both cohorts were analyzed jointly, as described above. In the second model, we carried out a test for genotype-gender interaction, defined as a departure from multiplicative, dominant or recessive model, using similar regression models and testing the null hypothesis of equality of genotype effect parameters between men and women. We report results in which the variant genotype specific p-value is ≤0.01 for either men or women. We verified that these results do not stem from a single cohort by testing the null hypothesis of equality of genotype effect parameters between FINRISK-92 and FINRISK-97 cohorts, using a similar regression model. For variants that conferred a risk at p<0.05 for CHD, we also studied the association in prevalent CHD cases (documented or self-reported myocardial infarction or unstable angina pectoris at baseline), using healthy sub-cohort subjects as controls. The analysis of prevalent cases was carried out using logistic regression, again with inverse sampling probability weighting, and using age, cohort and geographic region, and gender as covariates for the combined analysis of men and women. Analysis of haplotype effects was done for two variants of the F12 gene that were not in very high LD with each other and were both associated at p<0.01 with CHD and CVD. Haplotype analysis was done with an additive model, in which the common haplotype (containing the ‘non-risk’ alleles) was used as reference, and modeling an additive effect for the other haplotypes, in a weighted Cox proportional hazards model, applying the same weighting scheme and covariates that were used for single variant analysis, and using the PHREG procedure implemented in SAS version 9.1.3 SP4. Haplotype uncertainty was taken into account using multiple imputations, where a sample of haplotypes was obtained using Phase 2.1.1 software and the analysis was repeated for each sampled haplotype pair.Additionally, we tested whether allelic variants were associated with quantitative traits measured at baseline in sub-cohort individuals without a history of CVD. The lipid variables studied were: serum total cholesterol, HDL-C, triglycerides, and low density lipoprotein-cholesterol (LDL-C). LDL-C was calculated from measured values of total cholesterol, HDL-C and triglycerides using Friedewald's formula and excluding individuals with triglyceride value >4.0 mmol/l. Additional variables studied were mean blood pressure (average of systolic and diastolic blood pressure, each value based on two subsequent measurements), high sensitivity CRP, BMI, and WHR. Association of the variants with baseline measurements was tested using standard linear regression, employing additive, dominant, and recessive models, while adjusting for cohort, age, geographic region, and gender. Tests for genotype-gender interaction, defined as a departure from additive, dominant or recessive model, were carried out using similar regression models and testing the null hypothesis of equality of genotype effect parameters between men and women. Individuals using lipid lowering medication were excluded from the analyses of lipid variables, and individuals using drugs for hypertension were excluded from the analysis of blood pressure. We used logarithmic transformation for CRP and triglycerides. We verified that the results reported do not stem from a single cohort by testing the null hypothesis of equality of genotype effect parameters between FINRISK-92 and FINRISK-97 cohorts, using a similar regression model.For genes in which two or more variants (not in perfect LD) were associated at p<0.01 with a given quantitative trait, we also performed haplotype analysis to discern which allelic haplotype might be contributing to variation in the trait. Haplotype tagging variants were identified with the Haploview software version 3.32 using default settings. Analyses with the haplotype-tagging variants were performed with the haplo.stats package of the R statistical software , using the function haplo.glm with an additive model, and adjusting for age, cohort, geographic region and gender. The haplo.glm function estimates haplotype frequencies with the EM algorithm and calculates for each haplotype linear regression coefficient and p-value, comparing each haplotype to a base haplotype, defined as the most common haplotype. Rare haplotypes (frequency <0.05) were combined with the base haplotype for this analysis. The global p-value for haplotype effect coefficients was calculated for the null hypothesis of no effect for any haplotype.For the initial analyses of the FINRISK-92 case-cohort alone, time-to-event analyses and quantitative trait analyses were done as previously described , , , analyzing women and men both separately and together. We did not perform formal gender-genotype interaction analysis or haplotype analysis at this stage.In reporting the findings, we used a cut-off value of 0.01 for the p-values and reported uncorrected p-values. The cut-off value of 0.01 corresponds to posterior odds 6:1 of a finding being a true signal when we expect to see two signals among the 27 independent genes and our power is 70% (see The Wellcome Trust Case-control Consortium's 2007 paper for details) . The effect of multiple testing was addressed with standard Q-Q-plots for the individual test statistics and with false discovery rate (FDR) analysis , . The tail-area FDR statistic for a group of tests can be interpreted as the expected proportion of null results given the observed test statistics. The analysis was carried out using the R package “fdrtool” . The method used for power simulations is described in more detail elsewhere . The reported results are for both cohorts combined, for tests of the null hypothesis of no genotype effects (or no genotype-gender interaction) at 1% significance level. While simulating genotype-gender interaction we assumed no genotype effects for men while varying the effect for women. […]

Pipeline specifications

Software tools ldSelect, PedCheck, Haploview, haplo.stats, fdrtool
Databases dbSNP
Application GWAS
Organisms Homo sapiens
Diseases Cardiovascular Diseases, Cerebral Infarction, Coronary Disease