Computational protocol: Genome-Wide Association Study for Femoral Neck Bone Geometry

Similar protocols

Protocol publication

[…] Genomic DNA was extracted from whole human blood using a commercial isolation kit (Gentra Systems, Minneapolis, MN, USA) following the protocols detailed in the kit. Genotyping with the Affymetrix Mapping 250K Nsp and Affymetrix Mapping 250K Sty arrays was performed using the standard protocol recommended by the manufacturer. Fluorescence intensities were quantitated using Affymetrix Array Scanner 30007G. Data management and analyses were performed using the Affymetrix GeneChip Operating System. Genotyping calls were determined from the fluorescent intensities using the dynamic modeling (DM) algorithm with a 0.33 P-value setting,() as well as the Bayesian robust linear model with Mahalanobis distance (B-RLMM) algorithm.() DM calls were used for quality control, whereas the B-RLMM calls were used for all subsequent data analyses. B-RLMM clustering was performed with 94 samples per cluster.Following an Affymetrix guideline, we set a standard for the minimum DM call rate at 93% for a sample, considering all the single-nucleotide polymorphisms (SNPs) in the two arrays, the 250K Nsp and 250K Sty arrays. Eventually, 997 subjects who had at least one array (Nsp or Sty) reaching a 93% call rate were retained. Because of missing data for bone geometry phenotypes among the 997 subjects, the effective sample size for the GWAS was 987 for both BR and CT. The average call rate for the 987 analyzed subjects reached greater than 95%. Of the initial full set of 500,568 SNPs, we discarded 32,961 SNPs with SNP-wise call rate greater than 95%, 36,965 SNPs with allele frequencies deviating from Hardy-Weinberg equilibrium (HWE, P < .001), and 51,323 SNPs with minor allele frequency (MAF) less than 1%. Therefore, the final SNP set for the GWAS scan contained 379,319 SNPs, yielding a genomic marker spacing of approximately 7.9 kb on average. [...] In all four group samples (i.e., Caucasian GWAS, Caucasian bone geometry, Chinese bone geometry, and Chinese hip fracture cohorts), HWE was assessed by chi-square analyses. SNPs that did not follow HWE were excluded from further data analyses.For the stage 1 Caucasian GWAS sample, parameters such as age, age2, sex, age/age2-by-sex interaction, height, and weight were tested for their associations with BR and CT using stepwise regression. Significant (P ≤ .05) terms were included as covariates to adjust the raw BR and CT values for subsequent analyses. For BR, the covariates were age, height, weight, sex, and age2 × sex. For CT, the covariates were age, weight, and age2. The residuals from a linear model after adjusting for the significant covariates were used as traits in the follow-up data analyses. To minimize possible spurious associations owing to potential population stratification, we used EIGENSTRAT() software in GWAS association analyses. EIGENSTRAT detects and corrects for ancestry information while performing association analyses between phenotypes and genotypes. We used EIGENSTRAT because it can appropriately handle the quantitative data while maintaining sufficient power and robustness.() In the data analyses, the SNPs were coded as 0, 1, and 2 to represent AA, AB, and BB genotypes.Multiple testing is a perplexing issue in GWAS. Because the Bonferroni correction is considered overly conservative given extensive linkage disequilibrium (LD) among markers, we adopted the GWAS significance threshold of approximately 4.2 × 10−7 proposed by Lencz and colleagues.() The gene-wise approach used to calculate this threshold took into account recent estimates of the total number of genes in the human genome.The LD patterns of the most significant gene were analyzed and plotted using the Haploview program() ( haploview/). Focused association analyses on certain SNPs and other miscellaneous statistical analyses were performed using software packages SAS (SAS Institute, Inc., Cary, NC, USA) and Minitab (Minitab, Inc., State College, PA, USA), which include descriptive statistics data analysis and multiple regression analysis to screen significant covariates for BR and CT and the normality of the adjusted BR and CT data.For stage 2 replication analyses of the significant gene, significant parameters (P < .05) such as age, age2, sex, age/age2-by-sex interaction, height, and weight were used as covariates to adjust for the raw BR and CT values. We used a stepwise regression model to screen significant covariates for each study cohort. Since the effects of these confounding factors on the traits were different in each cohort, different covariates were applied. For unrelated bone geometry Caucasian samples (n = 1488), the covariates were age, sex, age × sex, weight, and height for BR; weight, age2, age2 × sex, and age × sex for CT. For unrelated Chinese bone geometry samples (n = 2118), the covariates were age2, weight, height, and age2 × sex for BR and weight, age × sex, age, and sex for CT. The residuals from a linear regression model after adjusting for the significant covariates were used as traits in the in HelixTree ( The SNPs were coded numerically with 0 for AA, 1 for AB, and 2 for BB, respectively. In the replication stage, in order to account for the multiple-testing problem, the SNPSpD method() ( was adopted to infer the effective number of independent tests (Meff), and the overall significance level was set at 0.05/Meff.The P values from cohorts of Chinese and Caucasian samples for bone geometry were combined using Fisher's method() to quantify the overall evidence for association with bone geometry variation. No cohort/geography adjustment was performed because the tests for Caucasian and Chinese are independent and have the same null hypotheses. In the Chinese hip fracture sample, genotype distributions for SNPs were compared using logistic regression models controlling for sex, age, height, and weight as covariates between the fracture versus nonfracture pooling sample including both males and females.To detect population stratification that may lead to spurious association results, we used the software Structure 2.2 ( to investigate the potential substructure/stratification of our sample. The Structure 2.2 program uses a Markov chain Monte Carlo (MCMC) algorithm to cluster individuals into different cryptic subpopulations on the basis of multilocus genotype data.() Using the software, we performed independent analyses under an assumption of K = 2 population strata using 200 unlinked markers in the GWAS unrelated Caucasian cohort and 1000 unlinked markers in the Chinese hip fracture cohort. […]

Pipeline specifications

Software tools RLMM, Haploview, HelixTree
Applications SNP array data analysis, GWAS
Organisms Homo sapiens
Diseases Hip Fractures, Muscular Diseases