Computational protocol: Geographic Differences in Genetic Susceptibility to IgA Nephropathy: GWAS Replication Study and Geospatial Risk Analysis

Similar protocols

Protocol publication

[…] The primary association analyses were performed using PLINK version 1.07 . Similar to GWAS, we selected a standard 1-df Cochran-Armitage trend test as the primary association test. We also estimated the per-allele odds ratios and 95% confidence intervals for all tested SNPs within each individual cohort. The results across multiple cohorts were combined using an inverse variance-weighted method under a fixed-effects model (PLINK), as well as using a random effects model as proposed by Han and Eskin (METASOFT) . We also tested for heterogeneity across cohorts by performing a formal Cochrane's Q heterogeneity test as well as by estimating the heterogeneity index (I2) . [...] Each study participant was scored for the number of risk alleles and the distributions of protective alleles were compared between cohorts of different ethnicity. Only individuals with complete genotype information at the 7 scored loci (14 alleles) were included in this analysis. The distributions were analyzed separately for cases and controls. A χ 2 goodness-of-fit test was used to derive p-values for comparison of distributions. Because of a relatively small number of individuals at the tails of the distributions, for the purpose of statistical testing the tails of the distributions were binned into single-bin categories to achieve expected cell counts >5.To confirm the results of conditional analyses and refine the genetic risk score proposed in the original GWAS, we subjected the genotype data from the entire cohort to a stepwise regression algorithm that selects significant covariates for the best predictive regression model based on Bayesian Information Criterion (the step function, R version 2.10). At model entry, we included all 12 genotyped SNPs, all 21 tested interactions, as well as cohort membership as a fixed covariate. Consistent with the results of our conditional analysis, the stepwise algorithm retained only the 7 SNPs exhibiting an independent effect along with the rs6677604*rs2412971 interaction term. All other terms were automatically dropped from the regression model.The risk score was calculated as a weighted sum of the number of protective alleles at each locus multiplied by the log of the OR for each of the individual loci from the final fully adjusted model. Only individuals with non-missing genotypes for all 14 alleles were included in this analysis. The risk score was standardized across all populations using a z-score transformation, thus the standardized score represented the distance between the raw score and the population mean in units of standard deviation. The percentage of the total variance in disease state explained by the risk score was estimated by Nagelkerke's pseudo R 2 from the logistic regression model with the risk score as a quantitative predictor and disease state as an outcome. The C-statistic was estimated as an area under the receiver operating characteristic curve provided by the above logistic model. These analyses were carried out with SPSS Statistics version 19.0. […]

Pipeline specifications

Software tools PLINK, METASOFT, SPSS
Applications Miscellaneous, GWAS
Diseases Diabetes Mellitus, Type 1, Kidney Diseases, Multiple Sclerosis, Inflammatory Bowel Diseases, Renal Insufficiency