Computational protocol: Differential Effects of MYH9 and APOL1 Risk Variants on FRMD3 Association with Diabetic ESRD in African Americans

Similar protocols

Protocol publication

[…] Genotyping of the Affymetrix Genome-Wide Human SNP Array 6.0 in the discovery sample of 966 AA cases with T2DM-ESRD and 1032 non-diabetic, non-nephropathy controls was completed at the Center for Inherited Disease Research (CIDR; using DNA extracted from peripheral blood. DNA from cases and controls were approximately balanced on each 96-well master plate. A fingerprinting set of 96 SNPs was independently genotyped in all samples and results compared to the corresponding SNPs on the Affymetrix array to confirm sample identity. Genotypes were called using Birdseed version 2; APT 1.10.0 by grouping samples by DNA plate to determine the genotype cluster boundaries. The minimum SNP call rate for an individual was 98.4%. Forty-six blind duplicates were genotyped and had a concordance rate of 99.59%. Cryptic relatedness was identified by the estimated identity-by-descent (IBD) statistics as implemented in PLINK ( There were two unexpected duplicate pairs and 54 unexpected first-degree relative pairs. One of each of these pairs was removed by the following rules: 1) retain T2DM-ESRD cases over non-diabetic, non-nephropathy controls, and 2) if case/control status was congruent, retain the individual with the most complete phenotype data. One individual had a self-reported gender inconsistent with X chromosome genotype data and one had an inbreeding coefficient, F-statistic, more than 4 standard deviations from the mean, both were excluded. The results are based on the remaining 952 T2DM-ESRD cases and 988 non-diabetic, non-nephropathy controls. Replication samples were recruited under identical ascertainment criteria to the discovery samples. FRMD3 SNPs were genotyped using the iPLEX™ Sequenom MassARRAY platform for replication. Genotyping efficiency >95% and 45 blind duplicates were included to ensure genotyping accuracy. Genotyping FSGS and HIVAN cases and controls were by TaqMan assays available from ABI Biosystems (Foster City, CA). [...] Each SNP was tested for departure from Hardy-Weinberg Equilibrium (HWE) expectations using a chi square goodness-of-fit test. The primary inference for this conditional/interaction GWAS was the SNPs with <5% missing and no differential missingness between cases and controls, HWE p-value>1E−4 in cases and >1E−2 in controls and minor allele frequency (MAF) in the entire sample >0.05. A total of 832,357 SNPs met these criteria. However, SNPs that did not meet these criteria were secondarily examined for association with consideration given to potential corroborating evidence of association at flanking SNPs, especially those SNPs with some evidence of HWE departure. The average sample call rate was 99.16% for all autosomal SNPs.A principal components analysis (PCA) was computed on the 832,357 SNPs to estimate the primary sources of genetic variations, including potential admixture. One principal component (PC) was retained and it correlated highly (r2 = 0.87) with previously computed admixture estimates based on 70 ancestry informative markers (AIMs) using the program FRAPPE . The same set of AIMs was genotyped in the replication sample and admixture estimates were computed using FRAPPE. As described below, the GWAS association analyses adjust for the first PC and the replication study and combined analyses adjusted for admixture estimates.Since not all individuals homozygous for APOL1 risk variants and/or the MYH9 E1 risk haplotype develop nephropathy, the probability of developing ESRD may depend on non-genetic factors and other genetic factors interacting with the known c22 risk variants. Thus, a series of complementary logistic regression analyses were computed using the program SNPGWA ( The analyses were restricted to SNPs with minor allele frequencies >0.10. The primary inference for the following analyses used the additive genetic model for the SNP, provided there was no evidence of departure from the additive genetic model (additive model lack-of-fit test p-value>5E−2). If the lack-of-fit to an additive model was significant, then the minimum of the dominant, additive and recessive model is reported. In addition, additive genetic models required at least ten individuals homozygous for the minor allele and recessive models required at least 30 individuals homozygous for the minor allele.The primary analysis consisted of a case-only test for an interaction between homozygosity for the MYH9 E1 haplotype or APOL1 risk variants (G1/G1; G2/G2; G1/G2) and individual SNPs across the genome. Specifically, a logistic regression model was computed in cases where the binary outcome was homozygosity for APOL1 risk SNPs or MYH9 E1 haplotypes (versus not homozygous) and independent variables (covariates) were age, gender, first PC to account for admixture and SNP. The case-only analysis makes the strong assumption that the SNP being tested and homozygosity for the c22 variants are independent under the null hypothesis of no interaction. If the assumption of independence under the null hypothesis is met, this case-only analysis can have considerably more statistical power than the corresponding classic case-control interaction model . To make the inference as robust to this assumption as possible, the test was restricted to those SNPs not on c22; note by Mendel's Law of Independent Assortment chromosomes are inherited independently and therefore the independence assumption is met. This assumption was further examined by testing for the interaction in the control sample.As an aid to interpret the case-only interaction analysis, the corresponding classic two-locus logistic regression interaction model was computed. Here, the logistic regression model had T2DM-ESRD status as the outcome, and the predictor variables (covariates) of age, gender, PC, the SNP, an indicator variable for two APOL1 risk variants or MYH9 E1 haplotype homozygosity and the centered cross-product of the SNP and indicator for c22 risk variant homozygosity. Here we mean the standard logistic regression model for two predictor variables (say X1 and X2) with their interaction term, a centered cross-product (e.g., Z) to reduce collinearity/correlation among the variables. Specifically, we would write this model as: ; where, X1 is the SNP and X2 is the indicator variable for the APOL1/MYH9 haplotype (see below), respectively and Z is the center cross-product defined as . The variable Z is defined in this way to reduce the collinearity or correlation among the predictors for better estimation properties. The indicator variable is a binary variable that codes an individual as either 0 or 1, depending on the characteristic of interest. Here, the indicator variable was 1 if the person was homozygous for the APOL1/MYH9 haplotype (easily determinable as it is a recessive model and phase is unambiguous) and 0 if they were not homozygous for these risk haplotypes. This binary (0, 1) variable was included in the logistic regression model. For the case-only analysis, this indicator variable was the outcome in the logistic regression analysis and for the classic two-locus interaction logistic regression models it was one of the predictor variables.Subsequent analyses stratified by homozygosity at the MYH9 E1 haplotype and APOL1 risk variants. A logistic regression model was computed in individuals homozygous for c22 variants, where T2DM-ESRD status was the outcome and the independent variables (covariates) in the model included age, gender, the first PC and the SNP of interest. The analysis was repeated for individuals not homozygous for c22 variants and the test for homogeneity of the odds ratio was computed. Analyses in the replication cohorts paralleled those in the discovery cohort.To determine whether associated SNPs from analyses contrasting individuals with T2DM-ESRD to those without diabetes were DN-associated or T2DM-associated, allele frequencies were compared between AA with T2DM lacking nephropathy to those in the combined T2DM-ESRD case groups and the combined non-diabetic, non-nephropathy control groups.Assuming a recessive model for the MYH9 and APOL1 risk variants with main effect OR = 1.5, haplotype frequency of 0.64, and an additive genetic model for the FRMD3 SNPs having no main effect (OR = 1.0) with minor allele frequency of 0.32, then with a type 1 error rate of α = 1−10, we have 0.50 power to detect an OR = 2.05 and 0.80 power to detect an OR = 2.34. […]

Pipeline specifications

Software tools PLINK, frappe
Applications Population genetic analysis, GWAS
Diseases Diabetes Mellitus, Diabetic Nephropathies, Kidney Diseases, Kidney Failure, Chronic
Chemicals Nucleotides