Computational protocol: Abundant local interactions in the 4p16.1 region suggest functional mechanisms underlying SLC2A9 associations with human serum uric acid

Similar protocols

Protocol publication

[…] This study was approved by the institutional review board of the West of Scotland Research Ethics Service of NHS in the UK. The GWAS data of the ARIC and FHS study cohorts are provided by the NIH Database of Genotype and Phenotype via specific Data Use Certifications issued by the Data Access Committee of the National Heart, Lung and Blood Institute. Both study cohorts have been described in detail elsewhere (–). Only individuals with European ancestry of the two study cohorts were used in this study. Both ARIC and FHS were approved by corresponding local ethics committees and obtained written informed consent from the study participants. ARIC was genotyped with the Affymetrix 6.0 SNP chip and the FHS cohort with Affymetrix 500K and Affymetrix 50K SNP chips.A common protocol was used to perform quality control of the genotype data in both cohorts using the GenABEL package () implemented in R (http://www.r-project.org/): individual call rate at 97%, SNP call rate at 95%, minor allele frequency at 2%, P-value for deviation from Hardy–Weinberg equilibrium at 1.0E−10, false discovery rate for unacceptably high individual heterozygosity at 0.01. SUA in ARIC was corrected for sex, age, body mass index (BMI), serum creatinine, hypertension treatment and sample centre. SUA in FHS was corrected for sex, age, BMI, creatinine, hypertension treatment, renal disease status and generation (SUA in generations 2 and 3 samples measured at their second and first visit, respectively). To control relatedness, individuals that were outliers of the first three principal components computed from the identity-by-state matrix constructed using GenABEL were removed. In addition, subjects younger than 18 years old, or with BMI >50, or with creatinine beyond the range of 3 SD of the population mean were removed from the study. After quality control, 9172 (4884 females) and 5538 (2951 females) samples, 514 662 and 410 947 autosomal SNPs were analysed in ARIC and FHS, respectively (Supplementary Material, Table S1).Genome scans were performed for each cohort as follows: (a) the identity-by-state matrix was reconstructed and the first ten principal components were calculated and stored; (b) SUA was adjusted for covariates correspondingly and normalized using the GenABEL rntransform function and then adjusted for polygenic effects and the first ten principal components to account for relatedness using the mixed model-based polygenic function where the polygenic heritability was computed (Supplementary Material, Table S1) and the resultant environmental residuals (i.e. pgresidualY) were used as the actual trait values for association tests (); (c) conventional GWAS analyses (i.e. assuming additive effects only) were performed using the GenABEL mmscore function () and the consensus threshold (P = 5.0E−08) () was used to identify marginal SNPs; (d) full pairwise genome scans using BiForce that utilizes bitwise data structures and advanced algorithms to allow high-throughput detection of epistasis (). Genome-wide significant thresholds were derived based on the Bonferroni adjustment of actual number of tests as previously described (,), i.e. with 514 662 SNPs and 166 marginal SNPs identified (Supplementary Material, Table S2) in ARIC, the thresholds were 3.8E−13 (P =0.05/(514662 × (514662–1)/2)) for SNP pairs identified from the full pairwise genome scan and 5.9E−10 (P = 0.05/((514662–1) × 166)) for SNP pairs involving at least one marginal SNP. We adopted the threshold of 1.0E−05 for local interactions derived previously based on permutation ().Significant epistatic SNP pairs were tested for replication in FHS at the SNP level only for simplicity, i.e. claiming a replication of an epistatic pair only if both SNPs were genotyped and with Pint < 0.05 in FHS (). Conditional tests were carried out by fitting one or multiple marginal SNPs as fixed effects in the background and then each of other SNPs or SNP pairs individually in the same way(s) as used in the genome scans and considering the SNP or SNP pair statistically independent if the conditional P/Pint < 0.05. The forward selection approach was used when multiple independent associations were available in the conditional tests: to select the most associated SNP or SNP pair (i.e. with the lowest conditional P/Pint), fit into the background and test the remaining, repeating until no more significant conditional associations were found. Variance explained was calculated using the polygenic function with marginal SNPs or SNP pairs fitted as fixed effects.We imputed the 4p16.1 region (from 9900 to 10400 kb) based on 9172 samples and 260 typed SNPs in ARIC using IMPUTE2 () and the 1000 Genomes Project reference panel (phase1 integrated variant set v3). We used SNPTEST (v2.5) () to test associations of 2610 imputed SNPs (minor allele frequency >0.01) with the same SUA trait in the frequentist additive model using genotype dosages. We used PLINK2 (https://www.cog-genomics.org/plink2/) to take the best genotypes of the imputed SNPs and then performed forward selection and conditional tests in R as described earlier.GWAS marginal SNPs and genome-wide significant epistatic SNPs within the 4p16.1 region were analysed for enrichment of ENCODE () cell-type-specific enhancers using the online tool HaploReg (http://compbio.mit.edu/HaploReg) that tests enrichment based on a rigorously defined genomic background (i.e. all the SNPs genotyped) (), with LD information (r2>0.8) from the 1000 Genomes Project and a background set of Affymetrix 6.0 SNPs. ANNOVAR () and UCSC genome browser () were used for functional annotation of SNPs within the region to identify regulatory signals associated with these loci. Enlight (http://enlight.usc.edu) was used to visually inspect the relationship between LD and regulatory signals. […]

Pipeline specifications

Software tools GenABEL, IMPUTE, SNPTEST, PLINK, HaploReg, ANNOVAR
Databases UCSC Genome Browser
Applications GWAS, Genome data visualization
Diseases Atherosclerosis
Chemicals Lead, Potassium, Uric Acid