Computational protocol: Genomewide meta‐analysis identifies loci associated with IGF‐I and IGFBP‐3 levels with impact on age‐related traits

Similar protocols

Protocol publication

[…] From each cohort's result file, monomorphic SNPs as well as SNPs with an imputation quality below 0.3 were excluded prior to the meta‐analysis. All study‐specific GWAS results were corrected by the genomic inflation factor λGC if >1. Due to the IGF‐I and IGFBP‐3 assay‐based differences in both effect sizes and variances of measurements across cohorts, a sample size‐weighted z‐score‐based meta‐analysis implemented in METAL (Willer et al., ) was conducted, and the meta‐analysis P‐values were corrected for genomic inflation. After meta‐analysis, SNPs with a MAF ≤1% were removed from subsequent analyses.Our multistage design had two GWAS stages (stages 1 and 2) and an additional stage (stage 3) with de novo genotyping data (N = 3364 individuals) to confirm novel loci. After stage 1 GWAS, all 19 lead SNPs from all traits with a P < 10−6 were taken forward to stage 2. All IGF‐I lead SNPs of novel loci that had a combined stage 1 and stage 2 P < 10−8 (except GCKR) were selected for de novo replication in an additional cohort. An overview of the design and the significantly associated loci at each stage is provided in Fig. S6 (Supporting information). Details on SNP selection and quality control are given in the Appendix S1 (Supporting information). Regional association plots were generated using LocusZoom (Pruim et al., ). [...] To define a lead SNP of each locus, the association results of a GWAS stage with P‐values <1 × 10−5 were grouped based on the LD structure of the HapMap release 28 CEU dataset using PLINK (settings r 2 >0.01, 1 Mb distance) (Purcell et al., ). Due to the strong association of the IGFBP3 locus with IGFBP‐3, only one lead SNP was selected regardless of several grouped results.The analysis of secondary signals in the NUBP2 locus was performed using the software gcta (Yang et al., ) and the genotypes of the SHIP cohort as a reference, and was verified by an analysis using the genotypes of the NHS/HPFS cohorts as a reference. [...] Genomewide gene‐based tests which account for both gene length and LD between SNPs were performed by vegas 0.8.27 (Versatile Gene‐Based Association Study) (Liu et al., ) using SNP P‐value results from the overall meta‐analyses. SNPs were allocated to one or more autosomal genes using gene boundaries ±50 kb. We performed 1 × 107 permutations and defined a gene‐based P‐value <1 × 10−6 as gene‐based genomewide significant. [...] Plasma proteome data were obtained as described in Appendix S1 (Supporting information) using liquid chromatography–mass spectrometry (LC‐MS). mascot (in‐house mascot server v2.3.2; Matrix Science, London, GB) search algorithm was used to match the generated peak lists with a human fasta‐formatted database containing 20 268 unique sequence entries (reviewed human database, release of October 2011). Prior to data analyses, all peptide intensity values were log10‐transformed and median–median‐normalized. Association analyses between peptides and serum IGF‐I and IGFBP‐3 levels were performed by linear regression, adjusted for age, sex, and the MS processing batch. Associations of a SNP with the peptides were conducted by linear regression, adjusted for age, sex, and the first four principal components of a peptide‐level‐based principal component analysis. Protein intensities used for analyses were obtained by averaging the corresponding peptide intensities that passed the QC filter, and were put instead of the peptide intensities into the association model. All measured peptides that passed QC and that belonged to proteins which were encoded by genes located in a 500‐kb vicinity of our lead SNPs were selected for association analyses. The assignment of protein names (uniprot identifiers) to the corresponding genes was performed using the DAVID gene conversion tool (http://david.abcc.ncifcrf.gov/). Finally, after QC the following proteins measured in 197 SHIP‐TREND samples were available: ALS, CC121, IBP3, and RT34. [...] encode and roadmap data were assessed using haploreg (http://www.broadinstitute.org/mammals/haploreg/haploreg.php) and regulomedb (http://regulomedb.org/). Statistical analysis of individual Regulome scores was performed using Monte Carlo sampling of 10 SNPs (the size of our ‘observed data’ pool). RegulomeDB assigns scores to SNP loci based on the presence of histone marks, predicted and experimentally validated transcription factor binding, DNase hypersensitivity, and other evidence for regulatory function. Scores range from 1 to 7, with lower scores indicating stronger evidence for the presence of a regulatory element. For the purpose of this analysis, score subcategories (1a, 1b, etc.) were merged. A multinomial test was performed for a statistical comparison between the observed distribution and the background distribution. The LD plot in Fig. B was generated using haploview 4.2 with genetic data downloaded from version 3, Release 2, using the genomic region Chr1:109590000‐109630000, and analysis panel CEU + TSI. Histone mark and DNase tracks in Fig. C were downloaded from UCSC Genome Browser. […]

Pipeline specifications

Software tools Mascot Server, DAVID
Application MS-based untargeted proteomics
Organisms Homo sapiens
Diseases Cardiovascular Diseases, Diabetes Mellitus, Neoplasms, Neoplasms, Adipose Tissue