Computational protocol: Common variation near ROBO2 is associated with expressive vocabulary in infancy

Similar protocols

Protocol publication

[…] Within each cohort, expressive vocabulary scores were adjusted for age, age-squared, sex and the most significant ancestry-informative principal components and subsequently rank-transformed to normality to facilitate comparison of the data across studies and instruments. The association between SNP and the expressive vocabulary score was assessed within each cohort using linear regression of the rank-transformed expressive vocabulary score against allele dosage, assuming an additive genetic model.In the discovery cohort, the genome-wide association analysis for each phase was carried out using MACH2QTL using 2,449,665 imputed or genotyped SNPs. SNPs with a minor allele frequency of <0.01 and SNPs with poor imputation accuracy (MACH R2≤0.3) were excluded prior to the analysis, and all statistics were subjected to genomic control correction (). All independent SNPs from the early- and later-phase GWAS below the threshold of P<10−4 (85 and 50 SNPs, respectively) were selected for subsequent follow-up analysis in additional cohorts. Independent SNPs were identified by linkage disequilibrium-based clumping using PLINK) Proxy SNPs within ±500 kb, linkage disequilibrium r2>0.3 (Hapmap II CEU, Rel 22) were removed). All analyses within the follow-up samples were carried out in silico using MACH2QTL or SNPTEST software (). For the selected SNPs, estimates from the discovery (genomic-control corrected) and follow-up cohort(s) were combined using fixed-effects inverse-variance meta-analysis (R ‘rmeta’ package), while testing for overall heterogeneity using Cochran’s Q-test. Signals below a genome-wide significance threshold of P<2.5 × 10−8 (accounting for two GWAS analyses) were considered to represent robust evidence for association.An empirical approach (Bootstrapping with 10,000 replicates) was selected to obtain meaningful genetic effects (basic 95% bootstrap confidence interval) of the reported SNPs in the discovery cohort. For this, we utlilized a linear model of z-standardized expressive vocabulary scores against allele dosage, adjusted for age, age-squared, sex and the most significant ancestry-informative principal components. The local departmental server of the School of Social and Community Medicine at the University of Bristol was used for data exchange and storage.Sensitivity analysis in ALSPAC using locally imputed genotypes on chromosome 3 (based on 1,000 Genomes) was performed as linear regression of the rank-transformed expressive vocabulary score against allele dosage, assuming an additive genetic model, using MACH2QTL (). […]

Pipeline specifications

Software tools PLINK, SNPTEST, rmeta
Application GWAS