Computational protocol: Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method

Similar protocols

Protocol publication

[…] We applied four different methods for whole-genome marker-enabled prediction. Genetic profile risk scores (GPRS) were constructed using the effects of all SNPs estimated from single-marker association analyses using PLINK []. An alternative to GPRS is a best linear genomic prediction (GBLUP []) which is based on mixed linear model that regresses phenotypes on all SNPs jointly. For GBLUP we used the MTG2 software [, ]. The third method applied elastic net regularization (EN) using the glmnet package [] in R []. The EN method was recently applied by Wei et al. [] for risk prediction of CD and UC using the IIBDGC iChip data. When applying EN, we first performed a single SNP association analysis using PLINK and then restricted the model space to the 8000 most significant SNPs, followed by 10-fold cross-validation to choose the optimal EN tuning parameter. We also applied BayesR [, ], which uses a Bayesian hierarchical method that models SNP effects as a mixture of normal distributions. To be able to fit the BayesR model to the large datasets in this study we developed a more efficient algorithm implemented in a newer version of the BayesR software. Prior assumptions and MCMC parameters for BayesR were as described in []. For the case-control data, a generalised linear model with a logit link function was used for GPRS and EN, whereas a linear mixed model was used for GBLUP and BayesR.We also tried to apply Bayesian Sparse Linear Mixed Models [] but encountered a run time error (segmentation fault) for the datasets with more than 20,000 individuals using GEMMA v0.94. Another method we investigated was the multiBLUP method developed by Speed and Balding [], which extends the GBLUP method to several variance components and was reported to increase prediction accuracy of CD in the Wellcome Trust Case Control Consortium dataset []. However, using Adaptive multiBLUP implemented in LDAK v4.9, we observed that prediction accuracy was generally lower than GBLUP for the same training sets (Additional file : Figure S1). Such behaviour is unexpected as the GBLUB model can be considered the ‘baseline’ model of multiBLUP. We therefore do not report multiBLUP results in the main text. […]

Pipeline specifications

Software tools PLINK, MTG, GEMMA, MultiBLUP
Application GWAS
Organisms Homo sapiens