Computational protocol: Mixed-effects models for GAW18 longitudinal blood pressure data

Similar protocols

Protocol publication

[…] For testing association between a given SNP and the longitudinal phenotype, we fit the mixed-effects model (1) y i = μ i + x i e β e + x i g β g + u i + e i ( i = 1 , . . . , n ) where yi=(yi1,...,yini)T is the ni×1 phenotype vector of individual i; μi=μ1ni with μ being the grand mean and 1ni being the ni×1 vector whose elements are all equal to 1; xie is the design matrix corresponding to nongenetic covariates (e.g., time), and βe is the associated nongenetic effects; xig is the numerically coded genotype of individual i and βg is the corresponding SNP effect. In the model, we assume random effect ui~N(0,σg2Ki) where Ki is an ni×ni matrix, and random error ei~N(0,σe2Ini). The SNP effect can be tested as H0:βg=0 versus H1:βg≠0 via the likelihood ratio test. For GWAS or WGS data, this test needs to be performed with a large number of SNPs, which can be computationally intensive if we treat Kis as the unknowns and estimate them jointly with the fixed effects. EMMA [] is an efficient algorithm originally developed for GWAS data in which samples are potentially structured. EMMA models the structure effect via a similarity matrix. An R package that implements EMMA can either estimate the similarity matrix using genotype data or take any similarity matrix provided by users. We tweak EMMA for our purpose. We provide EMMA with the following similarity matrix K=diag(K ^1,K ^2,...,K ^n) where K ^is are the estimated correlation matrices from model (1) in which βg is set to 0. The idea of estimating Kis this way is not new and has been used in EMMAX [], a fast version of EMMA. These estimates should be reasonable unless some SNPs have large effects, which is rare for most complex traits. [...] To further identify SNPs interacting with each other and with other nongenetic factors, such as time, we consider the following mixed-effects model (2) y i = μ i + x i e β e + x i g β g + x i gg β gg + x i ge β ge + u i + e i = μ i + x i β + u i + e i ( i = 1 , . . . , n ) where xi[=(xie,xig,xigg,xige)] is the design matrix corresponding to nongenetic factors, p putative SNPs, two-way interactions between p SNPs (resulting in total of p(p−1)/2 terms) and other selected SNP-environment interactions (for GAW18 data, we consider p SNP-age interactions); β[=(βeT,βgT,βggT,βgeT)T] is the vector of all fixed effects. We define μi the same way as in model (1). The random effects ui and ei are also assumed to follow the same distributions as described in model (1). Model (2) includes the effects of all putative SNPs; thus, the number of such effects can be large. To identify SNPs associated with the trait of interest, we use a Bayesian variable selection procedure in which we use a set of latent binary variables γk(k=1,...,q) to indicate which of the q genetic effects (be they main genetic effects, epistasis effects and/or SNP by environment interactions) are associated (γk=1) or not associated (γk=0) with the trait.As in model (1), we assume matrix Ki is known. We apply the Cholesky decomposition to Ki such that Ki=MiMiT where Mi is the ni×ni lower triangular Cholesky decomposition matrix of Ki. Then model (2) can be reparameterized as yi=μi+xiβ+σgMibi+ei where bi=(bi1,...,bini)T~N(0,Ini). We use the same prior distributions for μ, β, γ=(γ1,...,γq)T, and σe2 in Yi et al []. We set the prior of σg to N+(mg0,sg02), where N+(μ0,σ02) is the positive truncated normal density with mean μ0 and variance σ02, and both mg0 and sg02 are prespecified hyperparameters. The proposed method has been implemented upon the widely used R package, R/qtlbim [] for these GAW18 longitudinal data. […]

Pipeline specifications

Software tools EMMAX, QTLBIM
Applications WGS analysis, GWAS
Diseases Substance-Related Disorders