Computational protocol: JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects

Similar protocols

Protocol publication

[…] For invertibility of the X′X plug‐in estimate, we require a full rank X genotype matrix from the reference panel. When working with a large number of SNPs, it is convenient to split the covariates into B blocks, Xb, between which we can assume independence – e.g. if the blocks correspond to LD blocks – and within each of which Xb′Xb is invertible. Decomposing the likelihood into a product, the posterior for γ in Equation becomes: p(γ|zL)∝p(γ)∏b=1Bp(Lb−1zb|γb),where zb and γb are the summary statistic and model sub‐vectors, respectively, corresponding to SNPs in block b, and Lb is derived from the Cholesky decomposition corresponding to Xb.We suggest partitioning large genomic regions of interest into approximately independent blocks via the haplotype block recognition algorithm first proposed by Gabriel et al. []. Implementations exist in the widely used software Haploview [Barrett et al., ], and Plink [Purcell et al., ]. Researchers may also wish to consider the recent and substantially more efficient MIG++ implementation [Taliun et al., ]. Some experimentation will be required with the Haplotype block recognition parameters to ensure that each block of the resulting partition corresponds to a full rank genotype matrix in the reference data. Inherently no block can be larger than the reference sample size. In our fine‐mapping case study, after LD pruning for a maximum correlation of 95%, all LD blocks of interest were less than 100 SNPs long, and full rank genotype matrices were readily available from our reference sample of size 2,674. As larger reference samples such as the UK10K are fast becoming available, our method will be applicable to denser correlation structures and larger blocks. This will, however, inherently require the inversion of increasingly large X'X matrices. Recently the spectral decomposition has been leveraged in high dimensional genomics linear mixed models to reduce the computational cost of inverting large correlation matrices from cubic to linear complexity [Canela‐xandri et al., ; Kang et al., ; Speed and Balding, ]. In the future, we plan to implement the same decomposition into JAM, readying the algorithm for scalability to larger LD blocks. […]

Pipeline specifications

Software tools Haploview, PLINK
Databases UK10K
Application GWAS
Organisms Homo sapiens
Chemicals Glucose