Computational protocol: Multivariate Analysis of Anthropometric Traits Using Summary Statistics of Genome-Wide Association Studies from GIANT Consortium

Similar protocols

Protocol publication

[…] We applied the CPASSOC package developed by Zhu et al. [] to combine association evidence of both sexes with height, BMI and WHRadjBMI. CPASSOC can integrate association evidence from summary statistics of multiple traits. It uses summary-level data from single SNP-trait association of GWASs to detect which variant is associated with at least one trait. This method improves statistical power by analyzing multiple phenotypes and it can be executed with the summary statistics from GWASs. CPASSOC provides two statistics, SHom and SHet. SHom is similar to the fixed effect meta-analysis method [] but accounting for the correlation of summary statistics among cohorts induced by potential overlapped or related samples. In brief, assuming we have summary statistical results of GWAS from J cohorts with K phenotypic traits. In each cohort, single SNP-trait association was analyzed for each trait separately. Let Tjk be a summary statistic for a SNP, jth cohort and kth trait. Let T = (T11,⋯,TJ1, ⋯,T1K,⋯, TJK)T represents a vector of test statistics for testing the association of a SNP with K traits. We used a Wald test statistic Tjk=β^jks^jk, where β^jkands^jk are the estimated coefficient and corresponding standard error for the kth trait in the jth cohort. SHom is then defined as SHom=eT(RW)−1T(eT(RW)−1T)TeT(WRW)−1e,(1) which follows a χ2 distribution with one degree of freedom, where eT = (1,…,1) has length J × K and W is a diagonal matrix of weights for the individual test statistics. We used the sample sizes for the weights, i.e., wjk=nj for the sample size nj of the jth cohort.To define SHet, we first let S(τ)=eT(R(τ)W(τ))−1T(τ)(eT(R(τ)W(τ))−1T(τ))TeTW(τ)−1R(τ)−1W(τ)−1e, where T(τ) is the sub-vector of T satisfying |Tjk| > τ for a given τ>0, and R(τ) is a sub-matrix of R representing the correlation matrix, and W(τ) be the diagonal submatrix of W, corresponding to T(τ). Here we let wjk=nj×sign(Tjk). Then the test statistic is SHet = maxτ>0S(τ).The asymptotic distribution of SHet does not follow a standard distribution but can be evaluated using simulation. SHet is an extension of SHom but power can be improved when the genetic effect sizes vary for different traits. The distribution of SHet under the null hypothesis can be obtained through simulations or approximated by an estimated beta distribution. We first applied both SHom and SHet to combine sex-specific summary statistics for each of the three traits and compared the results with those from conventional meta-analysis of the same discovery phase data in GIANT consortium studies [–]. We next applied both SHom and SHet for combining all the sex-specific summary statistics of the three traits: height, BMI and WHRadjBMI. We hypothesized that meta-analyzing multiple traits would allow us to identify additional variants that are likely to be missed by the conventional meta-analyses for a single trait.To perform CPASSOC analysis, a correlation matrix is required to account for the correlation among phenotypes or induced by overlapped or related samples from different cohorts. Zhu et al. [] suggested using a set of SNPs in linkage equilibrium to estimate the correlation coefficients. We selected the SNP set based on linkage disequilibrium (LD) pattern in the ARIC European American (EA) data (downloaded from dbGaP In brief, the ARIC EA cohort includes 9,707 individuals with approximately 840,000 SNPs genotyped on the Affymetrix Array 6.0 [, ]. We first applied pairwise LD pruning with r2 threshold of 0.2 using the software PLINK ( SNPs with large effect sizes may represent true association, and consequently may inflate correlation among summary statistics. Therefore, we removed SNPs whose summary statistics Z scores were greater than 1.96 or less than -1.96. The final SNP sets for correlation estimation include 81,322 SNPs for height, 82,012 SNPs for BMI, and 81,130 SNPs for WHRadjBMI. We chose the common sets of SNPs for both sexes and three traits that can be mapped to dbSNP human Build 142 to perform the CPASSOC analyses. The numbers of SNPs used in this study are presented in .We reported loci that reached genome-wide significance (P < 5 × 10−8) by CPASSOC from sex-specific data [], but not by sex-combined conventional meta-analysis [–] when using the same samples from the discovery phase. Here we applied the same significant level P = 5 × 10−8 as in GWAS because CPASSOC performs the same number of tests although multiple traits are analyzed. To do this, for a SNP reaching P < 5 × 10−8 by either SHom or SHet, we examined the region within 500 kb of each side of the SNP. The SNP was considered to be identified only by CPASSOC if no SNPs that are genome-wide significant with conventional meta-analysis from the discovery phase data were found in the 1.0 Mb region, and it is not in LD with the index SNPs of the GIANT studies. We performed meta-analysis by combining male and female data for each trait separately, as well as by combining all the three traits and both sexes. […]

Pipeline specifications

Software tools CPASSOC, PLINK
Databases dbGaP
Application GWAS