Computational protocol: Exome-wide association study of plasma lipids in >300,000 individuals

Similar protocols

Protocol publication

[…] Each contributing cohort analyzed the ancestries within their cohorts separately and studies collected on case/control status analyzed cases separately from the controls. We performed both single variant and gene-level association tests. In the association analysis, we obtain residuals after controlling for sex, age, age and up to 4 principal components as covariates. Studies that had related samples analyzed the association using linear mixed models with relatedness estimated from genome-wide SNPs or from pedigrees.From each study, we collected single variant score statistics and their covariance matrix for variants in sliding windows across the genome. Summary association test statistics were generated using RAREMETALWORKER or RVTESTS. Using summary association statistics collected from each study, we performed meta-analysis of single variant association tests using the Mantel-Haenszel test and constructed burden, SKAT and variable threshold tests using the approach by Liu et al. For burden and SKAT, we used minor allele frequency thresholds of 1% and 5% and for VT, we applied minor allele frequency threshold of 5%. In the SKAT test, variants are weighted according to their minor allele frequencies, using the beta kernel β (1,25).Using covariance matrices between single variant association statistics, we were also able to perform conditional association analyses centrally, which distinguishes genuine signals from “shadows” of known loci. Details of the methods can be found in Liu et al.We centrally performed quality control for the data. We aligned study reported reference and alternative alleles with alleles reported in the NHLBI Exome Sequencing Project and remove mis-labelled variant sites that can be strand ambiguous. For variant sites in each study, we removed variants that had call rate < 0.9 or had Hardy Weinberg P values <1×10−7. Finally, as additional checks, we visually inspected for each study the scatter plot of variant allele frequency against frequencies from ethnicity-matched populations in the 1000 Genomes Project, and made sure that the strand and allele labels were well calibrated between studies.Single variant associations with P < 2.1 × 10−7 (0.05/242,289 variants analyzed) and gene-based associations with P < 4.2 × 10−7 (0.05/[20,000 genes * 6 tests]) were considered significant. Novel loci were defined as being not within 1 megabase of a known lipid GWAS SNP. Additionally, linkage disequilibrium information was used to determine independent SNPs where a locus extended beyond 1 megabase. All novel loci reported in this manuscript are > 1 megabase from any previously reported locus and independent (r2 < 0.2 was required for variants within 3 megabases). […]

Pipeline specifications

Applications WES analysis, GWAS
Organisms Homo sapiens, Mus musculus
Diseases Coronary Artery Disease, Diabetes Mellitus, Type 2, Macular Degeneration, beta-Thalassemia
Chemicals Cholesterol