Computational protocol: Rare variant analysis of blood pressure phenotypes in the Genetic Analysis Workshop 18 whole genome sequencing data using sequence kernel association test

Similar protocols

Protocol publication

[…] GAW18 provided whole genome sequencing (WGS) data and longitudinal phenotype data for related individuals of Mexican American heritage. Because SKAT can only handle unrelated individuals, we used both sequence and phenotype data on a subset of 142 unrelated individuals. Using PLINK [], we extracted the data and created input files for SKAT. We used the first-visit measurements for diastolic blood pressure (DBP) and systolic blood pressure (SBP) for all 200 simulated data and the most likely genotype data based on sequencing (geno.csv files). For covariates of both DBP and SBP, we used age, sex, blood pressure medication, and smoking. [...] We applied SKAT to the analysis of all 200 replicates of simulated DBP and SBP phenotypes for 2,652,577 SNPs that are located in genes, spanning all odd-numbered chromosomes provided by GAW18. Because our analysis was based on 142 unrelated individuals, instead of using minor allele frequency (MAF) provided by GAW18 (which is based on 959 related individuals), we computed MAF based on these 142 individuals using PLINK []. We also constructed a data set including low-frequency SNPs (with our computed MAF <0.05).For a continuous trait Y, SKAT uses a linear model Yi=γ0+γ1Xi+βGi with genotype values Gi and covariates Xi for subject i. As described by Wu et al [], SKAT assumes that the genetic effect βj of an individual variant j follows an arbitrary distribution with mean 0 and variance wjτ, where τ is a variance component and wj is a prespecified weight for variant j. SKAT further assumes that wj follows Beta (MAFj; a1,a2). Weighting with a1 = a2 = 1 corresponds to equally weighting all variants regardless of their MAF, which was shown to be equivalent to the C-alpha test by Neale at al []. Weighting with a1 = a2 = 0.5 is the same as the weight used by Madsen and Browning []. Default linear weighting by SKAT uses a1 = 1 and a2 = 25, which puts more weight on rare variants than Madsen-Browning (M-B) weighting, as shown in Figure .For both data sets (one with all SNPs and another with only low-frequency SNPs), we ran SKAT using these 3 weighting schemes: equal weighting, Madsen-Browning weighting, and SKAT default linear weighting. For each scenario, we obtained p-values for the 10,922 genes in 200 replications. To evaluate performance, we computed power (true-positive) and type I error (false-positive) rates at level 0.05. For each gene, power was computed by the proportion of replicates with p < 0.05 over 200 replicates. The overall power was computed by averaging these values across all causal genes. Type I error was computed by averaging these values across all null genes. […]

Pipeline specifications

Software tools PLINK, SKAT
Applications WGS analysis, GWAS