Computational protocol: Association Testing of Clustered Rare Causal Variants in Case-Control Studies

Similar protocols

Protocol publication

[…] To simulate real human genomic structure, we used the Cosi program that was based on a coalescent process . We generated 100 data sets, each containing 10,000 chromosomes of 1 Mb regions. The chromosomes were generated according to the linkage disequilibrium patterns of the HapMap CEU (Utah residents with ancestry from northern and western Europe) samples . For each data set, we randomly selected a ∼20 kb region. We considered two situations: (I) clustered causal variants: 20 rare causal variants were clustered within a ∼6 kb region; (II) non-clustered causal variants: 20 rare causal variants were approximately equally spaced across the whole ∼20 kb. The 20 causal variants were assumed to be (I) all protective; (II) 15 protective and 5 deleterious; (III) 10 protective and 10 deleterious; (IV) 5 protective and 15 deleterious; (V) all deleterious. The population attributable risk (PAR) of each causal variant was assumed to be 0%, 0.2%, 0.4%, 0.6%, 0.8%, and 1%, respectively.Given PAR () and MAF () of the jth causal variant, its genotype relative risk (GRR) is: , –. The indicator function is 1 if the jth causal variant is protective, and is 0 otherwise. The genotypes of a subject were formed by two chromosomes randomly drawn from the pool of 10,000 chromosomes. For a subject with chromosomes , his/her disease status was generated by –, where was the baseline penetrance (set at 1%), and was the minor allele at the jth site. Chromosome pairs were randomly drawn from the chromosome pool with replacement until 500 cases and 500 controls were recruited. [...] We compared CLUSTER with IL-K , KERNEL , SKAT , , WS , and VT . Single-nucleotide polymorphisms with MAF >5% in the combined sample of cases and controls were first removed from the analyses. The per-site P-values of individual variants were obtained by the mid P-values from the Fisher's exact test . The user-specified maximum distance was fixed at 20 kb throughout this work. IL-K and KERNEL were implemented with the R package “vclust” . The maximum window size considered by IL-K was set at 50% of the total region length, ∼10 kb, as suggested by Ionita-Laza et al. . When performing “KERNEL”, tri-weight () was used as the distance measure between any two variants, because this was the default setting in the R package “vclust” . To have a fair comparison, CLUSTER was implemented with the same tri-weight distance measure. The candidate truncation thresholds considered in CLUSTER were 0.10, 0.11, 0.12, …, 0.20. These are suitable P-value truncation thresholds for rare variant association testing .Two burden tests including WS and VT were implemented with the R script by Price et al. (http://genetics.bwh.harvard.edu/rare_variants/). As a representative method of non-burden tests, SKAT was also included into comparisons. SKAT was implemented with the R package “SKAT” . The weight given to the jth variant site (with MAF of ) was set at , because this was the default weight function in the package “SKAT”. Note that the SKAT compared here is the test that optimally combines the burden tests and the original SKAT proposed by Wu et al. .The P-values of CLUSTER, IL-K, KERNEL, WS, and VT were obtained with 10,000 permutations when evaluating type-I error rates and 1,000 permutations when evaluating power, respectively. For SKAT, we used the default Davies method in the package “SKAT” to compute P-values. […]

Pipeline specifications

Software tools Cosi, SKAT
Applications Population genetic analysis, GWAS