Computational protocol: Pathway Analysis Using Genome-Wide Association Study Data for Coronary Restenosis – A Potential Role for the PARVB Gene

Similar protocols

Protocol publication

[…] Analyses were performed using PLINK , GRASS , and ALIGATOR software. We will briefly describe all three methods. During the set-based test of PLINK the joint effect of all genetic variation, fulfilling the test constraints, within the set of genes of pathway of interest is evaluated. First a single SNP analysis of all SNPs within the pathway set is performed. Subsequently, a mean SNP statistic is calculated from the single SNP statistics of a maximum amount of independent SNPs with a p-value <0.2. SNPs are considered independent when the LD expressed in R2 is <0.5. Of the SNPs that are in LD with R2>0.5, the SNP with the lowest p-value in the single SNP analysis is selected. This analysis is repeated 10,000 times in simulated datasets with permutation of the phenotype. An empirical p-value for the SNP set is computed by calculating the number of times the test statistic of the simulated SNP sets exceeds that of the original SNP set.GRASS calculates “eigenSNPs” for each gene in the pathway set by summarizing the variation of a gene using principal component analysis. Subsequently, one or more of these “eigenSNPs” per gene are selected using regularized logistic regression to calculate a test statistic for each pathway set. This analysis is repeated 10,000 times in simulated datasets with permutation of the phenotype. The p-value per pathway SNP set is calculated by comparing the test statistic of the original pathway SNP set with that of the combined simulated pathway SNP sets.ALIGATOR (Association LIst Go AnnoTatOR) analyses gene sets for genes enriched with significant SNPs. Enrichment is defined as a gene set containing a larger number of significant genes than expected by chance. Replicate gene lists of the same length as the original are generated by randomly sampling SNPs (thus correcting for variable gene size). The lists are used to obtain p-values for enrichment for each gene set (by comparing the number of significant genes observed on the actual gene list to that observed on each replicate list), to correct these for testing multiple non-independent categories, and to test whether the number of significantly enriched categories is higher than expected. ALIGATOR uses data from all the SNPs tested in a gene and corrects for the variable numbers of SNPs per gene. Each gene is counted once regardless of how many significant SNPs it contains, thus eliminating the influence of LD between SNPs within genes. We used p-value cutoff <0.005 for SNPs, 5000 replicate gene lists and 1000 permutations as parameters to run ALIGATOR. Pathways were included when 2 or more individual genes contained significantly associated SNPs. , .Considering the exploratory nature of the analysis and the considerate overlap between the pathways, associations of pathways with restenosis were considered worthwhile exploring with P<0.01. Since it has been suggested that PLINK, GRASS, and ALIGATOR provide complementary information , , we proceeded with the secondary analysis when a pathway was associated with restenosis with p<0.01 in at least one of the analyses. The pathways meeting this criterion were explored in more detail by fine-mapping of firstly the genes within those pathways and secondly the individual SNPs. For these secondary analyses, as well as during the replications stage p<0.05 was considered significant. When applying a strict Bonferroni correction, correcting for the 54 tested pathways and three different tests, the threshold for statistical significance was set at p<0.0003 ( = 0.05/(54*3). […]

Pipeline specifications

Software tools PLINK, ALIGATOR
Application GWAS
Diseases Coronary Restenosis
Chemicals Vitamin D