Computational protocol: Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases

Similar protocols

Protocol publication

[…] The eQTL study of monocytes () comprises 414 (unstimulated), 367 (IFN24), 261 (LPS2) and 322 (LPS24) volunteers of European ancestry recruited in the Oxfordshire area (gene expression data: http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2232; genotyping data: http://www.ebi.ac.uk/ega/EGAD00010000144, http://www.ebi.ac.uk/ega/EGAD00010000520). The eQTL study of B cells () consists of 288 volunteers, of whom approximately two-thirds overlap with the subjects in the monocytes study, from the same area (gene expression data: http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-945; genotyping data: http://www.ebi.ac.uk/ega/EGAD00010000144). We included 413 (unstimulated), 366 (IFN24), 260 (LPS2) and 321 (LPS24) subjects that have per-sample call rate of >0.92 and autosomal heterozygosity of >0.3225.Samples of the Cardiogenics monocytes eQTL study (https://www.ebi.ac.uk/ega/studies/EGAS00001000411) were European descent and were recruited in five centres. We restricted analysis to 396 non-diabetic subjects who had genetic and expression data of monocytes available in the study. After per-sample call rate (=1) and autosomal heterozygosity (≥0.335 and ≤0.346) filtering, 391 remained for further analyses.Samples from the Fairfax studies were genotyped using the Illumina HumanOmniExpress-12v1.0 BeadChip. Samples from the Cardiogenics study were genotyped using Human 610 Quad Custom arrays. To provide matching dense genotypes in eQTL samples, we imputed genotypes into 1000 Genomes Phase 1 haplotypes using IMPUTE2 () (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html), for all the disease-associated regions. [...] Colocalisation analyses were conducted using the R package coloc, http://cran.r-project.org/web/packages/coloc (). The method requires summary statistics for each SNP: either P-values and MAFs, or estimated genotype effects and their standard errors. As genotypes were available for the eQTL data, we were able to compute the estimated effects and standard errors (snp.rhs.estimates, package snpStats, http://www.bioconductor.org). For the ten studied diseases, their P-values and MAFs from 1000 Genomes were used as input. The assumptions of the colocalisation analyses are as follows: The pair of traits for colocalisation are from independent studies. Our tests for colocalisation between the expression of eQTL and the autoimmune diseases satisfy this assumption.At most one SNP is causal for either of the traits in each test region. Under this assumption, we have five hypotheses: association with neither H0; with either (one of the ten diseases H1 or monocytes/B cells H2) and with both (at two independent SNPs H3, at one shared SNP H4). It is possible that more than one SNP is causal for a trait in a test region. However, in the probable case that one effect is stronger than another, the method effectively tests colocalisation of the strongest effect(s) ().The probability of that a given SNP is causal for a trait is independent of the probability that any other given SNP in the region is causal. This is perhaps counter-intuitive: we might expect the probability of a given SNP is causal is increased when another SNP is known to be causal. However, because we condition on exactly one or zero causal variants, these higher order probabilities need never be used. This allows the probability of each hypothesis to be expressed as shown in Equations (1)–(5) below. Note that this may not apply to association relationships, as associations of two or more SNPs with the trait will not be independent of each other if the SNPs are in high LD. However, it is plausible for causality.Causal SNPs are either directly genotyped, very well tagged or well imputed. In the high-density regions of the ImmunoChip, imputation of common variants has >95% efficiency (), justifying this assumption.Under the above-mentioned assumptions, we set our models as follows. Suppose we have Q SNPs in a test region. Let p0, p1, p2, p12 denote the prior probabilities of an SNP being causal for neither trait, disease only, gene expression of the monocytes/B cells only and both, respectively. These four prior probabilities must sum up to 1 because they comprise all possible outcomes.The pair of traits for colocalisation are from independent studies. Our tests for colocalisation between the expression of eQTL and the autoimmune diseases satisfy this assumption.At most one SNP is causal for either of the traits in each test region. Under this assumption, we have five hypotheses: association with neither H0; with either (one of the ten diseases H1 or monocytes/B cells H2) and with both (at two independent SNPs H3, at one shared SNP H4). It is possible that more than one SNP is causal for a trait in a test region. However, in the probable case that one effect is stronger than another, the method effectively tests colocalisation of the strongest effect(s) ().The probability of that a given SNP is causal for a trait is independent of the probability that any other given SNP in the region is causal. This is perhaps counter-intuitive: we might expect the probability of a given SNP is causal is increased when another SNP is known to be causal. However, because we condition on exactly one or zero causal variants, these higher order probabilities need never be used. This allows the probability of each hypothesis to be expressed as shown in Equations (1)–(5) below. Note that this may not apply to association relationships, as associations of two or more SNPs with the trait will not be independent of each other if the SNPs are in high LD. However, it is plausible for causality.Causal SNPs are either directly genotyped, very well tagged or well imputed. In the high-density regions of the ImmunoChip, imputation of common variants has >95% efficiency (), justifying this assumption. […]

Pipeline specifications

Software tools IMPUTE, COLOC, snpStats
Application GWAS
Diseases Diabetes Mellitus, Narcolepsy