Computational protocol: Susceptibility to tuberculosis is associated with variants in the ASAP1 gene encoding a regulator of dendritic cell migration

Similar protocols

Protocol publication

[…] We extracted genomic DNA from whole blood of the participating subjects using a standard chloroform/proteinase K protocol. We checked DNA quality using 1% agarose gel electrophoresis, determined DNA concentration using Picogreen assay and then normalized concentration for genotyping.In GWAS, genotyping was done using the Affymetrix Genome-Wide Human SNP Array 6.0. Genotypes were called in 5,914 TB cases and 6,022 controls using Birdseed. Individuals were excluded if they had more than 2% missing genotype data or showed excess of heterozygous genotypes (±3.5 standard deviations, ). For each pair of individuals, we calculated identity-by-state (IBS) and excluded samples with IBS > 80% as likely duplicates or close relatives. Finally, IBS was also calculated between each sample from this study and 1,397 samples from the International HapMap project. The IBS relationships were converted to distance, and projected onto two axes of multidimensional scaling. We removed non-European ancestry outliers based on these projections (). After removing these individuals, we excluded SNPs with a call rate of less than 98%, Hardy-Weinberg P-value < 10−6 (in controls), missing rate per SNP difference in cases and controls > 0.02 or a minor allele frequency less than 1%. In total, 799 individuals and 175,385 SNPs were excluded, leaving for analysis 707,452 SNPs in 5,530 cases and 5,607 controls, which we call Set 1. Principal component analysis was performed across autosomal SNPs within Set 1 (). We computed the first 10 principal components and decided to include the top 4 in our downstream analysis to account for population structure remaining in our dataset.We imputed SNPs from the 1000 Genomes Phase I (interim) release into our Set 1 samples using IMPUTE2,. Imputed SNPs were excluded if the imputation quality score r2 was < 0.5 and minor allele frequency was < 1%. After filtering, 7,614,862 SNPs were left for further association analyses.TB association was tested with logistic regression implemented in SNPTEST on genotype likelihoods from imputation, with 4 PCs as covariates. The genomic control inflation factor λGC for SNPs before and after imputation was 1.10 (), indicating that we have successfully controlled for any residual population structure between cases and controls.SNPs from the 1000 Genomes Phase I (interim) release were imputed into Ghanaian and Gambian subjects using IMPUTE2 and TB association of the seven ASAP1 SNPs was tested with logistic regression implemented in SNPTEST.We genotyped seven ASAP1 SNPs using custom Taqman assays () and 7900HT system from Applied Biosystems. We visually checked all genotype clusters, assigned calls and extracted genotypes using SDS 2.3 software. TB association was tested in STATA11 using logistic regression, with the city of the sample origin (St. Petersburg or Samara) as a covariate.To conduct Bayesian fine-mapping we calculated marginal likelihoods for each SNP and Bayes Factors using a normal prior with mean 0 and variance 0.2. Posterior probabilities were assigned using the relative contribution of each SNP’s Bayes Factor to the sum of Bayes Factors across all seven SNPs.We analyzed the seven ASAP1 SNPs in the Ghanaian and Gambian TB GWAS datasets that were published previously,. The Gambian dataset is available in the The European Genome-phenome Archive (accession number EGAS00000000027). Both Ghanaian and Gambian datasets have been QCed by filtering individuals with discordant sex information, elevated missing data rates (>2%), outlying heterozygosity rate (±3.5 standard deviations), duplicated or related individuals with IBS >80% and divergent ancestry based on principal component projections using the HapMap dataset. In total, 101 (13 cases and 88 controls) and 316 (192 cases and 124 controls) individuals were excluded from Ghanaian and Gambian datasets, respectively. After individual quality control, we imputed the seven ASAP1 SNPs from the 1000 Genomes Phase I (interim) release into Ghanaian and Gambian samples using IMPUTE2,. [...] We analyzed genome-wide expression data from DCs isolated from 65 Caucasian subjects. These DCs were either non-infected or infected in vitro for 18 hours with M. tuberculosis and were studied using Illumina HT-12 expression arrays. The raw data can be obtained from the GEO database (accession numbers GSE34588 and GSE34151). For our analysis, we used a cleaned dataset provided by L. Barreiro (personal communications) in which expression levels were regressed using the first 5 principal components for the data from non-infected cells and the first 8 principal components for the data from infected cells. We used the R package snpStats (D. Clayton, snpStats: snpMatrix and XSnpMatrix classes and methods, R package version 1.12.10) to fit generalized linear models, with SNP genotypes as predictor variables and expression levels of ASAP1 as responses. We report P-values for the two-tailed test where the null hypothesis was that the true value of the generalized linear model coefficient is zero. We imputed genotypes of the ASAP1 SNPs rs2033059, rs17285138, rs1469288, rs12680942, rs1017281 with the MaCH and minimac software packages using genotype data from 1000 Genomes. […]

Pipeline specifications

Software tools IMPUTE, SNPTEST, snpStats, minimac
Application GWAS
Diseases Tuberculosis, Tuberculosis, Pulmonary