Computational protocol: A Novel, Functional and Replicable Risk Gene Region for Alcohol Dependence Identified by Genome Wide Association Study

Similar protocols

Protocol publication

[…] CNV370 beadchip has only one-sixth of markers overlapping with Human1M beadchip. To know if the risk markers identified in AAs and EAs (Human1M) could be replicated in Australians (CNV370), we imputed the genotype data in Australians to fill in the missing markers and then performed association tests. First, we pre-phased the original genotype data 5 Mb around the risk genes of interest in Australians. Second, we used 1,000 Genome Project and HapMap 3 CEU datasets as reference panels to impute the missing genotypes in this 5 Mb region by the program IMPUTE2 . This program uses a Markov Chain Monte Carlo (MCMC) algorithm to derive full posterior probabilities of genotypes of each SNP (burnin = 10, iteration = 30, k = 80 and Ne = 11,500). If the probability of one of the three genotypes of a SNP was over the threshold of 0.95, the genotypes of this SNP were then expressed as a corresponding allele pair for the following association analysis; otherwise, they were treated as missing genotypes. For SNPs that were directly genotyped, we used the direct genotypes rather than the imputed data. The imputed genotype data in Australians were checked for Mendelian errors by the program PEDCHECK . [...] Before statistical analysis, we cleaned the phenotype data first and then the genotype data. This cleaning process yielded 805,814 SNPs in EAs, 895,714 SNPs in AAs and 300,839 SNPs in Australians. [Detailed cleaning steps were described previously ].Genome-wide association tests in AA discovery sample: The allele and genotype frequencies were compared between cases and controls in AAs using genome-wide logistic regression analysis implemented in the program PLINK . Diagnosis served as the dependent variable, alleles or genotypes served as the independent variables, and ancestry proportions (to control for admixture effects), sex, and age served as covariates. Ancestry proportions of each individual were estimated from 3,172 completely independent markers . The top-ranked SNPs (p<10−4) were also tested by Fisher's exact tests without controlling for admixture effects. The p-values derived from these analyses are illustrated in and the top 5 SNPs are listed in .Association tests in the primary EA replication sample: Associations between the above top-ranked SNPs (p<10−4) and alcohol dependence were tested using logistic regression analysis (with ancestry proportions, sex and age as covariates) and Fisher's exact test (without covariates) in EAs, to identify risk genes (i.e., Plant HomeoDomain (PHD) finger protein 3 gene - protein tyrosine phosphatase type IVA gene, member 1 (PHF3-PTP4A1) here) that were enriched with replicable markers. Then, associations between alcohol dependence and all nominally significant SNPs (p<0.05 in AAs) in PHF3-PTP4A1 were retested in EAs. The associations that were replicated across AAs and EAs are shown in and . Meta-analysis was performed to derive the combined p values between AAs and EAs.Family-based association tests in the secondary Australian family replication sample: Associations between alcohol dependence and the replicable risk SNPs in PHF3-PTP4A1 () identified between AAs and EAs were retested in Australians using a family-based association test implemented in PLINK . Meta-analysis was performed to derive the combined p values between EAs and Australians. […]

Pipeline specifications

Software tools IMPUTE, PedCheck, PLINK
Application GWAS
Diseases Alcoholism