Computational protocol: Inter-population variability of DEFA3 gene absence: correlation with haplotype structure and population variability

Protocol publication

[…] The UCSC Genome Browser [] served as the main source of genomic sequence, using the human genome assembly hg17. The region analysed was a 150 kb contig from 6,760,001 bp to 6,910,000 bp of chromosome 8p23.1 (based on human genome assembly hg17). Sequences were repeat-masked and aligned against itself using PipMaker []. The size, orientation and structure of segmental duplications can be interpreted by using the PIP and Dot-Plot output generated by PipMaker. Multiple sequence alignments and phylogenetic tree construction were carried out by using the ClustalW program []. [...] Between groups chi-square test was performed to compare the proportion of DEFA3 absence in different human populations. Genotyping data from HapMap public database [] was used to test the hypothesis of association between geneticpolymorphisms and DEFA3 absence using logistic regression models. Odds ratios (OR)and 95% confidence intervals (95% CI) were calculated for eachgenotype compared with the homozygous for the major allele (theallele with greater frequency among individuals lacking the DEFA3 allele). Analyses were initially done under a codominant inheritance model (three genotypes separated). Then, simplified models were fitted: a dominant model (heterozygous grouped with the homozygous for the minor allele), a recessive model (heterozygous grouped with the homozygous for the major allele), an overdominant model (homozygous grouped) and a log-additive model (a score was assigned counting the number of minor alleles: the homozygote for the major allele was given score 0, the heterozygote score 1, and the homozygote for the minor allele score 2). The model with lowest Akaike information criteria was the recessive one (minus twice the log likelihood of the model plus the number of variables in the model) and it was selected for an easy summary of the results. P values were derived from likelihood ratio tests, and a significance level of 5% (two sided) was used for the analyses. All these analyses were performed using the SNPassoc R package [].Haploblocks were constructed using Haploview program []. Haplotypes were reconstructed using the expectation maximization (EM) algorithm implemented in the haplo.stats R package []. The OR and 95% CI were estimated using a generalized linear-regression framework that incorporates haplotype phase uncertainty by inferring a probability matrix of haplotype likelihoods also implemented in haplo.stats library. […]

Pipeline specifications

Software tools PipMaker, Clustal W, SNPassoc, Haploview, haplo.stats
Databases UCSC Genome Browser
Applications Phylogenetics, GWAS, Genome data visualization
Organisms Homo sapiens