Computational protocol: Differential Natural Selection of Human Zinc Transporter Genes between African and Non-African Populations

Similar protocols

Protocol publication

[…] To evaluate whether the genetic variance between the four continental regions is significantly different from the genetic variance among populations within each region, AMOVA was carried out using Arlequin. To investigate global differentiation among ZTGs, we calculated the weighted average FST (WA - FST) of multiple sites from haplotypes of each gene across all 14 populations. The genetic differentiation between populations of each locus was measured using the unbiased estimates of FST, following Weir and Hill with a python script. We determined empirical cutoffs for the top 1% and 5% of signals genome-wide. Thus, loci or genes with an FST value greater than the cutoffs were considered as highly differential SNPs (selected SNPs) or genes. As a result, the highest 1% and 5% of the genome-wide locus-specific FST is 0.183 and 0.092 with the average being 0.017. [...] Two approaches, the integrated haplotype score (iHS) and the composite likelihood ratio (CLR) test, were used to detect the signals of recent positive selection. Because of the hitchhiking effect, positive selection might bring a selected allele into high frequency rapidly enough that recombination does not have time to break down this haplotype, resulting in a long haplotype in high frequency. The iHS test is based on the long haplotype, which is a distinctive signature that could not be expected under neutral drift. It has been shown to possess power enough to identify recent, incomplete sweeps. The standardized iHS scores were calculated for every SNP with minor allele frequency >5% by an R package, rehh. For every gene in each population, we screened the iHS value of each locus and inferred a positive selection signal if there are 7 or more loci with |iHS| equal to or more than the top 5% of genome-wide signals in any continuous 50-SNPs bin of this gene region.The CLR test, a model-based method, is a statistic to compute the likelihood ratio of selective sweeps by comparing the spatial distribution of allele frequencies in a given window, compared to the frequency spectrum of null distribution, such as all the autosomal regions. In this study, the SweepFinder program was used to carry out the calculation. For the CLR test, we calculated the standardized CLR score of each population for the entire autosomal regions and took the values with an empirical P-value of 0.05 as the cutoff to detect a natural selection signal at given ZTGs genes. [...] The transmembrane helices and topology of ZTGs were predicted using HMMTOP and visualized with TeXtopo. To predict how amino acid variants might change the function of the peptides of ZTGs, Polyphen-2 was used. […]

Pipeline specifications

Software tools Arlequin, Rehh, SweepFinder
Application Population genetic analysis
Chemicals Zinc