Computational protocol: Genetic polymorphism and evolutionary differentiation of Eastern Chinese Han: a comprehensive and comparative analysis on KIRs

Similar protocols

Protocol publication

[…] The observed carrier frequencies (OFs) of the KIR genes were determined by the number of positive typing reactions. Based on the assumption of Hardy-Weinberg equilibrium, we calculated the estimated gene frequencies (GFs) using the formula, GF = 1−(1−OF)1/2. The GF is determined by the OF of the KIR gene in all individuals. Package “pheatmap” (https://cran.r-project.org/web/packages/pheatmap/index.html) based on statistical software R version 3.2.5 (https://www.r-project.org/) was used to draw a Heatmap containing Eastern Chinese Han and 58 other populations with complete KIR genotyping files of 16 KIR genes exclusive of KIR1D which are Jilin Han, Shaanxi Han, Shenzhen Han, Sichuan Han, Xinjiang Han, Yunnan Han and the complete list of 52 populations in HGDP-CEPH distributed around the world. The Heatmap is constructed using Hierarchical Clustering algorithm based on Euclidean distance. The Hierarchical Clustering model generally refers to the assumption that irreducible correlation functions are described by the hierarchical relations: ξn = Qnξn−12, where ξn is the nth order correlation function, and the Qn is constant. The D statistic included in recognized “Genetics” package (https://cran.r-project.org/web/packages/genetics/index.html) was used to conduct linkage disequilibrium (LD) analysis (KIRs whose OFs = 100% were excluded). The calculated formula and according statistics principle reveal the sign of coefficient D which represents the same or opposite allelic association. Specifically to KIR genes, the completely positive LD (D = 1) indicates both loci are present or absent simultaneously. Oppositely, the complete negativity (D = −1) means just only one of the two loci is present. According to the observed carrier frequencies data of 13 variable KIR genes (KIR-2DL1, 2DL2, 2DL3, 2DL4, 2DL5, 3DL1, 3DL2, 3DL3, 2DS1, 2DS2, 2DS3, 2DS5 and 3DS1) from the above-mentioned 59 populations and 3 other Han populations including Shanghai Han, Hong Kong Han and Singapore Han, Dispan software was utilized to generate the DA genetic distances and relevant significances without correction. According to the estimation formula, DA is a direct calculation of genetic association between any 2 populations whilst FST is a relative measure of genetic differentiation given the total genetic variation presents in the population. From an accuracy point of view, DA genetic distance was commonly employed in studying KIRs because Nei DA distance is proved to possess the highest probability of obtaining the correct branching pattern of a phylogenetic tree. By using the distance matrix, we drew a Neighbor-Joining tree and assessed its reliability by interior branch test using Mega version 6.0. As for Neighbor-Joining algorithm, it’s a simplified version of the Minimum Evolution (ME) method, which doesn’t require the assumption of a constant rate of evolution mentioned in Hierarchical Clustering algorithm. The N-J tree reconstruction starts with a starlike tree with no hierarchical structure and the necessary assumption is that there is no clustering of OTUs (operational taxonomic units). To validate the genetic relationship of the studied populations, we illustrated a Multidimensional Scaling (MDS) plot using the “MASS” packages (http://www.r-tutor.com/category/r-packages/mass). Chi-square test was conducted by SPPS 22.0 to depict the distribution variances between Eastern Chinese Han and Jilin Han (Northeast China), Shaanxi Han, Xinjiang Han (Northwest China), Yunnan Han, Sichuan Han (Southwest China), Shenzhen Han, Hong Kong Han (Southeast China), Singapore Han (overseas Chinese), and Shanghai Han. […]

Pipeline specifications

Software tools PHeatmaps, MEGA
Databases HGDP-CEPH
Applications Phylogenetics, Transcriptome data visualization
Organisms Homo sapiens
Diseases Immune System Diseases