Computational protocol: Highly conserved extended haplotypes of the major histocompatibility complex and their relationship to multiple sclerosis susceptibility

[…] The phasing of alleles at each of five HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRB1 and HLA-DQB1) was accomplished using a previously published probabilistic phasing algorithm [, ]. Phased SNP haplotypes were constructed using a previously published probabilistic method [, ] at sliding windows of 2 to 15 SNPs throughout the 1 mb span surrounding the Class II region of the DRB1 gene. The SNP-window of the most significant MS-associated SNP haplotype was carried forward as a haplotype locus, a multi-allelic gene to be phased with the 5 classic HLA genes. As discussed earlier, this haplotype locus consisted of 11 phased SNPs surrounding the HLA-DRB1 gene (). The accuracy of the phasing was confirmed by the method of SHAPEIT2 [–], with better than 99% correspondence between methods.Phasing was accomplished by determining the probability of each possible combination and assigning the phasing to the most likely combination. At times, however, there were several possible combinations and this method, potentially, might designate a haplotype pair in circumstances where several compatible haplotype pairs existed and each pair had a very similar posterior probability. Such a situation did occur, but rarely. Thus, for the HLA-A~HLA-C ~HLA-B~HLA-DRB1~HLA-DQB1 haplotypes, 98% of the designations had a posterior probability of more than (0.5), 92% had posterior probability of more than (0.6), and 85% had a posterior probability of more than (0.7). For the Class II haplotypes (HLA-DRB1~HLA-DQB1~SNP), these same respective percentages were (100%, 99.997%, and 99.95%). [...] We used principal components (PC) analysis excluding MHC SNPs (Eigensoft) to correct for population stratification within the WTCCC cohort []. There was evidence of considerable population structure in the WTCCC data. An analysis of variance test carried out between cases and controls demonstrated a significant difference for most of the first 10 PCs (which accounted for 84% of the of the population stratification). None of other PCs were significantly different between cases and controls (neither were PC4 or PC10). The potential impact of this population structure on our findings was assessed by the inclusion of these 10 PCs in the final regression equation. […]

Applications Population genetic analysis, GWAS
Organisms Homo sapiens
Diseases Multiple Sclerosis