## Similar protocols

## Protocol publication

[…] We tested for signatures of historical positive selection in MHC using the one‐tailed Z‐test and likelihood codon‐based approaches. In the one‐tailed Z‐test, the selection parameter ω quantifies the ratio of nonsynonymous (d
N) to synonymous (d
S) substitutions, where ω = d
N
/d
S >1 indicates the effect of positive selection. We calculated the ratio of d
N and d
S per site, using the modified Nei–Gojobori method with the Jukes–Cantor correction for multiple substitutions in **MEGA** v6.6 (Tamura, Stecher, Peterson, Filipski, & Kumar, ). We calculated each of the above statistics for all codons, peptide‐binding regions (PBR) sites and non‐PBR sites separately, because PBR associated with pathogen binding are expected to be under strong selection. We identified putative PBR sites based on human MHC II molecular structure (Brown et al., ; Stern et al., ). Codon‐based maximum‐likelihood methods of balancing selection were implemented in **Codeml** in PAML (Yang, ). We tested six models allowing for different selection intensities among sites: M0 (one ratio ω), M1a (nearly neutral), M2a (positive selection), M3 (discrete), M7 (nearly neutral with beta distribution approximating ω variation) and M8 (positive selection with beta distribution approximating ω variation). We used likelihood ratio tests (LRT) to compare three nested models: M0 versus M3, M1a versus M2a, M7 versus M8; and determine the best fit to our data for presence of positive selection in models M3, M2a and M8. Positively selected sites were identified by Bayes empirical Bayes procedure (BEB) for models M2a and M8. Additionally, we conducted the fixed effects likelihood (FEL), random effects likelihood (REL) and the mixed effects model of evolution (MEME) tests implemented in the **HyPhy** software (hosted at **Datamonkey**: http://www.datamonkey.org/). We checked for signatures of recombination using the genetic algorithm recombination detection method using the Datamonkey website. [...] Copy number variation of MHC loci within individuals occurs in numerous species (e.g. Sepil, Lachish, Hinks, & Sheldon, ; Zagalska‐Neubauer et al., ), complicating the assignment of amplified alleles to specific loci (Lighten et al., ). It became clear during our analysis that one to four alleles occurred within individual badgers, indicating the presence of at least two MHC DRB exon 2‐like loci. Therefore, we could not assign alleles to specific loci, requiring different measures of MHC diversity. At the population level, we estimated the mean number of pairwise differences (k) for each population in **ARLEQUIN** v3.11 (Excoffier, Laval, & Schneider, ). We also calculated the relative frequency of MHC alleles by counting the number of individuals carrying a particular allele divided by the total number of alleles in each population (Ekblom et al., ). At the individual level, we calculated MHC individual diversity as the number of alleles per individual divided by the maximum number of alleles found within individuals in the total data set (n = 4). We tested for significant differences in MHC individual diversity among populations using a type II ANOVA followed by pairwise comparisons (Tukey's HSD, family‐wise α = .05). We tested the correlation between microsatellite allelic richness and MHC individual diversity using Pearson product moment correlations in R (R Core Team, ). [...] Significant departures from Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) for each sampling location were examined using probability test in **GENEPOP** v.4.2 (Rousset, ). For each location, we estimated observed (H
o) and expected heterozygosity (H
e) using **GENALEX** v.6.5 (Peakall & Smouse, ), inbreeding coefficient (F
IS) in FSTAT v.2.9.3.2 (Goudet, ) and allelic (A
r) and private allelic richness (P
ar) adjusted for unequal sample sizes by rarefaction in HP‐RARE v.1.0 (Kalinowski, ). We tested for significant differences in neutral genetic diversity (A
r
, H
o) between small, peripheral populations (TO and ON) and nonperipheral populations using 1,000 permutations in FSTAT v.2.9.3.2 (Goudet, ). We estimated N
e using the software LDNe (Do et al., ) by selecting the linkage disequilibrium and molecular co‐ancestry methods. Confidence intervals were estimated by jackknife resampling with 1,000 iterations. [...] Pairwise MHC F
ST distances between all sampling regions pairs were calculated using the Jukes–Cantor distance model in ARLEQUIN by entering the nucleotide MHC sequences and the number of individuals per sequence and population as haplotype data. For microsatellites, F
ST was calculated using the number of different alleles. Statistical significance (p < .05) of F
ST values was estimated by 1,000 randomizations. We used partial Mantel correlations to test the effect of isolation by geographical distance (IBD) on MHC genetic distances, while controlling for the genetic differentiation at neutral loci. Significance of Mantel correlation coefficients was tested by permuting observations 1,000 times using the R library vegan (Oksanen et al., ).To facilitate comparison between the two types of markers, we treated MHC and microsatellite data as dominant markers with alleles coded in a binary form (presence 1/absent 0) (Herdegen et al., ; Kyle et al., ; Nadachowska‐Brzyska, Zieliński, Radwan, & Babik, ). Population genetic structure based on the binary‐encoded data was analysed with Bayesian clustering analyses in STRUCTURE v2.3 (Pritchard, Stephens, & Donnelly, 2000) allowing for genetic admixture and correlated allele frequencies for 200,000 burn‐in steps followed by 400,000 postburn MCMC iterations. We tested scenarios ranging from 1 to 8 clusters (k), with 10 iterations at each value of k. We compared models with and without the LOCPRIOR function, which includes the sampling regions of origin in the analysis. The most likely number of k‐clusters was chosen by compiling runs using **STRUCTURE** HARVESTER v.0692 (Earl & vonHoldt, ), and assessing the increase in P
r (X|K) (Pritchard et al., ) and using the ad hoc ∆K method (Evanno, Regnaut, & Goudet, ). Individual membership probabilities of the inferred k‐clusters from 10 independent replicates were averaged using **CLUMPP** v.1.1.2 (Jakobsson & Rosenberg, ), and clusters were visualized using **DISTRUCT** v.1.1 (Rosenberg, ). We performed analyses of molecular variance (AMOVA), by partitioning the genetic variance among the eight sampling regions and the three subspecies groups. Significance of AMOVA components was tested with 1,000 permutations in GENALEX (Peakall & Smouse, ). To evaluate the influence of neutral processes on population genetic differentiation at MHC, we performed a co‐inertia analysis (CoA) to assess the joint structure of MHC and microsatellite loci. CoA is a multivariate method that assesses the covariance structure between data sets having the same observations (Dray & Dufour, ). This multivariate method is not limited to population‐pair comparisons such as F
ST and does not rely on mutational, HWE and LD equilibrium assumptions (Jombart, ). For each binary‐matrix, we performed a factorial PCA using sampling regions as predefined groups, and the two most important PCA components of each marker were input for CoA using the ade4 R package (Dray & Dufour, ). We tested the significance of the co‐relationship between matrices by comparing the CoA estimated from the empirical data set with the CoA distribution estimated after 1,000 bootstraps. […]

## Pipeline specifications

Software tools | MEGA-V, PAML, HyPhy, Datamonkey, Arlequin, Genepop, GenAlEx, Structure Harvester, CLUMPP, DISTRUCT |
---|---|

Applications | Population genetic analysis, GWAS |