## Similar protocols

## Protocol publication

[…] Deviations from Hardy–Weinberg equilibrium (HWE) within subpopulations and genotypic linkage disequilibrium (LD) between pairs of loci were tested using the Markov chain method implemented in **Genepop** v.4 . We corrected for multiple testing by the False Discovery Rate (FDR) approach implemented in the Qvalue package of R.Some loci displayed significant heterozygote deficiencies in several subpopulations (see ). In these loci, some null genotypes were found. We used Micro-checker v.2.2.3 to test whether heterozygote deficiencies could be accounted for by the existence of null alleles. We then used Freena (, available from www.montpellier.inra.fr/URLB), to assess the need to correct for null alleles.The genetic diversity of each subpopulation was assessed by calculating Nei's unbiased genetic diversity (HS, ), allelic richness (r, evaluating the number of alleles independent of sample size, calculated for a minimum of 7 individuals by the rarefaction procedure implemented in Fstat v.2.9.3.2 ), and FIS (tested for significance with Fstat). Effective population size (NE) was estimated with Ldne
. The method used is based on linkage disequilibrium and assumes random associations of alleles at different loci. Alleles with a frequency ≥0.02 were used to minimize possible bias . We investigated whether plague epizootics had left detectable traces on genetic diversity and population size, by assessing the relationship between r, HS, FIS or NE and plague seroprevalence in each subpopulation within areas by carrying out Spearman's rank non parametric correlation analyses in SAS v. 9.3 . We compared r, HS, and FIS between areas with Fstat (10,000 permutations), and NE using nonparametric Kruskal-Wallis tests in SAS.Genetic structure was examined in several ways within the four study areas. Bayesian clustering analyses were first performed with Structure v.2.3.3 . This approach is based on an explicit evolutionary model for genetic variation and makes statistical inference on the basis of individual data, to estimate the number of genetic clusters in each area and assign individuals to the various clusters. We used the ΔK method to infer the number of genetic groups per area . All analyses were performed with an admixture model and correlated allele frequencies . We performed 10 independent runs for each K value (from 1 to n+1, n being the number of villages sampled in each area). Each run included 50,000 burn-in iterations followed by 500,000 iterations. We also checked that a single mode was obtained in the results of the 10 Structure runs for each K value, by using the Greedy algorithm implemented in **Clumpp** v.1.1.2 .We also estimated FST values for each pair of subpopulations within each area, using Fstat. We generated 95% confidence intervals (CIs) for the mean FST per area by bootstrap resampling across loci, and we then used Fstat (10,000 repetitions) to compare mean FST. We assessed whether villages may explain population genetic structure within a given area, by performing AMOVA (analysis of molecular variance, ) with **Arlequin** v.2.000 , using the locus-by-locus option. The variance components were tested using randomization (1,000 permutations). At the finest spatial scale, we used G-based (log-likelihood ratio) randomization tests to evaluate the effect of habitat on genetic structure, with Fstat. These analyses were carried out with pairs of subpopulations corresponding to different habitats within villages (Betafo: 4 pairs; Mandoto: 4 pairs; Ambositra: 3 pairs; Moramanga: 4 pairs). Independent tests of pairwise genetic differentiation were combined, by the generalized binomial procedure implemented in Multitest v.1-2 .We analyzed isolation by distance (IBD) by regressing pairwise estimates of FST/(1 - FST) against the logarithm of the Euclidean geographic distances between trap sites (). Under a model of isolation by distance, genetic distance between subpopulations would be expected to increase with geographic distance. Mantel tests were performed to test the correlation between matrices of genetic differentiation and geographic distance in Genepop (10,000 permutations), excluding intra-village comparisons. The spatial pattern of genetic variation was also investigated by spatial autocorrelation analyses of mean genetic relatedness between pairs of individuals. These analyses complemented standard tests of isolation by distance, as spatial autocorrelation can occur at very fine scales, below the level of the area. Moreover, genetic relatedness provides a more contemporary picture of population genetic structure than the integrative FST. Spatial autocorrelation analyses were performed for each area with **Spagedi** v.1.2 , and the relatedness coefficient rxy
was calculated for each distance class. Genotypic data for more than 398 pairs were included in each distance class. The null hypothesis of random genetic structure was rejected if the correlation coefficient exceeded the limits of the 95% confidence interval, as determined from 10,000 permutations.Finally, we investigated the relationship between plague seroprevalence distribution and genetic structure in rat subpopulations while statistically controlling for the effect of the Euclidean geographic distance. This was done with partial Mantel tests performed in Fstat (10,000 permutations), using pairwise absolute differences between seroprevalence levels and pairwise estimates of FST between subpopulations. A positive correlation would suggest a strong influence of rat dispersal on plague distribution. […]

## Pipeline specifications

Software tools | Genepop, CLUMPP, Arlequin, SPAGeDi |
---|---|

Applications | Phylogenetics, Population genetic analysis |

Organisms | Rattus norvegicus, Yersinia pestis, Rattus rattus, Homo sapiens |