## Similar protocols

## Protocol publication

[…] 15 polymorphic microsatellites loci previously described in the literature – were evaluated for their use in population genetics analyses of G. pallidipes using multiplex PCR as described in . Loci were combined into multiplex reactions with the help of Multiplex Manager v1.2 in an analysis of 2 millions iterations, a primer complementarity threshold of 7 and a minimum distance between loci of the same dye color of 26 bp. The multiplex reactions were fine tuned by hand. After a validation step (fully described in ) we ended up using 9 microsatellites loci in two multiplex PCR. Multiplex reaction α contained loci GmmK06, GmmC17, GpC10b, GpC101, GpB115, GpCAG133. Multiplex reaction β contained loci: GmmA06, GpA19a and GpC26 (). This resulted in a primer complementary threshold of 6 within multiplex reactions and of a minimum distance between loci of the same dye colour of 58 bp. Multiplex PCR were carried out in a total volume of 10 µl containing 2 µl of template DNA solution, 1X Qiagen Multiplex PCR mix and 0.2 µM of each primers except for locus GpC10b (0.3 µM of each primers). Forward PCR primers 5′ labelled with a fluorescent dye were used to allow the PCR products to be detected on an automated DNA sequencer. The PCR cycling conditions for both multiplex PCRs were (95°C, 15 min); 25 cycles of (94°C, 30 s), (55°C, 90 s) and (72°C, 60 s); (60°C, 30 min). 1 µl of a 1/20 or 1/30 dilution of the multiplex PCR products were analysed by electrophoresis in combination with the GeneScan-500 LIZ size standard (Applied Biosystems) by DNA Sequencing & Services (MRCPPU, College of Life Sciences, University of Dundee, Scotland, www.dnaseq.co.uk) using Applied Biosystems Big-Dye Ver 3.1 chemistry on an Applied Biosystems model 3730 automated capillary DNA sequencer. The size estimation of amplified microsatellites was performed using **GeneMarker** v2.2.0 (SoftGenetics). The Excel Macro Autobin v0.9 was then used on the raw data set of amplified microsatellites sizes to automatically detect relevant gaps in size and help delimit allele “bins” (). The allele “bins” defined using Autobin were then used within GeneMarker to automatically bin the alleles. Each peak was then checked manually.Microsatellite data are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.bt612 [...] Genetic variation within samples was assessed using the mean number of alleles per locus (Na) and the mean expected heterozygosity (H) computed with **Geneclass** 2 ver. 2.0.h . The coefficient of inbreeding FIS was estimated with **Genepop**
on
the Web
, . For comparisons of Na values between samples, allelic richness (AR) was estimated on the basis of minimum sample size with Fstat 2.9.3.2 . The significance of differences in AR and H between samples was assessed with the nonparametric Friedman and Wilcoxon sign rank tests (with the locus as a repetition unit). Deviation from Hardy–Weinberg equilibrium (HWE) was assessed with the probability test approach, using Genepop
on
the Web. [...] Two approaches were used. In the first, we calculated the mean multilocus individual assignment likelihood of each IAEA sample i, to each sample of possible source populations s
, with Geneclass 2 ver. 2.0.g . For each IAEA sample, the most probable source population was then identified as that with both the highest Li→s value and the lowest FST value with the source population considered , .The second method allowed the concomitant assignment of individuals and inference of potential admixture. This clustering approach, implemented in Structure 2.3.4 was used to evaluate the contribution of the Rukomeshi and Busia populations to the current IAEA colony. Individual multilocus genotypes were used to infer clusters of individuals within which deviation from HWE and linkage disequilibria are minimized. The microsatellite data were converted from Genepop to Structure format using the software Create v.1.37 . Ten replicate runs for each prior value of the number (K) of clusters, set between 1 and 5, with a burn-in of 2×105 iterations followed by 106 iterations. The admixture model of ancestry together with the correlated allele frequencies model were used and no account was taken a priori on the origin (Busia, Rukomeshi or IAEA) of each individuals, i.e. individuals were clustered only on the basis of their multilocus genotypes. Default values were maintained for all other parameters. K was estimated as the value leading to the highest likelihood for the data P(X|K) and with the ΔK statistics of Evanno et al. with **Structure** Harvester Web v0.6.93 . [...] We applied an Approximate Bayesian Computation (ABC) approach to infer the demographic history of the G. pallidipes IAEA colony and field populations under study. Microsatellite data were combined with prior information on the history and demography of those populations. Analyses were performed with **Diyabc** v 1.0.4.46 , . Briefly, in an ABC analysis, summary statistics of each simulated dataset are recorded, together with the label of the scenario used for the simulation. Euclidian distances between each simulated dataset and the observed dataset are computed. These distances are then used to estimate the posterior probabilities of the scenarios and posterior probability distributions of the parameters. In each of the three analyses described below and in , 106 datasets were simulated for each competing scenario using parameter values drawn from prior distributions and assuming equiprobability of each scenario a priori. The simulated datasets had the same characteristics (number of samples, individuals, loci, characteristics of the microsatellite loci) as the observed dataset.Genetic variation was summarised using a set of summary statistics traditionally used in ABC for each population and each population pair , , : mean number of alleles, mean gene diversity, mean allele size variance and mean M index across loci , pairwise FST
, mean individual assignment log-likelihoods of individuals from population i assigned to population j (Li→j) and the maximum likelihood estimates for admixture proportions . In analyses 1 and 2, four summary statistics were used while there were 54 in analysis 3 ().In analysis 1 () we focused on the Busia population in order to correctly model the demographic history of this population when analysing the IAEA colony history. This is of importance as the Busia population may have experienced a genetic bottleneck due to tsetse control or to the destruction of the tsetse habitat associated with the increase of the human population between the foundation of the IAEA colony (1975) and the sampling of the Busia population. If such a bottleneck occurred it is important to take it into account when performing inferences on the demographic history of the IAEA population.In analysis 2 () we focused on the demography of the Rukomeshi population between the establishment of the IAEA colony and the sampling of the Rukomeshi flies in 2006. Unlike for the Busia population, there is no record of any tsetse control program in Rukomeshi area between 1975 and 2006. However, a field trial of a tsetse control technique has been carried out in Rukomeshi in 1991 and could have decreased the size of the G. pallidipes population temporarily .The IAEA colony demography and origin were examined in analysis 3, taking into account the scenarios selected in analysis 1 and 2. The IAEA colony was considered to originate from a single source, Busia or Rukomeshi, or from an admixture between both. Each of those three scenarios were considered with or without the possibility of a bottleneck associated with the laboratory establishment of the IAEA colony, giving a total of 6 competing scenarios (). The analyses were performed using parameter values drawn from the prior distributions described in .For all the ABC analyses performed, posterior probabilities of the competing scenarios were estimated by polychotomous logistic regression on the 1% simulated datasets closest to the observed dataset. The selected scenario was that obtaining the highest posterior probability with a 95% confidence interval non-overlapping with the second highest probability , . The posterior distributions of the demographic parameters were estimated under the selected scenario using a local linear regression on the 1% simulated datasets producing the smallest Euclidian distances to the observed dataset , . The median of a posterior distribution was considered as point estimate for a parameter , .ABC analyses were performed on simulated pseudo-observed datasets (PODs) to evaluate the ability of our ABC analysis 3 to select the true scenario. For each of the 6 scenarios of the ABC analysis 3 (), 100 PODs were simulated using parameter values drawn from the probability distributions identical to the prior distributions (). Each PODs has the same characteristics (number of samples, individuals, loci) as the observed dataset. For the selection of the scenario, procedures previously described (summary statistics, Euclidian distances, posterior probability estimation) were applied to each POD. Because the scenario used to generate each POD is known, applying the ABC analysis 3 on the PODs allows the estimation of type I and II errors for these analyses. Type I error corresponds to the proportion of PODs for which a scenario is excluded by the ABC analysis while it is actually the true scenario (the one used to generate the PODs). Type II error corresponds to the proportion of PODs for which a scenario is selected while it is not the true one. Low type II error indicates that the results are reliable even when the type I error is large .Using the “model checking” option in Diyabc we evaluated the ability of the selected scenario and of its parameters posterior distributions to generate simulated data that are similar to the observed data set . The procedure was carried out by simulating 104 PODs using the scenario selected in the ABC analysis 3 and parameters values drawn from the posterior distributions of the parameters. Summary statistics distributions corresponding to those 104 PODs were then compared to the observed summary statistics. To reduce the bias introduced by the use of the same set of summary statistics for the ABC analysis and the model checking we added the following summary statistics to the previously used 54 summary statistics: the shared allele distance and the (δμ)2 distance . That way, 66 summary statistics were used in the “model checking”. The combination of the selected scenario and its parameter posterior distributions would be considered inadequate if many observed summary statistics were not included in the distribution of the summary statistics corresponding to the 104 PODs . […]

## Pipeline specifications

Software tools | GeneMarker, GeneClass, Genepop, Structure Harvester, DIYABC |
---|---|

Application | Population genetic analysis |

Organisms | Drosophila melanogaster, Homo sapiens |

Diseases | Infertility, Trypanosomiasis |