Computational protocol: Combining phylogenetic and demographic inferences to assess the origin of the genetic diversity in an isolated wolf population

Similar protocols

Protocol publication

[…] We amplified by Polymerase Chain Reaction (PCR) a fragment of the left peripheral and central domains of the mtDNA control-region (CR) using primers WDLOOPL and H519 [], and fragments of the mtDNA ATP6, COIII and ND4 genes using primer pairs described in [] (primers For8049-Rev8501; For8255-Rev8891; For10104-Rev10647; For11093-Rev11741). All the amplifications were performed in 10 μL total reactions containing 20–40 ng/μL DNA, 1X PCR buffer with 2.5 mM Mg2+, 0.3 μM of primer mix (forward and reverse) and 0.25 units of Taq Polymerase (5 PRIME Inc., Gaithersburg, USA). Amplifications were performed with an initial DNA denaturation step at 94°C for 2 minutes, followed by 45 cycles of denaturation at 94°C for 15 seconds, annealing at 55°C for 15 seconds, extension at 72°C for 30 seconds, and final extension at 72°C for 10 minutes. Amplicons were purified using ExoSAP-IT (Affimetrix, Inc., Cleveland, Ohio, USA) and sequenced in both directions in an ABI automated DNA sequencer 3130XL (Applied Biosystems). Sequences were visually corrected in SeqScape 2.5 and aligned in Geneious 7.1 (Biomatters Ltd., Auckland, New Zealand). Geneious was also used to fix alignment ambiguities, mainly caused by indels in the mtDNA CR. The four mtDNA regions were concatenated in a multi-fragment alignment of 2164 bp. Taking into account the presence of indels, identical haplotypes were collapsed using DnaSP 5.10.01 [], that was also used to estimate haplotype (H) and nucleotide (π) diversity [] for each of the four regions and for the concatenated sequences. [...] In addition to the new 210 mtDNA sequences (), we downloaded from GenBank the homologous sequences of 18 extant wolves, four ancient canids and 322 dogs (). We then aligned these sequences to construct Neighbor-Joining (NJ) [], Maximum Likelihood (ML with heuristic search) [] and Bayesian (BT) [] phylogenetic trees that were rooted using as an outgroup a coyote sequence (Canis latrans, GenBank access number DQ480509). NJ and ML phylogenetic analyses were done in Paup* 4.0 beta []; the BT was computed in MrBayes 3.2 []. For the NJ and ML analyses, we selected the best evolutionary model using the modeltest option [] and the Akaike Information Criterion []. We obtained internode supports by 1000 bootstrap replicates [] in NJ trees, and by 100 bootstrap replicates in ML trees, using the faststep search in Paup. We identified the best-fit evolutionary model for Bayesian analyses using PartitionFinder []. MrBayes 3.2 was run for 2x106 generations, with a sampling frequency of 100 generations, and with one cold and three heated MCMC (temperature = 0.45; first 10% of the trees excluded to ensure convergence) []. To check for convergence of parameter for Bayesian analyses, we used Tracer 1.6 (http://tree.bio.ed.ac.uk/software/tracer). We estimated in Geneious the frequency of invariable (I) sites, the parameters of the gamma (γ) distributions and the transition-transversion (Ti/Tv) ratios for the NJ, ML, and BT analyses. [...] We used Beast 2.4.2 [] to estimate the coalescent time to the most recent common ancestor (TMRCA) of the main mtDNA clades identified in the phylogenetic trees of the concatenated sequences, including all the wolf and dog haplotypes. We applied a strict molecular clock with the coalescent extended bayesian skyline model [], and using as priors the ages of the four ancient canid sequences (; ). Model parameters and trees were sampled every 10 000 over a total of 100 000 000 iterations in two independent MCMC chains. The first 10% iterations were discarded as burn-in. We used Tracer to check for MCMC convergence. When the two independent runs converged on the posterior distributions and reached stationarity, we combined the sampled trees into a single tree file with LogCombiner 2.4.2 (burn-in = 10%). With TreeAnnotator 2.4.2 we summarized information from a sample of trees into a single final tree visualized in FigTree 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). LogCombiner and TreeAnnotator are part of the Beast package. [...] We used microsatellite data to run Approximate Bayesian Computation simulations (ABC) [] implemented in the software Diyabc 2.1.0 [] to model plausible demographic scenarios and estimate divergence times (in generations) among wolf populations sampled from European countries and corresponding to the clusters identified by Structure.We selected three wolf population samples for modelling full ABC simulation scenarios: WIT (pop1), WIBP (pop2) and WDIN (pop3), excluding any sample with possible traces of dog admixture (). Samples from WBALK, WCARP and WBALT were not used in the simulations because these populations are still in connection one another and also with unsampled wolf populations in eastern Europe [,].According to Pilot et al. [] and Fan et al. [], southern European populations diverged very closely in time, and their effective sizes steadily decreased in the last tens of thousands of years. Therefore, we tested four demographic scenarios (), assuming that the three populations split simultaneously (scenarios 1 and 2) or sequentially (scenarios 3 and 4) and that the three populations passed through a bottleneck (scenarios 2 and 4) or not (scenarios 1 and 3).We ran 6 x 106 simulations for each scenario using uniform prior distributions of the effective population size and time parameters with default mutation settings. We selected the following summary statistics for all the microsatellites: a) one sample: mean number of alleles, mean genetic diversity, mean size variance; b) two samples: mean number of alleles, mean genetic diversity, Fst, shared allele distance ().Scenarios were compared by estimating posterior probabilities with the logistic regression method in DIYABC using 1% of the simulated datasets. For the best models, posterior distributions of the parameters were estimated with a logit-transformed linear regression on the 1% simulated datasets closest to the observed data. Scenario confidence was evaluated by comparing observed and simulated summary statistics. Finally, the goodness-of-fit of the posterior parameters for the best performing scenarios was tested via the model checking option with default settings, and significance was assessed after Bonferroni correction for multiple testing []. […]

Pipeline specifications