Computational protocol: Intervarietal and intravarietal genetic structure in Douglas-fir: nuclear SSRs bring novel insights into past population demographic processes, phylogeography, and intervarietal hybridization

Similar protocols

Protocol publication

[…] Standard descriptive genetic diversity statistics was estimated at both the locus and the population levels. Microsatellite loci of both varieties were characterized for the number of alleles (Na), mean number of alleles (A), effective number of alleles (Ne), observed heterozygosity (HO), expected heterozygosity (HE), inbreeding coefficient (FIS), and allelic richness (AS). They were corrected for sample size by rarefraction, with the software FSTAT v. (Goudet , ) and the GenAlEx v. 6.5 software by Peakall and Smouse (, ). At the population level, the descriptive statistics included calculations of allelic richness (AS) using the approach of Szpiech et al. (), and estimations of expected heterozygosity (HE), observed heterozygosity (HO), and inbreeding coefficient (FIS) using the GenAlEx v. 6.5 software (Peakall and Smouse , ). Finally, the Hardy–Weinberg (HW) exact tests were carried out for both levels (population and locus) to estimate the heterozygote deficiency with the software GENEPOP v. 4.1.4, using the default values (Raymond and Rousset ; Rousset ). [...] The genetic structure was analyzed at two levels, at the intervarietal level, and at the intravarietal level. For the former, analyses of genetic structure were run on the entire dataset including both varieties, whereas for the latter all analyses were run on data subsets representing populations either of the coastal variety or the Rocky Mountain variety. The populations of the whole dataset were divided into two variety subsets (K = 2) using Bayesian admixture analysis in STRUCTURE v.2.3.4. (Pritchard et al. ; Falush et al. , ). Twenty replicates of a run with a burn-in period of 50,000 and Markov chain Monte Carlo (MCMC) iterations of 100,000 were carried out based on the admixture model with correlated allele frequencies (Falush et al. , ). These selected adjustments for the burn-in time and MCMC iterations resulted from tests optimizing the trade-off between precision and simulation time. STRUCTURE HARVESTER (Earl and von Holdt ) was used to generate an input file for the software CLUMPP v.1.1.2 (Jakobsson and Rosenberg ), which was used to estimate the mean variety-membership coefficient (Qvar) of each individual and population. An individual/population was declared as coastal with Qvar (membership coefficient) > 0.9, Rocky Mountain with Qvar < 0.1 and intervarietal admixed with 0.9 > Qvar > 0.1.The genetic population structure of both varieties was inferred using two different clustering approaches (Bayesian analysis of STRUCTURE and principal coordinate analysis [PCoa]). The Bayesian STRUCTURE analysis was run on both variety subsets separately using identical settings as described before. In addition, each subset was cleaned from possible intervarietal-admixed individuals (0.9 > Qvar > 0.1) as well as from individuals of the opposite variety when present. Each K was run 20 times. As ΔK (the real number of clusters) detects the uppermost level of genetic structure where several hierarchical levels exist (Evanno et al. ), it was rational to carry out a hierarchical STRUCTURE analysis within both variety subsets as these represent populations originating from different refugia and might also represent populations mixed between two refugia in the zone of contact. Thus, the size of each variety subsets was reduced step by step according to the significance of the results as inferred with hierarchical analyses of the ΔK method (Evanno et al. ). Means of estimated intravarietal membership coefficient (Qintra) for each individual and population of the 20 runs per K were computed with the software CLUMPP v.1.1.2 using the Greedy option and random input orders (Jakobsson and Rosenberg ). The means were then plotted using DISTRUCT v.1.1 (Rosenberg ). The program STRUCTURE HARVESTER (Earl and von Holdt ) was used to estimate ΔK within the analyzed dataset. Subsequently, each population was assigned to the cluster for which its inferred individual and population ancestry (Qintra) was the highest. This assignment was also checked visually on DISTRUCT plots in order to evaluate the estimated Qintra value of populations and their composition. Although running STRUCTURE on a population level underestimates the actual number of clusters within the dataset, we preferred this strict and controllable procedure over reshuffling the individuals among clusters, which might overestimate the number of existing clusters and cause assignment problems especially for populations resulting from gene flow between refugia. If Qintra for one or more populations within the studied dataset was 0.2 < Qintra < 0.8, the procedure was additionally rerun in STRUCTURE under the locprior using the ΔK value recommended by Evanno (Evanno et al. ). The remaining settings were identical to those described previously. The use of the locprior option is of significance when detecting genetic structures at lower levels of divergence, as expected for the coastal variety (Li and Adams ; Krutovsky et al. ) with no false positives if there is no structure present (Hubisz et al. ). We did not attempt to force individual populations to belong to a particular cluster as we expected that some of them might be of admixed nature because of refugia-mixed and/or variety-mixed origin. Therefore, whenever the locprior option led to identical results (in the assignment of populations to a cluster and/or their admixed nature) to those without this option, the assignment of populations was accepted. Populations with 0.2 < Qintra < 0.8 were assigned to reflect cluster-mixed populations.Subsequently, pairwise Jost’s D values among all estimated STRUCTURE clusters were calculated and tested for significance using permutations in GeneAlEx 6.5 (Peakall and Smouse , ). Jost’s D (Jost ) is more precise in quantifying differentiation when applying highly polymorphic markers such as SSRs (Meirmans and Hedrick ).As a second approach, population clustering and the relations between populations were calculated with a pairwise codominant genotypic distance followed by a covariance standardized PCoa using GeneAlEx 6.5 (Peakall and Smouse , ). This clustering was performed for the whole dataset as well as for the two variety subsets separately using both cleaned and not cleaned datasets from intervarietal-admixed individuals and from individuals of the opposite variety. The analyses with and without the cleaned data allow estimating the influence of intervarietal-admixed individuals on the clustering. In addition to this, we tested the clustering of populations represented by intervarietal-admixed individuals only. To do so, we run PCoa with the cleaned data of the entire whole set supplemented by “new” populations. These “new” populations were built using the intervarietal-admixed individuals (with 0.4 < Qvar < 0.6). In more detail, intervarietal-admixed individuals of R18, R21, and R39 were used to assemble such new populations a18, a21, and a39. [...] Before we estimated the demographic history of genetic clusters revealed by preceding PCoa and STRUCTURE analyses, the divergence time of both coastal variety and Rocky Mountain variety was estimated. We did it by applying Approximate Bayesian Computation (ABC) analysis in DIYABC, version 2.0 (Cornuet et al. ) to all nuSSR loci. The intervarietal divergence was estimated using populations of the coastal (R01-R11, R16, R30, ) and Rocky Mountain (R22-R26, ) variety. These populations overlap with areas where Miocene and Pleistocene fossils of Douglas-fir and a refugium for each variety characterized by cpDNA were detected (Gugger et al. ). Thereafter, in order to clarify possible refugial origin of the discovered genetic clusters, different putative historical scenarios were tested and dated following the standard protocol of DIYABC. We were aware of the fact that a distinct genetic cluster does not immediately represent a refugial population. The analyzed populations of each cluster are shown in . In all runs, 1,000,000 datasets were simulated for each scenario. A total of 1% of the simulated datasets most similar to the observed data were used to estimate the relative posterior probability (with 95% confidence intervals [CIs]) of each scenario via logistic regression and posterior parameter distributions according to the most likely scenario. Because DIYABC version 2.0 selects the optimized rate from a given mutation range, we assigned mutation rates of nuSSRs to vary between 10−4 and 10−2 and thus to correspond to a general mutation rate for nuSSRs (Miah et al. ). All intervarietal-admixed individuals were omitted from these analyses. Following parameters were estimated: divergence time (t), effective population size of each population (N1, N2), and effective population size of ancestor (Na). As t was expressed in the number of generations, we used an approximate average generation time of 100 years to convert it into years, following Gugger et al. (, ). [...] In order to assess whether intervarietal F1 genotypes or backcrosses exist within analyzed populations of the hybrid zone in BC (Canada), all intervarietal-admixed individuals with 0.2 < Qvar < 0.8 as estimated in STRUCTURE analysis (K = 2) from studied populations in BC have been selected. Subsequently, values of the intervarietal heterozygosity (IH) and maximum likelihood hybrid index (HI) were estimated following the approach as described by Lexer et al. (). In more detail, we tested whether genetically admixed intervarietal individuals, previously identified by the STRUCTURE analysis, were more compatible with the simulated F1 individuals (maximum IH and intermediate HI), or represent rather recombinant hybrids and introgressants of subsequent generations (reduced IH and HI values moving toward values of the coastal or Rocky Mountain variety). In the analysis, genotypes of 500 F1 individuals were simulated with the HYBRIDLAB software v.1.1 (Nielsen et al. ) using 32 randomly selected individuals from a coastal variety subset with Qvar > 0.99, and 32 individuals of the Rocky Mountain variety subset with Qvar < 0.01. Then, calculations of IH and HI, including 95% CIs, were carried out and plotted for simulated and observed intervarietal-admixed individuals and parentals with R package Introgress (Gompert and Buerkle ) following Gompert and Buerkle (). […]

Pipeline specifications