Computational protocol: Post glacial phylogeography and evolution of a wide ranging highly exploited keystone forest tree, eastern white pine (Pinus strobus) in North America: single refugium, multiple routes

Similar protocols

Protocol publication

[…] Genetic diversity parameters of individual populations for the nuclear and chloroplast microsatellites were determined using GenAlEx 6 []. Number of alleles per locus (AN), number of private alleles (AP), and observed and expected heterozygosity (HO and HE) were calculated for the nuclear markers. Number of alleles per locus (AN), Shannon’s Information Index (I) see [], haplotype diversity (H), and unbiased haplotype diversity (uH) were calculated for the chloroplast markers. We also estimated the effective number of alleles (AE) per locus, rarefaction-based allelic richness (AR), and inbreeding index (FIS) for the nuclear microsatellites using FSTAT v2.9.3.2 []. Departures from Hardy-Weinberg equilibrium were examined. We tested non-random association of alleles at different nuclear loci using a linkage disequilibrium test in FSTAT v2.9.3.2 [] and Arlequin v3.5.1.2 []. We calculated correlation between latitude and genetic diversity indices of populations to test for any declining genetic diversity trend from south to north, an expected signature of founder effects along the recolonization route(s). [...] Inter-population genetic differentiation for nuclear microsatellites was determined by using F-statistics [] employing GenAlEx 6 [], and AMOVA [] using Arlequin v3.5 []. GST and RST/NST among populations were calculated from the chloroplast markers using 1000 permutations in PermutCpSSR v2.0 [].The population genetic structure resulting from natural barriers and human activities was examined using two Bayesian model-based clustering approaches. First, STRUCTURE [] was used to examine the range-wide population structure, based on the 12 nuclear microsatellites, under the assumption that sample locality has no significant role in population structure. STRUCTURE works by grouping individuals into clusters (K) such that Hardy-Weinberg equilibrium is maximized within clusters. By varying the K-values across several runs and inspecting the resulting probabilities for these various K values, one can infer the likely number of groups which best capture the variation present in the data. We performed multiple runs of STRUCTURE to test K values ranging from 1 to 33, over 50 replications, using an admixture model and correlated allele frequencies options [], a 105 burn-in length and 105 MCMC replications for each run. In order to facilitate the selection of the best K value, we used STRUCTURE HARVESTER []; an online application that uses the Evanno et al. [] technique for assessing and visualizing likelihood values across multiple values of K and detecting the number of genetic groups that best fit the data.Due to the large variation in geographical distances among the locations of the sampled populations, we sought to disentangle any artifactual population structure signal caused by populations in close proximity. We did this by performing a second Bayesian population structure analysis using BAPS v5.3 []. Unlike STRUCTURE, BAPS provides the user with an option to integrate spatial coordinates into the prior assumptions []. We also employed the BAPS to examine the population structure as defined by the chloroplast markers, using the haplotype data. Both the nuclear and chloroplast datasets were analyzed for a maximum of 33 spatial cluster groups with a population mixture option.Regions where abrupt genetic differentiation exists over relatively small geographic distances can be indicative of boundaries of population groups and genetic barriers in a species range perhaps where distinct phylogeographic lineages meet. We used Barrier v2.2 [] to identify genetic barriers and boundaries of population groups, for nuclear and chloroplast microsatellites data, using both the multi-locus pairwise FST matrix and individual locus FST pairwise matrices to determine the number of loci that support any inferred barriers. [...] Although we observed varying levels of population differentiation and a significant magnitude of population structure among eastern white pine populations, signals of past phylogeographic patterns were present in nearly all analyses (e.g. regional clustering in low K-value STRUCTURE runs, Barrier analysis). In order to disentangle these patterns from the present population structure, we employed geographic distribution patterns of chloroplast haplotypes, and tested various phylogeographic hypotheses using the Approximate Bayesian Computation (ABC) analysis.We first examined the composition and geographic distribution of chloroplast haplotypes to infer genetic lineages and post-glacial northward migration of eastern white pine. The geographic distribution of the haplotype data was visualized using PhyloGeoViz []. The distribution of these haplotypes across the species’ range was combined with previous information on fossil pollen occurrence [] to formulate possible recolonization scenarios, including possible routes and divergence times.We used DIYABC v2.0.3 [] to test competing hypothetical scenarios regarding phylogeography and population divergence in eastern white pine on a range-wide scale. The hypotheses were constructed primarily to test the order (from south to north) and time of divergence of the population groups, as well as the possibility of population admixture after divergence. For the ABC simulations, we analyzed the nuclear and chloroplast marker data separately. We hypothesized four groups of populations (lineages) based on the signals from STRUCTURE, BAPS and Barrier analyses and geographical distribution of chloroplast haplotypes (see Results). These groups were as follows: Western, Central, Eastern and Southern (Additional file : Table S1). First, we compared the competing scenarios of population divergence without admixture and then with population admixture (Additional file : Figure S1). The information on the parameters and their prior distributions used in the analysis are provided in Additional file : Table S1. Then we compared the best scenarios taken from each of the without and with admixture analyses. We simulated one million data sets for each of the scenarios, and four million data sets for the comparison between the two best scenarios (~ two million each). The population divergence scenarios differed in the order of population divergence and in the number and time of demographic expansion events. The population admixture scenarios were developed based on both the chloroplast haplotype distribution and the best scenario from with and without admixture comparison.We performed a logistic regression to estimate posterior probability of each scenario, taking the simulated data sets closest to our real data set between 0.1 % and 1 % []. The 95 % credibility intervals for the posterior probabilities were computed through the limiting distribution of the maximum likelihood estimators. Once the most likely scenarios were identified, we used a linear regression analysis to estimate the posterior distributions of parameters under this scenario. We chose the 1 % of the simulated data sets closest to our real data for the logistic regression after applying a logit transformation to the parameter values. In order to evaluate the goodness-of-fit of the estimation procedure, we performed a model checking computation [] by generating 10,000 pseudo-observed data sets with parameters values drawn from the posterior distribution given the most likely scenario. […]

Pipeline specifications

Software tools GenAlEx, Arlequin, Structure Harvester, BAPS, PhyloGeoViz, DIYABC
Applications Phylogenetics, Population genetic analysis
Organisms Pinus strobus