## Similar protocols

## Protocol publication

[…] Since forest tree species usually have extensive geographic ranges, exhibit extensive gene flow and have low levels of population stratification [], we investigated whether the genetic variability due to non-random mating in our population was caused solely by isolation-by-distance (IBD), reflecting the large geographical distribution of our sample (cf. []), or also by natural barriers causing local genetic clusters. We performed spatial principal component analysis (sPCA) by using the “spca” function implemented in the R package “**adegenet**” [] which is a spatially explicit multivariate analysis accounting for spatial autocorrelation processes and patterns of genetic variation. A K-nearest neighbours method with K = 10 was used as connection network. Positional information for each genotype were transformed into Universal Transverse Mercator (UTM) coordinates using “convUL” in the R package “PBSmapping” []. Due to the occurrence of multiple genotypes with identical geographical coordinates (i.e. trees collected at the same latitude/longitude), we randomly selected a single genotype representing a geographical region (out of the total 140 locations). Eigenvalues for principal components from sPCA provided a cumulative picture about contributing factors, including the genetic variance and the spatial autocorrelation (through Moran’s I, see below). Large positive eigenvalues reflect the importance of the proportion of the genetic variance along with a strong positive autocorrelation in the global pattern (i.e. IBD), while large negative eigenvalues indicate the importance of the proportion of the genetic variance along with negative autocorrelation indicating the existence of discrete local genetic clusters.We used the "global.test" and "local.test" functions in the "adegenet" package to infer the statistical significance of each type of genetic structure. These functions are based on a spectral decomposition of the connection matrix into Moran's eigenvector map and test for association of those eigenvectors from Moran's eigenvector map with Moran's I []. To investigate gene dispersal, we employed a Moran I test for spatial autocorrelation ([,]). Moran’s I coefficients were investigated in 200 km spatial lags and the analysis was performed using “moran.test” in the “**spdep**” R package []. Moran’s I coefficients were estimated as follows:
I=n∑i=1n∑j=1nwij*∑i=1n∑j=1nwij(xi−x¯)(xj−x¯)∑i=1n(xi−x¯)2(1)
where n is the number of populations (i.e. unique geographical locations), w
ij is weight set at 0 or 1 depending on whether populations are considered neighbours in each 200 km lag test, x
i is the allele frequency in the ith population, and x- is the allele frequency across all populations. [...] We tested phenotypic characteristics in P. trichocarpa for their adaptive potential (). For Q
ST−F
ST comparisons, Q
ST values among the identified climate-related population groups were first estimated for each trait following [] and [], respectively.The narrow-sense Q
ST was estimated by computing the variance components using the ‘animal model approach’ [] following:
y=Xβ+Zp+Za+e(2)
where β is a vector of fixed effects (intercept), p and a are vectors of random climate cluster and individual tree additive genetic effects, X and Z are incidence matrices assigning fixed and random effects to measurements in vector y, the cluster effects are following p~N(0,σp2) where σp2 is the cluster variance, individual tree additive effects are following a~N(0,σa2G) where σa2 is the additive genetic variance and G is the realized relationship matrix [], using 29,354 SNPs estimated in R package “**synbreed**” [] as follows:
G=ZZ′2∑p(1−p)(3)
where Z is M-P, with M the marker matrix with genotypes recoded into 0, 1 and 2 for the reference homozygote allele, the heterozygote and the alternative homozygote allele, respectively, and with P the vector of doubled allele frequency; e is the vector of random residual effects following e~N(0,σe2I) where σe2 is the residual variance and I is the identity matrix. The narrow sense Q
ST was estimated as follows:
QST=σ^p2(σ^p2+2σ^a2)(4)
where σ^p2 and σ^a2 are the estimates of cluster and additive genetic variance representing among- and within-group trait variances attributable to additive effects.The measurements of all ecology and disease traits using clonal ramets (i.e. replication) enable estimating broad-sense Q
ST directly without the use of any relationship matrix, while narrow-sense Q
ST estimation was based on variance components estimated in the mixed linear model considering the realized relationship matrix [] as in . The model is identical to where the variance components for broad-sense Q
ST were estimated in the model considering a as the vector of clonal genotypic values following a~N(0,σa2I) where σa2 is the total genetic variance (including both additive and non-additive component) and e as the vector of ramet within clone effects following e~N(0,σe2I). Then, the computed Q
ST values for each trait were compared to the average population differentiation estimate (F
ST) strictly based on neutral markers (see below) allowing inferences about trait evolution based on selection or genetic drift (neutral trait), [].
Narrow-sense heritability (h
2) was based on variance components estimated in the mixed model as follows:
y=Xβ+Za+e(5)
where β is the vector of fixed effects (intercept and cluster) and a is the random vector of additive genetic effects following the description of . The narrow-sense heritability was estimated as follows:
h^2=σ^a2σ^a2+σ^e2(6)
where σ^a2 and σ^e2 are estimates of additive genetic and residual variance, respectively. The phenotypic Q
ST (i.e. P
ST) ([,]) was estimated as follows:
PST=σ^p2(σ^p2+2h^2σ^e2)(7)
where σ^p2 and σ^e2 are estimates of cluster and residual variance representing among- and within-population variances, respectively, and h^2 is the heritability estimated according to []. The variance components were estimated in ASReml software [] using the mixed linear model following:
y=Xβ+Zp+e(8)
where β is the vector of fixed effects (intercept) and p is the vector of random cluster effects, the effect of individuals within cluster is found within the error variance. [...] To identify SNPs putatively under selection and also associated with adaptive traits ([,,]), we performed: 1) F
ST outlier analysis (using Fdist2) employing the same climate clusters as for Q
ST analysis, 2) unsupervised spatial ancestral analysis (SPA), and 3) SPA with climate as a covariate. Additionally, we compared our results with F
ST outlier analysis (using Fdist2 and **BayeScan**) that were reported in [] using 25 topographic units separated by watershed barriers within the geographic area from Central Oregon, USA (44.3°N) to northern BC, Canada (59.6°N)).
F
ST values for SNPs were calculated among the four climate clusters (for definition and calculation, see above). We implemented the Fdist2 program within the LOSITAN project [] for SNP F
ST outlier detection. Fdist2 compares the distribution of F
ST values of sampled loci to the modeled neutral expectation of F
ST distribution using coalescent simulations []. We employed the infinite alleles mutation model (as we investigated SNPs), a subsample size of 50, and ran 200k simulations. F
ST values conditioned on heterozygosity and outside the 99% confidence interval were considered candidate outliers.Since P. trichocarpa populations have known structure related to IBD ([] and this study), we applied spatial ancestral analysis (SPA), a logistic regression-based approach [], to detect SNPs with sharp allelic frequency changes across geographical space (implying candidates under selection). The unsupervised learning approach (using only genomic data) was employed to obtain SPA statistics. In addition, we tested SPA including the first two principal components (PCs) based on climate variables (explaining 91% of the variance) as covariates to determine individuals’ location based on allele frequencies related to MAT, NFFD, and MAP climate components.We investigated correlations between the outlier SNPs (based on climate clusters) and the environmental variables that defined the established climatic clusters (). Subpopulation averages for MAT, NFFD, and MAP were tested for correlations with SNP allele frequencies employing multiple univariate logistic regression models with the spatial analysis method (SAM; []). The significance of correlations was assessed using three independent statistical tests (likelihood ratio and two Wald tests) implemented in SAM and applying an initial 95% confidence interval for the statistical tests. We used the Bonferroni correction method (α = 0.05) for multiple testing resulting in p<6.887052*10−5 for 726 tested models (242 alleles, three variables). Only those correlations that remained significant after Bonferroni correction for each of the three test statistics (i.e. the likelihood ratio and the two Wald tests) were retained.Finally, we compared observed Q
ST values with the simulated distribution of Q
ST-F
ST values for a neutral trait using previously provided R scripts []. In brief, a range of possible demographic scenarios was tested simulating the distribution of Q
ST values based on mean F
ST for neutral markers and mean Q
ST for neutral traits ([,]). For a neutral trait, the expected Q
ST was estimated based on σ^p2 (i.e., measured within-population variance; see above) and σ^a2 (i.e., expected between-population variance) given in . The distribution of σp2 values was based on σa2 and the observed F
ST values of 29,233 SNPs present (total number reduced by removing outliers) within the simulated neutral envelope of F
ST values (F
ST outlier analysis) with Q
ST replaced by the F
ST in . P-values were obtained by testing whether the null hypothesis that the estimated narrow-sense Q
ST for each tested trait is statistically equal to the expected Q
ST for a neutral trait []. […]

## Pipeline specifications

Software tools | adegenet, Spdep, synbreed, BayeScan |
---|---|

Applications | Miscellaneous, Population genetic analysis |

Organisms | Populus trichocarpa |