Computational protocol: Genetic differentiation and inferred dynamics of a hybrid zone between Northern Spotted Owls (Strix occidentalis caurina) and California Spotted Owls (S. o. occidentalis) in northern California

Similar protocols

Protocol publication

[…] We used Arlequin version 3.5 (Excoffier & Lischer, ) to quantify genetic diversity among sampling locations for the purposes of comparing results for the new sample sets (NSO contact zone and CSO) to results obtained from prior analyses (locations 1–18 in Haig, Mullins, & Forsman, ; locations A through O of Funk et al., ). For the mtDNA data, we calculated haplotype diversity (H), nucleotide diversity (π), and number of unique haplotypes detected (A) for each location. For the microsatellite data, we calculated expected heterozygosity (H e) and the average number of alleles per locus (A).We used the AMOVA procedure (Excoffier, Smouse, & Quattro, ) to quantify genetic differentiation patterns among all pairwise combinations of sampling locations for each data set. For the microsatellite data, all sampling locations illustrated in Figure a were used, treating the sets of CSO samples and NSO contact zone samples as sampling locations in addition to locations A through O from Funk et al. (). For the mtDNA data, analyses were performed using the number of nucleotide differences between haplotypes as the distance measure and included all sampling locations illustrated in Figure b, again treating the CSO and NSO contact zone samples as separate sampling locations. In all analyses, the significance of pairwise ΦST (mtDNA) or FST (microsatellites) values were obtained using 5,000 randomization replicates. Interpretation of the pairwise ΦST and F ST matrices was difficult due to the large numbers of pairwise comparisons involved in each data set. We therefore used MEGA version 7.020 (Kumar, Stecher, & Tamura, ) to generate neighbor‐joining (NJ) trees (Saitou & Nei, ) and visualize the pairwise differentiation matrices derived for each data set. In NJ analyses, negative values of ΦST or FST between locations were treated as zeros.Genetic structure patterns from the microsatellite data were also identified using STRUCTURE version 2.3.4 (Pritchard, Stephens, & Donnelly, ). Analyses were performed using all genotyped NSO and CSO individuals and were based on ten replicates each of assumed values of K (number of genetic clusters) ranging from 1 to 5. Analysis parameters included use of the admixture and correlated allele frequency models, as recommended by Falush, Stephens, and Pritchard (), along with 2 × 106 burnin steps and 107 Markov Chain Monte Carlo steps. We used the ΔK method of Evanno, Regnaut, and Goudet () to infer the most likely number of genetic clusters associated with the data and likewise used the program CLUMPP () to combine results across replicates for the inferred K value.Phylogenetic structure among unique mtDNA sequence haplotypes was investigated using two approaches. Bayesian phylogenetic analyses (Yang & Rannala, ) were based on the program MRBAYES version 3.2.6 (Ronquist et al., ). Analyses were performed using the HKY+I nucleotide substitution model as identified using jModelTest2 (Darriba, Taboada, Doallo, & Posada, ). MRBAYES analyses were implemented using four runs with the following parameters: 5,000 sampled trees recorded every 2,000 generations with the initial 1,000 trees discarded as burnin. The standard deviation of splits frequencies was used to determine whether convergence among runs was occurring. Maximum‐likelihood analyses were implemented using RAxML version 8.2.4 (Stamatakis, ) assuming an invariant sites model, with node support for the best tree assessed using 1,000 bootstrapping replicates. Resulting trees were visualized using MEGA 7.020 (Kumar et al., ). [...] Patterns of hybridization and introgression were investigated for the microsatellite data using the Bayesian approach of Anderson and Thompson () as implemented in the program NEWHYBRIDS (https://github.com/eriqande/newhybrids; accessed 21 December 2016). We used this analysis to obtain the posterior probability that each individual fell into one of six different categories: pure CSO, pure NSO, F1 hybrid, F2 hybrid, CSO backcross, and NSO backcross. Five replicate analyses were performed with unique random number seeds, with each replicate implemented using 5 × 105 Markov chain Monte Carlo steps recorded after an initial 5 × 104 burnin steps and Jeffrey's priors on π and θ. After analyses were completed, we identified the category with the highest average posterior probability for each individual and considered an individual's hybrid status as “unknown” if the highest posterior probability was <0.5.In the case of NSO samples, we quantified the spatial distribution of detected hybrids and pure individuals by generating a 5 × 2 contingency table (five detected NSO classifications × two locations: the NSO contact zone vs. the remainder of the NSO range). The primary goal of this analysis was to determine whether hybrid individuals were overrepresented in the NSO contact zone relative to other NSO regions. The contingency table was analyzed using the “fisher.test” function in R version 3.3.2 (R Development Core Team ) to determine whether difference in the frequencies of hybrid and nonhybrid individuals existed between geographic areas. CSO samples were also analyzed to determine whether spatial variation in the locations of hybrid versus nonhybrid individuals existed. In this case, we evaluated the hypothesis that nonpure CSO individuals (i.e., CSO individuals classified as F2, NSO, or unknown) were identified at more northern locations of the CSO range closer to the NSO contact zone. This analysis was implemented via logistic regression with the “glm” function in R using sample latitude as the independent variable and individual sample classification (pure CSO vs. different nonpure CSO categories) as the response variable.Our data set included CSO samples that were collected in either 1996 (n = 23; samples from Funk et al., ) or 2012–2015 (n = 104), thereby permitting us to evaluate the hypothesis that there has been no change in the frequencies of different hybridization categories during this period. The analysis was performed by constructing a 2 × 3 contingency table that summarized the number of putative CSO samples determined to be either pure CSO, pure NSO, or of hybrid/unknown ancestry for each year. We tested for independence of categories across years using the “fisher.test” function in R. To visualize these results, and also to identify possible differences in locations of hybrids and nonhybrids that may occur between older (1996) and newer (2012–2015) CSO samples, we generated plots that displayed the latitudinal position of pure CSO versus non‐CSO samples from each time period.The relative support for five different gene flow models were quantified for mtDNA and microsatellite data sets using the Bayesian inference framework implemented in program MIGRATE‐N version 3.6.6 (Beerli, ). Given our interest in understanding gene flow between NSO and CSO, we restricted this analysis to our NSO samples from the contact zone and the complete set of CSO samples in order to minimize the total number of estimated parameters and maintain focus on the most geographically proximate sets of samples from the two subspecies (Figure ). The five gene flow models included (1) the full migration model (asymmetric migration between populations), (2) migration only from NSO to CSO, (3) migration only from CSO to NSO, (4) symmetric (equal) migration between subspecies, and (5) no gene flow. For the microsatellite data, analyses under each gene flow model were performed assuming the Brownian motion mutational model, uniform priors on θ ranging from 0 to 30 (δ = 3; 1,500 bins), and uniform priors on M ranging from 0 to 100 (δ = 10; 1,500 bins). The Markov Chain Monte Carlo search strategy for each analysis involved recording 2,000 states sampled every 100 steps after an initial burnin of 2 × 105 steps. Twenty concurrent replicates were performed, each based on a static heating scheme with a swapping interval of 1 and 10 chains with temperatures of 1, 1.12, 1.28, 1.49, 1.79, 2.22, 2.94, 4.35, 8.33, and 106. For the mtDNA data, we used the DNA sequence model with a transition/transversion ratio of 6.12 as estimated using the program MEGA (Kumar et al., ), uniform priors on θ ranging from 0 to 0.1 (δ = 0.01; 1,500 bins), and uniform priors on M ranging from 0 to 10,000 (δ = 1,000; 1.500 bins). Search strategies for analyses were based on 2,000 recorded steps sampled every 100 iterations and 10 concurrent chains, with each chain implementing static heating as described for the microsatellite data. After analyses, we used the reported marginal likelihoods for each model (log‐probability for the mtDNA, Bezier approximations for microsatellites) to compute Bayes factors as per Beerli and Palczewski () and determine the relative support of each model relative to the model with the highest likelihood. […]

Pipeline specifications

Software tools Arlequin, MEGA, CLUMPP, MrBayes, jModelTest, RAxML, NewHybrids
Applications Phylogenetics, Population genetic analysis
Organisms Orussus occidentalis