Computational protocol: The oceanic concordance of phylogeography and biogeography: a case study in Notochthamalus

Similar protocols

Protocol publication

[…] We sequenced COI as in Laughlin et al. (). Individuals were assigned to the northern or southern lineage using phylogenetic approaches generated with previously collected data [ZL]. Evaluation of synapomorphies separating the two primary clades led to the development of an SpeI restriction assay to diagnose individuals to correct clade with ~99% accuracy (as in Wares and Castañeda ); this was used in processing specimens from the Chiloé region and confirmed through sequencing a subsample of new individuals (Ewers‐Saucedo et al. ).Genotyping of SNPs proceeded with an Illumina GoldenGate array, along with the assessment of genotypic error rate as detailed in Zakas et al. (). These SNPs were originally developed from specimens collected in north‐central Chile, and selected from a large number of potential loci because of ease in specific oligo development, a mix of criteria related to the classification of gene region, and all were polymorphic in our original sample from northern Chile (Zakas et al. ). Critically, this last criterion is a form of ascertainment bias in our SNP data: All loci are polymorphic in individuals that are considered “northern,” and all loci are biallelic (because of array genotyping); therefore, no SNPs analyzed here are diagnostic for the two lineages of N. scabrosus.Single nucleotide polymorphisms were explored for outlier behavior relative to neutral expectations for a given global allele frequency using BayeScan (Foll and Gaggiotti ) under default analysis. Given the island‐model assumptions of this inference, our goal is not suggesting candidate loci associated with environmental transition but merely recognizing the loci that contribute the most to the overall signal. Instead, we focus on identifying loci that exhibit extreme differentiation as noted above; our threshold was for loci that exhibited a 0.99 probability of non‐neutral evolution. In addition, we evaluate cytonuclear disequilibrium between SNPs and the individual mitotype using CNDd (Asmussen and Basten ). Bonferroni‐corrected significance tests of association using the exact test between allele and mitotype, as well as genotype‐by‐mitotype interactions, were performed. This test was run on all populations for which the northern and southern mitotypes are sympatric, as well as a focused analysis on coastal populations between 40–42.5°S latitude where the transition of genetic lineages is most pronounced (see “”). Subsequent analyses were repeated, including all data and excluding the loci that exhibit outlier or disequilibrium behaviors. [...] The SNP data were analyzed to generate hypotheses regarding geographic structure. Loci with >50% missing data across all individuals were excluded from the analysis. Individuals missing data from more than 25% of remaining loci were excluded as well. First, to test for the number of genetically identifiable groups, data were analyzed with STRUCTURE (Pritchard et al. ) for up to k = 5 genetic populations. We know from ZL that k = 1 can be rejected; k = 2, 3 are of relevance to the biogeographic structure of coastal Chile, and hypotheses of larger numbers of populations (k > 3) could suggest isolation‐by‐distance or as‐yet unrecognized population structure across the larger domain sampled in this study. Each population structure value (k) was explored with five replicate random‐seed analyses with 25,000 steps for statistical burn‐in and 250,000 steps for inference using an admixture model without location as prior. This approach was repeated under a no‐admixture model for contrast. Results were used to establish the most informative value of k using the delta‐k method (Evanno et al. ) as implemented in STRUCTUREHARVESTER (Earl and Vonholdt ).Genetically divergent clusters can also be identified using discriminant analysis of principal components (DAPC; Jombart et al. ). In contrast to the approach using STRUCTURE, no population genetic model (e.g., Hardy–Weinberg, admixture) is assumed by DAPC, as implemented in the R package adegenet (Jombart ). Here, missing data were replaced with the mean frequencies of corresponding alleles, as suggested in Jombart et al. (). The optimal number of clusters (k) was chosen using Ward's clustering method based on BIC summary statistics (the “diffNgroup” selection criterion). Conclusions about population structure were weighted toward the values of k obtained by both analytical approaches. To evaluate the sensitivity of individual assignment to one of k clusters, the DAPC results are used.Traditional pairwise G st calculations were made under the infinite‐sites model using GenAlEx v6.5 (Peakall and Smouse ) with 1000 data‐by‐location permutations for statistical significance. Given prior information on Notochthamalus [ZL], and the range of collection sites, pairwise G st values were calculated across all individuals as well as genotypic data partitioned by mitotype (northern or southern). The correspondence between mitotype and inferred nuclear genome identity is strongly correlated (see “”). These partitions of “northern” and “southern” diversity were again evaluated for pairwise site differentiation; these values of within‐lineage G st were used to test for isolation by distance using the implementation of Mantel test in GenAlEx.In addition to these approaches, signal of hybridization among lineages was assessed using NewHybrids (Anderson and Thompson ) with 5000 burn‐in iterations and 100,000 MCMC replicates to determine the Bayesian posterior probability that an individual represents one or another parental stock or hybrid class including F1, F2, or backcross. As sites of lineage sympatry generally harbor intermediate frequencies of both lineages, a uniform prior (no declared parental stock individuals) was used for mixing probabilities as in Wares et al. (). This approach, relying on expected transitions in allele and genotype frequencies, does not rely on diagnostic markers nor reference genotype classes. […]

Pipeline specifications

Software tools BayeScan, adegenet, GenAlEx, NewHybrids
Application Population genetic analysis
Organisms Notochthamalus scabrosus