Computational protocol: Detection of outlier loci and their utility for fisheries management

Similar protocols

Protocol publication

[…] A literature search for EST-linked microsatellite markers described for Salmo or Oncorhynchus species was conducted, from which 99 were selected for subsequent testing (; ; ; ; ; ). In addition, 11 389 ESTs described for O. nerka were identified from GenBank (date of inspection 01/08/2008), examined for microsatellites containing uninterrupted dinucleotide repeats using Tandem Repeats Finder (), and subsequently cross-referenced with the consortium for Genomics Research on All Salmon (cGRASP) to target those that have known functional annotations. Using this approach, polymerase chain reaction (PCR) primers were designed for an additional 125 EST-linked microsatellites using primer3 software (). Lastly, we targeted 19 putatively neutral, non-EST-linked microsatellite loci developed specifically for Oncorhynchus spp. and used in previously published population genetic analyses of Pacific salmonids (; ; ; ; ; ; ). In total, we tested 243 loci comprising 224 EST-linked microsatellites and 19 putatively neutral, non-EST-linked microsatellites (). [...] Outlier loci were detected using three different approaches. In all cases, the input data set was split by ecotype. First, a coalescent-based simulation approach was used to identify outlier loci displaying unusually high and low values of FST by comparing observed FST values with values expected under neutrality () as implemented in lositan Selection Workbench (). We performed an initial run with 50 000 simulations and all loci, using the mean neutral FST as a preliminary value. A more accurate estimate of the mean neutral FST was obtained following the first run by excluding all loci lying outside the 99% confidence interval, as their distribution could be the result of selection rather than neutral evolution. This refined estimate was used for a final set of 50 000 simulations over all loci. Second, we employed the approach of , which investigates outliers in a pairwise fashion based on population-specific F-statistics. The coalescent simulations were performed with detsel 1.0 (). Null distributions were generated using the following parameters: population size before the split N0 = 500; mutation rate μ = 0.0001 and 0.00001; ancestral population size Ne = 500, 1000 and 10 000; time since bottleneck T0 = 50, 100 and 1000; and time since population split t = 100. Outliers were determined based on an empirical P value for each locus at the 95% and 99% levels using the two-dimensional arrays of 50 × 50 square cells (). Lastly, we used the Bayesian simulation-based test of that has been further refined and implemented in the software bayescan 2.0 (). We based our analyses on 10 pilot runs each consisting of 5000 iterations, followed by 100 000 iterations with a burn-in of 50 000 iterations. [...] The data set was screened for null alleles using microchecker (). Allelic diversity, observed (HO) and expected heterozygosity (HE) were calculated at each locus for each ecotype and spawning locality using arlequin 3.11 (). Deviation from Hardy–Weinberg (H-W) equilibrium was assessed using exact tests based on the Markov chain method of as implemented in genepop 3.3 (1000 dememorization, 1000 batches and 10 000 iterations; ). Linkage disequilibrium was investigated for all pairs of loci using genepop 3.3 (). Type I error rates for tests of linkage disequilibrium and departure from H-W expectations were corrected for multiple comparisons using the sequential Bonferroni procedure ().The hierarchical organization of genetic variation was assessed using an analysis of molecular variance (amova; ) based on FST comparisons within and among reproductive ecotypes as implemented in arlequin 3.11 (). Likewise, genetic differentiation among each spawning locality was calculated by pairwise FST (), for which 95% confidence intervals were estimated by bootstrapping over loci, all of which were implemented in arlequin 3.11 ().Correspondence of geographically separated spawning sites and ecotypes as discrete genetic units was further tested using the Bayesian method of as implemented in structure 2.3.3. Run length was set to 1 000 000 Markov chain Monte Carlo (MCMC) replicates after a burn-in period of 500 000 using correlated allele frequencies under an admixture model using the LOCPRIOR option. The LOCPRIOR option uses sampling locations as prior information to assist the clustering for use with data sets where the signal of structure is relatively weak (). The most likely number of clusters inferred from the different data sets was determined using the ΔK approach (), by varying the number of clusters K from 1 to 10 with 20 iterations per value of K. […]

Pipeline specifications

Software tools BayeScan, Arlequin, Genepop
Application Population genetic analysis
Organisms Hemisus marmoratus, Oncorhynchus nerka