Computational protocol: Genetic Structure of Capelin (Mallotus villosus) in the Northwest Atlantic Ocean

Similar protocols

Protocol publication

[…] We conducted preliminary analyses on the 45 samples to assess linkage disequilibrium (LD) and to determine whether temporal samples from each location could be pooled. Details of those analyses are presented as Supplementary Material (, Table). Following the pooling of similar within-location collections sampled in different years, the new set of 18 samples from the 15 locations were re-evaluated () following the methods described in the Supplementary Material (). Genepop version 4.0.10 [] was used to calculate exact tests of departure from HWE []. Comparative measures of genetic diversity for each sample were calculated for each locus and where applicable, over all loci: allelic richness (N A), the number of alleles (R S), the number of private alleles (P S) for a standard sample size of 38 individuals, observed (H O) and expected number of heterozygotes (H E), and Weir and Cockerham’s [] inbreeding coefficient (F IS) using Genetix version 4.05.2 [] and Adze version 1.0 []. The number of alleles for a standard sample size was calculated using a rarefaction approach using Adze; allele frequencies for each locus were produced with Genetix. Weir and Cockerham’s [] F-statistics (θ) were also re-calculated to examine pair-wise genetic differentiation and to test the null hypothesis of panmixia. Samples outside the upper 95% confidence interval for θ were considered to be highly differentiated.The Præbel et al. study of genetic variation among NEA capelin [] used assignment tests performed with Geneclass version 2.0 [] to evaluate the fidelity of individuals to their group of origin. We undertook the same Bayesian individual assignment analyses, following Rannala and Mountain []; the details of which are provided as Supplementary Material ().Fixation indices systematically underestimate genetic differentiation when evaluated with highly polymorphic markers []. We used the harmonic mean of Jost's D est [] as a measure of heterozygosity-based relative differentiation of allele frequencies among samples (actual differentiation) adjusted for sample size. The D est parameter is based on allele identities and estimates allelic differentiation by partitioning heterozygosity within and among population components. This improves the estimates of population genetic divergence by not confounding within-group heterozygosity with divergence []. D est is not rooted in F-statistic-based metrics and performs consistently under different modelled levels of haplotype diversity and genetic distance between populations []. It is also considered preferable for estimating genetic differentiation from high diversity loci (such as those used here) relative to F ST or G ST that may underestimate genetic divergence [, , ]. Both D est and F-statistics weight the common alleles more heavily than the rare alleles [, , ]. D est was calculated with the online program Smogd version 1.2.5 ( [] separately for each locus and as the harmonic mean across loci. Bootstrapped 95% confidence intervals were calculated through resampling of individuals over 1,000 iterations. To test for the effect of the rare alleles in our data on D est we used the program SPADE (updated 2009) [] which allows for allele frequency input and hence manipulation. We estimated D est for each locus after removing all alleles with an average frequency across samples of less than 0.05. For Mvi9 and Mvi16, no average allele frequencies met this criterion and so 0.01 was used as the threshold. These levels have been used to define rare allele thresholds for examining the effect of rare alleles on estimates of effective population size [].Jost’s D est pair-wise distance matrix was imported into Primer version 6.1.5 (PRIMER-E Ltd, Plymouth, UK). Samples within this matrix were visualized on a non-metric (no units) multidimensional scaling plot (nMDS) based on Euclidean distances, such that the ranked differences among samples were preserved. The degree of correspondence between the distances among points implied by the nMDS scatterplot and the Jost D est matrix input was evaluated using the Kruskal stress formula 1 implemented in Primer. A complete linkage hierarchical cluster analysis was performed on this matrix. In complete linkage hierarchical clustering, the two clusters with the smallest pair-wise distance are merged in each step. This method was chosen as it is sensitive to outliers.The nMDS ordination considers the genetic distance between all possible pairs of samples and highlights those that are most differentiated from the others in Euclidean space. The expression of population differentiation in geographic space and the location of “barriers”, or areas of reduced gene flow among neighbouring samples, were also assessed using Barrier version 2.2 []. Geographical latitude and longitude co-ordinates, by sample, were used along with the genetic data (Jost’s D est excluding temporally differentiated samples from the same location) to generate a connectivity network of genetic distances based on Delaunay triangulation and Voronoï tessellation. Automatically generated virtual points prevented long edges forming in the Delaunay triangles. Monmonier’s Maximum Difference algorithm [] was then used to identify putative genetic barriers across the geographic landscape. In this analysis the user specifies the numbers of barriers; the first being positioned along the axis of greatest genetic differentiation, with subsequent barriers decreasing in order of importance and representing diminishing levels of differentiation. We computed barriers using a genetic matrix based on the harmonic mean of D est to represent genetic differentiation among capelin from the 15 locations. We ran the analyses excluding the three temporally distinct samples (2005CC, 2005SV, 2004LL) and by replacing CC, SV and LL samples with the temporally distinct samples to assess the stability of the results. Barrier robustness was evaluated by examining “consensus barriers”, that is the number of loci supporting the individual sections of the barriers constructed from single locus matrices of D est analyzed as above in the multiple matrix mode of Barrier. We sequentially considered both the consensus support for each barrier and the genetic distance between sample pairs at those barriers. We concluded our assessment (rejection of additional barriers) when a barrier was supported by fewer than four of the six loci (67%) in any segment over its length and (or) when average D est over all segments forming the barrier between adjacent samples connected through the oceanic features, was less than 0.01.Isolation by distance (IbD) describes the tendency of individuals to mate with others that are nearby rather than with individuals from some distant population and if present can cause genetic diversity to be clinal over the sample area. Mantel tests [] were used to test for congruence between genetic (Jost’s D est) and geographic (shortest sea route; km) distances among the 15 locations (excluding temporally differentiated samples) with the web-service program Isolation By Distance (Ibdws) []. Ibdws was used to calculate the slope and intercept of the IbD relation using reduced major axis regression [].We further investigated the genetic population structure of capelin using the Bayesian clustering program in Structure (version 2.3.4) []; the details of which can be found as Supplementary Material (). […]

Pipeline specifications

Software tools Genepop, GeneClass, IBDWS
Application Population genetic analysis
Organisms Mallotus villosus, Danio rerio