Computational protocol: Understanding the Spatial Scale of Genetic Connectivity at Sea: Unique Insights from a Land Fish and a Meta-Analysis

Similar protocols

Protocol publication

[…] The 120 mitochondrial ATPase 6 and 8 sequences were aligned using Geneious v.5.6. (Biomatters, http://www.geneious.com) and genealogical relationships among individuals were investigated using the coalescent-based approach in TCS [, ]. Sequence diversity was estimated as haplotypic diversity and nucleotide diversity [] per population in Arlequin 3.5.1.2 [].Demographic or selection history of the entire mitochondrial dataset was assessed by computing a mismatch distribution in Arlequin. Mismatch analysis tests for the agreement of the data with a model of demographic expansion [, ]. Fu’s [] test of demographic history or selective neutrality was also employed to assess the signal of expansion in the data set. In the event of demographic expansion or directional selection, large negative FS values are generally observed. We also assessed the demographic history of the A. arnoldorum on Guam with a Bayesian Skyline Plot (BSP; []) modelled in BEAST v1.7.2 [] using the mitochondrial ATPase 6 and 8 sequence data. A BSP is the posterior distribution of the effective population size through time generated using a standard Markov Chain Monte Carlo (MCMC) sampling procedure assuming a single panmictic population. For the analysis, we specified a strict molecular clock with a fixed mutation rate of 1.4% per million years [] and a GTR model of sequence evolution. These parameters were chosen because systematic rate heterogeneity is not expected in intraspecific data. The number of grouped individuals was set to five and two analyses were run for 100 million generations, sampling every 1000. We combined the independent runs and all effective sample sizes (ESS) were >200. Tracer v1.5 [] was then used to analyse the runs and generate the skyline plots. [...] For the mitochondrial data set, pairwise population genetic structure was calculated as ΦST [] and the degree of population structure was explored with a hierarchical analysis of molecular variance (AMOVA) in Arlequin []. Isolation by distance (IBD; []) was investigated using a Mantel permutation test [] of the association between genetic distance (ΦST) and geographic distance, either direct (Euclidian) or coastal distance in Arlequin [].For the microsatellite dataset, the 17 microsatellite loci were tested for departures from Hardy-Weinberg equilibrium (HW) in Arlequin and linkage disequilibrium was assessed using Genepop [, ]. Microchecker [] was then used to determine whether any observed departures from HW at each locality was due to null alleles, allele dropout or allele stuttering. The extent of inbreeding was also estimated using the IIM (individual inbreeding model) approach with 10,000 iterations implemented in INEst []. This method discriminates between heterozygote deficits due to null alleles, and deficits due to other causes such as inbreeding. It allows the calculation of unbiased estimates for a multilocus average inbreeding coefficient (FIS) in the presence of null alleles at proportions (pn). We estimated genetic diversity at each locality as number of alleles per locus, allelic richness, and Wright’s inbreeding coefficient (FIS), using the software FSTAT [] and expected and observed heterozygosity using Arlequin [].Pairwise genetic differentiation (FST) of microsatellites among populations was estimated and tested for significance with 10,000 permutations using Arlequin []. In addition, we calculated G’ST_est [] and Dest [] using SMOGD v.1.2.5 [] and their correlation with FST was tested using a linear regression []. We also calculated Shannon’s information index of population subdivision (SHUA) which is thought to provide another robust estimation of genetic exchange in addition to FST [, ], for pairwise population comparisons in Genalex [].Structure v2.3.4 was used to identify the presence of populations or genetic clusters in A. arnoldorum on Guam based on microsatellite data. The most likely value of K, the number of clusters, was determined by plotting the mean natural log (Ln) probability of the data versus K over multiple runs and change in K (∆K) following Evanno et al. [] with 1,000,000 MCMC repetitions and a burn in of 10,000 iterations. In each case, prior population information was not used, and correlated allele frequencies and admixed populations were assumed. Mantel permutation tests [] were also used with the microsatellite data to test for the association between genetic distance (FST) and direct and coastal distance (IBD; []) in Arlequin []. Spatial autocorrelation analysis as calculated in Genalex [] was then used to identify the scale of spatial genotypic structure among A. arnoldorum populations around Guam. The autocorrelation coefficients of multilocus microsatellite genotypes (r) was calculated for individuals sampled in the same locality (distance class 0) and among individuals separated across a range of distances from 0 to 100 km evaluated at 5 km increments. Our data was tested against the null hypothesis of randomly distributed genotypes, with 999 permutations and 999 bootstrap replicates. [...] Next, we simulated genetic differentiation under a range of dispersal scenarios and compared these results with our microsatellite data. To do this, we used IBDSim v.2 [] to simulate genotypic data for multiple unlinked loci under a general isolation-by-distance model. IBDSim is based on a backward-in-time coalescent method that enables the generation of large data sets using complex demographic scenarios. For our simulations, we constructed a 100 km × 0.5 km matrix that was representative of the entire intertidal area between the two most distant sample sites on Guam (Pago to Adelup Point; ). The distance of these sites set the outer spatial limits of our matrix. The matrix was composed of 50,000 grid squares with each square 10 m × 10 m in area. In each simulation, we populated the matrix with 10, 20, 50, 100, 500 or 1000 larval fish per grid square, which corresponds to densities of 0.1, 0.2, 0.5, 1, 5 and 10 larvae per m2, respectively. These densities were chosen as input parameters based on empirical estimates of the total adult density of A. arnoldorum obtained for five of the six sampling locations by another study [] conducted a month after the collection of tissues for the current study. The empirical estimates ranged from 1.3 to 9.3 individuals per m2 (average 4.8/m2). Our simulations therefore provide an assessment of genetic differentiation across a reasonable range of population densities (although we acknowledge that the density of larvae and adults might differ in reality).For each simulated population density, we used input parameters that closely matched those of our empirical dataset. These included 17 microsatellite loci under a strict stepwise mutation model (SMM; []) using a mean mutation rate of 0.001 []. To this we applied six different dispersal distributions (named in the IBDSim Manual as ‘0’, ‘2’, ‘3’, ‘6’, ‘7’, and ‘9’; []) to model various degrees of dispersal around the inter-tidal matrix. These dispersal distributions have similar total emigration rates and mostly differ in their ‘shape of dispersal’ characterised by the mean squared parent-offspring dispersal distances (σ2). For our simulated matrices representing a range of dispersal scenarios, the default values defined by IBDSim for dispersal distributions correspond to mean squared parent-offspring dispersal distances of 10 m, 40 m, 100 m 200 m, 1000 m. These distances can be interpreted as the average squared axial distance that offspring of a common ancestor will become separated per generation [, ]. These mean squared parent-offspring dispersal distances are paired with different combinations of M and n that control the maximum dispersal rate per generation and kurtosis (a measure of shape) of the dispersal distribution per generation respectively (see IBDSim Manual; []). For each simulation, the maximum possible dispersal distance was capped at 100 km (i.e., to the size of the largest distance possible in the matrix), which is also a realistic value assuming Lagrangian dispersal [] and a one month larval phase (Platt and Ord, unpublished data). The boundary of the matrix was set to ‘absorbing’ in which individuals that emigrate out of the lattice are lost (i.e. swept out to sea). All simulations used a truncated Pareto distribution (e.g. []) that allows for high dispersal rates as expected in the marine environment and is characterized by high kurtosis, which is often observed in biologically realistically functions [, ]. This distribution assumes a high probability of dispersal per generation over a relatively small distance, and decreasing probability for higher distances. We sampled fish from the simulated lattice from 100 evenly distributed locations (each population 1 km apart). Ten replicate analyses were conducted for each simulation combination. We then used Genepop version 4.0.10 to calculate global FST between the simulated populations and compared this with the global FST from our empirical data. The simulated FST values were approximately normally distributed and we subsequently used the standard deviation of FST values to calculate where 99% of values would theoretically lie in a normal distribution (i.e z = ±2.576) to provide a “99% percentile” for FST values at each density. […]

Pipeline specifications

Software tools Geneious, Arlequin, BEAST, Genepop, GenAlEx, IBDsim
Application Population genetic analysis
Organisms Danio rerio, Drosophila melanogaster