Computational protocol: Structure and phylogeography of two tropical predators, spinner (Stenella longirostris) and pantropical spotted (S. attenuata) dolphins, from SNP data

Similar protocols

Protocol publication

[…] Skin samples were collected from spotted dolphins and spinner dolphins via biopsy dart [] on research cruises, from specimens taken as bycatch in the tuna purse-seine fishery, or from stranded or beachcast individuals. Biopsies were collected in accordance to the best practices of the Society for Marine Mammalogy (, according to the permissible sampling techniques as stated on the project-specific Marine Mammal Protection Act (MMPA) Permit, and according to internal IACUC-approved methods.Spinner dolphin samples collected on research cruises in the ETP were assigned to a population based on the external morphology of the majority of animals in the school. This approach was taken because: (i) these often-large groups (greater than 1000 individuals) contained individuals exhibiting a range of morphology; only after observing the group for some time could observers classify it to population, (ii) researchers collecting biopsies from dolphins near the bow of the research vessel found it very difficult to confidently classify fast-swimming individuals at sea, and (iii) there is significant overlap in range; therefore, geography was not a reliable predictor of population identity. Samples were selected from areas where the eastern and whitebelly types are known to overlap, as well as from outside the overlap region (). The most experienced observers on the research cruise made the assessment of the type of the majority of the school prior to sampling, but there was probably some error involved. Unfortunately, there is no way to measure the accuracy of each sampling event. Spotted dolphins samples () were assigned to subspecies and populations based on the geographical location of the sampling site. In areas where the two ETP subspecies overlap, spotted dolphin samples collected from research cruises were assigned to populations based on external morphology [,].Tissue samples were preserved in salt-saturated 20% DMSO or 70% ethanol and stored frozen at −20°C, or frozen at −80°C with no preservative. DNA was extracted using silica-based filter membranes (Qiagen, Valencia, CA) or by NaCl precipitation []. DNA was quantified using Pico-Green fluorescence assays (Quant-it Kit, Invitrogen, Carlsbad, CA) using a Tecan Genios microplate reader (Tecan Group Ltd, Switzerland). DNA quality was assessed by electrophoresis in 1% agarose gel; only high-molecular weight extracts were used.Sequencing libraries were constructed using a ‘genotyping-by-sequencing’ protocol [] as previously described [] using the PstI enzyme. Library preparation and multiplexed sequencing on an Illumina HiSeq 2000/2500 (100 bp, single-end reads) were completed at the Cornell University Institute of Biotechnology's Genomic Diversity Facility ( [...] Per-population heterozygosity was calculated using the strataG [] package in R []. We then estimated differentiation (FST) for each pairwise combination of populations [–]. Point estimates and permutation tests (1000 repetitions) were generated using the strataG package in R [].We also directly tested hypotheses of population differentiation using multivariate analyses, specifically the discriminant analysis of principal components (DAPC) in the R package Adegenet []. DAPC calculates principal components and then estimates a centroid and measures the variance for predefined populations. The discriminant analysis tests the probability of each individual falling in the space of each of the populations based on the ‘geometric space’ created by the centroid and variation. Before conducting the DAPC analyses, we examined the cumulative variance explained by the eigenvalues for the full range of principal components.Because of the size and variability of these datasets, spurious ad hoc solutions might be found. These include, but are not limited to, over-fitting (i.e. using too many principal components and thus resulting in large and unstable differences between populations). To assess if over-fitting was occurring, we calculated alpha-scores for each population and each dataset overall, simulated in Adegenet (simulated 10 times).To complete the DAPC, we then constructed synthetic discriminant functions that represent linear combinations of the allelic data with the largest between-group variance and the smallest within-group variance. In all analyses, we kept only the first three eigenvalues, as they represented the vast majority of the information. Finally, we plotted the first two discriminant functions as two-dimensional scatters in R []. [...] Phylogeographic analyses were performed using SNAPP, a Markov chain Monte Carlo (MCMC) sampler for bi-allelic data used to infer phylogenetic trees []. Because of the high number of SNP loci for each individual and because phylogenetic analyses of large datasets are computationally intensive, the sample sizes for these analyses were reduced. Two samples were chosen at random from each putative population for spinner dolphins, and between one and seven were taken for spotted dolphins because of the lesser number of populations. Sample details are listed in electronic supplementary material, tables S1 and S2. Given the differences between populations (based on FST and DAPC), we did not replicate these analyses with different samples selected from each population.SNAPP was implemented in the software package BEAST 2 []. Prior to the analyses, datasets were converted using custom R scripts from the strataG format (gtype) to nexus format, input into Beauti (v. 2.3.1; []) and exported as .xml files. Forward and reverse mutation rates were estimated and chains were sampled every 1000 iterations. Coalescence rate was sampled throughout the MCMC. All other settings followed the default given in Beauti.SNAPP log files were read into Tracer (v. 1.6.1; []) to evaluate the convergence of the MCMC analyses. This included assessing the overall quality of the analyses inferred by the trends and variance of the estimates of Bayesian posteriors and estimated sample size (ESS), and estimating the number of chains to remove as burn-in.We used DensiTree (v. 2.01; []) to visualize and qualitatively analyse phylogeographic relationships and uncertainty using multiple trees. DensiTree displays the frequency of topologies as the colour of the trees presented. The most popular topologies are blue, the second most popular topologies are red and other topologies are green. TreeAnnotator (v. 2.3.1; []) was used to produce a consensus tree for the SNAPP analysis for each dataset. Burn-in for TreeAnnotator and DensiTree were set at 10%. We limited the posterior probability calculation for each node in the maximum clade credibility tree to those with greater than 0.5 posterior probability. Common ancestor heights were used for all consensus tree node heights. Finally, the consensus tree topology, posterior probability for each node, and theta for each branch were visualized in FigTree (v. 1.4.2; []).Although, there are methods for inferring the location of the root [] based on calculating the posterior probability of the root location, these options are not available for SNP data presently. Moreover, using outgroups in SNAPP can create long branches which make parameter estimation difficult and violate the assumptions of the Yule prior used in SNAPP (R. Bouckaert 2014, Personal Communication). Midpoint rooting often results in the root being assigned to the longest branch, which is the case in both of our species. This may not, however, reflect reality in terms of ancestry. Thus, we have tried to avoid making inferences dependent on the location of the root. It is also possible that phylogeographic patterns in nuDNA trees could represent shared ancestry, admixture (i.e. genetic exchange between populations) or a combination of both. […]

Pipeline specifications

Software tools DART, TUNA, adegenet, PhyloNet, BEAST, ScaleHD, DensiTree, FigTree
Applications Phylogenetics, Population genetic analysis, GBS analysis, GWAS
Organisms Stenella attenuata