Computational protocol: Population structure and phylogeography of the Gentoo Penguin (Pygoscelis papua) across the Scotia Arc

Similar protocols

Protocol publication

[…] Micro‐checker (Van Oosterhout et al. ) was used to test for genotyping errors resulting from null alleles, large allele dropout, and stutter. Standard indices of genetic variability, including observed and expected heterozygosities (H O and H E, respectively) and number of alleles, were quantified for each colony at each locus using Arlequin v3.5.1.2 (Excoffier et al. ). Linkage disequilibrium was tested using likelihood ratio tests with 10,000 permutations (Slatkin and Excoffier ). Expectations for Hardy–Weinberg equilibrium were estimated for each locus and for all loci using exact tests with 1,000,000 steps (Guo and Thompson ).For microsatellites, Arlequin was used to estimate pairwise F ST's (Weir and Cockerham ) and we used the SGoF+ method (Carvajal‐Rodriguez and de Uña‐Alvarez ) to correct for multiple hypothesis testing, using the modal method for π 0 estimation and a significance level of 0.05. Arlequin was used to calculate a global F ST using analysis of molecular variance (AMOVA). Hierarchical F‐statistics were then calculated to search for genetic structure and find the population grouping that maximized the among‐group variation (F CT) and minimized the variation among populations within groups (F SC) (Excoffier et al. ). Significance of both overall and pairwise F ST's was computed using 1,000,000 permutations. The frequency of null alleles was estimated according to Brookfield (Brookfield ), and FreeNA (Chapuis and Estoup ) was used to determine whether null alleles were biasing estimates of population differentiation.For the mtDNA, we calculated standard molecular diversity indices and pairwise Φ STs in Arlequin. Molecular diversity measures and molecular distances were calculated with the Tamura and Nei substitution model and a gamma distribution (with α = 0.066) for rate heterogeneity among sites, as calculated in jModelTest 0.1.1 (Guindon and Gascuel ; Posada ). Pairwise Φ STs were calculated between all colonies and significance was determined using 10,000 permutations of haplotypes between colonies. [...] To test for isolation by distance among microsatellite loci, the shortest geographic distance by sea was calculated using Google Earth Pro (Google, Version 7.1.5.1557), and linearized estimates of F ST were tested for correlation with distance using Mantel's test (Smouse et al. ) in R with the vegan package (Oksanen et al. ). Statistical significance of correlation coefficients was estimated using 10,000 permutations.To test for isolation by distance for mitochondrial data, the correlation between these same geographic distances and pairwise Φ STs was calculated using Mantel's test with 10,000 permutations in Arlequin. [...] We explored two approaches to derive population structure from multilocus microsatellite data. First, population structure was analyzed using STRUCTURE (Pritchard et al. ). We compared analyses that assumed correlated and uncorrelated allele frequencies, both with and without treating sampling locations as a priori information (Pritchard et al. , ). For admixture model conditions, α was allowed to vary. The program was run with a burn‐in of 10,000 iterations, followed by 1,000,000 MCMC steps. Each value of K (number of populations) between 1 and 14 was run 10 times, and significance was calculated from the posterior probabilities (Pritchard et al. ; Evanno et al. ; Falush et al. ). The most likely value of K was determined using the delta K values from Structure Harvester (Earl and Vonholdt ).Secondly, to visualize population assignment in a spatial context, we used the GENELAND package within R (Guillot et al. ,; Guillot ). This program incorporates GPS data for each individual (set for each breeding colony sampled) and multilocus genotype data to estimate the number of populations and the geographic boundaries between the inferred clusters. We set the number of populations from 1 to 14, varying the initial population (prior) from 1 to 14 for 1,000,000 MCMC iterations using the spatial model, testing both the correlated and uncorrelated allele frequency models.In addition, in order to verify the presence of any confounding signal from subspecies differentiation, to test for hierarchical population structure, and to detect fine‐scale structure in a highly sampled geographic area, all analyses were repeated for the 10 Falkland Island colonies alone, and for colonies south of the Polar Front. [...] We estimated the ancestral locations of Gentoo penguins using a Bayesian discrete phylogeographic approach (Lemey et al. ) with BEAST v1.8.1 (Drummond et al. ). We used the mtDNA data (HVR1 region, 320 bp) for 259 penguins. To select an appropriate model of nucleotide substitution, jModelTest v2.1.6 was used (Darriba et al. ). We evaluated the likelihood scores for 24 substitution models, and then used the Bayesian information criterion to select the model. There were two models in the 95% confidence interval (K80 + I+G and HKY + I + G), and we used the HKY + I + G in subsequent Bayesian phylogeographic analyses. We assigned each penguin to one of five island populations: Bird Island, South Georgia (n = 38); Falklands (n = 101); King George, South Shetland Islands (n = 41); Port Lockroy, Antarctic Peninsula (n = 37); and Signy Island, South Orkney Islands (n = 42). We modeled island location as a discrete trait using a symmetric substitution model with the Bayesian stochastic search variable selection (BSSVS) procedure, and we reconstructed ancestral states for all ancestors. We set the clock model for the mtDNA data to a strict molecular clock. We used a coalescent tree prior with constant population size and used a normally distributed prior for the mtDNA clock rate with a mean of 0.55 and standard deviation of 0.15, based on previous calculations for the mitochondrial mutation rate in the sister species Pygoscelis adeliae (Millar et al. ). As our focus is the tree topology and the locations of ancestral populations, and not the time to the most recent common ancestor, we show a single mutation rate. However, see Clucas et al. () for a greater discussion on the node ages assessed using multiple rates. The prior for locations used the approximate continuous time Markov chain rate reference prior (Ferreira and Suchard ). We ran the analysis for 10 million generations, sampling states every 10,000 steps. We repeated the analysis four times, checked for convergence in Tracer (Rambaut et al. ), and then combined the four runs using LogCombiner. We obtained a maximum clade credibility tree (MCC tree) using Tree Annotator v1.8.1. [...] Finally, we used the microsatellite data to assign individuals to populations to determine whether there were any recent migrants within the populations that we had sampled. Assignment tests were run in Genodive v2.0b27 (Meirmans & Van Tienderen ). Allele frequencies that were found to be equal to zero were replaced with 0.005; 50,000 permutations of the Monte Carlo test were performed to determine the null distribution of likelihood values and the significance threshold was chosen to be 0.002 (Paetkau et al. ). The test statistic used was the Home Likelihood (Lh), as we had not sampled all possible source locations for migrants. […]

Pipeline specifications

Software tools Arlequin, jModelTest, vegan, Structure Harvester, BEAST, Genodive
Applications Phylogenetics, Population genetic analysis
Organisms Homo sapiens
Diseases Tooth Migration