Computational protocol: Population genetic structure of Bellamya aeruginosa (Mollusca: Gastropoda: Viviparidae) in China: weak divergence across large geographic distances

Similar protocols

Protocol publication

[…] The resulting 116 sequences were aligned with Clustal X (Thompson et al. ). Variable sites, parsimony informative sites, number of haplotypes, and haplotype (h) and nucleotide diversity (π) were calculated with DNASP v. 5 (Librado and Rozas ). ARLEQUIN v. 3.5 (Excoffier and Lischer ) was used to test for neutrality employing Tajima's D (Tajima ) and Fu's F statistics (Fu ). AMOVA (Analysis of molecular variance) was conducted in ARLEQUIN to partition the genetic variance within and among populations. Furthermore, three groups were defined according to river system (NS and BY as Yellow River system, EH and DC as Plateau Lake; all other populations as Yangtze River system) to estimate variance components. Pairwise ΦST values were calculated to assess population differentiation. We generated a statistical parsimony haplotype network with a 95 % connection limit with TCS v. 1.2.1 (Clement et al. ). Further, we used the COI data to generate Bayesian skyline plots to explore the demographic history of each population alone and the whole combined data set. This method permits the reconstruction of past population demography and generates plots of female effective population sizes (N e) over time. The appropriate substitution model was determined as HKY+I+G with jModeltest v. 2 (Darriba et al. ). We created individual input files for each population, and all populations combined with BEAUti v.1.7.4 (implemented in the BEAST package). Analyses were run in BEAST v.1.7.4 (Drummond et al. ). We used a strict clock with a published substitution rate of 1.32 % per million years for invertebrates estimated for COI under the HKY model (Wilke et al. ). The program was run for 10 million generations sampling every 10,000 generations. The Bayesian skyline plots were subsequently generated with Tracer v. 1.5 (Rambaut and Drummond ). We tested for isolation by linear geographic distance across all data. As sampling at different geographic scales may lead to artificial IBD patterns, we also tested for isolation by distance and stream distance within the Yangtze River only using a Mantel test with 10,000 randomizations as implemented in IBD v. 1.52 (Bohonak ). Stream distances were measured along the streams between sites using ArcMap v. 10 (ESRI ). [...] All samples (N = 277) were genotyped at seven microsatellite loci (Table S1). The PCR amplifications were performed in a total volume of 10 μL: 1× PCR buffer (100 mmol/L Tris–HCl, 500 mmol/L KCl), 20–40 ng genomic DNA, 0.5 mmol/L of each primer, 200 mmol/L of each dNTP, 1.5 mmol/L MgCl2, and 0.25 U of Taq DNA polymerase (TaKaRa, Dalian, China). Thermal cycling was performed with a TProfessional Thermocycler (Biometra, Göttingen, Germany) under the following conditions: 94°C for 5 min, 35 cycles at 94°C for 45 sec, annealing at 50–55°C depending on the marker for 30 sec (details in Table S1), 72°C for 30 sec, and a final extension at 72°C for 10 min. Forward primers were 5′‐labeled with HEX, ROX, or TAMRA (Table S1). The sizes of the fluorescently labeled PCR products were estimated according to an internal size marker (LIZ‐500) on an ABI PRISM 3700 sequencer (Applied Biosystems). Fragment lengths were scored using STRAND v. 2.3.48 (UC Davis Veterinary Genetics Laboratory, Davis, CA).The resulting data were first inspected with MICRO‐CHECKER v. 2.2.3 (Van Oosterhout et al. ) to test for unexpected mutation steps, large gaps, unusually sized alleles and the presence of null alleles. The number of alleles (N A), the expected (H E) and observed heterozygosities (H O), F ST, and the inbreeding coefficients F IS (Weir and Cockerham ) were estimated with MSA (MICROSATELLITE ANALYSER) v. 3.15 (Dieringer and Schlötterer ). GENEPOP v. 4.0.10 (Rousset ) was used to measure heterozygote deficiency or excess and to assess deviations from HWE (Hardy–Weinberg equilibrium). P‐values were corrected for multiple comparisons by applying a sequential Bonferroni correction (Rice ). We tested for recent bottlenecks events using BOTTLENECK v. 1.2.02 (Piry et al. ) under the TPM (two‐phased model), with the proportion of stepwise mutations set to 95 % and the variance set at 15. Significance of deviations was tested using the Wilcoxon sign‐rank test with 1000 iterations. Further, we used the microsatellite data to calculate effective population sizes (N e). We used LDN e as implemented in NeEstimator v. 2 (Do et al. ) to calculate N e considering results from 0.02 and 0.01 as lowest allowed allele frequencies.Pairwise migration rates between the 12 populations were estimated using a maximum likelihood coalescent approach implemented in MIGRATE v 3.0 (Beerli and Felsenstein ); we estimated Ө and M (immigration rate/mutation rate) based on F ST values. We ran 10 short chains with a total of 10,000 genealogy samples and three long chains with 1,000,000 samples, following a burn‐in of 10,000 samples; three independent runs were performed. As MIGRATE has been suggested to be vulnerable to violations of the assumption of stable population sizes, we also used the Bayesian approach developed by Wilson and Rannala implemented in BAYESSASS 3.0 (Wilson and Rannala ) to infer migration rates. The program uses genotypic data and MCMCs (Markov Chain Monte Carlo) to infer recent patterns of gene flow. We performed five independent analyses. Each run contained 35,000,000 MCMC iterations, with a burn‐in of 3,500,000 iterations and a sampling frequency of 3500 generations with a random seed. To ensure sufficient mixing of the MCMCs and to improve the coverage of the probability space, we adjusted the acceptance rate for estimated allele frequencies and inbreeding coefficients. We increased the mixing parameters for the allele frequencies (ΔA) to 0.50 and for the inbreeding coefficient (ΔF) to 0.80. Mixing and convergence of MCMCs were visually assessed using TRACER. From the five independent runs, we chose the run with the lowest Bayesian deviance in the logProb calculated using the R‐function (R Development Core Team ) provided by Faubet et al. () and as suggested in Meirmans ().An analysis of molecular variance was performed with ARLEQUIN to partition the genetic variance among and within populations, similar to the mtDNA data. We also tested for isolation by linear geographic distance and stream distance (see mtDNA data) using Mantel tests with 10,000 randomizations as implemented in IBD v. 1.52 (Bohonak ). Population genetic structure was further analyzed using the Bayesian algorithm implemented in STRUCTURE v. 2.3 (Pritchard et al. ). The software was used to cluster individuals into K populations. The number of genetic clusters (K) was assessed assuming an admixture ancestry model with correlated allele frequencies; 20 independent runs were performed with 500,000 MCMC repetitions at each run discarding a burn‐in of 150,000 iterations. We tested K from 1 to 13 and used the ad‐hoc statistic ΔK (Evanno et al. ) to determine the most likely value of K. Data were sorted with CLUMPP v. 1.1.2 (Jakobsson and Rosenberg ), and aligned data were visualized in DISTRUCT (Rosenberg ). […]

Pipeline specifications