Computational protocol: A New Phylogeographic Pattern of Endemic Bufo bankorensis in Taiwan Island Is Attributed to the Genetic Variation of Populations

Similar protocols

Protocol publication

[…] Partial sequences of the mtDNA D-loop gene were aligned using the program Clustal X v1.81 and optimized manually. Firstly, 80 mtDNA D-loop (348 bp) haplotypes of this study with 38 sequences of B. gargarizans in various regions of mainland China were retrieved from NCBI GenBank (, ). One closely related species Bufo tibetanus as an outgroup was also obtained from GenBank (UX878885.1). The haplotype genealogy of B. bankorensis was separately reconstructed by maximum likelihood (ML) tree, neighbor-joining (NJ) tree and Bayesian analysis using PhyML 3.0 , MEGA v5.0 and MrBayes v3.1.2 . Neighbor-joining tree nodes and branch lengths were statistically tested using a bootstrap method of 10,000 replicates and an interior branch test, respectively. The jModelTest program was used to determine the most appropriate model for the analyses using the Akaike Information Criterion (AIC). Markov Chain Monte Carlo (MCMC) simulations were run for 5,000,000 generations with trees sampled every 1000 generations. Then Bayesian posterior probabilities were estimated after omitting the initial 1,000,000 generations. We sampled a tree every 100 generations and calculated a consensus topology for 7500 trees by omitting the first 2500 trees.The second dataset was limited to the samples collected of this study. Partial sequences (564 bp) of 279 individuals of B. bankorensis from 27 populations in Taiwan and 4 individuals of B. gargarizans from Shanghai (mainland China) were analyzed. The numbers of haplotypes (N) and the values of haplotype diversity (h; ) and nucleotide diversity (π; ) were calculated using the DnaSP v5.0 software (). The numbers of mutations between DNA haplotypes calculated in pairwise comparisons with MEGA v5.0 were employed to construct a minimum spanning network with the aid of MINSPNET . Two measures of population differentiation, G ST , which only considers haplotype frequencies, and N ST , which considers similarities between haplotypes in addition to their frequencies, were compared to infer phylogeographic structure. A greater N ST means that more closely related haplotypes occur in the same population, indicating the existence of phylogeographic structure at this scale . The existence of phylogeographic structure was tested following Pons and Petit by calculating two measures of genetic differentiation: G ST and N ST. The global N ST (0.792) was significantly higher than the global G ST (0.312), providing evidence of the existence of phylogeographic structure because closely related haplotypes would be detected more frequently than that are less closely related in the same area . The determination of G ST and N ST was carried out using DnaSP v5.0 . [...] To analyze the population demographic history of B. bankorensis, Tajima's D statistic and Fu's Fs test of neutrality, the frequency distribution of pairwise differences between mtDNA haplotypes (mismatch distribution), and Bayesian skyline plots (BSP) were examined. The significances of Fu's Fs and Tajima's D values were evaluated using the coalescent algorithm in DnaSP v5.0 . Evidence of population expansion within the lineages was obtained by determining mismatch distributions, implemented in DnaSP v5.0 . This method is based on the premise that compared with a constant population size, population growth or decline leaves a distinctive signature in DNA sequences. A smooth and often unimodal pattern of the mismatch distribution (i.e., a large number of closely related haplotypes, indicating non-equilibrium conditions) reflects population expansion, whereas for stationary populations, the distribution is ragged and often multimodal (i.e., neutral, under equilibrium conditions) . Furthermore, Harpending's raggedness index (Rg) and the sum of squared deviations (SSD) were calculated using Arlequin v3.5 to test whether the sequence data deviated significantly from the expectations of a population expansion model. Finally, for each BSP, the appropriate model of nucleotide substitution was determined using jModelTest. Genealogies and model parameters for each lineage were sampled every 1000th iteration for 20 million generations under a strict molecular clock with uniformly distributed priors and a pre burn-in of 2000. The effective sample size (ESS) for each of the Bayesian skyline analyses was greater than 200, suggesting that the 50 million generations were sufficient to estimate the demographic history for each lineage. This coalescent-based approach estimates the posterior distribution for the effective population size at intervals along a phylogeny, thereby allowing the inference of population fluctuations over time. These analyses were run for 1,000,000 generations, discarding the burn-in period. Plots for each analysis were drawn using Tracer v1.5 . [...] The program Arlequin v3.5 was employed to estimate the F ST values and their statistical significance between population pairs, i.e., the significance of population differentiation, with the following settings: 1000 permutations for significance and 10,000 steps in the Markov chain. When multiple comparisons were performed, the P values were adjusted using the sequential Bonferroni procedure . A hierarchical analysis of molecular variance (AMOVA) was conducted using Arlequin v3.5 to evaluate the most probable population configuration and geographic subdivision. The samples collected from different localities were grouped into 2 and 5 groups, according to the different geographic hierarchies that matched the geographic proximity. Samples collected from different localities were grouped together as follows to investigate the potential effects of various geographic barriers: (1) in two independent western and eastern groups (1–19; 20–27), which were primarily divided by the CMRs; and (2) in five groups (1–5; 6–12; 13–18; 19; 20–27) according to the major biogeographic zones in Taiwan based on the freshwater fish fauna and other studies in frogs . The Spatial Analysis of Molecular Variance (SAMOVA) 1.0 program implements a simulated annealing procedure to define groups of geographically homogeneous populations that maximize the proportion of total genetic variance due to differences among population groups (F CT). These analyses were performed based on 1000 simulated annealing steps, and comparisons with the maximum indicators of differentiation (F CT values) were conducted when the program was instructed to identify K  =  2–10 partitions for each sampling area within each analysis. The Mantel test was performed using the computer program Alleles in Space (AIS) to identify the correlations between the genetic and geographical distances among populations. Statistical significance was tested using 10,000 random permutations. Geographical distances were calculated based on the latitude and longitude of each population's collection location (). Finally, the genetic landscape shape of B. bankorensis was calculated using AIS. Using geographic data (projected in Universal Transverse Mercator) and the genetic distance scores calculated in AIS, a three-dimensional landscape, with genetic distance indicated by the z dimension was created. The output 3-dimensional values were projected onto a map of Taiwan Island to perform comparisons between the topographic and simulated genetic landscape.Asymmetrical gene flow was observed among three regions: the western region, including localities 1–18; the eastern region, including localities 20–27; and the southern region, including locality 19. To estimate contemporary and historical gene flow among the three regions, maximum likelihood inference was performed using MIGRATE software, v 3.2.6 , . The MIGRATE analyses were conducted using a full migration model (θ and M were estimated jointly from the data), which was compared with a restricted model (θ was averaged, and M was symmetrical between populations). The MIGRATE analyses were run five times to generate replicates using ten short chains (sampling 10,000 trees) and three long chains (sampling 100,000 trees), with a burn-in period of 10,000 trees, using an adaptive heating scheme. A migration matrix model with unequal population sizes and different migration rates was assumed . [...] The software BEAST v1.7.5 was used to estimate the time to the most recent common ancestor (TMRCA) of the major mitochondrial lineages under an MCMC Bayesian approach by first dataset. Based on multiple fossil data and sequence comparisons, a divergence rate of approximately 3.50% per million years , was estimated for the D-loop of Bufo mtDNA, and absolute TMRCA values were subsequently obtained , . All analyses were performed using the HKY model of nucleotide substitution for the best-fitting model using jModelTest. Two independent Monte Carlo Markov chains were run for 100 million generations, with sampling every 10,000 generations and 10% burn-in of the posterior samples. The effective sampling size (ESS) parameter was found to exceed 200, which suggests acceptable mixing and sufficient sampling. The analysis was run five times to test the stability and convergence of the MCMC chains in plots of posterior log likelihoods in Tracer v1.5 . The posterior samples from all runs were combined and analyzed in Tracer v1.5 to obtain mean estimates and the 95% highest posterior densities (HPD) of the TMRCA values. […]

Pipeline specifications

Software tools Clustal W, PhyML, MEGA-V, MrBayes, jModelTest, DnaSP, Arlequin, SAMOVA, AIS, BEAST
Applications Phylogenetics, Population genetic analysis, GWAS
Organisms Bufo gargarizans