Computational protocol: Genetic diversity in the endangered terrestrial orchid Cypripedium japonicum in East Asia: Insights into population history and implications for conservation

Similar protocols

Protocol publication

[…] Visible and clear DNA bands obtained from each ISSR and SCoT primers were scored as absent (0) or present (1). Bands length between 200–1,200 bp (ISSR) and 400–1,500 bp (SCoT) were scored. Using the program Popgene 1.32, and assuming that populations were in Hardy–Weinberg equilibrium, we estimated the following genetic diversity parameters: percentage of polymorphic bands (PPB), observed number of alleles (Na), number of “private” alleles (PA), average effective number of alleles per locus (Ne), Nei’s gene diversity index (HE), and Shannon’s information index (SI).To assess the correspondence of genetic diversity between the two datasets (ISSR vs. SCoT), we performed Wilcoxon signed-rank tests on 16 population pairs (with the exception of population ZP) for five within-population genetic variation parameters. We further conducted a Spearman’s rank correlation analysis between ISSR and SCoT for each genetic measure; the higher correlation between ISSR and SCoT for each genetic measure, the smaller the differences in the measurements between the two datasets. Thus, this correlation analysis can be viewed as an indirect way to check the suitability of the SCoT markers in surveying the genetic variability in C. japonicum.Although we found significant differences in the distribution of values of HE and SI (Mann-Whitney U-test or Wilcoxon rank-sum test for both ISSR and SCoT, P = 0.000), the order of values of those two genetic parameters were nearly the same (Spearman’s rank correlation analysis for both ISSR and SCoT, RS2 = 0.995 and 0.985, respectively, P = 0.000), suggesting that using either HE or SI would be appropriate. We used a Mann-Whitney U-test or Wilcoxon rank-sum test to assess the significance of differences in HE (a summary statistic of within-population genetic variation) between the pair of regions with the highest and the lowest estimates in China. In addition, we further conducted Mann-Whitney U-test to determine any significant difference in HE between the pair of countries with the highest and the lowest estimates in East Asia.To estimate the degree of population genetic differentiation, we further calculated total genetic diversity (HT), genetic diversity within populations (HS), and genetic differentiation among populations (GST) using Popgene. As the combination of dominant markers and small sample sizes could artificially inflate genetic structure (e.g., GST), we estimated GST values with and without the four populations with small sample sizes [n < 10; LS (3 individuals), DB (5), Ha (7), and JF (8)] to see how much these small populations may be influencing our statistics. To test for the influence of individuals within populations, populations within regions (countries), and regions on the observed genetic variation, we conducted a hierarchical analysis of molecular variance (AMOVA) using GenAlEx 6.5. To determine the relative importance of gene flow and RGD at regional scale, we conducted a correlation analysis between pairwise genetic distances and linear geographic distances (km) and ran a Mantel test for IBD (with 999 permutations) using GenAlEx. A positively significant linear relation suggests that populations are at regional equilibrium between gene flow and RGD. As reproductive barriers would exist between Chinese and Japanese populations due to different chromosome number (2n = 22 in China; 2n = 20 in Japan,) and geographical isolation, we could be dealing with two cryptic species or subspecies. In order to avoid such potential “phylogenetic” biases in our genetic results, we performed further analyses of GST and IBD in a hierarchical fashion: within each country separately and then with all possible combinations of country pairs (i.e., “China vs. Korea”, “China vs. Japan”, and “Korea vs. Japan”).We conducted unweighted pair-group method with arithmetic means (UPGMA) based on Nei’s unbiased genetic distances between populations using the Tfpga 1.3 program. Bootstrap values for nodes were estimated based on 999 replications. As a complementary analysis, we performed a principal coordinate analysis (PCoA) with GenAlEx, on the basis of Nei’s genetic distances, to investigate the relationships among the populations, and the two principal coordinates were used to visualize the dispersion of accessions in a two-dimensional array of eigenvectors. To deeply explore the population structure and unravel genetic admixture, we used Structure 2.3.4 to analyze the combined dataset (16 populations with ZP exclusion due to lack of SCoT data) of two markers with a Bayesian approach. Posterior probabilities of the data for each K were obtained for K = 1 to K = 18 clusters using the Admixture Model. Fifteen runs were completed for each K, with a Markov Chain Monte Carlo (MCMC) of 200,000 iterations, following a burn-in period of 1,000,000 iterations. We inferred the most likely value of K by the ΔK statistics, with the aid of Structure Harvester. Since the ΔK method tends to identify K = 2 as the top level of hierarchical structure, we combined it with the method of selecting the smallest K after the log probability of data [ln Pr(X|K)] values reached a plateau. Programs Clumpp 1.1.2 and Distruct 1.1 were used to combine the results of the 15 replicates of the best K and to visualize the results produced by Clumpp, respectively. […]

Pipeline specifications

Software tools POPGENE, GenAlEx, TFPGA, Structure Harvester, CLUMPP, DISTRUCT
Applications Phylogenetics, Population genetic analysis
Organisms Cypripedium japonicum, Ceramium japonicum