Computational protocol: Evidence for Introduction Bottleneck and Extensive Inter-Gene Pool (Mesoamerica x Andes) Hybridization in the European Common Bean (Phaseolus vulgaris L.) Germplasm

Similar protocols

Protocol publication

[…] For each nuSSR locus, the total number of alleles detected, the gene diversity or unbiased expected heterozygosity (He []), and the polymorphism information content (PIC) were calculated using the program Power Marker 3.25 [].The genetic diversity within continents (America and Europe), within the two gene pools (Andean and Mesoamerican), and within gene pools within continents (America Andean, America Mesoamerican, Europe Andean, and Europe Mesoamerican) was evaluated in terms of number of alleles per locus (Na), Shannon diversity index (I), and gene diversity or unbiased expected heterozygosity (He []). Number of private alleles was also computed using a threshold frequency of 5% to reduce the effects of sampling error []. All these indices were calculated using GenAlEx 6 []. As the number of alleles in a sample is highly dependent on the sample size, we also computed the allelic richness (Rs) using the generalized rarefaction method as implemented in HP-RARE []. HP-RARE uses the rarefaction approach of Kalinowski [] to trim unequal accession number to the same standardized sample size, a number equal to the smallest across the populations. Relative loss of diversity in terms of alleles (ΔRs) and genetic diversity (ΔHe) was calculated according to Vigouroux et al. []. Differences between populations on the gene diversity estimates were assessed for significance using Wilcoxon’s signed-rank test as implemented in the software StatistiXL ( further investigate the genetic relationships between all pairs of accessions, an individual-by-individual (N x N) genetic distance matrix was computed and subsequently used as an input for principal coordinate analysis (PCoA) in the GenAlEx 6 program [].Pairwise FST metrics were calculated in GenAlEx 6 to estimate the divergence between groups according to the formula of Weir and Cockerham []. The value of FST varies from zero to one; when FST = 0, the groups are identical, while when FST = 1, they are completely differentiated in relation to the fixation of different alleles in each group.To assess the distribution of genetic variations in the nuSSR dataset, a hierarchical analysis of molecular variance (AMOVA) was also performed, using GenAlEx 6 []. This analysis allowed the partition of the total nuSSR variation into within and among groups variance components, and provided measures of intergroup genetic distance as a proportion of the total nuSSR variation residing between any two groups (Phi statistics []). Genetic variation was partitioned into three levels: between continents (America and Europe), between gene pools (Mesoamerican and Andean) within continent and within gene pool within continent. The significance of the variance components and the differentiation statistic were tested by nonparametric randomization tests using 10,000 permutations.A Bayesian clustering approach, implemented in STRUCTURE 2.2 [], was adopted to first assess the number of meaningful populations (K) and second to identify putative inter-gene pool hybrids within our collections, with no “a priori” information other than nuSSR genotype data. The STRUCTURE program was run with populations (K) set from one to ten. Twenty independent simulations were performed for each K setting using the admixture model, with each simulation set to a 5,000 burn-in period and 50,000 Markov chain Monte Carlo (MCMC) repetitions. To determine the optimal number of clusters, STRUCTURE HARVESTER [], available at http: //taylor0.biology., was used to calculate the ΔK statistical test [], in combination with the likelihoods (posterior probabilities) of each preset K. Results from simulations with the highest likelihood within each number of different K simulations were chosen to assign accessions to populations. Following the recommendation of Pritchard et al. [], and previous useful analysis in common bean [,], accessions with population membership coefficient lower than 0.8 were identified as putative hybrids. A STRUCTURE graphical bar plot of membership coefficients was generated using Microsoft Excel.Hybridization between the Mesoamerican and Andean gene pools in Europe and America was also investigated by combining the information provided by chloroplast (cpSSR) and nuclear (phaseolin, Pv-shatterproof1) markers with the Bayesian assignments based on nuSSRs. Genotypes were classified as hybrids if the chloroplast or any of the two nuclear markers did not agree with the STRUCTURE gene pool assignment (i.e. a genotype was attributed to the Andean gene pool but had Mesoamerican “S” phaseolin type). To validate the levels of genetic admixture in the common bean, we then compared our results with hybrid identification according to [] as recombinant for chloroplast (cpSSR) and nuclear (phaseolin, Pv-shatterproof1) markers. In this approach, if the genetic patterns of variation in chloroplast and nuclear markers resulting from the analysis of recombinant were concordant (i.e. an individual is attributed to the Andean gene pool, or Mesoamerican gene pool by all of these marker types), then the accessions were classified as “pure”. On the contrary, accessions with a mismatch between their chloroplast and nuclear polymorphisms were classified as putative hybrids. […]

Pipeline specifications

Software tools GenAlEx, Structure Harvester
Application Population genetic analysis
Organisms Phaseolus vulgaris