Computational protocol: Evaluation of Genetic Diversity and Development of a Core Collection of Wild Rice (Oryza rufipogon Griff.) Populations in China

Similar protocols

Protocol publication

[…] Phenotypic and genotypic data were used to estimate genetic diversity and construct the core collection. Dendrograms of SSR markers and morphological traits were analyzed by NTSYSpc version 2.10 and SPSS17.0 software, respectively [,]. Box plots were used to determine the frequency distribution of quantitative traits among populations [,]. Box plots showing the distribution of ten quantitative traits from eight populations were developed using Sigmaplot software version 12.5. To assess whether the relationship between the qualitative traits (i.e., leaf color, flowering rate, and growth habit) and latitude was linear, graphs were created using Sigmaplot software version 12.5. A Mantel test was conducted using NTSYSpc version 2.10 to test for correlation between percentage of accessions exhibiting a particular qualitative trait (for instance, dark green leaf color, growth habit and flowering) and geographical locations.Packing diagrams illustrate the genetic diversity in populations detected using SSR loci [], and were created in EXCEL. The number of alleles at each locus (ne; effective number of alleles) is also a commonly used parameter to evaluate the genetic diversity of populations. Polymorphic alleles, total alleles, percentage of polymorphic bands (PPB), average number of alleles (na), and Shannon's information index (I) were estimated using POPGENE []. PPB, na, ne, and I were used to analyze the molecular data, and these parameters reflect the population genetic diversity and support the conservation strategies. POPGENE was also used to estimate the total genetic diversity (Ht) from all the wild rice populations and the mean genetic diversity within wild rice populations (Hs). The coefficient of gene differentiation (Gst; estimated by partitioning of the total genetic diversity residing among populations) and gene flow (Nm) were calculated using POPGENE. Shannon's information index (I) was calculated using the equation: I = −∑i=1S P ilnP i (P i = n/N, where n represents number of accessions in a population and N is the total number of accessions in all populations; ln is the natural log; Σ is the sum of calculations; and S is the number of populations), and accounts for the abundance or richness of a species [,]. Nei's gene diversity (h) was calculated using the equation: h = 2n (1−ΣPi 2) /(2n - 1) (where n is the number of populations sampled and pi is the allele frequency at a given locus) to estimate the average genetic diversity per locus within an individual population []. The polymorphism information content (PIC = 1−∑i=1l P 2 i−∑i=1l−1 ∑l=i+1l2P 2 i P 2 j, P i and P j are the population frequency of the i th and j th allele) was estimated using PICcale software (Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, 2007). PIC is a measure of a marker’s ability to detect polymorphisms in a population, based on the number of alleles detected and their frequency distribution; hence, it provides an estimate of the discriminating power of a marker [].To assess variation in the morphological traits and molecular data, the analysis of variance (ANOVA) and analysis of molecular variance (AMOVA) were performed using SPSS17.0 [] and ARLEQUIN, respectively []. The percentage of variation among and within populations was estimated using AMOVA. Nei's genetic distance and genetic identity are useful indexes to determine similarities or differences among populations and were estimated using POPGENE. The morphological and molecular data were combined to analyze genetic diversity using SPSS17.0. Principal component analysis (PCA) implemented in SPSS17.0 was used to analyze the relationships between populations []. [...] Three strategies were used to select wild rice accessions for long-term conservation []. (1) Strategy 1 was based on the magnitude of genetic distance, which is an estimate of the genetic divergence between populations within a species, determined using two individual phylogenetic trees (constructed using phenotypic and genotypic data), and accessions with high genetic distance were selected. (2) In Strategy 2, accessions were grouped by hierarchical cluster analysis based on genetic distance. (3) In Strategy 3, the core collection was selected from different groups of clusters, and the sampling proportion of the core collection was 10–20% of a population. A core collection was developed using QGAStation 2.0 (http://ibi.zju.edu.cn/software/qga/) and Venn analysis (http://bioinfogp.cnb.csic.es/tools/venny/) was performed to compare this core collection with the one based on genetic distance.Furthermore, genetic diversity parameters, including number of alleles, na, ne, h, I, and PIC, were used to estimate the representativeness of the core collections. These genetic diversity parameters were compared between each population and its core collection using Student's t-test (X 2). If the p-value was greater than 0.05, then the difference between the core and entire population was considered to be non-significant. PCA was used to identify the major sources of variation between the accessions selected as the core collection and the eight populations. If the core sets were distributed uniformly in the PCA distribution graph, the core collections were considered to maintain a high level of genetic diversity and to be representative of the whole germplasm. […]

Pipeline specifications

Software tools NTSYSpc, SigmaPlot, POPGENE, Arlequin, VENNY
Applications Miscellaneous, Phylogenetics, Population genetic analysis
Organisms Oryza rufipogon, Oryza sativa