Computational protocol: A set of multiplex panels of microsatellite markers for rapid molecular characterization of rice accessions

Similar protocols

Protocol publication

[…] As it was mentioned before, the analyzed accessions had been previously classified as japonica rice on its majority. In order to confirm this premise, pairwise genetic distances among the 548 accessions were estimated in order to classify the accessions according to the indica or japonica genetic background. Genetic distance values were based on the ratio between the sum of the proportions of common alleles between two accessions (Ps) for all loci and twice the number of tested loci [,], and were obtained following the parameter [(-ln (Ps)] on the web-based Genetic Distance Calculator []. The genetic distance diagonal matrix was submitted to clustering analysis following the Neighbour-Joining method, and a genetic distance dendrogram was built using the NTSYSpc version 2.10z software []. In addition, bootstrap analysis of the obtained data was performed so that an estimation of the relative probability of inclusion for any of these accessions in the japonica or indica subspecies would be obtained. The distribution of allelic frequencies for each subspecies (indica and japonica) as "baseline populations" was taken as a reference []. The relative probability of inclusion was estimated using the Whichrun software []. The likelihood that an individual accession may come from one of the source populations (indica or japonica) is presumed to be equal to the Hardy-Weinberg frequency of its specific genotype at each locus in each respective source population.Based on the results of the genetic distance and clustering analysis, the accessions classified as japonica rice were used to evaluate the performance of the three marker panels in comparison to previously reported multiplex marker analyses using the program PowerMarker v.3.23 []. Estimates of allele number, observed heterozygosity (Ho), gene diversity under Hardy-Weinberg equilibrium (HWE) and polymorphism information content (PIC) were calculated. Fisher's exact test was applied to individual marker loci to test the conformity to HWE expectations. Expected gene diversity was calculated based on the unbiased estimator formed by multiplying the sample expected heterozygosity (1 - Σi pi2) by the factor (2n)/(2n - 1); being pi the frequency of the ith allele for each locus and n the number of analyzed samples []. A database of allelic frequencies for all loci was established using PowerMarker v.3.23 []. The combined efficiency of the panels for questions regarding line discrimination, seed contamination or hybrid origin (paternity analysis) was estimated by parameters such as matching probability and power of exclusion (PE). The matching probability or the probability of identical genotypes [], defined as PI = Σpi4 + Σ(2pipj)2, was estimated for the selected loci individually, and later, for all loci at once. The power of exclusion, the probability of excluding a random individual from the population as a potential parent of an offspring based on the genotype of one parent and offspring, was calculated as PE = Σpi (1-pi)2 - 1/2 Σpi2pj2 [].The genetic structure of the germplasm collection was analyzed according to a contrast between an a priori model of population structure based on the clusters defined by the genetic distance analysis and an unknown a priori model using the software Structure version 2.1 [,]. Genetic distance and cluster analysis were initially used as a reference to depict possible signs of structuring, suggesting potential composition of subpopulations. For comparison purposes, the analyses were performed both on the complete set of 548 accessions and on the set of 485 japonica accessions using a burn-in period of 20,000 in the model-based program Structure, followed by a run length of 200,000. Five independent runs for each K – the number of inferred groups estimated by Structure – were performed, with K values ranging from 1 to 15. The model choice criterion to detect the most probable value of K was ΔK, an ad hoc quantity related to the second order change of the log probability of data with respect to the number of clusters inferred by Structure []. An accession was included in a particular cluster inferred by the program if at least 70% of its genome value, as measured by its membership coefficient (ranging from 0 to 1), was estimated to belong to that cluster. Overall FST values for the inferred clusters were calculated using PowerMarker. The correlation between clusters defined by Structure and clusters defined by genetic distance analysis followed by Neighbor-Joining grouping was estimated by Pearson's correlation coefficient.The extent of genetic differentiation among groups, as defined a priori by the genetic distance and clustering analysis, was also estimated under the premises of the infinite allele model (FST) [] and under the stepwise mutation model (RST) []. Analysis of molecular variance (AMOVA) was also employed to evaluate the substructuring level of the collection using the program PowerMarker.The majority of the accessions (Additional file ) have been collected in five major geographic regions of Brazil (Northern Region, Northeastern Region, Southeastern Region, Southern Region, Mid-Western Region) and a few originated in other countries (International accessions). The correlation between geographic origin and FST values of the collection was analyzed by Pearson's correlation coefficient. […]

Pipeline specifications

Software tools NTSYSpc, PowerMarker
Application Population genetic analysis
Organisms Oryza sativa, Oryza sativa Indica Group