Computational protocol: Genetic diversity and structure in hill rice (Oryza sativa L.) landraces from the North-Eastern Himalayas of India

Similar protocols

Protocol publication

[…] Summary statistics such as mean, standard deviation (SD), minimum and maximum values, and coefficient of variation (CV) were determined. The mean data were standardized by deriving Z scores for further analyses. ANOVA was carried out to test the significance of variation between hill rice cultivar groups for different traits. Principal component analysis (PCA) was performed on the correlation matrix of the data to understand the most important variables contributing to the total phenotypic variation among the accessions. PCA represents the simplest and most commonly used multivariate method to visualize the grouping of accessions based on component loadings. Ward’s hierarchical clustering was used to understand the relationships among rice accessions based on phenotypic data. All these analyses were performed using IBM-SPSS Statistics version 20.0 []. [...] Population structure of 64 hill and 15 control rice accessions was examined using the Bayesian model-based approach implemented in STRUCTURE V2.3.4 []. The number of clusters (K) evaluated here ranged from 1 to 8. The analysis was performed using five replicate runs per K value, a burn-in period length of 5000, a run length of 50,000, and a model allowing for admixture and correlated allele frequency. ‘Structure harvester’ programme ( was used to determine the final K value(s) based on both the LnP(D) and Evanno’s ΔK []. Subsequently, ten simulations at K = 2–4 were then performed with a burn-in period of 10,000 and a run length of 100,000. The membership coefficient from the run with the lowest likelihood value was used to assign each accession to the K = 1 to 4 subpopulations based on the estimated membership coefficients. [...] The average number of alleles per locus (AN), major allele frequency (MAF), gene diversity (He), heterozygosity (Ho) and polymorphism information content (PIC) were calculated using PowerMarker V3.25 []. Average allelic richness (Rs) and Wright’s fixation index (Fst) values were calculated using FSTAT V2.9.3.2 []. The molecular variance of subpopulations and accessions within the subpopulations were calculated using an Analysis of Molecular Variance (AMOVA) approach in GenAlEx V6.5 []. Separate analyses were conducted by classifying the rice accessions into: districts, farmers’ classified groups and STRUCTURE subpopulations. The statistical test of the differences in allelic count among different groups with variable numbers of accessions was conducted in R [] using FPTestR, the R version of FPTest method reported in Fu et al. []. The FPTestR package (communicated for publication) was kindly provided by Dr Yong-Bi Fu, Plant Genetic Resources of Canada, Agriculture and Agri-Food Canada. The allele frequency data from PowerMarker was used to export the data in binary format (1/0) for analysis with NTSYS-pc V2.2 []. A neighbour-joining (NJ) cluster diagram was constructed with the NJOIN sub-programme using the genetic dissimilarity matrix calculated in SIMGEND sub-programme with Nei72 coefficient. To summarize the patterns of variation in multi-locus dataset, principal coordinate analysis (PCoA) was performed in GenAlex software using the genetic distance matrix among the accessions. [...] Cluster analysis was conducted on hill and reference accessions to understand the grouping of hill rices in respect to different genetically defined groups (indica, aus, aromatic, tropical japonica and temperate japonica). The reference set also included some ‘admixture’ accessions. For clustering, a pair-wise genetic distance matrix was calculated in PowerMarker following C. S. Chord distance method []. The NJ method was used for phylogenetic reconstruction. The unrooted NJ tree was visualized using Dendroscope V3 []. Pair-wise Fst and allelic differences (FPTest) between the groups were calculated as stated earlier. […]

Pipeline specifications