Computational protocol: Marker-Trait Association for Biomass Yield of Potential Bio-fuel Feedstock Miscanthus sinensis from Southwest China

Similar protocols

Protocol publication

[…] The alleles of molecular markers were scored manually for the population as band presence (1) or absence (0), and each of them was treated as an independent character regardless of its intensity. A present/absent data matrix was constructed to analyze the genetic diversity and population structure. The discriminatory power of different primers was evaluated by means of polymorphic information content (PIC), calculated by the following model PICi = 2fi(1 − fi) (Roldan-Ruiz et al., ). In the model, PICi is the polymorphic information content of marker “I,” fi is the frequency of the amplified allele (band present), and 1 −fi is the frequency of the null allele.Population structure (Q) of 138 M. sinensis individuals was confirmed using the model-based clustering approach implemented in STRUCTURE v2.3.4 software (Pritchard et al., ) with the “admixture model,” burn-in period of 100,000 iterations and a run of 100,000 replications of Markov Chain Monte Carlo (MCMC) after burn in. For each run, 20 independent runs of STRUCTURE were performed with the number of clusters (K) varying from 1 to 10. Maximum likelihood and delta K (ΔK) tests were used to determine the optimum number of subgroups (Evanno et al., ). For clustering analysis, the similarity coefficients were used to construct an unweighted pair group method with arithmetic means (UPGMA) dendogram using sequential agglomerative hierarchical and nested clustering (SAHN) module in the NTSYS-pc version2.10 software. Analysis of molecular variance (AMOVA) was used to calculate variation among and within populations using GenAlEx ver. 6.41 (Peakall and Smouse, ).All genetic diversity indices were calculated using PopGen32 v.1.31, assuming Hardy-Weinberg equilibrium; the genetic diversity was evaluated with parameters: Nei's () gene diversity (H) and Shannon's Information Index of Diversity (I). The total gene diversity (HT) was divided into gene diversity within populations (HS) and the gene diversity among populations (DST). These parameters were calculated according to the equation HT = HS+DST. The genetic differentiation coefficient (GST) was calculated as a ratio of DST/HT and was used to measure population differentiation. Gene flow was calculated as Nm = 0.5(1 − GST)∕GST to estimate the level of gene drift among the populations (Slatkin and Barton, ). [...] The markers with minor allele frequency less than 5% were removed in order to reduce false positive associations. Relative kinship (K) among samples was calculated by TASSEL 2.1 software. The marker-trait association analysis was conducted to reveal associations between the interest traits and marker alleles using TASSEL 2.1 software along with the General Linear Model (GLM) and Mixed Linear Model (MLM) procedure (Bradbury et al., ) to control for population structure and relative kinship. The simple linear model, Q (population structure results included as fixed effects generating from STRUCTURE software) model, K (relative kinship results included as fixed effects generating from TASSEL software) model, and Q+K models were tested to identify the best model fitting biomass related traits using Quantile-quantile (QQ) plots for association mapping in the M. sinensis populations. Two thresholds for significant associations were tested in our study. First, the significance threshold for associations between loci and traits was set at P < 0.001. Second, the Bonferroni correction of multiple testing (P < 0.05/934 ~5.35 × 10 −5) was performed based on q-value using false discovery rate (FDR, αc = 0.05). The phenotypic variation explained by the single associated marker (R2) indicated the fixed marker effects. [...] The genome-wide prediction was carried out by using the R package rrBLUP (Endelman, ) with ridge regression. The average correlation between the predicted phenotypic values from marker data and the original phenotypic values directly from field trail was used as the criteria of genome prediction accuracy. The accuracy (Pearson's correlation coefficient) was calculated with recommended 10-fold cross-validation and was repeated 100 times (Slavov et al., ). The adjusted prediction accuracy was calculated by dividing accuracy by the square root of the broad-sense heritability (h2), where h2 was calculated by using PROC MIXED (SAS Institute, Version 9.1, Cary, NC, USA). The h2 was calculated as follows: h2 = σg2/(σg2+σe2/re +σge2/e), where σg2, σe2, σge2 represent Type III SS (sums of squares) for genotype (G), environment (E), and G × E, respectively. The “e” is the degree of freedom of environment and “re” is the degree of freedom of G × E. […]

Pipeline specifications

Software tools GenAlEx, TASSEL, rrBLUP
Applications Population genetic analysis, GWAS
Organisms Miscanthus sinensis, Macropus giganteus