Computational protocol: An experimental validation of genomic selection in octoploid strawberry

Similar protocols

Protocol publication

[…] In order to evaluate the performance of GS, six contrasting methods were assessed in this study. The selected methods corresponded to: Bayes B, Bayes C, Bayes Ridge Regression, BRR, Bayesian LASSO, BL, Bayesian Reproducing Kernel Hilbert Spaces, RKHS and Genomic BLUP, GBLUP. The Bayesian methods address the problem of small number of observations (n) and a large number of parameters (p) to be estimated (n<BGLR implemented in the package R version 3.1.1(ref. ) was used to estimate genomic predictions for Bayes B, Bayes C, BL, BRR and RKHS, with the response variables corresponding to the adjusted phenotypic mean values obtained previously ().The Bayesian models for continuous variables are represented by the equation y=1μ+Xβ+e, where y is the vector of adjusted phenotypic observations, μ is an intercept, β is a vector of marker effects associated to the columns of the marker incidence matrix X, and e is the vector of residual effects. The conditional distribution of marker effects of these models differ in the allocation of priors, which determine the type of shrinkage or variable selection imposed on the estimates. In Bayes B and Bayes C, a mixture of two different finite prior densities is used, a point of mass at zero and a Gaussian slab for Bayes B and a point of mass at zero and a scaled-t slab in Bayes C. In BRR a Gaussian prior density gives similar shrinkage to all marker effects and in BL a double exponential prior density causes a size of effect-dependent shrinkage on the estimates. In this study, each of the different prior densities were defined by using the default hyper-parameters presented in Perez and de los Campos. In preliminary work, different parameters were evaluated but these did not show any important differences over the default recommended parameter values.BGLR also implements RKHS regression which was proposed for the prediction of genetic values by Gianola and have been applied for complex traits in wheat breeding and animal breeding. This method uses a function or Reproducing Kernel (RK) that maps ‘marker genotypes K(mi,mi′) from an input set onto a real line that must satisfy ∑i∑i′αiαi′K(mi,mi′)≥0,for any non-null sequence of coefficients αI’. In this method, the observed numerator relationship matrix (Ag) is replaced by the kernel matrix (K), using a Gaussian prior evaluated by the square Euclidean distance between markers. The bandwidth parameter h, and the residual variance indexed by a scale and degrees of freedom parameters were set using default values.In GBLUP, an observed numerator relationships matrix, Ag, was obtained using all 17 479 markers with no imputation for missing values. This matrix was calculated by using the equations described by Yang et al., and later an inverse of this matrix was generated implementing bending due to the presence of non-positive eigenvalues. Markers quality control, generation of the Ag matrix and its inverse were all performed with the software GenoMatrix. This matrix was later used, based on the adjusted phenotypic values, to fit a simple animal/individual model of the form: y=1μ+Za+e, where a∼MVN(0,σa2Ag) and e∼MVN(0,σe2I). Model fitting for GBLUP and pedigree-based analysis, PBLUP, together with the generation of genomic predictions for each of the genotypes was performed with the software ASReml-R as implemented for the statistical package R version 3.1.1. [...] JoinMap 4.1 software was used to create a high-density SNP linkage map of 14 332 SNP markers using a FL_08-10×12.115-10 mapping population comprised of 165 progeny (Verma et al., unpublished data). Because of stringent mapping parameters and in order to minimize gaps between SNPs, a few linkage groups (LG) were subdivided into two groups. Orientation and subgenome specificity of each LG were assigned according to van Dijk et al. In order to evaluate the contribution of LD and genetic relationships on the accuracy and efficiency of GS methods, LD parameters were calculated for the entire T2/2013 and T2/2014 population sets. Common SNPs between the mapped markers and the 17 479 SNPs chosen for the genomic analysis were extracted. A common set of 4841 SNPs were distributed in 28 LGs and analyzed for extent of LD. Pairwise LD (r2) for each LG was estimated using the R package LDheatmap. The genomic relationship matrix derived from each LG was estimated using GenoMatrix software and used for the estimation of LD corrected for relatedness (rv2) utilizing the LDcorSV package in R. […]

Pipeline specifications

Software tools BGLR, GenoMatrix, LDheatmap
Applications Miscellaneous, GWAS