Computational protocol: Effectiveness of genomic prediction on milk flow traits in dairy cattle

Similar protocols

Protocol publication

[…] Genomic predictions were first computed using a multi-step approach []. First, a traditional evaluation of milking traits was performed. Using a six-trait animal model similar to the model described in Gray et al. [], traditional breeding values and parent averages for the genotyped bulls were predicted for TMT, AT, TP, DT, MMF and AVGF applying the BLUP methodology with ASREML []. Genotyped animals were then split into a training and a validation dataset depending on the reliability of their BLUP EBV. Genotyped sires with a reliability higher than 0.50 for TP and 0.60 for the remaining traits were included in the training dataset, which amounted to separate older sires from young sires with less progeny, i.e. older sires were assigned to the training dataset and the remaining younger animals to the validation dataset.Pseudo-phenotypes (dEBV) were obtained as deregressed EBV free of the effects of parent average and adjusted for the number of daughters contributing to the EBV for the data vector used in genomic predictions [].In the second step, the dEBV were then analyzed for prediction of genomic EBV including a genomic relationship matrix in place of the traditional numerator relationship matrix in the mixed model equations [], which will be referred to as GBLUP.The genomic relationship matrix G was constructed using the formula G=ZZ′2Σpi(1−pi), where pi is the frequency of marker i, estimated from all genotyped sires, and Z is the matrix of marker codes (0/1/2) adjusted by setting the mean for each SNP across genotypes to 0 by subtracting P defined as a matrix with allele frequencies expressed as a difference from 0.5 and multiplied by 2, such that column i of P is 2(pi−0.5)[]. With GBLUP, predicted breeding values were obtained for animals in the validation dataset through G-1. Using ASREML software [], records from the training population entered the mixed model equations as the y vector, solving for the predicted breeding values.Two different non-linear prediction approaches, Bayes-A, and Bayesian LASSO (Least Absolute Shrinkage and Selection Operator), were used to estimate a genetic variance component for each marker, accounting for a non-normal prior distribution. A comparison of these two methods is in Cleveland et al. [].The general structure of the models in matrix form was: y = 1 μ + X β + e y: vector of de-regressed breeding values for TMT, AT, TP, DT, MMF and AVGF; μ: overall mean; β: vector of additive effects for each marker; X: matrix of genotypes coded as number of copies of an arbitrary allele (0, 1, and 2) for each SNP; e: vector of residuals assumed normal with variance weighted as outlined by Garrick et al. [] with a c constant for the genetic variance unaccounted for by the markers set at 0.4 after an exploratory analysis (data not shown).A flat (1) prior was assigned to μ, while the prior distribution for σe2 was assumed inverted chi-square with 4 degrees of freedom and an expectation equal to the value used in the traditional BLUP evaluation with individual cow records.The remaining prior structure was: β i : N ( 0 , σ g i 2 ) for the ith SNP, σ g i 2 ~ inv − χ 2 ( σ g i 2 | ν / s 2 ) for the BayesA approach and σ g i 2 ~ Exp ( σ g i 2 | 2 / λ i 2 ) for the LASSO approach.In the current analysis, a straightforward generalization of the BayesA method was applied, in which scale parameter s2 and degrees of freedom ν were treated as unknown and were estimated from the data []. They were assigned a uniform prior in the interval (0,1] for ν and a uniform prior for s for the range of (0,Q], with Q being 100. At each round of the Gibbs sampler that was implemented, samples of s2 where obtained from Gamma(s2|σgi2,ν). Since the ν parameter does not have a closed form, parameter samples were obtained at each round of the sampler with a Metropolis step ( ν | y , μ , β , σ g i 2 , σ e 2 ) . The pseudo-code and a summary of posterior results for scale and degrees of freedom for this step are provided as additional data [see Additional file ].The λ parameter in the LASSO approach was assigned a gamma prior distribution Gamma (0.05,1.0), so the prior of λ was essentially uniform over a wide range of values []. A Gibbs sampling algorithm was implemented in R to obtain samples from the joint posterior distribution [].Marker effect estimates were obtained using the above models within the training population and were then applied to the validation dataset to predict genomic breeding values. Assuming a completely additive model, marker effects were summed across the entire genome for each animal to obtain the DGV. Genome-enhanced breeding values (GEBV) were obtained by combining DGV and parental averages, as outlined by Saatchi et al. []. […]

Pipeline specifications

Software tools ASREML, Lasso
Application GWAS
Organisms Bos taurus