Computational protocol: Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds

Similar protocols

Protocol publication

[…] In this study, all 54,555 SNP markers were simultaneously considered as predictors of the response variables in order to estimate partial SNP effects. The “Bayes-B” method [] which fits a mixture model in which non-zero SNP effects are drawn from distributions with marker specific variances and some known fraction of markers (π) have no effect on the trait was used to estimate marker effects. For each trait, the following model was fit to estimate marker effects: where y is the vector of observations (i.e., DEBV); b is the vector of fixed effects which comprised only the population mean because DEBV are free of systematic environmental effects such as herds, years and seasons of data origin; u is a vector of random marker substitution effects, where element j of u has effect greater than zero (with probability 1 - π) or effect equal to zero (with probability π) as described by []; X and Z are design matrices which relate observations to the fixed and marker effects, respectively, with each element of Z representing an allelic state (i.e., centered number of B alleles from the Illumina A/B calling system); and e is the vector of random residuals ~ N(0, ) where D is a diagonal matrix whose inverse elements are the weights described by []. The DEBV for maternal traits (calving ease and weaning weight maternal) were derived from EBV reported by respective breed association similar to other traits. In this study, parameter π was set to 0.99 for all analyses. MCMC methods with 41,040 iterations were used to obtain samples of marker effects and variances after discarding the first 1,000 samples to allow for burn-in. The estimates of genetic and residual variances for constructing priors of genetic and residual scale parameters for Bayes-B analysis [], were obtained from preliminary Bayes-C analyses with π = 0.95 [], which is less sensitive to prior assumptions than Bayes-B.For each 40th iteration of the post burn-in chain (1,000 samples in total), sampled values for the effects of the SNPs within each 1-Mb window were used to compute samples of the direct genomic breeding value (DGV) of every animal for that window (by multiplying the number of copies of B alleles by the sample of their corresponding SNP effect, and summing these values over all marker loci located within the 1-Mb window). The variance of window DGV across all animals within the breed was then used to obtain a sample of the additive genetic variance for that window. The percentage of additive genetic variance explained by each 1-Mb window was calculated as the proportion of the 1-Mb window variance for that sample relative to the sample in the same iteration of the whole genome additive genetic variance. Any 1-Mb window for which the posterior mean percentage of additive genetic variance explained was ≥ 1% (~25 fold greater than the expected value of 0.04% for each of 2,677 1-Mb windows genome-wide assuming a polygenic model for which all genomic regions explain the same amount of variance) was selected as a window containing (or defining) a large-effect QTL. Those QTL that were associated with at least two traits in more than one breed were considered to be pleiotropic. The posterior probability of inclusion (PPI) for a given window, which is the proportion of samples in which at least one SNP from a given window was included in the model with a non-zero effect, was used for significance testing []. All analyses were performed using GenSel software [].Individual 1-Mb windows that explained the largest proportions of additive genetic variation were visualized in GBrowse [] for detailed inspection of the chromosomal region containing the 1-Mb window. The overlapping QTL for a given 1-Mb window were obtained from cattle QTLdb (http://www.animalgenome.org/cgi-bin/gbrowse/bovine/). The SNP with the highest PPI within a given 1-Mb QTL window was selected as the lead-SNP or most strongly associated SNP for that QTL window. Further information for strongly associated SNP was obtained using NCBI dbSNP (http://www.ncbi.nlm.nih.gov/snp/) and Ensemble (http://www.ensembl.org/) databases. Gene searches were performed for these genomic regions using the NCBI gene database (http://www.ncbi.nlm.nih.gov/gene/). The sfdp algorithm from Graphviz software was used to draw the QTL network []. GO term enrichment analysis was performed for gene sets that existed within the pleiotropic QTL (those with available GO ID, Additional file ) over all known genes in the GO database using the web tool g:Profiler (http://biit.cs.ut.ee/gprofiler/) [] with Bonferroni corrected p-value cut off 0.05. Only GO terms from the categories biological processes and KEGG pathway enrichments were retained. […]

Pipeline specifications

Software tools GBrowse, Graphviz, g:Profiler
Databases dbSNP
Application Population genetic analysis
Organisms Bos taurus