Computational protocol: Genome-Wide Association Studies Identify Candidate Genes for Coat Color and Mohair Traits in the Iranian Markhoz Goat

Similar protocols

Protocol publication

[…] All animals were assessed for data quality and completeness. Statistical analysis was performed using proc GLM in SAS studio university edition (SAS Institute Inc., Cary, NC, United States) for each fleece trait to determine their relationship to variables such as sex, age (from 1 to 7 years), dam’s age (2 to 8 years), type of birth (single, twin, or triplet) and color (black, brown, or white) for subsequent inclusion as covariates in the GWAS. The specific number of animals used in each GWAS is denoted in Table and varies based on the number of animals with quality trait information and the GWAS model chosen. We also performed least square means and used the Tukey method for comparing means of males and females for yearling fleece weight and comparing coat colors (black, brown, and white) for fiber volume. PCA was performed on seven of the fiber traits for animals with complete records across all fiber traits using JMP PRO 12 (SAS Institute Inc., Cary, NC, United States). The correlation matrix was applied due to the wide variation in quantitative measures of each trait. PCA was used to generate single quantitative variables combining the seven mohair traits with principal components retained for interpretation and analysis if the eigenvalue score was greater than 1.0. [...] Multiple genome-wide tests were performed for coat color and mohair traits (Table ) using Golden Helix SVS v8.3.4 (Golden Helix, Bozeman, MT, United States). Two hundred and twenty individuals were included in the coat color GWAS and 138 individuals were included in the mohair trait GWAS including 179 females and 41 males, and 115 females and 23 males, respectively. Quantitative or case-control associations were used in an Efficient Mixed Model Linear analysis (EMMAX) () to correct for remaining population structure and relatedness by including genomic relationship matrix as a random effect in a model. Coat color GWAS were performed in a case-control study design comparing the identified coat color to all other coat colors combined, including brown compared to black and white, black compared to brown and white and white compared to black and brown. Additional GWAS were evaluated with smaller sample sizes to compare single coat colors to one another (i.e., brown compared to black). Variation in brown coat color was considered but small sample size and insufficient differentiation between color variations precluded an association analysis. Association studies with the covariates of sex (male or female), age (1–7 years old), dam’s age (2–8 years old), and type of birth (single or twins) were considered in additive, dominant, and recessive inheritance models. Quantile–quantile (QQ) plots were used to determine the model of best fit for each trait. Quantitative measures were used in the GWAS for true fiber, grease percentages, and yearling fleece weight. For the remaining mohair traits in which no statistically significant loci were observed using a quantitative variable, a case/control model was then applied using a threshold based on the median or quartile values to compare representatives demonstrating the greatest degree of phenotypic variation within the group. For traits that did not surpass an adjusted Bonferroni significance cutoff or an adjusted false discovery rate (FDR) significance cutoff of 0.05, we investigated significance using adaptive permutation for the model of best fit using PLINK v1.9 (). Adaptive permutation evaluates the genomic dataset more quickly in that it discards SNPs which are not demonstrating association from further permutations, while continuing to analyze associated SNPs to the set threshold. Adaptive permutation output provides both the number of permutations achieved and corresponding P-value. Parameters used in the adaptive permutation testing included a minimum of five permutations performed but no more than 1,000,000 to determine significance using a confidence interval of 0.0001, alpha threshold of 0, intercept interval for pruning of 1, and slope interval of 0.001 for pruning (). Linkage disequilibrium (LD) structure and haplotype analysis was examined between associated markers using HAPLOVIEW v4.2 to assist in candidate gene identification (). Haplotypes blocks were defined using the algorithm from . Putative candidate gene(s) within one million base pairs up or down stream or within LD blocks of significantly associated SNPs were identified based on the GCF-001704415.1(ARS1) assembly in Genome Data Viewer on National Center for Biotechnology Information (NCBI) (). […]

Pipeline specifications

Software tools JMP Pro, EMMAX, PLINK, Haploview
Databases GDV
Applications Miscellaneous, GWAS
Organisms Capra hircus
Diseases Retinal Telangiectasis