Computational protocol: A Multi Breed Genome Wide Association Analysis for Canine Hypothyroidism Identifies a Shared Major Risk Locus on CFA12

Similar protocols

Protocol publication

[…] The initial genetic datasets related to 165 GS, 74 HV and 92 RR individuals (), and were genotyped using the Illumina 170 k CanineHD BeadChip (Illumina, San Diego, CA, USA) at the same technology platform. All SNP-positions are given according to the dog CanFam3.1 assembly [].Genotyping data quality control (QC) was carried out for each breed separately, using R v3.0.2 [] and GenABEL v1.8–0 []. Firstly, an individual-based QC step was performed to identify potential duplicated samples and samples with gender discrepancies. Secondly, a marker-based QC was performed, including: pruning of the total set of SNPs according to minor allele frequency thresholds (MAF) (0.05 for all breeds), SNP and individual call rates (95% for all breeds), p-values (1x10-5 in GS, 1x10-8 in HV, 1x10-3 in RR) and false discovery rates for Hardy-Weinberg equilibrium (0.2 in all breeds). Moreover, each breed dataset was also checked for correlation between disease status and gender distribution. Fisher’s exact test and a phi coefficient were used to evaluate statistical significance and magnitude of correlation between such dichotomous variables (i.e. disease status and gender) [, ]. [...] A GWA analysis was performed on the quality controlled SNP datasets for GS, HV and RR breeds separately. All analytical steps were carried out using R v3.0.2 [] and GenABEL v1.8–0 []. Using 2,000 randomly selected autosomal markers a genomic kinship matrix weighted by allele frequencies was computed in every breed. In all the breeds, we applied a standard linear mixed model, which was fitted using the polygenic_hglm function from the hglm package ver 2.0–8 [], including the genomic kinship matrix as random effect. The mixed model approach is able to deal with both population structure and relatedness []. Breed-specific genomic kinship matrices were also used to project genetic distance between individuals into a plane using multidimensional scaling (MDS) and for subsequent plotting. For HV population, where samples had different geographic origins, we wanted to test whether this could have introduced any structure into the population. For this purpose, we followed an approach suggested by Tengvall and colleagues []. Shortly, we used K-means clustering to assign individuals to a predefined number of subpopulations. The number of clusters K = 2 (here subpopulations) was determined using a so-called scree test on a within-cluster sum of squares in a function of K (for details see []). Next, we used a linear mixed model with population as a fixed effect and genomic kinship as random effect. The statistical significance thresholds were evaluated as follows: (a) empirical genome-wide significance levels (Pgenome) obtained after 1000 permutations of the mixed model residuals (residualY returned by polygenic function in GenABEL) and (b) 95% empirical SNP distributions confidence intervals (CI95) as proposed by Karlsson and colleagues []. By permuting mixed model residuals, we maintained the connection between phenotypes and fixed effects [], thus being able to evaluate the significance of only the genetic effects. For each single-breed GWA study, a quantile-quantile (QQ) plot was produced in R v3.0.2 and a Manhattan plot was generated using the R package qqman []. The independence of the signal was verified by association analysis conditioned on the genotype of the most significantly associated SNP (top SNP) for each breed separately.Breed-specific associated loci were defined based on pairwise linkage disequilibrium (LD) estimates (R2 ≥ 0.7) of the three breed-specific top SNPs to SNPs in CFA12. [...] GWA meta-analysis of the three independent datasets (breeds) was carried out using MetABEL v0.2.0 [], a part of the R statistical suite v3.0.2 []. Assuming the associated shared allelic effect being the same in each dataset, MetABEL performs a fixed effects meta-analysis, where each study is weighted according to the inverse of its’ squared standard error in order to maximise the power of discovery []. We created an MDS plot, displaying the samples belonging to the three different breeds as subpopulations, a QQ plot, showing the degree of deviation of the associated SNPs compared to their null distribution, and a Manhattan plot, showing the genome-wide association signals, as described above. [...] The minimal risk haplotype, shared across breeds, was identified in the associated locus from the meta-analysis. Firstly, genotypes of the shared associated region were imputed (if missing) and phased into haplotypes in each breed separately using fastPHASE []. At this stage the phenotype of each individual was used as a covariate, in order to avoid prediction of spurious haplotypes. Thereafter, the risk haplotypes present in cases and non-risk haplotypes present in controls were identified based on the genotype at the meta-analysis top SNP. Starting from the meta-analysis top SNP and walking both up- and downstream, we then identified the SNP-positions where the risk haplotype was broken by a recombination event (i.e. two alternative alleles were present on both risk and non-risk haplotypes). This was done separately for each breed, and thereafter the minimal shared risk haplotype across breeds defined.Two SNPs tagging the associated risk haplotype across breeds were analysed for association with the phenotype as both haplotypes and genotypes using Pearson’s Chi-squared and Fisher’s exact tests respectively [, ]. […]

Pipeline specifications

Software tools GenABEL, qqman, MetABEL, fastPHASE
Application GWAS
Organisms Canis lupus familiaris, Homo sapiens
Diseases Endocrine System Diseases, Hypothyroidism