Computational protocol: Comparison of linear mixed model analysis and genealogy-based haplotype clustering with a Bayesian approach for association mapping in a pedigreed population

Similar protocols

Protocol publication

[…] The association between each SNP and the phenotype was assessed by a linear mixed model analysis [], using DMU software []. The model was as follows: y = 1 μ + X g + Z u + e Where y is the vector of 2,000 phenotypes, 1 is a vector of 1s of length 2,000, μ is the general mean, g is the additive effect of the SNP and X is a vector with genotypic indicators (0, 1, or 2) associating records to the marker effect, u is the random polygenic effect with the normal distribution N(0,Aσu2), where A is the additive relationship matrix and σu2 is the polygenic variance. Z is an incidence matrix relating phenotypes to the corresponding random polygenic effect, and e is a vector of random environmental deviates with the normal distribution N(0,Iσe2), where σe2 is the error variance and I is the identity matrix. Testing was done using a Wald test against a null hypothesis of H0:gi=0. The significance threshold was determined using a Bonferroni correction for the number of markers tested to obtain an experiment-wise P-value of 0.05. [...] The SNP genotype data were phased using software FastPhase [].The haplotypes were 4 SNP long and they were tested for association sliding windows from SNP to SNP. The model for testing the association of the haplotypes at position j and the phenotype can be clarified in scalar form as follows: y i = μ + u i + q h m i + q h p i + e i Where yi is the phenotype of animal i, μ is the population mean, ui is the random polygenic effect, qhmi and qhpi are the random effects of the maternal and paternal haplotypes carried by individual i, and ei is the random residual effect as defined for MMA. The other random effect q was assumed to be normally distributed with mean zero and variances Iσh2 (assuming equal variance for paternal and maternal haplotypes). The significance of the haplotype substitution effect was assessed with a likelihood ratio test comparing the RHM model with a null-model containing mean, polygenic effect and random error terms but no haplotype effects. Analysis was performed using the DMU software package []. Significant threshold was fixed at genome wide 5% level after Bonferroni correction and the mid-point of significant haplotypes were considered as the putative QTL positions. [...] The efficiency of GENMIX for association mapping was described by Sahana et al. []. In contrast to regular genome-wide association studies where phenotypic differences are either associated with single markers or with groups of markers organized in to haplo-groups in a non-stratified fashion, here phenotypes were associated using a hierarchical approach. Both grouping of markers into haplo-groups and clustering of observed haplotypes was done based on local genealogies []. This method identifies the widest possible region surrounding a marker that allows construction of a genealogy forming a bifurcating tree without either recurrent mutation or recombination, in other words it satisfies the four-gamete condition of Hudson and Kaplan []. Each bifurcation in the binary tree corresponds to one bi-allelic marker. Splitting the tree at the top generates two clusters of haplotypes. Splitting the tree at any other node generates three clusters: one above the split point and two corresponding to the two branches below. For the analyses presented in this paper we split the tree at the top (one set of two clusters), the second level (two sets of three clusters) and at the third level (four sets of three clusters). Successively each clustering of haplotypes was included as a random effect in the model for analysis: y i = μ + u i + q h 1 i + q h 2 i + e i where yi is the phenotype of individual i, μ is the population mean, ui is as described above in the MMA; qh1i and qh2i are two haplotype effects of individual i, where h1i and h2i can take values 11, 12, 13, 22, 23, and 33 and Var(q11,q12,q13,q22,q23,q33)=Iσh2 , σh2 is the haplotype variance, and ei is a random residual as defined for MMA. The local genealogies were constructed using the software Blossoc (http://www.daimi.au.dk/~mailund/Blossoc/) and variance component analysis was carried out using the software DMU []. The significance of the SNP association was tested using likelihood ratio test and the significant threshold was fixed at genome-wide 5% level after Bonferroni correction for multiple testing for the total number of markers. […]

Pipeline specifications

Software tools DMU, fastPHASE, Blossoc
Application GWAS
Diseases Disease Models, Animal