Computational protocol: Generalized Admixture Mapping for Complex Traits

Similar protocols

Protocol publication

[…] We carried out simulation studies to assess the performance of GLEAM in terms of type I error rate and power under various scenarios and compared it with the method based on Bayesian likelihood ratio (BLR) by , which is implemented by the software ANCESTRYMAP (http://genepath.med.harvard.edu/∼reich/Software.htm) as well as regularized regression methods Lasso and elastic net (; ; ). GLEAM and ANCESTRYMAP use slightly different HMMs to impute the local ancestries and regularized regression methods require given local ancestries. Because of these differences, we assumed the true local ancestries were given and focused on evaluating the ability of localizing susceptibility loci instead of estimating local ancestries. Our simulations were based on empirical data of local ancestries for 1001 African-American subjects from the HPHB Study (), with 1296 AIM loci measured across the genome. The MATLAB codes for simulating and analyzing the data are included in a Supporting Information folder online.We started by investigating the type I error rates for the local ancestries that were scattered around different regions of the genome and in linkage equilibrium. Under this scenario, the falsely localized AIM locus would be in the region remote from the true disease causing locus, which leads to a false positive finding. We first randomly sampled 1000 AIM loci with replacement from 1296 AIM loci for 1000 subjects. At each AIM locus, we simulated the local ancestries measured by the number of alleles from the African ancestral population from their maximum a posteriori (MAP) frequency estimates under the assumption of Hardy-Weinberg equilibrium. Ten sets of trait data were then generated such that we were able to assess the type I error rates under the genome-wide threshold level (e.g., α = 10−4), by using the following null model for continuous traits: yi = αEi + εi and for binary traits, logit{Prob(yi = 1)} = αEi; where the continuous risk covariate Ei and the measurement error εi followed standard normal distributions. We considered two situations whereby α = 0 in the absence of a covariate effect and α = 1 in the presence of a covariate effect.We next examined power under the single locus alternative models. We simulated 100 sets of traits. Each set included 1000 subjects and one disease associated local ancestry whose location was randomly sampled from 259 AIM loci, where the proportion of African ancestral population ranged from 0.8321 to 0.8817 and was on the top 20% percentile among 1296 AIM loci. Given the local ancestry Si, continuous covariates Ei and measurement error εi generated same as that for the null model, continuous traits were simulated from yi = αEi + βSi + εi and binary traits from logit{Prob(yi = 1)} = αEi + βSi. Under both models, the β was specified as β = c × proportion of African ancestral population which reflected the a priori observation that the locus with the larger proportion of the high-risk ancestral (here African American) population usually demonstrated stronger association with the traits. For continuous traits, we chose the values of effect size multiplier c as 0.2, 0.25, 0.3, 0.35, and 0.4 respectively, with the largest possible effect size equal to 0.3527. Similarly, we picked the c values as 0.4, 0.5, 0.6, 0.7, and 0.8 for binary traits with the largest possible odds ratio equal to 1.8537.We further considered a multilocus alternative model where two local ancestries were associated with the traits and there existed admixture linkage disequilibrium. To do so, we generated an artificial chromosome composed of two pieces from chromosome 1 and chromosome 4 with the length 139.50 Mb and 114.88 Mb, respectively, for 1000 subjects, based on empirical data on local ancestries from HPHB study. In the middle of each chromosome piece with 51 loci, there is one locus whose proportion of African ancestry population was among the highest in all 1296 AIM loci. In the simulations, those two loci are assumed to be associated with traits. We generated 100 sets of continuous and binary traits respectively, each of which was simulated similarly to the single locus alternative model except with two local ancestries involved and both effect size multiplier c values set at 0.7 for continuous traits and 0.35 for binary traits.The simulated datasets were analyzed by the GLEAM and the BLR method. Because the BLR method was primarily developed for binary traits, the BLR method required transformation of continuous traits into binary ones, such as defining the subjects with top 20% traits as the cases and the one with bottom 20% traits as controls. […]

Pipeline specifications

Software tools ANCESTRYMAP, GenePath
Application Population genetic analysis
Organisms Homo sapiens