Computational protocol: Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits

Similar protocols

Protocol publication

[…] HCHS/SOL participants self-identified as primarily associated with one of six background groups: Central American, Cuban, Dominican, Mexican, Puerto Rican, and South American. Based on these groups, “genetic analysis groups” were constructed as described in Conomos et al. (2016) []. The genetic analysis groups mostly overlap self-identified background groups, but were constructed to be more genetically homogeneous, as determined using PC analysis, and to assign groups for individuals who self-identified as having “more than one” or “other” background. Of these groups, the Mainland groups (Mexicans, Central Americans, and South Americans) have high proportion of Amerindian ancestry, with Mexicans generally having the highest proportion of Amerindian ancestry, and the Caribbean groups (Cuban, Dominican, and Puerto Rican) have high African ancestry, with Dominicans generally having the largest proportion of African ancestry. In contrast, Cubans generally had the lowest proportion of Amerindian ancestry, and Mexicans the lowest proportion of African ancestry. Distributions of admixture proportions, as well as PC plots and additional information about the construction of the groupings are provided in [].Local ancestry inference has been reported in Browning et al. (2016) []. It was performed using RFMix [] based on the genotyping values and a reference panel derived from the Human Genome Diversity Project [] (HGDP) and the 1000 Genome Project []. This resulted in 14,815 intervals, each spanning tens to hundreds of thousands base pairs. For each interval and each participant, we obtained the likely count of these intervals (0, 1 or 2) that were inherited from a European, African, or Amerindian ancestor. [...] After discovering genome-wide significant LAI associations (at the admixture mapping level), we attempted to fine-map these intervals to detect genotype associations that account for the signals. To determine whether a genotype association explains an admixture mapping signal, it suffices to use the genotype count as a covariate in the same regression analysis as the local ancestry count, to see if the local ancestry association becomes less statistically significant.Each local ancestry interval corresponds to possibly thousands of imputed and hundreds of genotyped variants in the HCHS/SOL data set, many may not appear to be significantly associated with the trait (e.g. p-value> 10−7). Based on the power comparison provided in , admixture mapping may be much more powerful than association mapping when the allele frequency of the causal variant substantially differs between the compared ancestries. Therefore, we searched for genotype association loci by filtering variants in the interval according to both their p-value threshold and sometimes differences in effect allele frequencies (EAFs) between the two genetic analysis groups with the highest and lowest proportions of the ancestry of interest. Thus, if an admixture mapping association was discovered in a specific interval comparing the counts of Amerindian ancestry to other ancestries, we considered the EAF (which is easily calculated) of the Mexican and Cuban genetic analysis groups for variants in the interval. This is a practical alternative to calculating the more interesting quantity, the difference between EAFs in the ancestral populations, for all variants in the LAI, because it is computationally too intensive.For relatively high p-values in association testing (e.g. ∼ 0.001) we required a difference in EAFs between the Mexican and the Cubans. For relatively low p-values (e.g. ∼ 10−6), we did not make any such restriction, because (1) the number of such variants is low, so stronger filtering is not required, and (2) admixture association may be detected by differences in the variant effect between the ancestral groups, even when the EAF is the same. In practice, we took a step-wise approach in which we relaxed the required p-value by factors of 10 (10−6, …, 10−2), where for low p-values (10−3, 10−2) we filtered by decreasing differences in EAFs (0.2, 0.15, 0.1, 0.5) of the two relevant genetic analysis groups. Finally, when searching for variants in the association analysis results, we utilized the heterogeneity of the HCHS/SOL cohort again by searching in both the results from the meta-analysis of all genetic analysis groups, the meta-analysis of only Caribbean, and the meta-analysis of only the Mainland genetic analysis groups reported in []. This was done because differences in the causal variant’s effect sizes between ancestral populations can cause an admixture mapping association, and in such settings, a variant association may be observed in only one of the Mainland/Caribbean groups due to the differences in admixture proportions between them.Upon generating a list of potential variants associated with the trait, we pruned them to obtain lead variants (variants with lowest p-value in association testing) from each set of correlated variants, defined as the set of variants such that each variant has Pearson correlation of at most 0.4 with at least one other member of the set. We then applied the admixture mapping model while adjusting for these variants in the mixed model (conditional model), and also calculated the ancestry-specific EAF of these variants using ASAFE software. [] […]

Pipeline specifications

Software tools ADMIXTURE, RFMix, ASAFE
Databases HGDP
Application Population genetic analysis
Organisms Homo sapiens
Chemicals Vitamin D