Computational protocol: Genomic Insights into the Ancestry and Demographic History of South America

Similar protocols

Protocol publication

[…] To characterize the ancestral components of South American Latino individuals from Colombia, Ecuador, Peru, Chile, and Argentina, we applied unsupervised clustering models and principal components analysis to genotype data from ancestral and admixed populations () (see ). This data set contains 436 admixed South American individuals together with 204 European individuals from the POPRES study [], 50 Yoruban and 50 Han Chinese from the 1000 Genomes Project [], and 493 unmasked Native American individuals from Reich et al. 2012 []. The South American individuals showed varying proportions of European, Native American and, to a lesser extent, West African ancestry in PCA space, supporting the notion of a broad range of global ancestry patterns throughout South America. We observed some dispersion of Native American individuals away from the main ancestral cluster due to the presence of European admixture.We then ran clustering models for K = 2 through K = 15 ancestral populations with ADMIXTURE [] on a total of 1,233 individuals. Cross validation errors for the ADMIXTURE analysis are shown in . The minimum CV error was observed at K = 13. When clustering is performed assuming K = 4 ancestral populations (), the algorithm separates the individuals into four major continental clusters. Average continental ancestry proportions for each of the admixed populations are shown in . As expected from historical records [,] and previous results from other Latino populations in the Caribbean [] and Mexico [], South American Latino individuals show a mixture of European, Native American, and African ancestry. However, some populations, especially those in Peru, Chile, and Argentina, tend to have a smaller proportion of African ancestry than seen in Latino populations in the Caribbean (p < 2.2 x 10−16, Wilcoxon test, ), also observed in previous analyses [,–]. We find significant differences in global ancestry proportions between countries within South America. The Peruvian individuals tend to have a higher proportion of Native American ancestry than individuals from any of the other South American populations (Tukey HSD Test, p < 0.001 vs. Argentina, Chile, Colombia, Ecuador; ). We observed multiple Peruvian individuals with a >25% proportion of East Asian ancestry, which is not surprising given that there were large Asian migrations to Peru especially during the 19th and early 20th century where laborers from Guandong (formerly Canton) province in China were brought to the country []. Peru opened its borders to Asian immigration in 1849, and it is estimated that over 87,000 Chinese individuals entered Peru between 1859 and 1874 []. This East Asian ancestry component is also seen in the Northern Amerindian individuals. These individuals are from Eskimo, Aleut, and Na-Dene populations and the observed clustering is consistent with the hypothesis of multiple waves of gene flow from Asia to America suggested by a previous study []. At higher values of K in ADMIXTURE, these individuals are assigned to their own ADMIXTURE component, indicating a unique ancestry component that is separate from the East Asian cluster ( and ).The Argentinian population has a significantly higher proportion of European ancestry than the Peruvian, Chilean, and Ecuadorian populations (Tukey HSD Test, p = 0.018 vs. Chile, p = 0.129 vs. Colombia, p<0.001 vs. Peru and Ecuador) with some individuals having close to 100% European ancestry (). Even so, there is a large range of ancestry proportions within individuals from Argentina, consistent with previous results based on a small number of ancestry informative markers and blood group antigens [,,]. This variance is most likely a result of the contrasting histories of different Argentinean regions. For example, the original Spanish settlers of Argentina came through the Pacific/Andean region []. However, as Argentina developed, individuals from Spain and Southern Europe settled throughout the coastal regions on the Atlantic []. We also observed a small number of Argentinian individuals with relatively high amounts of African ancestry, whereas the rest of the individuals have a very low African ancestry component. This diversity is reflected in the large range in ancestry proportions seen within Argentina and is consistent with previous studies [,,].At higher order Ks (K = 13 in ), we observed significant substructure in both the Native American and European populations. The North-South gradient among European populations is strongly correlated with the latitude of each country’s capital (p < 2.2 X 10−16, linear regression), with a southern European component (light blue) most prominent in Spain, Portugal, Italy, and Greece. Most of the admixed Latino individuals in the sample have a high proportion of this southern European component, suggesting that the Europeans involved in admixture events in South America are from the Iberian Peninsula and Mediterranean Europe. This observation is consistent with historical migration patterns and maintained cultural influence []. On the other hand, the primary cluster of Native ancestry is reflective of the local indigenous diversity. We find that a component of the Native American ancestry in the Peruvian samples is shared with local Andean native groups, such as Quechua and Aymara, and that of Colombians is more closely shared with the Southern and Central Amerindian groups (, K = 13). In contrast, we see that the Native American component in Argentina and Chile is shared between components from Central/Southern Native American and Andean Native American groups, showing a wider range of ancestral origins that we explore below in further analyses (, K = 13).Sex biased ancestry is an important feature of many Latin American populations, and has been observed and described thoroughly in many previous research articles [,]. European migrants to the Americas were mainly male, especially during the earlier years of colonization. This has resulted in increased Amerindian ancestry on the X-chromosome when compared to the autosomes. After excluding admixed males from the analysis, we had admixed individuals from only four populations: Argentina, Chile, Colombia, and Peru. We compared ADMIXTURE estimates at K = 3 of autosomal and X-chromosomal ancestry (). We find an increase in Native American ancestry on the X-chromosome compared to the autosomes (, Wilcoxon p < 0.001). This is suggestive of the fact that there was an overabundance of European males and Amerindian females that participated in the admixture process. [...] Previous work has performed ASPCA ancestry analyses using both trio [] and population [] phased data. Here, we show that the results of these analyses between trio-phasing and pop-phasing samples are similar. Trio phasing generally produces more accurate haplotypes than population phasing. This could affect the results of ancestry deconvolution methods that rely upon long range phasing information, such as ASPCA and Tracts. However, we find no significant difference between trio and population phasing results when using RFMix’s phase correction feature. In the paper describing the RFMix algorithm [], Maples et. al. demonstrate that the RFMix phase correction feature produces highly accurate long range haplotypes in admixed populations even when population phasing was performed. To assess the differences between trio-phased and population phased data, we compared ASPCA and Tracts results from the 1000 Genomes Peruvian and Colombian individuals between the different phasing approaches. For ASPCA, we find that the population and trio-based methods return similar results for both the Native American and European ancestry (). To assess the effect of phasing on the IBD analysis, we compared the results of IBD tract length analysis of trio-phased and pop-phased samples. The trio-phased IBD analysis finds an increased number of IBD tracts, however, the proportion of European, Native American, and African tracts is very similar () and the length distribution of the tracts is similar. We find the IBD tracts of each population have a Spearman correlation of 0.995 for the European tracts, 0.996 for the Native American IBD tracts, and 0.952 for African IBD tracts. Thus, we find no evidence of systematic bias in the IBD analysis due to the population phasing. For Tracts analysis, the trio-phased Tracts result has an earlier onset of admixture than the population-phased samples in the both populations. Specifically, we estimate 10 generations to the onset of admixture in the population-phased 1000 Genomes Peruvians vs. 9 generations for the same individuals when trio-phased. For the 1000 Genomes Colombians, we estimate 12 generations for both the population-phased and trio-phased data. Therefore, the admixture onset times calculated here may be slightly biased towards overestimating the initial onset of admixture. These tests indicate that using population-phased samples in combination with RFMix’s phase correction abilities in our ancestry analysis pipeline introduces little bias to the results. […]

Pipeline specifications

Software tools ADMIXTURE, RFMix
Application Population genetic analysis
Organisms Homo sapiens