Computational protocol: Demographic Inference Using Spectral Methods on SNP Data, with an Analysis of the Human Out-of-Africa Expansion

Similar protocols

Protocol publication

[…] The observable object that we aim to reproduce theoretically is the allele-frequency spectrum. In this section, we review the definition of the AFS in a multipopulation context and how one can approximate it with numerical solutions of diffusion-based models that use truncated expansions by orthogonal polynomials ().The joint AFS is defined as a K-dimensional matrix built from the allele counts observed in a sample of individuals from K different populations. Each value in the matrix is an expected number (in the case of an AFS calculated under a theoretical model) or an observed number (in the case of data) of diallelic polymorphisms that fall into a particular frequency class. We denote as ni1,i2,…,iK an entry of the observed joint AFS that specifies the number of SNPs in which their derived state occurs i1 ε [0, C1] times in the first population, i2 ε [0, C2] times in the second population, etc. Here, Ca is the total number of chromosomes sampled from the ath population (a = 1, … , K). For simplicity in the notation, we assume Ca = C for all a throughout this article. Here, while ni1,i2,…,iK denotes an entry of the empirical AFS, we denote as fi1,i2,…,iK(θ) the analogous entry of the theoretical AFS.The AFS can be seen as an object derived from the distribution of population allele frequencies φ(x) on [0, 1]K. In particular, if the derived allele frequencies of a SNP taken at random consist of a vector {xa}a=1K, where xa is the frequency of the SNP in population a, independently and identically distributed with respect to the distribution φ(x), the AFS consists of a finite sample of population alleles as defined in Equation 1. In our model-based approach φ(x) is interpreted as a present-time density that has been shaped by a historical Wright–Fisher process on a population tree specified by the parameters θ. We denote the resulting model-dependent joint density by φ(x|θ). The parameters depend on the particular model and usually involve effective population sizes, migration rates, splitting times, admixture coefficients, population growth rates, etc. In the diffusion approximation to multipopulation Wright–Fisher processes exchanging migrants, the time evolution of φ(x, t) obeys a partial differential equation (PDE) of the type∂∂tφ(x,t)=∑a,b12∂2∂xa∂xb (δabxa(1−xa)2Ne,a(t)φ(x,t)) −∂∂xa(mab(t)(xb−xa)φ(x,t))+ρ(x,t).(3)Here, {Ne,a(t)}a=1K denotes the effective population sizes, {mab(t)}a,b=1K denotes the fraction of chromosomes that population a receives from b, and the nonhomogeneous term ρ(x, t) describes the total incoming/outgoing flow of SNPs per generation into the K-cube from different boundary components of the K-cube and from de novo mutations. These boundary conditions are treated in more detail later. [...] We simulated data under seven different demographic histories for two and three populations to compare different approaches to forward diffusions. We generated 50,000 population allele frequencies using Monte Carlo simulations in each of the seven demographic scenarios. We sampled 20 chromosomes in the scenarios that involved three populations and 50 chromosomes in the scenarios that involved two populations. For each demographic scenario, we computed the AFS with our polynomial-based approach (MultiPop) and with the finite-difference method (∂a∂i) and compared each AFS with the AFS computed with Monte Carlo simulations. The different models and parameters used in the simulations are described below. […]

Pipeline specifications

Applications Population genetic analysis, GWAS
Organisms Homo sapiens
Diseases Tooth Migration