Computational protocol: On Detecting Selective Sweeps Using Single Genomes

Similar protocols

Protocol publication

[…] All simulations were performed using the program SFS_CODE (Hernandez, ). This is a generalized Wright–Fisher forward population genetic simulation for finite-site mutation models with selection, recombination, and demography. The program and documentation are available for download at: http://sfscode.sourceforge.net/SFS_CODE/SFS_CODE_home/SFS_CODE_home.htmlThe parameters used are human specific and rescaled for computational efficiency. The mutation rate (μ) is 2.35*10−8 per site per generation, considering human–chimp divergence to be ~1.13%, divergence time is ~6 mya, and a generation time of 25 years (Gutenkunst et al., ). For expediency, the effective population size, Ne, is 500. To account for the estimated Ne = 10,000 for humans, a rescaling factor of 20 was used. Thus, θ = 0.00094 for all simulations. Similarly, the scaled recombination rate 4Nr = ρ = 0.00074 (Nielsen et al., ). The selection coefficient, s, is evaluated at 0.1, 0.01, and 0.001. All simulations were conducted both in the presence and absence of recombination. Additionally, models of recurrent positive and negative selection were modeled with a fraction 0.01, 0.001, and 0.0001 of sites under selection.Population bottlenecks are modeled in the following way: a population of constant size N is reduced to size Nb at time tb (in units of 4N generations) in the past and then exponentially increases back to size N. Population bottlenecks are simulated for various times since the reduction (tb = 0.1, 0.54, and 1, in 4N generations), and severities (0.02, 0.1, and 0.722).Enard et al. () chose the size of test region (L) depending on the level of heterozygosity across the genome, and thus it varies by species. We simulate a test region of 100 kb with adjacent 2000 kb genomic flanking regions. For this test region, a ratio rl is calculated, which is the ratio of polymorphism to divergence for the L region. A similar ratio rg is calculated for the adjacent genomic region (G) and then a final ratio Robs of rl/rg. If Robs is less than 1, then there is said to be a local reduction in heterozygosity. Similarly, a ratio R is computed for 5,000 additional windows of size q, that are randomly sampled within G, but at a distance at least five times q from L. The Robs for test region is ranked among the R values for the adjacent regions. K is the proportion of random windows with R lower than Robs.K values < 0.05 are statistically significant, and thus reject the model (i.e., are consistent with positive selection). Simulations under models of positive selection were performed in order to characterize the true positive rate; while a variety of simulations under alternative models characterize the false positive rate. These two measures thus describe the performance of the K-statistic. […]

Pipeline specifications

Software tools SFS_CODE, FPG
Application Population genetic analysis
Organisms Homo sapiens, Pan troglodytes, Pongo pygmaeus