Computational protocol: Influence of mutation rate on estimators of genetic differentiation - lessons from Arabidopsis thaliana

Similar protocols

Protocol publication

[…] In order to investigate the behaviour of FST, F'ST, ΦST and D under high mutation rates, computer simulations using EasyPop 1.8 [] were performed. The simulation scheme was set to 10 populations with 500 individuals each, 20 freely recombining loci and random mating hermaphrodites. All loci were set initially as polymorphic by allowing EasyPop to generate the genotypes in the first generation. Populations followed an island model of migration. Migration rates (probability that a given individual will migrate in each generation), m ranged from 0.1 to 0.00001 and mutation rates (probability that a given allele will mutate in each generation), μ from 0.00001 to 0.01. In order to simulate microsatellite loci we first examined a pure single step mutation model. Then we relaxed this assumption by using a mixed mutation model in which the loci followed a single step mutation model but with the probability of 0.2 to mutate to any state. The number of possible allelic states was set to 30. The effect of self-fertilisation was examined by doing simulations with proportion of self-fertilisation set to 0.9. Simulations were run for 2000 generations. In our simulations this was enough for FST to reach equilibrium. To simulate a realistic sampling situation, 30 individuals were finally sampled from each population for parameter estimation. Each simulation was repeated 5 times for a given set of parameter values. For each simulated dataset we calculated the estimators of genetic differentiation as described below.Next we examined how mutation rate at underlying QTL affects QST. We used quantiNEMO [], with the same settings as described above for neutral markers with the following exceptions: the number of QTL underlying the variation in the quantitative trait was 10 and there were 21 possible allelic states for each QTL. We used the random mutation model in quantiNEMO for the QTL alleles, in this model, allelic effects are drawn from a normal distribution, all effects are additive in our simulations. We also ran the simulations using the incremental mutation model, where the allelic effect of a new mutation resembles its ancestor. Variance of allelic effects was set to 0.1. The simulated quantitative trait was neutral and did not have any effect on fitness. The simulation was started at a state where all loci were monomorphic, the number of generations was 4000. The time to reach equilibrium was longer for low migration and mutation rates and in these cases number of generations was 6000. This is longer than for neutral markers because simulations had to be started from a monomorphic state; otherwise distribution of allelic effects becomes unrealistic. Variance components for QST were estimated from genotypic values, which are returned by quantiNEMO as output, using R-scripts written by IK. The statistical model was a mixed-effect model with populations as random factors; REML-estimates of variance components were used. This was done in order to calculate QST also in the presence of self-fertilisation, which quantiNEMO does not calculate as a standard output. QST was estimated from the equation [], where is the between populations genetic variance component, is the within populations genetic variance component and FIS is reduction of heterozygosity within individuals due to inbreeding. Our method of estimating QST gives the same results as the standard output of quantiNEMO when mating is random (results not shown). For each parameter set, 50 replicates of simulations with a single quantitative trait were run.We also performed coalescent simulations to investigate the effect of different marker types on FST calculations. We investigated DNA haplotypes (these would be derived by re-sequencing short fragments, one locus is one fragment), independent single SNP markers and microsatellite markers following a single step mutation model. All coalescent simulations were performed using the program ms []. We simulated an island model of population structure with 10 populations, 20 individuals were sampled from each population. For sequence haplotypes and microsatellites 30 independent loci were simulated, for SNP markers we simulated 100 independent SNPs. For single SNPs and haplotypes, multiple hits were not permitted. The microsatellite mutation model was implemented via R-script. In the program ms migration and mutation rate are expressed in terms of effective population size, 4Nm and 4Nμ respectively. We set up the simulations so that the effective population size was 1000 for each population and then parameters m and μ ranged from 0.0001 to 0.1 for m and 0.00001 to 0.001 for μ. Each simulation was repeated 5 times for each parameter combination. [...] All statistical analyses were done using the statistical environment R [] unless otherwise stated. Methods not implemented by R-packages were implemented via R-scripts written by IK and are available upon request.Measure of genetic diversity, Nei's gene diversity (Hs) was calculated using FSTAT 2.9.3 []. The microsatellite population mutation rate, θ, is the product of effective population size and mutation rate at a locus was calculated following equation 15 of Kimmel et al. []. The performance of this summary statistic based method has been shown to be comparable to likelihood-based methods []. θ was calculated for each locus within each region. For SNP data the minor allele frequency was calculated for each locus in each region.FST was estimated according to Weir & Cockerham [] for microsatellites and SNP markers, using the R-package "hierfstat" []. All other genetic differentiation methods were implemented via R-scripts written by IK. For microsatellites the standardised genetic differentiation measure, F'ST [], was estimated using the maximised variance component method of Meirmans []. In order to take the distance between the microsatellite alleles or sequence haplotypes into account [] we estimated ΦST using the method of Michalakis and Excoffier []. Differentiation indices between regions were calculated in a hierarchical setting, taking into account the partition of variation between populations within regions []. Confidence intervals for different measures of genetic differentiation were generated by bootstrapping over loci. An estimator for D was calculated following Jost [].The expected FST was calculated for the forward and coalescent simulations using the relationship between FST and coalescence times in the island model. The expected FST was calculated as [], where = average coalescence time of alleles from different populations and = average coalescence time of two alleles from the same population. For an island model of population structure, and , where N = population size, d = number of populations and m = migration rate []. This provides us with an analytical estimate of the expected value of FST. We used this as a baseline when comparing the different estimators.To check whether ΦST >FST for the microsatellite loci, we used a permutation test []; if ΦST >FST stepwise mutations may contribute to genetic differentiation, thereby providing one explanation for the difference. The test was implemented in the program SPAGeDi 1.2 []. This is done by permuting microsatellite allele sizes among allelic states to test if stepwise mutations contribute to genetic differentiation. […]

Pipeline specifications

Software tools EASYPOP, quantiNemo, SPAGeDi
Applications Phylogenetics, Population genetic analysis
Organisms Arabidopsis thaliana