Computational protocol: Reconstructing the origin and dispersal patterns of village chickens across East Africa: insights from autosomal markers

Similar protocols

Protocol publication

[…] Allelic diversity (total number of alleles, mean number of alleles (MNA), allelic richness, polymorphic information content (PIC), effective number of alleles) and genetic diversity (expected (He) and observed (Ho) heterozygosity) were estimated from allele frequencies with FSTAT (Goudet ) and Microsatellite toolkit (Park ) and POPGENE 1.32 (Yeh et al. ). Total genetic variation of the populations (FIT) was partitioned into within (FIS) and among population (FST) components following Weir & Cockerham (). For each locus-population combination for the global data set and population groupings, we used Fisher's exact test with Bonferroni correction to test possible deviations from Hardy–Weinberg equilibrium (HWE) using GENEPOP 3.4 (Raymond & Rousset ). Exact P-values were estimated using the Markov chain algorithm with 10 000 dememorizations, 500 batches and 5000 iterations per batch.We used Bayesian clustering algorithm implemented in STRUCTURE 2.3.3 (Pritchard et al. ; Falush et al. ) to infer population structure and explore the assignment of individuals and populations to specific genetic clusters. For this analysis, we allowed the number of clusters (K) to vary between 1 ≤ K ≤ 15, using a burn-in of 50 000 followed by 100 000 Markov Chain Monte Carlo (MCMC) iterations. Ten simulations were carried out for each K assuming four scenarios: (i) populations are admixed and allele frequencies correlated; (ii) populations are admixed and allele frequencies independent; (iii) populations are not admixed but allele frequencies are correlated; and (iv) populations are not admixed and allele frequencies are independent. To estimate the most optimal K, we used three approaches. First, we used the best log-likelihood score resulting in the highest percentage of membership coefficient (q) to each cluster (Pritchard et al. ). Second, the number of clusters (K) was plotted against ΔK = m|L”(K)|/s|L(K)| and the optimal number of clusters identified by the largest change in log-likelihood (L(K)) values between the estimated number of clusters (Evanno et al. ). Third, we adopted Pritchard et al. () suggestion that for real-world data in which identifying the correct K is not always straightforward; the best choice of K should be the one that reveals a biologically meaningful genetic structure. DISTRUCT (Rosenberg ) was used to generate a graphical display of the simulated results.To further generate additional information to assist in interpreting the results from STRUCTURE and therefore, correctly infer the underlying genetic structure, we used the Factorial Correspondence Analysis (FCA) implemented in GENETIX 4.05 (Belkhir et al. ) and the Principal Coordinate Analysis (PCA) implemented in ADE4 package (Dray & Dufour ) in the R-environment (R Development Core Team ). FCA portrays the relationship between individuals or populations based on the detection of the best linear combination of allele frequencies. PCA, on the other hand, clusters individuals using proportionate data based on allele frequency information. By comparing the clustering solutions generated by STRUCTURE, FCA and PCA, we defined clusters of village chickens for subsequent population genetic analyses.The possible influence of single loci on the observed genetic structure revealed by STRUCTURE, FCA and PCA was assessed using the Multiple Co-inertia Analysis (MCoA) (Chessel & Hanafi ) implemented in ADE4 package (Dray & Dufour ) in the R-environment (R Development Core Team ). MCoA reveals common features of single marker analyses, generates a reference structure and makes it possible to compare population structures from single-markers with the consensus reference structure generated from the simultaneous analysis of all the markers. Using the MCoA, we estimated typological values (Tv) for each marker; the contribution of markers to the construction of the reference typology, which is equal to the product of the variance (Var) multiplied by the congruence with the consensus Cos2 (i.e. the correlation between the scores of individual locus tables and the synthetic variable of the same rank) (Laloë et al. ).Demographic history of the populations was investigated by assessing whether or not East African village chicken populations are at mutation-drift equilibrium (MDE). We searched for signals of population expansions or contractions using four statistical approaches. Using the program Bottleneck (Cornuet & Luikart ), we first carried out the T2-test with the modified two-phase mutation model (TPM) (Garza & Williamson ) of microsatellite evolution and second, the qualitative descriptor of allele frequency distribution (mode shift indicator) test. The former (T2-test), detects recent bottlenecks on the principle that a reduction in effective population size leads to an exponential decay in heterozygosity and allele numbers at polymorphic loci and that reduction in allelic diversity is more pronounced and faster than the decline in heterozygosity (Cornuet & Luikart ). The latter (mode shift indicator test) reveals a bottleneck at some point in the history of a population, if a deviation from the L-shaped allele frequency distribution is observed. The parameters for the TPM were set such that 88% of the mutations followed the stepwise mutation model and 12% followed a multistep one with a variance of nine (Di Rienzo et al. ). Significant departures from MDE, within and across populations were tested using the one-tailed Wilcoxon test. Third, we used the intra-locus kurtosis test (k-test) and the inter-locus variance test (g-test) (Reich & Goldstein ; Reich et al. ) for MDE. The k-test is based on the understanding that allele distribution patterns in expanding populations differ from those that are demographically stable. In expanding populations, the kurtosis (k), or the combination of the variance and kurtosis (Reich et al. ), of the allele size distributions is positive. The method uses a binomial test of the number of positive k-values based on the expectation of an almost equal probability (P = 0.515) of negative and positive k-values. The g-test, on the other hand, compares the observed and estimated values of the variance in allele sizes across loci. In stable populations, the variance is highly variable among loci, whereas in expanding populations, it is much more even. For this test, low variances in allele sizes may be taken as evidence of expansion, and we used the cut-off values given in (page 455) of Reich et al. () for inference purposes. Both the k- and g-tests were performed using the Macro program ‘kgtests’ (Bilgin ) implemented in Microsoft Excel®.As a livestock species closely associated with human activities and societies, the genetic structure of domestic chicken may be influenced by genetic improvement through crossbreeding with commercial stocks, past migration and geographic dispersion patterns. To investigate whether any of the genetic clusters revealed by STRUCTURE, FCA and PCA were influenced by introgressions from commercial breeds, the 112 individuals from four commercial breeds we genotyped were included in a separate STRUCTURE analysis with all the indigenous birds. The parameters and settings used previously to investigate the genetic structure of the indigenous fowls were employed in this STRUCTURE analysis. Individual village chickens with a membership coefficient (q-value) of above 0.2 for the commercial cluster were regarded to be influenced by commercial breeds. The possibilities of non-random associations between genetic differentiation, measured as [FST/(1-FST)] (Rousset ), and geographic distances, in kilometres, were tested using the IBDWS 3.05 ( Geographic distances between populations were calculated using the MapCrow Travel Distance Calculator ( as the distance between the central most towns within each sampling locations. To investigate the ecological specificity of any genetic clusters generated by STRUCTURE, FCA and PCA as an indirect indicator of adaptation to different ecological zones (eco-zones), we tested whether any of the genetic clusters were associated with any of the eco-zones (, Supporting information) spanning the study area. For this test, we evaluated the magnitude and significance of correlations between the genetic clusters and eco-zones using Kendall's tau and Spearman's rho statistics.Locus FST values across populations were used to test the hypothesis of diversifying selection acting at each locus. We used here two approaches, the FDIST2 outlier test (Beaumont & Nichols ) implemented in LOSITAN (Antao et al. ) and the Bayesian approach implemented in BayeScan (Foll & Gaggiotti ). We chose these two methods because they have the lowest type I and II error rates (Narum & Hess ). For FDIST2, we carried out 100 000 simulations with a cut-off probability value of 0.99. For BayeScan, we set a value of 10 as the prior odds for the neutral model with a false discovery rate (FDR) of 0.05 and retained 550 000 iterations of the (MCMC) simulations to ensure convergence of the posterior distributions with minimal MCMC chain autocorrelation. We focussed on outlier loci suggested to be under diversifying (positive) selection only, although the two methods can also detect outlier loci showing significantly low FST values indicating balancing selection. Indeed, microsatellite loci characterized by high mutation rates may show significantly low FST outlier values independent of any balancing selection pressures (Beaumont ). The analysis was performed for each cluster generated by STRUCTURE, FCA and PCA. […]

Pipeline specifications

Software tools POPGENE, Genepop, DISTRUCT, IBDWS, BayeScan
Application Population genetic analysis
Organisms Gallus gallus, Homo sapiens