Computational protocol: On the Origin and Spread of the Scab Disease of Apple: Out of Central Asia

Similar protocols

Protocol publication

[…] We collected 1,273 individual fungal strains of V. inaequalis from M.×domestica on 28 locations representing seven regions in five continents: Central Asia (Xinjiang Province of China, Iran, Azerbaijan), Europe (France, Sweden, Spain), North Africa (Morocco), South Africa, North America (Canada, USA), South America (Brazil), and Australasia (New Zealand) (, ). All samples represented single orchards, except the sample from Canada that originated from several locations and various host cultivars. In each orchard, infected leaves were sampled randomly and we collected only one leaf per apple tree.Host resistance can induce selective and/or demographic sweeps in fungal populations, leading to lineages highly divergent from populations found on susceptible cultivars –. To avoid confounding geographic structure with possible associations between host cultivars and fungal genotypes (i) we sampled on cultivars with no known effective resistance; (ii) we minimized the total number of cultivars sampled by focusing as much as possible on the commercially leading cultivars Fuji, Royal Gala, and Golden Delicious; and (iii) we collected samples on different cultivars at several locations and checked for the absence of associations between host and V. inaequalis genotypes by calculating pairwise φST between samples (an analog of Wright's FST fixation index) using Genalex . As pairwise φST values were low or nonsignificantly different from zero (), samples from the same location were pooled for all subsequent analyses, except the samples from Mechraâ Bel Ksiri in Morocco. We obtained a total of 29 samples. [...] The number of haplotypes was calculated using Arlequin 3.00 , and it was used to quantify the clonal fraction . We treated multilocus haplotypes repeated multiple times as clones. For all subsequent analyses, we used a data set in which each multilocus haplotype was represented only once in each sample .Expected heterozygosity , allelic richness, and unique allele richness were computed using scripts written in Matlab (The Mathworks, Natick, Massachusetts). Unique allele richness represents the number of alleles that are unique to a particular sample in comparisons with all other samples, averaged across loci. To account for differences in sample size, samples were standardized to a uniform size equal to the size of the smallest sample (South Africa: 12 individuals) using random draws with replacement (nonparametric bootstrapping) , . For each sample, expected heterozygosity, allelic richness, and unique allele richness indices were calculated as the average value of 100 bootstrap replicates . We examined correlations between these variability indices and geographical distance calculated as the arc surface distance from the most eastern Chinese sample. Because the variables tested may not be distributed normally, all correlations were nonparametrically tested using Spearman r available in Graphpad (GraphPad Software Inc., San Diego, California).Associations of alleles among different loci were examined in each sample using the index of association (IA) statistic, which is a generalized measure of multilocus linkage disequilibrium . The null hypothesis of random association of alleles (IA = 0), consistent with random mating, was tested using the program Multilocus by comparing the observed value of the statistic to that obtained after 1,000 randomizations to simulate recombination. [...] We used four different methods to determine the optimal number of populations present in our data set, to assess the level of differentiation, to infer the geographic ancestral relationships among these populations, and to identify recently founded populations, as these are expected to cluster with their source population.First, we calculated principal coordinates on Cavalli-Sforza and Edwards' chord distance among samples . The chord distance matrix was built using the Microsatellite analyzer (MSA 4.00) software , and principal component analysis was performed under Genalex.Second, we used the Bayesian clustering algorithm implemented in Structure 2.1 , . This method relies on the Bayesian Monte Carlo Markov Chain (MCMC) approach to cluster individuals into K distinct populations that minimize Hardy-Weinberg disequilibrium and gametic phase disequilibrium between loci within groups. The model allowed individuals to have mixed ancestry and correlation of allele frequencies. Uniform priors were assumed and the MCMC scheme was run for 500,000 iterations after an initial burn-in period of 50,000. We ran Structure for K ranging from 1 to 13 and we performed at least six repetitions to check for convergence of likelihood values for each value of K. Convergence of the MCMC could not be achieved for K values higher than 13. The number of populations that best represents the observed data under the model implemented was determined by maximizing the estimated Ln likelihood of the data for different values of K.Third, we used the Bayesian clustering algorithm implemented in Baps 4 to identify the optimal number K of partitions among groups of samples. By contrast to the individual-based algorithm applied in Structure, we used the group-level option in Baps such that clusters are formed by assembling whole samples. Baps 4 relies on stochastic optimization to infer the posterior mode of the genetic structure. The program was run for K ranging from 1 to 29 with five replicates for each value of K to ensure that the stochastic optimization algorithm had not ended up in different solutions in separate runs. Goodness-of-fit levels of the clustering solutions to the data set are compared in terms of natural logarithm of the marginal likelihood of the data. We also used Baps to perform an admixture analysis aiming at estimating individual coefficients of ancestry with regard to the inferred clusters of samples. For this analysis, we used 1,000 iterations to estimate the admixture coefficients for the individuals, we used 200 reference individuals from each cluster, and we repeated the admixture analysis 50 times per individual.Fourth, we used Geneclass 2.0 to assign individuals to regional groups of samples. The probability of individuals coming from each area was calculated using the standard criterion described by Rannala and Mountain and by simulating 1,000 individuals per regional group of samples using the method of Paetkau et al. . Individuals were assigned to a regional group when this group had the highest probability of being the source of this individual. […]

Pipeline specifications

Software tools GenAlEx, Arlequin, GeneClass
Application Population genetic analysis
Organisms Malus domestica