Computational protocol: Genomic diversity and macroecology of the crop wild relatives of domesticated pea

Similar protocols

Protocol publication

[…] Genomic DNA used for DArTseq analysis was isolated from single-plant samples. The DArTseq methodology requires high-molecular-weight DNA, typically obtained only from fresh material, while ITS and trnSG regions were PCR amplified and sequenced; therefore, herbarium samples could be used. PCR reactions were performed, using primers for ITS and trnSG regions,. PCR products were treated with Exonuclease-Alkaline Phosphatase (Thermo Scientific) and sequenced (BigDye Terminator v3.1 kit) at Macrogene. Haplotype network analysis was performed with PopART using a median-joining algorithm. [...] Bayesian model-based clustering was performed using STRUCTURE,, which has been widely used on cultivated and wild pea germplasm,,. Population structure was assessed using 161 accessions (P. sativum subsp. elatius & P. fulvum) with 66,910 polymorphic markers to infer genetic structure and to define the number of clusters using the STRUCTURE software version 2.3.4. The number of presumed populations (K) was evaluated from 3 to 16. The length of the burn-in period was set to 10,000, after which 200,000 iterations of the Monte Carlo Markov Chain (MCMC) were used for data collection. We ran 4 replicate MCMC chains for each value of K to evaluate the posterior likelihood using the ad hoc delta K method. Principal component analysis was performed using the eigen function of R software (R Core Team) after applying a normalization technique. Spatial autocorrelation analysis using SPAGeDiwas performed to assess the relationship between individual genetic identities and their geographic distance. We selected samples from Turkey and the Near East only in order to exclude the influence of seas and prohibitively large distances. Ritland´s kinship coefficient was employed to quantify average pairwise genetic identity based on 20 distance groups in each group with 200 pairwise comparisons. Randomization testing with 100 permutations was conducted to assess whether individual kinship values differed from expectations.The first 15 pairwise comparisons with the highest kinship coefficient from two potentionally interesting distance groups with a mean distance of 617 km and 888 km were depicted using Google Maps (https://maps.google.com/). Pairwise estimation of population Fst was done using the hierfstat package in R. The heterozygosity of the detected SNPs within the DArTseq dataset was calculated as a percentage of loci heterozygous per individual. Furthermore, the heterozygosity of putative interspecies hybrids was calculated for sets of SNPs associated (P-value of < 5 × 10−8) with respective parental species. To visualize the diversity and structure of the the individual samples in a complementary way, an unrooted split decomposition tree was rendered with the unfiltered DArTsilico data containing 187,298 binary characters using SplitsTree. [...] Using the location data for 409 P. sativum subsp. elatius and 106 P. fulvum accessions (Table ), the potential climatic niches were modelled using Maxent version 3.3.3k. Samples that were removed earlier as duplicates, misidentified or otherwise inappropriate, as well as those that had dubious or inaccurate coordinates, were not included in the modelling. A threshold value of 50 km has been used as the maximum accepted distance, and the validation process took place using free available scripts (http://www.movable-type.co.uk/scripts/latlong.html). All the rejected sites have been omitted from the analyses, and validation tests were applied. The environmental predictors used (19 bioclimatic variables) were from www.worldclim.org. The potential niches of the species were projected in past (Last Glacial Maximum, LGM ~22.000 ybp, http://worldclim.org) and future climatic conditions, following in the latter case the Representative Concentration Pathway (RCP) 6.0 scenario using bioclimatic data created by the Global Climate Model CCSM (Community Climate System Model) 4.0. In order to assess the importance of niche differences between the three species, we performed pairwise niche similarity tests. These tests compare the “observed” niche overlap of the species in question with the “expected” overlap based on the species’ environmental backgrounds. The “observed” overlap, calculated using the metrics D and I, refers to the overlap of the species’ potential niches as they were estimated by Maxent. The “expected” overlap results from substituting the species’ occurrence points with random points from their backgrounds and from calculating D and I for the resulting species/background pair. This random substitution process is iterated a set number of times (100 in our case) in order to obtain a statistical distribution for the two overlapping metrics, against which the “observed” values are tested. The background for each species was derived from its actual occurrence points using a Gaussian filter. Niche similarity tests were performed in ENMTools version 1.4.3. Niche diversity among species, as well as their genotypic groups, was investigated with the use of Shannon’s index of diversity. Typically, this index is expressed asH′=∑i=1Rpilnpi,where H′ is Shannon’s diversity index, and pi is the proportion of individuals (or cover) of the ith species in the dataset of interest. In our case, pi is the probability of occurrence of the ith species, and thus H′ can be calculated on a per-cell basis. The index has been calculated separately for the species using the modelling results of each taxon, as well as for the cpDNA haplotypes that were found during the genetic analysis, using the modelling results of each haplotype. Our quantitative analysis is one of the first to apply Shannon’s diversity index with probabilities of Maxent output to a niche modelling approach. The index was calculated for each cell of the study area using a custom R script. For the manipulation and plotting of spatial data, as well as for the creation of figures, the packages sp, SDMTools and plotrix were employed–. […]

Pipeline specifications

Software tools PopART, SplitsTree, SDMTools
Applications Miscellaneous, Phylogenetics, Population genetic analysis
Organisms Pisum sativum, Homo sapiens