Computational protocol: The First Molecular Identification of an Olive Collection Applying Standard Simple Sequence Repeats and Novel Expressed Sequence Tag Markers

[…] Number of alleles per locus (Na), number of effective alleles (Ne), Shannon’s information index (I), observed (Ho) and expected heterozygosity (He), and fixation index (F) were calculated at each locus for novel and standard SSRs by the use of GenAlEx 6.501 software (). Pairwise relatedness was performed on standard and OLEST SSR markers to calculate the allelic similarity for codominant data using GenAlEx 6.501 following the LRM = estimator – Mean multiplied by 2 to give max of 1.00. The software FreeNA () was applied to detect the presence of possible null alleles (Fnull), to determine the genetic uniqueness of each accession and to quantify redundancy. Polymorphic information content (PIC) was calculated for each microsatellite locus using CERVUS v.3.0 software (). We calculated the probabilities of identity for unrelated individuals [P(ID)] at each locus and across loci, as described by , by using GenAlEx for both OLEST and standard SSR markers. Cumulative P(ID) was calculated by ranking the PIC values at each locus from high to low. We used the criterion of P(ID) lower than 0.001 for the estimation of the minimum number of loci required for individual identification in the study species ().A model-based Bayesian clustering method was applied to infer the genetic structure of 59 cultivars and to define the number of clusters in the dataset (gene pools) using the software STRUCTURE v.2.3 (), for the same sample set separately for OLEST and standard SSRs. Tests were based on an admixture model with independent allele frequencies. No prior information was used to define clusters. Independent runs were done by setting the number of clusters (k) from 1 to 10. Each run comprised a burn-in length of 100,000 followed by 100,000 MCMC (Monte Carlo Markov Chain) replicates. An ad hoc statistic ΔK, based on the rate of change in the log probability of data between successive K values, as described by , was calculated through Structure Harvester v.0.9.93 website () and used to estimate the most likely number of clusters (k). In order to verify the breakdown of cultivars present in the Perugia collection to the Mediterranean groups previously observed (), their profiles for ten standard SSRs were analyzed with those of 281 most widely cultivated cultivars of Mediterranean from the CNR-IBBR database by using the same Structure parameters. Data of 281 cultivars were already published (, ; ). […]

Pipeline specifications