Computational protocol: Genetic structure and ecogeographical adaptation in wild barley (Hordeum chilense Roemer et Schultes) as revealed by microsatellite markers

Similar protocols

Protocol publication

[…] The geographic data (altitude, latitude and longitude) of 76 accessions were available [], and thus they were used to project the data using the DIVA-GIS software;. The geographic location of the study area was between 28°15' and 38°42' South latitude and between 70°18' and 73°24' West longitude. The altitude on the sites varied within a wide range, from sea level to high mountains (> 2000 m). Since only one accession is available in some provinces, the accessions were grouped in eight zones along Chile, from North to South, including in some cases various close provinces with similar ecological characteristics. The ecological data like rainfall of wettest and driest month and mean temperature of the warmest and coldest month were obtained for each site using DIVA-GIS. The ecological regions were described following the bioclimatic classification of DiCastri and Hajek []. [...] The summary statistics including the number of alleles per locus, polymorphism information content (PIC) values and gene diversity were determined using the application PowerMarker version 3.25 []. The unrooted neighbor-joining (NJ) tree was constructed using the Nei's index distance []. One thousand matrices were obtained by bootstrapping, and the consensus tree was constructed with the program Consense of the Phylip package (version 3.66) []. The dendrogram was visualized using the TreeView 1.6.6 software []. We performed a Mantel test correlation [] between Nei's genetic distance and the natural (Napierian) logarithm of the geographic distances, using the library ade4 in the R package (version 2.10.1; R Development Core Team 2008) [].A Bayesian model-based analysis for inference of population structure was performed using the program Structure (version 2.2) [] to estimate the number of groups (K) represented by all sampled individuals and the individual admixture proportions. The Structure software assumes a model in which there are K populations (where K may be unknown), each being characterized by a set of allele frequencies at each locus. Individuals in the sample are probabilistically assigned to a particular population, or associated to two or more populations (if their genotypes indicate that they are admixed). The number of clusters was inferred using 20 independent runs with 100,000 burn-ins and 100,000 iterations after burn-ins, following the admixture ancestry model and correlated allele frequencies, with K ranging from 1 to 10. We have followed the procedure by Evanno et al. [] to better detect the real number of clusters determined by Structure. Also, the clusteredness index [] was calculated, which is based on the Q matrix of Structure, being 1 when individuals are assigned completely to a single cluster and 0 when they are equally assigned to all clusters. The individuals can have membership coefficients summing 1 across clusters.The Distruct 1.1 software [] was used to graphically represent the estimated population structure, according to geographic proximity, ecological region and agronomical data. Each individual was represented by a thick line, which was partitioned into K colored segments, representing the individual's estimated membership fractions in K clusters.The genetic structure of the population was also inferred by the Geneland package [], implemented in the R software. The Geneland software uses geographic coordinates and does not assume admixture, whereas the Structure software does not use geographic coordinates and does assume admixture. We carried out five independent runs using independent allele frequencies with 100,000 iterations, from which each 100th observation was sampled from the Markov chain, with minimum and maximum K being 1 to 10. The run with the highest likelihood was post-processed to obtain the posterior mode of population membership. The genetic differentiation among genetic groups inferred by Structure was estimated by hierarchical analysis of molecular variance (AMOVA), implemented in the Arlequin 3.0 software [].We used the Lositan software [] to identify outlier loci that had excessively high or low Fst compared to neutral expectations. The basic rationale is that (i) loci influenced by directional (also called adaptive or positive) selection will show a larger genetic differentiation than neutral loci; and that (ii) loci that have been subject to balancing (also called negative or purifying) selection will show a lower genetic differentiation. Thus, the methods generally consist of identifying loci that present Fst coefficients that are "significantly" different from those expected under neutral theory (they are called outlier loci). To avoid false positives caused by population structure, the Fst was calculated for the inferred structure groups (the significance level chosen was 0.001, which corresponds to a statistical significance level of 0.05), applying a Bonferroni standard correction. The association of alleles of outlier loci with ecogeographical factors was assayed by linear regression analyses, using the SPSS package version 17.0.0 from SPSS (Chicago, IL, USA). Alleles with frequencies below 5% were excluded. Alleles of each locus were introduced as dependent variables in the model and ecogeographical factors were the independent variables. Significance was calculated for the model, which included only one allele, with the significance threshold set at 0.05, using a Bonferroni correction, as already mentioned.Values of environmental variables were first standardized and the Euclidean distance between the samples was computed using SPSS. The correlation between genetic distance and environmental distance in the collection was calculated by the Mantel test. Also, the Principal Component Analysis (PCA) was computed from environmental values, and the samples were plotted in genetic structure grouping. The Spearman rank correlation was used to assess differences in mean number of alleles and ecogeographic variables among the inferred groups. […]

Pipeline specifications

Software tools DIVA-GIS, PowerMarker, TreeViewX, DISTRUCT, Arlequin
Applications Phylogenetics, Population genetic analysis
Organisms Hordeum vulgare, Triticum aestivum