Computational protocol: Investigation of genetic diversity and population structure of common wheat cultivars in northern China using DArT markers

Similar protocols

Protocol publication

[…] The polymorphism information content (PIC) values were calculated for each DArT marker using the formula PIC = 1 - ∑ (Pi)2, where Pi is the proportion of the population carrying the ith allele []. Nei's genetic diversity is defined as the probability that two randomly chosen haplotypes are different in the sample and was estimated with the formula [,].A binary matrix was produced from the DArT data by scoring fragments as 1 or 0 for the presence or absence of a specific marker allele, respectively. Because DArT markers were scored as dominant, no attempt was made to identify loci harboring heterogeneous or heterozygous alleles. Consistent 0/1 data matrices were used as input for genetic diversity and population structure analysis. NTSYSpc (version 2.0) analysis software was used to perform principal-coordinates analysis (PCoA) using a genetic similarity matrix based on the Jacard genetic similarity index (sij)[]. The Jacard coefficient (sij) measures the asymmetric information on binary variables and is computed according to the following formula: sij = p/(p+q+r), where p = number of bands present in both individuals (i and j), q = number of bands present in i and absent in j, r = number of bands present j and absent in i. Based on decomposition of any multidimensional distance metric, PCoA analysis is similar to the more familiar principal-components analysis (PCA), which is based on Euclidean coordinates. The NTSYSpc analysis software was also used to construct an unweighted pair-group method with algorithmic mean (UPGMA) dendrogram.Linkage disequilibrium analysis was performed to investigate differences between the two groups. Each classified group was defined as a "locus," with the cultivars (lines) in one group scored as "0" and those in another group scored as "1." Linkage disequilibrium (LD) between pairs of polymorphic loci was evaluated using the software package TASSEL1.9.4 ( LD was estimated using the squared allele frequency correlation (r2), which is a measurement of the correlation between a pair of variables [].In addition, analysis of molecular variance (AMOVA) was used to estimate the genetic structure among groups and subgroups of cultivars. This method works on a distance matrix between samples in order to measure the genetic structure of the population from which the samples are drawn. It was carried out using ARLEQUIN v3.11 [] to estimate genetic variance components and to partition the total variance within and among subgroups and among groups. The significance of variance components was tested using 1000 permutations. The fixation index (FST), which is a measure of population differentiation and genetic distance, based on genetic polymorphism data, was computed. […]

Pipeline specifications

Software tools NTSYSpc, TASSEL, Arlequin
Application Population genetic analysis
Organisms Triticum aestivum