Computational protocol: Genetic Vulnerability and the Relationship of Commercial Germplasms of Maize in Brazil with the Nested Association Mapping Parents

Similar protocols

Protocol publication

[…] For each SNP marker, estimates were generated among the Brazilian commercial hybrid population and for the combined population (hybrids and NAM parents) by means of PowerMarker 3.25 software []:Gene Diversity—Dl Dl=1−∑i=1kpli2(1) in which pli is the frequency of the i-th allele of the l-th SNP.Minor Allele Frequency—MAF MAFl=Ql+Rl2×(Pl+Ql+Rl)(2) in which Ql is the sum of heterozygous loci at the l-th SNP, Pl is the sum of homozygous loci for the i-th allele at the l-th SNP, and Rl is the sum of homozygous loci for the alternative, lower frequency i-th allele at the l-th SNP.Heterozygosity—Hl Hl=QlPl+Ql+Rl(3) in which Ql is the sum of heterozygous loci at the l-th SNP, Pl is the sum of homozygous loci for the ith- allele at the l-th SNP, and Rl is the sum of homozygous loci for the i-th, lower frequency allele at the l-th SNP.Polymorphism Information Content—PIC PICl=1−∑i=1kpli2−∑i=1k−1∑j=i+1k2pli2plj2(4) in which pli and plj are the frequency of the i-th and j-th alleles, respectively, at the l-th SNP.From the allelic frequencies we estimated the genetic distances as described by Nei et al. []: DA=1r∑j=1r(1−∑i=1mjxijyij)(5) in which xij and yij are the frequencies of the i-th allele at the l-th SNP on X and Y populations, respectively. r is the number of loci studied.Subsequently, a dendrogram was generated using a bootstrap method (2,000 iterations), using the unweighted pair group method average (UPGMA). These bootstrap analyses were performed using the PowerMarker 3.25 software, and the node consistency dendrogram was generated using the MEGA 5.1 software [].The estimates of effective population size (Ne) of the Brazilian commercial hybrids was obtained using the following equation: Ne=11+Fst(6) and the Ne estimate of the hybrid progenitors was obtained using: Ne=12Fst(7) in which Fst is the withinpopulation- inbreeding coefficient of the commercial germplasm. The latter was estimated as described by Resende et al. [] using the genomic kinship matrix (G). G is estimated by: G=WW′∑i=1n2piqi(8) in which W is the incidence matrix to fix effects of alleles at bialellic markers, and pi and qi are allelic frequencies of biallelic markers at the i-th loci. So Fst was obtained by the mean of the diagonal of the G matrix less one.The identification of private alleles of each company was performed with the Convert 1.31 software []. Private alleles of a given locus were considered to be those present only in the given population and absent in all others. The positions of the private alleles and available information regarding the associated metabolic and biologic functions of the respective genes were recorded using the Gramene platform ( [].To quantify the kinship between the germplasms of the market share companies, Pearson correlation analysis was performed between the companies using the allele frequencies of the SNPs.Aiming to identify the population structure of the elite Brazilian germplasms of maize breeding companies (Population one) and their relationship with NAM parents (Population two), two population structure analyses were performed using the Structure 2.3.4 Software []. In the first, only the genotypes of population one were considered, while in the second the genotypes of both Populations one and two were considered. The following parameters were used in generating the structures: admixture was modelled between the ancestral populations, with the number of ancestral populations (K) ranging from 1 to 10, with 10 repetitions for each K. For each run, the Markov Chain Monte Carlo (MCMC) number was one million, with burn-in of the first 500 thousand MCMC.To estimate the number of ancestral populations that best fit the structure of the genotypes, we used the criterion ΔK []: ΔK=m[|L(K+1)−2L(K)+L(K−1)|]σ[L(K)](9) in which L(K) is the average of the natural logarithm of the probability of the data at each step of the MCMC minus half the variance for K populations as estimated by the Structure Harvester app [].Given the results of the population structure, the contribution of each hybrid to the population could be determined according to their average value of association probability (Qik), where Qik is the estimated proportion of the genome of the i-th genotype derived from the k-th ancestral population []. From the values of each Qik a bar chart of each genotype was built representing the relationship of the i-th genotype with the k-th ancestral population. Genotypes were considered to be from the same group or population when Qk exceeded 60%. Genotypes were considered to belong to the mixed population if they did not fit in any population, as previously described []. […]

Pipeline specifications

Software tools PowerMarker, MEGA, kinship, Structure Harvester
Databases Gramene
Application Population genetic analysis
Organisms Zea mays