Computational protocol: Mapping a male-fertility restoration locus for the A4 cytoplasmic-genic male-sterility system in pearl millet using a genotyping-by-sequencing-based linkage map

[…] Genomic DNA was extracted from dried young leafs of individual F2 plants and their parental lines using the DNeasy Plant Mini Kit (Qiagen Inc., Valencia, CA). Quality and quantity check of extracted DNA was performed using HindIII digestion and gel analysis. Fifty μl aliquots of each of 196 DNA samples (190 F2 individuals and six parents) containing > 10 ng μL− 1 per sample were sent in three 96-deep well plates to the Genomic Diversity Facility at Cornell University in Ithaca, New York, for GBS analysis. The remaining space in the plates was filled with further pearl millet samples from our project. Each 96-well plate contained one randomly positioned blank.GBS libraries were prepared and analyzed at the Genomic Diversity Facility at Cornell University according to Elshire et al. [], using the restriction enzyme PstI and sequenced at 96-plex level on the Illumina HiSeq2000 with single-end read sequencing.The raw GBS data files (FASTQ) were processed to SNP calls using the GBS version 2 pipeline of Tassel 5 (Version 5.2.28) []. The sequenced tags were aligned to the pearl millet reference genomic sequence provided by the Pearl Millet Genome Sequencing Consortium [], using the Burrows-Wheeler Alignment Tool (BWA) []. [...] High-quality SNPs were called using TASSEL 5. SNPs with more than 20% missing data, a minor allele frequency below 40%, or those which were heterozygous in one or both parents were filtered out. Genotypes (plants) showing > 50% missing data were removed. After this filtering, the remaining 2445 SNPs were imputed using the FSFHap algorithm [] implemented in TASSEL 5.Chi-square tests were performed on each marker for 1:2:1 (A:H:B) expected genotypic segregation ratios to assess the amount of segregation distortion. Only 29 SNPs showed significant segregation distortion at the 5% level after a Bonferroni correction for multiple tests. These SNPs were discarded.The genetic map was constructed using the MSTmap algorithm [] implemented in the R package ASMap [, ]. A total of 73 SNP markers were designated to outlying linkage groups (LG) with a very low number of SNPs and were discarded. The numbering of LGs was based on the genome sequence, which corresponds to the numbering of the consensus map published by Rajaram et al. []. The map length was re-estimated using the Lander-Green algorithm within the software package R/qtl, and choosing the Haldane function. The genetic map with its 2343 markers contained many redundant markers (caused by co-segregation) which were excluded, thus the final linkage map was based on 460 markers. […]

