Computational protocol: Resolution of Genetic Map Expansion Caused by Excess Heterozygosity in Plant Recombinant Inbred Populations

Similar protocols

Protocol publication

[…] Calculations of the genotype frequencies for proportions of heterozygosity maintained, h, other than 0.5 were implemented in C within a fork of the R/qtl v1.28.19 code base (). Specifically we used the golden section search algorithm as implemented in the R/qtl BCsFt tools () to estimate recombination fractions given genotype data for a marker pair. Map distances were calculated using the Haldane mapping function given the recombination fractions estimated from the golden section search.The source code is available on GitHub as a forked R/qtl repository at https://github.com/MulletLab/qtl. The hetexp branch contains the new functions, including est.rf.exHet() that can be called from R similar to the existing est.rf() but with a heterozygosity term, h, passed to it. The est.rf.exHet() function can also estimate h on the basis of H for each linkage group. Example usage can be found at https://github.com/MulletLab/exHet_Supplement.Genotypes for a 200-cM linkage group genotyped for 1000 individuals at 1000 markers were simulated under the derived heterozygosity model both (i) without errors or missing data, and (ii) with 1% errors and 5% missing data. The code used to generate the datasets, the simulated datasets, and their respective results can be found at https://github.com/MulletLab/exHet_Supplement. [...] The sorghum recombinant inbred mapping population, BT×623 × IS3620C, were made available by the USDA-ARS Plant Genetic Resource and Conservation Unit, Griffin, GA (). These F7–9 individuals were planted in fields in College Station, TX, in the summer of 2013. DNA was extracted from leaf tissue of 10−12 plants from seed stock of each RIL and prepared by digital genotyping with restriction endonuclease NgoMIV (). The digital genotyping templates were sequenced on Illumina HiSequation 2500 with 72 (or fewer) samples per lane.Genotypes were generated from the sequenced reads of the recombinant inbred lines and their parents, BT×623 and IS3620C. The sequence reads were delivered already sorted on sample barcode, and they were checked for restriction sites using awk; where applicable, preprocessing was parallelized using GNU parallel (). Reads were aligned to the sorghum reference genome (Sbi1) with BWA mem (v 0.7.5a) (; ). Aligned reads were realigned around indels using the Genome Analysis Toolkit (GATK v3.1-1) and the Queue framework with IndelRealigner; individual GVCFs were generated using the HaplotypeCaller; and joint genotyping was performed using GenotypeGVCFs (; ; ). Variants were hard filtered using VariantFiltration under the following criteria: DP < 10; QD < 5.0; MQ < 30.0; MQRankSum < ×10.0; BaseQRankSum < −10.0. The remaining variants were filtered to keep only biallelic variants for which the two parents, BT×623 and IS3620C, were each homozygous for different alleles and to keep only variants that were genotyped with a GQ score ≥ 20 in ≥ 25% of the samples. For these genotypes, the median depth of reads that passed the HaplotypeCaller’s internal quality control metrics (i.e., the median sample-level DP annotation) was 17 reads. Genotypes with a GQ score <20 were set to missing, and those remaining were screened for tight double recombinations occurring within 2 kbp; genotypes involved in a tight double recombination were set to missing. These variants and genotypes were used as the initial input for genetic map construction in R/qtl. […]

Pipeline specifications

Software tools R/qtl, BWA, GATK
Application WGS analysis
Organisms Sorghum bicolor