Computational protocol: Fine Mapping the Branching Habit Trait in Cultivated Peanut by Combining Bulked Segregant Analysis and High Throughput Sequencing

Similar protocols

Protocol publication

[…] Bulked segregant analysis was performed on the F3 families that were found to be homozygous for the spreading or bunch growth habit. In total, 52 completely bunch and 47 completely spreading families were sampled. Young leaves were collected from all 16 individuals in each family. In each phenotypic group (spreading/bunch), all tissues from the families were bulked for the RNA extraction. Working on the RNA level was preferable to working on the DNA genomic level due to the large and relatively complex peanut genome and also facilitated the detection of candidate genes. Samples were taken of each of the ground tissues (400 mg each) and were used for RNA extraction using the hot-borate method, as described by Brand and Hovav (). The total RNA was used to prepare two RNA-Seq libraries, using TruSeq RNA Sample Preparation Kit v2 (Illumina) following the manufacturer's protocol as described previously (Gupta et al., ). Libraries were validated using DNA Screen Tape D1000 and the Tapestation 2200 (Agilent). RNA-Seq libraries were sequenced using an Illumina HiSeqTM2000 (single lane) at the sequencing center at the Technion in Haifa, Israel.Data analyses followed the general guidelines for bulk segregant analysis using next-generation sequencing (Magwene et al., ) and the specific guidelines for polyploids (Trick et al., ), with several modifications. Raw reads were subjected to a cleaning procedure using the FASTX Toolkit ( index.htm) including: (1) trimming read-end nucleotides with quality scores <30 using fastq_quality_trimmer and (2) removing reads with less than 70% base pairs with quality score ≤ 30 using fastq_quality_filter. The sequences were mapped against the 4X tetraploid peanut transcript assembly reference ( and against two Arachis diploid genomes (A. duranensis and A. ipaensis; Bertioli et al., ; using Bowtie2 aligner (Langmead and Salzberg, ). The genome Analysis Toolkit (GATK) Unified Genotyper software version 2.5.2 (McKenna et al., ; DePristo et al., ) was used for the detection of SNPs. A custom Perl script was used to derive the symmetric difference of the two SNP sets. Polymorphisms between homologous genomes generate the same doubled code and should be common to both SNP sets. Yet, differences in the SNPs between cv. Hanoch and cv. Harari (varietal-specific SNPs) should generate doubled code for only one bulk and, therefore, be unique to the corresponding SNP set. In this manner, ~13,000 varietal-specific SNPs were retrieved between the two bulks. These SNPs were further filtered according to the number of reads for each SNP > 50, GATK quality value >100 and BFR >3. Also, genes with SNP densities higher than 5 SNPs/kb were eliminated to avoid possible paralogue SNPs. […]

Pipeline specifications

Software tools FASTX-Toolkit, Bowtie2, GATK
Databases PeanutBase