Computational protocol: First‐generation HapMap in Cajanus spp. reveals untapped variations in parental lines of mapping populations

Similar protocols

Protocol publication

[…] To reduce sequencing errors, paired‐end sequencing reads were trimmed and filtered with sickle version 1.200 (https://github.com/najoshi/sickle). Initially, duplicate reads were removed, further low‐quality reads (having phred score <30) and sequences shorter than 100 nucleotides, or containing ‘N’, were removed using in‐house QC pipeline NGS‐QCbox (Katta et al., ). After cleaning steps, filtered reads were mapped onto the reference genome with bowtie2 v2.2.4 (Langmead and Salzberg, ) using default options. Reads mapped on more than one position or not mapped were filtered to define uniquely mapped reads and unmapped reads. Reads having unique alignment onto the reference genome were retained in the BAM files. BAM files were further processed for variant (SNP and InDel) calling using Genome Analysis Toolkit suite with a minimum depth coverage of five reads per individual accession (McKenna et al., ). Using an in‐house perl script, the distribution of identified variants was analysed along the entire genome using a contiguous window of 100 kb. Additionally, identified SNPs were classified into homozygous and heterozygous (reads aligned at a position contained reference as well as alternate bases) SNPs, on the basis of mismatch frequencies. InDels were identified within the size range of 1–48 bp. Accession‐specific variants (SNPs and InDels) were reported only if the variant call was present in a particular accession and reference allele was present in remaining accessions. For identification of CNVs, CNVnator tool was used with an e‐value of 1e‐05 (Abyzov et al., ). Raw reads from Asha (ICPL 87119) genotype were aligned to draft assembly (Asha) for detecting the false positives in CNVs. Identified CNVs present in genes with length ≥1 kb were then reported. The frequencies of variants (SNPs and InDels) and CNVs were then projected using Circos (Krzywinski et al., ) across the categorized genotypes. [...] Identified SNPs and InDels based on their genomic locations were annotated as intergenic, intronic and exonic using SnpEff (Cingolani et al., ). The variants were further categorized into synonymous, nonsynonymous, start codon loss, stop codon gain, frame shifts, etc. The effects of variants were classified on the basis of their impacts as high, moderate, low and modifier. Generic feature format files having information on positions of variants were constructed by aligning the sequences against the reference genome. Accession‐specific variants (SNPs and InDels) present in exonic region for each accession were functionally annotated using UniProtKB database, and GO terms were assigned accordingly (Huntley et al., ). Further, the impacts of accession‐specific variants in various biological pathways were examined using The Biological Networks Gene Ontology tool (Maere et al., ).The phylogenetic tree was constructed using DNAML programs in the PHYLIP package and 1000 bootstraps with other default parameters of SNPhylo program (Lee et al., ). The software mega4 was used for visualizing the phylogenetic tree (Tamura et al., ). […]

Pipeline specifications