Computational protocol: Optimized Method of Extracting Rice Chloroplast DNA for High Quality Plastome Resequencing and de Novo Assembly

Similar protocols

Protocol publication

[…] The sequenced paired-end reads from the purified cpDNA and three sets of whole-genome sequencing reads (total DNA [tDNA] 1–3) downloaded from public databases were trimmed in Trimmomatic v. 0.33 software () with the following parameters: SLIDINGWINDOW: 8:20; TRAILING: 20; MINLEN: 90 (tDNA 1), 100 (purified cpDNA and tDNA 2), 76 (tDNA 3). The processed reads were aligned to the rice plastid reference genome (X15901.1) or a combined plastid (X15901.1)-mitochondrial (BA000029) reference genome by using the BWA-MEM v. 0.7.15 algorithm () with default parameters. PCR duplicates in BAM files were marked with Picard tools v. 1.68 software. Then local realignment of reads around indels was done in GATK (Genome Analysis Toolkit) IndelRealigner software (). To estimate cpDNA purity, we extracted unaligned reads from BAM files in SAMtools software () and re-aligned them on the rice mitochondrial reference genome (BA000029). These unmapped hits were extracted again and realigned on the rice nuclear reference genome (IRGSP-1.0). The coverage depth of each genome was calculated from the number and length of high-quality reads in a 250-nt sliding window. To calculate allele frequency at individual plastid genome positions, we generated wig files describing the base (A/C/G/T) content in a 1-nt sliding window from BAM files in igvtools v. 2.2 software (; ). After removal of data neighboring indels because of low reliability, we calculated first and second allele frequencies and coverage depths from wig files with a custom Perl script and then visualized them in 3D scatter plots using the scatterplot3d v. 0.3-37 tool of R (). For variant calling, we used the SAMtools mpileup v. 1.4.1 tool () with default parameters and GATK HaplotypeCaller v. 3.6 software () with the ‘-ploidy 1’ parameter to compare SNPs and small indels from BAM files. We filtered out heterozygous and low-quality variants (QUAL < 20) in SAMtools, and low-quality variants (QUAL < 20) in GATK. [...] PCR duplicates were removed from paired-end reads using the k-mer-based method implemented in a Perl script. From paired-reads of total DNA in BAM files, aligned plastid genome reads were extracted for enrichment of plastid reads, and then PCR duplicate reads were filtered out. Contigs were assembled from these reads in SOAP-denovo2 software () with various sets of k-mer parameters (Supplementary Table ). After assembled scaffolds shorter than 500 bp were filtered out, sequences were compared against the plastid reference genome by NCBI BLAST 2 (). Alignment results and detected SNPs/indels were visualized by Circos software (). […]

Pipeline specifications

Software tools Trimmomatic, BWA, Picard, GATK, SAMtools, IGV, BLASTN, Circos
Databases IRGSP
Applications WGS analysis, Genome data visualization
Organisms Oryza sativa
Chemicals Nitrogen, Sucrose