Computational protocol: Mutation tendency of mutator Plasmodium berghei with proofreading-deficient DNA polymerase δ

[…] The paired-end reads were mapped to the reference genome sequence downloaded from the Wellcome Trust Sanger Institute ( using BWA version 0.7.5a, and duplicate sequences were removed using Picard tools version 1.96 ( Local realignment around indel sites and base quality recalibration were conducted using the Genome Analysis Toolkit (GATK) version 2.7.4. Subsequently, variant calling based on the alignments of clones was performed using two pipelines, i.e., one with VarScan2 version 2.3.6 and the other with GATK. The alignments were converted into mpileup files using SAMtools. Base substitutions and indels were called from mpileup files with VarScan2 using the following conditions: ≥10 base coverage; mutation frequency criteria of ≥80% and ≥50% for base substitutions and indels, respectively; and a minimum PHRED quality score of 20. Base substitutions and indels were also called by GATK UnifiedGenotyper using a minimum PHRED quality score of 20. Detected variants were confirmed by visual inspection of mapping data with the Integrated Genome Viewer. We validated all indels, all base substitutions causing nonsense mutations, and all base substitutions detected using either of the two pipelines by Sanger sequencing (). We also sequenced the ancestral clones (WT0, Mut0, and Md0) of each passage line to identify only the mutations accumulated during serial passages, and their variants were excluded from the subsequent analysis. The detected variants were annotated using SnpEff version 3.6c. To summarise the distribution of mutations, a Circos plot was generated using the Circos program.The detection of base substitutions based on the alignments of populations was performed using LoFreq version 2.0.0 with the default settings. Visual inspection and annotation of detected base substitutions were performed as described above. [...] All of the statistical tests in this study were conducted in R. P < 0.05 was considered statistically significant. Graphs were generated using the ggplot2 R package. DCA was performed using the vegan R package based on the expression data for 4896 protein-coding genes, which we obtained after excluding genes with low expression levels (reads per kilobase of transcript per million mapped reads, RPKM < 1) at all stages. […]

Software tools BWA, Picard, GATK, VarScan, SAMtools, IGV, SnpEff, Circos, LoFreq, Ggplot2, vegan
Applications Miscellaneous, Genome data visualization
Organisms Plasmodium berghei, Rattus norvegicus, Toxoplasma gondii, Mus musculus
Diseases Malaria