Computational protocol: Evolution of the Transmission-Blocking Vaccine Candidates Pvs28 and Pvs25 in Plasmodium vivax: Geographic Differentiation and Evidence of Positive Selection

Similar protocols

Protocol publication

[…] For both genes, p28 and p25, independent alignments of their nucleotide sequences for P. vivax and their close NHPs malaria species were performed by using the MUSCLE algorithm [] implemented in SeaView4 [] on translated sequences followed by visual inspection and manual editing. The protein domains (signal peptide, EGF-like domains and GPI anchor) were assigned in the alignments following the description used by Saxena et al. 2006 []. In the case of pvs28, the low complexity regions (LCRs) were not included in the polymorphism and phylogenetic analyses; however, those were studied separately as defined by Rich et al. 1997 [].We estimated the polymorphism by gene and by domain within each Plasmodium species by using the population statistics π (the average number of substitutions between any two sequences), number of segregating polymorphic sites (S), and haplotype diversity (Hd). The polymorphism was also explored by computing Tajima’s D statistic []. The distribution of the genetic diversity across the p28 and p25 gene-sequences was described by calculating π on a sliding-window of 50 base pairs (bp) with a step size of 10 sites. The statistic was calculated in each window, assigned to the nucleotide at the midpoint of the window and plotted against the nucleotide position. All these calculations were performed using DnaSP v5.10.01 [].Evidence of natural selection was explored by estimating the average number of synonymous substitution per synonymous site (dS) and non-synonymous substitutions per non-synonymous site (dN) between a pair of sequences under the Nei Gojobori method [], with the Jukes and Cantor corrections as implemented in the MEGA6 []. The difference between dS and dN and its standard error was estimated by using bootstrap with 1,000 pseudo-replications, as well as a two tailed codon based Z-test on the difference between dS and dN as described in Nei and Kumar 2000 []. Under the neutral model, synonymous substitutions accumulate faster than non-synonymous because they do not affect the parasite fitness and/or purifying selection is expected to act against nonsynonymous substitutions (dS≥dN). Conversely, if positive selection is maintaining polymorphism, a higher incidence of nonsynonymous substitutions is expected (dSHyPhy, which uses flexible, but not overly parameter-rich rate distributions [] and allows both dS and dN to vary across sites independently. REL allows for tests of selection at a single codon site while taking into consideration rate variation across synonymous sites. It is often considered the only method for inferring selection from low divergence alignments such as pvs28 and pvs25. Evidence for natural selection was also explored in P. vivax by using the McDonald & Kreitman (MK) test which compares intra and inter-specific number of synonymous and non-synonymous changes []. In this analysis we compared P. vivax with their close NHPPs P. cynomolgi, P. inui and P. knowlesi for both p28 and p25 genes. Significance was assessed using a Fisher’s exact test for the 2 x 2 contingency table as implemented in the DnaSP. [...] In order to study the genetic relationships among worldwide haplotypes, a median joining network was estimated for a set of 284 cosmopolitan sequences of pvs28 and 325 of pvs25 genes by using Network v4.6.1.0 (Fluxus Technologies 2011). Transversions were set equal to transitions and the epsilon parameter set equal to 0 with only one round of star contraction, which collapses star-like structures in the network into single nodes. The total number of sites included in these analyses excluding gaps or missing data were 547 out of 744 for pvs28 and 558 out of 660 for the pvs25 genes. In addition, we also used DnaSP to estimate the fixation index (FST) based on haplotype-frequencies among these geographical regions.In order to investigate whether intragenic recombination generates allelic diversity in the P. vivax ookinete genes, the genetic algorithms for recombination detection (GARD) were used to screen for the recombination breakpoints in both alignments, as implemented in Datamonkey (http://www.datamonkey.org/)[,]. Default parameters for the detection of recombination breakpoints and donor-recipient pairs were used with a significance cut-off of 0.05. [...] The evolutionary relationships among the p28 and p25 genes in Plasmodium spp. were investigated using Bayesian methods implemented in MrBayes v3.2 with the default priors []. A General Time-Reversible model (GTR+I+Γ) was used because it had the lowest likelihood value and possessed the fewest number of parameter that best fit the data (p28 and p25) as was estimated by MEGA6. For both phylogenies (p28 and p25), two independent chains were sampled every 200 generations in runs lasting 6 × 106 Markov Chain Monte Carlo steps, and after convergence was reached, we discarded 50% of the sample as ‘burn-in’ period. Convergence is reached when the value of the potential scale reduction factor is between 1.00 and 1.02 and the average standard deviation of the posterior probability is below 0.01 [].Additionally, the adaptive branch-site random effects likelihood (aBSREL) approach [], implemented in Datamonkey, was run to detect evidence of episodic positive selection on all branches using both phylogenies (p28 and p25). It allows for different Ka/Ks ratios among sites and branches. We performed a likelihood ratio test (LRT) comparing the null model (ω = 1) against the alternative, where the branch was undergoing some form of selection (ω ≠ 1). In addition, we used BUSTED, implemented also in Datamonkey, which is an approach to identify gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate []. In these analyses we selected both human malarias P. vivax and P. falciparum branches because BUSTED requires pre-specified subset of lineages. […]

Pipeline specifications

Software tools MUSCLE, SeaView, DnaSP, MEGA, HyPhy, GARD, Datamonkey, MrBayes
Applications Phylogenetics, Population genetic analysis, Nucleotide sequence alignment
Organisms Plasmodium vivax, Homo sapiens, Toxoplasma gondii, Plasmodium cynomolgi
Diseases Malaria