Computational protocol: Phylogeography of Japanese Encephalitis Virus: Genotype Is Associated with Climate

Similar protocols

Protocol publication

[…] All available sequences for the E gene of JEV isolates were retrieved from GenBank in July 2011. The initial JEV E gene dataset was pruned of sequences representing non wild-type virus isolates, duplicate isolates, and isolates absent of information regarding the date and country of collection. The pruned dataset consisted of 489 sequences. The E gene sequences of two JEV isolates (M859/Cambodia/1967/Mosquito and KE-93-83) obtained from the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA) at the University of Texas Medical Branch (UTMB), were determined for analysis in this study utilizing previously described methods , , .Recombination can invalidate the results of coalescent analyses. Therefore, the nucleotide sequence alignment file was analyzed for potential recombination events using RDP , GENECONV , Chimaera , MaxChi and Bootscan methods implemented in RDP3 v Beta 41 . Common program settings were to perceive sequences as linear, require phylogenetic evidence, refine breakpoints and check alignment consistency, while all method-specific program settings remained at their default values. The highest acceptable p-value was set at 0.05, after considering Bonferroni correction for multiple comparisons. Potential recombination events were those that were identified by at least two methods. The breakpoint positions and recombinant sequence inferred for the potential recombination events were manually confirmed using the phylogenetic and recombination signal analysis features in RDP3. The K82P01 and K91P55 sequences were confirmed as recombinants (). These two isolates were not available from the WRCEVA at UTMB to re-sequence; therefore, the two corresponding sequences were removed from the dataset, leaving a final dataset of 487 sequences.To make an initial identification of the genotype of the JEV E gene sequences, neighbor-joining (NJ) and maximum-likelihood (ML) phylogenies were generated using SeaView v 4.2.12 and PhyML v 3.0 on the South of France bioinformatics platform , respectively.The final dataset of 487 JEV E gene sequences included information regarding the year, host and country of collection of the corresponding virus isolates. Sequences derived from isolates collected north of the Tropic of Cancer (23.5°N) were classified as temperate, while sequences derived from isolates collected south of the Tropic of Cancer were classified as tropical. The climate corresponding to five Taiwanese sequences could not be ascertained and therefore these sequences were not included in the climate phylogeographic analysis described below. [...] To estimate the date and location of the most recent common ancestor (MRCA) of the five genotypes and the overall rate of molecular evolution, time-scaled Bayesian phylogenies (country and climate) were inferred from the JEV E gene sequence dataset using a Bayesian Markov Chain Monte Carlo (MCMC) method implemented in BEAST v 1.6.1 .An SDR06 nucleotide substitution model , a relaxed-uncorrelated exponential molecular clock and a piecewise constant Bayesian skyline demographic model with 20 coalescent-interval groups were used in all analyses. The relaxed-uncorrelated exponential molecular clock was found to best-fit the data when Bayes factor (BF) values were calculated (Tracer v 1.5.1) to evaluate the relative fit of strict and relaxed molecular clock models to the data by determining the natural logarithm of the ratio of the marginal likelihoods of the competing models . The good fit of this relaxed clock model to the data has recently been shown to be an artifact of the harmonic mean estimator . However, preliminary analyses showed that the selection of a particular relaxed molecular clock model had little effect on the results.To infer the probable geographic origin of the MRCA of the genotypes of JEV, the BEAST input files (country and climate) created in BEAUti v 1.6.1 were edited to include the Bayesian stochastic search variable selection procedure .The Bioportal at the University of Oslo was used to execute the MCMC analyses for 600 million generations. This was achieved by using LogCombiner v 1.6.1 to compile 12 independent runs of 50 million generations (sampled every 1,000th state) to attain convergence, which was assessed by examining the trace and effective sample size statistics for each model parameter in Tracer v 1.5 . TreeAnnotator v 1.6.1 was used to summarize the posterior tree distribution and annotate both country and climate maximum clade credibility (MCC) phylogenies, which were viewed in FigTree v 1.3.1 . Each of the nodes of the Bayesian MCC phylogenies were annotated with posterior probability (PP) values, estimated median dates of the MRCA with corresponding 95% HPD values, and state PP values for each plausible geographic location of origin (country and climate). In addition, BOA v 1.15 implemented in R v 2.15.1 was used to calculate a 50% HPD interval for the date of the root of the phylogeny.Maps showing the distributions of sequences according to sampling location (country and climate) were created using GIMP v 2.6.12 from a blank map of Asia. [...] To test the null hypothesis of no association between genotype and climate, a Fisher's exact test was performed at α = 0.05 (IBM SPSS Statistics v 20). Post-hoc analyses were then performed to determine which cell(s) in the table of genotype versus climate contributed the most to the statistically significant Fisher's exact test. Adjusted standardized residuals (z-scores) were calculated and the Bonferroni method was used to correct for multiple comparisons. The adjusted standardized residual values were then compared against the critical z-value (±1.96) for α = 0.05 (IBM SPSS Statistics v 20). Only GI-a, GI-b, GII and GIII were considered in these analyses, as the dataset included only three sequences each for GIV and GV.The null hypothesis of no phylogeny-trait association was further evaluated at α = 0.05 using the association index (AI), parsimony score (PS), unique fraction (UniFrac), nearest taxa (NT), net relatedness (NR), phylogenetic diversity (PD) and maximum exclusive single-state clade size (MC) statistics calculated from the posterior set of trees generated by BEAST in Befi-BaTS v 0.1.1 . [...] Nonsynonymous substitutions involved in the phylogenetic divergence of the five genotypes of JEV were identified within the E protein alignment. The E gene alignment was evaluated for statistically significant evidence of positive selection (ratio of nonsynonymous to synonymous nucleotide substitutions [dN/dS] >1; p<0.05) using the single-likelihood ancestor counting (SLAC), fixed effects likelihood (FEL) and internal FEL (IFEL) methods available on the Datamonkey webserver . All analyses of positive selection utilized a NJ phylogeny and the reversible nucleotide substitution model. Evidence of directional selection within the E protein alignment was evaluated using the directional evolution of protein sequences (DEPS) method implemented in HyPhy v 2.0 . The DEPS method utilized a Bayesian phylogeny and the Jones, Taylor, Thorton amino acid substitution model to assess for the presence of statistically significant shifts in amino acid residue frequencies (p<0.05) and/or a statistically significant large number of substitutions toward a particular residue (BF>100). […]

Pipeline specifications

Software tools SeaView, PhyML, BEAST, FigTree, SPSS, Datamonkey, HyPhy
Applications Miscellaneous, Phylogenetics, Population genetic analysis
Organisms Japanese encephalitis virus, Sus scrofa
Diseases Encephalitis, Encephalitis, Arbovirus