Computational protocol: Increasing airline travel may facilitate co-circulation of multiple dengue virus serotypes in Asia

Similar protocols

Protocol publication

[…] Maximum phylogenetic information would be obtained by studying whole DENV genomes. However, available DENV whole genomes from Asia have insufficient coverage through time and space for reliable analysis, and most available sequences comprise partial or complete coding E gene sequences. In order to generate a data set with both acceptable phylogenetic diversity and spatiotemporal sampling, we used DENV E gene sequences in subsequent analyses. DENV (DENV-1 to DENV-3) envelope (E) gene sequences with known collection dates and locations of sampling in Asia were collected from GenBank. DENV-4 was not included in this study because too few samples were available (only 64 sequences from 11 countries; ). The remaining strains comprised a total of 2,202 sequences sampled between 1956 and 2015, from 20 distinct countries or geographic regions (). Sequences were grouped by serotype and aligned separately using MAFFT []. Recombination was inspected using the methods implemented in RDP3 and SimPlot []. After removing duplicate and recombinant strains, the final data set contained 1,272 DENV-1 sequences, 628 DENV-2 sequences and 302 DENV-3 sequences. In the complete sequence dataset, some countries, such as Vietnam, Cambodia, Thailand, and Singapore, were over-represented. In order to control for possible bias from uneven sampling, we randomly subsampled the complete sequence datasets by location and sampling time. At most 10 sequences were sampled per country and per year in order to create a more equitable spatio-temporal sampling distribution. After sub-sampling, the total number of sequences analyzed here was 327 for DENV-1, 357 for DENV-2, and 202 for DENV-3, sampled over a total of 59 years (). Details of the sequences in each data set, including information on the year of isolation, sampling location, and accession numbers, are provided in Supplementary Information (). [...] For each serotype subsampled data set, we first estimated the correlation between root-to-tip genetic divergence and sequence sampling dates, using TempEst []. This preliminary analysis indicated a good temporal signal for all serotypes (). To reconstruct past population dynamics, we used a coalescent-based Gaussian Markov random field (GMRF) method with the time-aware smoothing parameter [], as implemented in BEAST v1.8.2 []. A GTR+I+Γ nucleotide substitution model and an uncorrelated lognormal relaxed molecular clock model were used, with a prior distribution for the evolutionary rate parameter set to a Γ distribution with shape = 0.001 and scale = 1000. The BEAGLE library was used to accelerate computation []. For each serotype, three independent analyses of 150 million generations were performed, sampling parameters and trees every 15,000 generations. Analyses were combined after the removal of a burn-in of 10–20% of the samples and were checked visually in Tracer v.1.5. […]

Pipeline specifications

Software tools MAFFT, SimPlot, TempEst, BEAST
Application Phylogenetics