Computational protocol: Re-visiting the evolution, dispersal and epidemiology of Zika virus in Asia

Similar protocols

Protocol publication

[…] Partial and complete genome ZIKV sequences were retrieved from the NCBI GenBank (www.ncbi.nlm.nih.gov/genbank/) focusing on Asian lineage ZIKV isolates. Initially, all available complete genomes (as of October 2017) were retrieved. Sequences collected from the Pacific islands and the Americas were subsampled to reduce the data load. Subsequently, partial phylogenetically informative sequences were collected around the Indian–SEA region and added to the dataset. These sequences were then used to construct two curated open reading-frame datasets (sequence information is available in Supplementary Table ); (i) one open reading-frame dataset included 89 sequences, representing 5 from the African lineage and 84 of the Asian lineage; (ii) the other open reading-frame dataset included 84 sequences of Asian lineage only. The datasets were aligned with Mafft v.7.266, keeping the reading-frame consistent with amino-acid positions, and were visualized and edited in AliView. All subsequent analyses were performed with a generalized time-reversible nucleotide substitution model with four gamma distributed rate variation categories and a proportion of invariant sites, as selected by jModeltest v.2.To analyze the phylogenies of the 84 Asian lineage strains, including the novel Indian sequence, a Bayesian phylogenetic tree was computed using MrBayes v.3.2.6 with the dataset including the 5 African ZIKV lineage strains as an out-group. Two parallel runs with four Metropolis-coupled chains were initiated for 5 M Markov chain Monte Carlo generations using the previously determined models of nucleotide evolution with default flat Dirichlet priors, sampling every 1000 generations and discarding the first 25% as burn-in before computing a consensus tree. Since only partial sequences of the Indian isolate were available, it is likely that phylogenetic placement was also examined by constructing three separate trees based on either the capsid (453 bp), envelope (773 bp), or NS2b/NS3 (1393 bp) sequence alignments.To estimate the evolutionary rates and time to the most recent common ancestor for the Asian lineage, BEAST v.1.8.3 was employed. Initially, the temporal structure was assessed with TempEst V.1.5.1 (http://tree.bio.ed.ac.uk/software/tempest/) plus MM-type robust regression. Subsequently, a path-stone and stepping-stone model-test was performed, determining that a strict molecular clock with a non-informative continuous-time Markov chain (CTMC) prior and a Bayesian skyline coalescent tree prior with a piecewise-constant demographic model best suited the dataset based on Bayes factor evaluation. Based on the resultant Bayesian phylogenetic tree, which included African ZIKV as outgroup sequences, all Asian ZIKV lineages except the Malaysian 1966-sequence were annotated as a monophyletic clade. The robustness of the dating was also evaluated by excluding the Indian sequence. Using the models and parameters suggested by the model-test, two analyses were run in parallel for 100 M MCMC generations, sampling every 10,000 generations per analysis. The convergence of the two runs was assessed with Tracer 1.6 (http://tree.bio.ed.ac.uk/software/tracer/). Tree- and log-files were combined with LogCombiner (BEAST-package), and a maximum-clade credibility tree was computed with TreeAnnotator (BEAST-package) after discarding the first 10 M MCMC generations of each run. The resultant consensus tree was visualized and edited in FigTree v.1.4.1 (http://tree.bio.ed.ac.uk/software/figtree/). All computations were run using the CIPRES computational cluster. […]

Pipeline specifications

Software tools MAFFT, AliView, jModelTest, MrBayes, BEAST, TempEst, FigTree
Application Phylogenetics
Organisms Zika virus, Human poliovirus 1 Mahoney