Computational protocol: Genetic characterization of the complete genome of a highly divergent simian T-lymphotropic virus (STLV) type 3 from a wild Cercopithecus mona monkey

Similar protocols

Protocol publication

[…] Comparison of the full-length, gap-stripped PTLV-3 genomes was performed with the SimPlot program (Version 3.5.1) where STLV-3d(Cmo8699AB) was the query sequence using the F84 (ML) model and a transition/transversion ratio of 2.0 []. RNA secondary structure of the LTR region was predicted using the mfold web server program [] found at . Prediction of splice acceptor (sa) and splice donor (sd) sites was performed using the NetGene2 program available at the web server []. Identification and analysis of ORFs were performed using the ORF Finder program available at .Percent nucleotide divergence was calculated using the DNASTAR MegAlign 7.2 software (). For phylogenetic analysis two datasets were used. To investigate the phylogenetic relationship between PTLV, the first dataset included tax sequences from complete PTLV genomes available at GenBank and the new STLV-3 tax sequences from Cmo8699AB, Cni7867AB, Cni3034, Cni3038, and Lal9859 obtained in the current study, respectively. For further phylogenetic resolution of STLV-3d among PTLV, a larger dataset was used and included concatenated gag, pol, env, and tax sequences from complete PTLV genomes available at GenBank and the complete genome of STLV-3d(Cmo8699AB) determined here. Sequences were aligned using the Clustal W program, followed by manual editing and removal of indels. Nucleotide substitution saturation was assessed using pair-wise transition and transversion versus divergence plots using the DAMBE program []. Unequal nucleotide composition was measured by using the TREE-PUZZLE program []. Nucleotide substitution models and parameters were estimated from the edited Clustal W sequence alignments by using Modeltest v3.7 []. A variant of the general time reversible (GTR) model, which allows six different substitution rate categories (rA ↔ C = 2.62, rA ↔ G = 13.07, rA ↔ T = 2.79, rC ↔ G = 2.26, rC ↔ T = 4.54, rG ↔ T = 1) with gamma-distributed rate heterogeneity (α = 0.7071) and an estimated proportion of invariable sites (0.3436) was determined to best fit the data for the tax only alignments. The best model for the concatenated gag-pol-env-tax alignment was GTR+G, with six different rate substitutions (rA ↔ C = 2.53, rA ↔ G = 11.47, rA ↔ T = 2.58, rC ↔ G = 2.15, rC ↔ T = 4.3, rG ↔ T = 1) and gamma-distributed rate heterogeneity (α = 0.366). Phylogenetic trees were inferred using Bayesian analysis implemented in the BEAST software package [] and with maximum likelihood (ML) using the PhyML program available online at the webserver []. Support for branching order of the ML-inferred trees was evaluated using 500 bootstraps. Two independent BEAST runs consisting of 10 - 100 million Markov Chain Monte Carlo (MCMC) generations for the tax only and PTLV concatamer alignments, respectively, with a sampling every 1,000 generations, an uncorrelated log-normal relaxed molecular clock, and a burn-in of 100,000 to 1 million generations. Both the constant coalescent and the Yule process of speciation were used as tree priors to infer the viral tree topologies. Convergence of the MCMC was assessed by calculating the effective sampling size (ESS) of the runs using the program Tracer (v1.4; ). All parameter estimates showed significant ESSs (> 300). The tree with the maximum product of the posterior clade probabilities (maximum clade credibility tree) was chosen from the posterior distribution of 9,001 sampled trees (after burning in the first 1,000 sampled trees) with the program TreeAnnotator version 1.4.6 included in the BEAST software package []. Trees were viewed and edited using FigTree v1.1.2 .Divergence dates for the most recent common ancestor (MRCA) of STLV-3d(Cmo8699AB) were obtained by using both the tax only and the concatenated gag-pol-env-tax alignments, using Bayesian inference and using a relaxed molecular clock in the BEAST program. The PTLV evolutionary rate assumed a global molecular clock model and was estimated according to the formula: evolutionary rate (r) = branch length (bl)/divergence time (t) []. Divergence dates were obtained from well-established genetic and archaeological evidence for the timing of migration of the ancestors of indigenous Melanesians and Australians from Southeast Asia [,,,]. The PTLV evolutionary rate was estimated by using the divergence time of 40,000 - 60,000 years ago (ya) for the Melanesian HTLV-1 lineage (HTLV-1mel) and 15,000-30,000 ya for the most recent common ancestor of HTLV-2a/HTLV-2b native American strains as strong priors in a Bayesian MCMC relaxed molecular clock method implemented in the BEAST software package []. The use of two calibration points has previously been shown to provide more reliable estimates of PTLV substitution rates than a single calibration date [,]. The upper and lower divergence times estimated from anthropological data were used to define the interval of a strong uniform prior distribution from which the MCMC sampler would sample possible divergence times for the corresponding node in the tree. […]

Pipeline specifications

Software tools SimPlot, Mfold, NetGene2, Open Reading Frame Finder, Clustal W, DAMBE, TREE-PUZZLE, ModelTest-NG, BEAST, PhyML, FigTree
Applications Phylogenetics, WGS analysis
Organisms Human T-lymphotropic virus 4, Simian T-lymphotropic virus 3, Human T-cell leukemia virus type I, Simian T-lymphotropic virus 1, Homo sapiens