Computational protocol: First Dating of a Recombination Event in Mammalian Tick-Borne Flaviviruses

Similar protocols

Protocol publication

[…] Alignments were generated from GenBank sequences retrieved in January 2011, aligned using Muscle , rechecked and improved manually in the UTR regions. Sequences were numbered from the start of the ORFs using Neudoerfl (U27495) as reference. Details on the included sequences are provided in .ALN1 contains 41 complete nucleotide sequences of Tick-borne encephalitis virus and three out-groups selected among LGTV and OHFV. This initial alignment was scanned for recombination events and then down sampled to an alignment (ALN2) of 28 complete sequences of known collection dates (from 1937 to 2008), with the deletion of out-groups and strains with unusual sampling locations. UTRs and gap columns were deleted. ALN2 was further partitioned by individual genes resulting in alignments ALN2_C, ALN2_PrM, ALN2_E, ALN2_NS1, ALN2_NS2A, ALN2_NS2B, ALN2_NS3, ALN2_NS4A, ALN2_NS4B and ALN2_NS5. Next, we produced ALN3 from ALN2 with the deletion of the E gene and the region of NS3 identified as a possible recombinant fragment. Finally, E_161 was compiled from the 161 longest E-sequences available in Genbank (1033 to 1491 nt in length) endowed with sampling dates (from 1931 to 2008). [...] An analysis of the entire species (ALN1) was conducted with split networks using the neighbor-net method . Evolutionary distances were estimated using maximum likelihood (ML) with a GTR+Γ4+I as the best-fit substitution model as determined by MODELTEST v.3.7 , according to the Akaike Information Criterion.Several methods were used to extract recombination signal from ALN1 with the RDP3beta36 package , because inspection of the split network had established the possibility of recombination within the species (see results). All analyses were carried out with Bonferroni correction (P-value<0.05) and signals reported by more than one method were retained. RDP , GENECONV , BootScan , MaxChi , Chimaera , and SiScan were used for screenings the alignment. For this initial phase, the following settings were modified to balance sensitivity and statistical significance: RDP: window size 25, detect recombination between sequences sharing 90% to 100% identity; GENECONV: G-scale 5; BootScan: windows size 100, use NJ trees, 200 bootstrap replicates, cutoff percentage at 95% and Jin and Nei 1990 model; Chimaera: 40 variable sites per window; SisScan: window size 80, slow exhaustive scan. As all methods detected the presence of significant recombinant signals in the NS3 gene, the dataset was further evaluated for phylogenetic evidence of recombination based on an alignment of NS3-sequences derived from ALN1. [...] For the phylogenetic analysis, the NS3 partitions 5′ and 3′ of the putative recombinant fragment were concatenated. Trees were inferred separately for the recombinant region alone and for the concatenated region.Maximum likelihood analyses were performed with RAxML VI-HPC v.2.2. via the RAxML Web server . The proportion of invariable sites and the number of bootstrap runs were automatically determined.Bayesian phylogenetic trees were constructed with a GTR+I+G nucleotide substitution model for the concatenated alignment of NS3 and a GTR+G model for the recombinant partition. Model selection was based on the corrected Akaike information criterion in MrAic . For each alignment, two separate analyses were run simultaneously with MrBayes v.3.2-cvs (source code accessed with CVS 22 January 2009) for 5000000 generations using the default settings for priors and MCMC proposals. Trees were sampled every 1000th generation, and standard deviation of split frequencies was below 0.01 at the end of each analysis. For all Bayesian analyses (i.e. MrBayes and BEAST), mixing of the MCMC chains and effective sample size (ESS) for each parameter estimate were investigated using Tracer v.1.5 which showed convergence and larger than 200 ESS for each summary statistic. For both MrBayes analyses, the first 2500 trees where discarded as burn-in and the 7500 remaining trees were summarized in a majority-rule consensus tree.For each of the two partitions, we tested alternative topological placement for the putative recombinant strain. Constraining the topology in ML analyses yielded likelihoods for alternative placements that were compared with the likelihood of the best ML tree using the approximately unbiased (AU) test in CONSEL . For this step, ML analyses were performed with PAUP* v.4.0b10 and best trees were sought by heuristic searches (10 random addition replicates, TBR branch swapping, Multrees in effect).Throughout the study, node support was estimated by nonparametric bootstrap (BS, bootstrap support) in ML and with multiple samples from the posterior distribution (PP, posterior probability) in BI. […]

Pipeline specifications

Software tools MUSCLE, ModelTest-NG, RAxML, MrBayes, BEAST, CONSEL
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Homo sapiens, Louping ill virus
Diseases Animal Diseases, Encephalitis, Encephalitis, Tick-Borne, Hemorrhagic Fever, American, Tick-Borne Diseases