Computational protocol: Tobamoviruses have probably co-diverged with their eudicotyledonous hosts for at least 110 million years

Similar protocols

Protocol publication

[…] Sequences were edited using BioEdit () and MSWord Gene sequences were sorted, checked, grouped, and duplicates removed using the neighbour-joining (NJ) facility in ClustalX (), and were then aligned using MAFFT () with its ‘iterative refinement using local pairwise alignment’ option. Wasabi mosaic virus sequences were so close to the youcai mosaic virus population, they were treated as such. The Accession Codes of the 185 sequences in the resulting alignment are given in Supplementary Data Table 1; a few tobamoviruses are represented by single sequences, and the 67 for TMV are the most for a single species. The genomic sequences were separated into their constituent ORFs (i.e. the separate genes for the replicase, movement, and CPs, including any overlapping regions of each gene). They were aligned using the encoded amino acids as guide by the TranslatorX server (; http://translatorx.co.uk) with its MAFFT option. The ORFs were analysed both separately, and in combinations, but mostly as the complete concatenated ORFs (concats) of each genome. The latter were tested for the presence of phylogenetic anomalies using the full suite of options in RDP4 (). Sites in the aligned protein sequences with more than five gaps were removed using POSORT, a DOS command line program available at http://192.55.98.146/_resources/e-texts/README-POSORT.pdf. Consensus protein sequences were derived for each of the species using the ‘Create Consensus Sequence’ function in BioEdit (). This function examines each site in the species alignments and determines whether any amino acid at that site is present at or above the selected frequency and, if one is, records that amino acid in the consensus. When more than one type of amino acid exceeds the selected frequency, then one of those possibilities is chosen at random and recorded in the consensus sequence, and if none exceeds the selected frequency then a gap is recorded. [...] Models for ML analysis were chosen using TOPALi () and the ProtTest server at http://darwin.uvigo.es (); the best fit models were found to be GTR+Г4 I () for nucleotide sequences and LG+ Г4 I () for amino acid sequences. Phylogenetic trees were inferred using PhyML 3.0 (ML) (), and the support for their topologies assessed using the log-likelihood support for the trees, and the SH-support () for their branches.For comparison the relationships of the 29 tobamoviruses was also calculated from their 100% consensus concatenated amino acid sequences using BEAST v1.8.2 and associated programs (); the resulting Maximum Clade Credibility (MCC) tree was from a 750,000 cycle analysis using LG + Г4 I, a log normal relaxed clock, and constant population size, and the summary tree came from 939 trees obtained from a stable trace after removal of a 250,000 cycle burn-in. The ML and MCC trees were compared by the PATRISTIC method ().Trees were drawn using Figtree Version 1.3 (http://tree.bio.ed.ac.uk/software/figtree/), and pairs of trees were compared using PATRISTIC (). […]

Pipeline specifications

Software tools BioEdit, Clustal W, MAFFT, TranslatorX, TOPALi, ProtTest, PhyML, BEAST, PATRISTIC, FigTree
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Viruses, Human poliovirus 1 Mahoney