Computational protocol: Reconstruction of the Evolutionary Dynamics of A(H3N2) Influenza Viruses Circulating in Italy from 2004 to 2012

Similar protocols

Protocol publication

[…] The clock-like signal of the dataset was investigated using Path-O-Gen software (freely available at, which analyses the correlations between time and the root-to-tip distances of a tree constructed without assuming a molecular clock. [...] The sequences were aligned using CLUSTALW (integrated within the Bio-Edit sequence editor by Tom Hall, 2001; The best-fitting nucleotide substitution model was estimated using JModeltest [], and selected a GTR model [] with gamma-distributed rates among sites.The phylogenetic tree, model parameters, evolutionary rates and population growth were co-estimated using a Bayesian Markov Chain Monte Carlo (MCMC) method implemented in the BEAST package v.1.74 [].A strict clock and an uncorrelated log-normal relaxed clock model were both implemented under a GTR + G substitution model. A Bayes factor (BF, using marginal likelihoods) implemented in Beast selected the best-fitting models []. In accordance with [], only values of 2lnBF ≥6 were considered significant. A less restrictive Bayesian skyline plot (BSP, a non-parametric piecewise-constant model) was used as coalescent prior. Two independent MCMC chains were run for 30 million generations (with sampling every 3,000th generation), and were combined using the LogCombiner 1.74 included in the BEAST package. Convergence was assessed on the basis of the effective sampling size (ESS) after a 10% burn-in using Tracer software version 1.5 ( Only ESS’s of ≥200 were accepted.Uncertainty in the estimates was indicated by 95% highest posterior density (95% HPD) intervals.The obtained trees were summarised in a maximum clade credibility tree using the Tree Annotator program included in the BEAST package, and the tree with the maximum product of posterior probabilities (maximum clade credibility: MCC) after a 10% burn-in was displayed using Figtree version 1.3.1) ( [...] Tests for positive selection were conducted on the Datamonkey server [] using the single-likelihood ancestor (SLAC), fixed-effects likelihood (FEL), internal branch fixed-effects likelihood (IFEL), mixed effects model of evolution (MEME), and fast unconstrained Bayesian approximation (FUBAR) methods, and the dN/dS ratios were calculated using the SLAC and FEL codon-based maximum likelihood approaches. SLAC counts the number of non-synonymous changes per non-synonymous site (dN) and tests whether it is significantly different from the number of synonymous changes per synonymous site (dS). FEL estimates the ratios of non-synonymous to synonymous changes for each site in an alignment []. The IFEL method is similar to FEL, but tests site-by-site selection for only along internal branches of the phylogeny. In order to avoid an excessive false-positive rate, sites with SLAC, FEL, IFEL and MEME p-values of <0.1 and a FUBAR posterior probability of >0.90 were accepted as candidates for selection. The property informed model of evolution (PRIME) was designed to take into account the biochemical properties of the amino acids: it works using the same conceptual frameworks as FEL and MEME but, unlike FEL and MEME, allows the non-synonymous substitution rate β to depend on which residues are being exchanged as well as on the site in question.The selected positive sites were superimposed on the HA structure using PyMOL Molecular Graphics system, version 1.3 (Schrödinger, LLC), and strain A/Aichi/2/68(H3N2) (PDB code 3VUN). […]

Pipeline specifications

Software tools TempEst, Clustal W, BioEdit, jModelTest, BEAST, FigTree, Datamonkey, PyMOL
Applications Phylogenetics, Population genetic analysis
Diseases Encephalitis, Arbovirus, Infection, HIV Infections