Computational protocol: Stochastic Processes Are Key Determinants of Short-Term Evolution in Influenza A Virus

Similar protocols

Protocol publication

[…] Sequence alignments were manually constructed for the major coding regions of each segment, focusing on the HA (1,698 bp) and NA (1,407 bp). An alignment of the concatenated six internal gene segments was also constructed (9,636 bp), as these are expected to exhibit evolutionary patterns different from HA and NA. To place the New York State viruses in a global context, 48 unique HA gene sequences and 140 unique NA gene sequences from other human and swine influenza A viruses sampled worldwide from 1997–2005 were compiled from GenBank to make total datasets of 466 and 553 sequences for the HA and NA, respectively ( and ).Phylogenetic trees were inferred for the HA, NA, and concatenated internal genes using the maximum likelihood (ML) method available in PAUP* []. In each case the Hasegawa-Kishino-Yano (HKY) 85 + I + Γ4 model of nucleotide substitution was employed, with the transition–transversion ratio, proportion of invariant sites (I), and the gamma distribution of among-site rate variation with four rate categories (Γ4) estimated from the empirical data (parameter values available from the authors on request). Because of the very large size of all datasets, the nearest-neighbor-interchange branch-swapping method was employed. To assess the robustness of individual nodes on the phylogenetic tree, we performed a bootstrap resampling analysis (1,000 replications) using the neighbor-joining procedure but incorporating the ML substitution model. Independent entries of A(H3N2) viruses into New York State were determined by identifying viral isolates that were separated from the others circulating in that season (1) by viruses sampled from localities outside of New York State, and (2) by exceptionally long branch lengths.Rates of nucleotide substitution and age of the MRCA were estimated using a Bayesian Markov Chain Monte Carlo (MCMC) method available in the BEAST package [], which considers the distribution of branch lengths among viruses sampled at different times (day of sampling). Uncertainty in the data is reflected in the 95% highest probability density values. This analysis employed the HKY85 substitution model assuming exponential population growth and a relaxed (uncorrelated exponential) molecular clock which consistently best fit the data. In all cases, chains were run until convergence was achieved.We used the MacClade program [] to determine those amino acid changes in the HA gene that occurred within and among each clade, particularly those at (1) 131 amino acid positions in five antigenic regions of the HA1 domain [–], and (2) 18 antigenic sites in the HA1 domain that have previously been proposed to experience positive selection []. Site-specific selection pressures in HA (New York State viruses alone) were measured as the ratio of dN to dS per site estimated using the single likelihood ancestor counting (SLAC; all sequences per season) and fixed effects likelihood (FEL; maximum of 50 randomly sampled sequences per season) methods, both incorporating the general reversible substitution (REV) model with phylogenetic trees inferred using the neighbor-joining method available at the Datamonkey facility []. This analysis was undertaken on the intraseasonal data and an interseason dataset comprising a random sample of three isolates from each clade and all singletons (n = 52 isolates). […]

Pipeline specifications

Software tools MacClade, Datamonkey
Applications Phylogenetics, Population genetic analysis
Organisms Influenza A virus, Homo sapiens
Diseases Influenza, Human, HIV Infections