Computational protocol: Evolutionary Dynamics and Emergence of Panzootic H5N1 Influenza Viruses

Similar protocols

Protocol publication

[…] To estimate the times of divergence, a total of nine full-length datasets were analyzed: two Eurasian datasets for the H5-HA and N1-NA, and seven Asian datasets for each of the influenza gene segments, except the NA gene. To examine changes in genetic diversity during the evolution of the Gs/GD lineage, we constructed Bayesian skyline plots using a modified Asian HA dataset. This modified dataset consisted of only HA genes of viruses isolated from chicken (n = 54), duck (n = 52), goose (n = 15), pheasant (1) and Guinea fowl (1) in China.To estimate the rates of nucleotide substitution and TMRCAs, we used a Bayesian Markov chain Monte Carlo (MCMC) method as implemented in the program BEAST v1.4.7 ,. Each gene was analyzed with the codon based SRD06 nucleotide substitution model . For each analysis the Bayesian skyline coalescent model was used . Three clock models were compared statistically for each dataset using a Bayes factor test in Tracer v1.4 ,: the strict clock that assumes a single evolutionary rate along all branches, and the uncorrelated lognormal relaxed (UCLD) clock and uncorrelated exponential relaxed (UCED) clock that allow evolutionary rates to vary along branches within lognormal and exponential distributions, respectively . A Bayes factor test of clock models showed that the UCED clock was most appropriate for datasets other than PB2 and PB1, for which the UCLD clock most appropriately described the data. For each dataset, three to five independent Bayesian MCMC runs were conducted for 10–20 million generations sampled to produce 10,000 trees. Convergence of the runs was confirmed using Tracer v1.4 and effective sample size values of >500 indicated a sufficient level of sampling. The results of the multiple runs were then combined using LogCombiner v1.4.7 . Mean evolutionary rates and divergence times were calculated using TreeAnnotator v1.4.7 and TreeStat v1.1 after the removal of an appropriate burnin, 10–20% in most cases, and phylogenetic trees were visualized with FigTree v1.1.2 ,,.Finally, to evaluate if the TMRCAs of each of the gene segments of a given genotype were significantly different or not, the TMRCA of each gene segment was compared to the remaining genes of the genotype using a Bayes factor test . This test was calculated in as follows; given a genotype, the probability of any gene (e.g. PB2) being older than any other segment (e.g. PB1) divided by the probability of PB1 being older than PB2 given the data (tree estimates of TMRCA) multiplied by the inverse estimation for the priors (PB1 being older than PB2 divided by PB2 being older that PB1 of the priors) was calculated for each Bayesian MCMC run . [...] To determine selection pressures on the HA of Gs/GD-like H5N1 viruses in poultry the modified Asian H5-HA dataset was analyzed using the single-likelihood ancestor counting (SLAC) and genetic algorithm (GA) methods available in DataMonkey and HYPHY . The SLAC method calculates global and site-specific nonsynonymous (d N) and synonymous (d S) nucleotide substitution rate ratios (ω = d N /d S) based on the BEAST generated phylogenetic tree and the best-fit nucleotide substitution model. The GA method assigns four ω classes to each lineage in search of the model of lineage-specific evolution that best fits the data . The probability (≥95%) of ω being >1 along a specific lineage was calculated from the averaged model probability of all models rather than by inference from the single best-fitting model . This approach does not require any a priori hypothesis of lineage-specific evolution. […]

Pipeline specifications

Software tools BEAST, TreeStat, FigTree, Datamonkey, HyPhy
Applications Phylogenetics, Population genetic analysis
Organisms Homo sapiens