Computational protocol: The European and Japanese outbreaks of H5N8 derive from a single source population providing evidence for the dispersal along the long distance bird migratory flyways

Similar protocols

Protocol publication

[…] The complete set of available H5N8 nucleotide sequences were downloaded from the NCBI Influenza Virus Resource and GISAID (; ). The search was restricted to complete sequences of H5N8 within the NCBI influenza virus resource.All of the sequences were aligned with Muscle v3.8.31 (). Manual inspection and editing of the sequences was carried out using Mega6.06 (). During manual editing, the 5′ end of the sequence was edited to remove the un-translated region. All sequences begin at the start codon. Sequences with missing nucleotides were removed; this included 3 partial Chinese duck sequences. There was no editing at the 3′ end of the nucleotide sequences, as influenza uses a variety of stop codons that are sometimes repeated. Tip dates were assigned according to the year of collection.A subset of the sequences was created for the detailed analysis of the hemagglutinin and neuraminidase sequences containing only the sequences from 2014; this provides a more detailed analysis for calculating the divergence dates from the Korean outbreak. For these calculations, tip dates were given in months before December 2014. This solves the problem of missing data from earlier sequences where months might not be available.Bayesian Coalescent trees were calculated for all the different segments and the subsets using Beast2.1.3 (). The model used assumed an exponential population growth, and tip dates were set from the sequence collection dates. The Hasegawa–Kishino–Yano (HKY) nucleotide substitution model was used with an assumed strict molecular clock () as using a strict clock was shown to give the best effective sample size when compared to other clock models, which gave similar parameter values but also showed significant auto-correlation. The use of the HKY model in preference to Tamura-Nei in preference to Tamura-Nei was also supported by analysis using Model-Test (). All the final simulations were performed as a single run with a minimum of 10 million iterations, except for the PA segment that needed 20 million, the PB2 segment that needed 30 million and the NS segment that needed 40 million iterations to achieve a suitable level of chain sampling. A burn in period of 10% was used for all samples.Analysis of the Bayesian coalescent output was carried out using Tracer1.6.0 (). All simulations were run until the effective sample sizes for all of the parameters in the model were over 200. The maximum clade credibility trees were calculated using Treeannotater 2.1.2 along with the median node heights, and the final tree diagrams were generated using FigTree1.4.2 (; ).All of the available H5 hemagglutinin subunits and N8 neuraminidase subunits were downloaded from the NCBI Influenza Virus Resource and GISAID (; ). This data was used to carry out a complete phylogenetic analysis for each of the segments. The sequences were initially aligned using MAFFT (). FastTree2.1 was used to create an approximate maximum likelihood tree for all of the sequences. The resulting trees were visualised and annotated with FigTree 1.4.2 ().All of the XML files used to calculate the trees for each of the segments and for the 2014 hemagglutinin and neuraminidase trees are available as , as are the tree files they produce. The tree files for the complete H5 and N8 segment analysis are also available in and .The locations of the H5N8 cases were taken from the EMPRES Global Animal Disease Information System (EMPRES-i), and information about the original reports were sourced from the Avian Flu Diary Blog ( The map was created using Google maps and is available from: […]

Pipeline specifications

Software tools MUSCLE, MEGA, BEAST, FigTree, MAFFT, FastTree
Databases NCBI Influenza Virus Resource
Applications Phylogenetics, Nucleotide sequence alignment
Diseases Influenza in Birds