Computational protocol: Prospective of Genomics in Revealing Transmission, Reassortment and Evolution of Wildlife-Borne Avian Influenza A (H5N1) Viruses

Similar protocols

Protocol publication

[…] Phylogenetic analysis of viral sequences is a standard way to help delineate the molecular epidemiology of outbreak events, including those involving HPAI H5N1 viruses. However, the large numbers of sequences that are analyzed can sometimes cause particular problems. As of May 16th, 2011, there have been 602 virus genomes, 3,586 HA genes and 2,629 NA genes of H5N1 subtype available in GenBank []. With recent advances in sequencing technology, decreases in sequencing cost and the increased use of sequencing for epidemiological purposes, these datasets will become increasingly large. In this case, a traditional phylogenetic analysis becomes extremely difficult due to computational demands or problems with visualization of the outputs. Although there have been fast and paralleled algorithms to construct phylogenetic trees, such as PhyML [] and RaxML [], it is still hard to analyze a tree derived from a large dataset.Proteotyping has been proposed to study the evolution of type A influenza [,]. Proteotying is similar to genotyping at the DNA level and is able to capture the amino acid variations of the viruses [,]. It was used to analyze 2,196 AI virus genes and 169 complete virus genomes []. However, due to lack of computer programs to facilitate the proteotyping process and lack of appropriate standard to define proteotypes, this method has not been widely accepted by virologists thus far.An alternative approach to phylogenetic analysis is to carry out “ordination” using a dimension reduction technique such as multidimensional scaling (MDS) [], which is able to deal with very large numbers of sequences in a short period of time. MDS has been used to visualize antigenic variation in human influenza A, subtype H3N2 viruses []. Principal co-ordinates analysis (PCOORD) is equivalent to MDS when the distances are Euclidean. PCOORD has been used by us [] to study the phylogenetic diversity of influenza A viruses and the accompanying software has also been used to analyze HBV and HCV sequence variation []. However, although PCOORD and other MDS techniques can yield information on the major groupings (similar to lineages for a phylogenetic tree) of the sequences and can analyze a large number of sequences, they fail to reveal ancestor-descendant relationship which can be easily seen from the phylogenetic tree.Therefore, none of the methods currently available is suitable to analyze a large number of viral sequences. This has been proven to be problematical not only in the molecular epidemiology of influenza, but also in HIV, HBV, HCV and measles for example where there have been thousands of sequences available. To resolve this problem, novel mathematical techniques should be introduced and better computer programs should be developed.In addition, several online or stand-alone computer programs have been developed to facilitate the genotyping and identification of reassortant influenza viruses [-]. Among them, FluGenome [], the method proposed by Suzuki (2010) [], GiRaF [] and FluReF [] are based on phylogenetic trees but employ different methods to determine the topological differences among the trees constructed using each genomic segment, while the quantitative genotyping algorithm developed by Wan et al. (2007) is not phylogeny-based []. Although these methods are robust and efficient, they are not extensively applied to date for the following reasons. Firstly, phylogeny-based programs are often time-consuming requiring computing power and therefore cannot deal with large datasets, although GiRaF is reported to be able to perform a large-scale analysis []. Secondly, although these methods generate consistent outputs, the results from each of them are not directly comparable and are therefore difficult to compare. For example, the authors of FluGenome analyzed ca. 2300 complete genomes of type A influenza and identified 156 unique genotypes []. However, Wan et al. (2007) identified 107 niches among only 283 complete genomes of H5N1 AI viruses []. In addition, GiRaF was also used to analyze 1101 whole-genome sequences of non-human H5N1 influenza viruses and identified 18 reassortment events []. Therefore, results obtained using these programs are not always consistent and this makes the explanation of the results more complicated. Lastly, the widely accepted nomenclature system for HPAI H5N1 is not employed by any of these computer programs. […]

Pipeline specifications

Software tools PhyML, RAxML, GiRaF
Databases FluGenome
Applications Phylogenetics, WGS analysis
Organisms Viruses, Homo sapiens, Human poliovirus 1 Mahoney, Sus scrofa
Diseases Infection, Influenza, Human