Computational protocol: Determining the Phylogenetic and Phylogeographic Origin of Highly Pathogenic Avian Influenza (H7N3) in Mexico

Similar protocols

Protocol publication

[…] The complete genome of three outbreak strains of H7N3 from Mexico (A/chicken/Jalisco/CPA1/2012; A/chicken/Jalisco/12283/2012; A/Mexico/InDRE7218/2012) and all previously published influenza A virus sequences of North American lineage (complete genome only) were downloaded from GenBank on 1st March 2013. Sequences of each gene segment were aligned using MUSCLE v3.5 . Maximum likelihood (ML) phylogenetic trees for each segment were generated using RAxML v7.04 , each employing a GTR GAMMA substitution model with 500 bootstraps. We established a full genome dataset which was composed of the same 2343 North American strains for each segment. The HA and NA segments have extremely high divergence between different subtypes, therefore, we used all available H7 and N3 to generate the raw trees of HA and NA segment respectively; While there are diverse reassortment and interaction among the six internal segments, therefore, we constructed the giant ML trees for the internal segments in order to identify the outbreak strains related strains. Background sequences for further study were selected from the closest clades to the novel H7N3 HPAI viruses on the maximum likelihood tree of each segment. The final dataset of 427 AIV strains collected over a 12 year period (2001 to 2012) is displayed in . In this table the segments selected for analysis for each strain are indicated. For the majority of strains, only one segment is selected (n = 289), while for others more than one segments is included. There are 131 HA (H7) sequences included in the analysis based on their relationship to the Mexico H7N3 strain (1698 nt); other H7 sequences included in the joint analysis are related to the outbreak strain in other segments. For the other segments the distribution is as follows: NA (N3), n = 100 (1410 nt); PB2, n = 86 sequences (alignment length of 2277 nucleotides); PB1, n = 67 (2271 nt); PA, n = 89 (2148 nt); NP, n = 79 (1494 nt); MP, n = 39 (982 nt); and NS, n = 42 (838 nt). The trait information (host order; host species; location; flyway; state; subtype) of these background AIV sequences are also provided in . [...] To estimate the origin in time and space of the HPAI H7N3 outbreak strain in Mexico, models in BEAST v.1.7.3 , were applied independently to each gene segment (each segment has a different number of AIV sequences). Different combinations of substitution models: general time-reversible (GTR) substitution model+ Γ distributed site-site rate variation and SRD06 ; clock models: strict and uncorrelated relaxed lognormal; and population size models: constant size, exponential, skyride models were evaluated by Bayes Factor test. The best fitting model - incorporating a GTR substitution model+ Γ with uncorrelated lognormal relaxed molecular clocks and a constant-population coalescent process prior over the phylogenies was selected. Parameters were estimated using the Bayesian Monte-Carlo Markov Chain (MCMC) approach implemented in BEAST. MCMC chains were run for 100 million states, sampled every 10,000 states with 10% burn-in. MCMC convergence, and effective sample size of parameter estimates were evaluated using Tracer 1.5 ( Maximum clade credibility (MCC) trees were summarized by using Tree Annotator and visualized by using FigTree v1.4.0 ( graphical representation of the origin of HPAI H7N3 Mexico was obtained by spatial reconstruction using a Bayesian framework. The SPREAD application was used to convert the estimated divergence times and the spatially-annotated time-scaled phylogeny (by associating each location with a particular latitude and longitude) to a spatiotemporal movement. The mapped objects were exported to keyhole markup language (KML) files and then were visualized by geographic information systems software: ARCGIS ( The map source is OpenStreetMap ( […]

Pipeline specifications