Computational protocol: New Avian Paramyxoviruses Type I Strains Identified in Africa Provide New Outcomes for Phylogeny Reconstruction and Genotype Classification

Similar protocols

Protocol publication

[…] The forward and reverse sequences were aligned using Vector NTI software version 10.3 (Invitrogen, Europe). The validated open reading frames of the different genes were aligned with others retrieved in GenBank using the CLUSTAL W program of the Mega5 software suite. The sequence of a class I NDV isolate was used as an outgroup. Full length genomes, including the genomes from the Malian and Madagascar strains, were aligned using Muscle, which is a faster and more reliable method than CLUSTAL W for larger sets of long sequences . From the full genomes, individual and concatenated genes were generated using Mega5 software . Putative recombination events in sequence alignment of available NDV genomes and the new Malian and Madagascar genomes studied here were investigated by RDP suite (versions RDP3 Alpha 44 and RDP4 Beta 4.16 . This software uses multiple recombination signal detection methods like CHIMAERA, Maximum chi2, RDP, RECSCAN, and Geneconv. All recombined sequences were discarded before analysis. Phylogenetic reconstructions were first carried out by the Maximum Likelihood method implemented in Treefinder (version of March 2011). The data sets consisted of the full genome sequences (13740 nt, n = 110), the six individual genes from these full genomes, the complete F genes (1653 nt, n = 741), the complete HN genes (1713 nt, n = 323), and the partial coding sequences (445 and 372 nt) of the F gene (for n = 796 and 1921, respectively). All these multiple sequence alignments are provided in . The best models of nucleotide substitution for each data set were selected from the uncorrected and corrected Akaike Information Criterion, the Hannan and Quinn performance-based decision theory , and Bayesian Information Criterion of Treefinder version March 2010. A General Time Reversible (GTR) model with a discrete gamma distribution (+G) with 5 classes, allowing for invariant sites (+I), was the consensus substitution model proposed for the different data and was used for all Maximum Likelihood analyses. Bayesian inference was also performed with a GTR model using MrBayes_3.2.2 [38,(]. All Bayesian reconstructions were initially set at 100,000,000 trees with a sample frequency on the chains of 1/1,000 (targeted sample size = 100,000 trees). All priors were set by default, except the evolutionary model and the branch length. For the latter, an inverse gamma Dirichlet prior was selected to avoid overestimation of branch lengths by MrBayes , . For all reconstructions, two runs were launched in parallel with two chains (one heated, one cold). A convergence rule between the two runs was set at a standard deviation of split frequencies lower than 0.01, which then stops the reconstruction. Alternatively, when the standard deviation of split frequencies followed a stationary fluctuation above 0.01 for several consecutive days, very long reconstructions were manually stopped. To assess convergence, the expected sampling sizes (ESS) for posterior probabilities and the Potential Scale Reduction Factor (PSRF) were checked for all reconstructions: validation criteria for all parameters were average ESS>200 and PSRF within [0.99 and 1.01] (as the chains converge in the runs, the variance between the runs becomes more similar and the PSRF approaches 1.0). The numbers of trees used for the generation of the consensus tree and values for the two convergence parameters are indicated in the results. Final trees were laid out using Figtree software, version 1.4.0. Topologies of the Maximum Likelihood and Bayesian consensus trees obtained for the different genes and the full genomes (n = 110 sequences in the different data sets) were compared by Treefinder using the Shimodaira and Hasegawa test . The best representation was then selected. For the F gene, comparisons were performed using different phylogenetic methods, including Maximum Likelihood, Neighbor Joining, and Maximum Parsimony methods from Mega5 software and the Bayesian approach for phylogenetic reconstruction. Branch support values were obtained using nonparametric bootstrapping with 1,000 resamplings for the first three phylogenetic methods, and the posterior probabilities for the Bayesian approach were estimated on 16,806 samples with a Burn-in phase for the first 25% of tree samples. The best tree for the F gene was determined by calculating the minimum branch length distance (K tree score) between the phylogenetic trees by the Ktreedist program . The complete F gene data set was also used to calculate the mean evolutionary distances within and between clusters. The pairwise distance matrix was generated by Treefinder and analyzed in Excel. In addition to the evolutionary distance, nucleic and amino acid signatures specific for the different clusters were sought in the multiple alignments of the F gene and protein. New genotypes and sub-genotypes were assigned according to the criteria described by Diel et al with the following modifications:new genotypes and sub-genotypes were assigned on at least three independent isolates without a direct epidemiologic link in the phylogenetic trees generated for the complete F gene sequence and confirmed by at least two HN gene sequences, using both the Maximum Likelihood and Bayesian methods and the optimal nucleotide model (GTR +G +I, Г5), as determined by Treefinder.the mean distance between genotypes and sub-genotypes, as determined by the distance matrix from the complete F gene sequence generated by the Maximum Composite Likelihood model of Treefinder, was higher than or equal to 0.100 and 0.03, respectively.sub-genotypes were included into a monophyletic genotype and were identified by unique amino or nucleic acid signatures, as described in the results section. […]

Pipeline specifications

Software tools Clustal W, MEGA, MUSCLE, MrBayes, FigTree
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Avian avulavirus 2
Diseases Newcastle Disease