Computational protocol: Out of Africa: A Molecular Perspective on the Introduction of Yellow Fever Virus into the Americas

[…] Maximum likelihood phylogenetic trees were inferred for the YFV prM/E sequence data (670 nt) set under a variety of nucleotide substitution models in PAUP* [], including (a) codon-specific substitution rates and (b) the GTR+I+Γ4 model, with the rate of each substitution type under the general reversible model (GTR), the proportion of invariant sites (I), and shape parameter of a gamma distribution with four rate categories (Γ4) estimated from the data. The GTR+I+Γ4 substitution model was also used as the basis to estimate trees using Bayesian Markov chain Monte Carlo approaches implemented in MrBayes [] and BEAST []. The final tree presented is the MAP tree estimated in BEAST (chain length of 25 million, sampling every 1,000), with tip times corresponding to the year of virus sampling.To test the competing hypotheses of the “recent” and “ancient” origin of YFV in the Americas, we compared the likelihood of the maximum likelihood tree (“recent origin”) with that of a model tree in which both the African and American lineages were monophyletic (“ancient origin”) using a Shimodaira–Hasegawa test [].Rates of nucleotide substitution, the age of the most recent common ancestor (MRCA), and demographic histories were estimated for the whole data set and each geographic subset using models that allow for rate variation among lineages under a relaxed (uncorrelated exponential) molecular clock [] implemented in BEAST []. Four population dynamic models were investigated: constant population size, exponential population growth, logistic growth, and expansion growth. To confirm the age of the MRCA of all the YFV sequences analyzed, we also used the piecewise Bayesian skyline plot [], as this possesses the least constrained coalescent prior. Akaike's information criterion was used to determine the best-fit model, with uncertainty in parameter estimates reflected in the 95% HPD values, and all chains were run for sufficient time to ensure convergence. All estimates were again based on the GTR+I+Γ4 model of nucleotide substitution.Mean and site-specific selection pressures acting on YFV were measured as the ratio of nonsynonymous (dN) to synonymous substitutions (dS) per site estimated using the single likelihood ancestor counting (all sequences) and random effects likelihood (maximum of 50 sequences) methods, both incorporating the GTR model with phylogenetic trees inferred using the neighbor-joining method available at the Datamonkey facility []. […]

