Computational protocol: The reconstructed ancestral subunit a functions as both V-ATPase isoforms Vph1p and Stv1p in Saccharomyces cerevisiae

Similar protocols

Protocol publication

[…] Our method for reconstructing ancestral protein sequences generally follows the strategy described by , with specific details as follows. The methodology involves 1) collecting amino acid sequences from modern species of the query protein, 2) inferring a phylogenetic history describing the relationship between species, and 3) using a maximum likelihood (ML) method to predict the most probable ancestral state of the protein sequence of interest (the experimental strategy is reviewed in , , and ). Briefly, an amino acid residue is predicted (with a certain statistical confidence) to be present within an ancestral sequence based on the proportion of occurrence within the sequences of modern species and their specific phylogenetic relationship.GenBank was queried for all fungal V-ATPase subunit a protein sequences; 68 homologous isoforms were returned (Supplemental Table S1). We also retrieved the nonfungal protein sequences for subunit a in D. discoideum and A. thaliana to use as a phylogenetic outgroup. Sequences were aligned using PRANK, version 0.081202 (, ). Phylogenetic inference was performed using the ML criterion to optimize a probabilistic model of amino acid substitution (). The best-fitting model for our sequence alignment is the Whelan–Goldman matrix with gamma-distributed rate variation (+G) and proportion of invariant sites (+I), according to the Akaike information criterion as implemented in PROTTEST (; ). We used PhyML, version 3.0, to infer the ML topology, branch lengths, and model parameters (). The tree topology was optimized using the best result from nearest-neighbor interchange and subtree pruning and regrafting (using PhyML's implementation). All other model parameters were optimized using the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (), which we implemented as an in-house C extension to PhyML. Phylogenetic branch support was calculated as the approximate likelihood ratio based on a Shimodaira-Hasegawa–like procedure ().ML ancestral states were reconstructed at each site for all ancestral nodes in the ML phylogeny using a set of Python scripts called Lazarus (), which wraps PAML, version 4.1 (). Lazarus parsimoniously placed ancestral gap characters according to Fitch's algorithm (). We characterized the overall support for the reconstructed Anc.a sequence by binning the posterior probability of the ML state at each ancestral site into 5%-sized bins and then counted the proportion of total sites within each bin (Supplemental Figure S2). […]

Pipeline specifications

Software tools ProtTest, PhyML, PAML
Application Phylogenetics
Organisms Saccharomyces cerevisiae, Homo sapiens