Computational protocol: Structure-Related Differences between Cytochrome Oxidase I Proteins in a Stable Heteroplasmic Mitochondrial System

Similar protocols

Protocol publication

[…] To select representative sequences for analysis, 44 bivalve F and M COI amino acid sequences were aligned and a maximum likelihood (ML) tree constructed (, online). Four species each with F and M genomes were chosen within each of the three orders, Mytiloida, Veneroida, and Unionoida. No part of the lineages separating the F and M COI sequences through their common ancestor within each of the chosen species overlaps with the corresponding lineages separating F and M genomes in the other species. In other words, the F and M genomes form monophyletic groups within each species (). This approach allows statistically independent comparisons of the F and M genomes in line with a proposal for the comparative method (). Sequences from the Mytilus edulis group (M. edulis, M. trossulus, and M. galloprovincialis) were not chosen because of the complication of role reversals and rearrangements (), Mytilus californianus was chosen instead. Musculista senhousia was added as a fourth species because it also satisfies the above criterion of nonoverlapping lineages, and the F and M COI sequences are sufficiently diverged to merit inclusion. The eight sequences used in analysis with NCBI accession numbers are M. californianus (Cal) (F: ACV65353 M: ACV65365), M. senhousia (Sen) (F: ACY00212 M: ACY00224), Pyganodon grandis (Gra) (F: ACQ91058 M: ACQ91071), and Ruditapes philippinarum (Phi) (F: BAB83795 M: BAB83782). The eight bivalve sequences were aligned with a sequence from a known bovine structure (PDB ID: 1V54, ) using T-coffee (). Terminal regions and a few internal sites with gaps were removed from the alignment. The terminal regions comprise on average 22 amino acids per sequence and are similar in length and amino acid sequence between genomes within species. The resulting alignment is 501 amino acid sites long, 13 sites less than the number of amino acids in 1V54 COI. To reconstruct predicted ancestral sequences of the F and M COI sequences within each species, the alignment without gaps was submitted to the FASTML server, which implements ML algorithms for this purpose (). [...] To assign the site-specific attributes, the bivalve protein sequences are partitioned into different spatial regions called “structural environments,” which a priori are expected to reflect functional differences. Because the structures of the bivalve proteins are unknown, structural environment was estimated for COI using the known bovine structure 1V54. The program JOY () was used for this purpose, partitioning COI into categories for 11 structural environments. For illustration, consider an environment relating to main chain to main chain hydrogen bonding. Using a known or estimated protein structure, JOY would classify each amino residue as belonging to category “True” if it participated in such bonding and category “False” if it did not. We have given abbreviated names to these JOY environments, which with brief description in brackets are SecStrucPhi (secondary structure and phi angle), SolAcc (solvent accessibility), HmainCO (hydrogen bond from side chain to main chain CO group), HmainNH (hydrogen bond from side chain to main chain NH group), Hotherhet (hydrogen bond to nonstandard residue), Hhet (hydrogen bond to nonstandard residue), covhet (covalent bond to nonstandard residue), Hmamide (main chain to main chain hydrogen bond involving NH of specified residue), HmCO (main chain to main chain hydrogen bond involving CO of specified residue), DSSP (secondary structure using DSSP algorithm), and PosPhi (positive phi angle). The environment categories for SecStrucPhi and DSSP refer to participation in different types of secondary structure such as helix or coil. Further information on the environments is given in , online.The COI protein was additionally partitioned into categories based on four environments, which potentially relate to selective constraint and purifying selection. The first of these is total amino acid diversity per site in the sample of eight sequences. The second and third relate to evolutionary conservation values derived using the programs ConSurf () and FuncPatch (). As an exploratory approach, root mean square distance values for atomic distances within superimposed COI structures including 1V54 were also calculated as the fourth constraint environment (see S1, online), on the basis that higher constraint may be related to lower molecular distances in the superimposed structures. Contrasting categories based on these environments did not reveal the significant differences that are reported below for the JOY structural environments and are not considered further. A dendrogram showing clustering of the JOY structural environments, based on the category assignments for all sites over the 501 sites of 1V54 that align with the bivalve sequences, is given in , with correlation values between environments in , online. Some clustering is expected, for example, of SecStrucPhi and DSSP, which reflect secondary structure. The application of JOY revealed that 7 of the 11 structural environments gave only small numbers of sites differing between F and M COI proteins in one of the contrasting categories, and preliminary analysis was not informative and so these structural environments were excluded from further analysis. The remaining four structural environments, SecStrucPhi, SolAcc, Hmamide, and HmCO, are the focus of further investigation. As illustration of the spatial location of category regions, these are shown marked on an image of COI for three of these environments, Hmamide and HmCO in and SolAcc in . Hmamide and HmCO are combined as they have some overlapping sites with significant but not high correlation (, online). These three environments feature most prominently in the later analysis. The Hmamide and HmCO True category residues are located in a hydrophobic region within the membrane. The False category residues are located toward the aqueous intermembrane space and mitochondrial matrix. These hydrophilic regions have a greater number of side-chain oxygen and nitrogen atoms available for hydrogen bonding with other molecules as compared with the True category (). The SolAcc True residues are on the external parts of the protein, the False residues are buried. Confirmation that JOY is effective in generating categories having differences potentially reflecting structure and function was obtained by comparing the amino acid distributions between categories for the pooled data set of eight bivalve sequences (, online). Chi-square contingency analysis reveals highly significant differences in amino acid distribution between categories for the four environments tested. […]

Pipeline specifications

Software tools T-Coffee, FastML, JOY, ConSurf, FuncPatch
Applications Phylogenetics, Protein structure analysis
Organisms Homo sapiens
Chemicals Adenosine Triphosphate, Alanine, Amino Acids