Computational protocol: A three-dimensional topology of complex I inferred from evolutionary correlations

Similar protocols

Protocol publication

[…] We included 38 permanent subunits and seven assembly factors of human complex I for which a sufficient number of sequences were available. We collected homologous sequences from the nr database [] using PSI-BLAST (default parameters). Multiple queries from different species were used whenever PSI-BLAST failed to find known homologs [see Additional file . For A6, B9, A12, and AF2, orthologous groups were manually identified in neighbor-joining trees constructed with identity matrices and correcting for multiple substitutions. Species overlap between the partitions was used to divide the trees into separate orthologous groups. The remaining subunits were treated by a different protocol. First, to ensure a separation of the paralogs NADH2, 4, and 5, we built a set of trusted orthologs of NADH2, 4, and 5 from those sequences that had the best bi-directional hit with the human query using PSI-BLAST. From these seed sequence sets we computed three HMM profiles and sorted the remaining homologs into the orthologous group to which they showed the best profile-alignment []. For all sequence sets we selected as single ortholog per species the sequence with the highest NEEDLE score in a pairwise alignment to the human query [] (default parameters) and/or manual selection based on multiple alignments (MAFFT [], CLUSTALW [,], HMMER [], HHSEARCH []). The kinetoplastida were excluded from our analysis due to their high level of sequence divergence. To gain high quality alignments, we aligned all sequence sets with CLUSTALW and manually fixed misalignments. The manually curated alignments are provided in Additional file . Next, we filtered alignment columns with BMGE [] (−m BLOSUM30 -g 0.50 -b 4), removed sequences that had more than 33% gaps, and restricted the alignments to those species for which we found at least eight subunits of the complex. Of the 43 alignments, 39 had more than 75 sequences and there was no alignment with less than 44 sequences. Finally, we calculated phylogenetic trees using RAXML [] (Version 7.2.6, PROTGAMMAMTREV for NADH1/2/3/4/4L/5/6, otherwise PROTGAMMAJTT; 4 rate categories) [see Additional file . A single tree was calculated for the concatenated alignment of NADH3, 4L, and 6. [...] The structure of complex I in the T. thermophilus served as our reference [] (PDB:3M9S). We approximated the mass centers of the subunits as the average of x, y, and z coordinates of their Cβ atoms (Cα for glycine) []. The evolutionary configuration was fitted by rotation and isometric scaling on the bacterial configuration using generalized Procrustes analysis as implemented in the function GPA of the R package FactoMineR (Version 1.14) []. We quantified the difference between the bacterial configuration T and the evolutionary configuration C of n subunits by their root mean square deviation (rmsd) (3) r m s d ( T , C ) = ∑ i = 1 n | t i − c i | 2 n 2 where |ti - ci| is the distance between the bacterial and predicted center of the i-th subunit. […]

Pipeline specifications

Software tools BLASTP, MAFFT, Clustal W, HMMER, HHSearch, BMGE, RAxML, FactoMineR
Applications Miscellaneous, Phylogenetics, Amino acid sequence alignment
Organisms Thermus thermophilus
Chemicals NAD, Ubiquinone