Similar protocols

Protocol publication

[…] In the recent past there have been several computational efforts to identify residues in contact, involving analysis of correlated substitution patterns in a multiple sequence alignment of a protein. DCA (), PSICOV (; ), GREMLIN (), SCA () and EVfold () analyze co-variation matrix data from a multiple sequence alignment to deduce residues in contact. The methods rank residue pairs based on a co-variation or correlation score specific to each method. The top ranked pairs which typically lie in the top L/2 pairs, where L is the length of the protein sequence () are predicted to be in contact. The methods perform well when the size of the multiple sequence alignment is large, that is >5L (). DgkA (121 residues per protomer) exhibits large sequence diversity (4175 sequences in the multiple sequence alignment). Putative contact predictions by the above methods were analyzed by calculating sidechain-sidechain centroid distances between the predicted pairs using both X-ray and NMR structures of DgkA. Some high scoring, co-varying pairs predicted by DCA, GREMLIN, PSICOV and EVfold were found to be true contacts (centroid-centroid distance <7 Å, ) when mapped onto the crystal structure. There were a few high scoring pairs which were either far apart in the X-ray structure (predictions by PSICOV) or were in proximity when analyzed with the NMR structure (predictions by GREMLIN, PSICOV and EVfold). However, overall sequence co-variation data are more consistent with the X-ray structure, in agreement with conclusions from suppressor mutagenesis. Of the six contacts identified from our suppressor analyses (), three (62–41, 67–104, 68–100) were predicted in the top L/2 co-varying pairs by GREMLIN, PSICOV and EVfold (), only 67–104 was predicted by DCA and none by SCA. This suggests that natural sequence co-variation and suppressor mutagenesis can provide complementary information.10.7554/eLife.09532.022Figure 7.10.7554/eLife.09532.023Figure 7—figure supplement 1.10.7554/eLife.09532.023Figure 7—figure supplement 1.The co-variation prediction methods use a multiple sequence alignment as input. The predictions therefore are not specific to the identities of the side-chains of the residues present in the sequence of interest at the predicted contact positions. Therefore, we also analyzed the predictions by calculating the Cα-Cα distances between the predicted pairs using both X-ray and NMR structures (). No side chain information is involved in these calculations. Similar results were obtained as when using the sidechain-sidechain centroid distances. Co-variation prediction becomes increasingly challenging for proteins with very few homologs. CcdB (101 residues per protomer) has only 350 sequences (<5 L, where L is the length of the protein) in the multiple sequence alignment. Therefore, co-variation predictions for CcdB were not included. [...] There is considerable interest in accurate prediction of mutational effects on the free energy of folding (; ; ). We therefore examined whether ΔΔG calculations could be used to rationalize the identity of the experimentally observed local suppressors. To this end the difference in stability between the (Parent inactive mutant, suppressor) pair and the parent inactive mutant for CcdB mutants was calculated. △△Gfolding(△GfoldingDouble mutant-△GfoldingParent inactive mutant) was calculated using Rosetta v3.3 (). Putative proximal suppressors were considered to arise at residues within 7 Å (sidechain–sidechain centroid distance) of the parent inactive mutant. Many stable substituents were predicted (△△Gfolding<0, ). However, amongst the six experimentally identified stable compensatory pairs, only L36A/M63L (−3.7 kcal/mol) was predicted to be stable. The remaining five contact pairs were predicted to be either marginally stable or unstable. Several other mutations besides the experimentally determined ones were predicted to be stabilizing for example V5F/L16G, V18W/I90A, V20F/I90A, L36A/V54I and L83S/V18I. These might be present in the earlier rounds of sorting but are lost in later rounds due to stringent sort conditions. A marked bias for aromatic substitutions was observed in the predictions (, substitutions underlined in magenta) though such aromatic substitutions were not observed experimentally. Aromatic substitutions are rigid and were found to over pack the cavity created by the parent inactive mutants in the models generated using Rosetta. Further, several of the mutations that were computationally predicted to be highly stabilizing are unlikely to be so as they are not complementary in size to the original parent inactive mutant, for example L36A/W61F, V5F/L16Y, V18W/I90F and V20F/I90F. If aromatic substitutions are excluded, Rosetta predictions using ΔΔG values are in reasonable qualitative agreement with experiment.10.7554/eLife.09532.024Figure 8.10.7554/eLife.09532.025Figure 8—figure supplement 1.10.7554/eLife.09532.025Figure 8—figure supplement 1.A similar analysis was done using FoldX () (). However, these predictions were in poorer agreement with the experimental results, compared to those of Rosetta. Thus, in addition to their use in protein structure prediction, results from such suppressor analyses can also be used to benchmark and improve computational approaches to predict mutational effects on protein stability. […]

Pipeline specifications

Software tools PSICOV, GREMLIN, FoldX
Application Protein structure analysis
Organisms Escherichia coli, Dipturus trachyderma