Computational protocol: Accurate Prediction of Ligand Affinities for a Proton-Dependent Oligopeptide Transporter

Similar protocols

Protocol publication

[…] To determine which computational methods can best predict the binding of di- and tripeptides to PepTSt, we needed a test set of peptides and a selection of computational methods to validate. For the test set, we chose seven dipeptides and one tripeptide (triAla) for which experimental transport data were available. Crystal structures of one dipeptide (AlaPhe) and one tripeptide (triAla) bound to PepTSt are known (PDB: 4D2C, 4D2D, respectively; ). The pose of the other six dipeptides was assumed to be the same as AlaPhe, as illustrated in . We then selected a range of computational methods for calculating binding free energies which we would validate using the test set. The methods can be categorized based on the amount of computational resource each requires (D); at the low end is the structure-based scoring function found in AutoDock Vina (). Next we chose three different endpoint methods: the linear interaction energy (LIE; ), molecular mechanics generalized Born surface area (MMGBSA; ), and molecular mechanics Poisson Boltzmann surface area (MMPBSA; ). All of these require some molecular dynamics (MD) simulation and hence are more expensive. Finally, we selected a theoretically exact method, thermodynamic integration (TI; ) to calculate differences in binding free energies (ΔΔG) to refine the other predictions. Experimental binding data for PepTSt, and POT transporters in general remain scarce, and the standard method for estimating affinities is to perform competition transport assays and measure the half maximal inhibitory concentration (IC50) values (). Unlike certain enzymes, however, transporters have more complicated kinetics such that the relationship between IC50 and ΔG is unclear (). We therefore compared these two datasets in a qualitative manner using Spearman's correlation coefficient (), ρ, which assesses the ability of each computational approach to reproduce the ranking of substrates based on experimental IC50 values.Since it does not require any MD simulations, the scoring function is the fastest method to estimate binding affinities. Our results with AutoDock (A), however, shows that it is also the least accurate (ρ = 0.43). This is not surprising as AutoDock uses a simplified scoring function (). Although not tested in this study, it is possible that other scoring functions may produce better predictions for peptide transporters as no single docking program performs best across all protein families (, ). Also, AutoDock does not account for the conformational sampling of the ligand and the residues in the binding site of the protein as it uses only one snapshot of the protein-ligand complex for its calculation. It is worth noting that this may be improved by using multiple conformers of the complex, for example from MD simulations, as has been done with several other membrane transporters (, ). As the binding of the peptide test set was modeled using the same structure, AutoDock Vina predicted that they all have very similar ΔG values. For each dipeptide, the range of ΔG values predicted for the nine poses generated is small (∼0.5 kcal/mol), although the score for the pose most similar to the crystal structure or homology model is not always the highest (). We therefore conclude that AutoDock Vina does not accurately predict peptide-binding affinities for PepTSt.Encouragingly, all three endpoint free energy methods managed to rank the peptide test set well (B) compared with the experimental data (ρ ≈ 0.7). The predicted ΔG values for the eight peptides span a wider range, allowing us to better distinguish the subset of well-transported peptides (PhePhe, AlaPhe, AlaAla, and AlaTyr) from poorly transported peptides (triAla and GluGlu). We assume that this increase in accuracy is primarily a result of using an ensemble of conformations generated during the MD simulation, which accounts for the conformational sampling of the ligand and the protein. As endpoint methods require only simulations of the bound and unbound states, the computational cost required for each calculation is relatively modest and therefore they are suitable candidates for a high-throughput workflow to differentiate between high-from low-affinity peptides.Upon closer examination, however, we found that the endpoint methods poorly ranked peptides with similar IC50 values, for example the ρ value of the MMPBSA methods for hydrophobic dipeptides with IC50 ≤ 100 μM is 0.0, i.e., random (). We hypothesized that the more rigorous method, TI might improve the ranking of AlaPhe, AlaAla, AlaTyr, and PhePhe by calculating the change in ΔG (ΔΔG) when the amino acids in AlaAla were mutated into either Phe or Tyr. These values were subsequently added to the results of the endpoint methods. We found that by implementing this step, we managed to significantly improve the prediction and reproduce the exact experimental ranking (ρ = 1.0) (C). It is acknowledged, however, that due to the few data points, the apparently higher correlation to experimental data may be artificial.To quantify and compare the exact amount of resources required for each prediction method, computational usage in hours of single CPU usage (CPUh) was estimated based on the performance of GROMACS for MD simulation using an Intel quad core Xeon processor (D). It is no surprise that the more computational input fed into the methods, the more accurate the predictions become. The endpoint methods are an excellent compromise between good performance and low cost. We therefore conclude that it is most efficient to adopt a hybrid approach: using endpoint methods to broadly classify the ligands into high- and low-affinity substrates and then applying TI where necessary to further refine specific predictions. […]

Pipeline specifications

Software tools AutoDock Vina, GROMACS
Databases C-It
Application Protein interaction analysis