Computational protocol: Dynamic clustering threshold reduces conformer ensemble size while maintaining a biologically relevant ensemble

Similar protocols

Protocol publication

[…] The workflow employed in this study is illustrated in Fig. . The bound conformations of the ligands were obtained from their co-crystallized complexes in the Protein Data Bank (PDB). These structures had been employed previously to investigate the relative energies of the bound conformations of drug-like molecules []. Two structures (1RO9 and 3CPA) were removed from the list, because of questionable B-factors []. In addition, eight pairs of identical ligands that crystallized in dissimilar conformations in different receptors were included. A subset of 65 ligands was selected to represent the entire range of rotatable bonds reported for some drug-like compounds (Tables , ) []. Hydrogen atoms were added to the crystal structures of the ligands employing the AddH tool of Chimera version 1.4.1 [], and each ligand inspected visually for structural consistency. The positions of hydrogen atoms were optimized, while the heavy atoms were fixed utilizing the default minimization criteria in Chimera version 1.4.1 (100 steps, stepsize 0.02 Å, update interval 10, Gasteiger charges). Finally, each structure was minimized in the Molecular Operating Environment version 2008.10 (MOE) program [] employing heavy-atom positional constraints that are related to atomic B-factors and the temperatures at which the crystals were solved. This treatment of the dataset was performed to take into consideration high B-factors that can lead to inaccurate fitting of ligand atoms. Details of this approach have been published elsewhere []. Briefly, this method takes advantage of the notion that atoms with low B-factors have well-resolved electron densities, therefore, their positions are well-defined by the experimental coordinates and may not require further adjustments. However, high B-factors indicate high atomic mobility and positional uncertainties. Thus, in minimizing the bioactive structures the positional constraints are higher on atoms with well-defined atomic coordinates compared to those with poorly-defined coordinates. As a result atoms with low B-factors would be relatively stationary, while the positions of atoms with high B-factors would move presumably to their optimal positions. (In this study superposing pre- and post-minimized bioactive conformations did not lead to any significant changes in compound geometry, see the Results section and Table S1). Additional factors, such as protein environment, explicit solvent effects, etc. are not considered in this process.Fig. 1Next, the computational 3D models were built from scratch, and minimized employing the MMFF94x force field and default parameters in MOE. Four conformer sets were generated from these initial conformers. The first set of conformers (omega) was generated utilizing default OMEGA version 2.3.1 parameters except for the following: rms = 0.4; ewindow = 25.0 kcal/mol, maxconfs = 500; searchff = mmff94s_noestat. The initial number of conformers generated was 50,000, specified via the maxconfgen parameter. The rms parameter sets a lower limit for filtering similar conformers; maxconfs determines the final number of conformers to be retained from the initial ensemble requested via maxconfgen; the searchff specifies the force field employed to compute internal energies during conformer search; and ewindow sets an upper limit for retaining the generated conformers. The incomplete force field, mmff94s_noestat, was employed in order to neglect intramolecular gas-phase interactions that could lead to collapsed conformations, given that bound conformers are generally more extended than unbound conformers [, ]. The OMEGA parameters were employed because they have been shown to be optimal in terms of reproducing the bound conformations of ligands []. In addition, the maxconfs limit of 500 was set because the clustering algorithm scales as O(n2), see below. The OMEGA-generated conformers were translated to the same coordinate frame of reference as the bioactive conformer employing a rigid-body superposition with the ROCS version 2.3.1 program []. The second set (nmrclust) was generated by clustering the OMEGA-generated conformers of each molecule using the NMRCLUST algorithm in the Chimera command line interface, which employs the Kelley penalty function [] to determine an optimal number of clusters. Utilizing the NMRCLUST algorithm avoids subjective inputs of pre-defined intra-cluster cut-offs or spreads, by selecting the number of clusters that minimizes a penalty function during hierarchical clustering of an RMS distance matrix, D(i, j) employing the average-linkage method. The average-linkage method performed best for this type of studies compared to single or complete linkage []. For each hierarchy a penalty function is determined using the number of clusters and the average spread of the clusters. The hierarchy that gives the minimum value of the penalty function is selected to represent the optimum number of clusters for the conformer ensemble. Briefly, a distance matrix consisting of heavy-atom pairwise RMSDs for an ensemble of structures is generated. Next, hierarchical clustering is performed with the matrix using the average-linkage method:for clusters m and n with X and Y members, respectively, and dist(i, j) the RMS between the superimposed i and j from m and n, respectively [].In the course of the clustering, the average spread is determined at each stage using the spreads determined by: []for cluster m containing N members, with conformers i and k; by definition, clusters that contain only one member (singletons or N = 1) are excluded in the calculation of the spread. The average spread is computed by: []where i is a given hierarchy, and cnumi the number of clusters at that hierarchy. The average spreads are then normalized with values between one and (NT − 1), whereby NT is the total number of structures in the ensemble as follows: []Max(AvSp) and Min(AvSp) denote the maximum and minimum average spreads, respectively, in the set across all the stages of the clustering. This results in equal weights in the average spreads and number of clusters in a penalty function that is computed as the sum of the normalized average spread at a given hierarchy and the corresponding number of clusters (including singletons). The penalty scores are then stored as a function of the number of clusters and the average normalized spreads: []The number of clusters that corresponds to the minimum penalty score defines the cut-off for the ensemble. This cut-off represents the stage wherein the clusters are as highly populated as possible, while concurrently minimizing the spread. After this analysis, a structure closest to the centroid of each cluster is selected as the representative structure. The third set of structures (rms) was generated by altering the value of the rms parameter in OMEGA in order to obtain a comparable number of conformers as the number of representative structures identified by NMRCLUST. Finally, the fourth set of structures (rms_avg) was generated by partitioning the dataset in terms of number of rotors: low, having between one and four rotatable bonds; medium, possessing between five and nine rotatable bonds; and high, with ten to 15 rotatable bonds. The rms-filtering cutoffs employed in set three for the compounds in each category were averaged and employed to generate conformers for each molecule in the rms_avg set.The RMSDs between the computer-generated structures and their bioactive conformations were computed for each multi-model file, utilizing the g_rms module of GROMACS [], and the RMSD statistics (average, standard deviation, minimum and maximum values) were extracted (see Table ). Perl scripts were written for the automation of the conformer generation, ROCS overlays, and RMSD analyses procedures. […]

Pipeline specifications

Software tools ROCS, GROMACS
Application Drug design