Computational protocol: Structural alphabets derived from attractors in conformational space

Similar protocols

Protocol publication

[…] The suitability of the proposed Structural Alphabet to analyse protein dynamics was further tested by investigating both the robustness of the fragments to small fluctuations and their ability to describe conformational transitions. To limit the computational effort, the analysis was performed on a set of 24 proteins. The conformational space of each protein was explored with the tCONCOORD method [-] that provides a more accurate model than GNM, since an all-atom representation of the system is used and anharmonicities in atom motions are allowed, but it is still simpler and faster than Molecular Dynamics (MD) simulations.In tCONCOORD, ensembles of structures are generated by fulfilling a set of distance constraints between atom pairs. The permitted distance intervals are determined on the basis of the distance values found in the starting structure and of the type of the interaction (e.g. covalent bonds, hydrogen bonds, salt bridges or hydrophobic interactions), so that lower tolerances are used to describe stronger interactions. All the contacts in the original structure are preserved, except for 'under-wrapped' hydrogen bonds [,] which are considered unstable since they are not sufficiently shielded from the environment by hydrophobic groups. It has been shown that the detection of unprotected hydrogen bonds, together with the calibration of the distance constraint definition, allows the prediction of conformational transitions []. Moreover, even if the molecule description is less accurate than that provided by the force fields generally used in MD simulations and there is no explicit representation of the solvent, the collective motions and the overall RMSF profiles extracted from tCONCOORD ensembles have been found in good agreement with both MD and experimental results [,,,].The test set of 24 proteins was extracted from a larger dataset of proteins annotated in the PiSite database [] and currently used by the authors to study the role of flexibility in protein-protein interactions. To avoid the introduction of biases due to the over-representation of some secondary structure types or of some folds, the first four classes of SCOP (α, β, α/β, and α + β) were equally represented and a given fold was considered only once (see Table ). Moreover, within each group of six proteins belonging to the same SCOP class, it was ensured that the distributions of the total number of residues and of the ratio between structured (H, G, I, E and B in the DSSP [] dictionary) and unstructured (T, S and unassigned) regions was covering a wide range (see Table ). After full protonation and energy-minimisation with the GROMACS 3.3.3 package [] and the OPLS-AA force field [], tCONCOORD ensembles of 500 structures were generated for each protein.A 'per-fragment' flexibility profile was obtained for each protein by calculating the RMSFs of Cα over N-3 sliding windows of 4 residues. The roto-translational motion was eliminated by least-square superposition of the fragment in each frame to the reference starting structure. The value assigned to each window was calculated as the quadratic mean of the RMSF values of each Cα in the fragment.For comparison, for each protein the structures in the ensemble were encoded into structural strings by both local and global fit procedure as previously described (see "Assessment by fit quality"). Conversely to the encoding of a single structure, in the ensemble a given fragment position can be generally described by different letters. The letter variability per fragment was evaluated through the Shannon Entropy []:(9)where pij is the fraction of structures where fragment i was encoded by letter j and k is the total number of letters in the alphabet. […]

Pipeline specifications

Software tools tCONCOORD, GROMACS
Application Protein structure analysis