Computational protocol: Hypothetical in silico model of the early-stage intermediate in protein folding

[…] The testing subset of 250 protein chains has been chosen randomly from nonredundant set of protein structures from PDB. The teaching set of nonredundant set of protein structures has been selected from PDB on a basis of data obtained in December 2011 by means of the BLASTClust tool for protein sequences characterized by sequence identity not higher than 95 %. The testing subset of protein chains is 1 % of the whole nonredundant data basis of proteins. The teaching set did not contain the proteins belonging to test set. [...] Convergence assessment of both algorithms (step back and step forward) has been performed by using Chi-square testing as well as analysis of RR (relative risk) [], OR (odds ratio) [] and D (distance) [] parameters. Discrepancies between various structural models can be explained by analyzing the dependencies between correct (or incorrect) simulation results and the involvement of individual residues in interaction with external molecules (e.g., ligands or other proteins). A sample table which expresses these dependencies is shown below (Table ). External interaction is defined as an engagement of particular residue in ligand (protein, ion, nucleic acid) complexation. This identification is based on PDBSum standards (the distance criterion—distance below 4 Å) []. A Chi-square test has been applied to assess the dependencies listed above. Values of the Chi-square statistics indicate dependencies (p < 0.05), which are treated as effects of external interactions upon the conformation of a given amino acid. All relevant calculations were performed using the Statistica package []. […]

Pipeline specifications

Software tools BLASTclust, Statistica
Databases PDBsum
Applications Miscellaneous, Protein sequence analysis