*library_books*

## Similar protocols

## Protocol publication

[…] A set of 43 evolutionary and structural parameters, presented below, were used to characterize an interface residue. Note that all of them are calculated using only the structure of the protein that the residue belongs to.Amino acid type (x1, x2): we used two indexes (), derived from the Aaindex database (), to represent the 20 standard amino acid types. These two indexes summarize a collection of more than 400 indexes describing biochemical properties for each of the 20 standard amino acids. Unlike the equidistant 20-bit code commonly used to encode amino acid type, the more similar two amino acids are, the closer they are in the space defined by (x1, x2). In particular, the two indexes that we used are strongly correlated to residue size and hydrophobicity on one hand and to residue preference for being in a loop or strand on the other ().Evolutionary profile (x3,..., x22): first we used the software Blast (). As parameters, we used substitution matrix BLOSUM62 and expect value = 0.1, against the Swissprot/Uniprot knowledgebase release 9.6 () in order to find similar protein sequences. Then, sequences in the blast result were filtered according to HSSP threshold () to keep only homologue sequences. Two protein sequences in the original data set () did not survive this filtering process (at least five homologue sequences). Consequently, in our experiment only 233 interface residues were considered. After that, we used the software **ClustalW** (), with substitution matrix series BLOSUM, gapopen = 3.0 and gap ext = 0.1, using the resulting set of homologue sequences to build the final multiple sequence alignment (MSA). Each member of the profile corresponds to the percentage of the amino acid type present in the MSA.Conservation score (x23): the residue conservation score was calculated using the same MSA used for extracting the evolutionary profile parameters. The residue conservation score corresponds to evolutionary pressure, calculated by using the software **rate4site** (). It uses information from the phylogenetic tree built from the MSA and an underlying stochastic process to estimate the residue conservation rates by using the maximum likelihood principle.Surface Area and Solvation Energy (x24, ..., x34): both solvent accessible surface area (SAS) and molecular surface (MS) were calculated by using the program Volbl, included in the software package Alpha Shapes (), considering a probe radius of 1.4 Å and the set of atom radii provided in the package. Also, relative solvent accessible surface area (rSAS) was calculated from the SAS by using the values of SAS for each residue in extended state (Ala-X-Ala), as reported by . Solvation energy per atom, in cal/mol.Å2, was calculated considering four different sets of atomic solvation parameters (ASP) (; ; ). Additive contribution was assumed such that for each set of ASP, absolute solvation energy per residue was calculated by adding the corresponding solvation energy per atom. In addition, the corresponding solvation energy, weighted per ASA, was also calculated for each set of ASP.Geometry (x35, ..., x41): for describing the geometry of each surface residue, we considered a set of atoms composed of the residue's atoms which were exposed on the surface and all surface atoms as close as 10 Å to any of them. By using the set of coordinates corresponding to each atom in this set, seven geometric parameters were calculated as follows. Gaussian and Mean curvatures were calculated through an osculating quadric, as reported by , as well as the corresponding Principal curvatures. From those calculations, Curvedness and Shape Index were also calculated, as proposed by . Finally, the Index of Planarity, defined as the reciprocal of the root mean square deviation (rms) of a set of atoms relative to the least square plane through them (), was calculated.Dihedral angles (x42, x43): the software **Stride** () was used for calculating ϕ and ψ dihedral angles corresponding to each surface residue. [...] Most parameters used for classification were calculated by using algorithms available as public domain software. For calculating solvation energy and surface shape parameters, Python programming language and Bio.PDB **bioPython** package () were used. The Matlab environment 7.0 was used for data analysis and plotting ROC and (Precision, Recall) vs. Threshold curves.The classifier was implemented using the **LibSVM** software () with the radial basis kernel. For selecting the regularization parameter, C, and the kernel parameter, γ, we used a grid search procedure, as suggested in the LibSVM manual. This resulted in the following parameters for the SVM classifiers: C = 0.03125 and γ = 0.0078125.In order to estimate the performance measures (AUC, Precision, Recall and F-Measure) as well as the corresponding graphs (ROC and Precision, Recall, F vs. Threshold curves), we used a stratified 5-fold cross-validation procedure. It basically consists of the usual 5-fold cross-validation procedure where the original proportion between classes is maintained in each partition. This procedure was repeated 100 times such that the data set was randomly partitioned each time. We report the average result corresponding to the 100 repetitions. In addition, each time the SVM classifier was trained, we linearly scaled each of the 43 parameters in the training set to the range [-1, 1] and used the same scale mapping to scale the data in the testing set. […]

## Pipeline specifications

Software tools | Clustal W, Rate4Site, STRIDE, Biopython, LIBSVM |
---|---|

Databases | UniProtKB AAindex HSSP |

Applications | Miscellaneous, Phylogenetics |