Computational protocol: Wiggle—Predicting Functionally Flexible Regions from Primary Sequence

Similar protocols

Protocol publication

[…] Several protein disorder predictors were compared to Wiggle and Wiggle200 predictions () to illustrate that these predictors identify different targets. Disorder predictors differ widely in their approaches, but targets are generally based on high temperature factors or missing residues in crystal structures. PONDR [] is a disorder predictor trained on fractional composition and hydropathy. DISOPRED [] uses the PSI-blast matrix as input to an SVM to detect disorder, while DisEMBL [] is a neural network trained for the predictions of coils, hot coils, and disorder. RONN [] uses a bio-basis function neural network to take advantage of information embedded in homologous proteins. GlobPlot [] and FoldIndex [] are simpler algorithms that, respectively, use running propensity for protein disorder and an index that classifies residues based on hydrophobicity and net charge. IUPRED [] uses concepts of pair-wise interaction potentials observed in globular proteins to make assignments for each residue. Finally, NORSP [] assesses regions based on low confidence predictions for secondary structural elements.Some overlaps are expected with disorder predictions because FFRs may be disordered depending on the conformational state of the protein. Otherwise, we expect little correlation since disorder predictors generally aim to identify structural disorder and regions with a low propensity to form an ordered unit. Potential functional roles were not considered in their design, although these regions are suggested to be important for protein-protein recognition after examining positively classified sequences [,]. With the exception of the arc repressor where predictor results exhibited significant overlap, Wiggle and Wiggle200 have been found to target regions that were not otherwise identified by disorder predictors.For arc repressor (1BAZ), disorder predictors positively classified terminal ends, although some failed to identify it altogether. The hinge region connecting the two helices is not fully identified by most disorder predictors. While Wiggle predictors did not identify all residues involved in recognition at the major groove for PVUII endonuclease (3PVI), it identified the minor groove recognition loop, catalytic loop, and magnesium ion coordinating residues. Current disorder predicting tools failed to identify these regions. Disorder predictors that successfully identified at least one of these regions are based on an index separating hydrophobicity and net charge (FoldIndex and GlobPlot) or the use of homology information (RONN).Most disorder predictors failed to identify all glycosylation sites on erythropoietin (1EER) with the exception of DisEMBL, having the most overlap in predictions with Wiggle. The structure of erythropoietin is entirely helical, and DisEMBL has been designed to predict coils with high B factors. The glycine kink was also missed by most disorder predictors except for DisEMBL and FoldIndex.We also compare the performance of predictors in identifying FFRs as defined by the FF score (). Two test sets were used: TESTALL and TEST200 containing randomly selected chains from the training dataset for all proteins and proteins up to 200 residues long, respectively. These test sets were used during one of the cross-validation runs from which the Wiggle predictors were created; therefore, the performance results reflect unseen cases for Wiggle. The results show that DISOPRED was able to identify FFRs with the highest accuracy for both test sets (TESTALL: 78.48%, TEST200: 75.20%). However, DISOPRED failed to identify FFRs as indicated by the poor recall (TESTALL: 11.54%, TEST200: 12.89%). The predictor is therefore poor at identifying FFRs by identifying most residues to be a non-FFRs despite having a high precision. We observed earlier that the residue pool is disproportionate with the FF score identifying about 20% of the residues to be located in an FFR.We report the performance of Wiggle on TESTALL and Wiggle200 on TEST200. Wiggle predictors outperformed the other disorder predictors in overall performance for both test sets when comparing precision and recall values (). These results are expected since the predictors were all trained to identify a different target property of proteins. Our predictors were designed to identify regions of flexibility with functional importance unlike the other predictors that target highly disordered regions. The comparison of predictors is an important demonstration to illustrate that the target regions identified are different. This comparison is not intended to measure or make an assessment regarding the ability of Wiggle predictors to identify protein disorder. That our test cases are actually solved structures implies some level of order for the regions to be identified. […]

Pipeline specifications