Computational protocol: A Comprehensive Survey of the Roles of Highly Disordered Proteins in Type 2 Diabetes

Similar protocols

Protocol publication

[…] We investigated all 34 proteins from the KEGG-generated network of T2DM-related proteins (see and ). Special attention was paid to ten T2DM proteins predicted to have high disorder content; i.e., proteins whose content of disordered residues predicted by PONDR® FIT (CDRFIT) exceeded 30%. These proteins are: pancreatic β-cell-specific transcriptional activator, or V-MAF musculoaponeurotic fibrosarcoma oncogene homolog A, or transcription factor MAFA (UniProt ID: Q8NHW3, CDRFIT: 73.2%); insulin receptor substrates, IRS1, IRS2, and IRS4 (IRS1: UniProt ID: P35568, CDRFIT: 70.0%; IRS2: UniProt ID: Q9Y4H2, CDRFIT: 75.6%; IRS4: UniProt ID: O14654; CDRFIT: 64.4%); pancreatic and duodenal homeobox 1 protein (PDX1, UniProt ID: P52945, CDRFIT: 60.4%); phosphoinositide 3-kinase regulatory subunits 2 and 5 (PIK3R2: UniProt ID:O00459; CDRFIT: 34.1%) and PIK3R5 (UniProt ID: Q8WYR1; CDRFIT: 36.7%); suppressors of cytokine signaling SoCS1 and SoCS3 (SoCS1: UniProt ID: O15524, CDRFIT: 34.6%; and SoCS3, UniProt ID: O15524, CDRFIT: 32%); and adiponectin (UniProt ID: Q15848, CDRFIT: 38.93%). We also analyzed insulin (UniProt ID: P01308; CDRFIT: 16.3%) and insulin receptors (UniProt ID: P06213; CDRFIT: 14.03%) in more detail, due to the key role of these proteins in the T2DM pathogenesis. [...] In order to analyze the residue level of disorder propensity of the T2DM related proteins, we used several computational tools for predicting the per-residue disorder propensities of a query protein (PONDR® FIT [], PONDR® VLXT [], and PONDR® VSL2 algorithms [,,] together with the PONDR® VL3 predictor that possesses high accuracy in finding long IDPRs []). PONDR® FIT is a consensus artificial neural network (ANN) prediction method [], which was developed by combining the outputs of several individual disorder predictors including PONDR® VLXT, PONDR® VL3, PONDR® VSL2, IUPred [], FoldIndex [], and TopIDP []. PONDR® VLXT has significant advantages in finding potential binding sites, though it may underestimate the occurrence of long disordered regions in proteins [,]. The PONDR® VSL2 is one of the more accurate stand-alone disorder predictors for analyzing proteins containing both ordered and disordered regions [,,]. PONDR® VL3 is sensitive to long IDPRs, and, is therefore better for the wholly disordered proteins. We also used the IUPred web server that allows characterization of both short and long IDPRs in query proteins []. For each protein, after obtaining an average disorder score by each predictor, all predictor-specific average scores were averaged again to generate an average per-protein intrinsic disorder score. The use of consensus for evaluation of intrinsic disorder is motivated by empirical observations that this approach usually increases the predictive performance compared to the use of a single predictor [,,]. The outputs of all these per-residue disorder predictors are real numbers between 1 and 0, where 1 is the ideal prediction of disorder and 0 is the ideal prediction of order. The residues/regions with disorder scores above 0.5 are considered to be disordered, the residues/regions with disorder scores ranging from 0.25 to 0.5 are considered flexible, whereas the residues/regions with disorder scores below 0.25 are considered ordered.In addition to these per-residue predictors of intrinsic disorder, we utilized binary disorder predictors that evaluate the predisposition of a query protein to be ordered or disordered as a whole. The outputs of two of these tools, the charge-hydropathy (CH) plot [,] and the cumulative distribution function (CDF) plot [,], were combined to generate the CH-CDF plot [,,]. In this plot, the coordinates of a query protein are calculated as a following: the Y-coordinate corresponded to the distance of the point representing this protein in the CH-plot from the boundary (ΔCH), whereas the X-coordinate was an average distance of the respective CDF curve from the CDF boundary (ΔCDF). In the resulting CH-CDF plot, positive and negative Y-values corresponding to proteins predicted by CH-plot to be extended or compact, respectively. Positive and negative X-values correspond to proteins predicted to be ordered or intrinsically disordered by CDF analysis. The CH-CDF phase space provides specific expectations for the disorder status of a protein, depending on its position within the four quadrants. Here, the upper-right quadrant Q1 contains proteins predicted to be disordered by CH-plot but ordered by CDF; the lower-right quadrant Q2 is occupied by ordered proteins; the lower-left quadrant Q3 includes proteins that are predicted as disordered by CDF but compact by CH-plot (i.e., native molten globules or hybrid proteins containing comparable quantities of order and disorder); whereas the upper-left quadrant Q4 contains proteins with extended disorder, such as native coils and native pre-molten globules [].To analyze the consensus intrinsic disorder and to find disorder-based interaction sites, molecular recognition features (MoRFs), the MobiDB database [,], the ANCHOR algorithm [,], and the MoRFchibi system [] were used. The MobiDB database combines different data sources related to protein disorder into a consensus annotation, and was used to analyze the consensus intrinsic disorder. The database incorporates data from X-ray/NMR structures and multiple intrinsic disorder predictors to evaluate the possible disorder segments of a given protein of interest [,]. The ANCHOR algorithm (available online: is used to predict protein binding regions that are disordered in isolation but can undergo disorder-to-order transition upon binding. The algorithm captures segments within disorder regions that cannot form stable intrachain interactions to fold on their own, but are likely to gain stabilizing energy by interacting with a globular protein partner [,].The use of disorder predictors to find potential protein binding sites is based on the observation that the sharp dips of order within predicted disordered regions could indicate the presence of the short, loosely structured binding regions that undergo disorder-to-order transitions on interaction with the specific binding partners. MoRFs are short potentially ordered segments within longer disordered regions that bind to globular protein domains and undergo disorder-to-order transition. These disorder-based binding sites are categorized into three types: α-MoRFs (form α-helices upon binding), β-MoRFs (form β-strands), and ι-MoRFs (form irregular structures). MoRFchibi system contains three MoRFs predictors: MoRFCHiBi, a basic predictor best suited as a component in other applications; MoRFCHiBi_Light, ideal for high-throughput predictions; and MoRFCHiBi_Web, slower than the other two but best for highly accurate predictions []. We use a cut off-value around 0.7 with more than four residues above this cut-off identified as MoRFs.To provide more information on the presence of functional disordered regions in T2DM related proteins, we also utilized the D2P2 internet database [] (available online:, which is a community database for the pre-computed disorder predictions. D2P2 combines outputs of PONDR® VLXT, IUPred, PONDR® VSL2B [,], PrDOS [], ESpritz [], and PV2 [] to show disorder predisposition of a query protein. It is further enhanced by the information on the curated sites of various posttranslational modifications and on the location of predicted disorder-based potential binding sites.Finally, STRING (Search Tool for the Retrieval of Interacting Genes) databases were used to find the interactivity of T2DM-related proteins. They are an online resource that provide both experimental and predicted interaction information for query proteins []. […]

Pipeline specifications

Software tools PONDR-FIT, VSL, IUPred, FoldIndex, MoRFCHiBi, PrDOS, ESpritz
Databases MobiDB KEGG
Application Protein structure analysis
Organisms Homo sapiens
Diseases Diabetes Mellitus, Hyperglycemia, Sleep Disorders, Intrinsic