Computational protocol: iPTMnet: an integrated resource for protein post-translational modification network discovery

Similar protocols

Protocol publication

[…] PRO constitutes a crucial part of the iPTMnet dataset as it places the PTM site information in the context of proteoforms, i.e. it shows the combination of PTMs that have been observed in a given protein. This feature of representing experimentally validated combinations of PTMs is unique. PRO uses a hierarchical representation (family→gene→sequence→modification) that depicts the relationships of a protein to its parent class and child isoforms and proteoforms. The PRO hierarchy also connects the specific proteoform of a protein across taxa when a similar modification of the conserved protein is experimentally observed in multiple organisms. Moreover, PRO provides abundant expert-curated PTM enzyme-substrate and PTM-dependent PPI relations.All PRO data are stored in a separate database, only part of which is consumed by iPTMnet. Descriptive information for each PRO entry such as name, definition and label are pulled out directly from the PRO database. PTM, PPI, and PRO hierarchical relationships are extracted and reformatted before being imported into iPTMnet, as follows: (i) For organism-specific PTM proteoforms (PRO category: organism-modification), we extract the evidence, reference UniProtKB AC and PTM sites along with their corresponding PTM types (represented by a PSI-MOD () or UniCarbKB () identifier) from the definition of the term. PTM enzyme information is extracted from the PRO comment line; (ii) Proteoform-dependent PPI information is extracted from the PRO annotation file. Annotations with evidence_code ‘IPI’ (Inferred from physical interaction) or under the branch of ‘protein binding’ are extracted with the documented interaction partner; and iii) two types of hierarchical relations are extracted: is_a (parent-child relationship) and intersection_of (usually used to connect a proteoform with the organism it is found in). [...] We integrate text mining results, ontology and PTM database information into the iPTMnet knowledgebase. Implemented using Oracle (12c release 1), the database is designed as a dimensional model to support data retrieval and visualization. We perform integrity checks to ensure the quality of both the text mining results and database information. We verify kinase information by checking that its corresponding UniProtKB entry is annotated with the keyword ‘kinase.’ This approach is also used for other PTM enzyme types. We confirm that the assigned PTM types conform to their known residue types. PTM sites are further checked to ensure that the PTM residue is found at the expected position in the UniProtKB protein sequence of the substrate.The front-end website is built using Django (www.djangoproject.com), a high-level Python Web Framework, which links the underlying system components to support interactive scientific queries and visualization of PTM data. The website provides two types of visualization: the Cytoscape () network view, implemented with the Cytoscape.js (version 2.4.2) graph theory library, and the multiple sequence alignment view. Multiple sequence alignments are performed using MUSCLE (). […]

Pipeline specifications