Computational protocol: Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation

Similar protocols

Protocol publication

[…] A cornerstone of the InterPro GO annotation protocol is that curators annotate an InterPro entry, and not to the individual sequence; this is the key difference between InterPro GO annotations and those provided by manual annotation efforts. GO terms are assigned by a curator to an InterPro entry based on the common characteristics of the protein set matched by the signatures belonging to that entry. InterPro2GO annotations all apply the GO evidence code ‘Inferred from Electronic Annotation’ (IEA), indicating that the GO annotations are the result of an automated prediction pipeline and have not been individually reviewed by curators. An individual sequence will therefore inherit an InterPro GO term if it matches the signatures within the InterPro entry when searched against them. [...] A flowchart illustrating the InterPro curator protocol is presented in . When annotating an InterPro entry, a curator first identifies those UniProtKB/Swiss-Prot (i.e. reviewed) sequences matched by the entry that has been experimentally characterized. Based on this information, the curator considers whether each of the GO terms that could potentially be applied is valid for the remaining proteins in the match set. This is done by evaluating alignments of the sequences and the experimental evidence in the literature. The UniProtKB/Swiss-Prot GO terms should be applicable to at least 95% of reviewed proteins in the entry. This cut-off sets a stringent standard for evidence yet provides enough flexibility to accommodate the predictive nature of the signatures used in creating InterPro entries. More stringent requirements would result in a loss of a large number of valid InterPro2GO mappings. InterPro GO coverage as of InterPro v34.0 is detailed in . Figure 1. If the UniProtKB/Swiss-Prot GO terms are too specific to be attached to an entire InterPro entry, the InterPro curator can choose a related but more general GO term that is nonetheless still applicable to the full set of sequences. If no GO term exists to describe the function, creation of an appropriate term is requested from the GO consortium. If there is no experimental evidence to confirm a function, process or location term that can be applied to all sequences in the entry, then no GO term is applied.While UniProtKB/Swiss-Prot annotations are used as a starting point, we are not limited to these terms: unreviewed proteins in UniProtKB/TrEMBL are included for consideration if there is sufficient experimental evidence in support of a particular GO term. Similarly, if a curator identifies a function, process or location in the literature, which is applicable to the entire InterPro entry protein match set but which is not currently annotated to any individual sequence by UniProtKB, the appropriate term is added to the entry. GO annotations by TIGRFAMs (), HAMAP () and PANTHER keywords () are also considered for annotation, and are reviewed by a curator before inclusion. Once GO terms have been chosen, the InterPro abstract is updated with references to the literature supporting the annotation. With the exception of conserved sites (where there is an implicit lack of experimental evidence detailing involvement in functions, locations or processes), the above protocol currently applies to all InterPro entry types; however, some changes (detailed below) now occur for domains.InterPro GO annotations are available to the community primarily in two forms: users may query a sequence or sequences using InterProScan, or browse and download mappings at the InterPro website. InterPro GO annotations are also available at a sequence level via UniProt-GOA. […]

Pipeline specifications

Software tools InterPro, Interpro2GO, InterProScan
Databases UniProt-GOA
Application Protein sequence analysis
Diseases Genetic Diseases, Inborn