Computational protocol: Construction of biological networks from unstructured information based on a semi-automated curation workflow

Similar protocols

Protocol publication

[…] Recognizing relevant biological terms in a text is fundamental for semantic retrieval and extraction of relationships. The pipeline, which was based on the Unstructured Information Management Architecture (UIMA) ( ), integrated and combined linguistic algorithms for NER and relationship extraction. A text document collection given as input to the pipeline can be output as XML-based BEL (XBEL) documents, an XML version of BEL. For dictionary-based NER, the pipeline features ProMiner, which was shown to be efficient in BioCreative NER assessments ( , ), and integrates several dictionaries that have been optimized for use in systems biology ( ). The evaluation of ProMiner NER for human and mouse gene/protein names achieved F-scores of 0.79 (for human) and 0.8 (for mouse). The NER performances of a number of dictionaries tested in the current workflow for relationship extraction of protein function inhibitors are shown in . The Turku Event Extraction System (TEES), a support vector machine-based text mining system developed at the University of Turku, Finland ( ), was integrated for the extraction of events and relations from natural language text sources. The NER performance of TEES was tested within the biomedical text mining (BioNLP shared task) ( ) assessments and reached an overall F-score of 53.3%. Relationships detected by TEES are output as ‘positive regulation’ and ‘negative regulation’ and are mapped and translated to BEL via a BEL converter within the UIMA pipeline. Further details about the conversion process have been described by Fluck et al. ( ). The BEL converter creates BEL statements and generates two documents (XBEL and XMI) as output. The XBEL facilitates its transformation into an assembled knowledge network model. It contains evidence, statements and annotations (e.g. name spaces, experimental context and publication details) to fully describe the BEL statements ( ). The XMI document contains information that cannot be coded in BEL but is relevant for the curation interface (e.g. text location of the recognized entities and alternative namespaces). […]

Pipeline specifications

Software tools ProMiner, BioCreative
Application Information extraction
Organisms Mus musculus, Homo sapiens
Diseases Vascular Diseases, Atherosclerosis