1 - 41 of 41 results

BioCreative / Critical Assessment of Information Extraction systems in Biology

Promote the development of biomedical text mining applications. BioCreative works closely with biocurators to understand the various curation workflows, the Text Mining (TM) tools that are being used and their major needs. One of the aims of the BioCreAtIvE challenge is to determine the state of the art for a given task in biomedical text mining. This can be achieved if a considerable number of participants from a given community participates and the provided results of each system is evaluated by domain experts using well defined evaluation metrics. To address the barriers in using TM in biocuration, BioCreative has been conducting user requirements analysis and user-based evaluations, and fostering standards development for TM tool re-use and integration.


A modular framework for coreference resolution in biomedical text. Bio-SCoRes incorporates a variety of coreference types, their mentions and allows fine-grained specification of resolution strategies to resolve coreference of distinct coreference type-mention pairs. Bio-SCoRes follows a pipeline architecture, consisting of several mandatory and optional steps (linguistic pre-processing, domain-specific pre-processing, configuring resolution strategies, coreferential mention detection and post-processing). Experiments on several types of biomedical corpora demonstrated the extensibility of the architecture and its ease of adaptation.

becas / Biomedical Concept Annotation System

An API for biomedical concept identification and a web-based tool that addresses these limitations. MEDLINE abstracts or free text can be annotated directly in the web interface, where identified concepts are enriched with links to reference databases. Using its customizable widget, it can also be used to augment external web pages with concept highlighting features. Furthermore, all text-processing and annotation features are made available through an HTTP REST API, allowing integration in any text-processing pipeline.


Get annotations for biomedical text with concepts from the ontologies. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. To generate annotations for text, simply enter text in the box and press the submit button. The system matches words in the text to terms in ontologies by doing an exact string comparison (a “direct” match) between the text and ontology term names, synonyms, and ids.

CRAFT / Colorado Richly Annotated Full-Text

Collects full-text biomedical journal articles. CRAFT is a manually annotated corpus with all coreferential phenomena of identity and apposition. It also identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology.


Provides the bioinformatics community with annotated Web services descriptions in diverse formats. BioSWR is a web services registry that provides standard Resource Description Framework (RDF) based Web services descriptions along with the traditional Web Service Definition Language (WSDL) based ones. The registry provides Web-based interface for Web services registration, querying and annotation, and is also accessible programmatically via Representational State Transfer (REST) API or using a SPARQL Protocol and RDF Query Language.

NCBO Annotator+

Allows users to perform clinical text annotation. NCBO Annotator+ is a web application that contains several functions such as: scoring, detection of context (negation, experiencer, temporality), and coarse-grained concept recognition (with unified medical language system (UMLS) Semantic Groups). To perform, this tool uses a biomedical terms dictionary including about 600 semantic resources (with notably all UMLS and all the Open Biomedical Ontologies (OBO) Library ontologies).

iSimp / A Sentence Simplification System for Biomedical Text

Provides simplified text to enhance the performance of Natural language processing (NLP) systems and text mining (TM) applications. iSimp denotes simplified sentences in a corpus file, along with the annotation of simplification constructs in the original sentence. It uses shallow parsing and recursive transition networks to detect all forms of simplifications. This tool is able to detect six types of simplification constructs: coordination, relative clause, apposition, introductory phrase, subordinate clause and parenthetical element.


forum (1)
A platform for Biomedical Text Mining (BioTM) that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation.


A workbench for building text-mining solutions with the use of a rich graphical user interface, for the process of biocuration. Central to Argo are customizable workflows that users compose by arranging available elementary analytics to form task-specific processing units. A built-in manual annotation editor is the single most used biocuration tool of the workbench, as it allows users to create annotations directly in text, as well as modify or delete annotations created by automatic processing components.

SimSem / Similarity and Semantic

Allows semantic disambiguation via approximate string matching. SimSem exploits a collection of strings such as dictionaries, LibLinear as its machine-learning component and SimString for fast approximate string matching. It uses semantic category disambiguation (SCD) for the assignation of the appropriate semantic category. This tool is applicable with manual annotation support tasks and can be used as a high-recall component in text processing pipelines.

SAPIENTA / Semantic Annotation of Papers: Interface & ENrichment Tool Automated

Automatically annotates full scientific articles with categories from the first layer of the Core Scientific Concept (CoreSC) scheme. SAPIENT was trained on supervised machine learning algorithms and sequence labelling, and it employs conditional random fields (CRFs). The software can build extractive summaries of full papers in chemistry and biochemistry. This tool recognizes and qualifies discourse structure from the scientific literature.

TIES / Text Information Extraction System

Identifies, annotates, and indexes clinical documents. TIES is a natural language processing (NLP) pipeline and clinical document search engine. It supports tissue ordering and acquisition, building of Tissue Microarrays (TMAs), and integration with tissue banks and honest brokers. The tool provides a collaborative work space that enables research teams to work on queries and case sets together, even across institutions with separate TIES installations.


Adds ontology term selection to Excel spreadsheets. RightField can specify a range of allowed terms from a chosen ontology (subclasses, individuals or combinations). The resulting spreadsheet presents these terms to the users as a simple drop-down list. The tool enables users to import Excel spreadsheets, or generate new ones from scratch. It enables the scientist to consistently annotate their data without the need to explore and understand the numerous standards and ontologies available to them, and it does not require them to change normal practice.

Djeen / Database for Joomla's Extensible Engine

Allows managing project associated with heterogeneous data types while enforcing annotation integrity and minimum information. Djeen is a new Research Information Management System (RIMS) for collaborative projects. It is a user-friendly application, designed to streamline data storage and annotation collaboratively. Its database model, kept simple, is compliant with most technologies and allows storing and managing of heterogeneous data with the same system. Advanced permissions are managed through different roles. Templates allow Minimum Information (MI) compliance.

Medical Treebank

A handbook of domain-customized syntactic parsing guidelines based on iterative annotation and adjudication between two institutions (Kaiser Permanente and Vanderbilt University). Special considerations were incorporated into the guidelines Medical Treebank for handling ill-format sentences, which are common in clinical text. Medical Treebank (currently containing 1100 sentences) is the first one that applies Foster’s computationally verified approach to annotating ungrammatical clinical sentences.


Helps to quickly find interpretations of results from high-throughput experiments together with relevant literature or to simply scan the literature for discussed genes. GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. It accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. However, GoGene has a high recall of 75% and orthologous gene pairs can be distinguished from non-orthologous pairs purely based on text-mined annotations.

MILANO / Microarray Literature-based Annotation

Allows annotation of lists of genes derived from microarray results by user defined terms. MILANO expands the gene names to include all their informative synonyms while filtering out gene symbols that are likely to be less informative as literature searching terms. It supports searching two literature databases: GeneRIF and Medline (through PubMed), allowing retrieval of both quick and comprehensive results. MILANO also has two major advances over similar tools: the ability to expand gene names to include all their informative synonyms while removing synonyms that are not informative and access to the GeneRIF database which provides short summaries of curated articles relevant to known genes.