CRAFT specifications


Unique identifier OMICS_20821
Alternative name Colorado Richly Annotated Full-Text
Restrictions to use None
Community driven No
Data access File download, Browse
User data submission Not allowed
Version 2.0
Content license Creative Commons Attribution 3.0 license (CC BY).
Maintained Yes


  • person_outline Michael Bada
  • person_outline Kevin Cohen

[…] udy included five contemporary annotators - Whatizit, MetaMap, Neji, Cocoa, and BANNER, which were compared on three manually annotated corpora of biomedical publications, namely NCBI Disease corpus, CRAFT, and AnEM (see Table ). Evaluation on the CRAFT corpus considered 6 different biomedical entity types (e.g. species, cell, cellular component, gene and proteins), while on the other two corpora […]


[…] al Process (F-measure 0.42) and Molecular Function (F-measure 0.14) were much more difficult to recognize in text. Campos et al. present a framework called Neji and compare it against Whatizit on the CRAFT corpus []; they find similar best performance (Cellular Component 0.70, Biological Process/Molecular Function 0.35). Other work explored the impact of case sensitivity and information gain on co […]


[…] le the first two phases were represented by pastoral and early village farming communities, the mature Harappan settlements were highly urbanized with several organized cities, developed material and craft culture having trans-Asiatic trading to regions as distant as Arabia and Mesopotamia. The late Harappan phase witnessed large scale deurbanization, population decrease, abandonment of many estab […]


[…] tifiers (CUIs). Gurulingappa et al. annotated mentions of diseases and adverse events and their corresponding UMLS CUIs, in a set of 4272 sentences from Medline abstracts describing case reports. The Colorado Richly Annotated Full-Text corpus consists of 97 full-text biomedical articles with concept annotations from nine ontologies and terminologies, including Chemical Entities of Biological Inter […]


[…] sed (e.g., with stemming), and how terms are matched to text (e.g., via case-insensitive matching or with flexible word order). It has been demonstrated to achieve state of the art performance on the CRAFT corpus for a range of corpora, depending on what parameter settings are used []. cTAKES [] from Mayo Clinic consists of a staged pipeline of modules that are both statistical and rule-based. The […]


[…] tween genes, proteins, complexes, or families, except for Genia and the Bacteria Gene Interaction corpora. Corpora whose annotations are mapped to identifiers in a database, e.g., EntrezGene, such as CRAFT and OSIRIS, allow their use for the development of gene/protein normalization tools . Finally, the high number of corpora available for gene/protein corpora is due to the importance of these ent […]


CRAFT institution(s)
Computational Bioscience Program, University of Colorado School of Medicine, Denver, CO, USA; Department of Linguistics, University of Colorado, Boulder, CO, USA; School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, Australia
CRAFT funding source(s)
Supported by National Institutes of Health grants G08 LM009639, 3G08 LM009639-02S1 (ARRA), 2R01LM009254, and R01LM008111, the Australian Research Council through a Discovery Project grant, DP150101550, the Australian Federal and Victorian State governments, the Australian Research Council through the ICT Centre of Excellence program, National ICT Australia (NICTA) and in part by the DARPA “Big Mechanism” program, BAA 14-14, under contract W911NF-14-C-0109 with the Army Research Office (ARO).

