Maps biomedical text to the Unified Medical Language System (UMLS) Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text. MetaMap breaks the text into phrases and then, for each phrase, it returns the mapping options ranked according to the strength of mapping. It is meant for applications that emphasize processing speed and ease of use. The tool is modular for local use thank to its Java implementation. It allows the user to use customized dictionaries and focus on a specific domain or provide broad coverage of text types and semantic types.
Uses MALDI-TOF mass spectra fingerprints to track bacteria, yeasts and fungi, including agar plates-grown ones, in a given sample. SARAMIS offers an identification system leaning on an updated database allowing users to investigate colonies, including at sub-species levels.
Analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts. If you need to extract information from biomedical documents, this tagger might be a useful preprocessing tool.
Provides the key types of annotation for a single set of sentences, expressing complex relationships between both physical and abstract entities. BioInfer is a public resource providing an annotated corpus of biomedical English that aimed at developing information extraction (IE) systems and their components in the biomedical domain. This corpus is unique in the domain in combining annotation types for a single set of sentences, and in the level of detail of the relationship annotation.
An open source software tool for molecular biology text mining. At its core is a machine learning system using conditional random fields with a variety of orthographic and contextual features. The latest version is 1.5, which has an intuitive graphical interface and includes two modules for tagging entities (e.g. protein and cell line) trained on standard corpora, for which performance is roughly state of the art.
Provides a part-of-speech tagger trained on the MEDLINE corpus. MedPost accepts text for tagging in either native MEDLINE format or XML, both available as save options in PubMed. It is based on a stochastic tagger that employs a hidden Markov model (HMM). The tagger is able to achieve high accuracy by using the contextual information in the HMM to resolve ambiguities.
A UIMA-based combination system for clinical record concept annotation. ACCCA can annotate clinical concepts with 3 concept types: problem, treatment, and test. Currently it combines 6 systems (ABNER 1.5, Lingpipe 3.8, OpenNLP Chunker 2.1, JNET 2.3, Peregrine 2009, StanfordNER 1.1).