1 - 11 of 11 results

gnparser / Global Names Parser

Parses scientific names of any complexity. gnparser identifies which combinations of the most atomic parts of a name-string represent words or dates. It allows developers to define the rules that describe the general structure of target strings thank to the implementation of Parsing Expression Grammar (PEG). The tool can be used to form normalized names automatically. It transforms names of taxa into their semantic elements. gnparser aims to complete coverage of the biodiversity’s test suite.

MicroPIE / Microbial Phenomics Information Extractor

A natural language processing application for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. MicroPIE uses a robust supervised classification algorithm to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. Evaluation against a hand-generated gold standard matrix showed that MicroPIE performed well on over half of designated characters and achieved an overall accuracy of 79.0% and overall performance that was significantly better than the performance of undergraduate microbiology students.

OrganismTagger

A hybrid rule-based/machine learning system to extract organism mentions from the literature. OrganismTagger includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID.

SR4GN

An open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the gene normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications.

LINNAEUS

An open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level.

uBioRSS

Obsolete
A 'taxonomically intelligent' service customized for the biological sciences. uBioRSS aggregates syndicated content from academic publishers and science news feeds, and then uses a taxonomic named entity recognition algorithm to identify and index taxonomic names within those data streams. The resulting name index is cross-referenced to current global taxonomic datasets to provide context for browsing the publications by taxonomic group. This process, called taxonomic indexing, draws upon services developed specifically for biological sciences, collectively referred to as 'taxonomic intelligence'. Such value-added enhancements can provide biologists with accelerated and improved access to current biological content.

NetiNeti / Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing

Obsolete
Finds scientific names in literature from various domains like biomedicine and biodiversity. NetiNeti is able to retrieve names with Optical Character Recognition (OCR) errors and variations. It employs probabilistic machine learning methods and constructs a machine learning classifier from both the structural features. This tool can determine the probability of a label given a candidate string along with its contextual information.