A free service that tags gene, protein, and small molecule names in any web page within a few seconds. Clicking on a tagged term opens a small popup showing summary information, as shown below. Reflect can be installed as a plugin to Firefox or Internet Explorer, or can used by entering a URL in the field above.
Maps biomedical text to the Unified Medical Language System (UMLS) Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text. MetaMap breaks the text into phrases and then, for each phrase, it returns the mapping options ranked according to the strength of mapping. It is meant for applications that emphasize processing speed and ease of use. The tool is modular for local use thank to its Java implementation. It allows the user to use customized dictionaries and focus on a specific domain or provide broad coverage of text types and semantic types.
Permits users to annotate entities by using a graphical web-based user interface called BRAT. NeuroNER can achieve named-entity recognition (NER) which purposes the following advantages: i) the exploitation of the sate-of-the-art prediction capabilities of neural networks, and ii) the creation or modification of annotations for a new or existing group.
A web-based text mining tool that extracts and incorporates comprehensive knowledge about E3s with their underlying mechanisms. E3Miner integrates available E3 data not only from the published literature but also from the biological databases, using natural language processing techniques.
Incorporates several text mining and information extraction components. LimTox is an online text mining application that automatically extracts toxicology relevant information from text, with special emphasis on drug-induced adverse hepatobiliary reactions. It was implemented to facilitate a more targeted retrieval of hepatotoxicity relevant information. The system can be used as a topic-specific search engine.
Provides a package for fitting topic models. topicmodels is an R package that includes an interface to the C code for Latent Dirichlet Allocation (LDA) models and Correlated Topics Models (CTM). It builds on and complements functionality for text mining already provided by package tm. Users can extend the methods and supply their own fit functions via the method argument.
An open source software tool for molecular biology text mining. At its core is a machine learning system using conditional random fields with a variety of orthographic and contextual features. The latest version is 1.5, which has an intuitive graphical interface and includes two modules for tagging entities (e.g. protein and cell line) trained on standard corpora, for which performance is roughly state of the art.
Provides a part-of-speech tagger trained on the MEDLINE corpus. MedPost accepts text for tagging in either native MEDLINE format or XML, both available as save options in PubMed. It is based on a stochastic tagger that employs a hidden Markov model (HMM). The tagger is able to achieve high accuracy by using the contextual information in the HMM to resolve ambiguities.
Implements the alpha-closed frequent subtree method. The Glycan Miner Tool was able to extract a significant pattern from glycan array data. It was proved using a viral infection experiment on cells with modified glycans on the cell surface. It is also used to analyze the glycan array data of influenza viruses to find novel glycan structures other than sialic acid (SA) that may be involved in viral infection.
A free and user-friendly text annotation tool aimed to assist in carrying out the main biocuration tasks and to provide labelled data for the development of text mining systems. MyMiner allows easy classification and labelling of textual data according to user-specified classes as well as predefined biological entities.
Allows different types of sentence extraction. BioIE employs predefined categories of interest relating to proteins and custom extraction around different entities and concepts, together with statistical feedback on the source and extracted text. It uses five predefined categories of interest relating to proteins: structure, function, diseases and therapeutic compounds and localization and familial relationships.
Builds protein reports from related entries in Swiss-Prot. METIS employs data in the Swiss-Prot entries to find relevant literature, or to find search terms with which to seek this out. It reduces the time required to seek out and read relevant literature. This tool is able to extract pertinent sentences from the biomedical literature.
A text-mining software tool that integrates several state-of-the-art entity tagging systems (DNorm, GNormPlus, SR4GN, tmChem, and tmVar) and offer a batch-processing mode able to process arbitrary text input (e.g., scholarly publications, patents and medical records) in multiple formats (e.g. BioC). We support multiple standards to make our service interoperable and allow simpler integration with other text-processing pipelines. To maximize scalability, we have pre-processed all PubMed articles, and use a computer cluster for processing large requests of arbitrary text.
A web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents.
A learning algorithm for unsupervised feature extraction, specifically designed for analysing noisy and high-dimensional datasets. KODAMA consists of two main parts: (i) the first step involves random assignment of each sample to a different class; (ii) in the second step, the cross-validated accuracy is maximized by an iterative procedure by swapping the class labels.
Cell line recognition and normalization system, supporting corpora and tagged documents. The aim is to create corpora that is suitable for training and evaluating machine learning systems to recognize and normalize established cell line names from text. We created two manually annotated corpora, Gellus and CLL. Gellus is suitable for the training of any machine learning systems in recognizing cell line name mentions while CLL is for evaluating the systems in recognizing the Cellosaurus cell line names.
Uses both character embedding and word embedding for the biomedical named entity recognition (NER) tasks. GRAM-CNN is an end-to-end model allowing to extract local information between a target word and its neighbors and requiring no task specific resources or handcrafted features. The software can theoretically be applied to wide range of BioNER tasks. The approach was evaluated on three biomedical datasets.
Generates dense vector representations of biological entities and their corresponding ontology-based annotations and its derived ontology structure. Onto2Vec provides a generic method which couples neural and symbolic approaches. This application can apply feature learning to arbitrary OWL axioms in biomedical ontologies. Users can also download an updated version of the software, called OPA2Vec.
Provides a neural multi-task learning approach for biomedical named entity recognition. LM-LSTM-CRF uses char-level neural models for biomedical named entity recognition (BioNER). The software trains different BioNER models on datasets with different entity types while sharing parameters across these models. It can assist scientists in exploiting knowledge in biomedical literature in a systematic and unbiased way.