1 - 16 of 16 results


Uses to create and distribute code, software, and data for applying natural language processing techniques to biomedical texts. The aim of BioNLP is to provide algorithms and knowledge-based tools for the analysis and interpretation of high-throughput molecular biology data and for information extraction from and management of the biomedical literature. The major goal of this method was to explore the integration of concept recognition in biomedical information extraction systems.

DiMeX / eXtraction of Mutation association to Diseases

A text mining system for mutation-disease association extraction. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.

MRRAD / Multilingual Radiology Research Articles Dataset

Provides a corpus of Portuguese research articles about Radiology and human, automatic and semi-automatic translations to English. MRRAD containing for each article the original Portuguese document, the human translation (HT) translation, two alternative machine translations (MT) translations and a MT + PE ((post-editing) ) translation. The corpus can be used to study the efficacy of translation solutions in biomedical text, particularly in the field of Radiology.

TEPAPA / Text-based Exploratory Pattern Analyser for Prognosticator and Associator discovery

Identifies key clinicopathologic factors that differentiate subgroups of head and neck squamous cell carcinoma (HNSCC) patients by human papilloma virus (HPV) status. TEPAPA is an unbiased feature-learning pipeline that delivers “white-box” interpretable results to researchers for rapid hypothesis generation. It combines semantic-free natural language processing (NLP) methods, pattern search, and a “pattern-wide association study” to capture conserved patterns of electronic medical records (EMR) text associated with clinical outcomes of interest.

hivmut / HIV Mutation

A database of mutagenesis and mutation information on Human Immunodefiency Virus (HIV). Hivmut describes the phenotypes of 7,608 unique mutations at 2,520 sites in the HIV proteome, resulting from the analysis of 120,899 papers. The mutation information for each protein is organised in a residue-centric manner and each residue is linked to the relevant experimental literature. The importance of HIV as a global health burden advocates extensive effort to maximise the efficiency of HIV research. The HIV mutation browser provides a valuable new resource for the research community.


An open-source, rule-based system for extracting point mutation mentions from text. On blind test data, MutationFinder achieves nearly perfect precision and a markedly improved recall over a baseline. MutationFinder, along with a high-quality gold standard data set, and a scoring script for mutation extraction systems have been made publicly available. Implementations, source code and unit tests are available in Python, Perl and Java. MutationFinder can be used as a stand-alone script, or imported by other applications.


A text mining system which extracts protein mutation-disease associations from MEDLINE abstracts by incorporating discourse level analysis, using a benchmark data set extracted from curated database records. MutD achieves an F-measure of 64.3 % for reconstructing protein mutation disease associations in curated database records. Discourse level analysis component of MutD contributed to a gain of more than 10 % in F-measure when compared against the sentence level association extraction.


Automatically identifies the PubMed abstracts that contain information on the impact of a protein level mutation on the stability or the activity of a given enzyme. For querying EnzyMiner, please choose an enzyme from the list and specify if you are interested in disease related abstracts or non-disease related abstracts. For disease related abstracts, the mutation list and direct links to the abstracts will be displayed. For those abstracts that are related to non-diseases, in addition to having the mutation list, the abstracts are also categorized into two groups. These two groups determine whether the mutation has an effect on the enzyme's stability or functionality. If your target enzyme is not in the list, please write the enzyme name to the query box.