1 - 44 of 44 results


Allows users to explore PubMed search results with the Gene Ontology (GO), a hierarchically structured vocabulary for molecular biology. GoPubMed provides the following benefits: first, it gives an overview of the literature abstracts by categorizing abstracts according to the GO and thus allowing users to quickly navigate through the abstracts by category. Second, GoPubMed automatically shows general ontology terms related to the original query, which often do not even appear directly in the abstract. Third, it enables users to verify its classification because GO terms are highlighted in the abstracts and as each term is labelled with an accuracy percentage. Fourth, exploring PubMed abstracts with GoPubMed is useful as it shows definitions of GO terms without the need for further look up.


A framework for the automatic extraction of gene-disease relation from biomedical literature. DTMiner takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. DTMiner incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene-disease pairs, and ranking algorithms that estimate how closely the pairs are related.


Get annotations for biomedical text with concepts from the ontologies. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. To generate annotations for text, simply enter text in the box and press the submit button. The system matches words in the text to terms in ontologies by doing an exact string comparison (a “direct” match) between the text and ontology term names, synonyms, and ids.


A syntax convolutional neural network (SCNN) based drug-drug interaction (DDI) extraction approach. In this approach, a novel word embedding (syntax word embedding) is proposed to exploit the syntactic information of a sentence. Then the syntax word embedding is extended by the position and POS features to introduce the position and POS information. In addition, auto-encoder is employed to transfer sparse bag-of-words feature vectors to dense real value feature vectors before they are combined with the convolutional features. Finally, their combination is passed to a softmax to learn the DDI classifier. Experimental results on the DDIExtraction 2013 corpus show that our method achieves an F-score of 0.686 which is superior to those of the state-of-the-art methods.

@MInter / Automated Text-mining of Microbial Interactions

A machine-learning-based text-mining system for automated identification and extraction of microbial interaction data. @MInter complements manual annotation and curation through the identification of potentially informative texts, to accelerate the pace of data curation. It comprises of two core components, a crawler for the acquisition of article data and a classifier for the identification of abstracts containing interaction information. @MInter can be useful to mine large datasets such as the unstructured text compendiums of scientific literature.

PIE the search / Protein Interaction information Extraction the search

Provides protein interaction information (PPI) articles for biologists, baseline system performance for bio-text mining researchers and a compact PubMed-search environment for PubMed users. For easy user access, PIE the search provides a PubMed-like search environment, but the output is the list of articles prioritized by PPI confidence scores. By obtaining PPI-related articles at high rank, researchers can more easily find the up-to-date PPI information, which cannot be found in manually curated PPI databases.

AND-HCV / Associative Network Discovery-HCV

A system for automated reconstruction of molecular genetic interaction networks. ANDSystem provides the reconstruction and analysis of semantic associative network, i.e., networks describing interactions between molecular and genetic objects associated with certain biological processes, phenotypic traits, diseases, etc. For automated knowledge extraction, ANDSystem uses shallow parsing technology based on semantic templates. It contains data on the involvement of viral proteins in the regulation of expression and activity of human genes and proteins. Interrelation and visualization tool providing graphic representation of associative networks and convenient interface for navigation through the database.

GLAD4U / Gene List Automatically Derived For You

Creates expert candidate gene lists. GLAD4U is a web-based gene retrieval and prioritization tool. It uses NCBI eSearch API to find publications related to a user query and on the gene-to-publication link table to identify genes from the retrieved publications. It also provides additional functionalities such as sending queries towards WebGestalt for analyzing functional enrichment, or provides a direct link to visualize interactions among the protein products of the genes based on the Cytoscape Web utility.

ANDSystem / Associative Network Discovery System

A tool developed for the reconstruction of molecular genetic networks. ANDSystem is based on an automated text- and data-mining techniques. It provides detailed description of the various types of interactions between genes, proteins, microRNAs, metabolites, cellular components, pathways and diseases, taking into account the specificity of cell lines and organisms. Although the accuracy of ANDSystem is comparable to other well known text-mining tools, such as Pathway Studio and STRING, it outperforms them in having the ability to identify an increased number of interaction types.


Visualizes the association between proteins and diseases, based on text mining data processed from scientific literature. TIN-X provides an interactive visualization, ranking, and prioritization platform for scientists interested in exploring potentially novel drug targets, and examining the relationship between diseases, disease categories, proteins and protein classes, using automated text mining of biomedical literature. It cannot replace expert human readers and curators, yet, it is increasingly clear that automated bibliometry is essential given the accelerated pace and volume of publications.

IBRel / Identifying Biomedical Relations

Assists in evaluating for miRNA-gene relation extraction. IBRel is a method for extraction of biomedical relations from texts using only existing resources, and a dataset of miRNA-gene relations automatically extracted and manually validated. It is based on the sparse multi-instance learning algorithm, used to train on an automatically generated corpus of 4,000 documents related to miRNAs. This method does not require a manually annotated corpus.


A web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form.

crowd cid relex

A crowdsourcing approach to extracting chemical-disease relations (CDRs) from PubMed abstracts in the context of the BioCreative V community-wide biomedical text mining challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition.

DiMeX / eXtraction of Mutation association to Diseases

A text mining system for mutation-disease association extraction. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.


A resource which aims to be the most comprehensive freely available database of disease–gene associations. DISEASES is based on an open-source text-mining software that recognizes diseases and human genes in text and extracts disease–gene associations. We integrate the associations extracted through automatic text mining with evidence from databases with permissive licenses, namely manually curated associations from Genetics Home Reference (GHR) and UniProt Knowledgebase (UniProtKB), GWAS results from DistiLD, and mutation data from Catalog of Somatic Mutations in Cancer (COSMIC).


Compiles information from a broad selection of resources and limits display of the information to user-selected areas of interest. ToxReporter is a PERL-based web-application which utilizes a MySQL database to streamline this process by categorizing public and proprietary domain-derived information into predefined safety categories according to a customizable lexicon. It also uses a scoring system based on relative counts of the red-flags to rank all genes for the amount of information pertaining to each safety issue.

CD-REST / Chemical Disease Relation Extraction SysTem

An end-to-end system for extracting chemical-induced disease relations in biomedical literature. CD-REST consists of two main components: (1) a chemical and disease named entity recognition and normalization module, which employs the Conditional Random Fields algorithm for entity recognition and a Vector Space Model-based approach for normalization; and (2) a relation extraction module that classifies both sentence-level and document-level candidate drug-disease pairs by support vector machines. Our system achieved the best performance on the chemical-induced disease relation extraction subtask in the BioCreative V CDR Track, demonstrating the effectiveness of our proposed machine learning-based approaches for automatic extraction of chemical-induced disease relations in biomedical literature. The CD-REST system provides web services using HTTP POST request.

eFIP / extracting Functional Impact of Phosphorylation

A tool to support article selection and information extraction of functional impact of phosphorylated proteins. The current version focuses on protein-protein interactions (PPIs) as functional impact. In eFIP, PPIs refer to interactions between protein elements, including protein complexes and classes of proteins. Impact is defined as any direct relation between protein phosphorylation and PPI. The relation could be positive (phosphorylation of A increases binding to B), negative (when phosphorylated A dissociates from B) or neutral (phosphorylated A binds B).


A PPI search system which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.


Extracts UMLS concepts from biomedical texts such as scientific paper abstracts, experiments descriptions or medical notes and can be used to automatically curate and annotate BioMedical Literature or to index large documents databases and improve searches or discover relationships between them. Recognizing specific biomedical concepts from free text is an increasingly important process and Biolabeler focus on this task to help human and computer annotators to be more precise in order to improve the quality of the huge Biomedical text databases that bioinformatics and biologists has to deal with nowadays.


A web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair.

CPNM / Context-specific Protein Network Miner

Derives context-specific protein interactions networks (PINs) in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM provides a tool for biologists to explore PINs and reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics that can be explored via a user-friendly interface. It generates PINs in real time from the current version of the PubMed database based on a specific set of keywords provided by the user.