Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

Relation extraction software tools | Biomedication text mining

Rapidly evolving sequencing technologies (Zhang et al., 2011), (Capriotti et al., 2012) have led to a dramatic rise in the number of published articles reporting associations between genomic variations and diseases. There is an estimate that over 10,000 articles are published each year mentioning such associations (Burger et al., 2014). Manually collecting this information is both expensive and time consuming. To assist this manual curation, several text-mining (TM) efforts have been attempted. However, most of these efforts are limited to identifying mutation mentions only. The majority utilize regular expressions to detect mutations, although there are some, like tmVar (Wei et al., 2013) and VTag (McDonald et al., 2004), that use conditional random fields (CRFs), and SETH (Thomas et al., 2014), which implements an Extended Backus-Naur Form (EBNF) grammar. Only a few of these efforts extend the mutation detection method to associate the mutation with a disease phenotype. Most of these are search based TM tools that do not employ automatic extraction of the mutation-disease relationships expressed in articles.

Source text:
(Mahmood et al., 2016) DiMeX: A Text Mining System for Mutation-Disease Association Extraction. PLoS One.
(Zhang et al., 2011) The impact of next-generation sequencing on genomics. J Genet Genomics.
(Capriotti et al., 2012) Bioinformatics for personal genome interpretation. Brief Bioinform.
(Burger et al., 2014) Hybrid curation of gene-mutation relations combining automated extraction and crowdsourcing. Database 2014.
(Wei et al., 2013) tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics.
(McDonald et al., 2004) An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics.
(Thomas et al., 2014) SETH: SNP Extraction Tool for Human Variations.

1 - 44 of 44 results
filter_list Filters
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 44 of 44 results
A web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form.
PIE the search / Protein Interaction information Extraction the search
Provides protein interaction information (PPI) articles for biologists, baseline system performance for bio-text mining researchers and a compact PubMed-search environment for PubMed users. For easy user access, PIE the search provides a PubMed-like search environment, but the output is the list of articles prioritized by PPI confidence scores. By obtaining PPI-related articles at high rank, researchers can more easily find the up-to-date PPI information, which cannot be found in manually curated PPI databases.
A syntax convolutional neural network (SCNN) based drug-drug interaction (DDI) extraction approach. In this approach, a novel word embedding (syntax word embedding) is proposed to exploit the syntactic information of a sentence. Then the syntax word embedding is extended by the position and POS features to introduce the position and POS information. In addition, auto-encoder is employed to transfer sparse bag-of-words feature vectors to dense real value feature vectors before they are combined with the convolutional features. Finally, their combination is passed to a softmax to learn the DDI classifier. Experimental results on the DDIExtraction 2013 corpus show that our method achieves an F-score of 0.686 which is superior to those of the state-of-the-art methods.
@MInter / Automated Text-mining of Microbial Interactions
A machine-learning-based text-mining system for automated identification and extraction of microbial interaction data. @MInter complements manual annotation and curation through the identification of potentially informative texts, to accelerate the pace of data curation. It comprises of two core components, a crawler for the acquisition of article data and a classifier for the identification of abstracts containing interaction information. @MInter can be useful to mine large datasets such as the unstructured text compendiums of scientific literature.
CIIPro / Chemical In vitro-In vivo Profiling
A package to link chemical features and in vitro biological data with targeted in vivo biological activity. The CIIpro portal can automatically extract in vitro biological data from public resources for user-supplied compounds, and identify the most similar compounds based on their optimized bioprofiles. Compared to the existing hybrid approaches, the CIIPro portal provides a new read-across strategy to deal with missing data and biased data issues when using public data sources.
Visualizes the association between proteins and diseases, based on text mining data processed from scientific literature. TIN-X provides an interactive visualization, ranking, and prioritization platform for scientists interested in exploring potentially novel drug targets, and examining the relationship between diseases, disease categories, proteins and protein classes, using automated text mining of biomedical literature. It cannot replace expert human readers and curators, yet, it is increasingly clear that automated bibliometry is essential given the accelerated pace and volume of publications.
A framework for the automatic extraction of gene-disease relation from biomedical literature. DTMiner takes large biomedical literature repositories as inputs, identifies credible relationships between diseases and genes, and presents possible genes related to a given disease and possible diseases related to a given gene. DTMiner incorporates name entity recognition (NER), which identifies occurrences of genes and diseases in texts, association detection whereby we extract and evaluate features from gene-disease pairs, and ranking algorithms that estimate how closely the pairs are related.
eFIP / extracting Functional Impact of Phosphorylation
A tool to support article selection and information extraction of functional impact of phosphorylated proteins. The current version focuses on protein-protein interactions (PPIs) as functional impact. In eFIP, PPIs refer to interactions between protein elements, including protein complexes and classes of proteins. Impact is defined as any direct relation between protein phosphorylation and PPI. The relation could be positive (phosphorylation of A increases binding to B), negative (when phosphorylated A dissociates from B) or neutral (phosphorylated A binds B).
GLAD4U / Gene List Automatically Derived For You
Creates expert candidate gene lists. GLAD4U is a web-based gene retrieval and prioritization tool. It uses NCBI eSearch API to find publications related to a user query and on the gene-to-publication link table to identify genes from the retrieved publications. It also provides additional functionalities such as sending queries towards WebGestalt for analyzing functional enrichment, or provides a direct link to visualize interactions among the protein products of the genes based on the Cytoscape Web utility.
eGARD / extracting Genomic Anomalies association with Response to Drugs
Proposes a text-mining method to highlight interactions between genomic anomalies and drug response within MEDLINE abstracts. eGARD detects lexico-syntactic dependency structures in scientific literature with the aim of assisting users in curation. This application can consider several anomalies that includes substitutions, duplications, insertions, deletions, gene copy number variations, structural variants or gene and protein expression changes.
IBRel / Identifying Biomedical Relations
Assists in evaluating for miRNA-gene relation extraction. IBRel is a method for extraction of biomedical relations from texts using only existing resources, and a dataset of miRNA-gene relations automatically extracted and manually validated. It is based on the sparse multi-instance learning algorithm, used to train on an automatically generated corpus of 4,000 documents related to miRNAs. This method does not require a manually annotated corpus.
Highlights disease-related genes. LGscore is an approach exploiting a conjunction of literature and Google search data to improve the efficiency of its prediction. The method leans on a text-mining process initiated from PubMed abstracts linked with diseases of interest to establish a first gene network. This network is then enriched with information extracted from Google Search to refine the predictions. It was tested on five different diseases including diabetes, colon and lung cancer.
Get annotations for biomedical text with concepts from the ontologies. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. To generate annotations for text, simply enter text in the box and press the submit button. The system matches words in the text to terms in ontologies by doing an exact string comparison (a “direct” match) between the text and ontology term names, synonyms, and ids.
AND-HCV / Associative Network Discovery-HCV
A system for automated reconstruction of molecular genetic interaction networks. ANDSystem provides the reconstruction and analysis of semantic associative network, i.e., networks describing interactions between molecular and genetic objects associated with certain biological processes, phenotypic traits, diseases, etc. For automated knowledge extraction, ANDSystem uses shallow parsing technology based on semantic templates. It contains data on the involvement of viral proteins in the regulation of expression and activity of human genes and proteins. Interrelation and visualization tool providing graphic representation of associative networks and convenient interface for navigation through the database.
A PPI search system which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.
QTM / QTLTableMiner++
Detects and allows to semantically annotate results obtained from quantitative trait locus (QTL) mapping experiments. QTM extracts articles in a syntactically interopable format XML. This software filters candidate trait tables out of all tables in an article. It exploits the Apache Solr search platform to extract and annotate biological entities in these statements with domain-specific ontologies. This tool aims to provide QTL information in machine readable and semantically interopable formats.
ANDSystem / Associative Network Discovery System
A tool developed for the reconstruction of molecular genetic networks. ANDSystem is based on an automated text- and data-mining techniques. It provides detailed description of the various types of interactions between genes, proteins, microRNAs, metabolites, cellular components, pathways and diseases, taking into account the specificity of cell lines and organisms. Although the accuracy of ANDSystem is comparable to other well known text-mining tools, such as Pathway Studio and STRING, it outperforms them in having the ability to identify an increased number of interaction types.
miRiaD / microRNAs in association with Disease
Automates investigations about potential associations between MicroRNAs (miRs) and diseases within scientific literature. miRiaD is a web platform based on an approach purposes to reconstruct the links between a miR and a disease of interest in a text, by extracting information from Medline abstracts. The application aims to detect relationships between (i) a miR and its aspect, (ii) a disease and its aspect and (iii) a miR entity and a disease entity.
A web-based text mining tool to extract human PPIs from biomedical literature. PPInterFinder uses relation keyword co-occurrences with protein names to extract information on PPIs from MEDLINE abstracts and consists of three phases. First, it identifies the relation keyword using a parser with Tregex and a relation keyword dictionary. Next, it automatically identifies the candidate PPI pairs with a set of rules related to PPI recognition. Finally, it extracts the relations by matching the sentence with a set of 11 specific patterns based on the syntactic nature of PPI pair.
CD-REST / Chemical Disease Relation Extraction SysTem
An end-to-end system for extracting chemical-induced disease relations in biomedical literature. CD-REST consists of two main components: (1) a chemical and disease named entity recognition and normalization module, which employs the Conditional Random Fields algorithm for entity recognition and a Vector Space Model-based approach for normalization; and (2) a relation extraction module that classifies both sentence-level and document-level candidate drug-disease pairs by support vector machines. Our system achieved the best performance on the chemical-induced disease relation extraction subtask in the BioCreative V CDR Track, demonstrating the effectiveness of our proposed machine learning-based approaches for automatic extraction of chemical-induced disease relations in biomedical literature. The CD-REST system provides web services using HTTP POST request.
crowd cid relex
A crowdsourcing approach to extracting chemical-disease relations (CDRs) from PubMed abstracts in the context of the BioCreative V community-wide biomedical text mining challenge. Five non-expert workers on the CrowdFlower platform were shown each potential chemical-induced disease relation highlighted in the original source text and asked to make binary judgments about whether the text supported the relation. Worker responses were aggregated through voting, and relations receiving four or more votes were predicted as true. On the official evaluation dataset of 500 PubMed abstracts, the crowd attained a 0.505 F-score (0.475 precision, 0.540 recall), with a maximum theoretical recall of 0.751 due to errors with named entity recognition.
Compiles information from a broad selection of resources and limits display of the information to user-selected areas of interest. ToxReporter is a PERL-based web-application which utilizes a MySQL database to streamline this process by categorizing public and proprietary domain-derived information into predefined safety categories according to a customizable lexicon. It also uses a scoring system based on relative counts of the red-flags to rank all genes for the amount of information pertaining to each safety issue.
Extracts UMLS concepts from biomedical texts such as scientific paper abstracts, experiments descriptions or medical notes and can be used to automatically curate and annotate BioMedical Literature or to index large documents databases and improve searches or discover relationships between them. Recognizing specific biomedical concepts from free text is an increasingly important process and Biolabeler focus on this task to help human and computer annotators to be more precise in order to improve the quality of the huge Biomedical text databases that bioinformatics and biologists has to deal with nowadays.
CPNM / Context-specific Protein Network Miner
Derives context-specific protein interactions networks (PINs) in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM provides a tool for biologists to explore PINs and reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics that can be explored via a user-friendly interface. It generates PINs in real time from the current version of the PubMed database based on a specific set of keywords provided by the user.
1 - 4 of 4 results
filter_list Filters
build Data Access
copyright License
1 - 4 of 4 results
A resource which aims to be the most comprehensive freely available database of disease–gene associations. DISEASES is based on an open-source text-mining software that recognizes diseases and human genes in text and extracts disease–gene associations. We integrate the associations extracted through automatic text mining with evidence from databases with permissive licenses, namely manually curated associations from Genetics Home Reference (GHR) and UniProt Knowledgebase (UniProtKB), GWAS results from DistiLD, and mutation data from Catalog of Somatic Mutations in Cancer (COSMIC).
1 - 3 of 3 results
filter_list Filters
computer Job seeker
Disable 1
thumb_up Fields of Interest
public Country
language Programming Language
1 - 3 of 3 results