A network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. The iHOP system shows that distant medical and biological concepts can be related by surprisingly few intermediate genes; the shortest path between any two genes involves, on average, only four steps.
A library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. GNAT can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis.
A web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form.
A search tool that integrates different sources of information with the aim to retrieve literature about sequence variation of a gene. In addition, it OSIRIS provides a method to link a dbSNP entry with the articles referring to it. OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases.
A document-level gene normalization software for full-text articles. GeneTUKit employs both local context surrounding gene mentions and global context from the whole full-text document. It can normalize genes of different species simultaneously.
Extracts DNA sequences from biomedical articles and automatically maps them to genomic databases. text2genome links articles to genes and organisms without relying on gene names or identifiers. It also produces genome annotation tracks of the biomedical literature, thereby allowing researchers to use the power of modern genome browsers to access and analyze publications in the context of genomic data. System performance of the tool is related to the number of predictions made per paper.
A highly competitive system for gene name normalization, which obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BioCreAtIvE-II test set, thus being on a par with the best system on that task. GeNo tackles the complex gene normalization problem by employing a carefully crafted suite of symbolic and statistical methods, and by fully relying on publicly available software and data resources, including extensive background knowledge based on semantic profiling.
A hybrid method integrating a machine-learning model with a pattern identification strategy to identify the individual components of each composite mention. SimConcept achieves high performance in identifying and resolving composite mentions for three key biological entities: genes (90.42% in F-measure), diseases (86.47% in F-measure), and chemicals (86.05% in F-measure). SimConcept is the first text mining tool to systematically handle many types of composite mentions. It could be useful to assist the bioconcept normalization task.
A gene normalization system specifically tailored for plant species. The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. This pGenN website enables user search gene normalization information by keywords, a list of PMIDs, or UniProt ACs in the database. The results (Gene names and corresponding UniProt ACs) are displayed in sortable tables with text evidence and downloadable for further research.
Identifies potential name occurrences in the biomedical text and associate protein and gene database identifiers with the detected matches. ProMiner follows a rule-based approach and its search algorithm is geared towards recognition of multi-word names. It can be adapted to the characteristics of each organism using parameter settings and customized dictionary curation. The tool is able to obtain a high level of performance based on the classification of synonyms into several search classes.
Handles both gene mention and identifier detection. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. It compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset.
Aims to analyze PubMed abstracts. pubmed.mineR is a program that uses several existing functions from other R packages to enable text-mining. It includes a lot of features: terms extraction and their contexts, gene recognition, association between terms and between genes including cross-associations or hunting for key evidences of proof of associations or evidences.
A named entity recognition system intended primarily for biomedical text. BANNER uses conditional random fields as the primary recognition engine and includes a wide survey of the best techniques described in recent literature.
Informatives extraction method for various natural language processing tasks including: supervised name entity recognition and relationship extraction from biomedical documents. rainbow-nlp is based on distributional semantic similarity over the Gene Ontology (GO) terms. With a focus on gene functions, it includes two subtasks: (i) retrieving GO evidence sentences for relevant genes and (ii) predicting GO terms for relevant genes. The main advantage of using unsupervised open-IE technique is that it can easily be generalized and applied to similar relation extraction problems.
Computes similarities between an input text and already curated instances contained in a knowledge base to infer gene ontology concepts. The main limit of GOCat, both observed by reviewers and mentioned in our papers, was the difficulty to integrate it in a curation workflow: it is stated that GOCat proposes more accurate (Gene Ontology) GO concepts, but these concepts are inferred from the whole abstract, then the curators still have to locate the function in the publication and to link the correct GO concept with a gene product.
Offers a genome viewer dedicated to explore patenting activities of sequences of interest. PatSeq Analyzer is a web application, part of the PatSeq toolkit, that allows users to :(i) investigate specific patent sequences, refine mapping position, and make comparisons of patenting trends, from either recorded patent sequences mapped onto five different genome or personal files and to; (ii) search by gene or SEQ ID for analyzing or predicting patenting activities related to it.
Investigates biological sequences in patent documents. PatSeq Explorer allows users to explore patent-disclosed sequences on a genome of a specific organism as well as determine linkages between sequences and phenotypes. The application contains patent sequences mapped onto five different genome such as soybean and mouse. It includes multiple features for searching (by keywords, inventors or classification) and filtering (by years or sequence length) for assisting users in highlighting patenting trends. It is part of the PatSeq toolkit.
Enables the rapid identification of specific gene families of interest in related species. OrthoRBH streamlines the collection of homologs prior to downstream molecular evolutionary analysis. The efficacy of the program is demonstrated with the identification of the 13-member PYR/PYL/RCAR gene family in Hordeum vulgare using Oryza sativa query sequences. OrthoRBH is not recommended in situations where sequence homology is very low, for instance, identifying homologous sequences between plants and animals.
Analyzes scientific documents to find the interactions between genes/proteins in order to reconstruct molecular networks. Biblio-MetReS relies on a central database with the genomes and gene annotation of more than 1000 organisms. It is this repository of gene names and functions that is accessed by the application when you choose your organism of interest.