A network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. The iHOP system shows that distant medical and biological concepts can be related by surprisingly few intermediate genes; the shortest path between any two genes involves, on average, only four steps.
A library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. GNAT can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis.
A web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form.
A document-level gene normalization software for full-text articles. GeneTUKit employs both local context surrounding gene mentions and global context from the whole full-text document. It can normalize genes of different species simultaneously.
Extracts DNA sequences from biomedical articles and automatically maps them to genomic databases. text2genome links articles to genes and organisms without relying on gene names or identifiers. It also produces genome annotation tracks of the biomedical literature, thereby allowing researchers to use the power of modern genome browsers to access and analyze publications in the context of genomic data. System performance of the tool is related to the number of predictions made per paper.
A highly competitive system for gene name normalization, which obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BioCreAtIvE-II test set, thus being on a par with the best system on that task. GeNo tackles the complex gene normalization problem by employing a carefully crafted suite of symbolic and statistical methods, and by fully relying on publicly available software and data resources, including extensive background knowledge based on semantic profiling.
A hybrid method integrating a machine-learning model with a pattern identification strategy to identify the individual components of each composite mention. SimConcept achieves high performance in identifying and resolving composite mentions for three key biological entities: genes (90.42% in F-measure), diseases (86.47% in F-measure), and chemicals (86.05% in F-measure). SimConcept is the first text mining tool to systematically handle many types of composite mentions. It could be useful to assist the bioconcept normalization task.
A gene normalization system specifically tailored for plant species. The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. This pGenN website enables user search gene normalization information by keywords, a list of PMIDs, or UniProt ACs in the database. The results (Gene names and corresponding UniProt ACs) are displayed in sortable tables with text evidence and downloadable for further research.
Identifies potential name occurrences in the biomedical text and associate protein and gene database identifiers with the detected matches. ProMiner follows a rule-based approach and its search algorithm is geared towards recognition of multi-word names. It can be adapted to the characteristics of each organism using parameter settings and customized dictionary curation. The tool is able to obtain a high level of performance based on the classification of synonyms into several search classes.
Handles both gene mention and identifier detection. GNormPlus integrates several advanced text-mining techniques, including SimConcept for resolving composite gene names. It compares favorably to other state-of-the-art methods when evaluated on two widely used public benchmarking datasets, achieving 86.7% F1-score on the BioCreative II Gene Normalization task dataset and 50.1% F1-score on the BioCreative III Gene Normalization task dataset.
Aims to analyze PubMed abstracts. pubmed.mineR is a program that uses several existing functions from other R packages to enable text-mining. It includes a lot of features: terms extraction and their contexts, gene recognition, association between terms and between genes including cross-associations or hunting for key evidences of proof of associations or evidences.
A named entity recognition system intended primarily for biomedical text. BANNER uses conditional random fields as the primary recognition engine and includes a wide survey of the best techniques described in recent literature.
Informatives extraction method for various natural language processing tasks including: supervised name entity recognition and relationship extraction from biomedical documents. rainbow-nlp is based on distributional semantic similarity over the Gene Ontology (GO) terms. With a focus on gene functions, it includes two subtasks: (i) retrieving GO evidence sentences for relevant genes and (ii) predicting GO terms for relevant genes. The main advantage of using unsupervised open-IE technique is that it can easily be generalized and applied to similar relation extraction problems.
Computes similarities between an input text and already curated instances contained in a knowledge base to infer gene ontology concepts. The main limit of GOCat, both observed by reviewers and mentioned in our papers, was the difficulty to integrate it in a curation workflow: it is stated that GOCat proposes more accurate (Gene Ontology) GO concepts, but these concepts are inferred from the whole abstract, then the curators still have to locate the function in the publication and to link the correct GO concept with a gene product.
Offers a genome viewer dedicated to explore patenting activities of sequences of interest. PatSeq Analyzer is a web application, part of the PatSeq toolkit, that allows users to :(i) investigate specific patent sequences, refine mapping position, and make comparisons of patenting trends, from either recorded patent sequences mapped onto five different genome or personal files and to; (ii) search by gene or SEQ ID for analyzing or predicting patenting activities related to it.
Investigates biological sequences in patent documents. PatSeq Explorer allows users to explore patent-disclosed sequences on a genome of a specific organism as well as determine linkages between sequences and phenotypes. The application contains patent sequences mapped onto five different genome such as soybean and mouse. It includes multiple features for searching (by keywords, inventors or classification) and filtering (by years or sequence length) for assisting users in highlighting patenting trends. It is part of the PatSeq toolkit.
Enables the rapid identification of specific gene families of interest in related species. OrthoRBH streamlines the collection of homologs prior to downstream molecular evolutionary analysis. The efficacy of the program is demonstrated with the identification of the 13-member PYR/PYL/RCAR gene family in Hordeum vulgare using Oryza sativa query sequences. OrthoRBH is not recommended in situations where sequence homology is very low, for instance, identifying homologous sequences between plants and animals.
An integrative method to handle the three issues of the gene normalization (GN) task. GenNorm uses three modules, the gene name recognition (GNR) module, the species assignation (SA) module and the species-specific gene normalization (SGN) module. It makes sufficient use of gene/species information in context and of a thesaurus of gene/species.
Analyzes scientific documents to find the interactions between genes/proteins in order to reconstruct molecular networks. Biblio-MetReS relies on a central database with the genomes and gene annotation of more than 1000 organisms. It is this repository of gene names and functions that is accessed by the application when you choose your organism of interest.
Performs high-speed gene information discovery. GIS is a system with two modules: (i) gene information screening provides information about biological functions, associated diseases and related genes for a queried gene and (ii) gene–gene relation extraction extracts the gene–gene relations described in abstracts and estimates whether the relation between a pair of genes is positive, cooperative, or negative.
Provides several advanced functionalities in addition to the standard browsing capability of the official Gene Ontology (GO) browsing tool. DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The results are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. DynGO is generally applicable to any data set where the records are annotated with GO terms.
Enables the detection of gene patterns in an environmental context. MetaMine offers a targeted, knowledge driven system to detect gene patterns for subsequent correlation with environmental information: (i) the system is meant to confirm existing biological knowledge about genes involved in specific processes or pathways and (ii) the approach has the potential to detect genes of so far unknown functions but functionally linked to specific habitat parameters.
An indexing engine or tagger: a piece of software that can be used to recognize concepts in human readable text, based on a database (thesaurus) of known terms. Multi-word terms are correctly recognized. If terms can represent multiple concepts, Peregrine will attempt to disambiguate them.
0 - 0 of 0
1 - 2 of 2