Offers a significant advance in the automated extraction of biological knowledge from sets of genes or abstracts. Martini is an easy-to-use tool that allows end-users to compare two gene sets using a sensitive, keyword-based method. It is based on keywords extracted from Medline abstracts and supports a much wider range of species than comparable tools. Martini is designed to be fast and easy-to-use, providing a quick first insight into the functional difference between two gene sets.
A publicly available application to search XML-formatted MEDLINE data in a complete, object-relational schema implemented in Oracle XML DB. An advantage offered by botXminer is that it can generate quantitative results with certain queries that are not feasible through the Entrez-PubMed interface. After retrieving citations associated with user-supplied search terms, MEDLINE fields (title, abstract, journal, MeSH and chemical) and terms (MeSH qualifiers and descriptors, keywords, author, gene symbol and chemical), these citations are grouped and displayed as tabulated or graphic results.
A web-based NCBI-PubMed search application, which can analyze articles for selected biomedical verbs and give users relational information, such as subject, object, location, manner, time, etc. After receiving keyword query input, BWS retrieves matching PubMed abstracts and lists them along with snippets by order of relevancy to protein-protein interaction. Users can then select articles for further analysis, and BWS will find and mark up biomedical relations in the text. The analysis results can be viewed in the abstract text or in table form.
A search tool that integrates different sources of information with the aim to retrieve literature about sequence variation of a gene. In addition, it OSIRIS provides a method to link a dbSNP entry with the articles referring to it. OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases.
Allows users to enter queries to find MeSH terms closely related to the queries. Meshable relies on co-occurrence of text words and MeSH terms to find keywords that are related to each MeSH term. A query is then matched with the keywords for MeSH terms, and candidate MeSH terms are ranked based on their relatedness to the query. The experimental results show that our method achieves the best performance among several term extraction approaches in terms of topic coherence. Moreover, the interface can be effectively used to find full names of abbreviations and to disambiguate user queries.
BDSS / Big Data Smart Socket
A data transfer tool that abstracts scientific data repositories. BDSS consists of three parts: metadata repository, BDSS transfert client and an integration as a Galaxy data transfer tool. BDSS has the ability to take a data file manifest and look-up an alternate host from a curated database and select an optimal data transfer method for the source and destination computers. BDSS allows a researcher, who may be unaware of available technologies, to perform faster transfer by asking for data in a familiar ways. BDSS is unique in that it is tied to a curated inventory of file data transfer nodes, networks, and data transfer methods, and allows scientists to take advantage of advances in data transfer technologies even when they are unware of them.
A web-based tool that gives a clear and handful overview of the bibliography available corresponding to the user input. Users provides as input a gene list (expressed by gene names or ids from EntrezGene), a context of study (expressed by keywords), and optionally a range date. From this input, GeneValorization provides a matrix containing the number of publications with co-occurrences of gene names and keywords for the given range date. Gene synonyms and Mesh terms are leveraged when searching for publications. Graphics are automatically generated to assess the relative importance of genes within various contexts. Links to publications and other databases offering information on genes and keywords are also available.
eFIP / extracting Functional Impact of Phosphorylation
A tool to support article selection and information extraction of functional impact of phosphorylated proteins. The current version focuses on protein-protein interactions (PPIs) as functional impact. In eFIP, PPIs refer to interactions between protein elements, including protein complexes and classes of proteins. Impact is defined as any direct relation between protein phosphorylation and PPI. The relation could be positive (phosphorylation of A increases binding to B), negative (when phosphorylated A dissociates from B) or neutral (phosphorylated A binds B).
SEACOIN / Search Explore Analyze COnnect INspire
Allows users to search biomedical literature. SEACOIN is an interactive tool that includes several features including hypothesis generation based on both open discovery and closed discovery models and extractive summarization of top ranked abstracts that are associated with the query. It also contains other features such as: (1) usage of a controlled vocabulary to limit the k-ary trees to genes and more; (2) incorporation of text preprocessing steps; or (3) a “history” feature to store previously queried computed co-occurrence networks.
Consists of a modular document processing pipeline with extensible components for natural language processing that achieve state-of-the-art performance for POS tagging and chemical named entity recognition. ChemDataExtractor is able to automatically extract chemical information from scientific documents, facilitating the creation of massive chemical databases with minimal time and effort. The system provides a table processor for extraction of tabulated experimental properties and document-level processing algorithms to resolve data interdependencies and produce unified chemical records that incorporate information from multiple document domains.
A sentence-based search tool for accessing published research articles related to a set of genes and concepts (keywords). Input may consist of several hundred genes and any number of concepts. Concepts can be simple keywords or more sophisticated ontology terms. Ferret may be used to efficiently perform a broad scan of the bioscience literature for information about genes of interest. Searches are conducted against PubMed, so Ferret is as current as PubMed. Ferret process retrieved PubMed documents to find sentences for each gene-concept pair and also for each gene-gene pair. It is species agnostic in its function.
TIES / Text Information Extraction System
Identifies, annotates, and indexes clinical documents. TIES is a natural language processing (NLP) pipeline and clinical document search engine. It supports tissue ordering and acquisition, building of Tissue Microarrays (TMAs), and integration with tissue banks and honest brokers. The tool provides a collaborative work space that enables research teams to work on queries and case sets together, even across institutions with separate TIES installations.
A semantic search engine for the life sciences. SeMedico is capable of document retrieval and text mining search. In its document retrieval mode, SeMedico is a search engine which is oriented towards a specialized scientific domain of discourse, viz. the life sciences (biology, medicine, chemistry, etc.). In its text mining mode, SeMedico serves as a tool for the detection and further exploration of hypotheses for which preliminary evidence is found in the document collection.
Automates the download of genomic sequences. Genomepy is a simple software package that contains both command-line tools as well as a Python application programming interface (API). It supports providers for genomes include UCSC, NCBI and Ensembl. Downloaded genome sequences can be soft- or hard-masked and specific chromosomes or scaffolds can be either included or excluded based on regular expressions. Genomepy is a free and open source software and can be installed through standard package managers.
Retrieves sequence entries from flatfile databases and files. entret reads one or more complete sequence entries from a database or a file and writes them to a text file. Optionally, the first sequence from the input stream only can be retrieved. The complete entry, including heading annotation, is retrieved and written and the data is not altered or reformatted in any way. entret reads and writes the complete sequence entry together with the heading annotation (documentation) without attempting to reformat or interpret the data in any way.
ID GeneQuest / Intellectual Disability GeneQuest
Allows users to retrieve information from the Intellectual Disability Gene Database on gene name, accession number, genome location, neighboring genes, function and associated diseases. ID GeneQuest is part of the ID gene database that is designed to provide integrated information on known and candidate ID genes, and their protein features, protein interactions and associated pathways. The goal is to aid both basic science and clinical researchers in new ID gene knowledge discovery and to facilitate hypothesis generation in the molecular basis of ID. The sequence and annotation data displayed in the ID Gene Database are freely available to general public.
Evaluates the significance of cocitation for two types of queries: (i) query gene set with any predefined/manually-curated gene set and (ii) query gene set with any user-defined free term set. CoCiter evaluates the significance of co-citation for any gene set from the 8,077,952 genes in the National Center for Biotechnology Information (NCBI) Entrez gene database, by using a text mining approach against the up-to-date Medical Literature Analysis and Retrieval System Online (MEDLINE) literature database. It provides a flexible and more precise approach to analyzing gene set functions, compared with the traditional function enrichment analysis.
Provides free access to the full text of books and documents in life sciences and health care. Bookshelf is a full-text resource built and maintained by the National Center for Biotechnology Information (NCBI) within the National Library of Medicine (NLM). It includes textbooks, monographs, health reports, documentation, website content and databases. The database aims to (i) advance science and improve health care through the collection, exchange and dissemination of books and related documents and (ii)provide a permanent stable archive for the collection.
SemMedDB / Semantic Medline Database
A database of semantic predications extracted from titles and abstracts of PubMed citations by SemRep, a rule-based semantic interpreter. Semantic predications are drawn from the unified medical language system knowledge sources; the subject and object pair corresponds to Metathesaurus concepts, and the predicate to a relation type in an extended version of the semantic network. SemMedDB can be used as a knowledge resource to assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support.
A searchable database of biomedical negated sentences. BioNOT can be used to extract such negated events in order to fill the gap created due to the absence of text mining applications. This database incorporates 58 million negated sentences, extracted from three sources: abstracts of articles indexed by PubMed, full-text of articles in the PubMed Central Open Access, and full-text of articles published by Elsevier publisher. After evaluating negated sentences for autism, Alzheimer's disease, and Parkinson's disease, we found many genes that are thought to be relevant by experts incorporate biomedical evidences suggesting the opposite.
EVEX / EVent EXtraction
Generates homology-based hypotheses as well as novel, indirect associations between genes and proteins such as coregulators. EVEX provides access to relevant information and related biomolecular entities of a gene or pair of genes of interest, from PubMed abstracts. These text mining results were generated by a state-of-the-art event extraction system and enriched with gene family associations and abstract generalizations. The EVEX resource locates relevant literature on phosphorylation, regulation targets, binding partners, and several other biomolecular events and assigns confidence values to these events.
BELTracker / Biological Expression Language Tracker
A database for retrieving evidence sentences from PubMed abstracts and full-text articles available at PubMed Central. BELTracker uses a combination of multiple approaches based on traditional information retrieval, machine learning, and heuristics to accomplish this task. This database comprises three main components: (i) translation of a given BEL statement to an information retrieval (IR) query, (ii) retrieval of relevant PubMed citations and (iii) finding and ranking the evidence sentences in those citations.
Deja vu
Presents publically cases of highly similar citations in Medline. Deja vu is a database of duplicate publications, as identified using a number of different techniques, with the principle one being text similarity comparisons. This resource includes: (i) a streamlined process to update the database on a daily basis, (ii) a more collaborative approach for recruitment and qualification of topical experts as volunteer curators for specific publication areas, and (iii) methods to better address the question most often asked by authors.
YIF / Yale Image Finder
Indexes text found inside biomedical images. YIF offers more comprehensive research results by searching over text that may not be present in the image caption, and offers the ability to find related images and associated papers by directly comparing image content. YIF’s analysis identifies image text elements, and subjects them to optical character recognition (OCR). We believe that searching over image text opens up new avenues for fruitful research in biomedical information retrieval.
