Identifies text associations between a wide range of biomedical entities. PolySearch is an online text-mining system that supports ‘Given X, find all associated Ys’ type of queries with X and Y from more than 20 types of biomedical subject areas such as human diseases, genes, single nucleotide polymorphisms (SNPs), proteins, drugs, organs, tissues, positive and negative health effects, drug actions, Gene Ontology terms, MeSH terms, biological and chemical taxonomies. It also generates, ranks and annotates associative candidates, with statistics and highlighted key sentences.
Navigates the complex world of gene and gene product identifiers. MatchMiner provides two primary functions: (i) LookUp, that translates an input list of gene identifiers into a matching output list of identifiers of a different type, and (ii) Merge that combines two separate lists of either the same or different types of identifiers into one list that details all one-to-one, one-to-many, and many-to-many relationships between corresponding gene identifiers in the two lists.
A web-based tool that extracts several types of relationships returned by PubMed queries and maps them into networks, allowing for graphical visualization, textual navigation, and topological analysis. PubNet supports the creation of complex networks derived from the contents of individual citations, such as genes, proteins, Protein Data Bank (PDB) IDs, Medical Subject Headings (MeSH) terms, and authors. This feature allows one to, for example, examine a literature derived network of genes based on functional similarity.
A system designed to integrate a pathway visualizer, text mining systems and annotation tools into a seamless environment. This will enable biologists to freely move between parts of a pathway and relevant sections of articles, as well as identify relevant papers from large text bases. PathText integrates three knowledge sources indispensable for systems biology, i.e. (i) external databases such as SwissProt, EntreGene, Flybase, HUGO, etc., (ii) text databases such as MEDLINE and full papers, and (iii) pathways as organized interpretations of biological facts.
Recognizes, annotates and translates biomedical entities (e.g., genes, proteins, drugs and diseases) from texts into networks for knowledge discovery. HiPub visualizes texts as interactive biomedical entity networks and provides an interactive user interface for network exploration, enrichment analyses and link-outs to external databases. The HiPub system consists of two components: the Chrome browser extension and the Biomedical NER server to stores all the PubMed abstracts in local storage for rapid and efficient text processing.
A desktop software application which retrieves all microRNA:mRNA functional pairs represented by an experimentally derived set of genes. Sigterms computes, for each microRNA, an enrichment statistic for overrepresentation of predicted targets within the gene set, which could help to implicate roles for specific microRNAs and microRNA-regulated genes in the system under study. Currently, the software supports searching of results from PicTar, TargetScan, and miRanda algorithms.
Determines the functional coherence of gene sets by performing latent semantic analysis of Medline abstracts. A Latent Semantic Indexing (LSI) model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.
e!DAL / electronic data archive library
Publishes and shares research data. e!DAL’s main features are version tracking, metadata management, information retrieval, registration of persistent identifiers (DOI), an embedded HTTP(S) server for public data access, access as a network file system, and a scalable storage backend. Packaged as e!DAL server, all required API components are compiled as executable archive. This can be executed at any platform to operate an own data publication infrastructure.
FAUN / Feature Annotation Using Nonnegative matrix factorization
A Web-based bioinformatics software environment to facilitate both the discovery and classification of functional relationships among genes. FAUN not only assists researchers to use biomedical literature efficiently, but also provides utilities for knowledge discovery. This Web-based software environment may be useful for the validation and analysis of functional associations in gene subsets identified by high-throughput experiments.
Utilizes simple yet effective linguistic features to extract relations with maximum entropy models. This chemical-induced disease (CID) relation extraction tool consisted of two specific subtasks: (i) the primary step for automatic Chemical Disease Relation (CDR) extraction is disease named entity recognition and normalization (DNER); (ii) CID relation extraction. Participants were provided with the same raw text as DNER, and asked to return a ranked list of chemical and disease entity pairs with normalized concept identifiers with which CIDs were associated in the abstract.
PatSeq Analyzer
Offers a genome viewer dedicated to explore patenting activities of sequences of interest. PatSeq Analyzer is a web application, part of the PatSeq toolkit, that allows users to :(i) investigate specific patent sequences, refine mapping position, and make comparisons of patenting trends, from either recorded patent sequences mapped onto five different genome or personal files and to; (ii) search by gene or SEQ ID for analyzing or predicting patenting activities related to it.
PatSeq Explorer
Investigates biological sequences in patent documents. PatSeq Explorer allows users to explore patent-disclosed sequences on a genome of a specific organism as well as determine linkages between sequences and phenotypes. The application contains patent sequences mapped onto five different genome such as soybean and mouse. It includes multiple features for searching (by keywords, inventors or classification) and filtering (by years or sequence length) for assisting users in highlighting patenting trends. It is part of the PatSeq toolkit.
MELODI / Mining Enriched Literature Objects to Derive Intermediates
Identifies mechanistic pathways between any two biomedical concepts. MELODI can generate hypotheses for further investigation. It identifies enriched overlapping objects which have been assigned to scientific literature and uses these to derive intermediate mechanisms. The tool includes an enrichment step, whereby the frequencies of terms within a set of articles are compared with the background frequencies in the whole database.
FastLink / Fast Probabilistic Record Linkage
Implements a Fellegi-Sunter probabilistic record linkage model allowing for missing data and the inclusion of auxiliary information. FastLink allows users to duct a merge of two datasets under the Fellegi-Sunter model using the Expectation-Maximization algorithm. The software also includes tools to prepare, adjust and summarize data merges. It can be used to merge data sets with millions of records in a reasonable amount of time using one’s laptop.
Mines PubMed database for identifying relationships between genes, proteins or any keywords put by the user. Chilibot provides three ways to execute searches for genes, proteins or keywords: one for relationships between two fields, one for relationships between many fields and the last one for searching relationships between two lists of fields. It displays relationships as a graph and can generate hypotheses and edit synonyms collected from several genomic/proteomic databases.
Disimweb / Disimweb: The disease similarity browser
Enables the user to obtain the similarity measure between over 28.5 million pairs of diseases. Disimweb is a full interactive browser. This software is a method able to summarise the information available in the biomedical literature databases. Connections to OMIM, MeSH and UniProtKB databases are also provided. The data and source code used to generate the similarity scores as well as the website is available for download from the same website.
PatSeq Data
Compiles data about patents disclosing genetic sequences. PatSeq Data is built around data collected from national patent offices, public sequence listings repositories or intellectual property organizations. Searches can be made by jurisdiction, document type, sequence type or location. The database provides additional statistics based on criteria, such as document or sequence type or data sources, as well as information about the public availability of sequences listings in the corresponding patent office. It is part of the PatSeq toolkit.
PubDNA Finder
Allows user to realize advanced researches about sequences of nucleic acids. PubDNA Finder is an online repository linking PubMed Central manuscripts to the different genetic sequences appearing in them. It extends the search capabilities provided by PubMed Central by allowing researchers to: (1) retrieve all articles containing the genetic sequences specified by the user; (2) retrieve all the sequences appearing in the manuscripts matching a keyword-based query; and (3) find all articles matching a keyword-based query and containing the sequences specified by user.
