Permits users to establish a foundation for systematic reasoning over the intercellular network. iX is a comprehensive high-resolution knowledgebase of directional intercellular interactions that was text-mined from all available PubMed abstracts across a broad range of disease conditions. Interactions captured by this method include (i) direct cytokine binding or secretion events and more distant, and (ii) indirect influencing relations, scored and filtered to emphasize precision.
A publicly available, multidisciplinary, cancer-focused knowledgebase developed to support cancer translational research and drug discovery. canSAR integrates genomic, protein, pharmacological, drug and chemical data with structural biology, protein networks and druggability data. canSAR is widely used to rapidly access information and help interpret experimental data in a translational and drug discovery context.
Provides experimentally determined kinetic data of protein–protein, protein-RNA, protein-DNA, protein-ligand, RNA-ligand, DNA-ligand binding or reaction events described in the literature. KDBI contains information about binding or reaction event, participating molecules (name, synonyms, molecular formula, classification, SWISS-PROT AC or CAS number), binding or reaction equation, kinetic data and related references.
Provides a corpus of scientific texts, used for BioCreative, a competition in which participants are given well defined text-mining or information extraction tasks in the biological domain. BC2GM-corpus consists mainly of the training and testing corpora from BioCreative I and the testing corpus for the current task consists of an additional 5,000 sentences that were held 'in reserve'.
Aims to collate any relevant data pertaining to any PE2-4 protein. Missing ProteinPedia permits to define, summarize and discuss all available data for the so-called missing proteins, emphasizing why they may be currently difficult to observe/find, using standard proteomics mass spectrometry (MS) and Ab-based techniques. It allows the generation of high confidence MS evidence for as many PE2-4 proteins as possible.
Provides a collection of definitions for the higher-level attributes. Genome Properties ascribes higher-level attributes to a species when a sufficient set of molecular markers are detected in its genome, or else reported jointly absent. It can help to summarize high-level traits of species biology from genome analysis. The method permits to understand metabolic context while trying to understand the roles of other proteins from the same species.
Provides information on over 190 published online cellular pathway and network databases. Pathguide is organised around approximate categories and a database can be stored in multiple categories if it contains multiple data types. The main categories for databases are: Protein–protein interaction (PPI), Metabolic pathway, Signaling pathway, Gene regulation network (GRN), Genetic pathway and Pathway diagram. The database includes a number of protein-sequence databases that store pathway information as secondary information.
A freely available web resource, listing expertise and molecular resource capabilities of research centres and biotechnology companies. The database is designed for researchers who require information on how to make best use of valuable biospecimens from biobanks and other sample collections, focusing on the choice of analytical techniques and the demands they make on the type of samples, pre-analytical sample preparation and amounts needed.
Provides comprehensive information about miRNAs affecting drug therapies. mTD is a database that provides the sentences describing the drug-miRNA interactions with links to corresponding publications for each drugmiRNA association. This resource can help to understand the mechanisms underlying drug actions better and design more efficient drug combinations, like the combination of miRNA inhibitors and drugs.
Provides clinically relevant somatic mutations. DIRECT enables a genetically informed approach to cancer medicine by providing clinicians access to tumor gene therapy-response information based on individual patient data published in the literature. It catalogs only patients with non-small cell lung cancers (NSCLCs) and published individual patient data on EGF receptor (EGFR) mutations.
Contains data on the frequency of inherited human disorders. FIDD is useful in a clinical context, in medical research, for epidemiological studies, and for the planning of genetic services. It can be queried by the body system mainly affected by the disorder, the name of the disease or condition, its Online Mendelian Inheritance in Man (OMIM) number or the population surveyed. The database covers information of over 200 conditions.
A database of mutagenesis and mutation information on Human Immunodefiency Virus (HIV). Hivmut describes the phenotypes of 7,608 unique mutations at 2,520 sites in the HIV proteome, resulting from the analysis of 120,899 papers. The mutation information for each protein is organised in a residue-centric manner and each residue is linked to the relevant experimental literature. The importance of HIV as a global health burden advocates extensive effort to maximise the efficiency of HIV research. The HIV mutation browser provides a valuable new resource for the research community.
An information network launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources, as data, software tools, materials, services, projects and organizations. dkNET functions as a “search engine for data”, searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. dkNET provides an ideal environment for illustrating some of the opportunities and challenges associated with organizing and searching the new types of digital research objects, including dynamic databases, produced in support of biomedical research.
Summarizes literature-curated information on levodopa-induced dyskinesia (LID) genetics. LIPDP is a publicly available web resource that investigates the LID-associated genes in a broader functional context through analysis of their relationships based on network models. It uses different underlying networks and allows to incorporate data into the analyses. The core of LIDPD is a set of Panel genes that have been associated with LID in the scientific literature. Different Highlight datasets provide additional gene annotations or associations with Parkinson's disease.
An online database to centralize aging research projects across numerous research disciplines. IARP integrates information on research grants, peer-reviewed publications, and issued patent applications from multiple sources. Additionally, the database uses flexible project classification mechanisms and tools for analyzing project associations and trends. This system enables scientists to search the centralized project database, to classify and categorize aging projects, and to analyze the funding aspects across multiple research disciplines. The IARP is designed to provide improved allocation and prioritization of scarce research funding, to reduce project overlap and improve scientific collaboration thereby accelerating scientific and medical progress in a rapidly growing area of research.
A text mining system for mutation-disease association extraction. DiMeX consists of a series of natural language processing modules that preprocess input text and apply syntactic and semantic patterns to extract mutation-disease associations. DiMeX includes a separate component that extracts mutation mentions in text and associates them with genes. The results indicate that our system outperforms the existing mutation-disease association tools, addressing the low precision problems suffered by most approaches. DiMeX was applied on a large set of abstracts from Medline to extract mutation-disease associations, as well as other relevant information including patient/cohort size and population data. We conclude that this high-throughput text-mining approach has the potential to significantly assist researchers and curators to enrich mutation databases.
Delivers large healthcare dataset expressed in the open electronic health record (openEHR) formalism. ORBA was constructed on the basis of nation-wide information of pseudo-anonymised patients from healthcare settings distributed across the twenty-seven Brazilian Federation Units. It is composed of more than 150 million records with information from approximately 55.47 million hospitalisation authorisations and 7.75 million patients.
Allows researchers to extract phenotypic information from the PhenoCHF corpus. PhenoCHF-corpus is annotated with several types of phenotypic information related with integrating heterogeneous resources, congestive heart failure, electronic health records (EHR) and scientific literature. These annotations are limited to a single disease. This corpus aims to identify phenotypic information in unstructured texts via the development of machine learning-based techniques.
Provides a portal to healthspan-associated pathways. Healthspan Pathways allows users to access to network graphs of health-associated genes and their interactions. This data source enables to export the different graphs and permits users to display human genetic association and caloric restriction, graphs on caenorhabditis elegans and human/caenorhabditis elegans pathway overlap.
Collects experimental supported associations between biomolecules (miRNAs, metabolites, small molecules/drugs) and metabolic diseases. BioM2MetDisease provides detailed information including biomolecule and metabolic disease names, species, experimental techniques, experimental tissue, dysregulation patterns, etc. to describe how a biomolecule is related to metabolic disease. It also offers PubChem Compound Identifier (CID) for metabolites and small molecules/drugs, miRBase accession for miRNAs, Disease Ontology (DO) identifier and International Classification of Diseases (ICD-10) for metabolic diseases.
A knowledge base of spinal cord injury (SCI) biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources.
Compiles data about patents disclosing genetic sequences. PatSeq Data is built around data collected from national patent offices, public sequence listings repositories or intellectual property organizations. Searches can be made by jurisdiction, document type, sequence type or location. The database provides additional statistics based on criteria, such as document or sequence type or data sources, as well as information about the public availability of sequences listings in the corresponding patent office. It is part of the PatSeq toolkit.
Permits to extract automatically mentions of miRNAs, species, genes/proteins, disease and relations from scientific literature. This tool manually annotated is able to two types of miRNA mentions namely Specific (with numerical identifiers) and Non-Specific (without numerical identifiers). It uses information about miRNAs and miRNA-relations based on published scientific literature. This tool provides the basis for building a knowledge-based approach to model regulatory network for identification of deregulated miRNAs and genes/proteins.
Provides a corpus of Portuguese research articles about Radiology and human, automatic and semi-automatic translations to English. MRRAD containing for each article the original Portuguese document, the human translation (HT) translation, two alternative machine translations (MT) translations and a MT + PE ((post-editing) ) translation. The corpus can be used to study the efficacy of translation solutions in biomedical text, particularly in the field of Radiology.
Offers a search tool allowing users to retrieve patent documents that reports one or more sequence from a specific species. PatSeq Text provides a resource of over 440000 patents that can be investigated by keyword, organism of interest or a combination of both criteria. In addition, the application includes a functionality giving a direct access to sequences related to 24 species of interest. It is part of the PatSeq toolkit.
Provides a resource of known drivers, oncogenes and tumor suppressors in a wide variety of cancer types. CancerMine employs a text mining approach that permits users to discern complicated descriptions of cancer gene roles with a high level of precision. This method can also extract other types of biological knowledge with only minor changes. Data are accessible through a web viewer or a download file.
Contains the full results from all published PGC studies. The results files of the PGC Database are available below along with the LD pruned version suitable for polygenic profile scoring. The purpose of the Psychiatric Genomics Consortium (PGC) is to unite investigators around the world to conduct meta- and mega-analyses of genome-wide genomic data for psychiatric disorders. The PGC includes over 800 investigators from 38 countries, it represents the largest consortium and the largest biological experiment in the history of psychiatry.
Provides a biomedical text-mining and relationship exploring knowledgebase. DESHCV is a comprehensive Hepatitis C Virus (HCV) online web resource. Users can retrieve information using specified concepts, keywords and phrases. It also offers option of either viewing the retrieved abstracts with or without embedded tagged concepts. It leads to the identification of possible new discovery and augments efforts in the search for diagnostic or even therapeutic targets.
Provides users with programs for collecting and processing cancer registry data. Registry Plus is a suite supplying software that can be used separately or together for routine or special data collection. This suite aims to assist the implementation of the National Program of Cancer Registries (NPCR). All the programs provided are compliant with US standards.
Provides information on autism. AutismData proposes access to articles, books, and other media covering practical approaches and accounts of autism and Asperger syndrome through a library catalogue. Furthermore, it provides links to specialists databases (PubMed, Research Autism, ERIC, Social care Online), and general databases (British Library catalogues, Google Scholar, Directory of Open Access Journals).
Provides information about cell lines. Cellosaurus is composed of immortalized cell lines, naturally immortal cell lines, finite life cell lines when those are distributed and used widely, vertebrate cell line with an emphasis on human, mouse and rat cell lines, and invertebrate cell lines. It contains more than 100 000 cell lines, representing over 550 species. The database furnishes data such as synonyms, cross-references and references to publications, databases or ontologies.
Accommodates Standard Operating Procedures (SOP). BRD contains peer-reviewed primary and review articles as well as SOPs in the field of human Biospecimen Science. It is organized in a hierarchy system consisting of two tiers: (1) SOPs, established protocols; and (2) Biospecimen Evidence-based Practices (BEBP), procedural guidelines developed using literature evidence. The database is searchable by keywords or by curated files like: source organization, tier, applicable biospecimens, and topic.
Summarizes medical records into an electronic report of cancer diagnosis and treatment. Abstract Plus is a cancer data collection tool supports the abstraction of all data items in national standard datasets. The database provides an electronic abstract in the format of the North American Association of Central Cancer Registries data exchange layout. It was developed by the Centers for Disease Control and Prevention (CDC).
Automates the detection of mutations and the extraction of mutation–gene pairs. The result is a database of such pairs. MEMA identified 24 351 singleton mutations in conjunction with a HUGO gene name out of 16 728 abstracts.
Assists users in assessing text extraction from biomedical literature figures. DeTEXT contains over 500 typical biomedical literature figures existing in about 300 full-text articles randomly chosen from PubMed Central. The database provides annotation guidelines, tools and makes available to users text detection and word recognition evaluation protocols. It constitutes an image dataset for biomedical literature figure detection, recognition, and retrieval that can be used as a benchmark dataset.
Lists more than 2000 commonly used biological databases. MetaBase (MB) is a wiki-based database of biological databases. Users can search, query or browse the data by category and can also contribute, update and maintain the data. Each database in MB is described in a semi-structured way using forms and templates. Entries carry data for various fields and allow a free-text description of the resource.
A specialized, public resource specifically aimed at researchers investigating reproductive toxicity. Its primary goal is to enable researchers to efficiently and rapidly query the human reproductive toxicity literature in an innovative manner and on a deeper level than articles and from differing gene/protein-, protein-, metabolite / enzyme-, biological- and chemical/toxin-, disease- and human anatomical-centric perspectives.
Gathers information about stem cell research. SyStemCell is a manually-curated database which contains data extracted of more than 280 PubMed’ publications. It aims to provide a basic platform to extract relationships suggested by the multisource data. Moreover, it includes an analysis tool to mine potential relationships among different regulation levels and possibly formulate new hypothesis.
Records core pulmonary hypertension (PH)-related genes. PAHKB is a literature-based PAH genetic resource that contains about 340 human PH-related genes (coding and non-coding genes) curated from over one thousand PubMed abstracts. It also identifies critical signaling pathways that may be relevant to pulmonary arterial hypertension (PAH) pathogenesis The database provides users with a text query, sequence search, gene ranking, browsing using functional analysis results, highlighted pathway map, and curated mouse model.