Contains chemical structures and biological properties of molecules including small molecules and siRNA reagents. PubChem consists of three interconnected databases: Substance, BioAssay and Compound. The database also provides a suite of web-based bioactivity analysis tools allowing to download and search individual test results, compare biological activity data from multiple screenings, examine target selectivity or explore structure–activity relationships for compounds of interest.

ChEBI / Chemical Entities of Biological Interest

A database and ontology containing information about chemical entities of biological interest. ChEBI currently includes over 46 000 entries, each of which is classified within the ontology and assigned multiple annotations including (where relevant) a chemical structure, database cross-references, synonyms and literature citations. Programmatic access has been improved by the introduction of a library, libChEBI, in Java, Python and Matlab. Furthermore, we have added two new tools, namely an analysis tool, BiNChE, and a query tool for the ontology, OntoQuery.


A well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services, which exposes significantly more data from the underlying database and introduces new functionality.


Gathers detailed drug, drug-target, drug action and drug interaction information about drugs. DrugBank is a web resource that contains information about FDA-approved drugs as well as experimental drugs going through the FDA approval process. The database also includes pharmaco-omic data covering the influence of drugs on metabolite levels, gene expression levels and protein expression levels, as well as data on investigational drug clinical trials and drug repurposing trials, and thousands of up-to-date drug images of approved drugs.

DTP / Anti-cancer Agent Mechanism Database

Provides services and resources to the academic and private-sector research communities worldwide to facilitate the discovery and development of new cancer therapeutic agents. Services available directly from DTP: (i) NCI-60 human cancer cell line screen, (ii) Molecular Target Program, (iii) Materials for research — tumor cells, chemicals, natural products and biological samples. Anti-cancer Agent Mechanism Database DTP is a set of 122 compounds with anti-cancer activity and reasonably well known mechanism of action. The list of compounds was assembled as a training set for neural network analysis of drug mechanism of action.

SMPDB / Small Molecule Pathway DataBase

An interactive, visual database containing more than 618 small molecule pathways found in humans. More than 70% of these pathways (>433) are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in metabolomics, transcriptomics, proteomics and systems biology. It is able to do so, in part, by providing exquisitely detailed, fully searchable, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways.


A public, Web-based informatics environment. ChemBank stores and makes freely available data derived from small molecules and small-molecule screens and has resources for relating and studying these data. Currently, ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays performed at the Broad Institute screening center. Web-based analysis tools are available within ChemBank to study the relationships between small molecules, cell measurements, and cell states.

CTRP / Cancer Therapeutics Response Portal

Links genetic, lineage, and other cellular features of cancer cell lines to small-molecule sensitivity with the goal of accelerating discovery of patient-matched cancer therapeutics. CTRP hosts an 'Informer Set' of 481 small-molecule probes and drugs that selectively target distinct nodes in cell circuitry and that collectively modulate a broad array of cell processes. The CTRP is a living resource for the biomedical research community that can be mined to develop insights into small-molecule mechanisms of action and novel therapeutic hypotheses, and to support future discovery of drugs matched to patients based on predictive biomarkers.


A multimodal web interface that presents the data from the Target Central Resource Database (TCRD) which collates many heterogeneous gene and protein datasets. Pharos serves as entry point into the druggable genome. TCRD integrates a wide array of knowledge and data types about genes, proteins and small molecules collected and processed from numerous resources. It includes text-mined bibliometric associations and statistics from the biomedical and patent literature, mRNA and protein expression data, disease and phenotype associations, bioactivity data, drug target interactions, and processed datasets about the functions of genes and proteins from 66 resources organized into 114 datasets imported from the Harmonizome. Pharos application provides facile access to all data types collected. Given the complexity of the data surrounding any target, efficient and intuitive visualization has been a high priority, to enable users to quickly navigate and summarize search results and rapidly identify patterns.


Offers information of pharmaceutical ingredients. SuperDRUG provides annotated drugs with regulatory details, chemical structures (2D and 3D), dosage, biological targets, physicochemical properties, external identifiers, side-effects and pharmacokinetic data. It enables a comparison of 2D- and 3D-similarity between drugs of different indication classes elucidating structural reasons for adverse effects that might be neglected by exclusive consideration of their 2D-resemblence.


Extracts molecular fragments, classified as bricks and linkers, from small molecule datasets. eMolFrag uses the fragments in order to construct targeted libraries for virtual screening. It stores the connectivity information for the extracted building blocks to help generate new series of chemically feasible compounds. It is optimized to work with eSynth, a recently developed molecular synthesis algorithm. It can also be integrated into other cheminformatics toolkits utilizing chemical fragments.


Integrates structure, bioactivity, regulatory, pharmacologic actions and indications for active pharmaceutical ingredients approved by FDA and other regulatory agencies. DrugCentral includes content for active ingredients with pharmaceutical formulations, indexing drugs and drug label annotations, complementing similar resources available online. At the molecular level, DrugCentral bridges drug-target interactions with pharmacological action and indications. The integration with FDA drug labels enables text mining applications for drug adverse events and clinical trial information.

Exposome Explorer

Provides biomarkers of exposure to environmental risk factors for diseases. Exposome-Explorer contains detailed information on the nature of biomarkers, populations and subjects in which biomarkers have been measured, samples analysed, methods used for biomarker analyses, concentrations in biospecimens, correlations with external exposure measurements, and biological reproducibility over time. This information can be used by epidemiologists and clinicians to compare the performance and field of application of various biomarkers and to identify the specific biomarkers or panels of biomarkers that are most useful for biomonitoring or disease etiology studies.


A resource for withdrawn and discontinued drugs. WITHDRAWN not only contains information related to drug withdrawals and associated adverse drug reactions but also drug-target interactions and genetic variations of the protein targets. The drug-target interaction information is mapped to biological context by enriching the relevant pathways. The illustrated case study proves that, connecting links between drugs, targets and SNPs may explain the underlying mechanisms of toxicity. The knowledge presented in the database can improve the insights of drug-target interactions in toxicological context and provide the rationale for further off-target profiling and enhanced pharmacogenetics studies in different populations.


A publicly available large-scale resource containing compounds extracted from the full text, images and attachments of patent documents. The data are extracted from the patent literature according to an automated text and image-mining pipeline on a daily basis. SureChEMBL provides access to a previously unavailable, open and timely set of annotated compound-patent associations, complemented with sophisticated combined structure and keyword-based search capabilities against the compound repository and patent document corpus; given the wealth of knowledge hidden in patent documents, analysis of SureChEMBL data has immediate applications in drug discovery, medicinal chemistry and other commercial areas of chemical science. The SureChEMBL database contains more than 17 million distinct compounds extracted from more than 14 million patent documents, spanning a time range from 1970 to present.


Allows to explore the medicinal value of diet and elucidate the synergistic effects of natural bioactive compounds on disease phenotypes. NutriChem is a database that contains food-compound pairs between some plant-based foods and phytochemicals, as well as the food-disease associations between some plant-based foods and diseases. It was generated by text mining of 21 million MEDLINE abstracts. The incorporation of confidence scores based on the availability of support from literature or patient records may serve as future update of NutriChem.

SuperSweet / Sweetening Agents Database

Compiles information on natural and artificial sweetening agents. SuperSweet includes sweetening agents’ properties such as 3D structure, origin, sweetness, approval, calories and provides hypotheses on their binding to the receptor. It contains more than 8000 carbohydrates, proteins, D-amino acids and artificial (synthesized) sweeteners, which were retrieved from the literature and different pre-existing data sources like Pubchem and the Protein Data Bank (PDB).

NPC / NCGC Pharmaceutical Collection

A comprehensive, publically-accessible collection of approved and investigational drugs for high-throughput screening. NPC provides a valuable resource for both validating new models of disease and better understanding the molecular basis of disease pathology and intervention. It has already generated several useful probes for studying a diverse cross section of biology, including novel targets and pathways. NCGC provides access to its set of approved drugs and bioactives through the Therapeutics for Rare and Neglected Diseases (TRND) program and as part of the compound collection for the Tox21 initiative, a collaborative effort for toxicity screening among several government agencies.


Provides a database of chemical tool compounds for targets from medicinal chemistry literature and patents. PROBELIST was developed to help researchers make the best use of these chemicals and to easily locate useful functional inhibitors for a given biological pathway. The data set was annotated in a target-centric way, with the KEGG pathways, gene ontology (GO) terms (molecular function (MF), biological process (BP), and cellular compartment (CC)), reactome pathways, Entrez gene IDs, and Uniprot IDs for each protein target.

BARD / BioAssay Research Database

A public database and suite of tools developed to provide access to bioassay data produced by the NIH Molecular Libraries Program (MLP). Data from 631 MLP projects were migrated to a new structured vocabulary designed to capture bioassay data in a formalized manner, with particular emphasis placed on the description of assay protocols. New data can be submitted to BARD with a user-friendly set of tools that assist in the creation of appropriately formatted datasets and assay definitions.


It was developed to hierarchically cluster drugs, bioactive molecules, human metabolites, natural products, patented agents and other molecules into functional families, superfamilies and classes of structurally similar compounds based on the literature-reported high, intermediate and remote similarity measures. The compounds were represented by molecular fingerprint and molecular similarity was measured by Tanimoto coefficient. The functional seeds of CFam families were from hierarchically clustered drugs, bioactive molecules, human metabolites, natural products, patented agents, respectively, which were used to characterize families and cluster compounds into families, superfamilies and classes.

T3DB / Toxin-Toxin-Target DataBase

A resource that was specifically designed to capture information about the toxic exposome. The focus of the T3DB is on providing mechanisms of toxicity and target proteins for each toxin. This dual nature of the T3DB, in which toxin and toxin target records are interactively linked in both directions, makes it unique from existing databases. It is also fully searchable and supports extensive text, sequence, chemical structure, and relational query searches.


Provides information on covalent binding compound structures, chemotypes, targets, covalent binding types, and other biological properties. cBinderDB is a covalent binder database. The covalent binding targets are annotated with biological functions, protein family and domains, gene information, modulators, and receptor-ligand complex structure. The data in the database were collected from scientific publications by combining a text mining method and manual inspection processes.

ASDB / Annotated Scaffold Database

Allows exploration of scaffolds or chemical probes for pharmaceutical innovations and chemical biology studies. The scaffolds in ASDB were derived from public databases including ChEMBL, DrugBank, and TCMSP, with a scaffold-based classification approach. Each scaffold was assigned with an InChIKey as its unique identifier, energy-minimized 3D conformations, and other calculated properties. A scaffold is also associated with drugs, natural products, drug targets, and medical indications. The database can be retrieved through text or structure query tools.

SM-TF / Small Molecule-Transcription Factor

A structural database of small molecule-transcription factor complexes. The 3D structures of the co-bound small molecule and the corresponding binding sites on transcription factors (TFs) are provided in the database, serving as a valuable resource to assist structure-based drug design related to TFs. Currently, the SM-TF database contains 934 entries covering 176 TFs from a variety of species. The database is further classified into several subsets by species and organisms. The entries in the SM-TF database are linked to the UniProt database and other sequence-based TF databases. Furthermore, the druggable TFs from human and the corresponding approved drugs are linked to the DrugBank.


Gives access to a collection of chemical properties and their related links. ChemAgora compiles sets available through the OECD eChemPortal coupled to a selection of public repositories dealing with toxicological field. It is composed of two search modules: (i) the first panel permits to query the application by submitting an InChIKEY, an InChIKey Skeleton, a name or a CAS Registry Number and; (ii) the second panel allows users to identify substances by drawing a structure directly through the platform.