Compound identifier conversion software tools | Drug discovery data analysis
Metabolomic publications and databases use different database identifiers or even trivial names which disable queries across databases or between studies. The best way to annotate metabolites is by chemical structures, encoded by the International Chemical Identifier code (InChI) or InChIKey.
A freely available compound identifier mapping service on the internet, designed to optimize the efficiency with which structure-based hyperlinks may be built and maintained between chemistry-based resources. In the past, the creation and maintenance of such links at EMBL-EBI, where several chemistry-based resources exist, has required independent efforts by each of the separate teams. These efforts were complicated by the different data models, release schedules, and differing business rules for compound normalization and identifier nomenclature that exist across the organization. UniChem, a large-scale, non-redundant database of Standard InChIs with pointers between these structures and chemical identifiers from all the separate chemistry resources, was developed as a means of efficiently sharing the maintenance overhead of creating these links.
The discovery platform has been developed to reduce barriers to drug discovery in industry, academia and for small businesses. You can now access the integrated pharmacological data via the Open PHACTS API, Open PHACTS Explorer or apps.
A web-based Chemical Translation Service that performs batch conversions of the most common compound identifiers, including CAS, CHEBI, compound formulas, Human Metabolome Database HMDB, InChI, InChIKey, IUPAC name, KEGG, LipidMaps, PubChem CID+SID, SMILES and chemical synonym names. The software was implemented in Groovy and JAVA, the web frontend was implemented in GRAILS and the database used was PostgreSQL.
Arranges chemical names. NLM-CHEMSORT is based on a method consisting of a primary sort key of over 80 alphabetic characters and a 16-alphanumeric characters secondary level sort key generated from the chemical name. These de novo sort keys don’t need increased permanent-storage costs and allows results in logical sequences of chemical names. This algorithm is suitable for obtaining chemical names from smaller files such as the Toxicology Data Bank.
Enables its users to match either a single name of a chemical compound or a whole list of names against reference databases, even when the notations are different. This matching is solely done on the name (string) basis of chemical compounds without identifying the exact chemical structure of the molecule described. An input name is normalized to a unique name form by a set of transformation rules. These rules include, among others, reordering of substituent descriptions in the name and replacement of synonymous name constituents (e.g. equivalent trivial names), as well as more simple rules dealing with different spellings, spaces, hyphens, etc. The resulting normalized term does not necessarily represent a valid or even systematic notation for a given compound but is only intended for matching two names normalized by the same methods.
You can access more results by creating a free plan account or unlimited content via a premium account.