Allows intuitive hierarchical structuring, visualization and analysis of complex structure and bioactivity data. Scaffold Hunter reads data, for example, from biochemical screens, extracts chemically meaningful compound scaffolds (that is, all carbo- and heterocyclic rings, their aliphatic linker bonds and atoms attached via a double bond) and iteratively removes one ring at a time from the larger child scaffolds to generate smaller ‘parent’ scaffolds according to a set of chemistry- and medicinal chemistry–derived rules. Scaffold Hunter has its origin in drug discovery, which is still one of the main application areas, and is evolved into a reusable open source platform for a wider range of applications.
A network biology-based computational platform designed to integrate transcriptomes, interactomes and gene ontologies to identify phenotype-specific subnetworks. NetDecoder is based on network flow algorithm and formulated as a minimum-cost flow optimization problem to identify and prioritize paths and key regulators within disease specific subnetworks. NetDecoder is designed to capture molecular switches and infer disease-specific networks to better understand pathways and identify key regulators that contribute to a disease phenotype.
Prepares and completes whole-organism screening at high-througput rates. ARQiv-HTS includes functions that fall into two categories - those applied to 'Pre-screening Assay Optimization' and 'Compound Analysis'. The functions allow the user to calculate background signal, determine sample size, run quality control tests, perform virtual experiments to simulate compound efficacy - and finally, to perform compound analysis during iterative drug screen cycles. ARQiv-HTS platform is adaptable to almost any reporter-based assay designed to evaluate the effects of chemical compounds in living small-animal models. ARQiv-HTS thus enables large-scale whole-organism drug discovery for a variety of model species and from numerous disease-oriented perspectives.
Constructs, optimizes and applies computational models of cellular processes. CoPIA is a program that combines model construction in terms of nonlinear differential equations, combinatorial intervention, molecular observation at multiple points, optimization of model parameters with simplicity constraints and experimental validation. The software can be used to build reasonably accurate quantitative predictors of pathway responses to combinatorial drug perturbation in MCF7 cells.
Allows an intuitive interactive organization and exploration of chemical small molecule datasets on current desktop computer hardware. Mona is a generic, simple, and interactive compound browser and manager. This package contains two separate algorithms to cluster molecule sets. First, clustering properties employ bins to create clusters of molecules with similar property values. Second, clustering by similarity calculates fingerprints for all molecules, employing the k-medoid algorithm, which requires quadratic time.
Consists of a library for open-source drug discovery. DeepChem provides an open-source toolchain for deep-learning in drug identification. The software contains open-sourced graph-convolutional primitives and the iterative refinement long short-term memory (LSTM) models. It also implements multitask deep networks that can facilitate the broad adoption of deep networks in commercial drug discovery.
Provides a class of entropy parameters. 3D-MEDNEs offers a physically meaningful 3D generalization of the concept of molecular negentropy by means of stochastic models. The method essentially codifies electronic structural information. It is based on a simple model for the intramolecular movement of electrons from the point of view of Markov chains.
Allows users to break down small molecules into chemically meaningful fragments. molBLOCKS permits analyzing the resulting fragment distribution. It consists of two command-line programs: (1) “fragment” that reads user- defined rules to specify the bonds to break or uses default sets of rules; and (2) “analyze” that collects statistics on the frequency, clusters fragments using a user-defined similarity threshold based on a fingerprint representation of the fragment and selects a representative fragment for each cluster.
Serves for knowledge-based estimation of the permeability of potential anti-mycobacterial compounds. MycPermCheck is an online tool based on a chemoinformatic data-mining approach. It can be applied to drug-like compounds with a molecular weight less than 500 Dalton. This tool allows procurement of permeable anti-mycobacterial compounds.
Analyses energy landscapes from a transition state theory perspective. MEPSA provides features for the identification of all the barriers and minima of the landscape at once and the description of maxima edge profiles. It detects the lowest energy path connecting two minima and generates transition state theory diagrams along these paths. The tool can save most of the generated data into easily parseable text files.
Creates the 3D structure of DNA from the DNA sequence along with an intercalation site, and docks a ligand at the intercalation site. Intercalate is a state-of-the-art, robust and dedicated drug-DNA intercalation methodology. This method also predicts the best binding mode of ligand efficiently through binding free energy estimations in an automated mode. Intercalate methodology holds the potential for the identification of new ligand molecules intercalating to DNA non-covalently. A webserver is also created based on the proposed methodology and is made freely accessible. The webserver contains both the datasets, all the structures used in developing the methodology and a guide to its usage.
Allows extraction of data from the ‘Open Pharmacological Space’ (OPS) developed by the Open PHACTS project. Collector is an application that extracts series of compounds from OPS, together with the result of pharmacological/toxicological experiments. The software applies customizable curation filters, produces series of compounds in a format well suited for the development of quantitative structure-activity relationships (QSAR models). It was used in the eTOX project for the development of QSAR models against targets considered of toxicological interest (anti-targets).
Assists in predicting side effects of drug pairs. Decagon is a general graph convolutional neural network designed to operate on a large multimodal graph where nodes can be connected through a large number of different relation types. It infers a prediction model that can identify side effects of pairs of drugs. This application predicts an association between a side effect and a co-prescribed drug pair to identify side effects that cannot be attributed to either drug alone.
Offers a method dedicated to drugs classification. KRL proposes an approach that uses a machine learning system that focuses on the establishment of personalized drug recommendation. The application directly predicts the top-ranking drugs for a given cell line. It was tested by the settlement of simulated clinically scenario on a dataset of cancer cell lines using a three-fold cross-validation.
Predicts the Anatomical Therapeutic Chemical (ATC) classes. It has been established by hybridizing of the iATC-mISF method with the powerful iATC-mDO sub-predictor. iATC-mHyb outperforms the best existing ATC predictor in all the five metrics used to examine the prediction quality of a predictor for multi-label systems, particularly in the “absolute true” rate and the “absolute false” rate, the two most difficult to-improve indexes. This multi-label predictor can achieve lower than 3% of absolute false rate.
Provides an industry-standardised method to predict the purging of potential mutagenic impurities (PMIs) in a synthetic route based on sound scientific principles. Mirabilis can act as the corporate repository of synthetics schemes, reactions and purge factors facilitating internal collaboration and retention of corporate knowledge. The predicted purge factors are derived from a knowledge of the impurity class as well as other factors which affect purging.
Allows users to find the maximum common subgraph (MCS) in small molecules. SMSD uses a combination of various algorithms to search the MCS and filters the results in a manner that is chemically relevant. This tool calculates the MCS between two molecules by combining the power of the VF+ Lib, the MCS+, and the Chemistry Development Kit (CDK) based MCS algorithm. It checks if two molecules are identical or dissimilar based on the atom count and bond count before performing the MCS search.
Uses as an open source library for molecular fingerprinting with a focus on machine learning and data mining applications. jCompoundMapper is a library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. It also provides the functionality to combine, to compare, or to export the fingerprints into several formats.
Supports batch generation of quantitative structure-activity relationship (QSAR) models. AZOrange is based on a simple generalized consensus model. It combines the predictions from the tool’s learners by averaging or by using the majority vote. The tool uses customized high performance state-of-the-art machine learning algorithms. It offers a way to automatically build QSAR pipelines. AZOrange is able to transform data formats, scale descriptor values where appropriate, accommodate missing values and select stopping criteria.
Performs automated rule-based taxonomic classification of chemical compounds. ClassyFire is a web-accessible computer program that allows automated rule-based structural classification of all known chemical entities. The software is built around a chemical taxonomy along with a fully annotated chemical ontology (ChemOnt) and a Chemical Classification Dictionary. The ClassyFire API allows users to programmatically access the web server for submitting queries, and retrieving classification results, as well as entity-related properties.
An integrative data mining tool for drug discovery using aggregate web services. WENDI aims to build a full picture of potential biological activities of a chemical compound through the aggregation of data from web services that represent diverse multiple sources (including predictive models, databases and journal articles). WENDI allows the identification of corroborating or conflicting information: for instance, a compound might be predicted active in a breast cancer cell line, and similar compounds might show active in a PubChem BioAssay related to breast cancer, or be co-located in a paper abstract with a breast cancer related gene.
An easy-to-use, readily interpretable algorithm and tool that can assist scientists in navigating a complex scientific and informational landscape. In particular, Badapple is designed for rapid detection of promiscuity patterns in HTS data, using public bioassay evidence. However, Badapple is designed to be trained with additional data, and to detect novel patterns, based on an entirely different chemical library. Compound promiscuity is generally undesirable but must be understood in light of polypharmacology and systems chemical biology. Badapple scores indicate either patterns of true or artefactual promiscuity, either of which can help guide an experimental research project away from “false trails”.
Represents the diversity of chemical libraries considering simultaneously multiple molecular representations. CDPs helps the user to compare and classify data sets using diversity metrics. It permits to underline what differs between two libraries and determine which one is the most diverse, without the necessity to have sets of the same size. The user can customize the metrics used in the plot to adjust the method to his/her objectives.
Compares alternative descriptor subsets. VIDEAN is an interactive visual analytics application that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. Some domain expertise can also be added to the feature selection process by means of an interactive visual exploration of data.
Computes and studies atomic charges which respond to changes in molecular conformation and chemical environment. ACC is based on the electronegativity equalization method (EEM). It implements interactive 3D visualization of the molecules based on atomic charges. This tool is useful for statistical analysis and comparison of the results. It can handle any type of molecular system, regardless of size and chemical complexity.
A Java application that aids handling and storage of information about small-molecule compounds. With cApp, the user can appraise compounds with respect to their physico-chemical properties and present structural information together with calculated or measured properties. Structures can be provided by the user in the form of SMILES, InChI, structure-data files (SDF) or added via the embedded chemical editor.
Enumerates hypothetical modular natural product structures. LEMONS is an extensible method that allows modification of their monomer composition or tailoring reactions, and comparing of the original and modified structures using 2D molecular fingerprints. It enables to define a true match between the original and modified scaffolds originating from the same in silico assembly line and thus derived the proportion of correct matches between original and modified structures for each fingerprint. It can be useful in chemical space exploration and microbial genome mining.
Assists users in the prediction of individual component's chemical structure in a mixture. CSCCP includes two components: (1) an iterative dynamic programming algorithm and (2) a database organized from a large collection of natural products. It combines scaffolds analyzed from the scaffold relationship database for formulating chemical structures matching targeted molecular weights.
Generates bioactive conformers of drug-like molecules. ConfGen is based on the infrastructure from the general molecular modeling program MacroModel. It allows access to multiple all-atom force fields, redundant conformer elimination, and multiple processor computing. The tool offers features to limit the number of ring system conformations sampled, including an upper limit, a maximum number of the lowest energy ring conformations per ring system to use and a maximum overall number of ring conformations.
Enables the rapid calculation of a large and diverse set of descriptors encoding 2D chemical structure information. Mold(2) easily and quickly calculates molecular descriptors with no missing values, a common problem with most existing commercial systems. The descriptors used by this tool were compared with descriptors from commercial software packages using information entropy analysis, analysis of correlations between descriptors, and Decision Forest classification on several reported data sets.
Facilitates cheminformatics analyses. Chembench can be used by researchers for curation, visualization, analysis, and modelling of chemo-genomics data. It is useful in chemical genomics, drug discovery, computational toxicology, and other fields. The tool contains robust model builders, property and activity predictors. It offers virtual libraries of available chemicals with predicted biological and drug-like properties and tools specialised for chemical library design.
Assists in classifing 5-HT1A ligand functionality. LiCABEDS is a generic ligand classification algorithm for the prediction of categorical molecular properties. It was developed for the prediction of categorical ligand properties. It minimizes training error by iteratively adding more “learners” into the classifier ensemble. The implementation includes automated model training, cross-validation, and predicting.
Describes alignment-free molecules. xMap derives is based on MaP, a three-dimensional (3D) descriptor tool. This algorithm handles the fourth dimension (4D) and uses an ensemble of conformers generated by conformational searches. It functions through a five-step procedure and the most important descriptor variables are determined with chemometric regression tools. It can also display the derived quantitative structure-activity relationships.
Provides an atom pair contribution (APC) model. APC Method can predict the formation enthalpies of organic molecules in gas phase via its APC additivity scheme. This algorithm is based on increments associated with pairs of bonded and geminal atoms, along with 15 structural corrections. It owns also a large amount of experimental and theoretical data compiled for this work to validate the model.
Evaluates synthetic complexity based directly on published reaction data. SCScore quantifies molecule complexity by correlating implicit knowledge of compounds starting materials with the expected number of reaction steps required to make a target molecule. This software ranked the synthetic complexity of a molecule with a score between 1 and 5. It can assist in assessing virtual screening pipelines and de novo molecular design.
Permits to build, validate and deploy quantitative structure-activity relationship (QSAR) models. AutoQSAR is based on machine-learning methods and encodes current QSAR best practices methods in an automated workflow. It constructs predictive models of both continuously valued and categorical data. The tool can facilitate automatic updating of QSAR models as more end point data become available. It offers a way to explore descriptor and fitting methodology space.
Calculates molecular descriptors and fingerprints. PaDEL-descriptor was developed using the Java language and consists of a library and an interface component. It currently calculates 797 descriptors (663 1D, 2D descriptors, and 134 3D descriptors) and 10 types of fingerprints. These descriptors and fingerprints are calculated mainly using The Chemistry Development Kit. Some additional descriptors and fingerprints were added, which include atom type electrotopological state descriptors, McGowan volume, molecular linear free energy relation descriptors, ring counts, count of chemical substructures identified by Laggner, and binary fingerprints and count of chemical substructures identified by Klekota and Roth. PaDEL-descriptor is free and open source, which has both graphical user interface (GUI) and command line interfaces, able to work on all major platforms (Windows, Linux, MacOS), supports more than 90 different molecular file formats, and is multithreaded.
A publicly available application for quantitative structure-activity relationship (QSAR) analysis. DemQSAR integrates the open source CDK library to compute various molecular descriptors and fingerprints. QSAR models can thus be built without any additional software. Only two dimensional structures of the molecules have to be provided. CDK features can be easily complemented by external feature sets computed by other software. DemQSAR incorporates two state of the art feature selection strategies: embedded Lasso and recursive feature elimination. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human volume of distribution (VDss) and human clearance (CL) is available on the webpage of DemPRED.
Predicts a pan- or specific inhibitor for Bcl-2 and Bcl-xL targets. APPLE is a web-server that classifies the input according to its core and then predicts IC50 from the model generated for the family. The software can also identify the specificity of compounds toward antiapoptotic proteins Bcl-2 and Bcl-xL. The software finally provides the user with the activity and specificity of a given compound. It can be useful for identifying novel specific inhibitor of Bcl-2 family pro-survival proteins.
An integrated drug discovery software. MOE is able to track design ideas and ligand modifications with property models, produce correlation plots to visualize Structure, Property, Activity Relationships and visualize hydrophobic and charged protein surface to study aggregation prone regions. It can also automatically align and superpose antibody structures using the MOE Project protocol, generate and search advanced antibody queries with the Project Search application and build full length Ig structures including bispecifics with the Antibody Homology Modeler.
Investigates ensembles of protein structures for ensemble-based drug discovery. Polyphony can proceed to PDB and sequence alignment file parsing. It can calculate and to compare the properties of the protein structures. This tool is useful for manipulating the structural alignment of proteins. It aims to identify fundamental mechanisms in the makeup of protein machines.
Assists in statistical analyzing and visualizing combinatorial gene-gene and gene-drug interaction screens. TOPS permits users to plot, filter, import, and analyze data from double perturbation screens. It incorporates statistical models designed for the analysis of pairwise interactions of larger gene/drug sets. This tool can analyze all types of data, as long as the data can be reduced to a “perturbation A”, “perturbation B”, “score” format.
Allows users to perform electrophysiology simulations. ActionPotential is an open source portal, also available as a standalone software, which intends to evaluate the performance of different models and to define suitable contexts of use. It has two main functions: (i) gathers data from a cardiac ion channel screening panel and define expectations of the likely total effect, in multiple situations (ii) determines QT liability of compounds and possibly design new experiments.
Uses to learn about the fraction who benefit from a new treatment using randomized trial data. fraction-who-benefit includes (i) proving the plug-in estimator of the bounds can be inconsistent if support restrictions are made on the joint distribution of the potential outcomes; (ii) developing the first consistent estimator for this case; and (iii) applying this estimator to a randomized trial of a medical treatment to determine whether the estimates can be informative. This estimator is computed using linear programming, allowing fast implementation.
Represents an interactive semantic web explorer. PharmaTrek is designed for researchers in the field of multitarget pharmacology to address complex queries in a most simple and intuitive manner. With this tool, users can submit complex multitarget queries to the single largest open pharmacology space available to date. It’s possible to visualize the results in an unique interactive manner that allows taking informed decisions on the original objective multitarget queries.
Predicts Apicoplast inhibitors against Plasmodium parasite. CAPi classifies the compounds as inhibitor or non-inhibitor against Plasmodium apicoplast. Five species of Plasmodium can infect and be spread by humans. Most deaths are caused by P. falciparum because P. vivax, P. ovale, and P. malaria generally cause a milder form of malaria. Malaria parasites contain apicoplasts, organelles usually found in plants, complete with their own genomes.
Allows users to mine data for quantitative structure–activity relationship and quantitative structure–property relationships (QSAR/QSPR) studies. Autoweka is a standalone software based on support vector machines and artificial neural networks with the aim of facilitating QSAR/QSPR predictive data mining models’ development. Besides, users can download additionally scripts to generate plots and graphs.
Provides a method for factorization of heterogeneous data. Macau is Bayesian factorization application that allows to handle tensors, multiple relations, and entity and relation features. It scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. It also tries to learn multiple tasks (predicting multiple drugs) simultaneously and uncovers the common features that can benefit each individual learning task.
Predicts human colorectal carcinoma cell line (Caco-2) permeability of chemical compounds. This algorithm provides a deep neural network (DNN)-based in-silico model based on 663 structurally diverse chemical compounds and with high-level features extracted from 209 molecular descriptors. The model was optimized by using stochastic gradient descent (SGD) with momentum and minibatch size of 100.