Computational protocol: The Interplay between QSAR/QSPR Studies and Partial Order Ranking and Formal Concept Analyses

Similar protocols

Protocol publication

[…] QSAR/QSPR modeling can in the simplest form be expressed as the development of correlations between a given physico-chemical property or biological activity (endpoint), P, and a set of parameters (descriptors), Di, that are inherent characteristics for the compounds under investigation (1)P = f(Di)The properties (endpoints), P that has been subjected to QSAR/QSPR modeling comprises physicochemical properties and biological activities in the environment as well in the human beings.In general models that describe/calculate key properties of chemical compounds take into account three types of inherent characteristics of the molecule, i.e., structural, electronic and hydrophobic characteristics. Depending on the actual model few or many of these descriptors may be taken into account. Thus, can be rewritten as (2)P = f(Dstructural, Delectronic, Dhydrophobic, Dx) + eThe descriptors reflecting structural characteristics may, e.g., be element of the actual composition and 3-dimensional conFiguration of the molecule, whereas descriptors reflecting the electronic characteristics may, e.g., be HOMO/LUMO energies, charge densities, dipole moment etc. The descriptors reflecting the hydrophobic characteristics are related to the distribution of the compound between a biological, hydrophobic phase, and an aqueous phase. A further, fourth type of characteristics, Dx, (cf. ) accounts for possible underlying characteristics that may be known or unknown, such as environmental or experimental parameters as, e.g., temperature, salt content etc. The data may often be associated with a certain amount of systematic and non-quantifiable variability in combination with uncertainties. These unknown variations are expressed as “noise”. Thus, the parameter, e, account for possible noise in the system, i.e., the variation in the property that cannot be explained by the model.In the studies presented in the present review paper a series of freely available QSAR/QSPR models has been applied. Thus, physico-chemical data, environmental persistence and environmental toxicities have been obtained applying the EPI Suite []. The interaction with the human organism has been elucidated through absorption, distribution, metabolism and excretion data derived by ADME Boxes [] and the human toxicological effects by ToxBoxes [] and by PASS (Prediction of Activity Spectra for Substances) []. [...] Toxicities of the investigated substances have been obtained using the ECOSAR [] that calculates the toxicity of chemicals discharged into water. Both acute (short-term) toxicities and chronic (longterm or delayed) toxicities are calculated by ECOSAR, the calculations being based on the octanolwater partition (log KOW). ECOSAR can run independently or as an integrated part of the EPI SuiteECOSAR return the acute as well as chronic toxicities of the substance under investigation to fish (both fresh and saltwater), water fleas (daphnids), and green algae. In some cases also other effects, e.g., toxicity to earthworms are returned. The acute toxicities are calculated as LC50 values. [...] The theory of partial order ranking is presented elsewhere [,]. In brief, Partial Order Ranking is a simple principle, which a priori includes “≤” as the only mathematical relation. If a system is considered, which can be described by a series of descriptors pi, a given site A, characterized by the descriptors pi(A) can be compared to another site B, characterized by the descriptors pi(B), through comparison of the single descriptors, respectively. Thus, site A will be ranked higher than site B, i.e., B ≤ A, if at least one descriptor for A is higher than the corresponding descriptor for B and no descriptor for A is lower than the corresponding descriptor for B. If, on the other hand, pi(A) > pi(B) for descriptor i and pj(A) < pj(B) for descriptor j, A and B will be denoted incomparable. Obviously, if all descriptors for A are equal to the corresponding descriptors for B, i.e., pi(B) = pi(A) for all i, the two sites will have identical rank and will be considered as equivalent, i.e., A = B. In mathematical terms this can be expressed asIt further follows that if A ≥ B and B ≥ C then A ≥ C. If no rank can be established between A and B these sites are denoted as incomparable, i.e., they cannot be assigned a mutual order. Therefore POR is an ideal tool to handle incommensurable attributes.In partial order ranking – in contrast to standard multidimensional statistical analysis – neither any assumptions about linearity nor any assumptions about distribution properties are made. In this way the partial order ranking can be considered as a non-parametric method. Thus, there is no preference among the descriptors. However, due to the simple mathematics outlined above, it must be emphasized that the method a priori is rather sensitive to noise, since even minor fluctuations in the descriptor values may lead to non-comparability or reversed ordering.A main point is that all descriptors have identical orientations, i.e., “high” and “low”. As a consequence of this, it may be necessary to multiply some descriptors by −1 in order to achieve identical directions. As an example bioaccumulation and toxicity can be mentioned. In the case of bioaccumulation, the higher the number the higher a chemical substance tends to bioaccumulate and thus the more problematic the substance, whereas in the case of toxicity, the lower the Figure the more toxic the substance. Thus, in order to secure identical directions of the two descriptors, one of them, e.g., the toxicity Figures, has to be multiplied by −1. Consequently, both in the case of bioaccumulation and in the case of toxicity higher Figures will now correspond to more problematic sites.The graphical representation of the partial ordering is often given in a so-called Hasse diagram [–]. In practice the partial order rankings are performed using the WHasse software []. An alternative to the WHasse software is the DART (Decision Analysis by Ranking Techniques) that comprises different kinds of order ranking methods, roughly classified as total - and partial order ranking methods [] or the PyHasse software currently being developed by R. Brüggemann []. […]

Pipeline specifications

Software tools EPI Suite, PASS, ECOSAR, DART
Application Drug design
Organisms Homo sapiens