nitrogen would be 5 valence electrons minus 2 hydrogens, or 3. There are various additional modifications that are done to further differentiate atoms and define their environments within the molecule.
Excellent regression equations using topological indices have been obtained. A problem is interpreting what they mean. Is it lipophilicity, steric bulk, or electronic terms that define activity? The topological indices can be correlated with all of these common physicochemical descriptors. Another problem is that it is difficult to use the equation to decide what molecular modifications can be made to enhance activity further, again because of ambiguities in physicochemical interpretation. Should the medicinal chemist increase or decrease lipophilicity at a particular location on the molecule? Should specific substituents be increased or decreased? On the other hand, topological indices can be very valuable in classification schemes that are described later in this chapter. They do describe the structure in terms of rings, branching, flexibility, etc.
Besides regression analysis, there are other statistical techniques used in drug design. These fit under the classification of multivariate statistics and include discriminant analysis, principal component analysis, and pattern recognition. The latter can consist of a mixture of statistical and nonstatistical methodologies. The goal usually is to try to ascertain what physicochemical parameters and structural attributes contribute to a class or type of biological activity. Then, the chemicals are classified into groupings such as carcino-genic/noncarcinogenic, sweet/bitter, active/inactive, and depressant/stimulant.
The term multivariate is used because of the wide variety and number of independent or descriptor variables that may be used. The same physicochemical parameters seen in QSAR analyses are used, but in addition, the software in the computer programs breaks the molecule down into substructures. These structural fragments also become variables. Examples of the typical substructures used include carbonyls, enones, conjugation, rings of different sizes and types, N-substitution patterns, and aliphatic substitution patterns such as 1,3- or 1,2-disubstituted. The end result is that for even a moderate-size molecule typical of most drugs, there can be 50 to 100 variables.
The technique is to develop a large set of chemicals well characterized in terms of the biological activity that is going to be predicted. This is known as the training set. Ideally, it should contain hundreds, if not thousands, of compounds, divided into active and inactive types. In reality, sets smaller than 100 are studied. Most of these investigations are retrospective ones in which the investigator locates large data sets from several sources. This means that the biological testing likely followed different protocols. That is why classification techniques tend to avoid using continuous variables such as ED50, LD50, and MIC. Instead, arbitrary end points such as active or inactive, stimulant or depressant, sweet or sour, are used.
Once the training set is established, the multivariate technique is carried out. The algorithms are designed to group the underlying commonalities and select the variables that have the greatest influence on biological activity. The predictive ability is then tested with a test set of compounds that have been put through the same biological tests used for the training set. For the classification model to be valid, the investigator must select data sets whose results are not intuitively obvious and could not be classified by a trained medicinal chemist. Properly done, classification methods can identify structural and physicochemical descriptors that can be powerful predictors and determinants of biological activity.
There are several examples of successful applications of this technique.8 One study consisted of a diverse group of 140 tranquilizers and 79 sedatives subjected to a two-way classification study (tranquilizers vs. sedatives). The ring types included phenothiazines, indoles, benzodiazepines, barbiturates, diphenylmethanes, and various heterocyclics. Sixty-nine descriptors were used initially to characterize the molecules. Eleven of these descriptors were crucial to the classification, 54 had intermediate use and depended on the composition of the training set, and 4 were of little use. The overall range of prediction accuracy was 88% to 92%. The results with the 54 descriptors indicate an important limitation when large numbers of descriptors are used. The inclusion or exclusion of descriptors and parameters can depend on the composition of the training set. The training set must be representative of the population of chemicals that are going to be evaluated. Repeating the study on different randomly selected training sets is important.
Classification techniques lend themselves to studies lacking quantitative data. An interesting classification problem involved olfactory stimulants, in which the goal was to select chemicals that had a musk odor. A group of 300 unique compounds was selected from a group of odorants that included 60 musk odorants plus 49 camphor, 44 floral, 32 ethereal, 41 mint, 51 pungent, and 23 putrid odorants. Initially, 68 descriptors were evaluated. Depending on the approach, the number of descriptors was reduced to 11 to 16, consisting mostly of bond types. Using this small number, the 60 musk odorants could be selected from the remaining 240 compounds, with an accuracy of 95% to 97%.
The use of classification techniques in medicinal chemistry has matured over years of general use. The types of descriptors have expanded to spatial measurements in 3D space similar to those used in 3D-QSAR (see discussion that follows). Increasingly, databases of existing compounds are scanned for molecules that possess what appear to be the desired parameters. If the scan is successful, compounds that are predicted to be active provide the starting point for synthesizing new compounds for testing. One can see parallels between the search of chemical databases and screening plant, animal, and microbial sources for new compounds. Although the statistical and pattern recognition methodologies have been in use for a very long time, there still needs to be considerable research into their proper use, and further testing of their predictive power is needed. The goal of scanning databases of already synthesized compounds to select compounds for pharmacological evaluation will require considerable additional development of the various multi-variate techniques.
This chapter is limited to fairly simple computational techniques using readily available, low-cost software. The QSAR approach, including classification methods has at its disposal literally thousands of different descriptors, each with its advocates, and many computational approaches starting with the previously discussed linear regression to neural networks, decision trees and support vector machines. Thus, it is fair to ask if drug discoveries have been made with these computational techniques. The answer is in the ambiguous yes-and-no category depending on who is asked. There is general agreement o ch3
that QSAR provided the foundation to better understand the relationship between chemical space and pharmacological space. Consider these two pairs of active molecules. For classification purposes, acetylcholine and nicotine are nicotinic agonists, and dopamine and pergolide are dopaminergic agonists. Using the various measures of similarity and their descriptors, fingerprints, and fragments, these two pairs come up as dissimilar. It is true that the pharmacological profiles of acetylcholine—nicotine and dopamine—pergolide vary so much that nicotine is not used as a nicotinic agonist and pergolide is falling into disuse.9,10 Without being trite, computational drug design techniques are not going to replace the medicinal chemist who has an open, inquiring mind.
Has QSAR Been Successful?
The answer to this question depends on what are the expec-tations.11 In their original development, it was hoped that QSAR equations would lead to commercially successful drugs. This has not occurred. Over the years, methodologies to develop these equations have changed. In the early development of QSAR equations, all of the compounds in a data set were used followed by random holding out of compounds to see if the equation changed. If there are enough compounds, it is now more common to begin with a training set and evaluate its validity with a test set. This has led to recommendations on the most reliable statistical measurements of validity.12-14
Paralleling the evolution in the use of different statistical measures of validity, deriving these equations has lead to the development of a wide variety of descriptors ranging from descriptive, physicochemical and topological. What can be frustrating is that the quality of predictions is dependent on the descriptor set.15 It must be remembered that most of descriptors are only as good as the algorithms used to calculate them. Further, it can be difficult to interpret exactly what the descriptors are measuring in chemical space. QSAR equations must explain physical reality if predictions for future compounds are to be made.16
O COMPUTER-AIDED DRUG DESIGN: NEWER METHODS
Because powerful computing power, high-resolution computer graphics, and applicable software has reached the desktop, computational drug design methods are widely used in both industrial and academic environments. Through the use of computer graphics, structures of organic molecules can be entered into a computer and manipulated in many ways. Computational chemistry methods are used to calculate molecular properties and generate pharmacophore hypotheses. Once a pharmacophore hypothesis has been developed, structural databases (commercial, corporate, and/or public) of 3D structures can be searched rapidly for hits (i.e., existing compounds that are available with the required functional groups and permissible spatial orientations as defined by the search query). It has become popular to carry out in silico (computer as opposed to biological) screening of drug-receptor candidate interactions, known as virtual high-throughput screening (vHTS), for future development. The realistic goal of vHTS is to identify potential lead compounds. The drug-receptor fit and predicted physicochemical properties are used to score and rank compounds according to penalty functions and information filters (molecular weight, number of hydrogen bonds, hydrophobicity, etc.). Although medicinal chemists have always been aware of absorption, distribution, metabolism, elimination, and toxicity (ADMET or ADME/Tox), in recent years, a much more focused approach addresses these issues in the early design stages. Increased efforts to develop computer-based absorption, distribution, metabolism, and elimination (ADME) models are being pursued aggressively. Many of the predictive ADME models use QSAR methods described earlier in this chapter. In general, understanding what chemical space descriptors are critical for druglike molecules helps provide insight into the design of chemical libraries for biological evaluation.
Today's computers and software give the medicinal chemist the ability to design the molecule on the basis of an estimated fit onto a receptor or have similar spatial characteristics found in the prototypical lead compound. Of course, this assumes that the molecular structure of the receptor is known in enough detail for a reasonable estimation of its 3D shape. When a good understanding of the geometry of the active site is known, databases containing the 3D coordinates of the chemicals in the database can be searched rapidly by computer programs that select candidates likely to fit in the active site. As shown later, there have been some dramatic successes with use of this approach, but first one must have an understanding of ligand (drug)-receptor interactions and conformational analysis.
Keep in mind that a biological response is produced by the interaction of a drug with a functional or organized group of molecules. This interaction would be expected to take place by using the same bonding forces as are involved when simple molecules interact. These, together with typical examples, are collected in Table 2.9.
TABLE 2.9 Types of Chemical Bonds
Reinforced ionic Ionic
Hydrogen lon-dipole Dipole-dipole van der Waals Hydrophobic
Was this article helpful?