Introduction

Watching the seemingly ever increasing throughput both in molecular synthesis (high-throughput organic synthesis, HTOS) and biological testing (high-throughput screening, HTS), which allows a rapid screening of even the large compound pools of major pharmaceutical organizations, a naive viewer might guess that the role of computational methods to assess molecular similarity/dissimilarity is of decreasing importance in lead finding and optimization.

For various reasons the opposite development holds true:

- There are still many interesting biological targets which are not amenable to HTS. For medium and low throughput assays however, compound

* To whom correspondence should be addressed. E-mail: [email protected]

selection by diversity considerations or by similarity to given lead compounds usually is the method of choice. For targets whose 3D structures are available either from X-ray crystallography, high-resolution nuclear magnetic resonance spectroscopy (NMR) or homology modeling, virtual screening techniques like computational docking or 3D database searches might be other options.

- Prior to synthesis, a virtual library containing even billions of compounds can be constructed in the computer. In order to design and synthesize 'real' libraries around a given lead structure (lead-optimization libraries) or for use in broad screening (screening libraries), one may select compounds from the virtual library applying either similarity or diversity criteria.

- Similarly, one might consider external compound sources (e.g. vendor's catalogues) in order to fill 'diversity voids' in the corporate database or to purchase compounds similar to a lead structure.

- Since high-throughput compound screening is typically done in a highly automated fashion and at a single concentration, the biological data might be noisy, thus giving rise to a considerable number of both false positives and false negatives. Whereas false positives typically will be detected in secondary assays, false negatives might get lost in the further hit-to-lead process. One way to uncover at least some of those is to rescreen all compounds which have some similarity to a primary HTS hit, no matter whether they have passed the first HTS hit threshold or not.

- Considering the fact that many promising drug candidates fail due to their insufficient ADME (absorption, distribution, metabolism and excretion) properties, there is increasing interest in measuring these properties in an HTS mode or to predict them based on similarities to well-characterized compounds.

All these applications of molecular (dis-)similarity solely rely on the similar property principle [1] which states that there has to be a strong correlation between the molecular descriptors used to define similarity and the properties one would like to predict (e.g. binding affinity, physicochemical or pharmacokinetical properties).

In recent years, many researchers have investigated the predictive power of various molecular similarity descriptors [2-5]. Most of these analyses came to the conclusion that topological 2D descriptors (e.g. path fingerprints from DAYLIGHT [6] or MDL's MOLSKEYS [7]) are useful in this respect and that they are to some extent superior to 3D descriptors which take into account the three-dimensional features of molecules.

One possible explanation for these surprising results can be found in the nature of the data sets used in the investigations mentioned above: Typically, in a medicinal chemistry program many compounds with the same scaffold are synthesized around a given lead structure and towards the end of such a program, only minor modifications to the most active compounds are added. This is an ideal situation for 2D similarity measures: many active compounds sharing most of their 2D features. The 3D methods, however, can sometimes be very sensitive to these small changes. This would explain that such compound sets are misclassified by 3D measures.

Although for many applications clustering of compounds of the same structural class is desired, often one would like to cross the boundaries of structural scaffolds in order to find 'surprising' similarities of compounds sharing the same biological or physicochemical properties. This is particularly interesting for instance when a patent-protected competitor's compound is used as the search query or when unwanted side effects can be associated with a structural class.

In order to achieve such an 'island-hopping ' situation, the respective molecular descriptors have to be derived at a higher level of abstraction than the 2D structural keys. One of the most promising ideas in this respect, which was first described in the early 1990s, is the use of so-called 'affinity fingerprints'. In the following paragraphs, we will give a brief review of this field. In zddition we will describe a recently developed extension, called Flexsim-S. Finally, we will compare results obtained by our virtual affinity fingerprint methods to those obtained by employing some popular 2D descriptors.

0 0

Post a comment