Superpositioning molecules is a specific problem addressed in the more gen* To whom correspondence should be addressed. E-mail: [email protected]

eral field of molecular similarity research. The problem of assessing the similarity of molecules, taking their 3D structure into account, can be stated as follows: Given a set of molecules binding to the same location on a molecular target, (a) determine the conformation and relative orientation such as dictated by their common binding mode and (b) find a similarity metric that prioritizes this configuration among all others. Clearly, this problem has several aspects that may be considered to a larger or lesser extent.

First, in general, several molecules have to be considered simultaneously. However, frequently methods only perform painvise comparisons. This is a problem, since it can happen that the multiple alignment provides a plausible match ofthe structures but neither ofthe implied painvise matches does [1,2]. Second, molecules must be considered as flexible objects that usually may adopt millions of different shapes, even if only a limited energy-range is permitted. Frequently, conformational models are used that allow for enumerating conformers. Third, six degrees of freedom have to be considered with regard to the relative orientation of the molecules. Here, a multitude of restrictions have been proposed to cut down the size ofthe search space. One example is to discretize the problem to a matching problem with a limited number of atoms, groups or chemical features. Another alternative is to assume the true superposition in some local minimum of a similarity function and to perform local searches starting from a set of initial orientations. In addition, sometimes, the search space is restricted to a cubic grid. Finally, the similarity metric ideally provides a smooth function that has its global optimum in the above described ideal configuration. However, even in docking studies where the target structure is given, no single energy function exists that meets these standards. Therefore, usually similarity measures are used that approximate the desired behavior, i.e., arrive at a local optimum in a configuration close to the desired geometry for at least a test set of examples. 2D methods neglect the whole geometric issue completely and aim simply at a measure that prefers molecules with similar chemical groups in similar locations of the 2D formula.

In the following, we mention landmark contributions to molecular super-positioning and molecular similarity research in the past two decades.

Around 1979 Crippen and co-workers laid the groundwork of distance geometry [3,4]. This technique utilizes the description of molecules on the basis of their inter-atomic distances or valid distance ranges. Triangle or higher order inequalities are used to narrow down ranges as much as possible. The so-called embedding problem has to be solved to find 3D coordinates for all atoms that obey the distance constraints. This is usually solved using randomized approaches. Distance geometry has proven to be a useful tool for docking, pharmacophore detection and receptor modeling.

At about the same time the active analogue approach was laid out by Marshall and co-workers [5]. This method starts with a putative set of key pharmacophoric groups and their intermolecular correspondence in a set of active compounds. The key idea then is to enumerate all possible conformers and to store all resulting inter-feature distances in an array indexed by these distances. These so-called distance maps of several molecules can simply be intersected to obtain those conformations that result in a consistent pharma-cophoric pattern across all molecules.

For the same objective clique detection methods have been employed as well [6,7]. In a so-called distance compatibility graph all distance- and type-compatible atom pairs in a specific reference structure and any conformer of all other considered structures are represented by nodes. Those nodes are connected that share an atom in each ofthe represented structures. A clique in this graph represents a matching of type-compatible atoms across the set of molecules. Thus, large cliques represent potentially interesting pharmacophores. The method algorithmically enumerates such cliques.

Geometric Hashing is a powerful method that addresses the problem of finding three-dimensional objects in complex scenes. It originates from computer vision [8] and has been widely used to find matches between functional groups of molecules that are either to be docked or to be superposed. The key element in these techniques is a hash table that stores a rotationally invariant representation of one molecule and can be queried with structural features of the other molecule [9].

Given a matching of the pharmacophoric features on the molecules an RMS-fit provides the superposition of the structures that gives the smallest sum of squared distances between the related features [10]. The directed tweak technique allows for an RMS-fit considering molecular flexibility. Using analytical derivatives of the objective function the underlying optimization problem can be solved extremely fast [11].

Other optimization-based approaches aim at maximizing some kind of overlap volume. The classical similarity measures provided by Cabó [ 12] and Hodgkin [13] are standard objective functions in such methods. Con-formational sampling and the generation of reasonable starting points for the optimization are the critical ingredients in an optimization-based approach. All kinds of optimization techniques, gradient based, with genetic algorithms, simulated annealing, and many others have been attempted to arrive at reasonable optima quickly.

Virtual database screening aims at detecting active molecules in a large collection ofcompounds. The compounds may either be given explicitly (e.g. a corporate database) or implicitly as a combinatorial library with core fragments and substituent lists. Traditionally, database screening has been the domain oftopology- or descriptor-based methods that assess similarity calculation rapidly at the expense of limited accuracy and difficult interpretability of the results. Superposition methods have been employed for the screening task as well [14,15]. Usually, stepwise filtering protocols were used to keep computational demands low [16,17].

Reviews on the early work on molecular similarity can be found in References 18-20. An outstanding collection of articles and reviews on pharmacophore elucidation can be found in Reference 21. Drug design applications of molecular similarity and molecular complementarity concepts are described in Reference 22. Molecular similarity measures are reviewed in Reference 23. Excellent reviews on molecular superpositioning can be found in References 24-28. A recent review on descriptor-based methods is given in Reference 29. A review on the structural alignment ofmolecules highlighting the last recent methodical developments can be found in Reference 30.

Our group contributed several methods for different variants of the problem to superimpose molecules. Initially, these were designed for the purpose ofthe preparation of structures for 3D-QSAR analysis. However, they turned out to be powerful enough to screen large datasets. Especially some recent enhancements to the software and the use of a two-step filtering protocol using a rough and fast superpositioning technique first and an expensive but accurate alignment tool later made our methods applicable to virtual database screening.

In the following sections we briefly review the approaches to molecular superpositioning that we contributed during the last years and detail the recent enhancements we implemented. Then we describe two ways of evaluating superpositioning methods and the performance that we achieved with regard to these criteria. First, we compare the geometries that our method produces to crystal data. Second, we describe the filtering protocol we designed and show the effectiveness of our methods in a virtual database screening scenario. In order to keep the presentation concise, we restrict ourselves to presenting the results on one out of five example cases that we investigated. However, the results on the other example cases usually look quite similar and the entire set of charts can be inspected on our web-page

( Our presentation concludes with an outlook to future work.

0 0

Post a comment