## Methods

We developed two software tools for the purpose ofmolecular superpositioning: RigFit, a method to rigid-body superimpose molecules, and FlexS, for the task of flexibly superimposing a test molecule onto a rigid reference struc ture. Since the former is a special case ofthe latter problem, the whole system is called FlexS [15]. The flexible superpositioning strategy comprises three steps: the selection of a number of basefragments, the placement of the base fragments, and an incremental construction, completing the partial superpositions generated by the base placement step. The procedure is illustrated by the left branch ofthe flowchart in Figure 1. The rigid-body superpositioning comprises a numerical optimization procedure (see the right branch of the flowchart). It turns out that this method is also well suited for the superpositioning of molecular fragments. Therefore, we employ RigFit in FLEXS in two places: for the rigid-body superpositioning of whole molecules and as one of three alternatives for the base placement step during flexible superpos-itioning. FLEXS provides a scripting language that allows for comfortable control of the superpositioning task. Also, it permits batch processing, like performing loops over series of compounds or invoking RIGFIT with multiple conformers that have been generated separately. The runtime of the flexible superpositioning lies in the range of a few minutes, whereas rigid-body superpositioning is performed in a couple of seconds on a current workstation or PC (like a single processor SUN-Ultra-30 Workstation with 128 MB of main memory and 296 MHz clock speed).

The rigid-body superpositioning in RigFit implements an optimization procedure using the Hodgkin index [13] as the goal function and utilizing sets of Gaussian functions to model a variety of physico-chemical properties. The method employs basic techniques from crystallography that help to speed up the process significantly. The basic concepts in here have been described previously by Klebe et al. [3 1] for the modeling part and Nissink et al. [32] for the crystallographic techniques. The resulting RigFit approach has several interesting characteristics. First, moving to Fourier space, which can be done analytically, provides easy access to a translation-invariant description ofthe molecules. Hence the first step in a RigFit optimization is to determine local optima for the rotation of the molecules starting from a number of orientations. The second step comprises the translational optimization which can be performed in Fourier space quite efficiently as well. Finally, the approximate solutions are post-optimized using the original Hodgkin index and considering all six degrees of freedom. Figure 2 illustrates how this method compares with a traditional overlap optimization procedure. Though the procedure looks more complex, it reduces the necessary number of local optimizations and allows for much faster three-dimensional optimizations for the intermediate steps. Since the translational optimization is especially fast, large numbers of start translations are permitted which are the prerequisite for successfully placing molecular fragments. For the algorithmic details see Reference 1.

Figure 1. Flow chart of a single superposition of a test ligand onto a reference structure with FlexS. The user decides between flexible fitting (left branch) and rigid-body superpositioning (right branch). The rigid-body superpositioning has been described elsewhere in detail as the RigFit procedure [1]. RigFit is used in FlexS in two places (indicated by the dashed line). In any case FlexS produces a set of reasonable placements of the test ligand.

Figure 1. Flow chart of a single superposition of a test ligand onto a reference structure with FlexS. The user decides between flexible fitting (left branch) and rigid-body superpositioning (right branch). The rigid-body superpositioning has been described elsewhere in detail as the RigFit procedure [1]. RigFit is used in FlexS in two places (indicated by the dashed line). In any case FlexS produces a set of reasonable placements of the test ligand.

The flexible superpositioning part of FlexS heavily uses the matching of H-bonding partners in both molecules. However, neither the interacting atoms nor specific site points are matched. instead, those regions in space that describe the preferred position of protein counter atoms are required to intersect in order to form a valid matching. The H-bonding geometries as well as directional hydrophobic interactions are modeled by point sets in space. A combinatorial optimization procedure enumerates triangles of such interaction points on the base fragment and searches for compatible triangles

Figure 2. The traditional approach to rigid-body superpositioning with local optimization techniques is shown on the left hand side. Here, n x m local optimizations with six degrees of freedom have to be performed. In contrast, the RIGFIT procedure is illustrated on the right hand side. It also starts with n start rotations; however, these are optimized independently, and result in k < n local optima. Then k x m translational optimizations are performed that result in r << k x m local optima. These again are post-optimized and result in a comparatively small set of 5 solutions, just as the traditional approach. Hence, fewer local optimizations are performed at three different steps of the protocol. In addition, as symbolized by the size of the shaded rectangles, these optimizations (either dealing with only three degrees of freedom or comprising only a few steps of a post-optimization) can be carried out quite fast.

Figure 2. The traditional approach to rigid-body superpositioning with local optimization techniques is shown on the left hand side. Here, n x m local optimizations with six degrees of freedom have to be performed. In contrast, the RIGFIT procedure is illustrated on the right hand side. It also starts with n start rotations; however, these are optimized independently, and result in k < n local optima. Then k x m translational optimizations are performed that result in r << k x m local optima. These again are post-optimized and result in a comparatively small set of 5 solutions, just as the traditional approach. Hence, fewer local optimizations are performed at three different steps of the protocol. In addition, as symbolized by the size of the shaded rectangles, these optimizations (either dealing with only three degrees of freedom or comprising only a few steps of a post-optimization) can be carried out quite fast.

in a hash table for the reference structure. Each such match of two triangles defines a placement of the base fragment on the reference structure. If the triangle matching procedure fails, the slower RIGFIT procedure is invoked to place the fragment. In this way a number of base fragments, each with a set of low-energy conformers, is placed in a number of plausible orientations on the reference structure. Subsequently, the test ligand is incrementally built up by adding the remaining fragments in a stepwise fashion. Each fragment contains exactly one rotatable bond or a whole ring system. Upon adding a fragment, each of its preferred torsions or ring conformers are tried in turn. Thus, in each step, a number of partial placements is extended to a larger number of partial placements of molecular entities that are larger by one fragment. In order to avoid the inherent combinatorial explosion of alternative solutions, in each step, all partial placements are scored and only the topranking solutions are carried to the next step. See Figure 3 for an illustration of the tree-like combinatorial optimization scheme. For the technical details of this algorithm we refer to Reference 33.

ooooo oo••• ••••• ••••• • ••••

complete placements sorted by score

Figure 3. Flow chart of the combinatorial placement procedure in FlexS which explores the tree structured configuration space, permitted by the underlying discrete modeling of the superpositioning problem. Dots symbolize placements (for all but the leaves of the tree being partial placements) of the test ligand. White colored dots are supposed to indicate high-scoring placements that are shuffled to the left by the intermediate sorting steps and comprise the roots of the next level of the evaluated portion of the configuration tree. The triangles, starting from each root, symbolize that multiple solutions are obtained upon adding a new fragment in multiple conformations to an existing partial placement.

An important recent enhancement to the flexible superpositioning has been the extension of the algorithm to handle multiple base fragments to start with. These can be selected either manually, automated by a heuristic procedure, or on the basis of a common substructure. The latter is of special interest if combinatorial libraries are to be superimposed that contain a limited number of substructures suitable as base fragments. In combination with the manual definition of placements for these fragments a significant speed up is achieved, too, since the base placement step usually takes up to two thirds of the runtime of the entire procedure.

The automated selection of base fragments is critical for processing large numbers ofmolecules in a virtual database screening experiment. We employ a simple heuristic that loops over all fragments of a limited size and scores them by three important and, to some extent, opposing aspects: the number of potential interacting groups, the number of conformers, and the volume spanned by the fragment. Since these are at the same time the dominating factors for the superpositioning method it is likely that among high scoring fragments there are reasonable ones to start with.

Another enhancement of the software is to allow merging multiple Gaussian representations into a single one and to use such a composite model as the Gaussian representation of the reference molecule. The different sets of Gaussians may originate from different compounds, from different conform-ers of one compound, or may even be artificially constructed. In this way the reference compound may, in fact, represent a whole set of compounds, conformational uncertainty may be appropriately described, and, e.g., excluded volumes may be incorporated into the molecule to be fitted onto.

Since the number of terms to be evaluated during Gaussian overlap calculation grows approximately with the square of the number of atoms, we reduce the number of Gaussians as follows: We evaluate all painwise distances between all the Gaussians. Each distance that lies below a certain threshold results in an edge in a corresponding distance graph that contains a node for each Gaussian. The distance graph is not necessarily connected. The aim is to merge terminal nodes and cliques in such a graph. Figure 4 illustrates an example situation and the nodes to be merged.

We achieve this goal using a two-pass procedure, determining the Gaussi-ans to be merged first and subsequently merging them. The technical details and implementation can be found in Reference 34. In order to determine the merged Gaussian representation three parameters have to be optimized, height, width, and position. The width of the Gaussians is set to a constant in all our calculations. To optimize the position requires solving a separate expensive optimization problem. instead we simply use the center of gravity of the Gaussians to be merged. Given the widths and positions, the heights are optimized in order to minimize the difference between the old and the reduced Gaussian description.

## Post a comment