## Results

The recursive combinatorial docking algorithm can preferably be applied in cases where the library has one group exhibiting characteristic binding properties to the protein. Although this is not always the case, it occurs frequently in structure-based drug design, for example if a small molecule is already known to bind to a specific group of the protein like in metallo-proteinases or to specificity pockets like in thrombin, biotin, or DHFR.

### Data sets

In order to test the method, we have created three different libraries for thrombin and DHFR. The first library, called the benzamidine library, is very small and contains a core and one R-group. The core is either para- or meta-substitu-ted benzamidine, there are 46 instances for the R-group making 92 molecules in total. The R-group instances differ in size from 1 to 39 atoms, contain up

COMBINATORIAL_DOCKING( protein R, combilib L) % initiates recursive combinatorial docking calculation

1 S ^ define R-group build-up order;

2 foreach instance m e core of L do

4 recursive_rgroup_placement(R, m, P,tail(S)); od;

RECURSIVE_RGROUP_PLACEMENT(protein R, moleculem, placementsP, buildup-order S)

% extend m sequentially by the instances of an R-group, calculates placements % and further adds R-groups recursively

2 evaluate placements P;

4 foreach instance r e head(S) do

5 m ^ m extended by R-group instance r;

8 delete placements P'; remove R-group instance r from m; od;

Figure 6. The recursive combinatorial docking algorithm: After placing the core molecules, the recursive traversal through the library is started. In each call, all instances of one R-group are added and the incremental construction algorithm is applied to the extended molecule.

to two ring systems and up to eight rotatable bonds. The molecules including binding data to thrombin have been collected by Böhm and Klebe [26].

The second library is called the pyridine library, taken from [27]. It contains a single core molecule and three R-groups as shown in Figure 7a. The R-groups contain 23, 24, and 7 instances making 3864 molecules in total. The library contains molecules with up to 94 atoms, four ring systems, and 21 rotatable bonds.

The third library is called the UGI-160 library and is taken from [28]. The library is based on the UGI reaction and contains a core with two instances due to the stereo center and four R-groups (see Figure 7b). Unfortunately, not all R-group instances of the library are published, such that we have used the (2 x 2 x 2 x 10 x 2) = 160 molecules sublibrary from the 320 000 molecule library listed in the publication. We artificially increased this library to a 2 x 10 x 10 x 10 x 10 library named UGI-20000 by creating additional R-group instances.

All core and R-group instances have been preprocessed with Sybyl [29] as follows: correct sybyl atom and bond types as well as formal charges have been assigned, hydrogens were added, 3D structures have been generated and energy minimized. Finally, we marked X- and R-atoms.

### Docking the benzamidine library into thrombin

For this first experiment, a 3D structure ofthrombin was taken from PDB [30] entry ldwd containing the complex between human a-thrombin and NAPAP [31]. The active site was defined to contain all atoms located at a distance of less than or equal to 6.5 A around atoms from NAPAP, all water molecules were removed.

Because there is only one R-group, no build-up order has to be selected. Comparing a sequential docking run (docking each library molecule independently) with a combinatorial run shows only minor differences in the results (correlation coefficient: 0.9), see Figure 8a. This could be expected since the benzamidine unit plays a dominant role in the binding process and is docked first in the combinatorial run. Also, there is no choice concerning the R-group order. Differences in the calculations result from the fact that, in the sequential run, the algorithm selects a set of base fragments distributed over the whole ligand while, in the combinatorial run, the base fragments are limited to the core molecules, i.e. benzamidine in this case.

Comparing the calculated docking scores against the experimental binding affinities, a correlation of 0.58 is achieved, see Figure 8b. It should be noted that, in this case, all molecules bind to thrombin and span a relatively small range ofbinding affinities.

Docking the pyridine library into DHFR

The 3D structure of DHFR was taken from PDB entry 4dfT containing the complex between E. coli DHFR and methotrexate [32]. The active site was defined to contain all atoms at a distance of less than or equal to 6.5 A around methotrexate. All waters except HOH A 403 and 405 have been removed. These two waters have been kept because they are known to be highly conserved. Although they form hydrogen bonds to the pyridine unit, they are not necessary to determine the correct orientation of the pyridine core with FLeXX.

In this library, the three R-groups are located very closely to each other which makes it a good test case for studying the influence of the build-up order. We performed the docking calculations with three different orders: R3 - R4 - R5, R5 - R4 - R3, and R4 - R3 - R5. While, for the first two orders, the results are again very similar to the sequential run (correlation of 0.81 and 0.87), the correlation drops to 0.74 in the third case. Looking at the plot (see Figure 9) reveals that some molecules cannot be docked correctly in the combinatorial run. The reason might be that in these cases the instances of R4 prefer orientations in which they cause overlap with R3 or R5 later on. Although this library might be an extreme case because of the relatively small distance between the R-atoms at the core, this example shows that dependencies between R-groups have to be taken into account in combinatorial docking calculations.

### Docking the UGI libraries into thrombin

For docking the UGI libraries we used the same 3D structure of thrombin from ldwd that we used for docking the benzamidine library. An inspection of the library shows that R-group 3 is designed for binding into the S 1 pocket of thrombin (see Figure 10). We therefore used our core switching functionality in order to start the recursive combinatorial docking calculation with R-group 3 instead of the core.

For this experiment, neither structures nor experimental binding affinities were available to us. Weber et al. [28] presented the best 4-component compound found during their optimization procedure (see Figure 11), the stereo isomer is not specified. Within the results of docking the sublibrary we found the highest ranking stereo isomer (R-configuration) of this molecule at rank 2 with a predicted score of -40 kJ mol-1. This is in good agreement with the experimental value of -38 kJ mol-1.

In the sequential docking run, this molecule was found only at rank 38 with a suboptimal placement with score -35 kJ mol-1. In this application, the combinatorial algorithm produced a better result than the sequential al-

Figure 9. Correlation between sequential and combinatorial docking of the pyridine library info DHFR shown for two different build-up orders.

MqtNtttW

Figure 9. Correlation between sequential and combinatorial docking of the pyridine library info DHFR shown for two different build-up orders.

Ull^ll

H2N COOMe

"NH

HN""NHa HN NH

H2N COOMe

COOMe

COOMe

"O

Figure 10. Compounds used for R-group 3 from the UGI library, taken from [28].

HN""NHa HN NH

gorithm. The reason for this is that, in the combinatorial run, knowledge of the docking problem is entered by specifying the part of the molecule that is docked first. The search space is therefore limited such that the space of low-energy conformations can be searched more exhaustively.

In the UGI-20000 library, the molecule was found at rank 1592 which is 15.9% of the database taking into account that two stereo isomers are contained in the library.

### Computing times

The main advantage of using the recursive combinatorial docking algorithm versus the sequential algorithm is a reduction of computing time. Computing times for all sequential and combinatorial runs are given in Table 1; calcula-

Table 1. Computing times

Library

Sequential

Combinatorial

Total

Permol.

Order

Total

Per mol.

Benzamidine

92 5 h4 min 3:18 min

11 min

Pyridine 3864 3da 6:42 min 4-3-5 4h13min 3.9 s

UGI-160 UGI-20000

160 20000

Calculations are performed on a SUN Ultra-30 Workstation with 300 MHz processor. a Performed on a cluster of six SUN Ultra-5 workstations.

tions are performed on a 300 MHz SUN UltraSPARC workstation. The UGI libraries have been docked with the combinatorial algorithm only.

Docking the benzamidine library sequentially took about 3:18 min per molecule on average. The combinatorial run took 11 min in total, which is 7.2 s per molecule on average. Therefore, the combinatorial algorithm is a factor 27.5 faster than the sequential algorithm.

For the sequential run of the pyridine library, a parallel version of FLEXX was used running on a SUN workstation cluster with six CPUs. This run took about 3 days real time, which is about 6:42 min per molecule on average (this is a real time measurement and should be taken as a rough worst case estimate since other jobs may have been running on this cluster at the same time). For the combinatorial run, the average computing time per molecule is between 3.9 and 7.9 s depending on the build-up order ofthe library. Taking into account the different processor speeds and the fact that not all ofthe CPU time was available, the speed-up lies approximately between 50 and 25.

For the UGI libraries, no sequential run was performed. The combinatorial run of the UGI-20000 library took 4.6 s per molecule. The two UGI libraries also demonstrate the fact that the calculation time per molecule drops with the size ofthe library.

For the pyridine library, computing times for different build-up orders are given. In this case, taking the R-group with fewer instances (R5) first lowers the performance. Considering a small example with only two R-groups, say x and y, it can be shown that the average docking time per instance determines the most time-efficient build-up order: Let nx, n be the number of instances in R-group x and y and tx, ty be the computing time for placing all instances of the R-groups. Then the total time for build-up order x-y is Txy = x + nxty implying that Txy < Tyx if x/(nx - 1) > y/(n - 1).

## Post a comment