## Methods

FlexXc is a toolbox for docking calculations on combinatorial libraries providing functionality for handling libraries, for docking single fragments, combinations of fragments, or whole molecules from the library, and for browsing through combinatorial docking results. The reason for this design is that, depending on the size of the library and its structure (the number of R-groups and how they are connected), different docking algorithms might be useful.

In the following, we will first present the model for combinatorial libraries underlying FlexXc. Then, we will focus on the data structures developed for fast online generation of library molecules. In the third part, we will intro-

a be

Figure 1. Model of the structure of a combinatorial library: Tree representation of a library with an explicit core (a); instance of the core and R-group 1 with the X- and ^-atoms explicitly marked (b); resulting partial molecule after linking a core and an R-group 1 instance (c).

a be

Figure 1. Model of the structure of a combinatorial library: Tree representation of a library with an explicit core (a); instance of the core and R-group 1 with the X- and ^-atoms explicitly marked (b); resulting partial molecule after linking a core and an R-group 1 instance (c).

duce one of our docking methods, called the recursive combinatorial docking algorithm.

### A model for combinatorial libraries

In FlexXc a combinatorial library is represented by a rooted tree structure, the so-called library tree (see Figure 1). Each node of the tree stands for a building block, either the core or an R-group, the core is the root of the tree. For each building block including the core, several alternative molecule fragments are allowed; we will call them instances of the R-group or core in the following. The edges of the tree encode the rules for building the molecules ofthe library. We assume that atoms involved in forming a bond between two building blocks are explicitly marked by the user. We call the unique atom with which an instance is linked to the core or the previous R-group the X-atom and the atoms to which further R-groups can be added the ^-atoms (see Figure 1).

This model has some limitations: First of all, a structure of the library and the individual R-group instances are static. This means that the synthesis aspect of library design is separated from the docking process and no true de novo library design is possible. However, as soon as the chemical reactions as the basis of the library are given, this model is very flexible since various core templates and a large number of alternatives for each R-group can be preselected.

Due to the tree structure, the model does not handle ring closures between different R-groups. In those cases where the synthesized ring differs only in the ligands connected to the ring, the library tree can be defined by taking the ring as a single R-group. Note that the model is able to handle R-groups connected to R-groups.

Finally, the model has an explicit core which may be handled differently in the docking algorithms. Because we are able to handle a set of core instances, there is no real difference between a core and an R-group within the model. Therefore, each R-group can also be defined as the core. In fact, we have implemented a simple operation to change the role of the core and an R-group called core switching (see Figure 2) which will be used in one of our test cases later on.

### Data structures for combinatorial libraries

For the development of data structures for combinatorial libraries, we have the following design objectives: First, the library is handled in its closed form, i.e. it is represented by its R-group instances rather than by an enumerated list of molecules. This allows us to handle large libraries in main memory and computing time for generating library molecules is spent only if the molecules are needed for calculation. Second, the data structure must allow for an efficient construction of the library molecules including the handling of all physico-chemical data which has to be assigned for docking calculations. Third, the data structure has to allow an efficient transfer of docking results from one (partial) molecule of the library to another.

In order to explain how we reach these objectives, we give a rough description of the basic data structures within FlexX:

A molecule is stored by a set of cross-linked lists. Each list contains information about specific objects like atoms, bonds, or ring systems. Each object is labeled with the physico-chemical data assigned during the data structure initialization like interaction types and geometries assigned to atoms, low energy torsion angles assigned to bonds, and low energy ring conformations assigned to ring systems.

Figure 3. FlexX hierarchical data structures: molecules are represented as object lists. A docking solution (placement) contains two pointers (matches and conformation) into tree structures, storing paired interactions and the molecule conformation hierarchically. Both trees contain references back to the molecule.

Figure 3. FlexX hierarchical data structures: molecules are represented as object lists. A docking solution (placement) contains two pointers (matches and conformation) into tree structures, storing paired interactions and the molecule conformation hierarchically. Both trees contain references back to the molecule.

In a docking calculation, two hierarchical data structures are constructed during the incremental construction of the complex. The first data structure contains information about paired interactions and refers to atoms, the second contains information about the constructed conformation and refers to bonds and ring systems. The data structures are hierarchical in the sense that placements with a common ancestor in the incremental construction also share common parts in these two data structures. Figure 3 gives a graphical representation of this design. The advantage of these data structures is that they share common information which does not need to be copied during the incremental construction at the sacrifice ofhaving to maintain a complex linked data structure during the calculation.

If we want to transfer placement information from one library molecule to another having a part in common, we have to update all links referring from the hierarchical placement information to the molecule. This time-consuming and complicated process can be avoided only if the common parts of the two molecules are identical, i.e. are located at the same place in physical memory. We therefore use the following scheme which builds molecules from the core and R-group instances directly, avoiding copying ofmolecule data:

Each instance of a core or R-group is handled internally like a complete molecule with additional information on the connecting atoms. Two operations are developed which allow the 'virtual synthesis' of the library molecules. The extend operation adds an R-group instance to either the core or a partially synthesized library molecule. First, all object lists are linked

and the connecting X- and R-atom are removed from the atom list. In the bond and ring system list, the X-atom is replaced by the atom adjacent to the R-atom and the R-atom is replaced by the atom adjacent to the X-atom (see also Figure lb). A new bond is formed between the replacing atoms and physico-chemical data (torsion angle information) is attached to it.

The remove operation performs the inverse of the extend operation, i.e. the created bond is deleted, the X- and R-atoms are inserted and reintroduced into the bond and ring system lists, the object lists are unlinked. So far, the remove operation is only able to remove the R-group instance added last.

Based on this data structure, a combinatorial library molecule can be considered as a stack of selected R-group instances (see Figure 4). With each extend operation, we select an R-group and an instance of this R-group and put it on top of the stack. With the remove operation we are able to remove R-group instances from the top of the stack. We therefore go from one molecule of the library to another by removing some R-group instances and adding different instances. The R-group instances not changed during this process describe the part which is in common between the two molecules. The placement information calculated for this part can be used for the new molecule without modification.

### The recursive combinatorial docking algorithm

Based on the data structure described above, several docking algorithms can be easily implemented like docking of individual R-groups, core-R-group combinations, or sequential docking of the library molecules enumerated during the docking calculation. Here, we present a different method based on recursively adding and removing R-group instances.

The input of the algorithm is a three-dimensional protein structure, a library tree, and an ordering of the R-groups, called the sequential build-up order S. S defines in which order the R-groups will be added to the molecules during the docking process. S always starts with the core, it then contains

Figure 5. The enumeration process of the library can be considered as a tree once a build-up order is defined for the R-groups (shown on the left). Each tree level corresponds to adding an R-group, the leaves represent the individual library molecules. The numbers within the tree nodes identify the instance of the R-group added.

Figure 5. The enumeration process of the library can be considered as a tree once a build-up order is defined for the R-groups (shown on the left). Each tree level corresponds to adding an R-group, the leaves represent the individual library molecules. The numbers within the tree nodes identify the instance of the R-group added.

the R-groups in an order such that an R-group is always linked to one of the previous R-groups in S or to the core. Therefore, every prefix of S defines a connected partial molecule. Currently, defining the sequential build-up order is not automated and has to be done by the user. We will discuss its influence to the docking calculation in the Results section.

In the first phase, the core instances are docked to the protein. If the instances are small, only the base placement algorithm of FlexX [25] is used, otherwise the whole docking procedure consisting of automatic base selection [23], base placement, and complex construction [22] is used. Afterwards, a set of about 200 to 600 energetically favorable orientations for each core instance is available.

In the second phase, we subsequently dock the library molecules using the incremental construction algorithm. Given the sequential build-up order, all library molecules can be considered as the leaves of a large construction tree: On the first level of the tree, there is a node for each instance of the core. On the i-th level, there is a child node for each instance of the i-th R-group in S for each node of the (i - 1)-st level (see Figure 5). With the recursive combinatorial docking algorithm, we perform a preorder traversal ofthis construction tree: The first instance of each R-group is added until we reach the first molecule of the library (the left-most leave of the construction tree). Then all molecules are built traversing the tree depth-first and from left to right.

The input of the recursive procedure is the partially constructed molecule m, the selection of placements P calculated for m, and the partial sequence S' ofthe build-up order sequence S containing R-groups which have not been added so far (the postfix of S). In order to describe the algorithm, we use the following terms: head(S') is the first element of S'; tail(S') is S' except of the first element head( S').

If S' is empty, m is a complete library molecule and the placements in P are stored or evaluated further. Otherwise, the R-group head(S') is added in the current recursive call, For all instances r of head(S') the following steps have to be performed:

First the current molecule m is extended by r using the extend operation of the combinatorial library data structure explained above. Then, the incremental construction algorithm is executed to extend the placements in set P by the added instance r of the molecule to the new placement set P'. Next, the recursive procedure is called again with the extended molecule, the newly calculated P', and the build-up sequence tail(S') skipping the first R-group of S'. When the recursive call is finished, two clean-up operations are necessary. First, the placement set P' is deleted, then the R-group instance r is released from the molecule again. An outline of the recursive algorithm is given in Figure 6.

In order to initiate the recursive calculation, the algorithm is called for each core instance with the placements from the first phase and the build-up sequence without the core, tail( S). For further evaluation, the highest scoring placements for each library molecule are stored to disk. For this task, we use a priority queue of files which allows to always keep exactly the docking results for the k highest scoring library molecules (where k is user defined).

## Post a comment