Results and discussion

Flexsim-S

Three different approaches to virtual affinity fingerprints have been developed in our group: In our previous work, we have employed the docking programs DOCK and FlexX in order to dock test ligands into a reference panel of binding pockets taken from the PDB. The former approach, called DOCK-SIM, used a rigid docking algorithm and a panel of eight binding pockets selected more or less arbitrarily. In an attempt to optimize the reference panel of binding pockets in terms of size and composition and to introduce flexible docking, we developed the Flexsim-X method. Finally, in this article we present an extension which uses the molecular superpositioning program FlexS to generate virtual affinity fingerprints. To optimize the reference panel for this approach, called Flexsim-S, we applied the same genetic algorithm-based optimization protocol as described in detail for Flexsim-X [14]. The optimum, expressed as mean sample hit rate, was obtained using 44 out of 100 reference molecules.

Descriptor validation and comparison

In order to validate our methods and to see them in context with some popular 2D descriptors (DAYLIGHT fingerprints, ISIS MOLSKEYS, FTREES and Tripos' Molecular Holograms), we were particularly interested in answering two questions:

First, how well do our affinity fingerprints perform in terms of correctly classifying compounds according to their activity classes and second, do these methods yield similar or rather complementary hit lists with respect to the existing 2D approaches?

To answer the first question, we applied a standard ligand test set of approximately 1000 compounds. As described in the Methods section above, mean sample hit rates were calculated to gauge the predictive power of each descriptor set. It has to be kept in mind that a totally meaningless descriptor (e.g. a random assignment of activity classes) would result in a mean sample hit rate of about 9% for this data set. Thus, any figure higher than that can be regarded as enrichment, i.e. classification results better than random.

Results are given in Figure 1 with the descriptors sorted in order of decreasing mean sample hit rates. Yielding values between 57 and 71%, all four 2D methods are superior to our affinity fingerprints. Nevertheless, the optimized Flexsim-X and Flexsim-S panels (50 and 48%, respectively) perform only slightly worse than e.g. DAYLIGHT fingerprints or Tripos' Holograms.

These findings are in agreement with results obtained by other researchers who observed that 2D descriptors outperform 3D approaches in many cases. As discussed in the Introduction, we believe that this is partly due to the strong bias towards 2D similarity in many validation sets (like in ours). Taking this into account we are quite satisfied approaching the 2D results so closely.

Compared to our older DOCKSIM method (mean sample hit rate of 26%), we could clearly demonstrate a big improvement in overall performance. There are several reasons which might account for this enhancement:

- DOCKSIM employs the rigid docking algorithm of DOCK 3.5, whereas Flexsim-X allows a flexible adaptation of the ligands.

- The docking algorithms as well as the scoring functions of DOCK and FlexX are quite different.

Figure 1. Mean sample hit rates for the 2D descriptors (dark grey) and the virtual affinity fingerprints (light grey).

- For Flexsim-X and Flexsim-S, we carefully optimized the reference panel compositions yielding 41 and 44 members, respectively. For DOCKSIM, only eight binding pockets were selected more or less arbitrarily.

We believe that the latter issue has the largest effect on the improvement obtained. To verify or disprove this hypothesis, we plan to run the same reference panel optimization procedures using both DOCK 3.5 as well as the latest DOCK Version 4.0 [25], allowing flexible docking.

As we have already discussed in the Introduction and demonstrated in our previous work on virtual affinity fingerprints, these descriptors are able to find 'surprising' similarities amongst molecules from the same activity class, i.e. similarities which are neither obvious to a 'chemist's eye' nor to the 2D fingerprint methods described above. In order to examine this in more

Figure 2. Result of the descriptor comparison (expressed as mean % overlap) for each descriptor combination (black: 2D vs 2D descriptors; grey: 2D vs 3D descriptors; white: 3D vs 3D descriptors).

detail, we systematically calculated the hit list overlaps - based on the 20 nearest neighbors for each compound - for every descriptor combination (see Methods section above).

Results are given in Figure 2 in terms of mean percentage of overlap.

2D vs 2D

The overlap percentage ranges from 47% (MOLSKEYS vs Holograms) up to 63% (MOLSKEYS vs DAYLIGHT fingerprints).

2D vs 3D

Even the highest overlap value of 40% (Flexsim-S vs FTREES) is smaller than the lowest overlap in the 2D vs 2D section. The other values range from 17 to 33%.

3D vs 3D

The overlap values for our three different virtual affinity fingerprint methods are all around 20%.

The following conclusions may be drawn:

- Despite the differences in how the 2D descriptors are calculated, they all yield similar hit lists, leading to high overlap values.

- The 2D vs 3D results show that hit lists obtained by virtual affinity fingerprints are truly complementary to those obtained by standard 2D methods. This complementarity is particularly remarkable considering the fact that both the 2D as well as the optimized 3D descriptors are able to achieve an enrichment of compounds from the same activity class to a similar extent.

- Most surprising to us was the finding that the overlap values amongst the three affinity fingerprint methods (3D vs 3D) are all very low. Each approach seems to have its special characteristics, which again raises hopes for complementarity in similarity search or activity prediction results.

0 0

Post a comment