Discussion and conclusions

In this study, distance-dependent pair preferences and SAS-dependent singlet preferences are derived from crystallographically determined protein-ligand complexes. The scoring function DrugScore incorporates both terms and shows very promising results. It discriminates satisfactorily between well-docked (rmsd <2.0 A) ligand binding modes and largely deviating ones generated by the docking tool FlexX. This is demonstrated for two test sets comprising 91 and 68 complexes, respectively. A substantial improvement of 35% is achieved compared to the original FlexX scoring.

DrugScore's ability to predict binding affinities is assessed by correlating experimentally determined pKi values with the computed scores. Protein-ligand complexes taken from the PDB as well as sets of docked ligands were investigated. Compared to currently applied scoring functions, DrugScore reveals lower standard deviations.

Most remarkably, the composite picture of spherical pair-potentials in Drugscore exhibits implicitly information about the directionality of interactions. It possesses predictive power to suggest positions in space that are most favorable for particular ligand-atom types.

Knowledge-based approaches are assumed to be general since they implicitly incorporate even those effects that are yet not fully understood. Converting structural database information into statistical preferences considers entropy effects arising from cooperativity and changes of solvation due to their mean-field character. Moreover, less frequently populated states are considered with lower statistical preferences, thus implicitly penalizing computergenerated artifacts. Additionally, since no explicit training set is used in contrast to the derivation of e.g. regression-based scoring functions, our scoring function should be generally applicable.

Hydrogen atoms are not explicitly considered in our scoring function. Most complexes in the PDB either lack or contain only force-field assumed hydrogen atoms. However, in particular the positions ofpolar hydrogen atoms strongly depend on the influences of their molecular environment. Changes of the electrostatic field of a protein might result in substantial p Ka shifts of ionizable groups upon ligand binding. In consequence, defining protonation states a priori e.g. during a docking experiment is by no means straightforward. Although at first glance, the neglect of H-atom positions appears to imply the loss of information about the directionality of polar interactions, the composite consideration of many-fold pair-preferences in a compact molecular environment recovers these features (Figure 10).

By visualization of calculated hot spots, we definitely demonstrated this directionality in protein-ligand interactions to be implicitly included in our distance-dependent pair-potentials. Similar considerations about anisotropic interactions resulting from the summation of individually isotropic contributions led to the correct description of e.g. directional hydrogen bonds by taking into account only Lennard-Jones-type and electrostatic interactions in force fields [60]. We believe that this important property of our approach can be attributed to the comparatively short upper limit of 6 Á considered during the compilation ofourpotentials. While binding affinity is largely determined by the amount of buried non-polar surface, the specificity of ligand binding is mainly attributed to directional interactions such as hydrogen bonds [48].

The graphical display of a knowledge-based scoring function in terms of 'hot spots' in a binding pocket suggests further applications. Highlighting the regions of a binding site where a particular type of ligand atom appears to be most favorable allows one to use them as an interactive design tool. Nota bene, these 'hot spots' do not simply represent regions offavorable energy but also include entropical contributions. Additionally, this information should be used in a docking tool to drive the initial ligand placement.

For comparison, data in the Cambridge Structural Database (CSD) [61] can be used to derive statistical preferences of intermolecular interactions [57]. They allow to develop a scoring function to discriminate different computer-generated crystal packings [62]. Furthermore, additional data call upon a more sophisticated consideration of atom types. However, mixing data from the PDB with those from the CSD involves the following fundamental problem: protein-ligand complexes are usually crystallized from water whereas the overwhelming part of organic small molecules are crystallized from organic solvents. As a consequence, for a quantitative correlation as anticipated in our analysis the influence ofthe hydrophobic effect is expected to be smaller in the data derived from the CSD compared to those from the PDB. This influence was first recognized by Verdonk et al. during the development of Superstar [38].

The recent studies of Verkhivker et al. [28], Muegge et al. [33] and Mitchell et al. [35] use a similar formalism to derive potentials. These approaches are difficult to compare since hardly any of these studies elucidate the discriminative power to render prominent the native pose. In our opinion, this is the most crucial prerequisite prior to an estimation of binding affinities in virtual screening. A scoring function, demonstrated to operate satisfactorily on crystal structures, does not necessarily handle computer-generated, often artificial and incorrect binding modes equally well.

To compare different scoring functions with respect to the prediction of absolute binding affinities, one has to remember that R2 values (but not the standard deviations) heavily depend on the composition of the data sets considered. Accordingly, we compared DrugScore's performance with respect to other scoring functions in terms of their achieved standard deviations. Nevertheless, the trends observed in Figure 8 are similarly reflected once the R2 values are computed, since identical data sets were used for comparison. Noteworthy, except for the mixed set ('others'), DrugScore performs better than the other available knowledge-based approaches. Supposedly, for virtual screening applications, it is more important to correctly predict the binding affinity of different ligands with respect to one selected protein than to rank correctly mixed sets of various protein-ligand complexes. In this study, the standard deviation for the docked thrombin and trypsin inhibitors falls below one log unit. This demonstrates DrugScore's power to predict binding energies for computer generated ligand geometries. Nevertheless, since the measured pKi values are assumed to be affected by an even smaller error limit, the experimental accuracy is not yet matched. The larger standard deviations of 1.5 to 1.7 log units in the case of the docked thermolysin inhibitors have to be discussed in view of the supposedly much higher experimental data scatter, since these data were collected from several different sources with deviating assay conditions. In addition, due to an increasing conformational complexity of the ligands, FlexX does not detect reasonable geometries in all ofthe cases. Scoring unlikely geometries, however, cannot be expected to predict affinity reliably.

Currently, DrugScore is implemented into FlexX. We expect an improvement of the incremental ligand build-up and placement procedure mainly in reducing the number of generated solutions. With respect to the affinity predictions we expect further enhancement once water molecules are included in our considerations.

0 0

Post a comment