## Results

Statistical preferences of ligand-protein atom pairs

Correlation functions for pairs of ligand-protein atoms were derived by counting their occurrence frequencies at discrete distances. Subsequently, statistical preferences were computed using Equation 1. In the following, we will focus on some illustrative examples, depicted in Figures 1 and 2, out of 289 possible atom-pair combinations. In all cases, the first atom-type index i is attributed to a ligand atom, the second j to a protein or cofactor atom.

The derived preferences can be divided into two main classes: the first contains interactions between polar and charged atoms and exhibits pronounced minima at distances between 2.5 and 3.0 A. They correspond to hydrogen-bonds and salt-bridges (Figure 1). The second class comprises non-polar interactions and displays broader minima which reveal favorable interactions at distances >3.5 A (Figure 2).

Going from O.3-O.3 to O.3-O.co2 and O.co2-N.p13, the minima corresponding to the shell of next neighbors fall into a decreasing distance range,

thus exhibiting favorable interactions at lower distances. Contacts between the above-mentioned atom-types can be assigned to a 'normal' hydrogen-bond, a polar charge-assisted interaction and a salt-bridge [48]. Expressed in terms of statistical preferences (Equation 1), an ideal O.3-O.3 interaction is 2.5 times less favorable than a similar O.3-O.co2 interaction. For the O.co2-N.pl3 atom pair one has to take into account that in a bidentate salt bridge between carboxylate and amidinium/guanidinium this interaction is counted twice.

For nonpolar contacts (Figure 2), the C.ar-C.ar interaction shows a slightly more structured preference compared to the (2.3-C.3 interaction and the minimum of the former resides at a shorter distance of 3.7 A, in agreement with the well-known aromatic-aromatic interactions [49]. In contrast, C.3-C.3 interactions do not show any distinct preference of the atom pair distribution over the entire distance range of 4 to 6 A of favorable interactions. This is clearly in agreement with the well-known fact that the latter type of interaction hardly exhibits any directional preferences.

Figure 3. Statistical preferences for ligand atoms of type C.3 and O.co2 as calculated from the distribution functions for solvent accessibility of both atom types for complexed and separated state from the protein according to Equation 2. The number of cubes (#cubes) is an approximate measure for the solvent accessibility; zero cubes refer to complete burial.

Figure 3. Statistical preferences for ligand atoms of type C.3 and O.co2 as calculated from the distribution functions for solvent accessibility of both atom types for complexed and separated state from the protein according to Equation 2. The number of cubes (#cubes) is an approximate measure for the solvent accessibility; zero cubes refer to complete burial.

Table 1. Results for scoring multiple docking solutions of 91 protein-ligand complexes generated by FlexX and DrugScore

% of complexes with solutions exhibiting rmsd of the crystal structure

Table 1. Results for scoring multiple docking solutions of 91 protein-ligand complexes generated by FlexX and DrugScore

% of complexes with solutions exhibiting rmsd of the crystal structure

<1.0A |
<1.5 A |
<2.0A |
>2.0A | |

All ranks- |
65 |
76 |
84 |
16 |

1st rankb | ||||

FlexX |
20 |
37 |
54 |
46 |

DrugScore |
39 |
66 |
73 |
27 |

Improvement |
95 |
78 |
35 |
-41 |

a All solutions of each docking experiment for the 91 complexes are considered. This number expresses the portion of all complexes for which at least one solution with the given rmsd was computed by FlexX.

b Only the ligand geometry scored to be on the 1st rank by either FlexX or DrugScore is considered. The numbers are related to the ones in the first line.

c Calculated by (%DrugScore - %FlexX)/%FlexX.

a All solutions of each docking experiment for the 91 complexes are considered. This number expresses the portion of all complexes for which at least one solution with the given rmsd was computed by FlexX.

b Only the ligand geometry scored to be on the 1st rank by either FlexX or DrugScore is considered. The numbers are related to the ones in the first line.

c Calculated by (%DrugScore - %FlexX)/%FlexX.

SAS-dependent preferences for solvent-exposure of ligand and protein atoms

To incorporate solvent effects we derived a SAS-dependent singlet preference either for ligand and protein atoms according to Equation 2. Figure 3 shows the statistical preferences for a C.3 or O.co2 ligand atom to be exposed in the protein-bound state compared to the unbound state.

Except for very small surface portions of remaining solvent exposed at the binding site, complete burial of C.3 atoms in protein-ligand complexes is strongly favorable for complex formation.

A completely different, although quite reasonable behavior is observed for O.co2-type atoms. In the complexed state, the distribution results as a compromise between several effects. On the one side, burial of O.co2 atoms diminishes the SAS. On the other side, surface portions contacting polar protein atoms are not considered as being buried. Hence, 'partial' burial for O.co2 results as best compromise for the complexed state. Expressed in terms of preferences, a 'partial' burial favors complex formation in contrast to a complete burial.

'Scoring scoring functions '

To assess the quality of a generated docking solution in terms of its deviation from the experimental structure, we define those ligand poses as 'well docked' that deviate by not more than 2.0 A, root-mean-square deviation (rmsd) from the crystal structure. Visual inspection of a number of test cases showed that within this limit the generated solutions resemble the native binding mode. (For detailed statistics on ligand poses with rmsd <1.0 A, and <1.5 A, from the crystal structure, see Table 1 .)

To test the discriminatory power of a scoring function in rendering prominent well-docked solutions from obvious artifacts, we checked how often 'well-docked' solutions correspond to the best rank. This criterion meets the requirement for virtual screening of large databases: due to the immense data flow a detailed investigation of only the best hits will be feasible and has to focus on the best ranked solutions. Assuming that the total statistical preference renders prominent the native protein-ligand pose, the experimental solutions should be ranked best. While applying this most stringent criterion one has to keep in mind that crystal structures of proteins are determined to limited resolution. Accordingly, we alleviate our criterion and allow also solutions deviating by less than 2.0 A from the experimentally given structure to be ranked better than the crystal structure.

The precision of predicted binding affinities is limited by the accuracy of the experimentally determined ones. For data obtained from different laboratories, an error of one order of magnitude with respect to the binding constant is assumed [50,51] whereas a mean error of perhaps 20% is found for sets of related ligands binding to the same protein measured in one laboratory by the same person in an assay with unchanged conditions.

Correlation of the calculated scores versus rmsd of the crystal structure

Total preferences were calculated (Equation 3) for docking solutions obtained by FlexX. In Figure 4 the correlations of calculated scores versus the rmsd from the crystal structure for four protein-ligand complexes are displayed; the individual scores were normalized to fall between 0 and 100.

In all cases, the crystal structure (rmsd = 0) corresponds to the best score. For lbbp, llst and 2ada, the native geometry is well separated from any computed solution.

Figure 4. Correlations of scores, normalized to values between 0 and 100, versus the rmsd from the crystal structure for the protein-ligand complexes lbbp, 11st, lrbp, and 2ada are depicted. The percentage of nonpolar (i.e. carbon) ligand atoms varies from 52% for 2ada to 95% for 1rbp, the number of rotatable bonds from 1 (1rbp, 2ada) to 6 (1bbp) and the number of solutions generated by FlexX from 99 (1rbp) to 289 (llst). Favorable ligand geometries correspond to low scores on the ordinate.

Figure 4. Correlations of scores, normalized to values between 0 and 100, versus the rmsd from the crystal structure for the protein-ligand complexes lbbp, 11st, lrbp, and 2ada are depicted. The percentage of nonpolar (i.e. carbon) ligand atoms varies from 52% for 2ada to 95% for 1rbp, the number of rotatable bonds from 1 (1rbp, 2ada) to 6 (1bbp) and the number of solutions generated by FlexX from 99 (1rbp) to 289 (llst). Favorable ligand geometries correspond to low scores on the ordinate.

Validation of the approach using two test sets of 91 and 68 protein-ligand complexes

The successful reproduction of some test cases indicates the scope of the method; however, to rigorously validate our method we studied two sets of protein-ligand complexes.

Both test sets are taken from the sample used to validate FlexX [52]. The considered ligands cover a broad range of chemical diversity. In the first case (91 complexes), FlexX has already recognized a generated solution with rmsd <2.0 A on the first rank in 54% of the cases. For the remaining 46%, FlexX also generates a geometry with rmsd <2.0 A, however, not ranked as best solution.

The second test set contains 68 additional complexes. Out of these, for 28 cases a computed solution with an rmsd <2.0 A, is found by FlexX on rank 1.

Figure 5. Accumulated number of complexes as a function ofthe rmsd from the crystal structure found for ligand poses on rank 1 scored by FlexX (O) and DrugScore (A), respectively. The number of complexes with the best geometry found on any rank by FlexX is depicted as solid squares.

For 38 remaining cases, FlexX did not generate a pose with rmsd <2.0 A, using default settings. This second set was not involved in the development and parameter adjustment and served as a validation set for the scoring function.

All complexes used for validating have been excluded from the database used to derive the probability distributions and statistical preferences.

Recognition of'well-docked' solutions

Figure 5 summarizes the accumulated number of complexes plotted versus the rmsd with respect to the crystal structure of the best ranked ligand pose either determined by FlexX or by Drugscore. In addition, the FlexX solution with the smallest rmsd disregarding its actual rank is plotted. It gives an idea how well an ideal scoring function could perform and shows how often FlexX generates well-docked solutions for the present data set.

Apparently, the new scoring function performs significantly better than the one implemented in FlexX. Table 1 gives the percentages of cases found on rank 1 with an rmsd of <1.0, <1.5, <2.0 and >2.0 A, compared to the best approximating geometry found on any rank. With respect to identifying well-

150-

cc tr

100-

100-

Figure 6. The rank of the crystal structure calculated by DrugScore among all decoy geometries generated by FlexX for each of the 91 protein-ligand complexes of the first test set is shown,

### Complex ID

Figure 6. The rank of the crystal structure calculated by DrugScore among all decoy geometries generated by FlexX for each of the 91 protein-ligand complexes of the first test set is shown, docked solutions on rank 1, FlexX succeeds in 54% of the cases whereas DrugScore detects 73%.

For the second test set (68 complexes), out ofthe 30 satisfactorily docked solutions, FlexX and DrugScore are equally successful (93 and 92%, respectively).

### Recognition of the crystal structures

As mentioned above, assuming that the crystallographically determined structures are 'optimal' solutions, they should obtain the best scoring compared to any computer-generated ligand pose. As shown in Figure 6, this is actually the case for 54% of the examples of the first test set (91 complexes). If we alleviate the criterion to well-docked solutions, in 7 1% of the cases the crystal structure or a close-by solution is ranked best by Drugscore. Applying the scoring function to the second test set, 65% of all crystal structures are found on rank 1. Here the alleviated conditions are even fulfilled in 90% of all cases.

### Prediction of binding affinities

To assess the predictive power of DrugScore to estimate binding affinities, a scoring value is calculated (Equation 3) for different sets of protein-ligand complexes taken from literature [15,16,50]. The obtained values are scaled versus the experimental results (Equation 4). and the squared regression coefficient, the standard deviation and the maximal deviation are determined.

Two different tests are performed for reason of comparison. DrugScore is applied to complexes of known X-ray structure and binding affinities. Additionally, to asses how well the near-native binding modes are recognized and their affinities are predicted, a series of ligands is docked with FlexX.

Predicting binding affinities for data sets of PDB crystal structures Data sets of PDB protein-ligand complexes are taken from Eldridge et al. [50] and Böhm [ 15,161 comprising serine proteases, metalloproteases, L-arabinose binding proteins, endothiapepsins and mixed samples composed of different proteins. Similar data have recently been used by Muegge and Martin [33] to compare their scoring function to SCORE1 [15] and SMoG score [31]. Table 2 and Figure 7 summarize the statistics on the performance of DrugScore on six different data sets.

The set of serine proteases comprises 16 trypsin and thrombin protein-ligand complexes (Figure 7a). The best correlation with R2 = 0.86 is achieved for this test set. Examples of low-binding affinities (pKt's <2) (lbra, 1tni, 1tnj, 1tnk, 1 tnl) are predicted to bind tighter by about 1.5 orders of magnitude.

The second set of metalloproteases contains 15 complexes of carboxypepti-dase A, thermolysin, and collagenase (Figure 7b) (R2 = 0.70). The largest deviation is obtained for 6tmn with 3.3 orders of magnitude (see also section on 'Visualization of 'hot spots' in protein binding pockets').

The third set comprises 11 endothiapepsin complexes. A weak correlation of R2 = 0.30 is found. Nevertheless, the weakest binding ligand (PD125754 in 1eed) as well as the strongest one (H-261 in 2er7) are correctly recognized compared to examples with intermediate affinity (pKt's between 6 and 8).

Set no. 4 consists of 9 arabinose-binding protein-ligand complexes. Since each of the nine crystal structures contains two epimers of the sugars, for both of them scoring values are calculated separately. However, the computed values for each pair of epimers are nearly identical. Thus, even though different amounts of a - or ß -form of each sugar have been present during experimental affinity determinations, due to the mutarotation equilibrium this fact has no consequences for our computer experiment. The correlation obtained is the worst of all data sets with R2 = 0.22, although the standard deviation only amounts to 0.75 log units. Including the crystallographically determined water molecules 309, 310 and 311 does not yield any improvement. These

Table 2. Statistical parameters of the correlations between experimentally determined binding affinities andthose calculated by DrugScore

Data set No. of pKi R2 SDa MDa log(-cs)b complexes range

Serine proteases |
16 |
7 |
0.86 |
0.95 |
1.50 |
-1.55 |

Metalloproteases |
15 |
10 |
0.70 |
1.53 |
3.32 |
-1.49 |

Endothiapepsins |
17 |
4 |
0.30 |
0.94 |
1.67 |
-1.86 |

Arabinose-binding proteins |
18 (9)c |
3 |
0.22 |
0.75 |
1.30 |
-1.23 |

'Others'd |
17 |
8 |
0.43 |
1.85 |
3.39 |
-1.53 |

Bohm1998e |
71 |
13 |
0.33 |
2.21 |
7.22 |
-1.48 |

Bohm1998 (I)ff |
49 |
10 |
0.44 |
1.79 |
4.52 |
-1.44 |

Bohm1998 (II)g |
46 |
10 |
0.56 |
1.53 |
4.36 |
-1.44 |

'Mixed set'h |
55 |
10 |
0.44 |
1.80 |
4.27 |
-1.43 |

Thrombin/trypsin inhibitors' |
64 |
5 |
0.48 |
0.71 |
1.25 |
-1.59 |

Thermolysin1 |
61 |
10 |
0.35 |
1.70 |
4.00 |
-1.59 |

Thermolysin (I)k |
43 |
10 |
0.43 |
1.68 |
3.90 |
-1.58 |

Thermolysin (II)1 |
15 |
5 |
0.36 |
1.53 |
3.00 |
-1.58 |

Thermolysin (III)m |
14 |
5 |
0.50 |
1.39 |
3.27 |
-1.60 |

a The standard deviation (SD) and the maximum deviation (MD) are given in units of pKi. b cs is calculated as described in the text.

c Each epimer found in the crystal structure of 9 arabinose-binding proteins is treated separately.

d This set contains complexes of the combined training and test sets of Bohm's work [15] that were not found in the four data sets above and show a resolution of less than or equal to 2.5 À.

e Only protein-ligand complexes found in the PDB are taken from the training and test set fof Bohm'swork[16].

f Subset of Bohm1998 that contains only complexes that show a resolution less than or equal to 2.5 À, and comprise ligands with less than or equal to 40 non-hydrogen atoms. g Subset of Bohm1998(I) excluding 3 outliers: 1cil, lsbp, 1tnk. See text for explanation. h Subset of both validation sets of DrugScore for recognition of near-native geometries for which inhibition constants were found in literature. Binding affinities were calculated for the best ranked ligand geometries generated by FlexX of each ligand.

i Set of related inhibitors of thrombin and trypsin, taken from [54] for which ligand geometries are generated by FlexX via docking the compounds into the protein structure taken from Obst et al. [55] or from lpph.

j Sixty-one thermolysin inhibitors taken from the training set reported in [56] for which ligand geometries are generated by FlexX via docking the compounds into the protein structure ltlp.

k Subset of 'Thermolysin' containing only those ligands for which a reasonable geometry was generated by FlexX.

1 Fifteen thermolysin inhibitors taken from the test set in [56] for which ligand geometries were generated as described (j).

m Subset of Thermolysin(II) excluding the outlier ZGPLNH2.

Figure 7. Correlation plots of experimentally determined pK values versus calculated ones by DrugScore for four different sets of X-ray protein-ligand complexes as defined in the text: serine proteases (a), metallo-proteases (b), 'others' (c), and Bohml998(I/II) (d). For the latter, the three outliers 1cil, lsbp, 1tnk are highlighted (circles). Together with the ideal correlation line, deviations by ±1 log units are depicted.

Figure 7. Correlation plots of experimentally determined pK values versus calculated ones by DrugScore for four different sets of X-ray protein-ligand complexes as defined in the text: serine proteases (a), metallo-proteases (b), 'others' (c), and Bohml998(I/II) (d). For the latter, the three outliers 1cil, lsbp, 1tnk are highlighted (circles). Together with the ideal correlation line, deviations by ±1 log units are depicted.

waters are considered to contribute to the specificity of the protein towards the binding of L-arabinose, D-fucose and D-galactose [53].

Data set 5 ('others') comprises 17 protein-ligand complexes formed with different proteins. These data were also used by Muegge and Martin (Figure 7c). Including cofactors as parts of the protein, we obtain a correlation of R2 = 0.43. Obviously, three examples are predicted too high: the renin-complex lrne as well as the HIV-proteases 4hvp and 4phv with 51, 54 and

46 non-H atoms in the ligands, respectively. Since Drugscore has been parameterized only for ligands composed of less than 50 non-hydrogen atoms, which is considered to be an upper limit for the size of drug-like molecules, information concerning ligands with a size close to or beyond this limit could be regarded as incompletely covered in our analysis.

Finally, a composed data set has been selected considering the X-ray data used by Bohm in his SCORE2 study [16] (Figure 7d). Any modeled complexes used by Bohm have been discarded. This set (Bohm1998) of 71 complexes yields a correlation of R2 = 0.33. Excluding all complexes with a resolution >2.5 A or possessing ligands with more than 40 non-H atoms (vide supra) improved the correlation to R2 = 0.44 (Bohm1998 (I)). Three remaining outliers can be explained: in lsbp the ligand is a sulfate, thus falling beyond the lower limit of 6 non-hydrogen atoms used to parameterize DrugScore. In lcil (inhibitor ETS bound to carbonic anhydrase II), the sulfon-amide nitrogen bound to the zinc ion is deprotonated and yields a strong ionic interaction to the metal. DrugScore treats this interaction as a contact between a normal amide nitrogen and a zinc, thus underestimating its contribution to the binding. Finally, the ligand in ltnk shows quite unusual intramolecular dimensions supposedly resulting from some deficiencies in the final refinement process. Removing these three entries results in a significantly improved regression with R2 = 0.56 and SD = 1.53 (Bohm1998 (11)).

The performance of DrugScore is compared to other scoring functions in terms of some statistical descriptors determined for the first five test sets (data for PMFScore, SCORE1, SMoGScore taken from [33]; for ProteusScore taken from [50]). Figure 8 depicts the standard deviation for each of these test sets; for arabinose-binding proteins, Muegge and Martin report a value of 69.7 in the case of SCORE1 while there are no results given by Eldridge et al. for the 'others' test set. Compared to both knowledge-based scoring functions (PMFScore and SMoGScore), DrugScore performs better, resulting in a lower standard deviation for all four test sets considering one particular protein target. For the mixed data set ('others') PMFScore performs better. Comparing the regression-based scoring functions (SCORE1 and Pro-teusScore) with respect to Drugscore, the latter is superior to SCORE1 in all reported cases while ProteusScore obviously performs better in the case of the arabinose-binding proteins.

Predicting binding affinities for ligand geometries generated with FlexX The examples used so far are all based on experimentally determined binding geometries. Obviously, the scoring function performs satisfactorily for experimentally given geometries. However, in virtual screening, affinity predictions are attempted based on modeled binding modes. This is in partic-

Figure 8. Comparison of different currently developed scoring functions with DrugScore's performance to predict binding affinities in terms of their standard deviation. Five sets of X-ray protein-ligand complexes are used as defined in the text: serine proteases, metallo-proteases, endothiapepsins, arabinose-binding proteins, 'others'. Data for PMFScore, SCORE1, and SMoGScore are taken from Muegge and Martin [33], for ProteusScore from Eldridge et al. [50] (except for test set 'others'). In the case of arabinose-binding proteins, scored by SCORE1, a value of 69.7 is reported [33].

Figure 8. Comparison of different currently developed scoring functions with DrugScore's performance to predict binding affinities in terms of their standard deviation. Five sets of X-ray protein-ligand complexes are used as defined in the text: serine proteases, metallo-proteases, endothiapepsins, arabinose-binding proteins, 'others'. Data for PMFScore, SCORE1, and SMoGScore are taken from Muegge and Martin [33], for ProteusScore from Eldridge et al. [50] (except for test set 'others'). In the case of arabinose-binding proteins, scored by SCORE1, a value of 69.7 is reported [33].

ular the case if a docking tool is applied. Here, the scoring function has to render prominent the most likely geometry from a large sample of generated geometries.

To test the performance of DrugScore with respect to virtual screening, we calculated scoring values for ligand binding geometries generated by FlexX for (a) a sample of 55 complexes taken from the combined validation sets for DrugScore to render prominent well-docked solutions, (b) a congeneric series of 32 inhibitors [54] docked into thrombin [55] or trypsin (lpph), as well as (c) a series of 61 and 15 thermolysin inhibitors (taken from the training and test sets of [56], respectively) docked into the protein structure ltlp.

For the sample (a), a standard deviation of 1.8 log units and a squared correlation coefficient of 0.44 (data not shown) is achieved. For comparison,

## Post a comment