Introduction

The process of finding novel leads for a new target is one of the most important steps in a drug development program. Today two complementary strategies are followed: experimental high-throughput screening and computational methods exploiting structural information of the protein binding site [1-4]. The latter approaches try to predict, e.g. via docking, the actual binding mode of a ligand at the binding site [5,6].

Several of the published docking procedures are fast enough to serve the purpose and suggest solutions approximating the native pose in up to 80% of the cases [7-9]. Nevertheless, the pose closest to the experimental situation is often not ranked as the energetically most favorable one within a set of decoy geometries, which indicates deficiencies in the applied ranking schemes [10]. Consequently, we embarked on the development of a new scoring function.

At best, binding affinity is determined by statistical thermodynamics resulting in a master equation that considers all contributing effects [11,12]. Although theoretically most convincing, elaborate methods such as free energy perturbation (FEP) or thermodynamic integration (TI) are computationally too demanding for the application described above [13].

Instead, the partitioning of binding affinity into several additive terms or descriptors is a widely accepted assumption for the development of empirical regression-based scoring functions [14]. Usually a number of empirically derived contributions is fitted to a data set of experimental observations [ 1519]. Approaches such as VALIDATE [20] are based on the ideas of QSAR. These approaches achieve a precision of about 1.5 orders of magnitude while predicting K [15,20]. However, any regression analysis suffers from the fact that the obtained conclusions can only be as precise and generally valid as the data used covers all contributing and discriminating effects in protein-ligand complexes. The same arguments are true for penalty filters developed to discard computer-generated artifacts from a list of favorable ligand poses [21].

We decided to follow an alternative way to develop a scoring function based on empirical knowledge. During its development we decided not to assign protonation states to the various atom types, assuming that the derived statistical preferences implicitly reflect these influences. Furthermore, any binding feature not in agreement with the most frequently observed contact preferences will likely be penalized due to its minor occurrence.

Knowledge-based potentials have been applied to rank different solutions of the protein-folding problem [22-24]. Up to now, this approach has been applied to only five case studies for the ranking of different protein-ligand complexes. Except for a single test case in the most recently published work

[25], none of these, however, engaged in identifying near-native poses of one ligand with respect to one protein.

Wallqvist et al. [26,27] classified the surfaces ofburied ligand atoms found in 38 complexes and developed a model to predict the Gibbs free energy of binding based on these observed atom-atom preferences. From an analysis of 10 HIV protease inhibitor complexes, they approximated the free energy of binding to within ± 1.5 kcal/mol.

Using a dataset of 30 HIV-1, HIV-2, and SIV proteases, Verkhivker et al. [28] compiled a distance-dependent knowledge-based pair-potential which was then combined with hydrophobicity [29] and conformational entropy scales [30].

DeWitte and Shaknovich [31] used a sample of 126 structures from the PDB [32] to develop a set of 'interatomic interaction free energies' for a variety of atom types.

Muegge and Martin [33] explored structural information of known protein-ligand complexes from the PDB and derived distance-dependent Helmholtz free interaction energies of protein-ligand atom pairs. Using 77 protein-ligand complexes for validation, the calculated score achieves a standard deviation from the observed binding affinities of 1.8 log K units. The scoring function was further evaluated by docking weak-binding ligands to the FK506-binding protein [34].

Mitchell et al. [25,35] developed a potential of mean force at atomic level using high-resolution X-ray structures from the PDB considering 820 possible atom-atom pairs. The performance to identify low-energy binding modes from decoy conformations is tested only for heparin binding to bFGF (PDB-code: lbfc). While the crystal structure was ranked lowest, the best-scored of the generated structures deviates largely from the experimental situation. Evaluating a test set of 90 different PDB complexes, with respect to binding energies, a squared correlation coefficient of 0.55 is achieved as optimum.

In the present article, we describe the development of a new scoring function (called DrugScore) to predict protein-ligand interactions (see also [36]). It is based on the vast structural knowledge stored in the entire PDB retrievable using ReLiBase [37]. Knowledge-based probabilities, well adjusted to describe specific short-range distances between ligand and protein functional groups are combined with terms considering solvent-accessible-surface portions of bath partners that become buried upon binding. For the first time, knowledge-based probabilities are used to discriminate and predict ligand-binding modes for 159 protein-ligand complexes. Multiple solutions, generated for these examples by FlexX, have been re-ranked to obtaina significantly improved scoring with respect to their deviation from the native pose. In addition, the power of this approach to predict binding affinities is tested by analyzing sets of crystallographically determined protein-ligand complexes and protein-ligand geometries generated by FlexX. Compared to the results of commonly applied scoring functions, DrugScore reveals convincingly better predictions. Finally, the implicit consideration of directionality effects in intermolecular interactions described by the distance-dependent part of the function is demonstrated by visual inspection and a statistical evaluation. The results are encouraging compared to the recently published Superstar method [38].

0 0

Post a comment