Methods The screening tool Slide

SLIDE (for 'Screening for Ligands by Induced-fit Docking') can screen databases of 3D structures of over 100 000 small organic molecules, typically within hours to a day, on an ordinary desktop workstation. It has also been used for screening 185 000 peptides, which are more flexible, within a few days [7,48]. Slide uses multi-level hashing, mean-field theory, and an empirically tuned scoring function to efficiently recognize infeasible compounds, dock the most promising ligand candidates, and produce a ranked list of some 100 potential ligands for a given protein target.

Representing the binding site

The binding site of the protein is described by a template of favorable interaction points above its surface, onto which ligand atoms are mapped during the search. A template includes four different types of points:

• Hydrogen-bond donor point. During screening, SLIDE can place a hydrogen-bond donor of the ligand onto this point, which is determined to be within favorable hydrogen-bonding distance of a protein hydrogen-bond acceptor.

• Hydrogen-bond acceptor point. Each acceptor point is within favorable hydrogen-bonding distance of a protein hydrogen-bond donor.

• Hydrogen-bond donor/acceptor point. This is within hydrogen-bonding distance of both a hydrogen-bond acceptor and a donor of the protein, so either a ligand hydrogen-bond donor or acceptor can be placed here, or a group that can accept and donate at the same time (e.g., hy-droxyl oxygen).

• Hydrophobic interaction center. These points are placed above a hy-drophobic surface patch of the protein, and are matched by the centers of the most hydrophobic ligand groups, hydrocarbon rings.

The template can be automatically generated based on the ligand-free structure ofthe protein, which reduces bias towards known ligands. For automatic template generation, the binding site is filled with random points that are 2.5 to 5.0 A from a protein atom. To determine favorable hydrogen-bonding positions, each of these points is checked for donors or acceptors in the protein within a distance of 2.5 to 3.5 A; for protein hydrogen-bond donors, the angle between the donor, the donated hydrogen, and the probe point is also taken into account, and must be larger than 120°. Hydrophobic points are located between 3.5 and 5.0 A, from the nearest protein atom. For these points, the average hydrophilicity of all protein atoms within 5.0 8, is below 0.1, indicating a hydrophobic site (based on the values provided in Reference 49). All points of the same type are then clustered using complete-linkage clustering [50] to yield a computationally tractable number of template points (typically up to 200). Similarly, interaction points in each potential ligand are defined as those that can act as hydrogen bond acceptors, donors, acceptors and/or donors (e.g., hydroxyl oxygen atoms), or hydrophobic centers. The latter are defined as the centers of hydrocarbon rings with 6 or fewer carbon atoms (e.g., cyclohexane or benzene rings). Hydrogen-bond donors or acceptors in the ligand candidates are identified for oxygen, nitrogen, sulfur, and halogen atoms based on the molecular orbital type, valency, and presence ofhydrogen atoms in SYbYL mol2 format files prepared for each molecule in the ligand database. The interaction points in ligand candidates are mapped onto points in the binding site template having the same chemistry.

Alternatively, the template can be defined based on interaction patterns observed in complexes with known ligands, biasing the search towards ligands with similar interaction patterns, similar to pharmacophore-based screens. For either the 'unbiased', automatically generated templates, or templates designed based on known ligand binding, special key interaction points that must be matched by the ligand can also be included. This is useful to ensure that a certain part ofthe binding site is covered, or that a docked ligand makes particular interactions. Beyond the template, which governs the selection of complementary ligands, the binding site of the protein is represented by a shell of surface residues and water molecules likely to mediate protein-ligand interactions.

During the ligand search, all triangles of hydrogen-bond and hydrophobic interaction points in the screened molecules are mapped exhaustively onto triangles of template points with compatible geometry and chemistry, and such a mapping serves as a basis for docking a molecule into the binding site. A multi-level hashing approach is used to directly access all template triangles with feasible chemistry and geometry for a given set of three interaction centers in the ligand. Before the search, all possible template triangles are generated from the set of binding-site template points, and are indexed via four levels of hash (indexing) tables. The indices in these hash tables are based on the chemistry (H-bond donor/acceptor or hydrophobic) of the three triangle points, on the perimeter of the triangle, and then on the longest and the shortest side for each of the indexed template triangles. By using these four properties for a given triplet of interaction centers in a ligand candidate, all template triangles with compatible geometry and chemistry can be directly and very efficiently accessed. For feasible matches between each ligand triangle and template triangle, the geometrically best mapping is computed, which is then used to transform the ligand triangle onto the corresponding template points by applying a least-squares fit superposition. When including key points in the template, only those triangles that include at least one of these key interaction centers are indexed in the hash tables.

Docking the anchor fragment

The matched ligand interaction centers define the anchor fragment, which is the part of the molecule containing the three interaction centers. To maintain the distances between these matched points, all flexible bonds within this anchor fragment are rigidified. All chemically and geometrically feasible anchor fragments are then exhaustively tested in each ligand candidate for their ability to match triangles within the protein template. Collisions of the anchor fragment with protein main-chain atoms are resolved by iterative translations of the fragment as a rigid body. For this, a global translation vector is used to shift the anchor fragment the minimal amount necessary to resolve all collisions [5]. If all main-chain collisions can be resolved, the remaining atoms of the ligand are added to the anchor fragment in the conformation found for the molecule in the database. These atoms outside the anchor fragment are considered flexible, such that all single bonds in these parts can be rotated later, to resolve collisions with protein atoms. At this point we retain only those ligand dockings with at least 50% of their carbon atoms buried against the protein in order to keep only those dockings with good shape complementarity and minimal exposure of hydrophobic atoms to solvent; our analysis of 89 known protein-ligand complexes [5 1] showed they all met this criterion [7].

Modeling induced complementarity

Induced fit between the two molecules is modeled by resolving any collisions of their flexible parts by directed rotations of single bonds in either the ligand or side chains of the protein. This follows the paradigm that in most cases the two molecules will move as little as possible in order to be shape-complementary. There are typically several rotations that will re solve an intermolecular collision, and an approach based on mean-field theory [32,52,53] is used to decide which rotations to use to improve the shape complementarity.

For each pairwise intermolecular collision, the bonds in each molecule that can resolve the collision are identified. They are stored in a system together with the corresponding minimum rotation angle and the number of non-hydrogen atoms that will be displaced by the rotation. These two values provide the basis for a force measuring the cost of the rotation. A probability is assigned to each rotation, and all rotations that can be used to resolve one particular collision are initialized with equal probabilities. During several cycles of the mean-field optimization, these probabilities are updated and converge to higher values for those rotations that represent a globally optimal choice. When applying these rotations, a maximal number of collisions is resolved with minimal conformational changes in both molecules, without bias to one or the other; details of the mathematics of this procedure are provided in Reference 7.

In each cycle ofthe mean-field optimization process, a mean force is computed for each rotation in the system, which is based on the force associated with this rotation and its correlations with other rotations in the system. The probabilities for all rotations in the system are updated at the end of each cycle, taking into account the mean forces of alternative rotations for the same collision. We do 10 cycles of the optimization, then the probabilities have typically converged to define a near-optimal set of rotations. All feasible rotations are applied in the order provided by the computed probabilities. Since it is likely that not all rotations can be resolved and that new collisions might have emerged, the mean-field optimization process is iterated up to 10 times. Intramolecular collisions are also tolerated, since it is assumed that they will be resolved in a future iteration. The result of the mean-field optimization process is either the exclusion of a molecule as infeasible, if collisions cannot be resolved, or a shape-complementary docking of the two molecules.

Considering binding-site solvation

In order to not bias the search towards known ligands, we typically use the binding site from a ligand-free crystal structure ofthe target protein for screening. Water molecules are often observed in these crystal structures, and SLIDE can consider tightly bound waters when docking potential ligands. The current approach is to either translate a water molecule, if it overlaps with a ligand atom after docking the ligand into the binding site, or to displace it. A bound water molecule is only displaced if its collisions cannot be resolved by iterative translations, which are computed by summing the translation vectors that resolve each collision between the water molecule and a protein or ligand atom. Slide considers a penalty term for each displaced water when scoring a complex, and only displacements by non-polar ligand atoms are penalized.

To select which protein-bound water molecules to include in the screening and docking, we use a knowledge-based approach to determine those waters likely to be conserved upon ligand binding and to fix a penalty for their displacement. The tool Consolv [8], a k-nearest-neighbor classifier, is used to predict which binding-site waters will be conserved and which will be displaced upon ligand binding. Consolv's prediction is based on several features that describe the favorability of the local environment of a water molecule, and its knowledge base is a set of 5542 water molecules taken from 30 independently solved protein structures. Prior to screening, we remove all waters that are predicted to be displaced and for the remaining waters we use Cansolv's prediction confidence to scale the penalties for their displacement. To compute the penalty, we count the number of hydrogen bonds that are lost by displacing this water and scale this number by Consolv's prediction confidence (between 50 and 100%).

Scoring a potential ligand

Whenever a collision-free complex is generated, a score is assigned to the ligand based on the number of intermolecular hydrogen bonds and the hydro-phobic complementarity of its interface with the protein. If not provided in the protein or ligand structure, the position of the shared hydrogen in each intermolecular hydrogen bond is computed. This position is well-defined for all but the terminal hydrogens in lysine and hydroxyl side chains; for these cases we choose the optimal hydrogen position subject to bonding constraints. All hydrogen bonds with a donor-acceptor distance up to 3.5 Á and a donor-hydrogen-acceptor angle larger than 120° contribute equally to the score. If water molecules are included in the interface, all water-mediated hydrogen bonds are also counted. Intra-protein hydrogen bonds that were broken due to the rotation of a protein side chain, or hydrogen bonds to waters that were displaced upon ligand docking, lower the overall hydrogen-bond count by the number of lost hydrogen bonds. Note that this does not penalize the displacement of a water molecule by a polar ligand atom that preserves the hydrogen bond to the protein. The number of hydrogen bonds lost by displacing water molecules is weighted by Consolv's prediction confidence of their displacement. The final intermolecular hydrogen-bond score between protein P and ligand L, reflecting loss in intra-protein and water-mediated hydrogen bonds, is HBONDS(P,L).

For computing the hydrophobic complementarity value, atomic hydro-philicity values were taken from a statistical survey of hydration of the different atom types in 56 protein structures [49] (hydrophobicity values for protein atoms were taken from Table II and values for ligand atoms from Table III in Reference 49). The contribution of a single ligand atom is based on the comparison of its hydrophobicity value with the average hydrophobicity of the surrounding protein surface atoms [7]. Given the hydrophobicity h(a) of an atom a, with h(a) £ [0.. .635] calculated as the average number of hydrations per 1000 occurrences of that atom type (Table II in Reference 49), a value of 0 represents a maximally hydrophobic atom, 635 is maximally hydrophilic, and 3 17 is intermediate. The hydrophobic complementarity of the contact surface between protein P and ligand L is computed as:

considers only the hydrophobic contribution of ligand atoms li, since values larger than 317 refer to hydrophilic atoms. The hydrophobicity h(Pi) of the protein neighborhood Pi for a single ligand atom li is defined as the average hydrophobic contribution of all protein atoms pj within a distance of 4.0 Â ofthe ligand atom li :

The denominator in each term of the sum describing the hydrophobic score, HPHOB(P,L), is always greater than or equal to 32, which is 10% ofthe maximum score for a single ligand atom. This ensures that the overall HPHOB(P,L) score is not dominated by a few contacts with very small differences between protein and ligand hydrophobicity.

The scoring function SCORE(P,L) for a collision-free complex is a linear combination of the hydrophobic and hydrogen-bond terms:

The relative contribution of these terms was tuned for best fit to the experimentally determined affinities of 89 protein-ligand complexes [51], giving the weight of 1.3: 1.0 for the hydrogen-bond term relative to the hydrophobic term.

0 0

Post a comment