Application

Since we do not have access to real activity data on an assay, we simulate a real-world application of our virtual screening methods by hiding a set of known active molecules in a larger database of compounds of unknown activity. The test is to pick as many known active molecules among a certain small fraction of high ranking candidates. We approach this by the help of a single active compound or at most a small set of known actives that serve as a bait to fish the other active compounds.

Traditionally the measurement of performance for this kind of approach employs enrichment values or enrichment plots that quantitatively show how many more active compounds are found in a fraction of the processed database than in a random selection. The enrichment is defined as

Na (p) denotes the number of known active molecules among the top-ranking fraction p of compounds of the database (hit-rate), NA is the number of active molecules in the entire database, and N is the size of the screened database.

Sometimes it is also useful to scale the enrichment by the maximum possible enrichment opt(Ea(p)) in order to allow for a fair comparison across different datasets or datasets of varying size.

Note that this normalization simply reduces the enrichment to the fraction of the actives that are maximally to be found.

in order to give some visual impression as to what extent the actives condense in the upper portion of the database, we provide a bar-code-like visualization of the distribution of the actives after screening in Figure 10. Finally, we compare the performance and the hit lists of our method to those of a standard 2D approach, namely the DAYLIGHT fingerprint software [37], in order to show that the actives picked by either method differ significantly.

The experiments have been carried out on datasets of three different sizes (named small-size, medium-size, and large-size). Large numbers of training evaluations have been performed on 100 randomly selected compounds of a dataset collected by Briem [38]. Each of these training sets contains 10 active and 90 inactive compounds. Briem's collection comprises the following activity classes: 136 PAF-antagonists (PAF), 114 HMG-CoA inhibitors (HMG), 40 ACE inhibitors (ACE), 49 thromboxan A2 receptor antagonists (TXA2), 52 5HT3 receptor ligands (5HT3), and 581 randomly selected compounds from the MDDR database [39]. All actives are known to bind in the nanomolar range to their target.

We took each activity class into account separately. All medium-size test cases were composed of actives of a specific activity class and the remaining structures of Briem's dataset as the inactives. I.e., in total, we roughly have 1000 molecules for each of the five different sets of actives. The large-size test databases consist of all actives of a specific class and the whole NCI database as a supplement of inactive compounds [40]. This size of datasets with random chance of a hit of approximately 1/1000 closely mimics real world applications.

In the first set of experiments carried out on the training dataset, we applied FlexS using the following iterated protocol for evaluating the capabilities of the Gaussian merging procedure in order to enrich the information content of the reference structure. (1) We start with each of the 10 active molecules in the training sets, in turn, and determine the superposition score of the remaining 99 molecules. The solution lists ranked by score are analyzed to obtain 10 enrichment curves, the mean of which is displayed as a single curve in Figure 6 for the HMG example case. (2) The active molecule on rank x (x to be chosen) of a solution list is taken in its generated position and

Figure 5. Flow chart of the automated procedure applied with six iterations to obtain enriched Gaussian representations for the reference compounds.

merged (by means of its Gaussian representation) to the reference compound with which it was aligned. The composite model is then taken as the original reference in another execution of step 1 again. The protocol is illustrated in Figure 5.

The choice of the rank x of the active compound to be merged poses a tradeoffbetween close analogues (top ranking) with quite reliable placements and remote analogues (lower rank) with less reliable placements but higher probability to be of a different chemical class. We tested taking placements on rank 3 and 10, respectively, with quite similar performances for both choices. Figure 6 shows the average enrichment charts displaying EA(p) for the first 10 percentiles on the HMG example cases. The chart on the left hand side displays the results on the training set. The right hand side shows the results on Briem's entire dataset. As a point of reference the enrichments archived with the DAYLIGHT fingerprint software have been included as the first ofthe series of curves displayed. Note that similar results are achieved on both dataset sizes. Also the reference enrichment protocol performs similarly across the different activity classes.

Most enrichment curves are found to increase significantly from step to step, hence indicating that, in fact, the information content of the reference model is increased by the merging of the Gaussian descriptors. Note that this also implies that, to some extent, meaningful alignments have been generated for the molecules, since otherwise, instead of reinforcement of the reference model, a blurring effect on the same would have been expected.

Also, Figure 6 exemplifies our experience that the results achieved on a smaller dataset convincingly carry over to a larger one. I.e., an active compound that fishes a lot of other actives ofthe smaller set is also a good bait on the larger application. This is an important result since it allows for perform-

HMG training HMG Brrem

HMG training HMG Brrem

Figure 6. Normalized enrichments on the first 10 percentiles of the training set (left) and Briem's entire set (right), both for the HMG example, are displayed. As a reference the respective enrichments obtained using the daylight software are added in front of the other curves showing steps 1 to 5, respectively.

Figure 6. Normalized enrichments on the first 10 percentiles of the training set (left) and Briem's entire set (right), both for the HMG example, are displayed. As a reference the respective enrichments obtained using the daylight software are added in front of the other curves showing steps 1 to 5, respectively.

ing several runs and extensive testing on a subset at low computational costs, followed by the real and presumably expensive applications using a carefully selected set of reference compounds.

Of course some bias may be present by chance on a small subset and may mislead the choice of a potent reference structure. However, in our experience, taking not only a single but the three best performing structures on the training set was sufficient to consistently improve results. Also, we found that fusing the results of these individual runs consistently enhances the performance. The fusion operators we took into account were minimum, maximum, and the mean of the individual ranks. Technically, applying the fusion operator f to a compound that has been aligned to three different references on ranks r1, r2, and r3, respectively, means to determine the rank r ofthe compound by r = f(ri, r2, r3). The newly determined rank is used to reorder the sequence of the database. Figure 7 shows the enrichment plots obtained by applying the different operators on the HMG example case. Usually, the min operator performs best. Note that the fused ranking may even be superior to the individual rankings.

We made the important observation that the hits obtained with our tools are usually quite dissimilar from those obtained by a 2D method. For a comparison we applied the daylight software again. Figure 8 shows the average overlap we observed on the training datasets across all activity classes, among the top 10 candidates. The important point about this result is that 2D similarities are usually much less surprising to a chemist than a 3D similarity of topologically only remotely similar molecules.

Our final tests were performed using both of our methods in a two-step filtering protocol. With the preceding experiments we determined the best performing fragments in Briem's dataset for the different activity classes.

Figure 7. Enrichment charts for FlexS (bottom) and RIGFIT (top) on the training datasets of the HMG example are displayed. The curves (differentiated by symbol) show the behaviour of the three fusion operators we tried. Additionally, the individual enrichments are shown as error bars. The min curve in the bottom chart is clearly on top and therefore superior to the individual results.

Figure 7. Enrichment charts for FlexS (bottom) and RIGFIT (top) on the training datasets of the HMG example are displayed. The curves (differentiated by symbol) show the behaviour of the three fusion operators we tried. Additionally, the individual enrichments are shown as error bars. The min curve in the bottom chart is clearly on top and therefore superior to the individual results.

Figure 8. Across all training datasets the average number of top 10 hits found by FlexS and daylight, respectively, are shown. Additionally the average fraction of common hits is indicated by a different shading.

Figure 9. Enrichments on the first 10 percentiles of the NCI database (dashed) and on the top 5000 high-scoring molecules of the filtering step (solid), both for the HMG example are displayed. In the filtering step RIGFIT was used to shrink the NCI to 5000 molecules. The enrichment in the second step was achieved with FlexS.

Figure 9. Enrichments on the first 10 percentiles of the NCI database (dashed) and on the top 5000 high-scoring molecules of the filtering step (solid), both for the HMG example are displayed. In the filtering step RIGFIT was used to shrink the NCI to 5000 molecules. The enrichment in the second step was achieved with FlexS.

These have been used to screen the entire NCI [40] database. For a complete search with a single fragment this RigFit experiment took roughly 30 h of computing time. The dashed curve in Figure 9 shows the achieved enrichment in the HMG example case. Out of 114 active molecules, 53 have been pushed to the top 5000 ranking molecules, i.e., to the top 4% ofthe database. These top

Figure 10. The bar charts qualitatively show the compression of the actives to the top scoring fraction of the database. The left hand side shows the RigFIT experiment on the NCI database with 120 000 structures and randomly scattered actives in the beginning (leftmost). The top 5000 ranking compounds contain 53 of the 114 HMG actives after screening (second from left) and are taken as input for the FLEXS screening again randomly scattered (second from right). The resulting ranking of the actives is indicated in the rightmost column.

Figure 10. The bar charts qualitatively show the compression of the actives to the top scoring fraction of the database. The left hand side shows the RigFIT experiment on the NCI database with 120 000 structures and randomly scattered actives in the beginning (leftmost). The top 5000 ranking compounds contain 53 of the 114 HMG actives after screening (second from left) and are taken as input for the FLEXS screening again randomly scattered (second from right). The resulting ranking of the actives is indicated in the rightmost column.

5000 candidate structures have been screened using FLEXS, again with the best performing reference compounds on the smaller datasets. For a complete search with a single reference compound this experiment took about 150 h of CPU time. It has been performed in parallel with our daily computing routine, distributing the computing task across the available unused hardware in our institute overnight. The solid curve in Figure 9 shows the achieved enrichment. The bar charts in Figure 10 give an impression of how the distribution of actives looks with these enrichments as compared to random scattering of the actives. Again, similar results have been obtained across all datasets.

0 0

Post a comment