World Drug Index dataset

Having demonstrated the potential of data fusion on a small dataset, the next set of experiments used a file of structures and associated broad-class bio-activity data from the World Drug Index (WDI) database [28]. Three different types of similarity measure were used here, these being based on 2D fragment occurrence data, 3D geometric information and on molecular fields. The 2D rankings were obtained using the UNITY fingerprints mentioned previously, while the 3D rankings were obtained using the atom-mapping measure described by Pepperrell et al. [29]. This measure uses inter-atomic distance information to identify pairs of atoms, one in the target structure and one in the current database structure, that are surrounded by similar patterns of atoms; these initial atomic equivalences are then used to construct an approximate mapping of the target structure onto the database structure. The field-based rankings were obtained using the FBSS (for field-based similarity searching) program described by Drayton et al. [30], in which a target structure is aligned with a database structure by means of their steric, hydro-phobic and electrostatic fields. The particular version of the program used here considered all three types of field in the generation of an alignment, and hence in the resulting similarity score (this corresponding to the 'All' search of Drayton et al. [30]).

Ten target structures were chosen that had been used previously by Kears-ley et al. in their studies of WDI-based similarity searching [17]. The similarity searches were performed on datasets of approximately 3600 structures, each containing the activity class for the target structure with an additional 3500 randomly selected WDI molecules. The data available for fusing comprised three sets ofrankings (one for each ofthe original similarity measures) for each of the 10 targets, with the effectiveness of each search being measured by the number of molecules in the top-50 rank positions that had the same activity as the target; other performance measures for this dataset are discussed by Ginn [23]. Table 4 lists the numbers of actives identified in the original and fused searches for each of the 10 target structures. The results obtained are similar to those obtained with the cellular-uptake dataset: while the fused results are not always as good as the best individual result, they provide a generally high, and thus robust, level of effectiveness whereas the best original measure varies from target to target. This is particularly clear if one inspects the mean activities and ranks at the bottom of the table, where it will be seen that SUM would again seem to be the fusion rule of choice.

Table 4. The number of actives found in the top-50 rank positions for searches in the WDI database for the original similarity methods (columns 3D, 2D and FBSS) and afterdatafusion (columns MAX, MIN and SUM). The bold underlined numbers indicate a fused result at least as good as the best original similarity measure for that target structure

Table 4. The number of actives found in the top-50 rank positions for searches in the WDI database for the original similarity methods (columns 3D, 2D and FBSS) and afterdatafusion (columns MAX, MIN and SUM). The bold underlined numbers indicate a fused result at least as good as the best original similarity measure for that target structure

Target

3D

2D

FBSS

MAX

MIN

SUM

Apomorphine

15

23

14

24

16

26

Captopril

23

34

12

26

27

31

Cycliramine

43

31

36

i3

42

45

Diazepam

27

27

15

23

23

22

Diethylstilb'ol

44

33

34

42

38

42

Fenoterol

19

33

17

28

29

31

Gabaxadol

6

2

6

5

_6

5

Morphine

20

28

16

19

24

16

RS86

0

8

5

10

6

14

Serotonin

13

19

13

'13

20

15

Mean actives

21.0

23.8

16.8

23.3

23.1

24.7

Mean rank

3.60

3.05

5.15

3.40

3.05

2.75

0 0

Post a comment