## Cellularuptake dataset

The dataset

These experiments involved a set of 136 biological dyes that are used to stain cells so as to visualise various organelles, specifically the lysosomes (L), the mitochondria (M), and the nucleus (N). These three broad activity classes were subdivided into eight mechanism-specific subclasses [24] and each molecule in the dataset was allocated an 8-bit activity bit-string in which the i-th bit was switched on if the molecule exhibited the i-th activity. Three different types of descriptor were used to characterise the molecules in this dataset: 2D fragments, 3D fragments and physical properties. The 2D fragment descriptors used here were the fingerprints produced by Barnard Chemical Information Limited (BCI) [25], while the 3D fingerprints were based on the NBN non-bonded torsion angle descriptor developed by Bath et al. [26]. The physical property descriptor comprised three standardised properties for each molecule: the logarithm of the octanol/water partition coefficient, the net electric charge, and the number of bonds included in the delocalised electron system of the molecule [24]. Each molecule in the dataset was considered as a target for similarity searching using each of the three similarity measures, with the similarity between a pair of molecules being calculated using the Tanimoto coefficient (the simple binary form of this coefficient for the 2D and 3D fingerprint measures and the generalised, non-binary form for the physical property measure) [5]. The three rankings for each target structure were fused using the SUM, MIN and MAX fusion algorithms defined previously.

### Comparison of ranks and similarities

An inspection of Scheme 1 shows that Step 2 of the basic fusion procedure involves the rank positions for each database structure, rather than the similarity scores that are output by the similarity measure. On first sight, the formermight seem to be the less intuitively reasonable approach as it involves a loss of information when compared with the use of scores. However, there are two factors associated with the use of similarity scores that lessen their attractiveness. Firstly, as researchers are more likely to be concerned with some number of nearest-neighbours to the target structure, rather than with those items that are above some threshold of similarity, it seems logical to consider the rank positions of the items irrespective of their similarity scores. Secondly, and more importantly, despite having the same range of scores (such as zero to unity for the binary version of the Tanimoto coefficient [5]), the distributions within these ranges given by different similarity measures may not be directly comparable, with the possibility of biasing the fusion rule in much the same way as unstandardised numeric data can affect the results of a multivariate analysis.

We have compared the distributions of scores for each similarity method at each rank n, using the Kolmogorov-Smirnov test, which provides a simple and direct way of testing whether two distributions differ in any way, e.g., in location, dispersion or skewness [27]. If the distributions of the similarity scores for two original similarity measures are significantly different then it would be unwise to fuse them without applying some form of standardisation

Rank

Figure 1. Plots of mean score against rank for the three types of original (i.e., unfused) similarity measure for the cellular-uptake dataset.

procedure (i.e., the use of rank positions in the present context). Figure 1 shows plots of the mean similarity scores (averaged over all 136 target structures) at each rank position, n (1 < n < 100). The figure shows that while the 2D and 3D scores are distributed similarly, the physical property scores exhibit a markedly different distribution. Focusing upon the important top parts of the rankings, pairs of the distributions were compared for n = 1-10 using the Kolmogorov-Smirnov test: these tests showed that the distribution of scores for the physical properties measure was significantly different (p < 0.01) to those from both 2D and 3D for n = 1-10 and that the distribution of scores for 2D was significantly different to those for 3D for n = 1-4 We hence conclude that the distributions of similarity scores can be very different, even if they have the same ranges, thus supporting our use of ranks as the input to the various fusion rules studied here. Similar results were obtained [23] in a comparable study of the EVA and 2D rankings of the Starlist dataset mentioned previously.

### Fusion results

Having established the appropriateness of rank-based fusion, the main experiments were evaluated in two ways. In the first, a count was made of the molecules ranked in the top ten positions that belonged to the same activity subclass as the target structure. These counts were then averaged over each of the eight subclasses, with the results shown in Table 2, where (lyso-somes), M1-2 (mitochondria) and N1-2 (nucleus) denote the eight activity subclasses identified in the dataset. The bold-font underlined elements denote fusion results that perform at least as well as the best individual similarity measure. It will be seen that the best similarity measure, in terms of actives being highly ranked, varies across activity subclasses; however, the results

Rank

Figure 1. Plots of mean score against rank for the three types of original (i.e., unfused) similarity measure for the cellular-uptake dataset.

Table 2. The mean number of actives in the top-10 rank positions for each activity class in the cellular-uptake dataset for the original similarity methods (columns 2D, Phys and 3D) and after data fusion (columns MAX, MIN and SUM). The bold underlined numbers indicate a fused result at least as good as the best original similarity measure for that target structure

Table 2. The mean number of actives in the top-10 rank positions for each activity class in the cellular-uptake dataset for the original similarity methods (columns 2D, Phys and 3D) and after data fusion (columns MAX, MIN and SUM). The bold underlined numbers indicate a fused result at least as good as the best original similarity measure for that target structure

Activity |
2D |
Phys |
3D |
MAX |
MIN |
SUM |

Li |
1.40 |
3.05 |
1.12 |
2.96 |
2.02 |
3.25 |

L2 |
2.14 |
3.50 |
5.93 |
4.36 |
3.57 |
5.00 |

L3 |
5.53 |
6.35 |
3.69 |
6.00 |
5.81 |
6.16 |

L4 |
5.33 |
4.44 |
4.06 |
5.17 |
4.67 |
5.50 |

Mi |
2.29 |
6.17 |
2.50 |
4.96 |
4.08 |
5.04 |

M2 |
6.48 |
5.00 |
5.52 |
6.52 |
5.86 |
6.17 |

Ni |
2.71 |
4.43 |
2.71 |
3.19 |
3.29 |
3.81 |

N2 |
3.67 |
4.19 |
3.67 |
4.00 |
4.24 |
4.00 |

Mean actives |
3.99 |
4.64 |
3.65 |
4.65 |
4.19 |
4.93 |

Mean rank |
4.63 |
2.88 |
5.00 |
2.94 |
3.50 |
2.01 |

demonstrate that both SUM and MAX are, overall, to be preferred to the individual results. SUM also does well if one ranks the measures for each search, rather than using the actual numbers of actives retrieved (which vary considerably from one search to another). For example, in the first row of Table 2, SUM identifies most actives and is given the rank 1, Phys identifies the next highest number of actives and is given the rank 2 and so on down to 3D, which identifies the smallest number of actives and is thus given the rank 6. The mean ranks obtained in this way, when averaged across the eight activity sub-classes, are listed in the bottom row of the table and demonstrate clearly the effectiveness of the SUM fusion rule with this dataset.

The second set of analyses employed the Hamming distance [5] between the activity bit-strings of the target structure and a database structure, i.e., the number of times that the two bit-strings differ. For example, if the target is active for subclasses L1, M1 and N1 then a Hamming distance of 0 between a database structure and the target indicates that the former is also active in subclasses L1, M1 and N1 and only in those classes, and would thus be a most appropriate hit for that target molecule. Figure 2 shows the mean Hamming distance for each similarity measure across all 136 target structures at rank n (1 < n < 10), and it can be seen that the SUM and MAX fusion algorithms give results that are consistently better (i.e., a smaller mean

n = 1 |
n-2 |
B = 3 |
n = 4 |
n - 5 | |||||||||||

Method |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |

2D 3D Phys |
<0.01 <0.01 <0.01 |
0.18 0.84 0.34 |
<0.05 <0.01 <0.01 |
<0.01 <0.01 <0.01 |
0.24 0.53 0.58 |
<0.01 <0.01 <0.01 |
<0.01 <0.01 <0.01 |
0.79 0.43 0.54 |
<0.01 <0.01 <0.01 |
<0.01 <0.01 <0.01 |
0.43 0.43 0.33 |
<0.01 <0.01 <0.01 |
<0.01 <0.01 <0.01 |
0.58 0.42 0.38 |
<0.01 <0.01 <0.01 |

B = 6 |
n-1 |
« = 8 |
B = 9 |
b = 10 | |||||||||||

Method |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |
MAX |
MIN |
SUM |

2D 3D Phys |
<0.01 <0.01 0.43 |
0.42 0.20 0.60 |
<0.01 <0.01 <0.01 |
<0.01 <0.01 0.07 |
0.32 <0.05 0.53 |
<0.01 <0.01 <0.01 |
<0.01 <0.01 0.11 |
0.38 <0.05 0.43 |
<0.01 <0.01 <0.05 |
<0.01 <0.01 0.21 |
<0.05 <0.01 0.43 |
<0.01 <0.01 <0.05 |
<0.01 <0.01 0.36 |
<0.01 <0.01 0.28 |
<0.01 <0.01 <0.05 |

Figure 2. The mean Hamming distance at each rank n, 1< n <10.

1 23456789 10 Ftenkn

Figure 2. The mean Hamming distance at each rank n, 1< n <10.

Hamming distance) than those from any of the individual similarity methods. A pairwise comparison of similarity methods was carried out using the Wilcoxon Matched-Pairs Signed-Ranks test [27]. Specifically, the test was used to compare the Hamming distances for each fusion rule with each of the original similarity methods, target by target, and thus to indicate whether the two methods that are being compared are significantly different. Table 3 shows the p values for n = 1-10. It can be seen that SUM is significantly better than each of three original similarity methods for all values of n, with 28 out of the 30 sets of comparisons being highly significant (p < 0.01). MAX also performs well, but MIN is noticeably inferior to the other two fusion rules for this dataset.

Taken together, these results show that the fused similarity measures can, in some cases at least, enable better predictions to be made ofthe cell-staining activities ofthe molecules than can the original measures, with SUM appearing to perform best ofthe three fusion rules tested here. When we take account of the rather variable performance of the individual similarity measures from one activity to another, it can be concluded that SUM-based fusion provides an effective way of generating a reliable single ranking with respect to both a single activity and the activity classes as a whole.

## Post a comment