Introduction

Measures of inter-molecular similarity play an important role in drug- and pesticide-discovery programmes, being used for both database searching [ 1] and structure-activity studies [2]. Many different types of similarity measure have been described in the literature (see, e.g., [3-5]) but the great majority of published studies have considered the use of only a single type of similarity measure: in many cases, indeed, a description of a new type of similarity measure forms the principal focus of the publication. Even where this is not the case, multiple measures have typically been employed only as the input to a comparative study that seeks to identify the 'best' measure, using some quantitative performance criterion. As an example, an early study in our laboratory [6] compared 36 different similarity measures by means of simulated leave-one-out property prediction, and concluded that the Tanimoto coefficient was the most appropriate similarity coefficient of those tested for measuring the resemblances between pairs of fragment bit-strings. Such

* To whom correspondence should be addressed. E-mail: [email protected].

comparisons, of which there are many in the literature, are limited in that they assume, normally implicitly, that there is some specific type of structural feature, weighting scheme or whatever that is uniquely well suited to describing the type(s) of biological activity that are being sought for in a similarity search. The assumption cannot be expected to be generally valid, given the multi-faceted nature of biological activities, and this article investigates the use of data fusion [7] for combining multiple similarity measures.

0 0

Post a comment