Selection of reference molecules

A set of 100 reference molecules was extracted randomly from the MACCS Drug Data Report (MDDR), provided by MDL Information Systems Inc. (San Leandro, CA). Special attention was paid to avoiding any overlap of the reference set with the ligand test set.

3D structures for the reference set were generated by the Corina program

Hydrogens and partial charges, according to the Gasteiger-Marsili method

[21], were added within the SYBYL molecular modeling package (Tripos Inc., St. Louis, MO).

A list of the respective MDDR ID-codes can be obtained from the authors upon request.

Selection of ligand test set

For all evaluations, we used the same test set of 957 ligands extracted from the MDDR. The set contains 134 PAF antagonists, 49 5-HT3 antagonists, 49 TXA2 antagonists, 40 ACE inhibitors and 111 HMG-CoA reductase inhibitors. Additionally 574 compounds from the MDDR database were selected randomly. None of these belong to any of the five activity classes. Since the test set was already used for the development of Flexsim-X [ 14] and DOCK-SIM [5] (with some slight modifications), further details may be found in the respective papers and will not be repeated here.

Superimposing of ligand test set

Each ligand in the test set was superimposed rigidly onto each reference molecule, using the RigFit option of the program FlexS. Details of the FlexS and RigFit algorithms are described in References 18 and 22. A fingerprint vector for each test ligand was constructed from the pseudo fitting energies calculated by FlexS ('FlexS scores').

Genetic algorithm-based optimization of molecules in the reference panel In order to select optimal combinations of reference molecules, the genetic algorithm (GA)-based approach was applied exactly in the same way as described in our Flexsim-X paper [14] and will not be repeated here. As fitness function we used the mean sample hit rate as described in the 'success measure' paragraph below. The optimum was achieved using the following GA parameters (see Reference 14 for a detailed discussion):

Mutation rate: Crossover rate: Replacement: Selection:

uniform tournament 2 200

Tournament size: Population size:

Number of generations: 400

Descriptor comparison Generation of 2D descriptors

(1) DAYLIGHT fingerprints [6] were generated at fixed length (1024 bits) without folding.

(2) Within the ISIS database system, 166 predefined fragment-based keys are available (ISIS MOLSKEYS) [7].

These keys were extracted from the MDDR database and converted into a binary vector for each test ligand.

A feature tree is a more abstract 2D molecular descriptor than a fingerprint. Instead of counting atoms or fragments, a tree is constructed with each node representing a set of chemical features. Details of the methods are given in Reference 23.

(4) Tripos' Molecular Holograms [24].

Molecular holograms can be regarded as an extended form of 2D fingerprints. Instead of just determining the absence or presence of particular fragments, holograms maintain a count of the number of times each fragment occurs. In addition, branched and cyclic fragments are considered separately. Molecular holograms have been introduced by Tripos Inc. as part of the Hologram QSAR (HQSAR) module within SYBYL.

Generation of 3D descriptors

The score matrices for DOCKSIM and Flexsim-X are constructed as described in References 5 and 14. For DOCKSIM we used DOCK Version 3.5 with standard force field scoring.

The Flexsim-S method is described above.

Generation of euclidean distance matrices

For all 2D and 3D descriptor matrices, euclidean distances were calculated for each possible combination of test ligands.

Success measure

For each of the 383 ligands from the five activity classes in the ligand test set, the 10 most similar compounds (i.e. the 10 nearest neighbors) from the remainder of the whole test set were determined. An individual 'hit rate' for each compound can be computed based on the fraction of nearest neighbors belonging to the same activity class as the query compound itself. Finally, for the whole set of 383 ligands from the five activity classes, the individual hit rates were summed up and a mean hit rate was calculated. Enrichment is achieved when the nearest-neighbor hit rate is higher than the fraction of the particular activity class in the whole data set.

Calculation of overlap

For each compound from the five activity classes and each descriptor method, 20 nearest neighbors were determined. Subsequently, for each possible pair of descriptors, individual percentages of overlap were computed according to Equation 1 (e.g. a compound with 15 out of 20 nearest neighbors in common for a particular descriptor combination would have an individual overlap value of 75%).

Individual % overlap =

Number of nearest neighbors in common . 100 Number of nearest neighbors considered

Finally, the mean percentage of overlap for each descriptor combination is calculated.

0 0

Post a comment