Crop protection score

Data

In order to extend the approach for drug-likeness described above to crop protection compounds, it would be nice to have an equivalent for the World Drug Index in the crop protection field. Interestingly, there is no such database with the chemical structures in a computer-readable format available. Therefore, two databases that were assembled in-house at BASF were used: 986 crop protection compounds that are on the market or under development (CP) and 1203 compounds from new crop protection patents (PAT). The structural overlap of these two databases is less than 1% (data not shown). The two databases were pre-processed with the same procedure as the drugs in Reference 4. The CP database (986 compounds) was used along with 1000 'non-crop protection' compounds from the ACD [14] as training set. The CP database can be considered as the currently available world of crop protection compounds. The PAT database consisting of 1203 mostly newly patented compounds in this area can be considered as the future of crop protection chemistry. This database was used as a test set.

Results

The neural network training was performed like for the drug filter described above [4]. The results are shown in Figure 4.

The vast majority of the CP data (91%) is on the right hand side of the diagram. These are the correctly classified crop protection compounds. The majority of the non-crop protection compounds (71%) are on the left hand side. Thus, the new filter is able to distinguish between compounds that are suited for crop protection purposes and those which are not. In order to assess the predictivity of the approach, the second crop protection database PAT, being the future of crop protection chemistry, was sent through the trained neural network. Figure 5 shows the distribution of the score for this dataset. Clearly, also these 1203 compounds were classified mostly correct (69%).

Cross-validation

In addition, the two filters for drugs and crop protection compounds were cross-validated by applying the crop protection network to the World Drug

Figure 4. Distribution of crop protection scores in the CP (closed line) and ACD (dashed line) datasets.
Figure 5. Crop protection score distribution in the PAT dataset (solid line). For comparison, the distributions for the training data (Figure 4) are shown with thin dashed lines.

Index and the drug-likeness network to the crop protection compounds. The crop protection filter found 67% compounds not suited for crop protection in the WDI. On the other hand, the drug filter found 77% non-drugs in the two crop protection databases (CP and PAT, 2189 compounds).

0 0

Post a comment