Reaction prediction

It becomes increasingly clear that combinatorial chemistry asks for a deeper understanding of the reactions to be performed in synthesizing a library. It is common knowledge that organic reactions do not always give the products that one expects, that they might give by-products, that the yields might be low, and that reactions proceed with different rates. In the initial phases of the development of combinatorial chemistry these points have not always obtained enough attention; however, they now gain increasing importance.

Since many years we have been concerned with the modeling of chemical reactivity [11-15]. To handle these investigations we have developed the EROS system (Elaboration of Reactions for Organic Synthesis) [16-19]. Recently, we developed a new version of EROS that has a wide range of applications, from modeling laboratory reactions through studying the degradation of chemicals in the environment, all the way to simulating mass spectra [19]. A detailed description of this new version of EROS is given elsewhere [19,20]. Here, we only want to outline its application to combin atorial chemistry. The wide range of applications of EROS has been realized by the introduction of novel, clearly defined concepts. Again, more details are given elsewhere; here only essentials necessary for showing the utilization of EROS for combinatorial chemistry are given. In order to model a chemical reaction, details on the experimental set-up have to be given. To this end, information on the number of reactors and the number of phases, on the mode of a reaction, and on the kinetic set-up have to be specified.

These notions are now further explained: A reactor is a place where reactions are run at the same time. Such a reactor may consist of several places that are separated from each other but communicate with each other; these are the phases. Such a phase might be characterized by different reaction conditions, e.g., by different solvents; the communication occurring through the interface. The mode of a reaction specifies how molecules will react with each other: A situation where each starting material might react with each other molecule (mode = mix), or a situation where one has mono- or pseudomono-molecular kinetics (mode = monomolecular), or a set-up where no reaction occurs at all (mode = inert). The mode can be used in conjunction with information in the rule file on rate constants of the individual reaction steps (vide infra) to set up a system of differential equations for the kinetics of the system. This system of equations can be solved either by the Gear algorithm [21], or the Runge-Kutta [22], or Runge-Kutta-Merson method [23].

The use and benefits of the concepts introduced above are illustrated with a variety of examples elsewhere [19] and will be indicated here for the example of a combinatorial chemistry experiment.

EROS aspires to be applicable to the entire range of organic chemistry. This is to say, that the design of EROS is such that any type of organic reaction and arbitrary conditions can, in principle, be modeled. However, the knowledge presently available to EROS allows an in-depth modeling for only a few reactions. It is presently up to a system engineer or computer chemist to develop models for a specific reaction type of interest in close collaboration with an organic chemist. Mechanisms for developing such a knowledge base for EROS are available.

In the design of EROS a clear-cut separation of the system proper from the knowledge base has been made. The knowledge of which reactions EROS can be applied to is contained in a separate file of rules on reaction types (Figure 4). This modular set-up allows easy addition of rules for new reaction types, thereby extending the scope of EROS.

EROS considers reactions as bond breaking and making and electron shifting procedures much in the same ways as an organic chemist might draw curved arrows for specifying a reaction mechanism. A reaction rule specifies generation and evaluation of reactions and products

rules for generation and evaluation calculation of physicochemical properties

Figure 4. Basic set-up of the EROS system, the system core interacts with the routines for calculating physicochemical effects and the external knowledge base.

Figure 5. General reaction scheme for the breaking of two bonds followed by the formation of two new bonds; constraints for A, B, C and D can be applied according to atom type, atom and bond properties.

which bond and electron shifting pattern should be used. It can further make restrictions on the types of atoms (A, B, etc.) and bonds (single, double, etc.) that this bond rearrangement is to be performed to. Furthermore, a reaction rule can provide mechanisms for the evaluation of a reaction type. Such an evaluation might range from the calculation of heats of reaction [24,25], through explicit mathematical functions to model chemical reactivity [11], all the way to the calculation of absolute rate constants as exemplified for the hydrolysis of amides [26].

In this endeavor, the evaluation of a reaction can incorporate a variety of physicochemical effects that are automatically calculated by PETRA for any structure generated in an EROS run. These methods calculate partial atomic charges [4,6], q„ and q%, quantitative measures of the inductive X a [5], resonance Xn [6], and polarizability effect a [7], as well as heats of formation

[25]. The more experimental evidence is available for a certain reaction type, the better an evaluation can be derived and incorporated into a reaction rule.

A variety of software tools has been developed for extracting knowledge on a reaction type from a series of reaction instances. These comprise statistical methods [12], as well as neural networks [15,27,28], both of unsupervised and supervised learning. One such approach will be detailed in the next section.

First we will, however, show how the concepts of reactors, phases, and modes allow the handling of a set of reactions from a parallel synthesis or a combinatorial chemistry experiment. For a parallel synthesis or combinatorial chemistry experiment, a single reactor has to be defined as all reactions are performed at the same time. The set of all starting materials are analyzed for predefined substructures (e.g., acid chlorides) and all starting materials having a given substructure are stored in one phase. Thus, one has initially to specify as many phases as one has different types of starting materials.

We will illustrate the application of EROS to combinatorial chemistry with the parallel synthesis of amides from acid chlorides and amines. In this case, one needs two phases for storing the two types of starting materials (Figure 6). The mode of the first phase that stores all acid chlorides is set to 'inert' because no reactions are to be performed between these starting materials. The mode of the second phase, storing the amines, is set to 'interface'. This has the effect that one amine after the other is taken from this phase and each one is individually reacted with each separate acid chloride (at the 'interface'). Thus, all combinations of amines with acid chlorides are produced.

Figure 7. 15 acid chlorides react with 15 amines. All possible 225 amides were generated.

In the given example 15 acid chlorides were reacted with 15 amines giving 225 amides (Figure 7).

When several reaction steps are performed, more reactors are needed. In a previous publication [19] it has been shown how the Ellman synthesis [29] of 1,4-benzo-diazepines from 2-aminobenzophenones, amino acids, and alkylating agents can be modeled by a sequence of five phases.

The reaction rules contain procedures for the evaluation of a reaction type. These models are derived by learning from a series of individual reactions much in the same way as chemists have learned their rules on chemical reactivity from a set of individual reactions.

The more observations on reactions are available and the better the experimental data, the more detailed and better a model can be developed. Clearly, if data on the kinetics of a series of reactions are available, an in-depth evaluation of a reaction type can be derived. We will show this in the following for a quantitative evaluation of the hydrolysis of amides.

When less detailed information is available, such as reaction yields, only a more cursory model can be obtained. At the lower end of the scale we only have information on reactions that have been observed such as on those stored

Figure 8. The numbering of atoms at the reaction center ana all physicochemical aescnptors as used in Equation 1.

in a reaction database. How knowledge on a reaction type can be obtained even in such a case will be shown in the next section on learning from reaction databases.

But let us first address the case where kinetic data are available. Some time ago, we have performed an in-depth analysis of the hydrolysis of amides, both under basic conditions and under acid catalysis [26]. Here, we will only concentrate on amide hydrolysis under basic conditions.

Pseudo-first order rate constants for the hydrolysis of 13 benzamides, 18 anilides, and five phenylureas were gathered from literature, standardized to unit concentration of OH- and converted into free energies of activation, A G*. Then, a set of descriptors on electronic effects at the reaction center was calculated by the PETRA package for each individual reaction. The dataset was split into a training set of 33 reactions and a test set of five reactions. Stepwise multi-linear regression analysis on the training set provided Equation 1 for the quantitative modeling of this reaction type.

A G* = 107.6 - 198.1A^n (2-3) + 1.39RVD + UOR+p-o (1) (in kJ mol-1)

A q% (i-j ) gives the difference in the n charge of the bond between atoms i and j, whereas R+( i-j) is a measure of the stabilization of a positive charge through the resonance effect on atom i obtained on heterolysis of the bond

Figure 8 shows the numbering of atoms at the reaction center and identifies the physicochemical descriptors as used in Equation 1.

Equation 1 modeled the free energies of activation of the reactions in the training set with a correlation coefficient, r, of 0.958 and a standard error, s, of 3.3 kJ mol-1 (0.58 log k units) in a range of 27 kJ mol-1. Application of Equation 1 to the five reactions of the test set showed a standard error of 1.8 kJ mol1 (0.31 log k units).

This equation was incorporated into the rule base of EROS and allowed the calculation of absolute rate constants for the hydrolysis of a wide range ofbenzamides, anilides, and phenylureas [26]. It was shown that it could be generalized to the hydrolysis of benzoylphenylureas such as those constituting important agrochemicals. Clearly, enough kinetic data that allow such an in-depth analysis are only available for a few reaction types. Then, other data - preferentially quantitative ones - are necessary for deriving models for the evaluation of reaction types. We attribute much hope in data on the yields of a series of reactions run under identical or systematically varied reaction conditions. We challenge readers to provide us such data in order that we can derive a model for their reaction type.

When not even yields for reactions run under comparable conditions are given, another approach for deriving knowledge on reactions has to be chosen. This is the theme of the following section.

0 0

Post a comment