"2 126

N-acetylamino sugar hexose

R. torutoides

R. torutoides

£ F.capsutigenum pen tose

£ F.capsutigenum pen tose

-2-1 0 1 FACTOR SCORE Fl

Figure 35. Fyrolysis mass spectrum of the yeast RhodospGvid'iwv. tor-uZoi-des together with some factor spectra obtained on comparison with spectra of the other yeast species Saooharomyces oerevisiae and Filobasidium eapsuligemon. The complex yeast spectra (e.g. Figure (a)) contain various sub-sets of fragment peaks attributable to main groups of biochemical components, viz. proteins (marker peaks at m/z 34, 48, 69, 83, 92, 94, 108, 117, 131), hexoses (m/z 55, 58, 68, 72, 74, 82, 84, 85, 96, 98, 102, 110, 112, 126, 144), pentoses (m/z 55, 58, 60, 68, 70, 72, 82, 84, 85, 86, 96, 98, 114) and N-acetylamino sugars (m/z 59, 73, 83, 97, 109, 123, 125, 137). Figure (b) shows the first factor (unrotated); the positive part represents a protein sub-pattern and the negative part a mixed pattern of mainly carbohydrate fragment peaks. Figure (c) shows the negative part of the first factor after rotation of the feature space over 60°, and represents a pentose sub-pattern. Figure (d) shows the positive part of the second factor in the 60° rotation configuration and represents the strongly correlated hexose and N-acetylhexosamine sub-patterns which cannot be further separated. The plot of the unrotated factor scores for factors 1 and 2 is given in Figure (e). The inproduct of a spectrum and a "component factor" gives the rotated factor scores; F. oapsuligemm is relatively rich in pentose components, S. aevevieiae in protein and R. tovuloidee in hexoses and N-acetylaminosugars. It should be noted that this type of plot in general gives a better impression of dissimilarities between spectra in terms of chemical components than do non-linear maps such as that presented in Figure 31.

The applicability of several other important multivariate analysis techniques in Py-MS, e.g. principal component analysis or discriminant analysis, remains to be further demonstrated. Principal component analysis (ref. 132) is basically a data reduction technique. However, as a data reduction technique it has some serious disadvantages for Py-MS applications. First, the transformation from feature space to principal component space (Karhunen-Loeve transformation) is a full multivariate procedure requiring considerable computer time. Thus, in most applications, little or no gain in time or efficiency can be expected by reducing the data in this way. Secondly, the familiar m/z values are replaced by principal component values which do not necessarily have any chemical meaning. In factor analysis, an attempt is made to obtain factors which correspond directly to components of the sample and special adaptations [Varimax rotation (ref. 138)] exist to accommodate situations where the components in the sample are not mutually independent, as will be the rule rather than the exception in complex organic samples. Principal components, however, are mutually independent [orthogonality principle (ref. 132)].

Perhaps the most useful application of principal component analysis in Py-MS is to arrive at an estimate of the intrinsic ("true") dimensionality of the data set or, in chemical terms, the number of chemical components which vary between the samples of the set (ref. 144). Knowledge of this intrinsic dimensionality, which may be much lower than the number of mass peaks per spectrum, is important for assessing the suitability of the data set for supervised learning procedures, such as the LLM (linear learning machine) approach. When using discriminant analysis techniques, such as LLM, a reliable differentiation between two or more classes of spectra can be obtained only if the number of spectra per class is at least three times the intrinsic dimensionality of the data set (ref. 145). For instance, when trying to establish a reliable discriminant function for two or more classes of pyrograms reflecting differences in twenty chemical components, then the training set of pyrograms of known class should consist of at least sixty pyrograms for each of the classes considered. If this condition is not fulfilled, the discriminant function calculated will show poor classification performance. This requirement strongly limits the application of LLM and other discriminant analysis techniques to the classification of pyrolysis mass spectra of complex biomaterials. As discussed in Section 6.5, non-supervised techniques such as Kruskal's non-linear mapping method can be extremely useful in visualising differentiation tendencies in relatively small data sets which do not satisfy the basic requirements for discriminant techniques. A different, highly promising approach to the analysis of relatively small data sets has been developed by Wold and Sjostrom (ref. 146). This approach, the so-called SIMCA technique, was recently applied to Py-GC data (ref. 147) but has not yet been tested with Curie-point Py-MS data. The SIMCA program can be obtained from Infometrix (ref. 130).

Chapter 7

0 0

Post a comment