Figure 28. Three-dimensional representation of two hypothetical mass spectral patterns with features X, Y and Z, showing Euclidean distance and city block distance.

All metric similarity coefficients in two- or three-dimensional space can be visualised provided that suitable linear or non-linear axes are chosen. For a mathematical description of the various similarity coefficients used in multivariate statistical analysis, the reader is referred to specialised textbooks (refs. 132, 133) or to the SPSS User Manual (ref. 128).

In our experience, the modified Euclidean distance algorithm described by Eshuis et at. (ref. 45) and the closely related chi squared algorithm provide the most satisfactory results for pyrolysis mass spectra, provided that the proper feature scaling techniques are used first. Again referring to the DMD data, Table 6 shows the distance matrix obtained for the Py-MS spectra when .using the 40 mass peaks with highest "characteristic!ty" values and employing the chi squared formula. The diagonal of the matrix shows distance values between duplicate spectra of the same sample. Examination of these values allows the detection of badly reproducing spectra, which subsequently may be deleted and the matrix re-calculated. If there are no more obvious outliers among the duplicate spectra, the matrix can be substantially reduced by listing only the distance values between the average spectrum of each sample, as illustrated in Table 7. A close scrutiny of the values in Table 7 enables a crude mental picture to be formed of the relative positions of the average spectra in multi-dimensional space. Obviously, the spectra of the non-dystrophic controls form a small cluster with the spectra of the dystrophic patients moving progressively away from this cluster with increasing age of the patient. To form such mental pictures becomes increasingly difficult, if not impossible, with

0 0

Post a comment