Data Preprocessing

Each of the six muscle tissue samples was analysed in quadruplicate, thus generating 24 spectra in total. Representative spectra obtained by pyrolysing 10 yg of muscle tissue on filaments with a Curie-point temperature of 610°C and using electron impact ionisation at 14 eV electron energy are shown in Figure 25 for control C and patient F.

Visual examination of the spectra shows obvious similarities (note the characteristic ion cluster at m/z 91 to m/z 100) and differences (compare m/z 67) between the two spectra. The first question to be addressed is whether these differences are characteristic of the samples or due to variations introduced by the analytical procedure. Assuming that these differences indeed prove to be characteristic of the samples, the next question then becomes whether these differences in composition are correlated with the involvement in the disease or are due to intra- and interindividual variations in the samples. Before such questions can be addressed by making numerical comparisons between the spectra, all ion intensity measurements have to be properly scaled. Two types of scaling operations are normally performed, namely pattern (= spectrum) scaling, more often referred to as "normalisation", and feature (= mass peak) scaling. Both types of scaling will be discussed in some detail since previous texts have paid little attention to these procedures, which are highly crucial to the outcome of the numerical evaluation procedure.

6.2.1. Pattern scaling

Pattern scaling (normalisation) is performed to compensate for variations in the overall ion intensity caused by phenomena not relevant to the analytical problem, such as differences in sample size or changes in instrument sensitivity. The most direct, often satisfactory way to normalise mass spectra for numerical purposes is to express peak heights as percentage total ion intensity. This procedure works



20 40 60 80 100 120 m/z 140

Figure 25. Pyrolysis mass spectra of muscle tissue from a control subject (C) and a dystrophic patient (F). For clinical data see Table 4. Conditions: sample 10 yg; Tc 610°C; Eel 14 eV. Arrows indicate increased intensities as compared to the other spectrum.

20 40 60 80 100 120 m/z 140

Figure 25. Pyrolysis mass spectra of muscle tissue from a control subject (C) and a dystrophic patient (F). For clinical data see Table 4. Conditions: sample 10 yg; Tc 610°C; Eel 14 eV. Arrows indicate increased intensities as compared to the other spectrum.

better as the number of peaks increases and as the variation in individual peak heights decreases. The main problem in using this procedure is the occurrence of very large peaks, especially when these peaks exhibit a high degree of intra- and/or inter-sample deviation. If such a peak happens to be unusually high in a given spectrum, then all other peaks will be given low relative intensity values, which may considerably confuse further quantitative and qualitative comparisons between the spectra. A simple solution to this problem is to exempt all peaks larger than a certain percentage of total signal intensity, in one or more of the spectra compared, from the normalisation procedure. Once the remaining peaks have been normalised to 100% of the residual total ion intensity, the initially omitted peaks can be scaled accordingly, thus bringing the sum of the relative ion intensities to a value greater than 100%. The obvious shortcoming of this procedure is, however, that elimination of the largest peaks is at best a very rough way of eliminating potential sources of strong variation. Some large peaks may be very stable and contribute little to intra-sample (inner) or inter-sample (outer) deviation, whereas a relatively small peak may be responsible for most of the total (pooled) deviation in the system. An alternative procedure would therefore be to exempt all peaks exhibiting more than a certain amount of inner and/or outer deviation, depending on the analytical problem. However, this requires a knowledge of these deviations, which can be calculated accurately only after adequate normalisation. Therefore, this approach requires iterative calculation of normalisation coefficients and variances while removing peaks with high deviation values until no more peaks can be found with deviation values above a certain level. Apart from being demanding on computer time, this iterative procedure needs to be used judiciously when high levels of inner or outer deviation exist in the data, otherwise all but a few small peaks might be removed, and the normalisation might be based on poor signal statistics.

For the muscle biopsy data a compromise solution was adopted. Preliminary pattern scaling was performed on the basis of percentage ion intensity using all peaks. Subsequently, inner and outer deviation values were calculated for all peaks. Those peaks showing more than 5% relative ion intensity and/or an inner or outer deviation larger than 1% of total deviation were removed from the second and final pattern scaling operation. This resulted in the temporary removal of m/z 17, 34, 43, and 67.

It should be noted that the choice of the normalisation procedure and of most of the following numerical procedures depends largely on the operator's analytical philosophy. If, for example, the pyrolysis mass spectra of polystyrene and polytetra-fluoroethylene, which show peaks only at m/z 104 and 100, respectively, (see Figure 1) are to be compared numerically, then the iterative normalisation procedure will fail completely as both peaks are removed and no group of stable peaks common to both spectra is left to provide an internal reference for pattern scaling. Even if this problem is ignored by using sample weight or total ion current as a normalisation reference, further problems may be encountered when actually comparing the spectra numerically, depending on the type of similarity coefficient used. In general it can be stated that the more similar the spectra are that are being compared, the more straightforward a numerical comparison is likely to be. The more dissimilar the spectra, the greater is the difficulty in deciding which numerical procedures to use..

6.2.2. Feature scaling

Feature scaling, i.e. scaling the relative intensity values of different mass peaks, is performed to compensate for differences in relative peak intensities caused by phenomena not relevant to the analytical problem. For example, when trying to differentiate muscle biopsy samples by comparing pyrolysis mass spectra, the largest peaks do not necessarily contain the most information relevant to the problem. However, if no feature scaling is performed these large peak values will dominate the numerical comparison between spectra,with most types of algorithms.

Because it is preferable to base a classification or differentiation of muscle biopsy samples on the most reliable and characteristic features rather than on the largest, some form of feature scaling is indicated. Since multiple analyses of each sample are made, intra-sample standard deviations for each peak can be calculated and peak intensities expressed in units of standard deviation. The more duplicate analyses are available the more effectively this procedure removes intra-sample "noise" due to instrumental sources. If multiple samples had been available from each individual, "noise" from intra-individual variations in the composition of the sample could have been eliminated in the same way. An example of this scaling for the DMD data is given in Table 5 in which column (a) gives the total ion intensity for various peaks and column (b) the intra-sample deviation. Division of (a) by (b) gives the peak intensities in terms of standard deviation units. While this very common procedure brings out the most reliable features, these features are not necessarily the most characteristic features for obtaining a clear differentiation between the samples.

Was this article helpful?

0 0

Post a comment