Univariate Statistical Analysis

Feature scaling factors are primarily calculated for subsequent multivariate statistical analysis of the data. However, these factors also provide a primitive, but often effective, means of data reduction and/or feature selection. Data reduction for further numerical analysis procedures may be achieved by only considering the n mass peaks with the highest "characteristicity" values or highest reproducibility values, where n may have any value from unity to the total number of mass peaks measured. Feature selection on the basis of scaling factors can be helpful in selecting features to be displayed as one-, two- or three-dimensional plots. Alternatively, selected features may be used for further univariate statistical analysis, e.g. T-tests for the significance of observed trends in the intensity distributions.

As an example of two-dimensional feature plots, Figure 26 shows the scatter plot of the intensity distributions of the two masses with highest "characteristicity" values, namely m/z 67 and 81, respectively, for the DMD data. These masses both show a progressive increase in intensity with progressive involvement of the disease. In fact, T-test values show a significant separation between patient and control samples in spite of the small number of samples involved. At the same time, however, this two-dimensional plot shows the basic shortcoming of univariate statistical analysis techniques in that the strong correlation between the two mass signals is completely ignored. The existence of this correlation is confirmed by the high value of the calculated correlation coefficient. In contrast, the scatter plot of m/z 32 versus m/z 101 shown in Figure 27 reveals no strong correlation either with the degree of involvement in the disease or between both masses. The intensity distribution of a+b


Figure 26. Scatter plot of the ion intensities at m/z 67 versus m/z 81 for muscle biopsy samples (Table 4), showing a strong increase in the dystrophic muscle (D,E,F) and a high degree of correlation (correlation coefficient 0.91) between m/z 67 and 81. T-tests for probability of chance occurrence of separation between normals and dystrophic case F yield p values less than 0.01 for both m/z 67 and m/z 81.

0 0

Post a comment