## Multivariate Statistical Analysis

Instead of considering the intensity distribution at each individual mass value separately, as is the case in univariate statistical analysis, multivariate statistical analysis techniques allow consideration of n mass distributions in n-dimensional space simultaneously. Whereas it is relatively easy to visualize two- or even three-dimensional scatter plots obtained by plotting the intensity distributions at two or three mass values simultaneously, direct visualisation of distributions in higher dimensional space is, of course, impossible. However, as long as the higher dimensional space is metric, calculation of distances between points obeys the same metric laws as calculation of distances in two- or three-dimensional space. Therefore, consider the situation shown in Figure 28, where each of the two points represents the result of a measurement involving the same three parameters, x, y, and z. In terms of Py-MS data, this would correspond to two mass spectra containing masses x, y, and z with the ion intensities for those particular masses plotted along the corresponding axes. Since the values found for each of these parameters are different in the two measurements, the points occupy different positions in three-dimensional space. Obviously, the more different the values measured, the farther apart are the points, and viae versa. Thus, the distance between the points is a numerical expression of the degree of similarity between the multiparameter measurements .

The most direct way of measuring the distance between the points is to calculate the distance along a straight line through both points, the so-called Euclidean distance. A seemingly more complicated way of measuring the distance is to sum the distance when taking the shortest route from one point to another only following lines parallel to one of the three axes, the so-called city block distance (see Figure 28). The attractive property of this distance measurement is that is involves only simple additions and subtractions. Frequently, quadratic and other non-linear functions (e.g. correlation coefficients, standard deviations, chi squared coefficients or generalized Mahalanobis distances) are used to derive distance values (ref. 132). All of these functions are generally referred to as "similarity coefficients".