Statistical Approaches

High-throughput "omics" technologies such as microarray-based expression profiling generate massive amounts of data. As such, new statistical methods are continually being developed to deal with this challenging issue. The availability of a plethora of statistical techniques, when used with proper precautions, has provided a relatively quick and inexpensive way to validate biomarker candidates. The trial-and-error process with different statistical methods is especially common during early stages of biomarker development. In the exploratory phase of a biomarker project, various computational and mathematical techniques, such as multivariate analysis or machine learning, are often utilized to detect differences between treatment and control groups or between patients with different clinical presentations [18].

Statistically distinctive genomic biomarkers identified during the exploratory phase by one method may subsequently be subjected to a different technique. Similarly, given the same set of samples and expression measurements, permutation can be carried out on the data set prior to repeating an analysis. Congruency between the results generated using different methods may ultimately translate to an increase in biomarker confidence. This is especially useful during the internal and external phases of validation, when greater statistical freedom is exercised for the purpose of identifying or ranking biomarkers. Ideally, by the time a panel of biomarkers is selected for use in a clinical trial, specific algorithms and statistical approaches should be established. It has been suggested in a recent FDA presentation [ 12] that the expression patterns or algorithms should be developed and confirmed in at least two independent data sets (a training set and a test set, respectively) [12].

