Estimates Of Predictive Accuracy In Developmental Studies

Developmental studies are analogous to phase 2 clinical trials. They should include an indication of whether the pharmacogenomic classifier is promising and worthy of phase 3 evaluation. There are special problems, however, in evaluating whether classifiers based on high-dimensional genomic or proteomic assays are promising. The difficulty derives from the fact that the number of candidate features available for use in the classifier is much larger than the number of cases available for analysis. In such situations, it is always possible to find classifiers that accurately classify the data on which they were developed even if there is no relationship between expression of any of the genes and outcome (5). Consequently, even in developmental studies, some kind of validation on data not used for developing the model is necessary. This "internal validation" is usually accomplished either by splitting the data into two portions, one used for training the model and the other for testing the model, or some form of cross-validation based on repeated model development and testing on random data partitions. This internal validation should not, however, be confused with external truly independent validation of the classifier.

The most straightforward method for estimating the prediction accuracy is the split-sample method of partitioning the set of samples into a training set and a test set. Rosenwald et al. (11) used this approach successfully in their international study of prognostic prediction for large B-cell lymphoma. They used two-thirds of their samples as a training set. Multiple predictors were studied on the training set. When the collaborators of that study agreed on a single fully specified prediction model, they accessed the test set for the first time. On the test set there was no adjustment of the model or fitting of parameters. They merely used the samples in the test set to evaluate the predictions of the model that was completely specified using only the training data. In addition to estimating the overall error rate on the test set, one can also estimate other important operating characteristics of the test such as sensitivity, specificity, and positive and negative predictive values.

The split-sample method is often used with so few samples in the test set, however, that the validation is almost meaningless. One can evaluate the adequacy of the size of the test set by computing the statistical significance of the classification error rate on the test set or by computing a confidence interval for the test set error rate. Because the test set is separate from the training set, the number of errors on the test set has a binomial distribution.

Michiels et al. (12) suggested that multiple training-test partitions be used, rather than just one. The split-sample approach is mostly useful, however, when one does not have a well-defined algorithm for developing the classifier. When there is a single training set-test set partition, one can use biological insight on the training set to develop a classifier and then test that classifier on the test set. With multiple training-test partitions, however, that type of flexible approach to model development cannot be used. If one has an algorithm for classifier development, it is generally better to use one of the cross-validation or bootstrap resampling approaches to estimating error rate (see below) because the split-sample approach does not provide as efficient a use of the available data (13).

Cross-validation is an alternative to the split-sample method of estimating prediction accuracy (5). Molinaro et al. describe and evaluate many variants of cross-validation and bootstrap re-sampling for classification problems where the number of candidate predictors vastly exceeds the number of cases (13). The cross-validated prediction error is an estimate of the prediction error associated with application of the algorithm for model building to the entire dataset.

A commonly used invalid estimate is called the re-substitution estimate. You use all the samples to develop a model. Then you predict the class of each sample using that model. The predicted class labels are compared to the true class labels and the errors are totaled. It is well known that the re-substitution estimate of error is highly biased for small data sets and the simulation of Simon et al. (14) confirmed that, with a 98.2% of the simulated data sets resulting in zero misclassifications even when no true underlying difference existed between the two groups.

Simon et al. (14) also showed that cross-validating the prediction rule after selection of differentially expressed genes from the full data set does little to correct the bias of the re-substitution estimator: 90.2% of simulated data sets with no true relationship between expression data and class still result in zero misclassifications. When feature selection was also re-done in each cross-validated training set, however, appropriate estimates of mis-classification error were obtained; the median estimated misclassification rate was approximately 50%.

10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook


Post a comment