What to Do

Samuel Taylor Coleridge once said that literary reviewers are usually people who would have been poets, historians, and biographers if they could, but having tried their talents at one or the other, and failed, they turn to critics. As in other areas of life, with the issue of measurement error, too, it is much easier to be a critic than to do something well oneself. What should one actually do? This is a very good question indeed, and, feeling a little uncomfortable under Samuel's gaze from the sentence above, I shall attempt to come up with some useful suggestions.

First, the appropriate approach is likely to vary according to the situation, and the most crucial issue to consider when choosing an approach to dealing with measurement error is the potential risk of the erroneous measurement to the end user of the drug, device, or process under study. Experiments and/ or data interpretation should then be designed to err on the side of caution, if risk is an issue.

Other key determinants may include practicalities such as resources and time available to approach the problem, and questions of clinical or physiological relevance of the potential error margin, thresholds, or measurement ranges of interest. If, for example, a 10i to 100-lold increase in biomarker expression would be required for clinical relevance, and the maximum estimated measurement error is in the range of less than threefold, perhaps there is no issue. The discussion above, concerning whether significant differences are clinically relevant also applies to the relevance of measurement errors.

If measurement error is of a magnitude that could be clinically relevant, its quantification becomes more important. Perhaps because of the status quo of ignoring measurement error quantification in biomedical methods (for which, after all, error quantification is much more challenging than for analysis of less complex models), this is not a widespread practice in the drug development industry. One way of acknowledging this error while avoiding complex quantifications might be to use the definition of maximum allowable measurement error that is already made routinely for the purposes of acceptance criteria as a threshold of level of difference, below which results would either not be considered to reflect anything but error, or would be more carefully scrutinized in some predefined way. In other words, if you have determined that up to a doubling in endpoint values in a pretreatment versus posttreatment sample could be attributed solely to compounded measurement error rather than to treatment per se, one might decide not to consider effects to be a result of treatment unless they surpass this measurement error threshold. Of course, the fact that a method has been validated according to across-the-board standards such as having better than 15% or 20% total error does not mean that it actually produces results that are this far from the nominal values of the reference standards, but merely that it may do so at times, and that results of assays where these limits are exceeded will be rejected from the study. Error quantifications may well indicate much less error than the maximum allowed for assay acceptance, so given the circumstances it may be worth putting the time in to determine what the actual error is rather than simply assuming that the worst possible error has always occurred. Actual error might be revealed by data mining from previously amassed quality control data and/or on a per-batch basis using internal standards, for example. In these cases it is as simple as quantifying the proportional values obtained for accuracy rather than simply noting a pass or fail.

In the case where biomarker analysis involves more complex sources of error than a single or a few analytical assays (such as for epidemiological studies or metaanalyses to qualify particular biomarkers for their intended use), more complex uncertainty quantifications would be required. Various such methods are available, including Bayesian [13] and Monte Carlo simula-

tions [14]. Bayesian approaches, which stem from an alternative view of probability compared to the more common frequentist inferential statistics, are touched upon further later. Monte Carlo simulations represent the brute-force type of computation, basically executing a huge number of possible permutations using random draws from all the potential inputs, so as to spit out a final probability function (made up of all the output values from the various random inputs) that comprises the compound error from all the unknown inputs. In a simple example, consider the film Groundhog Day, where a man woke every morning to find that it was the same day as the one before. The film consisted of this man going through his interminable repeated days, making different choices when the fixed events of the day occurred, as they did every day; you can think of this man as being stuck inside a Monte Carlo simulation (well, a simple one, since his own behavior choices were the only independent variable with randomly chosen modifications; the range of outcomes were consequences of his varied choices from that one variable). In questions of compound error, of course, there would be many more variables, also all being sampled randomly over and over again. One advantage of choosing a Monte Carlo approach to resolving complex error sources should your experimental paradigm present them, is that user-friendly software is readily available, due to its common use for risk analysis in financial and engineering applications.

Once measurement error has been quantified, by whatever means, error distributions could be used to express uncertainty limits around results, analogous to the random error-related confidence intervals common in statistical analyses. These would help to clarify how much of the apparent differences between sets of results might be an artifact of the measuring tool(s) and not necessarily an effect of the treatment that one wishes to measure. In many cases, such as the reporter gene model example above, literature search will reveal readily applicable statistical models for specific types of assay that incorporate compound sources of error into the calculation of statistical significance of differences [15].

Phillips and Lapole suggest a much simpler approach to partially quantify method error uncertainty without any complicated analysis [16] . Like Pablo Picasso, who once said it took him his entire life to learn how to paint like a child, these authors advocate a return to earlier ways: in this case, the grade-school lesson of rounding figures. Rounding can adjust the measurement scale to reflect the limitations of the measurement error. In their words: "Rounding does not create imprecision—the imprecision exists even if we do not accurately report it." Thus even though the measurement instrument may spit out values with several decimal places in them, samples with results in the range of 100 for a test demonstrating 2% inaccuracy (i.e., 98 to 102) might better be reported rounding to the nearest 5 rather than even to 1's, let alone the decimal points reported by the instrumentation.

Apart from attempting to quantify the error, contain it within a wider measurement scale, or set thresholds of interest that exceed it, as discussed above, we should also not forget to implement strategies to reduce it, when it is possible to do so. For instance, internal controls such as reference standards or paired samples can reduce the impact of the between-assay variance proportion of measurement error.

For noncontinuous data, misclassification rather than numerical differences are the end effect of measurement error. In this case, a suggested approach to avoid the negative impact of measurement error might be to assume that error will happen, and design studies to have higher power from the beginning, in order to minimize the impact of error on the power of the study. Statistical tests for noncontinuous data are already inherently less powerful than those for continuous data, and a lower initial power level in the experimental design can further magnify the impact of measurement error. For example, Gordon and Finch illustrate how the same misclassification error rate causes much more power loss in a study that has lower power to start with than in a study that has higher power to start with. In their example, given exactly the same error rate, if the study is designed to have 99% power, it ends up with 95.5% after the error is taken into account, whereas if it is designed to have 80% power, it ends up with only 65.4% [17] , Therefore, specifying a higher power in the experimental design gives an advantage of robustness to the obscuring effects of misclassification error. In real terms, designing an experiment with higher power most often translates into recruiting more subjects; these authors advocate that "paying the cost up front" strategy.

Project Management Made Easy

Project Management Made Easy

What you need to know about… Project Management Made Easy! Project management consists of more than just a large building project and can encompass small projects as well. No matter what the size of your project, you need to have some sort of project management. How you manage your project has everything to do with its outcome.

Get My Free Ebook

Post a comment