## Introduction

Learning how to apply statistical analyses ... may take a lifetime of trial (and sometimes error), as it has done in the author's case. There is no evidence that biomedical investigators of the present generation are on a steeper learning curve. [1]

Sitting in the Drill Hall at the University of Toronto with row upon row of shifting, sighing, and coughing co-sufferers, cold fingers clutching one of my two well-sharpened HB pencils while I carefully write out each step of my standard deviation calculation. Statistics 101 Christmas exam. One finger rasping down rows and across columns of tables,p-values, degrees of freedom, these are the memories I share with countless scientists who have studied statistics in preparation for their research careers. Although I left that course and others with respectable marks and a seemingly clear understanding of the null hypothesis and the normal curve, I found myself some years later, scratching my head as I considered a sheet whose bland columns of numbers belied the painstaking hours spent injecting, isolating, pipetting, incubating, and centrifuging, from which they had been distilled. Looked at one way, it seemed

Biomarkers in Drug Development: A Handbook of Practice, Application, and Strategy, Edited by Michael R. Bleavins, Claudio Carini, Malle Jurima-Romet, and Ramin Rahbari Copyright Â© 2010 John Wiley & Sons, Inc.

to suggest one thing, but was that the correct approach? Looked at another way, the picture was different. The conclusion to be drawn would influence the direction that the entire project would take.

How to apply the right statistical test correctly to a particular research question, given the many choices, assumptions, and caveats, is not always obvious; indeed "gross misunderstandings of the purpose and functions of statistical analysis are apparent in applications to research grant-giving bodies and ethics committees, in manuscripts submitted to journals and sometimes in published papers" [1]. In fact, there seems to be a growing field of forensic biostatistics, pointing out data analysis errors that have crept into respected publications [2-4] , This is not surprising, considering the advent of genomics, proteomics, and other- omic- s tyle research, in which the ability to produce huge quantities of complex data are outstripping the common biologist's understanding of the mathematical issues concerning its interpretation.

Scientists focused on biomarker research in the drug development industry face an additional set of challenges, where the pressures of the business world to use less time and fewer resources, to focus only on the immediate goal, and to quickly release positive findings to investors or other onlookers are often at odds with the expert, thorough, and careful consideration that the more challenging research problems may require. Understandably, biomedical scientists who are often highly expert in their scientific discipline are not necessarily so in the mathematical issues underlying analysis of their data, and many are also without the luxury of a biostatistician with whom to consult at times of indecision. Although some lucky others may feel that there is no need to master the field when they can simply rely on in-house expertise, it is important to recognize that a working knowledge of the factors affecting data analysis facilitates better communication between the statistician and researcher, which, in turn, generally leads to better research design. Furthermore, the question of how to interpret data wisely is not limited to applying the correct statistical test in the correct manner, but also includes understanding the limits and pitfalls of the technical procedures used to generate the data. In this regard, biostatisticians, too, face challenges, such as being unfamiliar with or intimidated by the rapidly evolving and complex technical methods used in the lab. As so well voiced by Ransohoff [5], "how well do biologists ... appreciate the nature or seriousness of problems that can be introduced by methods of handling data . ? How well do epidemiologists or biostatisticians understand technical details of specimen collection, handling, and analysis in a way they can use to anticipate and manage specific sources of bias?"

In addition to an understanding of both technical and mathematical issues, there is a third element important to teasing out the elusive meaning of the data, which I like to think of as the HP factor (Hercule Poirot, that is). Born of intellect and experience, modeled by a higher-order view, a wider vision of extraneous factors that may play roles in the outcome, infused with a measure of skepticism and questioning of assumptions, this factor may more commonly be known as judgment.

Now, I sincerely hope that I am not marked, Rushdie-like, by the statisticians guild for this comment, since this softer science is undoubtedly in the realm of the subjective, and as such, in danger of being classified in the same bin as bias, data selection, and the bubonic plague. To my mind, however, judgment is critical to the success of most endeavors, including data evaluation and statistical analysis. Even though the rules of application of such analyses may be straightforward (in our day, in fact, completed for us by a simple twitch of an index finger on a mouse), understanding potential errors in our data and/ or determining which of many analytical possibilities best fit the situation presented by our own experimental paradigm and data set are not, and this is where judgment often comes in.

There is certainly a large mass of statistical reference material already published and available to answer questions that most biologists might have, however, most texts and articles focused on statistical analysis issues are written by, well ... statisticians. This is a good thing, of course, but sometimes brings with it the negative corollary of a specialized dialect and viewpoint. Joe or Josephine Biologist may find these, at best, difficult to adapt to their own needs, or, at worst, frankly unintelligible.

This chapter is addressed to fellow Joes and Josephines. It is written in plain language by a nonstatistician who has often grappled at the interface where experimental results are transduced through an intermediate state as digits, before manifesting themselves, newly metamorphosed, into conclusions that flutter their wings right into our collective scientific knowledge database. Whereas some basic statistics explanations may embellish its text, this is by no means intended to serve as a statistics manual, nor to explore the many intricacies of the field; in fact, statisticians may well be appalled by its gaps and oversimplifications. The publisher, on the other hand, is likely to be pleased at the paucity of equations, since it is common knowledge in the industry that book sales drop with each equation in a text. This chapter addresses data analysis issues pertinent to biomarker research in drug development. More specifically, it is intended to clarify some common misconceptions or confusion concerning which situations are appropriate for alternative choices of statistical analysis and to integrate statistical issues with nonstatisti-cal issues that also affect our data interpretation.