Statistical Considerations

Generally speaking, identification of a Pgx marker requires association of that marker with phenotype data from the clinical study. Phenotype data may be binary (positive/negative for the given response) or may have other characteristics. Often continuous phenotype data are converted to binary or ordinal format to enable analyses; however, for some types of data, correlation of markers with the response end points may be a useful exploratory approach. One important consideration is the assay itself; some clinical assays that seem to yield "continuous" data values are validated only for use as binary assays (positive/negative based on a threshold) and may not provide accurate data across a wide concentration range.

Whether a small group of candidate markers is selected or a whole-genome dataset is used, the search for putative classifiers should follow an exploratory/confirmatory paradigm. A marker or multivariate classifier that is identified in one sample set should be tested independently in a different sample set (24). Bootstrap techniques use multiple subsets of a single sample to demonstrate the sensitivity of the derived model to changes in the input data. This type of validation does not constitute an adequate test of the model, since identification of the classifier is based on the composition of initial sample set, and may be affected by sampling bias. Thus, a major hurdle for decision-making based on Pgx association data is the independent replication of results (25). Robust association markers should derive from a sufficient number of examples to provide at least two datasets for analysis, ideally from independently collected samples of the general population. To provide multiple datasets during drug development, Pgx activities must be included early in the program.

In practice, the number of samples required is directly related to the number of variables in the testing algorithm. For candidate Pgx markers, it is possible to generate relevant association data from fairly small numbers of treated patients (i.e., less than 100) whereas whole-genome association algorithms should include much larger sample numbers (200-300 per group). In this case, corrections for multiple testing must be used to reduce the associations found by chance alone.

Methods to adjust for multiple comparisons are usually applied to whole-genome scans for SNP association studies, but there is controversy about using very low significance thresholds because truly associated markers may not be identified. In exploratory studies, a less stringent threshold may lead to the identification of several functionally related markers, such as upstream and downstream markers of a certain pathway. This can lend mechanistic strength to the association, although it is not sufficient to confirm a marker or pathway in the absence of replication in an independent dataset.

If candidate markers have been previously identified (through literature or exploratory studies), a possible tactic is to perform an initial analysis using a predetermined set of candidate markers, with appropriate correction for false discovery rate based on the number of markers to be tested. The whole-genome analysis for non-hypothesis-based associations may follow, with a more stringent significance threshold to limit the false discovery rate.

Gene expression profiles are generally analyzed for hypothesis generation; however, multivariate classifiers such as gene expression signatures, once identified, should be tested with the same stringency as other candidate diagnostics (25). Methods for incorporating composite markers to identify patient subsets on the basis of gene expression have been proposed (26,27). Few diagnostics based on multivariate assays have been developed to guide patient treatment, and none has yet been codeveloped for launch with a protein therapeutic. Nonetheless, the results of gene expression studies may provide insight into the mechanism of action of a novel drug or target and may identify a subset of candidate markers for further analysis.

0 0

Post a comment