What is evidencebased medicine

Evidence-based medicine (EBM) is an approach to patient care that promotes the collection, interpretation, and integration of valid, important and applicable patient-reported, clinician-observed, and research-derived evidence. The best available evidence, moderated by patient circumstances and preferences is applied to improve the quality of clinical judgment.

The best available evidence is based on well-designed, randomized, double-blind and controlled trials (RCT) that have been diligently carried out (Table 3.1). RCTs are not always feasible, e.g. if the condition is very rare. Pharmacologic interventions are easier to perform as RCTs compared with, for example, invasive interventions. The latter are challenging regarding the control groups. The problem of the placebo effect (or expectation) is particularly acute regarding invasive treatments. An ethical concern is to decide how invasive a control treatment can be.

It is important to differentiate between lack of evidence (no controlled trials have been performed) and evidence for lack of effect (there is enough evidence to indicate that the treatment is not effective). Another question is whether the evidence is valid regarding an individual patient. This important question will be discussed later.

Evidence-Based Chronic Pain Management. Edited by C. Stannard, E. Kalso and J. Ballantyne. © 2010 Blackwell Publishing.

Randomization is important to minimize selection bias as inadequate concealment of treatment allocation overestimates the treatment effect by 41% [1] and nonrandomized studies can give wrong answers [2]. Each patient should have the same

Table 3.1 Type and strength of efficacy evidence

I Strong evidence from at least one systematic review of multiple well-designed randomized controlled trials

II Strong evidence from at least one properly designed randomized controlled trial of appropriate size

III Evidence from well-designed trials without randomization, single group pre-post, cohort, time series or matched case-control studies

IV Evidence from well-designed nonexperimental studies from more than one center or research group

V Opinions of respected authorities, based on clinical evidence, descriptive studies or reports of expert committees

Four levels of scientific evidence for the effectiveness of a certain intervention on a certain condition.

Level A Strong reserach-based evidence provided by generally consistent findings in multiple high-quality RCTs

Level B Moderate research-based evidence provided by generally consistent findings in one high-quality RCT plus one or more low-quality RCTs, or generally consistent findings in multiplelow-quality RCTs

Level C Limited or conflicting research-based evidence provided by one RCT (either high or low quality) or inconsistent findings in multiple RCTs

Level D No research-based evidence, i.e. no RCTs probability of being included in each study group and the allocation should be concealed. Randomization should be performed by someone who has no direct relationship to the study participants using tables of random numbers or numbers generated by computers.

Lack of double blinding will overestimate the treatment effect by roughly 17% [1] and this can lead to completely different answers, as with acupuncture in back pain [3]. Double blinding is achieved if at least the study subject and those making the observations are unaware of the treatment. Patients and observers can decode blinding because of adverse effects (and informed consents). Blinding can be tested by asking the participants which treatment they thought was given.

The control group is important as it indicates what the natural course of the disease is and/or how the new treatment compares with an established treatment. Figure 3.1 shows what effects different control groups can have. Patients with painful diabetic polyneuropathy showed a large "placebo" response. This could indicate that the patients either expected a large effect (the way the study was run enhanced the therapeutic effect of the treatment given) or the tendency for clinical improvement was greater in this group of neuropathic pain patients.

An ideal protocol should include an inactive control (placebo) and an active control (a gold standard if such exists), and the study drug in more than one dose. This means several groups and large numbers of patients need to be recruited. Thus the size of the trial may be compromised and the study will lack power to show any difference. Studies to demonstrate

Active control = natural course + interaction + expectation + actual effect

Placebo treatment = natural course interaction + expectation that there will be an effect

Visits without treatment = natural course + doctor/nurse and patient interaction

Natural course

Waiting list = natural course - negativity as nothing is being done

Figure 3.1 Different components of the "placebo" effect in different control groups.

unequivocally that there is no difference have to be very large, many times greater than standard analgesic trials. This is why an inactive control is important.

Quantitative systematic reviews or meta-analyses

According to the Dictionary of Evidence-Based Medicine [4], meta-analysis refers to the systematic quantitative pooling of available evidence on a particular research question with the use of appropriate statistical methods. As such, it forms part of many systematic reviews. In the context of drug efficacy, clinical trial evidence is sought systematically and the relevant efficacy data extracted. The data are then pooled using suitable weights such as sample variance or sample size. The pooled estimate of efficacy is then presented with the appropriate confidence bounds to define its precision.

Various statistical methods can be applied. The results of a meta-analysis are usually presented graphically with confidence interval (typically 95%) estimates for the individual as well as the pooled estimates of effect. Figure 3.2 shows the effect in individual studies and pooled effect of perioperative ketamine on the amount of morphine consumed in the ketamine versus placebo groups. In a cumulative meta-analysis the trials are arranged sequentially in order of publication date to provide a pooled estimate for the first two trials and then to update it with each subsequent trial [5].

The most "user-friendly" is number needed to treat (NNT), a term is used to define the reciprocal of the risk or rate difference. In a comparative study of two treatments A (analgesic) and P (placebo), suppose that the numbers of patients having at least 50% less pain after receiving treatments A and P are 80/100 and 60/100 respectively. Then the difference in rate of 50% pain relief is equal to 20/100. The reciprocal of this value, 5, is the NNT. This is interpreted as "on average, five patients need to be treated with treatment A for one more patient to achieve at least 50% pain relief than would be the case if they received treatment P."

The formula to calculate NNT:

1/[(Aimproved/Atotal) - (Pimproved/Ptotal)]

NNTs are "easy" to understand and to compare across studies. It is important that those who calculate and




Peri-operative ketamine for acute post-operative pain 01 Peri-operative ketamine vs control 01 Morphine (PCA) consumption over 24 h




or sub-category


Mean (SD)


Mean (SD)

Roytblat 1993


29.50 (7.50)


48.70 (13.00)

Javery 1996


25.82 (16.40)


51.10 (20.80)

Stubhaug 1997


64.50 (22.60)


68.00 (30.00)

IIk«r 1998


28.00 (21.00)


36.00 (23.00)

Adriaenssens 1999


19.40 (10.70)


30.70 (15.90)

Menigaux 2000 post


24.20 (17.80)


49.70 (24.10)

Menigaux 2000 pre


28.20 (18.40)


49.70 (24.10)

Guignard 2002


42.70 (16.30)


64.90 (27.00)

Jaksch 2002


44.10 (45.23)


40.23 (17.16)

Guillou 2003


37.00 (24.00)


48.00 (22.00)

Snijdelaar 2004


32.15 (18.59)


50.42 (24.70)

Total (95% CI)


10 (P = 0.19), I2 =


Test for heterogeneity:

= 13.67, df =


Test for overall effect: Z =

8.42 (P < 0.00001)


WMD (fixed)


95% CI
















































Favors treatment Favors control

Figure 3.2 Meta-analysis of the 24 h-consumption of morphine via patient controlled analgesia as an outcome for the efficacy of perioperative ketamine vs. placebo. Reproduced from Bell etal. [5].

use NNTs understand both the problems and benefits when applying them. Pharmacologic studies in acute pain relief offer the highest quality data for metaanalyses in pain research (see Chapter 2). NNT is treatment specific. It describes the difference between active treatment and control in achieving a certain clinical outcome.

The NNTs shown in Figure 3.3 are based on randomized and placebo-controlled studies where the baseline pain intensities have been at least moderate. The pain-relieving effect of a single dose of the studied drug and placebo is assessed over 4-6 hours. If rescue medication is given during this period, the last value before rescuing is used for the remaining time points. The area under the time-analgesic effect curve for pain relief (TOTPAR) from time point 0 to 6 hours is calculated. The calculation of NNTs is based on data that cover a period of 4-6 hours postoperatively. The calculation of NNTs requires dichotomous data. In this case, the endpoint for improvement is set at >50% pain relief, meaning that the TOTPAR shows that pain has decreased by at least 50% from the baseline pain intensity.

As all the data for the analgesic league table shown in Figure 3.3 are based on single dose studies in acute postoperative pain over a period of 6 hours, all conclusions should be restricted within these limits. A similar TOTPAR can be produced by a very effective but short-lasting analgesic and a less effective analgesic that has a longer duration of action. The time to onset of analgesia is not shown, so analgesics with slow onset but long duration of action or those with fast onset and fast offset may seem to underperform.

Figure 3.3 shows that nonsteroidal anti-inflammatory drugs (NSAIDs) compare well with opioids and that increasing the dose will improve the effectiveness of both NSAIDs and opioids. Higher doses will increase the risk for adverse effects. These are very different for the two groups of drugs. Figure 3.3 also shows that combination analgesics are effective. The combination analgesic is more effective than the opioid component alone. Two examples are paracetamol versus paracetamol plus codeine [6] and tramadol versus tramadol plus paracetamol [7]. This is important when trying to minimize adverse effects.

Codeine 60 mg Dihydrocodeine 30 mg Tramadol 50 mg Dextropropoxyphene HCl 65 mg Ketorolac (intramuscular) 10 mg Paracetamol 500 mg Tramadol 75 mg Paracetamol 300 mg/Codeine 30 mg Paracetamol 600/650 mg Aspirin 650/Codeine 60 mg Tramadol 100 mg Paracetamol 1000 mg Aspirin 600/650 mg Dextropropoxyphene 65mg/paracetamol 650 mg Aspirin 1000 mg Ketorolac (intramuscular) 30 mg Paracetamol 600 or 650 mg/Codeine 60 mg Morphine 10 mg (intramuscular) Tramadol 150 mg Pethidine (intramuscular) 100 mg Ibuprofen 400 mg Ketorolac (oral) 10 mg Aspirin 1200 mg Diclofenac 50 mg Paracetamol 1000 mg/Codeine 60 mg Ketorolac (intramuscular) 60 mg Diclofenac 100 mg

Figure 3.3 Oxford League Table of Analgesic Efficacy: NNT for at least 50% pain relief in patients with moderate to severe postoperative pain over 4-6 hours. Information was from randomized, double-blind, placebo-controlled trials. All doses oral except where indicated. The lower the NNT, the more effective the analgesic.

1305 194 770 440 142 649 563 442 1167 598 882 2283 5061 963 716 359 816 946 561 364 2898 790 279 636 127 116 308

Figure 3.3 Oxford League Table of Analgesic Efficacy: NNT for at least 50% pain relief in patients with moderate to severe postoperative pain over 4-6 hours. Information was from randomized, double-blind, placebo-controlled trials. All doses oral except where indicated. The lower the NNT, the more effective the analgesic.

The number of patients included in the calculation of the NNT is important. The number of patients required in each group for a clinically relevant NNT (NNT within ± 0.5 of true value) depends on the experimental event rate (EER = the proportion of patients given the active drug experiencing at least 50% pain relief). Based on single-dose acute pain analgesic trials in over 5000 patients, the control event rate (CER = proportion of patients experiencing at least 50% pain relief with placebo) is roughly 16%. Most common analgesics have EERs in the range of 40-60%. The group size required to obtain a probability of 0.95 would be >500 if the EER is 40% and about 180 if the EER is 60% [8].

The L'Abbé plot displays individual trial results so that the reader can easily identify which of the trials show benefits in favor of the test treatment and which do not (Fig. 3.4). The two axes of the plot represent the response of interest (e.g. percentage of patients having at least 50% pain relief) for the two treatment groups. Identical scales are chosen for each group's response (y axis for the test treatment, e.g. ibuprofen, and x axis for the control treatment, e.g. placebo) and the plane subdivided into two equal areas separated by a 45° diagonal line of equality.

Trials which show results in favor of the test treatment fall in the region above the diagonal while those which favor the control treatment fall below the diagonal. The symbol (circle) chosen to represent the individual trial may be sized to reflect the sample size or inverse variance of the estimate and hence the weight which should be attached to each of the trials [9].

Qualitative systematic reviews

The Dictionary of Evidence-Based Medicine [4] defines a systematic review as a review of a particular subject undertaken in such a systematic way that the risk of bias is reduced. The review objectives are defined precisely and formal and explicit methods are used to retrieve the available evidence as comprehensively as possible. Inclusion and exclusion criteria for studies are defined. In the evaluation of medical interventions, outcomes to be used for efficacy or safety are identified and the relevant data extracted using explicit methods. Appropriate statistical methods are used for pooling any suitable quantitative data (meta-analysis) to provide an estimate of efficacy or safety and the clinical significance of the results discussed.

Figure 3.4 Ibuprofen 400 mg vs. placebo. Each point represents one trial with the proportion of patients achieving at least 50% pain relief on the study drug plotted on the y-axis, and the proportion of patients achieving the same endpoint with placebo on the x-axis. The drugs were given for postoperative pain when the pain was at least moderate in severity. All circles above the line of equality indicate that ibuprofen was more effective than placebo. Modified from McQuay HJ, Moore RA. Oral ibuprofen and diclofenac in postoperative pain in. An evidence-based resource for pain relief. Oxford University Press, Oxford: 1998.

Percent with at least 50% pain relief with ibuprofen

Percent with at least 50% pain relief with ibuprofen

Placebo average 16%

Percent with at least 50% pain relief with placebo

Placebo average 16%

Percent with at least 50% pain relief with placebo

It is often not possible to combine (pool) data, resulting in a qualitative rather than a quantitative systematic review. Combining data is not possible if:

• no quantitative information is available in the component trials of the review

• trials had different clinical outcomes

• patients were followed for different lengths of time

• combining continuous rather than dichotomous data may be difficult.

Narrative (nonsystematic) reviews are important as they are often used as a source for references. They can easily be biased, however, as both inclusions and conclusions may be determined by the author's own opinion rather than by systematic methodology. Setting criteria for inclusion, assessing quality and vote counting, i.e. determining how many studies show that the intervention works or does not work, requires at least three authors. Vote counting can lead to wrong results if more weight is not given to studies of higher quality and validity.

Quality and validity

Quality scales (Table 3.2) score trials for randomization, double blinding and description of withdrawals and drop-outs. A trial must be of a certain quality to be included in a review. Nonrandomized and randomized studies can show completely different results. A review of transcutaneous electrical nerve stimulation (TENS) for postoperative pain relief

[2] analyzed 17 randomized and 19 nonrandomized studies. Seventeen of the 19 nonrandomized studies showed that TENS was more effective than placebo while 15 of the 17 randomized studies showed that it was less effective than placebo.

Nonblinded studies may also overestimate treatment effects. A review of acupuncture for back pain

[3] included both blinded and nonblinded studies. The blinded studies showed that 57% of patients improved with acupuncture and 50% with control. The five nonblinded studies, however, showed a significant difference from control as 67% improved with acupuncture and only 37% with control.

In general, studies with low quality score (Table 3.2) show greater effects of treatment than higher quality studies. A systematic review analyzed 50 trials with 2394 patients for the effectiveness of acupuncture in chronic pain [10]. Most high-quality studies showed either no benefit or that acupuncture was worse than

Table 3.2 Quality scoring. From Jadad et al. [45]



• Appropriate 1

- no -1 Withdrawals described?

control. From 40% to 50% of the low-quality trials showed acupuncture to be better than control.

However, this does not necessarily mean that the trial is of adequate design to answer the question it posed. The issue of validity is thus different from that of quality. An analgesic trial with a high quality score would not be valid if the trial investigated patients with insufficient baseline pain to show an analgesic effect. Adequate baseline pain intensity [6] and adequate numbers of patients in each group [8] are two of the most important inclusion criteria based on assessment of validity (Table 3.3). In pre-emptive studies where the analgesic is given before the pain appears, it is not possible to assess baseline pain and new methodologic approaches need to be developed [11]. This is particularly important considering the current interest in preventing acute pain becoming chronic (see Chapter 16).

It is essential that the authors are familiar with the clinical setting in order to appreciate the specific questions of validity. Assessment of validity may require tailor-made criteria for different settings, e.g. in dental [12] or back problems [13]. Valid outcomes should also be considered carefully. Simple pain intensity or pain relief scales may not be the most appropriate outcomes in chronic pain, particularly if they are used as the only measures. Several interventions may improve the quality of life, physical functioning or coping strategies of the patients with little effect on pain itself.

Systematic reviews do not carry quality control labels apart from Cochrane reviews that have been

Table 3.3 Oxford Pain Validity Scale (OPVS). From Smith etal. [46]

Item Score


• The trial was convincingly double blind 6

• The trial was convincingly single blind or unconvincingly double blind 3

• The trial was not blind/blinding is unclear 0

Size of trial groups


• The paper included results for at least one pre-hoc desirable outcome, and used it appropriately 2

• No results for any of the pre-hoc desirable outcomes/a pre-hoc desirable outcome was used inappropriately 0

Baseline pain and internal sensitivity

• For all treatment groups, there was enough baseline pain to detect a difference between baseline and 1 post-treatment levels/the trial demonstrated sensitivity

• For all treatment group, baseline levels were insufficient to be able to measure a change following the 0 intervention/baseline levels could not be assessed/internal sensitivity was not demonstrated

Data analysis a. Definition of outcomes

• The paper defined the relevant outcomes clearly 1

• The paper failed to define the outcomes clearly 0

b. Data presentation: location and dispersion

• The paper presented mean data ± SD/dichotomous outcomes/median + range/sufficient data to enable 1 extraction of any of these

• The paper presented none of the above 0

c. Statistical testing

• Appropriate statistical test with correction for multiple tests where relevant were used 1

• Inappropriate statistical test and/or multiple testing without correction/no statistics were used 0

d. Handling of drop-outs

• The drop-out rate was either <10%, or was >10% and includes an ITT analysis in which drop-outs were 1 included appropriately

• The drop-out rate was >10% and drop-outs were not included in the analysis/it is not possible to calculate drop-out rate presented in the paper 0

The maximum total score is 16

approved by editors of the Cochrane Collaboration.

The following list of quality control checks has been suggested by Oxman & Guyatt [14].

• Were the questions and methods stated clearly?

• Were the search methods used to locate relevant studies comprehensive?

• Were explicit methods used to determine which articles to include in the review?

• Was the methodologic quality of the primary studies assessed?

• Were the selection and assessment of the primary studies reproducible and free from bias?

• Were differences in individual study results explained adequately?

• Were the results of the primary studies combined appropriately?

• Were the reviewers' conclusions supported by the data cited?

A major concern regarding validity is also the relevance of the RCT for current practice. Medicine develops rapidly and therefore studies performed 10-20 years apart from each other are hardly comparable.

Systematic reviews do not compete with original research; rather, they complement each other. The greatest benefit of systematic reviews is the lessons they have taught us about trial methodology. They provide a means of quality control over clinical trials and help us to develop and apply better research methodology and to produce more reliable data. The Consolidated Standards of Reporting Trials (CONSORT) Statement was first published in 1996 and was revised in 2001 for improving the quality of reports of parallel group randomized trials [16] cluster randomized trials [17], noninferiority and equivalence trials [18], herbal interventions [19] and nonpharmacologic treatments [20]. Most high impact factor medical journals currently endorse the CONSORT statement [21]. Standards for improving the quality for reporting on meta-analyses of RCTs were published in 1999 [22].

Acupuncture For Cynics

Acupuncture For Cynics

Have You Always Been Curious About Acupuncture, But Were Never Quite Sure Where To Stick The Needles? If you associate acupuncture with needles, pain and weird alternative medicine then you are horribly misinformed about the benefits of the world's oldest form of medicinal treatment.

Get My Free Ebook

Post a comment