Critical appraisal: Randomized-controlled trials for Drug Therapy Nancy J. Lee, PharmD, BCPS Research fellow, Drug Effectiveness Review Project Oregon Evidence-based Practice Center Oregon Health and Science University To receive 1.0 AMA PRA Category 1 Credit™ you must review this section and answer CME questions at the end. Release date: January 2009 Expiration date: January 2012 Attachments • The attachments tab in the upper right hand corner contains documents that supplement the presentation • Handouts of slides and a glossary of terms can be found under this tab and are available to print out for your use • URL to online resources are also available Program funding This work was made possible by a grant from the state Attorney General Consumer and Prescriber Education Program which is funded by the multistate settlement of consumer fraud claims regarding the marketing of the prescription drug Neurontin®. Continuing education sponsors: The following activity is jointly sponsored by: The University of Texas Southwestern Medical Center and the Federation of State Medical Board’s Research and Education Foundation. CME information Program Speaker/Author: Nancy J. Lee, PharmD, BCPS Research fellow, Oregon Health and Science University, Oregon Evidence-base Practice Center, Drug Effectiveness Review Project Course Director: Barbara S. Schneidman, MD, MPH Federation of State Medical Boards Research and Education Foundation, Secretary Federation of State Medical Boards, Interim President and Chief Executive Officer Program Directors: David Pass, MD Director, Health Resources Commission, Oregon Office for Health Policy and Research Dean Haxby, PharmD Associate Professor of Pharmacy Practice, Oregon State University College of Pharmacy Daniel Hartung, PharmD, MPH Assistant Professor of Pharmacy Practice, Oregon State University College of Pharmacy Target Audience: This educational activity is intended for those that are involved with committees involved with medication use policies and for health care professionals who are involved with medication prescribing. Educational Objectives: Upon completion of this activity, participants should be able to: recognize differences between critical appraisal and quality assessment /internal validity; review the general steps involved in critical appraisal; discuss various components of internal validity; to review important statistical concepts important in critical appraisal; recognize importance of clinical insight and experience in critical appraisal. CME policies Accreditation: This activity has been planned and implemented in accordance with the Essential Areas & Policies of the Accreditation Council for Continuing Medical Education through the joint sponsorship of The University of Texas Southwestern Medical Center and the Federation of State Medical Boards Research and Education Foundation. The University of Texas Southwestern Medical Center is accredited by the ACCME to provide continuing medical education for physicians. Credit Designation: The University of Texas Southwestern Medical Center designates this educational activity for a maximum of 1.0 AMA PRA Category 1 Credit™. Physicians should only claim credit commensurate with the extent of their participation in the activity. Conflict of Interest: It is the policy of UT Southwestern Medical Center that participants in CME activities should be made aware of any affiliation or financial interest that may affect the author’s presentation. Each author has completed and signed a conflict of interest statement. The faculty members’ relationships will be disclosed in the course material. Discussion of Off-Label Use: Because this course is meant to educate physicians with what is currently in use and what may be available in the future, “off-label” use may be discussed. Authors have been requested to inform the audience when off-label use is discussed. DISCLOSURE TO PARTICIPANTS It is the policy of the CME Office at The University of Texas Southwestern Medical Center to ensure balance, independence, objectivity, and scientific rigor in all directly or jointly sponsored educational activities. Program directors and speakers have completed and signed a conflict of interest statement disclosing a financial or other relationship with a commercial interest related directly or indirectly to the program. Information and opinion offered by the speakers represent their viewpoints. Conclusions drawn by the audience should be derived from careful consideration of all available scientific information. Products may be discussed in treatment outside current approved labeling. FINANCIAL RELATIONSHIP DISCLOSURE Faculty David Pass, MD Dean Haxby, PharmD Daniel Hartung, PharmD, MPH Nancy Lee, PharmD, BCPS Barbara S. Schneidman, MD, MPH Type of Relationship/Name of Commercial Interest(s) None Employment/CareOregon None None None Learning objectives • Recognize difference between overall critical appraisal of evidence and quality assessment / internal validity – What is it and why is it necessary? • Review general steps involved in critical appraisal • Discuss various components of internal validity • Review important statistical concepts important in critical appraisal • Recognize importance of clinical insight and experience in critical appraisal What is critical appraisal? • Process of examining research evidence to evaluate its validity, results, and relevance before making an informed decision • It’s not an exact science and it won’t give us the “right” answers • Foundation to practicing thoughtful “evidence-based” or “evidenceinformed” medicine Hill, et al. Bandolier Volume 3 (2). http://www.evidence-based-medicine.co.uk Why is it necessary? • Not all publications are equally convincing or reliable (even if published in a reputable journal). – – – – Incorrect interpretation Fraud and misrepresentation of results Data dredging Data dumping • Sometimes clinical experience and theory based on pathophysiology can be misleading. • Systematically examining the literature increases our confidence in our strengths and shed light on areas of weakness Turner, et al. NEJM2008;358:252-60 Benefits and challenges • Benefits – Encourages objective assessment of the literature – Recognize breadth and depth of evidence base in a particular topic area • Challenges – Time intensive at first – Generates more questions than answers – Potential to highlight lack of good evidence making decision making challenging Critical appraisal requires that you ask yourself: 1. 2. 3. 4. 5. Is this relevant? Is this valid? Is this reliable? Is this important and meaningful? Is this applicable or generalizable? 1. Is this article even relevant? • Should I read it? – Title – Abstract – Introduction • Does this study identify a gap in the evidence? • Were the objectives clear and focused? • Should I continue? – Methods • Were inclusion and exclusion criteria clearly stated? • Are the outcomes patient-oriented or surrogate markers for long-term health outcomes? • How long was the study? (study duration) 2. Is this valid? • INTERNAL validity or study quality – Was the design, methods, and conduct of a study likely to have prevented or minimized bias in such a way that I can trust the findings? • With the information provided, could I reproduce this study and observe similar findings? “Trust no one unless you have eaten much salt with him.” -Cicero “Skepticism is the chastity of the intellect…” -George Santayana A few words about “quality” • Means different things to different people • In the context of this module, “quality” refers to methodologic quality or… – Study quality = quality assessment = internal validity – NOT the same as quality of reporting • Remember: “not reported” ≠ “not performed” • Sometimes difficult to differentiate between the two • Subjective process—may use dual review Threats to INTERNAL validity Selection bias, Performance bias, Detection bias, Attrition bias A. B. C. D. E. F. Randomization Allocation concealment Blinding Attrition Statistical analysis Other i. Post-randomization exclusions ii. Crossovers Study Quality Is this valid?: INTERNAL validity A. Randomization: – Adequate (unbiased): computerized random number generator, random number table – Inadequate (biased): by hospital number, date of birth, alternate assignment B. Allocation concealment – Adequate (unbiased): interactive voice response system, sealed, opaque envelopes that are coded and handled by a third party (centralized or pharmacycontrolled) – Inadequate (biased): serially numbered envelopes (even sealed opaque envelopes can be subject to manipulation), open lists Were both treatment groups fairly balanced? Example Primary outcome: cardiovascular events or deaths Tolbutamide N= 204 (% of subjects) Placebo N= 205 (% of subjects) Age >55 50 41 Digitalis use 8 5 Angina 7 5 ECG abnormality 4 3 Total chol >300 mg/dL 15 9 Fasting glucose >110 mg/dL 72 64 Hypertension 30 37 Adapted from Elwood. Critical appraisal of Epi studies and Clin trials 1998 Is this valid?: INTERNAL validity C. Blinding – Single-blind, double-blind, triple-blind, open-label, double-dummy • Who was blinded? • Was blinding maintained? – Is blinding essential or possible in every situation? • Important when outcome measures involve some subjectivity • May be less important when outcome measure is death Therapy administered in the comparator arm should be as “identical” to the therapy administered in the treatment arm Examples • Aspirin 1 gram vs. placebo post-MI – Double-blind – Risk of bleeding? • Ascorbic acid 1 gram vs. placebo for the common cold – Double-blind – Taste difference? • Esomeprazole vs. Omeprazole for erosive esophagitis – Double-dummy – Appearance? Is this valid?: INTERNAL validity D. Attrition – Was the total number of participants who withdrew reported for each group? – Were reasons for withdrawal provided? • Includes: adverse events, lost to follow-up, protocol violation, or lack of efficacy Is this valid?: INTERNAL validity • Commonly reported methods of analysis – Intention-to-treat • Always verify the numbers yourself • Practical issue: depending on the reason behind the missing data, may allow for <3-5% difference in baseline ITT numbers. – Other popular approaches: last observation carried forward (LOCF), as-treated or Perprotocol analyses, data not imputed, mixed modeling, etc. Is this valid?: INTERNAL validity E. Statistical analysis – Was the method appropriate? – What is the potential for type I or type II error? • Adequate power? • False positive • False negative – Selective analysis of data selective reporting • Example: calculating statistical significant p-value for A1c at week 26 instead of week 52 Is this valid?: INTERNAL validity F. Other - Post-randomization exclusions, crossovers, contamination? • • Were any groups of participants excluded during the course of study? Why? Was this significant? Is this valid? For Harms • Apply similar concepts and also ask: – How was harms monitored? • Active or passive methods? – Who assessed the harms? • Study investigator or third party? – When and how often were the assessments conducted? • Face-to-face or over the phone? • Various terms used – Safety = fading out (except with FDA) – Adverse effect= undesirable outcome with reasonable causal association – Adverse event= undesirable outcome with unknown causal association – Tolerability= ability or willingness to tolerate unpleasant drug-related events without serious or permanent sequelae Chou, et al. J Clin Epi 2008. Sept 25 (Epub ahead of print, in press) Tools for assessing internal validity • There are > 25 different scales and tools for assessing the internal quality of a trial – Jadad scale – Chalmers scale – Cochrane Risk of Bias tool – DERP method • Adapted from US Preventative Task Force (USPTF) and National Health Service Centre for Reviews and Dissemination (UK) Example: Jadad scale Item Score 1. Was the study described as randomized? 0 or 1 2. Was the method used to generate the sequence of randomization described and was it appropriate (ie, computer generated, table of random numbers)? 0 or 1 3. Was the study described as double-blind? 0 or 1 4. appropriate? 0 or 1 5. Was there a description of withdrawals and dropouts? 0 or 1 Deduct 1 point if the method used to generate the sequence of randomization was described but inappropriate (ie, allocated alternately or according to date of birth or hospital number)? 0 or -1 Deduct 1 point if the study was described as double-blind but the method of blinding was inappropriate (ie, comparison of tablet vs. injection without a double dummy)? 0 or -1 Scoring range: 0-5 Poor quality <3was it Was the method of double-blinding described and Example: Cochrane “Risk of bias” tool http://www.ohg.cochrane.org/forms/Risk%20of%20bias%20assessment%20tool.pdf Example: DERP method Author Author Example The Center for Evidence-based Medicine. Oxford. http://www.cebm.net 1. 2. 3. 4. 5. Is this relevant? Is this valid? Is this reliable? Is this important and meaningful? Is this applicable or generalizable? 3. Are the results reliable? • Were all the results reported? – Was there evidence of selective outcome reporting? – How were the results reported? And are they easy to read or determine? • How large is the treatment effect? – Relative risk, relative risk reduction, odds ratio, absolute risk reduction, number needed to treat • How precise is the estimate of the effect? – How narrow or wide is the confidence interval? – Where does the point estimate fall? “Do not put your faith in what statistics say until you have carefully considered what they do not say. ” -William Watt • Brief overview of: – Relative risk Relative measures – Odds ratio – Absolute risk Absolute measures – Number needed to treat – P-value – Confidence intervals Is this reliable?: Interpreting data • Relative risk (RR)= event rate or risk ratio Risk in treatment arm Risk in control arm – RR = 1 (no difference) – RR < 1 (intervention lowers the risk of the outcome) – RR > 1 (treatment increases the risk of the outcome) • Relative risk reduction (RRR) RRR= (1-RR) x 100 Limitations of risk ratios • Study 1 (outcome): death from any cause – Treatment: 1% – Placebo: 2% – RR= 0.50 and RRR= 50% • Study 2 (outcome): death from any cause – Treatment: 25% – Placebo: 50% – RR= 0.50 and RRR= 50% Is this reliable?: Interpreting data • Odds for an event within a single group Odds of an event occurring = Odds of an event NOT occurring p 1–p • Odds ratio compares the odds across groups Odds of an event occurring in group A Odds of an event occurring in group B = p/ (1 – p) q/ (1 – q) – OR = 1 (no difference) – OR < 1 (lowers the odds of experiencing the outcome) – OR > 1 (increases the odds of experiencing the outcome) Zhang J, et al. JAMA 1998; 280 (19):1690-1. Figure shown in this slide is from this article. Is this reliable?: Interpreting data • Absolute risk reduction or Risk difference ARR = Risk on control – Risk on treatment • Number needed to treat to benefit or harm NNT = 1/ARR x 100 – Should include: duration of follow-up and the control group event rate Example New Antiplatelet for Patients with Acute Myocardial Infarction Conclusion: The new antiplatelet medication (MiHaart®) for acute myocardial infarction is more effective than placebo with a 25% reduction in mortality after 60 days of treatment. Mortality (# events) Placebo (N=1000) MiHaart® (N=1000) Composite 250 187.5 High risk patients 200 150 Low risk patients 50 37.5 This example is based on a fictional medication (MiHaart). Crunching the numbers Mortality (# events) Placebo (N=1000) MiHaart® (N=1000) Composite 250 187.5 High risk patients 200 150 Low risk patients 50 37.5 Risk MiHaart = 187.5/1000 = 0.1875 Placebo = 250/1000 = 0.250 Absolute risk (rate) MiHaart = 18.75% Placebo = 25.0% Relative risk MiHaart = 0.1875 = 0.75 Placebo 0.250 Absolute risk reduction 25.0% - 18.75% = 6.25% Relative risk reduction 1- 0.75 x 100 = 25% Crunching the numbers RR RRR ARR NNT Composite 0.75 25% 6.25% 16 High risk patients 0.75 25% 5% 20 Low risk patients 0.75 25% 1.25% 80 RR and RRR tend to provide inflated magnitude of effect compared with the ARR but… in order to determine if any of these values is a good point estimate of mortality, we must evaluate the confidence interval in which it lies. Is this reliable?: Interpreting data • How can we determine if the “point estimate” is a good reflection of the “true” value? – The utility of the P-value – Role of the confidence interval (CI) The P-value • Used to measure statistical significance in epidemiology. • By convention, p-value typically set at 0.05 () assumes that an event at a rate ≥ 1 in 20 is unlikely to be due to random chance alone. • The smaller the p-value, the more unlikely the “point estimate” was due to random chance. • Does not provide the possible range of the true differences of the “point estimate” The problem with P-values • P-values do not give indication of: – Treatment effect size – Precision of estimate – Direction of effect • Degrades data measures into dichotomous judgments – Significant (P<0.05) – Not Significant (P>0.05; P=NS) • Does not protect against Type I or Type II errors – Non-significant P-value = “Negative Trial” Absence of evidence is NOT evidence of absence The confidence interval • More useful than P-value in evaluating results • Provides a range of possible values for the “true” treatment value – Width of a CI is a function of sample size • Can be calculated for means, medians, proportions, odds ratios, relative risks, NNT. – 95% = most commonly calculated – Can go to http://www.openepi.com to help you calculate confidence intervals for treatment measures Same example New Antiplatelet for Patients with Acute Myocardial Infarction Conclusion: The new antiplatelet medication (MiHaart®) for acute myocardial infarction is more effective than placebo with a 25% reduction in mortality after 60 days of treatment. ARR 95% CI NNT 95% CI 6.25% 2.6 to 9.9 16 (benefit) 10 to 38.5 High risk patients 5% 1.7 to 8.3 20 (benefit) 12 to 59 Low risk patients 1.25% -0.5 to 3.0 80 33 (benefit) to and 200 to (harm) Composite Confidence intervals and trials that appear “negative” • Swedish Cooperative Stroke Study (N=505) – Aspirin= 9% nonfatal stroke – Placebo= 7% nonfatal stroke – Risk difference= -2% – 95% CI (-7% to 3%) Guyatt G, et al. CMAJ 1995; 152 (2):169-73. Trials that appear “negative” Statistically: no difference Clinically: ? Definitely negative 5% -7% 0 PLACEBO Inadequate sample size 3% ASPIRIN Confidence intervals and trials that appear “positive” • Enalapril in LV Dysfunction, SOLVD (N=1285) – Enalapril= 47.7% died or worsening HF – Placebo= 57.3% died or worsening HF – Risk difference= 9.6% (~10%) – 95% CI (6% to 14%) Guyatt G. CMAJ 1995; 152(2): 169-73. Trials that appear “positive” Statistically: there is a difference Clinically: ? Let’s say, clinically relevant lower bound is 7% = inadequate sample size 0 PLACEBO Definitely relevant 6% ENALAPRIL 14% 1. 2. 3. 4. 5. Is this relevant? Is this valid? Is this reliable? Is this important and meaningful? Is this applicable or generalizable? 4. Making sense of it all: Is this important and meaningful? • Do the results make sense? • Do the results provide anything new? – Do the results confirm a prior conclusion? • Will this change my practice? • Do the concluding remarks match the results? 5. Is this applicable or generalizable? • EXTERNAL validity = term phasing out – New terms: APPLICABILITY or GENERALIZABILITY – Was enough information regarding population (eligibility criteria), interventions, outcomes, study design, and setting reported such that I can apply the results to my patients or generalize the findings to a broader population? 5. Is this applicable or generalizable? • Population – Recruitment methods? – Disease severity or duration of illness? – Run-in periods? • Interventions – Study medication naïve? – Dose, duration, other allowed interventions, adherence? – Level of training for those who assessed intervention? • Outcomes – Long term health outcomes relevant to patients? – Intermediate (or surrogate) markers used? • Setting – Specialty setting or general setting (in-or outpatient)? – Country? Summary 1. 2. 3. 4. 5. Is this relevant? Is this valid? Is this reliable? Is this important and significant? Is this applicable or generalizable? • Developing critical appraisal skill set is important for providing quality care – Brings awareness of current medical practices and highlights areas where more research is needed Acknowledgements • Attorney General Consumer and Prescriber Education Program • Members of the technical advisory committee of this grant • Office for Oregon Health Policy and Research • The University of Texas Southwestern Medical Center • The Federation of State Medical Board’s Research and Education Foundation CME instructions • Please complete the survey, CME questions, and program evaluation after this slide • Don’t forget to click the finish button at the end of the CME questions • You should be directly linked to a CME form which you will need to fill out and fax, email, or mail in order to receive credit hours PROPERTIES On passing, 'Finish' button: On failing, 'Finish' button: Allow user to leave quiz: User may view slides after quiz: User may attempt quiz: Goes to URL Goes to Next Slide After user has completed quiz At any time Unlimited times