From Clinical Trial Evidence to Practice Guidelines Lost in Translation Sanjay Kaul, MD, FACC George A. Diamond MD, FACC Division of Cardiology Cedars-Sinai Medical Center and Geffen School of Medicine at UCLA Los Angeles, California Complexity of American Strategy in Afghanistan “When we understand that slide, we’ll have won the war” - General McChrystal Complexity of Evidence-Based Medicine Lost in a jungle of evidence, we need a compass Evidence to Guidelines Lost in Translation Key Issues for Discussion • Establish the scientific evidence - Appraise and synthesize the evidence • Elucidate the clinical context - Clinical importance vs. statistical significance - Clinically relevant weighted outcomes • Encourage optimal processes of care - Quality initiatives - Reimbursement initiatives Appraisal of Evidence Design and Methods Quality • Important limitations - Study design or execution (bias) - Lack of randomization - Lack of concealment - ITT principle violated - Inadequate blinding - Loss to follow-up - Early stopping for benefit - Inconsistency of results - Indirectness of results - Imprecision - Publication bias Quality • Special strengths - Randomized, controlled, prospective, double-blind trials - Large, consistent, and precise treatment effect - RR<0.5 or >2.0 (large) - RR<0.2 or >5.0 (very large) - Minimal confounding & bias Synthesis of Evidence ACC/AHA Clinical Practice Guidelines Class I (“Useful & Effective”) (Benefit >>> risk) (Highly recommended) Level A (Multiple randomized clinical trials) Level B (Single randomized trial or nonrandomized studies Level C (Consensus opinion, case studies, or standard of care) Class II (“Conflicting Evidence”) IIa (Benefit >>risk) (Reasonably recommended) IIb (Benefit ? risk) (May be considered) Class III (“Not useful/ effective, may be harmful”) (No benefit/Harm) (Not recommended) Evidence to Guidelines Lost in Translation Self-evident Truths • Does empirical evidence trump expert opinion? Scientific Evidence Underlying The ACC/AHA Clinical Practice Guidelines Level of Evidence A AF 11.7 Heart failure 26.4 PAD 15.3 STEMI 13.5 Perioperative 12 Secondary prevention 22.9 Stable angina 6.4 SV arrhythmia 6.1 UA/NSTEMI 23.6 Valvular disease 0.3 (1/320) VA/SCD 9.7 PCI 11 CABG 19 Pacemaker 4.9 Radionuclide imaging 4.9 0 10 20 Tricoci P et al. JAMA 2009 30 Scientific Evidence Underlying The ACC/AHA Clinical Practice Guidelines Level of Evidence C AF 58.6 Heart failure 54.3 PAD 25.1 STEMI 47.2 Perioperative 32 Secondary prevention 8.3 Stable angina 54.5 SV arrhythmia 56.5 UA/NSTEMI 29.6 Valvular disease 70.6 VA/SCD 58.5 PCI 47.8 CABG 20 Pacemaker 58.2 Radionuclide imaging 26.3 0 10 20 30 40 50 60 Tricoci P et al. JAMA 2009 70 80 ACC/AHA Clinical Practice Guidelines Paucity of High-Quality Evidence Class I Class II (Benefit >>> risk) (Highly recommended) IIa (Benefit >>risk) (Reasonably recommended) IIb (Benefit ? risk) (May be considered) Class III (Risk ? Benefit) (Not recommended) Level A 19% based Recommendations are largely developedare from lower on high-level 41% of guidelines levels of evidence opinion. “Exercise caution evidence or expert based on Class II recommendations when not supported by Level B considering recommendations (Single (”uncertain evidence”) solid trial evidence” randomized or (Multiple randomized clinical trials) nonrandomized studies Level C (Consensus opinion, case studies, or standard of care) 48% of guidelines are based on level C evidence (“codification of expert opinion“) Tricoci P et al. JAMA 2009 Scientific Evidence Underlying The ACC/AHA Clinical Practice Guidelines Caveat Emptor, Caveat Lector “…it seems unlikely that substantial change will occur because many guideline developers seem set in their ways. If all that can be produced are biased, minimally applicable consensus statements, perhaps guidelines should be avoided completely. Unless there is evidence of appropriate changes in the guideline process, clinicians and policy makers must reject calls for adherence to guidelines. Physicians would be better off making clinical decisions based on valid primary data” Shaneyfelt and Centor, JAMA 2009 Guidelines that are driven by scientifically documented, high-quality evidence are more likely to be accepted by the stakeholders, thereby reducing the variability in care and improving the quality and cost of care 2009 ACC/AHA Focused Updates for STEMI/PCI Paucity of High-Quality Evidence Class I (Benefit >>> risk) (Highly recommended) Level A (Multiple randomized clinical trials) Level B (Single randomized trial or nonrandomized studies Level C (Consensus opinion, case studies, or standard of care) 12% based on high-level evidence Class II IIa (Benefit >>risk) (Reasonably recommended) IIb (Benefit ? risk) (May be considered) Class III (Risk ? Benefit) (Not recommended) 50% of guidelines are based on Class II recommendations (“conflicting evidence”) 44% of guidelines are based on level C evidence (“filtered expert opinion”) Kushner FG, Hand M et al. 2009 Focused Updates, JACC/Circulation 2009 The Laws of Diminishing Objectivity in the Interpretation of Evidence vehemence -1 evidence vehemence 2 eminence Peter McCulloch The Lancet, 2004;363;9004 Evidence to Guidelines Lost in Translation Key Issues for Discussion • Establish the scientific evidence - Appraise and synthesize the evidence • Elucidate the clinical context - Clinical importance vs. statistical significance - Clinically relevant weighted outcomes • Optimal processes of care - Quality initiatives - Reimbursement initiatives ACC/AHA Clinical Practice Guidelines Metrics for Assessing Strength of Evidence • Effect size - Absolute risk difference (NNT or NNH) - Relative risk difference Risk ratio Odds ratio Hazard ratio • Statistical certainty/precision - Hypothesis testing (P value) - Estimation (confidence interval) • ? Clinical importance Little or no explicit guidance Disconnect Between Statistical Significance and Clinical Importance 1 P value Effect Size Sample Size Statistical significance Clinical importance! Statistical Significance vs. Clinical Importance GP IIb/IIIa Inhibitors in UA/NSTEMI Trial (IIb/IIIa) Death / MI at 30 days Risk Ratio & 95% CI N Placebo 2b/3a (%) (%) GUSTO IV 7800 8.0 8.7 PRISM 3232 7.0 5.7 PRISM-Plus 1570 11.9 8.7 PURSUIT 9461 15.7 14.2 PARAGON A 1513 11.7 10.3 PARAGON B 5169 11.4 10.5 12.5 11.3 POOLED 0.91 (0.86, 0.99) 28,745 P=0.015 P=0.339 Breslow-Day Homogeneity 0.1 Better 1 Worse 10 Boersma et al, Lancet 2002;359:189-1198. ARR = 1.2% RRR = 9% What Does a P(ee) Value of 0.05 Mean? • ‘Fisherian’ P value of 0.05 is arbitrary and originally based on n=30! The plain fact is that in 1925 Ronald Fisher gave scientists a mathematical machine turning “baloney • for Always demand a P value into breakthroughs”, and “flukes intooffunding”. <0.001 for a sample size Robert Matthews > 200 as strong evidence against the null hypothesis of zero difference Al Feinstein Disconnect Between Statistical Significance and Clinical Importance 1 P value Effect Size Sample Size Lack of statistical significance lack of clinical importance! Statistical Significance vs. Clinical Importance Unfractionated Heparin in UA/NSTEMI Trial Death/MI Risk Ratio & 95% CI N ASA+UFH ASA Theroux 243 1.6% 3.3% RISC 399 1.4% 3.7% ATACS 214 3.8% 8.3% Holdright 285 27.3% 30.5% 0% 3% 5.7% 9.6% 7.9% 10.4% Cohen 1990 Gurfinkel Overall 69 143 0.67 (0.44-1.02) P=0.06 1335 0.1 1.0 ASA+UFH Better 10 ASA Better Oler A et al, JAMA 1996;276:811-15 ARR = 2.5% RRR = 33% Statistical Significance vs. Clinical Importance • MDD (minimum detectable difference, “d”) - The “minimum difference” the study is powered to detect - Utilized for sample size estimation - May or may not reflect a clinically important difference • MCID (minimum clinically important difference) The “minimum acceptable difference” to change the behavior of the clinician, patient, payer or policy maker, given the side effects, costs and inconveniences of therapeutic interventions Guideline Criteria for Clinical Importance Impact of Outcome, Harm, and Cost on MCID Small Large 0% 50% MCID (RRR) Outcome severity Harm Mortality Irreversible morbidity Reversible Morbidity Surrogate Endpoint Very low Low Moderate High Cost Very low Low Moderate High Statistical Significance vs. Clinical Importance MCID Threshold for UA/NSTEMI ACS “In ACS, a relative reduction of 15% in recurrent clinical events has recently been considered clinically important (GUSTO I); this level is far below the perceived threshold that drove the sample size calculations for clinical trials just a decade ago. As we develop more incrementally beneficial therapies, it is likely that the minimally important clinical difference will become even smaller.” Califf and DeMets Circulation. 2002;106:1015 Statistical Significance vs. Clinical Importance Strength of Evidence MCID Statistically not significant, clinically not important A Statistically not significant, may be clinically important B Statistically significant, not clinically important C Statistically significant, may be clinically important D Statistically significant, clinically important E MCID = minimal clinically important difference = 15% RRD 0.85 1.0 Risk Ratio (95% CI) Sackett, D Statistical Significance vs. Clinical Importance Class I, LOE A Recommendations for UA/NSTEMI Impact on Death or MI Intervention Control (%) Rx (%) Summary risk ratio (95% CI) P Value NNT (95% CI) Interpretation of Confidence Intervals (MCID = 15% RRR) Aspirin (N=2,856) 12.8 5.5 0.43 (0.33-0.56) <0.01 14 (11-19) Statistically significant and clinically important (E) UFH (N=1,353) 10.4 7.9 0.67 (0.44-1.02) 0.06 44 (∞-18) Statistically not significant, maybe clinically important (B) Enoxaparin (Early invasive) 12.8 12.1 0.96 (0.88-1.05) 0.35 171 (∞-59) Statistically not significant, clinically not important (A) Clopidogrel (CURE) 11.4 9.3 0.82 (0.74-0.92) <0.01 54 (35-120) Statistically significant, maybe clinically important (D) GP IIb/IIIa (Early invasive) 14.5 11.8 0.81 (0.70-0.94) 0.007 37 (21-139) Statistically significant, maybe clinically important (D) Aspirin is the only intervention listed as a performance measure! Evidence to Guidelines Lost in Translation Key Issues for Discussion • Establish the scientific evidence - Appraise and synthesize the evidence • Elucidate the clinical context - Clinical importance vs. statistical significance - Clinically relevant weighted outcomes • Encourage optimal processes of care - Quality initiatives - Reimbursement initiatives Endpoints in Cardiovascular Clinical Trials MACE vs MICE Major Adverse Cardiac Events “Hard” but infrequent Death Cardiac arrest Large MI Disabling Stroke Emergency CABG Minor Inconvenient Cardiac Events “Soft” but prevalent Silent CK/Tn Release Restenosis Reintervention Recurrent angina Rehospitalization Groin hematoma Cardioprotective Effects of Antihistamines Means to an End or an End to Means % patients Placebo Antihistamine Death Recurrent MI p < 0.05 Itching Cardioprotective Effects of Stenting Clinical Outcomes at 1 Year in Stent PAMI D/MI/Stroke/TVR Individual Components 25 30% Stent (N=452) P < 0.005 25% 20 20% PTCA (N=448) P < 0.0005 15 15% % 10.7 10 10% P = 0.07 5.6 5 5% 16.9% 0% 20.9 Stent (N=449) P = 0.7 3.1 2.9 2.5 0.5 24.8% PTCA (N=444) P = 0.83 0 Death MI 0.5 Stroke TVR Benefit driven by the “least robust” but the “most prevalent” component Validity of the Composite Endpoint • Components should be of comparable frequency • Components should be of comparable clinical importance • Components should be comparably responsive to therapy Montori VM et al. Br Med J 2005; 330:594-596 Cardioprotective Effects of Stenting Validity of the Composite Endpoint in Stent PAMI OR (95% CI) Stent PTCA Death 1.81 (0.93-3.53) 5.6% 3.1% MI 1.17 (0.52-2.65) 2.9% 2.5% Stroke 0.99 (0.14-7.05) 0.5% 0.5% 0.45 (0.31-0.66) 10.7% 20.9% 0.62 (0.45-0.86) 16.9% 24.8% Cochran’s Q = 14.64 Hetero P = 0.002 I2 = 80% (46-92%) TVR Composite analysis 0.00 1.00 2.00 3.00 4.00 Odds ratio Composite: Variable gradient in clinical importance, frequency and treatment effect across components Cardioprotective Effects of Stenting Weighted Analysis of Composite Endpoint Global P value 1.00 0.90 Weights 0.80 Death = 1 MI = 1 Stroke = 1 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Weight of TVR Composite endpoint becomes significant at a TVR weight of >0.7! ACC/AHA Guideline Recommendations Prasugrel During Primary PCI for STEMI Class I (Benefit >>> risk) (Highly recommended) Class II IIa (Benefit >>risk) (Reasonably recommended) Class III IIb (Benefit ? risk) (May be considered) (Risk ? Benefit) (Not recommended) Level A (Multiple randomized clinical trials) Level B (Single randomized trial or nonrandomized studies Level C (Consensus opinion, case studies, or standard of care) 60mg prasugrel load ASAP, 10 mg daily x 12m Withhold prasugrel 7 days prior to CABG or surgery Pts with h/o TIA or stroke; active bleeding Kushner FG, Hand M et al. 2009 Focused Updates, JACC/Circulation 2009 Benefit-Risk Balance in TRITON (All ACS Cohort) 1000 Patients Treated with prasugrel instead of clopidogrel Prasugrel vs Clopidogrel Benefit • 24 endpoints prevented - 3 CV deaths - 0 strokes - 21 nonfatal MIs - 4 PPMIs - 17 MI events - 13 clinically relevant MIs Risk • 30 excess TIMI bleeds - 2 bleeding deaths - 3 TIMI Major bleeds - 5 TIMI Minor bleeds - 20 TIMI Minimal bleeds or • 29 excess moderate/severe bleeds - 2 bleeding deaths - 9 transfusions - 6 nonfatal serious bleeds - 12 nonfatal moderate bleeds or • 17 excess serious bleeds • ? 3-6 excess cancer (1 cancer death) Judgments about Strength of Recommendation Prasugrel for Patients with ACS Undergoing PCI FACTORS COMMENTS Balance between desirable and undesirable effects “The net benefits are uncertain” Quality of the evidence “Quality of the evidence is high.” Patient values and preferences Costs (resource use) “All patients and care providers would not accept efficacy-safety trade-off.” Alternatives available. “The cost is high for treatment for long duration.” Does the evidence favor Class I (benefit >>> risk) recommendation for prasugrel? Evidence to Guidelines Lost in Translation Key Issues for Discussion • Establish the scientific evidence - Appraise and synthesize the evidence • Elucidate the clinical context - Clinical importance vs. statistical significance - Clinically relevant weighted outcomes • Encourage optimal processes of care - Quality initiatives - Reimbursement initiatives Quality Matters Linking Guidelines Adherence and Mortality % In-Hosp Mortality 8 7 5.95 6.31 6 5.16 5.06 5 4.97 4.63 4.16 4.15 4 3 2 Every 10% in guidelines adherence 10% in mortality (OR=0.90, 95% CI: 0.84-0.97) 1 0 <=25% 25 - 50% 50 - 75% Hospital Composite Quality Quartiles Adjusted Unadjusted Peterson et al, JAMA 2006;295:1863-1912 >=75% GRACE: Outcome Measures over Time NSTE ACS 15% Changes in Clinical Outcomes for NSTE ACS Patients • 13.0% 10% p <.001 6.1% p = .02 for linear trend 5% 2.9% 0% 2.2% n = 2213 n =1566 Death n = 2228 n = 1564 • Risk-adjusted hospital deaths declined by 0.7 percentage points (95% CI, -1.7 to 0.3) in NSTE ACS patients. The rate of congestive heart failure and pulmonary edema decreased by 6.5% (95% CI, -8.4 to -4.7). Heart Failure Jul -Dec 1999 Jul-Dec 2005 Fox et al. JAMA. 2007; 297:1892-1900 ACC Improvement Initiatives Continuous Quality Improvement Translating Science into Practice Improvement • D2B • H2H • FOCUS ACT PLAN Guidelines/Standards • Guidelines • AUC / PM Education and Training Measurement • NCDR STUDY DO Implementation - “Bridge” • Quality Practice Assessment • Clinical Decision Support • Operation Management Tools The Role of Evidence-Based Guidelines in Improving Clinical Practice Turbocharging the Guidelines Fuel Boost High-quality evidence Implementation Design, process evaluation 2007 ACC/AHA Guideline Recommendations Acute Coronary Syndromes • Number of recommendations: >250 • Number of pages: 157 • Number of figures: 21 • Number of tables: 26 • Last update: 2002 • Writing committee members: 15 • Reviewers: 40 (6 different layers from evaluation to publication) • Conflict-of-interest disclosure - Writing committee members: 14/15 - Reviewers: 30/40 J Am Coll Cardiol 2007; DOI:10.1016/j.jacc.2007.02.028 Evidence to Guidelines • Quality Framework for Refinement - Rigorous and standardized methodology (GRADE) - Emphasize clinical importance over statistical significance - Transparent and explicit benefit-risk assessment • Efficiency - User-friendly and parsimonious (avoid the 160 page report) • Timeliness - Keep pace with advances (annual updates) • Dissemination - Direct clinical relevance (at point of care via EMR) - Guide and inform clinical practice (performance measures) - Financial incentives (evidence-based reimbursement) Evidence to Guidelines Framework for Refinement • • Firewall between systematic review & guideline development Multidisciplinary guideline developers: methodologists, clinical content experts, patient representatives • • • • Avoid LOE C recommendations (best suited as “advisories”) Minimize conflicts of interest (COI) for writers/reviewers “Zero tolerance” COI policy for chairs PIs of guideline-relevant trials should only serve as advisors Evidence-Based Medicine ACC Improvement Initiatives • Turbocharging guidelines (18 currently available, 9 in development, 6 being updated) • Transform and transfer guidelines at the point of care - Just in time strategies (Vivisimo, Cardio Compass) • Appropriate use criteria (Noninvasive imaging, CABG/PCI) • Quality initiatives (D2B, H2H, FOCUS) • Registries - NCDR (CathPCI, ICD, CARE, ACTION-GWG, IMPACT, PINNACLE) • Physician incentives (PQRI, ACO) • Patient involvement (CardioSmart) Framework for Increased Adherence to Clinical Practice Guidelines and to EBM • Treat as “guides”, not “rules” • Patient-specific, not disease-specific • Pragmatic/assistive, not prescriptive/directive • Flexible and adapted to local practice • Based on empirical high-quality evidence, not “codified” or “filtered” expert clinical opinion • Drive the standard of care, not be driven by them • Inform clinical judgment, not replace it “Evidence-Based” Not “Evidence-Bound” Three Key Dimensions Scientific evidence Patient preference Clinical Judgment Complexity of Evidence-Based Guidelines Illusion of understanding? Illusion of control? "Yes, I have tricks in my pocket, I have things up my sleeve. But I am the opposite of a stage magician. He gives you illusion that has the appearance of truth. I give you truth in the pleasant disguise of illusion." Tennessee Williams (The Glass Menagerie) Caveats in Interpretation of Meta-analysis “Although it challenges logic that one could obtain new accurate information from the quantitative integration of a number of very diverse studies, the numerous meta-analyses published speak for themselves. Used in the proper setting, I think they can make a valuable contribution. The job of the Journal will be to ensure that those published are in this setting and are methodologically sound.” Anthony N DeMaria, MD Editor-in-Chief, JACC Has the Journal lived up to its ideals? J Am Coll Cardiol, 2008; 52:237-238 Turbocharging the Guidelines ACC Improvement Initiatives • • Separate systematic review from guideline development Multidisciplinary guideline developers: methodologists, clinical content experts, patient representatives • • • • Avoid LOE C recommendations (best suited as “advisories”) Minimize conflicts of interest (COI) for writers/reviewers “Zero tolerance” COI policy for chairs PIs of guideline-relevant trials should only serve as advisors Fee for Service or Fee for Appropriate Service? What is the Gold Standard? Do we practice evidence-based medicine or reimbursement-based medicine? PCI for Stable CAD Disconnect Between Policy and Practice • Number of PCIs performed for stable CAD - ~ 500,000/yr at cost of $20K per PCI ($10 billion) • Appropriateness of PCI - Presence of ischemic symptoms - Objective evidence of ischemia by stress testing - Failed trial of optimal medical therapy and lifestyle Rx • The real-world practice - 20% of pts referred for PCI are asymptomatic (ACC-NCDR) - 30-50% of pts have not had a stress test (Topol / Lin et al.); untold (?60-70%) number of stress tests are “negative” - 30% of pts not taking anti-ischemic meds (Samuels et al.) Evidence-Based Reimbursement for Stable CAD A Financial Incentive for Health Care Reform • Reimbursement for PCI - $20K per PCI • Score based on appropriateness - Presence of ischemic symptoms (1/3) - Objective evidence of ischemia by stress testing (1/3) - Failed trial of optimal medical therapy and lifestyle Rx (1/3) • Sliding-scale reimbursement - 20% reward for a score of 1 - 20% discount for a score of 2/3 - 60% discount for a score of 1/3 - 100% discount for a score of 0 Diamond and Kaul, Circulation Cardiovasc Qual Outcomes 2009; Archives of Int Med 2009 Evidence-Based Reimbursement for Stable CAD A Financial Incentive for Health Care Reform Symptoms + + + + Total Stress test + + + + - Treatment Score Payment Patients Reimbursement + + + + - 1 2/3 2/3 2/3 1/3 1/3 1/3 0 $24K $16K $16K $16K $8K $8K $8K 0 15,000 85,000 45,000 3,750 255,000 21,250 11,250 63,750 500,000 $0.36B $1.36B $0.72B $0.06B $2.04B $0.17B $0.09B $0B $4.8B • 80% of patients complain of ischemic symptoms • 50% of patients undergo stress testing; 50% of these are ischemic • 15% of patients are receiving OMT 13% reduction in caseload and 52% reduction in reimbursement Statistical Significance vs. Clinical Importance Class I, LOE A Recommendations for UA/NSTEMI Impact on Death or MI Intervention Control (%) Rx (%) Summary risk ratio (95% CI) P Value NNT (95% CI) Interpretation of Confidence Intervals (MCID = 15% RRR) Aspirin (N=2,856) 12.8 5.5 0.43 (0.33-0.56) <0.01 14 (11-19) Statistically significant and clinically important (E) UFH (N=1,353) 10.4 7.9 0.67 (0.44-1.02) 0.06 44 (∞-18) Statistically not significant, maybe clinically important (B) Enoxaparin (Early invasive) 12.8 12.1 0.96 (0.88-1.05) 0.35 171 (∞-59) Statistically not significant, clinically not important (A) Clopidogrel (CURE) 11.4 9.3 0.82 (0.74-0.92) <0.01 54 (35-120) Statistically significant, maybe clinically important (D) GP IIb/IIIa (Early invasive) 14.5 11.8 0.81 (0.70-0.94) 0.007 37 (21-139) Statistically significant, maybe clinically important (D) Early statin (A-to-Z) 12.4 11.1 0.89 (0.74-1.07) 0.21 87 (∞-33) Statistically not significant, maybe clinically important (B)