Assessment Methodology: Lessons from OMERACT Meetings Vibeke Strand, MD Biopharmaceutical Consultant Adjunct Clinical Professor, Division of Immunology, Stanford University OMERACT: Outcome Measures in Rheumatology Clinical Trials • I: 1992: Rheumatoid Arthritis Clinical Trials • II: 1994: Adverse Events → Establishment of Registries Health Related Quality of Life Economic Evaluations • III: 1996: Osteoarthritis Osteoporosis Psychosocial Measures • IV: 1998: Longitudinal Observational Studies RA Response Criteria / Imaging Ankylosing Spondylitis → ASAS Systemic Lupus Erythematosus • 5: 2000: MCID Economics: Cost Effectiveness Imaging: Radiography and MRI • 6: 2002: Economic Evaluations Imaging What is OMERACT? • Data driven process to define outcome measures to be used in RCTs and LOS for each clinical indication • Domains derived from the “Ds”: Discomfort Disability Dollar cost Death • Literature reviews, data available from LOS and RCTs: • Validity of currently defined instruments to assess outcome • “Data mining” to better understand clinical response • Correlation of patient reported responses with other outcome measures • Definition of “minimally clinically important improvement” = MCID What is OMERACT? • Presentation of evidence and development of consensus at each conference: Representatives from: Academia, Clinical Investigators, Regulatory Agencies, Sponsors, Clinical Rheumatologists • Goal: To Develop Recommendations for: • “Core Set” of minimum number of domains / outcome measures assessed in RCTs and LOS • Working agenda identifying ‘need’ to focus future work • Previous OMERACT Recommendations have been ratified by WHO / ILAR in RA, OA, SLE, including HRQOL and Economic evaluations The OMERACT ‘Umbrella’ RHEUMATOID ARTHRITIS: EULAR ACR JRA: PRCSG SLE: SLICC EULAR OSTEOARTHRITIS: OARSI ANKYLOSING SPONDYLITIS: ASAS PAIN: IMMPACT The OMERACT Filter • TRUTH: Face, content, construct and criterion validity Is the measure truthful? Does it measure what is intended? • DISCRIMINATION: Reliability and sensitivity to change Does the measure discriminate between situations [states] of interest? • FEASIBILITY: Can the outcome easily be measured given constraints of time, money and interpretability? Boers et al: JRheum 1998: 25: 198-9 Rheumatoid Arthritis: OMERACT I, 1992 • RCTs available, but data limited • Only a few included a measure of physical function • General ‘belief’ that none had demonstrated convincing efficacy • “Paper patients” derived from actual RCT data • → [healthy] arguments regarding changes reported • Clear disagreement about importance of MD Global assessments • Participants ranked patient reported physical function and SJC highest when assessing efficacy • Facilitated recognition that ‘perception’ of benefit variable ACR Response Criteria • Defined and Ratified after OMERACT I Data driven nominal group process • Based on Paulus criteria and statistical analyses of CSSRD and MTX RCTs best differentiating active therapy from placebo • Require ≥20% improvement in 5 of 7 measures: • Tender Joint Count and Swollen Joint Count • and 3 of the following 5: MD Global Physical function: HAQ Pain by VAS Patient Global ESR and/or CRP EULAR Response Definition Decrease in DAS28 DAS28 Score ≤3.2 >3.2 and ≤5.1 >0.6 to ≤1.2 >1.2 ≤0.6 Good Moderate >5.1 None DIscriminant function analysis of patients w/active; inactive RA Disease activity state determined by treatment changes Van Gestel et al. Arth Rheum 1996; 26:705-11 As Demonstrated in RA, Responder Analyses Have Face and Content Validity • Allow assessment of multiple domains • Facilitate comparison of efficacy across: • Products • Heterogeneous populations, and • Disease indications • May lead to tiered approach to label indications • Precedent: ACR Responder Index in RA DAS28 both confirms active disease at baseline and ‘clinical responses’ Additional data by x-ray and HRQOL Rheumatoid Arthritis: Later Efforts • Demonstrated that ‘generic’ measures of HRQOL sensitive to change in RA RCTs • Identified ‘MCID’ for HAQ and SF-36……facilitating: • Comparisons across products, disease populations • Economic evaluations • Helped to show impact of ‘Rheumatic Diseases’ to WHO • In this Bone and Joint Decade • Identified importance of Rheumatic Diseases relative to CV, DM, HTN, OP…. • [Hopefully] → allocation of more resources to identify and treat Rheumatic Diseases….. Minimum Clinically Important Differences [MCID] • Degree of improvement • Perceptible to patients = clinically important/ meaningful • Defined by patient query, delphi technique OMERACT: 33-36% improvement;18% > placebo • Confirmed by statistical correlations with patient global assessments in RCTs in RA and OA • Determination of proportion of patients with clinically important improvement provides a more interpretable result with direct clinical implications Minimum Clinically Important Differences [MCID] Score Range 1 2 3 4 5 6 7 Direction of Scoring MCID Literature HAQ DI 1-4 0-3 – 0.22 SF-36 2, 5-7 0 - 100 + 5 - 10 points PCS/MCS mean 50 ± 10 + 2.5 - 5 points Guzman et al. Arth Rheum. 1996; 39:5208 Kosinski et al. Arth Rheum. 2000; 43:1478-87 Redelmeier et al. Arch Intern Med. 1993; 153:1337-42 Wells et al. J Rheumatol. 1993; 20:557-60 Kosinski et al. Arth Rheum. 2000; 43:S140 Samsa et al. Pharmacoeconomics. 1999; 15:141-155 Thumboo et al. J Rheumatol. 1999; 26:97-102. Health Assessment Questionnaire (HAQ) • Widely accepted, validated, rheumatology-specific instrument to assess physical function in RA • Gold Standard: OMERACT/FDA Guidance • 20 questions covering 8 types of activities Dressing + Grooming; Arising; Eating; Walking; Hygiene; Reaching; Gripping, Activities of Daily Living • HAQ Disability Index (HAQ DI) • Scores the worst items within each of the eight scales • Based on use of aids and devices Mean Improvement in HAQ Disability Index Year-2 Cohorts at 24 Months LEF MTX SSZ Worsening Mean Change from Baseline US301 0 (101) (97) MN301/303/305 (51) MN302/304 (46) (248) (273) -0.56 -0.48 -0.56 -0.22 -0.37 -0.5 -0.6 * *LEF vs MTX; p=0.01 -0.73 Improvement -1 % Achieving MCID 84% 69% 86% 82% 74% 78% ATTRACT: HAQ Disability Index Mean Improvement through Week 102 0.5 Mean improvement 0.5 0.5 0.45 0.4 0.4 0.4 0.3 0.2 0.2 0.1 0 MTX + Placebo 3 mg/kg q8w p-value vs. MTX + Placebo < 0.001 3 mg/kg q4w 10 mg/kg q8w 10 mg/kg q4w < 0.001 < 0.001 < 0.001 All infliximab ERA: Mean Change in HAQ DI at Month 12 Baseline HAQ DI: 1.6 1.6 -0.1 Mean Change from BL -0.2 MCID -0.3 MTX ETN -0.4 -0.5 -0.6 -0.7 -0.70 -0.8 -0.80 Kosinski et al. AJMC. 2002;8:231-240 Mean Changes in HAQ DI at Weeks 24 and 52 Anakinra+MTX Mean Change from Baseline Baseline: 1.38 Placebo 1.43 Active 0.0 -0.1 -0.2 -0.3 -0.15 -0.18 -0.29 -0.28 Placebo+MTX Anakinra+MTX -0.4 -0.5 -0.6 -0.7 -0.8 24 weeks MCID 52 weeks Fleishman et al. Arth Rheum. 2002;46:S574. Mean Changes in HAQ DI at Weeks 24 and 52 Mean Change from Baseline DE019: Adalimumab+MTX 0.0 -0.1 -0.2 -0.3 MCID -0.24 -0.25 -0.4 -0.5 -0.6 -0.6 -0.56 -0.7 -0.8 24 weeks -0.61 -0.59 52 weeks Keystone E. Arthritis & Rheum 2002; 46(9) suppl. Placebo BL 1.48 Adalimumab 20 mg weekly BL 1.45 Adalimumab 40 mg eow BL 1.44 Mean Changes in HAQ DI from Weeks 30 to 54 ASPIRE RCT Mean Change from BL; Wks 30-54 Baseline HAQ DI: 1.5 1.5 1.5 -0.1 -0.2 MCID -0.3 -0.4 MTX MTX+INF 3 mg/kg MTX+INF 6mg/kg -0.5 -0.6 -0.7 -0.8 -0.75 % Achieving MCID: 65 -0.78 76 -0.79 76 Smolen et al. Ann Rheum Ds 2003;62:S64 Mean Changes in HAQ DI at Weeks 52 TEMPO RCT Baseline HAQ DI: 1.7 1.7 1.8 -0.1 Mean Change from BL -0.2 MCID -0.3 MTX ETN MTX+ETN -0.4 -0.5 -0.6 -0.7 -0.61 -0.66 -0.8 -0.9 -1.0 -0.97 SF-36: Short Form 36 Health Survey • Validated, widely used generic measure of HRQOL • 8 Domains: • Scored 0 - 100; age, sex adjusted rates • 2 Summary Scores • Physical Component: PCS – Measures how decrements in physical function affect day to day activities – Impact of physical impairment/disability on HRQOL • Mental Component: MCS – Impact of mental affect, symptoms of pain on HRQOL • Normative based scoring (Mean: 50, SD: 10) SF-36 Two-Component Model Physical Component Physical Function Role Physical Bodily Pain General Health Vitality Mental Component Social Function Role Emotion Mental Health US 301: Baseline SF-36 Scores US Norms vs US301 Population US Norms (A/S Adjusted) Study US301 Population 100 90 80 70 60 50 40 30 20 10 0 Physical Function Role Physical Bodily Pain General Health Perception Vitality Social Function Role Emotion Mental Health US301: Mean Improvement in SF-36: Year-2 Cohorts Leflunomide and Methotrexate LEF 24 Months (n = 93) MTX 24 Months (n = 89) Better US Norms (A/S Adjusted) Baseline Year-2 Cohort 90 80 Mean Scores 70 60 50 40 30 20 10 0 Physical Function Role Bodily Physical Pain General Vitality Health Perception Social Function Role Emotion Mental Health Mean Changes in SF-36 Scores DE019: Adalimumab+MTX Mean Change From Baseline 35 Placebo Adalimumab (40 mg) QOW 28.1 30 25 23.3 20 15 16.9 15.5 14.6 13.5 13.4 8.7 8.2 10 7.5 5.2 5 9.0 3.5 6.7 5.2 MCID 2.3 0 Keystone E. Arthritis & Rheum 2002; 46 suppl. Leflunomide and Methotrexate: Mean Changes in SF-36 PCS Year-2 Cohort (US301) US Norm 50 42.7 41.7 Mean Scores Improved 60 38.6 38.8 40 30 30.9 30.2 20 10 0 BL 12 M 24 M BL 12 M 24 M LEF (93) MTX (97) 2 SDs below US Norm Etanercept and Methotrexate: Mean Changes SF-36 PCS at 12 Months (ERA) US Norm 50 Mean Scores Improved 60 38.7 40 30 28.0 38.8 2 SDs below US Norm 29.2 20 10 0 BL 12M ETN 25mg (193) Kosinski et al. AJMC. 2002;8:231-240. BL 12M MTX (199) Infliximab: Median Improvement in SF-36 PCS at Month 24 (ATTRACT) 16 Median (IQR) Baseline: 23.9 –30.8 12 8 6.8 6.9 6.7 3 mg/kg q 4 wks (n=86) <0.001 10 mg/kg q 8 wks (n=87) <0.001 10 mg/kg q 4 wks (n=81) <0.001 4.6 4 0 2.8 MTX + Placebo (n=88) p-value vs. placebo 3 mg/kg q 8 wks (n=86) 0.011 Kavanaugh et al. Arth Rheum. 2000;43:S147. Anakinra+MTX: Mean Improvement in SF-36 PCS at Month 12 Baseline: 29.9 PL 28.8 Active Fleishman et al. Arth Rheum. 2002;46:S574. Correlation Between HAQ and SF-36 Reference Study Scales Correlation Ruta 1 — PCS -0.77 Talamo 2 — PF -0.72 Infliximab/ATTRACT PCS PF -0.51 -0.54 Kosinski 4 Etanercept/ERA PCS PF -0.60 -0.61 Lubeck 5 Etanercept/RAPOLO PCS PF -0.79 -0.82 Strand 6 Leflunomide/US301 PCS PF -0.60 -0.74 Kavanaugh 3 1 2 3 4 5 6 Ruta et al. Br J Rheum. 1998;37:425-436. Talamo et al. Br J Rheum. 1997;36:463-469. Kavanaugh et al. A&R. 2000;43:S147. Kosinski et al. Medical Care. 1999;37:MS23-39. Lubeck et al. Value in Health. 2001;4:MS2,163. Strand et al. A&R. 2001;44:S187. MCID Values Are Consistent in RCTs in RA • Improvements in HAQ DI and SF-36 in RA with newly approved therapies are statistically significant; more importantly, CLINICALLY MEANINGFUL • MCID values are consistent across agents and patient populations • Disease specific [‘relevant’] measure: HAQ • Generic measure: SF-36 • Improvements in disease specific highly correlated with generic measures MCID Workshop: Identifying Candidate Measures to Define ‘Low Disease Activity State’ • Pain • Function • Inflammation • Health Related Quality of life • Structure damage • Toxicity • Co-morbidity • Fatigue Osteoarthritis • OMERACT III: 1996 • Candidate instruments to assess: • Pain • Stiffness • Physical Function • Limited data from RCTs; treatments offering only symptomatic benefit • Identification of a ‘Core Set’ of 4 Domains as a foundation for future work • Research Agenda: Identification of ‘Disease Control’, ‘Biologic Markers’ of Response Western Ontario and McMaster Universities (WOMAC) Osteoarthritis Index • Self-administered questionnaire • Developed querying patients with hip or knee OA • Reflects physical activities most affected by symptoms, disease manifestations • Composite score based on 24 questions; subscores: • Pain (5 questions) • Joint stiffness (2 questions) • Physical function (17 questions) • Scored by 0 - 4 Likert or 0 - 10 cm VAS scales • Improvement = negative change BIOLOGIC MARKERS HRQOL / UTILITY INFLAMMATION PAIN PHYSICAL FUNCTION PATIENT GLOBAL IMAGING (≥1YR) 90% STIFFNESS 36% 8% MD GLOBAL OTHER Eg, Performance based Flares Time to Surgery Analgesic Count Placement Consequence INNER Core CORE SET ≥30% - <90% MIDDLE Core HRQOL/ Utility (Strongly Recommended) 0% - <30% OUTER Core OPTIONAL % Voting for inclusion ≥ 90% Outcome Measures in OA: OARSI Guidelines OMERACT Core Set and ‘Strongly Recommended’ Pain: WOMAC pain / stiffness subscales Differentiating pain from stiffness Physical function: WOMAC physical function subscale Patient Global Assessment: How to phrase question? Signal joint In all the ways arthritis affects you, how are you doing today? Transition question HRQOL/Utilities: WOMAC Composite Score SF-36 EQ5D / Utilities MD Global Assessment WOMAC Scores in OA RCTs: Identifying MCID • MCID in WOMAC composite score, Likert scale: • Anchored to Patient Global Assessment • 12 wk pivotal OA RCTs with Celecoxib: 10.1 [0 – 89] • Pain, Stiffness, Physical Fxn: 2.1, 1.2, 6.5 [0 – 20] [0 – 8] [0 – 61] Zhao et al. Pharmacother 1999;19:1269-78 • MCID in WOMAC VAS: • Anchored to Patient Response to Rx [0-4 Likert scale] • 6 wk RCTs OA hip, knee; Rofecoxib v Ibuprofen v PL: • Pain, Stiffness, Physical Fxn: 9.7, 10, 9.3 mm, VAS • 11 mm VAS for Patient Global Assessment Ehrich et al: JRheum 2000;27: 2635-2641 Improvement in WOMAC Composite Scores at Week 12 : Pivotal OA RCTs, Celecoxib MCID = 10.1 (SE=0.4) 14 * * * 12 10 * * * * * * * * * 8 6 4 2 0 CT20: knee Placebo Cel 50 CT21: knee Cel 100 CT54: hip Cel 200 Zhao et al Pharmacother 1999;19:1269-78 * P <.05 v placebo Nap 500 WOMAC Physical Function Subscale, knee or hip OA at 12 months: Pivotal RCT, Rofecoxib Rofecoxib 12.5 mg MCID = 9.3 0 Rofecoxib 25 mg Mean Change (mm) -5 Diclofenac 150 mg -10 -15 -20 -25 -30 Mean baseline = 69.6 mm -35 R 2 4 8 12 26 Week R = randomization P < 0.05 for all groups; treatment response compared with baseline Cannon GW, et al. Arthritis Rheum. 2000;43:978–987. 39 52 SF-36 in Osteoarthritis RCTs • Truth or Validity • Domains, especially Bodily Pain discriminated differences/ changes in symptoms over time • Closer correlation with patient assessed outcomes • Feasibility or Reliability Ware et al: A+R 1996; 39:S90 • Ceiling effects minimal; floor effects for RP and RE domains Ware et al: A+R 1996; 39:S90 • Able to detect effects of arthritis in community sample • Discrimination or Responsiveness Hill et al: JRheum 1999; 26:2029-35 • In longitudinal tests, BP domain and PCS summary score most responsive, even within 2-6 weeks Bellamy et al, A+R 2000; S221 • Valid and responsive measure of TKR, esp long term Brooks et al, A+R 1997; 40:S110 • Short term treatment → significant improvement in MCS Ehrich et al: JRheum 2000;27: 2635-2641 Mean Improvement in SF-36: All Rofecoxib v Normative Data US Population Difference between ages 45-54 and 55-64 US population. Ware et al 1993 25 Improvement 20 15 10 5 0 US Norms PF RP PAIN GHP VITAL Rofecoxib SOC RE MH Change in SF-36 Scores at Week 12: OA of knee Pivotal Trial with Celecoxib * p < .05 v placebo 24 * 19 * * * * * 14 * * * * * 9 * * * * * * * * 4 -1 PF Placebo RP BP Cel 50 GH Cel 100 VT SF Cel 200 RE MH Nap 500 Use of WOMAC and SF-36 in RCTs of OA Conclusions Based on the COX-2 Experience • WOMAC Questionnaire reflects clinical improvement consistent with other patient assessed measures • Proved valid, reliable and sensitive to change • Pain and stiffness subscales reflect symptoms • Physical function subscale dominates composite score • WOMAC Composite score is a disease specific measure of HRQOL • Correlates closely with improvements reported by generic SF-36 • Based on MCID calculations, Likert and VAS versions similarly sensitive to change OMERACT 4 SLE Module 1998: Goal • To develop consensus on required outcome domains to be assessed in clinical trials in SLE • Paucity of data from Randomized Controlled Trials [RCTs]; Most evidence derived from Longitudinal Observational Studies [LOS] Strand et al: J Rheum 1999; 26: 490-497 Smolen et al: J Rheum 1999; 26: 504-507 Disease Activity Indices BILAG, ECLAM, LAI, SLAM, SLEDAI • Good evidence for validity, discrimination, feasibility in published cohort [LOS] studies • Changes in one index correlated with others • Recommendation to use index of choice – Computer generation of all 5 indices facilitates: • Clinical research efforts: SLICC ESCICIT EURO-LUPUS • Exchange of information: interested parties biotech / pharma • Some limitations when used as primary outcome measures in RCTs; ongoing efforts to improve SF-36: Sensitive to Change in LOS in SLE • Baseline domain scores low in SLE – v. age/gender matched norms for Canada, Norway, UK, US – v. serious medical problems (IDDM, CAD) Gladman et al: J Rheum 1995; 23:1953-5 • In cohort studies reflects changes in disease activity measures – disease activity in PF, BP, GHP – disease activity SF-36 domain scores, esp. PF Gordon et al: A+R 1997; 40:S112 Gladman et al: Clin Exp Rheum 1995; 14:305-8 Stoll et al: J Rheum 1997; 24:309-13 and 1608-14 Fortin et al: Lupus 1998; 7:101-7 • Decrements in multiple domains correlate with increased disease activity and damage Abu-Shakra et al J Rheum 1999; 26:306-9 Thumboo et al J Rheum 1999; 26:97-102 Wang et al J Rheum 2001; 28:525-32 – Immunosuppressive use – ESRD Rood et al J Rheum 2000; 27:2057-9 Vu, Escalante J Rheum 1999; 26:2595-2601 Domains Recommended by OMERACT 4 Disease activity: Disease Activity Scores: SLEDAI, BILAG, ECLAM, SELENA SLEDAI, SLAM-R Definitions of Active Nephritis by U/A, 24 hour CCr, proteinuria, «Renal flare» «Major SLE Flare» Damage: ACR/SLICC Damage Index End Stage Renal Disease [ESRD] Doubling of Serum Creatinine Chronicity Index on Biopsy Bone loss due to disease activity and/or corticosteroids HRQOL: SF-36 [Should also include: Adverse events Economic costs including health utilities] As reviewed in Schiffenbauer et al: EBM Treatment of SLE; BJR: in press Ankylosing Spondylitis: ASAS • A successful and relevant example • To be discussed by Robert Landewe Juergen Braun Systemic Sclerosis Workshop: OMERACT 6 Absence of data: Few ‘failed’ RCTs Limited information from LOS Assessment by organ system involvement • Renal • Cardio-pulmonary • Muscle • HRQOL • Skin • GI OMERACT 7 May 12-16, 2004 Asilomar, California • Module: RA: Definition of Low Disease Activity • Module Updates: Imaging in Ankylosing Spondylitis [ASAS] Working Group on Safety • Workshops: Outcome Measures in Psoriatic Arthritis Outcome Measures in Fibromyalgia Outcome Measures in Gout The Patient Perspective in Outcome Measures