Assessment Methodology:
Lessons from
OMERACT Meetings
Vibeke Strand, MD
Biopharmaceutical Consultant
Adjunct Clinical Professor,
Division of Immunology, Stanford University
OMERACT: Outcome Measures in
Rheumatology Clinical Trials
• I:
1992: Rheumatoid Arthritis Clinical Trials
• II:
1994: Adverse Events → Establishment of Registries
Health Related Quality of Life
Economic Evaluations
• III: 1996: Osteoarthritis
Osteoporosis
Psychosocial Measures
• IV: 1998: Longitudinal Observational Studies
RA Response Criteria / Imaging
Ankylosing Spondylitis → ASAS
Systemic Lupus Erythematosus
• 5:
2000: MCID
Economics: Cost Effectiveness
Imaging: Radiography and MRI
• 6:
2002: Economic Evaluations
Imaging
What is OMERACT?
• Data driven process to define outcome measures to be used in
RCTs and LOS for each clinical indication
• Domains derived from the “Ds”:
Discomfort
Disability
Dollar cost
Death
• Literature reviews, data available from LOS and RCTs:
• Validity of currently defined instruments to assess outcome
• “Data mining” to better understand clinical response
• Correlation of patient reported responses with other
outcome measures
• Definition of “minimally clinically important improvement”
= MCID
What is OMERACT?
• Presentation of evidence and development of consensus at
each conference:
Representatives from: Academia, Clinical Investigators,
Regulatory Agencies, Sponsors,
Clinical Rheumatologists
• Goal: To Develop Recommendations for:
• “Core Set” of minimum number of domains /
outcome measures assessed in RCTs and LOS
• Working agenda identifying ‘need’ to focus future work
• Previous OMERACT Recommendations have been ratified by
WHO / ILAR in RA, OA, SLE, including HRQOL and
Economic evaluations
The OMERACT ‘Umbrella’
RHEUMATOID ARTHRITIS:
EULAR
ACR
JRA:
PRCSG
SLE: SLICC
EULAR
OSTEOARTHRITIS:
OARSI
ANKYLOSING
SPONDYLITIS: ASAS
PAIN:
IMMPACT
The OMERACT Filter
• TRUTH:
Face, content, construct and criterion validity
Is the measure truthful?
Does it measure what is intended?
• DISCRIMINATION:
Reliability and sensitivity to change
Does the measure discriminate between situations
[states] of interest?
• FEASIBILITY:
Can the outcome easily be measured given constraints
of time, money and interpretability?
Boers et al: JRheum 1998: 25: 198-9
Rheumatoid Arthritis: OMERACT I, 1992
• RCTs available, but data limited
• Only a few included a measure of physical function
• General ‘belief’ that none had demonstrated
convincing efficacy
• “Paper patients” derived from actual RCT data
• → [healthy] arguments regarding changes reported
• Clear disagreement about importance of MD Global
assessments
• Participants ranked patient reported physical function
and SJC highest when assessing efficacy
• Facilitated recognition that ‘perception’ of benefit variable
ACR Response Criteria
• Defined and Ratified after OMERACT I
Data driven nominal group process
• Based on Paulus criteria and statistical analyses of
CSSRD and MTX RCTs
best differentiating active therapy from placebo
• Require ≥20% improvement in 5 of 7 measures:
• Tender Joint Count
and Swollen Joint Count
• and 3 of the following 5:
MD Global
Physical function: HAQ
Pain by VAS
Patient Global
ESR and/or CRP
EULAR Response Definition
Decrease in DAS28
DAS28 Score
≤3.2
>3.2 and ≤5.1
>0.6 to ≤1.2
>1.2
≤0.6
Good
Moderate
>5.1
None
DIscriminant function analysis of patients w/active; inactive RA
Disease activity state determined by treatment changes
Van Gestel et al. Arth Rheum 1996; 26:705-11
As Demonstrated in RA, Responder Analyses
Have Face and Content Validity
• Allow assessment of multiple domains
• Facilitate comparison of efficacy across:
• Products
• Heterogeneous populations, and
• Disease indications
• May lead to tiered approach to label indications
• Precedent: ACR Responder Index in RA
DAS28 both confirms active disease at
baseline and ‘clinical responses’
Additional data by x-ray and HRQOL
Rheumatoid Arthritis: Later Efforts
• Demonstrated that ‘generic’ measures of HRQOL sensitive
to change in RA RCTs
• Identified ‘MCID’ for HAQ and SF-36……facilitating:
• Comparisons across products, disease populations
• Economic evaluations
• Helped to show impact of ‘Rheumatic Diseases’ to WHO
• In this Bone and Joint Decade
• Identified importance of Rheumatic Diseases relative to
CV, DM, HTN, OP….
• [Hopefully] → allocation of more resources to
identify and treat Rheumatic Diseases…..
Minimum Clinically Important Differences
[MCID]
• Degree of improvement
• Perceptible to patients = clinically important/ meaningful
• Defined by patient query, delphi technique
OMERACT: 33-36% improvement;18% > placebo
• Confirmed by statistical correlations with patient global
assessments in RCTs in RA and OA
• Determination of proportion of patients with clinically
important improvement provides a more interpretable
result with direct clinical implications
Minimum Clinically Important Differences
[MCID]
Score
Range
1
2
3
4
5
6
7
Direction
of Scoring
MCID
Literature
HAQ DI 1-4
0-3
–
0.22
SF-36 2, 5-7
0 - 100
+
5 - 10 points
PCS/MCS mean 50 ± 10
+
2.5 - 5 points
Guzman et al. Arth Rheum. 1996; 39:5208
Kosinski et al. Arth Rheum. 2000; 43:1478-87
Redelmeier et al. Arch Intern Med. 1993; 153:1337-42
Wells et al. J Rheumatol. 1993; 20:557-60
Kosinski et al. Arth Rheum. 2000; 43:S140
Samsa et al. Pharmacoeconomics. 1999; 15:141-155
Thumboo et al. J Rheumatol. 1999; 26:97-102.
Health Assessment Questionnaire (HAQ)
• Widely accepted, validated, rheumatology-specific
instrument to assess physical function in RA
• Gold Standard: OMERACT/FDA Guidance
• 20 questions covering 8 types of activities
Dressing + Grooming; Arising; Eating; Walking; Hygiene;
Reaching; Gripping, Activities of Daily Living
• HAQ Disability Index (HAQ DI)
• Scores the worst items within each of the eight scales
• Based on use of aids and devices
Mean Improvement in HAQ Disability Index
Year-2 Cohorts at 24 Months
LEF
MTX
SSZ
Worsening
Mean Change from Baseline
US301
0
(101)
(97)
MN301/303/305
(51)
MN302/304
(46)
(248)
(273)
-0.56
-0.48
-0.56
-0.22
-0.37
-0.5
-0.6
*
*LEF vs MTX; p=0.01
-0.73
Improvement
-1
% Achieving MCID 84%
69%
86%
82%
74%
78%
ATTRACT: HAQ Disability Index
Mean Improvement through Week 102
0.5
Mean improvement
0.5
0.5
0.45
0.4
0.4
0.4
0.3
0.2
0.2
0.1
0
MTX + Placebo
3 mg/kg
q8w
p-value vs. MTX + Placebo < 0.001
3 mg/kg
q4w
10 mg/kg
q8w
10 mg/kg
q4w
< 0.001
< 0.001
< 0.001
All
infliximab
ERA: Mean Change in HAQ DI at Month 12
Baseline HAQ DI:
1.6
1.6
-0.1
Mean Change from BL
-0.2
MCID
-0.3
MTX
ETN
-0.4
-0.5
-0.6
-0.7
-0.70
-0.8
-0.80
Kosinski et al. AJMC. 2002;8:231-240
Mean Changes in HAQ DI at Weeks 24 and 52
Anakinra+MTX
Mean Change from Baseline
Baseline: 1.38 Placebo
1.43 Active
0.0
-0.1
-0.2
-0.3
-0.15
-0.18
-0.29
-0.28
Placebo+MTX
Anakinra+MTX
-0.4
-0.5
-0.6
-0.7
-0.8
24 weeks
MCID
52 weeks
Fleishman et al. Arth Rheum. 2002;46:S574.
Mean Changes in HAQ DI at Weeks 24 and 52
Mean Change from Baseline
DE019: Adalimumab+MTX
0.0
-0.1
-0.2
-0.3
MCID
-0.24
-0.25
-0.4
-0.5
-0.6
-0.6
-0.56
-0.7
-0.8
24 weeks
-0.61
-0.59
52 weeks
Keystone E. Arthritis & Rheum 2002; 46(9) suppl.
Placebo BL 1.48
Adalimumab 20 mg weekly BL 1.45
Adalimumab 40 mg eow BL 1.44
Mean Changes in HAQ DI from Weeks 30 to 54
ASPIRE RCT
Mean Change from BL; Wks 30-54
Baseline HAQ DI: 1.5
1.5
1.5
-0.1
-0.2
MCID
-0.3
-0.4
MTX
MTX+INF 3 mg/kg
MTX+INF 6mg/kg
-0.5
-0.6
-0.7
-0.8
-0.75
% Achieving MCID: 65
-0.78
76
-0.79
76
Smolen et al. Ann Rheum Ds 2003;62:S64
Mean Changes in HAQ DI at Weeks 52
TEMPO RCT
Baseline HAQ DI: 1.7
1.7
1.8
-0.1
Mean Change from BL
-0.2
MCID
-0.3
MTX
ETN
MTX+ETN
-0.4
-0.5
-0.6
-0.7
-0.61
-0.66
-0.8
-0.9
-1.0
-0.97
SF-36: Short Form 36 Health Survey
• Validated, widely used generic measure of HRQOL
• 8 Domains:
• Scored 0 - 100; age, sex adjusted rates
• 2 Summary Scores
• Physical Component: PCS
– Measures how decrements in physical function
affect day to day activities
– Impact of physical impairment/disability on
HRQOL
• Mental Component: MCS
– Impact of mental affect, symptoms of pain on
HRQOL
• Normative based scoring (Mean: 50, SD: 10)
SF-36 Two-Component Model
Physical
Component
Physical
Function
Role
Physical
Bodily
Pain
General
Health
Vitality
Mental
Component
Social
Function
Role
Emotion
Mental
Health
US 301: Baseline SF-36 Scores
US Norms vs US301 Population
US Norms (A/S Adjusted)
Study US301 Population
100
90
80
70
60
50
40
30
20
10
0
Physical
Function
Role
Physical
Bodily
Pain
General
Health
Perception
Vitality
Social
Function
Role
Emotion
Mental
Health
US301: Mean Improvement in SF-36: Year-2
Cohorts Leflunomide and Methotrexate
LEF 24 Months (n = 93)
MTX 24 Months (n = 89)
Better
US Norms (A/S Adjusted)
Baseline Year-2 Cohort
90
80
Mean Scores
70
60
50
40
30
20
10
0
Physical
Function
Role
Bodily
Physical Pain
General Vitality
Health
Perception
Social
Function
Role
Emotion
Mental
Health
Mean Changes in SF-36 Scores
DE019: Adalimumab+MTX
Mean Change From Baseline
35
Placebo
Adalimumab (40 mg) QOW
28.1
30
25
23.3
20
15
16.9
15.5
14.6 13.5
13.4
8.7
8.2
10
7.5
5.2
5
9.0
3.5
6.7
5.2
MCID
2.3
0
Keystone E. Arthritis & Rheum 2002; 46 suppl.
Leflunomide and Methotrexate: Mean
Changes in SF-36 PCS Year-2 Cohort (US301)
US Norm
50
42.7 41.7
Mean Scores
Improved
60
38.6 38.8
40
30
30.9
30.2
20
10
0
BL 12 M 24 M
BL 12 M 24 M
LEF (93)
MTX (97)
2 SDs
below
US Norm
Etanercept and Methotrexate: Mean Changes
SF-36 PCS at 12 Months (ERA)
US Norm
50
Mean Scores
Improved
60
38.7
40
30
28.0
38.8
2 SDs
below
US Norm
29.2
20
10
0
BL
12M
ETN 25mg (193)
Kosinski et al. AJMC. 2002;8:231-240.
BL
12M
MTX (199)
Infliximab: Median Improvement in SF-36 PCS
at Month 24 (ATTRACT)
16
Median (IQR)
Baseline:
23.9 –30.8
12
8
6.8
6.9
6.7
3 mg/kg
q 4 wks
(n=86)
<0.001
10 mg/kg
q 8 wks
(n=87)
<0.001
10 mg/kg
q 4 wks
(n=81)
<0.001
4.6
4
0
2.8
MTX +
Placebo
(n=88)
p-value vs. placebo
3 mg/kg
q 8 wks
(n=86)
0.011
Kavanaugh et al. Arth Rheum. 2000;43:S147.
Anakinra+MTX: Mean Improvement in SF-36
PCS at Month 12
Baseline: 29.9 PL
28.8 Active
Fleishman et al. Arth Rheum. 2002;46:S574.
Correlation Between HAQ and SF-36
Reference
Study
Scales
Correlation
Ruta 1
—
PCS
-0.77
Talamo 2
—
PF
-0.72
Infliximab/ATTRACT
PCS
PF
-0.51
-0.54
Kosinski 4
Etanercept/ERA
PCS
PF
-0.60
-0.61
Lubeck 5
Etanercept/RAPOLO
PCS
PF
-0.79
-0.82
Strand 6
Leflunomide/US301
PCS
PF
-0.60
-0.74
Kavanaugh 3
1
2
3
4
5
6
Ruta et al. Br J Rheum. 1998;37:425-436.
Talamo et al. Br J Rheum. 1997;36:463-469.
Kavanaugh et al. A&R. 2000;43:S147.
Kosinski et al. Medical Care. 1999;37:MS23-39.
Lubeck et al. Value in Health. 2001;4:MS2,163.
Strand et al. A&R. 2001;44:S187.
MCID Values Are Consistent in RCTs in RA
• Improvements in HAQ DI and SF-36 in RA
with newly approved therapies are statistically
significant; more importantly, CLINICALLY
MEANINGFUL
• MCID values are consistent across agents and patient
populations
• Disease specific [‘relevant’] measure: HAQ
• Generic measure: SF-36
• Improvements in disease specific highly correlated
with generic measures
MCID Workshop: Identifying Candidate Measures
to Define ‘Low Disease Activity State’
• Pain
• Function
• Inflammation
• Health Related Quality of life
• Structure damage
• Toxicity
• Co-morbidity
• Fatigue
Osteoarthritis
• OMERACT III: 1996
• Candidate instruments to assess:
• Pain
• Stiffness
• Physical Function
• Limited data from RCTs;
treatments offering only symptomatic benefit
• Identification of a ‘Core Set’ of 4 Domains as a
foundation for future work
• Research Agenda: Identification of ‘Disease Control’,
‘Biologic Markers’ of Response
Western Ontario and McMaster Universities
(WOMAC) Osteoarthritis Index
• Self-administered questionnaire
• Developed querying patients with hip or knee OA
• Reflects physical activities most affected by
symptoms, disease manifestations
• Composite score based on 24 questions; subscores:
• Pain (5 questions)
• Joint stiffness (2 questions)
• Physical function (17 questions)
• Scored by 0 - 4 Likert or 0 - 10 cm VAS scales
• Improvement = negative change
BIOLOGIC MARKERS
HRQOL / UTILITY
INFLAMMATION
PAIN
PHYSICAL
FUNCTION 
PATIENT GLOBAL
IMAGING (≥1YR)
90%


STIFFNESS
36%
8%
MD GLOBAL
OTHER Eg, Performance based
Flares
Time to Surgery
Analgesic Count
Placement

Consequence
INNER Core

CORE SET
≥30% - <90%
MIDDLE Core

HRQOL/ Utility (Strongly Recommended)
0% - <30%
OUTER Core

OPTIONAL
% Voting for inclusion
≥ 90%
Outcome Measures in OA: OARSI Guidelines
OMERACT Core Set and ‘Strongly Recommended’
Pain: WOMAC pain / stiffness subscales
Differentiating pain from stiffness
Physical function: WOMAC physical function subscale
Patient Global Assessment: How to phrase question?
Signal joint
In all the ways arthritis affects you, how are you
doing today?
Transition question
HRQOL/Utilities:
WOMAC Composite Score
SF-36
EQ5D / Utilities
MD Global Assessment
WOMAC Scores in OA RCTs: Identifying MCID
• MCID in WOMAC composite score, Likert scale:
• Anchored to Patient Global Assessment
• 12 wk pivotal OA RCTs with Celecoxib: 10.1 [0 – 89]
• Pain, Stiffness, Physical Fxn: 2.1, 1.2, 6.5
[0 – 20] [0 – 8] [0 – 61]
Zhao et al. Pharmacother 1999;19:1269-78
• MCID in WOMAC VAS:
• Anchored to Patient Response to Rx [0-4 Likert scale]
• 6 wk RCTs OA hip, knee; Rofecoxib v Ibuprofen v PL:
• Pain, Stiffness, Physical Fxn: 9.7, 10, 9.3 mm, VAS
• 11 mm VAS for Patient Global Assessment
Ehrich et al: JRheum 2000;27: 2635-2641
Improvement in WOMAC Composite Scores at
Week 12 : Pivotal OA RCTs, Celecoxib
MCID = 10.1 (SE=0.4)
14
*
*
*
12
10
*
*
*
*
*
*
*
*
*
8
6
4
2
0
CT20: knee
Placebo
Cel 50
CT21: knee
Cel 100
CT54: hip
Cel 200
Zhao et al Pharmacother 1999;19:1269-78
* P <.05 v placebo
Nap 500
WOMAC Physical Function Subscale, knee or
hip OA at 12 months: Pivotal RCT, Rofecoxib
Rofecoxib 12.5 mg
MCID = 9.3
0
Rofecoxib 25 mg
Mean Change (mm)
-5
Diclofenac 150 mg
-10
-15
-20
-25
-30
Mean baseline = 69.6 mm
-35
R 2 4
8
12
26
Week
R = randomization
P < 0.05 for all groups; treatment response compared with baseline
Cannon GW, et al. Arthritis Rheum. 2000;43:978–987.
39
52
SF-36 in Osteoarthritis RCTs
• Truth or Validity
• Domains, especially Bodily Pain discriminated differences/
changes in symptoms over time
• Closer correlation with patient assessed outcomes
• Feasibility or Reliability
Ware et al: A+R 1996; 39:S90
• Ceiling effects minimal;
floor effects for RP and RE domains
Ware et al: A+R 1996; 39:S90
• Able to detect effects of arthritis in community sample
• Discrimination or Responsiveness
Hill et al: JRheum 1999; 26:2029-35
• In longitudinal tests, BP domain and PCS summary
score most responsive, even within 2-6 weeks
Bellamy et al, A+R 2000; S221
• Valid and responsive measure of TKR, esp long term
Brooks et al, A+R 1997; 40:S110
• Short term treatment → significant improvement in MCS
Ehrich et al: JRheum 2000;27: 2635-2641
Mean Improvement in SF-36: All Rofecoxib
v Normative Data US Population
Difference between ages 45-54 and 55-64
US population. Ware et al 1993
25
Improvement
20
15
10
5
0
US Norms
PF
RP
PAIN
GHP
VITAL
Rofecoxib
SOC
RE
MH
Change in SF-36 Scores at Week 12: OA of knee
Pivotal Trial with Celecoxib
* p < .05 v placebo
24
*
19
*
*
*
*
*
14
*
*
*
*
*
9
* *
*
*
*
*
*
*
4
-1
PF
Placebo
RP
BP
Cel 50
GH
Cel 100
VT
SF
Cel 200
RE
MH
Nap 500
Use of WOMAC and SF-36 in RCTs of OA
Conclusions Based on the COX-2 Experience
• WOMAC Questionnaire reflects clinical improvement
consistent with other patient assessed measures
• Proved valid, reliable and sensitive to change
• Pain and stiffness subscales reflect symptoms
• Physical function subscale dominates composite score
• WOMAC Composite score is a disease specific measure
of HRQOL
• Correlates closely with improvements reported by
generic SF-36
• Based on MCID calculations, Likert and VAS versions
similarly sensitive to change
OMERACT 4 SLE Module 1998: Goal
• To develop consensus on required outcome domains
to be assessed in clinical trials in SLE
• Paucity of data from Randomized Controlled Trials [RCTs];
Most evidence derived from Longitudinal Observational
Studies [LOS]
Strand et al: J Rheum 1999; 26: 490-497
Smolen et al: J Rheum 1999; 26: 504-507
Disease Activity Indices
BILAG, ECLAM, LAI, SLAM, SLEDAI
• Good evidence for validity, discrimination, feasibility
in published cohort [LOS] studies
• Changes in one index correlated with others
• Recommendation to use index of choice
– Computer generation of all 5 indices facilitates:
• Clinical research efforts:
SLICC
ESCICIT
EURO-LUPUS
• Exchange of information:
interested parties
biotech / pharma
• Some limitations when used as primary outcome
measures in RCTs; ongoing efforts to improve
SF-36: Sensitive to Change in LOS in SLE
• Baseline domain scores low in SLE
– v. age/gender matched norms for Canada, Norway, UK, US
– v. serious medical problems (IDDM, CAD)
Gladman et al: J Rheum 1995; 23:1953-5
• In cohort studies reflects changes in disease activity measures
–  disease activity  in PF, BP, GHP
– disease activity  SF-36 domain scores, esp. PF
Gordon et al: A+R 1997; 40:S112 Gladman et al: Clin Exp Rheum 1995; 14:305-8
Stoll et al: J Rheum 1997; 24:309-13 and 1608-14 Fortin et al: Lupus 1998; 7:101-7
• Decrements in multiple domains correlate with increased disease
activity and damage
Abu-Shakra et al J Rheum 1999; 26:306-9
Thumboo et al J Rheum 1999; 26:97-102
Wang et al J Rheum 2001; 28:525-32
– Immunosuppressive use
– ESRD
Rood et al J Rheum 2000; 27:2057-9
Vu, Escalante J Rheum 1999; 26:2595-2601
Domains Recommended by OMERACT 4
Disease activity: Disease Activity Scores: SLEDAI, BILAG, ECLAM,
SELENA SLEDAI, SLAM-R
Definitions of Active Nephritis by U/A, 24 hour CCr,
proteinuria, «Renal flare»
«Major SLE Flare»
Damage: ACR/SLICC Damage Index
End Stage Renal Disease [ESRD]
Doubling of Serum Creatinine
Chronicity Index on Biopsy
Bone loss due to disease activity and/or corticosteroids
HRQOL: SF-36
[Should also include: Adverse events
Economic costs including health utilities]
As reviewed in Schiffenbauer et al: EBM Treatment of SLE; BJR: in press
Ankylosing Spondylitis: ASAS
• A successful and relevant example
• To be discussed by
Robert Landewe
Juergen Braun
Systemic Sclerosis Workshop: OMERACT 6
Absence of data: Few ‘failed’ RCTs
Limited information from LOS
Assessment by organ system involvement
• Renal
• Cardio-pulmonary
• Muscle
• HRQOL
• Skin
• GI
OMERACT 7
May 12-16, 2004
Asilomar, California
• Module: RA: Definition of Low Disease Activity
• Module Updates:
Imaging in Ankylosing Spondylitis [ASAS]
Working Group on Safety
• Workshops:
Outcome Measures in Psoriatic Arthritis
Outcome Measures in Fibromyalgia
Outcome Measures in Gout
The Patient Perspective in Outcome Measures