PROMIS DEVELOPMENT METHODS, ANALYSES AND APPLICATIONS Dennis A. Revicki, Ph.D. Center for Health Outcomes Research, United BioSource Corporation, Bethesda, Maryland, USA Presented at the Patient-Reported Outcomes Measurement Information System (PROMIS): A Resource for Clinical & Health Services Research, Academy Health Annual Research Meeting, Orlando, Florida, June 3, 2007 OVERVIEW Development of PROMIS item banks Psychometric analysis of item bank data Clinical and health services research applications GOAL FOR PROMIS Improve assessment of self- reported symptoms and domains of health-related quality of life for application across a wide range of chronic diseases Develop and test a large bank of items for measuring PROs Develop computer-adaptive testing (CAT) for efficient assessment of PROs Create a publicly available, flexible, and sustainable system allowing researchers to access to item banks and CAT tools PROMIS DOMAIN HIERARCHY Upper Extremities: grip, buttons, etc (dexterity) Function/Disability Lower Extremities: walking, arising, etc (mobility) Central: neck and back (twisting, bending, etc) Activities: IADL (e.g. errands) Physical Health Pain Fatigue Satisfaction Symptoms Sleep/Wake Function** Sexual Function Other Anxiety Selfreported Health Depression Emotional Distress Anger/Aggression Substance Abuse Mental Health Negative Impacts of illness Cognitive Function Positive Impacts of Illness Satisfaction Satisfaction Social Health Satisfaction Positive Psychological Functioning Role Participation Social Support •Self Concept •Stress Response •Spirituality/Meaning •Social Impact Meaning and Coherence (spirituality) Mastery and Control (self-efficacy) Subjective Well-Being (positive affect) Performance Satisfaction Items from Instrument Items from Instrument Items from Instrument A B C New Items Item Pool Content Expert Review Cognitive Testing Secondary Data Analysis Questionnaire administered to large representative sample 2.5 1.0 2.0 0.8 Item Respons e Theory (IRT) 0.6 0.4 0.2 0.0 -3 -2 -1 0 1 2 3 Information Probability of Response Focus Groups 1.5 1.0 0.5 0.0 -3 -2 -1 0 1 2 Theta Theta Item Bank Short Form Instruments (IRT-calibrated items reviewed for reliability, validity, and sensitivity) CAT 3 ITEM BANKS An item bank comprises a large collection of items measuring a single domain, e.g., pain… no pain mild pain moderate pain severe pain extreme pain Pain Item Bank Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item n These items are reviewed by experts, patients, and methodologists to make sure: • Item phrasing is clear and understandable for those with low literacy • Item content is related to pain assessment and appropriate for target population • Item adds precision for measuring different levels of pain STEPS FOR PROMIS ITEM BANKS Criteria Item Development Skewness Unidimensionality Local Independence IRT Analysis Differential Item Function Item Parameter Stability Item Fit Evaluation 1 Qualitative Review 2 Frequency Analysis 3 CFA Focus groups and cognitive interviews < 95% response in one category >.60 factor loading 4 Residual <.10 residual correlation Correlations 5 Item Response Curves monotonic 6 Regression R2<.03 DIF 7 Exclusion of Items ? 8 Fit Tests p>.05 Chi2 test 9 Simulation Studies — ITEM RESPONSE THEORY MODELS IRT models enable reliable and precise measurement of PROs – Fewer items needed for equal precision – Makes assessment briefer More precision gained by adding items – Reducing error and sample size requirements Error is understood at the individual level – Allowing practical individual assessment WHICH RANGE OF MEASUREMENT? Are you able to … Does your health now limit you in ... climb up several stairs Item information 10 heavy work around the house usual physical activities 8 sit on the edge of the bed 6 strenuous activities 4 2 0 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 Theta Disability Physical Function 5 = Not at all 4 = Very little 3 = Somewhat 2 = Quite a lot 1 = Cannot do 5 = Without any difficulty 4 = With a little difficulty 3 = With some difficulty 2 = With much difficulty 1 = Unable to do PEOPLE AND ITEMS DISTRIBUTED ON THE SAME METRIC: FATIGUE People with more fatigue People with less fatigue Ceiling effect 0.0 Items more likely to be endorsed Items less likely to be endorsed BANK PRECISION LEVEL ALONG THE PAIN CONTINUUM 40 0.3% 28.3% Average self - reported pain = 60.43 (Scaled score = 2.46) 30 20 10 0 Severe pain 0 Minimal/no pain 10 20 30 40 50 60 70 80 4 SE 3 2 SE = 0.5 10.7 (Scaled score = -4.50) SE = 0.5 71.8 (Scaled score = 4.05) 1 0 Very much b Quite a bit Somewhat A little bit Not at all 90 100 THE ADVANTAGES OF SHORT-FORMS DEVELOPED FROM PROMIS ITEM BANKS Select a set of items that are matched to the severity level of the target population. All scales built from the same item bank are linked on a similar metric. FATIGUE MEASURE AND STANDARD ERROR COMPARISON BY TEST LENGTH Fatigue Measure and Standard Error Comparision by Test Length 1.0 0.9 Standard Error 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 Fatigue Measure 5 Item CAT 10 Item CAT 72 Item Bank 6 Item SF 13 Item Scale THE ADVANTAGES OF CAT-BASED ASSESSMENT 1. Provide an accurate estimate of a person’s score with the minimal number of questions. • 2. Questions are selected to match the health status of the respondent. CAT minimizes floor and ceiling effects. • People near the top or bottom of a scale will receive items that are designed to assess their health status. 1.0 How often did you feel nervous? 0.8 All of the time 0.6 Most of the time Some of the time Little of the time None of the time 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low Emotional Distress 2.00 2 3.00 3 very low Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel nervous? 0.8 Some of the time 0.6 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel nervous? 0.8 Some of the time 0.6 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 moderate 0.00 1.00 0 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel hopeless? 0.8 All of the time 0.6 Most of the time Some of the time Little of the time None of the time 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel hopeless? 0.8 Some of the time 0.6 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel worthless? 0.8 All of the time 0.6 Most of the time Some of the time Little of the time None of the time 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel worthless? 0.8 Little of the time 0.6 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel worthless? 0.8 Little of the time 0.6 0.4 0.2 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 moderate 0.00 1.00 0 1 low 2.00 2 3.00 3 very low Target in on emotional distress score Item Bank (Validated & IRT-Calibrated Emotional Distress Items) CLINICAL AND HEALTH SERVICES RESEARCH APPLICATIONS Brief, psychometrically sound short-form or CAT instruments – Pain, fatigue, physical function, emotional distress, social activities/function Efficient collection of health outcomes data in clinical trials – Comparing health interventions and strategies – Comparing pharmaceutical treatments Monitoring the health outcomes of populations – Health plan members – Medicare beneficiaries – US general population (i.e., MEPS) TREATMENT COMPARISONS AND EFFECT SIZE ESTIMATES FOR BASELINE TO ENDPOINT CHANGES FOR DEPRESSION SEVERITY SCALES FOR PAROXETINE AND PLACEBO GROUPS Score Least Square Mean Change F-Value P-Value Effect Size Paroxetine Placebo HDRS Total -11.4407 -8.375 7.45 0.007 0.43 MADRS Total -13.617 -8.793 11.93 0.001 0.54 DS-1 T-Score -17.333 -12.171 8.73 0.004 0.46 DS-2 T-Score -21.919 -13.690 14.57 0.0002 0.59 DS-3 T-Score -23.135 -14.234 16.09 0.0001 0.63 a. Sample size: Paroxetine N = 98; Placebo N = 99 b. Sample size: Paroxetine N = 82; Placebo N = 85 SUMMARY AND CONCLUSION PROMIS item banks, short-form measures and CAT will enable the efficient and psychometrically sound assessment of health outcomes PROMIS items banks, instruments and software will be in the public domain – PROMIS Health Organization – Not-for-profit organization for management and dissemination of PROMIS products Development of PROMIS item banks and instruments is ongoing – Preliminary measurement systems available late 2007 Health outcome measures may assist patients, their families, clinicians, and other health care decision-makers in understanding the outcomes of health care interventions and treatment