OVERVIEW PROMIS DEVELOPMENT METHODS, ANALYSES AND APPLICATIONS Development of PROMIS item banks Psychometric analysis of item bank data Clinical and health services research applications Dennis A. Revicki, Ph.D. Center for Health Outcomes Research, United BioSource Corporation, Bethesda, Maryland, USA Presented at the PatientPatient-Reported Outcomes Measurement Information System (PROMIS): A Resource for Clinical & Health Services Research, Academy Academy Health Annual Research Meeting, Orlando, Florida, June 3, 2007 GOAL FOR PROMIS PROMIS DOMAIN HIERARCHY Upper Extremities: grip, buttons, etc (dexterity) Improve assessment of selfself- reported symptoms and domains of healthhealth-related quality of life for application across a wide range of chronic diseases Central: neck and back (twisting, bending, etc) Activities: IADL (e.g. errands) Physical Health Pain Fatigue Develop and test a large bank of items for measuring PROs Lower Extremities: walking, arising, etc (mobility) Function/Disability Symptoms Satisfaction Develop computercomputer-adaptive testing (CAT) for efficient assessment of PROs Create a publicly available, flexible, and sustainable system allowing allowing researchers to access to item banks and CAT tools Sleep/Wake Function** Sexual Function Other Anxiety Depression Selfreported Health Emotional Distress Anger/Aggression Substance Abuse Mental Health •Self Concept •Stress Response •Spirituality/Meaning •Social Impact Negative Impacts of illness Cognitive Function Positive Impacts of Illness Satisfaction Satisfaction Social Health Meaning and Coherence (spirituality) Positive Psychological Functioning Mastery and Control (self-efficacy) Subjective Well-Being (positive affect) Performance Role Participation Satisfaction Social Support Satisfaction Items from Instrument Items from Instrument Items from Instrument A B C New Items ITEM BANKS An item bank comprises a large collection of items measuring a single domain, e.g., pain… Item Pool Content Expert Review Cognitive Testing Secondary Data Analysis no pain b Questionnaire b b b b bb administered bbbb large bb representativetosample mild pain moderate pain b severe pain b b extreme pain bb 2.5 1.0 Pain Item Bank 2.0 0.8 Item Response Theory (IRT) 0.6 0.4 0.2 In fo rm a tio n Pro ba bility o f Res po nse Focus Groups 1.5 1.0 0.5 0.0 0.0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 Item Bank Short Form Instruments 3 Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7 Item 8 Item 9 Item n Theta Theta (IRT-calibrated items reviewed for reliability, validity, and sensitivity) CAT These items are reviewed by experts, patients, and methodologists to make sure: • Item phrasing is clear and understandable for those with low literacy • Item content is related to pain assessment and appropriate for target population • Item adds precision for measuring different levels of pain 1 ITEM RESPONSE THEORY MODELS STEPS FOR PROMIS ITEM BANKS Criteria Item Development Skewness 1 Qualitative Review 2 Frequency Unidimensionality Analysis 3 CFA Local Independence 4 Residual IRT Analysis Differential Item Function Item Parameter Stability Item Fit Evaluation Focus groups and cognitive interviews – Fewer items needed for equal precision < 95% response in one category >.60 factor loading IRT models enable reliable and precise measurement of PROs – Makes assessment briefer More precision gained by adding items Error is understood at the individual level – Reducing error and sample size requirements <.10 residual correlation Correlations 5 Item Response Curves monotonic 6 Regression R2<.03 DIF 7 Exclusion of Items ? 8 Fit Tests p>.05 Chi2 test 9 Simulation Studies – Allowing practical individual assessment — PEOPLE AND ITEMS DISTRIBUTED ON THE SAME METRIC: FATIGUE WHICH RANGE OF MEASUREMENT? Are you able to … Does your health now limit you in ... climb up several stairs People with more fatigue heavy work around the house 8 People with less fatigue Ceiling effect usual physical activities sit on the edge of the bed 6 strenuous activities 4 2 0 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 5 = Not at all 4 = Very little 3 = Somewhat 2 = Quite a lot 1 = Cannot do 5 = Without any difficulty 4 = With a little difficulty 3 = With some difficulty 2 = With much difficulty 1 = Unable to do Theta Disability 0.0 Item information 10 Items more likely to be endorsed Physical Function BANK PRECISION LEVEL ALONG THE PAIN CONTINUUM Items less likely to be endorsed THE ADVANTAGES OF SHORTSHORT-FORMS DEVELOPED FROM PROMIS ITEM BANKS 40 0.3% 28.3% Average self - reported pain = 60.43 (Scaled score = 2.46) Select a set of items that are matched to the severity level of the target population. All scales built from the same item bank are linked on a similar metric. 30 20 10 0 Severe pain 0 Minimal/no pain 10 20 30 40 50 60 70 80 90 100 4 SE 3 2 SE = 0.5 10.7 (Scaled score = -4.50) SE = 0.5 71.8 (Scaled score = 4.05) 1 0 Very much b Quite a bit Somewhat A little bit Not at all 2 THE ADVANTAGES OF CATCAT-BASED ASSESSMENT FATIGUE MEASURE AND STANDARD ERROR COMPARISON BY TEST LENGTH Fatigue Measure and Standard Error Comparision by Test Length 1. Provide an accurate estimate of a person’ person’s score with the minimal number of questions. 1.0 • Questions are selected to match the health status of the respondent. 0.9 Standard Error 0.8 2. CAT minimizes floor and ceiling effects. 0.7 • People near the top or bottom of a scale will receive items that are designed to assess their health status. 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 Fatigue Measure 5 Item CAT 10 Item CAT 72 Item Bank 6 Item SF 13 Item Scale 1.0 1.0 How often did you feel nervous? How often did you feel nervous? 0.8 0.8 All of the time 0.6 Most of the time Some of the time Little of the time None of the time Some of the time 0.6 0.4 0.4 0.2 0.2 0.0 0.0 -3.00 -3 Severe -2.00 -1.00 -2 -1 high 0.00 1.00 0 moderate 1 2.00 3.00 2 low -3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 -3 -2.00 -2 Severe -1.00 0.00 -1 high 1.00 0 moderate 2.00 1 3.00 2 low 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 How often did you feel nervous? How often did you feel hopeless? 0.8 0.8 Some of the time 0.6 All of the time 0.6 0.4 Most of the time Some of the time Little of the time None of the time 0.4 0.2 0.2 0.0 0.0 -3.00 -3 Severe -2.00 -2 high -1.00 -1 moderate 0.00 1.00 0 1 low 2.00 2 3.00 -3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) -3 Severe -2.00 -2 high -1.00 -1 0.00 1.00 0 moderate 1 low 2.00 2 3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 3 1.0 1.0 How often did you feel hopeless? How often did you feel worthless? 0.8 0.8 Some of the time 0.6 All of the time 0.6 0.4 Most of the time Some of the time Little of the time None of the time 0.4 0.2 0.2 0.0 0.0 -3.00 -3 -2.00 -2 Severe -1.00 -1 high 0.00 1.00 0 moderate 2.00 1 3.00 2 low -3.00 3 very low -3 -2.00 -1.00 -2 Severe 0.00 -1 high 1.00 0 2.00 1 moderate 3.00 2 3 low very low Emotional Distress Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) Item Bank (Validated & IRT-Calibrated Emotional Distress Items) 1.0 1.0 How often did you feel worthless? How often did you feel worthless? 0.8 0.8 Little of the time 0.6 Little of the time 0.6 0.4 0.4 0.2 0.2 0.0 0.0 -3.00 -3 -2.00 -2 Severe high -1.00 -1 0.00 1.00 0 moderate 2.00 1 3.00 2 low -3.00 3 very low Emotional Distress Item Bank (Validated & IRT-Calibrated Emotional Distress Items) CLINICAL AND HEALTH SERVICES RESEARCH APPLICATIONS Brief, psychometrically sound shortshort-form or CAT instruments – Pain, fatigue, physical function, emotional distress, social activities/function -2 Severe -1.00 0.00 -1 1.00 0 moderate high Target in on emotional distress score 2.00 1 3.00 2 3 low very low Item Bank (Validated & IRT-Calibrated Emotional Distress Items) TREATMENT COMPARISONS AND EFFECT SIZE ESTIMATES FOR BASELINE TO ENDPOINT CHANGES FOR DEPRESSION SEVERITY SCALES FOR PAROXETINE AND PLACEBO GROUPS Score Efficient collection of health outcomes data in clinical trials – Comparing health interventions and strategies – Comparing pharmaceutical treatments -3 -2.00 HDRS Total Least Square Mean Change F-Value P-Value Effect Size -8.375 7.45 0.007 0.43 Paroxetine Placebo -11.4407 Monitoring the health outcomes of populations – Health plan members – Medicare beneficiaries – US general population (i.e., MEPS) MADRS Total -13.617 -8.793 11.93 0.001 0.54 DSDS-1 TT-Score -17.333 -12.171 8.73 0.004 0.46 DSDS-2 TT-Score -21.919 -13.690 14.57 0.0002 0.59 DSDS-3 TT-Score -23.135 -14.234 16.09 0.0001 0.63 a. Sample size: Paroxetine N = 98; Placebo N = 99 b. Sample size: Paroxetine N = 82; Placebo N = 85 4 SUMMARY AND CONCLUSION PROMIS item banks, shortshort-form measures and CAT will enable the efficient and psychometrically sound assessment of health outcomes PROMIS items banks, instruments and software will be in the public domain – PROMIS Health Organization – NotNot-forfor-profit organization for management and dissemination of PROMIS products Development of PROMIS item banks and instruments is ongoing Health outcome measures may assist patients, their families, clinicians, and other health care decisiondecision-makers in understanding the outcomes of health care interventions and treatment – Preliminary measurement systems available late 2007 5