Public Reporting of Long Term Care Quality: The US Experience Vincent Mor, Ph.D. Brown University 1 Background Long history of scandals regarding long term care quality, particularly nursing homes While preference for and supply of “community based” alternatives have grown in US, all acknowledge residentially based long term care must be part of any system Home Health less scrutinized but many worry about care adequacy since hard to inspect 2 Background Institute of Medicine Report in 1987 served as basis for nursing home reform also adopted by home care Uniform Resident Assessment Instrument created in 1991 and became the basis for the creation of performance measures designed to stimulate quality competition through public reporting Home Health Outcome and Assessment Information Set (OASIS) emerged independently 3 Background (cont.) Using RAI Nursing Home Quality Measures tested, revised and published as “Nursing Home Compare” since 2002 More recent efforts to create composite measure incorporating Inspection results, Staffing Levels and Quality Measures have been widely promulgated Home Health Quality Measures developed and tested and published as Home Health Compare since 2004 4 Purpose Summarize US Experience with Development of Long Term Care Quality Measures Review Conceptual and Technical Issues Facing the Construction of Long Term Care Quality Measures Review Literature on Effects of Public Reporting of Quality Measures in Long Term Care 5 The Nursing Home Resident Assessment Instrument (RAI) 1986 Institute of Medicine Report on Nursing Home Quality Recommended a Uniform RAI to Guide Care Planning --MDS OBRA ‘87 Contained Nursing Home Reform Act Including RAI Requirement A 300 Item, Multi-Dimensional RAI Tested for 2 Years Mandated Implementation in 1991 6 Clinical Planning Basis of the MDS Assessment Profile in Given Domain “Triggers” Potential “Risk” Status Resident Assessment Protocol Reviewed to Determine Presence of Problem or High Risk of Problem Care Planning and Treatment Directed to the Problem Data Quality Contingent upon conduct of Clinical Care Planning Process 7 MDS Background MDS Version 2.0 Introduced in 1996 Admission, Short Term and Quarterly Reassessments done on all Residents Inter-State Variation with some requiring additional data Since 1998 all MDS records are computerized and submitted to Centers for Medicare & Medicaid 8 9 10 MDS: Putting Practice into Research 11 CMS Quality Measures “The quality measures, developed under CMS contract to Abt Associates and a research team led by Drs. John Morris and Vince Mor, have been validated and are based on the best research currently available. These quality measures meet four criteria. They are important to consumers, are accurate (reliable, valid and risk adjusted), can be used to show ways in which facilities are different from one another, and can be influenced by the provision of high quality care by nursing home staff.” CMS Web Site 12 CMS Quality Measures - Long Term Percent of Long-Stay Residents Given Influenza Vaccination During the Flu Season Percent of Long-Stay Residents Given Pneumococcal Vaccination Percent of Residents Whose Need for Help With Daily Activities Has Increased Percent of Residents Who Have Moderate to Severe Pain Percent of High-Risk Residents Who Have Pressure Sores Percent of Low-Risk Residents Who Have Pressure Sores Percent of Residents Who Were Physically Restrained Percent of Residents Who are More Depressed or Anxious (Looks back 30 days) Percent of Low-Risk Residents Who Lose Control of Their Bowels or Bladder Percent of Residents Who Have/Had a Catheter Inserted and Left in Their Bladder Percent of Residents Who Spent Most of Their Time in Bed or in a Chair Percent of Residents Whose Ability to Move in and Around Their Room Got Worse Percent of Residents with a Urinary Tract Infection (Looks back 30 days) Percent of Residents Who Lose Too Much Weight (Looks back 30 days) 13 Physical FunctioningOctober/December 2009 State National AK AL AR AZ CA CO CT DC DE FL ADL Worse 14.9% 17.4% 11.6% 14.2% 13.9% 10.4% 15.3% 15.4% 13.4% 13.8% 12.7% Bed Bound 4.7% 5.3% 6.5% 4.2% 4.9% 7.1% 3.0% 2.5% 1.7% 4.1% 4.6% Move Worse 14.7% 14.6% 11.7% 12.6% 14.6% 11.6% 14.9% 16.3% 13.2% 15.5% 12.5% Decline in ROM 6.6% 10.9% 5.4% 5.5% 5.5% 6.2% 6.4% 4.8% 5.4% 8.1% 5.2% 14 Psychotropic Drug UseOctober/December 2009 State AntiPsychotics Overall AntiPsychotics LOW Risk Anti-Anxiety Agents National AK AL AR AZ CA CO CT DC DE FL 18.6% 11.2% 15.9% 17.9% 19.2% 16.8% 18.6% 23.7% 13.6% 20.2% 12.2% 15.6% 4.7% 14.0% 15.6% 15.8% 14.0% 15.1% 21.2% 12.4% 17.8% 10.1% 23.1% 21.5% 27.2% 21.1% 21.5% 20.4% 18.1% 22.7% 13.4% 23.3% 27.5% 15 CMS Quality Measures – Short stay Percent of Short-Stay Residents Given Influenza Vaccination During the Flu Season Percent of Short-Stay Residents Who Were Assessed and Given Pneumococcal Vaccination Percent of Short-Stay Residents With Delirium Percent of Short-Stay Residents in Moderate to Severe Pain Percent of Short-Stay Residents With Pressure Sores 16 Home Health Quality Measurement OASIS began as a cooperative effort between home health agencies and researchers to develop simple “outcome” measures to track patients’ rate of improvement while in care University of Colorado researchers worked with large Visiting Nurse Services to develop and test CMS then funded multiple large demonstrations to implement the tool and use for quality measurement and case-mix reimbursement 17 Outcome Based Quality Improvement Distinct measures of change in patient functioning, resolution of symptoms and ability to manage independently collected at the start and “end” of care (OR every 60 days) Most Medicare home health is short term Measures tested and revised with extensive case mix adjustment to allow for comparison across agencies and states 18 Risk-adjusted Home Health Outcome Report for Improvement of Activities of Daily Living EXAMPLE: Percent of Patients in Home Health Care whose ability to [Groom, Bathe, Dress Upper and Dress Lower Body] themselves improves between start of care and discharge 19 CMS OASIS Report – 2009 Rates of Improvement in ADL State Grooming Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia 71.0% 67.0% 67.0% 68.0% 71.0% 70.0% 69.0% 68.0% 75.0% 69.0% 71.0% Upper Dressing 73.0% 66.0% 69.0% 69.0% 72.0% 71.0% 69.0% 69.0% 79.0% 69.0% 73.0% Lower Dressing 75.0% 56.0% 67.0% 70.0% 70.0% 70.0% 68.0% 70.0% 79.0% 68.0% 74.0% Bathing 68.0% 59.0% 64.0% 65.0% 67.0% 64.0% 62.0% 62.0% 72.0% 66.0% 67.0% 20 Risk-adjusted Home Health Outcome Report for Utilization Outcomes Percent of patients who have received emergency care prior to or at the time of discharge from home health care. Percent of patients who are discharged from home health care and remain in the community Percent of patients who are admitted to an acute care hospital for at least 24 hours while a home health care patient. 21 Risk-adjusted Home Health Outcome Report State Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware District of Columbia Florida Georgia Any Emergent Care 23.0% 20.0% 25.0% 24.0% 18.0% 23.0% 27.0% 20.0% 21.0% 18.0% 22.0% Discharged to Community 64.0% 71.0% 67.0% 64.0% 72.0% 69.0% 65.0% 70.0% 71.0% 70.0% 68.0% Acute Care Hospital 33.0% 25.0% 29.0% 32.0% 25.0% 26.0% 32.0% 26.0% 26.0% 26.0% 29.0% 22 Conceptual Issues Inherent in Applying Quality Indicators Requires “shared” interpretation of Quality Assumes all Providers have same goals Assumes Measured Quality Domains are Important Indicators are NOT Quality per se, BUT often used as evidence in and of themselves Assumes Facilities Accountable for most of the variation in the Indicator (e.g. outcomes) Assumes Facilities Know how to Change Practice 23 Technical Issues That Can Compromise Validity of QI’s •Reliability & Validity of the data •Multi-dimensionality of Quality & Indicators •Stability of Estimates Sensitive to Sample Size •Ranks can Overestimate Differences •Patient Level Risk Adjustment Complex •Differences in Assessment Practices Influence QI Scores & Comparisons 24 Reliability Studies: NH 219 of 462 (47.4%) facilities approached chose to participate in full study (52.4% for HB and 45.6% for non-HB); Non-participants were more likely to be for-profit, less well staffed and with more regulatory deficiencies 5758 patients (ave. 27.5/facility) included in reliability analyses; 119 patients assessed twice by research nurses Patients resemble traditional US nursing home patient 25 Reliability of “Gold Standard” Nurses Of 100 items, only 3 didn’t reach Kappa>.4 50%+ items had Kappa >.75 Pct. Agreement high even for ordinal items with variance Item % Agree Kappa DNR 91% .83 Memory 88% .63 Decisions 97% .89 Understood 96% .82 Understand 96% .80 Fears 97% .76 Wander 99% .85 Walk 95% .86 Pain Fx. 93% .78 26 Reliability of Facility RNs to “Gold Standard” Of the 100 data items 28 had Kappa <.4 and 15 had Kappa >.75 Worst Kappa items were rare binary items like “end stage”, didn’t use toilet, recurrent lung aspirations, etc. ADLs and other Functioning items had Kappa values above .75 27 Reliability of Constructed Quality Indicators: NH Quality Indicators are composites of several RAI items; a definition of the denominator and of the conditions required to meet the QI definition The inter-rater reliability of a QI is a function of the reliability of all the component items defining the algorithm 28 Prevalence and Inter-Rater Agreement and Reliability of Selected Facility Quality Indicators [N=209 homes] Avg. QI Prev rate Facility Ave SD of QI Prev rate Ave Kappa for Items used in QI % Agree Resch & facility RNs on QI QI Kappa Behavior Problems High & Low Risk Combined .20 .10 .71 89.8 .61 Little no activities .12 .12 .28 65.3 .23 Catheterized .07 .05 .71 92.5 .67 Incontinence .62 .13 .88 91.4 .78 Urinary Tract Infection .08 .05 .53 89.1 .45 Tube Feeding .08 .05 .73 98.1 .83 Inadequate Pain Management .11 .08 .85 86.5 .87 29 Facility QI Reliability Variation: Bladder/Bowel Incontinence 70 60 50 40 30 20 Std. Dev = .21 10 Mean = .78 N = 209.00 0 0.00 .06 .13 .25 .19 .38 .31 .50 .44 .63 .56 .75 .69 .88 .81 1.00 .94 kappa Bladder/Bowel Incontinence (High and Low Risk) 30 Facility QI Reliability Variation: Inadequate Pain Management 30 20 10 Std. Dev = .30 Mean = .50 N = 209.00 0 -.13 0.00 -.06 .13 .06 .25 .19 .38 .31 .50 .44 .56 .63 .69 .75 .88 .81 1.00 .94 kappa Inadequate Pain Management 31 Reliability Studies: Home Health Fewer inter-rater reliability studies of OASIS More expensive to send two nurses at separate times on the same day to do the same assessment Largest Reliability Study done as part of research to develop case-mix reimbursement system ADL and other function items yield high levels of reliability; symptoms achieve “ok” reliability 32 Selected Inter-Rater Reliability Results from OASIS test Signs & Symptoms 1. Diarrhea Sampl Percent e Size Agreement 304 93.4% Kappa 0.44 2. Difficulty urinating or >=3x/night 304 91.5% 0.45 3. Fever 304 96.7% 0.63 4. Vomiting 304 97.4% 0.49 5. Chest Pain 304 95.4% 0.51 6. Constipation in 4 of last 7 days 304 92.1% 0.53 7. Dizziness or lightheadedness 304 89.1% 0.46 8. Edema 304 81.3% 0.50 9. Delusions 304 99.0% 0.66 10. Hallucinations 304 98.4% 0.44 33 OASIS Reliability Results: Function Variable Sample Percent Size Agreement Kappa Grooming: Current ability to tend to personal hygiene needs 304 74.7% 0.83 Dressing: Current ability to dress upper body with or without dressing aids 304 71.1% 0.83 Dressing: Current ability to dress lower body with or without dressing aids 304 77.0% 0.85 Bathing: Current ability to wash entire body Toileting: Current ability to get to and from the toilet or bedside commode 304 304 64.8% 82.6% 0.80 0.86 Transferring: Current ability to move from bed to chair, on/off toilet or commode, tub, … 304 74.3% 0.88 Ambulation/Locomotion: Current ability to safely walk, use a wheelchair… 304 77.6% 0.87 34 Validity of the Data & Measures Validity of the data shown by the extent to which items and measures behave as expected relative to “gold standard” variables or “hard” outcomes Compared MDS diagnoses to Hospital discharge diagnoses Looked at MDS predictors of survival Related to MDS measures to research scales 35 MDS vs. CMS Hospital diagnoses Neurological Cerebrovascular disorders (ICD-9: 432, 434, 436, 437) • PPV = 0.73 Parkinson’s disease (ICD-9: 332) • PPV = 0.86 Alzheimer’s disease (ICD-9: 331) • PPV = 0.68 Brain degeneration (ICD-9: 331.0, 331.2, 331.7, 331.9) • PPV = 0.84 36 One Year Survival by Gender & Cognition Level Women (CPS 2-4) Men (CPS 0-1) Months 37 Survival Time by CHESS Score and Age 0.70 0.60 Percent Died 0.50 0.40 0.30 0.20 0.10 0.00 CHESS Score/Age Group <1 year 1-2 years 2-3 years 38 Construct Validity: Cognitive Performance Scale & Correlates Cognitive Performance Scale (CPS) Derived from 5 MDS Items Strong (>.85) Correlation with MMSE High Kappa with Global Deterioration Scale (.76) Percent Patients with Dementia Increases as CPS Declines MDS Communication Correlated (.85) with MMSE ADL, CPS Symptoms & Select Diagnoses Related to Survival 39 Sample Size and QI Stability Providers and Consumers want QI to reflect not just what WAS but what WILL BE; SO QI stability is desired QI must be based upon minimum # observations Correlation between quarters among QIs varies Correlation among prevalence based QIs is high because same individuals assessed each quarter Correlation between quarters among incidence and change based QIs lower and VERY sensitive to sample size 40 Residents’ Expected Rates of Change on Quality Indicators Over 90 day period 77.1% of residents still in facility do not change on ADL, 14.7% decline and 8.2% improve. Over 12 months 58% of residents in home don’t change and 30.2% decline. Similar pattern for Communication, Cognition and individual ADL items Means that rates of decline are low and many residents are needed to estimate a home’s rate of ADL decline with confidence. 41 Estimated Sample Size for Change Number Residents Decline Estimate Residents Expected to Decline 20th Pctile Expected Residents Declining 80th Pctile Expected Residents Declining 20 Beds 12 1 <1 1 30 Beds 16 1 <1 3 50 Beds 28 2 1 4 80 Beds 45 4 100 Beds 56 5 2 7 150 Beds 83 7 4 11 200 Beds 117 9 Facility Size 2 5 6 14 42 Long Term Predictability of Quality Facility QI Trend: Incidence of Late-Loss ADL Worsening (Stratified by Quality at Baseline) 0.250 0.200 0.150 0.100 0.050 Best-quality at baseline Mixed-quality at baseline Good-quality at baseline Worst-quality at baseline 43 2004Q4 2004Q3 2004Q2 2004Q1 2003Q4 2003Q3 2003Q2 2003Q1 2002Q4 2002Q3 2002Q2 2002Q1 2001Q4 2001Q3 2001Q2 2001Q1 2000Q4 2000Q3 2000Q2 2000Q1 1999Q4 1999Q3 0.000 Quality Fluctuation: Seasonality Figure 2. Quarterly ADL Decline in Nursing Home Residents & Flu Mortality Rates in 122 CDC Monitored Cities: 2000-2005 .12 .115 .11 .105 Flu Deaths/100,000 Population .125 3.5 3 2.5 2 1.5 2000Q1 2000Q2 2000Q3 2000Q4 2001Q1 2001Q2 2001Q3 2001Q4 2002Q1 2002Q2 2002Q3 2002Q4 2003Q1 2003Q2 2003Q3 2003Q4 2004Q1 2004Q2 2004Q3 2004Q4 2005Q1 2005Q2 2005Q3 2005Q4 .13 ADL Decline Flu Mortality 44 Transforming QI Scores into Ranks Many QI score distributions are skewed; many facilities with little or no problem and few facilities with many residents experiencing the problem. Median facility might be very similar to the “best” (the one with fewest problems) Transforming to ranks means saying there is a difference between the 10th and 40th percentile when there is little difference 45 Pressure Ulcer Prevalence Facility Distribution: Meaning of Ranks 46 Variability in Ranking Distributions Anti-psychotics: Median Ranks Persistent Pain: Median Ranks 600 600 500 500 400 Median Ranks 400 300 300 200 200 100 80% Confidenc e 100 80% Confidence Intervals Interv als 0 Median 598 Facilities 0 Median 598 Facilities 47 Complexity of Determining Appropriate Risk Adjustment •Risk Factors May not be Measured Independent of the Provider (tx) Effect •Potential for Over Adjustment as Great as Under Adjustment •How to Adjust for Socio-Economic Differences Known to Affect Health Behavior or Clinical Characteristics (e.g. PU not “seen” on African American NH pts until at Stage 2 OR Pain Harder to “see” in Cognitively Impaired & Oldest pts) 48 Risk Adjustment Complexity 49 Why Adjust QIs Facilities should be compared on ‘level playing field’, acknowledging differences in Types of residents admitted Ability to ameliorate clinical characteristics thought to predispose to poor outcomes irrespective of care quality Variability in measurement acumen of assessors 50 Average Admission Prevalence of Pressure Ulcers Across All States, 1999 Louisiana Districtof Columbia New Jersey Mississippi California Georgia New York WestVirginia Maryland Alabama Tennessee South Carolina Oklahoma Pennsylvania Nevada Illinois Texas Florida Virginia Kentucky Michigan Rhode Island North Carolina Arkansas Ohio Arizona Delaware Indiana Haw aii New Mexico Alaska Massachusetts Missouri Washington New Hampshire Colorado Connecticut Kansas Utah Oregon Wisconsin Iow a Maine Idaho South Dakota Montana Nebraska Vermont North Dakota Minnesota Wyoming 0.0 .1 .2 Admission Prevalence of Pressure Ulcers in 1999 .3 51 9.1 9.2 Source: MDS 2000; Medicare inpatient claims 2000. 52 8.3 8.5 8.7 8.9 10.8 10.4 9.5 12.3 16.1 16.2 15.6 15.5 14.2 13.7 12.6 16.3 16.3 14.9 14.0 16.6 16.0 14.6 13.4 12.3 11.3 10.1 10.0 14.9 15.0 16.5 19.4 18.7 17.3 16.9 20.1 18.8 17.7 20.9 20.7 19.7 18.3 21.3 21.5 21.2 20.0 20.0 23.2 23.9 24.9 25.0 LA MS NJ OK TX KY AR WV GA IL FL MO AL TN MD OH PA IN MI NY IA SD CA VA MA NC SC RI NV DE KS NE MN WY CT WI AZ CO WA ND MT VT ID OR NH ME NM UT % Re-Hospitalized Hospitalization Rate in a 6-Month Period in 2000 Among Long-Stay NH Residents (Who Spent 90+ Days in the Facility) 30.0 5.0 0.0 53 Multi-dimensionality of QIs Consumers want to know “Best” nursing home & Regulators want to know where to focus their survey energies & Purchasers want to buy best. If Quality is multi-dimensional no such thing as the “best”; most valuable dimension is a preference and will be individualized Combining QIs that aren’t highly correlated may mask differences between facilities on important individual QIs 54 Does Poor Performance on One Measure Mean NF is Poor? •Average Correlation Among QIs is Low; •Anti-Psychotics and Restraints Correlated .04 •What is a “Good” Home if QIs not Related? •Can Performance Measures Help Pick Good Homes? • Are Some Measures More Meaningful? •Should Users of Performance Measures Select the Measures they Value Most? 55 Summary Results of Factoring Functional Decline Mood/ Behavior Pressure Ulcers Treatment & Condition [No Factor] Worsening Bladder Poor Mood State [Prevalence] Worsening Pressure Ulcers Prevalent Catheter Worsening Bowel Worsening Mood Prevalent Pressure Ulcer (Hi Risk) Prevalent Restraint ADL Decline Poor Mood w/o Anti- Prevalent Pressure depressants Ulcer (Lo Risk) Prevalent AntiHypnotic Use Mobility Decline Behavior Problems [Prevalence] Prevalent AntiPsychotic Use Cognitive Decline Worsening Behavior Weight Loss Communication Decline Worsening Relationships Falls Worsening Pain 56 Regression Modeling Results Results of relating each QI to all others revealed very low R2 for all Treatment & Conditions While R2 higher for QIs within other factors, many conceptually unrelated QIs found to weakly predict other QI Many QIs “load” (related to) on multiple factors QI “type” (e.g. prevalence, longitudinal, change) as influential as QI content in factor Factor structure sensitive to which QIs included Many QIs totally uncorrelated with others 57 Provisional Test of Combining Unlike Quality Indicators Use 1999 MDS 2.0 from OH, NY & CA Create Risk and Admission adjusted QIs for Pressure Ulcers, Anti-Psychotic Use and Pain Correlate Measures: PU & Pain<.05; PU & AntiPsych = -.15; Pain & Anti-Psych = .16 Only 13% of facilities in bottom half on all 3 QI’s; 5% if use bottom third on all measures 58 Public Reporting of Quality NURSING HOME COMPARE allows consumers and advocates to identify facilities in their geographic area and to select using a “Five Star” global rating OR based upon global domains OR specific measures. HOME HEALTH COMPARE allows consumers and advocates to identify agencies in geographic area and presents detail of many different Quality Indicators 59 60 61 62 63 64 65 Effect of Public Reporting Research to date all done on nursing homes Broken down by market served; long stay residential vs. short stay, post-acute First wave of studies surveyed administrators to find out how they were responding More recent studies use MDS data to examine changes in outcomes and admission patterns 66 Facility Response to Reporting Castle (2005) initially found Administrators were skeptical and unconvinced that reporting mattered Zinn & colleagues (2005) surveyed leaders and found they were aware of their scores and those of closest competitors; concluded spurred quality improvement Castle (2007) concurred in a separate survey that more competitive markets affected response 67 Reporting Improve Quality? Werner & colleagues (2009) found significant improvement in BOTH measured and unmeasured quality measures following public reporting – BUT general improvement trend Mukamel et al (2007) looked carefully at initial response relative to prior quality patterns and also found improvement on most but not all measures Werner, et al, 2010 also found improvement in post-acute quality scores 68 Reporting Alter Admissions? Werner & Colleagues have examined whether facilities with worse quality scores in competitive markets manifest reductions in admissions Very complicated; must infer from the data why someone entering a facility; should affect those entering to stay more, but hard to know who However, evidence suggest small but significant changes in referral patterns favoring better quality 69 Summary Public Reporting of long term care providers’ quality performance is possible; All measures are flawed, but no more than acute and ambulatory care Pre-requisite is to have uniform data collected with relevant clinical detail AND should be able to be audited with penalties to minimize bad data 70 Summary (cont.) Constructing quality measures can be complex Sample size, seasonality, risk adjustment are all important to assure the “fairness” of the system Like case mix reimbursement, don’t want incentives for providers to limit access to sickest Still at the infancy of understanding how consumers & advocates use the data 71 Issues for the Future Preferable to have common items, measures and metrics across different types of long term care options, technically AND for consumers Challenge of Creating Composite Scores consumers want that are technically less sensitive than domain specific measures However, Movement to “Pay for Performance” requires we develop a solution 72