Assessing the quality and appropriateness of care Case reviews are frequently carried out for quality assurance, negligence claims, investigations of adverse outcomes, and in disciplinary proceedings. Despite the popularity of this process it has severe limitations. A review of cases requires an understanding of the science of the assessment of the quality of care. First we need to define quality of care, of which the most comprehensive is that of Donabedian1: ‘Quality of care is a conceptual entity represented by the entire continuum from process to outcome and not by either one independently’2 Ideally quality of care is based on audit of the outcomes in a group practice. This may be compared to an expected standard based on literature or comparable data from other groups. In either case, since case mix (the type of patient seen) will vary, outcome data should be risk-adjusted to allow for this. Since the outcome may be just as much a function of risk factors such as age and comorbidity (presence of other medical conditions, see earlier discussion) as process (the care provided), it is important to determine which factors are responsible for the observed outcomes, and which are modifiable. Alternatively, risk adjusted outcomes may be examined within a group practice to see if there are obvious outliers, from which lessons may be learnt and practice modified. Obviously this applies equally to exceptionally bad or exceptionally good outcomes. Finally subsets of patients may be examined to see if there are certain groups of patients who appear to benefit less or more, which may lead to modifications in practice. In contrast negligence or malpractice claims, or disciplinary proceedings work backwards from known adverse outcomes to examine process for the appropriateness of care delivered, which is much more susceptible to bias. Limitations Even when trained reviewers are used, assessments of the appropriateness or quality of care show consistently low reliability3,4,5,6,7,8,910,11. Agreement between reviewers examining the same cases can be expressed as inter-rater reliability coefficients or kappa scores. While these are often at acceptable levels in terms of agreeing about the specific outcomes, reliability is much lower for process.12,13,14,15,16,17 However reliability is improved where there are accepted evidence based guidelines (Hofer 2004, Naylor 1998). (for further discussion on this see Measurement: Outcomes or Process? below) In a recent review Naylor18 parodies the reliance placed on profiles of clinical performance, driven by a ‘watchword’ of greater assessment and accountability in health care.19 As he states, performance measurement remain ‘a very inexact science’ and ‘clinical outcomes are relatively insensitive to process-of-care problems’. He demonstrates the limitations of even risk-adjusted data due to confounding by multiple factors, and that without a ‘level playing field’, ‘unfair comparison of disparate patient’ populations produce misleading data and impressions. He stresses that any assessment of clinical performance must take into account four critical elements: process of care, clinical outcome, patient perception and patient satisfaction. Finally he emphasizes the importance of considering the impact of performance profiles in terms of damaged ‘professional and institutional reputations and morale’. Theoretically there are potential harms to patients, particularly if such assessments lead to excessive risk aversion. Outcome Bias and Related Factors Of even more concern is the effect of knowledge of outcomes in biasing assessment of the level of care.20 Outcome bias is the relationship between the known severity of an outcome and the degree of harshness with which people judge the alleged perpetrators of that outcome. The desire to allocate blame, and the tendency to render this proportionately to outcome is a natural human fallibility, and in recent years has been driven by the defensiveness resulting from escalation in the application of tort law21 and dramatic press coverage of the estimated incidence and prevalence of medical error.22,23 This represents a significant retreat from an earlier position that “to seek absolute safety is to advocate therapeutic nihilism”.24 It has been suggested that this fulfils a need to annul wrong and lessen hurt (Merry and McCall Smith 2003). Knowledge of outcomes changes both the harshness of implicit judgement and the willingness to render judgement. Furthermore the specific nature of the outcome will influence assessments. When trained reviewers are given cases but minor adverse outcomes are then altered to more severe outcomes, the assessment of the quality of care changes sharply25,26. Kaplan et al. found the proportion of cases assessed as inappropriate by reviewers increased 14% (from 19 to 33% of the total) and judgements on appropriate care decreased 31% (from 67 to 36% of the total, almost halved) and in some cases as much as 61%, after changing the outcomes that were given to the reviewers, for the same description of the level of care.27 This is consistent with the observation that judgement of care is highly dependant on the context and ‘hindsight knowledge’ as opposed to the actual care provided, and has been a consistent observation28,2930,31,32 in many fields of human behaviour, and has a sound basis in psychology.33,34,35 At best retrospective chart reviews remain a ‘blunt instrument’36 with a built in systematic bias towards negativity. Process and outcome are at best loosely linked in complex and dynamic settings. Not all bad process results in adverse outcomes, and bad outcomes ‘occur despite good decisions, [including]…complete and thorough consideration of all of the available information, goals, and contributing factors’ (Edwards 1984). Even in the investigation of adverse outcomes an adaptive learning framework demonstrates that the key ‘is not to focus on where people went wrong but to understand how their decisions and actions made sense at the time’ (Henriksen 2003). Henriksen also points out how this pervasive bias is conducive in processes such as morbidity and mortality conferences (see above) to mete out ‘a fair share of shame and 2 blame’, associated with ‘delusional clarity and important lessons unlearned’. This can be further compounded by another shortcoming, that of attribution error37. “Rather than giving careful consideration to the full range of situational, organizational, and system related factors that are present when adverse events happen, humans serving in a supervisory or evaluative capacity tend to make dispositional attributions and view the mishap as evidence of an inherent character flaw or defect on the part of the individual involved” Other psychological factors that operate under these circumstances are those of the strong sense of personal responsibility imbued in medical hierarchies, moral meaning and just reciprocation, in which comparable magnitude of consequences and causation are sought. In a competitive medical culture, personal responsibility runs deep and successful outcomes are more highly valued than correct decision processes. Related to outcome bias is the concept of hindsight bias38, in which people exaggerate the extent to which individuals can predict the consequences of their actions.39,40,41 Alternatively those with prior knowledge of the outcome believe falsely that they would have predicted such an outcome. As Arkes and colleagues point out, the natural human tendency is to try and make sense out of a train of events rather than analyze the data independently, or reconstruct the events in context.42 Physicians are greatly susceptible to bias in their judgements and hindsight bias is a compelling influence in the population leading to overconfidence in their own abilities and an inadequate appreciation of the of the difficulties in making the original decision. Hindsight bias works through ‘creeping determinism’ whereby outcome information is unconsciously integrated into knowledge of prior events.43 This is reinforced by secondary gain, whereby the assessor appears to be more knowledgeable and intelligent, and to maintain esteem by being intolerant to ambiguity. Often forgotten in assessing quality of care is the ‘mirror image’ problem. Driven by an over-emphasis on and misunderstanding of the Hippocratic dictum of ‘do no harm’, the harms of not acting are overlooked in the equation.44 The risks of all feasible alternatives must be considered in aiding patient decision making. Memory Another major limitation in evaluative processes such as this is the reliance on memory. Henrikson and others45,46 point out that human memory is not only frail (limitations in the reproductive function of memory) but subject to systematic bias, so called reconstructive memory47. In this process recalled fragments are ordered and filled in with constructions that appear to make sense. Two conducive factors to systematic reconstruction are the passage of time, and reliance on those not directly involved. Other factors include the way questions are framed48, assimilation to the familiar, and outcome bias. 3 Adaptive Learning When these factors are placed in the broader context of adaptive learning49 we find evidence of motivational factors such as the desire to maintain public esteem or to maintain a more orderly control over their environment. Incorporating Patient Preferences In defining quality of care, Brook50 identifies two components, that as he says, ‘are important to people’. Care of high technical quality, and that ‘all patients wish to be treated in a humane and culturally appropriate manner and be invited to participate fully in deciding about their therapy’. Put simply, individuals’ value systems shape their choices. Naylor asks ‘What is appropriate care?’51 noting increased public concern about accountability with shrinking resources, and revelations of seemingly endless inexplicable studies revealing wide variations in what physicians do, and how they do it. However he notes that the biggest problem is defining and measuring quality of care. It has been said that the cornerstone of good care is sound clinical judgement, but the difficulty is in judging clinical judgement. Naylor offers this definition ‘an ability to determine what interventions (including calculated inactivity) will lead consistently to net benefits for individual patients.’ While this might seem straightforward to some, even the most rigorous tests of evaluating this give rise to concern, for instance different panels of experts could variously rate elective hysterectomies as inappropriate with a 100% variation.52 These sorts of studies are widespread53. In pondering over these failures of agreements he points out that a major failing of chart reviews or other forms of review is that they ignore patients’ preferences “Surely the key to more appropriate utilization is much greater patient involvement in decision making, not post hoc audits based on a few clinicians’ judgments”. Acknowledging that there are many ‘Grey Zone’ (Naylor 1995) areas of clinical practice, he attempted to answer his original question. “What is appropriate care? The answer will often be that it depends. It depends on which clinicians are asked, where they live and work, what weight is given to different types of evidence and end points, whether one considers the preferences of patients and families, the level of resources in a given health system, and the prevailing values of both the system and the society in which it operates….[some] will undoubtedly dislike this ambiguity. Clinicians, however, may welcome it as affirmation that the art of medicine is unlikely to be managed away for many years to come.” In an earlier paper (Naylor 1995) he characterised clinical medicine as ‘a few things we know, a few things we think we know (but probably don’t), and lots of things we don’t know at all’. Noting that although lack of evidence characterises much of what we do, ‘paralytic indecisiveness’ is surprisingly uncommon because we have become adept at 4 making decisions under uncertainty and confusing ‘personal opinion with evidence, or personal ignorance with genuine scientific uncertainty’. What is black and white in abstract rapidly becomes grey when applied to meeting individual patients’ needs. This has been discussed in a separate document Clinical Judgement: Decision making Preferences. Put simply, quality or appropriateness of care cannot be assessed in isolation from the individual patient and their family. Measurement: Outcomes or Process? Hofer et al. (2000) define the outcomes of interest as being at three levels, preventable morbidity, mortality and patient satisfaction. Essentially missing in most procedures such as the one being discussed in this document is attention to overall as opposed to individual outcomes from which observers work backwards. As Brook and colleagues (2000) state; “Process measures are only as good as the evidence that associates them with improved outcomes. Process assessments produce the harshest judgment of the quality of care.” In one of the original studies by this RAND group, only 4 of 300 patient records examined were deemed to be appropriate care, yet in only 25% could it be determined as to whether outcome could be improved by changes to the process, despite 98% of patients receiving ‘inappropriate’ care 54. They also note that it is extraordinarily difficult to determine what actually transpired, for many proposed measures. Just because something is not documented in a chart, such as a discussion, does not mean it did not take place. Assessments of process must therefore be reliably linked to events, desirable or undesirable (Brook 1973). In addition they question the equity or fairness of quality assessments that compare providers, in the absence of well documented reliability and validity. They stress the importance of patient input to any assessment. Quality assessment remains largely a research tool. In particular they noted that despite increasing emphasis on meeting patient’s expectations and patient satisfaction as outcomes, that chart reviews are unlikely to reflect this component. Summary Lack of epidemiological robustness and rigour, as in most of the error literature (Hofer 2000) severely hampers assessment of the quality and appropriateness of care. There are wide variations in how physicians practice, without necessarily adversely affecting the outcomes. Patient’s own values and preferences strongly influence both the process and outcome of care, and physicians consistently fail either to accurately estimate this or to take it into account. Judgements on care are subject to severe bias, reflecting wisdom after the event and individual prejudices. 5 Understanding the problems inherent in assessing quality of care, its limitations, lack of reliability and validity, and inherent negative bias, are essential initial steps in undertaking any judgement as to quality of care delivered. Dr Michael Goodyear, Department of Medicine, Dalhousie University, Halifax, Nova Scotia, Canada. March 2005. References 1 Donabedian A. Quality assessment and assurance: unity of purpose, diversity of means. Inquiry. 1988;25:173-92. 2 Hofer TP, Kerr EA, Hayward RA. What is an error? Effective Clinical Practice November 2000. 3 Koran LM: The reliability of clinical methods, data and judgments (first of two parts). N Engl J Med. 1975 Sep 25;293(13):642-6. 4 Koran LM: The reliability of clinical methods, data and judgments (second of two parts). N Engl J Med. 1975 Oct 2;293(14):695-701 5 Dubois RW, Brook RH. Preventable deaths: who, how often, and why? Ann Intern Med. 1988 Oct 1;109(7):582-9 6 Brennan TA, Localio RJ, Laird NL. Reliability and validity of judgments concerning adverse events suffered by hospitalized patients. Med Care. 1989 Dec;27(12):1148-58. 7 Thomas EJ, Lipsitz SR, Studdert DM, Brennan TA. The reliability of medical record review for estimating adverse event rates. Ann Intern Med. 2002 Jun 4;136(11):812-6. Erratum in: Ann Intern Med 2002 Jul 16;137(2):147. 8 I Localio AR, Weaver SL, Landis JR, Lawthers AG, Brenhan TA, Hebert L, Sharp TJ. dentifying adverse events caused by medical care: degree of physician agreement in a retrospective chart review. Ann Intern Med. 1996 Sep 15;125(6):457-64. 9 Hayward RA, McMahon LF Jr, Bernard AM. valuating the care of general medicine inpatients: how good is implicit review? Ann Intern Med. 1993 Apr 1;118(7):550-6 10 Hofer TP, Asch SM, Hayward RA, Rubenstein LV, Hogan MM, Adams J, Kerr EA. Profiling quality of care: Is there a role for peer review? BMC Health Serv Res. 2004 May 19;4(1):9 11 Goldman RL. The reliability of peer assessments of quality of care. JAMA 267: 958-960, 1992 12 Brook RH, Appel FA. Quality-of-care assessment: choosing a method for peer review. N Engl J Med. 1973 Jun 21;288(25):1323-9. 13 Nobrega FT, Morrow GW, Smoldt RK, Offord KP. Quality assessment in hypertension: analysis of process and outcome methods. N Engl J Med. 1977 Jan 20;296(3):145-8 14 Fessel WJ, Van Brunt EE. Assessing quality of care from the medical record. N Engl J Med. 1972 Jan 20;286(3):134-8. 15 Brook RH. Quality-can we measure it. N Engl J Med. 1977 Jan 20;296(3):170-2. 16 Schroeder SA, Kabcenell AI. Do bad outcomes mean substandard care? JAMA. 1991 Apr 17;265(15):1995. 17 Smith MA, Atherly AJ, Kane RL, Pacala JT. Peer review of the quality of care: reliability and sources of variability for outcome and process assessments. JAMA 278: 173-1578, 1997 18 Naylor CD. Public profiling of clinical performance. JAMA 287(10): 1323-1325, 2002. 6 19 Relman AS. Assessment and accountability: the third revolution in medical care. N Engl J Med. 319: 1220-1222, 1988 20 Henriksen K, Kaplan H. Hindsight bias, outcome knowledge and adaptive learning. Qual Saf Health Care. 2003 Dec;12 Suppl 2:ii46-50 21 Richards C. Who’s to blame for bad luck? Chicago Sun Times July 9 2003: 51 22 CNN. Physicians divided: report of medical-error deaths on target or exaggerated? CNN.com July 5 2000 23 Wiley D. Mistakes that kill. Macleans.ca August 13 2001 24 Mills DH. Medical insurance feasibility study: a technical study. West J Med 128: 360-5, 1978 25 Merry A, McCall Smith A. Errors, medicine, and the law. Cambridge University Press, 2003. pp 162-4, 194-8. 26 LaBine SJ, LaBine G. Determination of negligence and the hindsight bias. Law Hum Behav 20: 501-516, 1996 27 Caplan RA, Posner KL, Cheney FW. Effect of outcome on physician judgments of appropriateness of care. JAMA. 1991 Apr 17;265(15):1957-60. 28 Brennan TA, Leape LL, Laird NM, Hebert L, Localio AR, Lawthers AG, Newhouse JP, Weiler PC, Hiatt HH. Incidence of adverse events and negligence in hospitalized patients. Results of the Harvard Medical Practice Study I. N Engl J Med. 1991 Feb 7;324(6):370-6. 29 Brennan TA, Sox CM, Burstin HR. Relation between negligent adverse events and the outcomes of medical-malpractice litigation. N Engl J Med 335: 1963-1967, 1996 30 Berlin L. Malpractice issues in radiology: Outcome bias. AJR 183: 557-560, 2004 31 Morris JA Jr, Carillo Y, Jenkins JM et al: Surgical adverse events, risk management, and malpractice outcome: morbidity and mortality review is not enough. Ann Surg 237: 844-852, 2003 32 Blendon RJ, DesRoches CM, Brodie M et al: Views of practising physicians and the public on medical errors. N Engl J Med 347: 1933-1940, 2002 33 Hoch SJ, Loewenstein GF. Outcome feedback: hindsight and information. J Exp Psychol 15: 605-619, 1989 34 Reason J. Human Error. Cambridge University Press 1990. 35 Edwards W. How to make good decisions. Acta Psychol. 56: 5-27, 1984 36 Neale G, Woloshynowych M. Retrospective case record review: a blunt instrument that needs sharpening. Qual Saf Health Care. 2003 Feb;12(1):2-3 37 Nisbett R, Ross L. Human inference: strategies and shortcomings of social judgments. Prentice-Hall, NY, 1980 38 Berlin L. Malpractice issues in radiology: Hindsight bias. AJR 175: 597-801, 2000 39 Baron J, Hershey JC. Outcome bias in decision evaluation J Personality Soc Psychol 54: 569-79, 1988 40 Russo JE, Schoemaker PJH. Decision traps – the ten barriers to brilliant decision-making and how to overcome them. Simon & Schuster, NY, 1989 41 Hawkins SA, Hastie R. Hindsight: biased judgements of past events after the outcomes are known. Pscychol Bull 107: 311-327, 1990 42 Arkes HR, Wortmann RL, Saville PD, Harkness AR. Hindsight bias among physicians weighing the likelihood of diagnoses. J Appl Psychol 66: 252-254, 1981 43 Renfrew DI, Franken EA, Berbaum KS, Weigelt FH, Abu-Yousef MM. Error in radiology: classification and lessons in 182 cases presented at a problem case conference. Radiology 183: 145-50, 1992 44 Bogardus ST, Holmboe E, Jekel JF. Perils, pitfalls and possibilities in talking about medical risk. JAMA 281(11) 1037-1041. 45 Redelmeier DA, Schull MJ, Hux JE et al. Problems for clinical judgement: 1. Eliciting an insightful history of present illness. CMAJ 164(5): 647-51, 2001 46 Redelmeier DA, Schull MJ, Hux JE et al. Problems for clinical judgement: 2. Obtaining a reliable past medical history. CMAJ 164(6): 809-13, 2001 47 Bartlett FC. Remembering: an experimental and social study. Cambridge University Press, 1932 48 Loftus EF. The eyewitness on trial. Trial. October: 3l-3, 1980 49 Campbell JD, Tesser A. Motivational interpretations of hindsight bias: an individual difference analysis. J Personality 51: 605-20, 1983 50 Brook RH, McGlynn EA, Shekelle PG. Defining and measuring quality of care: a perspective from US researchers. Int J Qual Health Care 12(4): 281-295, 2000 51 Naylor CD. What is appropriate care? N Engl J Med. 1998 Jun 25;338(26):1918-20. 7 52 Shekelle PG, Kahan JP, Bernstein SJ, Leape LL, Kamberg CJ, Park RE. The reproducibility of a method to identify the overuse and underuse of medical procedures. N Engl J Med 1998;338:1888-1895. 53 Naylor CD. Grey zones of clinical practice: some limits to evidence-based medicine. Lancet 1995;345:840-842. 54 Brook RH, McGlynn EA, Cleary PD. Quality of health care. Part 2: measuring quality of care N Engl J Med 335: 966-70, 1996 8