Clinical Significance Editorial 2005

advertisement
Clinical significance of patient-reported questionnaire data:
Another step toward consensus
Jeff. A. Sloan, Ph.D.1
David Cella, Ph.D.2
Ron D. Hays, Ph.D.3,4
1.
2.
3.
4.
Mayo Clinic, Rochester, MN.
Northwestern University and Evanston Northwestern Healthcare, Evanston, IL
University of California Los Angeles, Los Angeles, CA
RAND, Santa Monica, CA
Introduction
Much has been written and discussed in recent years on the “clinical significance” of
quality of life (QOL) differences. In this issue, Yost et al1 provide an example of how to
go about estimating minimally important differences when comparing groups of patients
for the Functional Assessment of Cancer Therapy – Colorectal (FACT-C). The article
discusses the application and limitations of various approaches to obtaining estimates for
minimally important effects. It also provides an opportunity to discuss the differences, or
lack thereof, between the “competing” estimation methods. The hope is that in doing so,
a common ground can be found that refocuses research efforts in this arena on the
dissemination of practical application of QOL assessments. That is the purpose of this
accompanying editorial.
Why do we need to define clinical significance?
Drawing from the seminal early summary of Lydick and Epstein2, Yost et al dichotomize
the various methods for estimating a minimal clinically significant difference: Anchorbased and distribution-based. Anchor-based methods link changes in the QOL measure to
other important variables (anchors) while distribution-based approaches link the changes
in QOL to the probability distribution of scores from statistical theory. Guyatt et al.3
provide a detailed summary of the various approaches used to date.
Some clinical colleagues who are interested in understanding the meaningfulness of
patient-reported data have expressed frustration with the lack of clear guidelines on how
to interpret their data.4,5,6 QOL researchers have spent great efforts to convince them that
QOL assessments are an important and vital part of modern medicine7. By the early
1990’s, a myriad array of new QOL assessments entered the scientific literature.
Clinicians became more familiar with the concept of including QOL in clinical trials and
began asking practical questions regarding their usage.8.If QOL is an important clinical
outcome, then how does one interpret the results? How does one apply the results from
clinical trials comparing groups to the individual patient in the clinic?9 Can clinical
pathways include these tools? Clinicians began asking the same questions they would
about any other clinical measure, such as blood pressure or a component of a complete
blood culture analysis.
Foremost among these questions were “what do these scores mean?” and “how do we tell
when a change in a QOL score is important?” Jaeschke and Juniper were among the first
to tackle the concept of clinical significance while evaluating an asthma questionnaire10.
What followed was a series of disconnected attempts in the literature to define clinical
significance.3,4,11 This led to concerns that QOL assessments were not “hard science” and
therefore could not be trusted in the same way that laboratory assays were.12 The
combination of a lack of familiarity with self-reported endpoints and an unclear message
from the QOL “industry” led to clinical research articles questioning the value of QOL
assessments.13-17 One might say we were being told to “put up or shut up.” While this
may seem harsh, it is reasonable for clinicians to expect a scientific integrity out of QOL
assessments equal to other clinical indicators so that they may be incorporated into the
gestalt of medical practice.
A clinical significance Tower of Babel
QOL researchers have taken up the task admirably to attempt to define clinical
significance. Numerous authors have discussed the various approaches and potential
definitions.3,11,18,19, 20 Articles began to appear suggesting cutoff points for numerous
QOL assessments. The inherent problem in attempting to find a definition of clinical
significance for each tool in each clinical population raised the question of practicality.
Several authors attempted to find a unifying theme underlying the various
approaches.3,11,18,21,22
Recently, Norman et al23 presented an analysis that indicated a “remarkable universality”
among estimates of clinical significance that centered around roughly ½ times the
standard deviation (½ SD) of the QOL measure involved. Sloan19,24 used an analogy of
“worms, ducks, and elephants” to classify effect sizes as “small, moderate or large” and
suggested that the ½ SD was indeed a point of convergence among the various
approaches that would represent a clinically significant effect size. A consensus would
seem to be evolving.
Not all voices were in accord. Some authors questioned the simplicity of a single estimate
for clinical significance,25 pointing out that a range of scores might be more appropriate
across different applications and/or different populations. Perhaps this debate26 can be
resolved by considering the simple, single value as a guideline rather than a rule. Further
research on any clinical “rule of thumb” is informative, and may modify a guideline, but
a common starting point can facilitate progress. A clinical example in oncology might be
useful. For decades oncologists have considered 50% tumor shrinkage to be a meaningful
response to therapy. This is used in oncology despite the fact that a 50% tumor shrinkage
has more meaning in some tumors than others, and within a given tumor type,
meaningfulness of tumor response depends on many factors and can have many different
values. If a patient realizes 40% shrinkage, this may in the individual case be meaningful
(e.g., symptom reduction), but today’s convention would not deem this as a response.
Nevertheless, tumor response rate is enabled in clinical trials because of the acceptance of
this imperfect convention. Perhaps symptom and quality of life response rates could be
similarly encouraged in such a way to facilitate new research.
A further confusion in this field has arisen regarding the term “minimal.” Numerous
alternative terms including “minimal important difference,” “clinically important
difference,” “minimum clinically important difference,” “clinically significant
difference” and other word combinations appeared in the literature. Some authors
suggested that the ½ SD may be “too large to be minimal” and that perhaps 1/4 or 1/3 SD
was more representative of a “minimal difference.”27 Indeed, Cohen’s widely used rules
of thumb for interpreting the magnitude of differences offer 0.50 SD as a “medium”
effect size and 0.20 SD as a “small” effect. Some have suggested that “minimal”
differences might be closer to Cohen’s small effect size than to a medium effect,24,28.
while others have found “medium” effects the same size as “minimally important
differences.”10, 23, 24 Whether or not the different definitions were supportable
methodologically or intuitively was unclear.
The sum total of this discourse was that the basic question of interpreting QOL
assessments remained unanswered. QOL researchers need to recognize that if clinical
colleagues perceive that the psychometricians cannot agree among themselves on basic
scientific issues then this can feed the opinion that perhaps QOL is indeed too soft a
science to receive the same priority as traditional clinical parameters.
Can we agree?
In fact, the differences may not be as stark as one might think. As Yost et al1
demonstrate, it is possible to carry out a detailed analysis of clinical significance for a
single tool and derive estimates using the various approaches. The strength of the
scientific method is inherent in the fact that all the approaches given reasonably similar
answers. This is an important point, because if the various methods indicated wildly
variable results, we would have to question the underlying process of obtaining QOL
scores. Statistical theory tells us that ultimately all roads converge asymptotically if they
are indeed assessing the same construct. Individual variability across specific samples is
to be expected, characterized and then incorporated into the analysis rather than cited as a
weakness.
The differences among the various definitions can be resolved in part with a convergence
of the nomenclature involved, especially with respect to the term “minimum.” Although
many authors have asked patients how much of a difference might be clinically
important, few have incorporated the concept of a “minimum” in a statistical sense. Many
have asked the question more in terms of what might be an important difference and then
attached the term “minimum”. Perhaps the best way to resolve these grammatical
differences is to remove the term “minimum” entirely and just talk in terms of a clinically
meaningful or clinically significant effect. If these terms became the common vernacular,
it would be more palatable to our consumers, the clinicians, and remove us from the
myriad of acronyms associated with the term clinical significance.
A unifying clarification?
We hope we have clarified these issues to some extent. More importantly, we hope that
the science can move beyond psychometric minutiae to clinical necessity so that our
clinical colleagues may eventually come to accept QOL assessments as enthusiastically
as we do, enabling patient voices to be heard in the decision-making equation. Towards
these ends, we propose the following guidelines:
1) The method used to obtain an estimate of clinical significance should be
scientifically supportable.
2) The ½ SD is a conservative estimate of an effect size that is likely to be clinically
meaningful. An effect size greater than ½ SD is not likely to be one that can be
ignored. In the absence of other information, the ½ SD is a reasonable and
scientifically supportable estimate of a meaningful effect.
3) Effect sizes below ½ SD, supported by data regarding the specific characteristics
of a particular QOL assessment or application, may also be meaningful. The
minimally important difference may be below ½ SD in such cases.
4) If feasible, multiple approaches to estimating a tool’s clinically meaningful effect
size in multiple patient groups are helpful in assessing the variability of the
estimates. However, the lack of multiple approaches with multiple groups should
not preemptively restrict application of information gained to date.
The four points are intended as guidelines, not rules. Perhaps we can all agree on these
starting points.
The encouraging message is that the evidence to date suggests that all approaches of
estimating clinical significance converge more than they diverge. This tenet is important
to keep in mind when one is engrossed in the morass of psychometric anomalies. While
all of us in the QOL “industry” enjoy discovering and describing the sources of
variability, the “noise,” in QOL assessment, users of these instruments require practical
guidance on “the signal” to apply and use them in the clinical world. Hopefully this
editorial has helped enlighten as to when the signal can be considered important.
References
1. Yost KJ,1* Cella D,1 Chawla A,2 Holmgren E,2 Eton DT,1 Ayanian JZ,3 West DW4.
Minimally important differences were estimated for the Functional Assessment of Cancer
Therapy – Colorectal (FACT-C) instrument using a combination of distribution- and
anchor-based approaches.
2. Lydick E, Epstein RS. Interpretation of quality of life changes. Qual Life Res. 2:221226, 1993.
3. Guyatt GH, Osoba D, Wu AW, Wyrwich KW, Norman GR, Clinical Significance
Consensus Meeting Group. Methods to Explain the Clinical Significance of Health
Status Measures. Mayo Clin Proc 371-383, 2002.
4. Moser DK. Psychosocial factors and their association with clinical outcomes in
patients with heart failure: why clinicians do not seem to care. Eur J Cardiovasc Nurs.
Oct;1(3):183-8, 2002.
5. Unruh ML, Weisbord SD, Kimmel PL. Health-related quality of life in nephrology
research and clinical practice. Semin Dial. Mar-Apr;18(2):82-90, 2005.
6. Morreim EH. The impossibility and necessity of quality of life research. Bioethics
6:218-32, 1992.
7. Osoba D, Bezjak A, Brundage M, Zee B, Tu D, Pater J; Quality of Life Committee of
the NCIC CTG. Analysis and interpretation of health-related quality-of-life data from
clinical trials: basic approach of The National Cancer Institute of Canada Clinical Trials
Group. Eur J Cancer. 41(2):280-7, 2005.
8. Ayanian JZ, Chrischilles EA, Wallace RB, Fletcher RH, Fouad MN, Kiefe CI,
Harrington DP, Weeks JC, Kahn KL, Malin JL, Lipscomb J, Potosky AL, Provenzale
DT, Sandler RS, van Ryn M, West DW. Understanding Cancer Treatment and
Outcomes: The Cancer Care Outcomes Research and Surveillance Consortium. Journal
of Clinical Oncology, Comments and Controversies, Vol 22(15):August 1, 2004.
9. Hays, R. D., Brodsky, M., Johnston, M. F., Spritzer, K. L., & Hui, K. Evaluating the
statistical significance of health-related quality of life change in individual patients.
Evaluation and the Health Professions, 28, 160-171, 2005.
10. Jaeschke R, Singer J, Guyatt GH. Measurements of health status: ascertaining the
minimal clinically imortant difference. Cont Clin Trials 10:407-415, 1989.
11. Osoba D A. Taxonomy of the Uses of Health-Related Quality-of-Life Instruments in
Cancer Care and the Clinical Meaningfulness of the Results. Med. Care.
40(6)(Supplement):III-31-III-38, 2002.
12. Frost MH, Sloan JA. Quality of Life Measurements: A soft outcome-or is it? The
American Journal of Managed Care 8(18):S574-S579, 2002.
13. Movsas B. Scott C. Quality-of-life trials in lung cancer: past achievements and future
challenges. Hematology - Oncology Clinics of North America. 18(1):161-86, 2004.
14. Bottomley A. Vanvoorden V. Flechtner H. Therasse P. EORTC Quality of Life
Group EORTC Data Center. The challenges and achievements involved in implementing
Quality of Life research in cancer clinical trials. European Journal of Cancer. 39(3):27585, 2003.
15. Giesler RB. Williams SD. Opportunities and challenges: assessing quality of life in
clinical trials Journal of the National Cancer Institute. 90(20):1498-9, 1998.
16. Baars RM. van der Pal SM. Koopman HM. Wit JM. Clinicians' perspective on quality
of life assessment in paediatric clinical practice. Acta Paediatrica. 93(10):1356-62, 2004.
17. Welsh M. Parkinson's disease and quality of life: issues and challenges beyond motor
symptoms. Neurologic Clinics. 22(3 Suppl):S141-8, 2004.
18. Wyrwich KW. Minimal Important Difference Thresholds and the Standard Error of
Measurement: Is There a Connection? Journal of Biopharmaceutical Statistics 14(1):97 110, 2004.
19. Sloan J, Symonds T, Vargas-Chanes D, Fridley B. Practical guidelines for assessing
the clinical significance of health-related quality of life changes within clinical trials.
Drug Information Journal 37:23-31, 2003.
20. P. Tugwell, M. Boers, P.M. Brooks, L. Simon, C.V. Strand . OMERACT 5:
International Consensus Conference on Outcome Measures in Rheumatology: Minimal
Clinically Important Difference Module J Rheumatol 28:395-460, 2001..
21. Sloan JA. Assessing the Minimally Clinically Significant Difference: Scientific
Considerations, Challenges and Solutions. Journal of Chronic Obstructive Pulmonary
Disease 2: 57-62, 2005
22. Tugwell P, Guyatt G. Generating measurements, especially for quality of life
in Clinical Epidemiology, Third Edition, Author(s): R. Brian Haynes, David L. Sackett,
Gordon H. Guyatt, Peter Tugwell .Lippincott, Wilkins and Williams, New York (in
press)..
23. Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related
quality of life. The remarkable universality of a half a standard deviation. Medical Care
41(5):582-592, 2003.
24. Sloan JA, Vargas-Chanes D, Kamath CC, Sargent DJ, Novotny PJ, Atherton P,
Allmer C, Fridley BL, Frost MH, Loprinzi CL. Detecting worms, ducks and elephants: A
simple approach for defining clinically relevant effects in quality-of-life measures. J
Cancer Integrative Medicine 1(1):41-47, 2003.
25. Hays RD, Woolley JM. The Concept of Clinically Meaningful Difference in HealthRelated Quality-of-Life Research. How Meaningful is it? Pharmaoeconomics 419-423,
2000.
26. Norman GR, Sloan JA, Wyrwich KW. The truly remarkable universality of half a
standard deviation: confirmation through another look. Expert Review of
Pharmacoeconomics and Outcomes Research 4(5): 515 – 519, 2004.
27. Cella DF, Zagari MJ, Vandoros C, Gagnon DD, Huntz, HJ, Nortier JWR. Epoetin
alfa treatment results in clinically significant improvements in quality of life (QOL) in
anemic cancer patients when referenced to the general population. J Clin Oncol
21(2):366-373, 2003.
28. Farivar SS, Kiu H, Hays RD. Another look at the half standard deviation estimate of
the minimally important difference in health-related quality of life scores. Expert Rev.
Pharmacoeconomics Outcomes Res 4(5):521-529, 2004.
Download