Appendix G ASSESSING CLINICAL SIGNIFICANCE STATISTICALLY

advertisement
Appendix G ASSESSING CLINICAL SIGNIFICANCE STATISTICALLY
Appendix Outline:
INTRODUCTION
STATISTICALLY RELIABLE CHANGE
NORMATIVE COMPARISONS
STATISTICALLY RELIABLE CHANGE PLUS RECOVERY
INTRODUCTION
As promised in Chapter 16, this appendix will provide some additional statistical
information to help you include a statistical approach for assessing clinical significance
when writing proposals or interpreting findings of completed studies. You may recall that
clinical significance refers not only to the meaningfulness and practical value of the
overall findings of a study, but also to the meaningfulness and practical value of the
benefits of an intervention for each individual recipient of the evaluated intervention.
Significant differences between group means, for example, don’t tell us how many
individual clients in each group experienced clinically significant improvements. Three
statistical approaches have been proposed for measuring clinical significance among
individual clients: 1) statistically reliable change; 2) normative comparisons; and 3)
statistically reliable change plus recovery. Let’s begin with the statistically reliable
change approach.
STATISTICALLY RELIABLE CHANGE
Ogles, Lunnen, and Bonesteel (2001) provide an overview of the different
statistical methods that have been developed using the statistically reliable change
approach. Each method applies a formula separately to every client to assess whether
their amount of change from pretest to posttest can be attributed to measurement error.
Consequently, this approach can be applied only when both pretests and posttests are
Appendix G May 2005 Evidence-Based
612
used. Other limitations of this approach are discussed in Chapter 16. At this point we
will examine the most frequently used statistical formula for assessing the reliability of
individual change scores: the Jacobson and Truax (1991) Reliable Change Index (RCI).
The formula is as follows:
Xpost – Xpre
___________
Sdiff
RCI =
where
Xpre = the pretest score
Xpost = the posttest score
Sdiff = the standard error of the difference between the two test scores
To calculate the standard error of the difference between the two test scores, we
apply the denominator from the t-test formula, which we first encountered in Chapter 12.
Here it is again:
N1 s12 + N2 s22
______________
N 1 + N2 – 2
N1 + N2
________
N 1N 2
where
s12
is the variance of the pretest scores (the square of the standard
deviation of the pretest scores. See Chapter 6 to be reminded about
calculating the standard deviation)
Appendix G May 2005 Evidence-Based
613
s22
is the variance of the posttest scores (the squared standard deviation
of the posttest scores.)
N1 is the size of the first sample.
N2 is the size of the second sample.
Let’s calculate an RCI for each of two experimental group clients in an imaginary
pretest-posttest experiment that found statistically significant results on a scale measuring
well being. Let’s assume that the experimental group and control group each had a
sample size of 50 and that the pretest and posttest standard deviations were both 10. Next,
let’s imagine that one client, Thelma, improved from a pretest score of 40 to a posttest
score of 50. Thelma’s RCI would be as follows:
50 - 40
RCI= ___________________
50 x 102 + 50 x 102
_______________
50 +50 – 2
50 +50
________
50 x 50
[AUTHOR’S NOTE: I DID NOT INTEND LINES LIKE THIS TO BE IN
THIS DOCUMENT, BUT DON’T KNOW WHY THEY ARE HERE OR HOW
TO DELETE THEM.]
Appendix G May 2005 Evidence-Based
614
10
= ___________________
50 x 100 + 50 x 100
_______________
50 +50 – 2
100
________
2500
10
= ___________________
5000 + 5000
_______________
50+50 – 2
1
________
25
10
= ___________________
10000
_______________
98
Appendix G May 2005 Evidence-Based
1
________
25
615
10
= _____________________________________________________
102
.04
10
= _________
4.08
= 2.45
Thus, Thelma’s RCI is 2.45. Let’s assume that a second imaginary client, Louise,
improved from a pretest score of 40 to a posttest score of 46. In calculating her RCI, the
denominator will be the same, but the numerator will be 6 (because 46 minus 40 equals
6). Thus, Louise’s RCI would be 6/4.08 = 1.47.
So, what do these two RCI’s mean? To answer that question, we should recall that
the standard error represents one standard deviation in a normal distribution. We should
also recall (from Chapter 7) that only 5 percent of the normal curve falls at least 1.96
standard deviations above or below the mean. Therefore, an RCI of at least 1.96 is needed
to rule out the plausibility of measurement error as an explanation for the improvement
and thus deem the amount of change to be statistically reliable. Using this approach to
clinical significance, each client who achieves an RCI of at least 1.96 is considered to
have made a clinically significant amount of improvement. Thus, Thelma, with her RCI
of 2.45 made clinically significant improvement, but Louise, with her RCI of 1.47 fell
short of clinical significance. When reporting the results for an entire sample, we would
Appendix G May 2005 Evidence-Based
616
indicate the proportion of clients in each group who made a clinically significant amount
of improvement based on their RCI scores.
NORMATIVE COMPARISONS
As mentioned in Chapter 16, the second statistical approach to clinical
significance emphasizes recovery and the use of normative comparisons. A nonstatistical application of this approach is to conduct diagnostic interviews before and after
treatment to see if clients who meet the criteria for a particular disorder before treatment
no longer meet those criteria after treatment.
A statistical application suggested by Jacobson, Follette and Revenstorf (1984)
requires the use of a standardized measurement instrument that has had norms established
for populations with and without a particular disorder. Clinical significance would be
attained by clients improve from pretest scores closer to mean of the clinical population
(those with the disorder) to posttest scores that are closer to the mean of the normal
population than the mean of the clinical population.
Suppose previous large scale studies have established that the mean on a
depression scale is 70 for people in treatment for depression, with a standard deviation of
10, while the mean and standard deviation of people not in treatment for depression are
40 and 10. Figure 1 displays the two distributions. A person who scores 50 on the scale is
one standard deviation above the normal population mean, but two standard deviations
below the clinical population mean. Thus, that person is more likely to be part of the
normal population than the clinical one. Conversely, a person who scores 60 scale is one
standard deviation below the clinical population mean, but two standard deviations above
Appendix G May 2005 Evidence-Based
617
the normal population mean. Thus, that person is more likely to be part of the clinical
population than the normal one.
What about a person who scores 55? That person has an equal probability of
belonging to either population, because they are at a midpoint of 1.5 standard deviations
away from either mean. One criterion for clinical significance using this approach
therefore, would be for the client’s posttest score to improve from the clinical side of that
midpoint to the normal side of it. Then they would have moved from a pretest score
indicating that they were more likely to belong to the clinical population to a posttest
score indicating that they are more likely to belong to the normal population.
Alternative criteria can be used, though you may deem them too lenient or too
stringent. One that may be too stringent is to require a posttest score that is at least two
standard deviations better than the clinical mean, which – for our imaginary study –
would be 50 or lower in Figure 1. On the lenient side, we may require a posttest score of
only one standard deviation better than the clinical mean (a score of 60 in Figure 1), since
we know that only 16 percent of the normal curve is that much better than the mean (as
discussed in Chapter 7). Although there is no firm rule about which cutoff point to use,
the midpoint (a score of 55 in Figure 1) may be the best compromise as neither too
lenient nor too stringent and in light of the somewhat arbitrary nature of the decision to
use either the more stringent or more lenient alternative.
The normative approach can be used with group means as well as with individual
client scores. When means are used instead of individual scores, we would apply the
chosen cutoff alternative to the posttest mean of the experimental group receiving the
intervention rather than to an individual client scores. When individual scores are used
Appendix G May 2005 Evidence-Based
618
instead of the group mean, we would (as with the RCI) report the proportion of clients
who improved past the chosen cutoff point for clinical significance.
Insert Figure 1 about here.
STATISTICALLY RELIABLE CHANGE PLUS RECOVERY
This third approach is recommended as the most likely to impress favorably those
proposal reviewers who want to see a statistical approach to clinical significance. To
have clinically significant improvement, clients would have to meet two criteria. First,
they would need an RCI (from pretest to posttest) of at least 1.96. Second, their posttest
score would have to pass the selected normative comparison cutoff point, such as being
on the normal side of the midpoint between the two distributions. Thus, to use this
approach, you simply apply the statistical procedures discussed above both for
statistically reliable change and for recovery using normative comparisons. Then you can
report the proportion of clients who meet both criteria. You could also report the
proportion meeting the statistically reliable change criterion, only, and the proportion
meeting the recovery criterion, only. Reporting all three of these proportions enables
readers to use the criterion they might prefer (if any) in judging the clinical significance
of your findings.
If you are dubious about using any of the foregoing three statistical methods to
judge clinical significance, you have lots of company. Each is controversial and makes
some problematic assumptions, as discussed in Chapter 16. These assumptions are
particularly problematic in many practice evaluations. For some evaluation studies, none
of the approaches may be applicable. Although the third approach, which combines the
Appendix G May 2005 Evidence-Based
619
first two, may be the one most preferred by some reviewers, requiring both statistically
reliable change as well as recovery is the most stringent criterion and is least likely to be
applicable to evaluations in social work and other human services. Consequently, if you
use this approach (and even if you use one of the less stringent approaches), you should
let your readers know that you are aware of its controversies and limitations. Some
reviewers may not favor your chosen approach, and some may reject the entire notion of
using any of these statistical approaches to clinical significance. So you don’t want to
appear unaware of those issues or unlikely to express appropriate cautions when reporting
the results of the approach you choose.
Appendix G May 2005 Evidence-Based
620
Figure 1. Hypothetical Distributions of Depression Scale Scores for a Normal and a
Clinical Population
Clinical
Population
Normal
Population
40
50 55
Clinically Significant Post-Test Scores
(more likely to belong to normal
population than to clinical population)
60
70
Clinically Significant Improvement
Appendix G May 2005 Evidence-Based
621
Download