Power and Effect Size

advertisement
Dr. Tom Kuczek
Purdue University
Power of a Statistical test
Power is the probability of detecting a difference in
means under a given set of circumstances. Power,
denoted as 1-β, depends on:
 n= the sample size in each group
 α=Probability of a Type I error
 σ=standard deviation of experimental error
 |µ1-µ2|=absolute difference between the means of each
treatment group
What affects Power of a Statistical test?
Power increases under the following conditions:
 n increases (may be costly)
 α increases (one reason α≤0.05 is recommended)
 σ decreases (will be discussed shortly)
 |µ1-µ2| increases (not an option if the treatments are
fixed)
Experimental Error
Experimental error, quantified by σ, actually has a
number of components including:
 The variability in experimental units
 The variability of experimental conditions
 The variability inherent in the measurement system
 Nuisance variables, generally unobserved
Design issues related to increasing Power
 Increasing the sample size n always works, but may not
be feasible, either due to financial constraints or
physical constraints on the experiment (such as a
limited amount of test material).
 If n and α are fixed, then the Power of the t-test
depends only on the ratio |µ1-µ2|/ σ, as this ratio
increases so does the Power of the test.
 If the method of applying the treatments is fixed then
|µ1-µ2| is likely fixed. If n is also fixed, then the only
option is to somehow reduce the experimental error σ.
How to reduce experimental error?
 The best way is to get a better measurement system with
lower variability. This also improves any estimates of
treatment effects. (example: qt-PCR vs. Microarray)
 Another way is to make the experimental conditions as
tightly controlled as possible. This may negatively affect
inference from the experiment (if one wishes to claim that
the treatment works better under a variety of conditions).
 Reducing variability in experimental units also negatively
affects inference, especially in clinical trials, since one
would like to say the treatment works best on as many
subjects as possible.
Statistical vs. Practical Significance
If |µ1-µ2|>0, then for a large enough n, the null
hypothesis H0 will almost certainly be rejected giving
Statistical Significance. The problem is that if |µ1-µ2| is
“small”, say near zero, there may be no practical
difference between the responses to the two
treatments. This affects making a recommendation of
which treatment to apply to a population. The
treatment with the “better” mean may not make
economic sense if there is no practical difference.
Be careful about Clinical Trials
There is a fine point here. With regard to clinical trials,
a subject may respond better to one treatment than
another. The ideal thing in this situation is to test the
subject with both treatments at different time points.
This will be addressed later in the semester when we
discuss Crossover Designs.
How to address Practical Significance in reporting
results of an experiment
 In certain areas of Science, especially Biomedical
Research, this is addressed by reporting the Effect
Size.
 The Effect Size is a way to quantify practical
significance for a population.
 The Effect Size, usually denoted ES (or d) in the
literature, is defined by: ES=|µ1-µ2|/ σ.
Historically…
The most cited paper in the clinical literature I have
seen is:
Reporting the Size of Effects in Research Studies to
Facilitate Assessment of Practical or Clinical
Significance. Kraemer, H.C.,
Psychoneuroendocrinology, Vol. 17, No. 6, pp. 527-536,
1992.
(Very readable too, even for those who are not into
clinical trials).
Suggested guidelines for reporting mean
differences in Clinical Data
 ES~0.2 is “small”.
 ES~0.5 is “moderate”.
 ES~0.8 is “large”.
 The reasoning is based upon the shift in mean of the
Normal Distribution for the treatment group relative
to the Control group.
 Please note that while this is common terminology for
the Clinical literature, the terminology can vary in
other areas of Science or Engineering (though I don’t
know of other standard references).
Multiple experimental trials
 Data from more than one trial can be combined. One
example is a Randomized Complete Block Design
(RCBD) which will be covered later.
 An increasingly popular technique to estimate an
“overall” ES using multiple trials is called MetaAnalysis. It can weight individual Effect Sizes by the
sample sizes used in different experiments. (It is an
advanced topic we will not cover in this course and
there are a number of books in print which treat this
topic.)
Download