Dr. Tom Kuczek Purdue University Power of a Statistical test Power is the probability of detecting a difference in means under a given set of circumstances. Power, denoted as 1-β, depends on: n= the sample size in each group α=Probability of a Type I error σ=standard deviation of experimental error |µ1-µ2|=absolute difference between the means of each treatment group What affects Power of a Statistical test? Power increases under the following conditions: n increases (may be costly) α increases (one reason α≤0.05 is recommended) σ decreases (will be discussed shortly) |µ1-µ2| increases (not an option if the treatments are fixed) Experimental Error Experimental error, quantified by σ, actually has a number of components including: The variability in experimental units The variability of experimental conditions The variability inherent in the measurement system Nuisance variables, generally unobserved Design issues related to increasing Power Increasing the sample size n always works, but may not be feasible, either due to financial constraints or physical constraints on the experiment (such as a limited amount of test material). If n and α are fixed, then the Power of the t-test depends only on the ratio |µ1-µ2|/ σ, as this ratio increases so does the Power of the test. If the method of applying the treatments is fixed then |µ1-µ2| is likely fixed. If n is also fixed, then the only option is to somehow reduce the experimental error σ. How to reduce experimental error? The best way is to get a better measurement system with lower variability. This also improves any estimates of treatment effects. (example: qt-PCR vs. Microarray) Another way is to make the experimental conditions as tightly controlled as possible. This may negatively affect inference from the experiment (if one wishes to claim that the treatment works better under a variety of conditions). Reducing variability in experimental units also negatively affects inference, especially in clinical trials, since one would like to say the treatment works best on as many subjects as possible. Statistical vs. Practical Significance If |µ1-µ2|>0, then for a large enough n, the null hypothesis H0 will almost certainly be rejected giving Statistical Significance. The problem is that if |µ1-µ2| is “small”, say near zero, there may be no practical difference between the responses to the two treatments. This affects making a recommendation of which treatment to apply to a population. The treatment with the “better” mean may not make economic sense if there is no practical difference. Be careful about Clinical Trials There is a fine point here. With regard to clinical trials, a subject may respond better to one treatment than another. The ideal thing in this situation is to test the subject with both treatments at different time points. This will be addressed later in the semester when we discuss Crossover Designs. How to address Practical Significance in reporting results of an experiment In certain areas of Science, especially Biomedical Research, this is addressed by reporting the Effect Size. The Effect Size is a way to quantify practical significance for a population. The Effect Size, usually denoted ES (or d) in the literature, is defined by: ES=|µ1-µ2|/ σ. Historically… The most cited paper in the clinical literature I have seen is: Reporting the Size of Effects in Research Studies to Facilitate Assessment of Practical or Clinical Significance. Kraemer, H.C., Psychoneuroendocrinology, Vol. 17, No. 6, pp. 527-536, 1992. (Very readable too, even for those who are not into clinical trials). Suggested guidelines for reporting mean differences in Clinical Data ES~0.2 is “small”. ES~0.5 is “moderate”. ES~0.8 is “large”. The reasoning is based upon the shift in mean of the Normal Distribution for the treatment group relative to the Control group. Please note that while this is common terminology for the Clinical literature, the terminology can vary in other areas of Science or Engineering (though I don’t know of other standard references). Multiple experimental trials Data from more than one trial can be combined. One example is a Randomized Complete Block Design (RCBD) which will be covered later. An increasingly popular technique to estimate an “overall” ES using multiple trials is called MetaAnalysis. It can weight individual Effect Sizes by the sample sizes used in different experiments. (It is an advanced topic we will not cover in this course and there are a number of books in print which treat this topic.)