CHAPTER EIGHT Confidence Intervals, Effect Size, and Statistical Power NOTE TO INSTRUCTORS Students have a tendency to think that if something is statistically significant, the story is over and that’s all that a person needs to know. In other words, they frequently confuse “statistically significant” with “meaningful.” This chapter will help students recognize that this is not always the case. Aside from using the discussion questions and classroom exercises, present examples of studies that demonstrate a significant difference between groups but not a very meaningful one. It is also important to break students of the habit of using phrases such as “very significant” by discussing effect sizes. Although students might be tempted to describe an effect as “very significant,” emphasize that they should use effect sizes for this purpose instead. OUTLINE OF RESOURCES I. Confidence Intervals Discussion Question 8-1 (p. 74) Classroom Activity 8-1: Understanding Confidence Intervals (p. 74) Discussion Question 8-2 (p. 75) II. Effect Size and prep Discussion Question 8-3 (p. 75) Discussion Question 8-4 (p. 76) III. Next Steps: prep IV. Statistical Power Discussion Question 8-5 (p. 77) Discussion Question 8-6 (p. 77) Classroom Activity 8-2: Working with Confidence Intervals and Effect Size (p. 77) V. Next Steps: Meta-Analysis Discussion Question 8-7 (p. 78) Classroom Activity 8-3: Analyzing Meta-Analyses (p. 78) Additional Reading (p. 78) Online Resources (p. 79) VI. Handouts Handout 8-1: Classroom Activity: Working with Confidence Intervals and Effect Size (p. 80) Handout 8-2: Analyzing Meta-Analyses (p. 82) CHAPTER GUIDE II. Confidence Intervals 1. A point estimate is a summary statistic from a sample that is just one number as an estimate of the population parameter. 2. Instead of using a point estimate, it is wiser to use an interval estimate, which is based on a sample statistic and provides a range of plausible values for the population parameter. > Discussion Question 8-1 What is the difference between a point estimate and an interval estimate? Your students’ answers should include: A point estimate is a summary statistic from a sample that is just one number as an estimate of the population parameter. Point estimates are useful for gauging the central tendency, but by themselves can be misleading. An interval estimate is based on a sample statistic and provides a range of plausible values for the population parameter. Interval estimates are frequently used in media reports, particularly when reporting political polls. Classroom Activity 8-1 Understanding Confidence Intervals The following Web site provides a nice applet to help your students understand confidence intervals: http://www.ruf.rice.edu/~lane/stat_sim/conf_interv al/index.html The applet simulates a known population mean and standard deviation and allows you to control the sample size, providing a graphical display of the resulting confidence intervals. 3. A confidence interval is an interval estimate that includes the mean we would expect a certain percentage of the time for the sample statistic were we to sample from the same population repeatedly. 4. With a confidence interval, we expect to find a mean in this interval 95% of the time that we conduct the same study (if our confidence level is 95%). 5. To calculate a confidence interval with a z test, we first draw a normal curve that has the sample mean in the center. 6. We then indicate the bounds of the confidence interval on either end and write the percentages under each segment of the curve. 7. Next, we look up the z statistics for the lower and upper ends of the confidence interval in the z table. 8. We then convert the z statistic to raw means for the lower and upper ends of the confidence interval. To do so, we first calculate the standard error as our measure of spread using the formula M = /. Then, with this standard error and the sample mean, we can calculate the raw mean at the upper and lower end of the confidence interval. For the lower end we use the formula: MLower = –z(M) + MSample. For the upper end, we use the formulat: MUpper = –z(M) + MSample. 9. Lastly, we should check our answer to ensure that each end of the confidence interval is exactly the same distance from the sample mean. > Discussion Question 8-2 How would you calculate a confidence interval with a z test? Your students’ answers should include: To calculate a confidence interval with a z test: Draw a normal curve with a sample mean in the center. Indicate the bounds of the confidence interval on either end and write the percentages under each segment of the curve. Look up the z statistics for the lower and upper ends of the confidence interval in the z table. Convert the z statistic to raw means for the lower and upper ends of the confidence interval. For the lower end, use the formula: MLower = –z(M) + MSample. For the upper end, use the formula: MUpper = z(M) + MSample. II. Lastly, check the answer to ensure that each end of the confidence interval is exactly the same distance from the sample mean. Effect Size and prep 1. Increasing the sample size can lead to an increased test statistic during hypothesis testing. In other words, it becomes progressively easier to declare statistical significance as we increase the sample size. 2 An effect size indicates the size of a difference and is unaffected by sample size. 3. Effect size tells us how much two populations do not overlap. Two populations can overlap less if either their means are farther apart or the variation within each population is smaller. > Discussion Question 8-3 What is an effect size, and why would reporting it be useful? Your students’ answers should include: An effect size is a measure of the degree to which groups differ in the population on the dependent variable. It is useful to report the effect size because it provides you with a standardized value of the degree to which two populations do not overlap and addresses the relative importance and generalizability of your sample statistics. 3. Cohen’s d is a measure of effect size that assesses the difference between two means in terms of standard deviation, not standard error. 4. The formula for Cohen’s d for a z distribution is: d = (M – )/ 5. A d of .2 is considered a small effect size, a d of .5 is considered a medium effect size, and a d of .8 is considered a large effect size. 6. The sign of the effect size does not matter. > Discussion Question 8-4 Imagine you obtain an effect size of –0.3. How would you interpret this number? Your students’ answers should include: If you obtained an effect size of –0.3, you would interpret this as a small effect size. III. Next Steps: prep 1. Another method we can use in hypothesis testing is prep or the probability of replicating an effect given a particular population and sample size. It is interpreted as “This effect will replicate 100(prep)% of the time.” 2. To calculate prep, we first calculate the specific p value associated with our test statistic. 3. Next, using Excel we enter into one cell the formula =NORMDIST(NORMSINV (1-P/SQRT(2))) where we substitute the actual p value for p. IV. Statistical Power 1. Statistical power is a measure of our ability to reject the null hypothesis given that the null hypothesis is false. In other words, it is the probability that we will not make a Type II error, or the probability that we will reject the null hypothesis when we should reject the null hypothesis. 2. Our calculation of statistical power ranges from a probability of 0.00 to 1.00. Historically, statisticians have used a probability of .80 as the minimum for conducting a study. 3. There are three steps to calculating statistical power. In the first step, we determine the information needed to calculate statistical power, including the population mean, the population standard deviation, the hypothesized mean for the sample, the sample size, and the standard error based on this sample size. 4. In step two, we caculate the critical value in terms of the z distribution and in terms of the raw mean so that statistical power can be calculated. 5. In step three we calculate the statistical power or the percentage of the distribution of means for population 2 (the distribution centered on the hypothesized sample mean) that falls above the critical value. > Discussion Question 8-5 What is statistical power, and how would you calculate it? Your students’ answers should include: Statistical power is the probability of rejecting the null hypothesis when it is false. You calculate statistical power in three steps. First determine the characteristics of the two populations. Next, calculate the raw mean value that determines your cutoff values. Finally, determine the percentage that falls above the raw mean and at the cutoff value using population 2. 6. There are five ways that we can increase the power of a statistical test. First, we can increase alpha. Second, we could turn a twotailed hypothesis into a one-tailed hypothesis. Third, we could increase N. Fourth, we could exaggerate the levels of the independent variable. Lastly, we could decrease the standard deviation. > Discussion Question 8-6 What are ways that you could increase statistical power? Your students’ answers should include: Three ways that you could increase your statistical power are: Adapt a more lenient alpha level. Use a one-tailed test in place of a two-tailed test. Increase the size of the sample. Exaggerate the levels of the independent variable. Decrease the standard deviation. Classroom Activity 8-2 Working with Confidence Intervals and Effect Size For this activity, you will need to have the class take a sample IQ test. You can find many examples of abbreviated IQ tests online (www.iqtest.com is one such site). Have students anonymously submit their scores and compare the class data to data for the general population (population mean = 100, population standard deviation = 15). Using these data: Have students calculate the confidence interval for the analysis. Have students calculate the effect size. Use Handout 8-1, found at the end of this chapter, to complete the activity. V. Next Steps: Meta-Analysis 1. A meta-analysis is a study that involves the calculation of a mean effect size from the individual effect sizes of many studies. 2. A meta-analysis can provide added statistical power by considering many studies at once. In addition, a meta-analysis can help to resolve debates fueled by contradictory research findings. > Discussion Question 8-7 What is a meta-analysis and why is it useful? Your students’ answers should include: A study that involves the calculation of a mean effect size from the individual effect sizes of many studies. It is useful because it considers many studies at once and helps to resolve debates fueled by contradictory research findings. 3. The first step in a meta-analysis is to choose the topic and make a list of criteria for which studies will be included. 4. Our next step is to gather every study that can be found on a given topic and calculate an effect size for every study that was found. 5. Lastly, we calculate statistics—ideally, summary statistics, a hypothesis test, a confidence interval, and a visual display of the effect sizes. 6. A file-drawer analysis is a statistical calculation following a meta-analysis of the number of studies with null results that would have to exist so that a mean effect size is no longer statistically significant. Classroom Activity 8-3 Analyzing Meta-Analyses Directions: In this activity, have students find a meta-analysis within the psychological literature. You may want to point them in the right direction by suggesting journals that will typically publish meta-analyses such as Psychological Bulletin, Personality and Social Psychology Review, or Journal of Applied Psychology. Once students have found their meta-analysis, they should answer the questions in Handout 8-2. Additional Readings Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum. This is arguably the definitive source for power analysis. Many of the procedural guidelines for determining power that are useful in many types of research design are clearly laid out in this text. Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London. Series A, 236, 333–380. This is considered the seminal paper for confidence intervals. Rosenthal, R. (1994). Parametric measures of effect size. In Cooper, H., and Hedges, L. V. (Eds.), The handbook of research synthesis (pp. 231–244). New York: Russell Sage Foundation. A very readable account of many of the techniques for calculating effect sizes. The chapter also includes a lot of background information about these techniques and how to interpret them. Online Resources This is an excellent Web site with numerous statistical demonstrations that you can run in your classroom to help explain the concepts concretely: http://onlinestatbook.com/. Here you will find demonstrations of effect size, goodness of fit, and power. Math World is an excellent and extensive resource site, providing background information and succinct explanations for all of the statistical concepts covered in the textbook and beyond. http://mathworld.wolfram.com/topics/Probabilityand Statistics.html PLEASE NOTE: Due to formatting, the Handouts are only available in Adobe PDF®.