effect size indicator

RMTD 404 Lecture 8 Power Recall what you learned about statistical errors in Chapter 4: • Type I Error: Finding a difference when there is no true difference in the populations (i.e., incorrectly rejecting a true null hypothesis), designated by α. • Type II Error: Not finding a difference when there is a true difference in the populations (i.e., incorrectly retaining a false null hypothesis), designated by β. Power is the probability of finding a difference when there is a true difference in the populations (i.e., correctly rejecting a false null hypothesis), designated 1-β. 2 Power Power Factors affecting power There are four key factors that influence the power of a statistical test: 1. The alpha (α) that a researcher chooses; 2. The magnitude of the true population difference (effect size) 3. The sample size 4. The statistical test used Let’s try some of these in R (http://homepages.luc.edu/~rwill5/code.html) 4 Alpha’s influence on power A small alpha (α) makes the critical value more extreme so that less of the alternative distribution is allocated to the rejection region. Hence, we have less power with smaller alphas. Alpha = .10 Alpha = .05 A larger α makes the critical value less extreme so that more of the alternative distribution is allocated to the rejection region. Hence, we have more power with larger alphas. 5 Effect size’s influence on power A small effect size makes the critical value more extreme on the alternative distribution so that less of that distribution’s area is allocated to the rejection region. Hence, we have less power with smaller effect sizes. A larger effect size makes the critical value less extreme on the alternative distribution so that more of that distribution’s area is allocated to the rejection region. Hence, we have more power with larger effect sizes. 6 Power A small sample size makes the critical value more extreme on the alternative distribution so that less of that distribution’s area is allocated to the rejection region. Hence, we have less power with smaller sample sizes. A larger sample size makes the critical value less extreme on the alternative distribution so that more of that distribution’s area is allocated to the rejection region. Hence, we have more power with larger sample sizes. 7 Influence of sample size & variance on power Recall that the central limit theorem defines the standard error of the mean as X  X N Hence, as sample size increases, the size of the standard error of the mean decreases. As the sample size decreases, the size of the standard error of the mean increases. Similarly, as σx decreases, the standard error of the mean would also decrease, indicating that effects are easier to detect with more homogeneous populations. 8 Influence of statistical test on power One last note, different statistical tests provide different levels of power, all other things being equal. The increase in power results from assumptions that are made about the data being analyzed. Of course, if these assumptions are invalid, then the data-based decision that you make based on a hypothesis test may also be invalid. 9 Estimating Sample Size A good reason to perform power analysis is that these computations allow you to estimate the sample size that you would need to detect what you believe is a meaningful effect size. An important component of the power analysis computation is the effect size indicator. You must specify the size of the effect you wish to detect in order to determine the sample size that you must use. However, the previous equations have suggested that you must have your data in hand in order to get the effect size indicator. Hence, you need an estimate of the effect size in order to perform power and sample size computations. So how may people do I need? We will use my favorite table thus far – the Power/Delta Table  d n 10 Sources of effect size estimation There are three ways to estimate the size of the effect that you’ll want to detect: • Prior Research: You can estimate the effect size from prior studies that give the necessary statistics. This will allow you to detect effect sizes similar to those found by other researchers in similar studies. • Professional Judgment: Based on your own experiences, you may be able to identify an effect size that is substantively interesting. This will allow you to detect effect sizes that have real-world meaning, based on your experiences. • Convention: You can also use Cohen’s rule of thumb (e.g., small = .20, medium = .50, large = .80). This approach is probably only advisable when you don’t have enough information to perform the estimation use either of the previous approaches. 11 An effect size indicator for t-tests One measure of the magnitude of an effect, an effect size indicator, depicts the magnitude of the effect scaled in population standard deviation units.  X  0 parameter version d X If we want to estimate d from observed data, then we can transform the equation to: X  0 d sX statistic version A rule of thumb for interpreting d is that: d = .20 is a small effect size d = .50 is a medium effect size d = .80 is a large effect size 12 A visual depiction of d d = .20 %overlap = 85 d = .80 %overlap = 53 d = .50 %overlap = 66 d = 1.10 %overlap = 41 13 Another Effect Size Indicator for t-tests A similar measure of the magnitude of an effect is the squared point-biserial correlation, which is similar to the measures of association that we discussed in the context of the chi-square test. Rather than depicting the magnitude of the effect on the population standard deviation scale (as is the case for d), the squared point-biserial correlation indicates the proportion of shared variance between the independent and dependent variable. 2 rpb 2 tobserved  2 tobserved  df 2 A rule of thumb for interpreting rpb is that: 2 rpb = .01 is a small effect size 2 rpb = .06 is a medium effect size 2 rpb = .14 is a large effect size 14 Reporting effect size indicators To provide a more informative substantive interpretation, we would report and interpret the effect size indicators. So, we might say something like the following. The difference in means for students in Program A (31.21) and Program B (37.86) is too large to be accounted for by sampling error, t(15) = 2.23, p 2 = .02. In addition, this effect size is quite large ( rpb  .25 ), indicating that the observed difference is not an artifact of a large sample size. Scores of females (M = 3.56) were higher than the scores of males (M = 2.21), and this difference was statistically significant and the effect size was moderate, t(20) = 4.41, p < .0001, d = .55. There was a statistically significant difference between the mean ratings of hubands and wives (D  2.31, p  .01), but the effect size indicated 2  .003 ). that this difference is probably trivial ( rpb 15 Effect size calculations: One-sample t-test For the one-sample t-test, d is estimated as: d X  0 sX We can interpret the observed value as defined by Cohen’s rule-of-thumb criteria with values of .8 indicating large effect sizes. The d index is very important in planning a study, because you need to specify a meaningful effect size that you’d like to detect in order to determine the sample size required to detect that difference. As you have seen earlier that the effect size is related to the sample size. We use the statistic δ (delta)=d[f(n)] to represent this combination where the particular function of n will be defined differently for each individual test. 16 For the one-sample t-test, δ is based on the function of n . Specifically,   d n. Given δ as defined here, we can determine the power of the one-sample t test from the table of power on p.678. Back to the example we had for one-sample t-test: The mean GRE score of 300 students in School of Education at LUC is 565, and the standard deviation equals 75. We know the mean of the GRE test-taker population is 500. Thus, X  565, 0  500,and sX  75. d X  0 565  500   0.87 sX 75 Then   d n  0.87* 300  15.07From the Appendix Power, for δ=15.07 with α=0.05, the power is beyond 0.99. This means that, if we reject the null hypothesis, we are 99% certain our students’ GRE mean is different from 500. There is still less than 1% of the chance to make Type II error. 17 Sometimes the researcher is interested in knowing how many samples he should have in his study in order to obtain certain power. For example, a researcher wants to set power at .80 when he thinks (based on previous experience or literature) the effect size of her study is around d=0.20. According to the Appendix Power table, for power= .80 and α=0.05, δ must equal 2.80. And we have δ and ca simply solve for n.  d n   2 2  2.80  n     196  d   0.20  Therefore, if the researcher wants to have an 80% chance of rejecting the null hypothesis when the effect size is 0.2, he will have to use 196 random samples. 18 [Example] Literature Show that main influence score of peer pressure is 520 with a standard deviation of 80. An investigator would like to show that a minor change in conditions will produce scores with mean of only 500. He plans to run a t test to compare his sample mean with a population mean of 520. Effect size: d 500  520  0.25 80 If the sample size is 100, the δis:   d n  0.25* 100  2.5 Check the Appendix Power table, the power= .71 19 What sample sizes would be needed to raise power to .70, .80, and .90? (1)To have power=.70 with α=.05 , the δis close to 2.50. δ=2.40 power=0.67 δ=2.50 power=0.71 You can use interpolation 2.5  delta  0.71  0.70and delta=2.475. 2.5  2.4 0.71  0.67 To still detect the d=-2.5 with delta=2.475:  d n   2 2  2.475  n     98.01  99(round up)  d   0.25  (2)To have power=.80 with α=.05 , the δis close to 2.8.     2.8  n     125.44  126  d   0.25  2  d n 2 20 What sample sizes would be needed to raise power to .70, .80, and .90? (3)To have power=.90 with α=.05 , the δis in between 3.20 and 3.30. δ=3.20 power=0.89 δ=3.30 power=0.91 Use interpolation 3.30  delta  0.91  0.90 , and delta=3.25. 3.30  3.20 0.91  0.89 To still detect the d=-2.5 with delta=3.25:  d n   2 2  3.25  n     169 d  0.25     21 Effect Sizes: Two Independent-Samples The effect size index for the two independent sample t-test is defined as follows. 1  2 X1  X 2 d   s pooled spooled is defined as the common standard deviation (recall that we typically assume that the variances are equal). s 2X  s 2X 1 2 for equal-sized samples. s pooled  2 sX1 and sX 2 can be known from the population, estimated based on prior research, or estimated from the data. 22 In the case of unequal sample sizes, we pooled the variance as we do when computing the t-test. Recall what the pooled variance does—it estimates the population variance, weighting each sample variance by its sample size. Hence, the pooled variance is an estimate of the population variance that weights each case in the study equally. So, we can rewrite d for the unequal sample size case as follows. X  d 1  X2  s pooled  X 1  X2   n1  1 s12   n2  1 s22 n1  n2  2 And we need to calculate δ to find the power. The δ for the two-sample case is defined as  d n, and we also need to know n when it’s not the same 2 for the two groups. 23 When we deal with power for the t-test with unequal sample sizes, we need a single value of n to work with the power tables, so we need to combine the sample sizes from the two groups. The formula for the effective sample size is based on the harmonic mean. nh  2 1 1  n1 n2  2n1n2 n1  n2 Note that when we have unequal sample sizes, we need more participants to achieve the same level of power as a study in which sample sizes are equal (a balanced study). Consider the following two ways of dividing 100 participants into two groups. In this case, 100 people in the 2n1n2 2  40  60 unbalanced design has power n   48 n1  n2 40  60 equivalent to a balanced study with only 96 people. compared to n  50 in balanced studies What’s the point?—balance your samples when possible. 24 Let’s calculate the power of the two independent samples t-test that was shown on p.13 in the t-test slide set. IV: Teacher’s happiness (0=low happiness; 1=high happiness) DV: Student’s achievement Group Statistics Follow-up Reading std score COMPOSITE SEX MALE FEMALE X1  X 2 . s pooled Effect size: d Solving this  d N 117 138 Mean 50.4083 51.3812 Std. Deviation 10.37854 9.03615 With the unequal sample size, n for n gives us 2 Std. Error Mean .95950 .76921 s 2X  s 2X 1 2 s pooled  2 which is the sample size per group d = (51.3812 – 50.4083) / ((10.3785+9.0362)/2) = .1 nh = (2*117*138)/(117+138) = 126 δ = .1*sqrt(126/2) = 0.7937254 25 Summary: One-Sample T-Test (effect size, delta (for power estimate), and sample size: d X  0 sX Two-Independent Samples Test (effect size, delta (for power estimate), and per-group group sample size: X  X2 d 1 s pooled s 2X  s 2X 1 2 s pooled  2  d n 2 26 Effect Sizes: Matched-Samples The d index for the matched sample t-test is defined as: dD  X1  X 2 s( X1  X 2 ) s( X1  X 2 ) is the standard deviation of mean difference A problem arises: To calculate s( X1  X 2 ) , we need to know the correlation between X1 and X2. According to the variance sum law:  (2X  X )   X2   X2  2  X  X 1 2 1 2 1 2 To solve the problem, we make the general assumption of homogeneity of variance  X2 1   X2 2   2 27 So the variance sum law can be revised  (2X  X )   X2   X2  2  X  X 1 2 1 2 1 2   2   2  2  2  2 2  2  2 =2 2 (1   ) So  ( X1  X 2 )   2(1   ) The statistic form: s( X1  X 2 )  s 2(1  r ) Then we have to come up with the best guess of the correlation between X1 and X2 to calculate the  ( X1  X 2 ) . And the δ is defined as   d n . 28

effect size indicator

Related documents

Products

Support

effect size indicator

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib