Using and Reporting Measures of Effect Size Roger E. Kirk Department of Psychology & Neuroscience Baylor University Three Categories of Measures of Effect Magnitude 1. Measures of effect size (typically, standardized mean differences) 2. Measures of strength of association 3. A large category of other kinds of measures 2 Four Purposes of Measures of Effect Magnitude 1. Estimate the sample size required to achieve an acceptable power 2. Integrate the results of empirical research studies in meta-analyses 3. Supplement the information provided by null hypothesis significance tests 4. Determine whether research results are practically significant 3 Four Criticisms of Null Hypothesis Significance Testing 1. Answers the wrong question What we want to know is the probability that the null hypothesis is true, given our data: p(H0 | D) Null hypothesis significance testing tells us the probability of obtaining our data or more extreme data if the null hypothesis is true: p(D | H0 ) 4 Four Criticisms of Null Hypothesis Significance Testing (continued) 2. Is a trivial exercise According to John Tukey “the effects of A and B are always different—in some decimal place—for any A and B. Thus asking ‘Are the effects different?’ is foolish.” 5 Four Criticisms of Null Hypothesis Significance Testing (continued) According to Bruce Thompson “Statistical testing becomes a tautological search for enough participants to achieve statistical significance. If we fail to reject, it is only because we have been too lazy to drag in enough participants” 6 Four Criticisms of Null Hypothesis Significance Testing (continued) 3. Requires us to make a dichotomous decision from a continuum of uncertainty The adoption of .05 as as the dividing point between significance and non-significance is quite arbitrary. 7 Four Criticisms of Null Hypothesis Significance Testing (continued) 4. Does not address the question of whether results are important, valuable, or useful: that is, their practical significance. 8 Three Basic Questions that Researchers Want to Answer from Their Research 1. Is an observed effect real or should it be attributed to chance? 2. If the effect is real, how large is it? 3. Is the effect large enough to be useful? 9 Recommendation of the APA Publication Manual “Because confidence intervals combine information on location and precision and can often be used to infer significance levels, they are, in general the best reporting strategy . . . Multiple degree-of-freedom indicators are often less useful than effect-size indicators that decompose multiple degree-of-freedom tests into one degree-of-freedom effects . . . 10 Effect size |m E -m C | (1) Cohen’s d = s Three ways to estimate s (nE -1)sˆ 2E + (nC -1)sˆ C2 = (nE -1) + (nC -1) | YE - YC | Cohen d = sˆ pooled sˆ pooled | YE - YC | Glass g ¢ = sˆ C sˆ C is the sigma of the control group | YE - YC | sˆ pooled = Hedges g = sˆ pooled 11 2 ˆ (n1 -1)s1 + (n1 -1) + 2 ˆ + (n p -1)s p + (n p -1) Guidelines for Interpreting d d = 0.2 is a small effect d = 0.5 is a medium effect d = 0.8 is a large effect 12 Strength of Association (1) w2 , r I = s 2treatment 2 s error + s 2treatment Sample estimators of omega squared and the intraclass correlation SS treat - (df treat )MSerror ˆ = w SS total + MSerror 2 MS treat - MSerror rˆ I = MS treat + (n -1) MSerror 13 Guidelines for Interpreting Omega Squared w2 = .001 is a small association w2 = .059 is a medium association w2 = .138 is a large association 14 Measures of Effect Magnitude ________________________________________________________________ Effect Size Strength of Association Other Measures _______________________________________________________________ Cohen d, f, g, h, q, w Glass g’ Hedges g Mahalanobis D Mean1 – Mean2 Mdn1 – Mdn2 Mode1 – Mode2 Rosenthal P Tang f Thompson d* ˆ Wilcox L Mdn , sb r, rpb, r2, R, R2, h, 2 h2 , hmult , f , f2 Chamber re Cohen f2 Contingency coef C Cramér V Fisher Z Friedman rm Goodman l, g wˆ 2 rˆ I Herzberg R2 Kelly e 2 15 Abs. risk reduction ARR Cliff p Cohen U1, U2, U3 Shift function Dunlap CLR Grisson PS Logit d’ McGraw & Wong CL Odds ratio w Preece ratio of success Probit d’ Relative risk RR Sánchez-Meca dCox More Measures of Effect Magnitude ________________________________________________________________ Effect Size Strength of Association Other Measures _______________________________________________________________ Wilcox & Muska Qˆ0.632 Kendall W Lord R2 2 2 ˆG Olejnik w , hˆ G requivalent ralerting rcontrast reffect size ˆ 2mult c Tatsuoka w Wherry R2 16 Rosenthal & Rubin BESD Rosenthal & Rubin EScounter null Wilcoxon l Two Ways to Estimate the Denominator of Cohen’s d |m E -m C | d= s (1) (2) sˆ Y -Y = 1 2 sˆ Y -Y = 1 2 sˆ 2pooled n1 sˆ 2pooled n1 + + sˆ 2pooled n2 sˆ 2pooled n2 17 æ sˆ öæ sˆ ö - 2r ç pooled ÷ç pooled ÷ ç n ÷ç n ÷ è 1 øè 2 ø Effect of the Unreliability of the Dependent Variable, Y, On the Proportion of Explained Variance rXY can not exceed (rXX ¢ )1 2 (rYY ¢ )1 2 where rXX ¢ and rYY ¢ are the reliabilities of X and Y 18 Double-Blind Study of 22,071 Men Physicians Aspirin group pA = .01259 Placebo group pP = .02166 pA – pP = .01259 – .02166 = –.009 19 THE END 20