Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin Previous Lectures • Inferential Statistics – Sample Population • Null Hypothesis Significance Testing – Proceeds in series of steps – Allows us to assess the statistical significance of our results – To reject or accept the Ho on the basis of the p value Previous Lectures • Misleading nature of statistical significance – Results can be labelled as • ‘Statistically significant’ • ‘Not statistically significant’ – People interpret results in a cut and dried fashion • ‘Statistically significant result means there is a true effect in the population’ • ‘Non-significant result means there is no true effect’ Previous Lectures • NHST is not so straightforward • Statistical significance is affected by – – – – One or two tailed test Significance level / / probability of Type I error Power / Probability of Type II error Sample size • These factors must be considered – Research evaluation – Research planning Research Evaluation • A result is statistically significant – Implies a true effect exists in the population – But is this effect clinically significant? • How big if the effect? • Real world relevance? • Recall that a large enough sample size will make a small effect statistically significant Research Evaluation • A result is not statistically significant – Implies a true effect does not exist in the population – Power • Did the study have enough power to identify an effect as statistically significant even if a true effect existed? Research Planning • Power – Require enough power to obtain statistically significant results if a true effect exists • Sample Size – Obtain an adequate sample size Effect Size • NHST – Enables us to say whether or not a true effect exists in the population • Effect Size – Provides an estimate of the size of this true effect – A measure of the degree to which the Ho is false – A measure of the discrepancy between Ho and H1 Small ES 0 - 1 = small ES 0 1 Large ES 0- 1 = large ES 0 1 Effect Size • There is a different effect size measure for each statistical test • The difference between two independent group means – Cohen’s d – 1 - 0 σ – Standardised difference – Express the difference between the means in terms of the standard deviation Effect Size • To calculate Cohen’s d for a study in which you compared two groups Meantreat – Meancontrol SDcontrol • For example, I compared the effects of an exercise regime and a control regime on physical fitness (rated /20) in two groups and obtained the following results… Effect Size • Mean rating in exercise group was 17 (SD = 10) • Mean rating in control group was 11 (SD = 10) • Cohen’s d was 17 – 11 10 = .6 • The exercise group had a mean rating .6 SDs higher than the control group • You can use Cohen’s d to compare studies that have used different measures Comparing Studies • Four studies examined the effect of cognitive behavioural therapy on selfesteem but each study used a different scale to assess self-esteem. • • Calculate the effect size for each of the following studies Which study found the greatest effect? Study Treatment Group Mean Control Group Mean Mean Difference SD A 17 11 6 10 B 225 215 10 100 C 12 9 3 9 D 31 23 8 20 d Comparing Studies • Four studies examined the effect of cognitive behavioural therapy on selfesteem but each study used a different scale to assess self-esteem. • • Calculate the effect size for each of the following studies Which study found the greatest effect? Study Treatment Group Mean Control Group Mean Mean Difference SD d A 17 11 6 10 .6 B 225 215 10 100 .1 C 12 9 3 9 .33 D 31 23 8 20 .4 What is a big Effect Size? • Cohen’s (1992) rules of thumb • For independent t-tests comparing two means… Cohen’s d Small Medium Large .2 .5 .8 Cohen, J. (1992). A power primer. Psychological Bulletin, 112 (1), 155-159. Research Evaluation • A statistically significant result – Is it clinically significant? – Real world relevance? – Effect Size • A non-significant result – No true effect? – Lack of power? Calculating Power • Recall that power is determined by a number of factors • To calculate the power of an experiment you need to know – – – – One or two-tailed test Significance level Sample size Effect size • You calculate the power of an experiment to identify a certain effect size as statistically significant, using a one/two-tailed test with a certain level and a certain sample size Example: The effects of therapy on depression Analysis 1 Analysis 2 Size of sample 20 200 Therapy mean score 5.5 5.5 Therapy standard deviation 3.03 2.89 Control mean score 6.3 6.3 Control standard deviation 2.75 2.62 Mean difference -.8 -.8 T statistic -.618 -2.051 Df 18 198 P-value .54 .042 Study 1 Study 2 Independent samples T-test Independent samples T-test One or two-tailed Two-tailed Two-tailed Significance Level .05 .05 Size of each group 10 100 5.5 – 6.3 2.75 .29 .3 5.5 – 6.3 2.62 .305 .3 .1 .56 10% chance of finding an ES of .3 as statistically significant at p < .05 using twotailed test 56% chance of finding an ES of .3 as statistically significant at p < .05 using twotailed test Test Effect Size Power The difference in power for these two studies was due to sample size Power • Computer programmes can calculate power – http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/ – Free download of gpower3 package • Research planning – Rather than computing power post hoc, best to plan to have adequate power to obtain statistically significant results if Ho is false and a true effect exists – Convention • • • • Aim for power of .8 80% chance of obtaining significant results if Ho is false .2 probability of Type II error 1 : 4 ratio of Type I (.05) to Type II (.2) errors Power & Sample Size • Main avenue for increasing power – Increase sample size • Common question – How big a sample do I need? • Answer depends – – – – – The power you want to have Significance level you set Effect size you expect to obtain Statistical test you are running One or two tailed prediction Power & Sample Size • The Real Question – “What sample size do I need to have power of ____ to detect an ES of ____ as being statistically significant at ____ level, when doing a ____ statistical test and making a ____-tailed prediction?” • Most of the gaps are easy to complete – – – – – Power Test Prediction ES = = = = = .8 .05 depends on experimental design depends on theory ? • Need to estimate effect size Estimate Effect Size • Pilot Study • Do analysis on small group to give idea of results • Previous Research • Calculate ES in previously published studies • Theory • Based on theory or understanding of research area, estimate the ES or the smallest ES that would be of interest • Cohen’s Standards • Would you like to detect a small, medium or large effect? • Difference between two groups • Small (.2), Medium (.5), Large (.8) Power & Sample Size • Once you have decided on the following – Statistical test, prediction, Power, and ES • You can calculate necessary sample size in two ways – Computer package, such as gpower3 – Cohen’s tables • Let’s try an example – Turn to the handout showing Cohen’s table of required sample size • (note that this table refers to two-tailed predictions) Calculating Required Sample Size • I would like to investigate the difference between clinically anxious and normal people in relation to performance on an attention task • “How many people do I need in each group to have power of .8 to detect a large ES as being statistically significant at .05 level, when doing an independent samples t-test and making a two-tailed prediction?” Cohen’s Table N for Small, Medium, and large ES at power = 0.80 for = .01, .05 and .10 Sm 0.01 Med Lg mean diff 586 95 38 Sm 0.05 Med Lg 393 64 26 Sm 0.10 Med Lg 310 50 20 • We need 26 people in each group to have a power of 0.80 to detect a large ES as statistically significant at the 0.05 level Some more practice! Sm 0.01 Med Lg mean diff 586 95 38 Sm 0.05 Med Lg 393 64 26 Sm 0.10 Med Lg 310 50 20 – For a two group independent t-test, how many people do I need in each group to detect… • • • • • Large ES as statistically significant at .10 level Large ES as statistically significant at .05 level Large ES as statistically significant at .01 level Medium ES as statistically significant at .01 level Small ES as statistically significant at .01 level _________ _________ _________ _________ _________ – The smaller the alpha level, the _______________ the sample size required to detect a given difference as being statistically significant – The smaller the ES, the _______________ the sample size required to detect a given difference as being statistically significant Summary • Factors affecting Statistical Significance • Research Evaluation • Effect size • Power Calculations • Research Planning • Sample Size Calculations