Inferential Statistics II Confounding Variations • Anticipating the direction of the change in mean scores • The similarity of the samples • Random selection • Sample size • Multiple simultaneous t-tests Direction Anticipated Mean Shift 5% (.05) In some cases it can be assumed that the difference between means scores will represent a positive shift. When we give a lesson we expect that the test scores will rise from pre to post test. Direction Anticipated Mean Shift 5% (.05) In some cases it can be assumed that the difference between means scores will represent a negative shift. If we do conflict management training we would anticipate that the number of conflicts would be reduced. Direction ? 5% (.05) 5% (.05) What happens if you can’t anticipate which way the mean will shift? Will canceling inter-mural sports affect achievement positively or negatively? p < .05 Direction ? 2.5% 2.5% (.025) (.025) The possible means which would be considered significant must be split to both ends of the sampling distribution—a two-tailed test of significance. It is the researchers job to demonstrate that a significance test should be one-tailed or two-tailed. EZAnalyze Results Report - Paired T-Test of Pretest with Posttest Pretest Mean: Std. Dev.: N Pairs: Mean Difference: SE of Diff.: Eta Squared: T-Score: P: Posttest 74.611 82.611 13.349 11.850 36 -8.000 2.936 .171 2.724 .010 EZAnalyze always reports two-tailed results To compute one-tailed results divide p value in half. p < .05 5% (.05) Direction ? 5% (.05) 2.5% 2.5% (.025) (.025) Confounding Variations • Anticipating the direction of the change in mean scores • The similarity of the samples • Random selection • Sample size • Multiple simultaneous t-tests Using EZAnalyze for t -Tests Similarity of Samples • Paired—Significance is easier to demonstrate if the two samples include exactly the same individuals. The random error based on the respondents being different is gone. • Independent Samples—Significance is more difficult to demonstrate if the two groups are dissimilar. Random error that appears because the respondents are different has to be accounted for. Confounding Variations • Anticipating the direction of the change in mean scores • The similarity of the samples • Random selection • Sample size • Multiple simultaneous t-tests Random Selection • With random sampling all members of the population to which you wish to generalize have an equal chance of being in the sample. • Scientific studies use true random sampling which is also called probability sampling. (simple and stratified) • When all of the members of a population do not have an equal chance of being in the sample it is called nonprobablity sampling. (samples of convenience) • If your sample is random you have to carefully explain how you made it that way. (methods section) • If the sample isn’t random then you have to work hard at showing that your sample is not potentially dissimilar from the population. (methods section) Random Error • Random normal variation in groups. • Outside of the researchers control. • Inferential statistics deals with random error really well. • That is why groups should be formed randomly. • We will figure out how to deal with nonrandom error when we talk about validity. Dealing with Non-Random Samples • Carefully explain how the sample was formed in the methods section. • Carefully describe important elements of the context of the study that support the idea (or not) that the sample is like the population. • Carefully explain how the analysis of data will be done. • List the possible effect of sampling procedures in the limitations section of the conclusions. Confounding Variations • Anticipating the direction of the change in mean scores • The similarity of the samples • Random selection • Sample size • Multiple simultaneous t-tests Sample Sizes n=1000 n=100 Population Distribution n=30 As the sample size increases the Sampling Distribution of the Mean gets narrower. The standard error gets numerically smaller. Effect Size (Practical Significance) • With large samples it is possible that significant differences will appear from very small mean differences. • When statistical significance appears, practical significance can be reported by showing the mean differences in units of standard deviation—not standard error (remember z scores). • The simplest calculation is to determine the distance between the two mean scores and divide by the average standard deviation. (Cohen’s d) • Effect sizes over .5 are considered substantial. Effect Size—Practical Significance How many standard deviations is the new mean from the first mean? Effect size of .2 is weak; .5 is moderate; .8 is strong Practical Significance The difference of the means in units of standard deviation T able 1 Mean Scores on Johnson P roblem Solving I nventory for Students With and Without Conflict Resolution T raining. Mean SD Pre-Test N = 36 Post-Test N = 36 74.61 13.35 82.61* 11.85 * = p < .01 Difference in means: 74.61 - 82.61= -8 Average standard deviation: (13.35 + 11.85)/2 = 12.6 Practical significance: -8/12.6 = -.63 Practical Significance The difference of the means in units of standard deviation T able 1 Mean Scores on Johnson P roblem Solving I nventory for Students With and Without Conflict Resolution T raining. Mean SD Only report practical significance Pre-Test Post-Testif the mean N = 36 are statistically significant N = 36 differences to begin with. 74.61 13.35 82.61* 11.85 * = p < .01 Difference in means: 74.61 - 82.61= -8 Average standard deviation: (13.35 + 11.85)/2 = 12.6 Practical significance: -8/12.6 = -.63 Confounding Variations • Anticipating the direction of the change in mean scores • The similarity of the samples • Random selection • Sample size • Multiple simultaneous t-tests Research Design and Analysis 23 23 Research Design Groups by Treatment (Independent Variable) Data Gathering Events (Dependent Variable) Did direct instruction improve students’ ability to recall math facts? Independent Dependent DI to 4th Grade Class Pre-Test Post-Test group data group data t –test, paired if possible Do students who receive DI achieve better than those that don’t? Independent Dependent Test DI to 4th Grade Class group data Non-DI to different 4th Grade Class group data t –test, independent samples Do students who receive DI for math facts retain learning over the summer? Independent Dependent DI to 4th Grade Class Pre-Test Post-Test group data group data Repeated Measures Post-Post-Test group data Which instructional strategy works better for teaching math facts? Independent Dependent Test DI group data Cooperative group data Inquiry group data Single Factor Multiple Tests Simultaneously • When multiple (more than two) groups are to be compared on the same measure it is not appropriate to test each pair separately. The comparisons are not independent. • Analysis of Variance ANOVA • An ANOVA only tells if significant differences exist between at least two groups. It does tell which group pairs. A post hoc analysis is necessary to figure out which group differences are significant. • Download OWM Data from the site ANOVA in EZAnalyze • Single factor compares different groups on a single measure. • Repeated measures compares a single group on multiple uses of a single measure. Significance of the whole ANOVA Post hoc of pre postand anddelayed post delayed ANOVA Post Hoc Tests • Use a Tukey HSD (honestly significant difference) to compute multiple mean differences. • Accurate with groups of equal size. • Conservative with unequal variance. • Estimate by doing multiple t-tests. Factorial ANOVA Did direct instruction improve students’ • You will need three columns in Excel. ability to recall math facts? • The first will be the respondent number. • The second will indicate which of the four (or more) groups a score represents. In our case this is DI Girls, DI Boys, Non-DI Girls, and Non-DI Boys. • The third ANOVA—Two column will haveindependent the score for each individual. Factorial variables simultaneously. • Use a singleOne factor ANOVA.variable dependent • If significant you will have 6 comparisons to examine post hoc. Girls Boys DI to 4th Grade Class group data group data Non-DI to different 4th Grade Class group data group data Things to remember… • You have to figure out which t-test to use by judging the similarity of the groups. • Decide if your comparison should be one-tailed or two. • If you are comparing more than two groups simultaneously you have to use an ANOVA not a t-test. • Compute effect sizes, particularly if the groups are large. • Be random when you can. Exercise • Go to the Variable Exercise sheet on the Web site. • Identify the independent and dependent variables for all of the studies. • Pick one of the studies. Design a study following the prompts on the page. Excel Again • Download the data set called Reading Data • Students were asked about the amount of time they spent each week reading online, reading for pleasure (not online), and reading for homework. Is there a significant difference among those reported times? Being Wrong Test Group Mean 5% (.05) • We say that occurring randomly less than 5% of the time is really unlikely so it isn’t random. But, that statement would be wrong 5% of the time. • Type 1 Error: Saying it is not random when it was. (A false positive) Being Wrong Test Group Mean 5% (.05) • We say that occurring randomly more than 5% of the time is too likely so we say chance is the best explanation. But, sometimes real differences occur even though they look like chance. • Type 2 Error: Saying it is random when it was not. (A false negative) Reducing Being Wrong 5% (.05) • Reduce Type 1 errors by lowering the alpha level or using more conservative calculations. • Reduce Type 2 errors by increasing the sample size. • Reduce all errors by improving the study design (validity).