Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data Chapter Six Practice Problems Answer Key Study A: The coordinator of a private school in Japan that offers a test preparation class for the Test of English for International Communication (TOEIC) was committed to learning how beneficial the school’s training course might be. All of the people who attend TOEIC preparation classes at the school complete two practice tests before they take the official TOEIC test. In April, 42 people who had completed the test preparation course took the test when it was offered. He received the official mean for all of the people who took the test in April. The mean and standard deviation for the 42 people who had completed TOIEC preparation and took the test in April are presented below. Did the 42 students who had completed the course perform better on the TOEIC that the population of April test takers? (Descriptive statistics are fabricated.) all Japanese mean = 774 people who completed his training + 2 practice tests mean = 820 s = 21 n = 42 Follow the steps in statistical reasoning to determine whether people who completed his program scored significantly better than test takers who did not. When you report the outcome and make your conclusions in Step 10, please keep in mind the design of the study, which is ex post facto. Step 1. State hypotheses H0: There is no statistically significant difference between the mean of the population of test takers and the mean of the people who completed the special training with practice tests. H1: The mean of the population of test takers is significantly higher than the mean of the people who completed the special training with practice tests. H2: The mean of the population of test takers is significantly lower than the mean of the people who completed the special training with practice tests. Step 2. Set alpha alpha = .01 Step 3. Identify the appropriate statistic for the analysis Case I t-test Step 4. Collect the data (means and standard deviation of the sample presented above) Step 5. Check the assumptions The data are normally distributed in the population (yes; the TOEIC is a norm-referenced test designed to yield a normal distribution of scores) The sample is a subset of the population (yes) 1 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data Step 6. Calculate the observed value of the statistic tobserved X s x nx = 774 821 46 = = 21/ 6.48 21/ 42 46 = -14.20 3.24 Step 7. use df and alpha to find the critical value of t. df for Case I t-test is the number of people in the small group (the sample) minus one (nx – 1), so df = 41. I used df = 40 in the chart of critical values on pp. 104-105, so tcritical = 2.7045. Step 8. I compare the absolute value of tobserved to tcritical in this step. I must remove the negative sign from the tobserved value. 14.20 is greater than 2.7045 So following the rules in statistical logic, I reject the null hypothesis. I accept the appropriate alternative hypothesis (the one that states that the population mean is less than the mean of the sample, H2, and make the probability statement). Step 9. I can be 99% certain that the mean of the population of test takers is significantly lower than the mean of the people who completed the special training with practice tests. Step 10. I interpret meaningfulness in this step. The researcher can be confident that the students who took the test preparation class performed significantly better than the population of test takers (tobserved = 14.02, df = 41, alpha = .01), and the difference is quite strong (effect size = .91); however, the research design is ex post facto, so the findings do not support a causal statement. (That is, the researcher can be confident that there is a statistically significant difference, but the researcher cannot assert that the difference is due to the learners’ participation in the test preparation course!). Effect size t2 = t 2 df 14.02 2 = 14.02 2 41 196.56 = 196.56 41 196.56 = 237.56 .83 = .91 Study B: A teacher wanted to know whether students would benefit from completing and discussing a practice final test before taking the final test itself. She designed two equivalent forms of her final test and distributed Form A to her students one week before the date of the final exam so they could complete the practice test as homework. She reviewed the test with the students during the class meeting two days before the final test date administration date. The students completed Form B as the final exam. She compared the students’ scores on Form A to the scores on Form B to determine whether there was a statistically significant difference in the 2 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data students' performance on the two test forms. Her students' scores on the Form A and Form B are presented in the chart. Fill in the descriptive statistics and follow the steps in statistical logic to determine whether the students performed differently on the two test administrations. Student 1 Student 2 Student 3 Student 4 Student 5 Student 6 Student 7 Student 8 Student 9 Student 11 Student 11 Student 12 Student 13 mean median mode standard deviation range Score on A 69 70 75 76 76 78 78 80 81 81 90 89 80 Score on B 68 70 72 73 72 73 74 75 74 76 80 81 79 78.69 78 76, 78, 80, 81 6.101702 74.38 74 72, 73, 74 3.819652 21 13 Follow the steps in statistical reasoning to determine if there was a statistically significant difference between participants’ performance on the two tests. When you report the outcome and make your conclusions in Step 10, keep in mind the design of the study, which is preexperimental. Step 1. State hypotheses H0: There is no statistically significant difference between the mean of Test Form A and Test Form B. H1: There is a statistically significant positive difference between the mean of Test Form A and Test Form B (the mean for Test Form A is significantly higher than the mean for Test Form B). H2: There is a statistically significant negative difference between the mean of Test Form A and Test Form B (the mean for Test Form A is significantly lower than the mean for Test Form B). Step 2. Set alpha alpha = .01 3 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data Step 3. Identify the appropriate statistic for the analysis Case II Paired Samples t-test Step 4. Collect the data (presented in chart with descriptive statistics, R commands below) > scorea = c(69, 70, 75,76, 76, 78, 78, 80, 81, 81, 90, 89, 80) > scoreb = c(68, 70, 72, 73, 72, 73, 74, 75, 74, 76, 80, 81, 79) > summary (scorea) Min. 1st Qu. Median Mean 3rd Qu. Max. 69.00 76.00 78.00 78.69 81.00 90.00 > sd (scorea) [1] 6.101702 > subset(table(scorea), (table(scorea)==max(table(scorea)))) scorea 76 78 80 81 2 2 2 2 > 90-69 [1] 21 > summary(scoreb) Min. 1st Qu. Median Mean 3rd Qu. Max. 68.00 72.00 74.00 74.38 76.00 81.00 > sd (scoreb) [1] 3.819652 > 81-68 [1] 13 > subset(table(scoreb), (table(scoreb)==max(table(scoreb)))) scoreb 72 73 74 2 2 2 Step 5. Check the assumptions Both sets of data are normally distributed. Review histograms and interpret Shapiro Wilk. (It appears that I can be reasonably certain the data are normally distributed). > hist (scorea, col = "light green") > hist (scoreb, col = "light blue") 4 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data > shapiro.test(scorea) Shapiro-Wilk normality test data: scorea W = 0.9334, p-value = 0.3767 > shapiro.test (scoreb) Shapiro-Wilk normality test data: scoreb W = 0.9558, p-value = 0.6877 Step 6. Calculate the observed value of the statistic I used R. >t.test (scorea, scoreb, paired =T) Paired t-test data: scorea and scoreb t = 5.4137, df = 12, p-value = 0.0001566 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 2.574014 6.041370 sample estimates: mean of the differences 4.307692 Step 7. Using the critical value approach— Determine tcritical using df and alpha 5 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data The formula for df for the Case II Paired Samples t-test (npairs – 1), so df = 13 – 1 =12. The chart in Chapter Five on pp. 104-105 gives tcritical as 3.0545 (df = 12, α = .0.1) Using the exact probability approach— I simply retrieve the exact probability from the R output; exact p = 0.0001566. Step 8. Using the critical value approach— Compare tobserved to tcritical 5.4137 is greater than 3.0545 so reject null hypothesis & accept appropriate alternative Using the exact probability approach— Compare exact probability to alpha 0.0001566 is less than alpha so reject null hypothesis & accept the appropriate alternative Step 9. Make probability statement I can be 99% certain that there is a statistically significant positive difference between the mean of Test Form A and Test Form B (the mean for Test Form A is significantly higher than the mean for Test Form B). Step 10. Interpret meaningfulness The teacher/researcher can be confident that students do better on the practice test than the actual final, so she concludes that having her students complete and discuss a practice final test is not beneficial (tobserved = 5.4137, df = 12, p <.01, effect size = .84). (Incidentally, I don’t agree with her interpretation, but that’s what she thinks on the basis of this fabricated data!) . Effect size t2 = t 2 df 5.4137 2 = 5.4137 2 12 29.3081 = 29.3081 12 29.3081 = 41.3081 .71 = .84 Study C: The researcher for a large school district is investigating whether 5th grade children whose parents receive coaching in how to help their children with homework do better achieve a greater degree of learning than children whose parents haven’t been coached. The researchers randomly selected 80 5th grade children to participate in the study. The parents of 40 of the 5th grade children, randomly selected from the 80 that had been randomly selected, were invited to participate in 6 hours of coaching on how to help their children do homework assignments. After these sessions, throughout the term, the parents participated in follow-up sessions during which they received additional tips on how to help their children. The parents also turned in biweekly surveys that helped the researcher verify that the parents had been following the advice they had received. The parents of the other group of 40 children received the usual reports on their children's progress in school. At the end of the term, all of the children took a state test intended to measure students’ learning. The test is designed to yield normally distributed scores. The 6 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data researcher compared the performance of the children whose parents were coached to the performance of children whose parents weren’t coached. The data are located in the resource section of the companion website. 1. What is the independent variable and what are its levels? +/- coaching for children’s parent 2. What is the dependent variable? Students’ learning 3. Identify the major category of research design and your reasons for your choice. There is a treatment (the innovation of coaching students’ parents in how to assist their children with their homework). This research takes place in a large school district—so the design may be true experimental. If the school district is not sufficiently large to be considered a population, the design is pre-experimental. Now follow the 10 steps in statistical logic to determine whether the children whose parents were coached performed significantly better on the state test than the children whose parents weren't coached. The descriptive statistics for the two groups are presented below. The (fabricated) children's scores on the state test were sent by email if you want to use R. (I did the calculations using R, and followed the procedure described in Chapter Six for separating the coached from the uncoached to determine the descriptive statistics for the two groups and check the assumptions. See my R commands inserted in the steps below). Parents Coached mean = 85.25 s = 5.77 n = 40 Parents Un-coached mean = 80.30 s = 7.85 n = 40 Step 1. State hypotheses H0: There is no statistically significant difference in the learning of 5th graders whose parents received coaching on helping with homework and those whose parents did not receive coaching. H1: There is a statistically significant positive difference in the learning of 5th graders whose parents received coaching on helping with homework and those whose parents did not receive coaching. H2: There is a statistically significant negative difference in the learning of 5th graders whose parents received coaching on helping with homework and those whose parents did not receive coaching. Step 2. Set alpha alpha = .01 Step 3. Identify the appropriate statistic for the analysis Case II Independent Samples t-test 7 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data Step 4. Collect the data I imported the data set using this command >data.coaching.problem = read.csv(file.choose(), header =T) I viewed the data set using this command: > View (data.coaching.problem) Here’s the dataset; the coached group is “1” and uncoached group is “2” (I think). All dependent variable values for all of the participants are in one column (with the heading, score). This way of formatting the spreadsheet is the approach typically used by researchers. Student coached score 1 1 76 2 1 76 3 1 76 4 1 77 5 1 77 6 1 79 7 1 79 8 1 80 9 1 80 10 1 80 11 1 81 12 1 81 13 1 82 14 1 82 15 1 84 16 1 84 17 1 84 18 1 84 19 1 84 20 1 85 21 1 85 22 1 86 23 1 86 24 1 87 25 1 87 26 1 87 27 1 88 28 1 88 29 1 88 8 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 89 89 91 91 92 92 92 93 94 96 98 70 70 70 70 70 Step 5. Check the assumptions The assumptions are: 1) The independent variable is nominal and has only two levels. 2) There are different participants in the two groups. 3) The dependent variable is interval or interval-like. Make and interpret a histogram of each group’s data Calculate and interpret the Shapiro Wilk statistic for each group 4) The groups are exactly the same size, or the variances (standard deviation squared) of the groups are approximately equal. Verify that groups are same size OR Calculate the Levene Test statistic to verify that variances are approximately equal OR Use R (and Welch’s formula, which corrects for violation) to calculate tobserved . I need to split the complete dataset to check assumptions 3 & 4 and calculate the descriptive statistics (which are typically reported!). I follow the steps in Chapter Six, Box 6.1 to split the complete dataset, making separate datasetfor the coached group and the uncoached group—and I use the length command to see how many people are in each group (yes, each group has 40 participants). > coached.data = subset (data.coaching.problem, data.coaching.problem$coached=="1") [Note that I enter a name for the coached data (coached.data), then enter the subset command. The name of the complete dataset is given next, then the name of the column in that dataset that includes the independent variable values; I tell R to make a dataset called coached.data which includes only the people who are in Group 1.] 9 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data > uncoached.data = subset (data.coaching.problem, data.coaching.problem$coached=="2") [Now I make a dataset called uncoached.data] > length (coached.data$score) [1] 40 > length (uncoached.data$score) [1] 40 > summary (coached.data$score) Min. 1st Qu. Median Mean 3rd Qu. Max. 76.00 80.75 85.00 85.25 89.00 98.00 > sd (coached.data$score) [1] 5.767949 > 98-76 [1] 22 > subset (table(coached.data$score), (table(coached.data$score)==max(table(coached.data$score)))) 84 5 > summary (uncoached.data$score) Min. 1st Qu. Median Mean 3rd Qu. Max. 70.0 73.0 79.0 80.3 87.0 93.0 > sd (uncoached.data$score) [1] 7.845299 > subset (table(uncoached.data$score), (table(uncoached.data$score)==max(table(uncoached.data$score)))) 70 73 5 5 > 93-70 [1] 23 Here are the descriptive statistics. Coached (1) mean 85.25 median 85 mode 84 Uncoached (2) 80.30 79 70, 73 10 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data standard deviation range 5.767949 22 7.845299 23 I then made histograms and calculated the Shapiro Wilk statistic for each group. >par (mfrow = c(1,2)) > hist (coached.data$score, col = "red") > hist (uncoached.data$score, col = "purple") > shapiro.test (coached.data$score) Shapiro-Wilk normality test data: coached.data$score W = 0.9735, p-value = 0.4608 > shapiro.test(uncoached.data$score) Shapiro-Wilk normality test data: uncoached.data$score W = 0.9093, p-value = 0.003602 11 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data The histogram and the Shapiro Wilk statistic indicate that the data for the uncoached group are probably not normally distributed, so the Case II Independent Samples t-test SHOULD NOT be used (the non-parametric Wilcoxon Rank Sum Test is appropriate instead), but I’ll go ahead and calculate the Independent Samples t-test statistic, so we can see the outcome, and because the details of the Wilcoxon statistics are presented in the next chapter! I don’t need to calculate the Levene Test statistic because the groups are the same size (and because R uses the Welch formula for calculating the Case II Independent Samples t-test which corrects for any difference between the standard deviations of the two groups). Step 6. Calculate tonserved. There are several ways to enter the data for the t-test. I can do it using the two separate groups, like this: > t.test(coached.data$score, uncoached.data$score) Or I can use the complete data set like this: t.test (score ~ coached, data = data.coaching.problem) [Note that the name of the dependent variable is first inside the parentheses; then a tilde and the name of the independent variable; then data = and the name of the complete data set.] Welch Two Sample t-test data: score by coached t = 3.2151, df = 71.628, p-value = 0.001957 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.880539 8.019461 sample estimates: mean in group 1 mean in group 2 Step 7. Using the critical value approach— Determine tcritical using df and alpha The formula for df for the Case II Independent Samples t-test (n1 – 1) + (n2 -1), so df = 78 I used df = 70 from the chart in Chapter Five on pp. 104-105 so tcritical = 2.6479 Using the exact probability approach— I simply retrieve the exact probability from the R output, so exact p = 0.001957. Step 8. Using the critical value approach— Compare tobserved to tcritical 12 Turner Answer Key for Chapter Six Revised 4 2 2015 Using statistics in small-scale language education research: Focus on non-parametric data 3.2151 is greater than 2.6479 so reject null hypothesis & accept the alternative, H1 Using the exact probability approach— Compare exact probability to alpha 0.001957 is less than alpha so reject null hypothesis & accept the alternative, H1 Step 9. Make the appropriate probability statement I can be 99% certain that there is a statistically significant positive difference in the learning of 5th graders whose parents received coaching on helping with homework and those whose parents did not receive coaching. Step 10. Interpret meaningfulness. On the basis of these (fabricated) data, we can be confident that students whose parents are coached on how to help their children with their homework achieve a higher level of learning than students whose parents do not receive this coaching (tobserved = 3.2151, df = 71.628*, alpha < .01). The effect size (.355) indicates that the difference is strong. *Note that I used R to calculate the observed value of t, which used the Welch formula and corrects for the difference in the variances of the two groups. This correction is reflected in the degrees of freedom. Effect size calculation: t2 = t 2 df 3.21512 10.3369 = = 3.2151 71.628 10.3369 71.628 13 10.3369 = 81.9649 .126 = .355