Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014 Calculate and report descriptive statistics. Create and review a histogram.* Calculate and interpret the Shapiro–Wilk statistic. *a.k.a. frequency distribution © Taylor & Francis 2014 Student # Score Student # Score 1st 4 12th 13 2nd 5 13th 13 3rd 7 14th 13 4th 8 15th 14 5th 8 16th 14 6th 9 17th 14 7th 9 18th 15 8th 10 19th 15 9th 10 20th 15 10th 10 21st 15 11th 13 © Taylor & Francis 2014 Mean = 11.14286 Median = 13 Mode = 13 and 15 Range = 11 points Standard deviation = 3.42927 © Taylor & Francis 2014 © Taylor & Francis 2014 The descriptive statistics give a sense of ... ◦ central tendency ◦ dispersion The histogram gives a sense of... ◦ the general shape of the distribution ◦ the possibility of outlier scores © Taylor & Francis 2014 In Parametric Statistics Land... ◦ Researchers believe their data will match the normal distribution model. The hypothesis that one of these researchers would propose is: ◦ Null hypothesis: The data are (probably) normally distributed. © Taylor & Francis 2014 How likely is it that the scores are normally distributed? The Shapiro–Wilk statistic Tests that hypothesis! © Taylor & Francis 2014 Enter the data. >mydata = c(4, 5, 7, 8, 8, 9, 9, 10, 10, 10, 13, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15) © Taylor & Francis 2014 Calculate descriptive statistics. (Remember how?) >summary >subset (table (mydata), table(mydata)==max (table(mydata))) >sd > maximum score – minimum score © Taylor & Francis 2014 Make a histogram. >hist (mydata, col = “orange”, breaks = 10) © Taylor & Francis 2014 Calculate the Shapiro–Wilk statistic. >shapiro.test (mydata) Shapiro–Wilk normality test data: mydata W = 0.9002, p-value = 0.03527 © Taylor & Francis 2014 The observed value of the Shapiro–Wilk statistic is: W = 0.9002 The exact probability of the outcome, W = 0.9002, is: p-value = 0.03527 © Taylor & Francis 2014 What does this mean?—are the data probably normally distributed or not? © Taylor & Francis 2014 For the Shapiro–Wilk statistic: ◦ If p is more than .05, we can be 95% certain that the data are normally distributed. (In other words, the null hypothesis is probably true.) ◦ If p is less than .05, we can be 95% certain that the data are not normally distributed. (In other words, the null hypothesis is probably false.) © Taylor & Francis 2014 Oh, p = 0.03527 is less than .05. ◦ The null hypothesis is probably not true. ◦ I can be 95% certain that it isn’t true! ◦ The data are probably not normally distributed. © Taylor & Francis 2014 Check homework practice problem #19 from Chapter Two. The null hypothesis: The data are (probably) normally distributed. Enter the data. >spanish.vocab = c(41, 33, 32, 29, 27, 27, 26, 24, 19, 19, 18, 17, 14) © Taylor & Francis 2014 shapiro.test (spanish.vocab) Shapiro–Wilk normality test data: spanish.vocab W = 0.958, p-value = 0.7225 © Taylor & Francis 2014 The observed value of the Shapiro–Wilk statistic is: W = 0.958 The exact probability of the observed value, W = 0.958, is: p-value = 0.7225 © Taylor & Francis 2014 I’m reminding myself… For the Shapiro–Wilk statistic: ◦ If p is more than .05, we can be 95% certain that the data are normally distributed. (In other words, the null hypothesis is probably true.) ◦ If p is less than .05, we can be 95% certain that the data are not normally distributed. (That is, the null hypothesis is probably false.) © Taylor & Francis 2014 For the Spanish data, p = .7725, which is greater than .05. ◦ The null hypothesis is probably true. ◦ I can be 95% certain the hypothesis is true. ◦ The data probably are normally distributed. © Taylor & Francis 2014