Shapiro–Wilk Statistic

advertisement
Using statistics in small-scale language education research
Jean Turner
© Taylor & Francis 2014

Calculate and report descriptive statistics.

Create and review a histogram.*

Calculate and interpret the Shapiro–Wilk statistic.
*a.k.a. frequency distribution
© Taylor & Francis 2014
Student #
Score
Student #
Score
1st
4
12th
13
2nd
5
13th
13
3rd
7
14th
13
4th
8
15th
14
5th
8
16th
14
6th
9
17th
14
7th
9
18th
15
8th
10
19th
15
9th
10
20th
15
10th
10
21st
15
11th
13
© Taylor & Francis 2014

Mean = 11.14286

Median = 13

Mode = 13 and 15

Range = 11 points

Standard deviation = 3.42927
© Taylor & Francis 2014
© Taylor & Francis 2014

The descriptive statistics give a sense of ...
◦ central tendency
◦ dispersion

The histogram gives a sense of...
◦ the general shape of the distribution
◦ the possibility of outlier scores
© Taylor & Francis 2014

In Parametric Statistics Land...
◦ Researchers believe their data will match the normal
distribution model.

The hypothesis that one of these researchers would
propose is:
◦ Null hypothesis: The data are (probably) normally
distributed.
© Taylor & Francis 2014
How likely is it that the scores are normally
distributed?
The Shapiro–Wilk statistic
Tests that hypothesis!
© Taylor & Francis 2014

Enter the data.
>mydata = c(4, 5, 7, 8, 8, 9, 9, 10, 10, 10,
13, 13, 13, 13, 14, 14, 14, 15, 15, 15, 15)
© Taylor & Francis 2014

Calculate descriptive statistics. (Remember how?)
>summary
>subset (table (mydata), table(mydata)==max (table(mydata)))
>sd
> maximum score – minimum score
© Taylor & Francis 2014

Make a histogram.
>hist (mydata, col = “orange”, breaks = 10)
© Taylor & Francis 2014

Calculate the Shapiro–Wilk statistic.
>shapiro.test (mydata)
Shapiro–Wilk normality test
data: mydata
W = 0.9002, p-value = 0.03527
© Taylor & Francis 2014

The observed value of the Shapiro–Wilk statistic is:
W = 0.9002

The exact probability of the outcome, W = 0.9002, is:
p-value = 0.03527
© Taylor & Francis 2014
What does this mean?—are the data
probably normally distributed or not?
© Taylor & Francis 2014

For the Shapiro–Wilk statistic:
◦ If p is more than .05, we can be 95% certain that the data are
normally distributed. (In other words, the null hypothesis is
probably true.)
◦ If p is less than .05, we can be 95% certain that the data are not
normally distributed. (In other words, the null hypothesis is
probably false.)
© Taylor & Francis 2014

Oh, p = 0.03527 is less than .05.
◦ The null hypothesis is probably not true.
◦ I can be 95% certain that it isn’t true!
◦ The data are probably not normally distributed.
© Taylor & Francis 2014

Check homework practice problem #19 from Chapter
Two.
The null hypothesis: The data are (probably) normally distributed.

Enter the data.
>spanish.vocab = c(41, 33, 32, 29, 27, 27, 26, 24, 19,
19, 18, 17, 14)
© Taylor & Francis 2014

shapiro.test (spanish.vocab)
Shapiro–Wilk normality test
data: spanish.vocab
W = 0.958, p-value = 0.7225
© Taylor & Francis 2014

The observed value of the Shapiro–Wilk statistic is:
W = 0.958

The exact probability of the observed value, W = 0.958, is:
p-value = 0.7225
© Taylor & Francis 2014
I’m reminding myself…

For the Shapiro–Wilk statistic:
◦ If p is more than .05, we can be 95% certain that the data are
normally distributed. (In other words, the null hypothesis is
probably true.)
◦ If p is less than .05, we can be 95% certain that the data are not
normally distributed. (That is, the null hypothesis is probably
false.)
© Taylor & Francis 2014

For the Spanish data, p = .7725, which is greater than
.05.
◦ The null hypothesis is probably true.
◦ I can be 95% certain the hypothesis is true.
◦ The data probably are normally distributed.
© Taylor & Francis 2014
Download