Turner Answer Key for Chapter Eight Using statistics in small

advertisement
Turner
Answer Key for Chapter Eight
Using statistics in small-scale language education research: Focus on non-parametric data
Chapter Eight
Practice Problem Answer Key
You can retrieve the data for this problem from the Companion Website. Download the dataset
and save it on your computer as an Excel document in the comma-separated values format. Note
that there are three columns of information in the dataset, but for this analysis, the independent
variable values are in the column labeled “class” and the dependent variable values are in the
column labeled “posttest.”
Step 1. State hypotheses
H0: There is no statistically significant difference in students’ language learning (as
measured by their performance on the CELT) depending on the type of instruction they
received.
H1: There is a statistically significant difference in students’ language learning (as
measured by their performance on the CELT) depending on the type of instruction they
received.
Step 2. Set alpha
alpha = .01
Step 3. Identify the appropriate statistic for the analysis:
I propose to use a 1-way, between-groups ANOVA to analyze the data because:
 the independent variable has more than 2 levels;
 the dependent variable is interval or interval-like and the data are normally distributed in
the population and in the groups defined by the levels of the independent variable;
 the groups defined by the levels of the independent variable consist of different
individuals;
 I want to know whether there is a difference in the performance of the groups defined by
the independent variable;
 I am working in the parametric paradigm—my participants were randomly selected from
a population and I wish to generalize conclusions from my study to that population;
 there is a minimum of 5 (or 10) participants in each of the groups defined by the levels of
the independent variable; and
 the number of participants in each of the groups defined by the independent variable is
exactly the same OR the variances of the groups are approximately equal.
Step 4. Collect the data.
Retrieve the dataset from the Companion Website at www.routledge.com/cw/turner; save the
dataset on your computer in the comma-separated values before you import it into R. (You may
get an error message if you don’t save the dataset in the comma-separated values format before
importing it into R!)
1
Turner
Answer Key for Chapter Eight
Using statistics in small-scale language education research: Focus on non-parametric data
Step 5. Check the assumptions
 the independent variable has more than 2 levels. Yes. The independent variable has 4
levels. The type of instruction is indicated by the number 1, 2, 3, or 4 in the first column
of the dataset, which has the label class.
 the dependent variable is interval or interval-like and the data are normally distributed in
the population and in the groups defined by the levels of the independent variable. Yes.
The data are CELT scores. The CELT is designed to yield normally distributed scores
when it is administered to its target audience. I’ll make histograms and calculate the
Shapiro Wilk statistic for each of the groups to verify that the scores are probably
normally distributed.
 the groups defined by the levels of the independent variable consist of different
individuals; Yes. The four levels of the independent variable are represented by the four
separate classes.
 I want to know whether there is a difference in the performance of the groups defined by
the independent variable; Yes.
 I am working in the parametric paradigm—my participants were randomly selected from
a population and I wish to generalize conclusions from my study to that population; Yes.
 there is a minimum of 5 (or 10) participants in each of the groups defined by the levels of
the independent variable; Yes, there are 12 people in each class.
 the number of participants in each of the groups defined by the independent variable is
exactly the same OR the variances of the groups are approximately equal. Ok, there is
exactly the same number of participants in each of the levels of the independent variable.
Step 6. Calculate the observed value of the statistic
I use R to calculate the observed value of the F and the exact probability of the observed value. I
located the dataset on the Companion Websites (http://www.routledge.com/cw/turner) and saved
the dataset on my computer as a comma-separated-values Excel spreadsheet. (If you enter the
data directly into R see pages 232-233 for appropriate R commands.)
> practice.problem.data = read.csv(file.choose(), header=T) [I imported the dataset]
> View (practice.problem.data) [I viewed the dataset to check the header names, the number of
cases, etc]
[Create a subset of the data for each level of the independent variable & view each of the
subsets]
> Class1 = subset(practice.problem.data,practice.problem.data$class=="1")
> View (Class1)
> Class2 = subset(practice.problem.data,practice.problem.data$class=="2")
>View (Class2)
> Class3 = subset(practice.problem.data,practice.problem.data$class=="3")
>View (Class3)
> Class4 = subset(practice.problem.data,practice.problem.data$class=="4")
>View (Class4)
2
Turner
Answer Key for Chapter Eight
Using statistics in small-scale language education research: Focus on non-parametric data
[check the posttest scores for each of the classes to verify that the scores are probably normally
distributed]
> shapiro.test(Class1$posttest)
Shapiro-Wilk normality test
data: Class1$posttest
W = 0.9835, p-value = 0.9939
> shapiro.test(Class2$posttest)
Shapiro-Wilk normality test
data: Class2$posttest
W = 0.9221, p-value = 0.3034
> shapiro.test(Class3$posttest)
Shapiro-Wilk normality test
data: Class3$posttest
W = 0.9151, p-value = 0.2476
> shapiro.test(Class4$posttest)
Shapiro-Wilk normality test
data: Class4$posttest
W = 0.9411, p-value = 0.5121
> par(mfrow = c(2,2)) [create a space for 4 histograms]
[create a histogram of each of the sets of posttest scores & review them to confirm likelihood of
the scores being normally distributed]
> hist(Class1$posttest, breaks = 10, col = "plum")
> hist(Class2$posttest, breaks = 10, col = "deep pink 1")
> hist(Class3$posttest, breaks = 10, col = "orange")
> hist(Class4$posttest, breaks = 10, col = "dodger blue 3")
3
Turner
Answer Key for Chapter Eight
Using statistics in small-scale language education research: Focus on non-parametric data
> study.model = aov(posttest~class, data = practice.problem.data) [Command to carry out the
analysis and ‘save’ information from the steps in the analysis]
> study.model
[Command requesting some of the information from the steps]
Call:
aov(formula = posttest ~ class, data = practice.problem.data)
Terms:
class
Residuals
Sum of Squares 617.6042 2584.8750
Deg. of Freedom
1
46
Residual standard error: 7.496195
> res=aov(posttest~class, data = practice.problem.data) [request information on residuals]
> summary(res) [request that the information from the analysis be reported in a Source Table; I
added the blue information in the Source Table]
Df
1
46
47
Sum Sq
617.6
2584.9
3202.5
Mean Sq
617.60
56.19
F value
Pr(>F)
10.991 0.001792 **
class (between)
Residuals (within)
Total
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4
Turner
Answer Key for Chapter Eight
Using statistics in small-scale language education research: Focus on non-parametric data
> pairwise.t.test(practice.problem.data$posttest, practice.problem.data$class, paired =F,
p.adjust.method = "bonferroni") [request that all pairwise comparisons be made using the
Bonferroni correction technique so the comparisons that show a statistically significant difference
are identified.]
Pairwise comparisons using t tests with pooled SD
data: practice.problem.data$posttest and practice.problem.data$class
1
2
3
2 0.0137 - 3 0.0624 1.0000 4 0.0026 1.0000 1.0000
P value adjustment method: bonferroni
Step 7. Calculate the exact probability of the statistic
I simply retrieve the exact probability from the R output, so exact p = .001792
Step 8. Compare the exact probability to alpha
The rules for interpreting exact probability are:
If exact probability ≥ alpha → accept the null hypothesis
If exact probability < alpha → reject the null hypothesis
Exact p is less than alpha, which was set at .01, so reject the null hypothesis and accept the
alternative hypothesis. A post hoc analysis of the data must be performed to identify where the
significant differences are. The Bonferroni method is used to identify which pairs show a
significant difference; only Class 1 and Class 4 show a significant difference with alpha set at
.01.
Step 9. Make the probability statement
There is 99% certainty of a statistically significant difference in students’ language
learning (as measured by their performance on the CELT) depending on the type of
instruction they received).
Step 10. Interpret the meaningfulness
There are two avenues for interpreting meaningfulness: 1) with reference to the research
question, and 2) by calculating effect size.
Effect size for the entire analysis can be determined by calculating omega2. The values of SSB
(Sums of Squares Between), MSW (Mean Sums of Squares within), and SST (Sums of Squares
5
Turner
Answer Key for Chapter Eight
Using statistics in small-scale language education research: Focus on non-parametric data
Total) can be retrieved from the Source Table; k is the number of levels of the independent
variable.
omega2 =
=
SSB  [(k  1) MSW
SST  MSW
617.6  [(4 1)56.19 617.6  [(3)(56.19)
617.6  168.57
449.03
=
=
=
= .138
3202.5  56.19
3258.69
3258.69
3258.69
The results of the analysis could be summarized like this:
More information about the types of instruction received by the four difference classes is needed
to provide a meaningful interpretation of the analysis; however, we note that with 99% certainty
there is a statistically significant difference among the four classes (F = 10.991, p = .001792).
The post hoc Bonferroni analysis indicates that the comparison of Class 1 to Class 4 shows a
statistically significant difference (p = .0026), though effect size (omega2 = .138) is weak.
[Alternatively, the effect size could be calculated for each of the pairwise comparisons, Class 1
to Class 2, Class 1 to Class 3, Class 1 to Class 4, Class 2 to Class 3, Class 2 to Class 4, and Class
3 to Class 4. See page 240 in the text for guidance.]
6
Download