Turner Answer Key for Chapter Eleven 5 2015 Using statistics in

advertisement
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
Chapter Eleven
Practice Problems Answer Key
1. I chose to find the correlation between the performance of students on the oral debate and their written report. I
believe the written report is closely related to the topic of the debate.
Research question: Is there a statistically significant relationship between a learner’s performance in a debate and
the quality of the individual’s written work when writing on the debate topic? I’ve assigned the role of independent
variable to performance on the debate and the role of dependent variable to quality of written report. Now I follow
the steps in statistical logic.
Step 1: State hypotheses
H0: There is no statistically significant correlation statistically significant correlation between a learner’s
performance in an oral debate and the quality of the individual’s written report on the debate topic.
H1: There is a statistically significant correlation between a learner’s performance in an oral debate and the
quality of the individual’s written report on the debate topic.
Step 2. Set alpha
alpha = .01
Step 3. Identify the appropriate statistic for the analysis
I propose to analyze the data using Spearman rho or Kendall’s tau because:
1) the independent variable data is collected using a tool that yields rankable ordinal data;
2) the dependent variable data is collected using a tool that yields rankable ordinal data;
3) each observation is independent of from the others;
4) if there are tied scores within a variable, use Kendall’s tau.
Step 4. Collect the data.
The table presents all three variables—in this example I find the correlation between performance in the debate and
performance on the written report.
Data table
Participant
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
debate
99
78
88
97
93
94
91
95
98
79
85
94
95
94
90
92
94
94
89
brief
94
80
92
96
94
92
90
94
96
83
83
92
92
96
94
94
92
90
76
report
98
86
93
95
94
98
90
94
95
84
89
95
95
95
92
87
93
92
77
1
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
20
21
22
97
94
88
90
90
90
89
89
89
Step 5. Check the assumptions
I proposed to analyze the data using Spearman rho or Kendall’s tau. These are the assumptions I need to check and I
need to examine the data to see if there are tied scores.
1) the independent variable data is collected using a tool that yields rankable ordinal data;
2) the dependent variable data is collected using a tool that yields rankable ordinal data;
3) each observation is independent of from the others.
From examining the data in the Table, I see that the outcomes are rankable so assumptions 1 and 2 are met. The
debate and the written report are separate assignments and the scoring rubrics for the debate and the written report
are entirely distinct so the third assumption is met, too. There are some tied data within each of the two variables, so
I’ll use Kendall’s tau.
Step 6. Calculate the observed value of the statistic
I calculated and report descriptive statistics and just for fun, calculated the Shapiro Wilk statistic for each variable,
made histograms for each, and made a scatterplot before calculating the observed value of Kendall’s tau. The R
commands I used are presented below.
> data = read.csv(file.choose(), header =T)
> View (data)
> summary (data$debate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
78.00 89.25 94.00 91.73 94.75 99.00
> summary (data$report)
Min. 1st Qu. Median Mean 3rd Qu. Max.
77.00 89.00 92.50 91.32 95.00 98.00
> sd(data$debate)
[1] 5.504819
> sd(data$report)
[1] 4.912437
> par(mfrow = c(1,2))
> hist(data$debate, col = "deep sky blue", breaks =10)
> hist (data$report, col = "dark salmon", breaks = 10)
> shapiro.test (data$debate)
Shapiro-Wilk normality test
data: data$debate
W = 0.8778, p-value = 0.01095
2
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
> shapiro.test (data$report)
Shapiro-Wilk normality test
data: data$report
W = 0.9064, p-value = 0.03991
> plot(data$debate, data$report)
> cor.test (data$debate, data$report, method = "kendall", exact =F)
Kendall's rank correlation tau
data: data$debate and data$report
z = 3.4082, p-value = 0.0006538
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.55661
3
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
Step 7. Calculate the exact probability of the statistic
I simply retrieve the exact probability from the R output, so exact p = 0.0006538
Step 8. Compare the exact probability to alpha
The rules for interpreting exact probability are:
If exact probability ≥ alpha → accept the null hypothesis
If exact probability < alpha → reject the null hypothesis
The exact probability, p = 0.0006538, is less than alpha, .01, so reject the null hypothesis and accept the alternative
hypothesis.
H1: There is a statistically significant correlation between a learner’s performance in an oral debate and the
quality of the individual’s written report on the debate topic.
Step 9. Make the probability statement
We can be 99% certain that there is a statistically significant correlation between a learner’s performance in
an oral debate and the quality of the individual’s written report on the debate topic.
Step 10. Interpret the meaningfulness
There are two avenues for interpreting meaningfulness: 1) with reference to the research question, and 2) by
calculating effect size.
We discovered that there is a statistically significant relationship between learners’ performance in a debate
and the quality of their written work when writing on the debate topic (Kendall’s tau = 0.55661; p =
0.0006538 ). Effect size is not typically calculated for Kendall’s tau, though Anglim [2012, cited in Turner
(2014] indicates that the squared value of Kendall’s can be reported and interpreted as shared variance
(shared variance = .30.)
2. Derek Yiu (2011) agreed to share his data from his research project. Here’s part of his abstract.
Are you a blabbermouth? A mixed-method study of personality and oral classroom participation
Ho Yin Yiu (Derek)
Monterey Institute of International Studies
Abstract
In light of recent research on the role of personality in language learning, this study investigates the constructs of
extroversion and introversion and their relationship to participation in the language classroom. Specifically, I
explore the relationship between students’ self-perception of their own introversion and extroversion, and their selfreported oral classroom participation.
Quantitative data were collected by means of a questionnaire. Participants included 42 native speakers of English
who have studied, or were studying, a foreign language.
For the explanation below I imported the dataset from the Companion Website. I give a summary of the R
commands I used in Step 6 below.
4
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
Step 1: State hypotheses
H0: There is no statistically significant correlation between language learners’ perception of their degree of
introversion/extroversion and their self-reported amount of participation in language classes.
H1: There is a statistically significant correlation between language learners’ perception of their degree of
introversion/extroversion and their self-reported amount of participation in language classes.
Step 2. Set alpha
alpha = .05, because the research is exploratory.
Step 3. Identify the appropriate statistic for the analysis
I propose to analyze the data using one of the correlation statistics, Pearson’s r, Spearman rho, or Kendall’s tau.
Derek’s data collection tool may yield normally distributed data—if the data are normally distributed I’ll use
Pearson’s r. If they are not normally distributed, I’ll use Spearman rho or Kendall’s tau depending on whether there
are tied scores within the data.
1) the independent variable data is collected using a tool that yields data which may be normally
distributed;
2) the dependent variable data is collected using a tool that yields data which may be normally distributed;
3) each observation is independent of from the others;
4) the relationship between the two variables is linear;
4) if there are tied scores within a variable, I will use Kendall’s tau.
Step 4. Collect the data.
The data can be retrieved from Companion Website or entered directly into R from the table presented in the
problem. If you import the dataset, note that the column with the header vert includes the introversion-extroversion
scores and the column with the header part includes the self-reported participation scores.
Step 5. Check the assumptions
1) the independent variable data is collected using a tool that yields data which may be normally
distributed;
2) the dependent variable data is collected using a tool that yields data which may be normally distributed;
3) each observation is independent from the others;
4) the relationship between the two variables is linear.
The histograms and the outcomes of the Shapiro Wilk analyses indicate that the data for each variable approximate a
normal distribution, so assumptions 1 and 2 are met (the R commands are presented in Step 6 below). The two tools
used to the collect the data (the introversion/extroversion survey and the self-reported participation survey) are
completed distinct from one another, so the 3rd assumption is met. The scatterplot shows that the relationship
between the two variables is linear (see the scatterplot below), so the 4th assumption is met, too.
Step 6. Calculate the observed value of the statistic
The R commands I used to calculate the descriptive statistics, make histograms, make a scatterplot, and calculate the
correlation are presented below.
> derek.data = read.csv(file.choose(), header =T)
> View (derek.data)
> summary (derek.data$vert)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.000 4.215 5.335
5.423 6.453 8.070
5
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
> summary(derek.data$part)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.33 5.00 5.83
5.73 6.50 9.00
> sd(derek.data$vert)
[1] 1.390546
> sd(derek.data$part)
[1] 1.567391
> par(mfrow = c(1,2))
> hist(derek.data$vert, col = "medium spring green", breaks = 10)
> hist(derek.data$part, col = "light slate blue", breaks = 10)
> shapiro.test(derek.data$vert)
Shapiro-Wilk normality test
data: derek.data$vert
W = 0.9644, p-value = 0.2121
> shapiro.test(derek.data$part)
Shapiro-Wilk normality test
data: derek.data$part
W = 0.9702, p-value = 0.3345
> plot(derek.data$vert, derek.data$part)
> cor.test(derek.data$vert, derek.data$part)
Pearson's product-moment correlation
data: derek.data$vert and derek.data$part
t = 21.7879, df = 40, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9270025 0.9786407
sample estimates:
cor
0.9603577
6
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
Step 7. Calculate the exact probability of the statistic
I simply retrieve the exact probability from the R output; p-value < 2.2e-16 (which is .00000000000000022)
7
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
Step 8. Compare the exact probability to alpha
The rules for interpreting exact probability are:
If exact probability ≥ alpha → accept the null hypothesis
If exact probability < alpha → reject the null hypothesis
The exact probability, p = .00000000000000022, is less than alpha, .01, so reject the null hypothesis and accept the
alternative hypotheses.
H1: There is a statistically significant correlation between language learners’ perception of their degree of
introversion/extroversion and their self-reported amount of participation in language classes.
Step 9. Make the probability statement
We can be 95% certain that there is a statistically significant correlation between language learners’
perception of their degree of introversion/extroversion and their self-reported amount of participation in
language classes.
Step 10. Interpret the meaningfulness
There are two avenues for interpreting meaningfulness: 1) with reference to the research question, and 2) by
calculating effect size.
We can be 95% certain that there is a statistically significant correlation between language learners’
perception of their degree of introversion/extroversion and their self-reported amount of participation in
language classes (Pearson’s r = .96; p = .00000000000000022). The effect size is strong (r2 = .92).
Because the null hypothesis was rejected, one can estimate (or predict) someone’s (self-reported) level of class
participation based on that person’s self-reported introversion/extroversion. The following discussion is an
explanation of how to calculate estimated or predicted y using R. The predictor variable is the x-variable; the
variable to be predicted is the y-variable. The order in which you place the variables in an R command is important!
In this example, the dependent variable (y) is self-reported participation. The independent variable (x) is
introversion/extroversion.
The lm command derives from the idea of “linear modeling”. Given that there’s a statistically significant linear
relationship between introversion/extroversion and self-reported participation in class, you can estimate what a
person’s level of participation would be given that individual’s reported introversion/extroversion. You can see this
in the scatterplot (below)—someone who has an introversion/
extroversion score of about 5.25 is likely to have participation between 4.75 or so and 6 or so. Linear modeling

allows calculation of the “estimated” or predicted value of Y, which is represented as Y . Calculating the predicted

value is more precise than “eye-balling” the scatterplot. The formula for



Y is Y  Y  b X  X , where b is the

slope of the line. R uses a different formula, though both the formula for calculating
formula R uses are derived from the same principle.
8
Y with a calculator and the
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
The commands for doing linear regression are presented in the grey box below.
The lm command indicates that I want to predict a value
> lm(derek.data$part~ derek.data$vert)
for y (participation) from an individual’s performance on
x (introversion/extroversion).
The res command allows retrieval of the results. So this
> res = lm(derek.data$part~derek.data$vert)
command “says” save the results of doing linear
modeling of y on x.
Typing ‘res’ gives you the results of the linear modeling,
>res
two coefficients: beta 1 (the intercept) and beta 2 (the
Output
slope).
Call:
lm(formula = derek.data$part ~ derek.data$vert)
Coefficients:
9
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
(Intercept) derek.data$vert
-0.141
1.082
> plot (derek.data$vert, derek.data$part)
> abline (res)
The output looks like this:
Make a scatterplot again:
Then make a line from ‘a’ to ‘b’ using the abline
command.
This command recalls the “betas”, which are needed in
the formula R uses to estimate (predict) y.
Typing “betas” presents the values.
> betas = coef (res)
This command calculated the estimated value of y for x =
5.25. (The formula says: Get the beta values. Then
multiple beta 1 (the intercept) by 1 and multiple beta 2
(the slope) by the value of x that we want to predict from
(5.25)—and then add these two products. The sum is the
estimated (predicted) level of self-reported participation
for a person who has a self-reported
introversion/extroversion score of 5.25.
> betas
(Intercept) derek.data$vert
-0.1409579
1.0824929
> sum (betas * c(1,5.25))
[1] 5.54213

To calculate a confidence interval for Y , you can follow the description in the chapter for calculating the standard
error of the estimate, SEE.
The standard deviation for y, participation, is 1.567391 and the mean is 5.74. The correlation is .96. The formula for
SEE is:
SEE  S y 1  rxy2 =
1.567391 1  .962 =
1.567391 1  .92 = 1.567391 .08 = (1.567391)(.2828) = .44 points

To determine the 68% confidence band for
Y (which is 5.54213), find the low end of the confidence band by



Y and find the high end of the confidence band by adding SEE to Y . [( Y - SEE) for the
subtracting SEE from

lower boundary and ( Y + SEE) for the upper boundary.]
10
Turner Answer Key for Chapter Eleven 5 2015
Using statistics in small-scale language education research: Focus on non-parametric data
I can be 68% confident that a person whose self-reported level of introversion/extroversion is 5.25 will have a selfreported level of participation between 4.81 and 5.69.
Bibliography
Yiu, D. (2011). Are you a blabbermouth? A mixed-method study of personality and oral classroom participation.
(Unpublished paper, Graduate School of Translation, Interpretation, and Language Education, Monterey Institute of
International Studies, Monterey, CA.)
11
Download