Chapter Ten: Practice Problem Key

advertisement

Turner, J. Using statistics in small-scale language education research: Focus on non-parametric data

Chapter Ten: Practice Problem Key

I follow the ten steps in statistical logic in the response that follows. I also demonstrate how to do simple linear regression using R.

Step 1: State hypotheses

H

0:

There is no statistically significant correlation between teachers’ perceived importance of a learning strategy and their reported frequency of use of that strategy.

H

1:

There is a statistically significant correlation between teachers’ perceived importance of a learning strategy and their reported frequency of use of that strategy.

Step 2. Set alpha alpha = .01

Step 3. Identify the appropriate statistic for the analysis

I propose to analyze the data using Pearson’s r because the participants are a sample drawn from a population and the researchers want to generalize the findings to that population. Also the data collection tool is likely to yield normally distributed data.

1) the independent variable data is collected using a tool that yields data which may be normally distributed;

2) the dependent variable data is collected using a tool that yields data which may be normally distributed;

3) each observation is independent of from the others; and

4) the relationship between the two variables is linear.

Step 4. Collect the data.

I’ve entered the data into

R.

Step 5. Check the assumptions

1) the independent variable data is collected using a tool that yields data which may be normally distributed;

2) the dependent variable data is collected using a tool that yields data which may be normally distributed;

3) each observation is independent of from the others;

4) the relationship between the two variables is linear.

The histograms and the outcomes of the Shapiro Wilk analyses indicate that the data for each variable approximate a normal distribution, so assumptions 1 and 2 are met. The 3 rd

assumption is met because the two tools used to the collect the data (the introversion/extroversion survey and the self-reported participation survey) are completely independent of one another; that is, there are no items that appear on both surveys and the participants completed the surveys without any collaboration. The scatterplot shows that the relationship between the two variables is linear (see the scatterplot below).

1

Turner, J. Using statistics in small-scale language education research: Focus on non-parametric data

Step 6. Calculate the observed value of the statistic

The R commands I used to enter the data, check the assumptions, and calculate the observed value of Pearson’s r are presented below.

> imp = c(3.75, 4.57, 3.67, 3.32, 3.94, 4.22, 3.02, 4.03, 4.02, 4, 4.28, 4.25, 3.77, 3.63, 4.14, 2.8)

> use = c(37, 44, 36, 33, 38, 41, 25, 37, 38, 35, 40, 40, 36, 33, 41, 25)

> summary (imp)

Min. 1st Qu. Median Mean 3rd Qu. Max.

2.800 3.660 3.970 3.838 4.160 4.570

> summary (use)

Min. 1st Qu. Median Mean 3rd Qu. Max.

25.00 34.50 37.00 36.19 40.00 44.00

> sd (imp)

[1] 0.4721189

> sd(use)

[1] 5.269013

> par(mfrow = c(1,2))

> hist(imp, col = "orange red", breaks =10)

> hist(use, col = "plum", breaks =10)

> plot(imp, use)

> shapiro.test(imp)

Shapiro-Wilk normality test data: imp

W = 0.9376, p-value = 0.3205

> shapiro.test(use)

Shapiro-Wilk normality test data: use

W = 0.8936, p-value = 0.06349

2

Turner, J. Using statistics in small-scale language education research: Focus on non-parametric data

> cor.test(imp,use)

Pearson's product-moment correlation data: imp and use t = 12.7121, df = 14, p-value = 4.448e-09 [.00000000444] alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.8839517 0.9860926 sample estimates:

cor

0.9593079

3

Turner, J. Using statistics in small-scale language education research: Focus on non-parametric data

Step 7. Calculate the exact probability of the statistic

I simply retrieve the exact probability from the R output; exact p = p-value < .00000000444

Step 8. Compare the exact probability to alpha

The rules for interpreting exact probability are:

If exact probability ≥ alpha → accept the null hypothesis

If exact probability < alpha → reject the null hypothesis

The exact probability, p = . .00000000444, is less than alpha, .01, so reject the null hypothesis and accept the alternative hypotheses.

H

1:

There is a statistically significant correlation between teachers’ perceived importance of a learning strategy and their reported frequency of use of that strategy.

Step 9. Make the probability statement

We can be 95% certain that there is a statistically significant correlation between teachers’ perceived importance of a learning strategy and their reported frequency of use of that strategy.

Step 10. Interpret the meaningfulness

There are two avenues for interpreting meaningfulness: 1) with reference to the research question, and 2) by calculating effect size.

We can be 95% certain that there is a statistically significant correlation between teachers’ perceived importance of a learning strategy and their reported frequency of use of that strategy (Pearson’s r = . 0.9593079 ; p = . .00000000444). The effect size is strong ( r

2

= .92).

********

Once you’ve explored the data and determined statistical significance, because the null hypothesis was rejected, you can estimate (or predict) a teacher’s likely frequency of use of a strategy based on that person’s rating of the importance of the strategy. The estimation (or prediction) is based on the correlation found between the independent variable ( x ) and dependent variable ( y ). The steps for estimating what y would be for a specific value of x is described in the chapter; the following discussion is an explanation of how to calculate estimated or predicted y using R.

The predictor variable is the x -variable ( perceived importance of a strategy ); the variable to be predicted is the y -variable ( self-reported use ). The perceived importance score of interest is 3.20. What would the self-reported frequency of use be for a person whose perceived importance score is 3.20?

The order in which you place the variables in an R command is important. The lm command derives from the idea of “linear modeling”. Given that there’s a statistically significant linear

4

Turner, J. Using statistics in small-scale language education research: Focus on non-parametric data relationship between introversion/extroversion and self-reported participation in class, a person’s reported level of use of a strategy can be estimated from that individual’s perceived importance of the strategy.

The relationship between the two variables can be seen in the scatterplot below. Linear modeling allows calculation of the “estimated” or predicted value of Y, which is represented as

Y

Calculating the predicted value is more precise than “eye-balling” the scatterplot.

.

The formula for Y is

Y

Y

 b

X

X

, where b is the slope of the line. R uses a different formula, though the formula for calculating Y

with a calculator and the formula R uses are derived from the same principle.

> lm(use~imp)

Call: lm(formula = use ~ imp)

Coefficients:

(Intercept) imp

-4.904 10.706

> res=lm(use~imp)

5

Turner, J. Using statistics in small-scale language education research: Focus on non-parametric data

> res

Call: lm(formula = use ~ imp)

Coefficients:

(Intercept) imp

-4.904 10.706

> plot(imp,use)

> abline(res)

> betas = coef(res)

> betas

(Intercept) imp

-4.904288 10.706214

> sum(betas*c(1, 3.20))

[1] 29.3556

The estimated/predicted frequency of use of a strategy for a person who has a perceived importance score of 3.20 is 29.36. A confidence band for this estimate can be calculated using the formula for the standard error of the estimate (SEE). The standard deviation of y was calculated above (s y

= 5.269013) as was the correlation of the two variables ( r = 0.9593079).

The value of r 2 is .92.

SEE = s y

1

 r

2 

= 5.269013 .08 = (5.269013)(.28) = 1.48

One can be 68% confident that the reported frequency of use for a person who has a perceived importance score of 3.20 will be between +/- 1.48 points of the estimated value of y, 29.3556.

27.88 and 30.84.

6

Download