EDF 6486 Research Methods in Education: Experimental Design

advertisement
EDF 6486
Advanced Analysis in Educational Research
Homework Due March 5, 2012
Solutions
Hinkle, et al
1.
Communication
Client
(X)
1
39
2
27
3
30
4
52
5
30
6
42
7
35
8
32
9
29
10
33
11
46
12
43
13
55
n = 25
ΣX = 962
ΣX2 = 38536
Satisfaction
(Y)
29
32
25
19
30
21
24
33
27
31
23
27
33
Communication
(X)
39
43
37
48
30
42
38
42
25
36
49
42
Client
14
15
16
17
18
19
20
21
22
23
24
25
Satisfaction
(Y)
31
25
24
32
35
22
30
25
33
31
23
27
ΣXY = 26267
ΣY = 692
ΣY2 = 19622
a. & c.
40
35
Satisfaction
30
25
20
15
10
5
0
0
5
10
15
20
25
30
35
40
Communication
1
45
50
55
60
b. The slope of the regression line (b) is found:
b
n  XY   X  Y
n  X 2   X 
2

2526267   962 692  656675  665704  9029


 .238
2
963400  925444 37956
2538536  962 
The Y-intercept (a) is found:
a
 Y  b X
n
692  ( .238)( 962) 692  ( 228.960) 920.960


 36.84
25
25
25
So, the regression equation for predicting marital satisfaction scores (Y) from
communication score (X) is:

Y  .238 X  36.84
c. We can determine two points that are on the regression line. The first is (0, 36.84)
since we know the value of Y when X is zero (the Y-intercept) is equal to a. When X =
10, the value of Y is  .23810  36.84  2.38  36.84  34.46 . We plot these on the
graph on the previous page and connect these points. This gives us the regression line.
d. Using the regression equation, we would predict that a client who has a
communication score of 43 has a marital satisfaction score as shown below.

Y  .238 X  36.84  .238(43)  36.84  10.234  36.84  26.61
e. The standard error of estimate can be found using:

sY  X  sY 1  r 2
 n  1 n  2
(Formula 6.11)
sY is the standard deviation of the criterion variable. In this case it is the marital
satisfaction variable. We know that sY 
Y
2
  Y  n
2
n 1
, so
19622  692  n
19622  478864 25
19622  19154.56
sY 



25  1
24
24
2
2
467.44
 19.48  4.41
24
Now, we can find the standard deviation of the communication variable (X) using the
formula
sX 
X
2
  X  n
2
n 1
38536  962  n

25  1
2
sX 
so,
38536  925444 25

24
38536  37017.76
1518.24

 63.26  7.95
24
24
We can find the correlation between communication and satisfaction in marriage by using
s 
 7.95 
the formula r  bY  X  X   .238
  .2381.80  .428
 4.41 
 sY 
Getting back to Formula 6.11, we can calculate
2
sY  X  sY 1  r 2 n  1 n  2  4.41 1   .428 





 24 23
 4.41 1  .18  1.04   4.41 .82 1.02  4.41.911.02  4.09
Using Formula 6.12 we obtain
2
sY  X  sY 1  r 2  4.41 1   .428   4.41 1  .18  4.41 .82  4.41.91  4.01








f. If clients have communication scores of 28, we would predict that their marital
satisfactions scores would be:

Y  bX  a  .23828  36.84  6.66  36.84  30.18
If the mean of the conditional distribution is 30.18 and the standard error of estimate is
.401, the z-score that is equivalent to a score of 25 is found by

Y  Y 25  30.18  5.18
z


 1.29
sY  X
4.01
4.01
3
.9015
.4015
Predicted Y
Z
25
-1.29
.5000
30.18
0
The figure above shows us that the percentage of clients with communications scores
who will have marital satisfaction scores greater than 25 is 90.15%.
g. If the communication score is 33, the predicted marital satisfaction score is

Y  bX  a  .23833  36.84  7.85  36.84  28.99

The 95% confidence interval for the predicted value of Y is Y  tCV  s   . The number
 Y
of degrees of freedom is n-2. Remember that s  is the standard error of predicted scores
Y
that can be calculated:
s   sY  X
Y
 4.01


1
XX
1 33  38.48
1 
 4.01 1 

2
2
n n  1s X
25
247.95
2
2

 5.48
1  .04 
 4.01
2463.26
2
1.04 
30.03
1518.24
 4.01 1.04  .02  4.01 1.06  4.011.03  4.13
and therefore, the 95% confidence interval for the predicted marital satisfaction score,
given a communication score of 33 is:

Y  tCV  s    28.99  2.0644.13  28.99  8.52
 Y
So, the upper limit of CI95 is 28.99 + 8.52 = 37.51
4
and the lower limit of CI95 is 28.99 – 8.52 = 20.47 .
h. From part e., we know that r = -.428 We can test the null hypothesis that ρ = 0 using
n2
the formula t  r
with n-2 degrees of freedom. For our null hypothesis
1  r2
tr
n2
25  2
23
23
 .428
 .428
 .428
2
2
1 r
1  .183
.817
1   .428
 .428 28.15  .4285.31  2.273
At α = .05 for a two tailed test at 23 degrees of freedom, the critical value of t is ±2.069.
Since our value of t is beyond this value, we will reject the null hypothesis and conclude
that the correlation between communication and marital satisfaction is not zero. That is,
there is a significant correlation between these two variables.
i. From part b., we know that b = -.238. We can test the null hypothesis that β = 0 using
the formula
b
with n-2 degrees of freedom .
t
sY  X
SS X
We know that SS X  n  1s X2  247.95  2463.20  1516.80 .
2
So, t 
 .238  0   .238   .238  2.31
4.01
1516.80
4.01
38.95
.1030
At α = .05 for a two tailed test at 23 degrees of freedom, the critical value of t is ±2.069.
Since our value of t is beyond this value, we will reject the null hypothesis and conclude
that the regression coefficient (β) of the regression line for predicting marital satisfaction
from communication skills is not zero and the a knowledge of communication skills will
enhance prediction of marital satisfaction scores.
Note that the value of t obtained here is the same (allowing for rounding error) as the one
obtained in testing the null hypothesis ρ = 0 in part i of this question.
5
3.
a.
s
b  r  Y
 sX
a

 4.2 
  .67
  .671.17  .784
 3.6 

 Y  b X
n

1854  .7841782  1854  1397.09 456.91


 3.81
120
120
120
So, the regression equation for predicting income from level of education is

Y  .784 X  3.81
b. The income of a person with 13.5 years is found using:

Y  .784 X  3.81  .78413.5  3.81  10.58  3.81  14.39 or $14,390.
c.

sY  X  sY 1  r 2
 n  1 n  2
2
sY X  4.2  1  .67  


(Formula 6.11) so,
 120  1 120  2   4.2
1  .449 119 118
 4.2 .551 1.01  4.2.742 1.00  3.12

Alternatively, sY  X  sY 1  r 2
 (Formula 6.12) so,
2
sY X  4.2 1  .67   4.2 1  .449  4.2 .551  4.2.742  3.12


We can predict the income of those with 15 years of education thusly.

Y  .784 X  3.81  .78415  3.81  11.76  3.81  15.57
The z-score corresponding
to a value of 18.5
thousand dollars is
.1736

Y  Y 18.5  15.57
z

sY  X
3.12

Income
Z
15.57
0
18.50
.94
6
2.93
 .94
3.12
We can conclude that 17.36% of the members of the population have annual incomes
greater than 18.5 thousand dollars.
e. If the shopper indicates he has 16 years of education, the predicted income is

Y  bX  a  .78416  3.81  12.54  3.81  16.35

The 95% confidence interval for the predicted value of Y is Y  tCV  s   . The number
 Y
of degrees of freedom is n-2. Remember that s  is the standard error of predicted scores
Y
that can be calculated:
s   sY  X
Y
 3.12


16  14.85
1
XX
1
1 
 3.12 1 

2
2
n n  1s X
120
1193.6 
2
2

1.15
1  .01 
 3.12
11912.96 
2
1.01 
1.32
1542.24
 3.12 1.01  .02  0  3.12 1.03  3.121.03  3.21
and therefore, the 95% confidence interval for the predicted income, given a level of
education of 16 years is:

Y  tCV  s    16.35  1.983.21  16.35  6.36
 Y
So, the upper limit of CI95 is 16.35 + 6.36 = 22.71
and the lower limit of CI95 is 16.35 – 6.36 = 9.99 .
f. We will test the null hypothesis that ρ = 0 using the formula t  r
n2
with n-2
1  r2
degrees of freedom. So,
120  2
118
118
t  .67
 .67
 .67
 .67 214.55  .6714.65  9.82
2
1  .67
1  .45
.55
At α = .05 for a two tailed test at 118 degrees of freedom, the critical value of t is ±1.980.
Since our value of t is beyond this value, we will reject the null hypothesis and conclude
that the correlation between communication and marital satisfaction is not zero. That is,
there is a significant correlation between these two variables.
7
g. From part a., we know that b = .784. We can test the null hypothesis that β = 0 using
the formula
b
with n-2 degrees of freedom .
t
sY  X
SS X
We know that SS X  n  1s X2  1193.6  11912.96  1542.24 .
2
So, t 
.784  0
3.12
1542.24

.784
.784

 9.8
3.12
.08
39.27
At α = .05 for a two tailed test at 118 degrees of freedom, the critical value of t is ±1.980.
Since our value of t is beyond this value, we will reject the null hypothesis and conclude
that the regression coefficient (β) of the regression line for predicting income from
education is not zero and knowledge of communication skills will enhance prediction of
marital satisfaction scores.
Note that the value of t obtained here is the same (allowing for rounding error) as the one
obtained in testing the null hypothesis ρ = 0 in part f of this question.
8
Green, et al.
Lesson 32, Questions 5-6
We first load the database for this lesson (Lesson 32 Exercise File 2) into the data view
window of the SPSS system. The first portion of the file looks like this.
5. In order to carry out a bivariate linear regression we click on the Analyze menu at the
top of the data view screen and pull down the menu. Click on the Regression submenu
and on the first choice (Linear...) in that submenu. The choice should look like the one
shown on the next page.
9
This should give you the Linear Regression dialog box shown below.
10
Since we wish to predict the number of publications for each professor, the variable
num_pubs is the dependent variable. And since we wish to predict this value from the
work ethic of the professors, that score, (work_eth) is the independent variable. Place
these variables in the appropriate windows of the dialog box. The box should now look
like the one shown below.
Since we will want to plot
the scatterplot for predicted
and residual scores in
Question 6, click on the
Plots button to obtain the
dialog box shown below.
To obtain the desired
scatterplot, move the variable
containing the z-score of the
residual (the difference
between the predicted number
of publications and the actual
number for each professor),
ZRESID, into the Y window.
Then, move the z-score of the
predicted number of
publications for each professor
into the X window as shown on
the next page.
11
Now, click the
Continue button on the
Linear Regression:
Plots dialog box to
obtain the original
Linear Regression
dialog box. Click on
the OK button in this
dialog box to obtain
your output.
a. The output table below is the result of a significance test that assesses the
predictability of the number of publications from the work ethic.
ANOVA b
Model
1
Regres sion
Residual
Total
Sum of
Squares
1922.444
4387.556
6310.000
df
1
48
49
Mean Square
1922.444
91.407
F
21.032
Sig.
.000a
a. Predictors: (Constant), Work Ethic
b. Dependent Variable: Number of publications
This ANOVA tests the null hypothesis H0: R = 0. That is, that there is no predictability
of number of publication by work ethic. The alternative hypothesis is Ha: R ≠ 0, that is
that there is some correlation between the two variables and that one can be predicted
from the other. We see that the probability of the null hypothesis being true is less that
.001. Therefore, we will reject the null hypothesis and conclude that there is a degree of
predictability of number of publications and work ethic for these professors.
12
b. The table below gives us the values of the regression equation.
Coeffi cientsa
Model
1
(Const ant)
W ork Ethic
Unstandardized
Coeffic ients
B
St d. Error
-2. 963
2.823
.450
.098
St andardiz ed
Coeffic ients
Beta
.552
t
-1. 050
4.586
Sig.
.299
.000
a. Dependent Variable: Number of publications

The regression equation will be in the form Y  bX  a . In this case we see that the
value of b, the regression coefficient and the slope of the regression line is .450. The
value of a, the Y-intercept of the regression line, is -2.963. Note, however, that the t-test
for the significance of a (testing the null hypothesis that a = 0) is .299. This is above .05,
so it seems that we cannot reject the null. Therefore, we have to assume that the value of

a is zero. So, the regression equation is Y  .450 X .
c. The output table below shows the correlation between number of publications and
work ethic.
Model Summ aryb
Model
1
R
R Square
.552a
.305
Adjust ed
R Square
.290
St d. Error of
the Es timate
9.561
a. Predic tors: (Constant), Work Ethic
b. Dependent Variable: Number of publicat ions
We see that this correlation is .552.
13
We can see the scatterplot of the predicted and residual scores below.
Scatterplot
Dependent Variable: Number of publications
Regression Standardized
Residual
4
3
2
1
0
-1
-2
-2
-1
0
1
2
Regression Standardized Predicted Value
Note that for predicted values with standardized scores (z-scores) of less than zero (the mean score) the
residuals are about the same and clustered very closely together. Above z = 0, however, the residuals are
spread over many values. This scatterplot lacks homoscedasticity which means that the standard errors for
the predictions are lower for lower predicted values and higher for higher predicted values.
14
Download