extensive commentary

advertisement
Graphical analysis of multivariate data
Choose the columns containing marks on the final exam (final), marks on the assignment
(assignment) and marks on the midterm exam (midterm). Then make the following choices:
Graphs -> Legacy Dialogs -> Scatter/Dot
Choose MatrixScatter and press the button labelled Define. Choose as Matrix
variables the three variables final, assignment and midterm and press Ok.
Mark on midterm
test
Mark on
assignment
Mark on final
exam
The result will be the following array of scatter-plots:
Mark on final
exam
Mark on
assignment
Mark on midterm
test
We see that there is a linear pattern between Mark on midterm test and Mark on final exam,
but not between Mark on assignment and Mark on final exam, thus suggesting that only the
mark on the midterm test will be significant in predicting the mark on the final exam.
The absence of any pattern in the plot of Mark on midterm test and Mark on assignment
indicates that these two variables are uncorrelated; we expect no problem with multicollinearity in the statistical analyses that are to follow.
Statistical analyses
Choose the columns containing marks on the final exam (final), marks on the assignment
(assignment) and marks on the midterm exam (midterm). Then make the following choices:
Analyze -> Regression -> Linear
Choose as Dependent variable Mark on final exam, and as Independent variables
Mark on assignment and Mark on midterm test.
Press the button labelled Statistics and check the following boxes:
Regression coefficients
Confidence intervals
Covariance matrix
Model fit
Descriptives
Press the button labelled Continue, to leave the Statistics window. To analyse the data as
specified by the commands given above, press Ok and SPSS will produce the following
outputs.
You should pay attention to the highlighted numbers below.
Descriptive Statistics
Mark on final exam
Mark on assignment
Mark on midterm test
Mean
35,4333
14,7000
Std. Deviation
7,43562
3,49532
17,6000
5,74516
N
30
30
30
Correlations
Pearson Correlation
Sig. (1-tailed)
Mark on final exam
Mark on assignment
Mark on midterm test
Mark on final exam
Mark on assignment
N
Mark on midterm test
Mark on final exam
Mark on assignment
Mark on midterm test
Mark on
final exam
1,000
Mark on
assignment
,180
Mark on
midterm test
,869
,180
,869
1,000
,000
,104
.
1,000
,104
,170
,170
,000
30
30
.
,293
30
30
,293
30
30
30
.
30
30
We see that the correlation between Mark on final exam and Mark on midterm test is high
(0.869) and significant (p-value = 0.000), just as we saw in the plot-matrix above. The
correlation between Mark on final exam and Mark on assignment is weak (0.180), suggesting
that the assignment mark will not be significant in predicting the mark on final assignment.
Furthermore, there seems to be no linear relationship between the two “x-variables” Mark on
midterm test and Mark on assignment, since their correlation is weak (0.104), just as we saw
in the plot-matrix!
Model Summary(b)
Std. Error of
Adjusted R
R Square
the Estimate
R
Square
,763
3,75245
,873(a)
,745
a Predictors: (Constant), Mark on midterm test, Mark on assignment
b Dependent Variable: Mark on final exam
Model
1
The coefficient of determination is quite high (0.763) indicating that the multiple linear
regression model Yi = b 0 + b 1 X 1i + b 2 X 2i + ei is appropriate for these data; the symbols
Yi denote mark person number i received on the final exam, X 1i the mark person number i
received on the assignment and X 2i the mark person number i received on the midterm exam.
The standard error of the estimate is also quite low (3.75) indicating that the overall
differences between the observed marks on the final exam and those predicted by the linear
model are small.
ANOVA(b)
Model
1
Regression
Sum of
Squares
1223,184
Residual
Total
380,183
1603,367
df
2
Mean Square
611,592
27
29
F
43,434
Sig.
,000(a)
14,081
a Predictors: (Constant), Mark on midterm test, Mark on assignment
b Dependent Variable: Mark on final exam
The regression sum of squares (1223) is a considerable part of the total sum of squares
(1603); more specifically it constitutes 76.3 percent of the total variation in the marks on the
final exam; this is the coefficient of determination. The value of the F-statistic is high
(43.434) and significant (p-value = 0.000), indicating that we can reject the null hypothesis
that b 0 = b 1 = 0 . Consequently one, or perhaps both, are different from zero. In order to
determine which one of them has a significant effect on predicting the final mark, we make
individual t-tests, using confidence intervals computed for each variable.
Coefficients(a)
Unstandardized
Coefficients
Model
1
B
13,009
Std.
Error
3,528
Mark on
assignment
,194
,200
Mark on
midterm test
1,112
,122
(Constant)
Standardized
Coefficients
95% Confidence
Interval for B
t
Sig.
3,688
,001
Tolerance
5,771
VIF
20,248
,091
,968
,342
-,217
,605
,859
9,120
,000
,862
1,362
Beta
a Dependent Variable: Mark on final exam
The confidence interval for the coefficient b1 (Mark on assignment) contains the number
zero, indicating that we cannot reject the hypothesis b1 = 0 on the level of significance 0.05.
The mark a person receives on assignment is therefore not (statistically) significant in
predicting the mark received on the final exam; this is consistent with our previous
observations in the scatterplot-matrix where we did not detect any pattern between marks on
assignment and marks on final grade. On the other hand the mark on the midterm exam is
significant since the confidence interval for b 2 does not contain zero, so we can reject the
hypothesis that b 2 = 0 on the level of significance 0.05.
We conclude by stating that the following model seems to fit the data well:
Mark on final exam = 13 + Mark on midterm exam.
Download