Computer lab 2: Simple linear regression – model validation and

advertisement
Computer lab 2: Simple linear regression – model validation and
matrix representation
A simple linear regression model is composed of a linear function of an explanatory variable x, and a
random error ε. The construction of confidence and prediction intervals is based on the assumption
that all the error terms are statistically independent and N(0 ; σ) for some σ > 0. Furthermore, it is
assumed that the error terms and the x-variables are independent. The soundness of these
assumptions can be examined by investigating the model residuals, i.e. the differences between
observed and predicted response values.
Matrix representations of regression models have the advantage that they enable statistical
inference from models involving two or more explanatory variables.
Learning objectives
After reading the recommended text and completing the computer lab the student shall be able to:
•
•
To investigate and formally test whether or not a simple linear regression model can be
regarded as a correct model of a given data set.
To write a simple linear regression model in matrix form and to employ matrix operations to
estimate the model parameters.
Recommended reading
Chapter 3 – 5 in Kutner et al.
Assignment 1: Model validation using residual plots
Consider the data set in exercise 1.20 in the textbook and carry out the following:
Use Minitab 15 (Stat → Regression → Regression) to investigate the relationship between total
number of minutes spent by the service person and the number of copiers serviced. Make all the
different residual plots that are offered. Also plot the residuals against the number of copies
serviced.
Which conclusions can be drawn from the five residual plots? Is there any evidence of:
i.
ii.
iii.
iv.
Outliers
Trends in observation order
Non-constant variance
Relationship between the error terms and the levels of the explanatory variable
How does the plot of residuals against fitted values differ from the plot of residuals against the levels
of the explanatory variable?
Assignment 2: Matrix representation of regression models
Let us first examine the matrix operations offered in MINITAB 15 (Calc → Matrices).
Small matrices can easily be entered from the keyboard. Start by clicking Editor → Enable
commands. Then, click Read, enter the number of rows and columns and the name of the matrix,
and click OK. (This is equivalent to typing commands like read 4 3 m1 in the session window.) Finally
enter the matrix elements row by row. Check that your matrix has been correctly entered by typing
the command prin m1 in the session window or using the toolbar commands Manip → Display Data.
In the following, we use the cited matrix operations to compute parameter estimates in a simple
linear regression model.
a) Consider the data set from exercise 1.21 in the textbook and create a 10*2 matrix X that has
ones in the first column and the levels of the explanatory variable in the second column.
b) Transpose the X-matrix. Compute and print the matrix (vector) X’Y, where Y is the 10*1
matrix (vector) of response values.
c) Compute and print the matrix (X’X)-1
d) Estimate the intercept and slope parameters of the regression model by computing
b = (X’X)-1 X’Y.
e) Estimate the variance of the error terms by computing MSE =e’e/(n-2), where e is the 10*1
matrix (vector) of residuals and n = 10.
f) Finally, estimate the covariance matrix of b by computing MSE (X’X)-1. Try to explain why the
estimates of the intercept and the slope are correlated.
To hand in
Answers to Assignment 1 and c, d, e, f in Assignment 2.
The lab report should be handed in no later than 5 days after the scheduled computer lab. Use Lisam
(lisam.liu.se) for handing in the assignments.
Download