Computer exercise 1: Simple linear regression

advertisement
732G18/732G21/732A22 Linear statistical models
Department of Computer and Information Science
Computer lab 2: Simple linear regression – model
validation and matrix representation
A simple linear regression model is composed of a linear function of an explanatory
variable x, and a random error . The construction of confidence and prediction intervals
is based on the assumption that all the error terms are statistically independent and N(0;)
for some  > 0. Furthermore, it is assumed that the error terms and the x-variables are
independent. The soundness of these assumptions can be examined by investigating the
model residuals, i.e. the differences between observed and predicted response values.
Matrix representations of regression models have the advantage that they enable
statistical inference from models involving two or more explanatory variables.
Learning objectives
After reading the recommended text and completing the computer lab the student shall be
able to:

To investigate and formally test whether or not a simple linear regression model
can be regarded as a correct model of a given data set.

To write a simple linear regression model in matrix form and to employ matrix
operations to estimate the model parameters.
Recommended reading
Chapter 3 – 5 in Kutner et al.
Assignment 1: Model validation using residual plots
Consider the data set in exercise 1.19 in the textbook and carry out the following:
Use Minitab 15 (Stat  Regression  Regression) to investigate the relationship
between a student’s grade point average (GPA) and American College Testing (ACT )
score. Make all the different residual plots that are offered. Also plot the residuals against
the ACT scores.
Which conclusions can be drawn from the five residual plots? Is there any evidence of:
(i)
Outliers
(ii)
Trends in observation order
(iii)
Non-constant variance
(iv)
Relationship between the error terms and the levels of the explanatory
variable
How does the plot of residuals against fitted values differ from the plot of residuals
against the levels of the explanatory variable?
732G18/732G21/732A22 Linear statistical models
Department of Computer and Information Science
Assignment 2: Model validation for experiments involving
repeated measures
Consider the data set in Exercise 1.22 in the textbook. As can be seen, there are four
observations of plastic hardness at each of four predetermined levels of the time elapsed
since the manufacturing. This implies that we can regard the data as four independent
samples of size four, or sixteen pairs of data where the explanatory variable is varied at
four levels.
a)
b)
c)
d)
e)
Undertake a regression analysis of all sixteen pairs of data and compute the
residual (error) sum of squares (SSE) and the residual (error) mean square
(MSE).
Regard the data as four samples of size four, and compute the residual sum of
squares for each sample. Then compute a total SSE by adding the four
residual sums of squares. Also compute the mean square error for each sample
and a pooled MSE value by taking the average of the residual mean squares
for the four samples.
Compare the SSE values in assignments 2a and b. Does the regression always
produce the largest SSE?
Compare the MSE values in assignments 2a and b. Does the regression always
produce the smallest MSE?
Do the results in 2c and d indicate that a linear regression model is adequate?
Assignment 3: Matrix representation of regression models
Let us first examine the matrix operations offered in MINITAB 15 (Calc  Matrices).
Small matrices can easily be entered from the keyboard. Start by clicking Editor 
Enable commands. Then, click Read, enter the number of rows and columns and the
name of the matrix, and click OK. (This is equivalent to typing commands like read 4 3
m1 in the session window.) Finally enter the matrix elements row by row. Check that
your matrix has been correctly entered by typing the command prin m1 in the session
window or using the toolbar commands Manip  Display Data
a)
b)
c)
Enter a 4*3 matrix and call it m1. Enter a 3*2 matrix and call it m2. Then use
Matrices  Arithmetic to compute the product m3 of m1 and m2. Note that
you may also compute this product by typing the command mult m1 m2 m3 in
the session window. Display the matrix m3 and check that the matrix
operation has been carried out as you expected.
Investigate how you can compute the transpose of a matrix
Investigate how you can compute the inverse m2 of a quadratic matrix m1,
and check that the product of m1 and m2 is the identity matrix.
In the following, we use the cited matrix operations to compute parameter estimates in a
simple linear regression model.
732G18/732G21/732A22 Linear statistical models
Department of Computer and Information Science
d)
e)
f)
g)
h)
i)
Consider the data set from assignment 2 and create a 16*2 matrix X that has
ones in the first column and the levels of the explanatory variable in the
second column.
Transpose the X-matrix. Compute and print the matrix (vector) X’Y, where Y
is the 16*1 matrix (vector) of response values.
Compute and print the matrix (X’X)-1
Estimate the intercept and slope parameters of the regression model by
computing b = (X’X)-1 X’Y. Did you get the same result as in assignment 2?
Estimate the variance of the error terms by computing MSE =e’e/(n-2), where
e is the 16*1 matrix (vector) of residuals and n = 16.
Finally, estimate the covariance matrix of b by computing MSE (X’X)-1. Try
to explain why the estimates of the intercept and the slope are correlated.
To hand in
answers to the highlighted (yellow colour) questions/assignments
no later than Thursday 11 September
Download