Computer exercise 1: Simple linear regression

732G18/732G21/732A22 Linear statistical models Department of Computer and Information Science Computer lab 2: Simple linear regression – model validation and matrix representation A simple linear regression model is composed of a linear function of an explanatory variable x, and a random error . The construction of confidence and prediction intervals is based on the assumption that all the error terms are statistically independent and N(0;) for some  > 0. Furthermore, it is assumed that the error terms and the x-variables are independent. The soundness of these assumptions can be examined by investigating the model residuals, i.e. the differences between observed and predicted response values. Matrix representations of regression models have the advantage that they enable statistical inference from models involving two or more explanatory variables. Learning objectives After reading the recommended text and completing the computer lab the student shall be able to:  To investigate and formally test whether or not a simple linear regression model can be regarded as a correct model of a given data set.  To write a simple linear regression model in matrix form and to employ matrix operations to estimate the model parameters. Recommended reading Chapter 3 – 5 in Kutner et al. Assignment 1: Model validation using residual plots Consider the data set in exercise 1.19 in the textbook and carry out the following: Use Minitab 15 (Stat  Regression  Regression) to investigate the relationship between a student’s grade point average (GPA) and American College Testing (ACT ) score. Make all the different residual plots that are offered. Also plot the residuals against the ACT scores. Which conclusions can be drawn from the five residual plots? Is there any evidence of: (i) Outliers (ii) Trends in observation order (iii) Non-constant variance (iv) Relationship between the error terms and the levels of the explanatory variable How does the plot of residuals against fitted values differ from the plot of residuals against the levels of the explanatory variable? 732G18/732G21/732A22 Linear statistical models Department of Computer and Information Science Assignment 2: Model validation for experiments involving repeated measures Consider the data set in Exercise 1.22 in the textbook. As can be seen, there are four observations of plastic hardness at each of four predetermined levels of the time elapsed since the manufacturing. This implies that we can regard the data as four independent samples of size four, or sixteen pairs of data where the explanatory variable is varied at four levels. a) b) c) d) e) Undertake a regression analysis of all sixteen pairs of data and compute the residual (error) sum of squares (SSE) and the residual (error) mean square (MSE). Regard the data as four samples of size four, and compute the residual sum of squares for each sample. Then compute a total SSE by adding the four residual sums of squares. Also compute the mean square error for each sample and a pooled MSE value by taking the average of the residual mean squares for the four samples. Compare the SSE values in assignments 2a and b. Does the regression always produce the largest SSE? Compare the MSE values in assignments 2a and b. Does the regression always produce the smallest MSE? Do the results in 2c and d indicate that a linear regression model is adequate? Assignment 3: Matrix representation of regression models Let us first examine the matrix operations offered in MINITAB 15 (Calc  Matrices). Small matrices can easily be entered from the keyboard. Start by clicking Editor  Enable commands. Then, click Read, enter the number of rows and columns and the name of the matrix, and click OK. (This is equivalent to typing commands like read 4 3 m1 in the session window.) Finally enter the matrix elements row by row. Check that your matrix has been correctly entered by typing the command prin m1 in the session window or using the toolbar commands Manip  Display Data a) b) c) Enter a 4*3 matrix and call it m1. Enter a 3*2 matrix and call it m2. Then use Matrices  Arithmetic to compute the product m3 of m1 and m2. Note that you may also compute this product by typing the command mult m1 m2 m3 in the session window. Display the matrix m3 and check that the matrix operation has been carried out as you expected. Investigate how you can compute the transpose of a matrix Investigate how you can compute the inverse m2 of a quadratic matrix m1, and check that the product of m1 and m2 is the identity matrix. In the following, we use the cited matrix operations to compute parameter estimates in a simple linear regression model. 732G18/732G21/732A22 Linear statistical models Department of Computer and Information Science d) e) f) g) h) i) Consider the data set from assignment 2 and create a 16*2 matrix X that has ones in the first column and the levels of the explanatory variable in the second column. Transpose the X-matrix. Compute and print the matrix (vector) X’Y, where Y is the 16*1 matrix (vector) of response values. Compute and print the matrix (X’X)-1 Estimate the intercept and slope parameters of the regression model by computing b = (X’X)-1 X’Y. Did you get the same result as in assignment 2? Estimate the variance of the error terms by computing MSE =e’e/(n-2), where e is the 16*1 matrix (vector) of residuals and n = 16. Finally, estimate the covariance matrix of b by computing MSE (X’X)-1. Try to explain why the estimates of the intercept and the slope are correlated. To hand in answers to the highlighted (yellow colour) questions/assignments no later than Thursday 11 September

Computer exercise 1: Simple linear regression

Related documents

Products

Support

Computer exercise 1: Simple linear regression

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib