14.5.5 Correlation and Regression with Minitab During this tutorial you will learn how to use Minitab to investigate the association between two continuous variables and how to describe it graphically 5.5.1 In this practical session you will analyse the some bivariate data. To enter the data: File / Open Worksheet / Merlin1 (saved in the ANOVA worksheet, 14.5.1) In order to see what this file contains: Open the Information window; Window / Info. This window gives details of the data stored in the file merlin4.mtw which you are going to analyse. We shall give each of the cases a salary, in £’000, and then see if it is associated with their age. In the first empty column: name the variable Salary, in £’000, and type the following in one column: (Work down these columns one after the other) 38.1 18.7 42.3 25.9 53.6 37.6 38.9 42.8 60.1 60.7 75.2 10.9 23.2 19.6 15.5 20.7 40.2 28.5 22.9 47.5 15.8 35.9 25.3 63.2 19.8 31.3 59.3 33.8 24.5 32.0 19.7 8.5 15.9 39.3 14.5 10.2 15.6 28.5 37.3 32.9 23.9 8.6 31.7 14.1 20.3 19.8 28.2 25.3 17.3 33.5 13.7 6.4 35.3 12.5 37.8 32.9 9.8 15.2 8.7 38.4 The variables of interest in this practical session are the continuous variables SALARY and AGE. We shall investigate the relationship between the Salaries earned by the employees of Merlin and their ages. 5.5.2 Save revised worksheet as Merlin5.mtw 5.5.3 Produce a Scatterplot to see if there appears to be a relationship? Graph / Scatterplot / Simple Select SALARY for Y and AGE for X Scatterplot of Salary vs Age 80 70 60 Salary 50 40 30 20 10 0 20 30 40 50 60 70 Age 1 Examine the plot. Does it suggest a linear relationship? Yes but weak . . . . 5.5.4 Calculate the correlation coefficient. Stat / Basic Statistics / Correlation Select SALARY and AGE as the variables. Correlations: Age, Salary Pearson correlation of Age and Salary = 0.398 P-Value = 0.002 What is the value of the correlation coefficient? . . . . . . . . . . . . . . . . . 0.398 . . . . . What is the probability of it being zero? Is this significant at 5%? . . . . . . . . . . . . . . . . . . . 0.002. . . . . . . . . . . . . . . . . . . . . . . . Yes. . . . . . . . If the p value is less than 0.05 the correlation coefficient is significant at the 5% level of significance. 5.5.5 Find the regression equation: Stat / Regression / Regression Select SALARY for Response, AGE for Predictors. Regression Analysis: Salary versus Age The regression equation is Salary = 7.73 + 0.554 Age Write down the regression equation Salary = 7.73 + 0.554 Age The last output was for the default setting. Minitab can calculate and store the fitted values and the standardised residuals for each observation. Edit / Edit Last Dialog The last dialogue box reopens. Under Storage select Residuals, Standardised residuals and Fits. OK and check that three new columns have been added to your worksheet. 5.5.6 Save this altered version of your file at this stage under a new name Merlin6 File / Save Worksheet as Merlin6 5.5.7 Investigate the Fitted values: Graph / Scatterplot Select FITS1 as the Y-variable and AGE as the X-variable. This produces a straight line as all the fitted values lie on the regression line. 2 To fit this line to a scatterplot requires a graphics plot: Graph / Scatterplot / With regression SALARY against AGE as before. Scatterplot of FITS1 vs Age 45 40 FITS1 35 30 25 20 20 30 40 50 60 70 60 70 Age Scatterplot of Salary vs Age 80 70 60 Salary 50 40 30 20 10 0 20 30 40 50 Age 5.5.8 To print this graph which will not appear in your Session file: File / Print Graph 5.5.9 To predict a Y-value for a given X-value, a 40 year old: Stat / Regression / Regression Select SALARY and AGE as before and then select Options. Type 40 in the Prediction intervals for new observations. New Obs 1 Fit 29.91 SE Fit 1.87 95% CI (26.16, 33.66) 95% PI (1.20, 58.62) Values of Predictors for New Observations Obs Age 1 40.0 3 The output gives the predicted value for y when x is 40 with a confidence interval and prediction interval for this predicted y-value. What is the predicted salary of a 40 year old employee? . . . .£29 900 5.5.10To produce the regression line with its confidence interval and prediction interval: Stat / Regression / Fitted Line Plot / Options: Display Confidence and Prediction bands. Select SALARY and AGE as before. Print this window as before. Fitted Line Plot Salary = 7.728 + 0.5545 Age 80 Regression 95% CI 95% PI Salary 60 S R-Sq R-Sq(adj) 14.2202 15.9% 14.4% 40 20 0 20 30 40 Age 50 60 70 5.5.11 Print your session and/or graphs if required. 4