Correlation and regression in Minitab with output Word Document

advertisement
14.5.5
Correlation and Regression with Minitab
During this tutorial you will learn how to use Minitab to investigate the association
between two continuous variables and how to describe it graphically
5.5.1 In this practical session you will analyse the some bivariate data.
To enter the data:
File / Open Worksheet / Merlin1 (saved in the ANOVA worksheet, 14.5.1)
In order to see what this file contains: Open the Information window; Window / Info.
This window gives details of the data stored in the file merlin4.mtw which you are
going to analyse.
We shall give each of the cases a salary, in £’000, and then see if it is associated with
their age.
In the first empty column: name the variable Salary, in £’000, and type the following
in one column: (Work down these columns one after the other)
38.1
18.7
42.3
25.9
53.6
37.6
38.9
42.8
60.1
60.7
75.2
10.9
23.2
19.6
15.5
20.7
40.2
28.5
22.9
47.5
15.8
35.9
25.3
63.2
19.8
31.3
59.3
33.8
24.5
32.0
19.7
8.5
15.9
39.3
14.5
10.2
15.6
28.5
37.3
32.9
23.9
8.6
31.7
14.1
20.3
19.8
28.2
25.3
17.3
33.5
13.7
6.4
35.3
12.5
37.8
32.9
9.8
15.2
8.7
38.4
The variables of interest in this practical session are the continuous variables
SALARY and AGE. We shall investigate the relationship between the Salaries
earned by the employees of Merlin and their ages.
5.5.2 Save revised worksheet as Merlin5.mtw
5.5.3 Produce a Scatterplot to see if there appears to be a relationship?
Graph / Scatterplot / Simple Select SALARY for Y and AGE for X
Scatterplot of Salary vs Age
80
70
60
Salary
50
40
30
20
10
0
20
30
40
50
60
70
Age
1

Examine the plot. Does it suggest a linear relationship?
Yes but weak . . . .
5.5.4 Calculate the correlation coefficient.
Stat / Basic Statistics / Correlation Select SALARY and AGE as the variables.
Correlations: Age, Salary
Pearson correlation of Age and Salary = 0.398
P-Value = 0.002

What is the value of the correlation coefficient? . . . . . . . . . . . . . . . . . 0.398 . . . . .

What is the probability of it being zero?

Is this significant at 5%?
. . . . . . . . . . . . . . . . . . . 0.002. . . . . . .
. . . . . . . . . . . . . . . . . Yes. . . . . . . .
If the p value is less than 0.05 the correlation coefficient is significant at the 5% level
of significance.
5.5.5 Find the regression equation:
Stat / Regression / Regression Select SALARY for Response, AGE for Predictors.
Regression Analysis: Salary versus Age
The regression equation is
Salary = 7.73 + 0.554 Age

Write down the regression equation
Salary = 7.73 + 0.554 Age
The last output was for the default setting. Minitab can calculate and store the fitted
values and the standardised residuals for each observation.
Edit / Edit Last Dialog The last dialogue box reopens.
Under Storage select Residuals, Standardised residuals and Fits.
OK and check that three new columns have been added to your worksheet.
5.5.6 Save this altered version of your file at this stage under a new name Merlin6
File / Save Worksheet as Merlin6
5.5.7 Investigate the Fitted values:
Graph / Scatterplot Select FITS1 as the Y-variable and AGE as the X-variable.
This produces a straight line as all the fitted values lie on the regression line.
2
To fit this line to a scatterplot requires a graphics plot:
Graph / Scatterplot / With regression SALARY against AGE as before.
Scatterplot of FITS1 vs Age
45
40
FITS1
35
30
25
20
20
30
40
50
60
70
60
70
Age
Scatterplot of Salary vs Age
80
70
60
Salary
50
40
30
20
10
0
20
30
40
50
Age
5.5.8 To print this graph which will not appear in your Session file:
File / Print Graph
5.5.9 To predict a Y-value for a given X-value, a 40 year old:
Stat / Regression / Regression
Select SALARY and AGE as before and then select Options.
Type 40 in the Prediction intervals for new observations.
New
Obs
1
Fit
29.91
SE Fit
1.87
95% CI
(26.16, 33.66)
95% PI
(1.20, 58.62)
Values of Predictors for New Observations
Obs
Age
1 40.0
3
The output gives the predicted value for y when x is 40 with a confidence interval and
prediction interval for this predicted y-value.

What is the predicted salary of a 40 year old employee?
. . . .£29 900
5.5.10To produce the regression line with its confidence interval and prediction interval:
Stat / Regression / Fitted Line Plot / Options: Display Confidence and Prediction
bands. Select SALARY and AGE as before. Print this window as before.
Fitted Line Plot
Salary = 7.728 + 0.5545 Age
80
Regression
95% CI
95% PI
Salary
60
S
R-Sq
R-Sq(adj)
14.2202
15.9%
14.4%
40
20
0
20
30
40
Age
50
60
70
5.5.11 Print your session and/or graphs if required.
4
Download