How to interprete the minitab output of a regression analysis:

advertisement
How to interpret a minitab output of a regression analysis:
Step I:
Model: From the description of the problem, it says that this a time series data where the weight of soap
depends on the number of days it had been used. Thus dependent variable(y) is weight of the soap and
independent variable is the number of days (x).
We wish to fit a liner model Y = α + βx
Step II:
The following scatter diagram shows that
1. there is a inverse relationship between x and y, that is as the number of days increase, weight of the
soap decreases.
2. We see a distinct liner trend among the data points supporting our model in step I.
Scatterplot of Weight vs Day
140
120
Weight
100
80
60
40
20
0
0
5
10
15
20
25
Day
3. Pearson correlation of Day and Weight = -0.998
P-Value = 0.000. This tells us that the sample estimates of Pearson correlation of Day and
Weight is -0.998 based on 14 observations. When test for significance, a low p-value rejects
the null that rho=0 and we conclude that the sample estimates just did not come from the
noise. There is a meaningful linear relationship between the two variables.
Step III & IV:
Estimates and evaluation:
We estimate the model using least square method. The computation from the minitab is as follows:
The regression equation is
Weight = 123 - 5.57 Day
Interpretation: the line intersects y axis at 123 with a slope of -5.57. that is on the
day=0, weight is 123gm and for each increase in a day, the weight of the soap decreases
on the average by 5.57 grams.
Predictor
Constant
Day
Coef
123.141
-5.5748
SE Coef
1.382
0.1068
T
89.09
-52.19
P
0.000
0.000
Interpretation: the sample estimates of alpha and beta are 123.141 and -5.57
respectively. The corresponding test statistics are 89.09 and -52.10 indicating that these
are too large values of t-statististics and lie on the extreme ends of t-curve.
Thus we reject the null hypothesis of alpha =o and beta=o. And conclude that the beta
and alpha play a significant role in the regression model.
S = 2.94921
R-Sq = 99.5%
R-Sq(adj) = 99.5%
Interpretation: the standard deviation of the error terms is 2.94. A 99.5% R-sqadj
indicates that when ever we observe a variation in the value of y, 99.5% of it is due to
the model (or due to change in x) and only .5% is due error or some unexplained factor.
That is this data fits well to the linear model.
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
13
14
SS
23694
113
23807
MS
23694
9
F
2724.11
P
0.000
Interpretation: In this case ANOVA tests the hypothesis that beta=0. In fact F is
nothing but T-square. A low p-value suggest that beta plays a significant role in the
model, this is just reassurance of the t-test.
Unusual Observations
Obs
10
15
Day
12.0
22.0
Weight
50.000
6.000
Fit
56.244
0.496
SE Fit
0.772
1.418
Residual
-6.244
5.504
St Resid
-2.19R
2.13R
R denotes an observation with a large standardized residual.
Interpretation: the observation number 10 and number 15 are outliers. We need to go
back and review what happened on those days , either soap is used too much or too less.
To improve the model, we would like to delete those observation and recompute the
line.
Step 5:
Checking the validity of the assumptions:
We made the assumptions that the all the error terms are identically and independently
normally distributed with mean 0 and common variance sigma –square.
Residual Plots for Weight
Normal Probability Plot of the Residuals
Residuals Versus the Fitted Values
99
5.0
Residual
Percent
90
50
10
1
0.0
-2.5
-5.0
-5.0
-2.5
0.0
Residual
2.5
5.0
0
Histogram of the Residuals
5.0
4.5
2.5
3.0
1.5
0.0
30
60
Fitted Value
90
120
Residuals Versus the Order of the Data
6.0
Residual
Frequency
2.5
0.0
-2.5
-5.0
-6
-4
-2
0
2
Residual
4
6
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
Observation Order
Interpretation:
1. the graph on top left checks the assumption of normality of error terms. In this
case we see that most of the points are clustered around blue line indication that
the error terms are approximately normal. Thus our assumption of normality is
valid.
2. The graph on top right plots the error terms against the fitted values. There are
approximately half of them are above and half are below the zero line indicating
that our assumption of error terms having mean zero is valid.
3. On the same graph we see the clear cyclic pattern among the error terms
indicating that they are violating the assumption of independence of error. Error
terms are not independent. May be there is another factor present in this
example which we need to find out.
4. The bottom left graph again re-emphasizes the normality assumption. Though
our sample size is just 15.
5. The bottom right graph is also important in this case because data is a time
series and order of the data is important. A clear cyclic pattern indicates that
error terms are dependent on the time variable.
Step VI:
Although the beta is significant and R sq adj is very high indicating that model is a very
good fit to the data, there is violation of assumption of independence indicate that there
is some other factor which is playing role behind the screen and we may have to study it
further.
Step VII:
Let us estimate the value of y and interpret it
Say for x = 14 we find and interval for the average value of y
y-hat = 123 - 5.57 * 14 = 45.02
that is we expect that on the average the expected value of weight on the 14th day
approx 45 grams.
98% confidence interval:
45.02 ± t * .8441 = 45.02± 2.326*.8441= (43.0565, 46.9635)
We are 98% confidant that that on the 14 th day the weight of the soap on the average
lies between 43 grams and 47 grams approx.
98% prediction interval:
45.02± 2.326* 3.1163 = (41.9036, 48.1363)
We are 98% confidant that on the 14th day the predicted value of the weight of the
soap lies between 42 grams and 48 grams approx.
140
Fitted Line Plot
Weight = 123.1 - 5.575 Day
120
100
Regression
95% C I
95% PI
S
R-Sq
R-Sq(adj)
80
2.94921
99.5%
99.5%
60
40
20
0
0
5
10
15
20
Day
(optional)For those who want to improve upon the model
Quadratic fitting: compare the s-value and Rsq adj value with last model.
25
140
Fitted Line Plot
Weight = 127.3 - 6.744 Day
+ 0.05063 Day**2
120
Regression
95% CI
95% PI
100
S
R-Sq
R-Sq(adj)
60
40
20
0
0
5
10
15
20
25
Day
Validation of assumptions in quadratic fitting:
Residual Plots for Weight
Normal Probability Plot of the Residuals
Residuals Versus the Fitted Values
99
2
Residual
Percent
90
50
10
1
-5.0
-2.5
0.0
Residual
2.5
0
-2
-4
5.0
Histogram of the Residuals
30
60
90
Fitted Value
120
2
Residual
3
2
1
0
0
Residuals Versus the Order of the Data
4
Frequency
Weight
80
1.95599
99.8%
99.8%
-4
-3
-2
-1
0
Residual
1
2
3
0
-2
-4
1
2
3
4
5 6 7 8 9 10 11 12 13 14 15
Observation Order
Download