How to interpret a minitab output of a regression analysis: Step I: Model: From the description of the problem, it says that this a time series data where the weight of soap depends on the number of days it had been used. Thus dependent variable(y) is weight of the soap and independent variable is the number of days (x). We wish to fit a liner model Y = α + βx Step II: The following scatter diagram shows that 1. there is a inverse relationship between x and y, that is as the number of days increase, weight of the soap decreases. 2. We see a distinct liner trend among the data points supporting our model in step I. Scatterplot of Weight vs Day 140 120 Weight 100 80 60 40 20 0 0 5 10 15 20 25 Day 3. Pearson correlation of Day and Weight = -0.998 P-Value = 0.000. This tells us that the sample estimates of Pearson correlation of Day and Weight is -0.998 based on 14 observations. When test for significance, a low p-value rejects the null that rho=0 and we conclude that the sample estimates just did not come from the noise. There is a meaningful linear relationship between the two variables. Step III & IV: Estimates and evaluation: We estimate the model using least square method. The computation from the minitab is as follows: The regression equation is Weight = 123 - 5.57 Day Interpretation: the line intersects y axis at 123 with a slope of -5.57. that is on the day=0, weight is 123gm and for each increase in a day, the weight of the soap decreases on the average by 5.57 grams. Predictor Constant Day Coef 123.141 -5.5748 SE Coef 1.382 0.1068 T 89.09 -52.19 P 0.000 0.000 Interpretation: the sample estimates of alpha and beta are 123.141 and -5.57 respectively. The corresponding test statistics are 89.09 and -52.10 indicating that these are too large values of t-statististics and lie on the extreme ends of t-curve. Thus we reject the null hypothesis of alpha =o and beta=o. And conclude that the beta and alpha play a significant role in the regression model. S = 2.94921 R-Sq = 99.5% R-Sq(adj) = 99.5% Interpretation: the standard deviation of the error terms is 2.94. A 99.5% R-sqadj indicates that when ever we observe a variation in the value of y, 99.5% of it is due to the model (or due to change in x) and only .5% is due error or some unexplained factor. That is this data fits well to the linear model. Analysis of Variance Source Regression Residual Error Total DF 1 13 14 SS 23694 113 23807 MS 23694 9 F 2724.11 P 0.000 Interpretation: In this case ANOVA tests the hypothesis that beta=0. In fact F is nothing but T-square. A low p-value suggest that beta plays a significant role in the model, this is just reassurance of the t-test. Unusual Observations Obs 10 15 Day 12.0 22.0 Weight 50.000 6.000 Fit 56.244 0.496 SE Fit 0.772 1.418 Residual -6.244 5.504 St Resid -2.19R 2.13R R denotes an observation with a large standardized residual. Interpretation: the observation number 10 and number 15 are outliers. We need to go back and review what happened on those days , either soap is used too much or too less. To improve the model, we would like to delete those observation and recompute the line. Step 5: Checking the validity of the assumptions: We made the assumptions that the all the error terms are identically and independently normally distributed with mean 0 and common variance sigma –square. Residual Plots for Weight Normal Probability Plot of the Residuals Residuals Versus the Fitted Values 99 5.0 Residual Percent 90 50 10 1 0.0 -2.5 -5.0 -5.0 -2.5 0.0 Residual 2.5 5.0 0 Histogram of the Residuals 5.0 4.5 2.5 3.0 1.5 0.0 30 60 Fitted Value 90 120 Residuals Versus the Order of the Data 6.0 Residual Frequency 2.5 0.0 -2.5 -5.0 -6 -4 -2 0 2 Residual 4 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Observation Order Interpretation: 1. the graph on top left checks the assumption of normality of error terms. In this case we see that most of the points are clustered around blue line indication that the error terms are approximately normal. Thus our assumption of normality is valid. 2. The graph on top right plots the error terms against the fitted values. There are approximately half of them are above and half are below the zero line indicating that our assumption of error terms having mean zero is valid. 3. On the same graph we see the clear cyclic pattern among the error terms indicating that they are violating the assumption of independence of error. Error terms are not independent. May be there is another factor present in this example which we need to find out. 4. The bottom left graph again re-emphasizes the normality assumption. Though our sample size is just 15. 5. The bottom right graph is also important in this case because data is a time series and order of the data is important. A clear cyclic pattern indicates that error terms are dependent on the time variable. Step VI: Although the beta is significant and R sq adj is very high indicating that model is a very good fit to the data, there is violation of assumption of independence indicate that there is some other factor which is playing role behind the screen and we may have to study it further. Step VII: Let us estimate the value of y and interpret it Say for x = 14 we find and interval for the average value of y y-hat = 123 - 5.57 * 14 = 45.02 that is we expect that on the average the expected value of weight on the 14th day approx 45 grams. 98% confidence interval: 45.02 ± t * .8441 = 45.02± 2.326*.8441= (43.0565, 46.9635) We are 98% confidant that that on the 14 th day the weight of the soap on the average lies between 43 grams and 47 grams approx. 98% prediction interval: 45.02± 2.326* 3.1163 = (41.9036, 48.1363) We are 98% confidant that on the 14th day the predicted value of the weight of the soap lies between 42 grams and 48 grams approx. 140 Fitted Line Plot Weight = 123.1 - 5.575 Day 120 100 Regression 95% C I 95% PI S R-Sq R-Sq(adj) 80 2.94921 99.5% 99.5% 60 40 20 0 0 5 10 15 20 Day (optional)For those who want to improve upon the model Quadratic fitting: compare the s-value and Rsq adj value with last model. 25 140 Fitted Line Plot Weight = 127.3 - 6.744 Day + 0.05063 Day**2 120 Regression 95% CI 95% PI 100 S R-Sq R-Sq(adj) 60 40 20 0 0 5 10 15 20 25 Day Validation of assumptions in quadratic fitting: Residual Plots for Weight Normal Probability Plot of the Residuals Residuals Versus the Fitted Values 99 2 Residual Percent 90 50 10 1 -5.0 -2.5 0.0 Residual 2.5 0 -2 -4 5.0 Histogram of the Residuals 30 60 90 Fitted Value 120 2 Residual 3 2 1 0 0 Residuals Versus the Order of the Data 4 Frequency Weight 80 1.95599 99.8% 99.8% -4 -3 -2 -1 0 Residual 1 2 3 0 -2 -4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Observation Order