MINITAB EXAMPLE Multiple Regression 252regex2 9/21/99 Explanation: The data set has already been prepared and stored as famdat.mtw. Column 1 (C1) is labeled ‘Y’, C2 is labeled ‘X’, C3 is labeled ‘RESID’ and C4 is labeled ‘PRED’. Two additional independent variables were added, ‘W’ in C5 and ‘H’ in C6. C1, C2, C5 and C6 contain the data and C3 and C4 are blank. The data set is retrieved and printed out. Then the command for simple regression is repeated from the previous use of this data set. In the command, the words ‘on 1’ indicate that there is only one independent variable.(In later regressions we use ‘on 2’ and ‘on 3’ to indicate more independent variables.) The equation of the regression line ( Y b0 b1 X , where b0 0.833 and b1 0.667 ) is printed out. This is followed by a short table that repeats, in the ‘Coef’ column, the coefficients b0 and b1 . The quantities in the ‘Stdev’ column are the two standard deviations sb 0 and sb . In the ‘t-ratio’ column, are the two ratios 1 H10 : 0 0 H 20 : 1 0 b 0 b 0 and t 1 , which are used to test and . Finally, t 0 sb0 sb1 H 21 : 1 0 H11 : 0 0 in the ‘p’ column are the p-values for the two null hypotheses. If we assume that .05 , since the first p-value is 0.117, which is above the significance level, we do not reject the null hypothesis for b0 , and thus say that the intercept is not significant at the 5% significance level. Similarly, since the second p-value is below the significance level, we reject the null hypothesis for b1 and say that the slope is significant. ’s’ is the standard error se , which is used in computing t-tests and confidence intervals. ‘R-sq’ is, of course, R 2 . A graph of ‘PRED’(which is the same as Ŷ ) against Y is now shown. If we had a perfect prediction, this graph would be a 45 degree line. This graph is repeated after each regression to give us a visual idea of our progress, although better measures are R 2 and R 2 adjusted for degrees of freedom. (The symbol for R 2 adjusted for degrees of freedom is Rk2 n 1R 2 k , where k is the n k 1 number of independent variables.) In each regression, the t-ratio provides a test of the null hypothesis that the corresponding coefficient is not significantly different from zero. Note that in the second regression ( Y against X and W )all the p-values are less than 5%, indicating that the coefficients are all significant at the 5% significance level. However, in the final regression ( Y against X , W and H ), the p-values for W and H are above 5%, indicating that the coefficients are not significant at the 5% level. All of the analyses of variance show p-values below 5% indicating that the independent variables are generally related to the dependent variable. The last regression is not satisfactory because of the two insignificant coefficients and most researchers would prefer the second regression. Minitab Output: Worksheet size: 100000 cells MTB > RETR 'C:\MINITAB\FAMDAT.MTW'. Retrieving worksheet from file: C:\MINITAB\FAMDAT.MTW Worksheet was saved on 4/ 1/1998 MTB > print 'Y''X''W''H' Data Display Row Y X W H 1 2 3 4 5 6 7 8 9 10 0 2 1 3 1 3 4 2 1 2 0 1 2 1 0 3 4 2 2 1 1 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 0 MTB > regress 'Y' on 1 'X''resid''pred' Regression Analysis The regression equation is Y = 0.833 + 0.667 X Predictor Constant X Coef 0.8333 0.6667 s = 0.9014 Stdev 0.4751 0.2375 R-sq = 49.6% t-ratio 1.75 2.81 p 0.117 0.023 R-sq(adj) = 43.3% Analysis of Variance SOURCE Regression Error Total DF 1 8 9 SS 6.4000 6.5000 12.9000 MS 6.4000 0.8125 F 7.88 p 0.023 MTB > plot 'pred'*'Y' PRED 3 2 1 0 1 2 Y 3 4 MTB > regress 'Y' on 2 'X''W''resid''pred' Regression Analysis The regression equation is Y = 1.45 + 0.628 X - 1.40 W Predictor Constant X W Coef 1.4535 0.6279 -1.3953 s = 0.5139 Stdev 0.3086 0.1357 0.3325 R-sq = 85.7% t-ratio 4.71 4.63 -4.20 p 0.000 0.000 0.004 R-sq(adj) = 81.6% Analysis of Variance SOURCE Regression Error Total DF 2 7 9 SS 11.0512 1.8488 12.9000 SOURCE X W DF 1 1 SEQ SS 6.4000 4.6512 MS 5.5256 0.2641 F 20.92 p 0.001 MTB > plot 'pred'*'Y' 4 PRED 3 2 1 0 0 1 2 Y MTB > regress 'Y' on 3 'X''W''H''resid''pred' Regression Analysis 3 4 The regression equation is Y = 1.51 + 0.595 X - 0.698 W - 0.937 H Predictor Constant X W H Coef 1.5079 0.5952 -0.6984 -0.9365 s = 0.4484 Stdev 0.2709 0.1198 0.4860 0.5239 R-sq = 90.6% t-ratio 5.57 4.97 -1.44 -1.79 p 0.001 0.003 0.201 0.124 R-sq(adj) = 86.0% Analysis of Variance SOURCE Regression Error Total DF 3 6 9 SS 11.6937 1.2063 12.9000 SOURCE X W H DF 1 1 1 SEQ SS 6.4000 4.6512 0.6425 MS 3.8979 0.2011 Unusual Observations Obs. X Y 4 1.00 3.000 8 2.00 2.000 Fit 2.103 2.000 F 19.39 Stdev.Fit 0.200 0.448 p 0.002 Residual 0.897 0.000 R denotes an obs. with a large st. resid. X denotes an obs. whose X value gives it large influence. MTB > plot 'pred'*'Y' 4 PRED 3 2 1 0 0 1 2 Y MTB > Stop. 3 4 St.Resid 2.23R * X