Solutions to Midterm (STA 4234, October 7, 2013) The outputs are provided by Minitab. 1. Regression Analysis: purity versus hydro The regression equation is purity = 77.9 + 11.8 hydro Predictor Constant hydro Coef 77.863 11.801 S = 3.59656 SE Coef 4.199 3.485 T 18.54 3.39 R-Sq = 38.9% P 0.000 0.003 R-Sq(adj) = 35.5% Analysis of Variance Source Regression Residual Error Total DF 1 18 19 SS 148.31 232.83 381.15 MS 148.31 12.94 F 11.47 P 0.003 SE Fit 1.047 Residual 7.304 Unusual Observations Obs 18 hydro 0.99 purity 96.850 Fit 89.546 St Resid 2.12R R denotes an observation with a large standardized residual. Predicted Values for New Observations New Obs 1 Fit 89.664 SE Fit 1.025 95% CI (87.510, 91.818) 95% PI (81.807, 97.521) Values of Predictors for New Observations New Obs 1 hydro 1.00 (a) The regression equation is purity = 77.9 + 11.8 hydro (b) Analysis of Variance Source Regression Residual Error Total DF 1 18 19 SS 148.31 232.83 381.15 MS 148.31 12.94 F 11.47 P 0.003 F=11.47 with p=0.003. The null hypothesis 𝐻0 : 𝛽1 = 0 is rejected and we conclude that there is a linear relationship between purity and percent of hydrocarbons. (c) R-Sq = 38.9% (d) A 95% confidence interval on the slope parameter 𝛽1 is (11.801 ± 2.101 (3.485) = (4.48, 19.12). (e) A 95% confidence interval on the mean purity when the hydrocarbon percentage is 1.00 is (87.510, 91.818). (f) A 95% prediction interval on oxygen purity when the hydrocarbon percentage is 1.00 is (81.807, 97.521). 2. (a) The correlation is 𝑟 = √𝑅 2=0.624. (b) This is the same as the test statistics for testing 𝐻0 : 𝛽1 = 0 with t=3.39 and p=0.003. We reject the hypothesis that 𝜌 = 0 and conclude that there is correlation different from zero. (c) A 95% confidence interval for 𝜌 is (𝑡𝑎𝑛ℎ [arctanh(0.624) − 1.96 1.96 √17 √17 ] , tanh [arctanh(0.624) + ]) = (0.261, 0.837). (d) The test statistics is 𝑍0 = (arctanh(0.624) − arctanh(0.6))√17=0.15825. Since the rejection region is |𝑍0 | > 𝑍𝛼/2 = 1.96, we fail to reject 𝐻0 . 3. Regression Analysis: y versus x1, x2, x3, x4, x5 The regression equation is y = 52.1 + 0.0556 x1 + 0.282 x2 + 0.125 x3 - 0.000 x4 - 16.1 x5 Predictor Constant x1 x2 x3 x4 x5 Coef 52.08 0.05556 0.28214 0.1250 -0.0000 -16.065 S = 8.06536 SE Coef 18.89 0.02987 0.05761 0.4033 0.2016 1.456 R-Sq = 93.7% Analysis of Variance T 2.76 1.86 4.90 0.31 -0.00 -11.03 P 0.020 0.093 0.001 0.763 1.000 0.000 R-Sq(adj) = 90.6% Source Regression Residual Error Total Source x1 x2 x3 x4 x5 DF 1 1 1 1 1 DF 5 10 15 SS 9712.5 650.5 10363.0 MS 1942.5 65.0 F 29.86 P 0.000 Seq SS 225.0 1560.2 6.2 0.0 7921.0 Regression Analysis: y versus x2, x5 The regression equation is y = 80.1 + 0.282 x2 - 16.1 x5 Predictor Constant x2 x5 Coef 80.135 0.28214 -16.065 S = 8.23571 SE Coef 5.691 0.05883 1.487 R-Sq = 91.5% T 14.08 4.80 -10.81 P 0.000 0.000 0.000 R-Sq(adj) = 90.2% Analysis of Variance Source Regression Residual Error Total Source x2 x5 DF 1 1 DF 2 13 15 SS 9481.3 881.7 10363.0 MS 4740.6 67.8 F 69.89 P 0.000 Seq SS 1560.2 7921.0 Unusual Observations Obs 7 x2 95.0 y 71.00 Fit 86.38 SE Fit 3.57 Residual -15.38 St Resid -2.07R R denotes an observation with a large standardized residual. (a) The desired multiple regression - 0.000 x4 - 16.1 x5 model is y = 52.1 + 0.0556 x1 + 0.282 x2 + 0.125 x3 (b) Analysis of Variance Source Regression Residual Error Total DF SS MS 5 9712.5 1942.5 10 650.5 65.0 15 10363.0 F 29.86 P 0.000 Since F=29.86 with p=0.000, we reject H0: all the coefficients are zero and conclude that the regression is significant. (c) x2 and x5 appear to contribute to the model. For other regressors x1, x3 and x4, their t values are small with large p values, which implies that they are not significant. Predictor Constant x1 x2 x3 x4 x5 Coef 52.08 0.05556 0.28214 0.1250 -0.0000 -16.065 SE Coef 18.89 0.02987 0.05761 0.4033 0.2016 1.456 T P 2.76 0.020 1.86 0.093 4.90 0.001 0.31 0.763 -0.00 1.000 -11.03 0.000 (d) For the model in part (a), R-Sq = 93.7% R-Sq(adj) = 90.6%. For the model with only temperature and particle size, R-Sq = 91.5% R-Sq(adj) = 90.2%. These are basically the same. (e) For the model in part (a), a 95% confidence interval for the regression coefficient for temperature is (0.282-2.228 (0.05761), 0.282+2.228(0.05761))=(0.154, 0.410). For the model with only temperature and particle size, a 95% confidence interval is (0.282-2.16 (0.05883), 0.282+2.16(0.05883))=(0.155, 0.409). These two intervals are almost the same.