STATISTICS 401B Fall 2014 Laboratory Assignment 8 1. Experience with a certain type of plastic indicates that a relation exist between the hardness (measured in Brinell units) of items molded from the plastic (y) and the elapsed time since termination of the molding process (x). In a study to examine this relationship, sixteen batches of the plastic were made, and from each batch one test item was molded. Each test item was randomly assigned to one of four predetermined time levels, and the hardness was measured after the assigned time had elapsed. The result are shown below: x, Elapsed Time (hours) 16 24 32 40 y, Hardness 199, 205, 196, 200 218, 220, 215, 223 237, 234, 235, 230 250, 248, 253, 245 Answer the following questions. For parts a) to d), do all computations by hand P P using these quantities computed from the data: y = 3608, y 2 = 819, 008, P P 2 P x = 448, x = 13, 824, xy = 103, 616 a) Plot the data in a y vs. x scatter plot. Does it appear that a simple linear regression model would be a good fit? b) Use the LS method to fit a a simple linear regression model. What is your prediction equation? c) Construct an analysis of variance table for the regression. Use the F-ratio to perform a test of H0 : β1 = 0 vs. Ha : β1 6= 0 using α = .05 d) Compute a lack of fit test for this model and report the results in an anova table where SSLack and SSPexp are shown as a partition of SSE as demonstrated in the class example. What is your conclusion from this test? Use α = .05. For the rest of the problem you may use a JMP analysis using the data table plastic.jmp. Attach the output to your solution. e) Use JMP to obtain the answers to parts a) to d). f ) Compute the predicted values and residuals. Obtain plots of the residuals against elapsed time and the predicted values, respectively. Do these two plots suggest any inadequacies of this model ? Explain why you reached your conclusion. 2. The data on the next page are from a study of efficiency of a plant that produces nitric acid by the oxidation of ammonia. Data were obtained from 21 independent days of operation of the plant. The response y is 10 times the percentage of ammonia that escapes from the absorption column unabsorbed of the ammonia ingoing to the plant. This is an inverse measure of the overall efficiency of the plant. x1 is air flow, and represents the rate of operation of the plant – the nitric oxides produced are absorbed in a counter current absorption tower. x2 is the cooling 1 water inlet temperature. This is the temperature of the cooling water that circulates through the coils in the absorption tower. x3 is the acid concentration. This is 10 × (concentration of the acid circulating−5). Use JMP to perform a multiple regression analysis using the full model y = β0 + β1 x1 + β2 x2 + β3 x3 + using this data. Part I The following are the X 0 X matrix, X 0 y vector, y 0 y, and the inverse of the X 0 X matrix, respectively, computed for this data: X 0X = 21 1269 443 1812 1269 78365 27223 109988 443 27223 9545 38357 1812 109988 38357 156924 (X 0 X)−1 = , X 0y = 368 23953 8326 32189 , y 0 y = 8518 13.452727 0.0273387 −0.0619611 −0.159355 0.0273387 0.00172887 −0.00347079 −0.00067908 −0.0619611 −0.00347079 0.0128754 9.95952E −7 −0.159355 −0.00067908 9.95952E −7 0.00232217 Use these to perform the following calculations using matrix algebra. You may recalculate (X 0 X)−1 if you want more accuracy in your answers. The files X.txt, XPX+XPy.txt and XPXI.txt are available at the website. Show work. (a) Construct the normal equations. (b) Calculate the vector β̂ using β̂ = (X 0 X)−1 X 0 y. 0 (c) Calculate s2 using SSE= y 0 y − β̂ X 0 y. (d) Calculate the standard errors sβ̂1 , sβ̂2 , and sβ̂3 of β̂1 , β̂2 , and β̂3 , respectively. (e) Calculate the predicted values ŷ using the fact that ŷ = X β̂ Part II Write your answers to the following questions on separate pages extracting numbers from a JMP analysis of the data using the data table ammonia.jmp. No hand calculations needed for this part. (a) Report β̂0 , β̂1 , β̂2 , and β̂3 . (b) Report s2 , sβ̂0 , sβ̂1 , sβ̂2 , and sβ̂3 . (c) Report 95% confidence intervals for β1 , β2 and β3 , respectively. (d) Construct an analysis of variance for the above regression. Report the coefficient of determination. (e) Use the F -test statistic for testing H0 : β1 = β2 = β3 = 0 vs. Ha : at least one β is not zero, and report the p-value for the test. State your decision. (f) Use the t-test statistic for testing H0 : β3 = 0 vs. Ha : β3 6= 0, and report the p-value for the test. State your decision. 2 (g) Obtain a 95% confidence interval for β3 . Use this interval to test H0 : β3 = 0 vs. Ha : β3 6= 0. What is the α level of this test. (h) Obtain a 95% confidence interval for mean percentage of converted ammonia for the population of plants with x1 = 65, x2 = 21, x3 = 87. Describe in words what this interval tells you. (i) Obtain a 95% prediction interval for the percentage of converted ammonia y of a plant with x1 = 65, x2 = 21, x3 = 87. Describe in words what this interval tells you. (j) Obtain plots of the residuals vs. predicted values, x1 , x2 , and x3 , respectively. Does any pattern of the types discussed in class observed in these plots? Give your interpretation. (k) Obtain a normal probability plot of the studentized residuals. State the model assumption that you can examine using this plot. Does this assumption appear to be plausible here? (l) Fit the regression model y = β0 + β1 x1 + to the above data. Use values from the resulting output and the previous output to construct an F-statistic to test H0 : β2 = β3 = 0 vs. Ha : β2 and/or β3 6= 0. Perform the test at α = .05. (m) Compare the fit of the model y = β0 + β1 x1 + β2 x2 + with that of the full model and discuss reasons why the smaller model is comparable to the full model. x1 80 80 75 62 62 62 62 62 58 58 58 58 58 58 50 50 50 50 50 56 70 x2 27 27 25 24 22 23 24 24 23 18 18 17 18 19 18 18 19 19 20 20 20 Due Thursday, December 04, 2014 3 x3 89 88 90 87 87 87 93 93 87 80 89 88 82 93 89 86 72 79 80 82 91 y 42 37 37 28 18 18 19 20 15 14 14 13 11 12 8 7 8 8 9 15 15