STA 6207 – Exam 3 – Fall 2013 PRINT Name _____________________ Unless stated otherwise, all Questions are based on the following 2 regression models, where SIMPLE REGRESSION refers to the case where p=1, and X is of full column rank (no linear dependencies among predictor variables). Model 1 : Yi 0 1 X i1 p X ip i i 1,..., n i ~ NID 0, 2 Model 2 : Y Xβ ε X n p' β p'1 ε ~ N 0, 2 I _______________________________________________________________________________________ CONDUCT ALL TESTS AT = 0.05 SIGNIFICANCE LEVEL. Residuals 500 Plot 1 – Q.1. 400 300 200 100 0 -200 -100 0 Residuals 200 400 600 800 1000 -200 -300 Residuals Plot 2 – Q.3. 4 3 2 1 Residuals 0 -1 -2 -3 0 5 10 15 Q.1. A regression model is fit, relating time to complete a task (Y, in minutes) to nationality of the team (X 1=1 if US, 0 if non-US) and complexity (X2, on a TACOM scale) for nuclear power plant operators. The model fit is: Yi 0 1 X i1 2 X i 2 3 X i1 X i 2 i ANOVA df Regression Residual Total 3 66 69 SS 3234355 903715 4138069 MS 1078118 13693 F Significance F 79 0.0000 INV(X'X) 0.7998 -0.7998 -0.1410 0.1410 -0.7998 1.5996 0.1410 -0.2820 -0.1410 0.1410 0.0258 -0.0258 0.1410 -0.2820 -0.0258 0.0515 Coefficients Standard Error t Stat P-value Intercept -406.1 104.7 -3.88 0.0002 Nation -386.0 148.0 -2.61 0.0112 Complexity 117.2 18.8 6.24 0.0000 N*C 108.0 26.6 4.06 0.0001 p.1.a. Test whether the slopes (with respect to complexity scores) are equivalent for US and non-US power plants. H0: ____________________ HA: __________________ Test Stat: _____________________ P-Value: _________________ p.1.b. Give the estimated mean time to complete a task with complexity of X 2 = 5 for US and non-US plants. US: ______________________________________ non-US: ______________________________________ p.1.c. Compute a 95% Confidence Interval for the difference of the means estimated in p.1.c. p.1.d. A plot of residuals versus fitted values (See Page 1) clearly displays non-constant error variance. We want to fit a power model, relating the variance to the mean: i2 i ^ ^ ln i2 * ln i Regressing ln ei2 on ln Y i 0.75 Based on Bartlett’s method, suggest a power transformation to approximately stabilize the error variance. 2 ( ) f ( ) 1 ( )1 / 2 d Q.2. A study was conducted to compare growth rates of Thai and Commercial Tilapia. The authors fit separate models for the two strains, with samples of 20 fish of each type selected at 5 time points and weighed (then sacrificed ). Thus 100 fish of each strain were measured. The model fit by unweighted nonlinear least squares was (where Y=Weight, and X=Age, in days): Yi i i 0 e 1 X i i p.2.a. How would you interpret p.2.b. Obtain i 0 and 0 and e ? 1 i 1 ^ p.2.c. The estimates and standard errors for ^ 0 and 1 are given in the following table for each strain. Obtain 95% CI’s for all model parameters. Strain Parameter Beta0 Beta1 Thai Estimate 30.5771 0.0132 Thai StdError 3.2614 0.0008 Commercial Estimate 21.6209 0.0168 Commercial StdError 2.6555 0.0010 0T : ________________ 1T : ________________ 0C : ________________ 1C : ________________ p.2.d As a crude test of equality of Strains, based on the confidence intervals, do you reject the null hypothesis: H 0 : 0T 0C , 1T 1C Note that their estimates are independent as models were fit separately. p.2.e. At what age (if any) do the two strains’ curves cross? Yes / No Q.3. A study related the spread in shotgun pellets (Y, sqrt(AREA)) to distance shot (X, in feet) for a particular shotgun and bullet type (different from the in-class example). p.3.a. Based on the residuals versus fitted plot (Plot 2 on page 1), which violation(s) if any is/are violated? i) Linearity ii) Equal Variances iii) Independence p.3.b. We wish to fit an Estimated Weighted Least Squares Regression of the following model: Yi 0 1 xi 2 xi2 i xi X i X with weights wi 1 where s j is SD of Y for distance group of shot i s 2j ^ The data are given below, give the form of W used in estimating X 10 20 30 40 50 x -20 -10 0 10 20 x^2 400 100 0 100 400 _____ I 6 _____ I 6 W _____ I 6 _____ I 6 _____ I 6 y1 3.01 5.57 8.09 10.81 16.07 _____ I 6 _____ I 6 _____ I 6 _____ I 6 _____ I 6 y2 3.02 5.00 6.80 10.19 14.90 βW = X'WX X'WY y3 3.29 5.42 7.95 13.01 17.47 _____ I 6 _____ I 6 _____ I 6 _____ I 6 _____ I 6 -1 y4 3.00 5.73 8.62 11.17 14.21 _____ I 6 _____ I 6 _____ I 6 _____ I 6 _____ I 6 y5 3.20 5.29 8.41 11.33 13.13 y6 3.11 5.10 8.62 9.35 11.93 SD(grp) 0.119 0.278 0.685 1.232 1.996 _____ I 6 _____ I 6 _____ I 6 _____ I 6 _____ I 6 p.3.c. From the results below, complete the following table. (SSE* based on transformed residuals) X'WX 519.5807125 -9180.684522 178240.8398 -9180.684522 178240.8398 -3451225.692 178240.8398 -3451225.692 68848673.11 INV(X'WX) 0.02146079 0.00100725 -0.00000507 SSE* 0.00100725 0.00023816 0.00000933 s^2 25.09147082 -0.00000507 0.00000933 0.00000050 X'WY 1899.813 -29592.3 580927.4 Beta_W SE(B_W) 8.020371 0.286385 0.00203 Q.4. For the simple regression model (scalar form): X X 1 i S XX i 1 ^ we get: p.4.a. Derive p.4.b. Derive n Yi 0 1 X i i i 1,..., n i ~ NID 0, 2 n ^ ^ 1 Yi , Y Yi , 0 Y 1 X i 1 n ^ ^ E 1 , E Y , E 0 ^ ^ ^ ^ ^ V 1 , V Y , Cov 1 , Y , V 0 , Cov 1 , 0 Q.5. A simple linear regression model is fit, based on n=5 individuals. The data and the projection matrix are given below: X 1 1 1 1 1 2 4 25 6 8 Y 8 12 24 14 18 P 0.344 0.303 -0.129 0.262 0.221 0.303 0.274 -0.035 0.244 0.215 -0.129 -0.035 0.953 0.059 0.153 0.262 0.244 0.059 0.226 0.209 0.221 0.215 0.153 0.209 0.203 p.5.a. Give the leverage values for each observation. Do any exceed twice the average of the leverage values? Observation 1 ________ p.5.b. Give X'X 5 45 Obs 2 ______________ Obs 3 ___________ Obs 4 ____________ Obs 5 ______________ β , based on the following results 45 745 X'Y 76 892 INV(X'X) 0.438 -0.026 -0.026 0.003 p.5.c. Compute SSE and Se (Note: Y'Y 1304 Y'PY 1282.45 ) p.5.d. The following table contains the fitted values with and without each observation, residual standard deviation when that observation was not included in the regression, and the regression coefficients when that observation was not included in the regression. Y-hat 10.92 12.14 24.99 13.36 14.59 Y-hat(-i) 12.45 12.19 45.00 9.46 13.72 S_i 2.07 3.28 0.63 3.24 1.86 beta0_i 11.41 9.76 5.00 9.46 8.72 beta1_i 0.52 0.61 1.60 0.62 0.62 p.5.d.i. Compute DFFITS for the fifth observation p.5.d.ii. Compute DFBETAS0 for the first observation p.5.d.iii. Compute DFBETAS1 for the third observation