Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman This exam concerns the analysis of 1990 salary data for n = 30 offensive backs in the NFL. (This is a part of the larger data set that serves as the basis of your Lab #6.) Attached to this exam are a number of JMP reports for these data. Use them in answering the questions on this exam. As on Lab #6, the variables available for modeling were: salary 1990 season salary draft round in the player draft when the player was selected yrs_exp years of NFL experience for the player played the number of regular season games played in 1989 started the number of regular season games started in 1989 citypop the population of the city in which the player's team is located Vardeman used these variables and created several more: log10salary the base 10 logarithm of salary 1/draft the reciprocal of draft percentstarted the ratio started/played To begin, consider the problem of modeling a salary variable in terms of only a draft position variable. a) Consider the plots of salary vs draft and log10salary vs draft. What about these suggests that (as far as using standard statistical methodology is concerned) log10salary is probably a better "y" than salary? Ultimately, Vardeman decided to use 1/draft instead of draft in a SLR regression analysis for log10salary. So until further notice, consider a SLR analysis using the model log10salary = β 0 + β1 (1/ draft ) + ε b) What fraction of raw variability in log10salary is accounted for using 1/draft as a predictor variable? c) Give and interpret a p-value for testing H 0 : β1 = 0 . Say exactly where you found this on the printout. p-value: where: interpretation: 1 d) Notice that if draft is large, 1/draft is near 0. So β 0 might be interpreted as a mean log10salary for a high draft number (or perhaps even undrafted) offensive back. Give 95% confidence limits for this. e) What does the SLR model on the previous page give as the difference in mean log10salary values for 1st and 2nd round draft picks? (Note that these are the cases 1/ draft = 1 and 1/ draft = .5 .) Give 95% confidence limits for this difference in means. f) A particular offensive back not included in this data set is a former first round draft pick and was offered a $315,000 contract for 1990. (The base 10 logarithm of 315,000 is about 5.5.) On the basis of draft position alone, did this person have a good case that the offer was too low? Explain carefully. Now consider MLR analyses of log10salary . Notice that printouts are available for two different multiple linear regressions. The first is a regression on draft, yrs_exp, played, started, citypop, 1/draft, and percentstarted. The second is a regression on only yrs_exp, 1/draft, and percentstarted. g) Give 95% confidence limits for the standard deviation of log10salary when all of draft, yrs_exp, played, started, citypop, 1/draft, and percentstarted are held fixed. 2 h) What on the MLR printouts suggests that it may be feasible to model log10salary using fewer than 7 predictors? i) There is a decrease in R 2 if one moves from the 7 variable regression to the 3 variable regression. Give an appropriate F value, degrees of freedom and approximate p-value to attach to the decrease. F: d.f.: , p-value: Henceforth consider the 3 variable regression. Besides the raw data, the JMP data table at the end of the printout has summaries of that fit. Notice that although n = 30 cases were used in the fitting, the table includes some values for an additional (31st) case. j) According to this model, what increase in mean log10salary accompanies a 1 year increase in NFL experience, if draft position and percentage of games started are held fixed? Give 95% confidence limits. k) Dropping which of the 3 predictors would cause the biggest decrease in R 2 ? How do you know? variable: reasoning: 3 l) Player 30 has a large “hat” value. What about his values of yrs_exp, 1/draft, and percentstarted makes this qualitatively plausible/expected? m) Considering both “x” and “y” variables, which player among the 30 in the data set was the “most influential” in terms of fitting the 3 variable model? Explain. n) Make 95% prediction limits for the log10salary of player 31 (based on the 3 variable model!). o) Player 31’s actual salary was $75,000. Does your answer to n) provide solid statistical evidence that his salary (for unknown reasons) was atypical? Explain. 4 Bivariate Fit of SALARY By DRAFT SALARY 1500000 1000000 500000 0 0 2 4 6 8 DRAFT 10 12 14 12 14 Bivariate Fit of Log10Salary By DRAFT 6.25 Log10Salary 6 5.75 5.5 5.25 5 4.75 0 2 4 6 8 DRAFT 5 10 Bivariate Fit of Log10Salary By 1/Draft 6.5 6.25 Log10Salary 6 5.75 5.5 5.25 5 4.75 0 .2 .4 .6 .8 1/Draft Linear Fit Linear Fit Log10Salary = 5.3593619 + 0.4602566 1/Draft Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.266369 0.240168 0.270855 5.555055 30 Analysis of Variance Source Model Error C. Total DF 1 28 29 Sum of Squares 0.7458282 2.0541531 2.7999814 Mean Square 0.745828 0.073363 F Ratio 10.1663 Prob > F 0.0035 Parameter Estimates Term Intercept 1/Draft Estimate Std Error 5.3593619 0.078818 0.14435 0.4602566 t Ratio 68.00 3.19 Prob>|t| <.0001 0.0035 6 1 Response Log10Salary Whole Model Actual by Predicted Plot Log10Salary Actual 6.25 6 5.75 5.5 5.25 5 4.75 4.75 5.00 5.25 5.50 5.75 6.00 6.25 Log10Salary Predicted P<.0001 RSq=0.74 RMSE=0.1826 Summary of Fit RSquare 0.73799 RSquare Adj 0.654623 Root Mean Square Error 0.18261 Mean of Response 5.555055 Observations (or Sum Wgts) 30 Analysis of Variance Source Model Error C. Total DF 7 22 29 Sum of Squares 2.0663570 0.7336244 2.7999814 Mean Square 0.295194 0.033347 F Ratio 8.8523 Prob > F <.0001 Parameter Estimates Term Intercept DRAFT YRS_EXP PLAYED STARTED CITYPOP 1/Draft PercentStarted Estimate 5.2192484 -0.019696 0.0610721 -0.007996 -0.020133 4.201e-10 0.2107898 0.5894741 Std Error 0.196691 0.014513 0.014186 0.012826 0.02552 6.651e-9 0.174268 0.365152 t Ratio 26.54 -1.36 4.31 -0.62 -0.79 0.06 1.21 1.61 Prob>|t| <.0001 0.1885 0.0003 0.5394 0.4386 0.9502 0.2393 0.1207 Effect Tests Source DRAFT YRS_EXP PLAYED STARTED CITYPOP 1/Draft PercentStarted Nparm 1 1 1 1 1 1 1 DF 1 1 1 1 1 1 1 Sum of Squares 0.06141617 0.61807668 0.01295946 0.02075528 0.00013304 0.04878823 0.08690275 F Ratio 1.8418 18.5349 0.3886 0.6224 0.0040 1.4631 2.6060 Prob > F 0.1885 0.0003 0.5394 0.4386 0.9502 0.2393 0.1207 Log10Salary Residual Residual by Predicted Plot 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 4.75 5.00 5.25 5.50 5.75 6.00 6.25 Log10Salary Predicted Press 1.306899633 7 Response Log10Salary YRS_EXP Log10Salary Actual 6.25 6 5.75 5.5 5.25 5 4.75 4.75 5.00 5.25 5.50 5.75 6.00 6.25 Log10Salary Leverage Residuals Sum of Squares 1.9090107 0.8909707 2.7999814 Mean Square 0.636337 0.034268 F Ratio 18.5694 Prob > F <.0001 Parameter Estimates Term Intercept YRS_EXP 1/Draft PercentStarted Estimate 5.018818 0.05123 0.3931939 0.252371 Std Error 0.082817 0.012582 0.109373 0.094825 5.75 5.50 5.25 5.00 4.75 .0 2.5 Nparm 1 1 1 DF 1 1 1 t Ratio 60.60 4.07 3.59 2.66 6.00 5.75 5.50 5.25 5.00 4.75 -0.25 .00 Sum of Squares 0.56810982 0.44287812 0.24273030 Prob>|t| <.0001 0.0004 0.0013 0.0132 F Ratio 16.5784 12.9239 7.0833 .25 .50 .75 1.00 1.25 PercentStarted Leverage, P=0.0132 Prob > F 0.0004 0.0013 0.0132 Residual by Predicted Plot Log10Salary Residual 7.5 10.0 12.5 15.0 6.25 Effect Tests Source YRS_EXP 1/Draft PercentStarted 5.0 Leverage Plot 0.681794 0.645078 0.185116 5.555055 30 Analysis of Variance DF 3 26 29 6.00 PercentStarted Summary of Fit Source Model Error C. Total Leverage Plot 6.25 YRS_EXP Leverage, P=0.0004 Log10Salary Predicted P<.0001 RSq=0.68 RMSE=0.1851 RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 1/Draft Leverage Plot Log10Salary Leverage Residuals Actual by Predicted Plot 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 4.75 5.00 5.25 5.50 5.75 6.00 6.25 Log10Salary Predicted Press 1.2376170106 8 Log10Salary Leverage Residuals Whole Model 6.25 6.00 5.75 5.50 5.25 5.00 4.75 -0.25 .00 .25 .50 .75 1.00 1.25 1/Draft Leverage, P=0.0013 Rows SALARY DRAFT YRS_EXP PLAYED STARTED CITYPOP Log10Salary 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 236000 250000 185000 165000 250000 300000 300000 1000000 225000 475000 425000 310000 287500 700000 1275000 185000 700000 325000 155000 500000 204000 1366700 160000 1050000 98000 370000 450000 195000 1500000 420000 . 1 6 10 13 1 11 3 8 13 7 3 8 4 1 2 12 1 4 3 3 2 1 3 1 7 3 2 2 1 8 3 2 5 4 2 3 7 7 10 5 6 7 6 4 5 6 5 2 6 2 2 2 4 2 10 2 2 6 1 8 13 2 16 16 16 6 16 16 11 14 11 16 16 16 13 16 16 15 2 16 7 8 13 14 14 16 11 10 16 1 16 14 14 16 5 16 0 4 13 8 12 7 15 1 0 10 15 16 1 1 1 0 6 1 14 0 14 0 1 8 0 16 0 12 2737000 2737000 4620000 4620000 13770000 13770000 2388000 2388000 1307000 1307000 18120000 18120000 18120000 5963000 5963000 2030000 2030000 2030000 6042000 1995000 1995000 1995000 1176000 1728000 1728000 3641000 1237000 1575000 3001000 4110000 3766000 5.372912 5.39794001 5.26717173 5.21748394 5.39794001 5.47712125 5.47712125 6 5.35218252 5.67669361 5.62838893 5.49136169 5.45863785 5.84509804 6.10551018 5.26717173 5.84509804 5.51188336 5.1903317 5.69897 5.30963017 6.13567319 5.20411998 6.0211893 4.99122608 5.56820172 5.65321251 5.29003461 6.17609126 5.62324929 . 1/Draft 1 0.16666667 0.1 0.07692308 1 0.09090909 0.33333333 0.125 0.07692308 0.14285714 0.33333333 0.125 0.25 1 0.5 0.08333333 1 0.25 0.33333333 0.33333333 0.5 1 0.33333333 1 0.14285714 0.33333333 0.5 0.5 1 0.125 0.33333333 9 PercentStarted 1 0.3125 1 0 0.25 0.8125 0.72727273 0.85714286 0.63636364 0.9375 0.0625 0 0.76923077 0.9375 1 0.06666667 0.5 0.0625 0 0.75 0.07692308 1 0 0.875 0 0.1 0.5 0 1 0 0.85714286 Pred Formula Log10Salary 5.76684297 5.41936644 5.51542853 5.15152375 5.62879476 5.6182246 5.69203544 5.79658562 5.46581359 5.61896659 5.52426609 5.37534746 5.51616815 5.9047599 5.77516617 5.32455907 5.64065747 5.44026989 5.2523427 5.44162095 5.33728817 5.86930305 5.2523427 6.14513691 5.17744862 5.2775798 5.64898067 5.26664498 6.07422321 5.73395774 5.4686607 PredSE Log10Salary 0.0797461 0.04254526 0.08378588 0.06378724 0.07899839 0.06498982 0.04702261 0.0791616 0.05725635 0.0690754 0.06084868 0.05831465 0.05610435 0.06859241 0.05617189 0.05324596 0.07283157 0.05328167 0.05764531 0.06487557 0.0565259 0.07139383 0.05764531 0.09476325 0.06104746 0.05344746 0.03784954 0.06589555 0.07982346 0.12196501 0.07184125 Residual Log10Salary -0.393931 -0.0214264 -0.2482568 0.0659602 -0.2308548 -0.1411033 -0.2149142 0.20341438 -0.1136311 0.05772702 0.10412284 0.11601423 -0.0575303 -0.0596619 0.33034401 -0.0573873 0.20444057 0.07161347 -0.062011 0.25734906 -0.027658 0.26637015 -0.0482227 -0.1239476 -0.1862225 0.29062193 0.00423184 0.02338963 0.10186805 -0.1107084 . Rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Studentized Resid Cook's D Influence Log10Salary h Log10Salary Log10Salary -2.3580369 -0.1189293 -1.5039507 0.37956273 -1.3789475 -0.814058 -1.2003387 1.21559992 -0.6454872 0.33611855 0.59556611 0.66032921 -0.3261175 -0.3469934 1.872823 -0.3236856 1.20126911 0.40395064 -0.352511 1.48434007 -0.1569025 1.55958733 -0.2741294 -0.7794364 -1.0655858 1.63977487 0.02335378 0.13520724 0.60990831 -0.7949912 . 0.18557898 0.05282168 0.20485737 0.11873469 0.18211531 0.12325387 0.06452431 0.18286857 0.09566591 0.13923765 0.10804689 0.09923509 0.09185505 0.13729732 0.09207634 0.08273385 0.15479225 0.08284486 0.09697011 0.12282091 0.09324056 0.14874118 0.09697011 0.26205343 0.10875399 0.08336123 0.04180528 0.12671329 0.18593925 0.43409068 . 0.31675319 0.0001972 0.14568464 0.00485264 0.10584976 0.02329042 0.02484497 0.08267391 0.01101904 0.00456876 0.01074163 0.01200922 0.00268928 0.00479053 0.08892671 0.00236252 0.06607031 0.00368485 0.00333596 0.07712424 0.00063287 0.10625002 0.00201738 0.05393445 0.03463893 0.06113281 0.00000595 0.00066314 0.02124141 0.12119878 . 10