Stat 328 Final Exam (Regression) 30 n =

advertisement
Stat 328 Final Exam (Regression)
Summer 2002
Professor Vardeman
This exam concerns the analysis of 1990 salary data for n = 30 offensive backs in the NFL. (This is a part of the
larger data set that serves as the basis of your Lab #6.) Attached to this exam are a number of JMP reports for these
data. Use them in answering the questions on this exam.
As on Lab #6, the variables available for modeling were:
salary
1990 season salary
draft
round in the player draft when the player was selected
yrs_exp
years of NFL experience for the player
played
the number of regular season games played in 1989
started
the number of regular season games started in 1989
citypop
the population of the city in which the player's team is located
Vardeman used these variables and created several more:
log10salary
the base 10 logarithm of salary
1/draft
the reciprocal of draft
percentstarted the ratio started/played
To begin, consider the problem of modeling a salary variable in terms of only a draft position
variable.
a) Consider the plots of salary vs draft and log10salary vs draft. What about these suggests that (as far as using
standard statistical methodology is concerned) log10salary is probably a better "y" than salary?
Ultimately, Vardeman decided to use 1/draft instead of draft in a SLR regression analysis for
log10salary. So until further notice, consider a SLR analysis using the model
log10salary = β 0 + β1 (1/ draft ) + ε
b) What fraction of raw variability in log10salary is accounted for using 1/draft as a predictor variable?
c) Give and interpret a p-value for testing H 0 : β1 = 0 . Say exactly where you found this on the printout.
p-value:
where:
interpretation:
1
d) Notice that if draft is large, 1/draft is near 0. So β 0 might be interpreted as a mean log10salary for a high draft
number (or perhaps even undrafted) offensive back. Give 95% confidence limits for this.
e) What does the SLR model on the previous page give as the difference in mean log10salary values for 1st and
2nd round draft picks? (Note that these are the cases 1/ draft = 1 and 1/ draft = .5 .) Give 95% confidence
limits for this difference in means.
f) A particular offensive back not included in this data set is a former first round draft pick and was offered a
$315,000 contract for 1990. (The base 10 logarithm of 315,000 is about 5.5.) On the basis of draft position
alone, did this person have a good case that the offer was too low? Explain carefully.
Now consider MLR analyses of log10salary . Notice that printouts are available for two different
multiple linear regressions. The first is a regression on draft, yrs_exp, played, started, citypop,
1/draft, and percentstarted. The second is a regression on only yrs_exp, 1/draft, and percentstarted.
g) Give 95% confidence limits for the standard deviation of log10salary when all of draft, yrs_exp, played,
started, citypop, 1/draft, and percentstarted are held fixed.
2
h) What on the MLR printouts suggests that it may be feasible to model log10salary using fewer than 7
predictors?
i) There is a decrease in R 2 if one moves from the 7 variable regression to the 3 variable regression. Give an
appropriate F value, degrees of freedom and approximate p-value to attach to the decrease.
F:
d.f.:
,
p-value:
Henceforth consider the 3 variable regression. Besides the raw data, the JMP data table at the end
of the printout has summaries of that fit. Notice that although n = 30 cases were used in the fitting,
the table includes some values for an additional (31st) case.
j) According to this model, what increase in mean log10salary accompanies a 1 year increase in NFL experience,
if draft position and percentage of games started are held fixed? Give 95% confidence limits.
k) Dropping which of the 3 predictors would cause the biggest decrease in R 2 ? How do you know?
variable:
reasoning:
3
l) Player 30 has a large “hat” value. What about his values of yrs_exp, 1/draft, and percentstarted makes this
qualitatively plausible/expected?
m) Considering both “x” and “y” variables, which player among the 30 in the data set was the “most influential” in
terms of fitting the 3 variable model? Explain.
n) Make 95% prediction limits for the log10salary of player 31 (based on the 3 variable model!).
o) Player 31’s actual salary was $75,000. Does your answer to n) provide solid statistical evidence that his salary
(for unknown reasons) was atypical? Explain.
4
Bivariate Fit of SALARY By DRAFT
SALARY
1500000
1000000
500000
0
0
2
4
6
8
DRAFT
10
12
14
12
14
Bivariate Fit of Log10Salary By DRAFT
6.25
Log10Salary
6
5.75
5.5
5.25
5
4.75
0
2
4
6
8
DRAFT
5
10
Bivariate Fit of Log10Salary By 1/Draft
6.5
6.25
Log10Salary
6
5.75
5.5
5.25
5
4.75
0
.2
.4
.6
.8
1/Draft
Linear Fit
Linear Fit
Log10Salary = 5.3593619 + 0.4602566 1/Draft
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.266369
0.240168
0.270855
5.555055
30
Analysis of Variance
Source
Model
Error
C. Total
DF
1
28
29
Sum of Squares
0.7458282
2.0541531
2.7999814
Mean Square
0.745828
0.073363
F Ratio
10.1663
Prob > F
0.0035
Parameter Estimates
Term
Intercept
1/Draft
Estimate Std Error
5.3593619 0.078818
0.14435
0.4602566
t Ratio
68.00
3.19
Prob>|t|
<.0001
0.0035
6
1
Response Log10Salary
Whole Model
Actual by Predicted Plot
Log10Salary Actual
6.25
6
5.75
5.5
5.25
5
4.75
4.75 5.00 5.25 5.50 5.75 6.00 6.25
Log10Salary Predicted P<.0001
RSq=0.74 RMSE=0.1826
Summary of Fit
RSquare
0.73799
RSquare Adj
0.654623
Root Mean Square Error
0.18261
Mean of Response
5.555055
Observations (or Sum Wgts)
30
Analysis of Variance
Source
Model
Error
C. Total
DF
7
22
29
Sum of Squares
2.0663570
0.7336244
2.7999814
Mean Square
0.295194
0.033347
F Ratio
8.8523
Prob > F
<.0001
Parameter Estimates
Term
Intercept
DRAFT
YRS_EXP
PLAYED
STARTED
CITYPOP
1/Draft
PercentStarted
Estimate
5.2192484
-0.019696
0.0610721
-0.007996
-0.020133
4.201e-10
0.2107898
0.5894741
Std Error
0.196691
0.014513
0.014186
0.012826
0.02552
6.651e-9
0.174268
0.365152
t Ratio
26.54
-1.36
4.31
-0.62
-0.79
0.06
1.21
1.61
Prob>|t|
<.0001
0.1885
0.0003
0.5394
0.4386
0.9502
0.2393
0.1207
Effect Tests
Source
DRAFT
YRS_EXP
PLAYED
STARTED
CITYPOP
1/Draft
PercentStarted
Nparm
1
1
1
1
1
1
1
DF
1
1
1
1
1
1
1
Sum of Squares
0.06141617
0.61807668
0.01295946
0.02075528
0.00013304
0.04878823
0.08690275
F Ratio
1.8418
18.5349
0.3886
0.6224
0.0040
1.4631
2.6060
Prob > F
0.1885
0.0003
0.5394
0.4386
0.9502
0.2393
0.1207
Log10Salary Residual
Residual by Predicted Plot
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
4.75 5.00 5.25 5.50 5.75 6.00 6.25
Log10Salary Predicted
Press
1.306899633
7
Response Log10Salary
YRS_EXP
Log10Salary Actual
6.25
6
5.75
5.5
5.25
5
4.75
4.75 5.00 5.25 5.50 5.75 6.00 6.25
Log10Salary Leverage Residuals
Sum of Squares
1.9090107
0.8909707
2.7999814
Mean Square
0.636337
0.034268
F Ratio
18.5694
Prob > F
<.0001
Parameter Estimates
Term
Intercept
YRS_EXP
1/Draft
PercentStarted
Estimate
5.018818
0.05123
0.3931939
0.252371
Std Error
0.082817
0.012582
0.109373
0.094825
5.75
5.50
5.25
5.00
4.75
.0
2.5
Nparm
1
1
1
DF
1
1
1
t Ratio
60.60
4.07
3.59
2.66
6.00
5.75
5.50
5.25
5.00
4.75
-0.25 .00
Sum of Squares
0.56810982
0.44287812
0.24273030
Prob>|t|
<.0001
0.0004
0.0013
0.0132
F Ratio
16.5784
12.9239
7.0833
.25
.50
.75 1.00 1.25
PercentStarted Leverage,
P=0.0132
Prob > F
0.0004
0.0013
0.0132
Residual by Predicted Plot
Log10Salary Residual
7.5 10.0 12.5 15.0
6.25
Effect Tests
Source
YRS_EXP
1/Draft
PercentStarted
5.0
Leverage Plot
0.681794
0.645078
0.185116
5.555055
30
Analysis of Variance
DF
3
26
29
6.00
PercentStarted
Summary of Fit
Source
Model
Error
C. Total
Leverage Plot
6.25
YRS_EXP Leverage, P=0.0004
Log10Salary Predicted P<.0001
RSq=0.68 RMSE=0.1851
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
1/Draft
Leverage Plot
Log10Salary Leverage Residuals
Actual by Predicted Plot
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
4.75 5.00 5.25 5.50 5.75 6.00 6.25
Log10Salary Predicted
Press
1.2376170106
8
Log10Salary Leverage Residuals
Whole Model
6.25
6.00
5.75
5.50
5.25
5.00
4.75
-0.25 .00
.25
.50
.75 1.00 1.25
1/Draft Leverage, P=0.0013
Rows SALARY DRAFT YRS_EXP PLAYED STARTED CITYPOP Log10Salary
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
236000
250000
185000
165000
250000
300000
300000
1000000
225000
475000
425000
310000
287500
700000
1275000
185000
700000
325000
155000
500000
204000
1366700
160000
1050000
98000
370000
450000
195000
1500000
420000
.
1
6
10
13
1
11
3
8
13
7
3
8
4
1
2
12
1
4
3
3
2
1
3
1
7
3
2
2
1
8
3
2
5
4
2
3
7
7
10
5
6
7
6
4
5
6
5
2
6
2
2
2
4
2
10
2
2
6
1
8
13
2
16
16
16
6
16
16
11
14
11
16
16
16
13
16
16
15
2
16
7
8
13
14
14
16
11
10
16
1
16
14
14
16
5
16
0
4
13
8
12
7
15
1
0
10
15
16
1
1
1
0
6
1
14
0
14
0
1
8
0
16
0
12
2737000
2737000
4620000
4620000
13770000
13770000
2388000
2388000
1307000
1307000
18120000
18120000
18120000
5963000
5963000
2030000
2030000
2030000
6042000
1995000
1995000
1995000
1176000
1728000
1728000
3641000
1237000
1575000
3001000
4110000
3766000
5.372912
5.39794001
5.26717173
5.21748394
5.39794001
5.47712125
5.47712125
6
5.35218252
5.67669361
5.62838893
5.49136169
5.45863785
5.84509804
6.10551018
5.26717173
5.84509804
5.51188336
5.1903317
5.69897
5.30963017
6.13567319
5.20411998
6.0211893
4.99122608
5.56820172
5.65321251
5.29003461
6.17609126
5.62324929
.
1/Draft
1
0.16666667
0.1
0.07692308
1
0.09090909
0.33333333
0.125
0.07692308
0.14285714
0.33333333
0.125
0.25
1
0.5
0.08333333
1
0.25
0.33333333
0.33333333
0.5
1
0.33333333
1
0.14285714
0.33333333
0.5
0.5
1
0.125
0.33333333
9
PercentStarted
1
0.3125
1
0
0.25
0.8125
0.72727273
0.85714286
0.63636364
0.9375
0.0625
0
0.76923077
0.9375
1
0.06666667
0.5
0.0625
0
0.75
0.07692308
1
0
0.875
0
0.1
0.5
0
1
0
0.85714286
Pred Formula
Log10Salary
5.76684297
5.41936644
5.51542853
5.15152375
5.62879476
5.6182246
5.69203544
5.79658562
5.46581359
5.61896659
5.52426609
5.37534746
5.51616815
5.9047599
5.77516617
5.32455907
5.64065747
5.44026989
5.2523427
5.44162095
5.33728817
5.86930305
5.2523427
6.14513691
5.17744862
5.2775798
5.64898067
5.26664498
6.07422321
5.73395774
5.4686607
PredSE
Log10Salary
0.0797461
0.04254526
0.08378588
0.06378724
0.07899839
0.06498982
0.04702261
0.0791616
0.05725635
0.0690754
0.06084868
0.05831465
0.05610435
0.06859241
0.05617189
0.05324596
0.07283157
0.05328167
0.05764531
0.06487557
0.0565259
0.07139383
0.05764531
0.09476325
0.06104746
0.05344746
0.03784954
0.06589555
0.07982346
0.12196501
0.07184125
Residual
Log10Salary
-0.393931
-0.0214264
-0.2482568
0.0659602
-0.2308548
-0.1411033
-0.2149142
0.20341438
-0.1136311
0.05772702
0.10412284
0.11601423
-0.0575303
-0.0596619
0.33034401
-0.0573873
0.20444057
0.07161347
-0.062011
0.25734906
-0.027658
0.26637015
-0.0482227
-0.1239476
-0.1862225
0.29062193
0.00423184
0.02338963
0.10186805
-0.1107084
.
Rows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Studentized Resid
Cook's D Influence
Log10Salary
h Log10Salary
Log10Salary
-2.3580369
-0.1189293
-1.5039507
0.37956273
-1.3789475
-0.814058
-1.2003387
1.21559992
-0.6454872
0.33611855
0.59556611
0.66032921
-0.3261175
-0.3469934
1.872823
-0.3236856
1.20126911
0.40395064
-0.352511
1.48434007
-0.1569025
1.55958733
-0.2741294
-0.7794364
-1.0655858
1.63977487
0.02335378
0.13520724
0.60990831
-0.7949912
.
0.18557898
0.05282168
0.20485737
0.11873469
0.18211531
0.12325387
0.06452431
0.18286857
0.09566591
0.13923765
0.10804689
0.09923509
0.09185505
0.13729732
0.09207634
0.08273385
0.15479225
0.08284486
0.09697011
0.12282091
0.09324056
0.14874118
0.09697011
0.26205343
0.10875399
0.08336123
0.04180528
0.12671329
0.18593925
0.43409068
.
0.31675319
0.0001972
0.14568464
0.00485264
0.10584976
0.02329042
0.02484497
0.08267391
0.01101904
0.00456876
0.01074163
0.01200922
0.00268928
0.00479053
0.08892671
0.00236252
0.06607031
0.00368485
0.00333596
0.07712424
0.00063287
0.10625002
0.00201738
0.05393445
0.03463893
0.06113281
0.00000595
0.00066314
0.02124141
0.12119878
.
10
Download