Simple Linear Regression NBA 2013/14 Player Heights and Weights

advertisement
Simple Linear Regression
NBA 2013/14 Player Heights and
Weights
Data Description / Model
• Heights (X) and Weights (Y) for 505 NBA Players in
2013/14 Season.
• Other Variables included in the Dataset: Age, Position
• Simple Linear Regression Model: Y = b0 + b1X + e
• Model Assumptions:





e ~ N(0,s2)
Errors are independent
Error variance (s2) is constant
Relationship between Y and X is linear
No important (available) predictors have been ommitted
Weight (Y) vs Height (X) - 2013/2014 NBA Players
300
275
Weight (lbs)
250
225
200
175
150
65
70
75
80
Height (inches)
85
90
Regression Calculations
n  505
x  79.06535

S xx   x  x

2
y  220.6733

 6012.844 S xy   x  x



y  y  38065.78 S yy   y  y

2
 357767.1
2
^


SSE    yi  y i   S yy   S xy2 S xx   357767.1   38065.782 6012.844   116782.3

i 1 
n
S xy
38065.78
b1 

 6.330745
S xx 6012.844
^
^
^
b 0  y  b 1 x  220.6733  6.330745  79.06535   279.869
^
y i  279.869  6.330745xi
2
^


 yi  y i 

  SSE  116782.3  232.1716  s  232.1716  15.23718
se2  i 1 
e
n2
n  2 505  2
SE ^  se 1 S xx  15.23718 1 6012.844  0.196501
n
b1
Inference Concerning b1
n  505
x  79.06535
y  220.6733
^
^
^
S xy 38065.78
b1 

 6.330745
b 0  y  b 1 x  220.6733  6.330745  79.06535   279.869
S xx 6012.844
SE ^  se 1 S xx  15.23718 1 6012.844  0.196501
b1
Test of H 0 : b1  0 H A : b1  0
^
Test Statistic: tobs 
b1
SE ^
b1

6.330745
 32.21738
0.196501
Rejection Region: tobs  t.025,505 2  1.965
P-value: 2 P tn  2  tobs   2 P t505 2  32.21738  .0000
^
95% Confidence Interval for b1 : b 1  t.025,n  2 SE ^
b1
 6.330745  1.965  0.196501  6.330745  0.386124 
 5.944621 , 6.71687 
EXCEL Output and Inference for b0
Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%
-279.869 15.5512 -17.9966 2.89E-56 -310.423 -249.316
6.330745 0.196501 32.21738 2.2E-124 5.944682 6.716809
Intercept
Height
^
b 0  279.869
2
SE ^  se
b0
1 x
1
79.065346532

 15.23718

 15.5512
n S xx
505 6012.843564
^
Testing H 0 : b 0  0 H A : b 0  0 Test Statistic: tobs 
b0
SE ^
b0

279.869
 17.9966
15.5512
95% CI for b 0 :  279.869  1.965 15.5512    279.869  30.55811 
 310.427 ,  249.311
Estimating Mean and Predicting New Response at x=x*
y  220.6733
x  79.06535
n  505
^
S xx  6012.844 b 1  6.330745
^
y i  279.869  6.330745xi
^
b 0  279.869
se  232.1716  15.23718
Estimating Mean Response at x  76": Y  E Y | x  76  b 0  b1  76 
^
^
^
 Y  b 0  b 1  76   279.869  6.330745 76   201.2673
SE ^  se
Y

1 x x

S xx
n
*

2
 76  79.06535  15.23718 0.003543  0.906953
1

 15.23718
6012.844
505
2
95% CI for Y  E Y | x  76 : 201.2673  1.965  0.906953 
^
^
199.4852 , 203.0495
^
Predicting a New Players weight with x  76": y  b 0  b 1  76   279.869  6.330745 76   201.2673
1
SE ^  se 1  
y
n

x*  x
S xx

2
 76  79.06535  15.23718 1.003543  15.26415
1

 15.23718 1 
6012.844
505
2
95% Prediction Interval for y76 : 201.2673  1.965 15.26415  
171.2733 , 231.2614 
Weight vs Height - Data, Fitted Values, CI for Mean, PI for Individuals
350
300
250
200
150
100
66
69
72
75
Weight
Y-hat
78
CI_LB
81
CI_UB
PI_LB
84
PI_UB
87
90
Coefficients of Correlation and Determination
ryx 
S xy
S xx S yy

38065.78
6012.843564  357767.1
 0.820719235
Note that while the intercept and slope depend on units (e.g. inches vs centimetres, pounds vs kilograms,
the correlation coefficient will not)
r 2   r    0.820719235   0.67358
2
Alternatively: r 2 
2
TSS  SSE S yy  SSE 357767.1-116782.3


 0.67358
TSS
S yy
357767.1
Approximately 2/3 (67.4%) of the variation in weight is "explained" by the regression of weight on height
Testing H 0 :  yx  0 H A :  yx  0
Test Statistic: tobs  ryx
n2
505  2

0.820719235
 32.21738
1  ryx2
1  0.67358
P-value  .0000
Analysis of Variance and F-Test
n

Total (Corrected) Sum of Squares: TSS  S yy   yi  y
i 1

2
 357767.1
DFT  n  1  505  1  504
2
^


Error (aka Residual) Sum of Squares: SSE    yi  y i   116782.3

i 1 
n
2
^

Regression (aka Model) Sum of Squares: SSR    y i  y   240984.8

i 1 
n
DFE  n  2  505  2  503
DFR  1
F-Test for Slope Coefficient H 0 : b1  0 H A : b1  0
Test Statistic: Fobs 
 240984.8 1  1037.96
MSR  SSR DFR 


MSE  SSE DFE  116782.3 503
Rejection Region: Fobs  F ; DFR , DFE  F.05;1,503  3.860


P-value  P FDFR , DFE  Fobs  P  F1,503  1037.96  .0000
ANOVA
df
Regression
Residual
Total
SS
MS
1 240984.8 240984.8
503 116782.3 232.1716
504 357767.1
F Significance F
1037.96 2.2E-124
Graphical Representation of Analysis of Variance
300
280
260
240
220
200
180
160
140
120
100
66
69
72
75
78
Weight
Y-hat
81
Y-bar
84
87
90
Linearity of Regression
F -Test for Lack-of-Fit (n j observations at c distinct levels of "X")
H 0 : E Yi   b 0  b1 X i
H A : E Yi   i  b 0  b1 X i
Compute fitted value Y j and sample mean Y j for each distinct X level
c
nj

Lack-of-Fit: SS  LF     Y j  Y j
j 1 i 1
c
nj

Pure Error: SS  PE    Yij  Y j
j 1 i 1


2
2
df LF  c  2
df PE  n  c
SS ( LF )  c  2   MS ( LF )



~
MS
(
PE
)
SS
(
PE
)
n

c




H0
Test Statistic: FLOF
Reject H 0 if FLOF  F 1   ; c  2, n  c 
Fc  2,n c
FLOF
 SSE  SS  PE  


n

2

n

c






 SS  PE  


 nc 
 SSE  R   SSE  F  


df R  df F



 SSE  F  


 df F 
Reject H 0 if FLOF  F 1   ; c  2, n  c 
Computing Strategy:
nj
1) For each group (j ): Compute: Y j 
 nj
  Yij  Y j

s 2j   i 1
n j 1

0


Y
i 1
ij
nj
2
nj  1
otherwise
^
Y j  b0  b1 X j
nj
c
nj
c
2
^
^




2) SS  LF     Y j  Y j    n j  Y j  Y j 



i 1 j 1 
j 1

3) SS  PE    Yij  Y j
i 1 j 1
c
    n  1 s
2
c
j 1
j
2
j
2
 SS  LF  


 c  2   MS ( LF )
 SS  PE   MS ( PE )


 nc 
H0
~
Fc  2,n c
Height and Weight Data – n=505, c=18 Groups
Height
n
69
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
Sum
Source
df
LackFit
PureError
Mean
SD
Y-hat
SSLF
SSPE
SSE
2
182.50
3.54
156.95 1305.39
12.50 1317.89
4
175.75
15.52
169.61
150.62
722.75
873.37
13
181.00
13.00
175.94
332.27 2028.00 2360.27
16
186.13
12.09
182.28
237.15 2191.75 2428.90
21
183.33
9.26
188.61
583.79 1716.67 2300.45
41
193.71
11.58
194.94
61.96 5360.49 5422.44
32
200.84
11.96
201.27
5.74 4434.22 4439.96
31
204.13
10.70
207.60
373.06 3433.48 3806.55
43
211.00
12.83
213.93
368.86 6912.00 7280.86
49
221.35
18.70
220.26
57.94 16781.10 16839.04
46
227.33
15.13
226.59
24.90 10300.11 10325.01
67
232.49
19.63
232.92
12.30 25430.75 25443.05
53
241.49
14.79
239.25
265.64 11369.25 11634.88
44
245.66
17.55
245.58
0.26 13241.89 13242.14
34
254.62
14.70
251.91
248.66 7128.03 7376.69
7
247.86
10.75
258.24
755.21
692.86 1448.07
1
278.00
0.00
264.57
180.24
0.00
180.24
1
263.00
0.00
270.91
62.50
0.00
62.50
505 #N/A
#N/A
#N/A
5026.479 111755.8 116782.3
SS
16
5026.5
487 111755.8
MS
F(LOF)
F(.95)
P-value
314.2
1.369
1.664
0.1521
229.5
Do not reject
H0: j = b0 + b1Xj
Download