Logistic Regression - NFL Field Goal Attempts (2003)

advertisement

41.0

42.0

43.0

44.0

45.0

46.0

47.0

48.0

49.0

50.0

51.0

52.0

53.0

54.0

55.0

56.0

57.0

58.0

60.0

62.0

29.0

30.0

31.0

32.0

33.0

34.0

35.0

36.0

37.0

38.0

39.0

40.0

18.0

19.0

20.0

21.0

22.0

23.0

24.0

25.0

26.0

27.0

28.0

Case Study – Logistic Regression

NFL Field Goal Attempts – 2003

Dependent Variable: Outcome of Field Goal Attempt (1=Success,

0=Failure)

Independent Variable: Distance (Yards)

Data:

YARDS * SUCCESS Crosstabulation

Count

YARDS

Total

36

29

25

18

31

21

34

27

14

14

8

6

1

948

1

2

1

2

27

25

32

33

26

18

33

25

33

29

32

28

28

Total

1

33

30

17

27

5

19

18

36

32

32

37

22

SUCCESS

0

15

13

11

6

11

8

9

7

5

5

9

7

1

192

0

2

0

1

6

7

7

4

3

6

3

2

10

2

7

5

7

0

0

1

0

2

1

0

0

1

4

3

0

1

1

21

16

14

12

20

13

25

20

3

1

5

7

0

756

1

0

1

1

20

21

26

26

23

16

30

19

26

24

22

26

21

1

33

28

17

26

5

18

17

36

32

31

33

19

Random Component: Binomial Distribution

Systematic Component: g(

) =

+

X

Link Function: logit g(

) = log



1

 



Regression Equation:

( x )

1 e

   x

 e

   x

Estimated Regression Equation:

Variables in the Equation

1

YARDS

Constant

B

-.110

5.698

S.E.

.011

.451

a. Variable(s) entered on s tep 1: YARDS.

Wald

107.856

159.545

df

1

1

Sig.

.000

.000

Exp(B)

.896

298.235

95.0% C.I.for EXP(B)

Lower

.878

Upper

.915

^

( x )

1 e

5 .

698

0 .

110 x

 e

5 .

698

0 .

110 x

Yardage Probability

20

25

30

35

40

45

50

55

.9706

.9502

.9167

.8639

.7855

.6787

.5493

.4129

95% CI for

: -0.110

1.96(0.011)

-0.110

0.022

(-0.132 , -0.088)

Odds Ratio: OR

 odds ( x

1 ) odds ( x )

1

( x

 

(

 x

1 )

1 )

1

(

 x )

( x )

 e

Estimated odds ratio: e

^

  e

0 .

110 

0 .

896

Odds of successful fieldgoal change by about (0.896-1)100% = -10.4% for each extra yard of the attempt.

95% CI for odds ratio: (e -0.132

, e -0.088

)

(0.876 , 0.916)

Hosmer-Lemeshow Goodness-of-Fit Test:

1.

Lump attempts into 10 bins of approximately equal number of attempts

2.

Obtain the observed and predicted (expected) numbers of successes and failures for each bin (20 total cells)

3.

For each bin, compute the contribution to the chi-square statistic:

X

2 

( observed expected)

2 expected

4.

Sum the results from step 3 over all 20 cells

Hosme r and Leme show Test

St ep

1

Chi-square

5.974

df

8

Sig.

.650

Contingency Table for Hosmer and Lemeshow Test

SUCCESS = 0 SUCCESS = 1

Step

1

1

2

Observed Expected

47 47.043

35 36.501

Observed Expected

45 44.957

57 55.499

Total

92

92

3 28 27.657

58 58.343

86

4 19 22.232

69 65.768

88

5 19 18.588

76 76.412

95

6 17 12.632

67 71.368

84

7 14 10.995

88 91.005

102

8 8 6.779

83 84.221

91

9 3 5.638

103 100.362

106

10 2 3.936

110 108.064

112

Don’t reject the null hypothesis that the model fit is appropriate.

Computational Approach to Obtaining Logistic Regression Analysis

Data: n i

observations at the i th of m distinct levels of the independent variable(s), with y i

successes.

X

 x x

'

1

2

' x m

'

 x i

' 

1 x

1 i

 x pi

 

0

1 p

Note: p =1 in this case

With Likelihood and log-Likelihood Functions:

L (

| ( n

1

, y

1

),..., ( n m

, y m

)) log( L )

 i n 

1

 log( n i

!

)

 log(

 i m 

1

 y i

!

)

 y i

!

( n i n i

!

 log(( n i

 y i

)!





1 exp(

 x i

' exp(

 x i

'

)

)

 y i y i

)!

)

 i n 

1 y i x i

'

 



1

 n  i

1 n i

1 exp( x i

' 

)

 n i

 y i log

1

 exp( x i

'

)

The derivative of the log-likelihood wrt

:

 log( L )

 

  y i x i

  n i

1

1 exp( x i

'

) x i exp( x i

' 

)

 

( y i

 n i

 i

) x i

 g (

)

The Hessian matrix:

 2

'

   n x i

 x i

'

)

 x i

' exp( x i

'

  x i

'

 x i x i

'

)

 x i

'

)

2

   n x x i i i

' exp( x i

'

) 

W ii

 n

 i i

(1

  i

) W ij

1

 x i

'

)

2

0 i

 j

 

Newton-Raphson-Algorithm:

^

NEW

^

OLD

 G



^

OLD 



1 g



^

OLD 



'

G

Results from SAS PROC IML:

BETA_NEW SEBETA

5.6978802 0.4510991

-0.10991 0.0105832

VBETA LOGLIKE

0.2034904 -0.004683 -408.6017

-0.004683 0.000112

Note: There log-likelihood function evaluation does not match SPSS’ (it does not evaluate the first term, which does not involve

). Any tests concerning

will not be effected.

1 y

.8

.6

.4

.2

20 30 40 50 60 70 x

For more detail, see Agresti (2002), Categorical Data Analysis . Chapter 4.

Sources: www.jt-sw.com

, www.espn.com

Download