Caution Flags (Crashes) in NASCAR
Winston Cup Races 1975-1979
L. Winner (2006). “NASCAR Winston Cup Race Results for 1975-2003,” Journal of Statistics
Education, Vol.14,#3, www.amstat.org/publications/jse/v14n3/datasets.winner.html
• Units: NASCAR Winston Cup Races (1975-1979) n=151 Races
• Dependent Variable:
Y=# of Caution Flags/Crashes (CAUTIONS)
• Independent Variables:
X
1
=# of Drivers in race (DRIVERS)
X
2
=Circumference of Track (TRKLENGTH)
X
3
=# of Laps in Race (LAPS)
• Random Component:
Poisson Distribution for # of Caution Flags
• Density Function:
P
Y
y X
1
, X
2
, X
3
)
e
m
X
1
, X
2
, X
3
) m
X
1
, X
2
, X
3
) y y !
y
0 , 1 , 2 ,...
• Link Function: g( m) = log( m)
• Systematic Component: g ( m
)
m
log(
X
1
, X
2
, m
)
X
3
)
0
1
X
1
2
X
2 e
0
1
X
1
2
X
2
3
X
3
3
X
3
• H
0
:
1
2
3
• H
A
: Not all j
0
= 0
(# Cautions independent of all predictors)
(# Cautions associated with at least 1 predictor)
• Test Statistic: X obs
2
• P-Value: P( c 2
3
≥ X
= -2(L obs
2 )
0
-L
• Rejection Region: X obs
2 ≥ c 2 a ,3
1
)
• Where:
L
0
L
1 is maximized log likelihood under model H
0 is maximized log likelihood under model H
A
g
m
Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood
DF
150
150
150
150
0
Value
215.4915
215.4915
201.6050
201.6050
410.8784
Value/DF
1.4366
1.4366
1.3440
1.3440
g
m
0
1
X
1
2
X
2
3
X
3
Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood
DF
147
147
147
147
Value
171.2162
171.2162
158.8281
158.8281
433.0160
Value/DF
1.1647
1.1647
1.0805
1.0805
Test Statistic
Rejection
P
value
2
: X obs
:
Region
P
c
2
3
( a
2
L
0
L
1
)
2 ( 410 .
8784
433 .
0160 )
44 .
2752
0 .
05 )
44 .
2752
)
:
0
2
X obs
c
2
.
05 , 3
7 .
815
Statistical output obtained from SAS PROC GENMOD
H
0
:
j
0 H
A
:
j
0
Z
( Z )
Test Statistic
P
value : 2 P
: z obs
Z
SE
z obs
)
^
^ j j c
2 c
Test Statistic
P
value : P
c
1
2
: X
2 obs
X
2 obs
)
SE
^
^ j j
2
1 Sided Tests : Confirm sign of
^ j is correct, then " cut" P value in half.
Parameter
Intercept
Drivers
TrkLength
Laps
DF
1
1
1
1
Estimate
-0.7963
0.0365
0.1145
0.0026
Std Error
0.4117
0.0125
0.1684
0.0008
Chi-Square
3.74
8.55
0.46
10.82
Pr>ChiSq
0.0531
0.0035
0.4966
0.0010
Conclude the following:
• Controlling for Track Length and Laps, as Drivers Cautions
• Controlling for Drivers and Laps, No association between Cautions and Track Length
• Controlling for Drivers and Track Length, as Laps Cautions
Reduced Model: log(Crashes) = -0.6876+0.0428*Drivers+0.0021*Laps
• Two Common Measures of Goodness of Fit:
– Pearson’s Chi-Square
– Deviance
• Both measures have approximate Chi-Square Distributions under the hypothesis that the current model is appropriate for fixed number of combinations of independent variables and large counts
Pearson' s Chi Square : X
2 i n
1 y i
V
^
^ m i
2
where V
^
^ m i
^ m i
^ m i for Poisson Distributi on
Deviance : G 2
2 i n
1 y i log y i
^ m i
Null Model
Criterion
Pearson X2
Deviance
DF
150
150
Value
201.6050
215.4915
Value/DF
1.3440
1.4366
P-Value
0.0032
0.0004
Full Model
Criterion
Pearson X2
Deviance
DF
147
147
Value
158.8281
171.2162
Value/DF
1.0805
1.1647
P-Value
0.2386
0.0838
Note that the null model clearly does not fit well, and the full model fails to reject the null hypothesis of the model being appropriate (however, we have many combinations of Laps, Track Length, and Drivers)
options ps=54 ls=76; data one; input serrace 6-8 year 13-16 searace 23-24 drivers 31-32 trklength 34-40 laps 46-48 road 56 cautions 63-64 leadchng 71-
72; cards;
1 1975 1 35 2.54 191 1 5 13
...
151 1979 31 37 2.5 200 0 6 35
; run;
/* Data set one contains the data for analysis. Variable names and column specs are given in INPUT statement. I have included ony first and last observations */
/* The following model fits a Generalized Linear model, with poisson random component, and a constant mean: g(mu)=alpha is systematic component, g(mu)=log(mu) is the link function: mu=e**alpha */ proc genmod; model Cautions = / dist=poi link=log; run;
/* The following model fits a Generalized Linear model, with poisson random component, g(mu)=alpha + beta1*drivers + beta2*trkength + beta3*laps is systematic component, g(mu)=log(mu) is the link function: mu=e**alpha + beta1*drivers + beta2*trkength + beta3*laps */ proc genmod; model Cautions = drivers trklength laps / dist=poi link=log; run; quit;
• Used when there are “many” distinct levels of explanatory variables
• Based on “lumping” together cases based on their predicted values into J (often 10 is used) groups
• Compares observed and expected counts by group based on Deviance and Pearson residuals. For Poisson model (where obs is observed, exp is expected):
Pearson: r i
Deviance: d
= (obs i i
-exp i
)/√exp i
= √(obs i
* log(obs i
X 2 = r
/exp i i
2
)) G 2 =2 d i
2
Degrees of Freedom: J- p-1 where p=#Predictor Variables
Group
1
2
3
4
5
6
7
8
9
10
^ m i
e
0 .
6876
0 .
0428 D i
0 .
0021 L i
Fitted
<3.50
3.50-3.80
3.80-4.08
4.08-4.25
4.25-4.42
4.42-5.15
5.15-5.50
5.50-6.25
6.25-6.70
>6.70
#Races #Crashes Expected Pearson
15 37 46.05
-1.33
14 60 50.37
1.36
18
20
12
72
68
51
71.24
84.03
52.35
0.09
-1.75
-0.19
17
15
15
14
11
100
88
91
94
63
81.39
78.19
87.40
90.81
78.46
2.06
1.11
0.38
0.33
-1.75
Pearson X2 15.5119
P-value 0.0300
Note that there is evidence that the Poisson model does not provide a good fit
Computational Approach
Poisson Probabilit
Systematic Component
Link Function : g y Mass Function
( m
)
: g
log(
( m
) m
)
0 m
: P ( Y
1
X
1
e g ( m
) e
m m y
y )
2
X
2 y
y !
3
X
3
e
0
1
X
1
2
X
2
3
X
3
0 , 1 , 2 ,...
For Subject i x i
X
X
1
1 i
2 i
X
3 i
Likelihood
: g ( m i
)
β
0
3
0
1
2
Function : L
β
1
X
1 i
2
X
2 i
3
X
3 i
x
' i
β where :
X
x x x
'
'
'
1
2 n
y
1
,..., y n
)
i n
1
Y
y
1 y
2
y n
e
m i m i y i y i
!
i n
1 exp
μ e
x i
' m m m
β
1
2 n
x
' i
β y i y i
!
l
l
β
ln( L )
i n
1
e x
' i
β i n
1 y i x
' i
β i n
1 ln
) i
x i e x
' i
β y i x i
y i
m i
) x i
y i
m i
)
1
X
1 i
X
2 i
X
3 i
Computational Approach
l
β
x i e x
' i
β y i x i
y i
m i
) x i
y i
m i
)
1
X
1 i
X
2 i
X
3 i
Setting
2 l
β β'
l
β
β'
0
x i e x
' i
β
y i
m i
)
X
X y i x i
1
X
1 i
2 i
3 i
)
0
X ' ( Y
x i e x
' i
β x ' i
μ
)
0
X ' WX where W
diag
Setting
^
β
New
: G
^
β
Old
X ' WX and g
G
^
β
Old
1 g
^
β
Old
X ' ( Y
with
μ
) leads to the the estimate a reasonable staring vector of of
^
β
β
via
0
Newton ln
0
0
0
)
Raphson algorithm with approximat e large sample estimated variance
1
V
^ ^
β
G
1
X'
^
W X
covariance matrix :
: