Negative Binomial Regression - NASCAR Lead Changes (1975

Negative Binomial Regression NASCAR Lead Changes 1975-1979 Data Description • Units – 151 NASCAR races during the 19751979 Seasons • Response - # of Lead Changes in a Race • Predictors:  # Laps in the Race  # Drivers in the Race  Track Length (Circumference, in miles)  Models:  Poisson (assumes E(Y) = V(Y))  Negative Binomial (Allows for V(Y) > E(Y)) Poisson Regression • Random Component: Poisson Distribution for # of Lead Changes • Systematic Component: Linear function with Predictors: Laps, Drivers, Trklength • Link Function: log: g(m) = ln(m) y m X   m  X   e Mass Function: P Y  y | X 1 , X 2 , X 3   y! g  m  X      1 X 1   2 X 2  3 X 3  x '   m  X   e  1 X1  2 X 2  3 X 3  e x '  x '  1 X 1 X 2 X 3  Regression Coefficients – Z-tests Parameter Intercept Laps Drivers Trklength Estimate -0.4903 0.0021 0.0516 0.6104 Std Error 0.2178 0.0004 0.0057 0.0829 Z -2.25 5.15 9.09 7.36 Note: All predictors are highly significant. Holding all other factors constant: • As # of laps increases, lead changes increase • As # of drivers increases, lead changes increase • As Track Length increases, lead changes increase m e 0.49030.0021L0.0516 D0.6104T P-value .0244 <.0001 <.0001 <.0001 Testing Goodness-of-Fit • Break races down into 10 groups of approximately equal size based on their fitted values • The Pearson residuals are obtained by computing: ^ ^ Yi  m i Yi  m i observed- fitted ei    ^ V (Yi ) fitted mi X 2   ei2 • Under the hypothesis that the model is adequate, X2 is approximately chi-square with 10-4=6 degrees of freedom (10 cells, 4 estimated parameters). • The critical value for an =0.05 level test is 12.59. • The data (next slide) clearly are not consistent with the model. • Note that the variances within each group are orders of magnitude larger than the mean. Testing Goodness-of-Fit Range 0-9.4 9.4-10.5 10.5-11.6 11.6-20 20-21 21-23 23-26 26-32 32-36 36+ Total #Races 15 14 14 17 19 15 16 16 11 13 151 obs fit 113 138 178 321 485 191 353 491 349 574 131.3 150.6 157.1 274.4 390.3 328.7 397.1 452.9 374.2 536.4 Pearson -1.60 -1.03 1.67 2.81 4.79 -7.60 -2.21 1.79 -1.30 1.62 2 X =107.4 Mean 7.53 9.20 12.71 18.88 25.53 12.73 22.06 30.69 31.73 44.15 Variance 23.41 34.46 41.30 56.36 89.93 48.21 74.33 183.70 201.82 229.47 107.4 >> 12.59  Data are not consistent with Poisson model Negative Binomial Regression • Random Component: Negative Binomial Distribution for # of Lead Changes • Systematic Component: Linear function with Predictors: Laps, Drivers, Trklength • Link Function: log: g(m) = ln(m)  y  k k  k   m  Mass Function: P Y  y | X 1 , X 2 , X 3 , k         k    y  1  k  m   k  m  E Y   m V Y   m  m2 k g  m  X      1 X 1   2 X 2  3 X 3  x '   m  X   e  1 X1  2 X 2  3 X 3  e x '  x '  1 X 1 X 2 X 3  y y  0,1, 2,... Regression Coefficients – Z-tests Note that SAS and STATA estimate 1/k in this model. Parameter Intercept Laps Drivers Trklength 1/k m e Estimate -0.5038 0.0017 0.0597 0.5153 0.1905 Std Error 0.4616 0.0009 0.0143 0.1636 0.0294 Z -1.09 2.01 4.17 2.87 0.5038 0.0017 L  0.0597 D  0.5153T   V Y   m  0.1905 m 2 P-value .2752 .0447 <.0001 .0041 Goodness-of-Fit Test Pearson Residuals: ei  Range 0-9.4 9.4-10.5 10.5-11.6 11.6-20 20-21 21-23 23-26 26-32 32-36 36+ Total #Races 13 17 20 11 21 12 18 16 14 9 151 Yi  m i V Yi  obs fit 96 155 248 251 523 141 442 470 445 422 111.4 170.2 223.3 202.4 431.5 261.8 452.0 464.3 485.3 397.5  Yi  m i 2 mi  mi k Pearson -0.30952 -0.20153 0.250504 0.543148 0.482911 -1.04674 -0.0504 0.02797 -0.18924 0.140292 X2=1.88 X 2   ei2 Mean 7.38 9.12 12.40 22.82 24.90 11.75 24.56 29.38 31.79 46.89 S.D. 4.22 6.10 5.87 5.53 9.55 6.44 10.98 14.83 14.12 13.82 • Clearly this model fits better than Poisson Regression Model. • For the negative binomial model, SD/mean is estimated to be 0.43 = sqrt(1/k). • For these 10 cells, ratios range from 0.24 to 0.67, consistent with that value. Computational Aspects - I k is restricted to be positive, so we estimate k* = log(k) which can take on any value. Note that software packages estimating 1/k are estimating –k* Likelihood Function: k yi k ( yi  k )  k   mi  ( yi  k  1) (k )(k )  k   mi  Li           (k )( yi  1)  k  mi   k  mi  (k )( yi  1) k  m k  m i   i   ( y  k  1)  i yi ! k yi ( k )  k   mi       k  m k  m i   i   ( yi  e  1) yi ! k* ek* yi   mi  e  e  k*   k*  e  m e  m i  i    k* k* yi Log-Likelihood Function: yi 1 li  ln  Li    ln(ek*  j )  ln  yi !  ek* ln(ek* )  yi ln( mi )  (ek*  yi ) ln( mi  ek* ) j 0 Computational Aspects - II Derivatives wrt k* and : yi 1  li e k *  yi 1 k*  k*  e  k *  1  ln(e )  k *  ln(e k *  mi )  k * e  mi  j 0 e  j   yi 1   yi 1  2li e k *  yi mi  yi  1 1 ek*  k*  k* k* k* k*   e  k *  1  ln(e )  k *  ln(e  mi )  e  k * 1 e   2 k* 2    (k *) 2 e  j e  m ( e  j ) m  ek*  j 0 i i m  e  j 0    i      2li y  m k* i i   xi e mi    m  e k * 2  k *   i   y m  li  xi e k *  i ki*    mi  e   k*   2li e  yi  k*    xi xi ' e mi   m  e k * 2    '  i  Computational Aspects - III Newton-Raphson Algorithm Steps: l gk   i k *  2li Gk   k *2 l g   i   2li G    ' g  g k      gk  G k  G    ' 2    l i     k *   Step 1: Set k*=0 (k=1) and iterate to obtain estimate of : Step 2: Set ’ = [1 0 0 0] and iterate to obtain estimate of k*:  2li   k *      Gk *  ~ (i )   ~ (i ) k* ~ ( i 1) ~ ( i 1) k*    G  ~ ( i 1)     Gk 1 g k      k * Step 4: Back-transform k* to get estimate of k: k=exp(k*) g  Gk  g k Step 3: Use results from steps and 2 as starting values (software packages seem to use different intercept) to obtain estimates of k* and  ~ (i ) 1 1

Negative Binomial Regression - NASCAR Lead Changes (1975

Related documents

Products

Support

Negative Binomial Regression - NASCAR Lead Changes (1975

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib