Gamma Regression - Napa Valley Marathon Speeds by Age and Gender (2015)

advertisement
Gamma Regression
Marathon Miles per Hour at NAPA Valley
2015 Marathon by Age and Gender
Data Description
• Y = Race speed (Miles per Hour) at Napa Valley 2015
Marathon (3/1/2015) for 1882 Runners (977 M, 905 F)
• Predictor Variables:
 Age in Years (18-76)
 Gender (1=Male, 0=Female)
 Age x Gender Interaction
• Distribution: Gamma (Strictly Positive values, skewed
right)
• Potential Link Functions
 Inverse Link (Conjugate)
 Log Link
1/mph vs Age - Females
0.25
0.23
0.21
1/mph
0.19
0.17
0.15
0.13
0.11
0.09
16
26
36
46
56
66
76
Age
1/mph vs Age - Males
0.25
0.23
0.21
1/mph
0.19
0.17
0.15
0.13
0.11
0.09
16
26
36
46
Age
56
66
76
ln(mph) vs Age - Females
2.5
ln(mph)
2.25
2
1.75
1.5
1.25
16
26
36
46
56
66
76
Age
ln(mph) vs Age - Males
2.5
ln(mph)
2.25
2
1.75
1.5
1.25
16
26
36
46
Age
56
66
76
Gamma Distribution – Likelihood Function
f  y |,   
E Y   Y 
E Y
2
1
   

y
e

    
 yy
y
y  0;  ,   0
2

 1  y / 
e
dy 
0

       y

 1  y / 
e
dy      
      1    1
0
1
1
 1  y / 
y
 1  y / 
e
dy 
0

1
    

1
    
y
0
y
  y/
e
0
 1  y / 
e
   1   1      1
dy 

 
    
    
   2     2   1       2
dy 

   1  2


   
   
 V Y    Y2  E Y 2    E Y      1  2      2  Y
2
2
 Y2
 Y      2
 Y
1
1
 yi 
 yi 
1
Likelihood function: Li 


 exp 
i 
 1   i 

yi   
 
Inverse Link: g  i  
1
i
1
 i 
x i'β
Log Link: g  i   ln  i   i  e
Link Function: g  i    0  1 X i1  ...   p X ip  x i'β
1
 yi x i'β 
 yi x i'β 
1
 Li 


 exp 
 
1  

yi   
 
1
xi 'β
 yi 
 yi 
1
 Li 
 xi'β  exp  xi'β 
1 e 
 e 
yi    
 
Gamma Distribution – Inverse Link
x i'β  1 X i1

 0 
 
1
X ip      0  1 X i1  ...   p X ip
 
 
  p 
x i'β   0  1 X i1  ...   p X ip

β ' 
 0

  0  1 X i1  ...   p X ip 


 0

  1 
  0  1 X i1  ...   p X ip  

  X i1 
x i'β 
1

 xi


β

 


  X ip 
  0  1 X i1  ...   p X ip 


 p


 0  1 X i1  ...   p X ip
1
 0  1 X i1  ...   p X ip 
  1 X i1
 p

X ip   x i'
1
 yi x i'β 
 yi x i'β 
1
Inverse Link: g  i  
 Li 


 exp 
i
 
1  

yi   
 
  1  1
y x 'β
log-Likelihood (Inverse Link): li  ln  Li    ln  yi   ln       ln  yi   ln  x i'β   ln     i i

   
n
 1 1


li 1  1
li 1 n  1

 
x i  yi x i   
 yi  x i  g  β   
 
 yi  xi
β   x i'β
 i 1  x i'β
i 1 β
   x i'β


1

1
 i 
x i'β
n
 2li
 2li
1 1
1 n
1
1

x x '  G β   
 
x x '   X'WX
2 i i
2 i i
ββ'
  xi'β 
 i 1  x i'β 

i 1 ββ'
 x1' 
 1
 x '
i j
2

X   2  Wij    x i'β 
 
0
i j
 

x
'
 n
Gamma Distribution – Inverse Link

l 1 n  1
g β    i   
 yi  x i
 i 1  x i'β
i 1 β

n
 x1' 
 1
 x '
i j
2

X   2  Wij    x i'β 
 
0
i j
 

 x n '
 2li
1
G β   
  X'WX

i 1 ββ'
n
Newton-Raphson Algorithm:
1
  ~ Old  
 ~ Old 
β  β  G  β   g  β 



 
Starting Values:
~ New
~ Old
and iterate to convergence (Note that  "cancels out" in algorithm).
For inverse link, if there were no age or gender effects,
1

 0
~ Old
 
1 y 
 
0
 
 0 
 
 0 
^
Labelling the final converged estimate as β we obtain it's estimated Variance-Covariance matrix as:

1
^


V β   E G  β  ^     X' W X 
β β 



^
^
1
 x1' 
 1
i j
 x ' ^

^ 2


X   2  W ij    x i'β 
 


 
 0
i j
 x n '
 is estimated below
Estimating 
Estimating  (Method of Moments):
E Yi   i V Yi   i2
 Yi
 Yi  i
 Yi 
 Yi 
 
 E    1 V      E   1   E 
 i  
 i 
 i 
 i
 Yi
 Yi  i  
 
 V   1   V 
  
 i  
 i  
2
 Y

 Y   2 

i
 E  i  1   E  i
  
 i  
 i  
n 
Yi   i
1

 

n  p ' i 1  ^
i

^
^




 
  0
 
2
^
Where for the Inverse Link:  i 
1
^
x i'β
Gamma Distribution – Log Link
1
 yi e  xi'β 
 yi e  xi'β 
1
xi 'β
Log Link: g  i   ln  i   i  e
 Li 


 exp 

1  


yi   
 
  1  1
yi e  xi'β
log-Likelihood (Log Link): li  ln  Li    ln  yi   ln       ln  yi   x i'β  ln    

   
n
li 1
li 1 n
1
 xi 'β
 xi 'β

   x i  yi e x i    yi e
 1 x i  g  β   
   yi e  xi'β  1 x i
β 

 i 1
i 1 β

n
 2li
 2li
1  xi'β
1 n
1
  yi e x i x i'  G  β   
   yi e  xi'β xi xi'   X'WX
ββ'

 i 1

i 1 ββ'
 x1' 
 x '
 xi 'β

y
e
i j
2
i
X    Wij  
 
i j
0
 
 x n '

1
1

^
1
 V β   ^ X'X     X'X 




^
^
E G  β  
1

n
1
xi 'β  xi 'β
e
e
x
x
'

X'X

i i
^

1
 yi   i


n  p ' i 1  ^
i

^
n

i 1




2
^
i  e
^
xi 'β
~ Old
Start with: 
 
ln y


 0
 0
 0




Results – Inverse Link
1
i
  0   A Agei   M Malei   AM A i M i
y  6.2007
Beta0
Beta1
Beta2
Beta3
SE{Beta}
t
P-value
0.161272667 0.157777 0.157083313 0.157079252 0.003710707 42.33136 1.6E-275
0 0.000274 0.000302308 0.000302448 9.27468E-05 3.261005 0.00113
0 -0.02556 -0.02282392 -0.02280836 0.005067636 -4.50079 7.19E-06
0 0.00024 0.000175482 0.000175131 0.000121399 1.44261 0.149297
INV(X'WX)
0.000478401 -1.15572E-05 -0.00048 1.15572E-05
-1.1557E-05 2.98866E-07 1.16E-05 -2.9887E-07
-0.0004784 1.15572E-05 0.000892 -2.0667E-05
1.15572E-05 -2.98866E-07 -2.1E-05 5.12042E-07
phi-hat
D(y,mu)
GOF
df
0.02878202 53.69870121 1865.703
X2(.05)
P-value
1878 1979.931009 0.575425386
LRTest(B3)P-value
2.080118 0.149229
The main effects of Age and gender are significant, the interaction is not
Deviance, Goodness-of-Fit and Likelihood Ratio Tests are Described Below
Results – Log Link
ln  i    0   A Agei   M Malei   AM A i M i
y  6.2007
Beta0
Beta1
Beta2
Beta3
SE(Beta) t
P-value
1.824658765 1.851699 1.849421537
1.8494178 0.022058 83.84319
0
0 -0.0019 -0.00181168 -0.0018116 0.000546 -3.3171 0.000927
0 0.143565 0.152182376
0.1521939 0.031508 4.830278 1.47E-06
0 -0.00113 -0.00133855 -0.0013388 0.000742 -1.80392 0.071404
INV(X'X)
0.016917118 -0.000404935 -0.01692 0.000404935
-0.00040493
1.037E-05 0.000405 -1.037E-05
-0.01691712 0.000404935 0.034518 -0.00078647
0.000404935
-1.037E-05 -0.00079 1.91514E-05
phi-hat
D(y,mu)
GOF
df
0.02876127 53.66783997 1865.976
X2(.05)
P-value
1878 1979.931009 0.573672
LRTest(B3)P-value
3.194504 0.073886
The main effects of Age and Gender are significant, the interaction is not.
Deviance, Goodness-of-Fit and Likelihood Ratio Tests are Described Below
Deviance
Deviance measures the discrepancy between the observed and fitted values for a model
 
y
Deviance   ln  ^ i

i 1 
  i
n
 y  ^ 
i 
 i
^

 i 

 ^


Scaled Deviance  2 l   ,  , y   l  y,  , y  

 

n
n
  1  1  n
 1 n y
 ^
^ 
l  y,     ln  yi   n ln         ln  yi    ln   i   n ln      ^ i


 
i 1
i 1
  i 1 
       i 1
i
n
  1  1  n
 1 n yi
l  y, y    ln  yi   n ln         ln  yi    ln  yi   n ln     
i 1
i 1
  i 1 yi
       i 1
 y  y  ^ 
n 
^
1


 l   ,  , y   l  y,  , y    ln  ^ i   i ^ i 
 i 1    


 i 
  i
Under the null hypothesis that the current model is correct:
Deviance
Deviance
~  n2 p '
Reject the current model if:
  2  , n  p ' 
^
n


Both (link) models provide a good fit (both p > 0.50)
Model with Interaction - Log Link
12
10.5
mph
9
mph_f
7.5
mph_m
mu-hat_f
mu_hat_m
6
4.5
3
16
24
32
40
48
Age
56
64
72
80
Additive Model Results – Log Link
Beta0
Beta1
Beta2
Beta3
SE(Beta) t
P-value
1.824658765 1.876355 1.877591181 1.877591953 0.015512 121.0442
0
0 -0.00254 -0.00253238 -0.00253237 0.00037 -6.84351 1.04E-11
0 0.097432 0.097190587 0.097190276 0.007997 12.15384 8.97E-33
INV(X'X)
0.008355214 -0.000185672 -0.00029
-0.00018567
4.7549E-06 -2.1E-05
-0.00028807 -2.09201E-05 0.002221
phi-hat
D(y,mu)
GOF
df
0.02879762 53.75971785 1866.811
X2(.05)
P-value
1879 1980.957848 0.574711
Likelihood Ratio Test for H 0 :  AM  0
TS :
 DEVR  DEVF    53.75971785  53.66783997   3.1945
^
F
0.02876127
P  Pr  12  3.1945  0.073886
Additive Model - Log Link
12
10.5
mph
9
mph_f
7.5
mph_m
mu-hat_f
mu_hat_m
6
4.5
3
16
24
32
40
48
Age
56
64
72
80
R Program
napaf2015 <- read.csv("http://www.stat.ufl.edu/~winner/data/napa_marathon_fm2015.csv", header=T)
attach(napaf2015); names(napaf2015)
gender <- factor(Gender)
napa.mod1 <- glm(mph~1,family=Gamma); summary(napa.mod1); deviance(napa.mod1)
napa.mod2 <- glm(mph~Age,family=Gamma); summary(napa.mod2)
napa.mod3 <- glm(mph ~ Age, family=Gamma(link="log")); summary(napa.mod3)
napa.mod4 <- glm(mph~gender,family=Gamma); summary(napa.mod4)
napa.mod5 <- glm(mph~Age + gender,family=Gamma); summary(napa.mod5)
napa.mod6 <- glm(mph ~ Age + gender, family=Gamma(link="log")); summary(napa.mod6)
napa.mod7 <- glm(mph~Age*gender,family=Gamma); summary(napa.mod7)
napa.mod8 <- glm(mph ~ Age*gender, family=Gamma(link="log")); summary(napa.mod8)
age1 <- min(Age):max(Age)
yhat.F <- exp(1.8494178 - 0.0018116*age1)
yhat.M <- exp((1.8494178+0.1521938) - (0.0018116+0.0013388)*age1)
plot(Age,mph,col=gender)
lines(age1,yhat.F,col=1)
lines(age1,yhat.M,col=2)
anova(napa.mod5,napa.mod7,test="Chisq")
anova(napa.mod6,napa.mod8,test="Chisq")
par(mfrow=c(2,2))
plot(Age[Gender=="F"],log(mph[Gender=="F"]))
plot(Age[Gender=="M"],log(mph[Gender=="M"]))
plot(Age[Gender=="F"],1/mph[Gender=="F"])
plot(Age[Gender=="M"],1/mph[Gender=="M"])
Download