Gamma Regression Marathon Miles per Hour at NAPA Valley 2015 Marathon by Age and Gender Data Description • Y = Race speed (Miles per Hour) at Napa Valley 2015 Marathon (3/1/2015) for 1882 Runners (977 M, 905 F) • Predictor Variables: Age in Years (18-76) Gender (1=Male, 0=Female) Age x Gender Interaction • Distribution: Gamma (Strictly Positive values, skewed right) • Potential Link Functions Inverse Link (Conjugate) Log Link 1/mph vs Age - Females 0.25 0.23 0.21 1/mph 0.19 0.17 0.15 0.13 0.11 0.09 16 26 36 46 56 66 76 Age 1/mph vs Age - Males 0.25 0.23 0.21 1/mph 0.19 0.17 0.15 0.13 0.11 0.09 16 26 36 46 Age 56 66 76 ln(mph) vs Age - Females 2.5 ln(mph) 2.25 2 1.75 1.5 1.25 16 26 36 46 56 66 76 Age ln(mph) vs Age - Males 2.5 ln(mph) 2.25 2 1.75 1.5 1.25 16 26 36 46 Age 56 66 76 Gamma Distribution – Likelihood Function f y |, E Y Y E Y 2 1 y e yy y y 0; , 0 2 1 y / e dy 0 y 1 y / e dy 1 1 0 1 1 1 y / y 1 y / e dy 0 1 1 y 0 y y/ e 0 1 y / e 1 1 1 dy 2 2 1 2 dy 1 2 V Y Y2 E Y 2 E Y 1 2 2 Y 2 2 Y2 Y 2 Y 1 1 yi yi 1 Likelihood function: Li exp i 1 i yi Inverse Link: g i 1 i 1 i x i'β Log Link: g i ln i i e Link Function: g i 0 1 X i1 ... p X ip x i'β 1 yi x i'β yi x i'β 1 Li exp 1 yi 1 xi 'β yi yi 1 Li xi'β exp xi'β 1 e e yi Gamma Distribution – Inverse Link x i'β 1 X i1 0 1 X ip 0 1 X i1 ... p X ip p x i'β 0 1 X i1 ... p X ip β ' 0 0 1 X i1 ... p X ip 0 1 0 1 X i1 ... p X ip X i1 x i'β 1 xi β X ip 0 1 X i1 ... p X ip p 0 1 X i1 ... p X ip 1 0 1 X i1 ... p X ip 1 X i1 p X ip x i' 1 yi x i'β yi x i'β 1 Inverse Link: g i Li exp i 1 yi 1 1 y x 'β log-Likelihood (Inverse Link): li ln Li ln yi ln ln yi ln x i'β ln i i n 1 1 li 1 1 li 1 n 1 x i yi x i yi x i g β yi xi β x i'β i 1 x i'β i 1 β x i'β 1 1 i x i'β n 2li 2li 1 1 1 n 1 1 x x ' G β x x ' X'WX 2 i i 2 i i ββ' xi'β i 1 x i'β i 1 ββ' x1' 1 x ' i j 2 X 2 Wij x i'β 0 i j x ' n Gamma Distribution – Inverse Link l 1 n 1 g β i yi x i i 1 x i'β i 1 β n x1' 1 x ' i j 2 X 2 Wij x i'β 0 i j x n ' 2li 1 G β X'WX i 1 ββ' n Newton-Raphson Algorithm: 1 ~ Old ~ Old β β G β g β Starting Values: ~ New ~ Old and iterate to convergence (Note that "cancels out" in algorithm). For inverse link, if there were no age or gender effects, 1 0 ~ Old 1 y 0 0 0 ^ Labelling the final converged estimate as β we obtain it's estimated Variance-Covariance matrix as: 1 ^ V β E G β ^ X' W X β β ^ ^ 1 x1' 1 i j x ' ^ ^ 2 X 2 W ij x i'β 0 i j x n ' is estimated below Estimating Estimating (Method of Moments): E Yi i V Yi i2 Yi Yi i Yi Yi E 1 V E 1 E i i i i Yi Yi i V 1 V i i 2 Y Y 2 i E i 1 E i i i n Yi i 1 n p ' i 1 ^ i ^ ^ 0 2 ^ Where for the Inverse Link: i 1 ^ x i'β Gamma Distribution – Log Link 1 yi e xi'β yi e xi'β 1 xi 'β Log Link: g i ln i i e Li exp 1 yi 1 1 yi e xi'β log-Likelihood (Log Link): li ln Li ln yi ln ln yi x i'β ln n li 1 li 1 n 1 xi 'β xi 'β x i yi e x i yi e 1 x i g β yi e xi'β 1 x i β i 1 i 1 β n 2li 2li 1 xi'β 1 n 1 yi e x i x i' G β yi e xi'β xi xi' X'WX ββ' i 1 i 1 ββ' x1' x ' xi 'β y e i j 2 i X Wij i j 0 x n ' 1 1 ^ 1 V β ^ X'X X'X ^ ^ E G β 1 n 1 xi 'β xi 'β e e x x ' X'X i i ^ 1 yi i n p ' i 1 ^ i ^ n i 1 2 ^ i e ^ xi 'β ~ Old Start with: ln y 0 0 0 Results – Inverse Link 1 i 0 A Agei M Malei AM A i M i y 6.2007 Beta0 Beta1 Beta2 Beta3 SE{Beta} t P-value 0.161272667 0.157777 0.157083313 0.157079252 0.003710707 42.33136 1.6E-275 0 0.000274 0.000302308 0.000302448 9.27468E-05 3.261005 0.00113 0 -0.02556 -0.02282392 -0.02280836 0.005067636 -4.50079 7.19E-06 0 0.00024 0.000175482 0.000175131 0.000121399 1.44261 0.149297 INV(X'WX) 0.000478401 -1.15572E-05 -0.00048 1.15572E-05 -1.1557E-05 2.98866E-07 1.16E-05 -2.9887E-07 -0.0004784 1.15572E-05 0.000892 -2.0667E-05 1.15572E-05 -2.98866E-07 -2.1E-05 5.12042E-07 phi-hat D(y,mu) GOF df 0.02878202 53.69870121 1865.703 X2(.05) P-value 1878 1979.931009 0.575425386 LRTest(B3)P-value 2.080118 0.149229 The main effects of Age and gender are significant, the interaction is not Deviance, Goodness-of-Fit and Likelihood Ratio Tests are Described Below Results – Log Link ln i 0 A Agei M Malei AM A i M i y 6.2007 Beta0 Beta1 Beta2 Beta3 SE(Beta) t P-value 1.824658765 1.851699 1.849421537 1.8494178 0.022058 83.84319 0 0 -0.0019 -0.00181168 -0.0018116 0.000546 -3.3171 0.000927 0 0.143565 0.152182376 0.1521939 0.031508 4.830278 1.47E-06 0 -0.00113 -0.00133855 -0.0013388 0.000742 -1.80392 0.071404 INV(X'X) 0.016917118 -0.000404935 -0.01692 0.000404935 -0.00040493 1.037E-05 0.000405 -1.037E-05 -0.01691712 0.000404935 0.034518 -0.00078647 0.000404935 -1.037E-05 -0.00079 1.91514E-05 phi-hat D(y,mu) GOF df 0.02876127 53.66783997 1865.976 X2(.05) P-value 1878 1979.931009 0.573672 LRTest(B3)P-value 3.194504 0.073886 The main effects of Age and Gender are significant, the interaction is not. Deviance, Goodness-of-Fit and Likelihood Ratio Tests are Described Below Deviance Deviance measures the discrepancy between the observed and fitted values for a model y Deviance ln ^ i i 1 i n y ^ i i ^ i ^ Scaled Deviance 2 l , , y l y, , y n n 1 1 n 1 n y ^ ^ l y, ln yi n ln ln yi ln i n ln ^ i i 1 i 1 i 1 i 1 i n 1 1 n 1 n yi l y, y ln yi n ln ln yi ln yi n ln i 1 i 1 i 1 yi i 1 y y ^ n ^ 1 l , , y l y, , y ln ^ i i ^ i i 1 i i Under the null hypothesis that the current model is correct: Deviance Deviance ~ n2 p ' Reject the current model if: 2 , n p ' ^ n Both (link) models provide a good fit (both p > 0.50) Model with Interaction - Log Link 12 10.5 mph 9 mph_f 7.5 mph_m mu-hat_f mu_hat_m 6 4.5 3 16 24 32 40 48 Age 56 64 72 80 Additive Model Results – Log Link Beta0 Beta1 Beta2 Beta3 SE(Beta) t P-value 1.824658765 1.876355 1.877591181 1.877591953 0.015512 121.0442 0 0 -0.00254 -0.00253238 -0.00253237 0.00037 -6.84351 1.04E-11 0 0.097432 0.097190587 0.097190276 0.007997 12.15384 8.97E-33 INV(X'X) 0.008355214 -0.000185672 -0.00029 -0.00018567 4.7549E-06 -2.1E-05 -0.00028807 -2.09201E-05 0.002221 phi-hat D(y,mu) GOF df 0.02879762 53.75971785 1866.811 X2(.05) P-value 1879 1980.957848 0.574711 Likelihood Ratio Test for H 0 : AM 0 TS : DEVR DEVF 53.75971785 53.66783997 3.1945 ^ F 0.02876127 P Pr 12 3.1945 0.073886 Additive Model - Log Link 12 10.5 mph 9 mph_f 7.5 mph_m mu-hat_f mu_hat_m 6 4.5 3 16 24 32 40 48 Age 56 64 72 80 R Program napaf2015 <- read.csv("http://www.stat.ufl.edu/~winner/data/napa_marathon_fm2015.csv", header=T) attach(napaf2015); names(napaf2015) gender <- factor(Gender) napa.mod1 <- glm(mph~1,family=Gamma); summary(napa.mod1); deviance(napa.mod1) napa.mod2 <- glm(mph~Age,family=Gamma); summary(napa.mod2) napa.mod3 <- glm(mph ~ Age, family=Gamma(link="log")); summary(napa.mod3) napa.mod4 <- glm(mph~gender,family=Gamma); summary(napa.mod4) napa.mod5 <- glm(mph~Age + gender,family=Gamma); summary(napa.mod5) napa.mod6 <- glm(mph ~ Age + gender, family=Gamma(link="log")); summary(napa.mod6) napa.mod7 <- glm(mph~Age*gender,family=Gamma); summary(napa.mod7) napa.mod8 <- glm(mph ~ Age*gender, family=Gamma(link="log")); summary(napa.mod8) age1 <- min(Age):max(Age) yhat.F <- exp(1.8494178 - 0.0018116*age1) yhat.M <- exp((1.8494178+0.1521938) - (0.0018116+0.0013388)*age1) plot(Age,mph,col=gender) lines(age1,yhat.F,col=1) lines(age1,yhat.M,col=2) anova(napa.mod5,napa.mod7,test="Chisq") anova(napa.mod6,napa.mod8,test="Chisq") par(mfrow=c(2,2)) plot(Age[Gender=="F"],log(mph[Gender=="F"])) plot(Age[Gender=="M"],log(mph[Gender=="M"])) plot(Age[Gender=="F"],1/mph[Gender=="F"]) plot(Age[Gender=="M"],1/mph[Gender=="M"])