Page 36 Generalized Linear Models. (for a much more detailed discussion refer to Agresti’s text (Categorical Data analysis,Second Edition) Chapter 4, particularly pages 115-118, 125-132. Generalized linear models (GLMs) are “a broad class of models that include ordinary regression and analysis of variance for continuous response variables, as well as for categorical response variables”. There are three components that are common to all GLMs: a Random component Systematic Component Link Function Random Component: The random component: refers to the probability distribution of the response Y. We observe independent random variables Y1, Y2, . . ., YN. We now look at three ‘random components examples. Example 1. (Y1, Y2, . . ., YN) might be normal. In this case, we would say the random component is the normal distribution. This component leads to ordinary regression and analysis of variance models. Example 2. If the observations are Bernoulli random variables (which have values 0 or 1), then we would say the link function is the binomial distribution. . When the random component is the binomial distribution, we are commonly concerned with logistic regression models or probit models.. Example 3. Quite often the random variables Y1, Y2, . . ., YN have a Poisson distribution. Then we will be involved with Poisson regression models or loglinear models. Systematic Component. The random variables Yi, I = 1, 2, . . ., N, have expected values µi, I = 1, 2, . . ., N. The systematic component involves the explanatory variables x1, x2 , · · · , xk.as linear predictors: 0 + 1 x1 + 2 x2 + · · · + k xk. Page 37 Link Function. The third component of a GLM is the link between the random and systematic components. It says how the mean µ = E(Y) relates to the explanatory variables in the linear predictor through specifying a function g(µ): g(µ) = 0 + 1 x1 + 2 x2 + · · · + k xk. g(µ) is called the link function. Here are some examples: Example 1. The logistic regression model says ln [( x1, x2 , · · · , xk)/1-( x1, x2 , · · · , xk)] = 0 + 1 x1 + 2 x2 + · · · + k xk. The observations Y1, Y2, . . ., YN have a binomial distribution (the random component). Thus, for logistic regression, the link function is ln[µ/(1-µ)] and is called the logit link. There are other link functions used when the random component is binomial. For example, the normit/probit model has the binomial distribution as the random component and link function g(µ) = -1(µ), where (x) is the cumulative normal distribution. There is also a ‘Gompit’/complimentary log-log link (available in Minitab with the probit link also) Example 2. For ordinary linear regression, we assume the observations have a normal distribution (the random component) and the mean is µ(0 + 1 x1 + 2 x2 + · · · + k xk) = 0 + 1 x1 + 2 x2 + · · · + k xk. In this case the link function is the identity: g(µ) = µ. Example 3. If we assume the observations Y1, Y2, . . ., YN have a Poisson distribution (the random component) and the link function is g(µ) = ln µ, then we have the Poisson regression model: ln µ(0 + 1 x1 + 2 x2 + · · · + k xk) = 0 + 1 x1 + 2 x2 + · · · + k xk. Page 38 Sometimes the identity link function is used in Poisson regression, so that µ(0 + 1 x1 + 2 x2 + · · · + k xk) = 0 + 1 x1 + 2 x2 + · · · + k xk. This model is the same as that used in ordinary regression except that the random component is the Poisson distribution. There are other random components and link functions used in generalized linear models. The probit model has the binomial distribution as the random component and link function g(µ) = -1(µ), where (x) is the cumulative normal distribution. In some disciplines, the negative binomial distribution has been the random component. Here is a comparison of the cumulative Logistic and Normal distributions : CumNormal 0.001350 0.001866 0.002555 0.003467 0.004661 0.006210 0.008198 0.010724 0.013903 0.017864 0.022750 0.028717 0.035930 0.044565 0.054799 0.066807 0.080757 0.096800 0.115070 0.135666 0.158655 0.184060 0.211855 0.241964 0.274253 0.308538 0.344578 0.382089 0.420740 0.460172 0.500000 Logistic 0.047426 0.052154 0.057324 0.062973 0.069138 0.075858 0.083173 0.091123 0.099750 0.109097 0.119203 0.130108 0.141851 0.154465 0.167982 0.182426 0.197816 0.214165 0.231475 0.249740 0.268941 0.289050 0.310026 0.331812 0.354344 0.377541 0.401312 0.425557 0.450166 0.475021 0.500000 Scatterplot of CumNormal vs x 1.0 0.8 CumNormal x -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.6 0.4 0.2 0.0 -3 -2 -1 0 x 1 2 3 Scatterplot of Logistic vs x 1.0 0.8 Logistic Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0.6 0.4 0.2 0.0 -3 -2 -1 0 x 1 2 3 Page 39 Regression Models with Binary Response Variables: Logistic Regression A common problem is that of estimating the probability of success using a predictor variable x. Here is an example. Launch temperatures (in degrees Fahrenheit) and an indicator of O-ring failure for 24 space shuttle launches prior to the space shuttle Challenger disaster in 1986 are given below: x (temperature) 53 56 57 63 66 67 67 67 68 69 70 70 Failure yes yes yes no no no no no no no no yes x (temperature) 70 70 72 73 75 75 76 76 78 79 80 81 Failure yes yes no no no yes no no no no no no Can we predict the probability of failure using temperature? Let (x) = Prob(success|x) and 1- (x) = Prob(failure|x). We want a 'model' for (x). We will set up a 'regression model' for (x). Why not a linear regression model (x) = 0 + 1x1 ? Answer: a. For x large positively and x large negatively (x) = 0 + 1x1 will eventually be negative and greater than 1, an undesirable feature of a model for probabilities. b. We are working with Bernoulli trials. The variance of the outcome of a Bernoulli trial is [(x)(1-(x)] = [0 + 1x1][1 - (0 + 1x1)]. The variance of an observation depends on x, meaning the assumption of constant variance is not satisfied. c. The errors would be either 0-[0 + 1x1] = -0 - 1x1 or 1 - (0 + 1x1)--just two possible values for a given x--violating assumption of normality. What should a regression model look like? Page 40 1. Since (x) is a probability, its values should be between 0 and 1. 2. For the O-ring problem, we would expect (x) to increase from values near 0 to values near 1: as temperatures increase the chances of a failure should decrease or the chances of a 'success' --no O-ring failure--should increase. Here are some 'nice-looking' 'curves': (x) = Normal Distribution Function Logistic Distribution function 1.0 1.0 0.9 0.8 Logistic Normal 0.7 0.5 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 -3 -2 -1 0 1 2 3 -3 -2 -1 x 0 1 2 3 x What we are looking at on the left (above)is a normal curve for probabilities. On the y-axis is (x) and on the x-axis is x: given a value of x, the probability of a success is (x), where (x) is the normal curve. There are other 'curves' we could use: The curve on the right looks a lot like the first one (normal), but it is actually called the 'logistic' curve'. There are many other curves we could use, but these are the two most commonly used ones (by a country mile!). The curves above are in 'standard units'. (x) denotes the cumulative normal curve. For a regression model we use (x) = (0 + 1x1). The expression for the logistic curve is much nicer: F(x) = ex / (1 + ex ). The corresponding regression model is (x) = F(0 + 1x1) =exp(0 + 1x1)/ [1 + exp(0 + 1x1)]. If the 'slope' is negative, the curves would curve downward as x increases. Which curve should be used? Or better yet: which curve(s) are used in practice? If the normal distribution is used, the model is called the 'probit' (or ‘normit’model, while if the logistic curve is used it is called the 'logistic regression model'. Page 41 The logistic model says that (x) = F(0 + 1x1) =exp(0 + 1x1)/ [1 + exp(0 + 1x1)] A bit of algebra shows that this model is equivalent to ln [(x) / [1 - (x)] = 0 + 1x1 A correspondingly simple model cannot be obtained for the probit model. The quantity ln [(x) / [1 - (x)] is called the logit of (x) or the logit transform of (x). Logistic Regression Example. We illustrate logistic regression using the Challenger Shuttle data on O-ring failures. We call success 'no O-ring failure'--it is coded as '1' in the output. Here is the Minitab output, using Stat>Regression>Binary Logistic Regression. Binary Logistic Regression Link Function: Logit Response Information Variable failure Value yes no Total Count 7 17 24 (Event) Logistic Regression Table Predictor Constant temp Coef 10.875 -0.17132 StDev 5.703 0.08344 Z P 1.91 0.057 -2.05 0.040 Odds Ratio 0.84 95% CI Lower Upper 0.72 0.99 Log-Likelihood = -11.515 Test that slope is zero: G = 5.944, DF = 1, P-Value = 0.015 Fitted Model: Prob(failure|temp) = e10.8750.17132temp . 1 e10.8750.17132 Fitted probabilities with y=1 denoting ‘failure’.are given on the next page. Page 42 Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Temp 53 56 57 63 66 67 67 67 68 69 70 70 70 70 72 73 75 75 76 76 78 79 80 81 y 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 Prob 0.857583 0.782688 0.752144 0.520528 0.393696 0.353629 0.353629 0.353629 0.315518 0.279737 0.246552 0.246552 0.246552 0.246552 0.188509 0.163687 0.121993 0.121993 0.104799 0.104799 0.076729 0.065438 0.055709 0.047353 Poisson and Ordinary Regression of ‘Number of Arguments on Years Married’ Suppose we wanted to model the number Y of arguments married couples have as a function of the number of years they have been married. 60 couples, with 3 married x years, x = 1, 2, …, 20, are randomly obtained and asked how many arguments they had in the past year (they answer honestly). A summary by year is given below with output on 1) a linear regression, 2) a quadratic regression, 3) a quadratic regression using the square root of Y, and a Poisson regression. Data Display yr 1 2 3 4 5 6 7 8 9 ysum 7 12 14 27 31 38 54 59 61 aver 2.3333 4.0000 4.6667 9.0000 10.3333 12.6667 18.0000 19.6667 20.3333 x 1 1 1 2 2 2 3 3 3 x2 1 1 1 4 4 4 9 9 9 y 5 0 2 2 6 4 2 6 6 ysq 2.24 0.00 1.41 1.41 2.45 2.00 1.41 2.45 2.45 SRES1 1.79 0.53 1.03 -0.13 0.85 0.36 -1.13 -0.17 -0.17 FITS1 -2.12 -2.12 -2.12 2.52 2.52 2.52 6.68 6.68 6.68 SRES2 2.51 -2.08 0.82 -0.61 1.46 0.56 -1.86 0.19 0.19 FITS2 1.01 1.01 1.01 1.72 1.72 1.72 2.36 2.36 2.36 Page 43 10 11 12 13 14 15 16 17 18 19 20 73 69 81 69 81 57 47 38 31 26 14 24.3333 23.0000 27.0000 23.0000 27.0000 19.0000 15.6667 12.6667 10.3333 8.6667 4.6667 4 4 4 5 5 5 6 6 6 7 7 16 16 16 25 25 25 36 36 36 49 49 4 10 13 11 11 9 10 15 13 20 18 2.00 3.16 3.61 3.32 3.32 3.00 3.16 3.87 3.61 4.47 4.24 -1.53 -0.09 0.63 -0.62 -0.62 -1.09 -1.51 -0.32 -0.79 0.34 -0.14 10.37 10.37 10.37 13.59 13.59 13.59 16.32 16.32 16.32 18.58 18.58 -1.80 0.48 1.35 -0.18 -0.18 -0.80 -1.31 0.08 -0.44 0.57 0.12 2.92 2.92 2.92 3.41 3.41 3.41 3.83 3.83 3.83 4.18 4.18 Simple Linear Regression: The regression equation is Y = 11.1 + 0.354 x Predictor Constant x Coef 11.104 0.3536 S = 8.33097 SE Coef 2.234 0.1865 T 4.97 1.90 R-Sq = 5.8% P 0.000 0.063 R-Sq(adj) = 4.2% Analysis of Variance Source Regression Residual Error Total DF 1 58 59 SS 249.49 4025.49 4274.98 MS 249.49 69.41 F 3.59 P 0.063 Unusual Observations Obs 35 38 58 x 12.0 13.0 20.0 Y 33.00 36.00 2.00 Fit 15.35 15.70 18.18 SE Fit 1.11 1.17 2.07 Residual 17.65 20.30 -16.18 St Resid 2.14R 2.46R -2.00R R denotes an observation with a large standardized residual. A 4 in 1 graphical display is given below., It shows the following: Residual Plots for Y Normal Probability Plot of the Residuals Residuals Versus the Fitted Values 99.9 20 90 Residual Percent 99 50 10 10 0 -10 1 0.1 -20 -10 0 Residual 10 -20 20 Histogram of the Residuals 15.0 Fitted Value 16.5 18.0 20 12 Residual Frequency 13.5 Residuals Versus the Order of the Data 16 8 4 0 12.0 10 0 -10 -15 -10 -5 0 5 Residual 10 15 20 -20 1 5 10 15 20 25 30 35 40 45 50 55 60 Observation Order Top right graph shows curvature, suggesting a squared term be added to the model. Page 44 Quadratic Regression Analysis: Y versus x, xsq The regression equation is Y = - 7.24 + 5.36 x - 0.238 xsq Predictor Constant x xsq Coef -7.243 5.3572 -0.23827 S = 4.26222 SE Coef 1.831 0.4015 0.01857 R-Sq = 75.8% T -3.96 13.34 -12.83 P 0.000 0.000 0.000 R-Sq(adj) = 74.9% Analysis of Variance Source Regression Residual Error Total Source x xsq DF 1 1 DF 2 57 59 SS 3239.5 1035.5 4275.0 MS 1619.7 18.2 F 89.16 P 0.000 Seq SS 249.5 2990.0 Unusual Observations Obs x Y Fit 35 12.0 33.000 22.733 38 13.0 36.000 22.134 42 14.0 30.000 21.058 SE Fit 0.809 0.782 0.753 Residual 10.267 13.866 8.942 St Resid 2.45R 3.31R 2.13R R denotes an observation with a large standardized residual. Residual Plots for Y Normal Probability Plot of the Residuals Residuals Versus the Fitted Values 99.9 15 10 90 Residual Percent 99 50 10 Make a square root transformation 0 -5 1 0.1 5 -10 0 Residual 10 0 Histogram of the Residuals 5 10 15 Fitted Value 20 Residuals Versus the Order of the Data 15 10 9 Residual Frequency 12 6 3 0 5 0 -5 -4 0 4 Residual 8 12 Graph top right suggests non constant variance. 1 5 10 15 20 25 30 35 40 45 50 55 60 Observation Order Page 45 Regression Analysis: sqrtY versus x, xsq The regression equation is sqrtY = 0.236 + 0.814 x - 0.0358 xsq Predictor Constant x xsq Coef 0.2358 0.81396 -0.035782 S = 0.520233 SE Coef 0.2235 0.04901 0.002267 R-Sq = 83.0% T 1.06 16.61 -15.78 P 0.296 0.000 0.000 R-Sq(adj) = 82.4% Analysis of Variance Source Regression Residual Error Total Source x xsq DF 1 1 DF 2 57 59 SS 75.236 15.427 90.663 MS 37.618 0.271 F 139.00 P 0.000 Seq SS 7.804 67.432 Unusual Observations Obs 1 2 38 x 1.0 1.0 13.0 sqrtY 2.2361 0.0000 6.0000 Fit 1.0140 1.0140 4.7702 SE Fit 0.1829 0.1829 0.0954 Residual 1.2221 -1.0140 1.2298 St Resid 2.51R -2.08R 2.40R R denotes an observation with a large standardized residual. Plot of residuals vs. fitted values now looks more random—suggests variances are constant. Residual Plots for sqrtY Normal Probability Plot of the Residuals Residuals Versus the Fitted Values 99.9 1.0 90 Residual Percent 99 50 10 1 0.1 -1 0 Residual 1 2 1 Histogram of the Residuals 2 3 Fitted Value 4 5 Residuals Versus the Order of the Data 1.0 12 Residual Frequency 0.0 -0.5 -1.0 -2 16 8 0.5 0.0 -0.5 4 0 0.5 -1.0 -1.0 -0.5 0.0 0.5 Residual 1.0 1 5 10 15 20 25 30 35 40 45 50 55 60 Observation Order Page 46 Poisson Regression Output Why use Poisson Regression? The probability distribution of Y should be given by the Poisson, from the nature of the phenomena. The SAS System The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable Observations Used WORK.ARGUMENTS Poisson Log y 60 Criteria For Assessing Goodness Of Fit Criterion DF Value Deviance 57 54.4054 Scaled Deviance 57 54.4054 Pearson Chi-Square 57 51.4574 Scaled Pearson X2 57 51.4574 Log Likelihood 1638.7309 Parameter Intercept x xsq Scale DF 1 1 1 0 Analysis Of Parameter Estimates Standard Wald 95% Estimate Error Confidence Limits 0.4193 0.1845 0.0576 0.7810 0.4904 0.0347 0.4223 0.5585 -0.0213 0.0015 -0.0243 -0.0183 1.0000 0.0000 1.0000 1.0000 Obs 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 51 52 55 58 x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 xsq 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 y 5 2 2 4 11 10 20 22 22 27 21 28 17 26 20 11 11 12 8 2 Value/DF 0.9545 0.9545 0.9028 0.9028 ChiSquare 5.16 199.33 194.01 Pr > ChiSq 0.0231 <.0001 <.0001 pred 2.4311 3.7240 5.4662 7.6885 10.3628 13.3840 16.5644 19.6445 22.3247 24.3112 25.3691 25.3678 24.3073 22.3187 19.6372 16.5564 13.3762 10.3556 7.6824 5.4612 Page 47 Plot of fitted values vs. year 25 pred 20 15 10 5 0 0 5 10 x 15 20