*10. GLM, and General Non-linear Models GLM models are Generalized Linear Models. They extend the multiple regression model. The GAM (Generalized Additive Model) model is a further extension. 10.1 A Taxonomy of Extensions to the Linear Model R allows a variety of extensions to the multiple linear regression model. In this chapter we describe the alternative functional forms. The basic model formulation41 is: Observed value = Model Prediction + Statistical Error Often it is assumed that the statistical error values (values of ε in the discussion below) are independently and identically distributed as Normal. Generalized Linear Models, and the other extensions we describe, allow a variety of non-normal distributions. In the discussion of this section, our focus is on the form of the model prediction, and we leave until later sections the discussion of different possibilities for the “error” distribution. Multiple regression model y = α + β1x1 + β2x2 + . . . +βpxp + ε Use lm() to fit multiple regression models. The various other models we describe are, in essence, generalizations of this model. Generalized Linear Model (e.g. logit model) y = g(a + b1x1) + ε Here g(.) is selected from one of a small number of options. For logit models, y = π + ε , where π log( ) = a + b1 x1 1−π Here π is an expected proportion, and log( π ) = logit(π ) is log(odds). 1−π We can turn this model around, and write y = g (a + b1 x1 ) + ε = exp(a + b1 x1 ) +ε 1 + exp(a + b1 x1 ) Here g(.) undoes the logit transformation. We can add more explanatory variables: a + b1x1 + . . . + bpxp. Use glm() to fit generalized linear models. Additive Model y = φ1 ( x1 ) + φ 2 ( x 2 ) + .... + φ p ( x p ) + ε 41 This may be generalized in various ways. Models which have this form may be nested within other models which have this basic form. Thus there may be `predictions’ and `errors’ at different levels within the total model. 85 Additive models are a generalization of lm models. In 1 dimension y = φ1 ( x1 ) + ε Some of z1 = φ1 ( x1 ), z 2 = φ 2 ( x2 ),..., z p = φ p ( x p ) may be smoothing functions, while others may be the usual linear model terms. The constant term gets absorbed into one or more of the φ s. Generalized Additive Model y = g (φ1 ( x1 ) + φ 2 ( x2 ) + .... + φ p ( x p )) + ε Generalized Additive Models are a generalisation of Generalized Linear Models. For example, g(.) may be the function that undoes the logit transformation, as in a logistic regression model. Some of z1 = φ1 ( x1 ), z 2 = φ 2 ( x2 ),..., z p = φ p ( x p ) may be smoothing functions, while others may be the usual linear model terms. We can transform to get the model y = g ( z1 + z 2 + ...z p ) + ε Notice that even if p = 1, we may still want to retain both φ1 (.) and g(.), i.e. y = g (φ1 ( x1 )) + ε The reason is that g(.) is a specific function, such as the inverse of the logit function. The function φ1 (.) does any further necessary smoothing, in case g(.) is not quite the right transformation. One wants g(.) to do as much of possible of the task of transformation, with φ1 (.) giving the transformation any necessary additional flourishes. At the time of writing, R has no specific provision for generalized additive models. The fitting of spline (bs() bs() or ns()) ns() terms in a linear model or a generalized linear model will often do what is needed. 10.2 Logistic Regression We will use a logistic regression model as a starting point for discussing Generalized Linear Models. With proportions that range from less than 0.1 to 0.99, it is not reasonable to expect that the expected proportion will be a linear function of x. Some such transformation (`link’ function) as the logit is required. A good way to think about logit models is that they work on a log(odds) scale. If p is a probability (e.g. that horse A will win the race), then the corresponding odds are p/(1-p), and log(odds) = log( p ) = log(p) -log(1-p) 1− p The linear model predicts, not p, but log( p ). Fig. 24 shows the logit transformation. 1− p 86 0.001 -6 -4 -2 0.1 0 0.75 2 4 0.99 6 logit(Proportion), i. e. log(Odds) 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Figure 24: The logit or log(odds) transformation. Shown here is a plot of log(odds) versus proportion. Notice how the range is stretched out at both ends. The logit or log(odds) function turns expected proportions into values that may range from -∞ to +∞. It is not satisfactory to use a linear model to predict proportions. The values from the linear model may well lie outside the range from 0 to 1. It is however in order to use a linear model to predict logit(proportion). The logit function is an example of a link function. There are various other link functions that we can use with proportions. One of the commonest is the complementary log-log function. 10.2.1 Anesthetic Depth Example Thirty patients were given an anesthetic agent that was maintained at a pre-determined [alveolar] concentration 42 for 15 minutes before making an incision . It was then noted whether the patient moved, i.e. jerked or twisted. The interest is in estimating how the probability of jerking or twisting varies with increasing concentration of the anesthetic agent. The response is best taken as nomove, for reasons that will emerge later. There is a small number of concentrations; so we begin by tabulating proportion that have the nomove outcome against concentration. Alveolar Concentration Nomove 0.8 1 1.2 1.4 1.6 2.5 0 6 4 2 2 0 0 1 1 1 4 4 4 2 Total 7 5 6 6 4 2 _____________________________________________ 42 I am grateful to John Erickson (Anesthesia and Critical Care, University of Chicago) and to Alan Welsh (Centre for Mathematics & its Applications, Australian National University) for allowing me use of these data. 87 Table 1: Patients moving (0) and not moving (1), for each of six different alveolar concentrations. 1.0 Fig. 25 then displays a plot of these proportions. 2 6 0.4 0.6 6 5 0.0 0.2 Proportion 0.8 4 1.0 1.5 2.0 2.5 Concentration Figure 25: Plot, versus concentration, of proportion of patients not moving. The horizontal line is the estimate of the proportion of moves one would expect if the concentration had no effect. We fit two models, the logit model and the complementary log-log model. We can fit the models either directly to the 0/1 data, or to the proportions in Table 1. To understand the output, you need to know about “deviances”. A deviance has a role very similar to a sum of squares in regression. Thus we have: Regression Logistic regression degrees of freedom degrees of freedom sum of squares mean sum of squares (divide by d.f.) deviance mean deviance (divide by d.f.) We prefer models with a small mean residual sum of squares. We prefer models with a small mean deviance. If individuals respond independently, with the same probability, then we have Bernoulli trials. Justification for assuming the same probability will arise from the way in which individuals are sampled. While individuals will certainly be different in their response the notion is that, each time a new individual is taken, they are drawn at random from some larger population. Here is the R code: > anaes.logit <<- glm(nomove ~ conc, family = binomial(link = logit), + data = anesthetic) The output summary is: 88 > summary(anaes.logit) Call: glm(formula = nomove ~ conc, family = binomial(link = logit), data = anesthetic) Deviance Residuals: Min 1Q Median 3Q Max -1.77 -0.744 0.0341 0.687 2.07 Coefficients: Value Std. Std. Error t value (Intercept) -6.47 conc 5.57 2.42 -2.68 2.04 2.72 (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: 41.5 on 29 degrees of freedom Residual Deviance: 27.8 on 28 degrees of freedom Number of Fisher Scoring Iterations: 5 Correlation of Coefficients: (Intercept) conc -0.981 0.8 0.4 0.01 0.1 Proportion 0.99 Fig. 26 is a graphical summary of the results: 0.0 0.5 1.0 1.5 2.0 2.5 Concentration Figure 26: Plot, versus concentration, of log(odds) [= logit(proportion)] of patients not moving. The line is the estimate of the proportion of moves, based on the fitted logit model. With such a small sample size it is impossible to do much that is useful to check the adequacy of the model. You can also try plot(anaes.logit) and plot.gam(anaes.logit). plot.gam(anaes.logit) 89 10.3 glm models (Generalized Linear Regression Modelling) In the above we had anaes.logit <<- glm(nomove ~ conc, family = binomial(link = logit), data=anesthetic) The family parameter specifies the distribution for the dependent variable. There is an optional argument that allows us to specify the link function. Below we give further examples. 10.3.2 Data in the form of counts Data that are in the form of counts can often be analysed quite effectively assuming the poisson family. The link that is commonly used here is log. log The log link transforms from positive numbers to numbers in the range -∞ to +∞ that a linear model may predict. 10.3.3 The gaussian family If no family is specified, then the family is taken to be gaussian. gaussian The default link is then the identity, as for an lm model. This way of formulating an lm type model does however have the advantage that one is not restricted to the identity link. data(airquality) air.glm<air.glm<-glm(Ozone^(1/3) ~ Solar.R + Wind + Temp, data = airquality) # Assumes gaussian family, i.e. normal errors model summary(air.glm) 10.4 Models that Include Smooth Spline Terms These make it possible to fit spline and other smooth transformations of explanatory variables. One can request a `smooth’ b-spline or n-spline transformation of a column of the X matrix. In place of x one specifies bs(x)or bs(x) ns(x). ns(x) One can control the smoothness of the curve, but often the default works quite well. You need to install the splines library. R does not at present have a facility for plots that show the contribution of each term to the model. 10.4.1 Dewpoint Data The data set dewpoint43 has columns mintemp, mintemp maxtemp and dewpoint. dewpoint The dewpoint values are averages, for each combination of mintemp and maxtemp, of monthly data from a number of different times and locations. We fit the model: dewpoint = mean of dewpoint + smooth(mintemp mintemp) maxtemp) mintemp + smooth(maxtemp maxtemp Taking out the mean is a computational convenience. Also it provides a more helpful form of output. Here are details of the calculations: dewpoint.lm <<- lm(dewpoint ~ bs(mintemp) + bs(maxtemp), data = dewpoint) options(digits=3) summary(dewpoint.lm) 10.5 Non-linear Models You can use nls() (non-linear least squares) to obtain a least squares fit to a non-linear function. 10.6 Model Summaries Type in 43 I am grateful to Dr Edward Linacre, Visiting Fellow, Geography Department, Australian National University, for making these data available. 90 methods(summary) to get a list of the summary methods that are available. You may want to mix and match, e.g. summary.lm() on an aov or glm object. The output may not be what you might expect. So be careful! 10.7 Further Elaborations Generalised Linear Models were developed in the 1970s. They unified a huge range of diverse methodology. They have now become a stock-in-trade of statistical analysts. Their practical implementation built on the powerful computational abilities that, by the 1970s, had been developed for handling linear model calculations. Practical data analysis demands further elaborations. An important elaboration is to the incorporation of more than one term in the error structure. The R nlme library implements such extensions, both for linear models and for a wide class of nonlinear models. Each such new development builds on the theoretical and computational tools that have arisen from earlier developments. Exciting new analysis tools will continue to appear for a long time yet. This is fortunate. Most professional users of R will regularly encounter data where the methodology that the data ideally demands is not yet available. 10.8 Exercises 1. Fit a Poisson regression model to the data in the data frame moths that Accompanies these notes. Allow different intercepts for different habitats. Use log(meters) as a covariate. 10.9 References Dobson, A. J. 1983. An Introduction to Statistical Modelling. Chapman and Hall, London. Hastie, T. J. and Tibshirani, R. J. 1990. Generalized Additive Models. Chapman and Hall, London. McCullagh, P. and Nelder, J. A., 2nd edn., 1989. Generalized Linear Models. Chapman and Hall. Venables, W. N. and Ripley, B. D., 2nd edn 1997. Modern Applied Statistics with S-Plus. Springer, New York. 91