PROC GLIMMIX Generalized Mixed Linear Models Animal Science 500 Lecture No. 17- 18 October 25, 2010 IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX Information PROC GLIMMIX is a procedure for fitting Generalized Linear Mixed Models GLiM’s (or GLM’s) allow for non-normal data and random effects GLiM’s allow for correlation amongst responses Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support Getting GLIMMIX SAS 9.1 Download add-on (Windows, Unix, Linux) from http://support.sas.com http://www.sas.com/statistics Supported on a limited number of platforms and platform configurations SAS 9.2 (available now for most academic sites) Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support GLIMMIX overview PROC GLIMMIX fits statistical models to data with correlations or nonconstant variability and where the response is not necessarily normally distributed. These models are known as generalized linear mixed models (GLMM). The GLMMs, like linear mixed models, assume normal (Gaussian) random effects. Conditional on these random effects, data can have any distribution in the exponential family IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The exponential family comprises many of the elementary discrete and continuous distributions and include: Binary, The experiment consists of n repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial. The trials are independent - that is, the outcome on one trial does not affect the outcome on other trials. Binomial, Poisson, and Negative binomial distributions, IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The exponential family comprises many of the elementary discrete and continuous distributions and include: Binomial, Situations in which the coin for example is biased, so that heads and tails have different probabilities. The probability distributions for which there are just two possible outcomes with fixed probability summing to one. These distributions are called are called binomial distributions Poisson, and Negative binomial distributions, IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The exponential family comprises many of the elementary discrete and continuous distributions and include: Poisson, The poisson distribution is an appropriate model for count data. Examples of such data are mortality data, The number of misprints in a book, The number of bacteria on a plate, and The number of activations of a geiger counter. Negative binomial distributions, IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The exponential family comprises many of the elementary discrete and continuous distributions and include: Negative binomial distributions, The probability distribution of a negative binomial random variable is called a negative binomial distribution. Also known as the Pascal distribution. Example: You are flipping a coin repeatedly and count the number of heads (successes). If we continue flipping the coin until it has landed 2 times on heads, we are conducting a negative binomial experiment. The negative binomial random variable is the number of coin flips required to achieve 2 heads. IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The exponential family comprises many of the elementary discrete and continuous distributions and include: The previous distributions are discrete members of this family. The normal, beta, gamma, and chi-square distributions are representatives of the continuous distributions in this family. In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit by the GENMOD procedure). GLMMs are useful for estimating trends in disease rates IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The continuous distribution forms in the exponential family include the: Normal, also called Gaussian Beta 1. 2. The Beta distribution has two main uses: As the description of uncertainty or random variation of a probability, fraction or prevalence; As a useful distribution one can rescale and shift to create distributions with a wide range of shapes and over any finite range. As such, it is sometimes used to model expert opinion IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The continuous distribution forms in the exponential family include the: Gamma, Applications based on intervals between events which derive from it being the sum of one or more exponentially distributed variables. In this form, examples of its use include queuing models, the flow of items through manufacturing and distribution processes and the load on web servers and the many and varied forms of telecom exchange. Due to its moderately skewed profile, it can be used as a model in a range of disciplines, including climatology where it is a workable model for rainfall and financial services where it has been used for modelling insurance claims and the size of loan defaults and as such has been used in probability of ruin and value at risk calculations. IOWA STATE UNIVERSITY Department of Animal Science From http://www.brighton-webs.co.uk/distributions/gamma.asp GLIMMIX overview The continuous distribution forms in the exponential family include the: Chi-square distributions The best-known situations in which the chi-square distribution is used are the common chi-square tests for goodness of fit to compare an observed distribution to a known or theoretical distribution. Example expected movie rating distribution to the observed movie rating distribution Also can be used to test the independency of two criteria of classification of qualitative data. χ2 = Σ (O – E)2 E IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The continuous distribution forms in the exponential family include the: Chi-square distributions Hypotheses H0: The distribution of observed frequencies equals the distribution of expected frequencies. H1: The distribution of observed frequencies does not equal the distribution of expected frequencies. Assumptions Observations are independent (each subject can appear once and only once in a table) Expected frequencies in each row are at least 15. IOWA STATE UNIVERSITY Department of Animal Science Example of a Chi Square distribution Example 1: Pepsi Challenge Test whether cola preference among 220 college students in a simple random sample is equally distributed. Each individual tastes each of the three colas. Between tastes subjects eat a soda cracker. Each subject receives the colas in a different order. Each subject then selects which soda he/she likes best. Results: Pepsi 85, Coke 57, RC 78. Use equal expected frequencies for each row, E = 73.33. O E O-E Pepsi 85 73.33 11.67 Coke 57 73.33 -16.33 RC 78 73.33 4.67 Totals 220 219.99 df = rows - 1 = 3 - 1 = 2. Critical value of χ2 = 5.99 at alpha = 0.05. Observed value of χ2 = 5.8. Decision: Fail to reject H0. IOWA STATE UNIVERSITY Department of Animal Science (O-E)2 136.19 266.67 21.81 χ2 = (O-E)2/E 1.86 3.64 0.3 5.8 Example from: http://www.philender.com/courses/intro/notes3/chi.html Distributions Supported in PROC GLIMMIX Discrete Binary Binomial Poisson Geometric Negative Binomial Multinomial (nominal and ordinal) Continuous Beta Normal “Lognormal” Gamma Exponential Inverse Gaussian Shifted T Distributions specified through DIST= (and LINK=) options on the MODEL statement Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support GLIMMIX overview In the absence of random effects, the GLIMMIX procedure fits generalized linear models (fit by the GENMOD procedure). GLMMs are useful for estimating: Trends in disease rates, Modeling CD4 counts in a clinical trial over time, Modeling the proportion of infected plants on experimental units in a design with randomly selected treatments or randomly selected blocks Predicting the probability of high ozone levels in counties Modeling skewed data over time, Analyzing customer preference, Joint modeling of multivariate outcomes, etc. IOWA STATE UNIVERSITY Department of Animal Science GLIMMIX overview The syntax in SAS to use GLIMMIX to what we have learned for Proc Mixed. This includes CLASS, MODEL, and RANDOM statements. IOWA STATE UNIVERSITY Department of Animal Science PROC GLIMMIX features. SUBJECT= and GROUP= options, which enable blocking of variance matrices and parameter heterogeneity Linear unbiased predictors Flexible covariance structures for random and residual random effects, including variance components, unstructured, autoregressive, and spatial structures The CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements, which produce hypothesis tests and estimable linear combinations of effects The NLOPTIONS statement, which enables you to exercise control over the numerical optimization. You can choose techniques, update methods, line search algorithms, convergence criteria, and more. Or, you can choose the default optimization strategies selected for the particular class of model you are fitting. IOWA STATE UNIVERSITY Department of Animal Science PROC GLIMMIX features. Computed variables with SAS programming statements inside of PROC GLIMMIX (except for variables listed in the CLASS statement). These computed variables can appear in the MODEL, RANDOM, WEIGHT, or FREQ statement. User-specified link and variance functions choice of model-based variance-covariance estimators for the fixed effects or empirical (sandwich) Estimators to make analysis robust against misspecification of the covariance structure and to adjust for small-sample bias joint modeling for multivariate data. For example, you can model binary and normal responses from a subject jointly and use random effects to relate (fuse) the two outcomes. IOWA STATE UNIVERSITY Department of Animal Science Comparing the GLIMMIX and MIXED Procedures The MIXED procedure is different from the GLIMMIX procedure in the following respect: Linear mixed models are a special case in the family of generalized linear mixed models; A linear mixed model is a generalized linear mixed model where the conditional distribution is normal and the link function is the identity function. Most models that can be fit with the MIXED procedure can also be fit with the GLIMMIX procedure. Despite this overlap in functionality, there are also some important differences between the two procedures. Knowledge concerning the differences enables the user to select the most appropriate tool in situations where you have a choice between procedures and to identify situations where a choice does not exist. IOWA STATE UNIVERSITY Department of Animal Science Comparing the GLIMMIX and MIXED Procedures The following PROC MIXED statement when using the repeated statement repeated / subject=id type=ar(1); is equivalent to the following Random statement in the GLIMMIX procedure: random _residual_ / subject=id type=ar(1); IOWA STATE UNIVERSITY Department of Animal Science Syntax: GLIMMIX Procedure You can specify the following statements in the GLIMMIX procedure: PROC GLIMMIX <options> ; BY variables ; CLASS variables ; CONTRAST ’label’ contrast-specification <, contrastspecification> <, ...> </ options> ; COVTEST <’label’> <test-specification> </ options> ; EFFECT effect-specification ; ESTIMATE ’label’ contrast-specification <(divisor=n)> <, ’label’ contrast-specification <(divisor=n)>> <, ...> </ options> ; FREQ variable IOWA STATE UNIVERSITY Department of Animal Science Syntax: GLIMMIX Procedure ID Variables ; LSMEANS fixed-effects </ options> ; LSMESTIMATE fixed-effect <’label’> values <divisor=> <, <’label’> values <divisor=n>> <, ...> </ options> ; MODEL response<(response-options)> = <fixed-effects> </ modeloptions> ; MODEL events/trials = <fixed-effects> </ model-options> ; NLOPTIONS <options> ; OUTPUT <OUT=SAS-data-set> <keyword<(keyword-options)> <=name>>... <keyword<(keyword-options)> <=name>> </ options> ; PARMS (value-list) ...</ options> ; RANDOM random-effects </ options> ; IOWA STATE UNIVERSITY Department of Animal Science Syntax: GLIMMIX Procedure WEIGHT variable ; Programming statements ; The CLASS, CONTRAST, COVTEST, EFFECT, ESTIMATE, LSMEANS, LSMESTIMATE, and RANDOM statements and the programming statements can appear multiple times. The PROC GLIMMIX and MODEL statements are required, and the MODEL statement must appear after the CLASS statement if a CLASS statement is included. The EFFECT statements must appear before the MODEL statement. IOWA STATE UNIVERSITY Department of Animal Science Comparing MIXED and GLIMMIX PROC GLIMMIX BY CLASS CONTRAST EFFECT ESTIMATE FREQ ID LSMEANS LSMESTIMATE MODEL NLOPTIONS OUTPUT PARMS PRIOR RANDOM WEIGHT <Programming Statements> PROC MIXED BY CLASS CONTRAST ESTIMATE ID LSMEANS MODEL PARMS RANDOM REPEATED WEIGHT Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support Comparing MIXED and GLIMMIX MIXED uses RANDOM statement for G-side effects and REPEATED statement for R-side effects. Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support Comparing MIXED and GLIMMIX Both types of effects are specified with the RANDOM statement in GLIMMIX Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support Comparing MIXED and GLIMMIX What are G-and R-side Random Effects? Recallr from mixed models: Y = X*Beta + Z*Gamma + E G-side effects enter through Z*Gamma R-side effects apply to the covariance matrix on E G-side effects are “inside” the link function, making them easier to interpret and understand R-side effects are “outside” the link function and are more difficult to interpret Introduction to Generalized Linear Mixed Models IOWA STATE UNIVERSITYAn Using SAS PROC GLIMMIX Department of Animal Science P. Gibbs, SAS Technical Support Glimmix Example Proc glimmix data=one; Class treatment date site load; Model deads/pigs_transported = treatment/ dist=binomial link=logit solution; Random site date(site) load(date*site); LSMeans treatment/ilink pdiff cl; Run; Quit; IOWA STATE UNIVERSITY Department of Animal Science Glimmix Example The GLIMMIX Procedure Model Information Data Set WORK.ONE Response Variable (Events) Deads Response Variable (Trials) Pigs_Transported Response Distribution Binomial Link Function Logit Variance Function Default Variance Matrix Not blocked Estimation Technique Residual PL Degrees of Freedom Method Containment IOWA STATE UNIVERSITY Department of Animal Science Glimmix Example Class Levels Values Treatment 2 Blue Red Date 10 07/07/09 07/08/09 07/13/09 07/14/09 07/15/09 07/20/09 07/21/09 07/22/09 07/27/09 07/28/09 Site 2 L&L1 LPB Load 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 Number of Observations Read 54 Number of Observations Used 54 Number of Events 10 Number of Trials 4462 IOWA STATE UNIVERSITY Department of Animal Science Glimmix Example Dimensions G-side Cov. Parameters 3 Columns in X 3 Columns in Z 40 Subjects (Blocks in V) Max Obs per Subject IOWA STATE UNIVERSITY Department of Animal Science 1 54 Glimmix Example The GLIMMIX Procedure Iteration History Objective Iteration Restarts Subiterations Function Max Change Gradient 0 0 1 180.8730287 2.00000000 9.588598 1 0 0 226.21287482 0.17907168 5.707842 2 0 3 244.93049605 2.00000000 4.510821 3 0 2 241.99123222 0.24664831 4.378435 4 0 2 241.22432004 0.03671922 4.357186 5 0 1 241.08063527 0.00328332 4.35531 6 0 1 241.06655367 0.00015363 4.355223 7 0 0 241.06587398 0.00000000 4.355221 Convergence criterion (PCONV=1.11022E-8) satisfied. Estimated G matrix is not positive definite. IOWA STATE UNIVERSITY Department of Animal Science The Estimated G matrix not positive definite message usually indicates that one or more variance components on the RANDOM statement is/are estimated to be zero and could/should be removed from the model. Glimmix Example Fit Statistics -2 Res Log Pseudo-Likelihood 241.07 Generalized Chi-Square 47.41 Gener. Chi-Square / DF 0.91 IOWA STATE UNIVERSITY Department of Animal Science Glimmix Example Covariance Parameter Estimates Standard Cov Parm Estimate Error Site 0 . Date(Site) 0 . Load(Date*Site) 0.1569 IOWA STATE UNIVERSITY Department of Animal Science 0.7068 Glimmix Example Solutions for Fixed Effects Standard Effect Treatment Intercept Treatment Blue Treatment Red Estimate Error DF t Value -5.9213 0.4160 1 -14.23 0.0447 -0.4067 0.6466 26 -0.63 0.5348 . . 0 IOWA STATE UNIVERSITY Department of Animal Science . Pr > |t| . Glimmix Example Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F Treatment 1 26 0.40 0.5348 IOWA STATE UNIVERSITY Department of Animal Science Glimmix Example Treatment Least Squares Means Standard Treatment Estimate Error DF t Value Pr > |t| Alpha Lower Upper Mean Blue -6.3280 0.5066 26 -12.49 <.0001 0.05 -7.3692 -5.2867 0.001782 Red -5.9213 0.4160 26 -14.23 <.0001 0.05 -6.7764 -5.0662 0.002675 Treatment Least Squares Means Standard Treatment Error Lower Upper Mean Mean Mean Blue 0.000901 0.000630 0.005033 Red 0.001110 0.001139 0.006267 IOWA STATE UNIVERSITY Department of Animal Science Glimmix Example Differences of Treatment Least Squares Means Standard Treatment _Treatment Estimate Blue Red -0.4067 Error 0.6466 DF t Value Pr > |t| Alpha 26 IOWA STATE UNIVERSITY Department of Animal Science -0.63 0.5348 Lower 0.05 -1.7358 Upper 0.9224 IOWA STATE UNIVERSITY Department of Animal Science