Lecture 16 GLIMMIX - Animal Science Computer Labs

advertisement
PROC GLIMMIX
Generalized Mixed Linear Models
Animal Science 500
Lecture No. 17- 18
October 25, 2010
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX Information
 PROC
GLIMMIX is a procedure for fitting
Generalized Linear Mixed Models
 GLiM’s
(or GLM’s) allow for non-normal data
and random effects
 GLiM’s
allow for correlation amongst
responses
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
Getting GLIMMIX
 SAS
9.1 Download add-on (Windows, Unix,
Linux) from


http://support.sas.com
http://www.sas.com/statistics
 Supported
on a limited number of platforms
and platform configurations
 SAS
9.2 (available now for most academic
sites)
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
GLIMMIX overview
 PROC
GLIMMIX fits statistical models to data
with correlations or nonconstant variability
and where the response is not necessarily
normally distributed.
 These
models are known as generalized linear
mixed models (GLMM).
 The
GLMMs, like linear mixed models, assume
normal (Gaussian) random effects.
 Conditional
on these random effects, data can
have any distribution in the exponential family
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview
 The
exponential family comprises many of the
elementary discrete and continuous distributions
and include:

Binary,







The experiment consists of n repeated trials.
Each trial can result in just two possible outcomes. We call one of
these outcomes a success and the other, a failure.
The probability of success, denoted by P, is the same on every trial.
The trials are independent - that is, the outcome on one trial does
not affect the outcome on other trials.
Binomial,
Poisson, and
Negative binomial distributions,
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview
 The
exponential family comprises many of the
elementary discrete and continuous distributions
and include:

Binomial,





Situations in which the coin for example is biased, so that heads
and tails have different probabilities.
The probability distributions for which there are just two possible
outcomes with fixed probability summing to one.
These distributions are called are called binomial distributions
Poisson, and
Negative binomial distributions,
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview
 The
exponential family comprises many of the
elementary discrete and continuous distributions
and include:

Poisson,

The poisson distribution is an appropriate model for count data.





Examples of such data are mortality data,
The number of misprints in a book,
The number of bacteria on a plate, and
The number of activations of a geiger counter.
Negative binomial distributions,
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview
 The
exponential family comprises many of the
elementary discrete and continuous distributions
and include:

Negative binomial distributions,




The probability distribution of a negative binomial random variable is
called a negative binomial distribution.
Also known as the Pascal distribution.
Example: You are flipping a coin repeatedly and count the number
of heads (successes). If we continue flipping the coin until it has
landed 2 times on heads, we are conducting a negative binomial
experiment.
The negative binomial random variable is the number of coin flips
required to achieve 2 heads.
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview

The exponential family comprises many of the elementary
discrete and continuous distributions and include:

The previous distributions are discrete members of this
family.

The normal, beta, gamma, and chi-square distributions are
representatives of the continuous distributions in this
family.

In the absence of random effects, the GLIMMIX procedure
fits generalized linear models (fit by the GENMOD
procedure).

GLMMs are useful for estimating trends in disease rates
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview

The continuous distribution forms in the exponential family
include the:


Normal, also called Gaussian
Beta

1.
2.
The Beta distribution has two main uses:
As the description of uncertainty or random variation of a probability, fraction or
prevalence;
As a useful distribution one can rescale and shift to create distributions with a wide
range of shapes and over any finite range. As such, it is sometimes used to model
expert opinion
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview

The continuous distribution forms in the exponential family
include the:

Gamma,


Applications based on intervals between events which derive from it being
the sum of one or more exponentially distributed variables. In this form,
examples of its use include queuing models, the flow of items through
manufacturing and distribution processes and the load on web servers and
the many and varied forms of telecom exchange.
Due to its moderately skewed profile, it can be used as a model in a range of
disciplines, including climatology where it is a workable model for rainfall and
financial services where it has been used for modelling insurance claims and
the size of loan defaults and as such has been used in probability of ruin and
value at risk calculations.
IOWA STATE UNIVERSITY
Department of Animal Science
From http://www.brighton-webs.co.uk/distributions/gamma.asp
GLIMMIX overview

The continuous distribution forms in the exponential family
include the:

Chi-square distributions

The best-known situations in which the chi-square distribution is used are the common
chi-square tests for goodness of fit to compare an observed distribution to a known or
theoretical distribution.
 Example expected movie rating distribution to the observed movie rating distribution


Also can be used to test the independency of two criteria of classification of qualitative
data.
χ2 = Σ (O – E)2
E
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview

The continuous distribution forms in the exponential family
include the:

Chi-square distributions

Hypotheses



H0: The distribution of observed frequencies equals the distribution of expected
frequencies.
H1: The distribution of observed frequencies does not equal the distribution of
expected frequencies.
Assumptions


Observations are independent (each subject can appear once and only once in a
table)
Expected frequencies in each row are at least 15.
IOWA STATE UNIVERSITY
Department of Animal Science
Example of a Chi Square distribution

Example 1: Pepsi Challenge






Test whether cola preference among 220 college students in a simple random sample is equally distributed.
Each individual tastes each of the three colas.
Between tastes subjects eat a soda cracker.
Each subject receives the colas in a different order.
Each subject then selects which soda he/she likes best.
Results: Pepsi 85, Coke 57, RC 78.

Use equal expected frequencies for each row, E = 73.33.
O
E
O-E
Pepsi
85
73.33
11.67
Coke
57
73.33
-16.33
RC
78
73.33
4.67
Totals
220
219.99

df = rows - 1 = 3 - 1 = 2.

Critical value of χ2 = 5.99 at alpha = 0.05.

Observed value of χ2 = 5.8.

Decision: Fail to reject H0.
IOWA STATE UNIVERSITY
Department of Animal Science
(O-E)2
136.19
266.67
21.81
χ2 =
(O-E)2/E
1.86
3.64
0.3
5.8
Example from:
http://www.philender.com/courses/intro/notes3/chi.html
Distributions Supported in PROC GLIMMIX
 Discrete






Binary
Binomial
Poisson
Geometric
Negative Binomial
Multinomial (nominal
and ordinal)

Continuous
 Beta
 Normal
 “Lognormal”
 Gamma
 Exponential
 Inverse Gaussian
 Shifted T
Distributions specified through DIST= (and LINK=) options on the MODEL statement
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
GLIMMIX overview

In the absence of random effects, the GLIMMIX procedure
fits generalized linear models (fit by the GENMOD
procedure).

GLMMs are useful for estimating:







Trends in disease rates,
Modeling CD4 counts in a clinical trial over time,
Modeling the proportion of infected plants on experimental units in a
design with randomly selected treatments or randomly selected blocks
Predicting the probability of high ozone levels in counties
Modeling skewed data over time,
Analyzing customer preference,
Joint modeling of multivariate outcomes, etc.
IOWA STATE UNIVERSITY
Department of Animal Science
GLIMMIX overview
 The
syntax in SAS to use GLIMMIX to what we
have learned for Proc Mixed.

This includes CLASS, MODEL, and RANDOM
statements.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC GLIMMIX features.

SUBJECT= and GROUP= options, which enable blocking of variance
matrices and parameter heterogeneity

Linear unbiased predictors

Flexible covariance structures for random and residual random effects,
including variance components, unstructured, autoregressive, and spatial
structures

The CONTRAST, ESTIMATE, LSMEANS, and LSMESTIMATE statements,
which produce hypothesis tests and estimable linear combinations of
effects

The NLOPTIONS statement, which enables you to exercise control over the
numerical optimization.

You can choose techniques, update methods, line search algorithms,
convergence criteria, and more. Or, you can choose the default
optimization strategies selected for the particular class of model you are
fitting.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC GLIMMIX features.

Computed variables with SAS programming statements inside of
PROC GLIMMIX (except for variables listed in the CLASS
statement).

These computed variables can appear in the MODEL, RANDOM, WEIGHT, or FREQ
statement.

User-specified link and variance functions choice of model-based
variance-covariance estimators for the fixed effects or empirical
(sandwich)

Estimators to make analysis robust against misspecification of the
covariance structure and to adjust for small-sample bias joint
modeling for multivariate data.

For example, you can model binary and normal responses from a subject jointly and
use random effects to relate (fuse) the two outcomes.
IOWA STATE UNIVERSITY
Department of Animal Science
Comparing the GLIMMIX and MIXED Procedures

The MIXED procedure is different from the GLIMMIX procedure in
the following respect:


Linear mixed models are a special case in the family of generalized linear mixed
models;
A linear mixed model is a generalized linear mixed model where the conditional
distribution is normal and the link function is the identity function.

Most models that can be fit with the MIXED procedure can also be
fit with the GLIMMIX procedure.

Despite this overlap in functionality, there are also some
important differences between the two procedures.

Knowledge concerning the differences enables the user to select
the most appropriate tool in situations where you have a choice
between procedures and to identify situations where a choice
does not exist.
IOWA STATE UNIVERSITY
Department of Animal Science
Comparing the GLIMMIX and MIXED Procedures
The following PROC MIXED statement when
using the repeated statement
repeated / subject=id type=ar(1);
is equivalent to the following Random statement
in the GLIMMIX procedure:
random _residual_ / subject=id type=ar(1);
IOWA STATE UNIVERSITY
Department of Animal Science
Syntax: GLIMMIX Procedure

You can specify the following statements in the GLIMMIX
procedure:

PROC GLIMMIX <options> ;

BY variables ;

CLASS variables ;

CONTRAST ’label’ contrast-specification <, contrastspecification> <, ...> </ options> ;

COVTEST <’label’> <test-specification> </ options> ;

EFFECT effect-specification ;

ESTIMATE ’label’ contrast-specification <(divisor=n)>
<, ’label’ contrast-specification <(divisor=n)>> <, ...> </ options> ;

FREQ variable
IOWA STATE UNIVERSITY
Department of Animal Science
Syntax: GLIMMIX Procedure

ID Variables ;

LSMEANS fixed-effects </ options> ;

LSMESTIMATE fixed-effect <’label’> values <divisor=>
<, <’label’> values <divisor=n>> <, ...> </ options> ;

MODEL response<(response-options)> = <fixed-effects> </ modeloptions> ;

MODEL events/trials = <fixed-effects> </ model-options> ;

NLOPTIONS <options> ;

OUTPUT <OUT=SAS-data-set>
<keyword<(keyword-options)> <=name>>...
<keyword<(keyword-options)> <=name>> </ options> ;

PARMS (value-list) ...</ options> ;

RANDOM random-effects </ options> ;
IOWA STATE UNIVERSITY
Department of Animal Science
Syntax: GLIMMIX Procedure

WEIGHT variable ;

Programming statements ;

The CLASS, CONTRAST, COVTEST, EFFECT, ESTIMATE,
LSMEANS, LSMESTIMATE, and RANDOM statements and the
programming statements can appear multiple times.

The PROC GLIMMIX and MODEL statements are required, and the
MODEL statement must appear after the CLASS statement if a
CLASS statement is included.

The EFFECT statements must appear before the MODEL
statement.
IOWA STATE UNIVERSITY
Department of Animal Science
Comparing MIXED and GLIMMIX
PROC GLIMMIX
BY
CLASS
CONTRAST
EFFECT
ESTIMATE
FREQ
ID
LSMEANS
LSMESTIMATE
MODEL
NLOPTIONS
OUTPUT
PARMS
PRIOR
RANDOM
WEIGHT
<Programming Statements>
PROC MIXED
BY
CLASS
CONTRAST
ESTIMATE
ID
LSMEANS
MODEL
PARMS
RANDOM
REPEATED
WEIGHT
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
Comparing MIXED and GLIMMIX
MIXED uses RANDOM statement for G-side effects and REPEATED statement for
R-side effects.
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
Comparing MIXED and GLIMMIX
Both types of effects are specified with the RANDOM statement in GLIMMIX
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
Comparing MIXED and GLIMMIX
What are G-and R-side Random Effects?
Recallr from mixed models: Y = X*Beta + Z*Gamma + E
 G-side effects enter through Z*Gamma
 R-side effects apply to the covariance matrix on E
 G-side effects are “inside” the link function, making
them easier to interpret and understand
 R-side effects are “outside” the link function and are
more difficult to interpret
Introduction to Generalized Linear Mixed Models
IOWA STATE UNIVERSITYAn
Using SAS PROC GLIMMIX
Department of Animal Science
P. Gibbs, SAS Technical Support
Glimmix Example
Proc glimmix data=one;
Class treatment date site load;
Model deads/pigs_transported = treatment/
dist=binomial link=logit solution;
Random site date(site) load(date*site);
LSMeans treatment/ilink pdiff cl;
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
Glimmix Example
The GLIMMIX Procedure
Model Information
Data Set
WORK.ONE
Response Variable (Events)
Deads
Response Variable (Trials)
Pigs_Transported
Response Distribution
Binomial
Link Function
Logit
Variance Function
Default
Variance Matrix
Not blocked
Estimation Technique
Residual PL
Degrees of Freedom Method Containment
IOWA STATE UNIVERSITY
Department of Animal Science
Glimmix Example
Class
Levels
Values
Treatment
2
Blue Red
Date
10
07/07/09 07/08/09 07/13/09 07/14/09 07/15/09
07/20/09 07/21/09 07/22/09 07/27/09 07/28/09
Site
2
L&L1 LPB
Load
27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 28
Number of Observations Read
54
Number of Observations Used
54
Number of Events
10
Number of Trials
4462
IOWA STATE UNIVERSITY
Department of Animal Science
Glimmix Example
Dimensions
G-side Cov. Parameters
3
Columns in X
3
Columns in Z
40
Subjects (Blocks in V)
Max Obs per Subject
IOWA STATE UNIVERSITY
Department of Animal Science
1
54
Glimmix Example
The GLIMMIX Procedure
Iteration History
Objective
Iteration
Restarts
Subiterations
Function
Max
Change
Gradient
0
0
1
180.8730287
2.00000000
9.588598
1
0
0
226.21287482
0.17907168
5.707842
2
0
3
244.93049605
2.00000000
4.510821
3
0
2
241.99123222
0.24664831
4.378435
4
0
2
241.22432004
0.03671922
4.357186
5
0
1
241.08063527
0.00328332
4.35531
6
0
1
241.06655367
0.00015363
4.355223
7
0
0
241.06587398
0.00000000
4.355221
Convergence criterion (PCONV=1.11022E-8) satisfied.
Estimated G matrix is not positive definite.

IOWA STATE UNIVERSITY
Department of Animal Science
The Estimated G matrix not
positive definite message
usually indicates that one or
more variance components on
the RANDOM statement is/are
estimated to be zero and
could/should be removed from
the model.
Glimmix Example
Fit Statistics
-2 Res Log Pseudo-Likelihood
241.07
Generalized Chi-Square
47.41
Gener. Chi-Square / DF
0.91
IOWA STATE UNIVERSITY
Department of Animal Science
Glimmix Example
Covariance Parameter Estimates
Standard
Cov Parm
Estimate
Error
Site
0
.
Date(Site)
0
.
Load(Date*Site)
0.1569
IOWA STATE UNIVERSITY
Department of Animal Science
0.7068
Glimmix Example
Solutions for Fixed Effects
Standard
Effect
Treatment
Intercept
Treatment
Blue
Treatment
Red
Estimate
Error
DF
t Value
-5.9213
0.4160
1
-14.23
0.0447
-0.4067
0.6466
26
-0.63
0.5348
.
.
0
IOWA STATE UNIVERSITY
Department of Animal Science
.
Pr > |t|
.
Glimmix Example
Type III Tests of Fixed Effects
Num
Den
Effect
DF
DF
F Value
Pr > F
Treatment
1
26
0.40
0.5348
IOWA STATE UNIVERSITY
Department of Animal Science
Glimmix Example
Treatment Least Squares Means
Standard
Treatment Estimate
Error
DF t Value Pr > |t| Alpha
Lower
Upper
Mean
Blue
-6.3280
0.5066
26 -12.49
<.0001
0.05 -7.3692 -5.2867 0.001782
Red
-5.9213
0.4160
26 -14.23
<.0001
0.05 -6.7764 -5.0662 0.002675
Treatment Least Squares Means
Standard
Treatment
Error
Lower
Upper
Mean
Mean
Mean
Blue
0.000901
0.000630
0.005033
Red
0.001110
0.001139
0.006267
IOWA STATE UNIVERSITY
Department of Animal Science
Glimmix Example
Differences of Treatment Least Squares Means
Standard
Treatment _Treatment Estimate
Blue
Red
-0.4067
Error
0.6466
DF t Value Pr > |t| Alpha
26
IOWA STATE UNIVERSITY
Department of Animal Science
-0.63
0.5348
Lower
0.05 -1.7358
Upper
0.9224
IOWA STATE UNIVERSITY
Department of Animal Science
Download