Lecture 14 - 15 MIXED NEW

advertisement
Use of Proc Mixed to Analyze
Experimental Data
Animal Science 500
Lecture No.
October , 2010
IOWA STATE UNIVERSITY
Department of Animal Science
GLM and MIXED in SAS

The SAS procedures GLM and MIXED can be used to
fit linear models.


Proc GLM was designed to fit fixed effect models


Commonly used to analyze data from a wide range of experiments
Later amended to fit some random effect models by including
RANDOM statement with TEST option.
The REPEATED statement in PROC GLM allows to
estimate and test repeated measures models with an
arbitrary correlation structure for repeated
observations.
IOWA STATE UNIVERSITY
Department of Animal Science
GLM and MIXED in SAS
 The
PROC MIXED was specifically designed to
fit mixed effect models.

Including fixed effects, random effects, repeated effects
 It
can model data with heterogeneous
variances and autocorrelated observations.
 The
MIXED gives a user more flexibility in
specifying the correlation structures,
particularly useful in repeated measures and
random effect models.
IOWA STATE UNIVERSITY
Department of Animal Science
GLM and MIXED in SAS
 SAS
has made many advancements to
different procedures over the years.
 PROC



MIXED is not an extension of GLM.
Based on different statistical principles;
GLM and MIXED use different estimation methods
GLM uses the ordinary least squares (OLS) estimation


Parameter estimates are such values of the parameters of the
model that minimize the squared difference between observed
and predicted values of the dependent variable.
Provides the familiar analysis of variance table in which the
variability in the dependent variable (the total sum of squares) is
divided into variances due to different sources (sum of squares
for effects in the model).
IOWA STATE UNIVERSITY
Department of Animal Science
GLM and MIXED in SAS
 Using
PROC MIXED you do not get the
analysis of variance table that you may be
used to evaluating.

It uses estimation methods based on different
principles.
 PROC
MIXED has three options for the method
of estimation.



They are: ML (Maximum Likelihood),
REML (Restricted or Residual maximum likelihood,
which is the default method) and
MIVQUE0 (Minimum Variance Quadratic Unbiased
Estimation).
IOWA STATE UNIVERSITY
Department of Animal Science
Defining fixed or random factor
Random
Fixed
Levels
Selected at random from
a conceptually infinite
collection of possibilities
Finite number of
possibilities
Another experiment
Would use different levels
from the same population
Would use the same
levels of the factor
Goals
Estimate variance
components
Estimate means
Inference
For all levels of the factor
(i.e. for population from
which levels are selected)
Only for levels actually
used in the experiment
From D. A. Dickey, 2008: SAS Global Forum
IOWA STATE UNIVERSITY
Department of Animal Science
Characteristics of Mixed Solutions
 Best
Linear Unbiased Estimates = BLUE
 Best
Linear Unbiased Predictions = BLUP
IOWA STATE UNIVERSITY
Department of Animal Science
Comparison with GLM
 GLM
has a random statement however it
models all effects as if they were fixed
IOWA STATE UNIVERSITY
Department of Animal Science
 Usually
the tests of interest even in a mixed
model are the fixed effects differences
 Can
test differences between random effects
by using likelihood ratio tests
IOWA STATE UNIVERSITY
Department of Animal Science
Proc Mixed General Form
 PROC
MIXED is used to fit models of the form
y = Xβ + ZU + e
where y is a vector of responses X is a known design
matrix for the fixed effects,
β is vector of unknown fixed-effect parameters,
Z is a known design matrix for the random effects,
U is vector of unknown random-effect parameters, and
e is a vector of (normally distributed) random errors.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 The
PROC MIXED syntax is similar to the
syntax of PROC GLM with a few important
differences.




The random effects and repeated statements are used
differently,
Random effects are not listed in the model statement,
GLM has MEANS and LSMEANS statements, whereas
MIXED has only the LSMEANS statement,
GLM offers Type I, II, III and IV tests for fixed effects,
while MIXED offers TYPE I and TYPE III.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 The
following is a general form of PROC
MIXED statement:
PROC MIXED options;
CLASS variable-list;
MODEL dependent=fixed effects/ options;
RANDOM random effects / options;
REPEATED repeated effects / options;
CONTRAST 'label' fixed-effect values |
random-effect values/ options;
ESTIMATE 'label' fixed-effect values |
random-effect values/ options;
LSMEANS fixed-effects / options;
MAKE 'table' OUT= SAS-data-set <
options >;
RUN;
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax

The CONTRAST, ESTIMATE, LSMEANS, MAKE and
RANDOM statements can appear multiple times, all
other statements can appear only once.

The PROC MIXED and MODEL statements are required.

The MODEL statement must appear after the CLASS
statement if CLASS statement is used.

The CONTRAST, ESTIMATE, LSMEANS, RANDOM and
REPEATED statement must follow the MODEL
statement.

CONTRAST and ESTIMATE statements must follow
RANDOM statement if the RANDOM is used.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 PROC


Selected options:
DATA= SAS data set
Names SAS data set to be used by PROC MIXED.


MIXED <options>;
The default is the most recently created data set.
METHOD=REML
METHOD=ML
METHOD=MIVQUE0


Specifies the estimation method.
REML is the default method.

IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 PROC





MIXED <options>;
COVTEST
Prints asymptotic standard errors and Wald Z-test for
variance-covariance structure parameter estimates.
For example, if a random effect A is included in the
model, then the estimator of the variance of A will be
printed together with the Wald test of the hypothesis that
the variance of A is 0.
The COVTEST option is specified after Proc mixed and
before semicolon;.
For example,
Proc mixed data=mydata method=reml covtest;
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 CLASS
variables;
 CLASS
statement used in a similar manner
that we have used with other SAS Procedures


Lists classification variables (categorical independent
variables in the model).
For example:
PROC MIXED data=mydata covtest;
Class group gender agecat;
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 MODEL





dependent = fixed effects </options>;
The model statement names a single dependent
variable and the fixed effects, that is independent
variables that are not random.
Is different from GLM in that numerous dependent
variables can be listed on the left hand side of the
equation and only one can be listed with MIXED
Requires a new MIXED, Class, Model, Random,
Repeated, etc. for each dependent variable
An intercept is included in the model by default.
The NOINT option can be used to remove the intercept.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 Selected

Options of the model statement:
CHISQ, request c2 – tests (Wald tests) be performed for
all fixed effects in addition to the F-tests.
 DDFM=RESIDUAL


The DDFM= options specifies the method for computing
the denominator degrees of freedom for the tests of
fixed effects. DDFM=SATTERTH will result in the
Satterthwaite approximation for the denominator
degrees of freedom.
For balanced designs with random effects it will produce
the same test results as RANDOM …/ TEST option in
PROC GLM (if the default METHOD=REML is used in
proc mixed).
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 Selected

Options of the model statement:
CHISQ, request c2 – tests (Wald tests) be performed for
all fixed effects in addition to the F-tests.
 DDFM=CONTAIN
 DDFM=BETWITHN
 DDFM=SATTERTH,
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 RANDOM


random effects </options>;
The RANDOM statement defines the random effects in
the model.
It can be used to specify traditional variance
components



(independent random effects with different variances) or
to list correlated random effects and specify a correlation
structure for them with the TYPE=covariance-structure option.
A variety of structures are available. Those most frequently
used include
 TYPE=VC, a variance components correlation structure or
 TYPE=UN, an unstructured, that is, arbitrary covariance matrix.
 TYPE=VC is the default structure. In the following example, the
effect of subject is random.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax Example
 Proc
mixed data=one method=reml covtest;
Class gender treat subject;
Model y=gender treat gender*treat
/ddfm=satterth;
Random subject(gender);
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 REPEATED


repeated effects / options;
The repeated statement is used in PROC MIXED to specify
the covariance structure of the error term.
The repeated effect has to be categorical and has to appear
in the class statement and the data has to be sorted
accordingly.



For example, suppose that each pig or steer was weighed at five
equally spaced time points.
The time is the repeated effect and the data has to be sorted by
subject and time within each subject.
If time is also used as a continuous independent variable in the
model then a new variable, say t, identical to time has to be defined
and t should be used in the class and repeated statements.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax Repeated Example

Data one; Set one;
T=time;
Run;
Quit;
Proc sort data=one;
By group id t;
Run;
Quit
Proc mixed data=one covtest;
Class t group id;
Model y=group time group*time;
Repeated t /type=ar(1) subject=id;
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax Repeated Example

The option TYPE in the REPEATED statement specifies
the type of the error correlation structure.

The one specified in the above example is the firstorder autoregressive correlation.

The subject option is needed to identify observations
that are correlated.

Observations within the same subject are correlated
with the type of correlation specified in TYPE,
observations from different subjects are independent.

The TYPE option allows for many types of correlation structures.
Most commonly used are autocorrelation, compound symmetry,
Huynh-Feldt, Toeplitz, variance components, unstructured and
spatial.
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

Variance Components:




The VC structure is the standard variance components and is the default.
So treatments using are not correlated
Note that the co variances are assumed uncorrelated or 0
Random error term would also be uncorrelated as well.
σ2
σA2
0
0
0
0
σB2
0
0
0
0
σAB2
0
0
0
0
σBA2
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

Compound symmetry:





This structure says that the correlations between all pairs of measures are the same.
One reason for its popularity is that in many simple cases it gives the same results
as the univariate analysis from pre-PROC MIXED repeated measures ANOVA
programs, including SAS's own PROC GLM.
The assumption is not unreasonable when the repeated measures arise from
different sets of conditions, such as the response to different treatments.
Its biggest drawback is that it is often unrealistic when the repeated measures are
serial measurements, that is, the same response measured over time.
Typically, measurements that are made close together (consecutive measurments,
say) will be more highly correlated than measurements made farther apart (the first
and last).
σ2
σ2 +σ21
σ21
σ21
σ21
σ21
σ2+σ21 σ21
σ21
σ2 1
σ2 1
σ2+σ21
σ2 1
σ21
σ21
σ21
σ2+σ21
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

Auto regressive (1):




This structure resolves some of the objections to the use of compound symmetry
with serial data when the measures are equally spaced over time.
AR(1) says that the correlation between two responses that are t measurements
apart is t.
Since less than 1, the greater power, the smaller the magnitude.
Thus, the farther apart measurements are, the lower their correlation.
σ2
1
ρ
ρ2
ρ3
ρ
1
ρ
ρ2
ρ2
ρ
1
ρ
ρ3
ρ2
ρ
1
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

Unstructured:







Sometimes, no standard covariance structure seems appropriate.
This option will estimate every covariance individually and let the data themselves
dictate what they should be.
That is what the unstructured option does.
Is the most liberal of all covariance structures as it allows every term to be different
The more data that are used to assess the correlation structure, the less data there
are to estimate the parameters of linear models.
An analysis that uses an unstructured covariance matrix will be less powerful that an
analysis that uses the proper structure.
The challenge is knowing what the structure is.
σ12 σ12 σ13 σ14
σ2
σ12 σ22 σ23 σ24
σ13 σ23 σ32 σ34
σ14 σ24 σ34 σ32
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

Toeplitz:

The TOEP structure is similar to the AR(1) in that all measurements next
to each other have the same correlation,

Measurements two apart have the same correlation different from the
first, measurements three apart have the same correlation different from
the first two, etc.

However, the correlations do not necessarily have the same pattern as in
the AR(1). Technically, the AR(1) is a special case of the Toeplitz.
σ2
σ2
σ1
σ2
σ3
σ1
σ2
σ1
σ2
σ2
σ1
σ2
σ1
σ3
σ2
σ1
σ2
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

What is the proper way to choose among many covariances?

Ideally, the covariance structure should be known from previous work or subject
matter considerations.



Otherwise, one runs the risk of "shopping" for the structure that leads to a preconceived result.
However, there are many cases where the structure is unknown or where the
analyst would like to check to be sure that s/he is not making a mistake (in the
manner of checking for an interaction between a factor and covariate in an analysis
of covariance model).
It is common to consider a few likely structures and choose
among them according to some measure of fit.

The purest statistician suggests as above that this is not really the correct way to do
this but is what happens in reality
IOWA STATE UNIVERSITY
Department of Animal Science
Common covariance structures used with the
Repeated statement

These measures tend to be composed of two parts--one that
rewards for the accuracy of the fit and another that penalizes for
the number of parameters it takes to achive it.

The most popular of these is the Akaike Information Criterion
(AIC).

In the reward portion, the AIC looks at how well the estimated and
observed structures agree, or rather the extent to which they
differ.

Smaller values are good.

In the penalty portion, the AIC considers how many parameters it
takes to achieve the fit.

Thus, one might analyze the data using the CS, AR(1), and UN
covariance structures and choose the one for which the AIC is a
minimum.
IOWA STATE UNIVERSITY
Department of Animal Science
Example (Littell, Milliken, Stroup, Wolfinger)
 Example
= consider a multi location (9
locations) trial using 4 treatments.
 The
treatments were observed at each of 9
locations and at each location a RCB design
with 3 blocks was used.
 The
model is as follows:
IOWA STATE UNIVERSITY
Department of Animal Science
Writing the model statement
yijk = μ + τi+ Lj + R (L) jk + τLij + eijk
where
yijk is the observation
μ is the overall mean
τi is the treatment effect
Lj is the random Location effect, ~ N(0,σL2 )
R (L) jk is the block within location, ~ N(0,σR2 )
τLij is the treatment by location effect, ~ N(0,σT2 ) and
eijk is the random error, ~ N(0,σ2 )
IOWA STATE UNIVERSITY
Department of Animal Science
SAS code to analyze
The SAS code we’ll use to fit the data is the following.
Proc Mixed;
Class loc block trt;
Model resp = trt / ddfm=satterth;
Random loc block(loc) loc*trt;
Run;
Quit;

Note : This code uses the default variance component (VC)
structure give us an estimate of σB2 . Because of the
assumption regarding the distribution of the errors we do
not need to specify a REPEATED statement.
IOWA STATE UNIVERSITY
Department of Animal Science
PROC MIXED Syntax
 CONTRAST
‘label’ fixed-effect values | randomeffect values / options;
 ESTIMATE
‘label’ fixed-effect values | randomeffect values / options;
 The
CONTRAST statement is used when there is
need for custom hypothesis tests, the ESTIMATE
statement, when there is need for custom
estimates. Although they were extended in PROC
MIXED to include random effects, their use is
very similar to the CONTRAST and ESTIMATE
statement in PROC GLM.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
PROC MIXED Syntax
 CONTRAST
‘label’ fixed-effect values | randomeffect values / options;
 ESTIMATE
‘label’ fixed-effect values | randomeffect values / options;




LABEL is required for every contrast or estimate statement. It identifies
the contrast or estimated parameter on the output. It can not be longer
than 20 characters.
FIXED-EFFECT is the name of an effect appearing in the MODEL
statement.
RANDOM-EFFECT is the name of an effect appearing in the RANDOM
statement.
VALUES are the coefficients of the contrast to be tested or the parameter
to be estimated.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
PROC MIXED Syntax
 LSMEANS



fixed-effects / options;
Similar to use with GLM
LSMEANS computes the least squares means of fixed
effects.
The ADJUST option requests a multiple comparison
adjustment to the p-values for pair-wise comparisons of
means.



The following adjustments are available: BON (Bonferroni), DUNNET,
SCHEFFE, SIDAK, SIMULATE, SMM|GT2 and TUKEY.
The ADJUST option results in all possible pair-wise comparisons.
If comparisons with a control level are only needed then in addition to
ADJUST option, PDIFF=control should be used. The SLICE option
allows to test the significance of one effect at each level of another
effect.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
PROC MIXED Syntax
 MAKE



'table' OUT= SAS-data-set < options >;
The MAKE statement converts any table produced by
PROC MIXED into a sas data set.
NOPRINT option can be used to prevent printing the
requested table.
Only requested or default output can be converted into
a sas data set.


The P option has to be used in the model statement to produce
a data set with predicted values, and
The LSMEANS statement has to be included to output least
squares means.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED LOG

If the model is correct and runs a message will appear
in the log that states “convergence criteria met”

If you see anything else then either your model is not
correct or assumptions of normality of the data are
incorrect

A common error message is “G matrix not positive
definite


Get this because the mean square within subjects is greater than
the mean square between subjects
The value of the G matrix can be obtained by putting the g option at
the end of the random statement along with the NOBOUND

If this is a small negative value relative to the size of the residual then nothing to
worry about, if not then the model may need to be changed.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
 Output
56.1.1 Results for Split-Plot Analysis
The Mixed Procedure
Model Information
Data Set
WORK.SP
Dependent Variable
Y
Covariance Structure
Variance Components
Estimation Method
REML
Residual Variance Method
Profile
Fixed Effects SE Method
Model-Based
Degrees of Freedom Method
Containment
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Class Level Information
Class
Levels
Values
A
3
123
B
2
12
Block
4
1234
The "Class Level Information" table lists the levels of all variables specified
in the CLASS statement. Check this table to make sure that the data are
correct.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Dimensions
Covariance Parameters
3
Columns in X
12
Columns in Z
16
Subjects
Max Obs Per Subject
1
24
The "Dimensions" table lists the magnitudes of various vectors and
matrices.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Number of Observations
Number of Observations Read
24
Number of Observations Used
24
Number of Observations Not Used
0
The "Number of Observations" table shows that all observations
read from the data set are used in the analysis
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Iteration History
Iteration
Evaluations
-2 Res Log Like
0
1
139.81461222
1
1
119.76184570
Criterion
0.00000000
PROC MIXED estimates the variance components for Block, A*Block, and the residual by
REML. The REML estimates are the values that maximize the likelihood of a set of linearly
independent error contrasts, and they provide a correction for the downward bias found in
the usual maximum likelihood estimates. The objective function is times the logarithm of the
restricted likelihood, and PROC MIXED minimizes this objective function to obtain the
estimates.
The minimization method is the Newton-Raphson algorithm, which uses the first and second
derivatives of the objective function to iteratively find its minimum. The "Iteration History"
table records the steps of that optimization process. For this example, only one iteration is
required to obtain the estimates. The Evaluations column reveals that the restricted
likelihood is evaluated once for each of the iterations. A criterion of 0 indicates that the
Newton-Raphson algorithm has converged.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Covariance Parameter Estimates
Cov Parm
Estimate
Block
62.3958
A*Block
15.3819
Residual
9.3611
The REML estimates for the variance components for the random effects
Block, A*Block, and the residual are shown in the Estimate column of the
"Covariance Parameter Estimates“.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Fit Statistics
-2 Res Log Likelihood
119.8
AIC (smaller is better)
125.8
AICC (smaller is better)
127.5
BIC (smaller is better)
123.9
The "Fit Statistics“ lists several values about the fitted mixed model,
including the residual log likelihood.
The Akaike (AIC) and Bayesian (BIC) information criteria can be used to
compare different models; the ones with smaller values are preferred.
The AICC information criteria is a small-sample bias-adjusted form of the
Akaike criterion (Hurvich and Tsai 1989).
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
Type 3 Tests of Fixed Effects
Effect
Num DF
Den DF F Value Pr > F
A
2
6
4.07
0.076
B
1
9
19.39
0.0017
A*B
2
9
4.02
0.0566
The fixed effects are tested by using Type 3 estimable functions.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT
The results from the PROC MIXED analysis are
the same as those obtained from the following
GLM analysis
PROC GLM data=sp;
class A B Block;
model Y = A B A*B Block A*Block;
test h=A e=A*Block;
Run;
Quit;
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Understanding the PROC MIXED OUTPUT

LS Means can be obtained in a similar manner
as in GLM

Various mean separation techniques can be
used to determine differences between levels
of a factor once the factor has been found to
be a significant source of variation in the
analysis model used to evaluate the data.
IOWA STATE UNIVERSITY
Department of Animal Science
SAS/STAT(R) 9.2 User's Guide, Second Edition
Download