Generalized Linear Models Using Proc Genmod
Generalized Linear Models can be fitted using SAS Proc Genmod. This procedure
allows you to fit models for binary outcomes, ordinal outcomes, and models for
other distributions in the exponential family (e.g., Poisson, negative binomial,
gamma). GEE (Generalized Estimating Equations) can be used to fit marginal models
with repeated measures, by using the repeated statement.
We will be using data from the Apple Tree Dental Plan for these examples. Apple
Tree Dental is a non-profit organization whose mission is to provide comprehensive
oral health care for people with special dental access needs. This data is for
elderly nursing home residents, and was collected as part of Grant R03DE1697601A1 ("Dental Utilization by Nursing Home Residents: 1986-2004", National
Institute of Dental and Craniofacial Research), Barbara J. Smith, Principal
Investigator. There are 987 patients in this database, with baseline ages from 55
to 102 years. They all entered the program in 1992, and were followed for a
maximum of 5 follow-up periods. Each period was from 0 days to 547 days long. A
participant could have had a period of zero days length if they came to the
program, had their initial dental visit, and then never returned for any follow-up
visits. We will be taking a look at the number of claims that these participants
made for diagnostic dental services during their first period with Apple Tree
Dental, and then over the five possible periods in the dataset. We are mainly
interested in comparing three different levels of functional dentition,
FUNCTDENT, 0: Edentulous, 1: < 20 teeth, and 2: >=20 teeth. We will also control
for other covariates in the analysis.
We first take a look at the distribution of the number of Diagnostic services,
NUM_DIAGNOSTIC, using histograms for each level of FUNCTDENT. As you can
see in the graphs below, the distributions are not normal, in fact, they are highly
skewed to the right. Note, that due to the nature of the data, there can be no
values less than zero.
proc format;
value functdent 0="Edentulous"
1="<20 Teeth"
2=">=20 Teeth";
run;
Generalized Linear Models Using SAS
1
proc sgpanel data=mylib.appletree;
where period=1;
panelby functdent / rows=1 novarname;
histogram Num_Diagnostic ;
format functdent functdent.;run;
One of the covariates that we wish to include in the model is the size (NBEDS) of
the facility where the person is staying. However, we want to use this as a
categorical predictor. We modify the dataset to create a new categorical variable,
NURSBEDS, which has a value of 1: 100 or fewer beds, 2: 101-150 beds, or 3: >150
beds in the nursing home where the participant lived.
We also want to be sure that we are comparing "rates" of dental services usage, by
taking into account the length of time included in the first follow-up period in our
model as an offset. We calculate the length of the period in years, rather than
days, so the estimated mean values for the outcome will be based on annual, rather
than daily rates of usage. We then take the natural log of the number of years,
Generalized Linear Models Using SAS
2
after adding .0001 to the value, so the zero values will not be excluded. This
variable (LOG_PERIOD_YR) will be the offset in the model.
data mylib.appletree2;
set mylib.appletree;
if nbeds ne . then do;
if nbeds < 101 then nursbeds =1;
if nbeds >= 101 and nbeds < 151 then nursbeds=2;
if nbeds >= 151 then nursbeds=3;
end;
Period_yr = Period_days/365.25;
log_period_yr = log(period_yr+.0001);
run;
Poisson Regression Model
We now fit a Poisson regression model, restricting the analysis to period 1 only, by
using a Where Statement. We tell SAS that the Dist=Poisson, so that we get the
correct model, and specify the offset as LOG_PERIOD_YR. The link function that
is used will be the log function (by default). We get contrasts between different
levels of functional dentition using the estimate statement.
title "Annual Rate of Diagnostic Services in Period 1";
title2 "Poisson Model";
proc genmod data=mylib.appletree2;
where period=1;
class sex nursbeds functdent ;
model Num_Diagnostic = functdent sex baseage nursbeds /
dist=poisson offset = log_period_yr type3;
lsmeans functdent;
estimate "<20 vs Edent" functdent -1 1 0;
estimate ">=20 vs Edent" functdent -1 0 1;
estimate ">=20 vs <20"
functdent 0 -1 1;
run;
Generalized Linear Models Using SAS
3
Annual Rate of Diagnostic Services in Period 1
Poisson Model
The GENMOD Procedure
Model Information
Data Set
Distribution
Link Function
Dependent Variable
Offset Variable
MYLIB.APPLETREE2
Poisson
Log
Num_Diagnostic
log_period_yr
Number of Observations Read
Number of Observations Used
Missing Values
987
981
6
Class Level Information
Class
Sex
nursbeds
functdent
Levels
2
3
3
Values
F M
1 2 3
0 1 2
Parameter
Prm1
Prm2
Prm3
Prm4
Parameter Information
Effect
Sex
nursbeds
Intercept
functdent
functdent
functdent
Prm5
Prm6
Prm7
Prm8
Prm9
Prm10
Sex
Sex
BaseAge
nursbeds
nursbeds
nursbeds
functdent
0
1
2
F
M
1
2
3
Criteria For Assessing Goodness Of Fit
Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood
Full Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
DF
974
974
974
974
Value
1339.6041
1339.6041
2146.0464
2146.0464
-500.1528
-1920.6563
3855.3126
3855.4277
3889.5326
Value/DF
1.3754
1.3754
2.2033
2.2033
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates
Parameter
Intercept
functdent
functdent
functdent
Sex
0
1
2
F
DF
Estimate
Standard
Error
1
1
1
0
1
0.9799
-0.6784
0.2087
0.0000
-0.1483
0.1987
0.0598
0.0519
0.0000
0.0488
Generalized Linear Models Using SAS
Wald 95% Confidence
Limits
0.5904
-0.7957
0.1070
0.0000
-0.2440
1.3695
-0.5611
0.3104
0.0000
-0.0525
4
Wald
Chi-Square
24.31
128.53
16.18
.
9.21
Pr > ChiSq
<.0001
<.0001
<.0001
.
0.0024
Sex
M
0
0.0000
BaseAge
1
0.0038
nursbeds
1
1
-0.0418
nursbeds
2
1
-0.0075
nursbeds
3
0
0.0000
Scale
0
1.0000
NOTE: The scale parameter was held
0.0000
0.0024
0.0676
0.0457
0.0000
0.0000
fixed.
0.0000
-0.0010
-0.1743
-0.0970
0.0000
1.0000
0.0000
0.0086
0.0907
0.0821
0.0000
1.0000
.
2.46
0.38
0.03
.
.
0.1168
0.5365
0.8704
.
LR Statistics For Type 3 Analysis
Source
functdent
Sex
BaseAge
nursbeds
DF
ChiSquare
Pr > ChiSq
2
1
1
2
325.23
9.03
2.48
0.39
<.0001
0.0027
0.1156
0.8244
Least Squares Means
Effect
functdent
functdent
functdent
functdent
0
1
2
Estimate
Mean
L'Beta
1.6966
4.1193
3.3434
Standard
Error
DF
ChiSquare
Pr > ChiSq
0.0451
0.0341
0.0465
1
1
1
137.24
1719.8
672.68
<.0001
<.0001
<.0001
0.5286
1.4157
1.2070
Contrast Estimate Results
Mean
Estimate
Label
<20 vs Edent
>=20 vs Edent
>=20 vs <20
2.4280
1.9707
0.8116
Mean
Confidence Limits
2.1928
1.7526
0.7332
L'Beta
Estimate
Standard
Error
Alpha
0.8871
0.6784
-0.2087
0.0520
0.0598
0.0519
0.05
0.05
0.05
2.6885
2.2159
0.8985
Contrast Estimate Results
Label
<20 vs Edent
>=20 vs Edent
>=20 vs <20
L'Beta
Confidence Limits
0.7852
0.5611
-0.3104
0.9890
0.7957
-0.1070
ChiSquare
Pr > ChiSq
291.00
128.53
16.18
<.0001
<.0001
<.0001
The estimated annual number of diagnostic services for those participants who are
edentulous is 1.7, while it is 4.1 for those with < 20 teeth, and 3.3 for those with
>=20 teeth. There is a significant difference in the annual number of diagnostic
services required in Period 1 between each of the levels of functional dentition,
after controlling for the other covariates in the model.
Overdispersed Poisson Model
Generalized Linear Models Using SAS
5
The value of the deviance divided by its degrees of freedom and the Pearson chisquare divided by its degress of freedom, 1.38 and 2.20, respectively, suggest that
there might be some overdispersion. (If the distribution were Poisson, we would
expect the deviance divided by degrees of freedom to be close to 1.0).
We will next fit an overdispersed Poisson model, using Proc Genmod. To do this,
simply insert either scale=Pearson or scale=deviance as an option in the model
statement, to obtain an overdispersed Poisson distribution, based on the deviance
or Pearson chi-square, respectively.
When either of these options is specified, the model estimates are first obtained
by setting the scale to 1.0, as for the Poisson distribution; thus the parameter
estimates are unchanged from the Poisson model. Then, the scale parameter is
estimated by either the square root of the Pearson chi-square/df or the square
root of the deviance chi-square/df. The standard errors and other statistics are
adjusted accordingly. For example, the standard errors of the parameter
estimates are multiplied by the new scale statistic, making the statistical tests
more conservative.
The syntax to use is illustrated below (output not shown):
model Num_Diagnostic = functdent sex baseage nursbeds
/ scale=Pearson dist=poisson offset = log_period_yr type3;
Negative Binomial Model
We now refit the model, using dist=negbin, to fit a negative binomial model.
title "Annual Rate of Diagnostic Services in Period 1";
title2 "Negative Binomial Regression Model";
proc genmod data=mylib.appletree2;
where period=1;
class sex nursbeds functdent ;
model Num_Diagnostic = functdent sex baseage nursbeds /
dist=negbin offset = log_period_yr type3;
lsmeans functdent;
run;
Generalized Linear Models Using SAS
6
The deviance/df and Pearson chi-square/df are now closer to 1.0, so this is an
improvement over the original Poisson Model.
Criteria for Assessing Goodness of Fit
Criterion
Deviance
Scaled Deviance
Pearson Chi-Square
Scaled Pearson X2
Log Likelihood
Full Log Likelihood
AIC (smaller is better)
AICC (smaller is better)
BIC (smaller is better)
DF
974
974
974
974
Value
1010.3136
1010.3136
1715.2718
1715.2718
-471.6065
-1892.1099
3800.2199
3800.3680
3839.3285
Value/DF
1.0373
1.0373
1.7611
1.7611
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates
Parameter
Intercept
functdent
functdent
functdent
Sex
Sex
BaseAge
nursbeds
nursbeds
nursbeds
Dispersion
0
1
2
F
M
1
2
3
DF
Estimate
Standard
Error
Wald 95% Confidence
Limits
1
1
1
0
1
0
1
1
1
0
1.0088
-0.6906
0.2245
0.0000
-0.1480
0.0000
0.0040
-0.0588
-0.0100
0.0000
0.2363
0.0689
0.0621
0.0000
0.0584
0.0000
0.0029
0.0795
0.0543
0.0000
0.5456
-0.8256
0.1027
0.0000
-0.2626
0.0000
-0.0017
-0.2146
-0.1164
0.0000
1.4719
-0.5557
0.3463
0.0000
-0.0335
0.0000
0.0097
0.0971
0.0964
0.0000
1
0.1448
0.0243
0.0971
0.1925
Wald
Chi-Square
18.22
100.61
13.05
.
6.42
.
1.85
0.55
0.03
.
Pr > ChiSq
<.0001
<.0001
0.0003
.
0.0113
.
0.1735
0.4599
0.8539
.
LR Statistics For Type 3 Analysis
Source
functdent
Sex
BaseAge
nursbeds
DF
ChiSquare
Pr > ChiSq
2
1
1
2
226.22
6.35
1.86
0.55
<.0001
0.0118
0.1729
0.7589
Least Squares Means
Effect
functdent
functdent
functdent
functdent
0
1
2
Estimate
Mean
L'Beta
1.7316
0.5490
4.3239
1.4642
3.4544
1.2397
Standard
Error
0.0512
0.0422
0.0552
DF
1
1
1
ChiSquare
115.04
1201.3
503.61
Pr > ChiSq
<.0001
<.0001
<.0001
There are some minor differences in the model estimates and standard errors for
this negative binomial model vs. the original Poisson model. We can carry out a test
Generalized Linear Models Using SAS
7
to decide whether the data are better fit using an overdispersed Poisson
distribution, against alternatives of the form:
V ( )    k  2
which is appropriate for a negative binomial distribution. This is a Lagrange
Multiplier test in SAS (Cameron and Trivedi, 1988). To obtain this test in Proc
Genmod, insert the noscale option in the negative binomial model statement, after
the /.
model Num_Diagnostic = functdent sex baseage nursbeds
dist=negbin offset = log_period_yr type3;
/ noscale
The Lagrange Multiplier test is added to the output window. The results of this
test are significant, indicating that we would reject H0, and conclude that the
Negative Binomial model is a better choice for this analysis.
Lagrange Multiplier Statistics
Parameter
Dispersion
Chi-Square
37.1391
Pr > ChiSq
<.0001
Generalized Estimating Equations (GEE) Model for Clustered Data:
We now examine a model for the Apple Tree Dental data, but this time, we include
observations for up to 5 periods for each participant. We use the repeated
statement in SAS to set up the subject (RANDOM_ID) and the correlation type
(exchangeable). Other correlation types can be examined as well. SAS will
automatically use "sandwich" estimates (empirical estimates) of the standard
errors for GEE models.
The syntax below shows the inclusion of PERIOD, and the PERIOD*FUNCTDENT
interaction in the model statement. We also include a repeated statement to set
up the desired correlation structure among observations for the same participant.
title "Annual Rate of Diagnostic Services Across Periods";
proc genmod data=mylib.appletree2;
where nmiss(Num_Diagnostic,functdent,nursbeds,baseage)=0;
class random_id sex nursbeds period functdent;
model Num_Diagnostic = functdent period functdent*period sex
baseage nursbeds /
dist=negbin offset = log_period_yr type3;
Generalized Linear Models Using SAS
8
repeated subject=random_id / type=exch ;
lsmeans functdent*period;
run;
The output from this model fit is shown below:
Annual Rate of Diagnostic Services Across Periods
The GENMOD Procedure
Model Information
Data Set
Distribution
Link Function
Dependent Variable
Offset Variable
MYLIB.APPLETREE2
Negative Binomial
Log
Num_Diagnostic
log_period_yr
Number of Observations Read
Number of Observations Used
Class
Levels
Random_ID
981
Sex
nursbeds
Period
functdent
2
3
5
3
2892
2892
Class Level Information
Values
1 2 3
21 22
38 39
55 56
72 73
...
F M
1 2 3
1 2 3
0 1 2
4 5 6
23 24
40 41
57 58
74 75
7 8 9
25 26
42 43
59 60
76 77
10
27
44
61
78
11
28
45
62
79
12
29
46
63
80
13
30
47
64
81
14
31
48
65
82
15
32
49
66
83
16
33
50
67
84
17
34
51
68
85
4 5
Algorithm converged.
GEE Model Information
Correlation Structure
Subject Effect
Number of Clusters
Correlation Matrix Dimension
Maximum Cluster Size
Minimum Cluster Size
Exchangeable
Random_ID (981 levels)
981
5
5
1
Algorithm converged.
Exchangeable Working
Correlation
Correlation
-0.011628583
GEE Fit Criteria
QIC
19.7690
QICu
57.6942
Generalized Linear Models Using SAS
9
18
35
52
69
86
19
36
53
70
87
20
37
54
71
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter
Intercept
functdent
functdent
functdent
Period
Period
Period
Period
Period
0
1
2
1
2
3
4
5
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Sex
Sex
BaseAge
nursbeds
nursbeds
nursbeds
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
F
M
0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
1
2
3
Estimate
Standard
Error
95% Confidence
Limits
0.5355
-0.2255
0.0732
0.0000
0.3947
0.2259
0.2068
0.1929
0.0000
0.1739
0.1411
0.1146
0.0000
0.0830
0.0862
0.0856
0.0899
0.0000
0.1947
-0.5020
-0.1515
0.0000
0.2321
0.0570
0.0390
0.0167
0.0000
0.8764
0.0511
0.2979
0.0000
0.5573
0.3948
0.3746
0.3690
0.0000
3.08
-1.60
0.64
.
4.76
2.62
2.42
2.15
.
0.0021
0.1100
0.5230
.
<.0001
0.0087
0.0157
0.0319
.
-0.2906
0.1441
0.0000
0.0098
-0.1181
0.0000
-0.0678
-0.0470
0.0000
-0.0757
-0.1030
0.0000
0.0000
0.0000
0.0000
-0.1599
0.0000
0.0068
0.0180
-0.0982
0.0000
0.1389
0.1210
0.0000
0.1443
0.1282
0.0000
0.1406
0.1243
0.0000
0.1461
0.1320
0.0000
0.0000
0.0000
0.0000
0.0434
0.0000
0.0020
0.0535
0.0388
0.0000
-0.5629
-0.0930
0.0000
-0.2730
-0.3693
0.0000
-0.3434
-0.2906
0.0000
-0.3620
-0.3618
0.0000
0.0000
0.0000
0.0000
-0.2451
0.0000
0.0028
-0.0869
-0.1742
0.0000
-0.0183
0.3813
0.0000
0.2927
0.1331
0.0000
0.2078
0.1967
0.0000
0.2106
0.1558
0.0000
0.0000
0.0000
0.0000
-0.0748
0.0000
0.0107
0.1228
-0.0222
0.0000
-2.09
1.19
.
0.07
-0.92
.
-0.48
-0.38
.
-0.52
-0.78
.
.
.
.
-3.68
.
3.33
0.34
-2.53
.
0.0365
0.2335
.
0.9456
0.3568
.
0.6298
0.7054
.
0.6043
0.4353
.
.
.
.
0.0002
.
0.0009
0.7368
0.0114
.
Z Pr > |Z|
Score Statistics For Type 3 GEE Analysis
Source
DF
ChiSquare
Pr > ChiSq
2
4
8
1
1
2
36.59
41.36
50.72
11.95
9.66
7.62
<.0001
<.0001
<.0001
0.0005
0.0019
0.0222
functdent
Period
Period*functdent
Sex
BaseAge
nursbeds
Least Squares Means
Effect
Period
functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
1
1
1
2
2
2
3
3
3
0
1
2
0
1
2
0
1
2
Generalized Linear Models Using SAS
Estimate
Mean
L'Beta
2.3728
4.9408
3.9755
2.7066
3.2106
3.3580
2.4572
3.3822
3.2946
0.8641
1.5975
1.3802
0.9957
1.1665
1.2114
0.8990
1.2185
1.1923
10
Standard
Error
DF
0.0313
0.0391
0.0474
0.0425
0.0459
0.0578
0.0489
0.0552
0.0594
1
1
1
1
1
1
1
1
1
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
4
4
4
5
5
5
0
1
2
0
1
2
2.4040
3.1535
3.2489
2.1382
2.8826
2.6790
0.8771
1.1485
1.1783
0.7600
1.0587
0.9855
0.0756
0.0645
0.0821
0.1169
0.0837
0.0856
1
1
1
1
1
1
Least Squares Means
Effect
Period
functdent
ChiSquare
Pr > ChiSq
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
Period*functdent
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
764.44
1670.4
846.19
549.79
646.57
439.67
338.14
488.06
402.68
134.73
317.27
205.80
42.27
160.05
132.45
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
There is a significant Period*Functdent interaction, indicating that the effect of
period differs for different levels of functional dentition. A graph of the number
of services per period for each level of functional dentition would illustrate this.
Generalized Linear Models Using SAS
11