Generalized Linear Models Using Proc Genmod Generalized Linear Models can be fitted using SAS Proc Genmod. This procedure allows you to fit models for binary outcomes, ordinal outcomes, and models for other distributions in the exponential family (e.g., Poisson, negative binomial, gamma). GEE (Generalized Estimating Equations) can be used to fit marginal models with repeated measures, by using the repeated statement. We will be using data from the Apple Tree Dental Plan for these examples. Apple Tree Dental is a non-profit organization whose mission is to provide comprehensive oral health care for people with special dental access needs. This data is for elderly nursing home residents, and was collected as part of Grant R03DE1697601A1 ("Dental Utilization by Nursing Home Residents: 1986-2004", National Institute of Dental and Craniofacial Research), Barbara J. Smith, Principal Investigator. There are 987 patients in this database, with baseline ages from 55 to 102 years. They all entered the program in 1992, and were followed for a maximum of 5 follow-up periods. Each period was from 0 days to 547 days long. A participant could have had a period of zero days length if they came to the program, had their initial dental visit, and then never returned for any follow-up visits. We will be taking a look at the number of claims that these participants made for diagnostic dental services during their first period with Apple Tree Dental, and then over the five possible periods in the dataset. We are mainly interested in comparing three different levels of functional dentition, FUNCTDENT, 0: Edentulous, 1: < 20 teeth, and 2: >=20 teeth. We will also control for other covariates in the analysis. We first take a look at the distribution of the number of Diagnostic services, NUM_DIAGNOSTIC, using histograms for each level of FUNCTDENT. As you can see in the graphs below, the distributions are not normal, in fact, they are highly skewed to the right. Note, that due to the nature of the data, there can be no values less than zero. proc format; value functdent 0="Edentulous" 1="<20 Teeth" 2=">=20 Teeth"; run; Generalized Linear Models Using SAS 1 proc sgpanel data=mylib.appletree; where period=1; panelby functdent / rows=1 novarname; histogram Num_Diagnostic ; format functdent functdent.;run; One of the covariates that we wish to include in the model is the size (NBEDS) of the facility where the person is staying. However, we want to use this as a categorical predictor. We modify the dataset to create a new categorical variable, NURSBEDS, which has a value of 1: 100 or fewer beds, 2: 101-150 beds, or 3: >150 beds in the nursing home where the participant lived. We also want to be sure that we are comparing "rates" of dental services usage, by taking into account the length of time included in the first follow-up period in our model as an offset. We calculate the length of the period in years, rather than days, so the estimated mean values for the outcome will be based on annual, rather than daily rates of usage. We then take the natural log of the number of years, Generalized Linear Models Using SAS 2 after adding .0001 to the value, so the zero values will not be excluded. This variable (LOG_PERIOD_YR) will be the offset in the model. data mylib.appletree2; set mylib.appletree; if nbeds ne . then do; if nbeds < 101 then nursbeds =1; if nbeds >= 101 and nbeds < 151 then nursbeds=2; if nbeds >= 151 then nursbeds=3; end; Period_yr = Period_days/365.25; log_period_yr = log(period_yr+.0001); run; Poisson Regression Model We now fit a Poisson regression model, restricting the analysis to period 1 only, by using a Where Statement. We tell SAS that the Dist=Poisson, so that we get the correct model, and specify the offset as LOG_PERIOD_YR. The link function that is used will be the log function (by default). We get contrasts between different levels of functional dentition using the estimate statement. title "Annual Rate of Diagnostic Services in Period 1"; title2 "Poisson Model"; proc genmod data=mylib.appletree2; where period=1; class sex nursbeds functdent ; model Num_Diagnostic = functdent sex baseage nursbeds / dist=poisson offset = log_period_yr type3; lsmeans functdent; estimate "<20 vs Edent" functdent -1 1 0; estimate ">=20 vs Edent" functdent -1 0 1; estimate ">=20 vs <20" functdent 0 -1 1; run; Generalized Linear Models Using SAS 3 Annual Rate of Diagnostic Services in Period 1 Poisson Model The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable Offset Variable MYLIB.APPLETREE2 Poisson Log Num_Diagnostic log_period_yr Number of Observations Read Number of Observations Used Missing Values 987 981 6 Class Level Information Class Sex nursbeds functdent Levels 2 3 3 Values F M 1 2 3 0 1 2 Parameter Prm1 Prm2 Prm3 Prm4 Parameter Information Effect Sex nursbeds Intercept functdent functdent functdent Prm5 Prm6 Prm7 Prm8 Prm9 Prm10 Sex Sex BaseAge nursbeds nursbeds nursbeds functdent 0 1 2 F M 1 2 3 Criteria For Assessing Goodness Of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood Full Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) DF 974 974 974 974 Value 1339.6041 1339.6041 2146.0464 2146.0464 -500.1528 -1920.6563 3855.3126 3855.4277 3889.5326 Value/DF 1.3754 1.3754 2.2033 2.2033 Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Parameter Intercept functdent functdent functdent Sex 0 1 2 F DF Estimate Standard Error 1 1 1 0 1 0.9799 -0.6784 0.2087 0.0000 -0.1483 0.1987 0.0598 0.0519 0.0000 0.0488 Generalized Linear Models Using SAS Wald 95% Confidence Limits 0.5904 -0.7957 0.1070 0.0000 -0.2440 1.3695 -0.5611 0.3104 0.0000 -0.0525 4 Wald Chi-Square 24.31 128.53 16.18 . 9.21 Pr > ChiSq <.0001 <.0001 <.0001 . 0.0024 Sex M 0 0.0000 BaseAge 1 0.0038 nursbeds 1 1 -0.0418 nursbeds 2 1 -0.0075 nursbeds 3 0 0.0000 Scale 0 1.0000 NOTE: The scale parameter was held 0.0000 0.0024 0.0676 0.0457 0.0000 0.0000 fixed. 0.0000 -0.0010 -0.1743 -0.0970 0.0000 1.0000 0.0000 0.0086 0.0907 0.0821 0.0000 1.0000 . 2.46 0.38 0.03 . . 0.1168 0.5365 0.8704 . LR Statistics For Type 3 Analysis Source functdent Sex BaseAge nursbeds DF ChiSquare Pr > ChiSq 2 1 1 2 325.23 9.03 2.48 0.39 <.0001 0.0027 0.1156 0.8244 Least Squares Means Effect functdent functdent functdent functdent 0 1 2 Estimate Mean L'Beta 1.6966 4.1193 3.3434 Standard Error DF ChiSquare Pr > ChiSq 0.0451 0.0341 0.0465 1 1 1 137.24 1719.8 672.68 <.0001 <.0001 <.0001 0.5286 1.4157 1.2070 Contrast Estimate Results Mean Estimate Label <20 vs Edent >=20 vs Edent >=20 vs <20 2.4280 1.9707 0.8116 Mean Confidence Limits 2.1928 1.7526 0.7332 L'Beta Estimate Standard Error Alpha 0.8871 0.6784 -0.2087 0.0520 0.0598 0.0519 0.05 0.05 0.05 2.6885 2.2159 0.8985 Contrast Estimate Results Label <20 vs Edent >=20 vs Edent >=20 vs <20 L'Beta Confidence Limits 0.7852 0.5611 -0.3104 0.9890 0.7957 -0.1070 ChiSquare Pr > ChiSq 291.00 128.53 16.18 <.0001 <.0001 <.0001 The estimated annual number of diagnostic services for those participants who are edentulous is 1.7, while it is 4.1 for those with < 20 teeth, and 3.3 for those with >=20 teeth. There is a significant difference in the annual number of diagnostic services required in Period 1 between each of the levels of functional dentition, after controlling for the other covariates in the model. Overdispersed Poisson Model Generalized Linear Models Using SAS 5 The value of the deviance divided by its degrees of freedom and the Pearson chisquare divided by its degress of freedom, 1.38 and 2.20, respectively, suggest that there might be some overdispersion. (If the distribution were Poisson, we would expect the deviance divided by degrees of freedom to be close to 1.0). We will next fit an overdispersed Poisson model, using Proc Genmod. To do this, simply insert either scale=Pearson or scale=deviance as an option in the model statement, to obtain an overdispersed Poisson distribution, based on the deviance or Pearson chi-square, respectively. When either of these options is specified, the model estimates are first obtained by setting the scale to 1.0, as for the Poisson distribution; thus the parameter estimates are unchanged from the Poisson model. Then, the scale parameter is estimated by either the square root of the Pearson chi-square/df or the square root of the deviance chi-square/df. The standard errors and other statistics are adjusted accordingly. For example, the standard errors of the parameter estimates are multiplied by the new scale statistic, making the statistical tests more conservative. The syntax to use is illustrated below (output not shown): model Num_Diagnostic = functdent sex baseage nursbeds / scale=Pearson dist=poisson offset = log_period_yr type3; Negative Binomial Model We now refit the model, using dist=negbin, to fit a negative binomial model. title "Annual Rate of Diagnostic Services in Period 1"; title2 "Negative Binomial Regression Model"; proc genmod data=mylib.appletree2; where period=1; class sex nursbeds functdent ; model Num_Diagnostic = functdent sex baseage nursbeds / dist=negbin offset = log_period_yr type3; lsmeans functdent; run; Generalized Linear Models Using SAS 6 The deviance/df and Pearson chi-square/df are now closer to 1.0, so this is an improvement over the original Poisson Model. Criteria for Assessing Goodness of Fit Criterion Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X2 Log Likelihood Full Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) DF 974 974 974 974 Value 1010.3136 1010.3136 1715.2718 1715.2718 -471.6065 -1892.1099 3800.2199 3800.3680 3839.3285 Value/DF 1.0373 1.0373 1.7611 1.7611 Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Parameter Intercept functdent functdent functdent Sex Sex BaseAge nursbeds nursbeds nursbeds Dispersion 0 1 2 F M 1 2 3 DF Estimate Standard Error Wald 95% Confidence Limits 1 1 1 0 1 0 1 1 1 0 1.0088 -0.6906 0.2245 0.0000 -0.1480 0.0000 0.0040 -0.0588 -0.0100 0.0000 0.2363 0.0689 0.0621 0.0000 0.0584 0.0000 0.0029 0.0795 0.0543 0.0000 0.5456 -0.8256 0.1027 0.0000 -0.2626 0.0000 -0.0017 -0.2146 -0.1164 0.0000 1.4719 -0.5557 0.3463 0.0000 -0.0335 0.0000 0.0097 0.0971 0.0964 0.0000 1 0.1448 0.0243 0.0971 0.1925 Wald Chi-Square 18.22 100.61 13.05 . 6.42 . 1.85 0.55 0.03 . Pr > ChiSq <.0001 <.0001 0.0003 . 0.0113 . 0.1735 0.4599 0.8539 . LR Statistics For Type 3 Analysis Source functdent Sex BaseAge nursbeds DF ChiSquare Pr > ChiSq 2 1 1 2 226.22 6.35 1.86 0.55 <.0001 0.0118 0.1729 0.7589 Least Squares Means Effect functdent functdent functdent functdent 0 1 2 Estimate Mean L'Beta 1.7316 0.5490 4.3239 1.4642 3.4544 1.2397 Standard Error 0.0512 0.0422 0.0552 DF 1 1 1 ChiSquare 115.04 1201.3 503.61 Pr > ChiSq <.0001 <.0001 <.0001 There are some minor differences in the model estimates and standard errors for this negative binomial model vs. the original Poisson model. We can carry out a test Generalized Linear Models Using SAS 7 to decide whether the data are better fit using an overdispersed Poisson distribution, against alternatives of the form: V ( ) k 2 which is appropriate for a negative binomial distribution. This is a Lagrange Multiplier test in SAS (Cameron and Trivedi, 1988). To obtain this test in Proc Genmod, insert the noscale option in the negative binomial model statement, after the /. model Num_Diagnostic = functdent sex baseage nursbeds dist=negbin offset = log_period_yr type3; / noscale The Lagrange Multiplier test is added to the output window. The results of this test are significant, indicating that we would reject H0, and conclude that the Negative Binomial model is a better choice for this analysis. Lagrange Multiplier Statistics Parameter Dispersion Chi-Square 37.1391 Pr > ChiSq <.0001 Generalized Estimating Equations (GEE) Model for Clustered Data: We now examine a model for the Apple Tree Dental data, but this time, we include observations for up to 5 periods for each participant. We use the repeated statement in SAS to set up the subject (RANDOM_ID) and the correlation type (exchangeable). Other correlation types can be examined as well. SAS will automatically use "sandwich" estimates (empirical estimates) of the standard errors for GEE models. The syntax below shows the inclusion of PERIOD, and the PERIOD*FUNCTDENT interaction in the model statement. We also include a repeated statement to set up the desired correlation structure among observations for the same participant. title "Annual Rate of Diagnostic Services Across Periods"; proc genmod data=mylib.appletree2; where nmiss(Num_Diagnostic,functdent,nursbeds,baseage)=0; class random_id sex nursbeds period functdent; model Num_Diagnostic = functdent period functdent*period sex baseage nursbeds / dist=negbin offset = log_period_yr type3; Generalized Linear Models Using SAS 8 repeated subject=random_id / type=exch ; lsmeans functdent*period; run; The output from this model fit is shown below: Annual Rate of Diagnostic Services Across Periods The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable Offset Variable MYLIB.APPLETREE2 Negative Binomial Log Num_Diagnostic log_period_yr Number of Observations Read Number of Observations Used Class Levels Random_ID 981 Sex nursbeds Period functdent 2 3 5 3 2892 2892 Class Level Information Values 1 2 3 21 22 38 39 55 56 72 73 ... F M 1 2 3 1 2 3 0 1 2 4 5 6 23 24 40 41 57 58 74 75 7 8 9 25 26 42 43 59 60 76 77 10 27 44 61 78 11 28 45 62 79 12 29 46 63 80 13 30 47 64 81 14 31 48 65 82 15 32 49 66 83 16 33 50 67 84 17 34 51 68 85 4 5 Algorithm converged. GEE Model Information Correlation Structure Subject Effect Number of Clusters Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size Exchangeable Random_ID (981 levels) 981 5 5 1 Algorithm converged. Exchangeable Working Correlation Correlation -0.011628583 GEE Fit Criteria QIC 19.7690 QICu 57.6942 Generalized Linear Models Using SAS 9 18 35 52 69 86 19 36 53 70 87 20 37 54 71 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Parameter Intercept functdent functdent functdent Period Period Period Period Period 0 1 2 1 2 3 4 5 Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Sex Sex BaseAge nursbeds nursbeds nursbeds 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 F M 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 1 2 3 Estimate Standard Error 95% Confidence Limits 0.5355 -0.2255 0.0732 0.0000 0.3947 0.2259 0.2068 0.1929 0.0000 0.1739 0.1411 0.1146 0.0000 0.0830 0.0862 0.0856 0.0899 0.0000 0.1947 -0.5020 -0.1515 0.0000 0.2321 0.0570 0.0390 0.0167 0.0000 0.8764 0.0511 0.2979 0.0000 0.5573 0.3948 0.3746 0.3690 0.0000 3.08 -1.60 0.64 . 4.76 2.62 2.42 2.15 . 0.0021 0.1100 0.5230 . <.0001 0.0087 0.0157 0.0319 . -0.2906 0.1441 0.0000 0.0098 -0.1181 0.0000 -0.0678 -0.0470 0.0000 -0.0757 -0.1030 0.0000 0.0000 0.0000 0.0000 -0.1599 0.0000 0.0068 0.0180 -0.0982 0.0000 0.1389 0.1210 0.0000 0.1443 0.1282 0.0000 0.1406 0.1243 0.0000 0.1461 0.1320 0.0000 0.0000 0.0000 0.0000 0.0434 0.0000 0.0020 0.0535 0.0388 0.0000 -0.5629 -0.0930 0.0000 -0.2730 -0.3693 0.0000 -0.3434 -0.2906 0.0000 -0.3620 -0.3618 0.0000 0.0000 0.0000 0.0000 -0.2451 0.0000 0.0028 -0.0869 -0.1742 0.0000 -0.0183 0.3813 0.0000 0.2927 0.1331 0.0000 0.2078 0.1967 0.0000 0.2106 0.1558 0.0000 0.0000 0.0000 0.0000 -0.0748 0.0000 0.0107 0.1228 -0.0222 0.0000 -2.09 1.19 . 0.07 -0.92 . -0.48 -0.38 . -0.52 -0.78 . . . . -3.68 . 3.33 0.34 -2.53 . 0.0365 0.2335 . 0.9456 0.3568 . 0.6298 0.7054 . 0.6043 0.4353 . . . . 0.0002 . 0.0009 0.7368 0.0114 . Z Pr > |Z| Score Statistics For Type 3 GEE Analysis Source DF ChiSquare Pr > ChiSq 2 4 8 1 1 2 36.59 41.36 50.72 11.95 9.66 7.62 <.0001 <.0001 <.0001 0.0005 0.0019 0.0222 functdent Period Period*functdent Sex BaseAge nursbeds Least Squares Means Effect Period functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent 1 1 1 2 2 2 3 3 3 0 1 2 0 1 2 0 1 2 Generalized Linear Models Using SAS Estimate Mean L'Beta 2.3728 4.9408 3.9755 2.7066 3.2106 3.3580 2.4572 3.3822 3.2946 0.8641 1.5975 1.3802 0.9957 1.1665 1.2114 0.8990 1.2185 1.1923 10 Standard Error DF 0.0313 0.0391 0.0474 0.0425 0.0459 0.0578 0.0489 0.0552 0.0594 1 1 1 1 1 1 1 1 1 Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent 4 4 4 5 5 5 0 1 2 0 1 2 2.4040 3.1535 3.2489 2.1382 2.8826 2.6790 0.8771 1.1485 1.1783 0.7600 1.0587 0.9855 0.0756 0.0645 0.0821 0.1169 0.0837 0.0856 1 1 1 1 1 1 Least Squares Means Effect Period functdent ChiSquare Pr > ChiSq Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent Period*functdent 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 764.44 1670.4 846.19 549.79 646.57 439.67 338.14 488.06 402.68 134.73 317.27 205.80 42.27 160.05 132.45 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 There is a significant Period*Functdent interaction, indicating that the effect of period differs for different levels of functional dentition. A graph of the number of services per period for each level of functional dentition would illustrate this. Generalized Linear Models Using SAS 11