Ordinal Regression Analysis: Fitting the Proportional Odds Model Using Stata and SAS Xing Liu Neag School of Education University of Connecticut Purpose Introduce Ordinal Logistic Regression Analysis Demonstrate the use of the proportional odds (PO) model using Stata (V. 9.0) Compare the results of the proportional odds model using both Stata OLOGIT and SAS LOGISTIC. Why Ordinal Regression Analysis? Ordinal Dependent Variable Teaching experience SES (high, middle, low) Degree of Agreement Ability level (e.g. literacy, reading) Context is important Why Using STATA and SAS? Both are powerful general statistics software packages Stata are more powerful in the analysis of binary logistic regression and ordinal logistic regression SAS has two options for Ordinal dependent variable Ascending Descending Proportional Odds Model (1) One of several possible regression models for the analysis of ordinal data, and also the most common. Model predicts the ln(odds) of being in category j or beyond. Simplifying assumption Effect of an IV assumed to be invariant across splits Proportional Odds Model (2) Model predicts cumulative logits across K-1 response option categories. For K=6, (here, Y = 0 to 5) these cumulative logits can be used to make predictions for the K-1=five cumulative probabilities, given the collection of explanatory variables: Logit = ln(odds); Probability = odds / (1 + odds) logits odds estimated probability A Latent-variable Model (1) It can be expressed as a latent variable model (Agresti, 2002; Greene, 2003; Long, 1997, Long & Freese, 2006; Powers & Xie, 2000; Wooldridge & Jeffrey, 2001) Assuming a latent variable, Y* exists, we can define Y* = xβ + ε Let Y* be divided by some cut points (thresholds): α1, α2, α3… αj, and α1<α2<α3…< αj. The observed child’s literacy proficiency level is the ordinal outcome, y, ranging from 0 to 5 A Latent-variable Model (2) We define: 0 1 2 y 3 4 5 if y* 1 if 1 y* 2 if 2 y* 3 if 3 y* 4 if 4 y* 5 if 5 y* A Latent-variable Model (3) We can compute probability of a child attaining each proficiency level. P(y=0) = P (y* ≤α1) = P(xβ + ε ≤ α1) = F (α1- xβ); P(y=1) = P (α1<y* ≤α2) = F (α2- xβ)- F (α1- xβ); … P(y=4) = P (α4<y* ≤α5) = F (α5- xβ)- F (α4- xβ); P(y=5) = P (y* >α5) = 1- F (α5- xβ). We can also compute the cumulative probabilities using the form: P(Y≤j) = F (αj xβ) General Logistic Regression Model In a binary logistic regression model, we predict the log(odds) of success on a set of predictors. ( x) 1 X 1 2 X 2 ... p X p ln( Y ' ) ln 1 - ( x) In Stata, the ordinal logistic regression model is expressed as: j ( x) j 1 X 1 2 X 2 ... p X p ln( Y j ' ) ln 1 - ( x) j SAS uses a different ordinal logit model for estimating the parameters j ( x) j 1 X 1 2 X 2 ... p X p ln( Y j ' ) ln 1 - ( x) j Methodology Sample: ECLS-K Longitudinal Study (NCES) n= 3365 from 225 schools Outcome variable: proficiency in early reading Eight explanatory variables The PO model with a single explanatory variable was fitted first Full-model with all eight explanatory variables The assumption of the PO models were tested Software packages: STATA and SAS Results Figure 1: Full-Model Analysis of Proportional Odds Using Stata Figure 2: Brant Test of Parallel Regression (Proportional Odds) Assumption Figure 3: Measure of Fit Statistics for FullModel Conclusions Both packages produce the same or similar results in model fit statistics and the test of the proportional odds assumption Stata produces more detailed information of PO assumption test, and fit statistics The estimated coefficients and cut points (thresholds) are the same in magnitude but may be reversed in sign Compared to Stata, SAS (ascending and descending) does not negate the signs before the logit coefficients in the equations References Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: John Wiley & Sons. Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons. Allison, P.D. (1999). Logistic regression using the SAS system: Theory and application. Cary, NC: SAS Institute, Inc. Ananth, C. V. and Kleinbaum, D. G. (1997). Regression models for ordinal responses: A review of methods and applications. International Journal of Epidemiology 26, p. 1323-1333. Armstrong, B. B. & Sloan, M. (1989). Ordinal regression models for epidemiological data. American Journal of Epidemiology, 129(1), 191204. Bender, R. & Benner, A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biometrical Journal, 42(6), 677-699. Bender, R. & Grouven, U. (1998). Using binary logistic regression models for ordinal data with non-proportional odds. Journal of Clinical Epidemiology, 51(10), 809-816. Brant (1990). Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics, 46, 1171-1178. Clogg, C.C., Shihadeh, E.S. (1994). Statistical models for ordinal variables. Thousand Oaks, CA: Sage. Greene, William H. (2003). Econometric analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall. Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: John Wiley & Sons. Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage. Long, J. S. & Freese, J. (2006). Regression models for categorical dependent variables using Stata (2nd ed.). Texas: Stata Press. McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society Ser. B, 42, 109-142. McCullagh, P. & Nelder, J. A. (1989). Generalized linear models(2nd ed.). London: Chapman and Hall. Menard, S. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage. O’Connell, A.A., (2000). Methods for modeling ordinal outcome variables. Measurement and Evaluation in Counseling and Development, 33(3), 170-193. O’Connell, A. A. (2006). Logistic regression models for ordinal response variables. Thousand Oaks: SAGE. O’Connell, A.A., Liu, X., Zhao, J., & Goldstein, J. (2006, April). Model Diagnostics for proportional and partial proportional odds models. Paper presented at the 2006 Annual American Educational Research Association (AERA). San Francisco, CA. Powers D. A., & Xie, Y. (2000). Statistical models for categorical data analysis. San Diego, CA: Academic Press. Wooldridge, Jeffrey M. (2001). Econometric analysis of cross section and panel data. Cambridge, MA: The MIT Press.