Ordinal Regression Analysis: Fitting the Proportional Odds Model Using Stata and SAS

advertisement
Ordinal Regression Analysis:
Fitting the Proportional Odds
Model Using Stata and SAS
Xing Liu
Neag School of Education
University of Connecticut
Purpose



Introduce Ordinal Logistic Regression
Analysis
Demonstrate the use of the proportional odds
(PO) model using Stata (V. 9.0)
Compare the results of the proportional odds
model using both Stata OLOGIT and SAS
LOGISTIC.
Why Ordinal Regression Analysis?

Ordinal Dependent Variable





Teaching experience
SES (high, middle, low)
Degree of Agreement
Ability level (e.g. literacy, reading)
Context is important
Why Using STATA and SAS?



Both are powerful general statistics software
packages
Stata are more powerful in the analysis of
binary logistic regression and ordinal logistic
regression
SAS has two options for Ordinal dependent
variable


Ascending
Descending
Proportional Odds Model (1)



One of several possible regression models
for the analysis of ordinal data, and also the
most common.
Model predicts the ln(odds) of being in
category j or beyond.
Simplifying assumption

Effect of an IV assumed to be invariant across
splits
Proportional Odds Model (2)





Model predicts cumulative logits across K-1
response option categories.
For K=6, (here, Y = 0 to 5) these cumulative
logits can be used to make predictions for the
K-1=five cumulative probabilities, given the
collection of explanatory variables:
Logit = ln(odds);
Probability = odds / (1 + odds)
logits  odds  estimated probability
A Latent-variable Model (1)




It can be expressed as a latent variable model
(Agresti, 2002; Greene, 2003; Long, 1997, Long &
Freese, 2006; Powers & Xie, 2000; Wooldridge &
Jeffrey, 2001)
Assuming a latent variable, Y* exists, we can define
Y* = xβ + ε
Let Y* be divided by some cut points (thresholds):
α1, α2, α3… αj, and α1<α2<α3…< αj.
The observed child’s literacy proficiency level is the
ordinal outcome, y, ranging from 0 to 5
A Latent-variable Model (2)
We define:
0
1

2
y
3
4

5
if y*  1


if 1  y*   2 
if  2  y*   3 

if  3  y*   4 
if  4  y*   5 

if  5  y*   
A Latent-variable Model (3)

We can compute probability of a child
attaining each proficiency level.






P(y=0) = P (y* ≤α1) = P(xβ + ε ≤ α1) = F (α1- xβ);
P(y=1) = P (α1<y* ≤α2) = F (α2- xβ)- F (α1- xβ);
…
P(y=4) = P (α4<y* ≤α5) = F (α5- xβ)- F (α4- xβ);
P(y=5) = P (y* >α5) = 1- F (α5- xβ).
We can also compute the cumulative
probabilities using the form: P(Y≤j) = F (αj xβ)
General Logistic Regression Model


In a binary logistic regression model, we predict the
log(odds) of success on a set of predictors.
  ( x) 
    1 X 1   2 X 2  ... p X p 
ln( Y ' )  ln 
 1 -  ( x) 
In Stata, the ordinal logistic regression model is
expressed as:
  j ( x) 
   j   1 X 1   2 X 2  ... p X p 
ln( Y j ' )  ln 
 1 -  ( x) 
j



SAS uses a different ordinal logit model for
estimating the parameters
  j ( x) 
   j  1 X 1   2 X 2  ... p X p 
ln( Y j ' )  ln 
 1 -  ( x) 
j


Methodology

Sample: ECLS-K Longitudinal Study (NCES)







n= 3365 from 225 schools
Outcome variable: proficiency in early reading
Eight explanatory variables
The PO model with a single explanatory variable
was fitted first
Full-model with all eight explanatory variables
The assumption of the PO models were tested
Software packages: STATA and SAS
Results
Figure 1: Full-Model Analysis of
Proportional Odds Using Stata
Figure 2: Brant Test of Parallel Regression
(Proportional Odds) Assumption
Figure 3: Measure of Fit Statistics for FullModel
Conclusions




Both packages produce the same or similar results
in model fit statistics and the test of the proportional
odds assumption
Stata produces more detailed information of PO
assumption test, and fit statistics
The estimated coefficients and cut points
(thresholds) are the same in magnitude but may be
reversed in sign
Compared to Stata, SAS (ascending and
descending) does not negate the signs before the
logit coefficients in the equations
References





















Agresti, A. (2002). Categorical data analysis (2nd ed.). New York: John Wiley & Sons.
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley & Sons.
Allison, P.D. (1999). Logistic regression using the SAS system: Theory and application. Cary, NC: SAS Institute, Inc.
Ananth, C. V. and Kleinbaum, D. G. (1997). Regression models for ordinal responses: A review of methods and applications. International
Journal of Epidemiology 26, p. 1323-1333.
Armstrong, B. B. & Sloan, M. (1989). Ordinal regression models for epidemiological data. American Journal of Epidemiology, 129(1), 191204.
Bender, R. & Benner, A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biometrical Journal, 42(6), 677-699.
Bender, R. & Grouven, U. (1998). Using binary logistic regression models for ordinal data with non-proportional odds. Journal of Clinical
Epidemiology, 51(10), 809-816.
Brant (1990). Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics, 46, 1171-1178.
Clogg, C.C., Shihadeh, E.S. (1994). Statistical models for ordinal variables. Thousand Oaks, CA: Sage.
Greene, William H. (2003). Econometric analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Hosmer, D. W. & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York: John Wiley & Sons.
Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage.
Long, J. S. & Freese, J. (2006). Regression models for categorical dependent variables using Stata (2nd ed.). Texas: Stata Press.
McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society Ser. B, 42, 109-142.
McCullagh, P. & Nelder, J. A. (1989). Generalized linear models(2nd ed.). London: Chapman and Hall.
Menard, S. (1995). Applied logistic regression analysis. Thousand Oaks, CA: Sage.
O’Connell, A.A., (2000). Methods for modeling ordinal outcome variables. Measurement and Evaluation in Counseling and Development,
33(3), 170-193.
O’Connell, A. A. (2006). Logistic regression models for ordinal response variables. Thousand Oaks: SAGE.
O’Connell, A.A., Liu, X., Zhao, J., & Goldstein, J. (2006, April). Model Diagnostics for proportional and partial proportional odds models.
Paper presented at the 2006 Annual American Educational Research Association (AERA). San Francisco, CA.
Powers D. A., & Xie, Y. (2000). Statistical models for categorical data analysis. San Diego, CA: Academic Press.
Wooldridge, Jeffrey M. (2001). Econometric analysis of cross section and panel data. Cambridge, MA: The MIT Press.
Download