Paper - Southern Methodist University

advertisement
Running Head: REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
Regression Analysis with the Ordered Multinomial Logistic Model
Braden Hoelzle
Southern Methodist University
______________________________
Paper presented at the annual meeting of the Southwest Educational Research Association, San
Antonio, TX, February 2-4, 2011.
1
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
2
Regression Analysis with the Ordered Multinomial Logistic Model
While logistic regression is a commonly utilized subset of regression when modeling a
dichotomous dependent variable, very few researchers make use of the ordered multinomial
logistic model. The ordered multinomial logistic model can be understood as an extension of the
logistic regression model that is useful when modeling ordinal dependent variables with three or
more categories (eg. yes, maybe, no; first, second, third; Likert scale, etc). The rarity of its use
in educational research stands in sharp contrast to the abundance of ordinal data in educational
studies. In fact Johnson and Albert argue that “ordinal data are the most frequently encountered
type of data in the social sciences” (1999, p. 126). If ordinal data types occur so frequently in
social science research and if the ordered multinomial model is rarely used, how are researchers
modeling ordinal data? In many cases they are treating ordinal data like interval scale data and
modeling it using standard regression techniques. However, ordinal dependent variables often
violate the assumptions of regression and produce unreliable results. This paper will give
examples of ordinal data, examine briefly the debate around modeling ordinal data using linear
regression, and demonstrate how to model ordinal data with the ordered multinomial logistic
model using the open-source statistical program R. In addition, the appendix of this paper will
include a script file containing all the R code necessary for running two example regressions
using the ordered logistic model.
Ordinal Data in Educational Research
Were Johnson and Albert (1999) exaggerating when they said that ordinal data is the
most frequently found data type in social science research? Probably not. Table 1 contains a
few examples of ordinal data frequently found in education research. However, it should be
noted that the ordered multinomial logistic model is only appropriate when the dependent
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
3
variable follows an ordinal scale. Ordinal data can serve as predictor variables in linear
regression without increasing the likelihood of violating the assumptions of the model.
Table 1
Examples of Ordinal Data Found in Educational Research
always, frequently, sometimes, rarely, never
low, medium, high
no high school diploma, high school diploma,
some college, bachelor’s degree, master’s
degree, doctoral degree
0-10K per year, 10-20K per year, 20-30K per
year, 30 – 60K per year, > 60K per year
strongly agree, agree, neither agree nor
disagree, disagree, strongly disagree
yes, maybe, no
free school lunch, reduced school lunch, full
price lunch
not proficient, proficient, commended
basic math, regular math, pre-AP math, AP
math
first, second, third, fourth
A, B, C, D, F
tier 1, tier 2, tier 3
Modeling Ordinal Data Using Linear Regression
A common practice among social science researchers is to treat ordinal data as intervallyscaled data (Jamieson, 2004). For example, a person might conduct a survey using a Likert scale
that attempts to measure a person’s level of agreement with the 2010 health care reform bill.
Often the Likert scale is coded from one to five and modeled using linear regression as if it
consisted of five equally distanced integers. However, if the Likert scale were intervally scaled
one could argue that the difference in the level of agreement between someone who marked
Strongly Disagree (1) and someone who marked Neither Agree nor Disagree (3) is the same as
the distance between someone who marked Disagree (2) and someone who marked Agree (4).
While mathematically each of the differences equal 2, it would be inappropriate to say that the
difference in the level of agreement within pair one is the same as that within pair two.
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
4
Therefore, treating ordinal data as intervally scaled creates problems for interpreting the results
and often violates the assumptions necessary for regression.
One of the underlying assumptions of linear regression is that the data is distributed
around a single linear line defined as π‘Œ = 𝛽0 + 𝛽1 𝑋1 with a normally distributed error term
which has a mean of zero and constant variance (McKelvey & Zavoina, 1975). However, often
an ordinal dependent variable violates this assumption of linearity and constant error variance.
When this assumption is not met the results of the regression cannot be trusted and according to
Lu should be “accepted with a grain of salt” (1999, p. 264). However, the decision between
linear regression and ordered multinomial regression is not always black and white. According
to Gelman and Hill (2007) unequal error variance is in many cases a minor issue. In addition,
they indicate that when an ordinal dependent variable has a large number of categories that can
be considered equally spaced simple linear regression is an optional alternative. While
significant debate surrounds the decision of when ordinal data can be modeled using parametric
measures designed for interval data (Carifio & Perla, 2007; Jamieson, 2004; Knapp, 1990;
Labovitz, 1970; Lu, 1999) this paper is primarily concerned with showing how to model ordinal
data using the ordered multinomial logistic regression model. It is this author’s opinion that
when modeling between three and five ordered categories the ordered multinomial logistic model
is highly preferable. Further, when setting up a research study in which you intend to use
standard parametric statistics on ordinal data it is usually preferable to use more rather than
fewer categories – ie. ten instead of five (Knapp, 1990).
Modeling Ordinal Data with the Ordered Multinomial Logistic Model
As mentioned earlier, the ordered multinomial logistic model is an extension of logistic
regression when the data falls into more than two ordered categories. Like logistic regression,
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
5
the ordered logistic model requires that the continuously-scaled linear coefficients derived from
regression must be transformed using the inverse logit function onto the range (0, 1). The
inverse logit function can be written
πΏπ‘œπ‘”π‘–π‘‘ −1 (π‘₯) =
𝑒π‘₯
1 + 𝑒π‘₯
This transformation enables the calculation of probabilities that in logistic regression can
be interpreted as the probability that a person does or doesn’t have a particular trait. However, in
the ordered multinomial logistic model this transformation enables the calculation of the
probability that a person belongs in a given category. In order to conceptualize this concept one
needs some knowledge of cumulative distribution functions (see figure 1). A cumulative
distribution function (CDF) can take values in the range [0,1] that are equivocal to the
probability that the random variable X takes a value less than or equal to x. For example, in
figure 1 the probability that X is less than or equal to x1 (P0) is 45 percent. And by definition,
Figure 1
Cumulative Distribution Function
1.0
Cumulative Distribution Function F(X)
x3
P2
0.6
x1
0.2
0.4
P1
P0
0.0
pnorm(x)
0.8
x2
-2
-1
0
1
x
2
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
the probability that X is greater than x1 is equal to 55 percent. However to determine the
probability that X falls somewhere between x1 and x2 (P1) we would need to subtract the
probability that X ≤ x1 from the probability that X ≤ x2 (85 - 45 = 40 percent). Therefore, for
each cut point (x1, x2, and x3) the CDF is equal to the probability that X is less than or equal to
that cut point. To crystallize these concepts we will apply the ordered multinomial logistic
model to a dataset using the R statistical package “bayespolr” in the “arm” library.
Example Ordered Multinomial Logistic Model
The hypothetical dataset for this analysis can be downloaded from the UCLA Academic
Technology Services ( n.d.). Before running the analysis the “arm” and “psych” libraries must
be loaded and the dataset read into R.
grad<-read.csv("http://www.ats.ucla.edu/stat/r/dae/ologit.csv")
This dataset contains four variables defined in table 2. In this ordered regression model
apply will serve as the ordinal outcome variable and will be modeled by the variable gpa.
Table 2
Legend for Variables
Variable Names
apply
pared
Code
0 = unlikely, 1 = somewhat
likely, 2 = very likely
0 = no, 1 = yes
public
0 = private, 1 = public
gpa
Real number from 0 – 4
Definition
college juniors reported likelihood of
applying to graduate school
indicating whether at least one parent has
a graduate degree
indicating whether the undergraduate
institution is public or private
college gpa
The reliability of an ordered multinomial regression model is contingent upon the
assumptions that there is sufficient data that falls into each category of the dependent variable.
6
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
This assumption can be checked using crosstabs to identify any cells with none or very few
elements. It appears that this data met our assumptions for the ordered regression model.
> xtabs(~ pared + apply)
apply
pared
0
1
2
0 200 110 27
1 20 30 13
> xtabs(~ public + apply)
apply
public
0
1
2
0 189 124 30
1 31 16 10
Notice in the following code that for the ordered regression model the dependent
variable must be defined as ordinal in the script.
summary(m1 <- bayespolr(as.ordered(apply)~gpa,data=mydata))
Call:
bayespolr(formula = as.ordered(apply) ~ gpa, data = mydata)
Coefficients:
Value Std. Error t value
gpa 0.7109826
0.247078 2.877563
Intercepts:
Value Std. Error t value
0|1 2.3308 0.7502
3.1068
1|2 4.3508 0.7744
5.6183
Residual Deviance: 737.6921
AIC: 743.6921
The model produces a coefficient for the predictive variable gpa, and two cut points (0|1
and 1|2) representing the division between the categories 0 and 1 and 1 and 2 respectively.
Hence, at cut point 0|1 the CDF would produce the probability that our dependent variable
7
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
8
(apply) is less than or equal to the dividing line between 0 and 1 (ie. apply = 0). However, these
coefficients are not interpretable until they are mapped onto the range (0,1) using the inverse
logit function which can be expanded to read
𝑒 (𝐡1 𝑋𝑖 +𝐡0 )
𝑝̂ 𝑖 =
1 + 𝑒 (𝐡1 𝑋𝑖 +𝐡0 )
Rather than calculating by hand the probabilities of a person falling into each of the three
categories of the outcome variable apply the function can be coded into R using the following
script.
(beta <- coef(m1))
(tau <- m1$zeta)
x<- 3
logit.prob <- function(eta){exp(eta)/(1+exp(eta))}
(p0<- logit.prob(tau[1] - x * beta))
(p1<- logit.prob(tau[2] - x*beta)-logit.prob(tau[1] - x * beta))
(p2<- 1 - logit.prob(tau[2] - x * beta))
Using the function entitled logit.prob the script transforms the linear coefficients such
that p0 equals the probability that apply = 0 given a gpa of 3. The gpa was set to 3 since the
mean of gpa for this sample was 2.99. Using these formulas we derive that for students with a
gpa of 3, about 55 percent indicated that they were unlikely to apply to graduate school, 35
indicated that they were somewhat likely to apply, and 10 percent indicated that they were very
likely to apply. Notice, that p1 is calculated by subtracting the probability that apply is less than
cut point 0|1 ( tau[1] ) from the probability that apply is less than cut point (1|2) ( tau[2] ). Also,
p2 is calculated by subtracting the probability that apply is less than cut point 1|2 from 1, since
the sum of all the subset probabilities (p0, p1, p2) must equal 1.
By allowing gpa to vary and calculating a probability matrix for each of the three
categories of apply we can determine the likelihood of applying to graduate school given various
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
9
gpa’s. Further, we can add other predictor variables to the model to determine the likelihood of
applying to graduate school given a range of predictor variable values. Figure 2 graphically
displays the different probabilities of applying to graduate school for students who had parents
who attended graduate school and those who did not across a range of undergraduate gpa’s.
Figure 2
Probability of Applying to Grad School
From this graph we can draw two primary conclusions. First, students with at least one
parent who attended graduate school are more likely to apply to graduate school than students
whose parent(s) did not, across the entire range of potential gpa scores. Second, the likelihood of
applying for graduate school increases as GPA increases for all students.
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
10
Conclusion
The ordered multinomial logistic model offers an important alternative to standard linear
regression when modeling ordinal data. While significant debate still surrounds whether ordinal
data can be modeled using standard parametric regression techniques, in general, ordinal data
with few categories that do not approximate an interval scale should be modeled with nonparametric models such as the ordered multinomial logistic regression model. For a comparison
of the various models available for modeling ordinal data consult Fullerton (2009) or Winship
and Mare (1984). To replicate using R the analyses conducted for this article and to explore a
second example see Appendix A. The ordered multinomial model is an under-utilized regression
model that offers a non-parametric approach to modeling ordinal data.
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
11
References
Carifio, J. & Perla, R. J. (2007). Ten common misunderstandings, misconceptions, persistent
myths and urban legends about Likert scales and Likert response formats and their
antidotes. Journal of Social Sciences, 3 (3), 106-116.
Fullerton, A. S. (2009). A conceptual framework for ordered logistic regression models.
Sociological Methods & Research, 38 (2), 306–347. doi: 10.1177/0049124109346162
Gelman, A. & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models.
New York: Cambridge University Press.
Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38, 1212-1218.
Johnson, V. E. & Albert, J. H. (1999). Statistics for the social sciences and public policy:
Ordinal data modeling. New York: Springer.
Knapp, T. R. (1990). Treating ordinal scales as interval scales: An attempt to resolve the
controversy. Nursing Research, 39 (2), 121-123.
Labovitz, S. (1970). The assignment of numbers to rank order categories. American
Sociological Review, 35 (3), 515-524.
Lu, M. (1999). Determinants of residential satisfaction: Ordered logit vs. regression models.
Growth and Change, 30, 264-287.
McKelvey, R. D. & Zavoina, W. (1975). A statistical model for the analysis of ordinal level
dependent variables. Journal of Mathematical Sociology, 4, 104-120.
Quinn, K. (n.d.). NES 1996 [Data file and Code Book] Retrieved from
http://www.stat.washington.edu /quinn/classes/536/data/nes96r.dat
UCLA Academic Technology Services ( n.d.). [Hypothetical Data file] Retrieved from
http://www.ats.ucla.edu/stat/r/dae/ologit.csv
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
Winship, C. & Mare, R. D. (1984). Regression models with ordinal variables. American
Sociological Review, 49, 512-525.
12
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
Appendix A
###### Script for Running the Ordered Multinomial Logist
Regression
#### Load Libraries
library(arm)
library(psych)
#### Load Data
grad <read.csv("http://www.ats.ucla.edu/stat/r/dae/ologit.csv")
##### Examine Data
str(grad)
summary(grad$gpa)
table(grad$apply)
table(grad$pared)
table(grad$public)
xtabs(~ grad$pared + grad$apply)
xtabs(~ grad$public + grad$apply)
##### Run ordered multinomial logistic regression
summary(m1 <- bayespolr(as.ordered(apply)~gpa,data=grad))
##### In order to interpret it we need to transform the data
into a probability.
str(m1)
(beta <- coef(m1))
(tau <- m1$zeta)
x<- 3
##### Note: mean = 2.999
logit.prob <- function(eta){exp(eta)/(1+exp(eta))}
(p1 <- logit.prob(tau[1] - x * beta))
(p2<- logit.prob(tau[2] - x * beta) - logit.prob(tau[1] - x *
beta))
(p3<- 1 - logit.prob(tau[2] - x * beta))
p1+p2+p3
13
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
##### Adding multiple predictors
summary(m2 <- bayespolr(as.ordered(apply)~gpa + pared + public
,data=grad))
#### Transforming the Coefficients to probabilities
(beta <- coef(m2))
(tau <- m2$zeta)
(x<- cbind(0:4, 0 , .15))
(x2<-cbind(0:4, 1 , .15))
logit.prob <- function(eta){exp(eta)/(1+exp(eta))}
(p1 <- logit.prob(tau[1] - x %*% beta))
(p2<- logit.prob(tau[2] - x %*% beta) - logit.prob(tau[1] - x
%*% beta))
(p3<- 1 - logit.prob(tau[2] - x %*% beta))
(p4 <- logit.prob(tau[1] - x2 %*% beta))
(p5<- logit.prob(tau[2] - x2 %*% beta) - logit.prob(tau[1] - x2
%*% beta))
(p6<- 1 - logit.prob(tau[2] - x2 %*% beta))
#### Plotting our probability curves
Undergrad.GPA <-0:4
plot(Undergrad.GPA, p1, type="l", col=1, ylim=c(0,1))
lines(0:4, p2, col=2)
lines(0:4, p3, col=3)
lines(0:4, p4, col=1, lty = 2)
lines(0:4, p5, col=2, lty = 2)
lines(0:4, p6, col=3, lty = 2)
legend(1.5, 1, legend=c("P(unlikely)", "P(somewhat likely)",
"P(very likely)", "Line Type when Pared = 0",
"Line Type when Pared = 1"), col=c(1:3,1,1), lty=c(1,1,1,1,2))
##### Run model using Linear Regression
summary(m1.2<-lm(apply~gpa, data=grad))
#### Plot regression
plot(apply~gpa, data=grad)
abline(m1.2, col = "red")
#### Check Assumptions
par(mfrow = c(2,2))
plot(m1.2)
14
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
15
#### Will someone w/ a 3.0 gpa likely apply to gradschool?
(y.hat<-(-.2201 + (.2568 * 3)))
#### .5503 is their predicted apply score which means they are
somewhere
#### between unlikely and somewhat likely. This is somewhat
difficult to interpret
#### and it completely violates the assumptions for regression.
#### Practice Problem ****Quinn, K. (n.d.). *****
##### Description of Data (nes96)
### For more details see "NES 1996 [Data file and Code Book]
##Retrieved from
##http://www.stat.washington.edu /quinn/classes/536/data"
#ClinLR = Ordinal variable from 1-7 indicating ones view of Bill
Clinton’s
#political leanings, where 1 = extremely liberal, 2 = liberal, 3
= slightly
#liberal, 4 = moderate, 5= slightly conservative, 6 =
conservative, 6 =
#extremely conservative.
#PID = Ordinal variable from 0-6 indicating ones own political
identification,
#where 0 = Strong Democrat and 6 = Strong Republican
#educ = Ordinal variable from 1-7 indicating ones own level of
education,
#where 1 = 8 grades or less and no diploma, 2 = 9-11 grades, no
further
#schooling, 3 = High school diploma or equivalency test, 4 =
More than 12
#years of schooling, no higher degree, 5 = Junior or community
college level
#degree (AA degrees), 6 = BA level degrees; 17+ years, no
postgraduate degree,
#7 = Advanced degree
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
16
# load the data
nes96 <read.table("http://www.stat.washington.edu/quinn/classes/536/dat
a/nes96r.dat", header=TRUE)
# check to see if the data were read in properly
head(nes96)
describe(nes96)
table(nes96$ClinLR)
table(nes96$PID)
table(nes96$educ)
xtabs(~ nes96$ClinLR + nes96$PID)
xtabs(~ nes96$ClinLR + nes96$educ)
#### Note: we do violate some of our assumptions, but we will
run the model
#### anyway. We would probably not want to use these results
for anything
#### except an practice example.
summary(m3 <- bayespolr(as.ordered(ClinLR)~PID+educ,data=nes96))
(beta <- coef(m3))
(tau <- m3$zeta)
(X <- cbind(0:6,4.57))
#### use this function to transform into probabilities
logit.prob <- function(eta){exp(eta)/(1+exp(eta))}
(p1 <- logit.prob(tau[1] - X %*% beta))
(p2 <- logit.prob(tau[2] - X %*% beta) - logit.prob(tau[1]
%*% beta))
(p3 <- logit.prob(tau[3] - X %*% beta) - logit.prob(tau[2]
%*% beta))
(p4 <- logit.prob(tau[4] - X %*% beta) - logit.prob(tau[3]
%*% beta))
(p5 <- logit.prob(tau[5] - X %*% beta) - logit.prob(tau[4]
%*% beta))
(p6 <- logit.prob(tau[6] - X %*% beta) - logit.prob(tau[5]
%*% beta))
(p7 <- 1.0 - logit.prob(tau[6] - X %*% beta))
Political.ID <- 0:6
plot(Political.ID, p1, type="l", col=1, ylim=c(0,1))
lines(0:6, p2, col=2)
lines(0:6, p3, col=3)
lines(0:6, p4, col=4)
- X
- X
- X
- X
- X
REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL
LOGISTIC MODEL
17
lines(0:6, p5, col=5)
lines(0:6, p6, col=6)
lines(0:6, p7, col=7)
legend(0, 1, legend=c("P(extremely liberal)", "P(liberal)",
"P(slightly liberal)", "P(moderate)", "P(slightly
conservative)",
"P(conservative)", "P(extremely conservative)"), col=1:7, lty=1)
#### let educ vary and hold PID constant this time
(beta <- coef(m3))
(tau <- m3$zeta)
(X <- cbind(2.84,1:7))
#### use this function to transform the
logit.prob <- function(eta){exp(eta)/(1+exp(eta))}
(p1 <- logit.prob(tau[1] - X %*% beta))
(p2 <- logit.prob(tau[2] - X %*% beta) - logit.prob(tau[1]
%*% beta))
(p3 <- logit.prob(tau[3] - X %*% beta) - logit.prob(tau[2]
%*% beta))
(p4 <- logit.prob(tau[4] - X %*% beta) - logit.prob(tau[3]
%*% beta))
(p5 <- logit.prob(tau[5] - X %*% beta) - logit.prob(tau[4]
%*% beta))
(p6 <- logit.prob(tau[6] - X %*% beta) - logit.prob(tau[5]
%*% beta))
(p7 <- 1.0 - logit.prob(tau[6] - X %*% beta))
- X
- X
- X
- X
- X
Education <- 1:7
plot(Education, p1, type="l", col=1, ylim=c(0,1))
lines(1:7, p2, col=2)
lines(1:7, p3, col=3)
lines(1:7, p4, col=4)
lines(1:7, p5, col=5)
lines(1:7, p6, col=6)
lines(1:7, p7, col=7)
legend(1, 1, legend=c("P(extremely liberal)", "P(liberal)",
"P(slightly liberal)", "P(moderate)", "P(slightly
conservative)",
"P(conservative)", "P(extremely conservative)"), col=1:7, lty=1)
Download