Chapter 8

advertisement
1
Multicategory Logit Models

In the past, we have restricted the response or dependent variable in logit models to be
dichotomous. Now we will consider a response variable, Y, with J levels. The explanatory
or independent variables may be quantitative, qualitative, or both.

There are three ways in which logistic regression models for response variables with more
than two outcomes differ from logistic regression for dichotomous data.
1. How the logits are formed
When J = 2 there is only one logit we can form, however when J > 2 there are J(J-1)/2
logits that we can form, but only J-1 of them are non-redundant. There are different ways
to form the non-redundant logits, each of which results in a “dichotomizing” the response
variable. The way we choose to form the logit will partly depend on whether Y is ordinal
or nominal.
2. The sampling distribution
When Y is dichotomous, at each combination of the explanatory variable we assume that
data come from a binomial distribution. When J > 2, at each combination of the
explanatory variable we assume that data come from a multinomial model. The binomial
distribution is a special case of the multinomial distribution.
The multinomial distribution depends on n and {n}. This distribution gives
the probability for each way to classify the n observations into the J categories of the
response variable. For example, the possible ways to classify n = 2 observations into J =
3 categories is:
y1
y2
y3
2
0
0
0
2
0
0
0
2
1
1
0
0
1
1
1
0
1
3. Connections with other models, such as loglinear
Some multicategory logit models are equivalent to Poisson regression or loglinear
models, however some are derived from latent variable models. For example, some are
very similar to IRT models, in terms of their parametric form, however in IRT we assume
the predictor variable is observed, while in IRT the predictor is unobserved or latent.
2
Baseline Category Logit Model for Nominal Response Variables

This model is basically just an extension of the binary logistic regression model. It gives a
simultaneous representation of the odds of being in one category relative to being in another
category, for all pairs of categories.

With a set of J - 1 non-redundant odds we can figure out the odds for any pairs of categories.
Suppose we have data that identifies respondents’ political affiliation as either democrat,
republican, or independent and we want to know if political affiliation can be predicted by
SES which is a quantitative (i.e. continuous) variable.
For this data the response variable is party identification. We could fit a binary logit model
to each pair of party identifications:
  (x ) 
 democrat 
  log  1 1   1  1 x
log 
 republican 
  2 ( x2 ) 
  (x ) 
 republican 
  log  2 2    2   2 x
log 
 independent 
  3 ( x3 ) 
  (x ) 
 democrat 
  log  1 1    3   3 x
log 
 independent 
  3 ( x3 ) 
We can write one of the odds in terms of the other two:
 1 ( x1 )   2 ( x2 )   1 ( x1 ) 
  



  2 ( x2 )   3 ( x3 )    3 ( x3 ) 
Therefore, we can find the model parameters of one from the other two.
  (x ) 
  (x ) 
  (x ) 
log  1 1   log  2 2   log  1 1 
  2 ( x2 ) 
  3 ( x3 ) 
  3 ( x3 ) 
(1  1 x1 )  ( 2   2 x2 )   3   3 x3
Which means that in the population:
1   2   3
and
1   2  3
With sample data the estimates from separate binary logit models are consistent estimatator
of the parameters for the model, but estimates from fitting separate binary logit models will
NOT yield the equality that holds in the population. In other words,
ˆ 1  ˆ 2  ˆ 3
and
ˆ 1  ˆ 2  ˆ 3
We can solve this problem by simultaneously estimating the parameters of the model. This
will enforce the logical relationships among the parameters and will use the data more
efficiently, resulting in smaller standard errors.
3
With the baseline category logit model we choose one of the categories as the “baseline”.
This choice may be arbitrary or there may be a logical choice depending on the data.
For convenience, we’ll use the last level (i.e. the Jth level) of the response variable as the
baseline.
The baseline category logit model with one explanatory variable, x, is:
  ij
log 
  iJ

   j   j xi for j = 1, 2, … , J -1

For J = 2 this is just the regular binary logistic regression model. For J > 2,  and can
differ depending on which two categories are being compared. The odds for any pair of
categories of Y are a function of the parameters of the model.
Using the previous data and only using sex as an independent variable we have 3 - 1 = 2 nonredundant logits:
 
 democrat 
  log  1   1  1 x
log 
 independent 
 3 
 
 republican 
  log  2    2   2 x
log 
 independent 
 3 
The logit for democrat and republican is:

 democrat 
  log  1
log 
 republican 
 2
 1 
  

3
  log 


 2 

 3 
= log(log(




 1  1 x ) - (  2   2 x )
= (1   2 )  (1   2 ) x
The difference 1   2 is called a contrast.
CAUTION: You MUST be certain what the computer program that you use to estimate the
model is doing. Some programs set , some set J = 0 and others set .
Using the parameters I obtained from fitting the model with SES predicting party affiliation by
SES I obtained:
 
 democrat 
  lôg 1   0.1502  0.00013( x)
lôg
 independent 
 3 
 
 republican 
  lôg 2   0.9987  0.0191( x)
lôg
 independent 
 3 
4
 1 
  
 1 
 democrat 
3
  lôg   lôg

lôg

republican

 2 


 2
 3 
= log(log(




= (0.1502 - 0.00013x) - (-.9987 + .0191x)
= 1.1489 - .01923 x
We can interpret the parameters of the model in terms of odds ratios, given an increase in
SES. For a 10 point increase in SES index we obtain the following odds ratios:
Democrat to Independent = exp(10(-0.00013)) = 0.998
Republican to Independent = exp(10(.191)) = 6.75
Democrat to Republican = exp(10(-.01923)) = 0.825
Republican to Democrat = 1/.825 = 1.212
Just as in binary logistic regression, we can also interpret the parameters of the model in
terms of probabilities.
The probability of a response being in category j is
j 
exp(  j   j x)
J
 exp( 
k 1
k
  k x)
Note that for the baseline category (independent in our case), J = J = 0. This is an
J
identification constraint. Furthermore, the denominator,
 exp( 
k 1
probabilities sum to 1.
Using our estimated parameters we obtain:
ˆ democrat 
exp( 0.1502  .00013x)
1  exp(. 1502  .00013x)  exp( .9987  .0191x)
ˆ republican 
exp( 0.9987  .0191x)
1  exp(. 1502  .00013x)  exp( .9987  .0191x)
ˆ independent 
1
1  exp(. 1502  .00013x)  exp( .9987  .0191x)
We can use these functions to plot the probabilities versus SES.
k
  k x) ensures that the
5
Probability
0.8
0.6
0.4
0.2
0
0
Democrat
50
100
SES
Republican
Independent
We can easily add more explanatory variables to our model and these variables can either be
categorical or numeric. We identify numeric variables in proc catmod by the command
“direct”. Furthermore, by all of the model comparison methods that we have used in the past
will work with this model as well.

The baseline category logit model can be used when the categories of the response variable
are ordered, but it may not be the best model for ordinal responses.
Proportional Odds Model for Ordinal Response Variables

When the response variable is ordered we can use the ordering of the categories in forming
the logits. When we use the ordering of the categories the resulting model is a more
powerful model than the baseline logit model. It also yields a simpler model with simpler
interpretations. We will only consider one of these models, the proportional odds model.

For this model the effect of the explanatory variable(s) is the same regardless of how we
collapse Y into dichotomous categories. Therefore, a single parameter describes the effect of
x on Y, versus the J-1 parameters that are needed in the baseline model. However, the
intercepts can differ.

For this model we use cumulative probabilities which are the probabilities that Y falls in
category j or below. In other words, P( Y ≤ j) = j, where j = 1, 2, … , J.

Cumulative probabilities reflect the ordering of the categories and are used to form
cumulative logits.

A cumulative logit is of the form:
 1   2     j
 P(Y  j ) 
 P(Y  j ) 
  log 
  log 
log 
   
 P(Y  j ) 
 1  P(Y  j ) 
j 1
J






Models that use cumulative probabilities do not use the final category, P( Y ≤ J) since it must
equal 1.
6

A model for the jth cumulative logit looks like an ordinary logit model for a dichotomous
response variable in which categories 1 to j combine to from a single category. In other
words, the response variable collapses into two categories, one up to j and one for j + 1 to J.

The proportional odds ratio is of the form:
 P(Y  j ) 
   j  x
log 
 1  P(Y  j ) 
Cumulative probabilities are given by:
P(Y  j ) 
exp(  j  x)
1  exp(  j  x)
We can compute the probability of being in category j by taking differences between
cumulative probabilities. In other words,
P(Y = j) = P(Y ≤ j) - P(Y ≤ j - 1) for j = 2, … J
P(Y = 1) = P(Y ≤ 1)
Therefore, this model is sometimes referred to as a difference model.
To interpret this model in terms of odds ratios for a given level of Y, say Y = j
P(Y  j | X  x2 ) / P(Y  j | X  x2 )
P(Y  j | X  x2 ) P(Y  j | X  x1 )

P(Y  j | X  x1 ) / P(Y  j | X  x1 ) P(Y  j | X  x1 ) / P(Y  j | X  x2 )
= exp(j + x2)/ exp(j + x1) = exp[x1 -x2)]
The odds ratio is proportional to the difference between x1 and x2 and since this
proportionality is a constant = to this model is called the proportional odds model.
We can fit this model using either proc logistic or proc catmod. When the number of
categories is greater than 2 proc logistic fits a proportional odds model using maximum
likelihood estimation. To use proc catmod you need to specify that the response to be
modeled is clogits which is an abbreviation for cumulative logits. Proc catmod uses
weighted least squares estimation to fit the proportional odds model. For large samples with
categorical explanatory variables the results are almost the same. In general, maximum
likelihood estimation is preferred with quantitative explanatory variables.
I fit a cumulative odds ratio predicting whether or not one liked big band music from age and
got the following:



like it very much = -3.2566



like it = -1.2391



mixed feelings = -0.1981



dislike it = 1.6670
age = 0.0361
Interpreting this in terms of odds ratios, .0361(30 - 50) = .4857
7
Interpreting this in terms of cumulative probabilities:
P(Y  1) 
exp( 3.2566  .0361x)
1  exp( 3.2566  .0361x)
P(Y  2) 
exp( 1.2391  .0361x)
1  exp( 1.2391  .0361x)
Etc.
probability
Graphing this we get:
1
0.8
0.6
0.4
0.2
0
10
30
50
70
90
age
P(Y < 1)
P(Y < 2)
P(Y < 3)
P(Y < 4)
Calculating probabilities we get:
P(Y  1) 
exp( 3.2566  .0361x)
1  exp( 3.2566  .0361x)
P(Y  2)  P(Y  2)  P(Y  1) 
Etc.
Graphing this we get:
exp( 1.2391  .0361x)
exp( 3.2566  .0361x)

1  exp( 1.2391  .0361x) 1  exp( 3.2566  .0361x)
8
0.6
probability
0.5
0.4
0.3
0.2
0.1
0
10
30
50
70
90
age
P(Y = 1)
P(Y = 2)
P(Y=3)
P(Y=4)
P(Y = 5)
Download