Thursday, October 7: Interactions and Ordered Categorical Variables

advertisement
Thursday, October 7: Interactions and Ordered Categorical Variables
I. Interpreting Interaction Effects (also see Greene, pp. 123-124)
A. The Concept of an Interaction. Remember that interactions measure how
much changes on one variable affect the effect of another variable. Put another way, they
reveal whether or not the effect of variable changes in different contexts. For instance,
increases in ethnolinguistic fractionalization could lead to significant declines in political
stability under authoritarian regimes, but lead to increases in the stability of democracies. If
this is true, there will be a strong interaction between type of political system and
ethnolinguistic fractionalization.
B. Measuring an Interaction Effect. To test the hypothesis that Variable x1
influences the effect of Variable x2 (and vice-versa), your model should include an
interaction term that is the product of these two variables. If the effect of one variable is
altered by the other, the coefficient on this interaction term (x1x2) will be significant. In
ordinary least squares regression, evaluating the effect of x1 (ELF fractionalization in this
example) at any particular level of x2 (type of political system) is straightforward: you add the
coefficient of x1 to the product of the coefficient on x1x2 and the particular value of x2 in
which you are interested.
For the model:
We can get a predicted value by
y = β0 + β1x1+ β2x2+ β3x1x2 + e
yhat = β0 + β1x1+ β2x2+ β3x1x2
And we can obtain the first difference for x1, the change brought in yhat by a one-unit change
in x1 from x1low to x1high while all other factors are held constant, by:
Δy = β0 + β1x1high + β2x2+ β3x1highx2 - β0 - β1x1low - β2x2 - β3x1lowx2
Δy = β1x1high - β1x1low + β3x1highx2 - β3x1lowx2
Δy = β1(x1high - x1low) + β3x2(x1high - x1low)
Δy = β1 + β3x2
C. Interaction Effects in Maximum Likelihood Models. If only life were so
simple in the ML world. In OLS, this expression reduces so nicely because the predicted
value yhat for a given observation is just a linear function of the coefficients and explanatory
variables. The contribution of β2x2 to yhat is not going to vary across observations, so it can
drop out of the expression. But in ML models with more complex systematic components,
β2x2 and all of the other terms will be put through a transformation such as the logit
transformation. In a model with an interaction term, the effect of x2 will depend upon both
x1 and upon the levels of all the other variables in the equation (which determine where this
observation is on the logit curve when the shift in x1 begins).
Just as we did when we were finding simple first differences, we will have to pick
some value to set all other variables constant at. One pitfall to avoid is that when you tell
Clarify to setx mean, it will set the interaction term constant at the mean of the interaction
term (the product of x1 and x2). This is not very substantively meaningful, since the
interaction term isn’t a real variable. Instead of setting it constant at the mean of the
product of the two variables, we should set it constant at the product of the means of x1 and
x2. Then, to find the effect of a one-unit shift in x1 at some particular value of x2, we
simulate the first difference brought by both a one-unit change in x1 (analogous to β1) and
the change in the interaction term that results from a one-unit change in x1 (analogous to
β3x2).
D. Syntax for Interactions in Clarify.
Suppose I want to predict whether Senate committees in the 50 states are
required to report legislation that they are assigned. I begin with a model
that attempts to explain the presence of a reporting rule (senhear=1) by using
the three components of legislative professionalism, the house’s session
length, staffing levels, and salaries, along with a measure of the “moralism”
of the state’s political culture. I run the following logit model, using
Clarify.
.
estsimp logit senrepor totalday salary staffup moral
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-26.345398
-20.849193
-20.011675
-19.917045
-19.914806
-19.914805
Logit estimates
Log likelihood = -19.914805
Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2
=
=
=
=
50
12.86
0.0120
0.2441
-----------------------------------------------------------------------------senrepor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------totalday | -.0141113
.0099893
-1.41
0.158
-.0336899
.0054674
salary |
.0000921
.0000477
1.93
0.053
-1.33e-06
.0001856
staffup | -2.492502
1.392717
-1.79
0.074
-5.222177
.2371723
moral |
2.617587
1.120754
2.34
0.020
.4209504
4.814224
_cons | -2.046383
.9856179
-2.08
0.038
-3.978159
-.1146075
-----------------------------------------------------------------------------Simulating main parameters. Please wait....
% of simulations completed: 20% 40% 60% 80% 100%
Number of simulations : 1000
Names of new variables : b1 b2 b3 b4 b5
Now I can look at the effects increasing the salary level from Maine’s value
($12,900) to California’s ($99,250), holding other variables constant at their
means:
setx mean
simqi fd(pr) changex(salary 12900 99250)
First Difference: salary 12900 99250
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(senrepor = 0) |
-.842876
.2485586
-.9939116
-.0548535
dPr(senrepor = 1) |
.842876
.2485586
.0548535
.9939116
This tells me that moving from Maine’s salary to California’s salary, all other
factors being equal, increase the chances of requiring Senate committees to
report bills by 0.84, with a confidence interval of (0.05 to 0.99).
Now suppose I want to test whether the effect of salary is different in
moralistic and nonmoralistic states, and estimates those two context-bound
effects. I create the interaction variable salary_moral (the product of the
two, by using the Stata command genx salary_moral=salary*moral), and estimate
another logit model with this interaction term included. But first, I have to
get rid of the simulated parameters from the last model.
drop b*
estsimp logit senrepor totalday salary staffup moral salary_moral
Logit estimates
Log likelihood = -19.144086
Number of obs
LR chi2(5)
Prob > chi2
Pseudo R2
=
=
=
=
50
14.40
0.0132
0.2733
-----------------------------------------------------------------------------senrepor |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------totalday | -.0093382
.0097495
-0.96
0.338
-.0284469
.0097705
salary |
.0001326
.0000596
2.22
0.026
.0000157
.0002495
staffup | -2.422675
1.377301
-1.76
0.079
-5.122136
.2767861
moral |
4.598178
2.281156
2.02
0.044
.1271951
9.069161
salary_moral | -.0000673
.0000575
-1.17
0.242
-.0001801
.0000454
_cons | -4.184283
2.295119
-1.82
0.068
-8.682634
.3140674
-----------------------------------------------------------------------------Simulating main parameters. Please wait....
% of simulations completed: 16% 33% 50% 66% 83% 100%
Number of simulations : 1000
Names of new variables : b1 b2 b3 b4 b5 b6
Although it is not statistically significant, this interaction term tells me
that salary has a smaller affect on committee reporting requirements in states
with moralistic political cultures. To get a sense of the scale of the “slope
shift,” I can look at first differences.
Maybe I wasn’t listening to my own lectures, and I set all of the variables at
their means. This sets the interaction term at the mean of the product of the
two variables:
setx mean
simqi fd(pr) changex(salary 12900 99250)
First Difference: salary 12900 99250
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(senrepor = 0) |
-.917242
.1888489
-.9980687
-.2803625
dPr(senrepor = 1) |
.917242
.1888489
.2803625
.9980687
But now I know that I need to hold the interaction term constant at the product
of their means.
sum salary if moral!=.
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------salary |
50
25376.64
19829.32
200
99250
. sum moral
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------moral |
50
.5
.5050763
0
10
setx mean
setx salary_moral 25376*0.5
This would be the syntax I needed in order to correctly estimate the
some other variable, like staffing levels. But if I want to look at
effects of one of the interacted terms, like salary, I would have to
the first difference of its change and of the change that its change
the interaction term.
effect of
the
look at
brings to
simqi fd(pr) changex(salary 12900 99250 salary_moral 12900*.5 99250*.5)
First Difference: salary 12900 99250 salary_moral 12900*.5 99250*.5
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.7968219
.2824022
-.9988686
-.0267144
dPr(senrepor = 1) |
.7968219
.2824022
.0267144
.9988686
This is the effect of salary when moralism is held constant at its mean of 0.5.
setx moral 0
. simqi fd(pr) changex(salary 12900 99250 salary_moral 0 0)
First Difference: salary 12900 99250 salary_moral 0 0
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(senrepor = 0) |
-.915604
.2083988
-.9998101
-.1976774
dPr(senrepor = 1) |
.915604
.2083988
.1976775
.9998101
This is the effect of salary in nonmoralistic states.
. setx moral 1
. simqi fd(pr) changex(salary 12900 99250 salary_moral 12900 99250)
First Difference: salary 12900 99250 salary_moral 12900 99250
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(senrepor = 0) | -.6433052
.3541569
-.9467836
.3529614
dPr(senrepor = 1) |
.6433052
.3541569
-.3529614
.9467836
This is the effect of salary in moralistic states.
II. Ordered Categorical Variables
A. When is an ordered probit model appropriate? When your measurement
strategy results in an ordinal dependent variable that takes on three or more discrete values.
This means that you can specify a clear ordering of the outcomes (“3” means more of the
variable than “2,” which means more than “1”) but you may not be ready to assume that the
distance between each of the categories is constant. Ordered probit is often used to analyze
things like Likhert scales on polls or subjective scales assigned by field experts. Since we
don’t assume a constant difference between each category, we will allow for the possibility
that it takes a bigger change in an independent variable to get over the “threshold” into one
category than it takes to get into the next category. An ordered probit model estimates both
the effects of the independent variables (through the systematic component) and the
thresholds of the dependent variable (through the stochastic component) at the same time.
B. Some Notation. We will use the stylized normal distribution, the special case of
the Normal distribution where σ2=1. The probability density function of the distribution is
denoted by fstn(yi|μi), and Fstn(yi|μi) denotes the cumulative density function, which tells us
the probability of observing some value of y up to and including yi.
C. The Univariate PDF. The idea behind the ordered probit model is that even
though your measure is ordinal, the latent concept that you are trying to measure is
continuous. We assume that the random variable is distributed around μi according to the
stylized normal distribution, but that we only observe whether it got over one of m
thresholds that can be indexed by Tj … Tm, j = 1,m. The latent continuous variable can be
denoted as yi*. So that every possible value of yi* fits into a category, we assume T1 = -∞
and Tm = ∞. Thus, yi* is observed to fall into category j iff:
Tj-1 < yi* < Tj
for j=2…m
Now our observed realization yji is a set of dichotomous variables that tell us whether or not
observation i fell into each particular category j.
yji = { 1 if Tj-1 < yi* < Tj
0 otherwise
For one particular category j and one observation i, we can derive a univariate pdf.
Pr(Yji =1) = Pr(Tj-1 < yi* < Tj)
= ∫Tj-1Tj fstn(yi*|μi)∂yi*
= FN(Tj|μi,1) - FN(Tj-1|μi,1)
= FN(Tj|xiβ,1) - FN(Tj-1|xiβ,1)
D. Building a Likelihood Function. All we’ve done so far is write out the
univariate pdf for one category. To get the full univariate pdf for all categories (while still
only looking at one case), we need to estimate the chances that our unobserved variable fell
into all of the categories. We do that by taking the product of the above expression across
all m categories. We raise the expression to the power yji so that it takes on a value of 1 for
all of the cases where yi* doesn’t fall into category j, but takes on that category-specific pdf
for the category that it does fall into. Then to turn this into a joint pdf across all
observations, we take the product of that product across all n observations. The likelihood
and log-likelihood functions become:
n
m
~
~ y
~ ~
~
~
L(T ,  | y )   { [ FN (T j | xi  ,1)  FN (T j 1 | xi  ,1)] ji }
i 1
j 1
n
m
~
~
~ ~
~
~
ln L(T ,  | y )   y ji ln [ FN (T j | xi  ,1)  FN (T j 1 | xi  ,1)]
i 1 j 1
E. Estimating and Interpreting Results.
estsimp oprobit delivery termlim termnext salary totalday staffper income pop
edcoll urban govrace senhear init, table
Ordered probit estimates
Log likelihood = -40.969525
Number of obs
LR chi2(12)
Prob > chi2
Pseudo R2
=
=
=
=
50
23.45
0.0241
0.2225
-----------------------------------------------------------------------------delivery |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------termlim | -3.634462
1.941232
-1.87
0.061
-7.439206
.1702822
termnext |
2.139961
1.021766
2.09
0.036
.1373353
4.142586
salary | -6.49e-06
.0000182
-0.36
0.721
-.0000421
.0000291
totalday |
.0116516
.0041179
2.83
0.005
.0035808
.0197225
staffper | -.0323126
.1144711
-0.28
0.778
-.2566718
.1920467
income |
-.000027
.000101
-0.27
0.789
-.0002249
.0001709
pop | -3.57e-07
.0000764
-0.00
0.996
-.0001501
.0001494
edcoll |
.011924
.0627344
0.19
0.849
-.1110333
.1348812
urban |
.0172823
.0173222
1.00
0.318
-.0166687
.0512333
govrace | -.0892512
.3308387
-0.27
0.787
-.737683
.5591806
senhear |
1.3727
.5114623
2.68
0.007
.3702525
2.375148
init | -.8800094
.5167325
-1.70
0.089
-1.892787
.1327677
-------------+---------------------------------------------------------------_cut1 |
1.936859
1.699538
(Ancillary parameters)
_cut2 |
2.642842
1.711786
-----------------------------------------------------------------------------delivery |
Probability
Observed
-------------+------------------------------------0 |
Pr(
xb+u<_cut1)
0.4200
1 |
Pr(_cut1<xb+u<_cut2)
0.2000
2 |
Pr(_cut2<xb+u)
0.3800
. setx mean
. simqi fd(pr) changex(termlim 0 1)
First Difference: termlim 0 1
Quantity of Interest |
Mean
Std. Err.
[95% Conf. Interval]
---------------------------+-------------------------------------------------dPr(delivery = 0) |
.6302705
.2017743
-.034686
.841227
dPr(delivery = 1) | -.2250201
.0921626
-.3875622
-.0081882
dPr(delivery = 2) | -.4052504
.1566423
-.6185045
.0443655
Download