Event History as a Binary Time Series, Cross Section Analysis

advertisement
Event History as a Binary Time Series, Cross Section Analysis
I.
II.
III.
Choosing this Format
“How To” Guide
An Extension of the TS, CS Format
I. Why Would You Choose this Format?
A. The Binary time series, cross section format. Here, each of your cases
represents one Representative during one Congressional session. Instead of recording the
length of a spell, your dependent variable is now a dichotomous measure registering a 1 if a
failure occurs (if a member retires) at the end of that session or a 0 if the spell continues.
Explanatory variables take on the value that you have measured for that member during that
session. It is fine if some of these variables are constant across time, like gender and party,
but in order to get increased causal leverage out of this sort of dataset, you will want to
gather some measures like district competitiveness or fundraising that shift over time. Your
total N should be 435 Congressional districts times 10 sessions since 1980 = 4350.
To capture the effects of time, you need to create j dichotomous variables Kt
numbered K1 through Kj where j is the largest number of sessions for which any member is
observed to serve. Each dummy variable Kt takes on a value of 1 only if a member is
serving in his or her tth term during this observation. For instance, the observation for
Representative Connie Morella’s 3rd term would have K3=1 and every other Kt=0.
 Strengths: Essentially, this method transforms a bounded, continuous dependent
variable (the length of a spell) into a dummy variable, making the estimation of multivariate
models relatively easy. You can conduct an event history analysis simply, using logit or
probit and being able to interpret coefficients just as you would using these models. But
much more importantly, it allows you to analyze the effects of factors that change during the
course of a spell.
 Weaknesses: The tradeoff for simplicity is that this method does not make any of
the traditional approach’s powerful assumptions about when risks are of failure are high or
low in order to improve the fit of models. Another problem is that logit and probit can
sometimes be problematic when the events documented by your dependent variable are rare.
In this approach, long durations lead to rare events, so if your intervals are really small and
failures are rare, consult the rare events logit literature. Also note that if members of
Congress enter and reenter the dataset, you might want to worry about whether these are
truly independent spells. Finally, when using country dyads, you’ll want to pay attention to
all of the lack of independence issues that you would in any dyad analysis.
II. “How To” Guide.
A. Setting up the Dataset. To estimate the effects of a set of explanatory variables
on the duration of some event, use the logit (or probit) function of your statistics package.
Regress the dichotomous dependent variable – indicating whether or not a failure occurred
in a given observation – on the set of explanatory variables and on the set of Kt’s that index
the progress of time. But make sure that you drop one of the Kt’s to let it serve as the
default case, leaving you with j-1 Kt variables.
Just as in other logit/probit equations, the coefficients indicate whether increases in
each variable increase or decrease the odds of a failure, holding all other factors constant.
Coefficients on the Kt’s show how much more or less risky each period “t” is, compared to
the default case which you chose. Suppose you left out the K1 dummy as your anchor. If
the baseline hazard were relatively constant, none of the Kts would be significant. If hazards
increased, they would get progressively bigger, and they would get smaller if the hazard rate
decreased over time.
B. An example using legislative leadership data. This is an interesting one
because in the 9th and 11th sessions, the only leaders to make it that far survived that session.
logit change salary totalday staffper turn1n size income pop edhigh money k2 k3
k4 k5 k6 k7 k8 k9 k10 k11 k12
note: k9 != 0 predicts failure perfectly
k9 dropped and 4 obs not used
note: k11 != 0 predicts failure perfectly
k11 dropped and 1 obs not used
note: k12 dropped due to collinearity
Iteration 0:
log likelihood = -694.3046
Iteration 4:
log likelihood = -647.53627
Logit estimates
Log likelihood = -647.53627
Number of obs
LR chi2(17)
Prob > chi2
Pseudo R2
=
=
=
=
1003
93.54
0.0000
0.0674
-----------------------------------------------------------------------------change |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------salary |
5.41e-06
7.65e-06
0.71
0.479
-9.57e-06
.0000204
totalday | -.0018653
.0009013
-2.07
0.039
-.0036318
-.0000987
staffper |
.0941146
.0442924
2.12
0.034
.007303
.1809261
turn1n |
.028106
.006535
4.30
0.000
.0152975
.0409144
size | -.0019538
.0012793
-1.53
0.127
-.0044611
.0005536
income | -.0000477
.0000232
-2.06
0.040
-.0000931
-2.27e-06
pop |
-.000034
.0000264
-1.29
0.198
-.0000856
.0000177
edhigh |
.0547446
.0192492
2.84
0.004
.0170168
.0924723
money | -.0000198
9.36e-06
-2.11
0.034
-.0000381
-1.45e-06
k2 | -.3157197
.1680538
-1.88
0.060
-.6450991
.0136598
k3 | -.3637111
.2262405
-1.61
0.108
-.8071342
.0797121
k4 | -.2173993
.2771846
-0.78
0.433
-.7606712
.3258726
k5 | -.4474778
.3889799
-1.15
0.250
-1.209864
.3149088
k6 | -1.857217
.7766616
-2.39
0.017
-3.379446
-.3349883
k7 |
.1041463
.5726524
0.18
0.856
-1.018232
1.226524
k8 | -.6246922
.8698798
-0.72
0.473
-2.329625
1.080241
k10 |
1.310247
1.202222
1.09
0.276
-1.046064
3.666559
_cons | -3.715394
1.512452
-2.46
0.014
-6.679745
-.7510428
C. Censoring in the TS, CS Binary Format.
i. Right Censoring. This method deals with right censoring by dropping spells out of
the analysis just as they are right censored (since they no longer contribute any cases, or
Representative-sessions in our example). This is analogous the way that a Kaplan-Meier
survival plot drops spells that are no longer at risk out of the denominator.
ii. “Left censoring” is a more difficult problem. While right censoring happens when
we fail to observe the end of a spell, left censoring occurs when we fail to observe its start.
Under both, we cannot tell how long the spell lasts. The difference is that with right
censoring, at least we know exactly which periods of a spell we might be missing (the first
one after the censoring point, the second, and so on). Since left censoring leaves us with
more to guess about, it is best to avoid it altogether by sampling only the individuals who
begin their spells at a given time (like the Congressional Class of 1980).
III. An Extension of the TS, CS Format: Modeling Multiple Outcomes
A. What’s the Issue? The above method is fine if all we care about is whether a
Representative leaves Congress or not, or whether a leader loses power or not. However,
another interesting question is how a spell ended: did the Representative retire, lose, pass
away, or get sent to jail? These are called different types of “exits,” and different explanatory
factors may make one type of exit more like and another less likely. If that is the case, you
don’t want to constrain their effects to be positive for all types of exits. You should estimate
a “competing risks” or “multiple exits” model. Traditional spell-based event history
techniques can incorporate multiple exits, and so can the time series, cross section approach.
B. A time series, cross section multiple exits model. Quite simply, you can
handle this just as you would any other unordered choice model. If it’s the characteristics of
the individual Representatives that you care about, you can use multinomial logit to predict
the probabilities of each reason for leaving Congress. If you think the attributes of the
possible exits change over time, you might want to use conditional logit. For either of these
approaches, you need to think about the IIA assumption (see a footnote in my leadership
paper for another way to think about IIA in this context). To look at legislative leadership, I
coded a new dependent variable, exit2, which took on a variable of -1 when the leader left
due to a switch in party control, 0 if there was no change, 1 if the leader retired, and 2 if the
leader fell victim to a revolt within his own party. Allowing Stata to use the most common
outcome, 0, as the comparison case, here is my model:
mlogit exit2 salary totalday staffper turn1n size income pop edhigh money
k2 k3 k4 k5 k6 k7 k8 k10 k11 k12
k1
matsize too small
set matsize 500
mlogit exit2 salary totalday staffper turn1n size income pop edhigh money
k2 k3 k4 k5 k6 k7 k8 k10 k11 k12
k1
note: k12 dropped due to collinearity
Iteration 0:
log likelihood = -1200.2411
Iteration 39:
log likelihood =
Multinomial logistic regression
Log likelihood =
-1085.364
-1085.364
Number of obs
LR chi2(57)
Prob > chi2
Pseudo R2
=
=
=
=
1005
229.75
0.0000
0.0957
-----------------------------------------------------------------------------exit2 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+----------------------------------------------------------------1
|
salary |
.0000285
.0000116
2.45
0.014
5.73e-06
.0000513
totalday | -.0004404
.0012757
-0.35
0.730
-.0029407
.0020599
staffper | -.0374925
.0762662
-0.49
0.623
-.1869715
.1119864
turn1n |
.0519028
.0110317
4.70
0.000
.030281
.0735246
size | -.0012329
.002224
-0.55
0.579
-.0055919
.0031261
income | -.0000179
.0000374
-0.48
0.632
-.0000911
.0000553
pop |
.0000444
.0000444
1.00
0.317
-.0000426
.0001313
edhigh |
.1970114
.0367853
5.36
0.000
.1249134
.2691093
money | -8.79e-06
.0000157
-0.56
0.574
-.0000395
.0000219
k1 |
17.65737
3.151187
5.60
0.000
11.48116
23.83359
k2 |
16.94086
3.144106
5.39
0.000
10.77853
23.1032
k3 |
16.22168
3.183817
5.10
0.000
9.98151
22.46184
k4 |
16.74635
3.17803
5.27
0.000
10.51753
22.97518
k5 |
15.96912
3.314889
4.82
0.000
9.472058
22.46619
k6 |
16.24364
3.314851
4.90
0.000
9.746646
22.74062
k7 |
17.17906
3.25037
5.29
0.000
10.80845
23.54967
k8 |
17.30323
3.351653
5.16
0.000
10.73411
23.87235
k10 |
18.75432
3.467517
5.41
0.000
11.95812
25.55053
k11 | -22.53855
1.81e+09
-0.00
1.000
-3.54e+09
3.54e+09
_cons | -36.83572
.
.
.
.
.
-------------+---------------------------------------------------------------1
|
salary | -5.20e-06
.0000101
-0.51
0.608
-.0000251
.0000147
totalday | -.0019916
.0012193
-1.63
0.102
-.0043814
.0003982
staffper |
.1320419
.0565285
2.34
0.019
.021248
.2428358
turn1n |
.0366031
.0081496
4.49
0.000
.0206302
.0525759
size |
-.000499
.0014912
-0.33
0.738
-.0034216
.0024236
income | -.0000616
.0000299
-2.06
0.039
-.0001202
-3.02e-06
pop | -.0000412
.0000339
-1.22
0.224
-.0001076
.0000252
edhigh |
.0645511
.0241751
2.67
0.008
.0171687
.1119334
money | -.0000148
.0000121
-1.22
0.222
-.0000385
8.94e-06
k1 |
18.97942
1.921069
9.88
0.000
15.21419
22.74464
k2 |
19.04831
1.910441
9.97
0.000
15.30391
22.7927
k3 |
19.1312
1.915219
9.99
0.000
15.37744
22.88496
k4 |
19.35788
1.919486
10.08
0.000
15.59575
23.12
k5 |
19.00244
1.952629
9.73
0.000
15.17535
22.82952
k6 | -21.66208
3.05e+08
-0.00
1.000
-5.97e+08
5.97e+08
k7 |
19.62406
2.008169
9.77
0.000
15.68812
23.56
k8 | -20.98548
4.16e+08
-0.00
1.000
-8.15e+08
8.15e+08
k10 |
20.88315
2.301314
9.07
0.000
16.37266
25.39364
k11 | -21.33359
1.29e+09
-0.00
1.000
-2.53e+09
2.53e+09
-------------+---------------------------------------------------------------2
|
salary |
1.33e-06
.0000129
0.10
0.918
-.000024
.0000267
totalday | -.0036549
.0016751
-2.18
0.029
-.0069381
-.0003717
(Outcome exit2==0 is the comparison group)
Download