Event History as a Binary Time Series, Cross Section Analysis I. II. III. Choosing this Format “How To” Guide An Extension of the TS, CS Format I. Why Would You Choose this Format? A. The Binary time series, cross section format. Here, each of your cases represents one Representative during one Congressional session. Instead of recording the length of a spell, your dependent variable is now a dichotomous measure registering a 1 if a failure occurs (if a member retires) at the end of that session or a 0 if the spell continues. Explanatory variables take on the value that you have measured for that member during that session. It is fine if some of these variables are constant across time, like gender and party, but in order to get increased causal leverage out of this sort of dataset, you will want to gather some measures like district competitiveness or fundraising that shift over time. Your total N should be 435 Congressional districts times 10 sessions since 1980 = 4350. To capture the effects of time, you need to create j dichotomous variables Kt numbered K1 through Kj where j is the largest number of sessions for which any member is observed to serve. Each dummy variable Kt takes on a value of 1 only if a member is serving in his or her tth term during this observation. For instance, the observation for Representative Connie Morella’s 3rd term would have K3=1 and every other Kt=0. Strengths: Essentially, this method transforms a bounded, continuous dependent variable (the length of a spell) into a dummy variable, making the estimation of multivariate models relatively easy. You can conduct an event history analysis simply, using logit or probit and being able to interpret coefficients just as you would using these models. But much more importantly, it allows you to analyze the effects of factors that change during the course of a spell. Weaknesses: The tradeoff for simplicity is that this method does not make any of the traditional approach’s powerful assumptions about when risks are of failure are high or low in order to improve the fit of models. Another problem is that logit and probit can sometimes be problematic when the events documented by your dependent variable are rare. In this approach, long durations lead to rare events, so if your intervals are really small and failures are rare, consult the rare events logit literature. Also note that if members of Congress enter and reenter the dataset, you might want to worry about whether these are truly independent spells. Finally, when using country dyads, you’ll want to pay attention to all of the lack of independence issues that you would in any dyad analysis. II. “How To” Guide. A. Setting up the Dataset. To estimate the effects of a set of explanatory variables on the duration of some event, use the logit (or probit) function of your statistics package. Regress the dichotomous dependent variable – indicating whether or not a failure occurred in a given observation – on the set of explanatory variables and on the set of Kt’s that index the progress of time. But make sure that you drop one of the Kt’s to let it serve as the default case, leaving you with j-1 Kt variables. Just as in other logit/probit equations, the coefficients indicate whether increases in each variable increase or decrease the odds of a failure, holding all other factors constant. Coefficients on the Kt’s show how much more or less risky each period “t” is, compared to the default case which you chose. Suppose you left out the K1 dummy as your anchor. If the baseline hazard were relatively constant, none of the Kts would be significant. If hazards increased, they would get progressively bigger, and they would get smaller if the hazard rate decreased over time. B. An example using legislative leadership data. This is an interesting one because in the 9th and 11th sessions, the only leaders to make it that far survived that session. logit change salary totalday staffper turn1n size income pop edhigh money k2 k3 k4 k5 k6 k7 k8 k9 k10 k11 k12 note: k9 != 0 predicts failure perfectly k9 dropped and 4 obs not used note: k11 != 0 predicts failure perfectly k11 dropped and 1 obs not used note: k12 dropped due to collinearity Iteration 0: log likelihood = -694.3046 Iteration 4: log likelihood = -647.53627 Logit estimates Log likelihood = -647.53627 Number of obs LR chi2(17) Prob > chi2 Pseudo R2 = = = = 1003 93.54 0.0000 0.0674 -----------------------------------------------------------------------------change | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------salary | 5.41e-06 7.65e-06 0.71 0.479 -9.57e-06 .0000204 totalday | -.0018653 .0009013 -2.07 0.039 -.0036318 -.0000987 staffper | .0941146 .0442924 2.12 0.034 .007303 .1809261 turn1n | .028106 .006535 4.30 0.000 .0152975 .0409144 size | -.0019538 .0012793 -1.53 0.127 -.0044611 .0005536 income | -.0000477 .0000232 -2.06 0.040 -.0000931 -2.27e-06 pop | -.000034 .0000264 -1.29 0.198 -.0000856 .0000177 edhigh | .0547446 .0192492 2.84 0.004 .0170168 .0924723 money | -.0000198 9.36e-06 -2.11 0.034 -.0000381 -1.45e-06 k2 | -.3157197 .1680538 -1.88 0.060 -.6450991 .0136598 k3 | -.3637111 .2262405 -1.61 0.108 -.8071342 .0797121 k4 | -.2173993 .2771846 -0.78 0.433 -.7606712 .3258726 k5 | -.4474778 .3889799 -1.15 0.250 -1.209864 .3149088 k6 | -1.857217 .7766616 -2.39 0.017 -3.379446 -.3349883 k7 | .1041463 .5726524 0.18 0.856 -1.018232 1.226524 k8 | -.6246922 .8698798 -0.72 0.473 -2.329625 1.080241 k10 | 1.310247 1.202222 1.09 0.276 -1.046064 3.666559 _cons | -3.715394 1.512452 -2.46 0.014 -6.679745 -.7510428 C. Censoring in the TS, CS Binary Format. i. Right Censoring. This method deals with right censoring by dropping spells out of the analysis just as they are right censored (since they no longer contribute any cases, or Representative-sessions in our example). This is analogous the way that a Kaplan-Meier survival plot drops spells that are no longer at risk out of the denominator. ii. “Left censoring” is a more difficult problem. While right censoring happens when we fail to observe the end of a spell, left censoring occurs when we fail to observe its start. Under both, we cannot tell how long the spell lasts. The difference is that with right censoring, at least we know exactly which periods of a spell we might be missing (the first one after the censoring point, the second, and so on). Since left censoring leaves us with more to guess about, it is best to avoid it altogether by sampling only the individuals who begin their spells at a given time (like the Congressional Class of 1980). III. An Extension of the TS, CS Format: Modeling Multiple Outcomes A. What’s the Issue? The above method is fine if all we care about is whether a Representative leaves Congress or not, or whether a leader loses power or not. However, another interesting question is how a spell ended: did the Representative retire, lose, pass away, or get sent to jail? These are called different types of “exits,” and different explanatory factors may make one type of exit more like and another less likely. If that is the case, you don’t want to constrain their effects to be positive for all types of exits. You should estimate a “competing risks” or “multiple exits” model. Traditional spell-based event history techniques can incorporate multiple exits, and so can the time series, cross section approach. B. A time series, cross section multiple exits model. Quite simply, you can handle this just as you would any other unordered choice model. If it’s the characteristics of the individual Representatives that you care about, you can use multinomial logit to predict the probabilities of each reason for leaving Congress. If you think the attributes of the possible exits change over time, you might want to use conditional logit. For either of these approaches, you need to think about the IIA assumption (see a footnote in my leadership paper for another way to think about IIA in this context). To look at legislative leadership, I coded a new dependent variable, exit2, which took on a variable of -1 when the leader left due to a switch in party control, 0 if there was no change, 1 if the leader retired, and 2 if the leader fell victim to a revolt within his own party. Allowing Stata to use the most common outcome, 0, as the comparison case, here is my model: mlogit exit2 salary totalday staffper turn1n size income pop edhigh money k2 k3 k4 k5 k6 k7 k8 k10 k11 k12 k1 matsize too small set matsize 500 mlogit exit2 salary totalday staffper turn1n size income pop edhigh money k2 k3 k4 k5 k6 k7 k8 k10 k11 k12 k1 note: k12 dropped due to collinearity Iteration 0: log likelihood = -1200.2411 Iteration 39: log likelihood = Multinomial logistic regression Log likelihood = -1085.364 -1085.364 Number of obs LR chi2(57) Prob > chi2 Pseudo R2 = = = = 1005 229.75 0.0000 0.0957 -----------------------------------------------------------------------------exit2 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+----------------------------------------------------------------1 | salary | .0000285 .0000116 2.45 0.014 5.73e-06 .0000513 totalday | -.0004404 .0012757 -0.35 0.730 -.0029407 .0020599 staffper | -.0374925 .0762662 -0.49 0.623 -.1869715 .1119864 turn1n | .0519028 .0110317 4.70 0.000 .030281 .0735246 size | -.0012329 .002224 -0.55 0.579 -.0055919 .0031261 income | -.0000179 .0000374 -0.48 0.632 -.0000911 .0000553 pop | .0000444 .0000444 1.00 0.317 -.0000426 .0001313 edhigh | .1970114 .0367853 5.36 0.000 .1249134 .2691093 money | -8.79e-06 .0000157 -0.56 0.574 -.0000395 .0000219 k1 | 17.65737 3.151187 5.60 0.000 11.48116 23.83359 k2 | 16.94086 3.144106 5.39 0.000 10.77853 23.1032 k3 | 16.22168 3.183817 5.10 0.000 9.98151 22.46184 k4 | 16.74635 3.17803 5.27 0.000 10.51753 22.97518 k5 | 15.96912 3.314889 4.82 0.000 9.472058 22.46619 k6 | 16.24364 3.314851 4.90 0.000 9.746646 22.74062 k7 | 17.17906 3.25037 5.29 0.000 10.80845 23.54967 k8 | 17.30323 3.351653 5.16 0.000 10.73411 23.87235 k10 | 18.75432 3.467517 5.41 0.000 11.95812 25.55053 k11 | -22.53855 1.81e+09 -0.00 1.000 -3.54e+09 3.54e+09 _cons | -36.83572 . . . . . -------------+---------------------------------------------------------------1 | salary | -5.20e-06 .0000101 -0.51 0.608 -.0000251 .0000147 totalday | -.0019916 .0012193 -1.63 0.102 -.0043814 .0003982 staffper | .1320419 .0565285 2.34 0.019 .021248 .2428358 turn1n | .0366031 .0081496 4.49 0.000 .0206302 .0525759 size | -.000499 .0014912 -0.33 0.738 -.0034216 .0024236 income | -.0000616 .0000299 -2.06 0.039 -.0001202 -3.02e-06 pop | -.0000412 .0000339 -1.22 0.224 -.0001076 .0000252 edhigh | .0645511 .0241751 2.67 0.008 .0171687 .1119334 money | -.0000148 .0000121 -1.22 0.222 -.0000385 8.94e-06 k1 | 18.97942 1.921069 9.88 0.000 15.21419 22.74464 k2 | 19.04831 1.910441 9.97 0.000 15.30391 22.7927 k3 | 19.1312 1.915219 9.99 0.000 15.37744 22.88496 k4 | 19.35788 1.919486 10.08 0.000 15.59575 23.12 k5 | 19.00244 1.952629 9.73 0.000 15.17535 22.82952 k6 | -21.66208 3.05e+08 -0.00 1.000 -5.97e+08 5.97e+08 k7 | 19.62406 2.008169 9.77 0.000 15.68812 23.56 k8 | -20.98548 4.16e+08 -0.00 1.000 -8.15e+08 8.15e+08 k10 | 20.88315 2.301314 9.07 0.000 16.37266 25.39364 k11 | -21.33359 1.29e+09 -0.00 1.000 -2.53e+09 2.53e+09 -------------+---------------------------------------------------------------2 | salary | 1.33e-06 .0000129 0.10 0.918 -.000024 .0000267 totalday | -.0036549 .0016751 -2.18 0.029 -.0069381 -.0003717 (Outcome exit2==0 is the comparison group)