August 2005
Stata Application Tutorial 8: Competing Risks/Split Population
________________________________________________________________
Data Note: Code makes use of the restrictive abortion adoption legislation.
These data are available on the Event History website. Code is based on Stata version 8.
Preliminaries: With competing risks, the question arises as to how handle the fact that different kinds of events can possibly occur. This is a common issue with a many duration-type problems social scientists work with. For example, in what ways can a career end? Can regimes “fail” in different ways (overthrown, elections, resignation, etc.)? And so on.
I start with the restrictive abortion adoption legislation. Suppose we just estimate a garden-variety single-state Cox model? We obtain:
. stcox south lgctsid nbrestr mooneyp ugov conright, nohr exactp
failure _d: event
analysis time _t: time
Iteration 0: log likelihood = -189.66918
Iteration 1: log likelihood = -173.75225
Iteration 2: log likelihood = -172.84282
Iteration 3: log likelihood = -172.83956
Iteration 4: log likelihood = -172.83956
Refining estimates:
Iteration 0: log likelihood = -172.83956
Cox regression -- exact partial likelihood
No. of subjects = 418 Number of obs = 418
No. of failures = 44
Time at risk = 3170
LR chi2(6) = 33.66
Log likelihood = -172.83956 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
south | .8244373 .4292111 1.92 0.055 -.016801 1.665676
lgctsid | -.1419261 .1166981 -1.22 0.224 -.3706502 .0867981
nbrestr | .235164 .2336448 1.01 0.314 -.2227715 .6930995
mooneyp | -.2547299 .0856969 -2.97 0.003 -.4226928 -.086767
ugov | -.0061848 .3388421 -0.02 0.985 -.6703032 .6579336
conright | -1.312286 .4413719 -2.97 0.003 -2.177359 -.4472127
------------------------------------------------------------------------------
Under this model, it is assumed that any j event is equivalent to any k event (that is, we are not distinguishing between or among possibly disparate events…or competing risks; also note, the code above will replicate results from Jones and
Branton 2005).
There are a variety of ways to handle competing risks. The simplest way, via a
Cox model, is to estimate a stratified model. Under this model, we make the
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
1
(strong?) assumption that heterogeneity due to different event types is found in the baseline hazards; the covariate effects are the same over the competing risks. This is a strong assumption insofar as it disallows directly estimating different covariate effects for the J competing risks. Instead, heterogeneity is
“swept” into the baseline hazards.
Estimation is straightforward (assuming the data are constructed appropriately).
Under this model, it is assumed that an observation is at risk of experiencing any one of the J events. Consequently, in terms of the data set up, it will consist of multiple records per observation, with each observation contributing a separate record of data for each of the competing risks.
The model makes an important assumption about the competing risks and that assumption is this. At the entry time, an observation is assumed to be at risk of experiencing any one of the J events. After event j is experienced, it is assumed the observation is no longer at risk of experiencing that event. Hence, it is now at risk of experiencing one of the J -1 remaining risks.
See below. Here the variable “stcode” is our identifier; the variable “type_r” labels the type of event (1, 2, 4, 5 [coding is arbirtray]) the state is at risk of experiencing (in this context, each event type is a type of restrictive abortion legislation: (informed consent, parental consent, limited funding, and spousal consent). The variable “rev_even” denotes the event indicator: 1 if the j policy was adopted; 0 if not. Year is self-explanatory and _d is redundant with rev_even.
Note that there are 4 lines of data per year. This is because it is assumed the state is at risk of adopting any one of those four kinds of policies. Note how
“type_r” is adjusted after an event is experienced. Look at state 1 in 1977. Here it adopted policy type “5” (spousal consent). After 1977, the events state 1 is now at risk of experiencing are risks 1,2, and 4. Risk 5 was observed and so it’s assumed risk 5 cannot be repeated. Is this feasible? This is a call you have to make.
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
2
+----------------------------------------+
| stcode type_r rev_even year _d |
|----------------------------------------|
1. | 1 1 0 1974 0 |
2. | 1 2 0 1974 0 |
3. | 1 4 0 1974 0 |
4. | 1 5 0 1974 0 |
5. | 1 1 0 1975 0 |
|----------------------------------------|
6. | 1 2 0 1975 0 |
7. | 1 4 0 1975 0 |
8. | 1 5 0 1975 0 |
9. | 1 1 0 1976 0 |
10. | 1 2 0 1976 0 |
|----------------------------------------|
11. | 1 4 0 1976 0 |
12. | 1 5 0 1976 0 |
13. | 1 1 0 1977 0 |
14. | 1 2 0 1977 0 |
15. | 1 4 0 1977 0 |
|----------------------------------------|
16. | 1 5 1 1977 1 |
17. | 1 1 0 1978 0 |
18. | 1 2 0 1978 0 |
19. | 1 4 0 1978 0 |
20. | 1 1 0 1979 0 |
|----------------------------------------|
21. | 1 2 0 1979 0 |
22. | 1 4 0 1979 0 |
23. | 1 1 0 1980 0 |
24. | 1 2 0 1980 0 |
25. | 1 4 0 1980 0 |
|----------------------------------------|
26. | 1 1 0 1981 0 |
27. | 1 2 0 1981 0 |
28. | 1 4 0 1981 0 |
29. | 1 1 0 1982 0 |
30. | 1 2 0 1982 0 |
|----------------------------------------|
31. | 1 4 0 1982 0 |
32. | 1 1 0 1983 0 |
33. | 1 2 0 1983 0 |
34. | 1 4 0 1983 0 |
35. | 1 1 0 1984 0 |
|----------------------------------------|
36. | 1 2 0 1984 0 |
37. | 1 4 0 1984 0 |
38. | 1 1 0 1985 0 |
39. | 1 2 0 1985 0 |
40. | 1 4 0 1985 0 |
|----------------------------------------|
41. | 1 1 0 1986 0 |
42. | 1 2 1 1986 1 |
43. | 1 4 0 1986 0 |
44. | 1 1 0 1987 0 |
45. | 1 4 0 1987 0 |
|----------------------------------------|
46. | 1 1 0 1988 0 |
47. | 1 4 0 1988 0 |
48. | 1 1 0 1989 0 |
49. | 1 4 0 1989 0 |
50. | 1 1 0 1990 0 |
|----------------------------------------|
51. | 1 4 0 1990 0 |
52. | 1 1 0 1991 0 |
53. | 1 4 0 1991 0 |
54. | 1 1 0 1992 0 |
55. | 1 4 0 1992 0 |
|----------------------------------------|
56. | 1 1 0 1993 0 |
57. | 1 4 1 1993 1 |
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
3
You can continue this exercise throughout state 1’s history if you want. Below, I estimate this model:
. stcox south lgctsid nbrestr mooneyp UGOV conright, nohr exactp strata(type_r)
failure _d: rev_even
analysis time _t: duration
Iteration 0: log likelihood = -478.38134
Iteration 1: log likelihood = -455.87676
Iteration 2: log likelihood = -455.23175
Iteration 3: log likelihood = -455.23085
Refining estimates:
Iteration 0: log likelihood = -455.23085
Stratified Cox regr. -- exact partial likelihood
No. of subjects = 2554 Number of obs = 2554
No. of failures = 98
Time at risk = 23593
LR chi2(6) = 46.30
Log likelihood = -455.23085 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
south | .6588073 .2610267 2.52 0.012 .1472043 1.17041
lgctsid | -.142223 .0758693 -1.87 0.061 -.2909241 .0064782
nbrestr | .1128489 .1619133 0.70 0.486 -.2044954 .4301933
mooneyp | -.1923744 .0587138 -3.28 0.001 -.3074513 -.0772974
UGOV | .1085451 .2174234 0.50 0.618 -.3175969 .534687
conright | -.9162597 .2803983 -3.27 0.001 -1.46583 -.3666891
------------------------------------------------------------------------------
Stratified by type_r
The major difference between this and the other model is we are explicitly allowing for different kinds of events. The covariate effects differ, depending on which conceptualization of events you believe is most valid. I summarize this below (table taken from Jones and Branton 2005).
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
4
Table 2. Comparing Competing Risks and Single Event Cox Models of State Adoption of Restrictive Abortion Legislation, 1974-1993
Stratified Competing
Risks Model
Single-Event
Model
Variable
South
Ideology distance
Neighbor
Pre-Roe
Unified Government
Constitutional Right
N
Estimate (s.e.) Estimate (s.e.)
.66 (.26)
.14 (.07)
.82 (.43)
.14 (.12)
.11 (.16)
.19 (.06)
.24 (.23)
.25 (.09)
.11 (.22)
.01 (.34)
.92 (.28)
.31 (.44)
2554 418
Log-Likelihood
455.23
.84
Note: Data are from Brace, Hall, and Langer 2001. Both models
are semi-parametric Cox models.
In the contrast between column 1 and column 2, I would choose column 1 on the grounds that it explicitly allows for competing risks.
The usual and mostly valid complaint against this model is that covariate effects are constrained to be equal across risks. Maybe this is a bad assumption.
Here is another way: partitioned likelihood.
Under this approach, we are going to model the type-specific hazards using Cox model. Practically speaking, this will require us to estimate 4 models, 1 each for each risk. The basic idea here is that we treat the J-1 remaining risks as if they are right-censored cases. This isolates the risk of interest. Mechanically (in
Stata), we will re-stset the data after each estimation.
In Stata this would mean 4 stsets. stset duration, failure(type_r==1)
(estimate Cox model) stset duration, failure(type_r==2)
(estimate Cox model) stset duration, failure(type_r==3)
(estimate Cox model) stset duration, failure(type_r==4)
(estimate Cox model)
Below, I summarize the results that would be obtained from this estimation.
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
5
Table 3. Cox Type-Specific Competing Risks Models of State
Adoption of Restrictive Abortion Legislation, (years)
Informed
Consent
Parental
Consent
Limited
Funding
Spousal
Consent
Variable
South
Ideology Distance
Neighbor States
Pre-Roe
Unified Government
Constitutional Right
Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) Estimate (s.e.)
.32 (.59)
.17 (.16)
.05 (.36)
.26 (.14)
.34 (.47)
1.06 (.65)
.31 (.46)
.32 (.14)
.05 (.28)
.14 (.10)
.24 (.36)
.93 (.47)
.93 (.49)
.08 (.14)
.01 (.31)
.35 (.63)
.05 (.18)
.38 (.36)
.12 (.11)
.29 (.16)
.21 (.44)
.34 (.53)
.82 (.53)
.66 (.63)
N
Log-Likelihood
386
132.75
386
222.67
386
157.08
386
91.25
Note: Data are from Brace, Hall, and Langer 1999.
Differences? There are definite changes from the stratified model to this one.
Which one would I report? Most likely this one. It accounts for competing risks but has the flexibility of permitting covariate-specific effects.
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
6
A Very Quick Look at a Split-Population Model:
First a split-population model using the cloglog link (thanks to S. Jenkins 2001):
. spsurv dead mooneymean neighbor, id(stcode) seq(t)
Iteration 0: log likelihood = -129.08367 (not concave)
Iteration 1: log likelihood = -128.78891
Iteration 2: log likelihood = -127.66949
Iteration 3: log likelihood = -127.09029
Iteration 4: log likelihood = -127.0878
Iteration 5: log likelihood = -127.0878
Split population survival model Number of obs = 386
LR chi2(3) = 15.58
Log likelihood = -127.0878 Prob > chi2 = 0.0014
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------- hazard |
mooneymean | -.3223903 .0959937 -3.36 0.001 -.5105345 -.134246
neighbor | -.0677823 .4140629 -0.16 0.870 -.8793307 .7437661
_cons | -1.713602 .3519405 -4.87 0.000 -2.403393 -1.023812
-------------+---------------------------------------------------------------- cure_p |
_cons | -2.484965 .6332793 -3.92 0.000 -3.72617 -1.24376
------------------------------------------------------------------------------ c = Pr(never fail) = .07691892; Std.Err. = .04496435; z = 1.7106645
Likelihood ratio test of c=0: chibar2(01)= 5.45 Prob>=chibar2 = 0.010
Second, a standard cloglog model.
. cloglog dead mooneymean neighbor
Iteration 0: log likelihood = -129.95727
Iteration 1: log likelihood = -129.81491
Iteration 2: log likelihood = -129.81464
Iteration 3: log likelihood = -129.81464
Complementary log-log regression Number of obs = 386
Zero outcomes = 343
Nonzero outcomes = 43
LR chi2(2) = 10.13
Log likelihood = -129.81464 Prob > chi2 = 0.0063
------------------------------------------------------------------------------
dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mooneymean | -.2126234 .0888717 -2.39 0.017 -.3868088 -.038438
neighbor | .2638002 .3695151 0.71 0.475 -.460436 .9880364
_cons | -2.247332 .3018676 -7.44 0.000 -2.838981 -1.655682
------------------------------------------------------------------------------
I did this quickly! You would want to account for f(t) here!
Bradford S. Jones
ICPSR MLE-2 Event History Course
Stata Tutorial
7