Tutorial 8: Competing Risks

advertisement

August 2005

Stata Application Tutorial 8: Competing Risks/Split Population

________________________________________________________________

Data Note: Code makes use of the restrictive abortion adoption legislation.

These data are available on the Event History website. Code is based on Stata version 8.

Preliminaries: With competing risks, the question arises as to how handle the fact that different kinds of events can possibly occur. This is a common issue with a many duration-type problems social scientists work with. For example, in what ways can a career end? Can regimes “fail” in different ways (overthrown, elections, resignation, etc.)? And so on.

I start with the restrictive abortion adoption legislation. Suppose we just estimate a garden-variety single-state Cox model? We obtain:

. stcox south lgctsid nbrestr mooneyp ugov conright, nohr exactp

failure _d: event

analysis time _t: time

Iteration 0: log likelihood = -189.66918

Iteration 1: log likelihood = -173.75225

Iteration 2: log likelihood = -172.84282

Iteration 3: log likelihood = -172.83956

Iteration 4: log likelihood = -172.83956

Refining estimates:

Iteration 0: log likelihood = -172.83956

Cox regression -- exact partial likelihood

No. of subjects = 418 Number of obs = 418

No. of failures = 44

Time at risk = 3170

LR chi2(6) = 33.66

Log likelihood = -172.83956 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

south | .8244373 .4292111 1.92 0.055 -.016801 1.665676

lgctsid | -.1419261 .1166981 -1.22 0.224 -.3706502 .0867981

nbrestr | .235164 .2336448 1.01 0.314 -.2227715 .6930995

mooneyp | -.2547299 .0856969 -2.97 0.003 -.4226928 -.086767

ugov | -.0061848 .3388421 -0.02 0.985 -.6703032 .6579336

conright | -1.312286 .4413719 -2.97 0.003 -2.177359 -.4472127

------------------------------------------------------------------------------

Under this model, it is assumed that any j event is equivalent to any k event (that is, we are not distinguishing between or among possibly disparate events…or competing risks; also note, the code above will replicate results from Jones and

Branton 2005).

There are a variety of ways to handle competing risks. The simplest way, via a

Cox model, is to estimate a stratified model. Under this model, we make the

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

1

(strong?) assumption that heterogeneity due to different event types is found in the baseline hazards; the covariate effects are the same over the competing risks. This is a strong assumption insofar as it disallows directly estimating different covariate effects for the J competing risks. Instead, heterogeneity is

“swept” into the baseline hazards.

Estimation is straightforward (assuming the data are constructed appropriately).

Under this model, it is assumed that an observation is at risk of experiencing any one of the J events. Consequently, in terms of the data set up, it will consist of multiple records per observation, with each observation contributing a separate record of data for each of the competing risks.

The model makes an important assumption about the competing risks and that assumption is this. At the entry time, an observation is assumed to be at risk of experiencing any one of the J events. After event j is experienced, it is assumed the observation is no longer at risk of experiencing that event. Hence, it is now at risk of experiencing one of the J -1 remaining risks.

See below. Here the variable “stcode” is our identifier; the variable “type_r” labels the type of event (1, 2, 4, 5 [coding is arbirtray]) the state is at risk of experiencing (in this context, each event type is a type of restrictive abortion legislation: (informed consent, parental consent, limited funding, and spousal consent). The variable “rev_even” denotes the event indicator: 1 if the j policy was adopted; 0 if not. Year is self-explanatory and _d is redundant with rev_even.

Note that there are 4 lines of data per year. This is because it is assumed the state is at risk of adopting any one of those four kinds of policies. Note how

“type_r” is adjusted after an event is experienced. Look at state 1 in 1977. Here it adopted policy type “5” (spousal consent). After 1977, the events state 1 is now at risk of experiencing are risks 1,2, and 4. Risk 5 was observed and so it’s assumed risk 5 cannot be repeated. Is this feasible? This is a call you have to make.

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

2

+----------------------------------------+

| stcode type_r rev_even year _d |

|----------------------------------------|

1. | 1 1 0 1974 0 |

2. | 1 2 0 1974 0 |

3. | 1 4 0 1974 0 |

4. | 1 5 0 1974 0 |

5. | 1 1 0 1975 0 |

|----------------------------------------|

6. | 1 2 0 1975 0 |

7. | 1 4 0 1975 0 |

8. | 1 5 0 1975 0 |

9. | 1 1 0 1976 0 |

10. | 1 2 0 1976 0 |

|----------------------------------------|

11. | 1 4 0 1976 0 |

12. | 1 5 0 1976 0 |

13. | 1 1 0 1977 0 |

14. | 1 2 0 1977 0 |

15. | 1 4 0 1977 0 |

|----------------------------------------|

16. | 1 5 1 1977 1 |

17. | 1 1 0 1978 0 |

18. | 1 2 0 1978 0 |

19. | 1 4 0 1978 0 |

20. | 1 1 0 1979 0 |

|----------------------------------------|

21. | 1 2 0 1979 0 |

22. | 1 4 0 1979 0 |

23. | 1 1 0 1980 0 |

24. | 1 2 0 1980 0 |

25. | 1 4 0 1980 0 |

|----------------------------------------|

26. | 1 1 0 1981 0 |

27. | 1 2 0 1981 0 |

28. | 1 4 0 1981 0 |

29. | 1 1 0 1982 0 |

30. | 1 2 0 1982 0 |

|----------------------------------------|

31. | 1 4 0 1982 0 |

32. | 1 1 0 1983 0 |

33. | 1 2 0 1983 0 |

34. | 1 4 0 1983 0 |

35. | 1 1 0 1984 0 |

|----------------------------------------|

36. | 1 2 0 1984 0 |

37. | 1 4 0 1984 0 |

38. | 1 1 0 1985 0 |

39. | 1 2 0 1985 0 |

40. | 1 4 0 1985 0 |

|----------------------------------------|

41. | 1 1 0 1986 0 |

42. | 1 2 1 1986 1 |

43. | 1 4 0 1986 0 |

44. | 1 1 0 1987 0 |

45. | 1 4 0 1987 0 |

|----------------------------------------|

46. | 1 1 0 1988 0 |

47. | 1 4 0 1988 0 |

48. | 1 1 0 1989 0 |

49. | 1 4 0 1989 0 |

50. | 1 1 0 1990 0 |

|----------------------------------------|

51. | 1 4 0 1990 0 |

52. | 1 1 0 1991 0 |

53. | 1 4 0 1991 0 |

54. | 1 1 0 1992 0 |

55. | 1 4 0 1992 0 |

|----------------------------------------|

56. | 1 1 0 1993 0 |

57. | 1 4 1 1993 1 |

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

3

You can continue this exercise throughout state 1’s history if you want. Below, I estimate this model:

. stcox south lgctsid nbrestr mooneyp UGOV conright, nohr exactp strata(type_r)

failure _d: rev_even

analysis time _t: duration

Iteration 0: log likelihood = -478.38134

Iteration 1: log likelihood = -455.87676

Iteration 2: log likelihood = -455.23175

Iteration 3: log likelihood = -455.23085

Refining estimates:

Iteration 0: log likelihood = -455.23085

Stratified Cox regr. -- exact partial likelihood

No. of subjects = 2554 Number of obs = 2554

No. of failures = 98

Time at risk = 23593

LR chi2(6) = 46.30

Log likelihood = -455.23085 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

south | .6588073 .2610267 2.52 0.012 .1472043 1.17041

lgctsid | -.142223 .0758693 -1.87 0.061 -.2909241 .0064782

nbrestr | .1128489 .1619133 0.70 0.486 -.2044954 .4301933

mooneyp | -.1923744 .0587138 -3.28 0.001 -.3074513 -.0772974

UGOV | .1085451 .2174234 0.50 0.618 -.3175969 .534687

conright | -.9162597 .2803983 -3.27 0.001 -1.46583 -.3666891

------------------------------------------------------------------------------

Stratified by type_r

The major difference between this and the other model is we are explicitly allowing for different kinds of events. The covariate effects differ, depending on which conceptualization of events you believe is most valid. I summarize this below (table taken from Jones and Branton 2005).

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

4

Table 2. Comparing Competing Risks and Single Event Cox Models of State Adoption of Restrictive Abortion Legislation, 1974-1993

Stratified Competing

Risks Model

Single-Event

Model

Variable

South

Ideology distance

Neighbor

Pre-Roe

Unified Government

Constitutional Right

N

Estimate (s.e.) Estimate (s.e.)

.66 (.26)



.14 (.07)

.82 (.43)

.14 (.12)

.11 (.16)



.19 (.06)

.24 (.23)

.25 (.09)

.11 (.22)

.01 (.34)



.92 (.28)



.31 (.44)

2554 418

Log-Likelihood

455.23



.84

Note: Data are from Brace, Hall, and Langer 2001. Both models

are semi-parametric Cox models.

In the contrast between column 1 and column 2, I would choose column 1 on the grounds that it explicitly allows for competing risks.

The usual and mostly valid complaint against this model is that covariate effects are constrained to be equal across risks. Maybe this is a bad assumption.

Here is another way: partitioned likelihood.

Under this approach, we are going to model the type-specific hazards using Cox model. Practically speaking, this will require us to estimate 4 models, 1 each for each risk. The basic idea here is that we treat the J-1 remaining risks as if they are right-censored cases. This isolates the risk of interest. Mechanically (in

Stata), we will re-stset the data after each estimation.

In Stata this would mean 4 stsets. stset duration, failure(type_r==1)

(estimate Cox model) stset duration, failure(type_r==2)

(estimate Cox model) stset duration, failure(type_r==3)

(estimate Cox model) stset duration, failure(type_r==4)

(estimate Cox model)

Below, I summarize the results that would be obtained from this estimation.

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

5

Table 3. Cox Type-Specific Competing Risks Models of State

Adoption of Restrictive Abortion Legislation, (years)

Informed

Consent

Parental

Consent

Limited

Funding

Spousal

Consent

Variable

South

Ideology Distance

Neighbor States

Pre-Roe

Unified Government

Constitutional Right

Estimate (s.e.) Estimate (s.e.) Estimate (s.e.) Estimate (s.e.)

.32 (.59)

.17 (.16)

.05 (.36)

.26 (.14)

.34 (.47)

1.06 (.65)

.31 (.46)

.32 (.14)

.05 (.28)

.14 (.10)

.24 (.36)

.93 (.47)

.93 (.49)

.08 (.14)

.01 (.31)

.35 (.63)

.05 (.18)

.38 (.36)

.12 (.11)

.29 (.16)

.21 (.44)

.34 (.53)

.82 (.53)

.66 (.63)

N

Log-Likelihood

386

132.75

386

222.67

386

157.08

386

91.25

Note: Data are from Brace, Hall, and Langer 1999.

Differences? There are definite changes from the stratified model to this one.

Which one would I report? Most likely this one. It accounts for competing risks but has the flexibility of permitting covariate-specific effects.

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

6

A Very Quick Look at a Split-Population Model:

First a split-population model using the cloglog link (thanks to S. Jenkins 2001):

. spsurv dead mooneymean neighbor, id(stcode) seq(t)

Iteration 0: log likelihood = -129.08367 (not concave)

Iteration 1: log likelihood = -128.78891

Iteration 2: log likelihood = -127.66949

Iteration 3: log likelihood = -127.09029

Iteration 4: log likelihood = -127.0878

Iteration 5: log likelihood = -127.0878

Split population survival model Number of obs = 386

LR chi2(3) = 15.58

Log likelihood = -127.0878 Prob > chi2 = 0.0014

------------------------------------------------------------------------------

dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+---------------------------------------------------------------- hazard |

mooneymean | -.3223903 .0959937 -3.36 0.001 -.5105345 -.134246

neighbor | -.0677823 .4140629 -0.16 0.870 -.8793307 .7437661

_cons | -1.713602 .3519405 -4.87 0.000 -2.403393 -1.023812

-------------+---------------------------------------------------------------- cure_p |

_cons | -2.484965 .6332793 -3.92 0.000 -3.72617 -1.24376

------------------------------------------------------------------------------ c = Pr(never fail) = .07691892; Std.Err. = .04496435; z = 1.7106645

Likelihood ratio test of c=0: chibar2(01)= 5.45 Prob>=chibar2 = 0.010

Second, a standard cloglog model.

. cloglog dead mooneymean neighbor

Iteration 0: log likelihood = -129.95727

Iteration 1: log likelihood = -129.81491

Iteration 2: log likelihood = -129.81464

Iteration 3: log likelihood = -129.81464

Complementary log-log regression Number of obs = 386

Zero outcomes = 343

Nonzero outcomes = 43

LR chi2(2) = 10.13

Log likelihood = -129.81464 Prob > chi2 = 0.0063

------------------------------------------------------------------------------

dead | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

mooneymean | -.2126234 .0888717 -2.39 0.017 -.3868088 -.038438

neighbor | .2638002 .3695151 0.71 0.475 -.460436 .9880364

_cons | -2.247332 .3018676 -7.44 0.000 -2.838981 -1.655682

------------------------------------------------------------------------------

I did this quickly! You would want to account for f(t) here!

Bradford S. Jones

ICPSR MLE-2 Event History Course

Stata Tutorial

7

Download