Event History Modeling, Survival Analysis, Duration Models, Hazard

advertisement
Event History Modeling,
aka Survival Analysis,
aka Duration Models,
aka Hazard Analysis
How Long Until …?





Given a strike, how long will it last?
How long will a military intervention or
war last?
How likely is a war or intervention?
What determines the length of a Prime
Minister’s stay in office?
When will a government liberalize
capital controls?
Origins

Medical Science

Wanted to know the time of survival
0 = ALIVE
1 = DEAD


Model slightly peculiar – once you
transition, there is no going back.
Many analogs in Social Sciences
Disadvantages of Alternatives
(Cross Sections)

Assumes steady
state equilibrium



Individuals may vary
but overall
probability is stable
Not dynamic
Can’t detect
causation.
Disadvantages of Alternatives
(Panel)





Measurement Effects
Attrition
Shape not clear
Arbitrary lags
Time periods may
miss transitions
Event History Data



Know the transition
moment
Allows for greater
cohort and temporal
flexibility
Takes full advantage
of data
Data Collection Strategy
(Retrospective Surveys)



Ask Respondent for Recollections
Benefit: Can “cheaply” collect life
history data with single-shot survey
Disadvantages:



Only measure survivors
Retrospective views may be incorrect
Factors may be unknown to respondent
Logic of Model

T = Duration Time
t = elapsed time

Survival Function = S(t) = P(T≥t)

Logic of Model (2)

Probability an event occurs at time t

Cumulative Distribution function of f(t)


Note: S(t) = 1 – F(t)=
 f (u )du
t
Logic of Model (3)

Hazard Rate

Cumulative Hazard Rate
Logic of Model (4)

Interrelationships

so knowing h(t) allows us to derive
survival and probability densities.
Censoring and Truncation

Right truncation


Don’t know when the
event will end
Left truncation

Don’t know when the
event began
Censoring and Truncation (2)
tR
t t
h(t ) 

f (u )du
S (t ) 
t
f (u )du
t
t



f (u )du

S (t ) 

tL
f (u )du
Discrete vs. Continuous Time


Texts draw sharp distinction
Not clear it makes a difference



Estimates rarely differ
Need to measure time in some increment
Big problem comes for Cox Proportional
Hazard Model – it doesn’t like ties
How to Set up Data
(Single Record)
Prime Minister
Took Office
Left Office
Days
Event
Henry Sewell
7 May 1856
20 May 1856
13
1
William Fox
20 May 1856
2 June 1856
13
1
Edward Stafford
2 June 1856
12 July 1861
1866
1
William Fox
12 July 1861
6 August 1862
390
1
Alfred Domett
6 August 1862
30 October 1863
450
1
Frederick Whitaker
30 October 1863
24 November 1864
391
1
Frederick Weld
24 November 1864
16 October 1865
326
1
Edward Stafford
16 October 1865
28 June 1869
1351
1
William Fox
28 June 1869
10 September 1872
1170
1
Edward Stafford
10 September 1872
11 October 1872
31
1
Choices / Distributions

Need to assume a distribution for h(t).




Decision matters
Exponential
Weibull
Cox

Many others, but these are most common
Distributions (Exponential)


Constant Hazard Rate
Can be made to
accommodate
coefficients
  0  1 X
f (t )   e
S (t )  e
h(t ) 
 t
 t
e
e
 t
 t

Distributions (Weibull)

Allows for time dependent hazard rates
 1
 1
 1
Weibull Survival Functions
0.9
Alpha = 1 (Exponential)
0.8
Alpha = 0.5
0.7
Alpha = 1.5
0.6
0.5
0.4
0.3
0.2
0.1
0
0
-0.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Weibull Hazard Rates
6
Alpha = 1 (Exponential)
Alpha = 0.5
5
Alpha = 1.5
Hazard Rate
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
11
Time
12
13
14
15
16
17
18
19
20
Distributions (Cox)

Useful when




Unsure of shape of time dependence
Have weak theory supporting model
Only interested in magnitude and direction
Parameterizing the base-line hazard
rate
Distributions (Cox – 2)
h(t | X )  h0 (t )e
X
Baseline function of
“t” not “X”
Involves “X” but
not “t”
Distributions (Cox –3)
h(t | X )  h0 (t )e
X
Why is it called proportional?
h(t | X  x )  h0 (t )e
x
h(t | X  x  1)  h0 (t )e

 ( x 1)
 h (t | X  x  1)  e h0 (t )e
x 
 h0 (t )e e
x

 e h (t | X  x )
How to Interpret Output



Positive coefficients mean
observation is at increased risk of
event.
Negative coefficients mean
observation is at decreased risk of
event.
Graphs helpful.
Unobserved heterogeneity
and time dependency

Thought experiment on with groups





Each group has a constant hazard rate
The group with higher hazard rate
experience event sooner (out of dataset)
Only people left have lower hazard rate
Appears hazard drops over time
“Solution” akin to random effects
Extensions



Time Varying Coefficients
Multiple Events
Competing Risk Models
Download