Event History Analysis PS 791 Advanced Topics in Data Analysis Event History Analysis … and its cousins Event History Analysis is a general term comprising a set of time duration models Survival Analysis Duration analysis Hazard Modeling Event Duration When we look at processes that occur over time, we are often interested in two aspects of the process: the duration of the events, How long a regime or alliance lasts the transition event or state The occurrence of a coup Survival in broader terms Survival analysis is often used to examine the length of time that an entity survives after exposure to a disease or toxin. In toxicity studies this time might be the LC50 The concentration of the toxin that will kill 50% of the species during the time of exposure – say 24 hours Used for determining acute toxicity of a chemical compound Survival in a non-fatal sense Other senses of survival Length of time a regime lasts or stays in power Length of a military intervention Duration of wars; or alliances The Mathematics of Survival Some definitions: T is a positive random variable for survival time – the length of time before a change of state T is continuous Until we assume it isn’t – for later The actual measure of the survival time, or instance of it, is t. The possible values of T have a probability distribution, f(t), and a cumulative distribution function F(t). The distribution function of T The distribution function of T is expressed as: t F (t ) f (u )d (u ) Pr(T t ) 0 This expresses the idea that some survival time T is less than or equal to t The Unconditional Failure Rate If we differentiate F(t), we get the density function dF (t ) f (t ) F ' (t ) d (t ) We can characterize the distribution of failures by either distribution or density function The Survivor Function The survivor function denotes the probability a survival time T is equal to or greater that some time T. S (t ) 1 F (t ) Pr(T t ) This is also the proportion of units surviving beyond t. S(t) is a strictly decreasing function since as time passes there are fewer and fewer individuals surviving The Hazard Rate Given the survival function and the density of failures, we have a way that “survival” and “death are accounted for in EHA (Event History Analysis) We obtain another important component in EHA when we look at the relationship between the two in the hazard rate. h(t ) f (t ) S (t ) A Conditional Failure Rate The hazard rate is the rate at which units fail - or durations end – by t given that the unit has survived until t. Thus the hazard rate is a conditional failure rate. The Interrelationships The hazard rate, survivor function, and distribution and density functions all interrelated. dS (t ) f (t ) dt Thus the hazard rate can be represented by dS(t ) / dt d log S (t ) h(t ) S (t ) dt Using OLS on Durations If we model the duration of an event using OLS Like the year a regime lasts We regress the duration length on a set of characteristics or exogenous variables Often we will log the duration time because of some extremely durable cases that make the distribution asymmetric. This will cause problems Censoring In some cases, a case may not have failed by the end of the observation period. We refer to this as right-censoring. Model adoption of state lottery If a state has not adopted it by the end of the sample time frame, it is right censored Left-censoring Left censoring occurs when the history of the event begins prior to the start of the observed period A regime that began before the time frame A dispute already underway Censoring (cont) Note that both right- and left-censoring is common in many time-series data sets and is not dealt with in regression designs at all. EHA can incorporate censoring in the models. Based on calculating likelihoods Selection Bias Duration Models can give us a tool to look at Selection Bias When we study something like the determinants of regime failure, and we have a data set comprised of regimes, their failure dates, and the exogenous variables we think led to the failure, we have omitted cases that didn’t fail Because they did not fail because of the same factors that those that did fail we have biased our sample. Duration models can account for this bias. Somehow! Time Varying Covariates Regression assumes constant relationships (covariates) Yt B0 B1 X t et What if the slope changes over the course of the study? Yt B0 Bt X t et Regression can handle this through Stochastic or Time-Varying Parameter models, but they are usually ignored Distribution of failure times If we can correctly specify the type and shape of the distribution of the failure rate, we can estimate the impact of the covariates on the failure rate. The shape of that failure rate is a function of it’s parameterization The model’s covariates are used to assess that parameterization The exponential model The exponential model implies a baseline hazard rate that is flat The likelihood of a failure is the same at any given time This implies a constant hazard rate h(t ) Other distributions Weibell Used if the hazard rate is increasing or decreasing Log-logistic or Log-normal Gompertz How to choose? Theory? Generalized Gamma Proportional Hazard Models Cox Proportional Hazard Similar to Weibull Discrete Time Data An example Events Action-reaction Models