A Competing Risks Model with Time-varying Heterogeneity and Simultaneous Failure Ruixuan Liu Department of Economics Emory University Atlanta, GA 30322 This version: December 2015 Abstract This paper proposes a new bivariate competing risks model that speci…es each marginal duration as the …rst time a Lévy subordinator crosses a random threshold. Our speci…cation is a natural variant of the competing risks version of the mixed proportional hazards model, but it allows time-varying heterogeneity and simultaneous termination of both durations. When the structural multiplicative e¤ects from covariates are of linear-index forms, we identify these e¤ects, the baseline hazard function, and the characteristics of latent Lévy subordinators with competing risks data. Semiparametric estimation of …nite dimensional parameter in our model is developed based on certain average derivative estimation. Keywords: Competing Risks, Duration Analysis, First Passage Time, Lévy Copula, Lévy Processes. JEL Code: C14, C24, C34. I am deeply indebted to my mentor and supervisor Yanqin Fan for her continuing guidance and support. I also appreciate detailed and inspiring comments from Jaap Abbring, Irene Botosaru, Emmanuel Guerre, Qi Li, Eric Renault, Aman Ullah, and Jon Wellner. 1 Introduction In empirical studies, a spell of duration could potentially end for various reasons, whereas only the realized one is recorded in the dataset. This phenomenon is captured by the competing risks model, where two or more types of durations may occur on a subject, but only the one occurring …rst is observed together with its occurrence time, and other durations are censored. For two durations or failure times (Y1 ; Y2 ) of interest, economists or econometricians only observe the minimum Y = min(Y1 ; Y2 ) and the cause of failure. Applications of competing risks models in economics include Flinn and Heckman’s (1982), who investigated the duration of unemployment where an individual can exit unemployment either by …nding a job or by leaving the labor market; Katz and Meyer’s (1990), who studied the probability of leaving unemployment through recalls or new jobs; Deng, Quigley, and Van Order’s (2000), who investigated mortgage termination by prepayment or default; Honore and Lleras-Muney (2006) who studied changes in cancer and cardiovascular mortality since 1970; and Brown and Dinc’s (2010), who examined banks’survival where the failure could be triggered by bankrupcy or acquisition. The competing risks structure is generic for many other economic models, including the classical Roy model in Heckman and Honore (1990), where workers self-select the sector that gives them the highest expected earnings, and the …rst-price auction model in Athey and Haile (2003) where only the winning price and winner’s identity are available in the data. This paper proposes and studies a new bivariate competing risks model that speci…es each marginal duration as the …rst time a stochastic process crosses a random threshold, and imposes dependence between two durations via the unobservable dependent bivariate processes. Here each marginal stochastic process is assumed to be a Lévy subordinator, i.e., a continuous-time process with stationary, independent, and nonnegative increments, and we resort to the Lévy copula (Kallsen and Tankov, 2006) to generate dependence between two latent Lévy subordinators without restricting their marginal speci…cations. Meanwhile two random thresholds are assumed to follow independent unit exponential distributions, and covariates enter into the model in a multiplicative way, basically changing the magnitude of jumps of latent processes. Such a model is of substantial interest because not only is it related to optimal stopping time models where agents optimally time discrete actions (Heckman and Navarro, 2007; Abbring, 2012), but it also applies to the scenario in which the termination of duration is caused by some gradual and irreversible accumulation of damage (Singpurwalla, 1995,2005). The choice of latent processes, thresholds, covariates’ e¤ects and dependence structure in our model could be seen as a natural variant of the competing risks version of the mixed proportional hazards model (MPH) from a stochastic process point of view. In contrast with the traditional hazard-based models including MPH, our model embeds time-varying heterogeneity represented by the latent processes, and it allows simultaneous termination of both durations with positive probability. Surprisingly, the conditional distribution function of two durations in our model is of the same form as that in Marshall and Olkin (1967). Compared with the original and existing generalizations of Marshall-Olkin type model (Klein, Keiding, and Kamby, 1989; An, Christensen, and Gupta, 2004), the durations in our model are de…ned in a highly structural way and the concurrent termination of both durations is not necessarily driven 1 by some independent and common shock. Thus, I will name the present model the Extended Marshall-Olkin (EMO) model. EMO exhibits tractable mathematical structures and interesting probablistic properties, and has empirical content under a competing risks scheme where one does not observe both durations sequentially. Our contributions are mainly three-fold. First of all, the current paper contributes to the literature on the process-based duration models advanced by Abbring (2012) via proposing a bivariate durations model where both durations are described by their threshold-crossing behavior. The combination of our latent processes and threholds di¤er from the ones in Abbring (2012), hence resulting in di¤erent model setup, identi…cation and inference procedures. In the sequel, we show that our construction resembles the bivariate version of MPH, as the latter could be put into the same framework, with its duration de…ned by the …rst passage time of a deterministically increasing function (multiplied by some static heterogeneity term) crossing a unit exponential threshold, and the dependence between two durations is caused by unobservable static heterogeneity terms.1 Our model complements Abbring’s (2012), as the sample path of the latent process here is non-decreasing, therefore it is more suitable to capture the failure due to wearout or aging e¤ects. Second, it is well-known that the competing risks scheme induces delicate (non-)identi…ability issues once dependence between two durations is allowed, and it continues to attract attention from researchers, see Heckman and Honore (1989,1990), Abbring and van den Berg (2003), Lee (2006), Lee and Lewbel (2012), Khan and Tamer (2009), and Fan and Liu (2013). However, none of the aforementioned identi…cation results apply to our present model. Instead, we make novel use of the analytical properties of characteristic functions of Lévy processes and the special Marshall-Olkin structure arising from simultaneous jumping behavior of the latent processes. Assuming multiplicative e¤ects from observable covariates take some linear index forms, we could identify these e¤ects, the baseline hazard function, and the characteristics of latent Lévy processes separately even in a competing risks setting. Last but not least, the identi…cation results immediately suggest straightforward semiparametric estimators based on certain average derivative estimation (Stoker, 1986; Powell, Stock, and Stoker, 1989). We establish root-n consistency and asymptotic normality of proposed estimator for the …nite dimensional parameters in structural multiplicative e¤ects. This is remarkable because although the competing risks version of MPH is known to be nonparametrically identi…able (Heckman and Honore, 1989; Abbring and van den Berg, 2003), only partial answers have been given regarding non-/semiparametric estimation, see Fermanian (2004) and Chen (2010). When it comes to the existing process-based models, the literature is restricted to the univariate duration models and the suggested estimation method is fully parametric with numerical inversion involved, see Abbring (2012), Abbring and Salimans (2012). We are now in a position to describe the agenda for this paper. Section 2 contains a brief review of related literature. Section 3 presents the new univariate duration model in Liu (2015) after reexaming MPH from a process point of view. In Section 4, we construct the extended Marshall-Olkin model for bivariate durations 1 Another close connection to MPH is that the distribution of non-negative heterogeneity term is often assumed to be in…nitely divisible, such as gamma distribution in Lancaster (1979) and stable distribution in Hougaard (1986ab). Correspondingly in the present model, it is well known that any Lévy subordinator has non-negative in…nitely divisible distribution (Sato, 1999). 2 and derive the conditional joint survivial function. Various probabilistic properties and its distinction with the original Marshall-Olkin model are also discussed. Section 5 presents its empirical content under competing risks with the semiparametric estimation procedure for …nite dimensional parameters in Section 6. We conclude in Section 7 discussing potential applications and possible extensions. The Appendices containing all the proofs are further divided into three sections. Appendix A collects the proofs of main results. More technical proofs of the linear representation of the semiparametric estimator are in Appendix B. Appendix C lists various de…nitions and properties related to Lévy processes theory for easy reference. 2 Literature Review In this section we review two branches of related literature on nonparametric identi…cation and estimation of competing risks models and recent advance on process-based duration models. The potential identi…cation failure of competing risks model has a long history dating back to Cox (1959). Tsiatis (1975) comes up with an explicit construction to demonstrate the non-identi…ability of the marginal distribution function of Y1 once the independence between (Y1 ; Y2 ) is dispensed with. Crowder (1991) further ampli…es this identi…ability crisis by proving that even if the two marginal distribution functions of (Y1 ; Y2 ) are given, their joint distribution still remains unidenti…ed generally. In response to the identi…ability crisis, three general approaches have been taken in the literature. First, covariate information and speci…c model structures imposed on the marginal distributions may restore point identi…cation, see e.g., the proportional hazards and accelerated failure time models in Heckman and Honore (1989), Abbring and van den Berg (2003), Lee (2006), and Lee and Lewbel (2012); Second, in the spirit of Peterson (1976), worst-case bounds have been adapted to accelerated failure time model with discrete regressors in Honore and Lleras-Muney (2006), and censored quantile regression in Khan and Tamer (2009). Finally, assuming a known copula for the individual risks, Zheng and Klein (1995) …rst extend point identi…cation results for independent risks to dependent risks and propose a copula-graphic estimator of the marginal survival function. When the copula function is Archimedean and known, Rivest and Wells (2001) …rst derive an explicit expression for the copula-graphic estimator of the survival function proposed in Zheng and Klein (1995). In the context of a censored quantile regression where the independent variable is subject to certain dependent censoring, Fan and Liu (2013) consider partial identi…cation where the copula function is allowed to vary within a prespeci…ed class. Fan and Liu (2013) demonstrate the possibility of getting more informative insight for the conditional quantile function over the approach taking worst-case bounds. Even though MPH are known to be nonparametrically identi…able with competing risks data, the literature on non-/semiparametric estimation is still at its infancy. Fermanian (2004) has proved several consistency results for di¤erent model primitives based on kernel estimators, but no asymptotic distribution has been developed. Moreover, it is not clear what the optimal rates of convergence are. With a known survival copula function between the heterogeneity terms, Chen (2010) has studied a semiparametric MLE for …nite dimensional parameter in the model without restricting the baseline hazard functions. Burda, 3 Harding, and Hausman (2015) have proposed Bayesian estimation methods with a piece-wise linear hazard function and heterogeneity terms having an in…nite mixture of generalized inverse Gaussian densities. Parallel to the traditional hazard-based modeling strategy, many process-based models have been developed by making use of a speci…c parametric sub-class of Lévy processes. For example, the Brownian motion appears in Lancaster’s (1972) strike model and Whitmore’s (1979) job tenure model, while the gamma process has been utilized in Singpurwalla (1995) to model the degradation process. Abbring (2012) has pioneered the study of identi…ability of process-based duration models, without making any parametric assumption. The mixed hitting-time model in Abbring (2012) speci…es the duration formally as …rst passage time of a random threshold by the sample path of a latent stochastic process. Abbring (2012) has demonstrated that not only MHT incorporates the time-varying heterogeneity represented by the latent process, but also it arises more naturally from the optimal stopping time problem where the solution could be described by this threshold-crossing behavior. When the process is chosen to be the spectral negative Lévy process, the identi…ability of all structural components could be shown, adapted from the strategies analyzing the identi…ability of MPH. So far the identi…cation of MHT is restricted to the univariate duration model, and the suggested estimation is fully parametric. Botosaru (2013) also proposes another univariate duration model making use of the Lévy subordinator, which could be either viewed as a hazard-based model with certain random hazard density or a process-based model with the latent process being a double stochastic integral of the Lévy subordinator. Identi…cation in Botosaru’s (2013) random hazard density model relates to solving a nonlinear integral equation. For the identi…ed case, Botosaru (2013) presents a sieve type estimator and proves its consistency. In a companion paper of the present work, Liu (2015) proposes a new univariate duration model where the structural duration is the …rst passage time of a Lévy subordinator crossing a unit exponential threshold. When the structural multiplicative e¤ect is parameterized as a linear index, the model is equivalent as the single-index proportional hazards model where the unknown link function is the Lévy-Laplace exponent function of the latent Lévy subordinator. Liu (2015) has studied a sieve maximum partial likelihood estimator for the …nite dimensional parameter and proved its semiparametric e¢ ciency. The only existing process-based competing risks model is due to Ryu (1993), to the best of my knowledge. Ryu (1993) studies two duration variables with one driven by the natural failure mechanism (characterized by the threshold-crossing behavior of certain non-decreasing process) and the other one from economic agent’s endogenous optimization behavior. The observed duration arises as a consequence of these two competing forces. There is just one latent process in Ryu (1993) and it is assumed to be compound Poisson with nonparametric increments, which actually belongs to Lévy subordinators. Apart from this process, all the other components are parametric in Ryu (1993). The identi…ability of Ryu’s (1993) model remains to be addressed. 4 3 A Uni…ed Perspective of Duration Analysis We provide a uni…ed perspective on the structural duration model described by the threshold-crossing behavior of some latent process, including MPH in Lancaster (1979), MHT in Abbring (2012), and a random hazard density model (RHD) in Botosaru (2013). Thereafter we introduce our new model in the univariate case. We start with a generic duration model where the duration outcome is formally de…ned as the …rst passage time of a latent process e¤ect (t) crossing an unobservable threshold e, with structural proportional (X) from covariates X acting on the process: Y inf ft : In all models discussed in the paper, X, (X) (t) eg : (3.1) (t) and e are assumed to be mutually independent. The rule speci…ed in (3.1) resembles the classical binary choice model closely in a dynamic setup.2 But intead of making a discrete choice based on static comparisons, it stems from the optimal stopping time model where agent optimally chooses when to switch to another state at time Y , such as in …rms’ entry/exit choices (Dixit, 1989), workers’ unemployment decisions (Mortensen and Pissarides, 1994), and real option type investments (McDonold and Siegels, 1986). Speci…cally, the economic agent is facing a utility maximization problem where the current state generates a payo¤ to the agent with e and switching to another state gives (X) (t),3 i.e., the duration in (3.1) is the optimal solution of (Z Z 1 Y max e exp ( s) ds + (X) (t) exp ( Y 0 Y s) ds ) with the discount rate equal to . In spite of the fact that MPH is constructed from a regression modelling on the hazard rate, it is interesting to ask whether it could be put into the same framework where the duration is speci…ed as the …rst passage time of a latent process. Indeed this is possible. This alternative point of view further reveals the restrictions on the latent process and threshold imposed by MPH, and it provides the link of our speci…cation to MPH. Example 3.1 (MPH) The duration outcome in Lancaster’s (1979) MPH could be de…ned as in (3.1) by Y where the latent process is equal to cumulative baseline hazard function inf ft : (t) = (X) (t) eg ; (3.2) (t) with an unobservable heterogeneity term and the (t). The threshold follows unit exponential distribution, i.e. e Exp (1). To see this, note that Pr fY > tjX; g = Pr fe > (t) (X) jX; g = exp ( (t) (X)) : 2 In discrete-time setting, Heckman and Navarro (2007) have constructed a general mixture duration model based on a latent process crossing thresholds. 3 Because only the comparison plays a role from the agent’s perspective, it is not restrictive to let one side to become time invariant, see Honore and Paula (2010). 5 In MPH given the realization of , we just have a deterministic and nondecreasing trend (t) approaching the threshold from below. The individual heterogeneity does not evolve over time along the entire spell of duration, hence it is certainly desirable to construct a model with time-varying heterogeneity. Apparently, one has to make some assumptions on the underlying class for (t), in order to maintain a tractable structure. In the sequel, we shall restrict our attention to general Lévy processes. Lévy processes constitute a very rich and attractive class of stochastic processes, including the commonly encountered Brownian motion, gamma process and stable process as special cases. They have attracted considerable attention due to the ‡exibility for a wide variety of modeling issues in …nance, insurance and engineering, see Kyprianou (2006). Its adaption to duration models have been developed by making use of a speci…c parametric sub-class of Lévy processes in Lancaster (1972), Whitmore (1979) and Singpurwalla (1995). Abbring (2012) has pioneered the study of nonparametric identi…cation of process-based duration models. Before reviewing his contribution, it is necessary to introduce the formal de…nition of a d dimensional Lévy process and two important subclasses. To avoid digressions, we refer the readers to Sato for a authoritative treatment on the Lévy process theory. De…nition 3.2 A d dimensional Lévy process L (t) is a right continuous stochastic process with left limits such that for every t and r 0, the increment L (t + r) L (t) is independent of fL (s) ; 0 s tg and has the same distribution as L (r). Two subclasses are of particular importance: (i) L (t) is a Lévy subordinator if it has almost surely nondecreasing sample path, i.e., for t L (t) s one has L (s). (ii) L (t) is a spectral negative Lévy process if it has no positive jumps. The most remarkable property of a Lévy process is that even it is purely de…ned in terms of descriptive features about the sample path, it admits a very concrete analytic characterization via the well-known Lévy-Khintchine representation, see Sato (1999). Speci…cally, its Laplace transform could be expressed as E exp where zT L (t) = exp [ t (z)] ; ( ) is known as the Lévy-Laplace exponent function, which is nonparametric and time-invariant. This elegant analytic characterization plays a crucial role in identi…cation problems of those duration models driven by Lévy processes, including Abbring (2012), Botosaru (2013), and Liu (2015). Example 3.3 (MHT) Abbring (2012) starts with a model where the structural duration is de…ned to be n o ~ (t) e ; Y inf t : (X) L (3.3) ~ (t) is a spectral negative Lévy process. The empirical content where e is an arbitrary random threshold and L of MHT is revealed through the conditional Laplace transform LY ( jX) of the duration outcome: LY (sjX) E fexp ( sY ) jXg = Le ~ (s) (X) ; where Le is the Laplace transform of e and ~ (s) is the largest root satisfying 6 (3.4) ~ (s) = s. Example 3.4 (RHD) The duratoin in Botosaru’s (2013) random hazard density model could be expressed as Y inf ft : (X) (t) eg ; where the latent crossing process is a double stochastic integral of the Lévy subordinator, as (t) = RtRs f (u) dL (u) ds with a transformation function f (u). When the process L (u) is taken to be a Lévy 0 0 subordinator, the conditional survival function could be written as Z t SY (tjx) = exp ( (x) f (u) (t u)) du : 0 Inspired by Abbring (2012) and Botosaru (2013), I construct a new duration model under the framework of (3.1), replacing the product of the cumulative baseline hazard function and the frailty term (t) in MPH all together with a latent stochastic process L ( (t)). Here L ( ) is a Lévy subordinator and (t) is a deterministically increasing time-change function. This time-change function could be seen as a suitable transformation from certain abstract time scale that the process evolves to the rate of economic transactions (Kyprianou, 2006), which simply turns out to be the cumulative baseline hazard function. Hence the duration in our model becomes Y inf ft : (X) L ( (t)) eg ; (3.5) where the random threshold is still assumed to be unit exponentially distributed as in MPH. Notice the latent Lévy subordinator L ( ) has replaced the static frailty term and its sample path property inherits the key feature of a monotonic driving trend in MPH. Therefore, the latent Lévy subordinator can be viewed as the process generalization of a standard frailty term, see Gjessing, Aalen, and Hjort (2003). Another close connection to MPH is that the distribution of frailty term is often taken to be in…nitely divisible, see Hougaard (1986ab). Correspondingly in our present model, it’s well known the Lévy subordinator has non-negative in…nitely divisible distribution (Sato, 1999). The conditional survival function of Y in (3.5) could be derived in a straightforward way by the Lévy-Khintchine representation: SY (tjx) = exp [ (t) ( (x))] : (3.6) Our new model complements the study in Abbring (2012) and Botosaru (2013), while the great advantage of the current model is that the proportional hazard structure is maintained, which delivers very tractable econometric or statistical properties, see Liu (2015). 4 An Extended Marshall-Olkin (EMO) Model In this section, we will present the formal model setup of EMO and derive the conditional joint surivial function of the bivariate structural durations (Y1 ; Y2 ), highlighting its connection with the bivariate MPH, a continuous-time game model in Honore and Paula (2010). We further distinguish our modeling strategy with the original Marshall-Olkin (1967) model and existing generalizations. Pertaining to the speci…c conditional 7 joint survival function, we discuss various probabilistic properties of EMO, especially its dependence structure which has important implications for its empirical content under competing risks in the next section. 4.1 Construction of EMO We propose a bivariate model where each individual duration is de…ned as the …rst passage time of a latent Lévy subordinator crossing unit exponential threshold. And the dependence is generated via a bivariate dependent Lévy subordinator whose two margins are denoted as L1 (t) and L2 (t). Formally those two durations are de…ned by: Y where e Exp (1) for inf ft : (X) L [ (t )] e g, (4.1) = 1, 2. Notice that we assume the time change function ( ) is common for both durations. As already discussed in Section 3, not only is the class of Lévy subordinators ‡exible enough to incorporate a variety of interesting parametric prcesses (see Appendix C), but it also admits a sharp characterization via the well known Lévy-Khintchine representation. For each marginal Lévy subordinator L (t), its LévyLaplace exponent function ( ) is de…ned via the equation: E fexp [ zL (t)]g = exp [ t In fact (z)] : ( ) could be expressed as the following integral with respect to the jump intensity measure ( ): Z (z) = 1 e zy (dy) : (4.2) R+ Here the characterizing jump measure ( ) is known to be a Lévy measure (see the de…nition in Appendix E) and we denote its corresponding Lévy density function as ( ), if ( ) is absolutely continuous with respect to the Lebesgue measure. Referring to a bivariate Brownian motion, it is obvious that covariance structure between those two marginal Brownian motions captures all the dependence. The answer for pure jumping processes is not immediately clear, as the dependence comes from the relationship of their jump intensities. Such a dependence characterization relies on the notion of the Lévy copula, introduced in Cont and Tankov (2003), Kallsen and Tankov (2006). It is worth emphasizing that the Lévy copula is not a standard copula coupling two marginal distribution functions, rather it is connected with the Lévy measure which measures the jump intensity. There are mainly two advantages of the Lévy copula, compared with the standard distributional copula. First, the laws of multivariate Lévy processes are conveniently speci…ed by their Lévy measures, and only in a few cases the distribution or probability density function could be given in closed forms. In fact, for given two marginal in…nitely divisible distributions, it is unclear which distributional copula generates a joint in…nitely divisible distribution. Second, the distributional copula implied by the joint Lévy processes is typically time-varying, whereas the Lévy copula is time-invariant. Nevertheless, it does share certain similarities with the standard distributional or survival copula functoin, including the Sklar theorem and its 8 role in pinning down the joint Lévy-Laplace exponent function E fexp [ z1 L1 (t) Here we have 12 (u; v) = ZZ 12 (z1 ; z2 ): z2 L2 (t)]g = exp [ t 1 e y1 u y 2 v dCL 1 12 (z1 ; z2 )] : (y1 ) ; 2 (y2 ) ; (4.3) R2+ (y) = where CL ( ; ) is the Lévy copula coupling the two marginal Lévy tail integrals R1 y (y) d (y). We refer the readers to Appendix C for a probablistic interpretation of the tail integral. Theorem 4.1 in Cont and Tankov (2003) ensures that (4.3) could be marginalized into (4.2). The next theorem contains the conditional joint survival function of (Y1 ; Y2 ). Notice that our model exhibits a singularity, meaning there is a positive probability mass assigned on the diagonal, whose twodimensional Lebesgue measure is equal to zero. Theorem 4.1 Assume that the bivariate durations are de…ned by (4.1) with mutually independent covariates X, Lévy subordinators (L1 (t) ; L2 (t)), and unit exponential thresholds (e1 ; e2 ). If two thresholds are independent of each other, then we have the conditional joint survival function as Pr fY1 > t1 ; Y2 > t2 jX = xg = exp [ 1 (x) (t1 ) 2 (x) (4.4) (t2 ) 3 (x) [max (t1 ; t2 )]] ; where moreover all j (x) 1 (x) = 12 ( 1 (x) ; 2 (x)) 2 ( 2 (x)) ; 2 (x) = 12 ( 1 (x) ; 2 (x)) 1 ( 1 (x)) ; 3 (x) = 1 ( 1 (x)) + 2 ( 2 (x)) 12 ( 1 (4.5) (x) ; 2 (x)) ; 0 for j = 1; 2; 3: The following table summaries the comparsion made for three models. Clearly, our EMO embeds timevarying heterogeneity and inherits nice properties from MPH in terms of identi…ability and ‡exiable semiparametric estimation. Remark 4.2 The bivariate MPH model in Heckman and Honore (1989), Abbring and van der Berg (2003) starts with a speci…cation on the marginal conditional hazard density function Y (tjX; ) for = 1; 2, in terms of the proportional e¤ ect from covariate, the frailty term, and the baseline hazard function: Y1 (tjX; ) = 1 (X) 1 1 (t) ; Y2 (tjX; ) = 2 (X) 2 2 (t) : And the underlying dependence between (Y1 ; Y2 ) is produced via the dependent frailty pair ( 1 ; tically in our construction, L ( (t)) plays a similar role as 9 2 ). Heuris- (t). Analogously our model speci…es the Structural Component Threshold Latent Process Sample Path Heterogeneity Univariate Case Identi…cation Semiparametric Est. Competing Risks Identi…cation Semiparametric Est. Table 1: Comparison of Three Models MHT MPH EMO Unrestricted Spectral Negative Lévy No Positive Jump Time-varying Unit Exponential Baseline Hazard Function Deterministic & Increasing Static Unit Exponential Lévy Subordinator Increasing Time-varying Yes N.A. Yes Yes Yes Yes N.A. N.A. Yes N.A. Yes Yes structural duration individually as the …rst passage time via (4.1), and the dependence is generated by the bivariate Lévy subordinators (L1 (t) ; L2 (t)), essentially through the Lévy copula. Remark 4.3 Honore and Paula (2010) have studied the identi…cation of a simultaneous equation model where bivariate durations are determined by the optimal choice of two strategic agents exhibiting the thresholdcrossing feature: Y1 inf ft1 : 1 (X) (t1 ) exp ( 1 fY2 t1 g) e1 g ; Y2 inf ft2 : 2 (X) (t2 ) exp ( 1 fY1 t2 g) e2 g : In this model, thresholds (e1 ; e2 ) are allowed to be arbitrary, but the leader-follower e¤ ect parameter the deterministic trend function and ( ) are assumed to be common for both players. The system resembles a classical simultaneous equations model in which endogenous variables interact with observable and unobservable components. Despite the presence of multiple equilibria, all model primitives are shown to be identi…able by observing bivariate durations (not with competing risks) in Honore and Paula (2010). In extension to incorporate simultaneous stopping phenomenon, Honore and Paula (2014) consider the Nash bargaining solution in a cooperative game. In comparison, our model could be viewed as a seemingly unrelated regression as the dependence is produced via the unobservable dependent processes. Remark 4.4 Assuming (L1 (t) ; L2 (t)) is a compound Poisson process with bivariate non-negative shocks, our model boils down to the bivariate random shock model studied by Marshall and Shaked (1979), in the absence of covariates. Under this circumstance, the joint survival function of (Y1 ; Y2 ) could be determined explicitly without referring to the Lévy-Khintchine representation. Let marginal Lévy subordinator be L (t) = PN (t) i=1 Wi; for = 1, 2, and the arrival of shocks is governed by a common Poisson process N (t) with rate 10 .4 The joint survival function of (Y1 ; Y2 ) is Pr fY1 > t1 ; Y2 > t2 g 1 X 1 k X exp ( t1 ) ( t1 ) exp ( = k! (t2 t1 )) ( (t2 l! t1 )) Pr fY1 > t1 ; Y2 > t2 g 1 X 1 k X exp ( t2 ) ( t2 ) exp ( = k! (t1 t2 )) ( (t1 l! t2 )) k=0 l=0 for t1 h i E F [k;k+l] (e1 ; e2 ) ; k h i E F [k+l;k] (e1 ; e2 ) ; t2 , or k=0 l=0 for t1 t2 . In the above expression, F [k1 ;k2 ] (w1 ; w2 ) Pr 8 k1 <X : Wj;1 w1 ; k2 X Wj;2 j=1 j=1 and the expectation is computed with respect to the thresholds (e1 ; e2 ). 4.2 k w2 9 = ; ; The Original Marshall-Olkin Model and Existing Generalizations The prominent Marshall-Olkin (1967) model is widely adopted in biostatistics, engineering, and life testing where simultaneous failures of human organs, components, or engines are present. The original model does not involve any Lévy subordinator and is constructed with three types of independent shocks: one leading to joint spell termination and two leading to individual spell termination. The occurrence times of three shocks Z1 , Z2 , and Z3 are assumed to be independent exponential random variables with parameters and 3, 1, 2, respectively. So Y1 = min (Z1 ; Z3 ) and Y2 = min (Z2 ; Z3 ) follow the joint distribution given below: Pr fY1 > t1 ; Y2 > t2 g = exp [ In comparison, our construction speci…es Y 1 t1 2 t2 = inf ft : L (t) 3 max (t1 ; t2 )] : e g for = 1; 2, and those parameters could equivalently be expressed as 1 = 12 (1; 1) 2 (1) ; 2 = 12 (1; 1) 1 (1) ; 3 = 1 (1) + 2 (1) 12 (1; 1) : Maintaining the presence of a common shock, Klein, Keiding and Kamby (1989), An, Christensen and Gupta (2004) have generalized the Marshall-Olkin model with covariates and frailty terms. The observed bivariate durations are still assumed to be Y1 = min (Z1 ; Z3 ) and Y2 = min (Z2 ; Z3 ), then a conditional marginal hazard density model like MPH is speci…ed for each Zj for j = 1; 2; 3. Although estimation 4 The single Poisson process is not restrictive because several Poisson processes can always be superimposed to yield a new Poisson process and the several sequences of damage distributions can be replaced by a sequence of appropriate mixture, see Marshall and Shaked (1979). 11 procedures have been suggested in both papers, neither of them presents any large sample theory for the proposed estimators. Honore and Paula (2010, 2014) also criticize this type of generalization of MarshallOlkin model due to a lack of structural interpretation and the fact that the existence of a common shock to both durations seems too restrictive for economic applications. In sharp contrast with the existing generalization of Marshall-Olkin (1967) model, our construction has circumvanted those shortcomings. 4.3 Dependence Properties In EMO, dendence between the two duration variables is generated by dependence between the unobserved characteristics of latent stochastic processes. We are also interested how the dependence between latent processes transfers into the dependence between structural durations. Both the Pearson’s correlation coe¢ cient and two popular rank correlation coe¢ cients, Kendall’s tau and Spearman’s rho for the bivariate durations are given in closed forms in EMO. In this context, the advantage of EMO is borne out, compared with the calculation and bounds analysis for bivariate MPH in van den Berg (1997). With the help of the Lévy copula, our goal is to understand the dependence structure imposed on the resulting …rst passage times, or the structural durations. A straightforward algebra shows the conditional survival copula function of two durations is: CS (u; vjx) = SY1 ;Y2 SY 1 (ujx) ; SY 1 (vjx) jx = uv min u 1 ! 1 (x) 2 ;v ! 2 (x) ; (4.6) where ! 1 (x) = To see this, note that SY 1 (ujx) = CS (u; vjx) = 3 (x) and ! 2 (x) = (x) + 3 (x) 1 1 ( log u (x)+ 3 (x) uv 1 !2 (x) ; u1 !1 (x) v; 3 (x) : (x) + 3 (x) 2 for = 1; 2, then by monotonicity of if if log u 1 (x)+ ( ) we get: log v 3 (x) log u 1 (x)+ 3 (x) 2 (x)+ 3 (x) log v 2 (x)+ 3 (x) : This is known as the Marshall-Olkin copula or the generalized Cuadras-Auge copula. Although this copula is singular without a copula density function, it is an extreme copula with its Pickands’function (see Section 3.3.4 in Nelsen, 2006): A (tjx) = 1 min (! 2 (x) t; ! 1 (x) (1 t)) : Notice that the conditional survival copula function (4.6) changes with x as well, in contrast with the commonly considered situations with an invariant copula in Heckman and Honore (1989), Abbring and van der Berg (2003). The conditional survival copula reveals that our subsequent analysis will not …t into the framework in Fan and Liu (2013), because it is not Archimedean, neither it admits a copula density function with respect to the two dimensional Lebesgue measure. Nevertheless, the dependence structure still delivers very informative empirical content. Next theorem summarizes several dependence measures which are directly linked to this (conditional) Marshall-Olkin copula, see Nelson (2006). In comparison with the bivariate MPH in van den Berg (1997), only certain bounds for Kendall’s tau are available there. 12 Theorem 4.5 Let K (x) stand for (conditional) Kendall’s tau and S (x) for (conditional) Spearman’s rho. Moreover, let tU (x), tL (x) denote the (conditional) upper and lower tail dependence parameters. Then they are of closed forms in EMO: K (x) = S (x) = tU (x) = ! 1 (x) ! 2 (x) ; ! 1 (x) + ! 2 (x) ! 1 (x) ! 2 (x) 3! 1 (x) ! 2 (x) ; 2 [! 1 (x) + ! 2 (x)] ! 1 (x) ! 2 (x) min (! 2 (x) ; ! 1 (x)) ; tL (x) = 0: For Lévy subordinators, any Lévy copula is between the two extreme cases, the independent Lévy copula CL;? (u; v) and the completely dependent Levy copula CL;k (u; v) with the following expressions (Cont and Tankov, 2003): CL;? (u; v) = u I fv = 1g + v I fu = 1g ; CL;k (u; v) = min fu; vg : A convenient parametric Lévy copula which lies between those two extremes is the Lévy-Clayton copula with a single parameter > 0: CL; (u; v) = u 1= +v : (4.7) One nice property related to the Lévy-Clayton copula is that the joint Lévy exponent function (4.3) can be futher simpli…ed as: 12 (u; v) = where (u; v; ) = uv ZZ 1 (u) + y1 u y2 v e R2+ 2 (v) CL; (u; v; ) , 1 (y1 ) ; 2 (y2 ) dy1 dy2 , (4.8) see Proposition 1 in Epifani and Lijoi (2010). Note that the function (u; v; ) summarizes all the dependence structure when the Lévy-Clayton copula is adopted. Given the representation (4.8), we have the following expressions for the coe¢ cients in the extended Marshall-Olkin model: 1 (x) = 1 ( 1 (x)) ( 1 (x) ; 2 (x) ; ) ; 2 (x) = 2 ( 2 (x)) ( 1 (x) ; 2 (x) ; ) ; 3 (x) = ( 1 (x) ; 2 (4.9) (x) ; ) : Consequently, the expressions for those exponents in (4.6) simplify to ! 1 (x) = ( 1 (x) ; 2 (x) ; ) and ! 2 (x) = 1 ( 1 (x)) ( 1 (x) ; 2 (x) ; ) : 2 ( 2 (x)) Moreover, the dependence of two Lévy processes could be stated in terms of the (conditional) Pearson’s correlation coe¢ cient between Y1 and Y2 . For bivariate MPH with Weibull baseline hazard functions, van den Berg (1997) obtains a closed form expression for Pearson’s correlation coe¢ cient, see equation (6) in 13 van den Berg (1997). In comparison, our general expression (4.10) in the next proposition does not restrict the baseline hazard function. Theorem 4.6 (Epifani and Lijoi, 2010) Let when the Lévy copula is (4.7), then RR R2+ (Y1 ; Y2 jx) = In the special case when Q (t) e j=1;2 r (Y1 ; Y2 jx) stand for the conditional correlation coe¢ cient 1 ( 1 (x)) 2 R1 0 te (t) 2 ( 2 (x)) (t) j ( j (x)) e (s^t) ( dt R1 0 1 (x); 2 (x); e (t) j ( ) 1 dsdt j (x)) 2 : (4.10) dt (t) = t, we generalize the classical result as in Section 3.2 of Marshall and Olkin (1967): ( (Y1 ; Y2 jx) = 5 1( 1 (x)) + 2( 1 (x) ; 2 (x) ; ) ( 1 (x) ; 2 (x)) 2 (x) ; ) : Semiparametric Identi…cation with Competing Risks In this section we consider the empirical content of EMO under competing risks. In a competing risks model, we only observe Y = min (Y1 ; Y2 ) and the cause of failure D de…ned slightly di¤erent from the one in a standard competing risks model without simultaneous failure. Here D = 1 if Y1 < Y2 , D = 2 if Y2 < Y1 , and D = 3 if Y1 = Y2 . Apparently, the sampling information is summarized by the sub-distribution functions: FY;D=j (tjX = x) = Pr fY t; D = jjX = xg for j = 1; 2; 3: From the discussion in Section 4, (Y1 ; Y2 ) are still dependent conditional on covariates, hence the identi…cation analysis calls for extra attention. The following lemma collects the expressions for the sub-density functions and already appears in Marshall and Olkin (1967). This convenient form is also utilized by Basu and Ghosh (1978) to show the identi…ability of standard MO distribution under competing risks. A short proof is given in the Appendix A for completeness. Lemma 5.1 (Basu and Ghosh, 1978) Assume ( ) is absolutely continuous with its density function ( ), then conditional sub-density functions fY;D=j (tjx) = @FY;D=j (tjx) =@t are equal to fY;D=j (tjx) = (t) j (x) exp [ (t) 12 ( 1 (x) ; 2 (x))] ; for j = 1; 2; 3: It is worthwhile highlighting the novelty in our identi…cation strategy as existing methods do not apply here. In our model, the conditional survival copula function of two durations changes with covariates’value, unlike the models in Heckman and Honore (1989), Abbring and van den Berg (2003), Lee (2006), and Lee and Lewbel (2013). Furthermore, this conditional survival copula is not Archimedean and does not even admit a density function, which goes beyond the framework in Fan and Liu (2013). But thanks to the Marshall-Olkin structure, we can …rst identify certain reduced-form functions which are determined jointly by the structural multiplicative e¤ects from observable covariates and the characteristics of latent Lévy 14 subordinators. When covariates enter into the model via linear index, those reduced-form functions generate two single-index models (Ichimura, 1993) from which we could identify …nite dimensional parameter in the indices and marginal Lévy measure. Thereafter we proceed to identify the Lévy copula. We …rst list all the regularity assumptions, and then present the main theorem establishing the identi…ability of all model primitives. Assumption (A) (1) The time-change function denoted by ( ), moreover : R+ ! R+ , is nondecreasing and absolutely continuous with its density (0) = 0 and (0) = 1. R (2) The two marginal Lévy measures satisfy R+ y (dy ) < 1 for = 1; 2 and the joiny Lévy measure R satis…es R2 y1 y2 12 (dy1 ; dy2 ) < 1. Moreover, two marginal tail integrals are continuous for = 1; 2. + (3) The support of ( 1 (X) ; of R , moreover 2 d 1 S+ d S+ 1 (X)) contains an open rectangle in R2+ . (x) = exp xT (4) For = 1; 2 assume d 2 , and the support of X is not contained in any linear subspace where f : k k2 = 1; and ’s …rst non-zero coordinate is positiveg : Theorem 5.1 Let Assumptions (A1)-(A4) hold, then all the structural components ( ; ; CL ) with = 1; 2 are point identi…ed. The normalization assumption in (A1) by restricting (0) = 0, (0) = 1 is innocuous since we do not observe the magnitude of the latent process anyway. Besides the obvious example with (t) = t, one could P also consider higher order polynomials (t) = cm tm or the following logarithm function which leads to the Pareto type distribution: (t) = log 1 + t with > 0: Assumption (A2) requires the marginal Lévy measure has …nite …rst moment, which turns out to be equivalent as the corresponding moment restriction on L1 (1) or L2 (1), see Wolfe (1971). Correspondingly, assuming the …niteness of the …rst moment of a frailty term is widely adopted in the identi…cation results of MPH, see Elbers and Ridder (1982), Heckman and Honore (1989), and Abbring and van den Berg (2003). In fact, such an assumption is almost necessary to identify MPH, see Ridder (1990). Again this con…rms that a Lévy subordinator is a natural generalization of frailty term from a process point of view. Here we also need the assumption E [L1 (1) L2 (1)] < 1, restricting the joint Lévy measure. The variation restriction in (A3) on structural regressions is rather mild in view of Abbring and van den Berg (2003). With the linear index restrictions on covariates, the reduced-form conditional sub-density functions fY;D (tjx) belong to the multiple indices models, whose identi…cation typically rely on exclusion restrictions as in Ichimura and Lee (1991). But in our context, such an assumption could be circumvented since we could …rst identify certain single-index model because 0 exp x = 12 ( 1 (x) ; 2 15 (x)) E [D jX = x] , for = 1; 2; (5.1) where auxiliary binary variables are Di = I fDi = or 3g for = 1; 2; which prove to be useful in the estimation part as well. The function (5.2) 12 ( 1 (x) ; 2 (x)) is identi…ed from the sampling information in Y : Pr [Y > tjX = x] = exp [ (t) In those compositions, the unknown link function 12 ( 1 (x) ; 2 (x))] : (marginal Lévy-Laplace exponent function) could be identi…ed without extra regularity assumptions, as those Lévy-Laplace exponent functions are automatically very smooth. In the end, after all the marginal components are identi…ed, the Lévy copula is also uniquely determined given the Sklar’s theorem in Kallsen and Tankov (2006). 6 Semiparametric Estimation with Competing Risks Considering semiparametric estimation, we will focus on estimating for base-line hazard function. We denote the cumulative hazard function as function as (t; ), with its univariate true parameter = 1; 2 with a parameterized (t; ) and the hazard density 0. Notice for the random variable Y , its conditional hazard function is shown to be of proportional structure in Section 5: Y (tjX = x) = (t) exp [ (Xi )] ; where (Xi ) = log Therefore we could estimate 12 ( 1 (Xi ) ; 2 (Xi )) : ( ) and its derivatives using Fan, Gijbels and King (1997) even when Y is subject to some addtional random censoring C, due to drop-out or early termination of the study. Although independent censoring can be handled by traditional hazard-based models in duration or survival analysis, its incorporation into the recent structural duration model calls for extra attention. For example, even though Abbring’s (2012) MHT is naturally expressed in terms of the conditional Laplace transform, one has to invert the Laplace transform back to the probability density function to handle additional independent censoring. In comparison, such a complication could be easily embedded in our EMO. Let V = min (Y; C) and = I [Y < C]. From now on we shall explicity work with i:i:d:obervations Zi = (Vi ; i; i Di ; Xi ). We can only distinguish Di if Yi is not subject to addtional independent censoring from Ci : It is well known that the …nite-dimensional parameter in the single-index model is only identi…able upto a scale-normalization (Ichimura, 1993). Referring to (5.1), we see the following parameter derivatives are scale-equivalent to based on average (Stoker, 1986; Powell, Stock, and Stoker, 1989): = E [aD (Xi ) bY (Xi ) exp (aY (Xi )) + exp (aY (Xi )) bD (Xi )] ; for = 1; 2; 16 (6.1) where aD (x) = E [D jX = x], bD (x) = natural estimator for 1 n b @ @x E [D jX = x], aY (x) = (x), and bY (x) = @ @x (x). Hence, the are o b aD (Xi ) bbY (Xi ) exp (b aY (Xi )) + exp (b aY (Xi )) bbD (Xi ) ; for = 1; 2; n n X i=1 (6.2) with b a ( ) and bb ( ) being estimated intercept term and slope coe¢ cients obtained from local polynomial …tting. Starting from Fan and Gijbels (1996), the local polynomial estimator has become a default choice if one plans to estimate either the function value or its various partial derivatives. Chaudhuri, Doksum, and Samarov (1997), Li, Lu, and Ullah (2003) have studied the average derivative estimators from a general multivariate local polynomial …tting, and both demonstrated its better …nite sample performance, compared with the local constant based approaches in Hardle and Stoker (1989), Powell, Stock, and Stoker (1989). Here we follow this route, since our "average derivative" involves both the function values and its …rst order derivatives all together. Let us pin down some notations for the local polynomial estimator with order p (adapted from Tsybakov, 2008), with d dimensional covariates. Let v = (v1 ; number of v with jvj T d Q i=1 + vd , and let P be the total p: The following vector U (z) is in RP : U (z) = where v! = ; vd ) with jvj = v1 + zv ; jvj v! p ; (6.3) vi !, the vectors of v 2 N d being ordered lexicographically, and z v = d Q i=1 zivi . Furthermore, we de…ne two types of selection vectors e 2 RP to pick up the estimators for the function value and its …rst order derivatives respectively: eT0;P = eTs;P = @0; 0; :::; 1::::; 0; 0:::; 0A ; | {z } (1; 0; 0; :::; 0) ; 0 1 d where 1 s d. We need to carry out the local polynomial estimation for the dependent variables D and Y separately. For notational brevity, we shall use the same kernel density function K (u), the bandwidth hn and p th polynomial in both estimation. We write Khn ( ) = hn d K ( =hn ) and Uhn ( ) = U ( =hn ). The local polynomial …tting associated with D is standard, which minimizes the following weighted least squares problem, see Fan and Gijbels (1996) Xh bD (x) = arg min 1 Di n i=1 n T Uhn (Xi i2 x) i Khn (Xi x) : (6.4) Whenever we evaluate the estimate at a sample point Xi , we shall use its leave-one-out version as in Li, Lu, and Ullah (2003): Xh bD (Xi ) = arg min 1 Dj n T Uhn (Xj j6=i 17 i2 Xi ) i Khn (Xj Xi ) : The weigths ( n i )i=1 indicate that the estimation is only carried out for the uncensored subsample. Let the estimators for the intercept and slope coe¢ cients be denoted as: and bbD ( ) = bbD ;1 b aD (Xi ) = eT0;P bD (Xi ) ; and bbD ( ) ; :::; bbD ;s T ;d () . (x) = hn 1 eTs;P bD (Xi ) ; As for the second local polynomial estimation, we consider the minimizer of the local likelihood function in Fan, Gijbels, and King (1997), where bY (x) ; b (x) = arg min ln (x; Y; )=n 1 n h X i=1 i n log (Vi ; ) + T o x) Uhn (Xi ; ln (x; (Vi ; ) exp Y ; ) with: T Uhn (Xi x) i Khn (Xi x) : Therefore b aY (Xi ) and bbY (Xi ) are de…ned in a similar way based on the leave-one-out version. The following theorem shows the desired asymptotic normality of our estimator, and we spell out the form of its asymptotic covariance in Appendix B. Theorem 6.1 Under Assumptions (B1)-(B5) and (C1)-(C5) in Appendix B, we have the following asymptotic normality holds for = 1; 2: p 7 Conclusion n b =) N (0; ): This is the …rst paper on a competing risks model where the structural durations are driven by dependent stochastic processes, without parameterizing the latent processes. The latent Lévy subordinators could be seen as the natural variant of the static heterogeneity terms in MPH from a process point of view, and it resembles the notion of usage and wear-out e¤ect (Singpurwalla, 1995) closely. Hence, besides those optimal stopping time problems in economics, our model also applies to the scenario concerning the life/failure time in biology, medical science, and engineering, where the traditional hazard-based models dominate. In this Extended Marshall-Olkin (EMO) model, it is transparent how the deep structural parameters of theoretical models can be related to reduced-form functions, and all model primitives could be point identi…ed for various semiparametric speci…cations. The identi…cation strength further leads to root-n consistent and asymptotically normal estimators for parameters in the structural proportional e¤ects. Interestingly, the model generates equal durations with positive probability, motivated from an empirical application on the joint retirement decisions of married couples, as a signi…cant portion of couples choose to retire at the same time documented across di¤erent datasets (Honore and Paula, 2014). This key feature is ruled out from the very beginning in the traditional duration or survival analysis. In this application, the Lévy subordinator with its increasing sample path is more suitable to represent the latent aging process or the degradation of health condition of the elderly, as opposed to the spectrally negative Lévy process in Abbring (2012) which ‡uctuates up and down. There is no positive jump in the spectrally negative Lévy process by de…nition, but unexpected health shocks, such as heart attacks or new cancer diagnoses, are quite 18 common for people near retirement (Coile, 2004). The machinery developed in this paper provides a nice complementary tool compared with existing methods that have been used. Numerious extensions are possible. First, one could incorporate endogenous and time-varying covariates by modeling additional stochastic processes explicitly. Renault, van den Heijden, and Werker (2014) present a structural model for transaction time (duration) and asset prices (associated marks) driven by multivariate Brownian motion. In their model, successive passage times of one latent Gaussian component relative to random boundaries de…ne durations and the other correlated components generate the marks. It is both interesting and challenging to search for other processes which lead to tractable structure. Second, it is desirable to relax the parametric assumption on the threshold variable and to consider a more general Lévy process, i.e., the spectral positive Lévy process (Boyarchenko and Levendorskii, 2014). Last but not least, it is worthwhile to study the empirical content of stochastic game models driven by a Lévy process; see Chapter 11 in Kyprianou (2006). 8 Appendix A. Proofs of Main Results In this section, we prove the main theoretical results in the paper, including driving the expressions of the joint survival function, the sub-density functions, proving the identi…ability of all model primitives and showing asymptotic normality of the semiparametric estimator. Proof. (of Theorem 4.1) We consider two cases separately in the proof. When t1 t2 , we get Pr fY1 > t1 ; Y2 > t2 jX = xg = Pr fe1 > 1 = E fexp ( 1 = E fexp [ ( = (x) L1 [ (t1 )] ; e2 > (x) L1 [ (t1 )] 1 2 (x) L1 [ (t1 )] + exp [ ( (t2 ) (t1 )) 2 ( 2 2 (x) L2 [ (t2 )]g (x) L2 [ (t2 )])g 2 (x) L2 [ (t1 )]) (x)) (t1 ) 12 ( 2 1 (x) (L2 [ (t2 )] (x) ; 2 L2 [ (t1 )])]g (x))] : In the string of above equalities, the …rst one comes from the way we de…ne Y1 , Y2 as …rst passage times, while the second one utilizes the distributional assumption on the random thresholds. The third equality is prepared for using the independence between increments of Lévy processes, while the …nal one follows the Lévy-Khintchine representation. Analogously for the other case t1 t2 , we have: Pr fY1 > t1 ; Y2 > t2 jX = xg = exp [ ( (t1 ) (t2 )) 1 ( 1 (x)) (t2 ) 12 ( 1 (x) ; 2 (x))] : Now the uni…ed expression in (4.4) just summarizes those two cases and those functions j (x) arise from a simple reparameterization. When it comes to the non-negativeness of max f 1 (u) ; 2 j (v)g (x), we have the following inequalities hold 12 19 (u; v) 1 (u) + 2 (v) , for the marginal and joint Lévy-Laplace exponent functions for non-negative (u; v). For an arbitrary Lévy copula CL de…ned on R2+ , we have the Frechet-Hoe¤ding type inequality (Cont and Tankov, 2004) hold CL;? CL CL;k where CL;? (u; v) = u I fv = +1g + v I fu = +1g ; CL;k (u; v) = min fu; vg : CL;? (u; v) is the independence copula and CL;k (u; v) = min fu; vg is the complete dependence copula funcy 1 u y2 v tion, respectively. Notice that 1 e is the distribution function of a bivariate exponential distribution function with parameters (u; v), which we denote by Fu;v ( ; ). It is clear that 12 = ZZ (u; v) 1 y1 u y 2 v e dCL 1 (y1 ) ; 2 (y2 ) R2+ ZZ = R2+ ZZ R2+ = 1 CL 1 (y1 ) ; CL;? (u) + 2 1 (y2 ) dFu;v (y1 ; y2 ) 2 (y1 ) ; 2 (y2 ) dFu;v (y1 ; y2 ) (v) ; where integration-by-parts is used in the second equality. Similarly, we have (u; v) ZZ CL 12 = (y1 ) ; 1 2 (y2 ) dFu;v (y1 ; y2 ) R2+ ZZ = ZZ R2+ CL;k 1 1 (y1 ) ; y 1 u y2 v e 1 (u) ; 2 (y2 ) dFu;v (y1 ; y2 ) dCL;k R2+ max f 2 1 (y1 ) ; 2 (y2 ) (v)g , where the …nal inequality follows from the fact that 1 e y 1 u y2 v max 1 y1 u e and the integration is carried out on the line (y1 ; y2 ) : ;1 1 e (y1 ) = y2 v 2 , (y2 ) . Proof. (of Lemma 5.1) Recall that conditional joint survival function is S (t1 ; t2 jx) = exp [ 1 (x) (t1 ) 2 20 (x) (t2 ) 3 (x) [max (t1 ; t2 )]] ; whence for t1 < t2 , it simpli…es to exp [ 1 (x) (t1 ) [ 2 (x) + 3 (x)] (t2 )]. Upon di¤erentiation, it has a density of the following form: 1 (x) [ 2 (x) + (x)] (t1 ) (t2 ) exp [ 3 Integrating the above density over the range [t; +1] 1 (x) (t1 ) [ (x) + 2 3 (x)] (t2 )] : (t1 ; +1] delivers the conditional sub-survival function for D = 1: SV;D=1 (tjx) Z Z = 1 (x) [ 2 (x) + 3 (x)] (t1 ) (t2 ) exp [ 1 (x) (t1 ) t t1 Z = (x) exp [ [ 1 (x) + 2 (x) + 3 (x)] (t1 )] d (t1 ) 1 [ 2 (x) + 3 (x)] (t2 )] dt2 dt1 t = (x) 1 (x) + 2 (x) + 1 3 (x) exp [ [ 1 (x) + 2 (x) + 3 (x)] (t)] : A similar derivation gives one the conditional sub-survival function for D = 2: 2 SV;D=2 (tjx) = 1 (x) + 2 (x) (x) + 3 (x) exp [ [ 1 (x) + 2 (x) + 3 (x)] (t)] : Hence subtracting the sum of those two sub-survival functions from the joint survival function evaluated at t1 = t2 = t, we get 3 SV;D=3 (tjx) = 1 (x) + 2 (x) (x) + 3 (x) exp [ [ 1 (x) + 1 (x) ; 2 (x) + 3 (x)] (t)] : Notice that the following equation holds 1 (x) + 2 (x) + 3 (x) = ( 12 2 (x)) ; given the reparameterization in (4.5). Finally the claimed expressions of those conditional sub-density functions follow by di¤erentiating the corresponding sub-survival function. Proof. (of Theorem 5.1) (Step 1) First, let us …rst summarize the sampling information in the competing risks model that is crucial for subsequent analysis. Referring to the auxilary random variables Di = I fDi = or 3g with = 1; 2, we have (x) + 3 (x) (x) + 1 2 (x) + 3 (x) Pr [D = 1jX = x] = (A.1) from Lemma 5.1. The numerator on the right hand side of (A.1) delivers two single-index models as 0 exp x = (x) + (x) ; for = 1; 2; 3 (A.2) which leads to 0 exp x Pr [D = 1jX = x] = 12 ( 1 (x) ; 2 (x)) for = 1; 2: Moreover, we have Pr [Y > tjX = x] = exp [ (t) 21 12 ( 1 (x) ; 2 (x))] ; (A.3) which asserts the condtional hazard function of Y is of proportional structure: Y (tjX = x) = (t) ( 12 (Step 2) Notice that we would be able to identify 1 (x) ; and 2 (x)) : (A.4) from these two single-index models in (A.2) if the denominator in (A.3) is identi…able. This is indeed the case as 12 ( 1 (x) ; 2 (x)) = fV (0jx) given the multiplicative structure in (A.4) and the normalization assumption (A1) imposed on time-change function . Thereafter, the (t) could be identi…ed from the conditional survival function of Y . The identi…ability follows from Theorem 1 in Lin and Kulasekera (2007) given our Assumption (A4) and the smoothness of Lévy-Laplace exponent functions 0 . As link functions, are identi…ed within the support of exp x as shown by Lin and Kulasekera (2007). The real analytical property of allows a unique extension on the whole positive real line, by Proposition 1 in Abbring and van den Berg (2003). (Step 3) Given the identi…cation of way. Recall that (u) = , we can identify the marginal Lévy measure Z 1 e yu in the following (dy) , for = 1; 2: R+ 1 ( ); 2 ( ) belong to the class of so-called Bernstein functions, meaning their derivatives are completely monotone as in Theorem 3.2 in Schilling, Song, and Vondracek (2012). Let the operator L denote the Laplace transform of a positive measure and de…ne two positive measures e (dy) = y (dy) for = 1; 2. Then we get @ @u 1 @ @v 2 (u) Z = e yu y 1 e yv y 2 R+ (v) Z = R+ (dy) = L e 1 ; (dy) = L e 2 ; by di¤erentiating w.r.t u and v respectively. Assumption (A2) guarantees that e ( ) are …nite measures for = 1; 2. Hence, by Theorem 12b in Widder (1947) and Proposition 1 in Abbring and van den Berg (2003) we could identify e 1 ; e 2 from L e 1 ; L e 2 , as long as we get enough variation on some non-empty open sets as in Assumption (A3). Thereafter, are also identi…ed for = 1; 2. (Step 4) It su¤cies to identify the Lévy copula function to fully pin down the characteristics of the Lévy subordinators. So far, we have already identi…ed Hence 12 (u; v) is also identi…ed. Notice that Z @2 e 12 (u; v) = @u@v R2+ Whence, we could identify e 12 (dy) = y1 y2 12 12 ( y 1 u y2 v (dy1 1 (x) ; y 1 y2 12 2 (x)) and (dy1 1 (x) = exp(xT ) for = 1; 2. dy2 ) = L e 12 : dy2 ) by the bivariate Laplace inversion theorem in Widder (1947). Given the joint Lévy measure, its copula function CL is unique by the Sklar theorem in Kallsen and Tankov (2006) given the absolutely continuity of the tail integrals associated with the marginal Lévy measure.. 22 Proof. (of Theorem 6.1) The proof of main theorem is patterned after Hardle and Stoker (1989), Li, Lu, and Ullah (2003), but the detail is slightly more involved, as we are dealing with various intercept and slope estimates all together. We decompose the statistic into three categories: the …rst is the centered i:i:d average of the infeasible estimator while the second part consists of several U-statistics. The third remainder term is shown to be negligible by applying supremum bounds on various local polynomial estimators. The second U-statistic would further be linearized via Hoe¤ding decomposition. Thereafter the desired asymptotic normality follows from the behavior of those centered linear terms. (Step 1) We prove the uniform consistency of the local polynomial estimators and show the rates of convergence are for D or Y . b a (x) s a (x) = Op log n nhdn ! ; and bb (x) b (x) = Op s log n nhd+2 n ! ; (Step 2) We have the following decomposition of b where tn0 = n 1 1 1 = tn0 + 4 X tnl + op n 1=2 ; l=1 n n o X aD (Xi ) bY (Xi ) eaY (Xi ) + eaY (Xi ) bD (Xi ) ; i=1 tn1 tn2 = n = n 1 1 n X i=1 n X i=1 tn3 tn4 = n = n 1 1 n X i=1 n X i=1 . bY (Xi ) eaY (Xi ) [b aD (Xi ) h eaY (Xi ) bbD (Xi ) aD (Xi )] ; i bD (Xi ) ; eaY (Xi ) (aD (Xi ) bY (Xi ) + bD (Xi )) [b aY (Xi ) h aD (Xi ) eaY (Xi ) bbY (Xi ) aY (Xi )] ; i bY (Xi ) : (Step 3) The statistics tnl all belong to U-statistics for l = 1; :::; 4. We obtain the linear representation for tnl = n 1 n X 'l (Zi ) + op n 1=2 ; for l = 1; :::; 4: i=1 We refer readers to Lemma B.5 and Lemma B.8 for explicit expressions of 'l ( ), l = 1; :::; 4: (Step 4) The terms associated with '3 ( ) and '4 ( ) can be written as local martingales attached to the counting process Ni (t) = I [Vi ( 1 ; :::; n) t; i = 1], i = 1; :::; n. Let the associated random intensity process speci…ed as i (t) = I (Vi > t) (t) exp 23 (Xi ) : Thus Mi (t) = Ni (t) Z t i (u) du; 0 are local martingales. Because the regressor is time-invariant, the local martingale has zero mean conditional n on all fXi gi=1 . Hence the asymptotic normality of tn0 ; tn1 ; tn2 could be checked apart from tn3 ; tn4 . Those two parts are orthogonal in fact. Then the asymptotic normality follows by verifying the conditions in Rebollebo’s CLT for local martingales and standard Lindeberg CLT applies to the sample average of i:i:d: terms. 9 Appendix B. Proofs of Linear Representations Our analysis of average derivative type estimators follow the same scheme as in Powell, Stock, and Stoker (1989), Hardle and Stoker (1989), Chaudhuri, Doksum, and Samarov (1997), and Li, Lu, and Ullah (2003). Two key components are the uniform linear representation of local polynomial estimators and linearization of U-statistics. For the …rst purpose, we rely on some recent results by Einmahl and Mason (2005), while for the second we resort to Gine and Mason (2007). We let M be a generic …nite positive constant whose value might change from line to line, and occasionally we use subscript when more than one constant appear in the same display. First recall some standard de…nitions in Einmahl and Mason (2005). Let F collects some functions f : Rd ! R. We say the class F is of VC type with respect to an envelope F if the covering number N (F; L2 (Q) ; "), the smallest number of L2 (Q) open balls of radius " required to cover F, statis…es !v M kF kL2 (Q) ; for 0 < " 2 kF kL2 (Q) , N (F; L2 (Q) ; ") " for some universal positive constant M; v for every probability measure Q on the underlying space. Also by pointwise measurable class F, we mean there exists a countable subclass F0 of F s.t. we can …nd for any f 2 F a sequence of functions ffm g in F0 for which fm (x) ! f (x) for x 2 Rd . We shall apply the following lemma when f is the identity map and the generalized kernel function K x hn is K x hn U x hn . Lemma B.1 For i:i:d: observations, we consider the following generalized kernel type estimator gnhn (x) = n where the classes F = ff g and K = K type. Moreover, C n 1 X cf (x) K nhdn i=1 x hn x Xi hn : hn > 0; x 2 Rdx o f (Zi ) ; are both pointwisely measurable and of VC fcf : f 2 Fg is relatively compact w.r.t the supnorm topology and either 9M > 0 : F (Z) 1 fx 2 J g M < 1, (B.1) sup E F 2+ (Z) jX = x = M < 1; (B.2) or x2J 24 with > 2 and the envelope function F for class F. we have p nhdn kgnhn (x) Egnhn (x)k p lim sup sup sup < 1; d log hn _ log log n n!1 h2Hn f 2F (B.3) where Hn = with log n n hn : C = 1 in the bounded case (B.1), or =1 hn h0 ; 2= (2 + ) under the moment restriction in (B.2). To handle the U-processes which are related to tnj , we need to introduce addtional terminologies and notations, see Gine and Mason (2007). For a kernel function f of k variables, we denote Un(k) (f ) = where Inm = f(i1 ; ; im ) : 1 ij (n n! k)! X f (Xi1 ; ; X ik ) ; k i2In n; ij 6= ik if j 6= kg. Now suppose f is symmetric in its entries, we have the well-known Hoe¤ding decomposition: Un(m) (f ) Ef = m X Un(k) ( kf ) ; k=1 where kf 2 Moreover let =( x1 P) ( P) xk Pm k f: (which we call maximal variance) be any number satisfying P mf 2 2 F M 2: Lemma B.2 (Gine and Mason, 2007, Theorem 8) Let F be a collection of measurable symmetric functions f : S m ! R, bounded up by M in absolute values, and let P be any probability measure on (S; S) : Assume F is of VC type with envelope function F and A m e ;v 1; there exist constants C1; C2 , s.t. for any k = 1; :::; m; nk E Un(k) ( assuming n 2 M and with characteristics A and v. Then for every m 2 N , C2 log A 2 kf ) F C1 2k 2 log A k ; (B.4) . When we simply have k = 1, the bound in (B.4) is equivalent as Proposition 1 in Einmahl and Mason (2005). It’s clear from Section 5 that our estimators for the structural regression parameter would be based on various conditional moments and corresponding …rst order derivatives. We let m ( ) be the conditional mean function of D : Di = m (Xi ) + "i ; where E ["i jXi ] = 0 holds almost surely. The local polynomial estimator takes a closed form as: 25 bD where " # n 1X (x) = Gn 1 (x) n i=1 i Di Khn (Xi x) Uhn (Xi x) ; (B.5) n Gn (x) = Let q (x) = Pr f i 1X n i=1 i Khn (Xi x) UhTn (Xi x) Uhn (Xi x) : = 1jXi = xg and fX (x) denote the marginal density function of covariates X: Now we list the regularity assumptions needed. Assumption (B) (1) We assume the conditional independence censoring mechanism (Yi ? Ci ) jXi . In addition, q (x) M0 > 0 uniformly in x 2 X . (2) The conditional mean function is p + 1 order di¤erentiable with bounded (p + 1)-th order derivative uniformly in x 2 X , thus 0 jm(v) (x) m(v) x for all v 2 N d such that jvj = p, one has m(v) (x) = j v @x11 0 Ljjx @v v @xdd x jj; m (x), stands for partial derivative and jj jj for the Euclidean norm. (3) The marginal density function fX (x) exist and is second order continuously di¤erentiable. Moreover it is bounded up and below uniformly, i.e. 8x 2 X , fX (x) M0 > 0 and fX (x) M1 < 1 hold. d (4) The kernel density function K is a compactly supported (w.l.o.g let it be [ 1; 1] ), bounded kernel n o (up by Kmax ). Furthermore, K = K xhn : hn > 0; x 2 Rd is pointwisely measurable and of VC type. R Moreover, p (K) = K (u) U (u) U T (u) du is a positive de…nite P P matrix. (5) The bandwidth sequence satis…es: hn ! 0, log n p d+1 nhn ! 0; and nh2p n ! 0: It calls for a uniform asymptotic analysis of the derivative estimators from local polynomial …tting. We shall focus on bbD bbD ;t ;t (x), as the analysis of b aD (x) could be carried out upon change of notation. Recall (x) = hn 1 eTs;PbD (x), where eTs;P is the particular selection vector picking up s-th component of …rst order partial derivatives. Write it as a linear smoother bbD with s Wni (x) = ;s (x) = X s Di Wni (x) ; 1 T e G 1 (x) D Khn (Xi nhn s;P n x) Uhn (Xi (B.6) x) : (B.7) Consider the p-th order Taylor expansion for a generic smooth function m ( ) locally around x, where the window size is controlled by hn . m (Xi ) = X 0 jsj p m(s) (x) (Xi s! 26 s x) + R (Xi ; x) where R (Xi ; x) = P jsj=p p+d 1 d 1 jrs (Xi ; x) j, the total number of the sum equals to Np = . Thus we could decompose it into two parts by Proposition 1.12 in Tsybakov (2008): bbD ;s (x) We …rst bound the bias term bD P ;s (x) = X s R (Xi ; x) Wni (x) + X s "i Wni (x) : s R (Xi ; x) Wni (x) uniformly in covariates’value. P s (x)j = Op (hpn ) : Lemma B.3 Under Assumptions (B1)-(B5), we get supx j R (Xi ; x) Wni Proof. Recall R (Xi ; x) = one could de…ne Ns ; 0 s P jsj=p x , similarly p: jrs (Xi ; x) j = j with p+d 1 d 1 jrs (Xi ; x) j, the total number of the sum equals to Np = m(s) ( p! x) m(s) (x) (Xi p! p (Xi x) L p+1 h ; p! n p x) j being some intermediate value in neighborhood around x:Therefore, the reminder term is controlled LNp p+1 p! hn ; (uniformly in x) of the following order jR (Xi ; x) j s jWni (x) j 1 n hn 1 jjGn 1 (x) Khn (Xi and with probability 1 we have x) Uhn (Xi C ; nhn x) jj where we use the fact that Gn (x) is positive semide…nite and symmetric with probability approaching 1, thus jjGn 1 (x) vjj Cjjvjj. Finally the bias term could be bounded up uniformly by sup x X CNp p+1 X s h jWni (x) j p! n s R (Xi ; x) Wni (x) Chpn : In the current context, we need to expand the inverse Gn 1 (x) into two terms similarly as Lemma 4.2 in Chaudhuri, Doksum, and Samarov (1997), or Lemma A.1 in Li, Lu, and Ullah (2003). For a kernel function K, we de…ne p (K) = Z U (u) U T (u) K (u) du; T p (K) = Z U (u) K (u) du: Furthermore let uT = (u1 ; :::; ud ) , and for 1 s d, Z Z #p;s (K) = U (u) U T (u) K (u) us du; {p;s (K) = U (u) K (u) us du: Lemma B.4 Under Assumptions (B1)-(B5), we have 1 sup Gn (x) x 1 fX (x) q 1 (x) 1 p where G (x) 2 fX (x) q 2 (x) 1 p (K) (K) d X s=1 27 hn G (x) = Op (1) [fX (x) q (x)]s h2n + s ! #p;s (K) log n nhdn 1 p ! (K) : ; Proof. Note that Z Xi x 1 Xi x Xi x K EGn (x) = U UT dFX (Xi ) hdn hn hn hn Z = fX (x + uhn ) q (x + uhn ) K (u) U (u) U T (u) du = fX (x) q (x) p (K) + hn d X (1) [fX (x) q (x)]s #p;s (K) + O h2n : s=1 And by the results in Doney, Einmahl, and Mason (2006), we get s sup jGn (x) EGn (x) j = Op x log n nhdn ! : Hence we use von Neumann series expansion for the inverse matrix: Gn 1 (x) = 1 fX (x) q (x) 1 p hn 2 fX (x) q 2 (x) (K) 1 p (K) d X (1) [fX (x) q (x)]s ! #p;s (K) s=1 1 p (K) + Op h2n + s log n nhdn ! : Thus the linear representations we are seeking are b aD (x) aD (x) = and bbD where 1 ;s (x) bD X 1 eT0;P nfX (x) q (x) ;s (x) 1 p (K) "i i Khn (Xi x) Uhn (Xi x) + op n X 1 eTs;P p 1 (K) "i i Khn (Xi x) Uhn (Xi nhn fX (x) q (x) 1X T es;P G (x) "i i Khn (Xi x) Uhn (Xi x) + op n n = s d. From the previous two lemmas we immediately obtain s ! log n ; sup bbD (x) bD (x) = Op sup jb aD (x) aD (x)j = Op nhdn x2X x2X We de…ne the matrix A via AT = e1;P ; :::; ed;P T 1 hn s Lemma B.5 Under Assumptions (B1)-(B5), we have for l = 1; 2: n 1X ' (Zi ) + op n n i=1 l 1=2 ; where the in‡uence functions are de…ned by '1 (Zi ) = "i and '2 (Zi ) = "i i aY (Xi ) bY ie (Xi ) ; n eaY (Xi ) bY (Xi ) + eaY (Xi ) fX (Xi ) [AG (Xi ) 28 ; (B.8) x) (B.9) 1=2 ! ; : . The following lemma gives the expression for the in‡uence function in those terms tn;1 and tn;2 . tnl = log n nhdn 1=2 p o (K)] : Proof. The goal is to collect the dominating linear terms from various U-statistics. We only carry out a detailed proof tn2 as the analysis for tn1 follows the same type of argument somewhat simpler. Let tn2;s denote the s th element of vector tn2 . Given (B.9) and the restriction on the bandwidth hn , we have 1 tn2;s = n n X i=1 1 = where Un(2) (f1 ) = Un(2) (f2 ) = hd+1 n Un(2) (f1 ) h eaY (Xi ) bbD ;s (Xi ) bD 1 (2) U (f2 ) + op n hdn n ;s 1=2 i (Xi ) : 0 n X 1 X eaY (Xj ) Xi Xj @ eTs;P p 1 (K) "i i K Uhn (Xi 2 n j=1 fX (Xj ) q (Xj ) hn i6=j 0 1 n Xi Xj 1 X aY (Xj ) @X T e es;P G (Xj ) "i i K U (Xi Xj )A : n2 j=1 hn 1 Xj )A ; i6=j These three terms could be analyzed via U-statistics by a simple symmetrization, say for the …rst one 0 1 n aY (Xj ) 1 X@ 1 X T e " X X i i j i es;P p 1 (K) K Uhn (Xi Xj )A n j=1 nhd+1 fX (Xj ) hn n i6=j " X eaY (Xi ) "j j 1 eaY (Xj ) "i i 2 Xi Xj T 1 es;P p (K) d+1 K Uhn (Xj Xi ) + Uh (Xi = n (n 1) hn fX (Xi ) q (Xi ) fX (Xj ) q (Xj ) n hn 1 i<j n The classes of functions that come into play in the U-processes are F1 = F2 = Xi Xj eaY (Xj ) "i K Uhn (Xi Xj ) : hn 2 H ; fX (Xj ) q (Xj ) hn Xi Xj f2 : f2 = eTs;P G (Xi ) "i eaY (Xj ) K U (Xi Xj ) : hn 2 H : hn f1 : f1 = eTs;P 1 p (K) the second order tem of the Hoe¤ding decomposition is of smaller order hn d 1 Un(2) ( 2 f1 ) = Op log n nhdn ; by the maximal moment bound in Lemma B.4 as we could bound the maximal variance by an order of hdn . The …rst order term in the Hoe¤ding decomposition for f1 is = = 1 U (1) d+1 n hn n X 1 n ( 1 f1 ) Ej eTs;P 1 p (K) i=1 n 1X " eT n i=1 i s;P 1 p (K) Z 1 hd+1 n eaY (xj ) K hd+1 n n = = 1 X eaY (Xi ) "i T es;P n i=1 hn K Xi eaY (Xj ) "i U fX (Xj ) Xj hn xj Xi hn U Xi Xi Xj hn xj dxj hn n 1 p 1 X aY (Xi ) T (K) p (K) + "e es;P n i=1 i n 1 X aY (Xi ) "e bY;s (Xi ) + op n n i=1 i 1=2 29 1 p (K) " d X s=1 # bY;s (Xi ) #p;s (K) + op n 1=2 # Xj ) : where we have used eTs;P p 1 R (K) K (u) U (u) du = p 1 (K) p (K) s+1;1 = 0 for 1 s d as in Lemma A.3 of Li, Lu, and Ullah (2003) and the equality (4.27) in Chaudhuri, Doksum, and Samarov (1997). When it comes to f2 ;we get 1 (1) U ( hdn n n 1X " fX (Xi ) eaY (Xi ) eTs;P G (Xi ) 1 f2 ) = n i=1 i (2) Similarly, for the other second order terms, we have hn d Un ( 2 f2 ) p (K) + op n log n nhd n = Op 1=2 ; : The proof of obtaining the in‡uence function from local likelihood …tting falls into the same framework as the analysis above. The complication is due to the presence of this extra parameter in the hazard function. Following Fan, Gijbels, and King (1997), it will prove to be convenient to introduce some additional notations (t; ) = log (t; ). (x) E [ (V ; 0 ) jX = x] ; h 0 i (x) = e (x) E (V ; 0 ) jX = x ; h 00 00 (x) = E (V ; 0 ) + e (x) (V ; & (x) = e i ) jX = x : 0 Moreover, let S0 (x) = and & (x) (x) fX (x) p T p (K) (K) (x) p (K) ; (x) ! Pd Pd (1) (1) [fX (x) & (x)]s #p;s (K) [fX (x) (x)]s {p;s (K) s=1 s=1 ; Pd (1) T 0 s=1 [fX (x) (x)]s {p;s (K) S1 (x) = It is straightforward to verify that the matrix S0 (x) is invertible: S0 1 (x) = where (x) = & 1 (x) p 1 (K) 0 + 0T 0 (x) 2 e0;P eT0;P 2 (x) & eT0;P (x) (x) & 1 (x) 2 1 (x) (x) 1 e0;P (x) (x) & 1 (x) (B.10) (x) =& (x) : Assumption (C) (1) The function (x) is (p + 1) th order continuously di¤erentiable with bounded (p + 1)-th order derivative uniformly in x 2 X for all x 2 X . (2) The following conditional mean functions E ( jX) , E ( (V ; 0 E (V ; 0 ) jX 0 ) jX) , 0 E 0 , and E (V ; (V ; 0 ) jX 00 ,E (V ; 0 ) jX 0 ) jX are all uniformly continuous for x 2 X . (3) There exists some E 2+ > 0, such that (V ; 0 ) jX ,E 0 2+ (V ; 0) are …nite and uniformly continuous for X = x. 30 jX and E 0 2+ (V ; 0) jX , (4) 0 is an interior point of the parameter space. In addition, there exists a function M (v), with EM (V ) < 1, such that @3 @3 < M (v) ; and (v; ) @ 3 @ 3 for all v and all in a neighborhood of (v; ) < M (v) ; 0. (5) We assume 00 flog (s) Let (V ; )g 00 0, and (V ; ) 0. T (x) ; 1 jsj where H = diag (hvn ; 1 p jvj collect all the partial derivatives of the function T Y (x), and (x) = (s) T (x) ; 1 jsj p H, p). Strengthening the result in Theorem 1 in Fan, Gijbels and King (1997), we get the uniform consistency results: sup x n bY (x) Y o j = op (1) ; 0 (x) + jb (x) moreover, the following expansion holds bY (x) b (x) Y (x) = Sn 1 (x) l_n ( Y (x) ; 0) 0 The score function is l_n ( (x) ; Y @ln ( ; ) @ = n @ln ( ; ) @ = n 1 1 0) n h X i T (Vi ; ) exp i i=1 n h X 0 T @ln ( ; ) @ln ( ; ) ; @ @ T = 0 (Vi ; ) 2 bY (x) + Op Y (x) 2 0j + jb (x) : where Uhn (Xi (Vi ; ) exp x) T i Uhn (Xi Uhn (Xi x) i=1 x) Khn (Xi i Khn (Xi x) ; x) : and the Hessian matrix Sn (x) is de…ned as = n Sn (x) n X 1 Khn (Xi x) (Vi ; 0) e 0 i=1 T Uhn (Xi x) (Vi ; 0) e T x) UhTn (Xi Uhn (Xi Uhn (Xi x) UhTn (Xi 0 x) 00 x) (Vi ; 0) e (Vi ; 0) e T T Uhn (Xi x) Uhn (Xi Uhn (Xi x) + Assumption (C5) ensures Sn (x) to be negative de…nite. Let rn (x) = n 1 n X (Vi ; 0 i=1 (Xi (Vi ; 0 ) We shall decompose the score function l_n ( l_n ( Y (x) ; x) h 0 ) Uhn 0) = n 1 Y (x) ; 0) e e( (Xi ) T Uhn (Xi x)) x) Khn (Xi where = i vi = i (Vi ; 0 (Vi ; Khn (Xi into two parts n X "i Uhn (Xi vi i=1 "i i 0) 31 (Xi ) 0) e 0 ; and (Vi ; 0) e (Xi ) x) + rn (x) ; x) : i (Vi ; x) ! 0) : both have conditional zero mean by Proposition 1 in Fan, Gijbels, and King (1997). Apply (B.2) to the centered part in l_n ( (x) ; 0 ), sup n 1 Y x it holds that n X "i Uhn (Xi vi i=1 x) p log n nhdn x) = Op Khn (Xi : (B.11) Lemma B.6 Under Assumptions (C1)-(C5), we have supx jrn (x)j = Op hp+1 . n Proof. First of all, the expected value of rn (x) is equal to Ern (x) = f (x) e E ( jX = x) (x) p+1 hn 0 E (V ; RP (v) jvj=p+1 0 ) jX = x RP (x) v u U v! (u) K (u) du (v) (x) jvj=p+1 v! uv K (u) du ; + op hp+1 n (B.12) extending the proof of Theorem 2 in Fan, Gijbels and King (1997) into the multivariate case. and similarly p we get Ern2 (x) = O hp+1 . Thus by the uniform bound in (B.4), we have n s ! log n sup jrn (x) Ern (x)j = Op hp+1 ; n nhdn x as desired. Lemma B.7 Under Assumptions (C1)-(C5), we have 1 1 sup Sn (x) h2n hn S (x) = Op fX (x) S0 (x) x where S (x) 1 + s log n nhdn ! ; (B.13) fX 2 (x) S0 1 (x) S1 (x) S0 1 (x) : The proof of Lemma B.7 follows the same line as in Lemma B.4, mutatis mutandis, hence we omit it. We de…ne the matrix B via BT = e1;P +1 ; :::; ed;P +1 T . Now we are ready to get the in‡uence functions for tn3 and tn4 . Lemma B.8 Under Assumptions (C1)-(C5), we have for l = 3; 4: n tnl = 1X ' (Zi ) + op n n i=1 l 1=2 ; where the in‡uence functions are de…ned by '3 (Zi ) = eaY (Xi ) [aD (Xi ) bY (Xi ) + bD (Xi )] "i (Xi ) & (Xi ) + 2 (Xi ) (Xi ) & 2 (Xi ) vi (Xi ) ; & (Xi ) and '4 (Zi ) = eaY (Xi ) BS (Xi ) 0 ( B "i B +A B @ p 1 "i p (K) vi (K) Pd s=1 h ea Y ( X i ) &(Xi ) i(1) s vi e0;P {p;s (K) + Pd s=1 32 e0;P eT0;P (Xi )eaY (Xi ) (Xi )&(Xi ) Pd s=1 2 (Xi )eaY (Xi ) (Xi )& 2 (Xi ) (1) {p;s (K) s (1) ) 1 {p;s (K) s C C C: A Proof. Here we only highlight the di¤erence apart from the proof of Lemma B.5 regarding tn4 . Combining (B.11)(B.12)(B.13), it follows tn4;s = 1 hd+1 n Un(2) (g1 ) + 1 (2) U (g2 ) + op n hdn n 1=2 : where Un(2) (g1 ) = n 1 XX 1 "i Uhn (Xi fX (Xj ) eaY (Xj ) eTs;P +1 S0 1 (Xj ) n2 j=1 vi Xj ) Xi K i6=j Un(2) (g2 ) = n 1 X X aY (Xj ) T "i Uhn (Xi e es;P +1 S (Xj ) n2 j=1 vi Xj ) K Xi i6=j Xj hn Xj hn ; : It follows the same argument that the second order terms in the Hoe¤ding decomposition are of smaller order, so we focus on the expressions of the …rst order term: 1 hd+1 n = = = Un(1) ( 1 g1 ) Z n Xi xj "i Uhn (Xi xj ) 1 X T e S0 1 (xj ) fX 1 (xj ) eaY (xj ) K dFX (xj ) s;P +1 h vi nhd+1 n n i=1 Z n "i U (u) 1 X T e S0 1 (Xi uhn ) eaY (Xi uhn ) K (u) du nhn i=1 s;P +1 vi ) 0 ( (1) 2 Pd h eaY (Xi ) i(1) Pd (Xi )eaY (Xi ) 1 T "i {p;s (K) + e0;P e0;P s=1 {p;s (K) n p (K) s=1 &(Xi ) (Xi )& 2 (Xi ) 1X T B s B s es;P B (1) n i=1 @ a (Xi ) Pd i )e Y vi e0;P s=1 (X(X {p;s (K) i )&(Xi ) s +op n 1=2 : where we have taken a …rst-order Taylor expansion for S0 1 (Xi uhn ) eaY (Xi uhn ) 1 C C C A and the zero-order term vanishes similarly as in Lemma A.3 of Li, Lu, and Ullah (2003). For g2 ; the linear term via Hoe¤ding decomposition is 1 (1) U ( hdn n 10 n 1 g2 ) = "i p (K) 1 X aY (Xi ) T e es;P +1 S (Xi ) + op n n i=1 vi 1=2 : Appendix C. A Review of Lévy Processes Lévy process, a process with independent and stationary increments, is the natural generalization of random walk in continuous time and it encompasses many well studied ones, such as Brownian motion, (compound) Poisson process, gamma process, and inverse Gaussian process. The distribution of Lévy process at a …xed time coincides with the in…nitely divisible law, which also arises as the most general limit law in the classical limit theorems for null arrays, see Sato (1999). Two sub-families of the general Lévy process are of particular interest. One is the so-called spectral negative Lévy process used in Abbring (2012), which cannot have any 33 positive jump. While the other is the Lévy subordinator, which has non-decreasing sample path. The Gaussian component would necessarily be ruled out in a Lévy subordinator. In this Appendix, I collect some characterization and terminology about Levy process, and refer the interested readers to Sato (1999) and Cont and Tankov (2003) for further information. Despite the richness of Lévy class, it admits a concise characterization via the celebrated Lévy-Khintchine representation. A d dimensional Lévy process L (t) is completely characterized by the triplet ( ; ; ) through it characteristic function E fexp [izL (t)]g = exp [t with where i = p (z) = h ; zi 1, 1 0 z z+ 2 is the Lévy measure. Z Rd h eihy;zi 2 Rd 1 i hy; zi 1fjyj 1g i (dy) ; on Rd n f0g is called Lévy measure if it satis…es the following De…nition 10.1 A positive Radon measure condition: Z (z)] ; z 2 Rd ; 1 ^ jyj (dy) < 1: Recall that a Lévy subordinator L (t) has almost surely nondecreasing sample path, i.e., for t has L (t) s one L (s). The following lemma gives an equivalent characterization of the subordinator in terms of the triplet ( ; ; ). Proposition 10.2 L (t) is a Lévy subordinator i¤ . R positive quadrant with Rd (1 ^ y) (dy) < 1. = 0;and 0; assigns zero measure outside the + The restriction R Rd + (1 ^ y) (dy) < 1 arises because the nondecreasing process has bounded variation. We could simplify its Lévy-Laplace exponent function in this case a bit: D 0 E Z h i (z) = i ;z + eihy;zi 1 (dy) ; Rd where 0 = Z y (dy) : jyj 1 In this paper we restrict our attention to the case where 0 = 0 without any deterministic linear trend. Otherwise, we need to take second order derivatives to identify and . By Theorem 4.1 in Cont and Tankov (2003), there is no linear trend for each marginal Lévy process either. The next two lemmas are about the equivalence between moment conditions for L (t) and moment conditions for the corresponding Lévy measure. For the absolute moment conditions, see Corollary 25.8 in Sato (1999). 34 Proposition 10.3 (Theorem 25.17 Sato, 1999) 8 non-negative u, the exponential moment E fexp [ uL (t)]g is …nite for some t; if and only if it is …nite for all t, if and only if Z e hy;ui (dy) < 1. jyj 1 In this situation, we have E fexp [ uL (t)]g = exp [t (iu)] . Proposition 10.4 (Wolfe, 1971) The existence of moment for marginal distribution L (t) is equivalent as the existence of moment for the Lévy measure . For most Lévy processes, the jump intensity would explode to in…nity as the jump size converges to zero. The resulting implication is that the survival function version of Lévy measure is more tractable than its distribution function version. Such survival function attached to the Lévy measure is the so-called tail integral which we de…ne next. De…nition 10.5 (Tail Integral) A two-dimensional tail integral is a function (1) is a 2-increasing function; (2) is equal to zero if one of its arguments is equal to +1; (3) is …nite everywhere except possibly at zero. The probabilistic interpretation of the tail integral measure : R2+ ! R+ s.t. (x) is the following. Let N (t; x) denote the number of jumps of magnitude larger than the positive constant x during the time interval [0; t]. Then N (t; x) is a Poisson process with parameter (x) t. With the tail integral, we could de…ne the notion of Lévy copula and its role in pinning down the joint Lévy measure. De…nition 10.6 (Lévy Copula) A two dimensional Lévy copula for Lévy subordinators is a 2 increasing 2 grounded function CL (u; v) : [0; 1] ! [0; 1] with uniform margins, i.e. CL (u; 1) = CL (1; u) = u: Theorem 10.7 (Sklar Theorem) Let be a two-dimensional tail integral with margins 1; 2, then there exists a positive Levy copula CL s.t. (u; v) = CL if 1; 2 1 (u) ; 2 (v) ; (C.1) are continuous, the copula function CL is unique. Conversely for a given Lévy copula CL and two marginal tail integrals 1; 2, (C.1) de…nes a two-dimensional tail integral. Recall the Clayton-Lévy copula takes the form CL; (u; v) = (u +v ) 1= , with a single parameter 2 (0; 1). A straightforward yet slightly tedious computation leads to the following expressions of its 35 partial derivative w.r.t the arguments u or v: @ CL; (u; v) @u @ CL; (u; v) @v 1 = u 1 u +v = v 1 u +v 1 1 1 ; ; and its derivative w.r.t he parameter : @ CL; (u; v) @ 1 = u log (u 1 +v +v log u1 + v (u + v u ) 2 When the marginal Lévy measure has a density function log ) 1 v : ( ) with respect to the Lebesgue measure and the Lévy copula function is di¤erentiable, the joint Lévy-Laplace exponent function admits the following alternative expression: 12 (z1 ; z2 ) = ZZ 1 e @2 CL ( ; ) ju= @u@v z1 y1 z2 y2 R2+ 1 (y1 ) v= 1 (y1 ) 2 (y2 ) dy1 dy2 ; 2 (y2 ) Next we list some commonly used univariate Lévy densities, from Cont and Tankov (2003), Gjessing, Aalen and Hjort (2003). Example 10.8 The stable process has a power parameter 2 (0; 1) and stable index 2 (0; 2], the marginal Lévy density is S (y) = ) y 1+ (1 ; and the Lévy exponent function S (u) = u : Example 10.9 The gamma process has a shape parameter and scale parameter , the marginal Lévy density is G (y) = y e ; y and the Lévy exponent function G (u) = log 1 + u : Example 10.10 The power variance function process is a general process nesting the above gamma process as a special case, the marginal Lévy density is S with > 0, n > (y) = n n yn (n + 1) 1 y e ; 1 and n > 0 and the Lévy exponent function is PV F (u) = n 1 1+ The gamma process arises as a borderline case when n = 0: 36 u no : Finally we would like to comment on the compound Poissson process L (t) = P i N (t) Wi , because it is the only one having piece-wise constant sample path (almost surely) and its correponding Lévy measure is R …nite, i.e. Rd (dy) < 1. In addition, when Wi has a density function f (w) and the arrival rate of N (t) is , the Lévy measure has a density function CP (z) = (w) = f (w) and the Lévy exponent function simpli…es to Z h i eihw;zi 1 f (w) dw. Rd References [1] Abbring, J.H. (2012), Mixed hitting-time models, Econometrica, 80, 783–819. [2] Abbring, J.H. and G.J. van den Berg (2003), The identi…ability of the mixed proportional hazards competing risks model, Journal of the Royal Statistical Society B, 65 701-710. [3] Abbring, J.H. and T. Salimans (2012), The likelihood of mixed hitting times, working paper. [4] An, M.Y., Christensen, B.J., and N.D. Gupta (2004), Multivariate mixed proportional hazard modelling of the joint retirement of married couples, Journal of Applied Econometrics, 19, 687–704. [5] Athey, S. and P. Haile (2002), Identi…cation of standard auction models, Econometrica, 70, 2107-2140. [6] Basu, A.P. and J.K. Ghosh (1978), Identi…ability of the Multinomial and other distributions under competing risks model, Journal of Multivariate Analysis, 8, 413-429. [7] Botosaru, I. (2013), A duration model with dynamic unobserved heterogeneity, working paper. [8] Boyarchenko, S. and S. Levendorskii (2014), Preemption games under Lévy uncertainty, Games and Economic Behavior, 88, 354-380. [9] Brown, C.O. and I.S. Dinc (2011), Too many to fail? Evidence of regulatory forbearance when the banking sector is weak, Review of Financial Studies, 24, 1378-1405. [10] Burda, M., Harding, M., and J. Hausman (2015), A Bayesian semiparametric competing risk model with unobserved heterogeneity, Journal of Applied Econometrics, 30, 353-376. [11] Chaudhuri, P., Doksum, K., and A. Samarov (1997), On average derivative quantile regression, The Annals of Statistics, 25, 715-744. [12] Chen, Y.-H. (2010), Semiparametric marginal regression analysis for dependent competing risks under an assumed copula, Journal of the Royal Statistical Society B, 72 235-251. [13] Coile, C. (2004), Health shocks and couples’labor supply decisions, NBER working paper 10810. [14] Cox, D.R. (1959), The analysis of exponentially distributed life-times with two types of failure, Journal of the Royal Statistical Society B, 21, 411–421. [15] Cont, R. and P. Tankov (2004), Financial Modelling with Jump Processes. Chapman & Hall/CRC, Boca Raton, FL. 37 [16] Crowder, M. (1991), On the identi…ability crisis in competing risk analysis, Scandinavian Journal of Statistics 18, 223-233. [17] de la Pena, V.H. and E. Gine (1999), Decoupling: From Dependence to Independence. Springer. [18] Deng, Y., Quigley, J.M., and R. Van Order (2000), Mortgage terminations, heterogeneity and the exercise of mortgage options, Econometrica, 68, 275-307. [19] Dixit, A.K. (1989), Entry and exit decisions under uncertainty, Journal of Political Economy, 97, 620– 638. [20] Einmahl, U. and D.M. Mason (2005), Uniform in bandwidth consistency of kernel-type function estimators, The Annals of Statistics, 33, 1380-1403. [21] Elbers, C. and G. Ridder (1982), True and spurious duration dependence: The identi…ability of the proportional hazard model, Review of Economic Studies, 49, 403-409. [22] Epifani, I. and A. Lijoi (2010), Nonparametric priors for vectors of survival functions, Statistica Sinica, 20, 1455-1484. [23] Fan, J. and I. Gijbels, (1996), Local Polynomial Modeling and Its Applications, Chapman & Hall, London. [24] Fan, J., Gijbels, I., and M. King (1997), Local likelihood and local partial likelihood in hazard regression, The Annals of Statistics, 25, 1661-1690. [25] Fan, Y. and R. Liu (2013), Partial identi…cation and inference in censored quantile regression: a sensitivity analysis, working paper. [26] Fermanian, J.-D. (2003), Nonparametric estimation of competing risks models with covariates, Journal of Multivariate Analysis, 85, 156–191. [27] Flinn, C.J. and J.J. Heckman (1982), New methods for analyzing structural models of labor force dynamics, Journal of Econometrics, 18, 115-168. [28] Gine, E. and D.M. Mason (2007), On local U-statistic processes and the estimation of densities of functions of several sample variables, The Annals of Statistics, 35, 1105-1145. [29] Gjessing, H.K., Aalen, O.O., and N.L. Hjort (2003), Frailty models based on Lévy processes, Advances in Applied Probability, 35, 532-550. [30] Hardle, W. and T.M. Stoker (1989), Investigating smooth multiple regression by the method of average derivatives, Journal of The American Statistical Association, 84, 986–995. [31] Heckman, J.J. and B. Honore (1989), The identi…ability of competing risks model, Biometrika, 76, 325-330. [32] Heckman, J.J. and B. Honore (1990), The empirical content of the Roy model, Econometrica, 58, 1121-1149. 38 [33] Heckman, J.J. and S. Navarro (2007), Dynamic discrete choice and dynamic treatment e¤ects, Journal of Econometrics, 136, 341–396. [34] Honore B. and A. Lleras-Muney (2006), Bounds in competing risks models and the war on cancer, Econometrica, 74, 1675-1698. [35] Honore, B. and A. de Paula (2010), Interdependent durations, Review of Economic Studies, 77, 11381163. [36] Honore, B. and A. de Paula (2014), Interdependent durations in joint retirement, working paper. [37] Hougaard, P. (1986a), Survival models for heterogeneous populations derived from stable distributions, Biometrika, 73, 387-396. [38] Hougaard, P. (1986b), A class of multivanate failure time distributions, Biometrika, 73, 671-678. [39] Ichimura, H. (1993), Semiparametric least squares (SLS) and weighted SLS estimation of single-index models, Journal of Econometrics, 58, 71-120. [40] Ichimura, H. and L.F. Lee (1991), Semiparametric least squares estimation of multiple index models: single equation estimation, Nonparametric and Semiparametric Methods in Econometrics and Statistics, edited by William A. Barnett, James Powell, George E. Tauchen. [41] Kallsen, J. and P. Tankov, (2006), Characterization of dependence of multidimensional Levy processes using Levy copulas, Journal of Multivariate Analysis, 97, 1551–1572. [42] Katz, L.E. and B.D. Meyer (1990), Unemployment insurance, recall expectations, and unemployment outcomes, Quarterly Journal of Economics, 105, 973-1002. [43] Klein, J.P., Keiding, N., and C. Kamby (1989), Semiparametric Marshall-Olkin models applied to the occurrence of metastases at multiple sites after breast cancer, Biometrics, 45, 1073-1086. [44] Kyprianou A.E. (2006), Introductory Lectures on Fluctuations of Levy Processes with Applications, Springer. [45] Lancaster, T. (1972), A stochastic model for the duration of a strike, Journal of the Royal Statistical Society A, 135, 257–271. [46] Lancaster, T. (1979), Econometric methods for the duration of unemployment, Econometrica, 47, 939– 956. [47] Lee, S. (2006), Identi…cation of a competing risks model with unknown transformations of latent failure times, Biometrika, 93, 996-1002. [48] Lee, S. and A. Lewbel (2013), Nonparametric identi…cation of accelerated failure time competing risks model, Econometric Theory, 29, 905-919. [49] Li, Q., Lu, X. and A. Ullah (2003), Multivariate local polynomial regression for estimating average derivatives, Journal of Nonparametric Statistics, 15, 607-624. 39 [50] Lin, W. and K.B. Kulasekera (2007), Identi…ability of single-index models and additive-index models, Biometrika, 94, 496-501. [51] Liu, R. (2015), A single-index Cox model driven by Lèvy subordinators, working paper. [52] Marshall, A.W. and I. Olkin (1967), A multivariate exponential distribution, Journal of the American Statistical Association, 62, 30-44. [53] Marshall, A.W. and M. Shaked (1979), Multivariate shock models for distributions with increasing hazard rate average, The Annals of Probability, 7, 343-358. [54] McDonald, R. And D. Siegel (1986), The value of waiting to invest, Quarterly Journal of Economics, 101, 707–728. [55] Mortensen, D.T. and C.A. Pissarides (1994), Job creation and job destruction in the theory of unemployment, Review of Economic Studies, 61, 397–415. [56] Nelson, R.B. (2006), An Introduction to Copulas, 2nd Edition, Springer. [57] Peterson, A.V. (1976), Bounds for a joint distribution function with …xed sub-distribution functions: applications to competing risks. Proceedings of the National Academy of Sciences USA 73, 11-13. [58] Powell, J., Stock, J., and T.M. Stoker (1989), Semiparametric estimation of index coe¢ cients, Econometrica, 57, 1403-1430. [59] Renault, E., van den Heijden, T., and B.J.M. Werker (2014), The dynamic mixed hitting-time model for multiple transaction prices and times, Journal of Econometrics, 180, 233–250. [60] Ridder, G. (1990), The non-parametric identi…cation of generalized accelerated failure-time models, Review of Economic Studies, 57, 167-181. [61] Rivest, L.P. and M.T. Wells (2001), A martingale approach to the copula-graphic estimator for the survival function under dependent censoring, Journal of Multivariate Analysis, 79, 138-155. [62] Ryu, K. (1993), Structural duration analysis of management data, Journal of Econometrics, 57, 91-115. [63] Sato, K. (1999), Levy Processes and In…nitely Divisible Distributions. Cambridge University Press. [64] Schilling, R.L., Song, R., and Z. Vondracek (2012), Bernstein Functions: Theory and Applications, SIAM. [65] Singpurwalla, N.D. (1995), Survival in dynamic environments, Statistical Science, 10, 86–103. [66] Singpurwalla, N.D. (2006), The hazard potential: introduction and overview, Journal of the American Statistical Association, 101, 1705-1717. [67] Stoker, T.M. (1986), Consistent estimation of scaled coe¢ cients, Econometrica, 54, 1461-1481. [68] Tsiatis, A. (1975), A nonidenti…ability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences USA 72, 20-22. 40 [69] Tsybakov, A.B. (2008), Introduction to Nonparametric Estimation Springer. [70] van den Berg, G.J. (1997), Association measures for durations in bivariate hazard rate models, Journal of Econometrics, 79, 221-245. [71] van den Berg, G.J. (2001), Duration models: speci…cation, identi…cation, and multiple durations, in Handbook of Econometrics, Vol. 5, ed. by J.J. Heckman and E. Leamer. Amsterdam: Elsevier Science, Chapter 55, 3381–3460. [72] Whitmore, G.A. (1979), An inverse Gaussian model for labour turnover, Journal of the Royal Statistical Society A, 142, 468–478. [73] Widder, D.V. (1946), The Laplace Transform, Princeton University Press. [74] Wolfe, S.J. (1971), On moments of in…nitely divisible distribution functions, The Annals of Mathematical Statistics, 42, 2036-2043. [75] Zheng, M. and J.P. Klein (1995), Estimates of marginal survival for dependent competing risks based on an assumed copula, Biometrika, 82, 127-138. 41