CCP Estimation of Dynamic Discrete/Continuous Choice Models with Generalized Finite Dependence and Correlated Unobserved Heterogeneity Wayne-Roy Gayle∗ October 2, 2012 Abstract This paper investigates conditional choice probability estimation of dynamic structural discrete and continuous choice models. We extend the concept of finite dependence in a way that accommodates non-stationary, irreducible transition probabilities. We show that, under this new definition of finite dependence, one-period dependence is obtainable in any dynamic model. This finite dependence property also provides a convenient and computationally cheap representation of the optimality conditions for the continuous choice variables. We allow for general form of discrete-valued unobserved heterogeneity in utility, transition probabilities, and production functions. The unobserved heterogeneity may be correlated with the observable state variables, and its dimension may be large and unknown. Because of the potentially large dimension of the distribution of unobserved heterogeneity, we modify the smoothly clipped absolute deviation estimator to jointly estimate the structural parameters of the model, and the dimension of the unobserved heterogeneity. We show that the estimator is rootn–asymptotically normal. We develop a new and computationally cheap algorithm to compute the estimator. KEYWORDS: Conditional Choice Probabilities Estimator, Discrete/Continuous Choice, Finite Dependence, Correlated Random Effects. JEL: C14, C31, C33, C35. ∗ Department of Economics, University of Virginia, Monroe Hall, McCormick Rd, Room 208A, Charlottesville, VA 22903, E-mail: wg4b@virginia.edu. 1 1 Introduction In this paper, we investigate conditional choice probability (CCP) estimation of dynamic structural discrete/continuous choice models with unobserved individual heterogeneity. We show that a modest extension of the finite dependence definition in Hotz and Miller (1993), Altug and Miller (1998), and Arcidiacono and Miller (2010) accommodates general nonstationary and irreducible transition probabilities, as well as a general form of correlated unobserved heterogeneity in the utility functions, production functions, and the transition probabilities. We propose a generalized method of moments (GMM) estimator for the structural parameters of the model and derive their asymptotic distributions. We also propose a simple algorithm to implement the estimator. Since its introduction by Hotz and Miller (1993), CCP estimation of dynamic structural models has flourished in empirical labor economics and industrial organization, largely because of its potentially immense reduction in computational cost compared to the more traditional backward recursive- and contraction mapping-based full maximum likelihood estimation pioneered by Rust (1987), referred to as the nested fixed point algorithm (NFXP). The CCP estimator circumvents having to solve the dynamic programming problem for each trial value of the structural parameters by making use of a one-to-one mapping between the normalized value functions and the CCPs established in Hotz and Miller (1993). Therefore, nonparametric estimates of the CCPs can be inverted to obtain estimates of the normalized value functions, which can then be used in estimating the structural parameters. Empirical application the early formulation of CCP estimation had important limitations relative to the NFXP method. The emerging literature has focused on separate, but related drawbacks. The first is nonparametric estimation of the CCPs results in less efficient estimates of the structural parameters, as well as, relatively poor finite sample performance. The second is the difficulty of accounting for unobserved individual heterogeneity, mainly due to having to estimate the CCPs by nonparametric methods. A limitation of both approaches to estimation is the difficulty of both the CCP and NFXP estimators is they are largely restricted to discrete choice, discrete states models. Aguirregabiria and Mira (2002) proposed a solution to the issue of efficiency and finite sample performance by showing, for a given value of the preference parameters, the fixed 2 point problem in the value function space can be transformed into a fixed point problem in the probability space. Aguirregabiria and Mira (2002) propose swapping the nesting of the NFXP, and show that the resulting estimator is asymptotically equivalent to the NFXP estimator. Furthermore, Aguirregabiria and Mira (2002) show in simulation studies that their method produce estimates 5 to 15 times faster than NFXP. The method proposed by Aguirregabiria and Mira (2002) is restricted to discrete choice models in stationary environments, and is not designed to account for unobserved individual heterogeneity. Recent developments concerning incorporating unobserved heterogeneity into CCP estimators include Aguirregabiria and Mira (2007), Arcidiacono and Miller (2010). Aguirregabiria and Mira (2007) incorporate permanent unobserved heterogeneity into stationary, dynamic discrete games. Their method requires multiple inversion of potentially large dimensional matrices. Arcidiacono and Miller (2010) propose a more general method for incorporating timespecific or time-invariant unobserved heterogeneity into CCP estimators. Their method modifies the Expectations-Maximization algorithm proposed in Acidiacono (2002). Their method also allows for a restricted form of correlation between the unobserved and observed state variables. However, Arcidiacono and Miller’s method is only applicable to discrete dynamic models. Altug and Miller (1998) proposed a method for allowing for continuous choices in the CCP framework. By assuming complete markets, estimates of individual effects and aggregate shocks are obtained, which are then used in the second stage to form (now) observationally equivalent individuals. These observationally equivalent individuals are used to compute counterfactual continuous choices. Bajari et al. (2007) modify the methods of Hotz and Miller (1993) and Hotz et al. (1994), to estimating dynamic games. Their method of modeling unobserved heterogeneity in continuous choices is inconsistent with the dynamic selection. The finite dependence property; when two different policies associated with different initial choices lead to the same distribution of states after a few periods, is critical for the computational feasibility and finite sample performance of CCP estimators. Finite dependence combined with the invertibility result of Hotz and Miller (1993) results in significant reduction in computational cost of estimating dynamic structural models. Essentially, the smaller the order of dependence, the faster and more precise the estimator, because fewer future choice probabilities that either have to be estimated or updated, depending of the method of 3 estimation. The concept of finite dependence was first introduced by Hotz and Miller (1993), extended by Altug and Miller (1998), and further by Arcidiacono and Miller (2010). Despite these generalizations, the concept of finite dependence is largely restricted to discrete choice models with either stationary transitions or the renewal property. This paper make three separate, but closely related contributions to the literature on CCP estimation of dynamic structural models. We extend the concept of finite dependence to allow for general non-stationary and irreducible transition probabilities. While its definition is precise and well understood, the strategy to construct finite dependence in dynamic structural models have been largely ad hoc and imprecise, many times relying on assumptions that are either theoretically unjustified, or significantly restricting the data. Altug and Miller (1998), Gayle and Miller (2003), and Gayle (2006) rely on complete market and degenerate transition probability assumptions to form counterfactual strategies that obtain finite dependence. A key insight of Arcidiacono and Miller (2010) is “the expected value of future utilities from optimal decision making can always be expressed as functions of the flow payoffs and conditional choice probabilities for any sequence of future choices, optimal or not.” This insight is the basis of our extension of the finite dependence property. We show the expected value of future utilities from optimal decision making can be expressed as any linear combination of flow payoffs and conditional CCPs, as long as the weights sum to one. This insight converts the difficult problem of finding one pair of sequences of choices that obtains finite dependence to a continuum of finite dependencies from which to choose. Given we are now able to choose from a continuum of finite dependence representations, the question becomes whether there is a choice that obtains one-period finite dependence. Indeed, there are also a continuum of one-period finite dependencies, regardless of the form of the transition probabilities. One particular choice has the advantage of being elegant and intuitive, as well as providing a method to accommodate continuous choices. Our approach to accounting for continuous choices do not rely on first stage estimation as in Altug and Miller (1998), and Bajari et al. (2007), nor does it require forward simulation as in Hotz et al. (1994), and Bajari et al. (2007). The proposed method for estimating discrete/continuous dynamic structural models parallels the method for estimating discrete/continuous static structural models of Dubin and McFadden (1984), and Hanemann (1984), operationalized by the inversion result of Hotz and Miller (1993), Arcidiacono and Miller (2010), and the generalized finite dependence result of this paper. 4 To avoid stochastic degeneracy, the econometric specification of discrete/continuous structural models requires at least as many choice-specific unobserved shocks as there are choices. The estimator developed in this paper conveniently accommodates these traditional i.i.d. shocks, as well as, other correlated unobserved heterogeneity. The advantages of the algorithm proposed in this paper are, it does not require specifying initial conditions, it does not require discrete approximation of the continuous choice variables and the value functions, and its convergence is well understood. 2 Model 2.1 General framework The general setup of a dynamic structural discrete/continuous choice model is as follows. In each period, t, an individual chooses among J discrete, mutually exclusive alternatives. Let dt j be one if the discrete action j ∈ {1, · · · , J} is taken, and zero otherwise, and define dt = (dt1 , · · · , dtJ ). Associated with each discrete alternative, j, the individual chooses L j continuous alternatives. Let ctl j ∈ ℜ+ , l j ∈ 1, · · · , L j , be the continuous actions assoL ciated with alternative j with ctl j > 0 if dt j = 1. Define ct j = (ct1 , · · · , ctL j ) ∈ ℜ+j , and ct = (ct1 , · · · , ctJ ) ∈ ℜL+ , where L = ∑Jj=1 L j . Also, let ( j, ct j ) be the vector of discrete and continuous actions associated with alternative j. The current period payoff associated with action ( j, ct j ) depends on the observed state xt ∈ ℜDx , where Dx is the dimension of xt , the unobserved state st ∈ ℜDs , where Ds is the dimension of st , the unidimensional discrete- choice–specific shock ε jt ∈ ℜ, and the L j -dimensional vector of continuous-choice–specific shocks rt j = (rt1 , · · · , rtL j ) ∈ ℜL j . Let zt = (xt , st ). The probability density function of zt+1 given action ( j, ct j ) is taken in period t is denoted by f jt (zt+1 |zt , ct j ). The shocks associated with alternative ( j, ct j ) in period t, et j = (εt j , rt j ), are observed to the individual at the beginning of period t. The individual’s conditional direct current period payoff from choosing alternative ( j, ct j ) in period t is denoted by ut j (zt , ct j , rt j ) + εt j . Define yt j = (dt j , ct j ). The individual chooses the vector yt = (yt1 , · · · , ytJ ) to sequentially 5 maximize the expected discounted sum of payoffs: E ( T J ∑ ∑ βt−1dt j [ut j (zt , ct j , rt j ) + εt j ] t=1 j=1 ) , (2.1) where β ∈ (0, 1) is the discount factor. In each period, t, the expectation is taken over zt+1 , · · · , zT and et+1 , · · · , eT . The solution to maximizing expression (2.1) is a Markov decision rule for optimal choice conditional on the time-specific state vectors and i.i.d. shocks. Let the optimal decision rule at period t be given by (dt0j (zt , et ), ct0j (zt , et )). Let the ex-ante value function in period t, Vt (zt ), be the discounted sum of expected future payoffs, before et is revealed, given the optimal decision rule: Vt (zt ) = E ( T J ∑ ∑ βτ−t dτ0j (zτ, eτ)[uτ j (zτ, c0τ j (zτ, eτ), rτ j ) + ετ j ] τ=t j=1 ) . As is standard in dicrete/continuous models, the additive separability of the utility function implies the discrete-choice–specific continuous choice is a function of their associated shocks and not of εt . The expected value function in period t + 1, given zt , rt j , the discrete choice, j, and corresponding optimal continuous choice, ct0j (zt , rt j ), is V̄t+1, j (zt , rt ) = β Z Vt+1 (zt+1 ) f jt (zt+1 |zt , ct0j (zt , rt j ))dzt+1, R where Vt+1 (zt+1 ) = Vt+1 (zt+1 , rt+1 )gr (rt+1 ), and gr is the density function of rt . Let gε be the density function of ε, and assume that εt and rt are independent random vectors, so that ge = gε gr . If behavior is governed by a Markov decision rule, then Vt (zt ) can be written recursively: Vt (zt ) = E = Z = Z ( J ∑ dt0j (zt , et ) j=1 J ∑ dt0j (zt , et ) j=1 J ∑ dt0j (zt , et ) j=1 ) 0 ut j (zt , ct j (zt , rt j ), rt j ) + εt j + βV̄t+1, j (zt , rt ) ut j (zt , ct0j (zt , rt j ), rt j ) + εt j + βV̄t+1, j (zt , rt ) ge (et )de, vt j (zt , ct0j (zt , rt j ), rt j ) + εt j ge (et )de 6 where vt j (zt , ct0j (zt , rt j ), rt j ) = ut j (zt , ct0j (zt , rt j ), rt j ) + βV̄t+1, j (zt , rt ), (2.2) the choice-specific conditional value function without εt j . The optimal conditional continuous choice, given the discrete alternative j being chosen in period t, must satisfy ∂ vt j (zt , ct0j (zt , rt j ), rt j ) = 0, ∂ctl j (2.3) for l j = 1, · · · , L j . Given the optimal conditional continuous choice, ct0j (zt , rt j ), the individual’s discrete choice of alternative j is optimal if dt0j (zt , et ) = ( 1 if 0 (z , r ), r ) + ε vt j (zt , ct0j (zt , rt j ), rt j ) + εt j > vtk (zt , ctk t tk tk tk 0 otherwise ∀k 6= j (2.4) Finally, the optimal unconditional continuous choice, ct j (zt , rt j ), is given by ct j (zt , rt j ) = dt0j (zt , et )ct0j (zt , rt j ). (2.5) 2.2 Alternative representation The probability of choosing alternative j at time t, conditional on zt , rt , and the vector of 0 , · · · , c0 ) is given by choice-specific optimal conditional continuous choices, ct0 = (ct1 tJ pt0j (zt , rt ) = Z dt0j (zt , rt , εt )gε (εt )dεt , (2.6) so that, for all (zt , rt ), ∑Jj=1 pt0j (zt , rt ) = 1, and pt0j (zt , rt ) > 0 for all j. Let pt0 (zt , rt ) = 0 (z , r ), · · · , p0 (z , r )) be the vector of conditional choice probabilities. Lemma 1 of (pt1 t t tJ t t Arcidiacono and Miller (2010) show a function ψ : [0, 1]J 7→ ℜ exists such that, for k = 1, · · · , J 0 ψk (pt (zt , rt )) ≡ Vt (zt , rt ) − vtk (zt , ctk (zt , rtk ), rtk ). (2.7) Equation (2.7) is simply equation (3.5) of Arcidiacono and Miller (2010), modified so the choice probabilities and value functions are also conditional on the i.i.d. shocks associated with the conditional continuous choices. Our key insight is, given equation (2.7) holds for 7 k = 1, · · · , J, then for any J-dimensional vector of real numbers at = (at1 , · · · , atJ ) such that ∑Jk=1 atk = 1, we have J Vt (zt , rt ) = ∑ atk [vtk (zt , ctk0 (zt , rtk ), rtk ) + ψk (pt (zt , rt ))]. (2.8) k=1 Let at+1, j = (at+1,1 j , · · · , at+1,J, j ) be the weights associated with the initial discrete choice, j, in period t. Denote f jt (zt+1 |zt , ct0j (zt , rt j )) by f jt (zt+1 |zt , rt j ). Substituting equation (2.8) into equation (2.2) gives: vt j (zt , ct0j (zt , rt j ), rt j ) = ut j (zt , ct0j (zt , rt j ), rt j ) J Z +β ∑ 0 [vt+1,k (zt+1 , ctk (zt , rtk ), rt+1,k ) k=1 + ψk (pt+1 (zt+1 , rt+1 ))]at+1,k j gr (rt+1)drt+1 f jt (zt+1|zt , rt j )dzt+1 , (2.9) Equation (2.9) shows the value function conditional on (zt , rt ) can be written as the flow payoff of the choice plus any weighted sum of a function of the one period ahead CCPs plus the one period ahead conditional value functions, where the weights sum to one. This modest extension of the results of Arcidiacono and Miller (2010) provides a powerful tool for obtaining finite dependence in any model which can be formulated as the one developed in the previous section. Specifically, note the following. 2.3 Generalized finite dependence R Define f jt (zt+1 |zt ) = f jt (zt+1 |zt , rt j )gr (rt j )drt j . For any initial choice ( j, ct j ), for periods τ = {t + 1, · · · ,t + ρ}, and any corresponding sequence aτ = {aτk j , k, j = 1, · · · , J} with ∑Jk=1 aτk j = 1, define κτ j (zτ+1 , |zt , rt j ) = ( for τ = t f jt (zt+1 |zt , rt j ) R . ∑Jk=1 aτ+1,k j fkτ (zτ+1 |zτ )κτ−1, j (zτ |zt , rt j )dzτ for τ = t + 1, · · · ,t + ρ (2.10) R Because ∑Jk=1 aτk j = 1, κτ j (zτ+1 , |zt )dzτ+1 = 1. We also impose the restriction a = (at+1 , · · · , at+ρ ) is so that κt+ρ+1 j (zt+ρ+1 , |zt ) > 0. This restriction does not require a j ≥ 0. Then, by forward 8 substitution, equations (2.9) and (2.10) obtain vt j (zt , ct0j (zt , rt j ), rt j ) = ut j (zt , ct0j (zt , rt j ), rt j ) t+ρ + J Z ∑ ∑ τ=t+1 k=1 +β t+ρ+1 Z βτ−t [uτk (zτ , c0τk (zτ , rτk ), rτk ) + ψk [pτ (zτ , rτ )]]aτk j gr (rτ )κτ−1, j (zτ |zt , rt j )drτ dzτ Vt+ρ+1 (zt+ρ+1 , rt+ρ+1 )gr (rt+ρ+1 )κt+ρ+1, j (zt+ρ+1 |zt , rt j )drt+ρ+1 dzt+ρ+1 . (2.11) Equation (2.11) shows the alternative-specific conditional value functions depend on the sequence of (possibly non-optimal) choice probabilities pt+1 = (pt+1 (zt+1 , rt+1 ), · · · , pT (zT , rT )), and their corresponding optimal alternative-specific continuous choices ct0 = (ct0 (zt , rt ), · · · , c0T (zT , rT )), so that vt j (zt , ct0j (zt , rt j ), rt j ) = vt j (zt , ct0 , pt+1 , rt j ). (2.12) In what follows, we suppress this dependence on (ct0 , pt+1 ) and reintroduce them when clarity is required. Using equation (2.11), we can therefore express the difference in the conditional value functions associated with two alternative initial choices, j and j′ as vt j (zt , rt j ) − vt j′ (zt , rt j′ ) = ut j (zt , rt j ) − ut j′ (zt , rt j′ ) t+ρ + J Z ∑ ∑ τ=t+1 k=1 βτ−t [uτk (zτ , rτk ) + ψk [pτ (zτ , rτ )]]gr (rτ )drτ × [aτk j κτ−1, j (zτ |zt , rt j ) − aτk j′ κτ−1, j′ (zτ |zt , rt j′ )]dzτ +β t+ρ+1 Z Vt+ρ+1 (zt+ρ+1 , rt+ρ+1 )gr (rt+ρ+1 )drt+ρ+1 × [κt+ρ, j (zt+ρ+1 |zt , rt j ) − κt+ρ, j′ (zt+ρ+1 |zt , rt j′ )]dzt+ρ+1. (2.13) Therefore, we say a pair of initial choices, ( j, ct j ) and ( j′ , ct j′ ) exhibit generalized ρ-period dependence if corresponding sequences (at+1, j , · · · , at+ρ, j ), and (at+1, j′ , · · · , at+ρ, j′ ) exists such that κt+ρ, j (zt+ρ+1 |zt , rt j ) = κt+ρ, j′ (zt+ρ+1 |zt , rt j′ ) almost everywhere. 9 (2.14) We now show that this generalization of the finite dependence property can be used to obtain one-period dependence for any model that satisfies the setup given in the previous section. For initial choice ( j, ct j ), κt+1, j (zt+2 |zt ) = Z J ∑ at+1,k j fkt+1 (zt+2|zt+1) f jt (zt+1|zt , rt j )dzt+1, k=1 so for any pair of initial choices, ( j, ct j ) and ( j′ , ct j′ ), κt+1, j (zt+2 |zt , rt j ) − κt+1, j′ (zt+2 |zt , rt j′ ) = = Z J ∑ [at+1,k j fkt+1 (zt+2|zt+1) f jt (zt+1|zt , rt j ) − at+1,k j′ fkt+1 (zt+2|zt+1) f j′t (zt+1|zt , rt j′ )]dzt+1 k=1 Z J ∑ fkt+1 (zt+2|zt+1)[at+1,k j f jt (zt+1|zt , rt j ) − at+1,k j′ f j′t (zt+1|zt , rt j′ )]dzt+1. (2.15) k=1 Then a sufficient condition for one-period dependence is at+1,k j f jt (zt+1 |zt , rt j ) = at+1,kl f j′t (zt+1 |zt , rt j′ ), k = 1, · · · , J. (2.16) One particular choice that has an elegant representation is at+1,l j = 1, at+1, j j′ = f jt (zt+1 |zt , rt j )/ f j′t (zt+1 |zt , rt j′ ), at+1,ll = 1 − at+1, j j′ . (2.17) Given the volume of alternative choices of weights that obtains one-period finite dependence, and we proceed by assuming ρ = 1, so vt j (zt , rt j ) − vt j′ (zt , rt j′ ) = ut j (zt , rt j ) − ut j′ (zt , rt j′ ) +β Z Z J ∑ [ut+1,k (zt+1, rt+1,k ) + ψk [pt+1 (zt+1, rt+1)]]gr(rt+1)drt+1 k=1 × [at+1k j ft j (zt+1 |zt , rt j ) − at+1k j′ ft j′ (zt+1 |zt , rt j′ )]dzt+1 . 10 ! (2.18) 2.4 Optimality conditions The alternative representation of the difference in conditional value functions provide a simple and convenient representation of the condition for optimal conditional continuous choice, ct0j (zt , rt j ) given alternative j is chosen. The key is to note ∂vt j′ (zt , ct0j′ (zt , rt j′ ), rt j′ )/∂ctl j = 0 for j′ 6= j and l j = 1, · · · , L j . This equality, and equation (2.18) imply ct0j (zt , rt j ) solves 0= ∂ ut j (zt , ct0j (zt , rt j ), rt j ) ∂ctl j +β Z Z J 0 (zt+1 , rt+1,k ), rt+1,k ) + ψk [pt+1 (zt+1 , rt+1 )]]gr (rt+1 )drt+1 ∑ [ut+1,k (zt+1, ct+1,k k=1 ∂ × [at+1k j ft j (zt+1 |zt , rt j ) − at+1k j′ ftl (zt+1 |zt , rt j′ )]dzt+1 ∂ctl j ! (2.19) The key to this solution is to note at+1k j′ will typically be a function of ct0j′ . For example, expression (2.17) implies ∂ ∂ at+1 j j′ = [ f j′t (zt+1 |zt , rtl )]−1 f jt (zt+1 |zt , rt j ), ∂ctl j ∂ctl j and ∂ ∂ at+1, j′ j′ = − at+1 jl . ∂ctl j ∂ctl j (2.20) Substituting the optimal conditional continuous choice into equation (2.17) obtains vt j (zt , ct0j (zt , rt j ), rt j ) − vt j′ (zt , ct0j′ (zt , rt j′ ), rt j′ ) = ut j (zt , ct0j (zt , rt j ), rt j ) − ut j′ (zt , ct0j′ (zt , rt j′ ), rt j′ ) ! Z Z +β J 0 (zt+1 , rt+1,k ), rt+1,k ) + ψk [pt+1 (zt+1 , rt+1 )]]gr (rt+1 )drt+1 ∑ [ut+1,k (zt+1, ct+1,k k=1 × [at+1k j ft j (zt+1 |zt , rt j ) − at+1k j′ ft j′ (zt+1 |zt , rt j′ )]dzt+1 . (2.21) 3 Identification For each individual unit, the random variables (dt , ct , xt ), t = 1, · · · , T are observable. Hence in the population, the joint distribution F(dt , ct , xt ) is observed. Identification is semiparametric in the sense that we assume parametric restrictions on ut j , gε , and gr , but we only impose shape, exclusion, and granularity restrictions on f jt (zt+1 |zt , rt j ) and Π(st , st+1 |xt , rt ), 11 the conditional distribution of the unobserved state. For t = 1, · · · , T , j = 1, · · · , J, k 6= j, define ut jk (zt , ct , rt ) = ut j (zt , ct j , rt j ) − utk (zt , ctk , rtk ), and vt jk (zt , ct , rt ) = vt j (zt , ct j , rt j ) − vtk (zt , ctk , rtk ). In what follows, we will use the shorthand notations ut jk and vt jk for ut jk (zt , ct , rt ) and vt jk (zt , ct , rt ). Assumption 3.1. For t = 1, · · · , T , j = 1, · · · , J, k 6= j, 1. β ∈ [0, 1) is known. 2. ut j (zt , ct j , rt j ) = ut j (zt , ct j , rt j ; θ1 ) is known up to θ1 , and increasing and strictly concave in ctl j , with limctl j →0 ut j (zt , ct j , rt j ; θ1 ) = −∞, l j = 1, · · · , L j . 3. Pr (xt , st , rt )|ut jk (θ1 ) 6= 0, θ1 ∈ Θ1 ; ut jk (θ1 ) 6= ut jk (θ̃1 ), θ1 6= θ̃1 ∈ Θ1 = 1. 4. ε j and εk are independent with common density function, gε , which is twice continuously differentiable and log-concave with support ℜ. 5. gr (rt ) = gr (rt ; θ2 ), is continuous and known up to θ2 . 6. f jt (zt+1 |zt , ct j ) = f jt (xt+1 |xt , st , ct j )πt (st+1 |st , xt ), where π(st+1 |st , xt ) is the conditional density of st+1 . 7. ft j (xt+1 |xt , st , ct j ) is twice continuously differentiable in ctl j , l j = 1, · · · , L j , with ∂ ft j (xt+1 |xt , st , ct j ) < ∞. max sup l j =1,···,L j ctl ∂ctl j j 8. Let wt be a proper subset of xt . Π(st , st+1 |xt ) ∈ {(sq , sq′ , πqq′ (wt )), q = 1, · · · , Q, q′ = 1, · · · , Q; Q ∈ N }. T / 9. Dx ≥ 2, where Dx is the dimension of xt , and for at least one k ∈ {1, · · · , Dx }, xtk wt = 0, and the conditional distribution of xkt given x−k,t , where x−k,t = {x jt , j 6= k}, is absolutely continuous with respect to a Lebesgue measure with non-degenerate Radon-Nikodym density. Without loss of generality, set k = 1. R 10. Let θ = (θ1, θ2 ), and pt0j (xt , sq ; θ) = pt0j (xt , sq , rt ; θ)gr (rt ; θ2 )drt . pt0j (xt , sq ; θ) 6= pt0j (xt , sq′ ; θ) for sq 6= sq′ , and for any wt and sq , αt j ∈ ℜ, depending on wt and sq , exists such that P({xt |pt0j (xt , sq ; θ) = αt j }) > 0. 12 Define 0 ptt+1, jk (xt ; θ, Π) = Z Q Q 0 πqq′ (wt )pt0j (xt , sq ; θ)pt+1,k (xt+1 , sq′ ; θ) fk,t (xt+1 |xt , sq ), ∑∑ ′ q=1 q =1 0 and P0 (x; θ, Π) = (ptt+1, jk (xt ; θ, Π),t = 2, · · · , T − 1, j, k = 1, · · · , J). Let (θ0 , Π0 ) be the true parameter vector, in that the probabilities generated from the model at (θ0 , Π0) coincides with the true probabilities: P(x) = P0 (x; θ0 , Π0 ). We say that the parameters of the model are identified if x distinguishes between the true parameter vector and any other admissible parameter vector: (θ, Π) 6= (θ0 , Π0 ) ⇒ P0 (x; θ, Π) 6= P0 (x; θ0 , Π0 ), with probability one. Theorem 3.2. Suppose assumption 3.1 holds. Then (θ0 , Π0) is identified. The proof of 3.2 is in Appendix A.1. 4 Estimator In this section, we propose a GMM estimator for the parameters of the model, θ. We choose to propose a GMM estimator instead of the ML estimator for two reasons. First, definition of the GMM estimator does not require specifying the distribution of measurement errors. Second, the GMM estimator is implementable without the need to specify the initial conditions for the process of st . We begin by imposing a parametric restriction on πqq′ (w). Assumption 4.1. For q, q′ = 1, · · · , Q, πqq′ (w) = πqq′ (w; θ3 ) is known up to a set of finitedimensional parameters, θ3 . Rechristen θ = (θ1 , θ2 , {sq , q = 1, · · · , Q}, θ3 ). Suppose n observations of the random vector {yit = (dit , cit , xit ),t = 1, · · · , T } are independently drawn from F(dt , ct , xt ). Let yi = (y′i3 , · · · , y′iT )′ , p0i = (p0i3 , · · · , p0iT ), and c0i = (c0i3 , · · · , c0iT ). For each i, and for t = 3, · · · , T , 13 define the residuals ρ1it j (yi ; θ) = dit j − pt0j (xit ; θ), j = 2, · · · , J, ρ2it j (yi ; θ) = cit j − ct0j (xit ; θ), j = 1, · · · , J, 0 ρ3t jk (yi ; θ) = dit j dit+1,k − ptt+1, jk (xit ; θ), j = 1, · · · , J − 1, k = j + 1, · · · , J. Define ρ1it (yi ; θ) = (ρ1it j (yi ; θ), j = 2, · · · , J), ρ2it (yi ; θ) = (ρ2it j (yi ; θ), j = 1, · · · , J), and ρ3it (yi ; θ) = (ρ3t jk (yi ; θ), j = 1, · · · , J − 1, k = j + 1 · · · , J). Define the L + (J + 2)(J − 1)/2-dimensional residual vector ρit (yi ; θ) = (ρ1it , ρ2it , ρ3it )′ . Let Xit be a (L + (J + 2)(J − 1)/2) × NtX matrix of instruments, and define the NtX -dimensional vector. mit (θ) = Xit′ ρit (yi ; θ). (4.1) T −1 X Nt , Define the N X -dimensional vector mi (θ) = (mi2 (θ)′ , · · · , miT −1 (θ)′ )′ , where N X = ∑t=2 and the N X -dimensional vector of moments m(θ) = E[mi (θ)]. Let Ω be a N X ×N X -dimensional symmetric, positive definite weighting matrix. Then from the results of section 3, θ0 minimizes S(θ) = m(θ)′ Ωm(θ). (4.2) To implement the estimator, we assume the number of types, Q, is known to the investigator. We discuss methods for estimating Q at the end of this section. Let m̂(θ) = 1 n ∑ mi(θ). n i=1 (4.3) Then the GMM estimator for θ0 , θ̂ minimizes Ŝ(θ) = m̂(θ)′ Ω̂m̂(θ), 14 (4.4) where Ω̂ is a consistent estimator for Ω. 5 Computing The Estimator In this section, we present a method for computing the estimator proposed in the previous section. We begin by discussing updating the discrete alternative-specific CCPs and continuous choice variables. We describe the algorithm starting with the o + 1 iteration with p0,[o] , c0,[o] , θ[o] in hand. The alternative-specific continuous choice variables can be updated by iterating (over o′ ) 0,[o′ +1] cit j " 0,[o′ ] (xit , rit j ; θ[o] ) = cit j (xit , rit j ; θ[o] ) #−1 ′ ∂2 [o ] 0,[o] [o] vit j (xit , cit j (xit , rit j ; θ[o] ), cit+1 , pit+1 , rit j ; θ[o] ) − 2 ∂cit j ∂ [o′ ] 0,[o] [o] [o] [o] vit j (xit , cit j (xit , rit j ; θ ), cit+1 , pit+1 , rit j ; θ ) , × ∂cit j 0,[o] until convergence, where the procedure is initialized by cit 0,[o+1] . cit (5.1) . Denote the fixed point by The conditional choice probabilities can be updated as follows: [o+1] pit j 0,[o+1] Given ci [o+1] (xit , rit j ; θ[o] ) = Ψ j (vit j (xit , cit j 0,[o+1] (θ[o] ), and pi 0,[o] [o] (xit , rit j ; θ[o] ), cit+1 , pit+1 , rit j ; θ[o] )). (5.2) (θ[o] ), θ is updated as follows θ[o+1] = θ[o] ′ −1 ∂ ∂ [o] 0,[o+1] [o] 0,[o+1] [o] [o] 0,[o+1] [o] 0,[o+1] [o] − m(θ ; c (θ ), p (θ )) Ω̂ m(θ ; c (θ ), p (θ )) ∂θ ∂θ ′ ∂ [o] 0,[o+1] [o] 0,[o+1] [o] × m(θ ; c (θ ), p (θ )) Ω̂ m(θ[o] ; c0,[o+1] (θ[o] ), p0,[o+1] (θ[o] )), (5.3) ∂θ where c = (ci3 , i = 1, · · · , n) and p = (pi3 , i = 1, · · · , n). The full algorithm is as follows. 15 Main Algorithm. 1– Initialize θ[0] ∈ Θ, p0,[0] ∈ [0, 1][n(T−3)(J−1)] , and c0,[0] ∈ ℜn(T −3)L . 2– For o ≥ 1 2.1– Update c0,[o] using (5.1) 2.2– Update p0,[o] using equation (5.2) 2.3– Update θ[o] using equation (5.3) Until convergence in θ. Notice that steps 2.1 and 2.1 of the main algorithm is simply evaluating the objective function at the new trial values of θ, while step 2.3 is a Gauss-Newton step. Hence, the main algorithm converges, and the convergence rate is at most quadratic. 6 Asymptotic properties and consistent covariance estimation This section provides additional sufficient conditions for consistency and normality of the estimator, θ̂, of θ0 ∈ Θ. √ n-asymptotic Assumption 6.1. As sample of n independent realizations is drawn from F(d, c, x). For each i = 1, · · · , n, (dit , cit , xit ,t = 1, · · · , T ) is observed. Assumption 6.2. Ω̂ is symmetric and positive definite with kΩ̂ − Ωk = o p (1). Theorem 6.3. Suppose (i) Assumption 3.1 holds, (ii) Assumption 4.1 hold, and (iii) Assumpp tions 6.1, and 6.2 hold. Then θ̂ −→ θ0 , and √ p n(θ̂ − θ0 ) −→ N(0,V ), where V = (M ′ ΩM)−1 (M ′ ΩΣΩM)(M ′ΩM)−1 , M = ∂m(θ0 )/∂θ, and Σ = E[mi (θ0 )mi (θ0 )′ ]. The proof of Theorem 6.3 is standard and can be found in Newey and McFadden (1994). To estimate V , let M̂ = m(θ̂)/∂θ, Σ̂ = ∑ni=1 mi (θ̂)mi (θ̂)′ /n, and define V̂ = (M̂ ′ Ω̂M̂)−1 (M̂ ′ Ω̂Σ̂Ω̂M̂)(M̂ ′ Ω̂M̂)−1 . 16 Theorem 6.4. Suppose the conditions of Theorem 6.3 hold. Then kV̂ −V k = o p (1). Again, the proof of Theorem 6.4 is standard, and can also be found in Newey and McFadden (1994). 7 Monte Carlo study A LEMMA AND THEOREMS A.1 Proof of Theorem 3.2 For fixed θ1 , in last period, T , assumption 3.1.2 implies a unique solution of cT l j to the equation ∂uT j (zT , cT j , rT j ; θ1 )/∂cT l j = 0, j = 1, · · · , J, l j = 1, · · · , L j , denoted by c0T l j (zT , rT ). This solution, along with assumptions 3.1.3, and 3.1.4 imply unique pT j (zT , rT ; θ1 ), j = 1, · · · , J. Note that, under the conditions of the Theorem, f jt (xt+1 |xt , ct j ) can be computed from the population distribution of the observables. Now consider period T − 1. For fixed (θ, Π), these results, along with assumptions 3.1.1, and 3.1.5-3.1.7, vT −1, j (zT −1 , rT −1 ; θ, Π) = uT −1, j (zT −1 , cT −1. j , rT −1, j ; θ1 ) + βV̄T j (zT −1 , rT −1, j ; θ, Π), j = 1, · · · , J are unique. For fixed aT j,k , j, k = 1, · · · , J, vT −1, j j′ (zT −1 , rT −1 ; θ, Π) has a unique representation as function of uT −1, j j′ (zT −1 , rT −1 ; θ, Π), (aT jk , aT j′ k ), k = 1, · · · , J, uT (zT , cT , rT ; θ1 ), and pT (xT ; θ, Π), given in equation (2.18), from which c0T −1,l j (zT −1 , rT −1 ; θ, Π) uniquely solves equation (2.19), and in turn, pT −1, j (zT −1 , rT −1 ; θ, Π) is unique. Proceeding by backward induction obtain unique p jt (zt , rt , θ, Π), j = 1, · · · , J, t = 1, · · · , J, from which we obtain unique p jt (zt , θ, Π), j = 1, · · · , J, t = 1, · · · , J, and unique ptt+1, jk (xt ; θ, Π), j, k = 1, · · · , J, t = 1, · · · , T − 1. By assumption, ptt+1, jk (xt ; θ0 , Π0 ) = ptt+1, jk (xt ), the population probability. Suppose (θ, Π) 6= (θ0 , Π0 ) exists with ptt+1, jk (xt ; θ, Π) = ptt+1, jk (xt ). Then under the assumptions of the theorem, Gayle (2012) shows that (θ, Π) = (θ0 , Π0 ). In particular, the log-concavity of gε , assumptions 3.1.9, and 3.1.10 along with the uniqueness of basis vectors obtain the result. 17 References AGUIRREGABIRIA , V. AND P. M IRA (2002): “Swapping The Nested Fixed Point Algorithm: A Class of Estimators for Distrete Markov Decision Models,” Econometrica, 70, 1519–1543. ——— (2007): “Sequential Estimation of Dynamic Discrete Games,” Econometrica, 75, 1–53. A LTUG , S. AND R. A. M ILLER (1998): “Effect of Work Experience of Female Wages and Labour Supply,” The Review of Economic Studies, 65, 45–85. A RCIDIACONO , P. AND R. M ILLER (2010): “CCP Estimation of Dynamic Discrete Choice Models with Unobserved Heterogeneity,” MIMEO, Duke University. BAJARI , P., C. B ENKARD , AND J. L EVIN (2007): “Estimating Dynamic Models of Imperfect Competition,” Econometrica, 75. D UBIN , J. A. AND D. L. M C FADDEN (1984): “An Econometric Analysis of Residential Electric Applicance Holdings and Consumption,” Econometrica, 52, 345–362. G AYLE , G.-L. AND R. A. M ILLER (2003): “Life-Cycle Fertility and Human Capital Accumulation,” Unpublished Manuscript, Carnegie Mellon University. G AYLE , W.-R. (2006): “A Dynamic Structural Model of Labor Market Supply and Educational Attainment,” MIMEO: University of Virginia. H ANEMANN , M. W. (1984): “Discrete/Continuous Models of Consumer Demand,” Econometrica, 52, 541–561. H OTZ , J. V. AND R. A. M ILLER (1993): “Conditional Choice Probabilities and the Estimation of Dynamic Models,” Review of Economic Studies, 60, 497–529. H OTZ , J. V., R. A. M ILLER , S. S ANDERS , AND J. S MITH (1994): “A Simulation Estimator for Dynamic Models of Discrete Choice,” Review of Economic Studies, 61, 265–289. N EWEY, W. K. AND D. M C FADDEN (1994): Large Sample Estimation and Hypothesis Testing, Elsevier Science Publishers. RUST, J. (1987): “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher,” Econometrica, 55, 999–1033. 18