CCP Estimation of Dynamic Discrete/Continuous

advertisement
CCP Estimation of Dynamic Discrete/Continuous
Choice Models with Generalized Finite Dependence
and Correlated Unobserved Heterogeneity
Wayne-Roy Gayle∗
October 2, 2012
Abstract
This paper investigates conditional choice probability estimation of dynamic structural discrete and continuous choice models. We extend the concept of finite dependence in a way that accommodates non-stationary, irreducible transition probabilities.
We show that, under this new definition of finite dependence, one-period dependence
is obtainable in any dynamic model. This finite dependence property also provides a
convenient and computationally cheap representation of the optimality conditions for
the continuous choice variables. We allow for general form of discrete-valued unobserved heterogeneity in utility, transition probabilities, and production functions. The
unobserved heterogeneity may be correlated with the observable state variables, and
its dimension may be large and unknown. Because of the potentially large dimension
of the distribution of unobserved heterogeneity, we modify the smoothly clipped absolute deviation estimator to jointly estimate the structural parameters of the model, and
the dimension of the unobserved heterogeneity. We show that the estimator is rootn–asymptotically normal. We develop a new and computationally cheap algorithm to
compute the estimator.
KEYWORDS: Conditional Choice Probabilities Estimator, Discrete/Continuous Choice,
Finite Dependence, Correlated Random Effects.
JEL: C14, C31, C33, C35.
∗ Department
of Economics, University of Virginia, Monroe Hall, McCormick Rd, Room 208A, Charlottesville, VA 22903, E-mail: wg4b@virginia.edu.
1
1 Introduction
In this paper, we investigate conditional choice probability (CCP) estimation of dynamic
structural discrete/continuous choice models with unobserved individual heterogeneity. We
show that a modest extension of the finite dependence definition in Hotz and Miller (1993),
Altug and Miller (1998), and Arcidiacono and Miller (2010) accommodates general nonstationary and irreducible transition probabilities, as well as a general form of correlated
unobserved heterogeneity in the utility functions, production functions, and the transition
probabilities. We propose a generalized method of moments (GMM) estimator for the structural parameters of the model and derive their asymptotic distributions. We also propose a
simple algorithm to implement the estimator.
Since its introduction by Hotz and Miller (1993), CCP estimation of dynamic structural
models has flourished in empirical labor economics and industrial organization, largely because of its potentially immense reduction in computational cost compared to the more traditional backward recursive- and contraction mapping-based full maximum likelihood estimation pioneered by Rust (1987), referred to as the nested fixed point algorithm (NFXP).
The CCP estimator circumvents having to solve the dynamic programming problem for each
trial value of the structural parameters by making use of a one-to-one mapping between the
normalized value functions and the CCPs established in Hotz and Miller (1993). Therefore,
nonparametric estimates of the CCPs can be inverted to obtain estimates of the normalized
value functions, which can then be used in estimating the structural parameters.
Empirical application the early formulation of CCP estimation had important limitations
relative to the NFXP method. The emerging literature has focused on separate, but related
drawbacks. The first is nonparametric estimation of the CCPs results in less efficient estimates of the structural parameters, as well as, relatively poor finite sample performance. The
second is the difficulty of accounting for unobserved individual heterogeneity, mainly due to
having to estimate the CCPs by nonparametric methods. A limitation of both approaches to
estimation is the difficulty of both the CCP and NFXP estimators is they are largely restricted
to discrete choice, discrete states models.
Aguirregabiria and Mira (2002) proposed a solution to the issue of efficiency and finite
sample performance by showing, for a given value of the preference parameters, the fixed
2
point problem in the value function space can be transformed into a fixed point problem
in the probability space. Aguirregabiria and Mira (2002) propose swapping the nesting of
the NFXP, and show that the resulting estimator is asymptotically equivalent to the NFXP
estimator. Furthermore, Aguirregabiria and Mira (2002) show in simulation studies that
their method produce estimates 5 to 15 times faster than NFXP. The method proposed by
Aguirregabiria and Mira (2002) is restricted to discrete choice models in stationary environments, and is not designed to account for unobserved individual heterogeneity.
Recent developments concerning incorporating unobserved heterogeneity into CCP estimators include Aguirregabiria and Mira (2007), Arcidiacono and Miller (2010). Aguirregabiria and Mira
(2007) incorporate permanent unobserved heterogeneity into stationary, dynamic discrete
games. Their method requires multiple inversion of potentially large dimensional matrices. Arcidiacono and Miller (2010) propose a more general method for incorporating timespecific or time-invariant unobserved heterogeneity into CCP estimators. Their method
modifies the Expectations-Maximization algorithm proposed in Acidiacono (2002). Their
method also allows for a restricted form of correlation between the unobserved and observed
state variables. However, Arcidiacono and Miller’s method is only applicable to discrete
dynamic models.
Altug and Miller (1998) proposed a method for allowing for continuous choices in the
CCP framework. By assuming complete markets, estimates of individual effects and aggregate shocks are obtained, which are then used in the second stage to form (now) observationally equivalent individuals. These observationally equivalent individuals are used
to compute counterfactual continuous choices. Bajari et al. (2007) modify the methods of
Hotz and Miller (1993) and Hotz et al. (1994), to estimating dynamic games. Their method
of modeling unobserved heterogeneity in continuous choices is inconsistent with the dynamic selection.
The finite dependence property; when two different policies associated with different initial choices lead to the same distribution of states after a few periods, is critical for the computational feasibility and finite sample performance of CCP estimators. Finite dependence
combined with the invertibility result of Hotz and Miller (1993) results in significant reduction in computational cost of estimating dynamic structural models. Essentially, the smaller
the order of dependence, the faster and more precise the estimator, because fewer future
choice probabilities that either have to be estimated or updated, depending of the method of
3
estimation. The concept of finite dependence was first introduced by Hotz and Miller (1993),
extended by Altug and Miller (1998), and further by Arcidiacono and Miller (2010). Despite
these generalizations, the concept of finite dependence is largely restricted to discrete choice
models with either stationary transitions or the renewal property.
This paper make three separate, but closely related contributions to the literature on CCP
estimation of dynamic structural models. We extend the concept of finite dependence to allow for general non-stationary and irreducible transition probabilities. While its definition is
precise and well understood, the strategy to construct finite dependence in dynamic structural
models have been largely ad hoc and imprecise, many times relying on assumptions that are
either theoretically unjustified, or significantly restricting the data. Altug and Miller (1998),
Gayle and Miller (2003), and Gayle (2006) rely on complete market and degenerate transition probability assumptions to form counterfactual strategies that obtain finite dependence.
A key insight of Arcidiacono and Miller (2010) is “the expected value of future utilities from
optimal decision making can always be expressed as functions of the flow payoffs and conditional choice probabilities for any sequence of future choices, optimal or not.” This insight
is the basis of our extension of the finite dependence property. We show the expected value
of future utilities from optimal decision making can be expressed as any linear combination of flow payoffs and conditional CCPs, as long as the weights sum to one. This insight
converts the difficult problem of finding one pair of sequences of choices that obtains finite
dependence to a continuum of finite dependencies from which to choose.
Given we are now able to choose from a continuum of finite dependence representations,
the question becomes whether there is a choice that obtains one-period finite dependence. Indeed, there are also a continuum of one-period finite dependencies, regardless of the form of
the transition probabilities. One particular choice has the advantage of being elegant and intuitive, as well as providing a method to accommodate continuous choices. Our approach to
accounting for continuous choices do not rely on first stage estimation as in Altug and Miller
(1998), and Bajari et al. (2007), nor does it require forward simulation as in Hotz et al.
(1994), and Bajari et al. (2007). The proposed method for estimating discrete/continuous
dynamic structural models parallels the method for estimating discrete/continuous static
structural models of Dubin and McFadden (1984), and Hanemann (1984), operationalized
by the inversion result of Hotz and Miller (1993), Arcidiacono and Miller (2010), and the
generalized finite dependence result of this paper.
4
To avoid stochastic degeneracy, the econometric specification of discrete/continuous structural models requires at least as many choice-specific unobserved shocks as there are choices.
The estimator developed in this paper conveniently accommodates these traditional i.i.d.
shocks, as well as, other correlated unobserved heterogeneity. The advantages of the algorithm proposed in this paper are, it does not require specifying initial conditions, it does not
require discrete approximation of the continuous choice variables and the value functions,
and its convergence is well understood.
2 Model
2.1 General framework
The general setup of a dynamic structural discrete/continuous choice model is as follows.
In each period, t, an individual chooses among J discrete, mutually exclusive alternatives.
Let dt j be one if the discrete action j ∈ {1, · · · , J} is taken, and zero otherwise, and define dt = (dt1 , · · · , dtJ ). Associated with each discrete alternative, j, the individual chooses
L j continuous alternatives. Let ctl j ∈ ℜ+ , l j ∈ 1, · · · , L j , be the continuous actions assoL
ciated with alternative j with ctl j > 0 if dt j = 1. Define ct j = (ct1 , · · · , ctL j ) ∈ ℜ+j , and
ct = (ct1 , · · · , ctJ ) ∈ ℜL+ , where L = ∑Jj=1 L j . Also, let ( j, ct j ) be the vector of discrete and
continuous actions associated with alternative j. The current period payoff associated with
action ( j, ct j ) depends on the observed state xt ∈ ℜDx , where Dx is the dimension of xt , the
unobserved state st ∈ ℜDs , where Ds is the dimension of st , the unidimensional discrete-
choice–specific shock ε jt ∈ ℜ, and the L j -dimensional vector of continuous-choice–specific
shocks rt j = (rt1 , · · · , rtL j ) ∈ ℜL j . Let zt = (xt , st ). The probability density function of zt+1
given action ( j, ct j ) is taken in period t is denoted by f jt (zt+1 |zt , ct j ). The shocks associated
with alternative ( j, ct j ) in period t, et j = (εt j , rt j ), are observed to the individual at the beginning of period t. The individual’s conditional direct current period payoff from choosing
alternative ( j, ct j ) in period t is denoted by ut j (zt , ct j , rt j ) + εt j .
Define yt j = (dt j , ct j ). The individual chooses the vector yt = (yt1 , · · · , ytJ ) to sequentially
5
maximize the expected discounted sum of payoffs:
E
(
T
J
∑ ∑ βt−1dt j [ut j (zt , ct j , rt j ) + εt j ]
t=1 j=1
)
,
(2.1)
where β ∈ (0, 1) is the discount factor. In each period, t, the expectation is taken over
zt+1 , · · · , zT and et+1 , · · · , eT . The solution to maximizing expression (2.1) is a Markov decision rule for optimal choice conditional on the time-specific state vectors and i.i.d. shocks.
Let the optimal decision rule at period t be given by (dt0j (zt , et ), ct0j (zt , et )). Let the ex-ante
value function in period t, Vt (zt ), be the discounted sum of expected future payoffs, before
et is revealed, given the optimal decision rule:
Vt (zt ) = E
(
T
J
∑ ∑ βτ−t dτ0j (zτ, eτ)[uτ j (zτ, c0τ j (zτ, eτ), rτ j ) + ετ j ]
τ=t j=1
)
.
As is standard in dicrete/continuous models, the additive separability of the utility function implies the discrete-choice–specific continuous choice is a function of their associated
shocks and not of εt . The expected value function in period t + 1, given zt , rt j , the discrete
choice, j, and corresponding optimal continuous choice, ct0j (zt , rt j ), is
V̄t+1, j (zt , rt ) = β
Z
Vt+1 (zt+1 ) f jt (zt+1 |zt , ct0j (zt , rt j ))dzt+1,
R
where Vt+1 (zt+1 ) = Vt+1 (zt+1 , rt+1 )gr (rt+1 ), and gr is the density function of rt . Let gε
be the density function of ε, and assume that εt and rt are independent random vectors, so
that ge = gε gr . If behavior is governed by a Markov decision rule, then Vt (zt ) can be written
recursively:
Vt (zt ) = E
=
Z
=
Z
(
J
∑
dt0j (zt , et )
j=1
J
∑ dt0j (zt , et )
j=1
J
∑ dt0j (zt , et )
j=1
)
0
ut j (zt , ct j (zt , rt j ), rt j ) + εt j + βV̄t+1, j (zt , rt )
ut j (zt , ct0j (zt , rt j ), rt j ) + εt j + βV̄t+1, j (zt , rt ) ge (et )de,
vt j (zt , ct0j (zt , rt j ), rt j ) + εt j ge (et )de
6
where
vt j (zt , ct0j (zt , rt j ), rt j ) = ut j (zt , ct0j (zt , rt j ), rt j ) + βV̄t+1, j (zt , rt ),
(2.2)
the choice-specific conditional value function without εt j . The optimal conditional continuous choice, given the discrete alternative j being chosen in period t, must satisfy
∂
vt j (zt , ct0j (zt , rt j ), rt j ) = 0,
∂ctl j
(2.3)
for l j = 1, · · · , L j . Given the optimal conditional continuous choice, ct0j (zt , rt j ), the individual’s discrete choice of alternative j is optimal if
dt0j (zt , et ) =
(
1
if
0 (z , r ), r ) + ε
vt j (zt , ct0j (zt , rt j ), rt j ) + εt j > vtk (zt , ctk
t tk
tk
tk
0
otherwise
∀k 6= j
(2.4)
Finally, the optimal unconditional continuous choice, ct j (zt , rt j ), is given by
ct j (zt , rt j ) = dt0j (zt , et )ct0j (zt , rt j ).
(2.5)
2.2 Alternative representation
The probability of choosing alternative j at time t, conditional on zt , rt , and the vector of
0 , · · · , c0 ) is given by
choice-specific optimal conditional continuous choices, ct0 = (ct1
tJ
pt0j (zt , rt ) =
Z
dt0j (zt , rt , εt )gε (εt )dεt ,
(2.6)
so that, for all (zt , rt ), ∑Jj=1 pt0j (zt , rt ) = 1, and pt0j (zt , rt ) > 0 for all j. Let pt0 (zt , rt ) =
0 (z , r ), · · · , p0 (z , r )) be the vector of conditional choice probabilities. Lemma 1 of
(pt1
t t
tJ t t
Arcidiacono and Miller (2010) show a function ψ : [0, 1]J 7→ ℜ exists such that, for k =
1, · · · , J
0
ψk (pt (zt , rt )) ≡ Vt (zt , rt ) − vtk (zt , ctk
(zt , rtk ), rtk ).
(2.7)
Equation (2.7) is simply equation (3.5) of Arcidiacono and Miller (2010), modified so the
choice probabilities and value functions are also conditional on the i.i.d. shocks associated
with the conditional continuous choices. Our key insight is, given equation (2.7) holds for
7
k = 1, · · · , J, then for any J-dimensional vector of real numbers at = (at1 , · · · , atJ ) such that
∑Jk=1 atk = 1, we have
J
Vt (zt , rt ) =
∑ atk [vtk (zt , ctk0 (zt , rtk ), rtk ) + ψk (pt (zt , rt ))].
(2.8)
k=1
Let at+1, j = (at+1,1 j , · · · , at+1,J, j ) be the weights associated with the initial discrete choice,
j, in period t. Denote f jt (zt+1 |zt , ct0j (zt , rt j )) by f jt (zt+1 |zt , rt j ). Substituting equation (2.8)
into equation (2.2) gives:
vt j (zt , ct0j (zt , rt j ), rt j ) = ut j (zt , ct0j (zt , rt j ), rt j )
J Z
+β ∑
0
[vt+1,k (zt+1 , ctk
(zt , rtk ), rt+1,k )
k=1
+ ψk (pt+1 (zt+1 , rt+1 ))]at+1,k j gr (rt+1)drt+1 f jt (zt+1|zt , rt j )dzt+1 ,
(2.9)
Equation (2.9) shows the value function conditional on (zt , rt ) can be written as the flow
payoff of the choice plus any weighted sum of a function of the one period ahead CCPs
plus the one period ahead conditional value functions, where the weights sum to one. This
modest extension of the results of Arcidiacono and Miller (2010) provides a powerful tool
for obtaining finite dependence in any model which can be formulated as the one developed
in the previous section. Specifically, note the following.
2.3 Generalized finite dependence
R
Define f jt (zt+1 |zt ) = f jt (zt+1 |zt , rt j )gr (rt j )drt j . For any initial choice ( j, ct j ), for periods τ = {t + 1, · · · ,t + ρ}, and any corresponding sequence aτ = {aτk j , k, j = 1, · · · , J} with
∑Jk=1 aτk j = 1, define
κτ j (zτ+1 , |zt , rt j ) =
(
for τ = t
f jt (zt+1 |zt , rt j )
R
.
∑Jk=1 aτ+1,k j fkτ (zτ+1 |zτ )κτ−1, j (zτ |zt , rt j )dzτ for τ = t + 1, · · · ,t + ρ
(2.10)
R
Because ∑Jk=1 aτk j = 1, κτ j (zτ+1 , |zt )dzτ+1 = 1. We also impose the restriction a = (at+1 , · · · , at+ρ )
is so that κt+ρ+1 j (zt+ρ+1 , |zt ) > 0. This restriction does not require a j ≥ 0. Then, by forward
8
substitution, equations (2.9) and (2.10) obtain
vt j (zt , ct0j (zt , rt j ), rt j ) = ut j (zt , ct0j (zt , rt j ), rt j )
t+ρ
+
J Z
∑ ∑
τ=t+1 k=1
+β
t+ρ+1
Z
βτ−t [uτk (zτ , c0τk (zτ , rτk ), rτk ) + ψk [pτ (zτ , rτ )]]aτk j gr (rτ )κτ−1, j (zτ |zt , rt j )drτ dzτ
Vt+ρ+1 (zt+ρ+1 , rt+ρ+1 )gr (rt+ρ+1 )κt+ρ+1, j (zt+ρ+1 |zt , rt j )drt+ρ+1 dzt+ρ+1 .
(2.11)
Equation (2.11) shows the alternative-specific conditional value functions depend on the sequence of (possibly non-optimal) choice probabilities pt+1 = (pt+1 (zt+1 , rt+1 ), · · · , pT (zT , rT )),
and their corresponding optimal alternative-specific continuous choices ct0 = (ct0 (zt , rt ), · · · , c0T (zT , rT )),
so that
vt j (zt , ct0j (zt , rt j ), rt j ) = vt j (zt , ct0 , pt+1 , rt j ).
(2.12)
In what follows, we suppress this dependence on (ct0 , pt+1 ) and reintroduce them when clarity is required. Using equation (2.11), we can therefore express the difference in the conditional value functions associated with two alternative initial choices, j and j′ as
vt j (zt , rt j ) − vt j′ (zt , rt j′ ) = ut j (zt , rt j ) − ut j′ (zt , rt j′ )
t+ρ
+
J Z
∑ ∑
τ=t+1 k=1
βτ−t [uτk (zτ , rτk ) + ψk [pτ (zτ , rτ )]]gr (rτ )drτ
× [aτk j κτ−1, j (zτ |zt , rt j ) − aτk j′ κτ−1, j′ (zτ |zt , rt j′ )]dzτ
+β
t+ρ+1
Z
Vt+ρ+1 (zt+ρ+1 , rt+ρ+1 )gr (rt+ρ+1 )drt+ρ+1
× [κt+ρ, j (zt+ρ+1 |zt , rt j ) − κt+ρ, j′ (zt+ρ+1 |zt , rt j′ )]dzt+ρ+1.
(2.13)
Therefore, we say a pair of initial choices, ( j, ct j ) and ( j′ , ct j′ ) exhibit generalized ρ-period
dependence if corresponding sequences (at+1, j , · · · , at+ρ, j ), and (at+1, j′ , · · · , at+ρ, j′ ) exists
such that
κt+ρ, j (zt+ρ+1 |zt , rt j ) = κt+ρ, j′ (zt+ρ+1 |zt , rt j′ )
almost everywhere.
9
(2.14)
We now show that this generalization of the finite dependence property can be used to
obtain one-period dependence for any model that satisfies the setup given in the previous
section. For initial choice ( j, ct j ),
κt+1, j (zt+2 |zt ) =
Z
J
∑ at+1,k j fkt+1 (zt+2|zt+1) f jt (zt+1|zt , rt j )dzt+1,
k=1
so for any pair of initial choices, ( j, ct j ) and ( j′ , ct j′ ),
κt+1, j (zt+2 |zt , rt j ) − κt+1, j′ (zt+2 |zt , rt j′ )
=
=
Z
J
∑ [at+1,k j fkt+1 (zt+2|zt+1) f jt (zt+1|zt , rt j ) − at+1,k j′ fkt+1 (zt+2|zt+1) f j′t (zt+1|zt , rt j′ )]dzt+1
k=1
Z J
∑ fkt+1 (zt+2|zt+1)[at+1,k j f jt (zt+1|zt , rt j ) − at+1,k j′ f j′t (zt+1|zt , rt j′ )]dzt+1.
(2.15)
k=1
Then a sufficient condition for one-period dependence is
at+1,k j f jt (zt+1 |zt , rt j ) = at+1,kl f j′t (zt+1 |zt , rt j′ ),
k = 1, · · · , J.
(2.16)
One particular choice that has an elegant representation is
at+1,l j = 1,
at+1, j j′ = f jt (zt+1 |zt , rt j )/ f j′t (zt+1 |zt , rt j′ ),
at+1,ll = 1 − at+1, j j′ .
(2.17)
Given the volume of alternative choices of weights that obtains one-period finite dependence,
and
we proceed by assuming ρ = 1, so
vt j (zt , rt j ) − vt j′ (zt , rt j′ ) = ut j (zt , rt j ) − ut j′ (zt , rt j′ )
+β
Z
Z
J
∑ [ut+1,k (zt+1, rt+1,k ) + ψk [pt+1 (zt+1, rt+1)]]gr(rt+1)drt+1
k=1
× [at+1k j ft j (zt+1 |zt , rt j ) − at+1k j′ ft j′ (zt+1 |zt , rt j′ )]dzt+1 .
10
!
(2.18)
2.4 Optimality conditions
The alternative representation of the difference in conditional value functions provide a simple and convenient representation of the condition for optimal conditional continuous choice,
ct0j (zt , rt j ) given alternative j is chosen. The key is to note ∂vt j′ (zt , ct0j′ (zt , rt j′ ), rt j′ )/∂ctl j = 0
for j′ 6= j and l j = 1, · · · , L j . This equality, and equation (2.18) imply ct0j (zt , rt j ) solves
0=
∂
ut j (zt , ct0j (zt , rt j ), rt j )
∂ctl j
+β
Z
Z
J
0
(zt+1 , rt+1,k ), rt+1,k ) + ψk [pt+1 (zt+1 , rt+1 )]]gr (rt+1 )drt+1
∑ [ut+1,k (zt+1, ct+1,k
k=1
∂
×
[at+1k j ft j (zt+1 |zt , rt j ) − at+1k j′ ftl (zt+1 |zt , rt j′ )]dzt+1
∂ctl j
!
(2.19)
The key to this solution is to note at+1k j′ will typically be a function of ct0j′ . For example,
expression (2.17) implies
∂
∂
at+1 j j′ = [ f j′t (zt+1 |zt , rtl )]−1
f jt (zt+1 |zt , rt j ),
∂ctl j
∂ctl j
and
∂
∂
at+1, j′ j′ = −
at+1 jl .
∂ctl j
∂ctl j
(2.20)
Substituting the optimal conditional continuous choice into equation (2.17) obtains
vt j (zt , ct0j (zt , rt j ), rt j ) − vt j′ (zt , ct0j′ (zt , rt j′ ), rt j′ ) = ut j (zt , ct0j (zt , rt j ), rt j ) − ut j′ (zt , ct0j′ (zt , rt j′ ), rt j′ )
!
Z Z
+β
J
0
(zt+1 , rt+1,k ), rt+1,k ) + ψk [pt+1 (zt+1 , rt+1 )]]gr (rt+1 )drt+1
∑ [ut+1,k (zt+1, ct+1,k
k=1
× [at+1k j ft j (zt+1 |zt , rt j ) − at+1k j′ ft j′ (zt+1 |zt , rt j′ )]dzt+1 .
(2.21)
3 Identification
For each individual unit, the random variables (dt , ct , xt ), t = 1, · · · , T are observable. Hence
in the population, the joint distribution F(dt , ct , xt ) is observed. Identification is semiparametric in the sense that we assume parametric restrictions on ut j , gε , and gr , but we only
impose shape, exclusion, and granularity restrictions on f jt (zt+1 |zt , rt j ) and Π(st , st+1 |xt , rt ),
11
the conditional distribution of the unobserved state. For t = 1, · · · , T , j = 1, · · · , J, k 6= j,
define ut jk (zt , ct , rt ) = ut j (zt , ct j , rt j ) − utk (zt , ctk , rtk ), and vt jk (zt , ct , rt ) = vt j (zt , ct j , rt j ) −
vtk (zt , ctk , rtk ). In what follows, we will use the shorthand notations ut jk and vt jk for ut jk (zt , ct , rt )
and vt jk (zt , ct , rt ).
Assumption 3.1. For t = 1, · · · , T , j = 1, · · · , J, k 6= j,
1. β ∈ [0, 1) is known.
2. ut j (zt , ct j , rt j ) = ut j (zt , ct j , rt j ; θ1 ) is known up to θ1 , and increasing and strictly concave
in ctl j , with limctl j →0 ut j (zt , ct j , rt j ; θ1 ) = −∞, l j = 1, · · · , L j .
3. Pr (xt , st , rt )|ut jk (θ1 ) 6= 0, θ1 ∈ Θ1 ; ut jk (θ1 ) 6= ut jk (θ̃1 ), θ1 6= θ̃1 ∈ Θ1 = 1.
4. ε j and εk are independent with common density function, gε , which is twice continuously
differentiable and log-concave with support ℜ.
5. gr (rt ) = gr (rt ; θ2 ), is continuous and known up to θ2 .
6. f jt (zt+1 |zt , ct j ) = f jt (xt+1 |xt , st , ct j )πt (st+1 |st , xt ), where π(st+1 |st , xt ) is the conditional
density of st+1 .
7. ft j (xt+1 |xt , st , ct j ) is twice continuously differentiable in ctl j , l j = 1, · · · , L j , with
∂
ft j (xt+1 |xt , st , ct j ) < ∞.
max sup l j =1,···,L j ctl ∂ctl j
j
8. Let wt be a proper subset of xt . Π(st , st+1 |xt ) ∈ {(sq , sq′ , πqq′ (wt )), q = 1, · · · , Q, q′ =
1, · · · , Q; Q ∈ N }.
T
/
9. Dx ≥ 2, where Dx is the dimension of xt , and for at least one k ∈ {1, · · · , Dx }, xtk wt = 0,
and the conditional distribution of xkt given x−k,t , where x−k,t = {x jt , j 6= k}, is absolutely
continuous with respect to a Lebesgue measure with non-degenerate Radon-Nikodym density. Without loss of generality, set k = 1.
R
10. Let θ = (θ1, θ2 ), and pt0j (xt , sq ; θ) = pt0j (xt , sq , rt ; θ)gr (rt ; θ2 )drt . pt0j (xt , sq ; θ) 6= pt0j (xt , sq′ ; θ)
for sq 6= sq′ , and for any wt and sq , αt j ∈ ℜ, depending on wt and sq , exists such that
P({xt |pt0j (xt , sq ; θ) = αt j }) > 0.
12
Define
0
ptt+1,
jk (xt ; θ, Π) =
Z
Q
Q
0
πqq′ (wt )pt0j (xt , sq ; θ)pt+1,k
(xt+1 , sq′ ; θ) fk,t (xt+1 |xt , sq ),
∑∑
′
q=1 q =1
0
and P0 (x; θ, Π) = (ptt+1,
jk (xt ; θ, Π),t = 2, · · · , T − 1, j, k = 1, · · · , J). Let (θ0 , Π0 ) be the true
parameter vector, in that the probabilities generated from the model at (θ0 , Π0) coincides
with the true probabilities: P(x) = P0 (x; θ0 , Π0 ). We say that the parameters of the model
are identified if x distinguishes between the true parameter vector and any other admissible
parameter vector:
(θ, Π) 6= (θ0 , Π0 ) ⇒ P0 (x; θ, Π) 6= P0 (x; θ0 , Π0 ),
with probability one.
Theorem 3.2. Suppose assumption 3.1 holds. Then (θ0 , Π0) is identified.
The proof of 3.2 is in Appendix A.1.
4 Estimator
In this section, we propose a GMM estimator for the parameters of the model, θ. We choose
to propose a GMM estimator instead of the ML estimator for two reasons. First, definition of
the GMM estimator does not require specifying the distribution of measurement errors. Second, the GMM estimator is implementable without the need to specify the initial conditions
for the process of st . We begin by imposing a parametric restriction on πqq′ (w).
Assumption 4.1. For q, q′ = 1, · · · , Q, πqq′ (w) = πqq′ (w; θ3 ) is known up to a set of finitedimensional parameters, θ3 .
Rechristen θ = (θ1 , θ2 , {sq , q = 1, · · · , Q}, θ3 ). Suppose n observations of the random
vector {yit = (dit , cit , xit ),t = 1, · · · , T } are independently drawn from F(dt , ct , xt ). Let yi =
(y′i3 , · · · , y′iT )′ , p0i = (p0i3 , · · · , p0iT ), and c0i = (c0i3 , · · · , c0iT ). For each i, and for t = 3, · · · , T ,
13
define the residuals
ρ1it j (yi ; θ) = dit j − pt0j (xit ; θ), j = 2, · · · , J,
ρ2it j (yi ; θ) = cit j − ct0j (xit ; θ), j = 1, · · · , J,
0
ρ3t jk (yi ; θ) = dit j dit+1,k − ptt+1,
jk (xit ; θ),
j = 1, · · · , J − 1, k = j + 1, · · · , J.
Define
ρ1it (yi ; θ) = (ρ1it j (yi ; θ), j = 2, · · · , J),
ρ2it (yi ; θ) = (ρ2it j (yi ; θ), j = 1, · · · , J),
and
ρ3it (yi ; θ) = (ρ3t jk (yi ; θ), j = 1, · · · , J − 1, k = j + 1 · · · , J).
Define the L + (J + 2)(J − 1)/2-dimensional residual vector ρit (yi ; θ) = (ρ1it , ρ2it , ρ3it )′ . Let
Xit be a (L + (J + 2)(J − 1)/2) × NtX matrix of instruments, and define the NtX -dimensional
vector.
mit (θ) = Xit′ ρit (yi ; θ).
(4.1)
T −1 X
Nt ,
Define the N X -dimensional vector mi (θ) = (mi2 (θ)′ , · · · , miT −1 (θ)′ )′ , where N X = ∑t=2
and the N X -dimensional vector of moments m(θ) = E[mi (θ)]. Let Ω be a N X ×N X -dimensional
symmetric, positive definite weighting matrix. Then from the results of section 3, θ0 minimizes
S(θ) = m(θ)′ Ωm(θ).
(4.2)
To implement the estimator, we assume the number of types, Q, is known to the investigator. We discuss methods for estimating Q at the end of this section. Let
m̂(θ) =
1 n
∑ mi(θ).
n i=1
(4.3)
Then the GMM estimator for θ0 , θ̂ minimizes
Ŝ(θ) = m̂(θ)′ Ω̂m̂(θ),
14
(4.4)
where Ω̂ is a consistent estimator for Ω.
5 Computing The Estimator
In this section, we present a method for computing the estimator proposed in the previous
section. We begin by discussing updating the discrete alternative-specific CCPs and continuous choice variables. We describe the algorithm starting with the o + 1 iteration with
p0,[o] , c0,[o] , θ[o] in hand.
The alternative-specific continuous choice variables can be updated by iterating (over o′ )
0,[o′ +1]
cit j
"
0,[o′ ]
(xit , rit j ; θ[o] ) = cit j (xit , rit j ; θ[o] )
#−1
′
∂2
[o ]
0,[o] [o]
vit j (xit , cit j (xit , rit j ; θ[o] ), cit+1 , pit+1 , rit j ; θ[o] )
−
2
∂cit j
∂
[o′ ]
0,[o] [o]
[o]
[o]
vit j (xit , cit j (xit , rit j ; θ ), cit+1 , pit+1 , rit j ; θ ) ,
×
∂cit j
0,[o]
until convergence, where the procedure is initialized by cit
0,[o+1]
.
cit
(5.1)
. Denote the fixed point by
The conditional choice probabilities can be updated as follows:
[o+1]
pit j
0,[o+1]
Given ci
[o+1]
(xit , rit j ; θ[o] ) = Ψ j (vit j (xit , cit j
0,[o+1]
(θ[o] ), and pi
0,[o]
[o]
(xit , rit j ; θ[o] ), cit+1 , pit+1 , rit j ; θ[o] )).
(5.2)
(θ[o] ), θ is updated as follows
θ[o+1] = θ[o]
′ −1
∂
∂
[o] 0,[o+1] [o]
0,[o+1] [o]
[o] 0,[o+1] [o]
0,[o+1] [o]
−
m(θ ; c
(θ ), p
(θ )) Ω̂
m(θ ; c
(θ ), p
(θ ))
∂θ
∂θ
′
∂
[o] 0,[o+1] [o]
0,[o+1] [o]
×
m(θ ; c
(θ ), p
(θ )) Ω̂ m(θ[o] ; c0,[o+1] (θ[o] ), p0,[o+1] (θ[o] )), (5.3)
∂θ
where c = (ci3 , i = 1, · · · , n) and p = (pi3 , i = 1, · · · , n). The full algorithm is as follows.
15
Main Algorithm.
1– Initialize θ[0] ∈ Θ, p0,[0] ∈ [0, 1][n(T−3)(J−1)] , and c0,[0] ∈ ℜn(T −3)L .
2– For o ≥ 1
2.1– Update c0,[o] using (5.1)
2.2– Update p0,[o] using equation (5.2)
2.3– Update θ[o] using equation (5.3)
Until convergence in θ.
Notice that steps 2.1 and 2.1 of the main algorithm is simply evaluating the objective
function at the new trial values of θ, while step 2.3 is a Gauss-Newton step. Hence, the main
algorithm converges, and the convergence rate is at most quadratic.
6 Asymptotic properties and consistent covariance estimation
This section provides additional sufficient conditions for consistency and
normality of the estimator, θ̂, of θ0 ∈ Θ.
√
n-asymptotic
Assumption 6.1. As sample of n independent realizations is drawn from F(d, c, x). For each
i = 1, · · · , n, (dit , cit , xit ,t = 1, · · · , T ) is observed.
Assumption 6.2. Ω̂ is symmetric and positive definite with kΩ̂ − Ωk = o p (1).
Theorem 6.3. Suppose (i) Assumption 3.1 holds, (ii) Assumption 4.1 hold, and (iii) Assumpp
tions 6.1, and 6.2 hold. Then θ̂ −→ θ0 , and
√
p
n(θ̂ − θ0 ) −→ N(0,V ),
where V = (M ′ ΩM)−1 (M ′ ΩΣΩM)(M ′ΩM)−1 , M = ∂m(θ0 )/∂θ, and Σ = E[mi (θ0 )mi (θ0 )′ ].
The proof of Theorem 6.3 is standard and can be found in Newey and McFadden (1994).
To estimate V , let M̂ = m(θ̂)/∂θ, Σ̂ = ∑ni=1 mi (θ̂)mi (θ̂)′ /n, and define
V̂ = (M̂ ′ Ω̂M̂)−1 (M̂ ′ Ω̂Σ̂Ω̂M̂)(M̂ ′ Ω̂M̂)−1 .
16
Theorem 6.4. Suppose the conditions of Theorem 6.3 hold. Then kV̂ −V k = o p (1).
Again, the proof of Theorem 6.4 is standard, and can also be found in Newey and McFadden
(1994).
7 Monte Carlo study
A LEMMA AND THEOREMS
A.1
Proof of Theorem 3.2
For fixed θ1 , in last period, T , assumption 3.1.2 implies a unique solution of cT l j to the equation ∂uT j (zT , cT j , rT j ; θ1 )/∂cT l j = 0, j = 1, · · · , J, l j = 1, · · · , L j , denoted by c0T l j (zT , rT ).
This solution, along with assumptions 3.1.3, and 3.1.4 imply unique pT j (zT , rT ; θ1 ), j =
1, · · · , J. Note that, under the conditions of the Theorem, f jt (xt+1 |xt , ct j ) can be computed
from the population distribution of the observables. Now consider period T − 1. For fixed
(θ, Π), these results, along with assumptions 3.1.1, and 3.1.5-3.1.7, vT −1, j (zT −1 , rT −1 ; θ, Π) =
uT −1, j (zT −1 , cT −1. j , rT −1, j ; θ1 ) + βV̄T j (zT −1 , rT −1, j ; θ, Π), j = 1, · · · , J are unique. For fixed
aT j,k , j, k = 1, · · · , J, vT −1, j j′ (zT −1 , rT −1 ; θ, Π) has a unique representation as function of
uT −1, j j′ (zT −1 , rT −1 ; θ, Π), (aT jk , aT j′ k ), k = 1, · · · , J, uT (zT , cT , rT ; θ1 ), and pT (xT ; θ, Π), given
in equation (2.18), from which c0T −1,l j (zT −1 , rT −1 ; θ, Π) uniquely solves equation (2.19),
and in turn, pT −1, j (zT −1 , rT −1 ; θ, Π) is unique. Proceeding by backward induction obtain
unique p jt (zt , rt , θ, Π), j = 1, · · · , J, t = 1, · · · , J, from which we obtain unique p jt (zt , θ, Π),
j = 1, · · · , J, t = 1, · · · , J, and unique ptt+1, jk (xt ; θ, Π), j, k = 1, · · · , J, t = 1, · · · , T − 1. By
assumption, ptt+1, jk (xt ; θ0 , Π0 ) = ptt+1, jk (xt ), the population probability. Suppose (θ, Π) 6=
(θ0 , Π0 ) exists with ptt+1, jk (xt ; θ, Π) = ptt+1, jk (xt ). Then under the assumptions of the theorem, Gayle (2012) shows that (θ, Π) = (θ0 , Π0 ). In particular, the log-concavity of gε ,
assumptions 3.1.9, and 3.1.10 along with the uniqueness of basis vectors obtain the result.
17
References
AGUIRREGABIRIA , V. AND P. M IRA (2002): “Swapping The Nested Fixed Point Algorithm: A Class of Estimators for Distrete Markov Decision Models,” Econometrica, 70,
1519–1543.
——— (2007): “Sequential Estimation of Dynamic Discrete Games,” Econometrica, 75,
1–53.
A LTUG , S. AND R. A. M ILLER (1998): “Effect of Work Experience of Female Wages and
Labour Supply,” The Review of Economic Studies, 65, 45–85.
A RCIDIACONO , P. AND R. M ILLER (2010): “CCP Estimation of Dynamic Discrete Choice
Models with Unobserved Heterogeneity,” MIMEO, Duke University.
BAJARI , P., C. B ENKARD , AND J. L EVIN (2007): “Estimating Dynamic Models of Imperfect Competition,” Econometrica, 75.
D UBIN , J. A. AND D. L. M C FADDEN (1984): “An Econometric Analysis of Residential
Electric Applicance Holdings and Consumption,” Econometrica, 52, 345–362.
G AYLE , G.-L. AND R. A. M ILLER (2003): “Life-Cycle Fertility and Human Capital Accumulation,” Unpublished Manuscript, Carnegie Mellon University.
G AYLE , W.-R. (2006): “A Dynamic Structural Model of Labor Market Supply and Educational Attainment,” MIMEO: University of Virginia.
H ANEMANN , M. W. (1984): “Discrete/Continuous Models of Consumer Demand,” Econometrica, 52, 541–561.
H OTZ , J. V. AND R. A. M ILLER (1993): “Conditional Choice Probabilities and the Estimation of Dynamic Models,” Review of Economic Studies, 60, 497–529.
H OTZ , J. V., R. A. M ILLER , S. S ANDERS , AND J. S MITH (1994): “A Simulation Estimator
for Dynamic Models of Discrete Choice,” Review of Economic Studies, 61, 265–289.
N EWEY, W. K. AND D. M C FADDEN (1994): Large Sample Estimation and Hypothesis
Testing, Elsevier Science Publishers.
RUST, J. (1987): “Optimal Replacement of GMC Bus Engines: An Empirical Model of
Harold Zurcher,” Econometrica, 55, 999–1033.
18
Download