Estimation with Aggregate Shocks Jinyong Hahn Guido Kuersteiner Maurizio Mazzocco

advertisement
Estimation with Aggregate Shocks
Jinyong Hahn
Guido Kuersteinery
Maurizio Mazzoccoz
UCLA
University of Maryland
UCLA
February 10, 2016
Abstract
Aggregate shocks a¤ect most households’and …rms’decisions. Using two stylized models
we show that inference based on cross-sectional data alone generally fails to correctly account
for decision making of rational agents facing aggregate uncertainty. We propose an econometric framework that overcomes these problems by explicitly parametrizing the agents’inference
problem relative to aggregate shocks. We advocate the combined use of cross-sectional and
time series data. Our framework and corresponding examples illustrate that cross-sectional
and time series aspects of the model are often interdependent. Our inferential theory rests
on a new joint Central Limit Theorem for combined time-series and cross-sectional data sets.
In models with aggregate shocks, the asymptotic distribution of the parameter estimators
are a combination of asymptotic distributions from cross-sectional analysis and time-series
analysis.
UCLA, Department of Economics, 8283 Bunche Hall, Mail Stop:
147703, Los Angeles, CA 90095,
hahn@econ.ucla.edu
y
University of Maryland, Department of Economics, Tydings Hall 3145, College Park, MD, 20742, kuersteiner@econ.umd.edu
z
UCLA, Department of Economics, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095, mmazzocc@econ.ucla.edu
1
1
Introduction
An extensive body of economic research suggests that aggregate shocks have important e¤ects on
households’and …rms’decisions. Consider for instance the oil shock that hit developed countries
in 1973. A large literature has provided evidence that this aggregate shock triggered a recession
in the United States, where the demand and supply of non-durable and durable goods declined,
in‡ation grew, the unemployment rate increased, and real wages dropped.
The profession has generally adopted one of the following three strategies to deal with aggregate shocks. The most common strategy is to assume that aggregate shocks have no e¤ect on
households’ and …rms’ decisions and, hence, that aggregate shocks can be ignored. Almost all
papers estimating discrete choice dynamic models or dynamic games are based on this premise.
Examples include Keane and Wolpin (1997), Bajari, Bankard, and Levin (2007), and Eckstein and
Lifshitz (2011). We show that, if aggregate shocks are an important feature of the data, ignoring
them generally leads to inconsistent parameter estimates. The second approach is to include time
dummies in the model in an attempt to capture the e¤ect of aggregate shocks on the estimation
of the parameters of interest, as was done for instance in Runkle (1991) and Shea (1995). We
clarify that, if the econometrician does not account properly for aggregate shocks, the parameter
estimates will generally be inconsistent even if the actual realizations of the aggregate shocks are
observed. The inclusion of time dummies, therefore, fails to solve the problem.1 The last strategy
is to fully specify how aggregate shocks a¤ect individual decisions jointly with the rest of the
structure of the economic problem. Using this approach, the econometrician can obtain consistent
estimates of the parameters of interest. We are aware of only one paper that uses this strategy,
Lee and Wolpin (2010). Their paper is primarily focused on the estimation of a speci…c empirical
model. They do not address the broader question of which statistical assumptions and what type
of data requirements are needed more generally to obtain consistent estimators when aggregate
shocks are present. Moreover, as we argue later on, in Lee and Wolpin’s (2010) paper there are
issues with statistical inference and e¢ ciency.
1
In the Euler equation context, Chamberlain (1984) uses examples to argue that, when aggregate shocks are
present but disregarded, the estimated parameters are generally inconsistent. His examples make clear that time
dummies do not solve the problems introduced by the presence of aggregate shocks.
2
The previous discussion reveals that there is no generally agreed on econometric framework
for statistical inference in models where aggregate shocks have an e¤ect on individual decisions.
We show that inference based on cross-sectional data alone generally fails to correctly account
for decision making of rational agents facing aggregate uncertainty. By parametrizing aggregate
uncertainty and explicitly accounting for it when solving for the agents decision problem we are
able to o¤er an econometric framework that overcomes these problems. We advocate the combined
use of cross-sectional and time series data, and we develop simple to use formulas for test statistics
and con…dence intervals that enable the combined use of time series and cross-sectional data. We
proceed in two steps.
We …rst introduce a general class of models with the following two features. First, each model
in this class is composed of two submodels. The …rst submodel includes all the cross-sectional
features, whereas the second submodel is composed of all the time-series aspects. As a consequence,
the parameters of the model can also be divided into two groups: the parameters that characterize
the cross-sectional submodel and the parameters that enter the time-series submodel. The second
feature is that the two submodels are linked by a vector of aggregates shocks and by the parameters
that govern their dynamics. Individual decision making thus depends on aggregate shocks.
Given the interplay between the two submodels, the aggregate shocks have complicated e¤ects
on the estimation of the parameters of interest. To better understand those e¤ects, we present
stylized examples of the general framework that illustrate the complexities introduced by the aggregate shocks. The examples show that, if the existence of aggregate shocks is ignored in the
estimation of these models, the estimated parameters will generally be inconsistent and biased. An
important implication of this results is that the presence of aggregate shocks generally invalidates
cross-sectional inference. The examples show that in general the parameters of our models can
be consistently estimated only if cross-sectional variation is used in combination with time-series
variation. We argue that the best approach to consistently estimate the parameters of the investigated models is to combine cross-sectional data with a long time-series of aggregate variables. An
alternative method would be to combine cross-sectional and time-series variation by using panel
data. Panel data, however, are generally too short to achieve consistency, whereas long time-series
data are easier to …nd for most of the variables that are of interest to economists.
In the second part of the paper, we derive the asymptotic distribution of the parameter esti3
mators. The distribution is derived by assuming that the dimension of the cross-sectional data n
as well as the dimension of the time series data
go to in…nity.2 The derivation of the asymp-
totic distribution of the parameter estimators for our class of models is important for two reasons.
First, the asymptotic distribution enables economists who estimate models with aggregate shocks
to perform statistical inference. To our knowledge, a framework to perform statistical inference for
our class of models had not been derived in the theoretical econometrics literature.3 Second, the
derivation of the asymptotic distribution can be used to show that, in our general class of models,
consistency can be achieved only if the dimension of the time-series data goes to in…nity. Letting
the dimension of the cross-sectional data go to in…nity is not su¢ cient. This is a remarkable result
because it indicates that even the parameters of the cross-sectional submodel will be inconsistent
if the time-series is not su¢ ciently long. This result explains why time-series of aggregate data
should generally be preferred to panel data, the latter having typically a short time dimension.
Our asymptotic results are of interest also from a technical point of view. In order to deal with
parameters estimated using two data sets of completely di¤erent nature, we adopt the notion of
stable convergence and establish a Central Limit Theorem. Stable convergence dates back to Reyni
(1963) and was recently used in Kuersteiner and Prucha (2013) in a panel data context to establish
joint limiting distributions. Using this concept, we show that the asymptotic distributions of the
parameter estimators are a combination of asymptotic distributions from cross-sectional analysis
and time-series analysis.
We also derive a novel result related to the unit root literature. We show that, when the
time series data are characterized by unit roots, the asymptotic distribution is a combination
of a normal distribution and the distribution found in the unit root literature. Therefore, the
asymptotic distribution exhibits mathematical similarity to the inferential problem in predictive
regressions, as discussed by Campbell and Yogo (2006). While the similarity is super…cial in that
Campbell and Yogo’s (2006) result is about an estimator based on a single data source, it leads
2
If the econometrician uses panel data combined with time-series data, it is assumed that the time dimension
of the panel T remains constant.
3
Lee and Wolpin (2006, footnote 37) attempt to use standard statistical inference. Donghoon Lee, in private
communication, kindly informed us that the standard errors in their paper do not account for the noise introduced
by the estimation of the time series parameters, i.e. the standard errors reported in the paper assume that the
time-series parameters are known or, equivalently, …xed at their estimated value.
4
to the same need to address uniform inference. Phillips (2014) proposes a method of uniform
inference for predictive regressions, which we adopt and modify to our own estimation problem in
the unit root case.
Our paper also contributes to a growing literature whose objective is the estimation of general
equilibrium models. Some examples of papers in this literature are Heckman and Sedlacek (1985),
Heckman, Lochner, and Taber (1998), Lee (2005), Lee and Wolpin (2006), Gemici and Wiswall
(2011), Gillingham, Iskhakov, Munk-Nielsen, Rust, and Schjerning (2015). Aggregate shocks are
a natural feature of general equilibrium models. Without them those models have the unpleasant
implication that all aggregate variables can be fully explained by observables and, hence, that
errors have no e¤ects on those variables. Our general econometric framework makes this point
clear by highlighting the impact of aggregate shocks on parameter estimation and the variation
required in the data to estimate those models. More importantly, our results provide formulas
that can be used to perform statistical inference in a general equilibrium context.
The remainder of the paper is organized as follows. In Section 2, we introduce the general
model and discuss two examples. In Section 3, we present the intuition underlying our main
result, which is presented in Section 4.
2
Model
This section is composed of three parts. In the …rst subsection, we introduce in a general form the
identi…cation problem generated by the existence of aggregate shocks. In the second subsection, we
describe a simple stylized example which illustrates that, in the presence of aggregate shocks, the
estimates of the model parameters will generally be inconsistent and biased if the econometrician
uses exclusively cross-sectional data. In the third subsection, we consider a general equilibrium
example that highlights the complex relationship between the estimation of the cross-sectional
and time-series parameters when aggregate shocks are present. In the last subsection, we address
the question of whether long panels can be used to solve the problems considered in this paper.
5
2.1
The General Identi…cation Problem
We consider a class of models with four main features. First, the model can be divided into two
parts. The …rst part encompasses all the static aspects of the model and will be denoted with
the term cross-sectional submodel. The second part includes the dynamic aspects of the aggregate
variables and will be denoted with the term time-series submodel. Second, the two submodels
are linked by the presence of a vector of aggregate shocks
t
and by the parameters that govern
their dynamics. Third, the vector of aggregate shocks may not be observed. If that is the case,
it is treated as a set of parameters to be estimated. Lastly, the parameters of the model can be
consistently estimated only if a combination of cross-sectional and time-series data are available.
We now formally introduce the general model. The variables that characterize the model can
be divided into two vectors yi;t and zs . The …rst vector yi;t includes all the variables that characterize the cross-sectional submodel, where i describes an individual decision-maker, a household
or a …rm, and t a time period in the cross-section.4 The second vector zs is composed of all
the variables associated with the time-series model. Accordingly, the parameters of the general
model can be divided into two sets,
and . The …rst set of parameters
cross-sectional submodel, in the sense that, if the second set
was known,
characterizes the
and
t
can be con-
sistently estimated using exclusively variation in the cross-sectional variables yi;t . Similarly, the
vector
characterizes the time-series submodel meaning that, if
and
t
were known, those para-
meters can be consistently estimated using exclusively the time series variables zs . There are two
functions that relate the cross-sectional and time-series variables to the parameters. The function
f (yi;t j ; t ; ) restricts the behavior of the cross-sectional variables conditional on a particular
value of the parameters. Analogously, the function g (zs j ; ) describes the behavior of the timeseries variables for a given value of the parameters. An example is a situation in which (i) the
variables yi;t for i = 1; : : : ; n are i.i.d. given the aggregate shock
to ( s ;
s 1 ),
t,
(ii) the variables zs correspond
(iii) the cross-sectional function f (yi;t j ; t ; ) denotes the log likelihood of yi;t given
the aggregate shock
t,
and (iv) the time-series function g (zs j ; ) = g ( s j
the conditional probability density function of the aggregate shock
4
s
given
s 1;
) is the log of
s 1.
In this special
Even if the time subscript t is not necessary in this subsection, we keep it here for notational consistency
because later we consider the case where longitudinal data are collected.
6
case the time-series function g does not depend on the cross-sectional parameters .
More generally, the functions f and g may be the basis for method of moments (the exactly
identi…ed case) or GMM (the overidenti…ed case) estimation. In these situations parameters are
identi…ed from the conditions EC [f (yi;t j ; t ; )] = 0 given the shock
t
and E [g (zs j ; )] = 0:
The …rst expectation, EC ; is understood as being over the cross-section population distribution
holding
t
…xed, while the second, E ; is over the stationary distribution of the time-series data
generating process.
We assume that our cross-sectional data consist of fyi;t ; i = 1; : : : ; ng, and our time series data
consist of fzs ; s =
0
+ 1; : : : ;
0
+ g. For simplicity, we assume that
0
= 0 in this section.
The parameters of the general model can be estimated by maximizing a well-speci…ed objective
function, for instance the likelihood function of the model or the negative of a GMM criterion function.5 Since in our case the general framework is composed of two submodels, a natural approach
is to estimate the parameters of interest by maximizing two separate objective functions, one
for the cross-sectional model and one for the time-series model. We denote these criterion functions by Fn ( ; t ; ) and G ( ; ) : In the case of maximum likelihood these functions are simply
P
P
Fn ( ; t ; ) = n1 ni=1 f (yi;t j ; t ; ) and G ( ; ) = 1 s=1 g (zs j ; ) : In the case of moment
P
based estimation, we de…ne the empirical moment functions hn ( ; t ; ) = n1 ni=1 f (yi;t j ; t ; )
P
and k ( ; ) = 1 s=1 g (zs j ; ) : For two conforming, almost surely positive de…nite weight matrices Wn and W T , we can then de…ne Fn ( ; t ; ) =
hn ( ; t ; )0 WnC hn ( ; t ; ) and G ( ; ) =
k ( ; )0 W k ( ; ). The use of two separate objective functions is helpful in our context because it enables us to discuss which issues arise if only cross-sectional variables or only time-series
variables are used in the estimation.6
We now formalize the idea that the identi…cation of the parameters of the general model
requires the joint use of cross-sectional and time-series data. Let F ( ; t ; ) and G ( ; ) be the
5
Our discussion is motivated by Newey and McFadden’s (1994) uni…ed treatment of maximum likelihood and
GMM as extremum estimators.
6
Note that our framework covers the case where the joint distribution of (yit ; zt ) is modelled. Considering the
two components separately adds ‡exibility in that data is not required for all variables in the same period.
7
probability limit of the objective functions described above, i.e.
F ( ; t ; ) = plim Fn ( ; t ; ) ;
n!1
G ( ; ) = plim G ( ; ) :
!1
Let the argmax of those probability limits be de…ned as follows:
( ( ) ; ( ))
argmax F ( ; ; ) ;
;
( )
and denote with
0
and
argmax G ( ; ) ;
the true values of the cross-sectional and time-series parameters.
0
Observe that, if the objective function F of the cross-sectional model evaluated at the parameters that maximize it takes the same value for any feasible parameter , then
cannot
be consistently estimated using exclusively the data that characterize the cross-sectional model.
Based on this observation, we can express the idea that with cross-sectional data alone a subset
of the parameters of interest cannot be consistently estimated as follows:
max F ( ; t ; ) = max F ( ;
;
We also assume that
0
;
t
for all
0; t)
2
t
is needed for the consistent estimation of
( ( ) ; ( )) 6= ( 0 ; t )
for all
6=
(1)
and vt :
(2)
0:
Analogously, for the class of models we consider, time-series data alone is not su¢ cient to consistently estimate the time series parameters . This idea can be formalized in the following
way:
for all
max G ( ; ) = max G ( 0 ; )
( ) 6=
0
for all
6=
2
;
0:
In our class of models, however, all the parameters of interest can be consistently estimated if crosssectional data are combined with time-series data. De…ne
( 0 ; t0 )0 and impose the following
two assumptions: (i) there exists a unique solution to the system of equations:
@F ( ; t ; ) @G ( ; )
;
= 0;
@ 0
@ 0
8
(3)
and (ii) the solution is given by the true value of the parameters.
In the next subsection, we discuss an example which illustrates why, in our class of models,
some of the parameters are generally not consistently estimated if one uses cross-sectional data
alone.
2.2
Example 1: Portfolio Choice
Consider an economy that, in each period t, is populated by n households. These households are
born at the beginning of period t, live for one period, and are replaced in the next period by n
new families. The households living in consecutive periods do not overlap and, hence, make independent decisions. Each household is endowed with deterministic income and has preferences over
a non-durable consumption good ci;t . The preferences can be represented by Constant Absolute
Risk Aversion (CARA) utility functions which take the following form: U (ci;t ) =
e
ci;t
. For
simplicity, we normalize income to be equal to 1.
During the period in which households are alive, they can invest a share of their income in a
risky asset with return ui;t . The remaining share is automatically invested in a risk-free asset with
a return r that does not change over time. At the end of the period, the return to the investment
is realized and households consume the quantity of the non-durable good they can purchase with
the realized income. The distribution of the return on the risky asset depends on aggregate shocks.
This is modelled by considering an economy in which the return on the risky asset is the sum of
two components: ui;t =
t
+
i;t ,
where
is the aggregate shock and
t
i;t
is an i.i.d. idiosyncratic
shock. The idiosyncratic shock, and hence the heterogeneity in the return on the risky asset, can be
interpreted as di¤erences across households in transaction costs, in information on the pro…tability
of di¤erent stocks, or in marginal tax rates. We assume that
hence that ui;t
N( ;
2
), where
2
=
2
+
2
t
N( ;
2
),
i;t
N (0;
2
), and
.
Household i living in period t chooses the fraction of income to be allocated to the risk-free
asset
i;t
by maximizing its life-time expected utility:
max E
e
e
s:t: ci;t = e (1 + r) + (1
9
ci;t
e) (1 + ui;t ) ;
(4)
where the expectation is taken with respect to the return on the risky asset. It is straightforward
to show that the household’s optimal choice of
is given by7
i;t
2
i;t
=
=
+r
2
(5)
:
We will assume that the econometrician is mainly interested in estimating the risk aversion parameter .
We now consider an estimator that takes the form of a population analog of (5), and study
the impact of aggregate shocks on the estimator’s consistency when an econometrician works only
with cross-sectional data. Our analysis reveals that such an estimator is inconsistent, since the
presence of parameters related to aggregate risk precludes cross-sectional ergodicity. Our analysis
makes explicit the dependence of the estimator on the probability distribution of the aggregate
shock and points to the following way of generating a consistent estimator of . Using time series
variation, one can consistently estimate the parameters pertaining to aggregate uncertainty. Those
estimates can then be used in the cross-sectional model to estimate the remaining parameters.8
Without loss of generality, we assume that the cross-sectional data are observed in period t = 1.
The econometrician observes data on the return of the risky asset ui;t and on the return of the riskfree asset r. We assume that in addition he also observes a noisy measure of the share of resources
invested in the risky assets
words, yi = (ui1 ;
i1 ).
i;t
=
+ ei;t , where ei;t is a zero-mean measurement error. In other
We make the simplifying assumption that the aggregate shock is observable
to econometricians and that the time-series variables only include the aggregate shock: zt =
Because
2
= E [ui1 ],
= Var (ui1 ), and
= E[
i1 ],
t.
if only cross-sectional variation is used for
estimation, we would have the following method-of-moments estimators of those parameters:
1X
ui1 = u;
n i=1
n
^=
1X
(ui1
n i=1
n
^2 =
1X
n i=1
n
u)2 ;
and
^=
i1 :
In the second step, the econometrician can then use equation (5) to write the risk aversion parameter as
7
8
=(
r)/ (
2
(1
)) and estimate it using the sample analog ^ = (^
r)/ (^ 2 (1
^ )).
See the appendix.
Our model is a stylized version of many models considered in a large literature interested in estimating the
parameter
using cross-sectional variation. Estimators are often based on moment conditions derived from …rst
order conditions (FOC) related to optimal investment and consumption decisions. Such estimators have similar
problems, which we discuss in Appendix A.2.
10
In the presence of the aggregate shock
1X
^=
ui1 =
n i=1
n
1X
(ui1
n i=1
n
^2 =
t,
we would have
1X
1+
n i=1
n
n
which implies that
1X
(
n i=1
=
1
+ op (1) ;
n
u)2 =
1X
+
ei1 =
n i=1
^=
i1
i1
)2 =
2
+ op (1) ;
+ op (1) ;
will be estimated to be
+ op (1)
( 2 + op (1)) (1
1
^=
r
=
+ op (1))
r
1
2
(1
)
(6)
+ op (1) :
Using Equation (6), we can study the properties of the proposed estimator ^. If there were
no aggregate shock in the model, we would have
1
=
,
2
= 0,
2
=
2
and, therefore, ^
would converge to , a nonstochastic constant, as n grows to in…nity. It is therefore a consistent
estimator of the risk aversion parameter. In the presence of the aggregate shock, however, the
proposed estimator has di¤erent properties. If one conditions on the realization of the aggregate
shock , the estimator ^ is inconsistent with probability 1, since it converges to
to the true value
r
(
2 + 2 )(1
)
1
2 (1
r
)
and not
. If one does not condition on the aggregate shock, as n grows to
in…nity, ^ converges to a random variable with a mean that is di¤erent from the true value of
the risk aversion parameter. The estimator will therefore be biased and inconsistent. To see this,
remember that
1
N( ;
2
). As a consequence, the unconditional asymptotic distribution of ^
takes the following form:
^!N
r
2
(1
)
;
1
2 (1
2
2
)
!
2
=N
+
;
2
2
(
2
(
1))2
;
2
which is centered at
+
2
and not at , hence the bias.
We are not the …rst to consider a case in which the estimator converges to a random variable.
Andrews (2005) and more recently Kuersteiner and Prucha (2013) discuss similar scenarios. Our
example is remarkable because the nature of the asymptotic randomness is such that the estimator
is not even asymptotically unbiased. This is not the case in Andrews (2005) or Kuersteiner and
Prucha (2013), where in spite of the asymptotic randomness the estimator is unbiased.9
9
Kuersteiner and Prucha (2013) also consider cases where the estimator is random and inconsistent. However, in
11
There is a simple explanation for the result discussed above: cross-sectional variation is not
su¢ cient for the consistent estimation of the risk aversion parameter if aggregate shocks a¤ect
individual decisions.10 To make this point clearer, we now relate this result to the non-consistency
conditions (1)-(2) introduced in Section 2.1. We note that given the assumptions of this subsection,
conditional on the aggregate shock, yi has the following distribution
3 2
02
31
2
1
0 C
7 4
B6
5A ;
y i j 1 N @4 ( 2 + 2 ) + r
5;
2
0
e
( 2 + 2)
(7)
Using (7), it is straightforward to see that the cross-sectional likelihood is maximized for any
arbitrary choice of the time-series parameters
=( ;
2
), as long as one chooses
that satis…es
the following equation:
(
2
+ 2) + r
( 2 + 2)
= :
As a consequence, the non-consistency conditions (1)-(2) are satis…ed. Hence,
consistently estimated by maximizing the cross-sectional likelihood and
and
2
cannot be
cannot be consistently
estimated using only cross-sectional data.
A solution to the problem discussed in this subsection is to combine cross-sectional variables
with time-series variables. In this case, one can consistently estimate ( ;
series of the aggregate data fzt g. Consistent estimation of ( ;
plugging the consistent estimators of ( ;
2
2
;
2
e)
2
) by using the time-
can then be achieved by
) in the correctly speci…ed cross-section likelihood (7).
The example in this subsection is a simpli…ed version of the model in Section 2.1 in the sense
that the relationship between the cross-sectional and time-series submodels is simple and onedirectional. The variables and parameters of the time-series submodel a¤ect the cross-sectional
their case this happens for quite di¤erent reasons, due to endogeneity of the factors. The inconsistency considered
here occurs even when the factors are strictly exogenous.
10
As discussed brie‡y in the introductory section, a common practice to address the problems with aggregate
shocks is to include time dummies in the model. The portfolio example clari…es that the addition of time dummies
does not solve the problem generated by the presence of aggregate shocks. The inclusion of time dummies is
equivalent to the assumption that the aggregate shocks are known. But the previous discussion indicates that,
using exclusively cross-sectional data, the estimator ^ is biased and inconsistent even if the aggregate shocks are
known. An unbiased and consistent estimator of
can only be obtained if the distribution of the aggregate shocks
is known, which is feasible only by exploiting the variation contained in time-series data.
12
model, but the cross-sectional variables and parameters have no impact on the time-series model.
As a consequence, the time-series parameters can be consistently estimated without knowing the
cross-sectional parameters. The cross-sectional parameters, however, can be consistently estimated
only if the time-series parameters are known. In more complicated situations, such as general
equilibrium models, where aggregate shocks are a natural feature, the relationship between the
two submodels can be bi-directional. In the next subsection, we discuss a general-equilibrium
example in which the relationship between the two submodels is bi-directional as in the general
model.
2.3
Example 2: A General Equilibrium Model
We consider as a second example a general equilibrium model of labor force participation and
saving decisions in which aggregate shocks in‡uence individual choices. In principle, we could
have used as a general example a model proposed in the general equilibrium literature such as the
model developed in Lee and Wolpin (2006). We decided against this alternative because in those
models the e¤ect of the aggregate shocks and the relationship between the cross-sectional and
time-series submodels is complicated and therefore di¢ cult to describe. Instead, we have decided
to develop a model that is su¢ ciently general to generate an interesting relationship between the
two submodels, but at the same time is stylized enough for this relationship to be easy to describe
and understand.
Consider an economy populated in each period by one in…nitely-lived …rm and two overlapping
generations. Each generation is composed of individuals who live for two periods – the working
period and the retirement period. Individual i in period t has preferences over a consumption
good ci;t and leisure li;t , which can be characterized using a constant absolute risk aversion utility
function of the following form:
ui (ci;t ; li;t ) =
exp f r ( ci;t + (1
) li;t )g :
(8)
In the …rst period, individuals choose whether to work, consumption, and how much to save in
a risk-free asset with constant return R for the retirement period. These choices maximize the
individual’s life-time utility. Each individual is endowed with one unit of time. If they choose to
work, they supply inelastically this unit of time and provide to the …rm the amount of e¤ective
13
labor 1 ei;t –the unity of time multiplied by their ability. The inverse ei;t1 of the time endowment
is drawn from a uniform distribution de…ned on [ ; 1]. In the second period, they simply consume
the available resources. In both periods, individuals are endowed with non-labor income y. The
second period non-labor income –pension income –follows the following process:
yi;t+1 =
1
+
2 xi;t
S
t+1
+ ln
+
(9)
i;t+1 ;
where xi;t is an observable variable known at t, such as the outcome of an aptitude test,
an aggregate shock determining the current state of the economy, and
shock. We assume that
i;t+1
i;t+1
S
t+1
is
is an idiosyncratic
is independent of xi;t as well as ei;t . The aggregate shock follows the
following AR(1) process:
ln
with
t+1
S
t+1
= % ln
S
t
+
normally distributed with mean 0 and variance
a normal distribution with mean zero and variance,
(10)
t+1 ;
2
. The idiosyncratic shock is drawn from
. Individuals know the process governing
non-labor income. After they observe yi;t+1 and xi;t , they therefore know zt+1 = ln
S
t+1
+
i;t+1 .
Each generation is composed of NtA skilled individuals and NtB unskilled individuals, where
the type is observed to the econometrician. If a skilled individual chooses to work, she receives
A
A
earning wi;t
with wi;t
= wtA ei;t , where wtA is the period-t wage paid by the in…nitely-lived …rm
to all skilled workers and ei;t is the amount of labor supplied by the worker. Similarly, unskilled
B
= wtB ei;t .
workers are paid wi;t
Each period, the in…nitely-lived …rm produces the consumption good using two inputs, skilled
and unskilled labor, and the following Cobb-Douglas production function:
B
=
f LA
t ; Lt
with
A
+
B
< 1, where
D
t
D
t
LA
t
A
LB
t
B
;
(11)
B
is an aggregate shock, and LA
t and Lt are the labor demands for
skilled and unskilled labor. Without loss of generality, we normalize the price of consumption to
1. The …rm chooses skilled and unskilled labor by maximizing its pro…t function.
The next Proposition describes the main features of the equilibrium that characterizes the
economy.
14
Proposition 1 In equilibrium, period-t consumption of skilled and unskilled individuals who choose
to work takes the following form:
ch;w
i;t =
where
z zt
R
1
h
yi;t + wi;t
+
+
R+1
(R + 1)
and
2
z
1
+ 2 xi;t
R+1
r z2
z zt
+
;
2 (R + 1) R + 1
h = A; B;
are the mean and variance of zt+1 conditional on zt . The corresponding con-
sumption for individuals who decide not to work is
R
yi;t +
R+1
ch;nw
=
i;t
1
r z2
z zt
+
;
2 (R + 1) R + 1
+ 2 xi;t
R+1
(12)
h = A; B:
A worker chooses to work with probability
P
1
ei;t
wth
1
=
(1
) (1
)
wth
1
;
(13)
h = A; B:
With this probability, the period-t equilibrium wages of skilled and unskilled individuals solve the
following two equations:
ln
D
t
+ (1
B ) ln A
A
B ) ln wt
(1
1
A
+
B
ln
B
B
ln wtB
B
= ln NtA +ln
(1
wtA
ln (1
);
wtB
ln (1
):
)
and
ln
D
t
+ (1
A ) ln B
(1
1
B
A ) ln wt
A
+
A
ln
A
B
A
ln wtA
= ln NtB +ln
(1
)
Proof. In Appendix B.1.
We can now consider the estimation of the parameters of interest. For simplicity, we will assume
that
A
and
B
are known.11 Parameters in this model can be consistently estimated exploiting
both cross-sectional and time-series data. The cross section data consist of n i.i.d. observations
yi;t+1 ; xi;t ; cA;nw
; cB;nw
for t = 1. The time series data include (Yt+1 ; Xt ), where Yt
i;t
i;t
EC [yi;t ] and
EC [xi;t ].12 They also include the probability of working for each type of worker, as well as
Xt
the equilibrium wages. We will assume that the econometrician knows the value of R. By equation
(13), we can assume that
and
11
12
and
are known to the econometrician; it is because we can solve for
by plugging the probability of working in its left hand side and the equilibrium wage in
In Appendix B.2, we discuss how A and B can be estimated.
The EC [ ], which are formally de…ned in Section 2.1, can be understood to be cross-sectional expectations.
15
its right hand side for skilled and unskilled workers. We will assume that the econometrician’s
primary objective is to estimate the risk aversion parameter r in (8).13
We initially provide an intuitive discussion. First, equation (9) reveals that a cross-section
regression of yi;t+1 on xi;t and a constant provides a consistent estimator of
1
Moreover, the estimated variance of the error term represents a consistent estimator of
using the
2
consistently estimated in the …rst-step cross-section regression, let Vt
which implies that Vt =
Vt+1 = (1
%)
1
1
+ %Vt +
S
t+1 .
+ ln
t+1 .
.
. Second,
Yt+1
2 Xt ,
As a consequence, an autoregression of Vt on its lag and a constant
term is a consistent estimator of
1;
2
2
Using equation (10), it is then straightforward to show that
provides a consistent estimator of ((1
consistently estimate %;
S
t+1 ;
+ ln
2
2
%)
In addition, the estimated variance of the error
1 ; %).
. Thus, the second-step, time-series estimation allows us to
using cross-sectional estimates obtained in the …rst step. This
result is similar to the portfolio-choice example – we need cross-sectional estimates to generate
consistent estimates of the time-series parameters.
Now note that the …rst and second steps provide consistent estimators of
2
z,
z;
and ln
S
t ,
because
z
=
z
%;
2
;
%1
2
2
1
and Vt
1
=
1
+ ln
S
t .
%2
2
%2
+
%1
2
;
2
2
z
=
2
z
%;
2
;
Moreover, replacing for zt = ln
2
1
S
t
+
i;t
%2
+
2
2
1
%2
2
2
%2
+
;
2
in equation (12), we can use
the cross-sectional average of (R + 1) ca;nw
Ryi;t
2 xi;t to produce a consistent estimator of
i;t
2
r z
+ z ln tS . Lastly, since 1 ; z ; z2 ; ln tS have been estimated in the …rst two steps, we
1
2
r z2
can solve for the risk-aversion parameter r using the consistent estimator of 1
+ z ln tS .
2
The result is a consistent estimator of r, which can only be generated because we have previously
estimated parameters of the cross-sectional model and parameters of the time-series model. We
therefore have the bi-directional relationship between the two submodels that was missing from
the portfolio regression.
The intuitive estimation strategy may be formally related to the method-of-moments type
identi…cation discussed at the end of Section 2.1. For this purpose, it su¢ ces to write
13
=
It is straightforward to show that the parameters characterizing the production function (11) can be estimated
separately using time-series variation alone.
16
(r;
1;
2;
2
= (%; ! 2 ),
),
2
6
6
6
6
6
hn ( ; t ; ) = 6
6
6
6
6
4
1
n
Pn
i=1
1
n
1
n
Pn
i=1
Pn
0
@
i=1
(R + 1) ca;nw
i;t
1
xi;t
1
A yi;t+1
yi;t+1
1
Ryi;t
2 xi;t
+ ln
0
B
@
and
2
6
6
k ( ; )=6
4
1
P
s=1
1
0
@
1
P
Ys+1
s=1
2 Xs
((Ys+2
1
A ((Ys+2
2 Xs+1 )
1
2 Xs+1 )
(1
%)
1
S
t+1
+ ln
S
t+1
2 xi;t
2 xi;t
2
2
r
2
z
2
(Yt
1
+
z
(1
%;
%)
% (Ys+1
2
3
;
1
%;
2
2
2
;
2 Xt 1
% (Ys+1
2
2 Xs ))
2 Xs ))
2
1)
7
7
7
7
7
7
1 7;
7
7
C 7
A 5
3
7
7
7:
5
This characaterization of the model makes clear that (i) it is identi…ed if both cross-sectional and
time-series data are available, (ii) using only cross-sectional data the model cannot be identi…ed
unless a subset of the time-series parameters are known, and (iii) using time-series data the model
can only be identi…ed if a subset of the cross-sectional parameters are known.
2.4
Long Panels?
One may wonder whether the problems discussed in this section can be addressed by exploiting
panel data sets in which the time series dimension of the panel data is large enough. Our proposal
discussed in the next section requires access to two data sets, a cross-section (or short panel) and
a long time series of aggregate variables. One obvious advantage of combining two sources of data
is that time series data may contain variables that are unavailable in typical panel data sets. For
example the in‡ation rate potentially provides more information about aggregate shocks than is
available in panel data. We argue in this subsection that even without access to such variables,
the estimator based on the two data sets is expected to be more precise, which suggests that the
advantage of data combination goes beyond availability of more observable variables.
Consider the alternative method based on one long panel data set, in which both n and T go to
in…nity. Since the number of aggregate shocks
t
increases as the time-series dimension T grows,
we expect that the long panel analysis can be executed with tedious yet straightforward arguments
17
by modifying ideas in Hahn and Kuersteiner (2002), Hahn and Newey (2004) and Gagliardini and
Gourieroux (2011), among others.
We will now illustrate potential problem with the long panel approach with a simple arti…cial
example. Suppose that the econometrician is interested in the estimation of a parameter
that
characterizes the following system of linear equations:
qi;t = xi;t
t
=!
!
t 1
+
t
+ "i;t
i = 1; : : : ; n; t = 1; : : : ; T;
+ ut :
The variables qi;t and xi;t are observed and it is assumed that xi;t is strictly exogenous in the sense
that it is independent of the error term "i;t , including all leads and lags. For simplicity, we also
assume that ut and "i;t are normally distributed with zero mean and that "i;t is i.i.d. across both
i and t. We will denote by
In order to estimate
the ratio / !.
based on the panel data f(qi;t ; xi;t ) ; i = 1; : : : ; n; t = 1; : : : ; T g, we can
adopt a simple two-step estimator of . In a …rst step, the parameter
and the aggregate shocks
are estimated using an Ordinary Least Square (OLS) regression of qi;t on xi;t and time dummies.
t
In the second step, the time-series parameter ! is estimated by regressing bt on bt 1 , where bt ,
t = 1; : : : ; T , are the aggregate shocks estimated in the …rst step using the time dummies. An
estimator of
can then be obtained as b!
b.
The following remarks are useful to understand the properties of the estimator b = b!
b . First,
even if
t
were observed, for !
b to be a consistent estimator of ! we would need T to go to in…nity,
under which assumption we have !
b = !+Op T
1=2
. This implies that it is theoretically necessary
to assume that our data source is a “long”panel, i.e., T ! 1. Similarly, ^t is a consistent estimator
of
t
only if n goes to in…nity. As a consequence, we have ^t =
t
+ Op n
1=2
. This implies that
it is in general theoretically necessary to assume that n ! 1.14 Moreover, if n and T both go to
in…nity, b is a consistent estimator of
b = b!
b=
The Op n
14
1=2
+ Op
T
1=2
1
p
nT
and b = + Op n
! + Op
1
p
T
1=2
T
1=2
= ! + Op
. All this implies that
1
p
T
=
+ Op
estimation noise of b, which is dominated by the Op T
For !
b to have the same distribution as if
t
1=2
1
p
T
:
, is the term
were observed, we need n to go to in…nity faster than T or
equivalently that T = o (n). See Heckman and Sedlacek (1985, p. 1088).
18
that would arise if ! were not estimated. The term re‡ects typical …ndings in long panel analysis
(i.e., large n, large T ), where the standard errors are inversely proportional to the square root of
the number n
T of observations. The fact that it is dominated by the Op T
1=2
term indicates
that the number of observations is e¤ectively equal to T , i.e., the long panel should be treated as
a time series problem for all practical purposes.
This conclusion has two interesting implications. First, the sampling noise due to cross-section
variation should be ignored and the “standard” asymptotic variance formulae should generally
be avoided in the panel data analysis when aggregate shocks are present. We note that Lee and
Wolpin’s (2006, 2010) standard errors use the standard formula that ignores the Op T
1=2
term.
Second, since in most cases the time-series dimension T of a panel data set is relatively small,
despite the theoretical assumption that it grows to in…nity, estimators based on panel data will
generally be more imprecise than may be expected from the “large”number n T of observations.15
In the next section, we formally develop an alternative asymptotic framework that enables one
to combine the two sources of data without having to worry about that drawback. Our proposal
improves precision by exploiting long time series data of aggregate variables.
3
Asymptotic Inference
While so far our discussion focused on the question of identi…cation we now turn the focus on
correct inference. The model in Section 2.1 can be generalized by allowing the cross section data
to be a short panel data set. This implies that the cross-sectional part of our estimator is based on
P P
Fn ( ; t ; ) = n1 ni=1 Tt=1 f (yi;t j ; t ; ) in the case of maximum likelihood estimation, and on
P P
hn ( ; t ; ) = n1 ni=1 Tt=1 f (yi;t j ; t ; ) with Fn ( ; t ; ) = hn ( ; t ; )0 WnC hn ( ; t ; ) in the
case of moment based models. The de…nition of the time series criterion G ( ; ) is not a¤ected
by the presence of panel cross-sectional data. The M-estimator solves
argmax Fn ( ; t ; )
;
15
and
argmax G ( ; ) ;
(14)
1 ;:::; T
This raises an interesting point. Suppose there is an aggregate time series data set available with which
consistent estimation of
is feasible at the standard rate of convergence. Also suppose that the number of
observations there, say , is a lot larger than T . If this were the case, we should probably speculate that the panel
data analysis is strictly dominated by the time series analysis from the e¢ ciency point of view.
19
with the moment equation (3) modi…ed accordingly. We present and study an asymptotic framework such that T is …nite but goes to in…nity at the same rate as n. Here fyi;t ; i = 1; : : : ; n; t = 1; : : : ; T g
denotes our cross section data of individual observations, and fzs ; s = 1; : : : ; g denotes our time
series data of aggregate variables. Our focus in this paper is on the asymptotic distribution of
the estimators. We provide an intuitive sketch of the basic properties of these estimators in this
section, leaving the technical details to Section 4.
Our asymptotic framework is such that standard textbook level analysis su¢ ces for the discussion of consistency of the estimator. In standard analysis with a single data source, one typically
restricts the moment equation to ensure identi…cation, and imposes further restrictions such that
the sample analog of the moment function converges uniformly to the population counterpart. An
identical analysis applies to our problem, and as such, we simply impose a high-level assumption
that our estimator is consistent.
We begin this section by arguing that a more intuitive estimation strategy exploiting long
panel data may have subtle problems. We then present the intuition underlying the main results.
3.1
Stationary Models
Suppose that there is a separate time series zt such that its log of conditional probability density
function given zt
1
is given by g (zt j zt 1 ; ). Note that the time series model does not depend on
the micro parameter . Let e denote a consistent estimator.
We assume that the dimension of the time series data is , and that the in‡uence function of
e is such that
with E ['s ] = 0. Here,
p
0
(e
1
)= p
0+
X
(15)
's
s= 0 +1
+ 1 denotes the beginning of the time series data, which is allowed
to di¤er from the beginning of the panel data. Using e from the time series data, we can then
consider maximizing the criterion Fn ( ; t ; ) with respect to
=( ;
1 ; : : : ; T ).
representation is the idea that we are given a short panel for estimation of
Implicit in this
= ( ;
1 ; : : : ; T ),
where T denotes the time series dimension of the panel data. In order to emphasize that T is
small, we use the term ’cross-section’for the short panel data set, and adopt asymptotics where
20
T is …xed. The moment equation then is
@Fn b; e
@
=0
and the asymptotic distribution of b is characterized by
p
Because
p
n (@Fn ( ; e) =@
p
with
@ 2 F ( ; e)
@ @ 0
n b
@Fn ( ; ) =@ )
A
p @Fn ( ; e)
n
@
p
(@ 2 F ( ; ) =@ @ 0 ) pn
p @Fn ( ; )
A 1 n
@
n b
1
@ 2F ( ; )
;
@ @ 0
B
p
n
1
A Bp
1
p
p
:
) we obtain
(e
0+
X
's
s= 0 +1
!
(16)
@ 2F ( ; )
@ @ 0
We adopt asymptotics where n; ! 1 at the same rate, but T is …xed. We stress that a technical di¢ culty arises because we are conditioning on the factors ( 1 ; : : : ;
T):
This is accounted for
in the limit theory we develop through a convergence concept by Renyi (1963) called stable convergence, essentially a notion of joint convergence. It can be thought of as convergence conditional
on a speci…ed -…eld, in our case the -…eld C generated by ( 1 ; : : : ; T ). In simple special cases,
P 0+
and because T is …xed, the asymptotic distribution of p1
s= 0 +1 's conditional on ( 1 ; : : : ; T )
may be equal to the unconditional asymptotic distribution. However, as we show in Section 4,
this is not always the case, even when the model is stationary.
Renyi (1963) and Aldous and Eagleson (1978) show that the concepts of convergence of the
distribution conditional on any positive probability event in C and the concept of stable convergence are equivalent. Eagelson (1975) proves a stable CLT by establishing that the conditional
characteristic functions converge almost surely. Hall and Heyde’s (1980) proof of stable convergence on the other hand is based on demonstrating that the characteristic function converges
weakly in L1 . As pointed out in Kuersteiner and Prucha (2013), the Hall and Heyde (1980) approach lends itself to proving the martingale CLT under slightly weaker conditions than what
Eagleson (1975) requires. While both approaches can be used to demonstrate very similar stable and thus conditional limit laws, neither simpli…es to conventional marginal weak convergence
except in trivial cases. For this reason it is not possible to separate the cross-sectional inference
21
problem from the time series problem simply by ‘…xing’the common shocks ( 1 ; : : : ;
approach would only be valid if the shocks ( 1 ; : : : ;
T)
T ).
Such an
did not change in any state of the world,
in other words if they were constants in a probabilistic sense. Only in that scenario would stable
or conditional convergence be equivalent to marginal convergence. The inherent randomness of
( 1; : : : ;
T);
taken into account by the rational agents in the models we discuss, is at the heart of
our examples and is the essence of the inference problems we discuss in this paper. Thus, treating
( 1; : : : ;
T)
as constants is not an option available to us. This is also the reason why time dummies
are no remedy for the problems we analyze. A related idea might be to derive conditional (on
C) limiting results separately for the cross-section and time series dimension of our estimators.
As noted before, such a result in fact amounts to demonstrating stable convergence, in this case
for each dimension separately. Irrespective, this approach is ‡awed because it does not deliver
joint convergence of the two components. It is evident from (16) that the continuous mapping
theorem needs to be applied to derive the asymptotic distribution of b: Because both A and B are
C-measurable random variables in the limit the continuous mapping theorem can only be applied if
P 0+
p
joint convergence of n@Fn ( ; ) =@ ; 1=2 s=
's and any C-measurable random variable is
0 +1
established. Joint stable convergence of both components delivers exactly that. Finally, we point
out that it is perfectly possible to consistently estimate parameters, in our case ( 1 ; : : : ;
T ),
that
remain random in the limit. For related results, see the recent work of Kuersteiner and Prucha
(2015).
Here, for the purpose of illustration we consider the simple case where the dependence of the
time series component on the factors vanishes asymptotically. Let’s say that the unconditional
distribution is such that
1
p
where
0+
X
s= 0 +1
's ! N (0;
)
is a …xed constant that does not depend on ( 1 ; : : : ;
p @Fn ( ; )
n
! N (0;
@
conditional on ( 1 ; : : : ;
on ( 1 ; : : : ;
T)
T ).
T):
Let’s also assume that
y)
Unlike in the case of the time series sample,
y
generally does depend
through the parameter .
We note that @F ( ; ) =@ is a function of ( 1 ; : : : ;
If there is overlap between (1; : : : ; T )
P 0+
and ( 0 + 1; : : : ; 0 + ), we need to worry about the asymptotic distribution of p1
s= 0 +1 's
22
T ).
conditional on ( 1 ; : : : ;
However, because in this example the only connection between y and
p
' is assumed to be through and because T is assumed …xed, the two terms n@Fn ( ; ) =@
P
and 1=2 t=0 +0 +1 't are expected to be asymptotically independent in the trend stationary case
and when
T ).
does not depend on ( 1 ; : : : ;
T ).
Even in this simple setting, independence between
the two samples does not hold, and asymptotic conditional or unconditional independence as well
as joint convergence with C-measurable random variables needs to be established formally. This
is achieved by establishing C-stable convergence in Section 4.2.
It follows that
p
where
n b
! N 0; A
1
yA
1
+ A 1B
B0A
1
(17)
;
lim n/ . This means that a practitioner would use the square root of
1
A
n
1
yA
1
+
n
A 1B
B0A
1
=
1
A
n
1
yA
1
1
+ A 1B
B0A
1
as the standard error. This result looks similar to Murphy and Topel’s (1985) formula, except
that we need to make an adjustment to the second component to address the di¤erences in sample
sizes.
The asymptotic variance formula is such that the noise of the time series estimator e can make
quite a di¤erence if
is large, i.e., if the time series size
is small relative to the cross section size
n. Obviously. this calls for long time series for accurate estimation of even the micro parameter
. We also note that time series estimation has no impact on micro estimation if B = 0. This
con…rms the intuition that if
does not appear as part of the micro moment f , which is the case in
Heckman and Sedlacek (1985), and Heckman, Lochner, and Taber (1998), cross section estimation
can be considered separate from time series estimation.
In the more general setting of Section 4,
may depend on ( 1 ; : : : ;
T):
In this case the limiting
distribution of the time series component is mixed Gaussian and dependent upon the limiting
distribution of the cross-sectional component. This dependence does not vanish asymptotically
even in stationary settings. As we show in Section 4 standard inference based on asymptotically
pivotal statistics is available even though the limiting distribution of b is no longer a sum of two
independent components.
23
3.2
Unit Root Problems
When the simple trend stationary paradigm does not apply, the limiting distribution of our estimators may be more complicated. A general treatment is beyond the scope of this paper and
likely requires a case by case analysis. In this subsection we consider a simple unit root model
where initial conditions can be neglected. We use it to exemplify additional inferential di¢ culties
that arise even in this relatively simple setting. In Section 4.3 we consider a slightly more complex
version of the unit root model where initial conditions cannot be ignored. We show that more
complicated dependencies between the asymptotic distributions of the cross-section and time series
samples manifest. The result is a cautionary tale of the di¢ culties that may present themselves
when nonstationary time series data are combined with cross-sections. We leave the development
of inferential methods for this case to future work.
We again consider the model in the previous section, except with the twist that (i)
AR(1) coe¢ cient in the time series regression of zt on zt
1
is the
with independent error; and (ii)
is at
(or near) unity. In the same way that led to (16), we obtain
p @Fn ( ; )
A
n
@
p
n b
1
1
A B
p
n
(e
)
For simplicity, again assume that the two terms on the right are asymptotically independent. The
…rst term converges in distribution to a normal distribution N (0; A
1
yA
1
), but with
= 1 and
i.i.d. AR(1) errors the second term converges to
W (1)2 1
A B R1
;
2 0 W (r)2 dr
1
p
where
= lim
when
is away from unity. The result is formalized in Section 4.3.
n/
and W ( ) is the standard Wiener process, in contrast to the result in (17)
The fact that the limiting distribution of ^ is no longer Gaussian complicates inference. This
discontinuity is mathematically similar to Campbell and Yogo’s (2006) observation, which leads
to a question of how uniform inference could be conducted. In principle, the problem here can
be analyzed by modifying the proposal in Phillips (2014, Section 4.3). First, construct the 1
con…dence interval for
argmax Fn ( ; ) for
using Mikusheva (2007). Call it [
2[
L;
U ].
Assuming that
24
L;
U ].
Second, compute b ( )
1
is …xed, characterize the asymptotic variance
( ), say, of
p
n b( )
( ) , which is asymptotically normal in general. Third, construct the
con…dence region, say CI ( 2 ; ), using asymptotic normality and ( ). Our con…dence
S
interval for 1 is then given by
2[ L ; U ] CI ( 2 ; ). By Bonferroni, its asymptotic coverage rate
1
2
is expected to be at least 1
4
2.
1
Joint Panel-Time Series Limit Theory
In this section we …rst establish a generic joint limiting result for a combined panel-time series
process and then specialize it to the limiting distributions of parameter estimates under stationarity
and non-stationarity. The process we analyze consists of a triangular array of panel data
y
n;it
observed for i = 1; :::; n and t = 1; : : : ; T where n ! 1 while T is …xed and t = 1 is an
arbitrary normalization of time at the beginning of the cross-sectional sample. It also consists
of a separate triangular array of time series
1 <
K
;t
for t =
0
+ 1; : : : ;
K < 1 for some bounded K and
0
0
+
where
0
is …xed with
y
n;it and
! 1. Typically,
;t
are
the scores of a cross-section and time series criterion function based on observed data yit and zt :
We assume that T
0
y
n;it ;
+ . Throughout we assume that
;t
is a martingale di¤erence
sequence relative to a …ltration to be speci…ed below. We derive the joint limiting distribution
P P
P 0+
y
and a related functional central limit theorem for p1n Tt=1 ni=1 n;it
and p1
;t .
t= 0 +1
We now construct the triangular array of …ltrations similarly to Kuersteiner and Prucha (2013)
by setting C =
G
G
( 1 ; :::;
G
n;0
=C
G
n;i
=
and
(18)
zmin(1; 0 ) ; yj;min(1;
0)
i
j=1
_C
..
.
n;n+i
n;(t min(1; 0 ))n+i
T)
n
j=1
=
yj;min(1;
=
yj;t 1 ; yj;t 2 ; : : : ; yj;min(1;
0)
; zmin(1;
0 )+1
; zmin(1;
0)
; yj;min(1;
0 )+1
..
.
We use the convention that G
n;(t min(1; 0 ))n
added simultaneously to the …ltration G
=G
0)
n
j=1
; zt ; zt 1 ; : : : ; zmin(1;
n;(t min(1; 0 ) 1)n+n :
n;(t min(1; 0 ))n+1 .
25
0)
i
j=1
_C
; fyj;t gij=1 _ C
This implies that zt and y1t are
Also note that G
n;i
predates the time
series sample by at least one period, i.e. corresponds to the ‘time zero’sigma …eld. To simplify
notation de…ne the function qn (t; i) = (t
min(1; 0 ))n + i that maps the two-dimensional index
(t; i) into the integers and note that for q = qn (t; i) it follows that q 2 f0; : : : ; max(T; )ng. The
…ltrations G
E
;t
G
n;q
are increasing in the sense that G
n;qn (t 1;i)
but also on
G
n;q
n;q+1
for all q;
and n. We note that
= 0 for all i is not guaranteed because we condition not only on zt 1 ; zt 2 :::
1 ; :::; T ,
where the latter may have non-trivial overlap with the former.
The central limit theorem we develop needs to establish joint convergence for terms involving
both
y
n;it
and
;t
with both the time series and the cross-sectional dimension becoming large
simultaneously. Let [a] be the largest integer less than or equal a. Joint convergence is achieved
by stacking both moment vectors into a single sum that extends over both t and i: Let r 2 [0; 1]
and de…ne
~ (r) = p ;t 1 f
it
0
+1
t
0
+ [ r]g 1 fi = 1g ;
(19)
which depends on r in a non-trivial way. This dependence will be of particular interest when we
specialize our models to the near unit root case. For the cross-sectional data de…ne
y
~y = pn;it
it
n
(20)
where ~ity is constant as a function of r 2 [0; 1] : In turn, this implies that functional convergence
of the component (20) is the same as the …nite dimensional limit. It also means that the limiting
process is degenerate (i.e. constant) when viewed as a function of r: However, this does not matter
in our applications as we are only interested in the sum
1 XX
p
n t=1 i=1
T
De…ne the stacked vector ~it (r) =
max(T; 0 + )
n
y
n;it
Xn (r) =
=
X
n
X
~y
it
Xny :
t=min(1; 0 +1) i=1
~y0 ; ~ (r)0
it
it
max(T; 0 + )
X
n
X
0
2 Rk and consider the stochastic process
~it (r) ;
0
Xn (0) = (Xny0 ; 0) :
(21)
t=min(1; 0 +1) i=1
We derive a functional central limit theorem which establishes joint convergence between the
panel and time series portions of the process Xn (r). The result is useful in analyzing both trend
stationary and unit root settings. In the latter, we specialize the model to a linear time series
26
setting. The functional CLT is then used to establish proper joint convergence between stochastic
integrals and the cross-sectional component of our model.
For the stationary case we are mostly interested in Xn (1) where in particular
max(T; 0 + )
0+
X
1
p
;t
t= 0 +1
=
X
n
X
~ (1) :
it
t=min(1; 0 +1) i=1
The limiting distribution of Xn (1) is a simple corollary of the functional CLT for Xn (r) : We
note that our treatment di¤ers from Phillips and Moon (1999), who develop functional CLT’s for
the time series dimension of the panel data set. In our case, since T is …xed and …nite, a similar
treatment not applicable.
We introduce the following general regularity conditions. In later sections these conditions will
be specialized to the particular models considered there.
Condition 1 Assume that
i)
y
n;it
ii)
;t
is measurable with respect to G
n;(t min(1; 0 ))n+i :
is measurable with respect to G
n;(t min(1; 0 ))n+i
iii) for some
iv) for some
v) E
vi) E
y
n;it
;t
h
> 0 and C < 1; supit E
h
> 0 and C < 1; supt E
2+
y
n;it
i
2+
;t
i
for all i = 1; :::; n:
C for all n
C for all
G
n;(t min(1; 0 ))n+i 1
= 0:
G
n;(t min(1; 0 ) 1)n+i
= 0 for t > T and all i = 1; :::; n:
1:
1:
Remark 1 Conditions 1(i), (iii) and (v) can be justi…ed in a variety of ways. One is the subordinated process theory employed in Andrews (2005) which arises when yit are random draws
from a population of outcomes y. A su¢ cient condition for Conditions 1(v) to hold is that
E [ (yj ; ; t )j C] = 0 holds in the population. This would be the case, for example, if
were
the correctly speci…ed score for the population distribution. See Andrews (2005, pp. 1573-1574).
Condition 2 Assume that:
i) for any s; r 2 [0; 1] with r > s;
1
Xr]
0 +[
t= 0 +[ s]+1
;t
0 p
;t !
(r)
27
(s) as
!1
where
(s) is positive de…nite a.s. and measurable with respect to
(r)
r 2 (s; 1]. Normalize
( 1 ; :::;
T)
for all
(0) = 0:
ii) The elements of
derivatives _ (r) = @
(r) are bounded continuously di¤erentiable functions of r > s 2 [0; 1]. The
(r) =@r are positive de…nite almost surely.
iii) There is a …xed constant M < 1 such that supk
2Rk
k=1;
supt
0
@
M a.s.
(t) =@t
Condition 2 is weaker than the conditions of Billingsley’s (1968) functional CLT for strictly
stationary martingale di¤erence sequences (mds). We do not assume that E
;t
t
0
is constant.
Brown (1971) allows for time varying variances, but uses stopping times to achieve a standard
Brownian limit: Even more general treatments with random stopping times are possible - see
Gaenssler and Haeussler (1979). On the other hand, here convergence to a Gaussian process (not
a standard Wiener process) with the same methodology (i.e. establishing convergence of …nite dimensional distributions and tightness) as in Billingsley, but without assuming homoskedasticity is
pursued. Heteroskedastic errors are explicitly used in Section 4.3 where
Even if
s
is iid(0;
2
) it follows that
;t
;t
= exp ((t
s) = )
s.
is a heteroskedastic triangular array that depends on .
It can be shown that the variance kernel
(r) is
(r) =
2
(1
exp ( 2r ))/ 2 in this case.
See equation (68).
Condition 3 Assume that
where
ty
1X
n i=1
n
y
n;it
p
y0
n;it !
ty
is positive de…nite a.s. and measurable with respect to
( 1 ; :::;
T):
Condition 2 holds under a variety of conditions that imply some form of weak dependence
of the process
;t .
These include, in addition to Condition 1(ii) and (iv), mixing or near epoch
dependence assumptions on the temporal dependence properties of the process
;t :
Assumption
3 holds under appropriate moment bounds and random sampling in the cross-section even if
the underlying population distribution is not independent (see Andrews, 2005, for a detailed
treatment).
28
4.1
Stable Functional CLT
This section details the probabilistic setting we use to accommodate the results that Jacod and
Shiryaev (2002) (shorthand notation JS) develop for general Polish spaces. Let ( 0 ; F 0 ; P 0 ) be
a probability space with increasing …ltrations Ftn
and an increasing sequence kn ! 1 as n ! 1. Let DRk
[0; 1] ! Rk
n
for any t = 1; :::; kn
Ft+1
F and Ftn
Rk
[0; 1] be the space of functions
Rk that are right continuous and have left limits (see Billingsley (1968, p.109)).
Let C be a sub-sigma …eld of F 0 . Let ( ; Z n (!; t)) :
or random elements in R and DRk
( 0 ; F 0 ; P 0 ) and assume that
Rk
0
[0; 1] ! R
Rk
Rk be random variables
[0; 1], respectively de…ned on the common probability space
is bounded and measurable with respect to C. Equip DRk
Rk
[0; 1]
with the Skorohod topology, see JS (p.328, Theorem 1.14). By Billingsley (1968), Theorem 15.5
the uniform metric can be used to establish tightness for certain processes that are continuous in
the limit.
We use the results of JS to de…ne a precise notion of stable convergence on DRk
Rk
[0; 1].
JS (p.512, De…nition 5.28) de…ne stable convergence for sequences Z n de…ned on a Polish space.
We adopt their de…nition to our setting, noting that by JS (p.328, Theorem 1.14), DRk
Rk
[0; 1]
equipped with the Skorohod topology is a Polish space. Also following their De…nition VI1.1 and
Theorem VI1.14 we de…ne the -…eld generated by all coordinate projections as DRk
De…nition 1 The sequence Z n converges C-stably if for all bounded
C and for all bounded continuous functions f de…ned on DRk
measure
on (
0
DRk
Rk
[0; 1] ; C
n
E [ f (Z )] !
Z
DRk
Rk
Rk
Rk
.
measurable with respect to
[0; 1] there exists a probability
) such that
(! 0 ) f (x) (d! 0 ; dx) :
0
D
Rk
Rk
[0;1]
As in JS, let Q (! 0 ; dx) be a distribution conditional on C such that (d! 0 ; dx) = P 0 (d! 0 ) Q (! 0 ; dx)
and let Qn (! 0 ; dx) be a version of the conditional (on C) distribution of Z n : Then we can de…ne the joint probability space ( ; F; P ) with
=
0
DRk
Rk
[0; 1] ; F = F 0
DRk
Rk
and
P = P (d!; dx) = P 0 (d! 0 ) Q (! 0 ; dx) : Let Z (! 0 ; x) = x be the canonical element on DRk Rk [0; 1] :
R
(! 0 ) f (x) (d! 0 ; dx) = E [ Qf ] :We say that Z n converges C-stably to Z if for
It follows that
all bounded, C-measurable ;
E [ f (Z n )] ! E [ Qf ] :
29
(22)
More speci…cally, if W (r) is standard Brownian motion, we say that Z n ) W (r) C-stably where
the notation means that (22) holds when Q is Wiener measure (for a de…nition and existence proof
see Billingsley (1968, ch.9)). By JS Proposition VIII5.33 it follows that Z n converges C-stably i¤
Z n is tight and for all A 2 C, E [1A f (Z n )] converges.
The concept of stable convergence was introduced by Renyi (1963) and has found wide application in probability and statistics. Most relevant to the discussion here are the stable central
limit theorem of Hall and Heyde (1980) and Kuersteiner and Prucha (2013) who extend the result
in Hall and Heyde (1980) to panel data with …xed T . Dedecker and Merlevede (2002) established
a related stable functional CLT for strictly stationary martingale di¤erences.
Following Billingsley (1968, p. 120) let
r1 ;:::;rk Z
n
= Zrn1 ; :::; Zrnk be the coordinate projections
of Z n : By JS VIII5.36 and by the proof of Theorems 5.7 and 5.14 on p. 509 of JS (see also, Rootzen
(1983), Feigin (1985), Dedecker and Merlevede (2002, p. 1057)), C-stable convergence for Z n to
Z follows if E
f Zrn1 ; :::; Zrnk
! E [ f (Zr1 ; :::; Zrn )] and Z n is tight under the measure P: We
note that the …rst condition is equivalent to stable convergence of the …nite dimensional vector of
random variables Zrn1 ; :::; Zrnk de…ned on Rk and is established with a multivariate stable central
limit theorem.
Theorem 1 Assume that Conditions 1, 2 and 3 hold. Then it follows that for ~it de…ned in (21),
and as ; n ! 1 and T …xed,
2
Xn (r) ) 4
where By (r) =
y
1=2
Wy (r); B (r) =
measurable, _ (s) = @
Rr
0
By (1)
B (r)
3
5 (C-stably)
_ (s)1=2 dW (s) and
(r) = diag (
y;
(r)) is C-
(s) =@s and (Wy (r) ; W (r)) is a vector of standard k -dimensional
Brownian processes independent of
.
Proof. In Appendix C.
Remark 2 Note that the component Wy (1) of W (r) does not depend on r: Thus, Wy (1) is simply
a vector of standard Gaussian random variables, independent both of W (r) and any random
variable measurable w.r.t C.
30
The limiting random variables By (r) and B (r) both depend on C and are thus mutually
dependent. The representation By (1) =
y
1=2
Wy (1); where a stable limit is represented as the
product of an independent Gaussian random variable and a scale factor that depends on C, is common in the literature on stable convergence. Results similar to the one for B (r) were obtained by
Phillips (1987, 1988) for cases where _ (s) is non-stochastic and has an explicitly functional form,
notably for near unit root processes and when convergence is marginal rather than stable. Rootzen
(1983) establishes stable convergence but gives a representation of the limiting process in terms
of standard Brownian motion obtained by a stopping time transformation. The representation of
B (r) in terms of a stochastic integral over the random scale process _ (s) seems to be new. It
is obtained by utilizing a technique mentioned in Rootzen (1983, p. 10) but not utilized there,
namely establishing …nite dimensional convergence using a stable martingale CLT. This technique
combined with a tightness argument establishes the characteristic function of the limiting process.
The representation for B (r) is then obtained by utilizing results for characteristic functions of
a¢ ne di¤usions in Du¢ e, Pan and Singletion (2000). Rootzen (1983, p.13) similarly utilizes characteristic functions to identify the limiting distribution in the case of standard Brownian motion, a
much simpler scenario than ours. Finally, the results of Dedecker and Merlevede (2002) di¤er from
ours in that they only consider asymptotically homoskedastic and strictly stationary processes. In
our case, heteroskedasticity is explicitly allowed because of _ (s) : An important special case of
Theorem 1 is the near unit root model discussed in more detail in Section 4.3.
More importantly, our results innovate over the literature by establishing joint convergence
between cross-sectional and time series averages that are generally not independent and whose
limiting distributions are not independent. This result is obtained by a novel construction that
embeds both data sets in a random …eld. A careful construction of information …ltrations G
n;n+i
allows to map the …eld into a martingale array. Similar techniques were used in Kuersteiner
and Prucha (2013) for panels with …xed T: In this paper we extend their approach to handle an
additional and distinct time series data-set and by allowing for both n and
to tend to in…nity
jointly. In addition to the more complicated data-structure we extend Kuersteiner and Prucha
(2013) by considering functional central limit theorems.
The following corollary is useful for possibly non-linear but trend stationary models.
31
Corollary 1 Assume that Conditions 1, 2 and 3 hold. Then it follows that for ~it de…ned in (21),
and as ; n ! 1 and T …xed,
d
Xn (1) ! B :=
where
= diag (
y;
1=2
W (C-stably)
(1)) is C-measurable and W = (Wy (1) ; W (1)) is a vector of standard
d-dimensional Gaussian random variables independent of
. The variables
y;
(:) ; Wy (:) and
W (:) are as de…ned in Theorem 1.
Proof. In Appendix C.
d
The result of Corollary 1 is equivalent to the statement that Xn (1) ! N (0; ) conditional
on positive probability events in C. As noted earlier, no simpli…cation of the technical arguments
are possible by conditioning on C except in the trivial case where
is a …xed constant. Eagleson
(1975, Corollary 3), see also Hall and Heyde (1980, p. 59), establishes a simpler result where
d
Xn (1) ! B weakly but not (C-stably). Such results could in principle be obtained here as
well, but they would not be useful for the analysis in Sections 4.2 and 4.3 because the limiting
distributions of our estimators not only depend on B but also on other C-measurable scaling
matrices. Since the continuous mapping theorem requires joint convergence, a weak limit for B
alone is not su¢ cient to establish the results we obtain below.
Theorem 1 establishes what Phillips and Moon (1999) call diagonal convergence, a special
form of joint convergence. To see that sequential convergence where …rst n or
go to in…nity,
followed by the other index, is generally not useful in our set up, consider the following example.
Assume that d = k is the dimension of the vector ~it . This would hold for just identi…ed moment
estimators and likelihood based procedures. Consider the double indexed process
max(T; 0 + )
Xn (1) =
X
n
X
~it (1) :
(23)
t=min(1; 0 +1) i=1
For each
…xed, convergence in distribution of Xn as n ! 1 follows from the central limit
theorem in Kuersteiner and Prucha (2013). Let X denote the “large n, …xed ”limit. For each n
…xed, convergence in distribution of Xn as
! 1 follows from a standard martingale central limit
theorem for Markov processes. Let Xn be the “large , …xed n”limit. It is worth pointing out that
the distributions of both Xn and X are unknown because the limits are trivial in one direction.
32
For example, when
is …xed and n tends to in…nity, the component
1=2
P
0+
t= 0 +1
trivially
;t
P
1=2
0+
t= 0 +1
converges in distribution (it does not change with n) but the distribution of
;t
is generally unknown. More importantly, application of a conventional CLT for the cross-section
alone will fail to account for the dependence between the time series and cross-sectional components. Sequential convergence arguments thus are not recommended even as heuristic justi…cations
of limiting distributions in our setting.
4.2
Trend Stationary Models
Let = ( ;
1 ; :::; T )
and de…ne the shorthand notation fit ( ; ) = f (yit j ; ), gt ( ; ) = g ( t j
f
;it
( ; ) = @fit ( ; ) =@
f
;it
( 0;
0) ;
gt = gt ( 0 ;
0)
and g
;t
( ; ) = @gt ( ; ) =@ : Also let fit = fit ( 0 ;
and g
;t
= g
;t
( 0;
0) :
y
it ;
;t
f
;it
=
Depending on whether the estimator under
consideration is maximum likelihood or moment based we assume that either (f
satisfy the same Assumptions as
0) ;
t 1;
in Condition 1. We recall that
t
;it ; g ;t )
or (fit ; gt )
( ; ) is a function
of (zt ; ; ), where zt are observable macro variables. For the CLT, the process
t
=
is evaluated at the true parameter values and treated as observed. In applications,
t
( 0;
0)
t
will be
replaced by an estimate which potentially a¤ects the limiting distribution of : This dependence
is analyzed in a step separate from the CLT.
The next step is to use Corollary 1 to derive the joint limiting distribution of estimators for
P
= ( 0 ; 0 )0 . De…ne sM L ( ; ) = 1=2 t=0 +0 +1 @g ( t ( ; ) j t 1 ( ; ) ; ; ) =@ and syM L ( ; ) =
P P
n 1=2 Tt=1 ni=1 @f (yit j ; ) =@ for maximum likelihood, and
sM ( ; ) =
0
(@k ( ; ) =@ ) W
1=2
0+
X
t= 0 +1
and syM ( ; ) =
(@hn ( ; ) =@ )0 WnC n
1=2
PT Pn
t=1
i=1
g( t( ; )j
t 1
( ; ); ; )
f (yit j ; ) for moment based estimators.
We use s ( ; ) and sy ( ; ) generically for arguments that apply to both maximum likelihood
and moment based estimators. The estimator ^ jointly satis…es the moment restrictions using
time series data
^; ^ = 0:
(24)
sy ^; ^ = 0:
(25)
s
and cross-sectional data
33
; );
0
De…ning s ( ) = sy ( )0 ; s ( )0
expansion around
0
the estimator ^ satis…es s ^ = 0. A …rst order Taylor series
is used to obtain the limiting distribution for ^. We impose the following
additional assumption.
Condition 4 Let
= ( 0 ; 0 )0 2 Rk ;
2 Rk ; and
2 Rk : De…ne Dn = diag n
1=2
Iy ;
1=2
I
;where
Iy is an identity matrix of dimension k and I is an identity matrix of dimension k : Let
W C = plimn WnC and W
and assume the limits to be positive de…nite and C-
= plim W
measurable. De…ne h ( ; ) = plimn hn ( ; t ; ) and k ( ; ) = plim k ( ; ) : Assume that for
some " > 0;
i) sup
ii) sup
:k
0k
:k
iii)s sup
0k
:k
@k ( ; )0 =@ W
(@k ( ; ) =@ )0 W
"
0
(@hn ( ; ) =@ ) WnC
"
0k
"
@s( )
Dn
@ 0
rank almost surely. Let
= op (1) ;
0
(@h ( ; ) =@ ) W C = op (1) ;
A ( ) = op (1) where A ( ) is C-measurable and A = A ( 0 ) is full
= lim n= ;
2
A=4
p
Ay;
p1
A
;
Ay;
A
;
3
5
with Ay; = plim n 1 @sy ( 0 ) =@ 0 , Ay; = plim n 1 @sy ( 0 ) =@ 0 , A
A
;
= plim
1
;
= plim
1
@s ( 0 ) =@
0
and
@s ( 0 ) =@ 0 .
Condition 5 For maximum likelihood criteria the following holds:
P +[ r]
p
(r)
i) for any s; r 2 [0; 1] with r > s; 1 t=0 0 +[ s]+1 g ;t g 0 ;t !
(s) as
! 1 and where
(r) satis…es the same regularity conditions as in Condition 2(ii).
P
p
ii) n1 ni=1 f ;it f 0 ;it ! ty for all t 2 [1; :::; T ] and where ty is positive de…nite a.s. and measurable
P
with respect to ( 1 ; :::; T ) : Let y = Tt=1 ty .
Condition 6 For moment based criteria the following holds:
P 0 +[ r]
p
i) for any s; r 2 [0; 1] with r > s; 1 t;q=
gt gq0 ! g (r)
0 +[ s]+1
g
(s) as
! 1 and where
(r) satis…es the same regularity conditions as in Condition 2(ii).
P
P
p
ii) n1 ni=1 fit fir0 ! t;rf for all t; s 2 [1; :::; T ] Let f = Tt;r=1 t;sf : Assume that
g
de…nite a.s. and measurable with respect to
( 1 ; :::;
34
T ).
f
is positive
Condition 6 accounts for the possibility of misspeci…cation of the model. In that case, the
martingale di¤erence property of the moment conditions may not hold, necessitating the use of
robust standard errors through long run variances.
The following result establishes the joint limiting distribution of ^:
y
it ;
Theorem 2 Assume that Conditions 1, 4, and either 5 with
y
it ;
likelihood based estimators or 6 with
hold. Assume that ^
0
;t
= (f
;it ; g ;t )
in the case of
= (fit ; gt ) in the case of moment based estimators
;t
= op (1) and that (24) and (25) hold. Then,
d
Dn 1 ^
!
0
1
A
1=2
W (C-stably)
where A is full rank almost surely, C-measurable and is de…ned in Condition 4. The distribution
1=2
of
W is given in Corollary 1. In particular,
maximum likelihood
=
y
@h(
0
0; 0)
@
WC
y
fW
and
= diag (
(1)). Then the criterion is
y;
(1) are given in Condition 5. When the criterion is moment based,
C0 @h(
0; 0)
@
and
@k(
(1) =
0
0; 0)
@
W
g
(1) W
0 @k(
0; 0)
@
with
f
and
g
(1)
de…ned in Condition 6.
Proof. In Appendix C.
Corollary 2 Under the same conditions as in Theorem 2 it follows that
p
n ^
d
0
Ay;
!
1=2
y Wy
p
(1)
Ay;
1=2
(1) W (1) (C-stably):
(26)
where
Ay; = Ay;1 + Ay;1 Ay;
Ay; =
Ay;1 Ay;
A
A
;
A ; Ay;1 Ay;
A ; Ay;1 Ay;
;
1
1
A ; Ay;1
:
For
= Ay;
y; 0
yA
+ Ay;
(1) Ay;
0
it follows that
p
n
1=2
^
d
0
! N (0; I) (C-stably).
35
(27)
Note that
; the asymptotic variance of
p ^
n
0
conditional on C, in general is a random
variable, and the asymptotic distribution of ^ is mixed normal. However, as in Andrews (2005), the
result in (27) can be used to construct an asymptotically pivotal test statistic. For a consistent
p
1=2
estimator ^ the statistic n ^
R ^ r is asymptotically distribution free under the null
hypothesis R
r = 0 where R is a conforming matrix of dimension q
k and and r a q
1
vector.
4.3
Unit Root Time Series Models
In this section we consider the special case where
t+1
=
t
+
t
t
follows an autoregressive process of the form
as discussed in Section 2.3. As in Hansen (1992), Phillips (1987, 1988, 2014) we
allow for nearly integrated processes where
= exp ( / ) is a scalar parameter localized to unity
such that
;t+1
and the notation
;t
emphasizes that
;t
1=2
where
0
(28) is
= exp ( / )
;t
+
(28)
t+1
is a sequence of processes indexed by . We assume that
;min(1; 0 )
=
0
= V (0) a.s.
is a potentially nondegenerate random variable. In other words, the initial condition for
;min(1; 0)
=
1=2
0.
We explicitly allow for the case where
0
= 0, to model a situation
where the initial condition can be ignored. This assumption is similar, although more parametric
than, the speci…cation considered in Kurtz and Protter (1991). We limit our analysis to the case
of maximum likelihood criterion functions. Results for moment based estimators can be developed
along the same lines as in Section 4.2 but for ease of exposition we omit the details. For the unit
root version of our model we assume that
t
is observed in the data and that the only parameter to
be estimated from the time series data is : Further assuming a Gaussian quasi-likelihood function
we note that the score function now is
g
;t
( ; )=
;t 1
(
;t
;t 1
):
(29)
The estimator ^ solving sample moment conditions based on (29) is the conventional OLS estimator
given by
P
t=
^= P
;t 1 ;t
:
2
;t 1
t= 0 +1
0 +1
36
We continue to use the de…nition for f
where
0
=( ;
0 ).
;it
( ; ) in Section 4.2 but now consider the simpli…ed case
We note that in this section,
rather than
0
used in the cross-sectional model. The implicit scaling of
;min(1; 0 )
n
(r) ; Y n ) where V
Y
Note that
Z
n
(r) =
1=2
P
0 +[ r]
t= 0 +1
1=2
(r) =
is necessary in the
! 1.
[ r] ;
and
T X
n
X
f ;it
p :
=
n
t=1 i=1
n
Xr]
0 +[
r
V n dW
n
1
=
0
with W
n
1=2
by
cross-sectional speci…cation to maintain a well de…ned model even as
Consider the joint process (V
is the common shock
;min(1; 0)
;t 1 t
t= 0 +1
t.
We de…ne the limiting process for V n (r) as
Z r
r
e (r s) dW (s)
V ;V (0) (r) = e V (0) +
(30)
0
where W is de…ned in Theorem 1. When 0 = 0; Theorem 1 directly implies that e [r ]= V n (r) )
Rr
e s dW (s) C-stably noting that in this case
(s) = 2 (1 exp ( 2s )) 2 and _ (s)1=2 =
0
Rr
e s . The familiar result (cf. Phillips 1987) that V n (r) ) 0 e (r s) dW (s) then is a consequence of the continuous mapping theorem. The case in (30) where
0
is a C-measurable random
variable now follows from C-stable convergence of V n (r) : In this section we establish joint C-stable
Rr
convergence of the triple V n (r) ; Y n ; 0 V n dW n :
Let
0
= ( 0 ; )0 2 Rk ;
= exp ( 0 = ) with
0
2 Rk ; and
2 R and both
2 R: The true parameters are denoted by
0
and
0
and
bounded. We impose the following modi…ed
0
assumptions to account for the the speci…c features of the unit root model.
Condition 7 De…ne C =
except that here
( 0 ). De…ne the
-…elds Gn;jmin(1;
= n such that dependence on
0 )jn+i
in the same way as in (18)
is suppressed and that
t
is replaced with
t
in
G
n;(t min(1; 0 ))n+i
=
yjt 1 ; yjt 2 ; : : : ; yj min(1;
0)
n
j=1
Assume that
i) f
;it
is measurable with respect to G
n;(t min(1; 0 ))n+i :
37
;
t ; t 1 ; : : : ; min(1; 0 )
; (yj;t )ij=1 _ C
as
ii)
t
is measurable with respect to G
iii) for some
iv) for some
;it jG n;(t min(1; 0 ))n+i 1
vi) E
t jG n;(t min(1; 0 ) 1)n+i
vii) For any 1 > r > s
r;s
;
h
> 0 and C < 1; supit E kf ;it k2+
h
i
2+
> 0 and C < 1; supt E k t k
v) E f
Then,
n;(t min(1; 0 ))n+i
!p (r
viii) Assume that
respect to C. Let
s) 2 :
Pn
1
i=1
n
y
=
f
PT
for all i = 1; :::; n
C
C
=0
= 0 for t > T and all i = f1; :::; ng:
P 0 +[ r]
1
0 …xed let r;s; =
t=min(1; 0 )+[
0
;it f ;it
t=1
i
p
!
ty
where
ty
s]+1
E
2
t jG n;(t min(1; 0 ) 1)n+n
:
is positive de…nite a.s. and measurable with
ty .
Conditions 7(i)-(vi) are the same as Conditions 1 (i)-(vi) adapted to the unit root model. Condition 7(vii) replaces Condition 2. It is slightly more primitive in the sense that if t2 is homoskedasP 0 +[ r]
2
tic, Condition 7(vii) holds automatically and convergence of 1 t=min(1;
! (r s) 2
0 )+[ s]+1 t
follows from an argument given in the proofs rather than being assumed. On the other hand,
Condition 7(vii) is somewhat more restrictive than Condition 2 in the sense that it limits heteroskedasticity to be of a form that does not a¤ect the limiting distribution. In other words,
P 0 +[ r]
2
we essentially assume 1 t=min(1;
to be proportional to r s asymptotically. This
0 )+[ s]+1 t
assumption is stronger than needed but helps to compare the results with the existing unit root
literature.
For Condition 7(viii) we note that typically
0
= ( 00 ;
0
0;
0 ).
Thus, even if
ty
ty
0
;it f ;it
( )=E f
and
(:) is non-stochastic, it follows that
surable with respect to C because it depends on
0
ty
ty
=
ty
( 0 ) where
is random and mea-
which is a random variable measurable w.r.t
C.
The following results are established by modifying arguments in Phillips (1987) and Chan and
Wei (1987) to account for C-stable convergence and by applying Theorem 1.
Theorem 3 Assume that Conditions 7 hold. As ; n ! 1 and T …xed with
2 (0; 1) it follows that
Z s
V n (r) ; Y n ;
V n dW
0
n
)
V
;V (0) (r) ;
1=2
y Wy
(1) ;
Z
0
in the Skorohod topology on DRd [0; 1] :
38
= n for some
s
V
;V (0) dW
(C-stably)
Proof. In Appendix C.
We now employ Theorem 3 to analyze the limiting behavior of ^ when the common factors are
generated from a linear unit root process. To derive a limiting distribution for ^ we impose the
following additional assumption.
P P
Condition 8 Let ^ = arg max Tt=1 ni=1 f (yit j ; ^). Assume that
Condition 9 Let s~yit ( ) = f
;it
p
( ) = n and s~it ( ) = 1 fi = 1g g
^
;t
0
= Op n
1=2
.
( ) = . Assume that s~yit ( ) :
Rk ! Rk , s~it ( ) : R ! R and de…ne Dn = diag n 1=2 Iy ; 1 , where Iy is an identity matrix of
P
dimension k . Let = lim n= 2 . Let Ay; ( ) = Tt=1 E [@syit ( ) =@ 0 ] ;
Ay; ( ) =
and de…ne Ay ( ) =
random functions
h
Ay; ( )
p
Ay; ( )
T
X
i
E [@syit ( ) =@ ]
t=1
where A ( ) is a k
k dimensional matrix of non-
! R. Assume that Ay; ( 0 ) is full rank almost surely. Assume that for some
" > 0;
max(T; 0 + )
sup
:k
0k
"
X
t=min(1; 0 +1)
n
X
@~
syit ( )
Dn
0
@
i=1
Ay ( ) = op (1)
We make the possibly simplifying assumption that A ( ) only depends on the factors through
the parameter :
Theorem 4 Assume that Conditions 7, 8 and 9 hold. It follows that
Z 1
1 Z 1
p
p
d
1 1=2
1
2
^
Ay; y Wy (1)
n
Ay; Ay;
V ;V (0) dr
V
0 !
0
;V (0) dW
(C-stably).
0
Proof. In Appendix C.
The result in Theorem 4 is an example that shows how common factors a¤ecting both time
series and cross-section data can lead to non-standard limiting distributions. In this case, the
initial condition of the unit root process in the time series dimension causes dependence between
the components of the asymptotic distribution of ^ because both
on
0.
y
and V
;V (0)
in general depend
Thus, the situation encountered here is generally more di¢ cult than the one considered
in Stock and Yogo (2006) and Phillips (2014). In addition, because the limiting distribution of ^
is not mixed asymptotically normal, simple pivotal test statistics as in Andrews (2005) are not
readily available contrary to the stationary case.
39
5
Summary
We argue that cross-sectional inference generally fails in the face of aggregate uncertainty a¤ecting
agents’decisions. We propose a general econometric framework that rests on parametrizing the
e¤ects of aggregate shocks and on using a combination of time series and cross-sectional data to
identify parameters of interest. We develop new asymptotics for combined data-sets based on the
notion of stable convergence and establish a Central Limit Theorem. Our results are expected
to be helpful for the econometric analysis of rational expectations models involving individual
decision making as well general equilibrium settings.
40
References
[1] Andrews, D.W.K. (2005): “Cross-Section Regression with Common Shocks,” Econometrica
73, pp. 1551-1585.
[2] Aldous, D.J. and G.K. Eagleson (1978): “On mixing and stability of limit theorems,” The
Annals of Probability 6, 325–331.
[3] Arellano, M., R. Blundell, and S. Bonhomme (2014): “Household Earnings and Consumption:
A Nonlinear Framework,”unpublished working paper.
[4] Bajari, Patrick, Benkard, C. Lanier, and Levin, Jonathan (2007): “Estimating Dynamic
Models of Imperfect Competition,”Econometrica 75, 5, pp. 1331-1370.
[5] Billingsley, P. (1968): “Convergence of Probability Measures,” John Wiley and Sons, New
York.
[6] Brown, B.M. (1971): “Martingale Central Limit Theorems,”Annals of Mathematical Statistics 42, pp. 59-66.
[7] Campbell, J.Y., and M. Yogo (2006): “E¢ cient Tests of Stock Return Predictability,”Journal
of Financial Economics 81, pp. 27–60.
[8] Chamberlain, G. (1984): “Panel Data,” in Handbook of Econometrics, eds. by Z. Griliches
and M. Intriligator. Amsterdam: North Holland, pp. 1247-1318.
[9] Chan, N.H. and C.Z. Wei (1987): “Asymptotic Inference for Nearly Nonstationary AR(1)
Processes,”Annals of Statistics 15, pp.1050-1063.
[10] Dedecker, J and F. Merlevede (2002): “Necessary and Su¢ cient Conditions for the Conditional Central Limit Theorem,”Annals of Probability 30, pp. 1044-1081.
[11] Du¢ e, D., J. Pan and K. Singleton (2000): “Transform Analysis and Asset Pricing for A¢ ne
Jump Di¤usions,”Econometrica, pp. 1343-1376.
[12] Durrett, R. (1996): “Stochastic Calculus,”CRC Press, Boca Raton, London, New York.
41
[13] Eagleson, G.K. (1975): “Martingale convergence to mixtures of in…nitely divisible laws,”The
Annals of Probability 3, 557–562.
[14] Feigin, P. D. (1985): “Stable convergence of Semimartingales,”Stochastic Processes and their
Applications 19, pp. 125-134.
[15] Gagliardini, P., and C. Gourieroux (2011): “E¢ ciency in Large Dynamic Panel Models with
Common Factor,”unpublished working paper.
[16] Gemici, A., and M. Wiswall (2014): “Evolution of Gender Di¤erences in Post-Secondary
Human Capital Investments: College Majors,”International Economic Review 55, 23–56.
[17] Gillingham, K., F. Iskhakov, A. Munk-Nielsen, J. Rust, and B. Schjerning (2015): “A Dynamic Model of Vehicle Ownership, Type Choice, and Usage,”unpublished working paper.
[18] Hahn, J., and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic
Panel Model with Fixed E¤ects When Both n and T are Large,”Econometrica 70, pp. 1639–
57.
[19] Hahn, J., and W.K. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear
Panel Models,”Econometrica 72, pp. 1295–1319.
[20] Hall, P., and C. Heyde (1980): “Martingale Limit Theory and its Applications,” Academic
Press, New York.
[21] Hansen, B.E. (1992): “Convergence to Stochastic Integrals for Dependent Heterogeneous
Processes,”Econometric Theory 8, pp. 489-500.
[22] Heckman, J.J., L. Lochner, and C. Taber (1998), “Explaining Rising Wage Inequality: Explorations with a Dynamic General Equilibrium Model of Labor Earnings with Heterogeneous
Agents,”Review of Economic Dynamics 1, pp. 1-58.
[23] Heckman, J.J., and G. Sedlacek (1985): “Heterogeneity, Aggregation, and Market Wage
Functions: An Empirical Model of Self-Selection in the Labor Market,” Journal of Political
Economy, 93, pp. 1077-1125.
42
[24] Jacod, J. and A.N. Shiryaev (2002): “Limit Theorems for stochastic processes,” Springer
Verlag, Berlin.
[25] Kuersteiner, G.M., and I.R. Prucha (2013): “Limit Theory for Panel Data Models with Cross
Sectional Dependence and Sequential Exogeneity,”Journal of Econometrics 174, pp. 107-126.
[26] Kuersteiner, G.M and I.R. Prucha (2015): “Dynamic Spatial Panel Models: Networks, Common Shocks, and Sequential Exogeneity,”CESifo Working Paper No. 5445.
[27] Kurtz and Protter (1991): “Weak Limit Theorems for Stochastic Integrals and Stochastic
Di¤erential Equations,”Annals of Probability 19, pp. 1035-1070.
[28] Lee, D. (2005): “An Estimable Dynamic General Equilibrium Model of Work, Schooling and
Occupational Choice,”International Economic Review 46, pp. 1-34.
[29] Lee, D., and K.I. Wolpin (2006): “Intersectoral Labor Mobility and the Growth of Service
Sector,”Econometrica 47, pp. 1-46.
[30] Lee, D., and K.I. Wolpin (2010): “Accounting for Wage and Employment Changes in the U.S.
from 1968-2000: A Dynamic Model of Labor Market Equilibrium,” Journal of Econometrics
156, pp. 68–85.
[31] Magnac, T. and D. Thesmar (2002): “Identifying Dynamic Discrete Decision Processes,”
Econometrica 70, pp. 801–816.
[32] Magnus, J.R. and H. Neudecker (1988): “Matrix Di¤erential Calculus with Applications in
Statistics and Econometrics,”Wiley.
[33] Mikusheva, A. (2007): “Uniform Inference in Autoregressive Models,” Econometrica 75, pp.
1411–1452.
[34] Murphy, K. M. and R. H. Topel (1985): “Estimation and Inference in Two-Step Econometric
Models,”Journal of Business and Economic Statistics 3, pp. 370 –379.
[35] Peccati, G. and M.S. Taqqu (2007): “Stable convergence of generalized L2 stochastic integrals
and the principle of conditioning,”Electronic Journal of Probability 12, pp.447-480.
43
[36] Phillips, P.C.B. (1987): “Towards a uni…ed asymptotic theory for autoregression,”Biometrika
74, pp. 535-547.
[37] Phillips, P.C.B. (1988): “Regression theory for near integrated time series,” Econometrica
56, pp. 1021-1044.
[38] Phillips, P.C.B. and S.N. Durlauf (1986): “Multiple Time Series Regression with Integrated
Processes,”Review of Economic Studies 53, pp. 473-495.
[39] Phillips, P.C.B. and H.R. Moon (1999): “Linear Regression Limit Theory for Nonstationary
Panel Data,”Econometrica 67, pp. 1057-1111.
[40] Phillips, P.C.B. (2014): “On Con…dence Intervals for Autoregressive Roots and Predictive
Regression,”Econometrica 82, pp. 1177–1195.
[41] Renyi, A (1963): “On stable sequences of events,”Sankya Ser. A, 25, 293-302.
[42] Rootzen, H. (1983): “Central limit theory for martingales via random change of time,” Department of Statistics, University of North Carolina, Technical Report 28.
[43] Runkle, D.E. (1991): “Liquidity Constraints and the Permanent-Income Hypothesis: Evidence from Panel Data,”Journal of Monetary Economics 27, pp. 73–98.
[44] Shea, J. (1995):“Union Contracts and the Life-Cycle/Permanent-Income Hypothesis,”American Economic Review 85, pp. 186–200.
[45] Tobin, J. (1950): “A Statistical Demand Function for Food in the U.S.A.,” Journal of the
Royal Statistical Society, Series A. Vol. 113, pp.113-149.
[46] Wooldridge, J.M. and H. White (1988): “Some invariance principles and central limit theorems for dependent heterogeneous processes,”Econometric Theory 4, pp.210-230.
44
Appendix –Available Upon Request
A
Discussion for Section 2.2
A.1
Proof of (5)
The maximization problem is equivalent to
Since
(1
) ui;t
N
( (1+r)+(1
max
e
(1
) ;
E e
(1
2
)2
(1
)ui;t
))
=e
E e
2
(1
(1
)ui;t
:
, we have
) +
2 (1
)2 2
2
;
and the maximization problem can be rewritten as follows:
max
(1+r)+(1
e
)(1+ )
)2 2
(1
2
:
Taking the …rst order condition, we have,
0=
r
+
2
2
from which we obtain the solution
=
A.2
1
2
r
+
2
:
Euler Equation and Cross Section
Our model in Section 2.2 is a stylized version of many models considered in a large literature
interested in estimating the parameter using cross-sectional variation. Estimators are often based
on moment conditions derived from …rst order conditions (FOC) related to optimal investment
and consumption decisions. We illustrate the problems facing such estimators.
Assume a researcher has a cross-section of observations for individual consumption and returns
ci;t and ui;t . The population FOC of our model16 takes the simple form E e
16
We assume
6= 0 and rescale the equation by
1
:
45
ci;t
(r
ui;t ) = 0. A
just-identi…ed moment based estimator for
solves the sample analog n
0. It turns out that the probability limit of ^ is equal to (
r)/ ((1
t
1
Pn
e
i=1
2
)
^ci;t
(r
ui;t ) =
), i.e., ^ is inconsistent.
We now compare the population FOC a rational agent uses to form their optimal portfolio
with the empirical FOC an econometrician using cross-sectional data observes:
1
n
n
X
ci;t
e
(r
ui;t ) = 0:
i=1
Noting that ui;t =
t
+
i;t
and substituting into the budget constraint
ci;t = 1 + r + (1
) ui;t = 1 + r + (1
)
t
+ (1
)
i;t
we have
n
1
n
X
e
ci;t
(r
ui;t ) = n
i=1
1
n
X
(1+ r+(1
e
) t)
(1
)
i;t
(r
(31)
i;t )
t
i=1
=e
(1+ r+(1
) t)
(r
1
t) n
n
X
e
(1
)
1
n
i;t
i=1
n
1
n
X
e
(1
)
i;t
=E e
(1
)
+ op (1) = e
i;t
e
(1
)
i=1
Under suitable regularity conditions including independence of
that
n
X
i;t
i;t
i;t
!
:
in the cross-section it follows
2 (1
)2 2
2
+ op (1)
(32)
i=1
and
n
1
n
X
e
(1
)
i;t
i;t
=E e
(1
)
i;t
i;t
+ op (1) =
(1
)
2
e
2 (1
)2 2
2
+ op (1) :
(33)
i=1
Taking limits as n ! 1 in (31) and substituting (32) and (33) then shows that the method of
moments estimator based on the empirical FOC asymptotically solves
(r
Solving for
t)
+ (1
)
2
e
2 (1
)2 2
2
= 0:
(34)
we obtain
plim ^ =
t
(1
r
)
2
:
This estimate is inconsistent because the cross-sectional data set lacks cross sectional ergodicity,
or in other words does not contain the same information about aggregate risk as is used by rational
46
agents. Therefore, the empirical version of the FOC is unable to properly account for aggregate
risk and return characterizing the risky asset. The estimator based on the FOC takes the form of
an implicit solution to an empirical moment equation, which obscures the e¤ects of cross-sectional
non-ergodicity. A more illuminative approach uses our modelling strategy in Section 2.1.
On the other hand, it is easily shown using properties of the Gaussian moment generating
function that the population FOC is proportional to
(1
E e
)ui;t
(r
ui;t ) = r
+ (1
)
2
e
(1
The main di¤erence between (32) and (33) lies in the fact that
sample and that
t
6=
) +
2
v
2 (1
)2 2
2
= 0:
(35)
is estimated to be 0 in the
in general. Note that (35) implies that consistency may be achieved with
a large number of repeated cross sections, or a panel data set with a long time series dimension.
However, this raises other issues discussed later in Section 2.4.
B
B.1
Details of Section 2.3
Proof of Proposition 1
We start by considering the problem solved by individual i. For simplicity, we will assume that
the discount factor is equal to 1. We solve the problem in two steps. First, conditional on the
working decision, we determine optimal consumption and, hence, savings. Then, we determine
the optimal working decision. Conditional on working in period t, worker i solves the following
program:
max
bt ;ci;t ;ci;t+1
exp ( r ci;t ) + Et [ exp ( r ci;t+1 ) exp ( r (1
))]
s.t. ci;t + bw
i;t = yi;t + wi;t
ci;t+1 = yi;t+1 + Rbw
i;t ;
or equivalently,
max
ci;t ;ci;t+1
exp ( r ci;t ) + Et [ exp ( r ci;t+1 ) exp ( r (1
s.t. ci;t+1 = yi;t+1 + R (yi;t + wi;t
47
ci;t ) :
))]
The …rst order condition for ci;t generate the following standard Euler equation:
exp ( r ci;t ) = Et [exp ( r ci;t+1 ) exp ( r (1
))] :
Replacing for ci;t+1 using the budget constraint, gives us
exp ( r ci;t ) = Et [exp f r (yi;t+1 + R (yi;t + wi;t
ci;t ))g exp ( r (1
))] ;
which can be written as follows:
exp ( r ci;t ) = exp ( r R (yi;t + wi;t
ci;t )) exp ( r (1
)) Et [exp ( r yi;t+1 )] :
As mentioned in the main text, after the shocks are realized, individuals can infer zt+1 = ln
i;t+1 .
Under our assumption, zt+1 is normally distributed with mean zero and variance
a consequence, zt+1 jzt is also normally distributed with mean
2
z
are functions of the variances and covariances of ln
and
t+1
and variance
z zt
i;t+1 .
2
z,
2
+
where
= exp ( r (
+
1
+
1
2 xi;t
+ zt+1 ))]
2 xi;t )) Et
[exp ( r zt+1 )] :
Now, conditional on zt , zt+1 is normal with mean equal to
Cov (zt ; zt+1 )
(zt
E [zt+1 ] +
Var (zt )
% 2
1 %2
E [zt ]) =
2
1 %2
2
+
zt
z zt ;
and variance equal to
Var (zt+1 )
Cov (zt ; zt+1 )2
=
Var (zt )
1
2
%2
+
%
2
2
2
%2
1
2
z:
It follows that
Et [exp ( r zt+1 )] = exp
r
z zt
+
r2
2 2
z
2
and hence,
exp ( r ci;t ) = exp ( r R (yi;t + wi;t
exp ( r
z zt ) exp
ci;t )) exp ( r (1
r2
2 2
z
2
48
:
+
2
. As
z
and
Speci…cally, for individuals
who worked we have
Et [exp ( r yi;t+1 )] = Et [exp ( r (
t+1
)) exp ( r (
1
+
2 xi;t ))
Taking the log of both sides and solving for ci;t , we obtain
cw
i;t =
R (yi;t + wi;t ) + 1
+ (
1
+
2 xi;t )
+
r
z zt
2 2
z
2
:
(R + 1)
Following similar steps using the non-labor income of an individual who does not work, it can be
shown that his consumption takes the following form
Ryi;t + (
cnw
i;t =
1+
2 xi;t ) +
r
z zt
2 2
z
2
(R + 1)
:
We can now determine whether it is optimal for the individual to work in period t by comparing the life-time utility conditional on working with the corresponding utility conditional on not
working. The individual works if
expf r cw
i;t g + Et
expf r cnw
i;t g expf r (1
expf r
yi;t + wi;t
)g + Et
cw
i;t expf r (1
expf r
)g
cnw
i;t expf r (1
yi;t
)g
or equivalently
expf r cw
i;t g + Et expf r
expf r cnw
i;t g expf r (1
yi;t+1 + R yi;t + wi;t
)g + Et expf r
cw
i;t
yi;t+1 + R yi;t
expf r (1
cnw
i;t
)g
expf r (1
)g :
The Euler equation of a working individual implies that
expf r cw
i;t g = Et [expf r (yi;t+1 + R (yi;t + wi;t
ci;t )) expf r (1
)g] :
Similarly, the Euler equation of a non-working individual implies that
expf r cnw
i;t g expf r (1
)g = Et [expf r (yi;t+1 + R (yi;t
ci;t ))g expf r (1
)g] :
The above inequality is therefore satis…ed if
2 exp
r cw
i;t
2 exp
r cnw
i;t exp ( r (1
)) :
nw
Taking logs of both sides and replacing for cw
i;t and ci;t , the inequality becomes (after simpli…cation)
wi;t
1
49
;
or equivalently
1
wt
1
:
ei;t
1
is drawn from a uniform distribution de…ned in the interval [ ; 1]. The
ei;t
probability that a skilled worker chooses to work takes therefore the following form:
The random variable
P
wtA
1
ei;t
1
=P
1
ei;t
wtA
1
=
(1
) (1
)
wtA
:
1
Similarly, the probability that an unskilled worker decides to work has the following form:
P
wtB
1
1
ei;t
=P
1
ei;t
wtB
1
=
(1
) (1
)
wtB
:
1
The labor supply of skilled and unskilled worker is therefore given by
StA = NtA
(1
) (1
)
wtA
and StB = NtB
1
(1
) (1
)
wtB
1
:
or, equivalently,
ln StA = ln NtA +ln
(1
)
wtA
ln (1
and
)
ln StB = ln NtB +ln
(1
)
wtB
ln (1
We will now solve the …rm’s problem. Since the market price is normalized to 1, the …rm
chooses the amount of skilled and unskilled labor that maximizes the following pro…t function:
max qt
c(qt ) =
B
LA
t ;Lt
D
t
LA
t
LB
t
A
wtA LA
t
B
wtB LB
t :
B
Deriving the …rst order conditions and solving for LA
t and Lt , it can be shown that the optimal
labor demands take the following form:
LA
t =
"
LB
t =
"
1
D
t
A
wtA
B
B
B
wtB
#
1
1
and
1
D
t
B
wtB
A
A
A
wtA
#
A
B
1
1
A
B
;
or equivalently,
ln LA
t =
1
1
A
ln
B
D
t
+ (1
B ) ln A
50
(1
A
B ) ln wt
+
B
ln
B
B
ln wtB
):
and
ln LB
t =
1
1
D
t
ln
A
B
+ (1
A ) ln B
B
A ) ln wt
(1
+
A
ln
A
A
ln wtA :
Equilibrium in the labor market requires that
ln DtA = ln StA
ln DtB = ln StB ;
and
or, equivalently,
ln
D
t
+ (1
B ) ln A
A
B ) ln wt
(1
1
A
+
ln
B
B
B
ln wtB
B
= ln NtA +ln
(1
wtA
ln (1
);
wtB
ln (1
):
)
and
ln
D
t
+ (1
A ) ln B
B
A ) ln wt
(1
1
A
+
ln
A
A
A
ln wtA
B
= ln NtB +ln
(1
)
The equilibrium wages wtA and wtB are the solution to these two equations.
B.2
Estimation of ( A ;
B)
We assume that the time series data include wtA ; wtB ; StA ; StB ; NtA ; NtB in addition to (Yt+1 ; Xt ).
We also assume that NtA ; NtB are exogenous in the sense that they are independent of
D
t .
Note
that ln DtA = ln StA can be rewritten as
ln
1
A
D
t
(1
=
B ) ln A
1
B
+
1
1
+
A
B
A
B
B
ln
B
ln (1
B
ln wtA +
B
1
A
B
) + ln NtA + ln
1
wtA
ln wtB :
The independence assumption suggests a straightforward method of moments estimation. If we
rede…ne to also include u and s and augment k ( ; ) to include
0
1
1
C
1 XB
1
B
C
B
A
+ ln Nss + ln
wsA
+
ln wsA +
B ln Ns C
1
1
1
@
A
A
B
s=1
B
ln Ns
we can estimate (
A; B )
B
A
along with the other parameters of the model, where
(1
B ) ln A
1
A
+
B
ln
B
B
51
ln (1
)
E ln
1
A
D
t
:
B
B
ln wsB ;
C
Proofs for Section 4
C.1
Proof of Theorem 1
To prove the functional central limit theorem we follow Billingsley (1968) and Dedecker and
Merlevede (2002). The proof involves establishing …nite dimensional convergence and a tightness
argument. For …nite dimensional convergence …x r1 < r2 <
Xn (ri ) = Xn (ri )
< rk 2 [0; 1]. De…ne the increment
(36)
Xn (ri 1 ) :
Since there is a one to one mapping between Xn (r1 ) ; :::; Xn (rk ) and Xn (r1 ) ; Xn (r2 ) ; :::; Xn (rk )
we establish joint convergence of the latter. The proof proceeds by checking that the conditions
of Theorem 1 in Kuersteiner and Prucha (2013) hold. Let kn = max(T; )n where both n ! 1
and
! 1 such that clearly kn ! 1 (this is a diagonal limit in the terminology of Phillips and
Moon, 1999). Let d = k + k : To handle the fact that Xn 2 Rd we use Lemmas A.1 - A.3 in
Phillips and Durlauf (1986). De…ne
De…ne t = t
j
=
0
j;y ;
0
j;
0
and let
= ( 1; : : : ;
k)
2 Rdk with k k = 1:
min (1; 0 ).
For each n and
0
de…ne the mapping q (t; i) : N2+ ! N+ as q (i; t) := t n+i and note that q (i; t)
is invertible, in particular for each q 2 f1; :::; kn g there is a unique pair t; i such that q (i; t) = q:
We often use shorthand notation q for q (i; t). Let
•q(i;t)
k
X
0
j
~it (rj )
E
j=1
h
~it (rj ) jG
n;t n+i 1
i
(37)
where
~it (rj ) = ~it (rj )
Note that
and
~it (rj ) =
~y (rj ) ;
it
~it (rj 1 ) ;
~it (r1 ) = ~it (r1 ) :
(38)
0
~ (rj ) with
t
8
< 0 for j > 1
~y (rj ) =
it
: ~y for j = 1
it
8
< ~ (r ) if [ r ] < t [ r ] and i = 1
j
j 1
j
;t
~ (rj ) =
:
it
:
0
otherwise
52
(39)
(40)
Using this notation and noting that
0
1 Xn
(r1 ) +
k
X
Pmax(T;
0+ )
t=min(1; 0 +1)
0
j
Pn
i=1
•q(i;t) = Pkn •q , we write
q=1
Xn (rj )
j=2
kn
X
=
q=1
First analyze the term
max(T; 0 + )
•q +
X
n X
k
X
0
jE
t=min(1; 0 +1) i=1 j=1
Pkn •
q=1 q . Note that
y
n;it
h
~it (rj ) G
n;t n+i 1
i
is measurable with respect to G
(41)
n;t n+i
by con-
struction. Note that by (37), (39) and (40) the individual components of •q are either 0 or equal
h
i
to ~it (rj ) E ~it (rj ) jG n;t n+i 1 respectively. This implies that •q is measurable with respect to
h
i
G n;q ; noting in particular that E ~it (rj ) jG n;t n+i 1 is measurable w.r.t G n;t n+i 1 by the propi
h
•
erties of conditional expectations and G n;t n+i 1
G n;q . By construction, E r jG n;q 1 = 0.
Pq •
This establishes that for Snq = s=1 s ,
fSnq ; G
n;q ; 1
q
kn ; n
1g
is a mean zero martingale array with di¤erences •q .
To establish …nite dimensional convergence we follow Kuersteiner and Prucha (2013) in the
proof of their Theorem 2. Note that, for any …xed n and given q, and thus for a corresponding
unique vector (t; i) ; there exists a unique j 2 f1; : : : ; kg such that
0
+ [ rj 1 ] < t
0
+ [ rj ] :
Then,
•q(i;t) =
k
X
l=1
=
0
1;y
+
0
j;
~it (rl )
0
l
E
h
~it (rl ) jG
n;t n+i 1
i
h
i
~y jG n;t n+i 1 1 fj = 1g
E
it
it
h
i
~ (rj ) E ~ (rj ) jG n;t n+i 1 1 f[ rj 1 ] < t
;t
;t
~y
[ rj ]g 1 f i = 1g
where all remaining terms in the sum are zero because of by (37), (39) and (40). For the subsequent
inequalities, …x q 2 f1; :::; kn g (and the corresponding (t; i) and j) arbitrarily. Introduce the
shorthand notation 1j = 1 fj = 1g and 1ij = 1 f[ rj 1 ] < t
First, note that for
[ rj ]g 1 f i = 1g :
0, and by Jensen’s inequality applied to the empirical measure
53
1
4
P4
i=1
xi
we have that
•q
2+
= 42+
42+
+ 42+
= 22+2
+ 22+2
h
i
h
i
1 0
~y E ~y jG n;t n+i 1 1j + 1 0
~ (rj ) E ~ (rj ) jG n;t n+i 1 1ij
;t
;t
it
4 1;y it
4 j;
h
i 2+
2+
1
1
2+
2+
y
y
~
~
k 1;y k
+ k 1;y k
E it jG n;t n+i 1
1j
it
4
4
h
i 2+
2+
1
1
+ k j; k2+ E ~ ;t (rj ) jG n;t n+i 1
k j; k2+ ~ ;t (rj )
1ij
4
4
h
i 2+
2+
1j
+ k 1;y k2+ E ~ity jG n;t n+i 1
k 1;y k2+ ~ity
k
j;
~ (rj )
;t
k2+
2+
+k
j;
h
E ~ ;t (rj ) jG
k2+
n;t n+i 1
i
2+
2+
1ij :
We further use the de…nitions in (19) such that by Jensen’s inequality and for i = 1 and t 2
[
0
+ 1;
0
+ ]
2+
~ (rj )
;t
+ E ~ ;t (rj ) jG
1
2+
;t
1+ =2
1
2+
;t
1+ =2
while for i > 1 or t 2
=[
0
+ 1;
h
0
+ E
h
+E
Similarly, for t 2 [1; :::; T ]
2+
it
1
h
+ E ~ity jG
=2
k
while for t 2
= [1; :::; T ]
Noting that k
E
•q
j;y k
jG
;t
n;t n+i 1
i
y 2+
it k
n;t n+i 1
h
+E k
i
n;t n+i 1
i
2+
y 2+
it k
jG
~y = 0:
it
1 and k
2+
G
2+
2+
n;t n+i 1
= 0:
it
n1+
jG
;t
2+
+ ];
~
~y
n;t n+i 1
i
n;q 1
j;
k < 1,
23+2 1 fi = 1; t 2 [
0
1+ =2
+ 1;
23+2 1 ft 2 [1; :::; T ]g h
+
E k
n1+ =2
54
0
+ ]g
y 2+
it k
E
G
h
2+
G
;t
n;t n+i 1
i
;
n;t n+i 1
i
(42)
where the inequality in (42) holds for
check that
0. To establish the limiting distribution of
kn
X
2+
•q
E
! 0;
q=1
kn
X
q=1
and
p
•2 !
q
X
0
1;y
1;y +
yt
sup E 4
n
k
X
(43)
0
j;
(rj
rj 1 )
j;
(44)
;
j=1
t2f1;::;T g
2
Pkn •
q=1 q we
h
kn
X
E •q2 G
q=1
n;q 1
i
!1+
=2
3
5 < 1;
(45)
which are adapted to the current setting from Conditions (A.26), (A.27) and (A.28) in Kuersteiner
and Prucha (2013). These conditions in turn are related to conditions of Hall and Heyde (1980)
and are shown by Kuersteiner and Prucha (2013) to be su¢ cient for their Theorem 1.
To show that (43) holds note that from (42) and Condition 1 it follows that for some constant
C < 1;
kn
X
E
•q
0+
X
23+2
2+
1+ =2
q=1
E
t= 0 +1
h
2+
T
n
23+2 X X h
E k
+ 1+ =2
n
t=1 i=1
23+2 C
1+ =2
+
i
;t
G
n;t n
y 2+
it k
G
n;t n
i
23+2 nT C
23+2 C 23+2 T C
=
+
!0
=2
n1+ =2
n =2
because 23+2 C and T are …xed as ; n ! 1.
P n •2
Next, consider the probability limit of kq=1
q : We have
kn
X
•2
q
q=1
=
k
0+
1X X
0
j;
;t jG n;t n
E
;t
j=1 t= 0 +1
2
+p
n
X
0
1;
t2f 0 ;:::; 0 + g
\fmin(1; 0 );::;T g
(46)
1ij
;t jG n;t n
E
;t
2
(
y
1t
E[
0
y
1t jG n;t n ])
1;y 1 ft
0
+ [ r1 ]g 1ij 1j
(47)
+
1
n
X
n
X
t2fmin(1; 0 );::;T g i=1
0
1;y
y
n;it
E
y
n;it jG n;t n+i 1
55
2
1j
(48)
where for (46) we note that E
k
0+
1X X
;t jG n;t n
0
j;
= 0 when t > T . This implies that
j=1 t= 0 +1
1X
X
k
=
2
;t jG n;t n
E
;t
1ij
0
j;
j=1 t2f 0 ;:::; 0 + g\fmin(1; 0 +1);::;T g
0+
X
1X
k
+
2
0
j;
;t
2
;t jG n;t n
E
;t
1ij
1ij
j=1 t=maxf 0 ;T g
where by Condition 2
0+
X
1X
k
2
0
j;
;t
j=1 t=maxf 0 ;T g
1f
0
p
+ [ rj 1 ] < t
0
+ [ rj ]g !
k
X
0
j;
(
(rj )
(rj 1 ))
j;
j=1
and
2
1+ =2
k
6 1X
6
E6
4
j=1
X
0
j
;t
t2f 0 ;:::; 0 + g
\fmin(1; 0 +1);::;T g
20
k
6BX
1
6B
= 1+ =2 E 6@
4 j=1
(T + j 0 j)
=2
k
0
;t
n;t n
E
t2f 0 ;:::; 0 + g
\fmin(1; 0 +1);::;T g
=2
1+ =2
22+ (T + j 0 j)1+
1+ =2
X
2
;t jG
E
k
X
X
;t jG
E
j=1 t2f 0 ;:::; 0 + g\fmin(1; 0 +1);::;T g
=2
k 1+
=2
sup E
t
h
2+
;t
i
h
7
7
7
5
1ij
2
n;t n
0
3
;t
(49)
11+
C
1ij C
A
E
=2
3
7
7
7
5
;t jG
n;t n
2+
i
! 0;
where the …rst inequality follows from noting that the set f 0 ; :::;
0
+ g \ fmin (1; 0 ) ; ::; T g has
at most T + j 0 j elements and from using Jensen’s inequality on the counting measure. The second
inequality follows from Hölder’s inequality. Finally, we use the fact that (T + j 0 j) = ! 0 and
h
i
2+
supt E
C < 1 by Condition 1(iv).
;t
56
Next consider (47) where
2
6 2
E6
4 p
2
p
E
n
h
n
X
0
1;
t2f 0 ;:::; 0 + g
\fmin(1; 0 );::;T g
X
E
t2f 0 ;:::; 0 + g
\fmin(1; 0 );::;T g
(
y
1t
E[
h
0
1
23
p sup E
n t
1y
i
;t
t2f 0 ;:::; 0 + g
\fmin(1; 0 );::;T g
1=2
sup E
;t
y
1t
(
2
n;t n
i
E[
0
y
1t jG n;t n ])
1;y 1 ft
1=2
0
7
+ [ r1 ]g 7
5
1=2
X
1=2
24 T + j 0 j
p
sup E
n
t
;t jG
E
;t
2
0
y
1t jG n;t n ])
;t jG n;t n+i 1
E
;t
3
i;t
1f
0
<t
E
(
y
1t
h
h
2
y
n;it
0
E[
i
(50)
+ [ rj ]g
2
0
y
1t jG n;t n ])
1;y
i
1=2
(51)
1=2
!0
(52)
where the …rst inequality in (50) follows from the Cauchy-Schwartz inequality, (51) uses Condition
1(iv), and the last inequality uses Condition 1(iii). Then we have in (51), by Condition 1(iii) and
the Hölder inequality that
E
such that (52) follows.
h
(
y
1t
E[
2
0
y
1t jG n;t n ])
y
i
h
2E k
y 2
1t k
i
p
We note that (52) goes to zero as long as T = n ! 0: Clearly, this condition holds as long as
T is held …xed, but holds under weaker conditions as well.
Next the limit of (48) is, by Condition 1(v) and Condition 3,
n
1 X X 0
2 p
y
y
E n;it
jG n;t n+i 1
!
1;y
n;it
n
i=1
t2f1;::;T g
This veri…es (44). Finally, for (45) we check that
2
kn
X
2
4
sup E
E •q G
n
First, use (42) with
kn
X
q=1
n;q 1
q=1
!1+
= 0 to obtain
E
•q
0+
23 X
2
G
n;q 1
t=
+
23
n
E
0
h
n
X X
t2f1;::;T g i=1
57
2
;t
E
h
G
=2
X
0
1;y
yt 1;y :
t2f1;::;T g
3
5 < 1:
n;t n
i
2
y
n;it
G
n;t n+i 1
(53)
i
:
(54)
Applying (54) to (53) and using the Hölder inequality implies
2
!1+ =2 3
kn
X
2
5
E4
E •q G n;q 1
q=1
=2
2
2
0+
X
3
2
E4
20
6 23
+ 2 =2 E 4@
n
E
t= 0 +1
h
2
G
;t
X
n;t n+i 1
n
X
E
t2f 0 ;:::; 0 + g\f1;::;T g i=1
h
i
!1+
=2
2
y
n;it
3
5
G
n;t n+i 1
i
By Jensen’s inequality, we have
0+
1 X
E
t= 0 +1
h
and
E
so that
2
3
2
E4
0+
X
2
;t
h
h
E
t= 0 +1
G
2
G
;t
2
;t
G
n;t n+i 1
!
i 1+
n;t n+i 1
n;t n+i 1
i1+
!
i 1+
=2
0+
1 X
E
t= 0 +1
=2
E
=2
3
h
=2
0+
X
=2
> T (which holds eventually)
20
n
h
2
6@ 23 X X
y
E
jG
E4
n;it
n
i=1
sup E
n;t n+i
t2f1;::;T g
=2
23+3
(T n)
n1+
3+3 =2
2
T
=2
=2
T X
n
X
E
t=1 i=1
1+ =2
sup E
i;t
h
G
t= 0 +1
23+3
h
2+
y
n;it
2+
y
n;it
i
i
3+3 =2
2
sup E
t
h
2+
;t
i
+2
3+3 =2
T
=2
1+ =2
sup E
h
i
G
< 1:
3
7
5
2+
y
n;it
=2
i
;t
;t
i1+
n;t n+i 1
ii
(55)
(56)
<1
i;t
58
7
5
2+
2+
11+
i
A
1
3
n;t n+i 1
h h
E E
h
=2
A
n;t n+i 1
By combining (55) and (56) we obtain the following bound for (53),
2
!1+ =2 3
kn
X
2+
5
E4
E •q
jG n;q 1
q=1
G
;t
;t
t
and similarly, for all
2
2+
23+3
5
h
11+
i
< 1:
This establishes that (43), (44) and (45) hold and thus establishes the CLT for
Pkn •
q:
q=1
It remains to be shown that the second term in (41) can be neglected. Consider
max(T; 0 + )
X
n X
k
X
0
jE
t=min(1; 0 +1) i=1 j=1
=
k
0+ X
X
1=2
t=
+n
1=2
0
0
j;
h
~it (rj ) jG
E
;t jG n;t n+i 1
0
1;y E
y
n;it jG n;t n+i 1
j=1
T X
n
X
n;t n+i 1
t=1 i=1
1f
0
i
+ [ rj 1 ] < t
0
+ [ rj ]g
:
Note that
;t jG n;t n+i 1
E
= 0 for t > T
and
y
n;it jG n;t n+i 1
E
= 0:
This implies, using the convention that a term is zero if it is a sum over indices from a to b with
a > b, that
1=2
0+
X
t=
=
1=2
0
;t jG n;t n+i 1
E
0
T X
k
X
t=
0
0
j;
;t jG n;t n+i 1
E
j=1
1f
0
+ [ rj 1 ] < t
0
+ [ rj ]g :
By a similar argument used to show that (49) vanishes, and noting that T is …xed while
it follows that
as
2
E4
1=2
1+ =2
T
X
t=
0
j;
E
0
n X
k
0+ X
X
t=
0
0
jE
i=1 j=1
h
~it (rj ) jG
3
5!0
;t jG n;t n
! 1: The Markov inequality then implies that
1=2
! 1;
n;t n+i 1
i
= op (1) :
and consequently that
0
1 Xn
(r1 ) +
k
X
0
j
Xn (rj ) =
kn
X
q=1
j=2
59
•q + op (1) :
(57)
We have shown that the conditions of Theorem 1 of Kuersteiner and Prucha (2013) hold by
establishing (43), (44), (45) and (57). Applying the Cramer-Wold theorem to the vector
Ynt = Xn (r1 )0 ; Xn (r2 )0 :::; Xn (rk )0
0
it follows from Theorem 1 in Kuersteiner and Prucha (2013) that for all …xed r1 ; ::; rk and using
the convention that r0 = 0;
2
0
E [exp (i 0 Ynt )] ! E 4exp @
When
0
1@
2
X
0
1;y
yt 1;y
0
j;
(
(rj )
(rj 1 ))
j;
j=1
t2f1;::;T g
for all r 2 [0; 1] and some
(r) = r
+
k
X
113
AA5 :
(58)
positive de…nite and measurable w.r.t C this
result simpli…es to
k
X
0
j;
(
(rj )
(rj 1 ))
j;
=
j=1
k
X
0
j;
j;
(rj
rj 1 ) :
j=1
The second step in establishing the functional CLT involves proving tightness of the sequence
0
Xn (r) : By Lemma A.3 of Phillips and Durlauf (1986) and Proposition 4.1 of Wooldridge and
White (1988), see also Billingsley (1968, p.41), it is enough to establish tightness componentwise.
This is implied by establishing tightness for
0
Xn (r) for all
2 Rd such that
0
= 1. In the
following we make use of Theorems 8.3 and 15.5 in Billingsley (1968). We need to show that for
the ‘modulus of continuity’
! (Xn ; ) = sup j 0 (Xn (s)
(59)
Xn (t))j
jt sj<
where t; s 2 [0; 1] it follows that
lim lim supP (! (Xn ; )
!0
") = 0:
n;
De…ne
1 XX
Xn ;y (r) = p
n t=1 i=1
T
Since
j 0 (Xn (s)
Xn (t))j
0
y
(Xn
n
;y
y
n;it ;
(s)
Xn
Xn
60
;
;y
Xr]
0 +[
1
(r) = p
(t)) + j
;t :
t= 0 +1
0
(Xn
;
(s)
Xn
;
(t))j
0
y
and noting that
(Xn
;y
(s)
Xn
;y
(t)) = 0 uniformly in t; s 2 [0; 1] because of the initial
condition Xn (0) given in (21) and the fact that Xn
;y
(t) is constant as a function of t: It follows
that
sup j
! (Xn ; )
0
(Xn
;
(s)
Xn
(60)
(t))j
;
js tj<
such that
P (! (Xn ; )
3")
P (! (Xn ; ; )
3") :
To analyze the term in (60) use Billingsley (1968, Theorem 8.4) and the comments in Billingsley
P
(1968, p. 59). Let Ss = t=0 +s0 +1 0 ;t . To establish tightness it is enough to show that for each
0
" > 0 there exists c > 1 and
such that if
P
>
max jSk+s
s
0
p
Sk j > c"
hold for all k: Note that for each k …xed, Ms = Ss+k
"
c2
Sk and Fs = G
(61)
n;(s+k min(1; 0 )n+1) ;
fMs ; Fs g
is a martingale. By a maximal inequality, see Hall and Heyde (1980, Corollary 2.1), it follows that
for each k
P
p
Sk j > c"
max jSk+s
s
max jSk+s Sk jp > (c")p p=2
s
hP
pi
1
k+
0
E
;t
t=k+1
(c")p p=2
2p p=2
"
2p
p
sup E
= 2 p 2 1+p sup E
;t
(c")p p=2 t
c c "
t
=P
p
;t
(62)
by an inequality similar to (49). Note that the bound for (62) does not depend on k. Now
choose c = 2p=(p
2)
supt E
p
1=(p 2)
;t
="(1+p)=(p
2)
such that (61) follows. We now identify
the limiting distribution using the technique of Rootzen (1983). Tightness together with …nite
dimensional convergence in distribution in (58), Condition 2 and the fact that the partition r1 ; :::; rk
is arbitrary implies that for
with
y
=
P
2 Rd with
=
0
y;
yt :
0
1
2
E [exp (i 0 Xn (r))] ! E exp
t2f1;::;T g
0
0
y
y y
+
0
(63)
(r)
Let W (r) = (Wy (r) ; W (r)) be a vector of mutually independent
standard Brownian motion processes in Rd ; independent of any C-measurable random variable.
We note that the RHS of (63) is the same as
E exp
1
2
0
y
y
y +
0
(r)
= E exp i
0
y
1=2
y Wy
(1) + i
Z
r
0
_ (t)
1=2
dW (t)
0
61
(64)
:
The result in (64) can be deduced in the same way as in Du¢ e, Pan and Singleton (2000), in
Rt
particular p.1371 and their Proposition 1. Conjecture that Xt = 0 (@ (t) =@t)1=2 dW (t). By
Condition 2(iii) and the fact that @
(t) =@t does not depend on Xt it follows that the conditions
of Durrett (1996, Theorem 2.8, Chaper 5) are satis…ed. This means that the stochastic di¤erential
Rt
equation Xt = 0 (@ (t) =@t)1=2 dW (s) with initial condition X0 = 0 has a strong solution
X; Wt ; FtW _ C where FtW is the …ltration generated by W (t) : Then, Xt is a martingale (and
thus a local martingale) w.r.t the …ltration FtW : For C-measurable functions
(t) : [0; 1] ! R and
(t) : [0; 1] ! Rk de…ne the transformation
The terminal conditions
(r) = 0 and
r
(r) + (r)0 Xr :
= exp
r
are imposed such that
(r) = i
(r) + (r)0 Xr = exp (i
= exp
0
Xr ) :
The goal is now to show that
r jC]
E[
=
0
= exp a (0) + (0)0 X0 = exp (a (0))
where the initial condition X0 = 0 was used. In other words, we need to …nd
such that
t
=
(t)0 (Xt ),
t
r
=
0
+
Z
(t)0 (Xt ) (Xt ) (t) + _ (t) + _ (t) Xt use Ito’s
1
2
(t) =
Lemma to obtain
r
(s) ds +
s
It follows that for
r
to be a martingale we need
_ (t) = 0 and _ (t) = 1=2 0 (@ (t) =@t)
Z r
Z r
1 0
(r)
(0) =
_ (t) dt =
@
0
0 2
1
2
0
r
s dWs :
(t) = 0 which implies the di¤erential equations
: Using the terminal condition
(t) =@t
dt =
0
(
(r)
C = exp
1
2
(r) = 0 it follows that
(0))
=
1
2
0
(r)
and
(r)
E exp i
Z
0
0
(0) =
(t)
is a martingale. Following the proof of Proposition 1 in Du¢ e, Pan and Singleton
r
(2000) and letting
or
(t) and
Z
r
0
_ (t)
1=2
dW (t)
0
0
(r)
a.s.
(65)
which implies (64) after taking expectations on both sides of (65). To check the regularity conditions in Du¢ e et al (2000, De…nition A) note that
62
t
= 0 because
(x) = 0: Thus, (i) holds
automatically. For (ii) we have
0
t t
and, noting that
j
0
t tj
2 0
t
=
@
sup
2R
sup
k
k=1;
= exp 2 (t) + 2 (t)0 Xt
sup
k
2R
0
@
(t) =@t
1 it follows that
0
@
(t) =@t
jexp (2 (t))j exp 2 (t)0 Xt
0
@
(t) =@t
jexp (2 (t))j
t
k
k=1;
(t)0 (Xt ) such that
t
sup
sup
k
t
(t) =@t
sup
2Rk
k=1;
=
and therefore exp 2 (t)0 Xt
(t) = i
k
t
0
@
(t) =@t
exp 2
t
sup
k
2R
t
k=1;
2Rk
k=1;
such that condition (ii) holds by Condition 2(iii) where supk
0
sup
k
@
supt
(t) =@t
0
@
!
(t) =@t
M
a.s. Finally, for (iii) one obtains similarly that
j
tj
(t)0 Xt
jexp ( (t))j exp
jexp ( (t))j
exp (M ) a:s:
such that the inequality follows.
C.2
Proof of Corollary 1
We note that …nite dimensional convergence established in the proof of Theorem 1 implies that
1
2
E [exp (i 0 Xn (1))] ! E exp
We also note that because of (65) it follows that
Z 1
1=2
0
_ (t)
E exp i
dW (t)
0
y
y y
+
0
= E exp
0
which shows that
C.3
R1
0
_ (t)
(1)
1
2
1=2
dW (t) has the same distribution as
0
:
(1)
(1)1=2 W (1).
Proof of Theorem 2
Let syit ( ; ) = f
;it
( ; ) and st ( ; ) = g
;t
( ; ) in the case of maximum likelihood estimation
and syit ( ; ) = fit ( ; ) and st ( ; ) = gt ( ; ) in the case of moment based estimation. Using
the notation developed before we de…ne
s~yit
( ; )=
8
<
:
syit ( ; )
p
n
if t 2 f1; :::; T g
otherwise
0
63
analogously to (20) and
s~it ( ; ) =
8
<
st ( ; )
p
:
if t 2 f
0
+ 1; :::;
0
+ g and i = 1
otherwise
0
analogously to (19). Stack the moment vectors in
s~it ( ) := s~it ( ; ) = s~yit ( ; )0 ; s~it ( ; )0
1=2
and de…ne the scaling matrix Dn = diag n
1=2
Iy ;
I
0
(66)
where Iy is an identity matrix of
dimension k and I is an identity matrix of dimension k : For the maximum likelihood estimator,
the moment conditions (24) and (25) can be directly written as
max(T; 0 + )
X
n
X
s~it ^; ^ = 0:
t=min(1; 0 +1) i=1
For moment based estimators we have by Conditions 4(i) and (ii) that
max(T; 0 + )
syM
sup
k
0k
"
X
0 0
0
( ; ) ; sM ( ; )
n
X
s~it ( ; ) = op (1) :
t=min(1; 0 +1) i=1
It then follows that for the moment based estimators
max(T; 0 + )
0=s ^ =
X
n
X
s~it ^; ^ + op (1) :
t=min(1; 0 +1) i=1
A …rst order mean value expansion around
max(T; 0 + )
X
op (1) =
n
X
t=min(1; 0 +1) i=1
or
Dn
where
1
^
satis…es
allow for
0
0
@
=
0
0
s~it ( 0 ) + @
max(T; 0 + )
X
n
X
t=min(1; 0 +1) i=1
^
0
0
where
max(T; 0 + )
X
= ( 0 ; 0 )0 and ^ = ^0 ; ^0
n
X
@~
sit
t=min(1; 0 +1) i=1
@~
sit
@
0
1
Dn A
1
@
0
Dn A Dn 1 ^
max(T; 0 + )
X
1
0
n
X
leads to
0
s~it ( 0 ) + op (1)
t=min(1; 0 +1) i=1
and we note that with some abuse of notation we implicitly
to di¤er across rows of @~
sit
=@ 0 . Note that
2
3
@~
syit ( ; ) =@ 0
@~
syit ( ; ) =@ 0
@~
sit
5
=4
0
0
@ 0
@~
sit; ( ; ) =@
@~
sit; ( ; ) =@
64
where s~it; denotes moment conditions associated with : From Condition 4(iii) and Theorem 1
it follows that (note that we make use of the continuous mapping theorem which is applicable
because Theorem 1 establishes stable and thus joint convergence)
max(T; 0 + )
Dn
^
1
=
0
A ( 0)
X
1
n
X
s~it ( 0 ) + op (1)
t=min(1; 0 +1) i=1
It now follows from the continuous mapping theorem and joint convergence in Corollary 1 that
d
Dn 1 ^
C.4
0
!
1=2
W (C-stably)
Proof of Corollary 2
Partition
2
with inverse
2
A ( 0)
1
A ( 0)
1
=4
2
=4
A ( 0) = 4
Ay;1 + Ay;1 Ay;
p1
p
Ay;
p1
A
A
;
A
A
3
;
Ay;
p1
A
1
1
; Ay; Ay;
A
;
1
A ; Ay;1 Ay;
Ay;
;
3
5
p
A ; Ay;1
Ay;1 Ay;
1
; Ay;
A
A
;
1
A ; Ay;1 Ay;
A
;
A
1
; Ay; Ay;
It now follows from the continuous mapping theorem and joint convergence in Corollary 1 that
Dn 1 ^
d
0
!
1
A ( 0)
1=2
W (C-stably)
where the right hand side has a mixed normal distribution,
A ( 0)
and
A ( 0)
1
A ( 0 )0
1
2
=4
The form of the matrices
Ay;
p1
y
A
;
and
1
1=2
W
M N 0; A ( 0 )
y; 0
+ Ay;
yA
p
y; 0
+
A
yA
(1) Ay;
;
1
0
y; 0
(1) A
A ( 0 )0
p1
Ay;
1
A
1
yA
;
; 0
yA
+
; 0
p
+A
Ay;
;
(1) A
(1) A
; 0
; 0
3
5
follow from Condition 5 in the case of the maximum likelihood
estimator. For the moment based estimator,
y
and
syM ( ; ) and sM ( ; ) and Conditions 4(i) and (ii).
65
3
5
1
5:
;
A
;
p
Ay;
follow from Condition 6, the de…nition of
C.5
Proof of Theorem 3
We …rst establish the joint stable convergence of (V
1=2
and V
;t
n
= exp ((t
(r) =
1=2
min (1; 0 )) = )
1=2
[ r]
[ 0 + r] .
min (1; 0 )) = )
1=2
0
P[
1=2
[ r] :
r]
s=min(1; 0 )
t
X
1=2
exp ((t
s) = )
s
s=min(1; 0 )
exp ( s = )
s:
It follows that
+ 1 ft > min (1; 0 )g exp ([ r] = ) V~ n (r) :
We establish joint stable convergence of V~ n (r) ; Y
to deal with the …rst term in
(r) ; Y n ) : Recall that
0 + 1 ft > min (1; 0 )g
De…ne V~ n (r) =
= exp ((t
n
n
and use the continuous mapping theorem
By the continuous mapping theorem (see Billingsley
(1968, p.30)), the characterization of stable convergence on D [0; 1] (as given in JS, Theorem VIII
5.33(ii)) and an argument used in Kuersteiner and Prucha (2013, p.119), stable convergence of
V~ n (r) ; Y
n
implies that
exp ([ r] = ) V~ n (r) ; Y
n
also converges jointly and C-stably. Subsequently, this argument will simply be referred to as
the ‘continuous mapping theorem’. In addition exp (([ r]
min (1; 0 )) = )
0
!p exp (r )
0
which is measurable with respect to C. Together these results imply joint stable convergence
of (V
;s
n
V~ n (r) ; Y
(r) ; Y n ). We thus turn to
= exp ( s = )
s
n
. To apply Theorem 1 we need to show that
satis…es Conditions 1 iv) and 2. Since
jexp ( s = )
2+
sj
= jexp ( s= )j
(2+ )
j s j2+
ej
j(2+ )
j s j2+
(67)
such that
h
i
E jexp ( s = ) s j2+
C
h
i
and Condition 1 iv) holds. Note that E j t j2+
C holds since we impose Condition 7. Next,
2
s]
note that E [exp ( 2s = )
=
2
exp ( 2s = ). Then, it follows from the proof of Chan and
Wei (1987, Equation 2.3)17 that
Xr]
0 +[
1
(
;s )
t= 0 +[ s]+1
17
Xr]
0 +[
2
=
1
exp ( 2 t= )
t= 0 +[ s]+1
See Appendix D for details.
66
2
t
!
p
2
Z
s
r
exp ( 2 s) dt:
(68)
In this case,
2
(r) =
_ (r)
exp ( 2r )) =2 and
(1
1=2
=
r). By the relationship
exp (
in (64) and Theorem 1 we have that
V~ n (r) ; Y
n
Z
)
r
s
e
dW (s) ;
y Wy
C-stably
(1)
0
which implies, by the continuous mapping theorem and C-stable convergence that
Z r
(V n (r) ; Y n ) ) exp (r ) 0 +
e(r s) dW (s) ; y Wy (1) C-stably.
(69)
0
Note that
Rr
0
s)
e(r
dW (s) is the same term as in Phillips (1987) while the limit given in (69)
is the same as in Kurtz and Protter (1991,p.1043).
We now square (28) and sum both sides as in Chan and Wei (1987, Equation (2.8) or Phillips,
(1987) to write
1
+ 0
X
s 1 s
=
s= 0 +1
We note that e
=
=
e
1
2
; +
2
! 1; e
=
2
;
0
1
e2
+
0
=
=
e
1
2
!
2 =
2
e
+ 0
X
=
e
2
s 1
1
2
s= 0 +1
+ 0
X
s= 0 +1
(70)
2 . Furthermore, note that for all
; " > 0 it
follows by the Markov and triangular inequalities and Condition 7iv) that
!
+ 0
X
1
2
1=2
P
E s 1 j tj >
jG n;t n > "
t= 0 +1
+
1 X0
E
" t= +1
2
s1
0
h
i
supt E j t j2+
1=2
j tj >
=2
! 0 as
such that Condition 1.3 of Chan and Wei (1987) holds. Let U 2;k =
Then, by Holder’s and Jensen’s inequality
h
i
2+
E jU ; j
1
+ 0
X
t= 0 +1
E
h
E
2
s jG n;t n
1+ =2
i
1
! 1:
Pk+
0
t= 0 +1
E [ s2 jG
h
i
2+
sup E j t j
<1
n;t n ] :
(71)
t
such that U 2; is uniformly integrable. The bound in (71) also means that by Theorem 2.23 of
P+0
1
2
Hall and Heyde it follows that E U 2;
! 0 and thus by Condition 7 vii) and
s= 0 +1 t
by Markov’s inequality
1
+ 0
X
s= 0 +1
67
2 p
t !
2
:
2
s:
We also have
and
2
+ 0
X
2
s 1
1 2
; +
0
1 2
;
0
1
=
=V
n
(1)2 ;
p
! V (0)2
X
V
s
2
n
=
Z
1
V 2n (r) dr
0
s=1
s= 0 +1
(72)
such that by the continuous mapping theorem and (69) it follows that
1
+ 0
X
s= 0 +1
s 1 s
1
)
V
2
An application of Ito’s calculus to V
2
;V (0)
Z
2
(1)
V (0)
1
V
(r)2 dr
;V (0)
0
;V (0)
2
2
(r)2 =2 shows that the RHS of (73) is equal to
(73)
:
R1
0
V
;V (0) dW
which also appears in Kurtz and Protter (1991, Equation 3.10). However, note that the results in
Kurtz and Protter (1991) do not establish stable convergence and thus don’t directly apply here.
When V (0) = 0 these expressions are the same as in Phillips (1987, Equation 8). It then is a
further consequence of the continuous mapping theorem that
!
Z
+ 0
X
1
V n (r) ; Y n (r) ;
) V ;V (0) (r) ; Y (r) ;
s 1 s
V
;V (0) dW
0
s= 0 +1
C.6
1
Proof of Theorem 4
For s~it ( ) = s~yit ( ; )0 ; s~it; ( )
0
we note that in the case of the unit root model
2
De…ning
@~
sit ( ) 4
=
@ 0
@~
syit
Ay n ( ) = @
A
max(T; 0 + )
X
( ; ) =@
0
@~
sit; ( ) =@
0
n
X
@~
syit
@
t=min(1; 0 +1) i=1
^
n
@~
syit
0
0
we have as before for some ~
( ; ) =@
0
2
( )=4
( )
0
that for
0
Ay n ( )
P 0+
2
t=
68
0
2
;t
1
Dn A
3
5;
3
5:
(C-stably).
we have
Dn
^
1
0
=
A
1
~
n
max(T; 0 + )
X
n
X
s~it ( 0 )
t=min(1; 0 +1) i=1
Using the representation
2
0+
X
t=
2
;t
=
Z
1
V
(r)2 dr;
n
0
0
it follows from the continuous mapping theorem and Theorem 3 that
!
Z 1
+ 0
X
V n (r) ; Y n ; Ay n ( 0 ) ;
V n (r)2 dr; 1
s 1
0
)
V (r) ;
y
1=2
y Wy
(1) ; A ( 0 ) ;
Z
s= 0 +1
1
V
2
;V (0)
(r) dr;
0
Z
(74)
s
V
(C-stably).
;V (0) dW
0
The partitioned inverse formula implies that
2
R1
1
1
A
A
V ;V (0) (r)2 dr
A
y;
y;
y;
0
1
A ( 0) = 4
1
R1
2
V
(r)
dr
0
;V
(0)
0
1
3
5
(75)
By Condition 9, (74) and the continuous mapping theorem it follows that
2
3
1=2
y Wy (1)
5:
Dn 1 ^
A ( 0) 1 4 R s
0 )
V
dW
;V (0)
0
(76)
The result now follows immediately from (75) and (76).
D
Proof of (68)
Lemma 1 Assume that Conditions 7, 8 and 9 hold. For r; s 2 [0; 1] …xed and as
! 1 it follows
that
Xr]
0 +[
1
(
;s )
2
e(
2 t= )
E
t= 0 +[ s]+1
2
t jG n;(t min(1; 0 ) 1)n
p
!0
Proof. By Hall and Heyde (1980, Theorem 2.23) we need to show that for all " > 0
Xr]
0 +[
1
e
2 t=
E
2
t1
1=2
e
t= 0 +[ s]+1
69
t=
t
>"
G
p
n;(t min(1; 0 ) 1)n
! 0:
(77)
By Condition 7iv) it follows that for some > 0
2
0 +[ r]
X
1
1=2
4
E
e 2 t= E t2 1
e
t=
> " jG
t
t= 0 +[ s]+1
Xr]
0 +[
(1+ =2)
t=
e
2+
"
t= 0 +[ s]+1
h
i [ r] [ s]
sup E j t j2+
e(2+
1+ =2 "
t
n;(t min(1; 0 ) 1)n
h
i
E j t j2+
)j j
3
5
! 0:
1
This establishes (77) by the Markov inequality. Since
P
0 +[ r]
( 2 t= )
E
t= 0 +[ s]+1 e
2
t jG n;(t min(1; 0 ) 1)n
is uniformly integrable by (67) and (71) it follows from Hall and Heyde (1980, Theorem 2.23, Eq
2.28) that
2
Xr]
0 +[
E4
1
(
2
;s )
e(
2 t= )
2
t jG n;(t min(1; 0 ) 1)n
E
t= 0 +[ s]+1
The result now follows from the Markov inequality.
3
5 ! 0:
Lemma 2 Assume that Conditions 7, 8 and 9 hold. For r; s 2 [0; 1] …xed and as
that
Xr]
0 +[
1
e
( 2 t= )
2
t jG n;(t min(1; 0 ) 1)n
E
t= 0 +[ s]+1
!
p
2
Z
! 1 it follows
r
exp ( 2 t) dt:
s
Proof. The proof closely follows Chan and Wei (1987, p. 1060-1062) with a few necessary
adjustments. Fix
> 0 and choose s = t0
t1
max e
2 ti
tk = r such that
:::
e
i k
2 ti
< :
1
This implies
Z
s
r
e
2 t
dt
k
X
e
2 ti
(ti
ti 1 )
i=1
k Z
X
i=1
70
ti
ti
e
1
2 t
e
2 ti
dt
:
(78)
Let Ii = fl : [ ti 1 ] < l
[ ti ]g : Then,
Xr]
0 +[
1
2 t=
e
2
t jG n;(t min(1; 0 ) 1)n
E
t= 0 +[ s]+1
k X
X
1
=
2 l=
e
E
k X
X
=
2 l=
e
2 [ ti
e
1 ]=
+
e
2 [ ti
1
1 ]=
i=1
+
k
X
X
e
1 ]=
2
(ti
e
Z
2
2
ti 1 )
Z
2 t
dt
r
2 t
e
dt
s
2
2
l jG n;(l min(1; 0 ) 1)n
E
l2Ii
2 [ ti
r
2
l jG n;(l min(1; 0 ) 1)n
E
i=1 l2Ii
k
X
Z
s
2
l jG n;(l min(1; 0 ) 1)n
i=1 l2Ii
1
2
(ti
ti 1 )
!
r
e
2 t
dt
s
i=1
= In + IIn + IIIn :
2 [ ti
For IIIn we have that e
that for all
0
2 [ ti
; e
1 ]=
1 ]=
2 ti
!e
2 ti
e
as
1
! 1: In other words, there exists a
and by (78)
1
jIIIn j
2 :
We also have by Condition 7vii) that
1
X
2
l jG n;(l min(1; 0 ) 1)n
E
l2Ii
as
! 1 such that by maxi
jIIn j
e2j
Finally, there exists a
j
1
e2
X
[ ti
0
1 ]=
e2j
0
such that for all
l2Ii
2 l=
2
e
2 [ ti
(ti
ti 1 )
j
2
l jG n;(l min(1; 0 ) 1)n
E
l2Ii
max max e
i k
k
!
2
(ti
ti 1 ) = op (1) :
it follows that
1 ]=
max e
2 [ ti ]=
i k
2 max e
e
2 [ ti ]=
e
i k
+ max e
2 ti
i k
2 + =3 :
71
e
2 [ ti
2 ti
1
2 ti
1 ]=
0
such
We conclude that
jIn j
3
1
k X
X
i=1 l2Ii
E
2
l jG n;(l min(1; 0 ) 1)n
=3
2
(1 + op (1)) :
The remainder of the proof is identical to Chan and Wei (1987, p. 1062).
72
Download