Estimation with Aggregate Shocks Jinyong Hahn Guido Kuersteinery Maurizio Mazzoccoz UCLA University of Maryland UCLA February 10, 2016 Abstract Aggregate shocks a¤ect most households’and …rms’decisions. Using two stylized models we show that inference based on cross-sectional data alone generally fails to correctly account for decision making of rational agents facing aggregate uncertainty. We propose an econometric framework that overcomes these problems by explicitly parametrizing the agents’inference problem relative to aggregate shocks. We advocate the combined use of cross-sectional and time series data. Our framework and corresponding examples illustrate that cross-sectional and time series aspects of the model are often interdependent. Our inferential theory rests on a new joint Central Limit Theorem for combined time-series and cross-sectional data sets. In models with aggregate shocks, the asymptotic distribution of the parameter estimators are a combination of asymptotic distributions from cross-sectional analysis and time-series analysis. UCLA, Department of Economics, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095, hahn@econ.ucla.edu y University of Maryland, Department of Economics, Tydings Hall 3145, College Park, MD, 20742, kuersteiner@econ.umd.edu z UCLA, Department of Economics, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095, mmazzocc@econ.ucla.edu 1 1 Introduction An extensive body of economic research suggests that aggregate shocks have important e¤ects on households’and …rms’decisions. Consider for instance the oil shock that hit developed countries in 1973. A large literature has provided evidence that this aggregate shock triggered a recession in the United States, where the demand and supply of non-durable and durable goods declined, in‡ation grew, the unemployment rate increased, and real wages dropped. The profession has generally adopted one of the following three strategies to deal with aggregate shocks. The most common strategy is to assume that aggregate shocks have no e¤ect on households’ and …rms’ decisions and, hence, that aggregate shocks can be ignored. Almost all papers estimating discrete choice dynamic models or dynamic games are based on this premise. Examples include Keane and Wolpin (1997), Bajari, Bankard, and Levin (2007), and Eckstein and Lifshitz (2011). We show that, if aggregate shocks are an important feature of the data, ignoring them generally leads to inconsistent parameter estimates. The second approach is to include time dummies in the model in an attempt to capture the e¤ect of aggregate shocks on the estimation of the parameters of interest, as was done for instance in Runkle (1991) and Shea (1995). We clarify that, if the econometrician does not account properly for aggregate shocks, the parameter estimates will generally be inconsistent even if the actual realizations of the aggregate shocks are observed. The inclusion of time dummies, therefore, fails to solve the problem.1 The last strategy is to fully specify how aggregate shocks a¤ect individual decisions jointly with the rest of the structure of the economic problem. Using this approach, the econometrician can obtain consistent estimates of the parameters of interest. We are aware of only one paper that uses this strategy, Lee and Wolpin (2010). Their paper is primarily focused on the estimation of a speci…c empirical model. They do not address the broader question of which statistical assumptions and what type of data requirements are needed more generally to obtain consistent estimators when aggregate shocks are present. Moreover, as we argue later on, in Lee and Wolpin’s (2010) paper there are issues with statistical inference and e¢ ciency. 1 In the Euler equation context, Chamberlain (1984) uses examples to argue that, when aggregate shocks are present but disregarded, the estimated parameters are generally inconsistent. His examples make clear that time dummies do not solve the problems introduced by the presence of aggregate shocks. 2 The previous discussion reveals that there is no generally agreed on econometric framework for statistical inference in models where aggregate shocks have an e¤ect on individual decisions. We show that inference based on cross-sectional data alone generally fails to correctly account for decision making of rational agents facing aggregate uncertainty. By parametrizing aggregate uncertainty and explicitly accounting for it when solving for the agents decision problem we are able to o¤er an econometric framework that overcomes these problems. We advocate the combined use of cross-sectional and time series data, and we develop simple to use formulas for test statistics and con…dence intervals that enable the combined use of time series and cross-sectional data. We proceed in two steps. We …rst introduce a general class of models with the following two features. First, each model in this class is composed of two submodels. The …rst submodel includes all the cross-sectional features, whereas the second submodel is composed of all the time-series aspects. As a consequence, the parameters of the model can also be divided into two groups: the parameters that characterize the cross-sectional submodel and the parameters that enter the time-series submodel. The second feature is that the two submodels are linked by a vector of aggregates shocks and by the parameters that govern their dynamics. Individual decision making thus depends on aggregate shocks. Given the interplay between the two submodels, the aggregate shocks have complicated e¤ects on the estimation of the parameters of interest. To better understand those e¤ects, we present stylized examples of the general framework that illustrate the complexities introduced by the aggregate shocks. The examples show that, if the existence of aggregate shocks is ignored in the estimation of these models, the estimated parameters will generally be inconsistent and biased. An important implication of this results is that the presence of aggregate shocks generally invalidates cross-sectional inference. The examples show that in general the parameters of our models can be consistently estimated only if cross-sectional variation is used in combination with time-series variation. We argue that the best approach to consistently estimate the parameters of the investigated models is to combine cross-sectional data with a long time-series of aggregate variables. An alternative method would be to combine cross-sectional and time-series variation by using panel data. Panel data, however, are generally too short to achieve consistency, whereas long time-series data are easier to …nd for most of the variables that are of interest to economists. In the second part of the paper, we derive the asymptotic distribution of the parameter esti3 mators. The distribution is derived by assuming that the dimension of the cross-sectional data n as well as the dimension of the time series data go to in…nity.2 The derivation of the asymp- totic distribution of the parameter estimators for our class of models is important for two reasons. First, the asymptotic distribution enables economists who estimate models with aggregate shocks to perform statistical inference. To our knowledge, a framework to perform statistical inference for our class of models had not been derived in the theoretical econometrics literature.3 Second, the derivation of the asymptotic distribution can be used to show that, in our general class of models, consistency can be achieved only if the dimension of the time-series data goes to in…nity. Letting the dimension of the cross-sectional data go to in…nity is not su¢ cient. This is a remarkable result because it indicates that even the parameters of the cross-sectional submodel will be inconsistent if the time-series is not su¢ ciently long. This result explains why time-series of aggregate data should generally be preferred to panel data, the latter having typically a short time dimension. Our asymptotic results are of interest also from a technical point of view. In order to deal with parameters estimated using two data sets of completely di¤erent nature, we adopt the notion of stable convergence and establish a Central Limit Theorem. Stable convergence dates back to Reyni (1963) and was recently used in Kuersteiner and Prucha (2013) in a panel data context to establish joint limiting distributions. Using this concept, we show that the asymptotic distributions of the parameter estimators are a combination of asymptotic distributions from cross-sectional analysis and time-series analysis. We also derive a novel result related to the unit root literature. We show that, when the time series data are characterized by unit roots, the asymptotic distribution is a combination of a normal distribution and the distribution found in the unit root literature. Therefore, the asymptotic distribution exhibits mathematical similarity to the inferential problem in predictive regressions, as discussed by Campbell and Yogo (2006). While the similarity is super…cial in that Campbell and Yogo’s (2006) result is about an estimator based on a single data source, it leads 2 If the econometrician uses panel data combined with time-series data, it is assumed that the time dimension of the panel T remains constant. 3 Lee and Wolpin (2006, footnote 37) attempt to use standard statistical inference. Donghoon Lee, in private communication, kindly informed us that the standard errors in their paper do not account for the noise introduced by the estimation of the time series parameters, i.e. the standard errors reported in the paper assume that the time-series parameters are known or, equivalently, …xed at their estimated value. 4 to the same need to address uniform inference. Phillips (2014) proposes a method of uniform inference for predictive regressions, which we adopt and modify to our own estimation problem in the unit root case. Our paper also contributes to a growing literature whose objective is the estimation of general equilibrium models. Some examples of papers in this literature are Heckman and Sedlacek (1985), Heckman, Lochner, and Taber (1998), Lee (2005), Lee and Wolpin (2006), Gemici and Wiswall (2011), Gillingham, Iskhakov, Munk-Nielsen, Rust, and Schjerning (2015). Aggregate shocks are a natural feature of general equilibrium models. Without them those models have the unpleasant implication that all aggregate variables can be fully explained by observables and, hence, that errors have no e¤ects on those variables. Our general econometric framework makes this point clear by highlighting the impact of aggregate shocks on parameter estimation and the variation required in the data to estimate those models. More importantly, our results provide formulas that can be used to perform statistical inference in a general equilibrium context. The remainder of the paper is organized as follows. In Section 2, we introduce the general model and discuss two examples. In Section 3, we present the intuition underlying our main result, which is presented in Section 4. 2 Model This section is composed of three parts. In the …rst subsection, we introduce in a general form the identi…cation problem generated by the existence of aggregate shocks. In the second subsection, we describe a simple stylized example which illustrates that, in the presence of aggregate shocks, the estimates of the model parameters will generally be inconsistent and biased if the econometrician uses exclusively cross-sectional data. In the third subsection, we consider a general equilibrium example that highlights the complex relationship between the estimation of the cross-sectional and time-series parameters when aggregate shocks are present. In the last subsection, we address the question of whether long panels can be used to solve the problems considered in this paper. 5 2.1 The General Identi…cation Problem We consider a class of models with four main features. First, the model can be divided into two parts. The …rst part encompasses all the static aspects of the model and will be denoted with the term cross-sectional submodel. The second part includes the dynamic aspects of the aggregate variables and will be denoted with the term time-series submodel. Second, the two submodels are linked by the presence of a vector of aggregate shocks t and by the parameters that govern their dynamics. Third, the vector of aggregate shocks may not be observed. If that is the case, it is treated as a set of parameters to be estimated. Lastly, the parameters of the model can be consistently estimated only if a combination of cross-sectional and time-series data are available. We now formally introduce the general model. The variables that characterize the model can be divided into two vectors yi;t and zs . The …rst vector yi;t includes all the variables that characterize the cross-sectional submodel, where i describes an individual decision-maker, a household or a …rm, and t a time period in the cross-section.4 The second vector zs is composed of all the variables associated with the time-series model. Accordingly, the parameters of the general model can be divided into two sets, and . The …rst set of parameters cross-sectional submodel, in the sense that, if the second set was known, characterizes the and t can be con- sistently estimated using exclusively variation in the cross-sectional variables yi;t . Similarly, the vector characterizes the time-series submodel meaning that, if and t were known, those para- meters can be consistently estimated using exclusively the time series variables zs . There are two functions that relate the cross-sectional and time-series variables to the parameters. The function f (yi;t j ; t ; ) restricts the behavior of the cross-sectional variables conditional on a particular value of the parameters. Analogously, the function g (zs j ; ) describes the behavior of the timeseries variables for a given value of the parameters. An example is a situation in which (i) the variables yi;t for i = 1; : : : ; n are i.i.d. given the aggregate shock to ( s ; s 1 ), t, (ii) the variables zs correspond (iii) the cross-sectional function f (yi;t j ; t ; ) denotes the log likelihood of yi;t given the aggregate shock t, and (iv) the time-series function g (zs j ; ) = g ( s j the conditional probability density function of the aggregate shock 4 s given s 1; ) is the log of s 1. In this special Even if the time subscript t is not necessary in this subsection, we keep it here for notational consistency because later we consider the case where longitudinal data are collected. 6 case the time-series function g does not depend on the cross-sectional parameters . More generally, the functions f and g may be the basis for method of moments (the exactly identi…ed case) or GMM (the overidenti…ed case) estimation. In these situations parameters are identi…ed from the conditions EC [f (yi;t j ; t ; )] = 0 given the shock t and E [g (zs j ; )] = 0: The …rst expectation, EC ; is understood as being over the cross-section population distribution holding t …xed, while the second, E ; is over the stationary distribution of the time-series data generating process. We assume that our cross-sectional data consist of fyi;t ; i = 1; : : : ; ng, and our time series data consist of fzs ; s = 0 + 1; : : : ; 0 + g. For simplicity, we assume that 0 = 0 in this section. The parameters of the general model can be estimated by maximizing a well-speci…ed objective function, for instance the likelihood function of the model or the negative of a GMM criterion function.5 Since in our case the general framework is composed of two submodels, a natural approach is to estimate the parameters of interest by maximizing two separate objective functions, one for the cross-sectional model and one for the time-series model. We denote these criterion functions by Fn ( ; t ; ) and G ( ; ) : In the case of maximum likelihood these functions are simply P P Fn ( ; t ; ) = n1 ni=1 f (yi;t j ; t ; ) and G ( ; ) = 1 s=1 g (zs j ; ) : In the case of moment P based estimation, we de…ne the empirical moment functions hn ( ; t ; ) = n1 ni=1 f (yi;t j ; t ; ) P and k ( ; ) = 1 s=1 g (zs j ; ) : For two conforming, almost surely positive de…nite weight matrices Wn and W T , we can then de…ne Fn ( ; t ; ) = hn ( ; t ; )0 WnC hn ( ; t ; ) and G ( ; ) = k ( ; )0 W k ( ; ). The use of two separate objective functions is helpful in our context because it enables us to discuss which issues arise if only cross-sectional variables or only time-series variables are used in the estimation.6 We now formalize the idea that the identi…cation of the parameters of the general model requires the joint use of cross-sectional and time-series data. Let F ( ; t ; ) and G ( ; ) be the 5 Our discussion is motivated by Newey and McFadden’s (1994) uni…ed treatment of maximum likelihood and GMM as extremum estimators. 6 Note that our framework covers the case where the joint distribution of (yit ; zt ) is modelled. Considering the two components separately adds ‡exibility in that data is not required for all variables in the same period. 7 probability limit of the objective functions described above, i.e. F ( ; t ; ) = plim Fn ( ; t ; ) ; n!1 G ( ; ) = plim G ( ; ) : !1 Let the argmax of those probability limits be de…ned as follows: ( ( ) ; ( )) argmax F ( ; ; ) ; ; ( ) and denote with 0 and argmax G ( ; ) ; the true values of the cross-sectional and time-series parameters. 0 Observe that, if the objective function F of the cross-sectional model evaluated at the parameters that maximize it takes the same value for any feasible parameter , then cannot be consistently estimated using exclusively the data that characterize the cross-sectional model. Based on this observation, we can express the idea that with cross-sectional data alone a subset of the parameters of interest cannot be consistently estimated as follows: max F ( ; t ; ) = max F ( ; ; We also assume that 0 ; t for all 0; t) 2 t is needed for the consistent estimation of ( ( ) ; ( )) 6= ( 0 ; t ) for all 6= (1) and vt : (2) 0: Analogously, for the class of models we consider, time-series data alone is not su¢ cient to consistently estimate the time series parameters . This idea can be formalized in the following way: for all max G ( ; ) = max G ( 0 ; ) ( ) 6= 0 for all 6= 2 ; 0: In our class of models, however, all the parameters of interest can be consistently estimated if crosssectional data are combined with time-series data. De…ne ( 0 ; t0 )0 and impose the following two assumptions: (i) there exists a unique solution to the system of equations: @F ( ; t ; ) @G ( ; ) ; = 0; @ 0 @ 0 8 (3) and (ii) the solution is given by the true value of the parameters. In the next subsection, we discuss an example which illustrates why, in our class of models, some of the parameters are generally not consistently estimated if one uses cross-sectional data alone. 2.2 Example 1: Portfolio Choice Consider an economy that, in each period t, is populated by n households. These households are born at the beginning of period t, live for one period, and are replaced in the next period by n new families. The households living in consecutive periods do not overlap and, hence, make independent decisions. Each household is endowed with deterministic income and has preferences over a non-durable consumption good ci;t . The preferences can be represented by Constant Absolute Risk Aversion (CARA) utility functions which take the following form: U (ci;t ) = e ci;t . For simplicity, we normalize income to be equal to 1. During the period in which households are alive, they can invest a share of their income in a risky asset with return ui;t . The remaining share is automatically invested in a risk-free asset with a return r that does not change over time. At the end of the period, the return to the investment is realized and households consume the quantity of the non-durable good they can purchase with the realized income. The distribution of the return on the risky asset depends on aggregate shocks. This is modelled by considering an economy in which the return on the risky asset is the sum of two components: ui;t = t + i;t , where is the aggregate shock and t i;t is an i.i.d. idiosyncratic shock. The idiosyncratic shock, and hence the heterogeneity in the return on the risky asset, can be interpreted as di¤erences across households in transaction costs, in information on the pro…tability of di¤erent stocks, or in marginal tax rates. We assume that hence that ui;t N( ; 2 ), where 2 = 2 + 2 t N( ; 2 ), i;t N (0; 2 ), and . Household i living in period t chooses the fraction of income to be allocated to the risk-free asset i;t by maximizing its life-time expected utility: max E e e s:t: ci;t = e (1 + r) + (1 9 ci;t e) (1 + ui;t ) ; (4) where the expectation is taken with respect to the return on the risky asset. It is straightforward to show that the household’s optimal choice of is given by7 i;t 2 i;t = = +r 2 (5) : We will assume that the econometrician is mainly interested in estimating the risk aversion parameter . We now consider an estimator that takes the form of a population analog of (5), and study the impact of aggregate shocks on the estimator’s consistency when an econometrician works only with cross-sectional data. Our analysis reveals that such an estimator is inconsistent, since the presence of parameters related to aggregate risk precludes cross-sectional ergodicity. Our analysis makes explicit the dependence of the estimator on the probability distribution of the aggregate shock and points to the following way of generating a consistent estimator of . Using time series variation, one can consistently estimate the parameters pertaining to aggregate uncertainty. Those estimates can then be used in the cross-sectional model to estimate the remaining parameters.8 Without loss of generality, we assume that the cross-sectional data are observed in period t = 1. The econometrician observes data on the return of the risky asset ui;t and on the return of the riskfree asset r. We assume that in addition he also observes a noisy measure of the share of resources invested in the risky assets words, yi = (ui1 ; i1 ). i;t = + ei;t , where ei;t is a zero-mean measurement error. In other We make the simplifying assumption that the aggregate shock is observable to econometricians and that the time-series variables only include the aggregate shock: zt = Because 2 = E [ui1 ], = Var (ui1 ), and = E[ i1 ], t. if only cross-sectional variation is used for estimation, we would have the following method-of-moments estimators of those parameters: 1X ui1 = u; n i=1 n ^= 1X (ui1 n i=1 n ^2 = 1X n i=1 n u)2 ; and ^= i1 : In the second step, the econometrician can then use equation (5) to write the risk aversion parameter as 7 8 =( r)/ ( 2 (1 )) and estimate it using the sample analog ^ = (^ r)/ (^ 2 (1 ^ )). See the appendix. Our model is a stylized version of many models considered in a large literature interested in estimating the parameter using cross-sectional variation. Estimators are often based on moment conditions derived from …rst order conditions (FOC) related to optimal investment and consumption decisions. Such estimators have similar problems, which we discuss in Appendix A.2. 10 In the presence of the aggregate shock 1X ^= ui1 = n i=1 n 1X (ui1 n i=1 n ^2 = t, we would have 1X 1+ n i=1 n n which implies that 1X ( n i=1 = 1 + op (1) ; n u)2 = 1X + ei1 = n i=1 ^= i1 i1 )2 = 2 + op (1) ; + op (1) ; will be estimated to be + op (1) ( 2 + op (1)) (1 1 ^= r = + op (1)) r 1 2 (1 ) (6) + op (1) : Using Equation (6), we can study the properties of the proposed estimator ^. If there were no aggregate shock in the model, we would have 1 = , 2 = 0, 2 = 2 and, therefore, ^ would converge to , a nonstochastic constant, as n grows to in…nity. It is therefore a consistent estimator of the risk aversion parameter. In the presence of the aggregate shock, however, the proposed estimator has di¤erent properties. If one conditions on the realization of the aggregate shock , the estimator ^ is inconsistent with probability 1, since it converges to to the true value r ( 2 + 2 )(1 ) 1 2 (1 r ) and not . If one does not condition on the aggregate shock, as n grows to in…nity, ^ converges to a random variable with a mean that is di¤erent from the true value of the risk aversion parameter. The estimator will therefore be biased and inconsistent. To see this, remember that 1 N( ; 2 ). As a consequence, the unconditional asymptotic distribution of ^ takes the following form: ^!N r 2 (1 ) ; 1 2 (1 2 2 ) ! 2 =N + ; 2 2 ( 2 ( 1))2 ; 2 which is centered at + 2 and not at , hence the bias. We are not the …rst to consider a case in which the estimator converges to a random variable. Andrews (2005) and more recently Kuersteiner and Prucha (2013) discuss similar scenarios. Our example is remarkable because the nature of the asymptotic randomness is such that the estimator is not even asymptotically unbiased. This is not the case in Andrews (2005) or Kuersteiner and Prucha (2013), where in spite of the asymptotic randomness the estimator is unbiased.9 9 Kuersteiner and Prucha (2013) also consider cases where the estimator is random and inconsistent. However, in 11 There is a simple explanation for the result discussed above: cross-sectional variation is not su¢ cient for the consistent estimation of the risk aversion parameter if aggregate shocks a¤ect individual decisions.10 To make this point clearer, we now relate this result to the non-consistency conditions (1)-(2) introduced in Section 2.1. We note that given the assumptions of this subsection, conditional on the aggregate shock, yi has the following distribution 3 2 02 31 2 1 0 C 7 4 B6 5A ; y i j 1 N @4 ( 2 + 2 ) + r 5; 2 0 e ( 2 + 2) (7) Using (7), it is straightforward to see that the cross-sectional likelihood is maximized for any arbitrary choice of the time-series parameters =( ; 2 ), as long as one chooses that satis…es the following equation: ( 2 + 2) + r ( 2 + 2) = : As a consequence, the non-consistency conditions (1)-(2) are satis…ed. Hence, consistently estimated by maximizing the cross-sectional likelihood and and 2 cannot be cannot be consistently estimated using only cross-sectional data. A solution to the problem discussed in this subsection is to combine cross-sectional variables with time-series variables. In this case, one can consistently estimate ( ; series of the aggregate data fzt g. Consistent estimation of ( ; plugging the consistent estimators of ( ; 2 2 ; 2 e) 2 ) by using the time- can then be achieved by ) in the correctly speci…ed cross-section likelihood (7). The example in this subsection is a simpli…ed version of the model in Section 2.1 in the sense that the relationship between the cross-sectional and time-series submodels is simple and onedirectional. The variables and parameters of the time-series submodel a¤ect the cross-sectional their case this happens for quite di¤erent reasons, due to endogeneity of the factors. The inconsistency considered here occurs even when the factors are strictly exogenous. 10 As discussed brie‡y in the introductory section, a common practice to address the problems with aggregate shocks is to include time dummies in the model. The portfolio example clari…es that the addition of time dummies does not solve the problem generated by the presence of aggregate shocks. The inclusion of time dummies is equivalent to the assumption that the aggregate shocks are known. But the previous discussion indicates that, using exclusively cross-sectional data, the estimator ^ is biased and inconsistent even if the aggregate shocks are known. An unbiased and consistent estimator of can only be obtained if the distribution of the aggregate shocks is known, which is feasible only by exploiting the variation contained in time-series data. 12 model, but the cross-sectional variables and parameters have no impact on the time-series model. As a consequence, the time-series parameters can be consistently estimated without knowing the cross-sectional parameters. The cross-sectional parameters, however, can be consistently estimated only if the time-series parameters are known. In more complicated situations, such as general equilibrium models, where aggregate shocks are a natural feature, the relationship between the two submodels can be bi-directional. In the next subsection, we discuss a general-equilibrium example in which the relationship between the two submodels is bi-directional as in the general model. 2.3 Example 2: A General Equilibrium Model We consider as a second example a general equilibrium model of labor force participation and saving decisions in which aggregate shocks in‡uence individual choices. In principle, we could have used as a general example a model proposed in the general equilibrium literature such as the model developed in Lee and Wolpin (2006). We decided against this alternative because in those models the e¤ect of the aggregate shocks and the relationship between the cross-sectional and time-series submodels is complicated and therefore di¢ cult to describe. Instead, we have decided to develop a model that is su¢ ciently general to generate an interesting relationship between the two submodels, but at the same time is stylized enough for this relationship to be easy to describe and understand. Consider an economy populated in each period by one in…nitely-lived …rm and two overlapping generations. Each generation is composed of individuals who live for two periods – the working period and the retirement period. Individual i in period t has preferences over a consumption good ci;t and leisure li;t , which can be characterized using a constant absolute risk aversion utility function of the following form: ui (ci;t ; li;t ) = exp f r ( ci;t + (1 ) li;t )g : (8) In the …rst period, individuals choose whether to work, consumption, and how much to save in a risk-free asset with constant return R for the retirement period. These choices maximize the individual’s life-time utility. Each individual is endowed with one unit of time. If they choose to work, they supply inelastically this unit of time and provide to the …rm the amount of e¤ective 13 labor 1 ei;t –the unity of time multiplied by their ability. The inverse ei;t1 of the time endowment is drawn from a uniform distribution de…ned on [ ; 1]. In the second period, they simply consume the available resources. In both periods, individuals are endowed with non-labor income y. The second period non-labor income –pension income –follows the following process: yi;t+1 = 1 + 2 xi;t S t+1 + ln + (9) i;t+1 ; where xi;t is an observable variable known at t, such as the outcome of an aptitude test, an aggregate shock determining the current state of the economy, and shock. We assume that i;t+1 i;t+1 S t+1 is is an idiosyncratic is independent of xi;t as well as ei;t . The aggregate shock follows the following AR(1) process: ln with t+1 S t+1 = % ln S t + normally distributed with mean 0 and variance a normal distribution with mean zero and variance, (10) t+1 ; 2 . The idiosyncratic shock is drawn from . Individuals know the process governing non-labor income. After they observe yi;t+1 and xi;t , they therefore know zt+1 = ln S t+1 + i;t+1 . Each generation is composed of NtA skilled individuals and NtB unskilled individuals, where the type is observed to the econometrician. If a skilled individual chooses to work, she receives A A earning wi;t with wi;t = wtA ei;t , where wtA is the period-t wage paid by the in…nitely-lived …rm to all skilled workers and ei;t is the amount of labor supplied by the worker. Similarly, unskilled B = wtB ei;t . workers are paid wi;t Each period, the in…nitely-lived …rm produces the consumption good using two inputs, skilled and unskilled labor, and the following Cobb-Douglas production function: B = f LA t ; Lt with A + B < 1, where D t D t LA t A LB t B ; (11) B is an aggregate shock, and LA t and Lt are the labor demands for skilled and unskilled labor. Without loss of generality, we normalize the price of consumption to 1. The …rm chooses skilled and unskilled labor by maximizing its pro…t function. The next Proposition describes the main features of the equilibrium that characterizes the economy. 14 Proposition 1 In equilibrium, period-t consumption of skilled and unskilled individuals who choose to work takes the following form: ch;w i;t = where z zt R 1 h yi;t + wi;t + + R+1 (R + 1) and 2 z 1 + 2 xi;t R+1 r z2 z zt + ; 2 (R + 1) R + 1 h = A; B; are the mean and variance of zt+1 conditional on zt . The corresponding con- sumption for individuals who decide not to work is R yi;t + R+1 ch;nw = i;t 1 r z2 z zt + ; 2 (R + 1) R + 1 + 2 xi;t R+1 (12) h = A; B: A worker chooses to work with probability P 1 ei;t wth 1 = (1 ) (1 ) wth 1 ; (13) h = A; B: With this probability, the period-t equilibrium wages of skilled and unskilled individuals solve the following two equations: ln D t + (1 B ) ln A A B ) ln wt (1 1 A + B ln B B ln wtB B = ln NtA +ln (1 wtA ln (1 ); wtB ln (1 ): ) and ln D t + (1 A ) ln B (1 1 B A ) ln wt A + A ln A B A ln wtA = ln NtB +ln (1 ) Proof. In Appendix B.1. We can now consider the estimation of the parameters of interest. For simplicity, we will assume that A and B are known.11 Parameters in this model can be consistently estimated exploiting both cross-sectional and time-series data. The cross section data consist of n i.i.d. observations yi;t+1 ; xi;t ; cA;nw ; cB;nw for t = 1. The time series data include (Yt+1 ; Xt ), where Yt i;t i;t EC [yi;t ] and EC [xi;t ].12 They also include the probability of working for each type of worker, as well as Xt the equilibrium wages. We will assume that the econometrician knows the value of R. By equation (13), we can assume that and 11 12 and are known to the econometrician; it is because we can solve for by plugging the probability of working in its left hand side and the equilibrium wage in In Appendix B.2, we discuss how A and B can be estimated. The EC [ ], which are formally de…ned in Section 2.1, can be understood to be cross-sectional expectations. 15 its right hand side for skilled and unskilled workers. We will assume that the econometrician’s primary objective is to estimate the risk aversion parameter r in (8).13 We initially provide an intuitive discussion. First, equation (9) reveals that a cross-section regression of yi;t+1 on xi;t and a constant provides a consistent estimator of 1 Moreover, the estimated variance of the error term represents a consistent estimator of using the 2 consistently estimated in the …rst-step cross-section regression, let Vt which implies that Vt = Vt+1 = (1 %) 1 1 + %Vt + S t+1 . + ln t+1 . . . Second, Yt+1 2 Xt , As a consequence, an autoregression of Vt on its lag and a constant term is a consistent estimator of 1; 2 2 Using equation (10), it is then straightforward to show that provides a consistent estimator of ((1 consistently estimate %; S t+1 ; + ln 2 2 %) In addition, the estimated variance of the error 1 ; %). . Thus, the second-step, time-series estimation allows us to using cross-sectional estimates obtained in the …rst step. This result is similar to the portfolio-choice example – we need cross-sectional estimates to generate consistent estimates of the time-series parameters. Now note that the …rst and second steps provide consistent estimators of 2 z, z; and ln S t , because z = z %; 2 ; %1 2 2 1 and Vt 1 = 1 + ln S t . %2 2 %2 + %1 2 ; 2 2 z = 2 z %; 2 ; Moreover, replacing for zt = ln 2 1 S t + i;t %2 + 2 2 1 %2 2 2 %2 + ; 2 in equation (12), we can use the cross-sectional average of (R + 1) ca;nw Ryi;t 2 xi;t to produce a consistent estimator of i;t 2 r z + z ln tS . Lastly, since 1 ; z ; z2 ; ln tS have been estimated in the …rst two steps, we 1 2 r z2 can solve for the risk-aversion parameter r using the consistent estimator of 1 + z ln tS . 2 The result is a consistent estimator of r, which can only be generated because we have previously estimated parameters of the cross-sectional model and parameters of the time-series model. We therefore have the bi-directional relationship between the two submodels that was missing from the portfolio regression. The intuitive estimation strategy may be formally related to the method-of-moments type identi…cation discussed at the end of Section 2.1. For this purpose, it su¢ ces to write 13 = It is straightforward to show that the parameters characterizing the production function (11) can be estimated separately using time-series variation alone. 16 (r; 1; 2; 2 = (%; ! 2 ), ), 2 6 6 6 6 6 hn ( ; t ; ) = 6 6 6 6 6 4 1 n Pn i=1 1 n 1 n Pn i=1 Pn 0 @ i=1 (R + 1) ca;nw i;t 1 xi;t 1 A yi;t+1 yi;t+1 1 Ryi;t 2 xi;t + ln 0 B @ and 2 6 6 k ( ; )=6 4 1 P s=1 1 0 @ 1 P Ys+1 s=1 2 Xs ((Ys+2 1 A ((Ys+2 2 Xs+1 ) 1 2 Xs+1 ) (1 %) 1 S t+1 + ln S t+1 2 xi;t 2 xi;t 2 2 r 2 z 2 (Yt 1 + z (1 %; %) % (Ys+1 2 3 ; 1 %; 2 2 2 ; 2 Xt 1 % (Ys+1 2 2 Xs )) 2 Xs )) 2 1) 7 7 7 7 7 7 1 7; 7 7 C 7 A 5 3 7 7 7: 5 This characaterization of the model makes clear that (i) it is identi…ed if both cross-sectional and time-series data are available, (ii) using only cross-sectional data the model cannot be identi…ed unless a subset of the time-series parameters are known, and (iii) using time-series data the model can only be identi…ed if a subset of the cross-sectional parameters are known. 2.4 Long Panels? One may wonder whether the problems discussed in this section can be addressed by exploiting panel data sets in which the time series dimension of the panel data is large enough. Our proposal discussed in the next section requires access to two data sets, a cross-section (or short panel) and a long time series of aggregate variables. One obvious advantage of combining two sources of data is that time series data may contain variables that are unavailable in typical panel data sets. For example the in‡ation rate potentially provides more information about aggregate shocks than is available in panel data. We argue in this subsection that even without access to such variables, the estimator based on the two data sets is expected to be more precise, which suggests that the advantage of data combination goes beyond availability of more observable variables. Consider the alternative method based on one long panel data set, in which both n and T go to in…nity. Since the number of aggregate shocks t increases as the time-series dimension T grows, we expect that the long panel analysis can be executed with tedious yet straightforward arguments 17 by modifying ideas in Hahn and Kuersteiner (2002), Hahn and Newey (2004) and Gagliardini and Gourieroux (2011), among others. We will now illustrate potential problem with the long panel approach with a simple arti…cial example. Suppose that the econometrician is interested in the estimation of a parameter that characterizes the following system of linear equations: qi;t = xi;t t =! ! t 1 + t + "i;t i = 1; : : : ; n; t = 1; : : : ; T; + ut : The variables qi;t and xi;t are observed and it is assumed that xi;t is strictly exogenous in the sense that it is independent of the error term "i;t , including all leads and lags. For simplicity, we also assume that ut and "i;t are normally distributed with zero mean and that "i;t is i.i.d. across both i and t. We will denote by In order to estimate the ratio / !. based on the panel data f(qi;t ; xi;t ) ; i = 1; : : : ; n; t = 1; : : : ; T g, we can adopt a simple two-step estimator of . In a …rst step, the parameter and the aggregate shocks are estimated using an Ordinary Least Square (OLS) regression of qi;t on xi;t and time dummies. t In the second step, the time-series parameter ! is estimated by regressing bt on bt 1 , where bt , t = 1; : : : ; T , are the aggregate shocks estimated in the …rst step using the time dummies. An estimator of can then be obtained as b! b. The following remarks are useful to understand the properties of the estimator b = b! b . First, even if t were observed, for ! b to be a consistent estimator of ! we would need T to go to in…nity, under which assumption we have ! b = !+Op T 1=2 . This implies that it is theoretically necessary to assume that our data source is a “long”panel, i.e., T ! 1. Similarly, ^t is a consistent estimator of t only if n goes to in…nity. As a consequence, we have ^t = t + Op n 1=2 . This implies that it is in general theoretically necessary to assume that n ! 1.14 Moreover, if n and T both go to in…nity, b is a consistent estimator of b = b! b= The Op n 14 1=2 + Op T 1=2 1 p nT and b = + Op n ! + Op 1 p T 1=2 T 1=2 = ! + Op . All this implies that 1 p T = + Op estimation noise of b, which is dominated by the Op T For ! b to have the same distribution as if t 1=2 1 p T : , is the term were observed, we need n to go to in…nity faster than T or equivalently that T = o (n). See Heckman and Sedlacek (1985, p. 1088). 18 that would arise if ! were not estimated. The term re‡ects typical …ndings in long panel analysis (i.e., large n, large T ), where the standard errors are inversely proportional to the square root of the number n T of observations. The fact that it is dominated by the Op T 1=2 term indicates that the number of observations is e¤ectively equal to T , i.e., the long panel should be treated as a time series problem for all practical purposes. This conclusion has two interesting implications. First, the sampling noise due to cross-section variation should be ignored and the “standard” asymptotic variance formulae should generally be avoided in the panel data analysis when aggregate shocks are present. We note that Lee and Wolpin’s (2006, 2010) standard errors use the standard formula that ignores the Op T 1=2 term. Second, since in most cases the time-series dimension T of a panel data set is relatively small, despite the theoretical assumption that it grows to in…nity, estimators based on panel data will generally be more imprecise than may be expected from the “large”number n T of observations.15 In the next section, we formally develop an alternative asymptotic framework that enables one to combine the two sources of data without having to worry about that drawback. Our proposal improves precision by exploiting long time series data of aggregate variables. 3 Asymptotic Inference While so far our discussion focused on the question of identi…cation we now turn the focus on correct inference. The model in Section 2.1 can be generalized by allowing the cross section data to be a short panel data set. This implies that the cross-sectional part of our estimator is based on P P Fn ( ; t ; ) = n1 ni=1 Tt=1 f (yi;t j ; t ; ) in the case of maximum likelihood estimation, and on P P hn ( ; t ; ) = n1 ni=1 Tt=1 f (yi;t j ; t ; ) with Fn ( ; t ; ) = hn ( ; t ; )0 WnC hn ( ; t ; ) in the case of moment based models. The de…nition of the time series criterion G ( ; ) is not a¤ected by the presence of panel cross-sectional data. The M-estimator solves argmax Fn ( ; t ; ) ; 15 and argmax G ( ; ) ; (14) 1 ;:::; T This raises an interesting point. Suppose there is an aggregate time series data set available with which consistent estimation of is feasible at the standard rate of convergence. Also suppose that the number of observations there, say , is a lot larger than T . If this were the case, we should probably speculate that the panel data analysis is strictly dominated by the time series analysis from the e¢ ciency point of view. 19 with the moment equation (3) modi…ed accordingly. We present and study an asymptotic framework such that T is …nite but goes to in…nity at the same rate as n. Here fyi;t ; i = 1; : : : ; n; t = 1; : : : ; T g denotes our cross section data of individual observations, and fzs ; s = 1; : : : ; g denotes our time series data of aggregate variables. Our focus in this paper is on the asymptotic distribution of the estimators. We provide an intuitive sketch of the basic properties of these estimators in this section, leaving the technical details to Section 4. Our asymptotic framework is such that standard textbook level analysis su¢ ces for the discussion of consistency of the estimator. In standard analysis with a single data source, one typically restricts the moment equation to ensure identi…cation, and imposes further restrictions such that the sample analog of the moment function converges uniformly to the population counterpart. An identical analysis applies to our problem, and as such, we simply impose a high-level assumption that our estimator is consistent. We begin this section by arguing that a more intuitive estimation strategy exploiting long panel data may have subtle problems. We then present the intuition underlying the main results. 3.1 Stationary Models Suppose that there is a separate time series zt such that its log of conditional probability density function given zt 1 is given by g (zt j zt 1 ; ). Note that the time series model does not depend on the micro parameter . Let e denote a consistent estimator. We assume that the dimension of the time series data is , and that the in‡uence function of e is such that with E ['s ] = 0. Here, p 0 (e 1 )= p 0+ X (15) 's s= 0 +1 + 1 denotes the beginning of the time series data, which is allowed to di¤er from the beginning of the panel data. Using e from the time series data, we can then consider maximizing the criterion Fn ( ; t ; ) with respect to =( ; 1 ; : : : ; T ). representation is the idea that we are given a short panel for estimation of Implicit in this = ( ; 1 ; : : : ; T ), where T denotes the time series dimension of the panel data. In order to emphasize that T is small, we use the term ’cross-section’for the short panel data set, and adopt asymptotics where 20 T is …xed. The moment equation then is @Fn b; e @ =0 and the asymptotic distribution of b is characterized by p Because p n (@Fn ( ; e) =@ p with @ 2 F ( ; e) @ @ 0 n b @Fn ( ; ) =@ ) A p @Fn ( ; e) n @ p (@ 2 F ( ; ) =@ @ 0 ) pn p @Fn ( ; ) A 1 n @ n b 1 @ 2F ( ; ) ; @ @ 0 B p n 1 A Bp 1 p p : ) we obtain (e 0+ X 's s= 0 +1 ! (16) @ 2F ( ; ) @ @ 0 We adopt asymptotics where n; ! 1 at the same rate, but T is …xed. We stress that a technical di¢ culty arises because we are conditioning on the factors ( 1 ; : : : ; T): This is accounted for in the limit theory we develop through a convergence concept by Renyi (1963) called stable convergence, essentially a notion of joint convergence. It can be thought of as convergence conditional on a speci…ed -…eld, in our case the -…eld C generated by ( 1 ; : : : ; T ). In simple special cases, P 0+ and because T is …xed, the asymptotic distribution of p1 s= 0 +1 's conditional on ( 1 ; : : : ; T ) may be equal to the unconditional asymptotic distribution. However, as we show in Section 4, this is not always the case, even when the model is stationary. Renyi (1963) and Aldous and Eagleson (1978) show that the concepts of convergence of the distribution conditional on any positive probability event in C and the concept of stable convergence are equivalent. Eagelson (1975) proves a stable CLT by establishing that the conditional characteristic functions converge almost surely. Hall and Heyde’s (1980) proof of stable convergence on the other hand is based on demonstrating that the characteristic function converges weakly in L1 . As pointed out in Kuersteiner and Prucha (2013), the Hall and Heyde (1980) approach lends itself to proving the martingale CLT under slightly weaker conditions than what Eagleson (1975) requires. While both approaches can be used to demonstrate very similar stable and thus conditional limit laws, neither simpli…es to conventional marginal weak convergence except in trivial cases. For this reason it is not possible to separate the cross-sectional inference 21 problem from the time series problem simply by ‘…xing’the common shocks ( 1 ; : : : ; approach would only be valid if the shocks ( 1 ; : : : ; T) T ). Such an did not change in any state of the world, in other words if they were constants in a probabilistic sense. Only in that scenario would stable or conditional convergence be equivalent to marginal convergence. The inherent randomness of ( 1; : : : ; T); taken into account by the rational agents in the models we discuss, is at the heart of our examples and is the essence of the inference problems we discuss in this paper. Thus, treating ( 1; : : : ; T) as constants is not an option available to us. This is also the reason why time dummies are no remedy for the problems we analyze. A related idea might be to derive conditional (on C) limiting results separately for the cross-section and time series dimension of our estimators. As noted before, such a result in fact amounts to demonstrating stable convergence, in this case for each dimension separately. Irrespective, this approach is ‡awed because it does not deliver joint convergence of the two components. It is evident from (16) that the continuous mapping theorem needs to be applied to derive the asymptotic distribution of b: Because both A and B are C-measurable random variables in the limit the continuous mapping theorem can only be applied if P 0+ p joint convergence of n@Fn ( ; ) =@ ; 1=2 s= 's and any C-measurable random variable is 0 +1 established. Joint stable convergence of both components delivers exactly that. Finally, we point out that it is perfectly possible to consistently estimate parameters, in our case ( 1 ; : : : ; T ), that remain random in the limit. For related results, see the recent work of Kuersteiner and Prucha (2015). Here, for the purpose of illustration we consider the simple case where the dependence of the time series component on the factors vanishes asymptotically. Let’s say that the unconditional distribution is such that 1 p where 0+ X s= 0 +1 's ! N (0; ) is a …xed constant that does not depend on ( 1 ; : : : ; p @Fn ( ; ) n ! N (0; @ conditional on ( 1 ; : : : ; on ( 1 ; : : : ; T) T ). T): Let’s also assume that y) Unlike in the case of the time series sample, y generally does depend through the parameter . We note that @F ( ; ) =@ is a function of ( 1 ; : : : ; If there is overlap between (1; : : : ; T ) P 0+ and ( 0 + 1; : : : ; 0 + ), we need to worry about the asymptotic distribution of p1 s= 0 +1 's 22 T ). conditional on ( 1 ; : : : ; However, because in this example the only connection between y and p ' is assumed to be through and because T is assumed …xed, the two terms n@Fn ( ; ) =@ P and 1=2 t=0 +0 +1 't are expected to be asymptotically independent in the trend stationary case and when T ). does not depend on ( 1 ; : : : ; T ). Even in this simple setting, independence between the two samples does not hold, and asymptotic conditional or unconditional independence as well as joint convergence with C-measurable random variables needs to be established formally. This is achieved by establishing C-stable convergence in Section 4.2. It follows that p where n b ! N 0; A 1 yA 1 + A 1B B0A 1 (17) ; lim n/ . This means that a practitioner would use the square root of 1 A n 1 yA 1 + n A 1B B0A 1 = 1 A n 1 yA 1 1 + A 1B B0A 1 as the standard error. This result looks similar to Murphy and Topel’s (1985) formula, except that we need to make an adjustment to the second component to address the di¤erences in sample sizes. The asymptotic variance formula is such that the noise of the time series estimator e can make quite a di¤erence if is large, i.e., if the time series size is small relative to the cross section size n. Obviously. this calls for long time series for accurate estimation of even the micro parameter . We also note that time series estimation has no impact on micro estimation if B = 0. This con…rms the intuition that if does not appear as part of the micro moment f , which is the case in Heckman and Sedlacek (1985), and Heckman, Lochner, and Taber (1998), cross section estimation can be considered separate from time series estimation. In the more general setting of Section 4, may depend on ( 1 ; : : : ; T): In this case the limiting distribution of the time series component is mixed Gaussian and dependent upon the limiting distribution of the cross-sectional component. This dependence does not vanish asymptotically even in stationary settings. As we show in Section 4 standard inference based on asymptotically pivotal statistics is available even though the limiting distribution of b is no longer a sum of two independent components. 23 3.2 Unit Root Problems When the simple trend stationary paradigm does not apply, the limiting distribution of our estimators may be more complicated. A general treatment is beyond the scope of this paper and likely requires a case by case analysis. In this subsection we consider a simple unit root model where initial conditions can be neglected. We use it to exemplify additional inferential di¢ culties that arise even in this relatively simple setting. In Section 4.3 we consider a slightly more complex version of the unit root model where initial conditions cannot be ignored. We show that more complicated dependencies between the asymptotic distributions of the cross-section and time series samples manifest. The result is a cautionary tale of the di¢ culties that may present themselves when nonstationary time series data are combined with cross-sections. We leave the development of inferential methods for this case to future work. We again consider the model in the previous section, except with the twist that (i) AR(1) coe¢ cient in the time series regression of zt on zt 1 is the with independent error; and (ii) is at (or near) unity. In the same way that led to (16), we obtain p @Fn ( ; ) A n @ p n b 1 1 A B p n (e ) For simplicity, again assume that the two terms on the right are asymptotically independent. The …rst term converges in distribution to a normal distribution N (0; A 1 yA 1 ), but with = 1 and i.i.d. AR(1) errors the second term converges to W (1)2 1 A B R1 ; 2 0 W (r)2 dr 1 p where = lim when is away from unity. The result is formalized in Section 4.3. n/ and W ( ) is the standard Wiener process, in contrast to the result in (17) The fact that the limiting distribution of ^ is no longer Gaussian complicates inference. This discontinuity is mathematically similar to Campbell and Yogo’s (2006) observation, which leads to a question of how uniform inference could be conducted. In principle, the problem here can be analyzed by modifying the proposal in Phillips (2014, Section 4.3). First, construct the 1 con…dence interval for argmax Fn ( ; ) for using Mikusheva (2007). Call it [ 2[ L; U ]. Assuming that 24 L; U ]. Second, compute b ( ) 1 is …xed, characterize the asymptotic variance ( ), say, of p n b( ) ( ) , which is asymptotically normal in general. Third, construct the con…dence region, say CI ( 2 ; ), using asymptotic normality and ( ). Our con…dence S interval for 1 is then given by 2[ L ; U ] CI ( 2 ; ). By Bonferroni, its asymptotic coverage rate 1 2 is expected to be at least 1 4 2. 1 Joint Panel-Time Series Limit Theory In this section we …rst establish a generic joint limiting result for a combined panel-time series process and then specialize it to the limiting distributions of parameter estimates under stationarity and non-stationarity. The process we analyze consists of a triangular array of panel data y n;it observed for i = 1; :::; n and t = 1; : : : ; T where n ! 1 while T is …xed and t = 1 is an arbitrary normalization of time at the beginning of the cross-sectional sample. It also consists of a separate triangular array of time series 1 < K ;t for t = 0 + 1; : : : ; K < 1 for some bounded K and 0 0 + where 0 is …xed with y n;it and ! 1. Typically, ;t are the scores of a cross-section and time series criterion function based on observed data yit and zt : We assume that T 0 y n;it ; + . Throughout we assume that ;t is a martingale di¤erence sequence relative to a …ltration to be speci…ed below. We derive the joint limiting distribution P P P 0+ y and a related functional central limit theorem for p1n Tt=1 ni=1 n;it and p1 ;t . t= 0 +1 We now construct the triangular array of …ltrations similarly to Kuersteiner and Prucha (2013) by setting C = G G ( 1 ; :::; G n;0 =C G n;i = and (18) zmin(1; 0 ) ; yj;min(1; 0) i j=1 _C .. . n;n+i n;(t min(1; 0 ))n+i T) n j=1 = yj;min(1; = yj;t 1 ; yj;t 2 ; : : : ; yj;min(1; 0) ; zmin(1; 0 )+1 ; zmin(1; 0) ; yj;min(1; 0 )+1 .. . We use the convention that G n;(t min(1; 0 ))n added simultaneously to the …ltration G =G 0) n j=1 ; zt ; zt 1 ; : : : ; zmin(1; n;(t min(1; 0 ) 1)n+n : n;(t min(1; 0 ))n+1 . 25 0) i j=1 _C ; fyj;t gij=1 _ C This implies that zt and y1t are Also note that G n;i predates the time series sample by at least one period, i.e. corresponds to the ‘time zero’sigma …eld. To simplify notation de…ne the function qn (t; i) = (t min(1; 0 ))n + i that maps the two-dimensional index (t; i) into the integers and note that for q = qn (t; i) it follows that q 2 f0; : : : ; max(T; )ng. The …ltrations G E ;t G n;q are increasing in the sense that G n;qn (t 1;i) but also on G n;q n;q+1 for all q; and n. We note that = 0 for all i is not guaranteed because we condition not only on zt 1 ; zt 2 ::: 1 ; :::; T , where the latter may have non-trivial overlap with the former. The central limit theorem we develop needs to establish joint convergence for terms involving both y n;it and ;t with both the time series and the cross-sectional dimension becoming large simultaneously. Let [a] be the largest integer less than or equal a. Joint convergence is achieved by stacking both moment vectors into a single sum that extends over both t and i: Let r 2 [0; 1] and de…ne ~ (r) = p ;t 1 f it 0 +1 t 0 + [ r]g 1 fi = 1g ; (19) which depends on r in a non-trivial way. This dependence will be of particular interest when we specialize our models to the near unit root case. For the cross-sectional data de…ne y ~y = pn;it it n (20) where ~ity is constant as a function of r 2 [0; 1] : In turn, this implies that functional convergence of the component (20) is the same as the …nite dimensional limit. It also means that the limiting process is degenerate (i.e. constant) when viewed as a function of r: However, this does not matter in our applications as we are only interested in the sum 1 XX p n t=1 i=1 T De…ne the stacked vector ~it (r) = max(T; 0 + ) n y n;it Xn (r) = = X n X ~y it Xny : t=min(1; 0 +1) i=1 ~y0 ; ~ (r)0 it it max(T; 0 + ) X n X 0 2 Rk and consider the stochastic process ~it (r) ; 0 Xn (0) = (Xny0 ; 0) : (21) t=min(1; 0 +1) i=1 We derive a functional central limit theorem which establishes joint convergence between the panel and time series portions of the process Xn (r). The result is useful in analyzing both trend stationary and unit root settings. In the latter, we specialize the model to a linear time series 26 setting. The functional CLT is then used to establish proper joint convergence between stochastic integrals and the cross-sectional component of our model. For the stationary case we are mostly interested in Xn (1) where in particular max(T; 0 + ) 0+ X 1 p ;t t= 0 +1 = X n X ~ (1) : it t=min(1; 0 +1) i=1 The limiting distribution of Xn (1) is a simple corollary of the functional CLT for Xn (r) : We note that our treatment di¤ers from Phillips and Moon (1999), who develop functional CLT’s for the time series dimension of the panel data set. In our case, since T is …xed and …nite, a similar treatment not applicable. We introduce the following general regularity conditions. In later sections these conditions will be specialized to the particular models considered there. Condition 1 Assume that i) y n;it ii) ;t is measurable with respect to G n;(t min(1; 0 ))n+i : is measurable with respect to G n;(t min(1; 0 ))n+i iii) for some iv) for some v) E vi) E y n;it ;t h > 0 and C < 1; supit E h > 0 and C < 1; supt E 2+ y n;it i 2+ ;t i for all i = 1; :::; n: C for all n C for all G n;(t min(1; 0 ))n+i 1 = 0: G n;(t min(1; 0 ) 1)n+i = 0 for t > T and all i = 1; :::; n: 1: 1: Remark 1 Conditions 1(i), (iii) and (v) can be justi…ed in a variety of ways. One is the subordinated process theory employed in Andrews (2005) which arises when yit are random draws from a population of outcomes y. A su¢ cient condition for Conditions 1(v) to hold is that E [ (yj ; ; t )j C] = 0 holds in the population. This would be the case, for example, if were the correctly speci…ed score for the population distribution. See Andrews (2005, pp. 1573-1574). Condition 2 Assume that: i) for any s; r 2 [0; 1] with r > s; 1 Xr] 0 +[ t= 0 +[ s]+1 ;t 0 p ;t ! (r) 27 (s) as !1 where (s) is positive de…nite a.s. and measurable with respect to (r) r 2 (s; 1]. Normalize ( 1 ; :::; T) for all (0) = 0: ii) The elements of derivatives _ (r) = @ (r) are bounded continuously di¤erentiable functions of r > s 2 [0; 1]. The (r) =@r are positive de…nite almost surely. iii) There is a …xed constant M < 1 such that supk 2Rk k=1; supt 0 @ M a.s. (t) =@t Condition 2 is weaker than the conditions of Billingsley’s (1968) functional CLT for strictly stationary martingale di¤erence sequences (mds). We do not assume that E ;t t 0 is constant. Brown (1971) allows for time varying variances, but uses stopping times to achieve a standard Brownian limit: Even more general treatments with random stopping times are possible - see Gaenssler and Haeussler (1979). On the other hand, here convergence to a Gaussian process (not a standard Wiener process) with the same methodology (i.e. establishing convergence of …nite dimensional distributions and tightness) as in Billingsley, but without assuming homoskedasticity is pursued. Heteroskedastic errors are explicitly used in Section 4.3 where Even if s is iid(0; 2 ) it follows that ;t ;t = exp ((t s) = ) s. is a heteroskedastic triangular array that depends on . It can be shown that the variance kernel (r) is (r) = 2 (1 exp ( 2r ))/ 2 in this case. See equation (68). Condition 3 Assume that where ty 1X n i=1 n y n;it p y0 n;it ! ty is positive de…nite a.s. and measurable with respect to ( 1 ; :::; T): Condition 2 holds under a variety of conditions that imply some form of weak dependence of the process ;t . These include, in addition to Condition 1(ii) and (iv), mixing or near epoch dependence assumptions on the temporal dependence properties of the process ;t : Assumption 3 holds under appropriate moment bounds and random sampling in the cross-section even if the underlying population distribution is not independent (see Andrews, 2005, for a detailed treatment). 28 4.1 Stable Functional CLT This section details the probabilistic setting we use to accommodate the results that Jacod and Shiryaev (2002) (shorthand notation JS) develop for general Polish spaces. Let ( 0 ; F 0 ; P 0 ) be a probability space with increasing …ltrations Ftn and an increasing sequence kn ! 1 as n ! 1. Let DRk [0; 1] ! Rk n for any t = 1; :::; kn Ft+1 F and Ftn Rk [0; 1] be the space of functions Rk that are right continuous and have left limits (see Billingsley (1968, p.109)). Let C be a sub-sigma …eld of F 0 . Let ( ; Z n (!; t)) : or random elements in R and DRk ( 0 ; F 0 ; P 0 ) and assume that Rk 0 [0; 1] ! R Rk Rk be random variables [0; 1], respectively de…ned on the common probability space is bounded and measurable with respect to C. Equip DRk Rk [0; 1] with the Skorohod topology, see JS (p.328, Theorem 1.14). By Billingsley (1968), Theorem 15.5 the uniform metric can be used to establish tightness for certain processes that are continuous in the limit. We use the results of JS to de…ne a precise notion of stable convergence on DRk Rk [0; 1]. JS (p.512, De…nition 5.28) de…ne stable convergence for sequences Z n de…ned on a Polish space. We adopt their de…nition to our setting, noting that by JS (p.328, Theorem 1.14), DRk Rk [0; 1] equipped with the Skorohod topology is a Polish space. Also following their De…nition VI1.1 and Theorem VI1.14 we de…ne the -…eld generated by all coordinate projections as DRk De…nition 1 The sequence Z n converges C-stably if for all bounded C and for all bounded continuous functions f de…ned on DRk measure on ( 0 DRk Rk [0; 1] ; C n E [ f (Z )] ! Z DRk Rk Rk Rk . measurable with respect to [0; 1] there exists a probability ) such that (! 0 ) f (x) (d! 0 ; dx) : 0 D Rk Rk [0;1] As in JS, let Q (! 0 ; dx) be a distribution conditional on C such that (d! 0 ; dx) = P 0 (d! 0 ) Q (! 0 ; dx) and let Qn (! 0 ; dx) be a version of the conditional (on C) distribution of Z n : Then we can de…ne the joint probability space ( ; F; P ) with = 0 DRk Rk [0; 1] ; F = F 0 DRk Rk and P = P (d!; dx) = P 0 (d! 0 ) Q (! 0 ; dx) : Let Z (! 0 ; x) = x be the canonical element on DRk Rk [0; 1] : R (! 0 ) f (x) (d! 0 ; dx) = E [ Qf ] :We say that Z n converges C-stably to Z if for It follows that all bounded, C-measurable ; E [ f (Z n )] ! E [ Qf ] : 29 (22) More speci…cally, if W (r) is standard Brownian motion, we say that Z n ) W (r) C-stably where the notation means that (22) holds when Q is Wiener measure (for a de…nition and existence proof see Billingsley (1968, ch.9)). By JS Proposition VIII5.33 it follows that Z n converges C-stably i¤ Z n is tight and for all A 2 C, E [1A f (Z n )] converges. The concept of stable convergence was introduced by Renyi (1963) and has found wide application in probability and statistics. Most relevant to the discussion here are the stable central limit theorem of Hall and Heyde (1980) and Kuersteiner and Prucha (2013) who extend the result in Hall and Heyde (1980) to panel data with …xed T . Dedecker and Merlevede (2002) established a related stable functional CLT for strictly stationary martingale di¤erences. Following Billingsley (1968, p. 120) let r1 ;:::;rk Z n = Zrn1 ; :::; Zrnk be the coordinate projections of Z n : By JS VIII5.36 and by the proof of Theorems 5.7 and 5.14 on p. 509 of JS (see also, Rootzen (1983), Feigin (1985), Dedecker and Merlevede (2002, p. 1057)), C-stable convergence for Z n to Z follows if E f Zrn1 ; :::; Zrnk ! E [ f (Zr1 ; :::; Zrn )] and Z n is tight under the measure P: We note that the …rst condition is equivalent to stable convergence of the …nite dimensional vector of random variables Zrn1 ; :::; Zrnk de…ned on Rk and is established with a multivariate stable central limit theorem. Theorem 1 Assume that Conditions 1, 2 and 3 hold. Then it follows that for ~it de…ned in (21), and as ; n ! 1 and T …xed, 2 Xn (r) ) 4 where By (r) = y 1=2 Wy (r); B (r) = measurable, _ (s) = @ Rr 0 By (1) B (r) 3 5 (C-stably) _ (s)1=2 dW (s) and (r) = diag ( y; (r)) is C- (s) =@s and (Wy (r) ; W (r)) is a vector of standard k -dimensional Brownian processes independent of . Proof. In Appendix C. Remark 2 Note that the component Wy (1) of W (r) does not depend on r: Thus, Wy (1) is simply a vector of standard Gaussian random variables, independent both of W (r) and any random variable measurable w.r.t C. 30 The limiting random variables By (r) and B (r) both depend on C and are thus mutually dependent. The representation By (1) = y 1=2 Wy (1); where a stable limit is represented as the product of an independent Gaussian random variable and a scale factor that depends on C, is common in the literature on stable convergence. Results similar to the one for B (r) were obtained by Phillips (1987, 1988) for cases where _ (s) is non-stochastic and has an explicitly functional form, notably for near unit root processes and when convergence is marginal rather than stable. Rootzen (1983) establishes stable convergence but gives a representation of the limiting process in terms of standard Brownian motion obtained by a stopping time transformation. The representation of B (r) in terms of a stochastic integral over the random scale process _ (s) seems to be new. It is obtained by utilizing a technique mentioned in Rootzen (1983, p. 10) but not utilized there, namely establishing …nite dimensional convergence using a stable martingale CLT. This technique combined with a tightness argument establishes the characteristic function of the limiting process. The representation for B (r) is then obtained by utilizing results for characteristic functions of a¢ ne di¤usions in Du¢ e, Pan and Singletion (2000). Rootzen (1983, p.13) similarly utilizes characteristic functions to identify the limiting distribution in the case of standard Brownian motion, a much simpler scenario than ours. Finally, the results of Dedecker and Merlevede (2002) di¤er from ours in that they only consider asymptotically homoskedastic and strictly stationary processes. In our case, heteroskedasticity is explicitly allowed because of _ (s) : An important special case of Theorem 1 is the near unit root model discussed in more detail in Section 4.3. More importantly, our results innovate over the literature by establishing joint convergence between cross-sectional and time series averages that are generally not independent and whose limiting distributions are not independent. This result is obtained by a novel construction that embeds both data sets in a random …eld. A careful construction of information …ltrations G n;n+i allows to map the …eld into a martingale array. Similar techniques were used in Kuersteiner and Prucha (2013) for panels with …xed T: In this paper we extend their approach to handle an additional and distinct time series data-set and by allowing for both n and to tend to in…nity jointly. In addition to the more complicated data-structure we extend Kuersteiner and Prucha (2013) by considering functional central limit theorems. The following corollary is useful for possibly non-linear but trend stationary models. 31 Corollary 1 Assume that Conditions 1, 2 and 3 hold. Then it follows that for ~it de…ned in (21), and as ; n ! 1 and T …xed, d Xn (1) ! B := where = diag ( y; 1=2 W (C-stably) (1)) is C-measurable and W = (Wy (1) ; W (1)) is a vector of standard d-dimensional Gaussian random variables independent of . The variables y; (:) ; Wy (:) and W (:) are as de…ned in Theorem 1. Proof. In Appendix C. d The result of Corollary 1 is equivalent to the statement that Xn (1) ! N (0; ) conditional on positive probability events in C. As noted earlier, no simpli…cation of the technical arguments are possible by conditioning on C except in the trivial case where is a …xed constant. Eagleson (1975, Corollary 3), see also Hall and Heyde (1980, p. 59), establishes a simpler result where d Xn (1) ! B weakly but not (C-stably). Such results could in principle be obtained here as well, but they would not be useful for the analysis in Sections 4.2 and 4.3 because the limiting distributions of our estimators not only depend on B but also on other C-measurable scaling matrices. Since the continuous mapping theorem requires joint convergence, a weak limit for B alone is not su¢ cient to establish the results we obtain below. Theorem 1 establishes what Phillips and Moon (1999) call diagonal convergence, a special form of joint convergence. To see that sequential convergence where …rst n or go to in…nity, followed by the other index, is generally not useful in our set up, consider the following example. Assume that d = k is the dimension of the vector ~it . This would hold for just identi…ed moment estimators and likelihood based procedures. Consider the double indexed process max(T; 0 + ) Xn (1) = X n X ~it (1) : (23) t=min(1; 0 +1) i=1 For each …xed, convergence in distribution of Xn as n ! 1 follows from the central limit theorem in Kuersteiner and Prucha (2013). Let X denote the “large n, …xed ”limit. For each n …xed, convergence in distribution of Xn as ! 1 follows from a standard martingale central limit theorem for Markov processes. Let Xn be the “large , …xed n”limit. It is worth pointing out that the distributions of both Xn and X are unknown because the limits are trivial in one direction. 32 For example, when is …xed and n tends to in…nity, the component 1=2 P 0+ t= 0 +1 trivially ;t P 1=2 0+ t= 0 +1 converges in distribution (it does not change with n) but the distribution of ;t is generally unknown. More importantly, application of a conventional CLT for the cross-section alone will fail to account for the dependence between the time series and cross-sectional components. Sequential convergence arguments thus are not recommended even as heuristic justi…cations of limiting distributions in our setting. 4.2 Trend Stationary Models Let = ( ; 1 ; :::; T ) and de…ne the shorthand notation fit ( ; ) = f (yit j ; ), gt ( ; ) = g ( t j f ;it ( ; ) = @fit ( ; ) =@ f ;it ( 0; 0) ; gt = gt ( 0 ; 0) and g ;t ( ; ) = @gt ( ; ) =@ : Also let fit = fit ( 0 ; and g ;t = g ;t ( 0; 0) : y it ; ;t f ;it = Depending on whether the estimator under consideration is maximum likelihood or moment based we assume that either (f satisfy the same Assumptions as 0) ; t 1; in Condition 1. We recall that t ;it ; g ;t ) or (fit ; gt ) ( ; ) is a function of (zt ; ; ), where zt are observable macro variables. For the CLT, the process t = is evaluated at the true parameter values and treated as observed. In applications, t ( 0; 0) t will be replaced by an estimate which potentially a¤ects the limiting distribution of : This dependence is analyzed in a step separate from the CLT. The next step is to use Corollary 1 to derive the joint limiting distribution of estimators for P = ( 0 ; 0 )0 . De…ne sM L ( ; ) = 1=2 t=0 +0 +1 @g ( t ( ; ) j t 1 ( ; ) ; ; ) =@ and syM L ( ; ) = P P n 1=2 Tt=1 ni=1 @f (yit j ; ) =@ for maximum likelihood, and sM ( ; ) = 0 (@k ( ; ) =@ ) W 1=2 0+ X t= 0 +1 and syM ( ; ) = (@hn ( ; ) =@ )0 WnC n 1=2 PT Pn t=1 i=1 g( t( ; )j t 1 ( ; ); ; ) f (yit j ; ) for moment based estimators. We use s ( ; ) and sy ( ; ) generically for arguments that apply to both maximum likelihood and moment based estimators. The estimator ^ jointly satis…es the moment restrictions using time series data ^; ^ = 0: (24) sy ^; ^ = 0: (25) s and cross-sectional data 33 ; ); 0 De…ning s ( ) = sy ( )0 ; s ( )0 expansion around 0 the estimator ^ satis…es s ^ = 0. A …rst order Taylor series is used to obtain the limiting distribution for ^. We impose the following additional assumption. Condition 4 Let = ( 0 ; 0 )0 2 Rk ; 2 Rk ; and 2 Rk : De…ne Dn = diag n 1=2 Iy ; 1=2 I ;where Iy is an identity matrix of dimension k and I is an identity matrix of dimension k : Let W C = plimn WnC and W and assume the limits to be positive de…nite and C- = plim W measurable. De…ne h ( ; ) = plimn hn ( ; t ; ) and k ( ; ) = plim k ( ; ) : Assume that for some " > 0; i) sup ii) sup :k 0k :k iii)s sup 0k :k @k ( ; )0 =@ W (@k ( ; ) =@ )0 W " 0 (@hn ( ; ) =@ ) WnC " 0k " @s( ) Dn @ 0 rank almost surely. Let = op (1) ; 0 (@h ( ; ) =@ ) W C = op (1) ; A ( ) = op (1) where A ( ) is C-measurable and A = A ( 0 ) is full = lim n= ; 2 A=4 p Ay; p1 A ; Ay; A ; 3 5 with Ay; = plim n 1 @sy ( 0 ) =@ 0 , Ay; = plim n 1 @sy ( 0 ) =@ 0 , A A ; = plim 1 ; = plim 1 @s ( 0 ) =@ 0 and @s ( 0 ) =@ 0 . Condition 5 For maximum likelihood criteria the following holds: P +[ r] p (r) i) for any s; r 2 [0; 1] with r > s; 1 t=0 0 +[ s]+1 g ;t g 0 ;t ! (s) as ! 1 and where (r) satis…es the same regularity conditions as in Condition 2(ii). P p ii) n1 ni=1 f ;it f 0 ;it ! ty for all t 2 [1; :::; T ] and where ty is positive de…nite a.s. and measurable P with respect to ( 1 ; :::; T ) : Let y = Tt=1 ty . Condition 6 For moment based criteria the following holds: P 0 +[ r] p i) for any s; r 2 [0; 1] with r > s; 1 t;q= gt gq0 ! g (r) 0 +[ s]+1 g (s) as ! 1 and where (r) satis…es the same regularity conditions as in Condition 2(ii). P P p ii) n1 ni=1 fit fir0 ! t;rf for all t; s 2 [1; :::; T ] Let f = Tt;r=1 t;sf : Assume that g de…nite a.s. and measurable with respect to ( 1 ; :::; 34 T ). f is positive Condition 6 accounts for the possibility of misspeci…cation of the model. In that case, the martingale di¤erence property of the moment conditions may not hold, necessitating the use of robust standard errors through long run variances. The following result establishes the joint limiting distribution of ^: y it ; Theorem 2 Assume that Conditions 1, 4, and either 5 with y it ; likelihood based estimators or 6 with hold. Assume that ^ 0 ;t = (f ;it ; g ;t ) in the case of = (fit ; gt ) in the case of moment based estimators ;t = op (1) and that (24) and (25) hold. Then, d Dn 1 ^ ! 0 1 A 1=2 W (C-stably) where A is full rank almost surely, C-measurable and is de…ned in Condition 4. The distribution 1=2 of W is given in Corollary 1. In particular, maximum likelihood = y @h( 0 0; 0) @ WC y fW and = diag ( (1)). Then the criterion is y; (1) are given in Condition 5. When the criterion is moment based, C0 @h( 0; 0) @ and @k( (1) = 0 0; 0) @ W g (1) W 0 @k( 0; 0) @ with f and g (1) de…ned in Condition 6. Proof. In Appendix C. Corollary 2 Under the same conditions as in Theorem 2 it follows that p n ^ d 0 Ay; ! 1=2 y Wy p (1) Ay; 1=2 (1) W (1) (C-stably): (26) where Ay; = Ay;1 + Ay;1 Ay; Ay; = Ay;1 Ay; A A ; A ; Ay;1 Ay; A ; Ay;1 Ay; ; 1 1 A ; Ay;1 : For = Ay; y; 0 yA + Ay; (1) Ay; 0 it follows that p n 1=2 ^ d 0 ! N (0; I) (C-stably). 35 (27) Note that ; the asymptotic variance of p ^ n 0 conditional on C, in general is a random variable, and the asymptotic distribution of ^ is mixed normal. However, as in Andrews (2005), the result in (27) can be used to construct an asymptotically pivotal test statistic. For a consistent p 1=2 estimator ^ the statistic n ^ R ^ r is asymptotically distribution free under the null hypothesis R r = 0 where R is a conforming matrix of dimension q k and and r a q 1 vector. 4.3 Unit Root Time Series Models In this section we consider the special case where t+1 = t + t t follows an autoregressive process of the form as discussed in Section 2.3. As in Hansen (1992), Phillips (1987, 1988, 2014) we allow for nearly integrated processes where = exp ( / ) is a scalar parameter localized to unity such that ;t+1 and the notation ;t emphasizes that ;t 1=2 where 0 (28) is = exp ( / ) ;t + (28) t+1 is a sequence of processes indexed by . We assume that ;min(1; 0 ) = 0 = V (0) a.s. is a potentially nondegenerate random variable. In other words, the initial condition for ;min(1; 0) = 1=2 0. We explicitly allow for the case where 0 = 0, to model a situation where the initial condition can be ignored. This assumption is similar, although more parametric than, the speci…cation considered in Kurtz and Protter (1991). We limit our analysis to the case of maximum likelihood criterion functions. Results for moment based estimators can be developed along the same lines as in Section 4.2 but for ease of exposition we omit the details. For the unit root version of our model we assume that t is observed in the data and that the only parameter to be estimated from the time series data is : Further assuming a Gaussian quasi-likelihood function we note that the score function now is g ;t ( ; )= ;t 1 ( ;t ;t 1 ): (29) The estimator ^ solving sample moment conditions based on (29) is the conventional OLS estimator given by P t= ^= P ;t 1 ;t : 2 ;t 1 t= 0 +1 0 +1 36 We continue to use the de…nition for f where 0 =( ; 0 ). ;it ( ; ) in Section 4.2 but now consider the simpli…ed case We note that in this section, rather than 0 used in the cross-sectional model. The implicit scaling of ;min(1; 0 ) n (r) ; Y n ) where V Y Note that Z n (r) = 1=2 P 0 +[ r] t= 0 +1 1=2 (r) = is necessary in the ! 1. [ r] ; and T X n X f ;it p : = n t=1 i=1 n Xr] 0 +[ r V n dW n 1 = 0 with W n 1=2 by cross-sectional speci…cation to maintain a well de…ned model even as Consider the joint process (V is the common shock ;min(1; 0) ;t 1 t t= 0 +1 t. We de…ne the limiting process for V n (r) as Z r r e (r s) dW (s) V ;V (0) (r) = e V (0) + (30) 0 where W is de…ned in Theorem 1. When 0 = 0; Theorem 1 directly implies that e [r ]= V n (r) ) Rr e s dW (s) C-stably noting that in this case (s) = 2 (1 exp ( 2s )) 2 and _ (s)1=2 = 0 Rr e s . The familiar result (cf. Phillips 1987) that V n (r) ) 0 e (r s) dW (s) then is a consequence of the continuous mapping theorem. The case in (30) where 0 is a C-measurable random variable now follows from C-stable convergence of V n (r) : In this section we establish joint C-stable Rr convergence of the triple V n (r) ; Y n ; 0 V n dW n : Let 0 = ( 0 ; )0 2 Rk ; = exp ( 0 = ) with 0 2 Rk ; and 2 R and both 2 R: The true parameters are denoted by 0 and 0 and bounded. We impose the following modi…ed 0 assumptions to account for the the speci…c features of the unit root model. Condition 7 De…ne C = except that here ( 0 ). De…ne the -…elds Gn;jmin(1; = n such that dependence on 0 )jn+i in the same way as in (18) is suppressed and that t is replaced with t in G n;(t min(1; 0 ))n+i = yjt 1 ; yjt 2 ; : : : ; yj min(1; 0) n j=1 Assume that i) f ;it is measurable with respect to G n;(t min(1; 0 ))n+i : 37 ; t ; t 1 ; : : : ; min(1; 0 ) ; (yj;t )ij=1 _ C as ii) t is measurable with respect to G iii) for some iv) for some ;it jG n;(t min(1; 0 ))n+i 1 vi) E t jG n;(t min(1; 0 ) 1)n+i vii) For any 1 > r > s r;s ; h > 0 and C < 1; supit E kf ;it k2+ h i 2+ > 0 and C < 1; supt E k t k v) E f Then, n;(t min(1; 0 ))n+i !p (r viii) Assume that respect to C. Let s) 2 : Pn 1 i=1 n y = f PT for all i = 1; :::; n C C =0 = 0 for t > T and all i = f1; :::; ng: P 0 +[ r] 1 0 …xed let r;s; = t=min(1; 0 )+[ 0 ;it f ;it t=1 i p ! ty where ty s]+1 E 2 t jG n;(t min(1; 0 ) 1)n+n : is positive de…nite a.s. and measurable with ty . Conditions 7(i)-(vi) are the same as Conditions 1 (i)-(vi) adapted to the unit root model. Condition 7(vii) replaces Condition 2. It is slightly more primitive in the sense that if t2 is homoskedasP 0 +[ r] 2 tic, Condition 7(vii) holds automatically and convergence of 1 t=min(1; ! (r s) 2 0 )+[ s]+1 t follows from an argument given in the proofs rather than being assumed. On the other hand, Condition 7(vii) is somewhat more restrictive than Condition 2 in the sense that it limits heteroskedasticity to be of a form that does not a¤ect the limiting distribution. In other words, P 0 +[ r] 2 we essentially assume 1 t=min(1; to be proportional to r s asymptotically. This 0 )+[ s]+1 t assumption is stronger than needed but helps to compare the results with the existing unit root literature. For Condition 7(viii) we note that typically 0 = ( 00 ; 0 0; 0 ). Thus, even if ty ty 0 ;it f ;it ( )=E f and (:) is non-stochastic, it follows that surable with respect to C because it depends on 0 ty ty = ty ( 0 ) where is random and mea- which is a random variable measurable w.r.t C. The following results are established by modifying arguments in Phillips (1987) and Chan and Wei (1987) to account for C-stable convergence and by applying Theorem 1. Theorem 3 Assume that Conditions 7 hold. As ; n ! 1 and T …xed with 2 (0; 1) it follows that Z s V n (r) ; Y n ; V n dW 0 n ) V ;V (0) (r) ; 1=2 y Wy (1) ; Z 0 in the Skorohod topology on DRd [0; 1] : 38 = n for some s V ;V (0) dW (C-stably) Proof. In Appendix C. We now employ Theorem 3 to analyze the limiting behavior of ^ when the common factors are generated from a linear unit root process. To derive a limiting distribution for ^ we impose the following additional assumption. P P Condition 8 Let ^ = arg max Tt=1 ni=1 f (yit j ; ^). Assume that Condition 9 Let s~yit ( ) = f ;it p ( ) = n and s~it ( ) = 1 fi = 1g g ^ ;t 0 = Op n 1=2 . ( ) = . Assume that s~yit ( ) : Rk ! Rk , s~it ( ) : R ! R and de…ne Dn = diag n 1=2 Iy ; 1 , where Iy is an identity matrix of P dimension k . Let = lim n= 2 . Let Ay; ( ) = Tt=1 E [@syit ( ) =@ 0 ] ; Ay; ( ) = and de…ne Ay ( ) = random functions h Ay; ( ) p Ay; ( ) T X i E [@syit ( ) =@ ] t=1 where A ( ) is a k k dimensional matrix of non- ! R. Assume that Ay; ( 0 ) is full rank almost surely. Assume that for some " > 0; max(T; 0 + ) sup :k 0k " X t=min(1; 0 +1) n X @~ syit ( ) Dn 0 @ i=1 Ay ( ) = op (1) We make the possibly simplifying assumption that A ( ) only depends on the factors through the parameter : Theorem 4 Assume that Conditions 7, 8 and 9 hold. It follows that Z 1 1 Z 1 p p d 1 1=2 1 2 ^ Ay; y Wy (1) n Ay; Ay; V ;V (0) dr V 0 ! 0 ;V (0) dW (C-stably). 0 Proof. In Appendix C. The result in Theorem 4 is an example that shows how common factors a¤ecting both time series and cross-section data can lead to non-standard limiting distributions. In this case, the initial condition of the unit root process in the time series dimension causes dependence between the components of the asymptotic distribution of ^ because both on 0. y and V ;V (0) in general depend Thus, the situation encountered here is generally more di¢ cult than the one considered in Stock and Yogo (2006) and Phillips (2014). In addition, because the limiting distribution of ^ is not mixed asymptotically normal, simple pivotal test statistics as in Andrews (2005) are not readily available contrary to the stationary case. 39 5 Summary We argue that cross-sectional inference generally fails in the face of aggregate uncertainty a¤ecting agents’decisions. We propose a general econometric framework that rests on parametrizing the e¤ects of aggregate shocks and on using a combination of time series and cross-sectional data to identify parameters of interest. We develop new asymptotics for combined data-sets based on the notion of stable convergence and establish a Central Limit Theorem. Our results are expected to be helpful for the econometric analysis of rational expectations models involving individual decision making as well general equilibrium settings. 40 References [1] Andrews, D.W.K. (2005): “Cross-Section Regression with Common Shocks,” Econometrica 73, pp. 1551-1585. [2] Aldous, D.J. and G.K. Eagleson (1978): “On mixing and stability of limit theorems,” The Annals of Probability 6, 325–331. [3] Arellano, M., R. Blundell, and S. Bonhomme (2014): “Household Earnings and Consumption: A Nonlinear Framework,”unpublished working paper. [4] Bajari, Patrick, Benkard, C. Lanier, and Levin, Jonathan (2007): “Estimating Dynamic Models of Imperfect Competition,”Econometrica 75, 5, pp. 1331-1370. [5] Billingsley, P. (1968): “Convergence of Probability Measures,” John Wiley and Sons, New York. [6] Brown, B.M. (1971): “Martingale Central Limit Theorems,”Annals of Mathematical Statistics 42, pp. 59-66. [7] Campbell, J.Y., and M. Yogo (2006): “E¢ cient Tests of Stock Return Predictability,”Journal of Financial Economics 81, pp. 27–60. [8] Chamberlain, G. (1984): “Panel Data,” in Handbook of Econometrics, eds. by Z. Griliches and M. Intriligator. Amsterdam: North Holland, pp. 1247-1318. [9] Chan, N.H. and C.Z. Wei (1987): “Asymptotic Inference for Nearly Nonstationary AR(1) Processes,”Annals of Statistics 15, pp.1050-1063. [10] Dedecker, J and F. Merlevede (2002): “Necessary and Su¢ cient Conditions for the Conditional Central Limit Theorem,”Annals of Probability 30, pp. 1044-1081. [11] Du¢ e, D., J. Pan and K. Singleton (2000): “Transform Analysis and Asset Pricing for A¢ ne Jump Di¤usions,”Econometrica, pp. 1343-1376. [12] Durrett, R. (1996): “Stochastic Calculus,”CRC Press, Boca Raton, London, New York. 41 [13] Eagleson, G.K. (1975): “Martingale convergence to mixtures of in…nitely divisible laws,”The Annals of Probability 3, 557–562. [14] Feigin, P. D. (1985): “Stable convergence of Semimartingales,”Stochastic Processes and their Applications 19, pp. 125-134. [15] Gagliardini, P., and C. Gourieroux (2011): “E¢ ciency in Large Dynamic Panel Models with Common Factor,”unpublished working paper. [16] Gemici, A., and M. Wiswall (2014): “Evolution of Gender Di¤erences in Post-Secondary Human Capital Investments: College Majors,”International Economic Review 55, 23–56. [17] Gillingham, K., F. Iskhakov, A. Munk-Nielsen, J. Rust, and B. Schjerning (2015): “A Dynamic Model of Vehicle Ownership, Type Choice, and Usage,”unpublished working paper. [18] Hahn, J., and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed E¤ects When Both n and T are Large,”Econometrica 70, pp. 1639– 57. [19] Hahn, J., and W.K. Newey (2004): “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,”Econometrica 72, pp. 1295–1319. [20] Hall, P., and C. Heyde (1980): “Martingale Limit Theory and its Applications,” Academic Press, New York. [21] Hansen, B.E. (1992): “Convergence to Stochastic Integrals for Dependent Heterogeneous Processes,”Econometric Theory 8, pp. 489-500. [22] Heckman, J.J., L. Lochner, and C. Taber (1998), “Explaining Rising Wage Inequality: Explorations with a Dynamic General Equilibrium Model of Labor Earnings with Heterogeneous Agents,”Review of Economic Dynamics 1, pp. 1-58. [23] Heckman, J.J., and G. Sedlacek (1985): “Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market,” Journal of Political Economy, 93, pp. 1077-1125. 42 [24] Jacod, J. and A.N. Shiryaev (2002): “Limit Theorems for stochastic processes,” Springer Verlag, Berlin. [25] Kuersteiner, G.M., and I.R. Prucha (2013): “Limit Theory for Panel Data Models with Cross Sectional Dependence and Sequential Exogeneity,”Journal of Econometrics 174, pp. 107-126. [26] Kuersteiner, G.M and I.R. Prucha (2015): “Dynamic Spatial Panel Models: Networks, Common Shocks, and Sequential Exogeneity,”CESifo Working Paper No. 5445. [27] Kurtz and Protter (1991): “Weak Limit Theorems for Stochastic Integrals and Stochastic Di¤erential Equations,”Annals of Probability 19, pp. 1035-1070. [28] Lee, D. (2005): “An Estimable Dynamic General Equilibrium Model of Work, Schooling and Occupational Choice,”International Economic Review 46, pp. 1-34. [29] Lee, D., and K.I. Wolpin (2006): “Intersectoral Labor Mobility and the Growth of Service Sector,”Econometrica 47, pp. 1-46. [30] Lee, D., and K.I. Wolpin (2010): “Accounting for Wage and Employment Changes in the U.S. from 1968-2000: A Dynamic Model of Labor Market Equilibrium,” Journal of Econometrics 156, pp. 68–85. [31] Magnac, T. and D. Thesmar (2002): “Identifying Dynamic Discrete Decision Processes,” Econometrica 70, pp. 801–816. [32] Magnus, J.R. and H. Neudecker (1988): “Matrix Di¤erential Calculus with Applications in Statistics and Econometrics,”Wiley. [33] Mikusheva, A. (2007): “Uniform Inference in Autoregressive Models,” Econometrica 75, pp. 1411–1452. [34] Murphy, K. M. and R. H. Topel (1985): “Estimation and Inference in Two-Step Econometric Models,”Journal of Business and Economic Statistics 3, pp. 370 –379. [35] Peccati, G. and M.S. Taqqu (2007): “Stable convergence of generalized L2 stochastic integrals and the principle of conditioning,”Electronic Journal of Probability 12, pp.447-480. 43 [36] Phillips, P.C.B. (1987): “Towards a uni…ed asymptotic theory for autoregression,”Biometrika 74, pp. 535-547. [37] Phillips, P.C.B. (1988): “Regression theory for near integrated time series,” Econometrica 56, pp. 1021-1044. [38] Phillips, P.C.B. and S.N. Durlauf (1986): “Multiple Time Series Regression with Integrated Processes,”Review of Economic Studies 53, pp. 473-495. [39] Phillips, P.C.B. and H.R. Moon (1999): “Linear Regression Limit Theory for Nonstationary Panel Data,”Econometrica 67, pp. 1057-1111. [40] Phillips, P.C.B. (2014): “On Con…dence Intervals for Autoregressive Roots and Predictive Regression,”Econometrica 82, pp. 1177–1195. [41] Renyi, A (1963): “On stable sequences of events,”Sankya Ser. A, 25, 293-302. [42] Rootzen, H. (1983): “Central limit theory for martingales via random change of time,” Department of Statistics, University of North Carolina, Technical Report 28. [43] Runkle, D.E. (1991): “Liquidity Constraints and the Permanent-Income Hypothesis: Evidence from Panel Data,”Journal of Monetary Economics 27, pp. 73–98. [44] Shea, J. (1995):“Union Contracts and the Life-Cycle/Permanent-Income Hypothesis,”American Economic Review 85, pp. 186–200. [45] Tobin, J. (1950): “A Statistical Demand Function for Food in the U.S.A.,” Journal of the Royal Statistical Society, Series A. Vol. 113, pp.113-149. [46] Wooldridge, J.M. and H. White (1988): “Some invariance principles and central limit theorems for dependent heterogeneous processes,”Econometric Theory 4, pp.210-230. 44 Appendix –Available Upon Request A Discussion for Section 2.2 A.1 Proof of (5) The maximization problem is equivalent to Since (1 ) ui;t N ( (1+r)+(1 max e (1 ) ; E e (1 2 )2 (1 )ui;t )) =e E e 2 (1 (1 )ui;t : , we have ) + 2 (1 )2 2 2 ; and the maximization problem can be rewritten as follows: max (1+r)+(1 e )(1+ ) )2 2 (1 2 : Taking the …rst order condition, we have, 0= r + 2 2 from which we obtain the solution = A.2 1 2 r + 2 : Euler Equation and Cross Section Our model in Section 2.2 is a stylized version of many models considered in a large literature interested in estimating the parameter using cross-sectional variation. Estimators are often based on moment conditions derived from …rst order conditions (FOC) related to optimal investment and consumption decisions. We illustrate the problems facing such estimators. Assume a researcher has a cross-section of observations for individual consumption and returns ci;t and ui;t . The population FOC of our model16 takes the simple form E e 16 We assume 6= 0 and rescale the equation by 1 : 45 ci;t (r ui;t ) = 0. A just-identi…ed moment based estimator for solves the sample analog n 0. It turns out that the probability limit of ^ is equal to ( r)/ ((1 t 1 Pn e i=1 2 ) ^ci;t (r ui;t ) = ), i.e., ^ is inconsistent. We now compare the population FOC a rational agent uses to form their optimal portfolio with the empirical FOC an econometrician using cross-sectional data observes: 1 n n X ci;t e (r ui;t ) = 0: i=1 Noting that ui;t = t + i;t and substituting into the budget constraint ci;t = 1 + r + (1 ) ui;t = 1 + r + (1 ) t + (1 ) i;t we have n 1 n X e ci;t (r ui;t ) = n i=1 1 n X (1+ r+(1 e ) t) (1 ) i;t (r (31) i;t ) t i=1 =e (1+ r+(1 ) t) (r 1 t) n n X e (1 ) 1 n i;t i=1 n 1 n X e (1 ) i;t =E e (1 ) + op (1) = e i;t e (1 ) i=1 Under suitable regularity conditions including independence of that n X i;t i;t i;t ! : in the cross-section it follows 2 (1 )2 2 2 + op (1) (32) i=1 and n 1 n X e (1 ) i;t i;t =E e (1 ) i;t i;t + op (1) = (1 ) 2 e 2 (1 )2 2 2 + op (1) : (33) i=1 Taking limits as n ! 1 in (31) and substituting (32) and (33) then shows that the method of moments estimator based on the empirical FOC asymptotically solves (r Solving for t) + (1 ) 2 e 2 (1 )2 2 2 = 0: (34) we obtain plim ^ = t (1 r ) 2 : This estimate is inconsistent because the cross-sectional data set lacks cross sectional ergodicity, or in other words does not contain the same information about aggregate risk as is used by rational 46 agents. Therefore, the empirical version of the FOC is unable to properly account for aggregate risk and return characterizing the risky asset. The estimator based on the FOC takes the form of an implicit solution to an empirical moment equation, which obscures the e¤ects of cross-sectional non-ergodicity. A more illuminative approach uses our modelling strategy in Section 2.1. On the other hand, it is easily shown using properties of the Gaussian moment generating function that the population FOC is proportional to (1 E e )ui;t (r ui;t ) = r + (1 ) 2 e (1 The main di¤erence between (32) and (33) lies in the fact that sample and that t 6= ) + 2 v 2 (1 )2 2 2 = 0: (35) is estimated to be 0 in the in general. Note that (35) implies that consistency may be achieved with a large number of repeated cross sections, or a panel data set with a long time series dimension. However, this raises other issues discussed later in Section 2.4. B B.1 Details of Section 2.3 Proof of Proposition 1 We start by considering the problem solved by individual i. For simplicity, we will assume that the discount factor is equal to 1. We solve the problem in two steps. First, conditional on the working decision, we determine optimal consumption and, hence, savings. Then, we determine the optimal working decision. Conditional on working in period t, worker i solves the following program: max bt ;ci;t ;ci;t+1 exp ( r ci;t ) + Et [ exp ( r ci;t+1 ) exp ( r (1 ))] s.t. ci;t + bw i;t = yi;t + wi;t ci;t+1 = yi;t+1 + Rbw i;t ; or equivalently, max ci;t ;ci;t+1 exp ( r ci;t ) + Et [ exp ( r ci;t+1 ) exp ( r (1 s.t. ci;t+1 = yi;t+1 + R (yi;t + wi;t 47 ci;t ) : ))] The …rst order condition for ci;t generate the following standard Euler equation: exp ( r ci;t ) = Et [exp ( r ci;t+1 ) exp ( r (1 ))] : Replacing for ci;t+1 using the budget constraint, gives us exp ( r ci;t ) = Et [exp f r (yi;t+1 + R (yi;t + wi;t ci;t ))g exp ( r (1 ))] ; which can be written as follows: exp ( r ci;t ) = exp ( r R (yi;t + wi;t ci;t )) exp ( r (1 )) Et [exp ( r yi;t+1 )] : As mentioned in the main text, after the shocks are realized, individuals can infer zt+1 = ln i;t+1 . Under our assumption, zt+1 is normally distributed with mean zero and variance a consequence, zt+1 jzt is also normally distributed with mean 2 z are functions of the variances and covariances of ln and t+1 and variance z zt i;t+1 . 2 z, 2 + where = exp ( r ( + 1 + 1 2 xi;t + zt+1 ))] 2 xi;t )) Et [exp ( r zt+1 )] : Now, conditional on zt , zt+1 is normal with mean equal to Cov (zt ; zt+1 ) (zt E [zt+1 ] + Var (zt ) % 2 1 %2 E [zt ]) = 2 1 %2 2 + zt z zt ; and variance equal to Var (zt+1 ) Cov (zt ; zt+1 )2 = Var (zt ) 1 2 %2 + % 2 2 2 %2 1 2 z: It follows that Et [exp ( r zt+1 )] = exp r z zt + r2 2 2 z 2 and hence, exp ( r ci;t ) = exp ( r R (yi;t + wi;t exp ( r z zt ) exp ci;t )) exp ( r (1 r2 2 2 z 2 48 : + 2 . As z and Speci…cally, for individuals who worked we have Et [exp ( r yi;t+1 )] = Et [exp ( r ( t+1 )) exp ( r ( 1 + 2 xi;t )) Taking the log of both sides and solving for ci;t , we obtain cw i;t = R (yi;t + wi;t ) + 1 + ( 1 + 2 xi;t ) + r z zt 2 2 z 2 : (R + 1) Following similar steps using the non-labor income of an individual who does not work, it can be shown that his consumption takes the following form Ryi;t + ( cnw i;t = 1+ 2 xi;t ) + r z zt 2 2 z 2 (R + 1) : We can now determine whether it is optimal for the individual to work in period t by comparing the life-time utility conditional on working with the corresponding utility conditional on not working. The individual works if expf r cw i;t g + Et expf r cnw i;t g expf r (1 expf r yi;t + wi;t )g + Et cw i;t expf r (1 expf r )g cnw i;t expf r (1 yi;t )g or equivalently expf r cw i;t g + Et expf r expf r cnw i;t g expf r (1 yi;t+1 + R yi;t + wi;t )g + Et expf r cw i;t yi;t+1 + R yi;t expf r (1 cnw i;t )g expf r (1 )g : The Euler equation of a working individual implies that expf r cw i;t g = Et [expf r (yi;t+1 + R (yi;t + wi;t ci;t )) expf r (1 )g] : Similarly, the Euler equation of a non-working individual implies that expf r cnw i;t g expf r (1 )g = Et [expf r (yi;t+1 + R (yi;t ci;t ))g expf r (1 )g] : The above inequality is therefore satis…ed if 2 exp r cw i;t 2 exp r cnw i;t exp ( r (1 )) : nw Taking logs of both sides and replacing for cw i;t and ci;t , the inequality becomes (after simpli…cation) wi;t 1 49 ; or equivalently 1 wt 1 : ei;t 1 is drawn from a uniform distribution de…ned in the interval [ ; 1]. The ei;t probability that a skilled worker chooses to work takes therefore the following form: The random variable P wtA 1 ei;t 1 =P 1 ei;t wtA 1 = (1 ) (1 ) wtA : 1 Similarly, the probability that an unskilled worker decides to work has the following form: P wtB 1 1 ei;t =P 1 ei;t wtB 1 = (1 ) (1 ) wtB : 1 The labor supply of skilled and unskilled worker is therefore given by StA = NtA (1 ) (1 ) wtA and StB = NtB 1 (1 ) (1 ) wtB 1 : or, equivalently, ln StA = ln NtA +ln (1 ) wtA ln (1 and ) ln StB = ln NtB +ln (1 ) wtB ln (1 We will now solve the …rm’s problem. Since the market price is normalized to 1, the …rm chooses the amount of skilled and unskilled labor that maximizes the following pro…t function: max qt c(qt ) = B LA t ;Lt D t LA t LB t A wtA LA t B wtB LB t : B Deriving the …rst order conditions and solving for LA t and Lt , it can be shown that the optimal labor demands take the following form: LA t = " LB t = " 1 D t A wtA B B B wtB # 1 1 and 1 D t B wtB A A A wtA # A B 1 1 A B ; or equivalently, ln LA t = 1 1 A ln B D t + (1 B ) ln A 50 (1 A B ) ln wt + B ln B B ln wtB ): and ln LB t = 1 1 D t ln A B + (1 A ) ln B B A ) ln wt (1 + A ln A A ln wtA : Equilibrium in the labor market requires that ln DtA = ln StA ln DtB = ln StB ; and or, equivalently, ln D t + (1 B ) ln A A B ) ln wt (1 1 A + ln B B B ln wtB B = ln NtA +ln (1 wtA ln (1 ); wtB ln (1 ): ) and ln D t + (1 A ) ln B B A ) ln wt (1 1 A + ln A A A ln wtA B = ln NtB +ln (1 ) The equilibrium wages wtA and wtB are the solution to these two equations. B.2 Estimation of ( A ; B) We assume that the time series data include wtA ; wtB ; StA ; StB ; NtA ; NtB in addition to (Yt+1 ; Xt ). We also assume that NtA ; NtB are exogenous in the sense that they are independent of D t . Note that ln DtA = ln StA can be rewritten as ln 1 A D t (1 = B ) ln A 1 B + 1 1 + A B A B B ln B ln (1 B ln wtA + B 1 A B ) + ln NtA + ln 1 wtA ln wtB : The independence assumption suggests a straightforward method of moments estimation. If we rede…ne to also include u and s and augment k ( ; ) to include 0 1 1 C 1 XB 1 B C B A + ln Nss + ln wsA + ln wsA + B ln Ns C 1 1 1 @ A A B s=1 B ln Ns we can estimate ( A; B ) B A along with the other parameters of the model, where (1 B ) ln A 1 A + B ln B B 51 ln (1 ) E ln 1 A D t : B B ln wsB ; C Proofs for Section 4 C.1 Proof of Theorem 1 To prove the functional central limit theorem we follow Billingsley (1968) and Dedecker and Merlevede (2002). The proof involves establishing …nite dimensional convergence and a tightness argument. For …nite dimensional convergence …x r1 < r2 < Xn (ri ) = Xn (ri ) < rk 2 [0; 1]. De…ne the increment (36) Xn (ri 1 ) : Since there is a one to one mapping between Xn (r1 ) ; :::; Xn (rk ) and Xn (r1 ) ; Xn (r2 ) ; :::; Xn (rk ) we establish joint convergence of the latter. The proof proceeds by checking that the conditions of Theorem 1 in Kuersteiner and Prucha (2013) hold. Let kn = max(T; )n where both n ! 1 and ! 1 such that clearly kn ! 1 (this is a diagonal limit in the terminology of Phillips and Moon, 1999). Let d = k + k : To handle the fact that Xn 2 Rd we use Lemmas A.1 - A.3 in Phillips and Durlauf (1986). De…ne De…ne t = t j = 0 j;y ; 0 j; 0 and let = ( 1; : : : ; k) 2 Rdk with k k = 1: min (1; 0 ). For each n and 0 de…ne the mapping q (t; i) : N2+ ! N+ as q (i; t) := t n+i and note that q (i; t) is invertible, in particular for each q 2 f1; :::; kn g there is a unique pair t; i such that q (i; t) = q: We often use shorthand notation q for q (i; t). Let •q(i;t) k X 0 j ~it (rj ) E j=1 h ~it (rj ) jG n;t n+i 1 i (37) where ~it (rj ) = ~it (rj ) Note that and ~it (rj ) = ~y (rj ) ; it ~it (rj 1 ) ; ~it (r1 ) = ~it (r1 ) : (38) 0 ~ (rj ) with t 8 < 0 for j > 1 ~y (rj ) = it : ~y for j = 1 it 8 < ~ (r ) if [ r ] < t [ r ] and i = 1 j j 1 j ;t ~ (rj ) = : it : 0 otherwise 52 (39) (40) Using this notation and noting that 0 1 Xn (r1 ) + k X Pmax(T; 0+ ) t=min(1; 0 +1) 0 j Pn i=1 •q(i;t) = Pkn •q , we write q=1 Xn (rj ) j=2 kn X = q=1 First analyze the term max(T; 0 + ) •q + X n X k X 0 jE t=min(1; 0 +1) i=1 j=1 Pkn • q=1 q . Note that y n;it h ~it (rj ) G n;t n+i 1 i is measurable with respect to G (41) n;t n+i by con- struction. Note that by (37), (39) and (40) the individual components of •q are either 0 or equal h i to ~it (rj ) E ~it (rj ) jG n;t n+i 1 respectively. This implies that •q is measurable with respect to h i G n;q ; noting in particular that E ~it (rj ) jG n;t n+i 1 is measurable w.r.t G n;t n+i 1 by the propi h • erties of conditional expectations and G n;t n+i 1 G n;q . By construction, E r jG n;q 1 = 0. Pq • This establishes that for Snq = s=1 s , fSnq ; G n;q ; 1 q kn ; n 1g is a mean zero martingale array with di¤erences •q . To establish …nite dimensional convergence we follow Kuersteiner and Prucha (2013) in the proof of their Theorem 2. Note that, for any …xed n and given q, and thus for a corresponding unique vector (t; i) ; there exists a unique j 2 f1; : : : ; kg such that 0 + [ rj 1 ] < t 0 + [ rj ] : Then, •q(i;t) = k X l=1 = 0 1;y + 0 j; ~it (rl ) 0 l E h ~it (rl ) jG n;t n+i 1 i h i ~y jG n;t n+i 1 1 fj = 1g E it it h i ~ (rj ) E ~ (rj ) jG n;t n+i 1 1 f[ rj 1 ] < t ;t ;t ~y [ rj ]g 1 f i = 1g where all remaining terms in the sum are zero because of by (37), (39) and (40). For the subsequent inequalities, …x q 2 f1; :::; kn g (and the corresponding (t; i) and j) arbitrarily. Introduce the shorthand notation 1j = 1 fj = 1g and 1ij = 1 f[ rj 1 ] < t First, note that for [ rj ]g 1 f i = 1g : 0, and by Jensen’s inequality applied to the empirical measure 53 1 4 P4 i=1 xi we have that •q 2+ = 42+ 42+ + 42+ = 22+2 + 22+2 h i h i 1 0 ~y E ~y jG n;t n+i 1 1j + 1 0 ~ (rj ) E ~ (rj ) jG n;t n+i 1 1ij ;t ;t it 4 1;y it 4 j; h i 2+ 2+ 1 1 2+ 2+ y y ~ ~ k 1;y k + k 1;y k E it jG n;t n+i 1 1j it 4 4 h i 2+ 2+ 1 1 + k j; k2+ E ~ ;t (rj ) jG n;t n+i 1 k j; k2+ ~ ;t (rj ) 1ij 4 4 h i 2+ 2+ 1j + k 1;y k2+ E ~ity jG n;t n+i 1 k 1;y k2+ ~ity k j; ~ (rj ) ;t k2+ 2+ +k j; h E ~ ;t (rj ) jG k2+ n;t n+i 1 i 2+ 2+ 1ij : We further use the de…nitions in (19) such that by Jensen’s inequality and for i = 1 and t 2 [ 0 + 1; 0 + ] 2+ ~ (rj ) ;t + E ~ ;t (rj ) jG 1 2+ ;t 1+ =2 1 2+ ;t 1+ =2 while for i > 1 or t 2 =[ 0 + 1; h 0 + E h +E Similarly, for t 2 [1; :::; T ] 2+ it 1 h + E ~ity jG =2 k while for t 2 = [1; :::; T ] Noting that k E •q j;y k jG ;t n;t n+i 1 i y 2+ it k n;t n+i 1 h +E k i n;t n+i 1 i 2+ y 2+ it k jG ~y = 0: it 1 and k 2+ G 2+ 2+ n;t n+i 1 = 0: it n1+ jG ;t 2+ + ]; ~ ~y n;t n+i 1 i n;q 1 j; k < 1, 23+2 1 fi = 1; t 2 [ 0 1+ =2 + 1; 23+2 1 ft 2 [1; :::; T ]g h + E k n1+ =2 54 0 + ]g y 2+ it k E G h 2+ G ;t n;t n+i 1 i ; n;t n+i 1 i (42) where the inequality in (42) holds for check that 0. To establish the limiting distribution of kn X 2+ •q E ! 0; q=1 kn X q=1 and p •2 ! q X 0 1;y 1;y + yt sup E 4 n k X (43) 0 j; (rj rj 1 ) j; (44) ; j=1 t2f1;::;T g 2 Pkn • q=1 q we h kn X E •q2 G q=1 n;q 1 i !1+ =2 3 5 < 1; (45) which are adapted to the current setting from Conditions (A.26), (A.27) and (A.28) in Kuersteiner and Prucha (2013). These conditions in turn are related to conditions of Hall and Heyde (1980) and are shown by Kuersteiner and Prucha (2013) to be su¢ cient for their Theorem 1. To show that (43) holds note that from (42) and Condition 1 it follows that for some constant C < 1; kn X E •q 0+ X 23+2 2+ 1+ =2 q=1 E t= 0 +1 h 2+ T n 23+2 X X h E k + 1+ =2 n t=1 i=1 23+2 C 1+ =2 + i ;t G n;t n y 2+ it k G n;t n i 23+2 nT C 23+2 C 23+2 T C = + !0 =2 n1+ =2 n =2 because 23+2 C and T are …xed as ; n ! 1. P n •2 Next, consider the probability limit of kq=1 q : We have kn X •2 q q=1 = k 0+ 1X X 0 j; ;t jG n;t n E ;t j=1 t= 0 +1 2 +p n X 0 1; t2f 0 ;:::; 0 + g \fmin(1; 0 );::;T g (46) 1ij ;t jG n;t n E ;t 2 ( y 1t E[ 0 y 1t jG n;t n ]) 1;y 1 ft 0 + [ r1 ]g 1ij 1j (47) + 1 n X n X t2fmin(1; 0 );::;T g i=1 0 1;y y n;it E y n;it jG n;t n+i 1 55 2 1j (48) where for (46) we note that E k 0+ 1X X ;t jG n;t n 0 j; = 0 when t > T . This implies that j=1 t= 0 +1 1X X k = 2 ;t jG n;t n E ;t 1ij 0 j; j=1 t2f 0 ;:::; 0 + g\fmin(1; 0 +1);::;T g 0+ X 1X k + 2 0 j; ;t 2 ;t jG n;t n E ;t 1ij 1ij j=1 t=maxf 0 ;T g where by Condition 2 0+ X 1X k 2 0 j; ;t j=1 t=maxf 0 ;T g 1f 0 p + [ rj 1 ] < t 0 + [ rj ]g ! k X 0 j; ( (rj ) (rj 1 )) j; j=1 and 2 1+ =2 k 6 1X 6 E6 4 j=1 X 0 j ;t t2f 0 ;:::; 0 + g \fmin(1; 0 +1);::;T g 20 k 6BX 1 6B = 1+ =2 E 6@ 4 j=1 (T + j 0 j) =2 k 0 ;t n;t n E t2f 0 ;:::; 0 + g \fmin(1; 0 +1);::;T g =2 1+ =2 22+ (T + j 0 j)1+ 1+ =2 X 2 ;t jG E k X X ;t jG E j=1 t2f 0 ;:::; 0 + g\fmin(1; 0 +1);::;T g =2 k 1+ =2 sup E t h 2+ ;t i h 7 7 7 5 1ij 2 n;t n 0 3 ;t (49) 11+ C 1ij C A E =2 3 7 7 7 5 ;t jG n;t n 2+ i ! 0; where the …rst inequality follows from noting that the set f 0 ; :::; 0 + g \ fmin (1; 0 ) ; ::; T g has at most T + j 0 j elements and from using Jensen’s inequality on the counting measure. The second inequality follows from Hölder’s inequality. Finally, we use the fact that (T + j 0 j) = ! 0 and h i 2+ supt E C < 1 by Condition 1(iv). ;t 56 Next consider (47) where 2 6 2 E6 4 p 2 p E n h n X 0 1; t2f 0 ;:::; 0 + g \fmin(1; 0 );::;T g X E t2f 0 ;:::; 0 + g \fmin(1; 0 );::;T g ( y 1t E[ h 0 1 23 p sup E n t 1y i ;t t2f 0 ;:::; 0 + g \fmin(1; 0 );::;T g 1=2 sup E ;t y 1t ( 2 n;t n i E[ 0 y 1t jG n;t n ]) 1;y 1 ft 1=2 0 7 + [ r1 ]g 7 5 1=2 X 1=2 24 T + j 0 j p sup E n t ;t jG E ;t 2 0 y 1t jG n;t n ]) ;t jG n;t n+i 1 E ;t 3 i;t 1f 0 <t E ( y 1t h h 2 y n;it 0 E[ i (50) + [ rj ]g 2 0 y 1t jG n;t n ]) 1;y i 1=2 (51) 1=2 !0 (52) where the …rst inequality in (50) follows from the Cauchy-Schwartz inequality, (51) uses Condition 1(iv), and the last inequality uses Condition 1(iii). Then we have in (51), by Condition 1(iii) and the Hölder inequality that E such that (52) follows. h ( y 1t E[ 2 0 y 1t jG n;t n ]) y i h 2E k y 2 1t k i p We note that (52) goes to zero as long as T = n ! 0: Clearly, this condition holds as long as T is held …xed, but holds under weaker conditions as well. Next the limit of (48) is, by Condition 1(v) and Condition 3, n 1 X X 0 2 p y y E n;it jG n;t n+i 1 ! 1;y n;it n i=1 t2f1;::;T g This veri…es (44). Finally, for (45) we check that 2 kn X 2 4 sup E E •q G n First, use (42) with kn X q=1 n;q 1 q=1 !1+ = 0 to obtain E •q 0+ 23 X 2 G n;q 1 t= + 23 n E 0 h n X X t2f1;::;T g i=1 57 2 ;t E h G =2 X 0 1;y yt 1;y : t2f1;::;T g 3 5 < 1: n;t n i 2 y n;it G n;t n+i 1 (53) i : (54) Applying (54) to (53) and using the Hölder inequality implies 2 !1+ =2 3 kn X 2 5 E4 E •q G n;q 1 q=1 =2 2 2 0+ X 3 2 E4 20 6 23 + 2 =2 E 4@ n E t= 0 +1 h 2 G ;t X n;t n+i 1 n X E t2f 0 ;:::; 0 + g\f1;::;T g i=1 h i !1+ =2 2 y n;it 3 5 G n;t n+i 1 i By Jensen’s inequality, we have 0+ 1 X E t= 0 +1 h and E so that 2 3 2 E4 0+ X 2 ;t h h E t= 0 +1 G 2 G ;t 2 ;t G n;t n+i 1 ! i 1+ n;t n+i 1 n;t n+i 1 i1+ ! i 1+ =2 0+ 1 X E t= 0 +1 =2 E =2 3 h =2 0+ X =2 > T (which holds eventually) 20 n h 2 6@ 23 X X y E jG E4 n;it n i=1 sup E n;t n+i t2f1;::;T g =2 23+3 (T n) n1+ 3+3 =2 2 T =2 =2 T X n X E t=1 i=1 1+ =2 sup E i;t h G t= 0 +1 23+3 h 2+ y n;it 2+ y n;it i i 3+3 =2 2 sup E t h 2+ ;t i +2 3+3 =2 T =2 1+ =2 sup E h i G < 1: 3 7 5 2+ y n;it =2 i ;t ;t i1+ n;t n+i 1 ii (55) (56) <1 i;t 58 7 5 2+ 2+ 11+ i A 1 3 n;t n+i 1 h h E E h =2 A n;t n+i 1 By combining (55) and (56) we obtain the following bound for (53), 2 !1+ =2 3 kn X 2+ 5 E4 E •q jG n;q 1 q=1 G ;t ;t t and similarly, for all 2 2+ 23+3 5 h 11+ i < 1: This establishes that (43), (44) and (45) hold and thus establishes the CLT for Pkn • q: q=1 It remains to be shown that the second term in (41) can be neglected. Consider max(T; 0 + ) X n X k X 0 jE t=min(1; 0 +1) i=1 j=1 = k 0+ X X 1=2 t= +n 1=2 0 0 j; h ~it (rj ) jG E ;t jG n;t n+i 1 0 1;y E y n;it jG n;t n+i 1 j=1 T X n X n;t n+i 1 t=1 i=1 1f 0 i + [ rj 1 ] < t 0 + [ rj ]g : Note that ;t jG n;t n+i 1 E = 0 for t > T and y n;it jG n;t n+i 1 E = 0: This implies, using the convention that a term is zero if it is a sum over indices from a to b with a > b, that 1=2 0+ X t= = 1=2 0 ;t jG n;t n+i 1 E 0 T X k X t= 0 0 j; ;t jG n;t n+i 1 E j=1 1f 0 + [ rj 1 ] < t 0 + [ rj ]g : By a similar argument used to show that (49) vanishes, and noting that T is …xed while it follows that as 2 E4 1=2 1+ =2 T X t= 0 j; E 0 n X k 0+ X X t= 0 0 jE i=1 j=1 h ~it (rj ) jG 3 5!0 ;t jG n;t n ! 1: The Markov inequality then implies that 1=2 ! 1; n;t n+i 1 i = op (1) : and consequently that 0 1 Xn (r1 ) + k X 0 j Xn (rj ) = kn X q=1 j=2 59 •q + op (1) : (57) We have shown that the conditions of Theorem 1 of Kuersteiner and Prucha (2013) hold by establishing (43), (44), (45) and (57). Applying the Cramer-Wold theorem to the vector Ynt = Xn (r1 )0 ; Xn (r2 )0 :::; Xn (rk )0 0 it follows from Theorem 1 in Kuersteiner and Prucha (2013) that for all …xed r1 ; ::; rk and using the convention that r0 = 0; 2 0 E [exp (i 0 Ynt )] ! E 4exp @ When 0 1@ 2 X 0 1;y yt 1;y 0 j; ( (rj ) (rj 1 )) j; j=1 t2f1;::;T g for all r 2 [0; 1] and some (r) = r + k X 113 AA5 : (58) positive de…nite and measurable w.r.t C this result simpli…es to k X 0 j; ( (rj ) (rj 1 )) j; = j=1 k X 0 j; j; (rj rj 1 ) : j=1 The second step in establishing the functional CLT involves proving tightness of the sequence 0 Xn (r) : By Lemma A.3 of Phillips and Durlauf (1986) and Proposition 4.1 of Wooldridge and White (1988), see also Billingsley (1968, p.41), it is enough to establish tightness componentwise. This is implied by establishing tightness for 0 Xn (r) for all 2 Rd such that 0 = 1. In the following we make use of Theorems 8.3 and 15.5 in Billingsley (1968). We need to show that for the ‘modulus of continuity’ ! (Xn ; ) = sup j 0 (Xn (s) (59) Xn (t))j jt sj< where t; s 2 [0; 1] it follows that lim lim supP (! (Xn ; ) !0 ") = 0: n; De…ne 1 XX Xn ;y (r) = p n t=1 i=1 T Since j 0 (Xn (s) Xn (t))j 0 y (Xn n ;y y n;it ; (s) Xn Xn 60 ; ;y Xr] 0 +[ 1 (r) = p (t)) + j ;t : t= 0 +1 0 (Xn ; (s) Xn ; (t))j 0 y and noting that (Xn ;y (s) Xn ;y (t)) = 0 uniformly in t; s 2 [0; 1] because of the initial condition Xn (0) given in (21) and the fact that Xn ;y (t) is constant as a function of t: It follows that sup j ! (Xn ; ) 0 (Xn ; (s) Xn (60) (t))j ; js tj< such that P (! (Xn ; ) 3") P (! (Xn ; ; ) 3") : To analyze the term in (60) use Billingsley (1968, Theorem 8.4) and the comments in Billingsley P (1968, p. 59). Let Ss = t=0 +s0 +1 0 ;t . To establish tightness it is enough to show that for each 0 " > 0 there exists c > 1 and such that if P > max jSk+s s 0 p Sk j > c" hold for all k: Note that for each k …xed, Ms = Ss+k " c2 Sk and Fs = G (61) n;(s+k min(1; 0 )n+1) ; fMs ; Fs g is a martingale. By a maximal inequality, see Hall and Heyde (1980, Corollary 2.1), it follows that for each k P p Sk j > c" max jSk+s s max jSk+s Sk jp > (c")p p=2 s hP pi 1 k+ 0 E ;t t=k+1 (c")p p=2 2p p=2 " 2p p sup E = 2 p 2 1+p sup E ;t (c")p p=2 t c c " t =P p ;t (62) by an inequality similar to (49). Note that the bound for (62) does not depend on k. Now choose c = 2p=(p 2) supt E p 1=(p 2) ;t ="(1+p)=(p 2) such that (61) follows. We now identify the limiting distribution using the technique of Rootzen (1983). Tightness together with …nite dimensional convergence in distribution in (58), Condition 2 and the fact that the partition r1 ; :::; rk is arbitrary implies that for with y = P 2 Rd with = 0 y; yt : 0 1 2 E [exp (i 0 Xn (r))] ! E exp t2f1;::;T g 0 0 y y y + 0 (63) (r) Let W (r) = (Wy (r) ; W (r)) be a vector of mutually independent standard Brownian motion processes in Rd ; independent of any C-measurable random variable. We note that the RHS of (63) is the same as E exp 1 2 0 y y y + 0 (r) = E exp i 0 y 1=2 y Wy (1) + i Z r 0 _ (t) 1=2 dW (t) 0 61 (64) : The result in (64) can be deduced in the same way as in Du¢ e, Pan and Singleton (2000), in Rt particular p.1371 and their Proposition 1. Conjecture that Xt = 0 (@ (t) =@t)1=2 dW (t). By Condition 2(iii) and the fact that @ (t) =@t does not depend on Xt it follows that the conditions of Durrett (1996, Theorem 2.8, Chaper 5) are satis…ed. This means that the stochastic di¤erential Rt equation Xt = 0 (@ (t) =@t)1=2 dW (s) with initial condition X0 = 0 has a strong solution X; Wt ; FtW _ C where FtW is the …ltration generated by W (t) : Then, Xt is a martingale (and thus a local martingale) w.r.t the …ltration FtW : For C-measurable functions (t) : [0; 1] ! R and (t) : [0; 1] ! Rk de…ne the transformation The terminal conditions (r) = 0 and r (r) + (r)0 Xr : = exp r are imposed such that (r) = i (r) + (r)0 Xr = exp (i = exp 0 Xr ) : The goal is now to show that r jC] E[ = 0 = exp a (0) + (0)0 X0 = exp (a (0)) where the initial condition X0 = 0 was used. In other words, we need to …nd such that t = (t)0 (Xt ), t r = 0 + Z (t)0 (Xt ) (Xt ) (t) + _ (t) + _ (t) Xt use Ito’s 1 2 (t) = Lemma to obtain r (s) ds + s It follows that for r to be a martingale we need _ (t) = 0 and _ (t) = 1=2 0 (@ (t) =@t) Z r Z r 1 0 (r) (0) = _ (t) dt = @ 0 0 2 1 2 0 r s dWs : (t) = 0 which implies the di¤erential equations : Using the terminal condition (t) =@t dt = 0 ( (r) C = exp 1 2 (r) = 0 it follows that (0)) = 1 2 0 (r) and (r) E exp i Z 0 0 (0) = (t) is a martingale. Following the proof of Proposition 1 in Du¢ e, Pan and Singleton r (2000) and letting or (t) and Z r 0 _ (t) 1=2 dW (t) 0 0 (r) a.s. (65) which implies (64) after taking expectations on both sides of (65). To check the regularity conditions in Du¢ e et al (2000, De…nition A) note that 62 t = 0 because (x) = 0: Thus, (i) holds automatically. For (ii) we have 0 t t and, noting that j 0 t tj 2 0 t = @ sup 2R sup k k=1; = exp 2 (t) + 2 (t)0 Xt sup k 2R 0 @ (t) =@t 1 it follows that 0 @ (t) =@t jexp (2 (t))j exp 2 (t)0 Xt 0 @ (t) =@t jexp (2 (t))j t k k=1; (t)0 (Xt ) such that t sup sup k t (t) =@t sup 2Rk k=1; = and therefore exp 2 (t)0 Xt (t) = i k t 0 @ (t) =@t exp 2 t sup k 2R t k=1; 2Rk k=1; such that condition (ii) holds by Condition 2(iii) where supk 0 sup k @ supt (t) =@t 0 @ ! (t) =@t M a.s. Finally, for (iii) one obtains similarly that j tj (t)0 Xt jexp ( (t))j exp jexp ( (t))j exp (M ) a:s: such that the inequality follows. C.2 Proof of Corollary 1 We note that …nite dimensional convergence established in the proof of Theorem 1 implies that 1 2 E [exp (i 0 Xn (1))] ! E exp We also note that because of (65) it follows that Z 1 1=2 0 _ (t) E exp i dW (t) 0 y y y + 0 = E exp 0 which shows that C.3 R1 0 _ (t) (1) 1 2 1=2 dW (t) has the same distribution as 0 : (1) (1)1=2 W (1). Proof of Theorem 2 Let syit ( ; ) = f ;it ( ; ) and st ( ; ) = g ;t ( ; ) in the case of maximum likelihood estimation and syit ( ; ) = fit ( ; ) and st ( ; ) = gt ( ; ) in the case of moment based estimation. Using the notation developed before we de…ne s~yit ( ; )= 8 < : syit ( ; ) p n if t 2 f1; :::; T g otherwise 0 63 analogously to (20) and s~it ( ; ) = 8 < st ( ; ) p : if t 2 f 0 + 1; :::; 0 + g and i = 1 otherwise 0 analogously to (19). Stack the moment vectors in s~it ( ) := s~it ( ; ) = s~yit ( ; )0 ; s~it ( ; )0 1=2 and de…ne the scaling matrix Dn = diag n 1=2 Iy ; I 0 (66) where Iy is an identity matrix of dimension k and I is an identity matrix of dimension k : For the maximum likelihood estimator, the moment conditions (24) and (25) can be directly written as max(T; 0 + ) X n X s~it ^; ^ = 0: t=min(1; 0 +1) i=1 For moment based estimators we have by Conditions 4(i) and (ii) that max(T; 0 + ) syM sup k 0k " X 0 0 0 ( ; ) ; sM ( ; ) n X s~it ( ; ) = op (1) : t=min(1; 0 +1) i=1 It then follows that for the moment based estimators max(T; 0 + ) 0=s ^ = X n X s~it ^; ^ + op (1) : t=min(1; 0 +1) i=1 A …rst order mean value expansion around max(T; 0 + ) X op (1) = n X t=min(1; 0 +1) i=1 or Dn where 1 ^ satis…es allow for 0 0 @ = 0 0 s~it ( 0 ) + @ max(T; 0 + ) X n X t=min(1; 0 +1) i=1 ^ 0 0 where max(T; 0 + ) X = ( 0 ; 0 )0 and ^ = ^0 ; ^0 n X @~ sit t=min(1; 0 +1) i=1 @~ sit @ 0 1 Dn A 1 @ 0 Dn A Dn 1 ^ max(T; 0 + ) X 1 0 n X leads to 0 s~it ( 0 ) + op (1) t=min(1; 0 +1) i=1 and we note that with some abuse of notation we implicitly to di¤er across rows of @~ sit =@ 0 . Note that 2 3 @~ syit ( ; ) =@ 0 @~ syit ( ; ) =@ 0 @~ sit 5 =4 0 0 @ 0 @~ sit; ( ; ) =@ @~ sit; ( ; ) =@ 64 where s~it; denotes moment conditions associated with : From Condition 4(iii) and Theorem 1 it follows that (note that we make use of the continuous mapping theorem which is applicable because Theorem 1 establishes stable and thus joint convergence) max(T; 0 + ) Dn ^ 1 = 0 A ( 0) X 1 n X s~it ( 0 ) + op (1) t=min(1; 0 +1) i=1 It now follows from the continuous mapping theorem and joint convergence in Corollary 1 that d Dn 1 ^ C.4 0 ! 1=2 W (C-stably) Proof of Corollary 2 Partition 2 with inverse 2 A ( 0) 1 A ( 0) 1 =4 2 =4 A ( 0) = 4 Ay;1 + Ay;1 Ay; p1 p Ay; p1 A A ; A A 3 ; Ay; p1 A 1 1 ; Ay; Ay; A ; 1 A ; Ay;1 Ay; Ay; ; 3 5 p A ; Ay;1 Ay;1 Ay; 1 ; Ay; A A ; 1 A ; Ay;1 Ay; A ; A 1 ; Ay; Ay; It now follows from the continuous mapping theorem and joint convergence in Corollary 1 that Dn 1 ^ d 0 ! 1 A ( 0) 1=2 W (C-stably) where the right hand side has a mixed normal distribution, A ( 0) and A ( 0) 1 A ( 0 )0 1 2 =4 The form of the matrices Ay; p1 y A ; and 1 1=2 W M N 0; A ( 0 ) y; 0 + Ay; yA p y; 0 + A yA (1) Ay; ; 1 0 y; 0 (1) A A ( 0 )0 p1 Ay; 1 A 1 yA ; ; 0 yA + ; 0 p +A Ay; ; (1) A (1) A ; 0 ; 0 3 5 follow from Condition 5 in the case of the maximum likelihood estimator. For the moment based estimator, y and syM ( ; ) and sM ( ; ) and Conditions 4(i) and (ii). 65 3 5 1 5: ; A ; p Ay; follow from Condition 6, the de…nition of C.5 Proof of Theorem 3 We …rst establish the joint stable convergence of (V 1=2 and V ;t n = exp ((t (r) = 1=2 min (1; 0 )) = ) 1=2 [ r] [ 0 + r] . min (1; 0 )) = ) 1=2 0 P[ 1=2 [ r] : r] s=min(1; 0 ) t X 1=2 exp ((t s) = ) s s=min(1; 0 ) exp ( s = ) s: It follows that + 1 ft > min (1; 0 )g exp ([ r] = ) V~ n (r) : We establish joint stable convergence of V~ n (r) ; Y to deal with the …rst term in (r) ; Y n ) : Recall that 0 + 1 ft > min (1; 0 )g De…ne V~ n (r) = = exp ((t n n and use the continuous mapping theorem By the continuous mapping theorem (see Billingsley (1968, p.30)), the characterization of stable convergence on D [0; 1] (as given in JS, Theorem VIII 5.33(ii)) and an argument used in Kuersteiner and Prucha (2013, p.119), stable convergence of V~ n (r) ; Y n implies that exp ([ r] = ) V~ n (r) ; Y n also converges jointly and C-stably. Subsequently, this argument will simply be referred to as the ‘continuous mapping theorem’. In addition exp (([ r] min (1; 0 )) = ) 0 !p exp (r ) 0 which is measurable with respect to C. Together these results imply joint stable convergence of (V ;s n V~ n (r) ; Y (r) ; Y n ). We thus turn to = exp ( s = ) s n . To apply Theorem 1 we need to show that satis…es Conditions 1 iv) and 2. Since jexp ( s = ) 2+ sj = jexp ( s= )j (2+ ) j s j2+ ej j(2+ ) j s j2+ (67) such that h i E jexp ( s = ) s j2+ C h i and Condition 1 iv) holds. Note that E j t j2+ C holds since we impose Condition 7. Next, 2 s] note that E [exp ( 2s = ) = 2 exp ( 2s = ). Then, it follows from the proof of Chan and Wei (1987, Equation 2.3)17 that Xr] 0 +[ 1 ( ;s ) t= 0 +[ s]+1 17 Xr] 0 +[ 2 = 1 exp ( 2 t= ) t= 0 +[ s]+1 See Appendix D for details. 66 2 t ! p 2 Z s r exp ( 2 s) dt: (68) In this case, 2 (r) = _ (r) exp ( 2r )) =2 and (1 1=2 = r). By the relationship exp ( in (64) and Theorem 1 we have that V~ n (r) ; Y n Z ) r s e dW (s) ; y Wy C-stably (1) 0 which implies, by the continuous mapping theorem and C-stable convergence that Z r (V n (r) ; Y n ) ) exp (r ) 0 + e(r s) dW (s) ; y Wy (1) C-stably. (69) 0 Note that Rr 0 s) e(r dW (s) is the same term as in Phillips (1987) while the limit given in (69) is the same as in Kurtz and Protter (1991,p.1043). We now square (28) and sum both sides as in Chan and Wei (1987, Equation (2.8) or Phillips, (1987) to write 1 + 0 X s 1 s = s= 0 +1 We note that e = = e 1 2 ; + 2 ! 1; e = 2 ; 0 1 e2 + 0 = = e 1 2 ! 2 = 2 e + 0 X = e 2 s 1 1 2 s= 0 +1 + 0 X s= 0 +1 (70) 2 . Furthermore, note that for all ; " > 0 it follows by the Markov and triangular inequalities and Condition 7iv) that ! + 0 X 1 2 1=2 P E s 1 j tj > jG n;t n > " t= 0 +1 + 1 X0 E " t= +1 2 s1 0 h i supt E j t j2+ 1=2 j tj > =2 ! 0 as such that Condition 1.3 of Chan and Wei (1987) holds. Let U 2;k = Then, by Holder’s and Jensen’s inequality h i 2+ E jU ; j 1 + 0 X t= 0 +1 E h E 2 s jG n;t n 1+ =2 i 1 ! 1: Pk+ 0 t= 0 +1 E [ s2 jG h i 2+ sup E j t j <1 n;t n ] : (71) t such that U 2; is uniformly integrable. The bound in (71) also means that by Theorem 2.23 of P+0 1 2 Hall and Heyde it follows that E U 2; ! 0 and thus by Condition 7 vii) and s= 0 +1 t by Markov’s inequality 1 + 0 X s= 0 +1 67 2 p t ! 2 : 2 s: We also have and 2 + 0 X 2 s 1 1 2 ; + 0 1 2 ; 0 1 = =V n (1)2 ; p ! V (0)2 X V s 2 n = Z 1 V 2n (r) dr 0 s=1 s= 0 +1 (72) such that by the continuous mapping theorem and (69) it follows that 1 + 0 X s= 0 +1 s 1 s 1 ) V 2 An application of Ito’s calculus to V 2 ;V (0) Z 2 (1) V (0) 1 V (r)2 dr ;V (0) 0 ;V (0) 2 2 (r)2 =2 shows that the RHS of (73) is equal to (73) : R1 0 V ;V (0) dW which also appears in Kurtz and Protter (1991, Equation 3.10). However, note that the results in Kurtz and Protter (1991) do not establish stable convergence and thus don’t directly apply here. When V (0) = 0 these expressions are the same as in Phillips (1987, Equation 8). It then is a further consequence of the continuous mapping theorem that ! Z + 0 X 1 V n (r) ; Y n (r) ; ) V ;V (0) (r) ; Y (r) ; s 1 s V ;V (0) dW 0 s= 0 +1 C.6 1 Proof of Theorem 4 For s~it ( ) = s~yit ( ; )0 ; s~it; ( ) 0 we note that in the case of the unit root model 2 De…ning @~ sit ( ) 4 = @ 0 @~ syit Ay n ( ) = @ A max(T; 0 + ) X ( ; ) =@ 0 @~ sit; ( ) =@ 0 n X @~ syit @ t=min(1; 0 +1) i=1 ^ n @~ syit 0 0 we have as before for some ~ ( ; ) =@ 0 2 ( )=4 ( ) 0 that for 0 Ay n ( ) P 0+ 2 t= 68 0 2 ;t 1 Dn A 3 5; 3 5: (C-stably). we have Dn ^ 1 0 = A 1 ~ n max(T; 0 + ) X n X s~it ( 0 ) t=min(1; 0 +1) i=1 Using the representation 2 0+ X t= 2 ;t = Z 1 V (r)2 dr; n 0 0 it follows from the continuous mapping theorem and Theorem 3 that ! Z 1 + 0 X V n (r) ; Y n ; Ay n ( 0 ) ; V n (r)2 dr; 1 s 1 0 ) V (r) ; y 1=2 y Wy (1) ; A ( 0 ) ; Z s= 0 +1 1 V 2 ;V (0) (r) dr; 0 Z (74) s V (C-stably). ;V (0) dW 0 The partitioned inverse formula implies that 2 R1 1 1 A A V ;V (0) (r)2 dr A y; y; y; 0 1 A ( 0) = 4 1 R1 2 V (r) dr 0 ;V (0) 0 1 3 5 (75) By Condition 9, (74) and the continuous mapping theorem it follows that 2 3 1=2 y Wy (1) 5: Dn 1 ^ A ( 0) 1 4 R s 0 ) V dW ;V (0) 0 (76) The result now follows immediately from (75) and (76). D Proof of (68) Lemma 1 Assume that Conditions 7, 8 and 9 hold. For r; s 2 [0; 1] …xed and as ! 1 it follows that Xr] 0 +[ 1 ( ;s ) 2 e( 2 t= ) E t= 0 +[ s]+1 2 t jG n;(t min(1; 0 ) 1)n p !0 Proof. By Hall and Heyde (1980, Theorem 2.23) we need to show that for all " > 0 Xr] 0 +[ 1 e 2 t= E 2 t1 1=2 e t= 0 +[ s]+1 69 t= t >" G p n;(t min(1; 0 ) 1)n ! 0: (77) By Condition 7iv) it follows that for some > 0 2 0 +[ r] X 1 1=2 4 E e 2 t= E t2 1 e t= > " jG t t= 0 +[ s]+1 Xr] 0 +[ (1+ =2) t= e 2+ " t= 0 +[ s]+1 h i [ r] [ s] sup E j t j2+ e(2+ 1+ =2 " t n;(t min(1; 0 ) 1)n h i E j t j2+ )j j 3 5 ! 0: 1 This establishes (77) by the Markov inequality. Since P 0 +[ r] ( 2 t= ) E t= 0 +[ s]+1 e 2 t jG n;(t min(1; 0 ) 1)n is uniformly integrable by (67) and (71) it follows from Hall and Heyde (1980, Theorem 2.23, Eq 2.28) that 2 Xr] 0 +[ E4 1 ( 2 ;s ) e( 2 t= ) 2 t jG n;(t min(1; 0 ) 1)n E t= 0 +[ s]+1 The result now follows from the Markov inequality. 3 5 ! 0: Lemma 2 Assume that Conditions 7, 8 and 9 hold. For r; s 2 [0; 1] …xed and as that Xr] 0 +[ 1 e ( 2 t= ) 2 t jG n;(t min(1; 0 ) 1)n E t= 0 +[ s]+1 ! p 2 Z ! 1 it follows r exp ( 2 t) dt: s Proof. The proof closely follows Chan and Wei (1987, p. 1060-1062) with a few necessary adjustments. Fix > 0 and choose s = t0 t1 max e 2 ti tk = r such that ::: e i k 2 ti < : 1 This implies Z s r e 2 t dt k X e 2 ti (ti ti 1 ) i=1 k Z X i=1 70 ti ti e 1 2 t e 2 ti dt : (78) Let Ii = fl : [ ti 1 ] < l [ ti ]g : Then, Xr] 0 +[ 1 2 t= e 2 t jG n;(t min(1; 0 ) 1)n E t= 0 +[ s]+1 k X X 1 = 2 l= e E k X X = 2 l= e 2 [ ti e 1 ]= + e 2 [ ti 1 1 ]= i=1 + k X X e 1 ]= 2 (ti e Z 2 2 ti 1 ) Z 2 t dt r 2 t e dt s 2 2 l jG n;(l min(1; 0 ) 1)n E l2Ii 2 [ ti r 2 l jG n;(l min(1; 0 ) 1)n E i=1 l2Ii k X Z s 2 l jG n;(l min(1; 0 ) 1)n i=1 l2Ii 1 2 (ti ti 1 ) ! r e 2 t dt s i=1 = In + IIn + IIIn : 2 [ ti For IIIn we have that e that for all 0 2 [ ti ; e 1 ]= 1 ]= 2 ti !e 2 ti e as 1 ! 1: In other words, there exists a and by (78) 1 jIIIn j 2 : We also have by Condition 7vii) that 1 X 2 l jG n;(l min(1; 0 ) 1)n E l2Ii as ! 1 such that by maxi jIIn j e2j Finally, there exists a j 1 e2 X [ ti 0 1 ]= e2j 0 such that for all l2Ii 2 l= 2 e 2 [ ti (ti ti 1 ) j 2 l jG n;(l min(1; 0 ) 1)n E l2Ii max max e i k k ! 2 (ti ti 1 ) = op (1) : it follows that 1 ]= max e 2 [ ti ]= i k 2 max e e 2 [ ti ]= e i k + max e 2 ti i k 2 + =3 : 71 e 2 [ ti 2 ti 1 2 ti 1 ]= 0 such We conclude that jIn j 3 1 k X X i=1 l2Ii E 2 l jG n;(l min(1; 0 ) 1)n =3 2 (1 + op (1)) : The remainder of the proof is identical to Chan and Wei (1987, p. 1062). 72