Estimation of Constant Gain Learning Models Eric Gaus Srikanth Ramamurthy

advertisement
Estimation of Constant Gain Learning
Models
Eric Gaus∗
Srikanth Ramamurthy†
April 2014
Abstract
Constant Gain Learning models present a unique set of challenges to the
econometrician compared to models based on Rational Expectations. This is
because of the differences in the equilibrium concept and stability conditions
between the two paradigms. This paper focuses on three key issues: stability
conditions, identification and derivation of the likelihood function. Several illustrative examples complement the general estimation methodology to demonstrate
the aforementioned issues in practice.
Keywords: Adaptive Learning, Constant Gain, E-Stability.
∗
Gaus: Ursinus College, 601 East Main St., Collegville, PA 19426-1000 (e-mail: egaus@ursinus.edu)
Ramamurthy: Sellinger School of Business, Loyola University Maryland, 4501 N. Charles Street,
Baltimore, MD 21210 (e-mail: sramamurthy@loyola.edu).
†
1
Introduction
Adaptive learning explores economic decision making within a bounded rationality
framework. In the field of macroeconomics, adaptive learning models have made a
resurgence primarily based on the foundations of Evans and Honkapohja (2001). While
much of the work in this area has remained within the confines of theory, there is a
small but growing literature that attempts to fit these models to real data. Both
the complexity of models and econometric techniques have varied tremendously. On
the frequentist side, Orphanides and Williams (2005) calibrate the learning parameters and then estimate the structural parameters of the model. Branch and Evans
(2006) demonstrate that a simple calibrated adaptive learning model provides the best
fit and forecast of the Survey of Professional Forecasters. Chevillon, Massmann and
Mavroeidis (2010) demonstrate a technique based on the Anderson-Rubin statistic for
improving estimates of the structural parameters.
With the exception of Chevillon et al. (2010), the central focus of the empirical
learning literature has been macroeconomic, concerned primarily with the relative fit
of various learning models to the data. Details of the estimation methodology, in particular, the unique set of issues involved in the estimation of these models, remain
elusive to many emerging researchers. This is particularly relevant in light of the fundamental differences in the equilibrium concept, stability conditions and the associated
parameter constraints between the learning and RE paradigms. The goal of this paper
is to fill this void. It describes the general methodology for estimating a broad category
of learning models, paying careful attention to the specific issues that the researcher is
likely to encounter in the process. Specifically, it focusses on the following three areas:
1. Stability conditions: Equilibrium in a learning model is defined by convergence
to some solution. Typically, though not always, the benchmark is the solution
under rational expectations. Evans and Honkapohja (2001) study this equilibrium in great depth and characterize its stability properties with the E-stability
principle. An important question that arises in this regard is the relation between
E-stability and determinacy of the RE solution. McCallum (2007) demonstrates
that under certain conditions determinacy is a sufficient condition for E-stability.
Likewise it is also possible that when a multiplicity of solutions exist, some of
those solutions may not be E-stable. It is important for the researcher to be
aware of these subtleties to ensure that the appropriate parameter constraints
are enforced in the estimation process. These issues are particularly relevant
when estimating a model with lagged endogenous variables. From a practical
standpoint, it is necessary to assume that agents include only t − 1 data in their
information set. In the case of lagged endogenous variables, this results in multiple equilibria under learning. Typically, E-stability conditions can help the
researcher select the appropriate equilibrium when taking a model to the data.
In the simplest case two solutions exist, one locally E-stable and the other locally
E-unstable. As shown in Marcet and Sargent (1989), in this case convergence
to the E-stable solution occurs only with the use of a projection facility. This
technique ensures that a series of bad shocks does not lead the system outside the
basin of attraction. An alternative approach is to penalize the likelihood function
1
whenever the underlying law of motion becomes explosive. Example 2 in Section
3 highlights these issues.
2. Idenification: Chevillon et al. (2010) show that, under certain conditions, the
structural parameters may not all be identified as the equilibrium under learning
converges to that of RE. Clearly this raises concerns about the reliability of the
estimates in models that assume E-stability. However, as we show in this paper
with the aid of a simple bivariate example, identification of the structural parameters can be improved as long as one of the variables is influenced by expectations
of other variables, but is itself not directly or indirectly part of the expectational
feedback loop. As it turns out, the interest rate, specified by the Taylor (1993)
rule in the NK-DSGE models, satisfies precisely this condition. Note that expectations of interest rate appear neither in the Taylor rule (direct), nor in the
IS or Phillips curve equations (indirect). Intuitively, the interest rate provides
an additional measurement that aids in the identifiability of the parameters by
shrinking the set of possible parameter values that give rise to the same data. It
is our belief that a more generalized version of this property plays an important
role in the identification of the structural parameters in learning based NK-DSGE
models.
3. Derivation of the Likelihood function: From an econometric standpoint, an interesting aspect of fitting these models concerns the evaluation of the likelihood.
Because the agents in the models have limited information - either about some
of the model parameters or the structure of the model itself (or possibly both) they update their beliefs about the model based on some form of least squares
(LS) estimation. Consequently, the evolution of the endogenous variables depends critically on these LS coefficients. We refer to these coefficients as the
learning parameters. Neither the learning parameters nor the covariates of the
LS model are observed by the econometrician. While these quantities may not be
of particular interest to the researcher, deriving the joint density of the data as a
function of only the model parameters (but marginalized of the learning parameters) becomes an issue. The approach taken in the literature is to recursively
calculate the learning parameters based on predicted values of the unobserved
covariates. A detailed discussion of this is presented in the following section of
the paper.
Because much of the empirical work in this area has focused mainly on learning
under constant gain (CGL), the goal here is to provide a self-contained guide that can
be readily implemented for estimating CGL models. To accomplish this goal, several
examples complement the general estimation methodology. The first two examples
illustrate the aforementioned issues with the aid of simulated data. These examples
are meant to be instructive with complete details of the estimation process. Since NewKeynesian (NK) Dynamic Stochastic General Equilibrium (DSGE) models serve as one
of the bulwarks of the empirical macroeconomics field, our third example is a standard
NK-DSGE model that is fit to real data. The handful of papers in this area continue
the tradition of fitting these models using Bayesian techniques. Milani (2007) shows
that incorporating learning in a simple linearized DSGE model results in a better fit
to the data compared to its RE counterpart. It also considers interesting extensions by
2
introducing learning in the underlying microfounded model along the lines of Evans,
Honkapohja and Mitra (2011) and Preston (2005). Continuing on those lines Eusepi
and Preston (2011) find that expectations, in general, and learning, specifically, may
explain business cycle fluctuations. More recently, Slobodyan and Wouters (2012) have
estimated a more complicated medium scale DSGE model. Their findings suggest that
the overall fit of the model hinges on the initialization of the agents’ beliefs. More
importantly, they conclude that limiting the information set of the agents leads to an
empirical fit that favors comparably to a structural VAR model. For a more in depth
survey of the literature on DSGE models under learning we refer the interested reader
to Milani (2012).
The rest of the paper is organized as follows. The next section describes the general
class of adaptive learning models and the estimation procedure. Section three provides
three examples from this class of models. Concluding remarks are provided in Section
four.
2
General Framework
Evans and Honkapohja (2001) provides a unifying theoretical framework for analyzing
the dynamic properties of adaptive learning models. In general, the methodology
for taking a learning model to the data involves some additional steps compared to
a model based on rational expectations. While some of these steps are founded in
theory, others are relevant from an empirical standpoint. This section describes the
model, the stability conditions, the state space form, and estimation strategy. For ease
of readability, the exposition here closely follows Chapter 10 in Evans and Honkapohja
(2001).
Consider the class of models given by
e
yt = K + N yt−1 + M0 yte + M1 yt+1
+ P wt + Qυt
wt = F wt−1 + et
(1)
(2)
Here yt is an n × 1 vector of endogenous variables, wt is a p × 1 vector of exogenous
variables that follows a stationary VAR process with et ∼ N (0, Σ1 ) and υt ∼ N (0, Σ2 )
is an additional q dimensional error that is independent of et ; n ≥ p + q. Capital letters
denote vectors or matrices of appropriate dimensions that comprise the model parameters θ. For future reference, we call the parameters in matrices K, N , M0 , and M1 the
structural parameters and those in P , Q, F , Σ1 and Σ2 the shock parameters. Finally,
the superscript e represents the mathematical expectation of y given the information
set I at time t − 1: yte := E(yt |It−1 ). One could also work with contemporaneous
expectations by defining the appropriate information set.1 Clearly, expectations of y
depend on what is and is not included in this information set. Rational expectation
(RE) assumes agents’ knowledge of the history of the realizations of both the endogenous and exogenous variables in the model up to and including period (t − 1), all of
the parameters in the model as well as the structure of the model itself. In contrast,
1
See Evans and Honkapohja (2001) for details as well as some examples of this case.
3
adaptive learning agents have a strict subset of this information. In particular, we
will suppose that (a) our agents do not know the structural parameters of the model,
and (b) their perception of the model is based on the Minimum State Variable (MSV)
representation of the RE solution (McCallum 1983). Alternative information sets have
been studied in Marcet and Sargent (1989) and Hommes and Sorger (1998). In typical
macroeconomic applications, this system is obtained by log-linearizing the underlying
microfounded general equilibrium model. As will be clear from the discussion below,
models with lagged endogenous variables present additional issues than ones without.
2.1
Solution and Stability Conditions
The MSV solution to (1)-(2) takes the form
yt = a + byt−1 + cwt−1 + P et + Qυt
(3)
where a, b and c are to be determined. Whereas the RE agent solves for these parameters by the method of undetermined coefficients, adaptive learners estimate them
using the data available in their information set. Accordingly, this reduced form is
referred to as the learning agents’ perceived law of motion (PLM). Stability analysis in
this case revolves around the “consistency” of the estimator in relation to some solution. The benchmark solution is usually the rational expectations equilibrium (REE).2
Consequently, the question of interest is whether the REE is learnable. Evans and
Honkapohja (2001, 2009) analyze the stability properties of the REE in terms of the
so called E-stability principle. The main results from the practitioner’s viewpoint are
summarized below.
e
= a + ab +
Calculating the conditional expectations yte = a + byt−1 + cwt−1 and yt+1
b yt−1 + bcwt−1 + cF wt−1 , and substituting into (1) yields
2
yt = K + (M0 + M1 (I + b))a + (N + M0 b + M1 b2 )yt−1 +
(P F + M0 c + M1 (bc + cF ))wt−1 + P et + Qυt .
(4)
This is the law of motion for yt that results from the expectations above and is referred
to as the actual law of motion (ALM). In practice, estimates of a, b and c would be
plugged into this expression. Hereafter we refer to these coefficients as the learning
parameters. Given the form of the PLM (3), these coefficients can be estimated by
0
0
the least squares algorithm with zt−1 = (1, yt−1
, wt−1
)0 as the regressors. However,
agents presumably assign a larger weight to more recent observations than the distant
past. A natural choice is therefore some form of discounted least squares. Denoting
φi = (ai , bi , ci )0 , where i refers to the ith equation in the system, and, bi and ci are the
i-th rows of b and c, these quantities can be estimated efficiently in real time by the
formula
φ̂i,t = φ̂i,t−1 + γRt−1 zt−1 (yi,t − φ̂0i,t−1 zt−1 )
0
Rt = Rt−1 + γ(zt−1 zt−1
− Rt−1 )
2
(5)
(6)
This need not be the case. However, it is the natural benchmark because agents effectively have
the rational expectations information set when the structural parameters are known in this model.
4
where Rt is the familiar moment matrix. It is important to note the timing of the
information set here. The assumption is that the estimate φ̂t is available only after
yt is observed. Thus, the ALM evolves in real time with φ̂t−1 replacing the learning
parameters in (4). Parameter γ ∈ (0, 1) dictates the extent to which past data is
discounted. A larger value of γ leads to heavier discounting. Note that γ is assumed
to be constant. Hence the terminology of constant gain least squares (CGLS). Setting
γ = 1/t above results in the familiar recursive lease squares (RLS) formula. For a brief
introduction to the RLS algorithm and CGLS, see pages 32-34 and 48-50, respectively,
in Evans and Honkapohja (2001). Lastly, notice the squared term b2 in the ALM. This
(matrix) quadratic arises because of the autoregressive component in the model. In
the absence of yt−1 , the solution, stability condition and estimation are all simpler. We
discuss this case separately later.
To calculate the REE we equate coefficients in (3) and (4). This gives rise to the
following set of matrix equations:
a = K + (M0 + M1 + M1 b)a
b = M1 b2 + M0 b + N
c = M0 c + M1 bc + M1 cF + P F
(7)
(8)
(9)
Without making further assumptions it is generally not possible to obtain a closed form
solution to this system of equations. The interesting point though is that the matrix
quadratic can result in multiple solutions for b. We illustrate this in the context of a
univariate example below for which the analytical solution is readily calculated. The
mapping from the PLM to the ALM, referred to as the T-map is given by
T (a, b, c) = (K + (M0 + M1 (I − b))a, M1 b2 + M0 b + N, M0 c + M1 bc + M1 cF + P F ) (10)
To determine the conditions for the local stability of the REE under learning one
follows the E-stability principle in Evans and Honkapohja (2001). Under decreasing
gain (γ = 1/t), the condition for E-stability is that, at the REE, the eigenvalues of the
Jacobian of vecT must have real parts less than 1. Denoting the REE (ā, b̄, c̄), this
translates to the condition that the following square matrix


I ⊗ (M0 + M1 (I − b̄))
ā0 ⊗ M1
0

0
I ⊗ M0 + b̄0 ⊗ M1
0
DT (ā, b̄, c̄) = 
0
0
0
c̄ ⊗ M1
I ⊗ (M0 + M1 b̄) + F ⊗ M1
(11)
must have eigenvalues with real parts less than 1. For computational efficiency, it
suffices to calculate the eigenvalues of the three sub matrices on the diagonal of DT .
The analogous E-stability condition for constant gain learning is that these eigenvalues
lie within a circle of radius 1/γ with origin (1 − 1/γ, 0) (Evans and Honkapohja 2009).
The following points regarding the importance of constraining the parameter space
to the E-stable region are noteworthy to the practitioner:
1. Presumably, the main goal of working with a learning model is to analyze the
learnability of a REE. Learnability provides a practical justification for the existence of a REE. This point is also stressed in McCallum (2007). It is therefore
5
imperative for a practitioner to enforce the E-stability constraint, particularly
when conducting empirical analysis. Otherwise, the purpose of the analysis remains unclear as are the interpretations of the parameter estimates. In this
situation, one might as as well directly estimate the parameters under the said
REE.
2. An exception to the previous point is when the agents’ information set includes
the contemporaneous endogenous variables. As shown in McCallum (2007), in
this case, determinacy of a REE also ensures that it is E-stable. However, because the causality does not run in the other direction, a researcher that is only
interested in the determinate solution needs to exercise caution.
3. When a multiplicity of solutions exist, E-stability may not by itself suffice to
ensure convergence to the stationary solution. As shown in Marcet and Sargent
(1989), ensuring convergence in this situation requires the implementation of an
appropriate projection facility. There is no unique way to implement it however.
Consequently, parameter estimates depend on the rule chosen by the researcher.
An alternative and more direct, although less effective, approach is to penalize the
likelihood function when the T-map condition is violated for a particular estimate
of the learning coefficients. We discuss both alternatives more concretely in the
context of Example 2 in the following section.
As an illustration, consider the following univariate model
e
yt = α + δyt−1 + βyt+1
+ wt
wt = ρwt−1 + et
(12)
Comparing the model in (1)-(2), K = α, N = δ, M1 = β, P = 1 and F = ρ. For
reasons that will be clear in the following section, we have set M0 = Q = 0. Writing
the MSV solution as
yt = a + byt−1 + cwt−1 + et .
(13)
the conditional expectation of yt+1 given information up to (t − 1) is
Et−1 yt+1 = (1 + b)a + b2 yt−1 + c(b + p)wt−1 .
(14)
Substituting this back into the model (12) we get the ALM
yt = α + β(1 + b)a + (βb2 + δ)yt−1 + (βc(b + p) + ρ)wt−1 + et .
(15)
with the corresponding T-map
Ta (a, b, c) = α + β(1 + b)a
Tb (a, b, c) = βb2 + δ
Tc (a, b, c) = ρ + β(b + ρ)c
6
(16)
(17)
(18)
The RE solution to this system is given by
ā = α/(1 − β(1 + b̄))
p
b̄ = (1 ± 1 − 4βδ)/2β
c̄ = ρ/(1 − β(b̄ + ρ))
(19)
(20)
(21)
where it is assumed that the discriminant 1 − 4βδ is positive and that all the denominators are non-zero. Note that the squared term leads to two distinct solution for b.
We reiterate here that it is important to impose the following constraint: −1 < b̄ < 1.
This ensures that the ALM at the RE solution is stationary, but allows for temporary
non-stationarity based on a particular path of learning coefficient estimates.
The derivative of the T-map at the RE solution is


β(1 + b̄) βā
0

0
2β b̄
0
DT (ā, b̄, c̄) = 
0
βc̄ β(b̄ + ρ)
(22)
Recall that, under constant gain learning, the REE solution is E-stable if the eigenvalues
of DT are contained in the circle with origin (1 − 1/γ, 0) and radius 1/γ. Because we
restrict the REE to be real, the eigenvalues of DT (which are simply its diagonal) are
also real. For a given γ it is then sufficient to check that 1 − 2/γ < diag(DT ) < 1.
Clearly, only one of the solutions for b̄ (the one with the negative square root) satisfies
this condition. The alternative solution, obtained from the positive square root of the
discriminant, is E-unstable.
We complete this section with a brief discussion of a special case of the general
model in (1). In the absence of the autoregressive component, the REE is unique and,
assuming the appropriate rank condition for the system matrices, has a closed form
solution. To see this, we write the MSV solution (as well as the PLM) as
yt = a + cwt−1 + P et + Qυt
(23)
Calculating the conditional expectations as before and and substituting into the model
results in the ALM
yt = K + (M0 + M1 )a + (M0 c + M1 cF + P F )wt−1 + P et + Qυt
(24)
Finally, equating terms in (23) and (24), it follows that the RE solution for a and c is
ā = (In − M0 − M1 )−1 K
vec(c̄) = (In×n − In ⊗ M0 − F 0 ⊗ M1 )−1 (F 0 ⊗ In )vec(P )
(25)
(26)
where In is the n-dimensional identity matrix and vec denotes vectorization by column.
The T-map now simplifies to
T (a, c) = (K + (M0 + M1 )a, M0 c + M1 cF + P F )
(27)
In this case, the E-stability condition for CGLS simplify to the condition that the
7
eigenvalues of the following matrices
DTa (ā) = M0 + M1
DTb (b̄) = In ⊗ M0 + F 0 ⊗ M1
(28)
(29)
lie within a circle of radius 1/γ with origin (1 − 1/γ, 0). With this background, we now
turn to the econometrician’s problem of estimation and inference.
2.2
Likelihood Function and Parameter Estimation
Our goal is to estimate the model parameters θ given the sample data on the endogenous
variables: YT = (y0 , . . . , yT ). One might also be interested in inferring the exogenous
variables WT = (w0 , . . . , wT ), as well as the learning parameters ΦT = (φ0 , . . . , φT ).
Parameter inference is based on the assumption that data is generated by the ALM
(4). Unlike in the RE framework, therefore, the model generating the data is different
for the agent and the researcher. We return to this point later in the section.
To estimate the unknowns in the model, we combine the ALM (4) and the exogenous
process (2) in the state space model (SSM)
st = µt−1 (θ, φ̂t−1 ) + Gt−1 (θ, φ̂t−1 )st−1 + H(θ)ut
yt = Bst
(30)
(31)
by defining
yt
et
K + (M0 + M1 (I + b̂t−1 ))ât−1
st =
, ut =
∼ N (0, Ω), µt−1 =
,
wt
υt
0
Gt−1 =
N + M0 b̂t−1 + M1 b̂2t−1 (P F + M0 ĉt−1 + M1 (b̂t−1 ĉt−1 + ĉt−1 F )
,
0
F
P Q
H=
, B = In 0
I 0
where 0 denotes vectors or matrices of appropriate dimensions. Note that the measurement equation (31) simply extracts yt from the state vector st . Consequently, only
part of the state vector that comprises wt is truly latent. In the context of SSMs, it is
important that wt remain an exogenous process.
The interesting part of the SSM here is the learning parameters φ̂t−1 in the system
matrices µt−1 and Gt−1 . Recall that these time varying parameters are calculated by
0
0
the formula (5)-(6) with zt−2 = (1, yt−2
, wt−2
)0 as the vector of regressors. Consequently,
the system matrices involve parameters that are non-linear functions of the lagged state
vector wt−2 as well as the past observations Yt−1 . Our SSM is therefore unconventional
and the usual technique for deriving the likelihood function as a by-product of the
Kalman filter is not fruitful here.
To state the problem explicitly, the likelihood function now depends on Φ̂T in addi8
tion to θ. Thus, the joint density of the data takes the form
f (y1 , . . . , yT |Y0 , θ, Φ̂T ) = f (yT |YT −1 , θ, φ̂T −1 ) . . . f (y1 |Y0 , θ, φ̂0 )
(32)
where, from the ALM we have exploited the fact that, for t = 1, . . . , T , given Yt−1 , the
density of yt depends only on θ and φ̂t−1 . For given values of Φ̂T and θ, this likelihood
function is well defined. From a frequentist perspective, therefore, one can at least
conceptually think of maximizing it with respect to both Φ̂T and θ, much the same
way as one would calculate the MLE of θ in a standard SSM. However, the impediment
here is that φ̂t is a function of the latent states wt−1 , which is itself a random variable.
While, estimates (or more precisely the conditional distribution) of the latent states
may be calculated based on the observed data, it is not equivalent to knowing the latent
states. In other words, only conditional on WT can Φ̂T be treated as a parameter in
the likelihood function. But then the likelihood function itself is a function of WT . It
is therefore not feasible to derive the joint density of the data only as a function of
the unknown parameters by means of the Kalman filter. To our knowledge there is
no legitimate method for calculating the likelihood function in this problem without
taking recourse to nonlinear techniques.
The approach taken in the literature is to approximate the likelihood value by calculating φ̂t , t = 1, . . . , T , conditional on estimates of WT . Conveniently then, the
Kalman filter delivers the latter in the form of E(wt |Yt , θ), t = 1, . . . , T . One therefore
iteratively updates φ̂t within the Kalman filter by substituting E(wt |Yt ) in place of
wt in (5)-(6). Of course, this procedure raises concerns about the optimality of the
Kalman filter as well as the properties of the MLE of θ. Clearly this approach leans
towards computational efficiency rather than accuracy. Finally, it is worth noting that
if ∀t, wt = 0, then the ALM is simply a vector autoregressive process that can be
cast as a standard linear Gaussian SSM. In that case the aforementioned complication
regarding the likelihood function is not relevant.
Having dealt with the calculation of φ̂t , t = 1, . . . , T , the only remaining concern is
the initialization of φ̂0 . The two main approaches in the literature to deal with this are
(a) estimate φ̂0 using extraneous information and treat it as a known quantity when
estimating θ and (b) treat φ̂0 as an additional parameter and estimate it alongside θ
using the sample data. While several variants of the former have been discussed in
the literature (see, for instance, Slobodyan and Wouters (2012)), we focus here on the
training sample approach. We then compare this to the case where φ0 is estimated
within the sample. Note that the aforementioned discussion applies equally well to R0 .
Without loss of generality, we only consider the estimation of φ0 in this paper. In our
examples, we initialize R0−1 as an arbitrary matrix with large diagonal elements.
We now provide the details for likelihood based estimation of θ in the case of
the training sample approach. We denote the available pre sample of length S by
Y = (y−S+1 , . . . , y0 ). Simultaneously let W = (w−S , . . . , w−1 ) be the collection of
the corresponding latent states. Then, in the frequentist setting, the procedure for
estimating θ can be summarized conceptually as follows:
max f YT |θ, φ̂0 (Y, θ)
θ
9
A Bayesian would combine the likelihood function with a prior on θ to calculate the
posterior distribution of θ. We discuss the specifics of this in the context of specific
examples in the following section. For now, our primary focus is on the derivation of
the likelihood function. Procedurally, it involves two steps.
Step 1: Estimate φ0 given θ and Y
1a. For the pre sample period, replace φ̂t in the SSM with the REE, φ̄, of the model
(computed either analytically or numerically)
1b. Run the Kalman filter and smoother to estimate E(W |Y, θ)
1c. Construct for t = −S, . . . , −1 the vector ẑt = (1, yt0 , ŵt0 )0
1d. Calculate the OLS estimate φ̂0 by regressing Y on Z = (ẑ−S , . . . , ẑ−1 )
With this initial value of φ̂, we now turn to the evaluation of the likelihood function. Let w0 |Y0 ∼ N (ω0 , Ω0 ) denote the prior distribution of w0 based on some initial
information Y0 . The hyperparameters of the normal distribution can be specified in
one of several ways. We present two alternatives here. Assuming that the latent
process is stationary, these quantities take on the steady state values of ω0 = 0 and
vec(Ω0 ) = (Ip×p − F ⊗ F )−1 vec(Σ1 ). Alternatively, we could include w0 in W and
extend step 2. above by one additional iteration to get ω0 and Ω0 conditioned on Y
and θ. Subsequently, to evaluate the likelihood at a given value of θ, one essentially implements the Kalman filter with an additional step to update the learning parameters.
Thus, the second step in the procedure is as follows:
Step 2: Calculate the likelihood value as a function of θ and φ̂0
2a. Initialize s0 |Y0 ∼ N (ψ0 , Ψ0 ) where
y0
ψ0 =
ω0
and Ψ0 =
0 0
0 Ω0
2b. For t = 1, . . . , T , calculate
(i)
Ψt|t−1 Ψt|t−1 B 0
st
ψt|t−1
|Yt−1 , θ, φ̂t−1 ∼ N
,
BΨ0t|t−1
Γt
yt
Bψt|t−1
where
ψt|t−1 = µt−1 +Gt−1 ψt−1 ,
Ψt|t−1 = Gt−1 Ψt−1 G0t−1 +HΩH 0 ,
(ii)
0
Rt = Rt−1 + γ(ẑt−1 ẑt−1
− Rt−1 )
φ̂i,t = φ̂i,t−1 + γRt−1 ẑt−1 (yit − φ̂0i,t−1 ẑt−1 )
0
0
where zt−1 = (1, yt−1
, ŵt−1
)0
10
Γt = BΨt|t−1 B 0
(iii)
st |Yt , θ, φ̂t−1 ∼ N (ψt , Ψt )
where
ψt = ψt|t−1 + Ψt|t−1 B 0 Γ−1
t (yt − Bψt|t−1 ),
2c. Return the log-likelihood as
T
P
0
Ψt = Ψt|t−1 − Ψt|t−1 B 0 Γ−1
t BΨt|t−1
log(yt |Yt−1 , θ, φ̂t−1 )
t=1
Embedding steps 1 and 2 in a numerical maximization routine, one can calculate the
MLE of θ. Likewise, a Bayesian would combine these steps to evaluate the likelihood
for a given value of θ.
The following points are worth noting about the above procedure:
• It is easy to verify that for all t = 1, 2, . . . , T , ψt and Ψt replicate the structure
of ψ0 and Ψ0 . That is,
0 0
yt
and Ψt =
ψt =
ωt
0 Ωt
where ωt = E(wt |Yt , θ, φ0 ) and Ωt = V ar(wt |Yt , θ, φ0 ).
• As mentioned earlier, if it were possible to calculate φ̂t (and Rt ) conditional
on wt−1 instead of ωt−1 , then the preceding three steps form a coherent cycle.
Further, in this non-standard SSM, there is also the concern as to whether ωt−1
is the optimal estimate of wt−1 . As such we are unaware of any solution to this
problem within the frequentist setting without resorting to nonlinear techniques.
Our main goal here is to highlight this issue and we leave the resolution to future
work.
We conclude this section with the following note. To estimate φ̂0 within the sample,
one simply skips Step 1 above and treats φ̂0 as an additional set of parameters. But
this means that the MLE problem is now one of
max f YT |θ, φ̂0
θ,φ̂0
Examples 1 and 3 below explore both alternatives. We estimate the initial values of
the learning parameters with a training sample and then separately with the actual
sample along with θ.
3
Examples
In this section we provide three illustrative examples. The first two stylized examples
are based on simulated data, whereas the third is a simple New Keynesian (NK) model
that we fit to real data. In the spirit of the framework discussed in the previous section,
all three examples involve exogenous processes that follow a stationary AR(1) process.
11
3.1
Simulated Data Example 1
Consider the following model
e
y1,t = α + βy1,t+1
+ κy2,t + wt
(33)
e
+ vt
y2,t = λy1,t
(34)
wt = ρwt−1 + et
(35)
where et and vt are independent zero mean i.i.d. sequences with standard deviations
σ1 and σ2 , respectively. Equation (34), which appears to be ad-hoc in this setup, is
motivated by the Taylor rule specification in monetary DSGE models (Taylor 1993).
It’s role is explained in the estimation section below. Because y2,t depends on expectations over y1,t and not itself, we suppress it in the following discussion. Substituting
(34) into (33) one obtains
e
e
y1,t = α + βy1,t+1
+ κλy1,t
+ wt + κvt
(36)
which is of the form (1). Accordingly, the MSV solution is
y1,t = a + cwt−1 + et + κvt
(37)
with the resulting law of motion for y1,t under RE being
y1,t =
α
ρ
+
wt−1 + et + κvt
1 − β − κλ 1 − ρβ − κλ
(38)
Equation (37) serves as the agents’ PLM who update φt = (at , ct )0 following (5).
The ALM then takes the form
y1,t = α + (β + κλ)at−1 + (ρ + (ρβ + κλ)ct−1 )wt−1 + et + κvt
(39)
with the T -map defined as
T (a, c) = (α + (β + κλ)a, ρ + (ρβ + κλ)c)
In this case, the convergence criteria (28)-(29) simplify to β + κλ < 1 and ρβ + κλ < 1.
One completes the state space setup by defining
 




y1,t
α + (β + κλ)at−1
0 0 ρ + (ρβ + κλ)ct−1
 Gt−1 = 0 0

λat−1
λct−1
st = y2,t  µt−1 = 
wt
0
0 0
ρ


1 κ
e
y
1
0
0
t
1,t
H = 0 1  u t =
yt =
B=
vt
y2,t
0 1 0
1 0
where y2,t = λat−1 + λct−1 wt−1 + εi,t is obtained by substituting yte in (34).
With the SSM setup, we now turn to the problem of parameter estimation. As
noted by Chevillon et al. (2010), the structural parameters in this model suffer from
12
identification when γ is small. This might be immediately apparent to the reader
because not all parameters are identified under RE. In light of this issue, we consider
three alternative versions of this model when estimating the unknown parameters.
Model 1(a) λ = κ = 0
Model 1(b) λ 6= 0 and κ = 0
Model 1(c) λ, κ 6= 0
The intent here is to understand the extent to which the sample data provides information about the structural parameters when additional measurements in the form of
y2,t are available. In this regard, the additional information afforded in Model 1(b) is
expectations of y1,t . It is assumed, however, that these expectations do not influence
the evolution of y1,t directly. In contrast, Model 1(c) assumes that these expectations
feedback to y1,t .
Separately, we are also interested in whether estimating φ0 together with θ affects
the estimates of the latter. For this experiment, we follow the two alternative approaches discussed in the previous section for estimating φ0 . That is, we first estimate
φ0 alongside θ using the sample data. We then compare this to the case when φ0 is estimated using a pre-sample. For fair comparison of the results from the two approaches
we work with the sample length throughout.
We generated 240 observations of yt for each of the three versions of the model. The
actual sample was preceded by a burn-in period of 5000 ensured that any lingering effects of the initial values of the unknowns set in the DGP had worn off. Table 1 presents
the data generating values for the parameters θ = {α, β, κ, σ12 , γ, λ, ρ, σ22 , a0 , c0 }. The
initial values of a0 and c0 reported are the values immediately preceding the sample
after an initial burn-in of 1000 observations. Table 1 reports the MLE of θ and φ0 in
all three cases of Example 1.
Our results confirm the difficulty in identifying the structural parameters in case (a)
while providing the new result that the initial values of the learning parameters are not
well identified either. As we turn to case (b) we see that adding a measurement with
no feedback improves identification of the learning parameters, but not the structural
parameters. However, once we have the feedback in case (c) we find that all the
estimates are fairly well identified. Our intuition for the improved identification in
cases (b) and (c) is as follows. The fact that agents do not form expectations over
the additional measurement reduces the set of possible time paths of the learning
parameters, resulting in a more well defined likelihood function. As regards to the
second concern above, the second column of results in each case indicates that the
estimates of the initial values of the learning parameters have little impact on the
estimates of the structural parameters.
Turning to the Bayesian approach, we suppose that the parameters are apriori independent. The prior distribution on each element of θ is centered at values that are
distinctly different from the DGP. We also assume fairly large standard deviations. In
particular, the gain parameter γ is allowed to vary uniformly between 0 and 1. Also,
the autoregressive parameter ρ of the exogenous process is assumed to be uniformly
distributed between −1 and 1. Note that this enforces the stationary constraint on
13
wt . The priors on a0 and c0 are centered at the REE corresponding to the prior mean
of θ. To capture the uncertainty regarding these quantities we specify large standard
deviations. Table 2 summarizes our prior. In addition to the parameter constraints
implied by the prior distribution, we also enforce the E-stability constraints mentioned
in the previous section.
In the estimation procedure we initialized the parameters at their prior mean. We
then ran the Tailored Randomized Block Metropolis-Hastings (TaRB-MH) sampler
(Chib and Ramamurthy (2010)) for 11000 iterations and discarded the initial 1000
draws as burn-in. See Appendix A for a brief overview of the TaRB-MH sampler.
Table 2 summarizes the posterior distribution based on the sampled draws. The values
in the left panel correspond to the case where a0 and c0 are estimated from a training
sample. For this experiment, the training sample comprised of the first 40 observations
with the remaining 200 observations contributing to the likelihood value. The posterior
summary in the right panel includes a0 and b0 as additional parameters. For conformity of the data sample, these estimates are also based on observations 41 through
240. The results indicate that the estimated marginal posteriors are practically indistinguishable for the two cases. Also noteworthy is the closeness of the posterior
mean to the data generating values of the parameters. This is not surprising given
the maximum likelihood estimates - the data carried substantial information about the
parameters.
To conclude this section, we plot the marginal prior and (kernel smoothed) posterior
of each parameter in Figure 1. In this figure, the dashed line is the prior, the green line
is the posterior for the training sample that corresponds to the left posterior panel in
Table 2 and the blue line is the posterior without the training sample that corresponds
to the right panel in Table 2. As mentioned earlier, the similarity between the estimates
for the two approaches are more evident in the figure. We also plot the simulated at
and ct for the PLM (dashed line) and ALM (circled line) based on the posterior draws
in Figure 2. The shaded regions capture the 95% band. These plots were generated
by first simulating the PLM and ALM for each draw from the posterior.3 Then for
each t we calculated the .025, .5 and .975 quantiles. The horizontal lines denote the
corresponding quantiles for the REE.
3.2
Simulated Data Example 2
Consider the univariate example associated with equation (12) presented in Section 2.1
that we repeat here for the reader’s convenience.
e
yt = α + δyt−1 + βyt+1
+ wt
wt = ρwt−1 + et
The parameter values that we use to generate the data (see Table 3) imply the following
values of the rational expectations solution (equations 19-21): ā = 1.264 b̄ = 0.052, and
3
The procedure is identical to Step 2 in the likelihood calculation. Recall that, for a given value
of θ and φ0 , at and ct are updated within the Kalman filter based on the PLM. The corresponding
quantities for the ALM are simply the non-zero elements in the first row of µt−1 and Gt−1 .
14
c̄ = 2.514. One can verify that this is the E-stable solution. The alternative solution,
obtained from the positive square root of the discriminant, is E-unstable.
When generating the data, we initialize the learning parameters sufficiently close
to the RE solution. However, in the process of updating these quantities by the RLS
algorithm it is entirely possible that they wander off the stable region as discussed
by Marcet and Sargent (1989). This happens because a particular estimate (by the
agents) of b may push the eigenvalues of DT to the unstable region. The solution to
this problem is to invoke a projection facility which nudges the learning parameters
back to the stable region, which in the case of CGLS is the circle with origin (1 −
1/γ, 0) and radium 1/γ. A projection facility can come in many forms. The simplest
implementation is to recall the latest value of the learning parameters that resulted
in eigenvalues in the stable region. This approach clearly poses the danger of getting
stuck at those latest stable values of the learning parameters. Practically, however,
this may not be a source of major concern as long as the corresponding values of
the structural parameters are less likely to have generated the data. For alternative
implementations, Marcet and Sargent (1989) provide a general guide to restricting the
stable space. Even in our simple example, however, defining the stable space of the
learning parameters can result in several inequalities, turning this process into an ordeal
for the practitioner. As mentioned earlier, there is a easier third alternative - that of
penalizing the likelihood function for unstable values of the learning parameters. From
a practical viewpoint, this can always be a fallback option that can be implemented
along with the simple projection facility.
We now focus on our ability to estimate a model with lagged endogenous variables.
Table 3 displays the results from varying sample sizes ranging from 200 to 2000. As
the results demonstrate, even with a very large sample size, the estimation of this class
of models is tenuous at best. In particular, notice that estimates of γ get worse as the
sample size increases, which is troublesome since the chief objective is the story about
learning. Table 3 also supports the claim made in the previous example that estimating
the initial values of the learning parameters has little impact on the estimates of the
structural parameters as shown by the second column for each sample size.
3.3
Application to a Monetary DSGE Model
As an application to a DSGE model, consider the following canonical model derived
from Woodford (2003)
e
xt = xet+1 − ψ(it − πt+1
) + εx,t
e
πt = λxt + βπt+1 + επ,t
it = θx xet + θπ πte + εi,t
(40)
(41)
(42)
Here, the endogenous variables are output gap xt , inflation πt and the nominal interest
rate it . Underlying this linear system is a nonlinear microfounded model comprising
households and firms that are intertemporal optimizers. The central bank, on the other
hand, sets the interest rate following the Taylor rule. As mentioned earlier, this interest
rate rule (42) is an ad-hoc specification in the sense that the economic agents do not
15
incorporate expectations of it in their decisions. Finally, the structural shocks εx,t and
επ,t follow independent stationary AR(1) processes. For these processes, we denote the
autoregressive coefficients ρx and ρπ , and the variance of the white noise terms σx2 and
σπ2 , respectively. We also assume that εi,t ∼ N (0, σi2 ).
As in the previous example, substituting (42) into (40) leads to the familiar form
(1) with N = K = 0
e
+ P wt + Qεi,t
(43)
yt = M1 yte + M2 yt+1
where
xt
yt =
πt
εx,t
−ψθx −ψθπ
1
ψ
wt =
M0 =
M1 =
επ,t
−λψθx −λψθπ
λ β + λψ
1 0
−ψ
ρx 0
P =
Q=
F =
λ 1
−λψ
0 ρπ
Note that here K = 0. As before, we write the PLM as
yt = a + cwt−1 + P et + Qεt
where
ax
a=
aπ
c=
cxx cxπ
cπx cππ
Substituting the expectational terms yields the following ALM
yt = (M0 + M1 )a + (M0 c + M1 cF + P F )wt−1 + P w̃t + Qεi,t
(44)
The corresponding T-map for the learning parameters is then
T (a) = (M0 + M1 )a
T (c) = M0 c + M1 cF + P F
(45)
(46)
Finally, note that the RE solution for this model is aRE = 0 and
vec(cRE ) = (I4 − F 0 ⊗ M1 − I2 ⊗ M0 )−1 (F 0 ⊗ I2 )vec(P )
The SSM is completed as before by including the additional measurement it = Sa +
Scwt−1 + εi,t , where S = (θx , θπ ).
We fit this model to the sample period 1954:III – 2007:I with the vector of quarterly
measurements on output gap, inflation and the nominal interest rate. These quantities
were calculated from the data on quarterly GDP, GDP deflator and the Federal Funds
Rate published by the St. Louis Fed as follows. Output gap was obtained by taking
the log difference of real GDP from its potential (as calculated by the Congressional
Budget Office). The inflation and interest rates were taken to be the demeaned GDP
deflator and annualized federal funds rate.
Given the complexity of the model we favor using Bayesian techniques. Collecting
16
the parameters of interest in the vector
θ = (ψ, λ, β, θx , θπ , ρx , ρπ , σx2 , σπ2 , σi2 , γ)0
we suppose that the intertemporal elasticity of substitution ψ apriori follows a Gamma
distribution with mean 4.0 and standard deviation 2.0. For the Taylor rule coefficients
θx and θπ we assume a normal prior with mean 0.25 and 1.0, respectively, and unit
variance. As is well documented in the literature, the discount factor β ∈ (0, 1) that is
consistent with the data usually exceeds its upper limit of 1. For this reason it is not
uncommon to fix its value close to 1. In our estimation we fix β = 0.95. Similarly, λ is
also pinned down by the cost adjustment parameter in the underlying structural model.
Because in this paper we deal only with the reduced form, we also fix λ to a reasonable
0.05. For the remaining model parameters, the prior specification is identical to that in
the previous example. Finally, for the initial learning parameters φ0 , we suppose that
each element follows a normal distribution with mean calculated as in the previous
example and a large standard deviation. Table 4 summarizes this prior.
This table also presents the posterior summary based on 9000 iterations of the
TaRB-MH algorithm beyond a burn-in of 1000. The left posterior panel corresponds
to the case where φ0 was estimated using a training sample. The first forty observations
from 1954:III to 1964:II were reserved for this purpose. The effective sample therefore
runs from 1964:III to 2007:I. Results in the center panel are also based on this sample
but now include estimates of φ0 as well. Once again, the estimated marginal posteriors
are similar in both cases. This can also be viewed in Figure 3 that plots the kernel
smoothed marginal posterior densities along with the prior densities of the parameters.
In contrast, the results in the last panel are starkly different. These estimates are based
on the full sample 1954:III to 2007:I. While in this paper we would not draw attention
to the parameter estimates per se, it is worth noting the tight posterior interval for γ.
This is in sharp contrast to the flat prior and is particularly striking in the third panel.
Recall that the convergence condition enforced here is valid only for small enough γ.
Finally, we reiterate our stance that if one estimates a model of learning one presumably is interested in the potential story of agents expectations. Therefore, we include
graphs of the reduced form coefficients of the ALM and PLM, that is M0 c+M1 cF +P F
and c respectively, in Figure 4. While we would not consider these results conclusive,
the graphs are certainly insightful. Note that these graphs represent how a lagged
shock to output or inflation influences current output or inflation. The graphs show
that both the ALM and PLM of shocks to output influencing output stay quite close to
the RE solution and the same is true for inflation on inflation. In addition, the ALM
and the PLM are generally overlapping. However, looking at the cross terms, shocks
of output on inflation and shocks of inflation on output, we find that the learning
parameters drift form the RE solution, but also that perceptions do not always align
with reality. If one considers this in context of the traditional Phillips curve, that is
in the short run shocks to inflation should influence output this provides some insight
to how perceptions of this relationship have changed over time. Since the early 1980’s
perceptions suggest that shocks to inflation have a positive influence on output and
that has influenced the ALM. However, the opposite story is not true, perceptions of
shocks on output on inflation have at times implied a positive relationship and at other
17
a negative relationship.
4
Conclusion
We conclude by noting that models with learning present a challenging problem to
the econometrician. In practical applications, not only are the parameter restrictions
imposed by the solution stringent, but the structure implied by learning limits the
econometrician to a second-best estimate of the variables that the agents incorporate.
This paper highlights these issues and provides an accessible estimation guide that is
useful for the applied researcher. In particular, this is applicable to the growing body
of work on NK-DSGE models and suggests a promising future for the role of learning
in this area.
A
Appendix: TaRB-MH Algorithm
The two key components of the Tailored Randomized Block MH (TaRB-MH) algorithm
are randomized blocks and tailored proposal densities. In each MCMC iteration, one
begins by dividing θ into a random number of blocks, where each block comprises
random components of θ. For a given block, one then constructs a Student-t proposal
density (with low degrees of freedom, say, 15) centered at the posterior mode of the
block and variance calculated as the negative inverse of the Hessian at the mode. The
final step is to draw a value from this proposal density and accept it with the standard
MH probability.
To illustrate this prodecure for an MCMC iteration, suppose that θ is divided into B
blocks as (θ1 , . . . , θB ). Consider the update of the lth block, l = 1, . . . , B with current
value θl . Denoting the current value of the preceding blocks as θ−l = (θ1 , . . . , θl−1 ) and
the following blocks as θ+l = (θl+1 , . . . , θB )
1. Calculate
θl∗ = arg max log f (y|θ−l , θl , θ+l )π(θl )
θl
using a suitable numerical optimization procedure
2. Calculate the variance Vl∗ as the negative inverse of the Hessian evaluated at θl∗
3. Generate θlp ∼ t(θl∗ , Vl∗ , νl ) where νl denotes the degrees of freedom
4. Calculate the MH acceptance probability α as
f (y|θ−l , θlp , θ+l )π(θlp ) t(θl |θl∗ , Vl∗ , νl )
,1
α = min
f (y|θ−l , θl , θ+l )π(θl ) t(θlp |θl∗ , Vl∗ , νl )
5. Accept θlp if u ∼ U (0, 1) < α, else retain θl and repeat for the next block.
18
References
Branch, William A and George W Evans, “A simple recursive forecasting model,”
Economics Letters, 2006, 91 (2), 158–166.
Chevillon, G, M Massmann, and S Mavroeidis, “Inference in models with adaptive learning,” Journal of Monetary Economics, 2010, 57 (3), 341–351.
Chib, S and Srikanth Ramamurthy, “Tailored randomized-block MCMC methods
for analysis of DSGE models,” Journal of Econometrics, 2010, 155 (1), 19–38.
Eusepi, Stefano and Bruce Preston, “Expectations, Learning, and Business Cycle
Fluctuations,” American Economic Review, October 2011, 101 (6), 2844–2872.
Evans, George W and Seppo Honkapohja, Learning and expectations in macroeconomics, Princeton: Princeton University Press, January 2001.
and
, “Robust learning stability with operational monetary policy rules,” in
Karl Schmidt-Hebbel and Carl Walsh, eds., Monetary Policy under Uncertainty
and Learning, Santiago: Central Bank of Chile, 2009, pp. 145–170.
,
, and Kaushik Mitra, “Notes on Agents’ Behavioral Rules Under Adaptive Learning and Studies of Monetary Policy,” CDMA Working Paper Series,
2011.
Hommes, Cars H and Gerhard Sorger, “Consistent Expectations Equilibria,”
Macroeconomic Dynamics, July 1998, 2, 287–321.
Marcet, Albert and Thomas J Sargent, “Convergence of Least-Squares Learning
in Environments with Hidden State Variables and Private information,” Journal
of Political Economy, 1989, 97 (6), 1306–1322.
McCallum, Bennett T, “On Non-Uniqueness in Rational Expectations Models: An
Attempt at Perspective,” Journal of Monetary Economics, 1983, 11 (2), 139–168.
, “E-Stability vis-a-vis determinacy results for linear rational expectations models,” Journal of Economic Dynamics and Control, April 2007, 31 (4), 1376–1391.
Milani, Fabio, “Expectations, learning and macroeconomic persistence,” Journal of
Monetary Economics, 2007, 54 (7), 2065–2082.
, “The Modeling of Expectations in Empirical DSGE Models: a Survey,” Advance
in Econometrics, July 2012, 28, 3–38.
Orphanides, Athanasios and John C Williams, “The decline of activist stabilization policy: Natural rate misperceptions, learning, and expectations,” Journal of
Economic Dynamics and Control, 2005, 29 (11), 1927–1950.
Preston, Bruce, “Learning about monetary policy rules when long-horizon expectations matter,” International Journal Of Central Banking, 2005, 1 (2), 81–126.
19
Slobodyan, Sergey and Raf Wouters, “Estimating a medium-scale DSGE model
with expectations based on small forecasting models,” Journal of Economic Dynamics and Control, 2012, 36 (1), 26–46.
Taylor, John, “Discretion versus policy rules in practice,” Carnegie-Rochester conference series on public policy, 1993.
Woodford, Michael, Interest and prices: foundations of a theory of monetary policy,
Princeton: Princeton University Press, 2003.
20
λ
κ
β
param
α
0.3000
1.0000
0.5000
true
2.0000
0.6791
(0.0648)
1.0548
(0.1055)
model 1a
2.8051
2.6092
(0.5034)
(0.5155)
0.2592
0.2989
(0.1215)
(0.1231)
0.6722
(0.0706)
1.0517
(0.1053)
-289.6026
0.2249
(0.1389)
0.3191
(0.0923)
-4.0664
(6.5200)
2.0203
(3.4174)
-289.4089
0.2674
(0.0561)
0.6818
(0.0656)
0.8906
(0.0892)
0.4849
(0.0486)
0.0209
(0.0183)
4.4511
(1.2312)
1.5421
(0.5047)
-484.1486
-483.4043
0.2895
(0.0137)
0.6944
(0.0563)
0.8776
(0.0878)
0.4887
(0.0489)
0.031
(0.0340)
Table 1: MLE of parameters in Example 1
mle
model 1b
3.0149
3.0896
(0.4801)
(0.5238)
0.2197
0.2311
(0.1038)
(0.1198)
0.5000
0.6000
σ22
0.1000
ρ
γ
(1a)3.98; (1b)3.24; (1c)9.91
1.0000
a0
(1a)1.06; (1b)0.93; (1c)1.38
σ12
c0
log-likelihood
Note:
i. Model index: (a) λ = κ = 0, (b) λ 6= 0 and κ = 0, and (c) λ, κ 6= 0.
model 1c
2.25
2.2888
(0.4991) (0.5175)
0.4471
0.4534
(0.0521) (0.0550)
1.0452
1.0386
(0.0563) (0.0558)
0.3006
0.305
(0.0050) (0.0044)
0.655
0.6551
(0.0529) (0.0541)
1.1305
1.1298
(0.1133) (0.1131)
0.4482
0.4542
(0.0450) (0.0455)
0.1429
0.1192
(0.0543) (0.0577)
11.5107
(1.0887)
1.2684
(0.6009)
-500.2818 -501.678
ii. Results in the left column for each version of the model: Joint estimation of θ and φ0 using sample data 41:240. Results
in the right column for each version of the model: φ0 initialized using a training sample comprising observations 1:40 and
θ estimated using sample data 41:240.
21
22
-508.53
mean
2.5831
0.4107
1.0600
0.3045
0.6886
1.1496
0.4636
0.1420
n.s.e.
0.0170
0.0021
0.0013
0.0001
0.0021
0.0018
0.0007
0.0008
.025
1.7348
0.2965
0.9485
0.2931
0.5817
0.9455
0.3809
0.0550
posterior
.975
mean
3.4771
2.4788
0.5040
0.4145
1.1826
1.0698
0.3154
0.3002
0.8442
0.6754
1.3989
1.1473
0.5590
0.4646
0.2446
0.1619
11.4283
1.0916
-509.83
n.s.e.
0.0225
0.0019
0.0014
0.0002
0.0018
0.0018
0.0008
0.0011
0.0237
0.0125
.025
1.5849
0.3074
0.9535
0.2871
0.5684
0.9430
0.3779
0.0564
9.2875
-0.0455
.975
3.4452
0.5141
1.2000
0.3125
0.8054
1.3909
0.5725
0.2732
13.6527
2.1549
Note: .025 and .975 denote the quantiles. n.s.e. is the numerical standard error. Posterior summary in the left panel: φ0 initialized
using a training sample comprising observations 1:40 and θ estimated using sample data 41:240 Posterior summary in the right panel:
joint estimation of θ and φ0 using sample data 41:240.
prior
param
true
mean s.d.
dist.
α
2.00
1.00 2.00
N
0.50
1.00 2.50
N
β
κ
1.00
0.10 1.50
N
λ
0.30
1.00 1.50
N
ρ
0.60
0.00 0.58 U(-1,1)
σ12
1.00
0.75 2.00
IG
2
σ2
0.50
0.75 2.00
IG
γ
0.10
0.50 0.29 U(0,1)
a0
9.91
2.50 7.50
N
c0
1.38
0.00 1.50
N
posterior ordinate (unnormalized)
Table 2: Posterior Summary of Parameters in Example 1
1
−.5
.5
0
b0
−2.5
0
−1.5
5
1
−3
3
.5
17.5
1
0
10
−1
−.5
2.5
a0
−3
−1
−5
2
1
−12.5
1.5
1.5
3
2
3.5
1.5
−2
0
−1
.5
0
2
2
1
1
1.5
2
2
0
0
.25
.25
.5
.5
.75
.75
1
1
Figure 1: Marginal prior-posterior plots of the parameters. Red: prior, Green: posterior with training sample, Blue: posterior without training
sample.
23
Figure 2: Simulated REE, PLM and ALM plots for at (top) and ct (bottom) in Example 1.
Shaded regions show the 95% band (darker shade is for PLM).
24
δ
β
param
α
0.0500
0.8000
dgp
0.2000
0.0500
0.8000
γ
1.1627
ρ
a0
-0.0045
1.0000
b0
2.7191
σ2
c0
log-likelihood
-399.3193
n = 500
0.5493
0.5093
(0.3524)
(0.2868)
0.7903
0.8062
(0.0490)
(0.0292)
0.1031
0.0417
(0.0951)
(0.0434)
0.7925
0.8155
(0.0399)
(0.0291)
0.8645
0.9763
(0.2069)
(0.1578)
0.0150
0.0173
(0.0090)
(0.0112)
0.5666
(1.2949)
-0.1103
(0.1324)
3.4863
(1.0969)
-1012.0615 -1013.0774
Table 3: DGP and MLE of Parameters in Example 2
mle
200
n = 1000
0.4559
0.4082
0.2904
(0.4970)
(0.1832)
(0.1807)
0.8365
0.8297
0.8142
(0.0516)
(0.0248)
(0.0197)
0.0069
0.0121
0.0520
(0.0988)
(0.0455)
(0.0291)
0.8008
0.8280
0.8055
(0.0517)
(0.0208)
(0.0220)
1.0389
0.9971
0.9554
(0.2169)
(0.1582)
(0.1027)
0.0345
0.0149
0.0115
(0.0070)
(0.0050)
(0.0067)
0.5719
(1.5204)
-0.1567
(0.0724)
3.3489
(0.7073)
-2015.4323 -2016.0206
n=
0.6060
(0.6564)
0.8226
(0.0494)
0.0069
(0.0955)
0.8526
(0.0444)
0.8690
(0.3810)
0.0396
(0.0133)
0.6984
(1.9551)
-0.1629
(0.1334)
3.9071
(1.5364)
-399.9057
n = 2000
0.0985
0.1463
(0.0962)
(0.0859)
0.8124
0.8272
(0.0146)
(0.0135)
0.0532
0.0297
(0.0288)
(0.0232)
0.7973
0.8058
(0.0168)
(0.0156)
0.9494
0.9830
(0.1065)
(0.0862)
0.0105
0.0118
(0.0035)
(0.0036)
1.4647
(0.7033)
0.2283
(0.1944)
2.2857
(0.4052)
-3936.8494 -3936.7587
Note: For each sample size (n), the left column presents the mle of all parameters; the right column presents the mle of the model parameters.
In the latter case the learning parameters where set at their data generating values.
25
26
-966.48
mean
0.073
0.077
1.071
0.455
0.404
1.225
1.714
5.976
0.112
.025.
0.027
-0.060
0.936
0.408
0.328
0.991
1.386
4.830
0.092
.975
0.128
0.223
1.229
0.507
0.483
1.511
2.107
7.337
0.133
mean
0.072
0.111
1.030
0.489
0.369
1.137
1.647
6.133
0.124
1.410
0.125
1.000
-0.981
2.245
0.341
-964.69
posterior
.025
.975
0.027 0.131
-0.024 0.271
0.893 1.184
0.427 0.557
0.282 0.453
0.913 1.406
1.323 2.028
4.893 7.538
0.092 0.147
0.113 2.819
-1.108 1.424
-0.891 3.525
-2.099 0.192
0.800 3.906
-0.905 1.406
mean
0.201
0.291
1.272
0.786
0.518
0.082
1.954
4.612
0.069
0.308
-2.085
5.380
-0.216
0.269
-0.606
-1144.774
.025
0.134
0.134
1.111
0.708
0.457
0.048
1.601
3.728
0.062
-1.246
-2.649
3.928
-1.311
-0.335
-1.074
.975
0.265
0.443
1.439
0.874
0.581
0.130
2.370
5.675
0.075
2.309
-1.537
7.178
0.785
0.874
-0.138
Note: .025 and .975 denote the quantiles. Sample data for posterior summary: Left panel: 1964:III to 2007:I with initial estimates
of φ0 and R0 based on observations from 1954:III to 1964:II. Center panel: 1964:III to 2007:I. Right panel: 1954:III to 2007:I.
prior
param
mean s.d.
dist.
ψ
4.00 2.00
G
0.25 1.00
N
θx
θπ
1.00 1.00
N
–
–
U(-1,1)
ρx
ρπ
–
–
U(-1,1)
2
σx
0.75 2.00
IG
σπ2
0.75 2.00
IG
2
σi
0.75 2.00
IG
γ
–
–
U(0,1)
a0x
0.00 5.00
N
a0π
0.00 5.00
N
c0xx
0.00 5.00
N
c0πx
0.00 5.00
N
c0xπ
0.00 5.00
N
c0ππ
0.00 5.00
N
log-posterior ordinate (unnormalized)
Table 4: Prior and Posterior Summary of Parameters in the NK Model
−1
0
−.5
.5
0
x
2
0
a0x
1
−10
0
b0
0
−20
−2.5
1
−5
1
2
1.5
.75
.5
.25
0
6
x
−.5
4
−.25
−1
2
0
a0
−.75
0
−2.5
0
b0x2
1
3
−5
−2.5
.5
20
−5
2
i
10
5
2.5
2.5
2.5
0
.6
1
1.2
1.5
1.8
1
2
2.4
1.25
.5
.75
2
x
0
.5
1
.25
0
5
8
2.5
−2.5
2
0
b0x1
−5
5
0
b0
2.5
−1.25
1.25
−2.5
5
Figure 3: Marginal prior-posterior plots of the parameters in the NK model. Red: prior, Green: posterior with training sample, Blue: posterior
without training sample. Training sample: 1954:III to 1964:II. Estimation sample: 1964:III to 2007:I
27
Figure 4: Simulated REE, PLM and ALM plots for ct,xx and ct,πx in the NK model for
sample period 1964:III to 2007:I (center panel in Table 4). Shaded regions show the 95%
band (darker shade is for PLM).
28
Figure 4: (cont’d) Simulated REE, PLM and ALM plots for ct,xπ and ct,ππ in the NK model
for sample period 1964:III to 2007:I (center panel in Table 4). Shaded regions show the 95%
band (darker shade is for PLM).
29
Download