Particle MCMC and Sequential Monte Carlo Squared for DSGE Models

advertisement
Particle MCMC and Sequential Monte Carlo Squared
for DSGE Models
Edward Herbst
Frank Schorfheide∗
Federal Reserve Board
University of Pennsylvania
CEPR, and NBER
February 13, 2015
Abstract
This paper explores the efficacy and accuracy of Bayesian estimation of nonlinear dynamic
stochastic equlibirium (DSGE) models using two different approaches. First, we consider particle
Markov chain Monte Carlo (PMCMC) techniques [Andrieu, Doucet, and Holenstein (2010)],
wherein a particle filter-based estimate of the likelihood function is used in lieu of an exact
likelihood within an MCMC algorithm. Second, we consider an alternative approach in which
the parameters are estimated jointly with the states of the nonlinear model, as in Chopin, Jacob,
and Papaspiliopoulos (2012). We assess the effectiveness of various configurations of these two
algorithms using two DSGE models where the true likelihood is known. Specifically, we focus
on how the number of particles, type of particle filter, and size of model affects the numerical
efficiency of both posterior samplers. We also discuss computational considerations important
for implementing both approaches.
JEL CLASSIFICATION: C11, C15, E10
KEY WORDS: Bayesian Analysis, DSGE Models, Monte Carlo Methods
∗
Correspondence: E. Herbst: Board of Governors of the Federal Reserve System, 20th Street and Constitution
Avenue N.W., Washington, D.C. 20551. Email: edward.p.herbst@frb.gov. F. Schorfheide: Department of Economics,
3718 Locust Walk, University of Pennsylvania, Philadelphia, PA 19104-6297. Email: schorf@ssc.upenn.edu.
1
1
Introduction
We explore the efficacy and accuracy of Bayesian estimation of nonlinear dynamic stochastic equlibirium (DSGE) models. Bayesian estimation of such models is challenging and relatively few papers
have attempted full posterior elicitation: see, for example, Fernandez-Villaverde and Rubio-Ramirez
(2004) and Gust, Lopez-Salido, and Smith (2012). Therefore, there is relatively little understanding of the econometric performance previously-used methods, such as particle MCMC, and newer
Sequential Monte Carlo techniques.
We assesses two posterior simulators for nonlinear state space models. First, we consider
particle Markov chain Monte Carlo (PMCMC) techniques [Andrieu, Doucet, and Holenstein (2010)],
wherein a particle filter-based estimate of the likelihood function is used in lieu of an exact likelihood
within an MCMC algorithm. Second, we consider alternative approach in which the parameters are
estimated jointly with the states of the nonlinear model, as in Chopin, Jacob, and Papaspiliopoulos
(2012), using so-called the sequential Monte Carlo squared (SM C 2 ) algorithm. Since we are only
interested in the econometrics of both simulators, we use linear models as our benchmarks. This
allows us to compare both approaches to their Kalman filter-based equivalents, and thus isolate
the loss in efficiency attributable solely to the particle approximation to the likelihood. We use a
small-scale New Keynesian model and the Smets and Wouters (2007)–henceforth SW model–to get
a sense of how model complexity affects the simulators.
For both approaches, the particle filter forms the basis of the estimation. In our examples, we
span (loosely speaking) “best” and “worst” case scenarios, using both the naive bootstrap particle
filter and a particle filter tailored with a so-called “conditionally-optimal” proposal distribution.
We also vary the number of particles as well as several other tuning parameters in our examination.
Our results can be summarized as follows. For both P M CM C and SM C 2 There can be
substantial gain to adapting the particle filter to the particular economic model being estimated.
The bootstrap particle filter performs very poorly in most cases, requiring substantially more
particles to obtain reliable likelihood estimates. This inaccuracy affects the posterior simulators:
for the larger, SW model, the particle MCMC algorithm broke down completely under the bootstrap
particle filter. Finally, SM C 2 seems more sensitive to the accuracy of the particle approximation
than P M CM C. While P M CM C using the conditionally-optimal particle filter perform nearly as
well as their Kalman Filter-based counterparts, the same is not true for SM C 2 , which is clearly
less efficient than a Kalman Filter-based SM C algorithm.
2
The material that follows is abstracted from the yet-to-be-published book monograph, “Bayesian
Estimation of DSGE Models.” The discussion of the particle MCMC and Sequential Monte Carlo
methods build off some of the earlier material in the book, so we include relevant sections here for
completeness. We provide a rough table of contents for navigating the document below.
• Chapter 1. Describes the small-scale New Keynesian model and the SW model used in the
simulations.
• Chapter 4. Describes the basics of the random walk Metroplis-Hastings algorithm that is
the backbone of the particle MCMC method.
• Chapter 5. Describes the basic Sequential Monte Carlo (SMC) method for estimating model
parameters that forms the basis for the SM C 2 method.
• Chapter 8. Introduces the particle filter for evaluating the likelihood of nonlinear state
space models. We define and describe the bootstrap version of the particle filter as well as
one based on “conditionally optimal” proposals.
• Chapter 9. Describes the PMCMC algorithm and shows posterior simulation results for
small-scale new Keynesian model and the SW model.
• Chapter 10. Describes the SM C 2 algorithm and shows posterior simulation results for the
small-scale New Keynesian model.
Bayesian Estimation of DSGE Models
Edward Herbst
Frank Schorfheide
February 5, 2015
Chapter 1
DSGE Modeling
Estimated dynamic stochastic general equilibrium (DSGE) models are now widely used by
academics to conduct empirical research macroeconomics as well as by central banks to interpret the current state of the economy, analyze the impact of changes in monetary or fiscal
policy, and to generate predictions for key macroeconomic aggregates. The term DSGE
model encompasses a broad class of macroeconomic models that span the real business cycle
models of Kydland and Prescott (1982) and King, Plosser, and Rebelo (1988) as well as the
New Keynesian models of Rotemberg and Woodford (1997) or Christiano, Eichenbaum, and
Evans (2005), which feature nominal price and wage rigidities and a role for central banks
to adjust interest rates in response to inflation and output fluctuations. A common feature of these models is that decision rules of economic agents are derived from assumptions
about preferences and technologies by solving intertemporal optimization problems. Moreover, agents potentially face uncertainty with respect to aggregate variables such as total
factor productivity or nominal interest rates set by a central bank. This uncertainty is generated by exogenous stochastic processes that may shift technology or generate unanticipated
deviations from a central bank’s interest-rate feedback rule.
The focus of this book is the Bayesian estimation of DSGE models. Conditional on distributional assumptions for the exogenous shocks, the DSGE model generates a likelihood
function, that is, a joint probability distribution for the endogenous model variables such
as output, consumption, investment, and inflation that depends on the structural parameters of the model. These structural parameters characterize agents’ preferences, production
technologies, and the law of motion of the exogenous shocks. In a Bayesian framework, this
likelihood function can be used to transform a prior distribution for the structural parameters
4
of the DSGE model into a posterior distribution. This posterior is the basis for substantive
inference and decision making. Unfortunately, it is not feasible to characterize moments and
quantiles of the posterior distribution analytically. Instead, we have to use computational
techniques to generate draws from the posterior and then approximate posterior expectations
by Monte Carlo averages.
In Section 1.1 we will present a small-scale New Keynesian DSGE model and describe
the decision problems of firms and households and the behavior of the monetary and fiscal
authorities. We then characterize the resulting equilibrium conditions. This model is subsequently used in many of the numerical illustrations. Section 1.2 briefly sketches two other
DSGE models that will be estimated in subsequent chapters.
1.1
A Small-Scale New Keynesian DSGE Model
We begin with a small-scale New Keynesian DSGE model that has been widely studied in
the literature (see Woodford (2003) or Gali (2008) for textbook treatments). The particular
specification presented below is based on An and Schorfheide (2007). The model economy
consists of final good producing firms, intermediate goods producing firms, households, a
central bank, and a fiscal authority. We will first describe the decision problems of these
agents, then describe the law of motion of the exogenous processes, and finally summarize
the equilibrium conditions. The likelihood function for a linearized version of this model can
be quickly evaluated, which makes the model an excellent showcase for the computational
algorithms studied in this book.
1.1.1
Firms
Production takes place in two stages. There are monopolistically-competitive intermediate
goods producing firms and perfectly competitive final goods producing firms that aggregate
the intermediate goods into a single good that is used for household and government consumption. This two-stage production process makes it fairly straightforward to introduce
price stickiness, which in turn creates a real effect of monetary policy.
The perfectly competitive final good producing firms combine a continuum of intermediate
goods indexed by j ∈ [0, 1] using the technology
1
Z 1
1−ν
1−ν
Yt =
Yt (j) dj
.
0
(1.1)
5
The final good producers take input prices Pt (j) and output prices Pt as given. The revenue from the sale of the final good is Pt Yt and the input costs incurred to produce Yt are
R1
Pt (j)Yt (j)dj. Maximization of profits
0
1
1−ν
1
Z
1−ν
Πt = Pt
Yt (j)
−
dj
0
1
Z
Pt (j)Yt (j)dj,
(1.2)
0
with respect to the inputs Yt (j) implies that the demand for intermediate good j is given by
Yt (j) =
Pt (j)
Pt
−1/ν
Yt .
(1.3)
Thus, the parameter 1/ν represents the elasticity of demand for each intermediate good. In
the absence of an entry cost final good producers will enter the market until profits are equal
to zero. From the zero-profit condition, it is possible to derive the following relationship
between the intermediate goods prices and the price of the final good:
1
Z
Pt (j)
Pt =
ν−1
ν
ν
ν−1
dj
.
(1.4)
0
Intermediate good j is produced by a monopolist who has access to the following linear
production technology:
Yt (j) = At Nt (j),
(1.5)
where At is an exogenous productivity process that is common to all firms and Nt (j) is the
labor input of firm j. To keep the model simple, we abstract from capital as a factor or
production for now. Labor is hired in a perfectly competitive factor market at the real wage
Wt .
In order to introduce nominal price stickiness, we assume that firms face quadratic price
adjustment costs
φ
ACt (j) =
2
Pt (j)
−π
Pt−1 (j)
2
Yt (j),
(1.6)
where φ governs the price rigidity in the economy and π is the steady state inflation rate
associated with the final good. Under this adjustment cost specification it is costless to
change prices at the rate π. If the price change deviates from π the firm incurs a cost in
terms of lost output that is a quadratic function of the discrepancy between the price change
and π. The larger the adjustment cost parameter φ, the more reluctant the intermediate
goods producers are to change their prices and the more rigid the prices are at the aggregate
6
level. Firm j chooses its labor input Nt (j) and the price Pt (j) to maximize the present value
of future profits
"
Et
∞
X
β s Qt+s|t
s=0
#
Pt+s (j)
Yt+s (j) − Wt+s Nt+s (j) − ACt+s (j) .
Pt+s
(1.7)
Here, Qt+s|t is the time t value of a unit of the consumption good in period t + s to the
household, which is treated as exogenous by the firm.
1.1.2
Households
The representative household derives utility from consumption Ct relative to a habit stock
(which is approximated by the level of technology At )1 and real money balances Mt /Pt . The
household derives disutility from hours worked Ht and maximizes
"∞
#
X (Ct+s /At+s )1−τ − 1
M
t+s
βs
Et
+ χM ln
,
− χH Ht+s
1
−
τ
P
t+s
s=0
(1.8)
where β is the discount factor, 1/τ is the intertemporal elasticity of substitution, and χM
and χH are scale factors that determine steady state real money balances and hours worked.
We will set χH = 1. The household supplies perfectly elastic labor services to the firms,
taking the real wage Wt as given. The household has access to a domestic bond market
where nominal government bonds Bt are traded that pay (gross) interest Rt . Furthermore,
it receives aggregate residual real profits Dt from the firms and has to pay lump-sum taxes
Tt . Thus, the household’s budget constraint is of the form
Pt Ct + Bt + Mt + Tt = Pt Wt Ht + Rt−1 Bt−1 + Mt−1 + Pt Dt + Pt SCt ,
(1.9)
where SCt is the net cash inflow from trading a full set of state-contingent securities.
1.1.3
Monetary and Fiscal Policy
Monetary policy is described by an interest rate feedback rule of the form
ρR R,t
Rt = Rt∗ 1−ρR Rt−1
e ,
1
(1.10)
This assumption ensures that the economy evolves along a balanced growth path even if the utility
function is additively separable in consumption, real money balances, and leisure.
7
where R,t is a monetary policy shock and Rt∗ is the (nominal) target rate:
Rt∗
= rπ
∗
π ψ1 Y ψ2
t
∗
π
t
Yt∗
.
(1.11)
Here r is the steady state real interest rate (defined below), πt is the gross inflation rate
defined as πt = Pt /Pt−1 , and π ∗ is the target inflation rate. Yt∗ in (1.11) is the level of output
that would prevail in the absence of nominal rigidities.
We assume that the fiscal authority consumes a fraction ζt of aggregate output Yt , that is
Gt = ζt Yt , and that ζt ∈ [0, 1] follows an exogenous process specified below. The government
levies a lump-sum tax Tt (subsidy) to finance any shortfalls in government revenues (or to
rebate any surplus). The government’s budget constraint is given by
Pt Gt + Rt−1 Bt−1 + Mt−1 = Tt + Bt + Mt .
1.1.4
(1.12)
Exogenous Processes
The model economy is perturbed by three exogenous processes. Aggregate productivity
evolves according to
ln At = ln γ + ln At−1 + ln zt ,
where
ln zt = ρz ln zt−1 + z,t .
(1.13)
Thus, on average technology grows at the rate γ and zt captures exogenous fluctuations of
the technology growth rate. Define gt = 1/(1 − ζt ), where ζt was previously defined as the
fraction of aggregate output purchased by the government. We assume that
ln gt = (1 − ρg ) ln g + ρg ln gt−1 + g,t .
(1.14)
Finally, the monetary policy shock R,t is assumed to be serially uncorrelated. The three
innovations are independent of each other at all leads and lags and are normally distributed
with means zero and standard deviations σz , σg , and σR , respectively.
1.1.5
Equilibrium Relationships
We consider the symmetric equilibrium in which all intermediate goods producing firms make
identical choices so that the j subscript can be omitted. The market clearing conditions are
given by
Yt = Ct + Gt + ACt
and Ht = Nt .
(1.15)
8
Because the households have access to a full set of state-contingent claims, it turns out that
Qt+s|t in (1.7) is
Qt+s|t = (Ct+s /Ct )−τ (At /At+s )1−τ .
(1.16)
Thus, in equilibrium households and firms are using the same stochastic discount factor.
Moreover, it can be shown that output, consumption, interest rates, and inflation have to
satisfy the following optimality conditions
"
#
−τ
Ct+1 /At+1
At Rt
1 = βEt
Ct /At
At+1 πt+1
τ 1
Ct
1
π
1 =
1−
+ φ(πt − π) 1 −
πt +
ν
At
2ν
2ν
"
#
−τ
Ct+1 /At+1
Yt+1 /At+1
−φβEt
(πt+1 − π)πt+1 .
Ct /At
Yt /At
(1.17)
(1.18)
(1.17) is the consumption Euler equation which reflects the first-order-condition with respect
to the government bonds Bt . In equilibrium, the household equates the marginal utility of
consuming a dollar today with the discounted marginal utility from investing the dollar,
earning interest Rt and consuming it in the next period. (1.18) characterizes the profit
maximizing choice of the intermediate goods producing firms. The first-order condition for
the firms’ problem depends on the wage Wt . We used the households’ labor supply condition
to replace Wt by a function of the marginal utility of consumption. In the absence of nominal
rigidities (φ = 0) aggregate output is given by
Yt∗ = (1 − ν)1/τ At gt ,
(1.19)
which is the target level of output that appears in the monetary policy rule (1.11).
In Chapter 2.1 we will use a solution technique for the DSGE model that is based on a
Taylor series approximation of the equilibrium conditions. A natural point around which to
construct this approximation is the steady state of the DSGE model. The steady state is
attained by setting the innovations R,t , g,t , and z,t to zero at all times. Because technology
ln At evolves according to a random walk with drift ln γ, consumption and output need to
be detrended for a steady state to exist. Let ct = Ct /At and yt = Yt /At , and yt∗ = Yt∗ /At .
Then the steady state is given by
π = π∗,
r=
γ
,
β
R = rπ ∗ ,
c = (1 − ν)1/τ ,
and y = gc = y ∗ .
(1.20)
Steady state inflation equals the targeted inflation rate π∗ ; the real rate depends on the
growth rate of the economy γ and the reciprocal of the households’ discount factor β; and
9
finally steady state output can be determined from the aggregate resource constraint. the
nominal interest rate is determined by the Fisher equation; the dependence of the steady
state consumption on ν reflects the distortion generated by the monopolistic competition
among intermediate goods producers. We are now in a position to rewrite the equilibrium
conditions by expressing each variable in terms of percentage deviations from its steady state
value. Let x̂t = ln(xt /x) and write
i
h
1 = βEt e−τ ĉt+1 +τ ĉt +R̂t −ẑt+1 −π̂t+1
1 − ν τ ĉt
1
1
π̂t
π̂t
e −1 = e −1
1−
e +
νφπ 2
2ν
2ν
π̂t+1
−τ ĉt+1 +τ ĉt +ŷt+1 −ŷt +π̂t+1 − β Et e
−1 e
2
φπ 2 g π̂t
eĉt −ŷt = e−ĝt −
e −1
2
R̂t = ρR R̂t−1 + (1 − ρR )ψ1 π̂t + (1 − ρR )ψ2 (ŷt − ĝt ) + R,t
(1.21)
(1.22)
(1.23)
(1.24)
ĝt = ρg ĝt−1 + g,t
(1.25)
ẑt = ρz ẑt−1 + z,t .
(1.26)
The equilibrium law of motion of consumption, output, interest rates, and inflation has to
satisfy the expectational difference equations (1.21) to (1.26).
1.2
Other DSGE Models Considered in this Book
In addition to the small-scale New Keynesian DSGE model, we consider two other models:
the widely-used Smets-Wouters (SW) model, which is a more elaborate version of the smallscale DSGE model that includes capital accumulation as well as wage rigidities, and a real
business cycle model with a detailed characterization of fiscal policy. We will present a brief
overview of these models below and provide further details as needed in Chapter 6.
1.2.1
The Smets-Wouters Model
The Smets and Wouters (2007) model is a more elaborate version of the small-scale DSGE
model presented in the previous section. In the SW model capital is a factor of intermediate goods production, and in addition to price stickiness the model features nominal wage
stickiness. In order to generate a richer autocorrelation structure, the model also includes
10
investment adjustment costs, habit formation in consumption, and partial dynamic indexation of prices and wages to lagged values. The model is based on work by Christiano,
Eichenbaum, and Evans (2005), who added various forms of frictions to a basic New Keynesian DSGE model in order to capture the dynamic response to a monetary policy shock
as measured by a structural vector autoregression (VAR). In turn (the publication dates are
misleading), Smets and Wouters (2003) augmented the Christiano-Eichenbaum-Evans model
by additional exogenous structural shocks (among them price markup shocks, wage markup
shocks, preference shocks, and others) to be able to capture the joint dynamics of Euro Area
output, consumption, investment, hours, wages, inflation, and interest rates.
The Smets and Wouters (2003) paper has been highly influential, not just in academic
circles but also in central banks because it demonstrated that a modern DSGE model that
is usable for monetary policy analysis can achieve a time series fit that is comparable to a
less restrictive vector autoregression (VAR). The 2007 version of the SW model contains a
number of minor modifications of the 2003 model in order to optimize its fit on U.S. data. We
will use the 2007 model exactly as it is presented in Smets and Wouters (2007) and refer the
reader to that article for details. The log-linearized equilibrium conditions are reproduced
in Appendix A.1. By now, the SW model has become one of the workhorse models in the
DSGE model literature and in central banks around the world. It forms the core of most
large-scale DSGE model that augment the SW model with additional features such as a
multi-sector production structure or financial frictions. Because of its widespread use, we
will consider its estimation in this book.
1.2.2
A DSGE Model For the Analysis of Fiscal Policy
In the small-scale New Keynesian DSGE model and in the SW model fiscal policy is passive
and non-distortionary. The level of government spending as a fraction of GDP is assumed
to evolve exogenously and an implicit money demand equation determines the amount of
seignorage generated by the interest rate feedback rule. Fiscal policy is passive in the sense
that the government raises lump-sum taxes (or distributes lump-sum transfers) to ensure that
its budget constraint is satisfied in every period. These lump-sum taxes are non-distortionary,
because they do not affect the decisions of households and firms. The exact magnitude of
the lump-sum taxes and the level of government debt are not uniquely determined, but they
also do not matter for macroeconomic outcomes. Both the small-scale DSGE model and the
Chapter 4
Metropolis-Hastings Algorithms for
DSGE Models
To date, the most widely used method to generate draws from posterior distributions of
a DSGE model is the random walk MH (RWMH) algorithm. This algorithm is a special
case of the generic Algorithm 5 introduced in Chapter 3.5 in which the proposal distribution
q(ϑ|θi−1 ) can be expressed as the random walk ϑ = θi−1 +η and η is drawn from a distribution
that is centered at zero. We introduce a benchmark RWMH algorithm in Section 4.1 and
apply it to the small-scale New Keynesian DSGE model in Section 4.2. The DSGE model
likelihood function in combination with the prior distribution presented in Chapter 2.3 leads
to posterior distribution that has a fairly regular elliptical shape. In turn, the draws from
a simple RWMH algorithm can be used to obtain an accurate numerical approximation of
posterior moments.
Unfortunately, in many other applications, in particular those involving medium- and
large-scale DSGE models the posterior distributions could be very non-elliptical. Irregularlyshaped posterior distributions are often caused by identification problems or misspecification.
The DSGE model may suffer from a local identification problem that generates a posterior
that is very flat in certain directions of the parameter space, similar to the posterior encountered in the simple set-identified model of Chapter 3.3. Alternatively, the posterior may
exhibit multimodal features. Multimodality could be caused by the data’s inability to distinguish between the role of a DSGE model’s external and internal propagation mechanisms.
For instance, inflation persistence can be generated by highly autocorrelated cost-push shocks
or by firms’ inability to frequently re-optimize their prices in view of fluctuating marginal
58
costs. We use a very stylized state-space model to illustrate these challenges for posterior
simulators in Section 4.3.
In view of the difficulties caused by irregularly-shaped posterior surfaces, we review a
variety of alternative MH samplers in Section 4.4. These algorithms differ from the RWMH
algorithm in two dimensions. First, they use alternative proposal distributions q(ϑ|θi−1 ). In
general, we consider distributions of the form
q(·|θi−1 ) = pt (·|µ(θi−1 ), Σ(θi−1 ), ν),
(4.1)
where pt (·) refers to the density of a student-t distribution Thus, our exploration of proposal
densities concentrates on different ways of forming the location parameter µ(·) and the scale
matrix Σ(·). For ν = ∞ this notation nests Gaussian proposal distributions. The second
dimension in which we generalize the algorithm is blocking, i.e., we group the parameters
into subvectors, and use a Block MH sampler to draw iteratively from conditional posterior
distributions.
While the alternative MH samplers are designed for irregular posterior surfaces for which
the simple RWMH algorithm generates inaccurate approximations, we illustrate the performance gains obtained through these algorithms using the simple New Keynesian DSGE
model in Section 4.5. Similar to the illustrations in Chapter 3.5, we evaluate the accuracy
of the algorithms by computing the variance of Monte Carlo approximations across multiple
chains. Our simulations demonstrate that careful tailoring of proposal densities q(ϑ|θi−1 )
as well as blocking the parameters can drastically improve the accuracy of Monte Carlo
approximations. Finally, Section 4.6 takes a brief look at the numerical approximation of
marginal data densities that are used to compute posterior model probabilities.
In Chapter 3.5, we showed directly that the Monte Carlo estimates associated with the
discrete MH algorithm satisfied a SLLN and CLT for dependent, identically distributed random variables. All of the MH algorithms here give rise to Markov chains that are recurrent,
irreducible and aperiodic for the target distribution of interest. These properties are sufficient for a SLLN to hold. However, validating conditions for a CLT to hold is much more
difficult and beyond the scope of this book.
4.1
A Benchmark Algorithm
The most widely-used MH algorithm for DSGE model applications is the random walk MH
(RWMH) algorithm. The mean of the proposal distribution in (4.1) is simply the current
59
location in the chain and its variance is prespecified:
µ(θi−1 ) = θi−1 and Σ(θi−1 ) = c2 Σ̂
(4.2)
The name of the algorithm comes from the random walk form of the proposal, which can be
written as
ϑ = θi−1 + η
where η is mean zero with variance c2 Σ̂. Given the symmetric nature of the proposal distribution, the acceptance probability becomes
p(ϑ|Y )
α = min
,1 .
p(θi−1 |Y )
A draw, ϑ, is accepted with probability one if the posterior at ϑ has a higher value than the
posterior at θi−1 . The probability of acceptance decreases as the posterior at the candidate
value decreases relative to the current posterior.
To implement the RWMH, the user still needs to specify ν, c, and Σ̂. For all of the
variations of the RWMH we implement, we set ν = ∞ in (4.1), that is, we use a multivariate
normal proposal distribution in keeping with most of the literature. Typically, the choice
of c is made conditional on Σ̂, so we first discuss the choice for Σ̂. The proposal variance
controls the relative variances and correlations in the proposal distribution. As we have seen
in Chapter 3.5, the sampler can work very poorly if q is strongly at odds with the target
distribution. This intuition extends to the multivariate setting here. Suppose θ, comprises
two parameters, say β and δ, that are highly correlated in the posterior distribution. If the
variance of the proposal distribution does not capture this correlation, e.g., the matrix Σ̂ is
diagonal, then the draw ϑ is unlikely to reflect the fact that if β is large then δ should also
be large, and vice versa. Therefore, p(ϑ|Y ) is likely to be smaller than p(θi−1 |Y ), and so the
proposed draw will be rejected with high probability. As a consequence, the chain will have
a high rejection rate, exhibit a high autocorrelation, and the Monte Carlo estimates derived
from it will have a high variance.
A good choice for Σ̂ seeks to incorporate information from the posterior, to potentially
capture correlations discussed above. Obtaining this information can difficult. A popular
approach, used in Schorfheide (2000), is to set Σ̂ to be the negative of the inverse Hessian at
the mode of the log posterior, θ̂, obtained by running a numerical optimization routine before
running MCMC. Using this as an estimate for the covariance of the posterior is attractive,
because it can be viewed as a large sample approximation to the posterior covariance matrix
60
as the sample size T −→ ∞. There exists a large literature on the asymptotic normality
of posterior distributions. Fundamental conditions can be found, for instance, in Johnson
(1970).
Unfortunately, in many applications the maximization of the posterior density is tedious
and the numerical approximation of the Hessian may be inaccurate. These problems may
arise if the posterior distribution is very non-elliptical and possibly multi-modal, or if the
likelihood function is replaced by a non-differentiable particle filter approximation (see Chapter 8 below). In both cases, a (partially) adaptive approach may work well: First, generate
a set of posterior draws based on a reasonable initial choice for Σ̂, e.g. the prior covariance
matrix. Second, compute the sample covariance matrix from the first sequence of posterior
draws and use it as Σ̂ in a second run of the RWMH algorithm. In principle, the covariance
matrix Σ̂ can be adjusted more than once. However, Σ̂ must be fixed eventually to guarantee
the convergence of the posterior simulator. Samplers which constantly (or automatically)
adjust Σ̂ are known as adaptive samplers and require substantially more elaborate theoretical
justifications.
Instead of strictly following one of the two approaches of tuning Σ̂ that we just described,
we use an estimate of the posterior covariance, Vπ [θ], obtained from an earlier estimation
in the subsequent numerical illustrations. While this approach is impractical in empirical
work, it is useful for the purpose of comparing the performance of different posterior samplers,
because it avoids a distortion due to a mismatch between the Hessian-based estimate and
the posterior covariance. Thus, it is a best-case scenario for the algorithm. To summarize,
we examine the following variant of the RWMH algorithm:
RWMH-V : Σ̂ = Vπ [θ].
The final parameter of the algorithm is the scaling factor c. This parameter is typically
adjusted to ensure a “reasonable” acceptance rate. Given the opacity of the posterior, it
is difficult to derive a theoretically optimal acceptance rate. If the sampler accepts too
frequently, it may be making very small movements, resulting in large serial correlation and
a high variance of the resulting Monte Carlo estimates. Similarly, if the chain rejects too
frequently, it may get stuck in one region of the parameter space, again resulting in accurate
estimates. However, for the special case of a target distribution which is multivariate normal,
Roberts, Gelman, and Gilks (1997) have derived a limit (in the size of parameter vector)
optimal acceptance rate of 0.234. Most practitioners target an acceptance rate between 0.20
and 0.40. The scaling factor c can be tuned during the burn-in period or via pre-estimation
Chapter 5
Sequential Monte Carlo Methods
Importance sampling has been rarely used in DSGE model application. A key difficulty with
Algorithm 4, in particular in high-dimensional parameter spaces, is to find a good proposal
density. In this chapter, we will explore methods in which proposal densities are constructed
sequentially. Suppose φn , n = 1, . . . , Nφ , is a sequence that slowly increases from zero to
one. We can define a sequence of tempered posteriors as
πn (θ) = R
[p(Y |θ)]φn p(θ)
[p(Y |θ)]φn p(θ)dθ
n = 0, . . . , Nφ ,
φn ↑ 1.
(5.1)
Provided that φ1 is close to zero, the prior density p(θ) may serve as an efficient proposal
density for π1 (θ). Likewise, the density πn (θ) may be a good proposal density for πn+1 (θ).
Sequential Monte Carlo (SMC) algorithms try to exploit this insight efficiently.
SMC algorithms were initially developed to solve filtering problems that arise in nonlinear
state-space models. We will consider such filtering applications in detail in Chapter 8.
Chopin (2002) showed how to adapt the particle filtering techniques to conduct posterior
inference for a static parameter vector. Textbook treatments of SMC algorithms can be
found, for instance, in Liu (2001) and Cappé, Moulines, and Ryden (2005). The volume by
Doucet, de Freitas, and Gordon (2001) discusses many applications and practical aspects of
SMC. Creal (2012) provides a recent survey focusing on SMC applications in econometrics.
The first paper that applied SMC techniques to posterior inference in DSGE models is
Creal (2007). He presents a basic SMC algorithm and uses it for posterior inference in a
small-scale DSGE model that is similar to the model in Section 1.1. Herbst and Schorfheide
(2014) developed the algorithm further, provided some convergence results for an adaptive
version of the algorithm building on the theoretical analysis of Chopin (2004), and showed
88
that a properly tailored SMC algorithm delivers more reliable posterior inference for largescale DSGE models with multi-modal posterior than the widely-used RMWH-V algorithm.
Much of exposition in this chapter borrows from Herbst and Schorfheide (2014). An additional advantage of the SMC algorithms over MCMC algorithms, on the computational
front, highlighted by Durham and Geweke (2014), is that SMC is much more amenable to
parallelization. Durham and Geweke (2014) show how to implement a SMC algorithm on
graphical processing unit (GPU), facilitating massive speed gains in estimations. While the
evaluation of DSGE likelihoods is not (yet) amenable to GPU calculation, we will show how
to exploit the parallel structure of the algorithm.
We present a generic SMC algorithm in Section 5.1. Further details on the implementation of the algorithm, the adaptive choice of tuning constants, and the convergence of the
Monte Carlo approximations constructed from the output of the algorithm are provided in
Section 5.2. Finally, we apply the algorithm to the small-scale New Keynesian DSGE model
in Section 5.3. Because we will generate draws of θ sequentially, from a sequence of posterior
N
φ
distributions {πn (θ)}n=1
, it is useful to equip the parameter vector with a subscript n. Thus,
θn is associated with the density πn (·).
5.1
A Generic SMC Algorithm
Just as the basic importance sampling algorithm, SMC algorithms generate weighted draws
N
φ
from the sequence of posteriors {πn }n=1
in (5.1). The weighted draws are called particles.
We denote the overall number of particles by N . At any stage the posterior distribution
πn (θ) is represented by a swarm of particles {θni , Wni }N
i=1 in the sense that the Monte Carlo
average
h̄n,N
N
1 X i
a.s.
=
Wn h(θi ) −→ Eπ [h(θn )].
N i=1
(5.2)
i
i
Starting from stage n − 1 particles {θn−1
, Wn−1
}N
i=1 the algorithm proceeds in three steps,
using Chopin (2004)’s terminology: correction, that is, reweighting the stage n−1 particles to
reflect the density in iteration n; selection, that is, eliminating a highly uneven distribution of
particle weights (degeneracy) by resampling the particles; and mutation, that is, propagating
the particles forward using a Markov transition kernel to adapt the particle values to the
stage n bridge density.
The sequence of posteriors in (5.1) was obtained by tempering the likelihood function,
that is, we replaced p(Y |θ) by [p(Y |θ)]φn . Alternatively, one could construct the sequence
89
of posteriors by sequentially adding observations to the likelihood function, that is, πn (θ) is
based on p(Y1:bφn T c |θ):
πn(D) (θ) = R
p(Y1:bφn T c )p(θ)
,
p(Y1:bφn T c )p(θ)dθ
(5.3)
where bxc is the largest integer that is less or equal to x. This data tempering is particularly
attractive in sequential applications. Due to the fact that individual observations are not
divisible, the data tempering approach is slightly less flexible. This may matter for the early
stages of the SMC sampler in which it may be advantageous to add information in very
small increments. The subsequent algorithm is presented in terms of likelihood tempering.
However, we also discuss the necessary adjustments for data tempering.
The algorithm provided below relies on sequences of tuning parameters. To make the
exposition more transparent, we begin by assuming that these sequences are provided ex
N
φ
ante. Let {ρn }n=1
be a sequence of zeros and ones that determine whether the particles
N
are resampled in the selection step and let {ζn }φn=1
of tuning parameters for the Markov
transition density in the mutation step. The adaptive choice of these tuning parameters will
be discussed in Section 5.2.2 below.
Algorithm 8 (Generic SMC Algorithm with Likelihood Tempering)
iid
1. Initialization. (φ0 = 0). Draw the initial particles from the prior: θ1i ∼ p(θ) and
W1i = 1, i = 1, . . . , N .
2. Recursion. For n = 1, . . . , Nφ ,
(a) Correction. Reweight the particles from stage n − 1 by defining the incremental
weights
i
w̃ni = [p(Y |θn−1
)]φn −φn−1
(5.4)
and the normalized weights
W̃ni =
1
N
i
w̃ni Wn−1
,
PN
i Wi
w̃
i=1 n n−1
i = 1, . . . , N.
(5.5)
An approximation of Eπn [h(θ)] is given by
h̃n,N =
N
1 X i
W̃ h(θi ).
N i=1 n n−1
(5.6)
90
(b) Selection.
Case (i): If ρn = 1, resample the particles via multinomial resampling. Let
{θ̂}N
i=1 denote N iid draws from a multinomial distribution characterized by supi
i
port points and weights {θn−1
, W̃ni }N
i=1 and set Wn = 1.
i
and Wni = W̃ni , i = 1, . . . , N . An approximation
Case (ii): If ρn = 0, let θ̂ni = θn−1
of Eπn [h(θ)] is given by
ĥn,N
N
1 X i
=
W h(θ̂i ).
N i=1 n n
(5.7)
(c) Mutation. Propagate the particles {θ̂i , Wni } via NM H steps of a MH algorithm
with transition density θni ∼ Kn (θn |θ̂ni ; ζn ) and stationary distribution πn (θ) (see
Algorithm 9 for details below). An approximation of Eπn [h(θ)] is given by
h̄n,N =
N
1 X
h(θni )Wni .
N i=1
(5.8)
3. For n = Nφ (φNφ = 1) the final importance sampling approximation of Eπ [h(θ)] is
given by:
h̄Nφ ,N =
N
X
i
h(θN
)WNi φ .
φ
(5.9)
i=1
5.1.1
Three-Step Particle Propagation
A stylized representation of the propagation of the particles is depicted in Figure 5.1. Each
dot corresponds to a particle and its size indicates the weight. At stage n = 0, N = 21 draws
are generated from a U [−10, 10] distribution and each particle receives the weight W0i = 1.
At stage n = 1 the particles are reweighted during the correction step (the size of the dots
is no longer uniform) and the particle values are modified during the mutation step (the
location of the dots shifted). The bottom of the figure depicts the target posterior density.
As n and φn increase, the target distribution becomes more concentrated. The concentration
is mainly reflected in the increased weight of the particles with values between -3 and 3. The
figure is generated under the assumption that ρn equals one for n = 3 and zero otherwise.
Thus, in iteration n = 3 the resampling step is executed and the uneven particle weights are
equalized. While the selection step generates 21 particles with equal weights, there are only
six distinct particle values, four of which have multiple copies. The subsequent mutation
step restores the diversity in particle values.
91
Figure 5.1: SMC: Evolution of Particles
10
5
0
−5
−10
C
φ0
S
φ1
M
C
S
M
C
φ2
S
M
φ3
Notes: The vertical location of each dot represents the particle value and the diameter of
the dot represents its weight. The densities at the bottom represent the tempered targed
posterior πn (·). C is Correction; S is Selection; and M is Mutation.
5.1.2
A Closer Look at the Algorithm
Algorithm 8 is initialized for n = 0 by generating iid draws from the prior distribution.
This initialization works well as long as the prior is sufficiently diffuse to assign non-trivial
probability mass to the area of the parameter space in which the likelihood function peaks.
There do exist papers in the DSGE model estimation literature in which the posterior mean
of some parameters is several prior standard deviations away from the prior mean. For such
applications it might be necessary to choose φ0 > 0 and to use an initial distribution that is
also informed by the tempered likelihood function [p(Y |θ)]φ0 . If the particles are initialized
based on a more general distribution with density g(θ), then for n = 1 the incremental
weights have to be corrected by the ratio p(θ)/g(θ).
The correction step reweights the stage n − 1 particles to generate an importance sampling
approximation of πn . Because the parameter value θi does not change in this step, no
i
further evaluation of the likelihood function is required. The likelihood value p(Y |θn−1
) was
computed as a by-product of the mutation step in iteration n−1. As discussed in Chapter 3.4,
92
the accuracy of the importance sampling approximation depends on the distribution of the
particle weights W̃ni . The more uniformly the weights are distributed, the more accurate the
approximation. If likelihood tempering is replaced by data tempering, then the incremental
weights w̃ni in (5.4) have to be defined as
w̃ni(D) = p(Y(bφn T c+1):bφn T c |θ).
(5.10)
The correction steps deliver a numerical approximation of the marginal data density as
a by-product. Using arguments that we will present in more detail in Section 5.2.4 below,
it can be verified that the unnormalized particle weights converge under suitable regularity
conditions as follows:
Z
N
1 X i i
[p(Y |θ)]φn−1 p(θ)
a.s.
w̃n Wn−1 −→
[p(Y |θ)]φn −φn−1 R
dθ
N i=1
[p(Y |θ)]φn−1 p(θ)dθ
R
[p(Y |θ)]φn p(θ)dθ
.
= R
[p(Y |θ)]φn−1 p(θ)dθ
(5.11)
Thus, the data density approximation is given by
p̂SM C (Y ) =
Nφ
Y
n=1
!
N
1 X i i
w̃ W
.
N i=1 n n−1
(5.12)
Computing this approximation does not require any additional likelihood evaluations.
The selection step is designed to equalize the particle weights, if its distribution becomes
very uneven. In Algorithm 8 the particles are resampled whenever the indicator ρn is equal
to one. On the one hand, resampling introduces noise in the Monte Carlo approximation,
which makes it undesirable. On the other hand, resampling equalizes the particle weights
and therefore increases the accuracy of the correction step in the subsequent iteration. In
Algorithm 8 we use multinomial resampling. Alternative resampling algorithms are discussed
in Section 5.2.3 below.
i
The mutation step changes the values of the particles from θn−1
to θni . To understand
the importance of the mutation step, consider what would happen without this step. For
simplicity, suppose also that ρn = 0 for all n. In this case the particle values would never
change, that is, θni = θ1i for all n. Thus, we would be using the prior as importance sampling
distribution and reweight the draws from the prior by the tempered likelihood function
[p(Y |θ1i )]φn . Given the information contents in a typical DSGE model likelihood function,
this procedure would lead to a degenerate distribution of particles, in which in the last
93
stage Nφ the weight is concentrated on a very small number of particles and the importance
sampling approximation is very inaccurate. Thus, the goal of the mutation step is to adapt
the values of the stage n particles to πn (θ). This is achieved by using steps of an MH
algorithm with a transition density that satisfies the invariance property
Z
Kn (θn |θ̂ni )πn (θ̂ni )dθ̂ni = πn (θn ).
The execution of the MH steps during the particle mutation phase requires at least one,
but possibly multiple, evaluations of the likelihood function for each particle i. To the
extent that the likelihood function is recursively evaluated with a filter, data tempering
has a computational advantages over likelihood tempering, because the former only requires
bφn T c ≤ T iterations of the filter, whereas the latter requires T iterations. The particle
mutation is ideally suited for parallelization, because the MH steps are independent across
particles and do not require any communication across processors. For DSGE models, the
evaluation of the likelihood function is computationally very costly because it requires to run
a model solution procedure as well as a filtering algorithm. Thus, gains from parallelization
are potentially quite large.
5.1.3
A Numerical Illustration
We provide a first numerical illustration of the SMC algorithm in the context of the stylized
state-space model introduced in Chapter 4.3. Recall that the model is given by
#
" #
"
0
1
θ12
yt = [1 1]st , st =
s
t , t ∼ iidN (0, 1).
t−1 +
(1 − θ12 ) − θ1 θ2 (1 − θ12 )
0
We simulate T = 200 observations given θ = [0.45, 0.45]0 , which is observationally equivalent
to θ = [0.89, 0.22]0 , and use a prior distribution that is uniform on the square 0 ≤ θ1 ≤ 1
and 0 ≤ θ2 ≤ 1. Because the state-space model has only two parameters and the model used
for posterior inference is correctly specified, the SMC algorithm works extremely well. It is
configured as follows. We use N = 1, 024 particles, Nφ = 50 stages, and a linear tempering
schedule that sets φn = n/50. The transition kernel for the mutation step is generated by a
single step of the RWMH-V algorithm.
Some of the output of the SMC algorithm is depicted in Figure 5.2. The left panel displays
the sequence of tempered (marginal) posterior distributions πn (θ1 ). It clearly shows that
the tempering dampens the posterior density. While the posterior is still unimodal during
94
Figure 5.2: SMC Posterior Approximations for Stylized State-Space Model
1.0
0.8
5
4
0.6
θ2
3
2
0.4
1
0
50
0.2
40
30
n 20
10
0.0
0.2
0.6
0.4 θ 1
0.8
1.0
0.0
0.0
0.2
0.4
0.6
0.8
1.0
θ1
Notes: The left panel depicts kernel density estimates of the sequence πn (θ) for n = 0, . . . , 50.
The right panel depicts the contours of the posterior π(θ) as well as draws from π(θ).
the first few stages of the algorithm, a clear bimodal shape as emerged for n = 10. As
φn approaches one, the bimodality of the posterior becomes more pronounced. The left
panel also suggests, that the stage n = Nφ − 1 tempered posterior provides a much better
importance sampling distribution for the overall posterior π(·) than the stage n = 1 (prior)
distribution. The right panel shows a contour plot of the joint posterior density of θ1 and θ2 as
well as the draws from the posterior π(θ) = πNφ (θ) obtained in the last stage of Algorithm 8.
The algorithm successfully generates draws from the two high-posterior-density regions.
5.2
Further Details of the SMC Algorithm
Our initial description of the SMC algorithm left out some details that are important for
the successful implementation of the algorithm. Section 5.2.1 discusses the choice of the
transition kernel in the mutation step. Section 5.2.2 considers the adaptive choice of various
tuning parameters of the algorithm. 5.2.4 outlines some convergence results for Monte Carlo
approximations constructed from the output of the SMC sampler. Finally, the resampling
step is discussed in more detail in Section 5.2.3.
Chapter 8
Particle Filters
The key difficulty that arises when the Bayesian estimation of DSGE models is extended
from linear to nonlinear models is the evaluation of the likelihood function. Throughout this
book, we will focus on the use of particle filters to accomplish this task. Our starting point
is a state-space representation for the nonlinear DSGE model of the form
yt = Ψ(st , t; θ) + ut ,
st = Φ(st−1 , t ; θ),
ut ∼ Fu (·; θ)
(8.1)
t ∼ F (·; θ).
As discussed in the previous chapter, the functions Ψ(st , t; θ) and Φ(st−1 , t ; θ) are generated
numerically by the solution method. For reasons that will become apparent below, we require
that the measurement error ut in the measurement equation is additively separable and that
the probability density function p(ut |θ) can be evaluated analytically. In many applications,
ut ∼ N (0, Σu ). While the exposition of the algorithms in this chapter focuses on the nonlinear
state-space model (8.1), the numerical illustrations and empirical applications are based on
the linear Gaussian model
yt = Ψ0 (θ) + Ψ1 (θ)t + Ψ2 (θ)st + ut ,
st = Φ1 (θ)st−1 + Φ (θ)t ,
ut ∼ N (0, Σu ),
(8.2)
t ∼ N (0, Σ )
obtained by solving a log-linearized DSGE model. For model (8.2) the Kalman filter described in Table 2.1 delivers the exact distributions p(yt |Y1:t−1 , θ) and p(st |Y1:t , θ) against
which the accuracy of the particle filter approximation can be evaluated.
There exists a large literature on particle filters. Surveys and tutorials can be found, for
instance, in Arulampalam, Maskell, Gordon, and Clapp (2002), Cappé, Godsill, and Moulines
148
(2007), Doucet and Johansen (2011), Creal (2012). Kantas, Doucet, Singh, Maciejowski, and
Chopin (2014) discuss using particle filters in the context of estimating the parameters of
a state space models. These papers provide detailed references to the literature. The basic
bootstrap particle filtering algorithm is remarkably straightforward, but may perform quite
poorly in practice. Thus, much of the literature focuses on refinements of the bootstrap
filter that increases the efficiency of the algorithm, see, for instance, Doucet, de Freitas, and
Gordon (2001). Textbook treatments of the statistical theory underlying particle filters can
be found in Cappé, Moulines, and Ryden (2005), Liu (2001), and Del Moral (2013).
The remainder of this chapter is organized as follows. We introduce the bootstrap particle
filter in Section 8.1. This filter is easy to implement and has been widely used in DSGE
model applications. The bootstrap filter is a special case of a sequential importance sampling
with resampling (SISR) algorithm. This more general algorithm is presented in Section 8.2.
An important step in the generic particle filter algorithm is to generate draws that reflect
the distribution of the states in period t conditional on the information Y1:t . These draws are
generated through an importance sampling step in which states are drawn from a proposal
distribution and reweighted. For the bootstrap particle filter, this proposal distribution is
based on the state transition equation. Unfortunately, the forward iteration of the state
transition equation might produce draws that are associated with highly variable weights,
which in turn leads to imprecise Monte Carlo approximations.
The accuracy of the particle filter can be improved by choosing other proposal distributions.
We discuss in Section 8.3 how to construct more efficient proposal distributions. While the
tailoring (or adaption) of the proposal distributions tends to require additional computations,
the number of particles can often be reduced drastically, which leads to an improvement in
efficiency. DSGE model-specific implementation issues of the particle filter are examined in
Section 8.4. Finally, we present the auxiliary particle filter and a filter recently proposed
by DeJong, Liesenfeld, Moura, Richard, and Dharmarajan (2013) in Section 8.5. Various
versions of the particle filter are applied to the small-scale New Keynesian DSGE model
and the SW model in Sections 8.6 and 8.7. We close this chapter with some computational
considerations in Section 8.8. Throughout this chapter we condition on a fixed vector of
parameter values θ and defer the topic of parameter estimation to Chapters 9 and 10.
149
8.1
The Bootstrap Particle Filter
We begin with a version of the particle filter in which the particles representing the hidden
state vector st are propagated by iterating the state-transition equation in (8.1) forward.
This version of the particle filter is due to Gordon, Salmond, and Smith (1993) and called
bootstrap particle filter. As in Algorithm 8, we use the sequence {ρt }Tt=1 to indicate whether
the particles are resampled in period t. A resampling step is necessary to prevent the
distribution of the particle weights from degenerating. We discuss the adaptive choice of ρt
below. The function h(·) is used to denote transformations of interest of the state vector st .
The particle filter algorithm closely follows the steps of the generic filter in Algorithm 1.
Algorithm 11 (Bootstrap Particle Filter)
iid
1. Initialization. Draw the initial particles from the distribution sj0 ∼ p(s0 ) and set
W0j = 1, j = 1, . . . , M .
2. Recursion. For t = 1, . . . , T :
j
(a) Forecasting st . Propagate the period t − 1 particles {sjt−1 , Wt−1
} by iterating the
state-transition equation forward:
s̃jt = Φ(sjt−1 , jt ; θ),
jt ∼ F (·; θ).
(8.3)
An approximation of E[h(st )|Y1:t−1 , θ] is given by
ĥt,M
M
1 X
j
=
h(s̃jt )Wt−1
.
M j=1
(8.4)
(b) Forecasting yt . Define the incremental weights
w̃tj = p(yt |s̃jt , θ).
(8.5)
The predictive density p(yt |Y1:t−1 , θ) can be approximated by
p̂(yt |Y1:t−1 , θ) =
M
1 X j j
w̃ W .
M j=1 t t−1
(8.6)
If the measurement errors are N (0, Σu ) then the incremental weights take the form
0 −1
1
j
j
j
−n/2
−1/2
w̃t = (2π)
|Σu |
exp − yt − Ψ(s̃t , t; θ) Σu yt − Ψ(s̃t , t; θ) , (8.7)
2
where n here denotes the dimension of yt .
150
(c) Updating. Define the normalized weights
W̃tj
=
1
M
j
w̃tj Wt−1
PM j j .
j=1 w̃t Wt−1
(8.8)
An approximation of E[h(st )|Y1:t , θ] is given by
h̃t,M
M
1 X
=
h(s̃jt )W̃tj .
M j=1
(8.9)
(d) Selection. Case (i): If ρt = 1 resample the particles via multinomial resampling.
Let {sjt }M
j=1 denote M iid draws from a multinomial distribution characterized by
support points and weights {s̃jt , W̃tj } and set Wtj = 1 for j =, 1 . . . , M .
Case (ii): If ρt = 0, let sjt = s̃jt and Wtj = W̃tj for j =, 1 . . . , M .
An approximation of E[h(st )|Y1:t , θ] is given by
h̄t,M
M
1 X
=
h(sjt )Wtj .
M j=1
(8.10)
3. Likelihood Approximation. The approximation of the log likelihood function is
given by
ln p̂(Y1:T |θ) =
T
X
t=1
ln
M
1 X j j
w̃ W
M j=1 t t−1
!
.
(8.11)
As for the SMC Algorithm 8, we can define an effective sample size (in terms of number
of particles) as
[t = M
ESS
!
M
1 X j 2
(W̃ ) .
M j=1 t
(8.12)
and replace the deterministic sequence {ρt }Tt=1 by an adaptively chosen sequence {ρ̂t }Tt=1 ,
[ t falls below a threshold, say M/2. In the remainder of
for which ρ̂t = 1 whenever ESS
this section we discuss the asymptotic properties of the particle filters, letting the number
of particles M −→ ∞, and the role of the measurement errors ut in the bootstrap particle
filter.
8.1.1
Asymptotic Properties
The convergence theory underlying the particle filter is similar to the theory sketched in
Chapter 5.2.4 for the SMC sampler. As in our presentation of the SMC sampler, in the
151
subsequent exposition we will abstract many of the technical details that underly the convergence theory for the SMC sampler. Our exposition will be mainly heuristic, meaning
that we will present some basic convergence results and the key steps for their derivation.
Rigorous derivations are provided in Chopin (2004). As in Chapter 5.2.4 the asymptotic
variance formulas are represented in recursive form. While this renders them unusable for
the computation of numerical standard errors, they do provide some qualitative insights into
the accuracy of the Monte Carlo approximations generated by the particle filter.
To simplify the notation, we drop the parameter vector θ from the conditioning set. Starting point is the assumption that Monte Carlo averages constructed from the t − 1 particle
j
swarm {sjt−1 , Wt−1
}M
j=1 satisfy an SLLN and a CLT of the form
a.s.
√
M h̄t−1,M
h̄t−1,M −→ E[h(st−1 )|Y1:t−1 ],
− E[h(st−1 )|Y1:t−1 ] =⇒ N 0, Ωt−1 (h) .
(8.13)
Based on this assumption, we will show the convergence of ĥt,M , p̂(yt |Y1:t−1 ), h̃t,M , and h̄t,M .
We write Ωt−1 (h) to indicate that the asymptotic covariance matrix depends on the function
h(st ) for which the expected value is computed. The filter is typically initialized by directly
sampling from the initial distribution p(s0 ), which immediately delivers the SLLN and CLT
provided the required moments of h(s0 ) exist. We now sketch the convergence arguments
for the Monte Carlo approximations in Steps 2(a) to 2(d) of Algorithm 11. A rigorous proof
would involve verifying the existence of moments required by the SLLN and CLT and a
careful characterization of the asymptotic covariance matrices.
Forecasting Steps. The forward iteration of the state-transition equation amounts to
drawing st from a conditional density gt (st |sjt−1 ). In Algorithm 11 this density is given by
gt (st |sjt−1 ) = p(st |sjt−1 ).
We denote expectations under this density as Ep(·|sj
t−1 )
ĥt,M
[h] and decompose
M
1 X j
j
− E[h(st )|Y1:t−1 ] =
h(s̃t ) − Ep(·|sj ) [h] Wt−1
t−1
M j=1
(8.14)
M
1 X
j
+
Ep(·|sj ) [h]Wt−1
− E[h(st )|Y1:t−1 ]
t−1
M j=1
= I + II,
say. This decomposition is similar to the decomposition (5.31) used in the analysis of the
mutation step of the SMC algorithm.
152
j
j
Conditional on {sjt−1 , Wt−1
}M
i=1 the weights Wt−1 are known and the summands in term I
form a triangular array of mean-zero random variables that within each row are indepenj
dently but not identically distributed. Provided the required moment bounds for h(s̃jt )Wt−1
are satisfied, I converges to zero almost surely and satisfies a CLT. Term II also converges
to zero because
M
1 X
a.s.
j
Ep(·|sj ) [h]Wt−1
−→ E Ep(·|st−1 ) [h] Y1:t−1
t−1
M j=1
Z Z
=
h(st )p(st |st−1 )dst p(st−1 |Y1:t−1 )dst−1
=
(8.15)
E[h(st )|Y1:t−1 ]
Thus, under suitable regularity conditions
√
a.s.
ĥt,M −→ E[h(st )|Y1:t−1 ],
M ĥt,M − E[h(st )|Y1:t−1 ] =⇒ N 0, Ω̂t (h) .
(8.16)
The asymptotic covariance matrix Ω̂t (h) is given by the sum of the asymptotic variances
√
√
of the terms M · I and M · II in (8.14). The convergence of the predictive density
approximation p̂(yt |Y1:t−1 ) to p(yt |Y1:t−1 ) in Step 2(b) follows directly from (8.16) by setting
h(st ) = p(yt |st ).
Updating and Selection Steps. The goal of the updating step is to approximate posterior
expectations of the form
R
h(st )p(yt |st )p(st |Y1:t−1 )dst
E[h(st )|Y1:t ] = R
≈
p(yt |st )p(st |Y1:t−1 )dst
1
M
PM
j
j
j
j=1 h(s̃t )w̃t Wt−1
PM j j
1
j=1 w̃t Wt−1
M
= h̃t,M .
(8.17)
The Monte Carlo approximation of E[h(st )|Y1:t ] has the same form as the Monte Carlo
approximation of h̃n,M in (5.24) in the correction step of the SMC Algorithm 8 and its
convergence can be analyzed in a similar manner. Define the normalized incremental weights
as
p(yt |st )
.
(8.18)
p(yt |st )p(st |Y1:t−1 )dst
Then, under suitable regularity conditions, the Monte Carlo approximation satisfies a CLT
vt (st ) = R
of the form
√
M h̃t,M − E[h(st )|Y1:t ]
=⇒ N 0, Ω̃t (h) , Ω̃t (h) = Ω̂t vt (st )(h(st ) − E[h(st )|Y1:t ]) .
(8.19)
The expression for the asymptotic covariance matrix Ω̃t (h) highlights that the accuracy
depends on the distribution of the incremental weights. Roughly, the larger the variance of
the particle weights, the less accurate the approximation.
153
Finally, the selection step in Algorithm 11 is identical to the selection step in Algorithm 8
and it adds some additional noise to the approximation. If ρt = 1, then under multinomial
resampling
√
M h̄t,M − E[h(st )|Y1:t ] =⇒ N 0, Ωt (h) ,
Ωt (h) = Ω̃t (h) + V[h(st )|Y1:t ].
(8.20)
As discussed in Chapter 5.2.3, the variance can be reduced by replacing the multinomial
resampling with a more efficient resampling scheme.
8.1.2
The Role of Measurement Errors
Many DSGE models, e.g., the ones considered in this book, do not assume that the observables yt are measured with error. Instead, the number of structural shocks is chosen to be
equal to the number of observables, which means that the likelihood function p(Y1:T |θ) is
nondegenerate. It is apparent from the formulas in Table 2.1 that the Kalman filter iterations
are well defined even if the measurement error covariance matrix Σu in the linear Gaussian
state space model (8.2) is equal to zero, provided that the number of shocks t is not smaller
than the number of observables and the forecast error covariance matrix Ft|t−1 is invertible.
For the bootstrap particle filter the case of Σu = 0 presents a serious problem. The
incremental weights w̃tj in (8.7) are degenerate if Σu = 0 because the conditional distribution
of yt |(st , θ) is a pointmass. For a particle j, this point mass is located at ytj = Ψ(s̃jt , t; θ). If
the innovation jt is drawn from a continuous distribution in the forecasting step and the state
transition equation Φ(st−1 , t ; θ) is a smooth function of the lagged state and the innovation
t , then the probability that ytj = yt is zero, which means that w̃tj = 0 for all j and the
particles vanish after one iteration. The intuition for this result is straightforward. The
incremental weights are large for particles j for which ytj = Ψ(s̃jt , t; θ) is close to the actual
yt . Under Gaussian measurement errors, the metric for closeness is given by Σ−1
u . Thus, all
else equal, decreasing the measurement error variance Σu increases the discrepancy between
ytj and yt and therefore the variance of the particle weights.
Consider the following stylized example (we are omitting the j superscripts). Suppose that
yt is scalar, the measurement errors are distributed according to ut ∼ N (0, σu2 ), Wt−1 = 1,
and let δ = yt − Ψ(st , t; θ). Suppose that in population the δ is distributed according to a
N (0, 1). In this case vt (st ) in (8.18) can be viewed as a population approximation of the
normalized weights W̃t constructed in the updating step (note that the denominator of these
154
two objects is slightly different):
n
o
1/2
exp − 2σ12 δ 2
1
1 2
u
o
n = 1+ 2
W̃t (δ) ≈ vt (δ) =
exp − 2 δ .
R
σu
2σu
(2π)−1/2 exp − 21 1 + σ12 δ 2 dδ
u
The asymptotic covariance matrix Ω̃t (h) in (8.19) which captures the accuracy of h̃t,M as
well as the heuristic effective sample size measure defined in (8.12) depend on the variance
of the particle weights, which in population is given by
Z
1 + 1/σu2
1 1 + σu2
p
vt2 (δ)dδ = p
=
−→ ∞ as σu −→ 0.
σu 2 + σu2
1 + 2/σu2
Thus, a decrease in the measurement error variance raises the variance of the particle weights
and thereby decreases the effective sample size. More importantly, the increasing dispersion
of the weights translates into an increase in the limit covariance matrix Ω̃t (h) and a deterioration of the Monte Carlo approximations generated by the particle filter. In sum, all else
equal, the smaller the measurement error variance, the less accurate the particle filter.
8.2
A Generic Particle Filter
In the basic version of the particle filter the time t particles were generated by simulating
the state transition equation forward. However, the naive forward simulation ignores information contained in the current observation yt and may lead to a very uneven distribution
of particle weights, in particular if the measurement error variance is small or if the model
has difficulties explaining the period t observation in the sense that for most particles s̃jt the
actual observation yt lies far in the tails of the model-implied distribution of yt |(s̃jt , θ). The
particle filter can be generalized by allowing s̃jt in the forecasting step to be drawn from a
generic importance sampling density gt (·|sjt−1 , θ), which leads to the following algorithm:
Algorithm 12 (Generic Particle Filter)
1. Initialization. (Same as Algorithm 11)
2. Recursion. For t = 1, . . . , T :
155
(a) Forecasting st . Draw s̃jt from density gt (s̃t |sjt−1 , θ) and define the importance
weights
ωtj
=
p(s̃jt |sjt−1 , θ)
gt (s̃jt |sjt−1 , θ)
.
(8.21)
An approximation of E[h(st )|Y1:t−1 , θ] is given by
ĥt,M
M
1 X
j
h(s̃jt )ωtj Wt−1
.
=
M j=1
(8.22)
(b) Forecasting yt . Define the incremental weights
w̃tj = p(yt |s̃jt , θ)ωtj .
(8.23)
The predictive density p(yt |Y1:t−1 , θ) can be approximated by
M
1 X j j
p̂(yt |Y1:t−1 , θ) =
w̃ W .
M j=1 t t−1
(8.24)
(c) Updating. (Same as Algorithm 11)
(d) Selection. (Same as Algorithm 11)
3. Likelihood Approximation. (Same as Algorithm 11).
The only difference between Algorithms 11 and 12 is the introduction of the importance
weights ωtj which appear in (8.22) as well as the definition of the incremental weights w̃tj
in (8.23). The main goal of replacing the forward iteration of the state-transition equation
by an importance sampling step is to improve the accuracy of p̂(yt |Y1:t−1 , θ) in Step 2(b) and
h̃t,M in Step 2(c).
We subsequently focus on the analysis of p̂(yt |Y1:t−1 , θ). Emphasizing the dependence of
ωtj
on both s̃jt and sjt−1 , write
p̂(yt |Y1:t−1 ) − p(yt |Y1:t−1 )
M 1 X
j
j j j
j
j j j
j
=
p(yt |s̃t )ωt (s̃t , st−1 ) − Egt (·|sj ) [p(yt |s̃t )ωt (s̃t , st−1 )] Wt−1
t−1
M j=1
M 1 X
j
j j j
j
+
Egt (·|sj ) [p(yt |s̃t )ωt (s̃t , st−1 )] − p(yt |Y1:t−1 ) Wt−1
t−1
M j=1
= I + II,
(8.25)
156
say. Consider term II. First, notice that
M
1 X
j
j
[p(yt |s̃jt )ωtj (s̃jt , sjt−1 )]Wt−1
E
M j=1 gt (·|st−1 )
a.s.
j
j
−→ E Egt (·|st−1 ) [p(yt |s̃t )ωt (s̃t , st−1 )]
#
Z "Z
j
j p(s̃t |st−1 )
j
j
=
p(yt |s̃t )
gt (s̃t |st−1 )ds̃t p(st−1 |Y1:t−1 )dst−1
gt (s̃jt |st−1 )
=
(8.26)
p(yt |Y1:t−1 ),
which implies that term II converges to zero almost surely and ensures the consistency of
the Monte Carlo approximation. Second, because
Egt (·|st−1 ) [p(yt |s̃jt )ωt (s̃jt , st−1 )]
Z
=
p(yt |s̃jt )
p(s̃jt |st−1 )
gt (s̃jt |st−1 )ds̃jt = p(yt |st−1 ),
j
gt (s̃t |st−1 )
(8.27)
the variance of term II is independent of the choice of the importance density gt (s̃jt |sjt−1 ).
j
j
Now consider term I. Conditional on {sjt−1 , Wt−1
}M
j=1 the weights Wt−1 are known and the
summands form a triangular array of mean-zero random variables that are independently
distributed within each row. Recall that
p(yt |s̃jt )ωtj (s̃jt , sjt−1 ) =
p(yt |s̃jt )p(s̃jt |sjt−1 )
gt (s̃jt |sjt−1 )
.
(8.28)
Choosing a suitable importance density gt (s̃jt |sjt−1 ) that is a function of the time t observation
j
yt can drastically reduce the variance of term I conditional on {sjt−1 , Wt−1
}M
j=1 . Such filters
are called adapted particle filters. In turn, this leads to a variance reduction of the Monte
Carlo approximation of p(yt |Y1:t−1 ). A similar argument can be applied to the variance
of h̃t,M . The bootstrap particle filter simply sets gt (s̃jt |sjt−1 ) = p(s̃jt |sjt−1 ) and ignores the
information in yt . We will subsequently discuss more efficient choices of gt (s̃jt |sjt−1 ).
8.3
Adapting the Generic Filter
There exists a large literature on the implementation and the improvement of the particle
filters in Algorithms 11 and 12. Detailed references to this literature are provided, for
instance, in Doucet, de Freitas, and Gordon (2001), Cappé, Godsill, and Moulines (2007),
Doucet and Johansen (2011), and Creal (2012). We will focus in this section on the choice of
157
the proposal density gt (s̃jt |sjt−1 ). The starting point is the conditionally-optimal importance
distribution. In nonlinear DSGE models it is typically infeasible to generate draws from this
distribution but it might be possible to construct an approximately conditionally-optimal
proposal. Finally, we consider conditionally linear Gaussian state-space model for which it
is possible to use Kalman filter updating for a subset of state variables conditional on the
remaining elements of the state vector.
8.3.1
Conditionally-Optimal Importance Distribution
The conditionally-optimal distribution, e.g., Liu and Chen (1998), is defined as the distribution that minimizes the Monte Carlo variation of the importance weights. However, this
notion of optimality conditions on the current observation yt as well as the t − 1 particle
sjt−1 . Given (yt , sjt−1 ) the weights w̃tj are constant (as a function of s̃t ) if
gt (s̃t |sjt−1 ) = p(s̃t |yt , sjt−1 ),
(8.29)
that is, s̃t is sampled from the posterior distribution of the period t state given (yt , sjt−1 ). In
this case
w̃tj
Z
=
p(yt |st )p(st |sjt−1 )dst .
(8.30)
In a typical (nonlinear) DSGE model applications it is not possible to sample directly from
p(s̃t |yt , sjt−1 ). One could use an accept-reject algorithm as discussed in Künsch (2005) to
generate draws from the conditionally-optimal distribution. However, for this approach to
work efficiently, the user needs to find a good proposal distribution within the accept-reject
algorithm.
As mentioned above, our numerical illustrations below are all based on DSGE models that
take the form of the linear Gaussian state-space model (8.2). In this case, one can obtain
p(s̃t |yt , sjt−1 ) from the Kalman filter updating step described in Table 2.1. Let
s̄jt|t−1 = Φ1 sjt−1
Pt|t−1 = Φ Σ Φ0
j
ȳt|t−1
= Ψ0 + Ψ1 t + Ψ2 s̄jt|t−1
Ft|t−1 = Ψ2 Pt|t−1 Ψ02 + Σu
s̄jt|t
−1
= s̄jt|t−1 + Pt|t−1 Ψ02 Ft|t−1
(yt − ȳt|t−1 ) Pt|t
−1
= Pt|t−1 − Pt|t−1 Ψ02 Ft|t−1
Ψ2 Pt|t−1 .
The conditionally optimal proposal distribution is given by
s̃t |(sjt−1 , yt ) ∼ N s̄jt|t , Pt|t .
(8.31)
We will use (8.31) as a benchmark to document how accurate a well-adapted particle filter
could be. In applications with nonlinear DSGE models, in which it is not possible to sample
directly from p(s̃t |yt , sjt−1 ), the documented level of accuracy is typically not attainable.
158
8.3.2
Approximately Conditionally-Optimal Distributions
In a typical DSGE model application, sampling from the conditionally-optimal importance
distribution is infeasible or computationally too costly. Alternatively, one could try to sample from an approximately conditionally-optimal importance distribution. For instance, if
the DSGE model nonlinearity arises from a higher-order perturbation solution and the nonlinearities are not too strong, then an approximately conditionally-optimal importance distribution could be obtained by applying the one-step Kalman filter updating in (8.31) to the
first-order approximation of the DSGE model. More generally, as suggested in Guo, Wang,
and Chen (2005), one could use the updating steps of a conventional nonlinear filter, such
as an extended Kalman filter, unscented Kalman filter, or a Gaussian quadrature filter, to
construct an efficient proposal distribution. Approximate filters for nonlinear DSGE models
have been developed by Andreasen (2013) and Kollmann (2014).
8.3.3
Conditional Linear Gaussian Models
Certain DSGE models have a conditional linear structure that can be exploited to improve
the efficiency of the particle filter. These models include the class of Markov switching linear
rational expectations (MS-LRE) models analyzed in Farmer, Waggoner, and Zha (2009) as
well as models that are linear conditional on exogenous stochastic volatility processes, e.g.,
the linearized DSGE model with heteroskedastic structural shocks estimated by Justiniano
and Primiceri (2008) and the long-run risks model studied in Schorfheide, Song, and Yaron
(2014).
For concreteness, consider an MS-LRE model obtained by replacing the fixed targetinflation rate π ∗ in the monetary policy rule (1.24) with a time-varying process πt∗ (mt )
of the form
∗
πt∗ = mt πL∗ + (1 − mt )πH
,
P{mt = l|mt−1 = l} = ηll ,
l ∈ {0, 1}.
(8.32)
This model was estimated in Schorfheide (2005).1 Log-linearizing the model with Markovswitching target inflation rate leads to a MS-LRE similar to (2.1), except that the loglinearized monetary policy rule now contains an intercept that depends on mt . The solution
1
A richer DSGE model with a Markov switching target inflation rate was subsequently estimated by Liu,
Waggoner, and Zha (2011).
Chapter 9
Combining Particle Filters with MH
Samplers
We previously focused on the particle-filter approximation of the likelihood function of a potentially nonlinear DSGE model. In order to conduct Bayesian inference, the approximate
likelihood function has to be embedded into a posterior sampler. We begin by combining
the particle filtering methods of Chapter 8 with the MCMC methods of Chapter 4. In a nutshell, we replace the actual likelihood functions that appear in the formula for the acceptance
probability α(ϑ|θi−1 ) in Algorithm 5 by particle filter approximations p̂(Y |θ). This idea was
first proposed for the estimation of nonlinear DSGE models by Fernández-Villaverde and
Rubio-Ramı́rez (2007). We refer to the resulting algorithm as PFMH algorithm. It is a
special case of a larger class of algorithms called particle Markov chain Monte Carlo (PMCMC). The theoretical properties of PMCMC methods were established in Andrieu, Doucet,
and Holenstein (2010). Applications of PFMH algorithms in other areas of econometrics are
discussed in Flury and Shephard (2011).
9.1
The PFMH Algorithm
The statistical theory underlying the PFMH algorithm is very complex and beyond the scope
of this book. We refer the interested reader to Andrieu, Doucet, and Holenstein (2010) for a
careful exposition. Below we will sketch the main idea behind the algorithm. The exposition
is based on Flury and Shephard (2011). We will distinguish between {p(Y |θ), p(θ|Y ), p(Y )}
186
and {p̂(Y |θ), p̂(θ|Y ), p̂(Y )}. The first triplet consists of the exact likelihood function p(Y |θ)
and the resulting posterior distribution and marginal data density defined as
Z
p(Y |θ)p(θ)
p(θ|Y ) =
, p(Y ) = p(Y |θ)p(θ)dθ.
p(Y )
(9.1)
The second triplet consists of the particle filter approximation of the likelihood function
denoted by p̂(Y |θ) and the resulting posterior and marginal data density:
Z
p̂(Y |θ)p(θ)
p̂(θ|Y ) =
, p̂(Y ) = p̂(Y |θ)p(θ)dθ.
p̂(Y )
(9.2)
By replacing the exact likelihood function p(θ|Y ) with the particle filter approximation
p̂(Y |θ) in Algorithm 5 one might expect to obtain draws from the approximate posterior
p̂(θ|Y ) instead of the exact posterior p(θ|Y ). The surprising implication of the theory developed in Andrieu, Doucet, and Holenstein (2010) is that the distribution of draws from the
PFMH algorithm that replaces p(Y |θ) by p̂(Y |θ) in fact does converge to the exact posterior.
The algorithm takes the following form:
Algorithm 18 (PFMH Algorithm) For i = 1 to N :
1. Draw ϑ from a density q(ϑ|θi−1 ).
2. Set θi = ϑ with probability
α(ϑ|θ
i−1
p̂(Y |ϑ)p(ϑ)/q(ϑ|θi−1 )
) = min 1,
p̂(Y |θj−1 )p(θi−1 )/q(θi−1 |ϑ)
and θi = θi−1 otherwise. The likelihood approximation p̂(Y |ϑ) is computed using Algo-
rithm 12.
Any of the particle filters described in Chapter 8 could be used in the PFMH algorithm.
For concreteness, we use the generic filter described in Algorithm 12. At each iteration the
0
and
filter generates draws s̃jt from the proposal distribution gt (·|sjt−1 ). Let S̃t = s̃1t , . . . , s̃M
t
1:M
denote the entire sequence of draws by S̃1:T
. In the selection step we are using multinomial
resampling to determine the ancestor for each particle in the next iteration. Thus, we can
define a random variable Ajt that contains this ancestry information. For instance, suppose
1
that during the resampling particle j = 1 was assigned the value s̃10
t then At = 10. Note
that the Ajt ’s are random variables whose values are determined during the selection step.
Let At = A1t , . . . , AN
and use A1:T to denote the sequence of At ’s.
t
187
The PFMH algorithm operates on a probability space that includes the parameter vector θ
as well as S̃1:T and A1:T . We use U1:T to denote the sequence of random vectors that are used
to generate S̃1:T and A1:T . U1:T can be thought of as an array of iid uniform random numbers.
The transformation of U1:T into (S̃1:T , A1:T ) typically depends on θ and Y1:T because the
proposal distribution gt (s̃t |sjt−1 ) in Algorithm 12 depends both on the current observation yt
as well as the parameter vector θ which enters measurement and state-transitions equations,
see (8.1).
For concreteness, consider the conditionally-optimal particle filter for a linear state-space
model described in Chapter 8.3.1. Implementation of this filter requires sampling from a
N (s̄jt|t , Pt|t ) distribution for each particle j. The mean of this distribution depends on yt
and both mean and covariance matrix depend on θ through the system matrices of the
state-space representation (8.2). Draws from this distribution can in principle be obtained,
by sampling iid uniform random variates, using a probability integral transform to convert
them into iid draws from a standard normal distribution, and then converting them into
j
draws from a N (s̄jt|t , Pt|t
). Likewise, in the selection step, the multinomial resampling could
be implemented based on draws from iid uniform random variables. Therefore, we can
express the particle filter approximation of the likelihood function as
p̂(Y1:T |θ) = g(Y1:T |θ, U1:T ).
where
U1:T ∼ p(U1:T ) =
T
Y
p(Ut ).
(9.3)
(9.4)
t=1
The PFMH algorithm can be interpreted as operating on an enlarged probability space for
the triplet (Y1:T , θ, U1:T ). Define the joint distribution
pg Y1:T , θ, U1:T = g(Y1:T |θ, U1:T )p U1:T p(θ).
(9.5)
The PFMH algorithm samples from the joint posterior
pg θ, U1:T |Y1:T ∝ g(Y |θ, U1:T )p U1:T p(θ)
(9.6)
and discards the draws of U1:T . For this procedure to be valid, it has to be the case
that marginalizing the joint posterior pg θ, U1:T |Y1:T with respect to U1:T yields the exact
posterior p(θ|Y1:T ). In other words, we require that the particle filter produces an unbiased
simulation approximation of the likelihood function for all values of θ:
Z
E[p̂(Y1:T |θ)] = g(Y1:T |θ, U1:T )p U1:T dθ = p(Y1:T |θ).
(9.7)
188
Del Moral (2004) has shown that particle filters indeed deliver unbiased estimates of the
likelihood function.
It turns out that the acceptance probability for the MH algorithm that operates on the
enlarged probability space can be directly expressed in terms of the particle filter approximation p̂(Y1:T |θ). On the enlarged probability space, one needs to generate a proposed draw
∗
for both θ and U1:T . We denote these draws by ϑ and U1:T
. The proposal distribution for
∗
∗
(ϑ, U1:T
) in the MH algorithm is given by q(ϑ|θ(i−1) )p(U1:T
). There is no need to keep track
∗
of the draws (U1:T
), because the acceptance ratio for Algorithm 18 can be written as follows
(omitting the time subscripts):


α(ϑ|θi−1 ) = min 1,

(
= min
g(Y |ϑ,U ∗ )p(U ∗ )p(ϑ)
q(ϑ|θ(i−1) )p(U ∗ )


g(Y |θ(i−1) ,U (i−1) )p(U (i−1) )p(θ(i−1) ) 
q(θ(i−1) |θ∗ )p(U (i−1) )
(9.8)
)
p̂(Y |ϑ)p(ϑ) q(ϑ|θ(i−1) )
1,
.
p̂(Y |θ(i−1) )p(θ(i−1) ) q(θ(i−1) |ϑ)
The terms p(U ∗ ) and p(U (i−1) ) cancel from the expression in the first line of (9.8) and it
suffices to record the particle filter likelihood approximations p̂(Y |ϑ) and p̂(Y |θ(i−1) ).
9.2
Application to the Small-Scale DSGE Model
We now apply the PFMH algorithm to the small-scale New Keynesian model, which is estimated over the period 1983:I to 2002:IV. We use the 1-block RWMH-V algorithm and
combine it with the Kalman filter, the bootstrap PF, and the conditionally optimal PF.
According to the theory sketched in the previous section the PFMH algorithm should accurately approximate the posterior distribution of the DSGE model parameters. Our results
are based on Nrun = 20 runs of each algorithm. In each run we generate N = 100, 000
posterior draws and discard the first N0 = 50, 000. As in Chapter 8.6, we use M = 40, 000
particles for the bootstrap filter and M = 400 particles for the conditionally-optimal filter.
A single run of the RWMH-V algorithm takes 1 minute and 30 seconds with the Kalman
filter, approximately 40 minutes with the conditionally-optimal PF, and approximately 1
day with the bootstrap PF.
The results are summarized in Table 9.1. Most notably, despite the inaccurate likelihood
approximation of the bootstrap PF documented in Chapter 8.6, the PFMH works remarkably
well. Columns 2 to 4 of the table report posterior means which are computed by pooling the
189
Table 9.1: Accuracy of MH Approximations
Posterior Mean (Pooled)
Inefficiency Factors
Std Dev of Means
KF
CO-PF
BS-PF
KF
CO-PF
BS-PF
KF
τ
2.63
2.62
2.64
66.17
126.76
1360.22
0.020
0.028
0.091
κ
0.82
0.81
0.82
128.00
97.11
1887.37
0.007
0.006
0.026
ψ1
1.88
1.88
1.87
113.46
159.53
749.22
0.011
0.013
0.029
ψ2
0.64
0.64
0.63
61.28
56.10
681.85
0.011
0.010
0.036
ρr
0.75
0.75
0.75
108.46
134.01
1535.34
0.002
0.002
0.007
ρg
0.98
0.98
0.98
94.10
88.48
1613.77
0.001
0.001
0.002
ρz
0.88
0.88
0.88
124.24
118.74
1518.66
0.001
0.001
0.005
r(A)
0.44
0.44
0.44
148.46
151.81
1115.74
0.016
0.016
0.044
π
(A)
3.32
3.33
3.32
152.08
141.62
1057.90
0.017
0.016
0.045
γ
(Q)
0.59
0.59
0.59
106.68
142.37
899.34
0.006
0.007
0.018
σr
0.24
0.24
0.24
35.21
179.15
1105.99
0.001
0.002
0.004
σg
0.68
0.68
0.67
98.22
64.18
1490.81
0.003
0.002
0.011
σz
0.32
0.32
0.32
84.77
61.55
575.90
0.001
0.001
0.003
-357.17
-358.32
0.040
0.038
0.949
ln p̂(Y ) -357.14
CO-PF BS-PF
Notes: Results are based on Nrun = 20 runs of the PF-RWMH-V algorithm. Each run
of the algorithm generates N = 100, 000 draws and the first N0 = 50, 000 are discarded.
The likelihood function is computed with the Kalman filter (KF), bootstrap particle filter
(BS-PF, M = 40, 000) or conditionally-optimal particle filter (CO-PF, M = 400). “Pooled”
means that we are pooling the draws from the Nrun = 20 runs to compute posterior statistics.
draws generated by the 20 runs of the algorithms. Except for some minor discrepancies in
the posterior mean for τ and r(A) , which are parameters with a high posterior variance, the
posterior mean approximations are essentially identical for all three likelihood evaluation
methods. Columns 5 to 7 contain the inefficiency factors InEffN (θ̄) for each parameters
and the last three columns of Table 9.1 contain the standard deviations of the posterior
mean estimates across the 20 runs. Not surprisingly, the posterior sampler that is based on
the bootstrap PF is the least accurate. The standard deviations are 2 to 4 times as large
as for the samplers that utilize either the Kalman filter or the conditionally-optimal PF.
Under the Kalman filter the inefficiency factors range from 35 to about 150, whereas under
the bootstrap particle filter they range from 575 to 1,890. As stressed in Section 9.1 the
most important requirement for PFMH algorithms is that the particle filter approximation
190
Figure 9.1: Autocorrelation of PFMH Draws
τ
1.0
0.9
κ
1.0
0.9
0.9
0.8
0.8
0.8
0.7
0.6
0.7
0.7
0.5
0.6
0.6
0.5
0.4
0.5
0.3
0.2
0
5 10 15 20 25 30 35 40
ρr
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0
5 10 15 20 25 30 35 40
0.4
0.4
0
5 10 15 20 25 30 35 40
ψ1
1.0
0.3
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0
5 10 15 20 25 30 35 40
0
0.3
5 10 15 20 25 30 35 40
ψ2
1.0
0.9
0.3
σr
1.0
0
5 10 15 20 25 30 35 40
Notes: The figure depicts autocorrelation functions computed from the output of the 1 Block
RWMH-V algorithm based on the Kalman filter (solid), the conditionally-optimal particle
filter (dashed) and the bootstrap particle filter (solid with dots).
is unbiased – it does not have to be exact.
In Figure 9.1 we depict autocorrelation functions for parameter draws computed based on
the output of the PFMH algorithms. As a benchmark, the figure also contains autocorrelation
functions obtained from the sampler that uses the exact likelihood function computed with
the Kalman filter. While under the conditionally-optimal particle filter the persistence of
the Markov chain for the DSGE model parameters is comparable to the persistence under
the Kalman filter, the use of the bootstrap particle filter raises the serial correlation of the
parameter draws drastically, which leads to the less precise Monte Carlo approximations
reported in Table 9.1.
191
9.3
Application to the SW Model
We now use the PF-RWMH-V algorithm to estimate the SW model. Unlike in Chapter 6.2,
where we used a more diffuse prior distribution to estimate the SW model, we now revert back
to the prior originally specified by Smets and Wouters (2007). This prior is summarized in
Table A-2 in the Appendix. As shown in Herbst and Schorfheide (2014), under the original
prior distribution the RWMH algorithm is much better behaved than under our diffuse
prior, because it leads to a posterior distribution that does not exhibit multiple modes. The
estimation sample is 1966:I to 2004:IV. Using the RWMH-V algorithm, we estimate the model
posteriors using the Kalman filter and the conditionally-optimal PF, running each algorithm
Nrun = 20 times. The bootstrap particle filters with M = 40, 000 and M = 400, 000 particles
turned out to be too inaccurate to deliver reliable posterior estimates.
Table 9.2 show the results for the Kalman filter and the conditionally-optimal PF. While
the the MCMC chains generated using the KF and CO-PF generate roughly the same means
for the parameter draws on average, the variability across the chains is much higher for the
CO-PF. According to inefficiency factors, the KF chains are about 10 more efficient than
the CO-PF.
The results are summarized in Table 9.2. Due to the computational complexity of the
PF-RWMH-V algorithm, the results reported in the table are based on N = 10, 000 instead
of N = 100, 000 draws from the posterior distribution. We used the conditionally-optimal
PF with M = 40, 000 particles and a single-block RWMH-V algorithm in which we scaled
the posterior covariance matrix that served as covariance matrix of the proposal distribution
by c2 = 0.252 for the KF and c2 = 0.052 for the conditionally-optimal PF. This leads to
acceptance rates of 33% for the KF and 24% for the conditionally-optimal PF. In our experience, the noisy approximation of the likelihood function through the PF makes it necessary
to reduce the variance of the proposal distribution to maintain a targeted acceptance rate.
In the SW application the proposed moves using the PF approximation are about five times
smaller than under the exact KF likelihood function. This increases the persistence of the
Markov chain and leads to a reduction in accuracy. Because of the difference in precision
of PF approximations at different points in the parameter space, the RWMH-V acceptance
rates varies much more across chains. For example, the standard deviation of the acceptance
rate for the CO-PF PMCMC is 0.09, about ten times larger than for the KF runs.
While the pooled posterior means using the KF and the conditionally-optimal PF reported
in Table 9.1 are very similar, the standard deviation of the means across runs is three to
192
five times larger if the PF approximation of the likelihood function is used. Because the
PF approximation of the log likelihood function is downward-biased the log marginal data
density approximation obtained with the PF is much smaller than the one obtained with the
KF.
Reducing the number of particles for the conditionally-optimal PF to 4,000 or switching
to the bootstrap PF with 40,000 or 400,000 particles was not successful in the sense that the
acceptance rate quickly dropped to zero. Reducing the variance of the proposal distribution
did not solve the problem because to obtain a nontrivial acceptance rate the step-size had
to be so small that the sampler would not be able to traverse the high posterior density
region of the parameter space in a reasonable amount of time. In view of the accuracy of the
likelihood approximation reported in Table 8.4 this is not surprising. The PF approximations
are highly volatile and even though the PF approximation is unbiased in theory, finite sample
averages appear to be severely biased.
In a nutshell, if the variation in the likelihood, conditional on a particular value of θ,
is much larger than the variation that we observe along a Markov chain (evaluating the
likelihood for the sequence θi , i = 1, . . . , N ) that is generated by using the exact likelihood
function, the sampler easily gets stuck in the sense that the acceptance rate for proposed
draws drops to zero. Once the PF has generated an estimate p̂(Y |θ) that exceeds p(Y |θ) by
a wide margin, it becomes extremely difficult to move to a nearby θ. The reason is that, on
the one hand, most of the PF evaluations underestimate p(Y |θ) and, on the other hand, two
θ’s that are close to each other tend to be associated with similar likelihood values.
9.4
Computational Considerations
We implement the PFMH algorithm on a single machine, utilizing up to twelve cores. Efficient parallelization of the algorithm is difficult, because it is challenging to parallelize
MCMC algorithms and it is not profitable to use distributed memory parallelization for
the filter. For the small-scale DSGE model it takes 30:20:33 [hh:mm:ss] hours to generate 100,000 parameter draws using the bootstrap PF with 40,000 particles. Under the
conditionally-optimal filter we only use 400 particles, which reduces the run time to 00:39:20
minutes. Thus, with the conditionally-optimal filter, the PFMH algorithm runs about 50
times faster and delivers highly accurate approximations of the posterior means. For the
193
SW model the computational time is substantially larger. It took 05:14:20:00 [dd:hh:mm:ss]
days to generate 10,000 draws using the conditionally-optimal PF with 40,000 particles.
In practical applications with nonlinear DSGE models the conditionally-optimal PF that
we used in our numerical illustrations is typically not available and has to be replaced by
one of the other filters, possibly an approximately conditionally-optimal PF. Having a good
understanding of the accuracy of the PF approximation is crucial. Thus, we recommend to
assess the variance of the likelihood approximation at various points in the parameter space
as we did in Chapters 8.6 and 8.7 and to tailor the filter until it is reasonably accurate.
To put the accuracy of the filter approximation into perspective, one could compare it to
the variation in the likelihood function of a linearized DSGE model fitted to the same data,
along a sequence of posterior draws θi . If the variation in the likelihood function due to the
PF approximation is larger than the variation generated by moving through the parameter
space, the PF-MH algorithm is unlikely to produce reliable results.
In general, likelihood evaluations for nonlinear DSGE models are computationally very
costly. Rather than spending computational resources on tailoring the proposal density for
the PF to reduce the number of particles, one can also try to lower the number of likelihood
evaluations in the MH algorithm. Smith (2012) developed a PFMH algorithm based on
surrogate transitions. In a nutshell the algorithm proceeds as follows: Instead of evaluating
the posterior density (and thereby the DSGE model likelihood function) for every proposal
draw ϑ, one first evaluates the likelihood function for an approximate model, e.g., a linearized
DSGE model, or one uses a fast approximate filter, e.g., an extended Kalman filter, to obtain
a likelihood value for the nonlinear model. Using the surrogate likelihood, one can compute
the acceptance probability α. For ϑ’s rejected in this step, one never has to execute the timeconsuming PF computations. If the proposed draw ϑ is accepted in the first stage, then a
second randomization that requires the evaluation of the actual likelihood is necessary to
determine whether θi = ϑ or θi = θi−1 . If the surrogate transition is well tailored, then the
acceptance probability in the second step is high and the overall algorithm accelerates the
posterior sampler by reducing the number of likelihood evaluations for poor proposals ϑ.
194
Table 9.2: Accuracy of MH Approximations
Post. Mean (Pooled)
KF
(100β
−1
π̄
¯l
− 1)
CO-PF
Ineff. Factors
Std Dev of Means
KF
CO-PF
KF
CO-PF
0.14
0.14
172.58
3732.90
0.007
0.034
0.73
0.74
185.99
4343.83
0.016
0.079
0.51
0.37
174.39
3133.89
0.130
0.552
α
0.19
0.20
149.77
5244.47
0.003
0.015
σc
1.49
1.45
86.27
3557.81
0.013
0.086
Φ
1.47
1.45
134.34 4930.55
0.009
0.056
ϕ
5.34
5.35
138.54
3210.16
0.131
0.628
h
0.70
0.72
277.64
3058.26
0.008
0.027
ξw
0.75
0.75
343.89
2594.43
0.012
0.034
σl
2.28
2.31
162.09
4426.89
0.091
0.477
ξp
0.72
0.72
182.47
6777.88
0.008
0.051
ιw
0.54
0.53
241.80
4984.35
0.016
0.073
ιp
0.48
0.50
205.27
5487.34
0.015
0.078
ψ
0.45
0.44
248.15
3598.14
0.020
0.078
rπ
2.09
2.09
98.32
3302.07
0.020
0.116
ρ
0.80
0.80
241.63
4896.54
0.006
0.025
ry
0.13
0.13
243.85
4755.65
0.005
0.023
r∆y
0.21
0.21
101.94
5324.19
0.003
0.022
ρa
0.96
0.96
153.46
1358.87
0.002
0.005
ρb
0.22
0.21
325.98
4468.10
0.018
0.068
ρg
0.97
0.97
57.08
2687.56
0.002
0.011
ρi
0.71
0.70
219.11
4735.33
0.009
0.044
ρr
0.54
0.54
194.73
4184.04
0.020
0.094
ρp
0.80
0.81
338.69
2527.79
0.022
0.061
ρw
0.94
0.94
135.83
4851.01
0.003
0.019
ρga
0.41
0.37
196.38
5621.86
0.025
0.133
µp
0.66
0.66
300.29
3552.33
0.025
0.087
µw
0.82
0.81
218.43
5074.31
0.011
0.052
σa
0.34
0.34
128.00
5096.75
0.005
0.034
σb
0.24
0.24
186.13
3494.71
0.004
0.016
σg
0.51
0.49
208.14
2945.02
0.006
0.021
σi
0.43
0.44
115.42
6093.72
0.006
0.043
σr
0.14
0.14
193.37
3408.01
0.004
0.016
σp
0.13
0.13
194.22
4587.76
0.003
0.013
σw
0.22
0.22
211.80
2256.19
0.004
0.012
0.298
9.139
ln p̂(Y )
-964.44 -1017.94
Notes: Results are based on Nrun = 20 runs of the PF-RWMH-V algorithm. Each run of
the algorithm generates N = 10, 000 draws. The likelihood function is computed with the
Kalman filter (KF) or conditionally-optimal particle filter (CO-PF). “Pooled” means that
we are pooling the draws from the Nrun = 20 runs to compute posterior statistics. The
CO-PF uses M = 40, 000 particles to compute the likelihood.
Chapter 10
Combining Particle Filters with SMC
Samplers
Following recent work by Chopin, Jacob, and Papaspiliopoulos (2012), we now combine the
SMC algorithm of Chapter 5 with the particle filter approximation of the likelihood function
developed in Chapter 8 to develop an SM C 2 algorithm.
10.1
An SM C 2 Algorithm
As with the PFMH algorithm, our goal is to obtain a posterior sampler for the DSGE model
parameters for settings in which the likelihood function of the DSGE model cannot be evaluated with the Kalman filter. The starting point is the SMC Algorithm 8. However, we make
a number of modifications to our previous algorithm. Some of these modifications are important, others are merely made to simplify the exposition. First and foremost, we add data
sequentially to the likelihood function rather than tempering the entire likelihood function:
we consider the sequence of posteriors πnD (θ) = p(θ|Y1:tn ), defined in (5.3), where tn = bφn T c.
The advantage of using data tempering are that the particle filter can deliver an unbiased
estimate of the incremental weight p(Ytn−1 +1:tn |θ) in the correction step, whereas the estimate
of a concave transformation p(Y1:T |θ)φn −φn−1 tends to be biased. Moreover, in general one
has to evaluate the likelihood only for tn observations instead of all T observations, which
can speed up computations considerably.
Second, the evaluation of the incremental and the full likelihood function in the correction and mutation steps of Algorithm 8 are replaced by the evaluation of the respective
196
Table 10.1: Particle System for SM C 2 Sampler After Stage n
Parameter
State
(θn1 , Wn1 )
(θn2 , Wn2 )
1,1
(s1,1
tn , Wtn )
2,1
(s2,1
tn , Wtn )
(st1,2
, Wt1,2
)
n
n
(st2,2
, Wt2,2
)
n
n
(θnN , WnN )
N,1
(sN,1
tn , Wtn )
..
.
..
.
···
1,M
(s1,M
tn , Wtn )
..
.
···
..
.
2,M
(s2,M
tn , Wtn )
..
.
N,2
(sN,2
tn , Wtn )
···
N,M
(sN,M
)
tn , Wtn
particle filter approximations. Using the same notation as in (9.3), we write the particle
approximations as
p̂(ytn−1 +1:tn |Y1:tn−1 , θ) = g(ytn−1 +1:tn |Y1:tn−1 , θ, U1:tn ),
p̂(Y1:tn |θn ) = g(Y1:tn |θn , U1:tn ). (10.1)
As before, U1:tn is an array of iid uniform random variables generated by the particle filter
with density p(U1:tn ), see (9.4). The approximation of the likelihood increment also depends on the entire sequence p(U1:tn ), because of the recursive structure of the filter: the
particle approximation of p(stn−1 +1 |Y1:tn−1 , θ) is dependent on the particle approximation of
p(stn−1 |Y1:tn−1 , θ). The distribution of U1:tn does neither depend on θ nor on Y1:tn and can be
factorized as
p(U1:tn ) = p(U1:t1 )p(Ut1 +1:t2 ) · · · p(Utn−1 +1:tn ).
(10.2)
To describe the particle system we follow the convention of Chapter 5 and index the
parameter vector θ by the stage n of the SMC algorithm and write θn . The particles generated
by the SMC sampler are indexed i = 1, . . . , N and the particles generated by the particle
filter are indexed j = 1, . . . , M . At stage n we have a particle system {θni , Wni }N
i=1 that
represents the posterior distribution p(θn |Y1:tn ). Moreover, for each θni we have a particle
system that represents the distribution p(st |Y1:tn , θni ). To distinguish the weights used for
the particle values that represent the conditional distribution of θt from the weights used
to characterize the conditional distribution of st , we denote the latter by W instead of W .
Moreover, because the distribution of the states is conditional on the value of θ, we use i, j
i,j M
superscripts: {si,j
t , Wt }j=1 . The particle system can be arranged in the matrix form given
in Table 10.1.
Finally, to streamline the notation used in the description of the algorithm, we assume
that during each stage n exactly one observation is added to the likelihood function. Thus,
197
we can write θt instead of θn and Y1:t instead of Y1:tn and the number of stages is Nφ = T .
Moreover, we resample the θ particles at every iteration of the algorithm (which means we
do not have to keep track of the resampling indicator ρt ) and we only use one MH step in
the mutation phase.
Algorithm 19 (SM C 2 )
iid
1. Initialization. Draw the initial particles from the prior: θ0i ∼ p(θ) and W0i = 1,
i = 1, . . . , N .
2. Recursion. For t = 1, . . . , T ,
(a) Correction. Reweight the particles from stage t − 1 by defining the incremental
weights
i
i
i
w̃ti = p̂(yt |Y1:t−1 , θt−1
) = g(yt |Y1:t−1 , θt−1
, U1:t
)
(10.3)
and the normalized weights
W̃ti =
1
N
i
w̃ni Wt−1
,
PN
i
i
i=1 w̃t Wt−1
i = 1, . . . , N.
(10.4)
An approximation of Eπt [h(θ)] is given by
h̃t,N
N
1 X i i
=
W̃ h(θt−1 ).
N i=1 t
(10.5)
(b) Selection. Resample the particles via multinomial resampling. Let {θ̂ti }M
i=1 denote
M iid draws from a multinomial distribution characterized by support points and
i
i
weights {θt−1
, W̃ti }M
j=1 and set Wt = 1. Define the vector of ancestors At with
elements Ait by setting Ait = k if the ancestor of resampled particle i is particle k,
k
that is, θ̂ti = θt−1
.
An approximation of Eπt [h(θ)] is given by
ĥt,N
N
1 X i i
W h(θ̂t ).
=
N j=1 t
(10.6)
(c) Mutation. Propagate the particles {θ̂ti , Wti } via 1 step of an MH algorithm. The
proposal distribution is given by
∗i
q(ϑit |θ̂ti )p(U1:t
)
(10.7)
198
and the acceptance ratio can be expressed as
(
)
i i
i
i
)
|
θ̂
)/q(ϑ
)p(ϑ
p̂(Y
|ϑ
1:t t
t t
t
α(ϑit |θ̂ti ) = min 1,
.
i
i
i i
p̂(Y1:t |θ̂t )p(θ̂t )/q(θ̂t |ϑt )
(10.8)
An approximation of Eπt [h(θ)] is given by
h̄t,N
N
1 X
h(θti )Wti .
=
N i=1
(10.9)
3. For t = T the final importance sampling approximation of Eπ [h(θ)] is given by:
h̄T,N =
N
X
h(θTi )WTi .
(10.10)
i=1
A formal analysis of SM C 2 algorithms is provided in Chopin, Jacob, and Papaspiliopoulos (2012). We will provide a heuristic explanation of why the algorithm correctly approximates the target posterior distribution and comment on some aspects of the implementai
i
}N
, Wt−1
tion. At the end of iteration t − 1 the algorithm has generated particles {θt−1
i=1 .
i
For each parameter value θt−1
there is also a particle filter approximation of the likelihood
i,j M
i
), a swarm of particles {si,j
function p̂(Y1:t−1 |θt−1
t−1 , Wt−1 }j=1 that represents the distribution
i
i
p(st−1 |Y1:t−1 , θt−1
), and the sequence of random vectors U1:t−1
that underlies the simulation
approximation of the particle filter. To gain an understanding of the algorithm it is useful
i
i
i
}N
, Wt−1
, U1:t−1
to focus on the triplets {θt−1
i=1 . Suppose that
Z Z
N
1 X
i
i
i
h(θ, U1:t−1 )p(U1:t−1 )p(θ|Y1:t−1 )dU1:t−1 dθ ≈
h(θt−1
, U1:t−1
)Wt−1
.
N i=1
(10.11)
This implies that we obtain the familiar approximation for functions h(·) that do not depend
on U1:t−1
Z
N
1 X
i
i
h(θ)p(θ|Y1:t−1 )dθ ≈
h(θt−1
)Wt−1
.
N i=1
(10.12)
i
Correction Step. The incremental likelihood p̂(yt |Y1:t−1 , θt−1
) can be evaluated by iterating
i,j M
the particle filter forward for one period, starting from {si,j
t−1 , Wt−1 }j=1 . Using the notation
in (10.1), the particle filter approximation of the likelihood increment can be written as
i
i
i
p̂(yt |Y1:t−1 , θt−1
) = g(yt |Y1:t−1 , U1:t
, θt−1
).
(10.13)
199
The value of the likelihood function for Y1:t can be tracked recursively as follows:
i
i
i
)
p̂(Y1:t |θt−1
) = p̂(yt |Y1:t−1 , θt−1
)p̂(Y1:t−1 |θt−1
(10.14)
i
i
i
i
= g(yt |Y1:t , U1:t
, θt−1
)g(Y1:t−1 |U1:t−1
, θt−1
)
i
i
= g(Y1:t |U1:t
, θt−1
).
i
i
The last equality follows because conditioning g(Y1:t−1 |U1:t−1
, θt−1
) also on Ut does not change
the particle filter approximation of the likelihood function for Y1:t−1 .
By induction, we can deduce from (10.11) that the Monte Carlo average
1
N
PN
approximates the following integral
Z Z
h(θ)g(yt |Y1:t−1 , U1:t , θ)p(U1:t )p(θ|Y1:t−1 )dU1:t dθ
Z
Z
g(yt |Y1:t−1 , U1:t , θ)p(U1:t )dU1:t p(θ|Y1:t−1 )dθ.
=
h(θ)
i=1
i
i
)w̃ti Wt−1
h(θt−1
(10.15)
Provided that the particle filter approximation of the likelihood increment is unbiased, that
is,
Z
g(yt |Y1:t−1 , U1:t , θ)p(U1:t )dU1:t = p(yt |Y1:t−1 , θ)
(10.16)
for each θ, we deduce that h̃t,N is a consistent estimator of Eπt [h(θ)].
Selection Step. The selection step Algorithm 19 is very similar to Algorithm 8. To simplify
the description of the SM C 2 algorithm, we are resampling in every iteration. Moreover, we
are keeping track of the ancestry information in the vector At . This is important, because
for each resampled particle i we not only need to know its value θ̂ti but we also want to
track the corresponding value of the likelihood function p̂(Y1:t |θ̂ti ) as well as the particle
i,j
i
approximation of the state, given by {si,j
t , Wt }, and the set of random numbers U1:t . In
the implementation of the algorithm, the likelihood values are needed for the mutation step
and the state particles are useful for a quick evaluation of the incremental likelihood in the
i
correction step of iteration t + 1 (see above). The U1:t
’s are not required for the actual
implementation of the algorithm but are useful to provide a heuristic explanation for the
validity of the algorithm.
Mutation Step. The mutation step essentially consists of one iteration of the PFMH algorithm described in Chapter 9.1. For each particle i there is a proposed value ϑit and an
associated particle filter approximation p̂(Y1:t |ϑit ) of the likelihood and sequence of random
∗
i
vectors U1:t
drawn from the distribution p(U1:t ) in (10.2). As in (9.8), the densities p(U1:t
)
200
∗
) cancel from the formula for the acceptance probability α(ϑit |θ̂ti ). For the impleand p(U1:t
mentation it is important to record the likelihood value as well as the particle system for the
state st for each particle θti .
10.2
Application to the Small-Scale DSGE Model
We now present an application of the SM C 2 algorithm to the small-scale DSGE model. The
results in this section can be compared to the results obtained in Chapter 9.2. Because the
SM C 2 algorithm requires an unbiased approximation of the likelihood function, we will use
data tempering instead of likelihood tempering as in Chapter 5.3. Overall, we compare the
output of four algorithms: SM C 2 based on the conditionally-optimal PF; SM C 2 based on
the bootstrap PF; SMC based on the Kalman filter likelihood function using data tempering;
and SMC based on the Kalman filter likelihood function using likelihood tempering. In order
to approximate the likelihood function with the particle filter, we are using M = 40, 000
particles for the bootstrap PF and M = 400 particles for the conditionally-optimal PF.
The approximation of the posterior distribution is based on N = 4, 000 particles for θ,
Nφ = T + 1 = 81 stages under data tempering, and Nblocks = 3 blocks for the mutation step.
Table 10.2 summarizes the results from running each algorithm Nrun = 20 times. We
report pooled posterior means from the output of the 20 runs as well as inefficiency factors
InEffN (θ̄) and the standard deviation of the posterior mean approximations across the 20
runs. The results in the column labeled KF(L) are based on the Kalman filter likelihood
evaluation and obtained from the same algorithm that was used in Chapter 5.3. The results
in column KF(D) are also based on the Kalman filter, but the SMC algorithm uses data
tempering instead of likelihood tempering. The columns CO-PF and BS-BF contain SM C 2
results based on the conditionally-optimal and the bootstrap PF, respectively. The pooled
means of the DSGE model parameters computed from output of the KF(L), KF(D), and COPF algorithms are essentially identical. The log marginal data density approximations are
less accurate than the posterior mean approximations and vary for the first three algorithms
from -358.75 to -356.33.
A comparison of the standard deviations and the inefficiency factors indicates that moving from likelihood tempering to data tempering leads to a deterioration of accuracy. For
instance, the standard deviation of the log marginal data density increases from 0.12 to
1.19. As discussed in Chapter 5.3 in DSGE model applications it is important to use a
201
Table 10.2: Accuracy of SM C 2 Approximations
Posterior Mean (Pooled)
Inefficiency Factors
KF(L)
KF(D)
CO-PF
BS-PF
τ
2.65
2.67
2.68
2.53
1.51
κ
0.81
0.81
0.81
0.70
1.40
8.36
40.60
ψ1
1.87
1.88
1.87
1.89
3.29
18.27
22.56
ψ2
0.66
0.66
0.67
0.65
2.72
10.02
43.30
ρr
0.75
0.75
0.75
0.72
1.31
11.39
ρg
0.98
0.98
0.98
0.95
1.32
ρz
0.88
0.88
0.88
0.84
3.16
r(A)
0.45
0.46
0.44
0.46
1.09
(A)
3.32
3.31
3.31
3.56
γ (Q)
0.59
0.59
0.59
0.64
σr
0.24
0.24
0.24
σg
0.68
0.68
0.68
π
σz
ln p(Y )
KF(L) KF(D) CO-PF BS-PF
Std Dev of Means
KF(L)
KF(D)
0.01
0.03
7223
0.00
4785
0.01
4197
60.18
4.28
15.06
0.76
0.01
0.01
0.18
0.02
0.02
0.27
0.01
0.02
0.03
0.34
14979
0.00
0.00
0.01
0.08
250.34
21736
0.00
0.00
0.00
0.04
35.35
10802
0.00
0.00
0.00
0.05
26.58
73.78
7971
0.00
0.02
0.04
0.42
2.15
40.45
158.64
6529
0.01
0.03
0.06
0.40
2.35
32.35
133.25
5296
0.00
0.01
0.03
0.16
0.26
0.75
7.29
43.96
16084
0.00
0.00
0.00
0.06
0.73
1.30
1.48
20.20
5098
0.00
0.00
0.00
0.08
2.32
3.63
26.98
41284
0.32
0.32
0.32
0.42
-357.34
-356.33
-340.47
47.60
6570
CO-PF BS-PF
0.07
-358.75
10.41
0.00
0.00
0.00
0.11
0.120
1.191
4.374
14.49
Notes: Results are based on Nrun = 20 runs of the SM C 2 algorithm with N = 4, 000
particles. D is data tempering and L is likelihood tempering. KF is Kalman filter, CO-PF is
conditionally-optimal PF with M = 400, BS-PF is bootstrap PF with M = 40, 000. CO-PF
and BS-PF use data tempering.
convex tempering schedule that adds very little likelihood information in the initial stages.
The implied tempering schedule of our sequential estimation procedure is linear and adds
a full observation in stage n = 1 (recall that n = 0 corresponds to sampling from the
prior distribution). Replacing the Kalman filter evaluation of the likelihood function by the
conditionally-optimal particle filter, increases the standard deviations further. Compared
to KF(D) the standard deviations of the posterior mean approximations increase by factors
ranging from 1.5 to 5. The inefficiency factor for the KF(D) algorithm range from 1.5 to
40, whereas they range from 20 to 250 for CO-PF. A comparison with Table 9.1 indicates
that the SMC algorithm is more sensitive to the switch from the Kalman filter likelihood
to the particle filter approximation. Using the conditionally-optimal particle filter, there
seems to be no deterioration in accuracy of the RWMH algorithm. Finally, replacing the
conditionally-optimal PF by the bootstrap PF leads to an additional deterioration in accuracy. Compared to KF(D) the standard deviations for the BS-PF approach are an order
of magnitude larger and the smallest inefficiency factor is 4,200. Nonetheless, the pooled
posterior means are fairly close to those obtained from the other three algorithms.
202
10.3
Computational Considerations
The SM C 2 results reported in Table 10.2 are obtained by utilizing 40 processors. We parallelized the likelihood evaluations p̂(Y1:t |θti ) for the θti particles rather than the particle filter
i,j M
computations for the swarms {si,j
t , Wt }j=1 . The likelihood evaluations are computationally
costly and do not require communication across processors. The run time for the SM C 2
with conditionally-optimal PF (N = 4, 000, M = 400) is 23:24 [mm:ss] minutes, where as
the algorithm with bootstrap PF (N = 4, 000 and M = 40, 000) runs for 08:05:35 [hh:mm:ss]
hours. The bootstrap PF performs poorly in terms of accuracy and runtime.
After running the particle filter for the sample Y1:t−1 one could in principle save the particle
swarm for the final state st−1 for each θti . In the period t forecasting step, this information can
then be used to quickly evaluate the likelihood increment. In our experience with the smallscale DSGE model, the sheer memory size of the objects (in the range of 10-20 gigabytes)
precluded us from saving the t−1 state particle swarms in a distributed parallel environment
in which memory transfers are costly. Instead, we re-computed the entire likelihood for Y1:t
in each iteration.
Our sequential (data-tempering) implementation of the SM C 2 algorithm suffers from particle degeneracy in the intial stages, i.e., for small sample sizes. Instead of initially sampling
from the prior distribution, one could initialize the algorithm by using an importance sampler
with a student-t proposal distribution that approximates the posterior distribution obtained
conditional on a small set of observations, e.g., Y1:2 or Y1:5 , as suggested in Creal (2007).
Bibliography
Altug, S. (1989): “Time-to-Build and Aggregate Fluctuations: Some New Evidence,”
International Economic Review, 30(4), 889–920.
An, S., and F. Schorfheide (2007): “Bayesian Analysis of DSGE Models,” Econometric
Reviews, 26(2-4), 113–172.
Anderson, G. (2000): “A Reliable and Computationally Efficient Algorithm for Imposing
the Saddle Point Property in Dynamic Models,” Manuscript, Federal Reserve Board of
Governors.
Andreasen, M. M. (2013): “Non-Linear DSGE Models and the Central Difference Kalman
Filter,” Journal of Applied Econometrics, 28(6), 929–955.
Andrieu, C., A. Doucet, and R. Holenstein (2010): “Particle Markov Chain Monte
Carlo Methods,” Journal of the Royal Statistical Society Series B, 72(3), 269–342.
Ardia, D., N. Bastürk, L. Hoogerheide, and H. K. van Dijk (2012): “A Comparative Study of Monte Carlo Methods for Efficient Evaluation of Marginal Likelihood,”
Computational Statistics and Data Analysis, 56(11), 3398–3414.
Arulampalam, S., S. Maskell, N. Gordon, and T. Clapp (2002): “A Tutorial on
Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal Processing, 50(2), 174–188.
Aruoba, B., P. Cuba-Borda, and F. Schorfheide (2014): “Macroeconomic Dynamics
Near the ZLB: A Tale of Two Countries,” NBER Working Paper, 19248.
Aruoba, S. Boragan, L. B., and F. Schorfheide (2013): “Assessing DSGE Model
Nonlinearities,” NBER Working Paper, 19693, Manuscript, University of Maryland and
University of Pennsylvania.
203
204
Aruoba, S. B., J. Fernández-Villaverde, and J. F. Rubio-Ramı́rez (2006): “Comparing Solution Methods for Dynamic Equilibrium Economies,” Journal of Economic Dynamics and Control, 30(12), 2477–2508.
Berzuini, C., and W. Gilks (2001): “RESAMPLE-MOVE Filtering with Cross-Model
Jumps,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas,
and N. Gordon, pp. 117–138. Springer Verlag.
Bianchi, F. (2013): “Regime Switches, Agents’ Beliefs, and Post-World War II U.S.
Macroeconomic Dynamics,” Review of Economic Studies, 80(2), 463–490.
Binder, M., and H. Pesaran (1997): “Multivariate Linear Rational Expectations Models:
Characterization of the Nature of the Solutions and Their Fully Recursive Computation,”
Econometric Theory, 13(6), 877–888.
Blanchard, O. J., and C. M. Kahn (1980): “The Solution of Linear Difference Models
under Rational Expectations,” Econometrica, 48(5), 1305–1312.
Canova, F., and L. Sala (2009): “Back to Square One: Identification Issues in DSGE
Models,” Journal of Monetary Economics, 56, 431 – 449.
Cappé, O., S. J. Godsill, and E. Moulines (2007): “An Overview of Existing Methods
and Recent Advances in Sequential Monte Carlo,” Proceedings of the IEEE, 95(5), 899–
924.
Cappé, O., E. Moulines, and T. Ryden (2005): Inference in Hidden Markov Models.
Springer Verlag.
Chang, Y., T. Doh, and F. Schorfheide (2007): “Non-stationary Hours in a DSGE
Model,” Journal of Money, Credit, and Banking, 39(6), 1357–1373.
Chen, R., and J. Liu (2000): “Mixture Kalman Filters,” Journal of the Royal Statistical
Society Series B, 62, 493–508.
Chib, S., and E. Greenberg (1995): “Understanding the Metropolis-Hastings Algorithm,” The American Statistician, 49, 327–335.
Chib, S., and I. Jeliazkov (2001): “Marginal Likelihoods from the Metropolis Hastings
Output,” Journal of the American Statistical Association, 96(453), 270–281.
205
Chib, S., and S. Ramamurthy (2010): “Tailored Randomized Block MCMC Methods
with Application to DSGE Models,” Journal of Econometrics, 155(1), 19–38.
Chopin, N. (2002): “A Sequential Particle Filter for Static Models,” Biometrika, 89(3),
539–551.
(2004): “Central Limit Theorem for Sequential Monte Carlo Methods and its
Application to Bayesian Inference,” Annals of Statistics, 32(6), 2385–2411.
Chopin, N., P. E. Jacob, and O. Papaspiliopoulos (2012): “SM C 2 : An Efficient
Algorithm for Sequential Analysis of State-Space Models,” arXiv:1101.1528.
Christiano, L. J. (2002): “Solving Dynamic Equilibrium Models by a Methods of Undetermined Coefficients,” Computational Economics, 20(1-2), 21–55.
Christiano, L. J., M. Eichenbaum, and C. L. Evans (2005): “Nominal Rigidities
and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy,
113(1), 1–45.
Cox, W., and E. Hirschhorn (1983): “The Market Value of US Government Debt,”
Journal of Monetary Economics, 11, 261–272.
Creal, D. (2007): “Sequential Monte Carlo Samplers for Bayesian DSGE Models,” Unpublished Manuscript, Vrije Universitiet.
(2012): “A Survey of Sequential Monte Carlo Methods for Economics and Finance,”
Econometric Reviews, 31(3), 245–296.
Curdia, V., and R. Reis (2009): “Correlated Disturbances and U.S. Business Cycles,”
Working Paper.
(2010): “Correlated Disturbances and U.S. Business Cycles,” Manuscript, Columbia
University and FRB New York.
Davig, T., and E. M. Leeper (2007): “Generalizing the Taylor Principle,” American
Economic Review, 97(3), 607–635.
DeJong, D. N., B. F. Ingram, and C. H. Whiteman (2000): “A Bayesian Approach
to Dynamic Macroeconomics,” Journal of Econometrics, 98(2), 203 – 223.
206
DeJong, D. N., R. Liesenfeld, G. V. Moura, J.-F. Richard, and H. Dharmarajan (2013): “Efficient Likelihood Evaluation of State-Space Representations,” Review of
Economic Studies, 80(2), 538–567.
Del Moral, P. (2004): Feynman-Kac Formulae. Springer Verlag.
(2013): Mean Field Simulation for Monte Carlo Integration. Chapman & Hall/CRC.
Del Negro, M., R. B. Hasegawa, and F. Schorfheide (2014): “Dynamic Prediction Pools: An Investigation of Financial Frictions and Forecasting Performance,” NBER
Working Paper, 20575.
Del Negro, M., and F. Schorfheide (2008): “Forming Priors for DSGE Models (and
How it Affects the Assessment of Nominal Rigidities),” Journal of Monetary Economics,
55(7), 1191–1208.
(2013): “DSGE Model-Based Forecasting,” in Handbook of Economic Forecasting,
ed. by G. Elliott, and A. Timmermann, vol. 2, forthcoming. North Holland, Amsterdam.
Doucet, A., N. de Freitas, and N. Gordon (eds.) (2001): Sequential Monte Carlo
Methods in Practice. Springer Verlag.
Doucet, A., and A. M. Johansen (2011): “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later,” in Handook of Nonlinear Filtering, ed. by D. Crisan, and B. Rozovsky. Oxford University Press.
Durbin, J., and S. J. Koopman (2001): Time Series Analysis by State Space Methods.
Oxford University Press.
Durham, G., and J. Geweke (2014): “Adaptive Sequential Posterior Simulators for
Massively Parallel Computing Environments,” in Advances in Econometrics, ed. by I. Jeliazkov, and D. Poirier, vol. 34, chap. 6, pp. 1–44. Emerald Group Publishing Limited.
Farmer, R., D. Waggoner, and T. Zha (2009): “Understanding Markov Switching
Rational Expectations Models,” Journal of Economic Theory, 144(5), 1849–1867.
Fernández-Villaverde, J., G. Gordon, P. Guerrón-Quintana, and J. F. RubioRamı́rez (2012): “Nonlinear Adventures at the Zero Lower Bound,” National Bureau of
Economic Research Working Paper.
207
Fernandez-Villaverde, J., and J. F. Rubio-Ramirez (2004): “Comparing dynamic
equilibrium models to data: a Bayesian approach,” Journal of Econometrics, 123(1), 153
– 187.
Fernández-Villaverde, J., and J. F. Rubio-Ramı́rez (2007): “Estimating Macroeconomic Models: A Likelihood Approach,” Review of Economic Studies, 74(4), 1059–1087.
(2013): “Macroeconomics and Volatility: Data, Models, and Estimation,” in Advances in Economics and Econometrics: Tenth World Congress, ed. by D. Acemoglu,
M. Arellano, and E. Dekel, vol. 3, pp. 137–183. Cambridge University Press.
Flury, T., and N. Shephard (2011): “Bayesian Inference Based Only On Simulated
Likelihood: Particle Filter Analysis of Dynamic Economic Models,” Econometric Theory,
27, 933–956.
Foerster, A., J. F. Rubio-Ramirez, D. F. Waggoner, and T. Zha (2014): “Perturbation Methods for Markov-Switching DSGE Models,” Manuscript, FRB Kansas City
and Atlanta.
Gali, J. (2008): Monetary Policy, Inflation, and the Business Cycle: An Introduction to
the New Keynesian Framework. Princeton University Press.
Geweke, J. (1989): “Bayesian Inference in Econometric Models Using Monte Carlo Integration,” Econometrica, 57(6), 1317–1399.
(1999): “Using Simulation Methods for Bayesian Econometric Models: Inference,
Development, and Communication,” Econometric Reviews, 18(1), 1–126.
(2005): Contemporary Bayesian Econometrics and Statistics. John Wiley & Sons,
Inc.
Girolami, M., and B. Calderhead (2011): “Riemann Manifold Langevin and Hamilton
Monte Carlo Methods (with discussion),” Journal of the Royal Statistical Society Series
B, 73, 123–214.
Gordon, N., D. Salmond, and A. F. Smith (1993): “Novel Approach to Nonlinear/NonGaussian Bayesian State Estimation,” Radar and Signal Processing, IEE Proceedings F,
140(2), 107–113.
208
Guo, D., X. Wang, and R. Chen (2005): “New Sequential Monte Carlo Methods for
Nonlinear Dynamic Systems,” Statistics and Computing, 15, 135.147.
Gust, C., D. Lopez-Salido, and M. E. Smith (2012): “The Empirical Implications of
the Interest-Rate Lower Bound,” Manuscript, Federal Reserve Board.
Halpern, E. F. (1974): “Posterior Consistency for Coefficient Estimation and Model Selection in the General Linear Hypothesis,” Annals of Statistics, 2(4), 703–712.
Hamilton, J. D. (1989): “A New Approach to the Economic Analysis of Nonstationary
Time Series and the Business Cycle,” Econemetrica, 57(2), 357–384.
(1994): Time Series Analysis. Pri.
Hammersley, J., and D. Handscomb (1964): Monte Carlo Methods. Methuen and Company, London.
Hastings, W. (1970): “Monte Carlo Sampling Methods Using Markov Chains and Their
Applications,” Biometrika, 57, 97–109.
Herbst, E. (2011): “Gradient and Hessian-based MCMC for DSGE Models,” Unpublished
Manuscript, University of Pennsylvania.
Herbst, E., and F. Schorfheide (2014): “Sequential Monte Carlo Sampling for DSGE
Models,” Journal of Applied Econometrics, forthcoming.
Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999): “Bayesian
Model Averaging: A Tutorial,” Statistical Science, 14(4), 382–417.
Holmes, M. (1995): Introduction to Perturbation Methods. Cambridge University Press.
Ireland, P. N. (2004): “A Method for Taking Models to the Data,” Journal of Economic
Dynamics and Control, 28(6), 1205–1226.
Iskrev, N. (2010): “Local Identification of DSGE Models,” Journal of Monetary Economics, 2, 189–202.
Johnson, R. (1970): “Asymptotic Expansions Associated with Posterior Distributions,”
Annals of Mathematical Statistics, 41, 851–864.
Judd, K. (1998): Numerical Methods in Economics. MIT Press, Cambridge.
209
Justiniano, A., and G. E. Primiceri (2008): “The Time-Varying Volatility of Macroeconomic Fluctuations,” American Economic Review, 98(3), 604–641.
Kantas, N., A. Doucet, S. Singh, J. Maciejowski, and N. Chopin (2014): “On
Particle Methods for Parameter Estimation in State-Space Models,” arXiv Working Paper,
1412.8659v1.
Kass, R. E., and A. E. Raftery (1995): “Bayes Factors,” Journal of the American
Statistical Association, 90(430), 773–795.
Kim, J., S. Kim, E. Schaumburg, and C. A. Sims (2008): “Calculating and Using
Second-Order Accurate Solutions of Discrete Time Dynamic Equilibrium Models,” Journal
of Economic Dynamics and Control, 32, 3397–3414.
Kim, J., and F. Ruge-Murcia (2009): “How Much Inflation is Necessary to Grease the
Wheels,” Journal of Monetary Economics, 56, 365–377.
King, R. G., C. I. Plosser, and S. Rebelo (1988): “Production, Growth, and Business
Cycles: I The Basic Neoclassical Model,” Journal of Monetary Economics, 21(2-3), 195–
232.
King, R. G., and M. W. Watson (1998): “The Solution of Singluar Linear Difference
Systems under Rational Expectations,” International Economic Review, 39(4), 1015–1026.
Klein, P. (2000): “Using the Generalized Schur Form to Solve a Multivariate Linear Rational Expectations Model,” Journal of Economic Dynamics and Control, 24(10), 1405–1423.
Kloek, T., and H. K. van Dijk (1978): “Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo,” Econometrica, 46, 1–20.
Koenker, R. (2005): Quantile Regression. Cambridge University Press.
Koenker, R., and G. Bassett (1978): “Regression Quantiles,” Econometrica, 46(1),
33–50.
Kohn, R., P. Giordani, and I. Strid (2010): “Adaptive Hybrid Metropolis-Hastings
Samplers for DSGE Models,” Working Paper.
Kollmann, R. (2014): “Tractable Latent State Filtering for Non-Linear DSGE Models
Using a Second-Order Approximation and Pruning,” Computational Economics, forthcoming.
210
Komunjer, I., and S. Ng (2011): “Dynamic Identification of DSGE Models,” Econometrica, 79(6), 1995–2032.
Koop, G. (2003): Bayesian Econometrics. John Wiley & Sons, Inc.
Künsch, H. R. (2005): “Recursive Monte Carlo Filters: Algorithms and Theoretical Analysis,” Annals of Statistics, 33(5), 1983–2021.
Kydland, F. E., and E. C. Prescott (1982): “Time to Build and Aggregate Fluctuations,” Econometrica, 50(6), 1345–70.
Lancaster, T. (2004): An Introduction to Modern Bayesian Econometrics. Blackwell Publishing.
Leamer, E. E. (1978): Specification Searches. Wiley, New York.
Leeper, E. M., M. Plante, and N. Traum (2010): “Dynamics of Fiscaling Financing
in the United States,” Journal of Econometrics, 156, 304–321.
Liu, J. S. (2001): Monte Carlo Strategies in Scientific Computing. Springer Verlag.
Liu, J. S., and R. Chen (1998): “Sequential Monte Carlo Methods for Dynamic Systems,”
Journal of the American Statistical Association, 93(443), 1032–1044.
Liu, J. S., R. Chen, and T. Logvinenko (2001): “A Theoretical Franework for Sequential Importance Sampling with Resampling,” in Sequential Monte Carlo Methods in
Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon, pp. 225–246. Springer Verlag.
Liu, Z., D. F. Waggoner, and T. Zha (2011): “Sources of Macroeconomic Fluctuations:
A Regime-switching DSGE Approach,” Quantitative Economics, 2, 251–301.
Lubik, T., and F. Schorfheide (2003): “Computing Sunspot Equilibria in Linear Rational Expectations Models,” Journal of Economic Dynamics and Control, 28(2), 273–285.
(2006): “A Bayesian Look at the New Open Macroeconomics,” NBER Macroeconomics Annual 2005.
Maliar, L., and S. Maliar (2014): “Merging Simulation and Projection Approaches
to Solve High-Dimensional Problems with an Application to a New Keynesian Model,”
Quantitative Economics, forthcoming.
211
Metropolis, N., A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller
(1953): “Equations of State Calculations by Fast Computing Machines,” Journal of Chemical Physics, 21, 1087–1091.
Min, C.-K., and A. Zellner (1993): “Bayesian and Non-Bayesian Methods for Combining Models and Forecasts with Applications to Forecasting International Growth Rates,”
Journal of Econometrics, 56(1-2), 89–118.
Müller, U. (2011): “Measuring Prior Sensitivity and Prior Informativeness in Large
Bayesian Models,” Manuscript, Princeton University.
Murray, L. M., A. Lee, and P. E. Jacob (2014): “Parallel Resampling in the Particle
Filter,” arXiv Working Paper, 1301.4019v2.
Neal, R. (2010): “MCMC using Hamiltonian Dynamics,” in Handbook of Markov Chain
Monte Carlo, ed. by S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, pp. 113–162.
Chapman & Hall, CRC Press.
Otrok, C. (2001): “On Measuring the Welfare Costs of Business Cycles,” Journal of
Monetary Economics, 47(1), 61–92.
Phillips, D., and A. Smith (1994): “Bayesian Model Comparison via Jump Diffusions,”
Technical report 94-20, Imperial College of Science, Technology, and Medicine, London.
Pitt, M. K., and N. Shephard (1999): “Filtering via Simulation: Auxiliary Particle
Filters,” Journal of the American Statistical Association, 94(446), 590–599.
Qi, Y., and T. P. Minka (2002): “Hessian-based Markov Chain Monte-Carlo Algorithms,”
Unpublished Manuscript.
Rı́os-Rull, J.-V., F. Schorfheide, C. Fuentes-Albero, M. Kryshko, and
R. Santaeulalia-Llopis (2012): “Methods versus Substance: Measuring the Effects
of Technology Shocks,” Journal of Monetary Economics, 59(8), 826–846.
Robert, C. P. (1994): The Bayesian Choice. Springer Verlag.
Robert, C. P., and G. Casella (2004): Monte Carlo Statistical Methods. Springer.
Roberts, G., and J. S. Rosenthal (1998): “Markov-Chain Monte Carlo: Some Practical
Implications of Theoretical Results,” The Canadian Journal of Statistics, 25(1), 5–20.
212
Roberts, G., and O. Stramer (2002): “Langevin Diffusions and Metropolis-Hastings
Algorithms,” Methodology and Computing in Applied Probability, 4, 337–357.
Roberts, G. O., A. Gelman, and W. R. Gilks (1997): “Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms,” The Annals of Applied Probability,
7(1), 110–120.
Roberts, G. O., and S. Sahu (1997): “Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler,” Journal of the Royal Statistical Society.
Series B (Methodological), 59(2), 291–317.
Roberts, G. O., and R. Tweedie (1992): “Exponential Convergence of Langevin Diffusions and Their Discrete Approximations,” Bernoulli, 2, 341 – 363.
Rotemberg, J. J., and M. Woodford (1997): “An Optimization-Based Econometric
Framework for the Evaluation of Monetary Policy,” in NBER Macroeconomics Annual
1997, ed. by B. S. Bernanke, and J. J. Rotemberg. MIT Press, Cambridge.
Sargent, T. J. (1989): “Two Models of Measurements and the Investment Accelerator,”
Journal of Political Economy, 97(2), 251–287.
Schmitt-Grohé, S., and M. Uribe (2004): “Solving Dynamic General Equilibrium Models Using a Second-Order Approximation to the Policy Function,” Journal of Economic
Dynamics and Control, 28, 755–775.
(2012): “What’s News in Business Cycles?,” Econometrica, forthcoming.
Schorfheide, F. (2000): “Loss Function-based Evaluation of DSGE Models,” Journal of
Applied Econometrics, 15, 645–670.
(2005): “Learning and Monetary Policy Shifts,” Review of Economic Dynamics,
8(2), 392–419.
(2010): “Estimation and Evaluation of DSGE Models: Progress and Challenges,”
NBER Working Paper.
Schorfheide, F., D. Song, and A. Yaron (2014): “Identifiying Long-Run Risks: A
Bayesian Mixed-Frequency Approach,” NBER Working Paper, 20303.
Schwarz, G. (1978): “Estimating the Dimension of a Model,” Annals of Statistics, 6(2),
461–464.
213
Sims, C. A. (2002): “Solving Linear Rational Expectations Models,” Computational Economics, 20, 1–20.
Sims, C. A., D. Waggoner, and T. Zha (2008): “Methods for Inference in Large
Multiple-Equation Markov-Switching Models,” Journal of Econometrics, 146(2), 255–274.
Smets, F., and R. Wouters (2003): “An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area,” Journal of the European Economic Association, 1(5),
1123–1175.
Smets, F., and R. Wouters (2007): “Shocks and Frictions in US Business Cycles: A
Bayesian DSGE Approach,” American Economic Review, 97, 586–608.
Smith, M. (2012): “Estimating Nonlinear Economic Models Using Surrogate Transitions,”
Manuscript, RePEc.
Strid, I. (2009): “Efficient Parallelisation of Metropolis-Hastings Algorithms Using a
Prefetching Approach,” Computational Statistics and Data Analysis, in press.
Tierney, L. (1994): “Markov Chains for Exploring Posterior Distributions,” The Annals
of Statistics, 22(4), 1701–1728.
van der Vaart, A. (1998): Asymptotic Statistics. Cambridge University Press.
Woodford, M. (2003): Interest and Prices. Princeton University Press.
Wright, J. (2008): “Bayesian Model Averaging and Exchange Rate Forecasting,” Journal
of Econometrics, 146, 329–341.
Appendix A
Model Descriptions
A.1
Smets-Wouters Model
The log-linearized equilibrium conditions of the Smets and Wouters (2007) model take the
following form:
ŷt = cy ĉt + iy ît + zy ẑt + εgt
h/γ
1
wlc (σc − 1) ˆ
ĉt =
ĉt−1 +
Et ĉt+1 +
(lt − Et ˆlt+1 )
1 + h/γ
1 + h/γ
σc (1 + h/γ)
1 − h/γ b
1 − h/γ
(r̂t − Et π̂t+1 ) −
ε
−
(1 + h/γ)σc
(1 + h/γ)σc t
1
βγ (1−σc )
1
ît =
î
+
E
î
+
q̂t + εit
t−1
t
t+1
(1−σ
)
(1−σ
)
2
c
c
1 + βγ
1 + βγ
ϕγ (1 + βγ (1−σc ) )
k
q̂t = β(1 − δ)γ −σc Et q̂t+1 − r̂t + Et π̂t+1 + (1 − β(1 − δ)γ −σc )Et r̂t+1
− εbt
ŷt = Φ(αk̂ts + (1 − α)ˆlt + εat )
(A.1)
(A.2)
(A.3)
(A.4)
(A.5)
k̂ts = k̂t−1 + ẑt
(A.6)
1−ψ k
ẑt =
(A.7)
r̂
ψ t
(1 − δ)
k̂t =
k̂t−1 + (1 − (1 − δ)/γ)ît + (1 − (1 − δ)/γ)ϕγ 2 (1 + βγ (1−σc ) )εit (A.8)
γ
µ̂pt = α(k̂ts − ˆlt ) − ŵt + εat
(A.9)
(1−σc )
βγ
ιp
π̂t =
E
π̂
+
π̂t−1
(A.10)
t
t+1
(1−σ
)
c
1 + ιp βγ
1 + βγ (1−σc )
(1 − βγ (1−σc ) ξp )(1 − ξp )
−
µ̂pt + εpt
(1−σ
)
c
(1 + ιp βγ
)(1 + (Φ − 1)εp )ξp
A-1
A-2
r̂tk = ˆlt + ŵt − k̂ts
(A.11)
µ̂w
t
(A.12)
ŵt
r̂t
1
(ĉt − h/γĉt−1 )
= ŵt − σl ˆlt −
1 − h/γ
βγ (1−σc )
1
=
(Et ŵt+1 + Et π̂t+1 ) +
(ŵt−1 − ιw π̂t−1 )
(1−σ
)
c
1 + βγ
1 + βγ (1−σc )
1 + βγ (1−σc ) ιw
(1 − βγ (1−σc ) ξw )(1 − ξw )
µ̂w + εw
−
π̂
−
t
t
1 + βγ (1−σc )
(1 + βγ (1−σc ) )(1 + (λw − 1)w )ξw t
= ρr̂t−1 + (1 − ρ)(rπ π̂t + ry (ŷt − ŷt∗ ))
(A.13)
(A.14)
∗
+r∆y ((ŷt − ŷt∗ ) − (ŷt−1 − ŷt−1
)) + εrt .
The exogenous shocks evolve according to
εat = ρa εat−1 + ηta
(A.15)
εbt = ρb εbt−1 + ηtb
(A.16)
εgt = ρg εgt−1 + ρga ηta + ηtg
(A.17)
εit = ρi εit−1 + ηti
(A.18)
εrt = ρr εrt−1 + ηtr
(A.19)
p
εpt = ρr εpt−1 + ηtp − µp ηt−1
(A.20)
w
w
εw
= ρw εw
t
t−1 + ηt − µw ηt−1 .
(A.21)
A-3
The counterfactual no-rigidity prices and quantities evolve according to
ŷt∗ = cy ĉ∗t + iy î∗t + zy ẑt∗ + εgt
h/γ ∗
1
wlc (σc − 1) ˆ∗
∗
ĉt−1 +
Et ĉ∗t+1 +
(l − Et ˆlt+1
)
ĉ∗t =
1 + h/γ
1 + h/γ
σc (1 + h/γ) t
1 − h/γ b
1 − h/γ ∗
−
rt −
ε
(1 + h/γ)σc
(1 + h/γ)σc t
1
βγ (1−σc )
1
∗
q̂ ∗ + εit
î∗t =
î
+
Et î∗t+1 +
t−1
(1−σ
)
(1−σ
)
2
c
c
1 + βγ
1 + βγ
ϕγ (1 + βγ (1−σc ) ) t
k∗
∗
− εbt
− rt∗ + (1 − β(1 − δ)γ −σc )Et rt+1
q̂t∗ = β(1 − δ)γ −σc Et q̂t+1
ŷt∗ = Φ(αkts∗ + (1 − α)ˆlt∗ + εat )
∗
k̂ts∗ = kt−1
+ zt∗
1 − ψ k∗
ẑt∗ =
r̂
ψ t
(1 − δ) ∗
k̂t−1 + (1 − (1 − δ)/γ)ît + (1 − (1 − δ)/γ)ϕγ 2 (1 + βγ (1−σc ) )εit
k̂t∗ =
γ
ŵt∗ = α(k̂ts∗ − ˆlt∗ ) + εat
r̂tk∗ = ˆlt∗ + ŵt∗ − k̂t∗
1
ŵt∗ = σl ˆlt∗ +
(ĉ∗t + h/γĉ∗t−1 ).
1 − h/γ
(A.22)
(A.23)
(A.24)
(A.25)
(A.26)
(A.27)
(A.28)
(A.29)
(A.30)
(A.31)
(A.32)
The steady state (ratios) that appear in the measurement equation or the log-linearized
equilibrium conditions are given by
γ = γ̄/100 + 1
(A.33)
π ∗ = π̄/100 + 1
(A.34)
r̄ = 100(β −1 γ σc π ∗ − 1)
k
rss
= γ σc /β − (1 − δ)
α
1
α (1 − α)(1−α) 1−α
wss =
k α
Φrss
ik = (1 − (1 − δ)/γ)γ
k
1 − α rss
lk =
α wss
(α−1)
ky = Φlk
iy = (γ − 1 + δ)ky
(A.35)
(A.36)
(A.37)
(A.38)
(A.39)
(A.40)
(A.41)
A-4
cy = 1 − gy − iy
k
ky
zy = rss
k
1 1 − α rss
ky
wlc =
.
λw α
cy
(A.42)
(A.43)
(A.44)
The measurement equations take the form:
Y GRt = γ̄ + ŷt − ŷt−1
IN Ft = π̄ + π̂t
F F Rt = r̄ + R̂t
CGRt = γ̄ + ĉt − ĉt−1
IGRt = γ̄ + ît − ît−1
W GRt = γ̄ + ŵt − ŵt−1
HOU RSt = ¯l + ˆlt .
(A.45)
A-5
Table A-1: Diffuse Prior for SW Model
Parameter
Type
Para (1)
Para (2) Parameter
Type
Para (1)
Para (2)
ϕ
Normal
4.00
4.50
α
Normal
0.30
0.15
σc
Normal
1.50
1.11
ρa
Uniform
0.00
1.00
h
Uniform
0.00
1.00
ρb
Uniform
0.00
1.00
ξw
Uniform
0.00
1.00
ρg
Uniform
0.00
1.00
σl
Normal
2.00
2.25
ρi
Uniform
0.00
1.00
ξp
Uniform
0.00
1.00
ρr
Uniform
0.00
1.00
ιw
Uniform
0.00
1.00
ρp
Uniform
0.00
1.00
ιp
Uniform
0.00
1.00
ρw
Uniform
0.00
1.00
ψ
Uniform
0.00
1.00
µp
Uniform
0.00
1.00
Φ
Normal
1.25
0.36
µw
Uniform
0.00
1.00
rπ
Normal
1.50
0.75
ρga
Uniform
0.00
1.00
ρ
Uniform
0.00
1.00
σa
Inv. Gamma
0.10
2.00
ry
Normal
0.12
0.15
σb
Inv. Gamma
0.10
2.00
r∆y
Normal
0.12
0.15
σg
Inv. Gamma
0.10
2.00
π
Gamma
0.62
0.30
σi
Inv. Gamma
0.10
2.00
Gamma
0.25
0.30
σr
Inv. Gamma
0.10
2.00
Normal
0.00
6.00
σp
Inv. Gamma
0.10
2.00
Normal
0.40
0.30
σw
Inv. Gamma
0.10
2.00
100(β
−1
l
γ
− 1)
Notes: Para (1) and Para (2) correspond to the mean and standard deviation of the Beta,
Gamma, and Normal distributions and to the upper and lower bounds of the support for
the Uniform distribution. For the Inv. Gamma distribution, Para (1) and Para (2) refer to
s and ν, where p(σ|ν, s) ∝ σ −ν−1 e−νs
2 /2σ 2
. The following parameters are fixed during the
estimation: δ = 0.025, gy = 0.18, λw = 1.50, εw = 10.0, and εp = 10.
A-6
Table A-2: SW Model: Original Prior
Parameter
Type
Para (1)
Para (2) Parameter
Type
Para (1)
Para (2)
ϕ
Normal
4.00
1.50
α
Normal
0.30
0.05
σc
Normal
1.50
0.37
ρa
Beta
0.50
0.20
h
Beta
0.70
0.10
ρb
Beta
0.50
0.20
ξw
Beta
0.50
0.10
ρg
Beta
0.50
0.20
σl
Normal
2.00
0.75
ρi
Beta
0.50
0.20
ξp
Beta
0.50
0.10
ρr
Beta
0.50
0.20
ιw
Beta
0.50
0.15
ρp
Beta
0.50
0.20
ιp
Beta
0.50
0.15
ρw
Beta
0.50
0.20
ψ
Beta
0.50
0.15
µp
Beta
0.50
0.20
Φ
Normal
1.25
0.12
µw
Beta
0.50
0.20
rπ
Normal
1.50
0.25
ρga
Beta
0.50
0.20
ρ
Beta
0.75
0.10
σa
Inv. Gamma
0.10
2.00
ry
Normal
0.12
0.05
σb
Inv. Gamma
0.10
2.00
r∆y
Normal
0.12
0.05
σg
Inv. Gamma
0.10
2.00
π
Gamma
0.62
0.10
σi
Inv. Gamma
0.10
2.00
100(β −1 − 1)
Gamma
0.25
0.10
σr
Inv. Gamma
0.10
2.00
l
Normal
0.00
2.00
σp
Inv. Gamma
0.10
2.00
γ
Normal
0.40
0.10
σw
Inv. Gamma
0.10
2.00
Notes: Para (1) and Para (2) correspond to the mean and standard deviation of the Beta,
Gamma, and Normal distributions and to the upper and lower bounds of the support for
the Uniform distribution. For the Inv. Gamma distribution, Para (1) and Para (2) refer to
s and ν, where p(σ|ν, s) ∝ σ −ν−1 e−νs
2 /2σ 2
.
A-7
A.2
Leeper-Plante-Traum Fiscal Policy Model
The log-linearized equilibrium conditions of the Leeper, Plante, and Traum (2010) are given
by:
γ(1 + h)
γh
τc
Ĉt +
Ĉt−1 −
τ̂ c
(A.46)
1−h
1−h
1 + τc t
γ
τc
c
Et Ĉt+1
Et τ̂t+1
+ Et ubt+1 −
= R̂t −
c
1+τ
1−h
τc
ûlt + (1 + κ)ˆlt +
(A.47)
τ̂ c
1 + τc t
τl
γ
γh
= Ŷt −
τ̂tl −
Ĉt +
Ĉt−1
l
1+τ
1−h
1−h
γ
γ(1 + h)
τc
γh
b
c
q̂t = Et ût+1 −
Et Ĉt+1 +
Ĉt −
Et τt+1
Ĉt−1(A.48)
− ûbt −
c
1−h
1−h
1+τ
1−h
Y
Y
Y
τc
k
τ̂tc + β(1 − τ k )α Et Ŷt+1 − β(1 − τ k )α K̂t − βτ k α Et τ̂t+1
+
c
1+τ
K
K
K
−βδ1 Et ν̂t+1 + β(1 − δ0 )Et q̂t+1
δ2
τk
k
Yt =
τ̂ + K̂t+1 + q̂t + 1 +
ν̂t
(A.49)
1 − τk t
δ0
1
(A.50)
0 = 00 q̂t + (1 − β)Iˆt + Iˆt−1 + βEt ûit + βEt ûit+1
s (1)
Y Ŷt = C Ĉt + GĜt + I Iˆt
(A.51)
ûbt −
K̂t = (1 − δ0 )Kt−1 + δ1 ν̂t + δ0 It
B B̂t + τ k αY (τ̂tk + Ŷt ) + τ l (1 − α)Y (τ̂tl + Ŷt ) + τ c C(τ̂tc + Ĉt )
B
B
=
R̂t−1 + B̂t−1 + GĜt + Z Ẑt
β
β
a
Ŷt = ût + ανt + αK̂t−1 + (1 − α)L̂t .
(A.52)
(A.53)
(A.54)
The fiscal policy rules and the law of motion of the exogenous shock processes are provided
in the main text (Chapter 6.3).
A-8
Table A-3: Fiscal Model: Posterior Moments - Part 2
Based on LPT Prior
Mean
[5%, 95%] Int.
Based on Diff. Prior
Mean
[5%, 95%] Int.
Endogenous Propagation Parameters
γ
2.5
[ 1.82, 3.35]
2.5
[ 1.81, 3.31]
κ
2.4
[ 1.70, 3.31]
2.5
[ 1.74, 3.37]
h
0.57
[ 0.46, 0.68]
0.57
[ 0.46, 0.67]
7.0
[ 6.08, 7.98]
6.9
[ 6.06, 7.89]
0.25
[ 0.16, 0.39]
0.24
[ 0.16, 0.37]
s
00
δ2
Endogenous Propagation Parameters
ρa
0.96
[ 0.93, 0.98]
0.96
[ 0.93, 0.98]
ρb
0.65
[ 0.60, 0.69]
0.65
[ 0.60, 0.69]
ρl
0.98
[ 0.96, 1.00]
0.98
[ 0.96, 1.00]
ρi
0.48
[ 0.38, 0.57]
0.47
[ 0.37, 0.57]
ρg
0.96
[ 0.94, 0.98]
0.96
[ 0.94, 0.98]
ρtk
0.93
[ 0.89, 0.97]
0.94
[ 0.88, 0.98]
ρtl
0.98
[ 0.95, 1.00]
0.93
[ 0.86, 0.98]
ρtc
0.93
[ 0.89, 0.97]
0.97
[ 0.94, 0.99]
ρz
0.95
[ 0.91, 0.98]
0.95
[ 0.91, 0.98]
σb
7.2
[ 6.48, 8.02]
7.2
[ 6.47, 8.00]
σl
3.2
[ 2.55, 4.10]
3.2
[ 2.55, 4.08]
σi
5.7
[ 4.98, 6.47]
5.6
[ 4.98, 6.40]
σa
0.64
[ 0.59, 0.70]
0.64
[ 0.59, 0.70]
Appendix B
Data Sources
B.1
Small-Scale New Keynesian DSGE Model
The data from the estimation comes from Lubik and Schorfheide (2006). Here we detail the
construction of the extended sample (2003:I to 2013:IV) for Chapter 8.6.
1. Per Capita Real Output Growth Take the level of real gross domestic product,
(FRED mnemonic “GDPC1”), call it GDPt . Take the quarterly average of the Civilian
Non-institutional Population (FRED mnemonic “CNP16OV” / BLS series “LNS10000000”),
call it P OPt . Then,
GDPt
GDPt−1
Per Capita Real Output Growth = 100 ln
− ln
.
P OPt
P OPt−1
2. Annualized Inflation. Take the CPI price level, (FRED mnemonic “CPIAUCSL”),
call it CP It . Then,
Annualized Inflation = 400 ln
CP It
CP It−1
.
3. Federal Funds Rate. Take the effective federal funds rate (FRED mnemonic “FEDFUNDS”), call it F F Rt . Then,
Federal Funds Rate = F F Rt .
A-9
A-10
B.2
Smets-Wouters Model
The data covers 1966:Q1 to 2004:Q4. The construction follows that of Smets and Wouters
(2007). Output data come from the NIPA; other sources are noted in the exposition.
1. Per Capita Real Output Growth. Take the level of real gross domestic product,
(FRED mnemonic “GDPC1”), call it GDPt . Take the quarterly average of the Civilian
Non-institutional Population (FRED mnemonic “CNP16OV” / BLS series “LNS10000000”),
normalized so that it’s 1992Q3 value is one, call it P OPt . Then,
GDPt
GDPt−1
Per Capita Real Output Growth = 100 ln
− ln
.
P OPt
P OPt−1
2. Per Capita Real Consumption Growth. Take the level of personal consumption
expenditures (FRED mnemonic “PCEC”), call it CON St . Take the level of the GDP
price deflator (FRED mnemonic “GDPDEF”), call it GDP Pt . Then
Per Capita Real Consumption Growth
CON St
CON St−1
= 100 ln
− ln
.
GDP Pt P OPt
GDP Pt−1 P OPt−1
3. Per Capita Real Investment Growth. Take the level of fixed private investment
(FRED mnemonic “FPI”), call it IN Vt . Then,
Per Capita Real Investment Growth
IN Vt
IN Vt−1
= 100 ln
− ln
.
GDP Pt P OPt
GDP Pt−1 P OPt−1
4. Per Capita Real Wage Growth. Take the BLS measure of compensation per
hour for the nonfarm business sector (FRED mnemonic “COMPNFB” / BLS series
“PRS85006103”), call it Wt . Then
Per Capita Real Wage Growth = 100 ln
Wt
GDP Pt
− ln
Wt−1
GDP Pt−1
.
5. Per Capita Hours Index. Take the index of average weekly nonfarm business hours
(FRED mnemonic / BLS series “PRS85006023”), call it HOU RSt . Take the number of
employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value
is 1, call it EM Pt . Then,
Per Capita Hours = 100 ln
The series is then demeaned.
HOU RSt EM Pt
P OPt
.
A-11
6. Inflation. Take the GDP price deflator, then
GDP Pt
.
Inflation = 100 ln
GDP Pt−1
7. Federal Funds Rate. Take the effective federal funds rate (FRED mnemonic “FEDFUNDS”), call it F F Rt . Then,
Federal Funds Rate = F F Rt /4.
B.3
Leeper-Plante-Traum Fiscal Policy Model
The data covers 1960:Q1 to 2008:Q1. The construction follows that of Leeper, Plante, and
Traum (2010). Output data come from the NIPA; other sources are noted in the exposition.
Each series has a (seperate) linear trend removed prior to estimation.
1. Real Investment. Take nominal personal consumption on durable goods (FRED
mnemonic “PCDG”) call it P CEDt and deflate it by the GDP deflator for personal
consumption (FRED mnemonic “DPCERD3Q086SBEA”) call it Pt . Take the number of
employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value
is 1, call it P OPt . Then,
Real Investment = 100 ln
P CEDt /Pt
P OPt
.
2. Real Consumption.Take nominal personal consumption on durable goods (FRED
mnemonic “PCNG”) call it P CEt and deflate it by the GDP deflator for personal
consumption (FRED mnemonic “DPCERD3Q086SBEA”) call it Pt . Take the number
of employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3
value is 1, call it P OPt . Then,
Real Consumption = 100 ln
P CEt /Pt
P OPt
.
3. Real Hours worked. Take the index of average weekly nonfarm business hours
(FRED mnemonic / BLS series “PRS85006023”), call it HOU RSt . Take the number of
employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value
is 1, call it EM Pt . Then,
Real Hours Worked = 100 ln
HOU RSt EM Pt
P OPt
.
A-12
4. Real Consumption Tax Revenues. Take Federal govenment current tax receipts
from production and imports (FRED mnemonic “W007RC1Q027SBEA”), call it CT AXt .
Then,
Real Consumption Tax Revenues = 100 ln
CT AXt /Pt
P OPt
.
5. Real Labor Tax Revenues. Take personal current tax revenues (FRED mnemonic
“A074RC1Q027SBEA”), call it ITt , take wage and salary accruals (FRED mnemonic
“WASCUR”), call it Wt , and take proprietors’ incomes (FRED mnemonic “A04RC1Q027SBEA”),
call it P RIt . Take rental income (FRED mnemoic “RENTIN”), call it REN Tt , take corporate profits (FRED mnemonic “CPROFIT”), call it P ROFt , and take interest income
(FRED mnemonic “W255RC1Q027SBEA”), call it IN Tt . Define capital income, CIt ,
as
CIt = REN Tt + P ROFT + IN Tt + P RI/2.
Define the average personal income tax rate as:
τtp =
ITt
Wt + P RIt /2 + CIt .
Take contributions for government social insurance (FRED mnemonic “W780RC1Q027SBEA”)
call it CSIt and take compensation for employees (FRED mnemonic “A57RC1Q027SBEA”)
call it ECt . Define the average labor income tax rate as
τtl =
τtp (Wt + P RIt ) + CSIt
.
ECt + P RIt /2
Take the tax base BASEt = P CEt + P CEDt . Then
l
τt BASEt /Pt
Real Labor Tax Revenues = 100 ln
.
P OPt
6. Real Capital Tax Revenues. Take taxes on corporate income (FRED mnemonic
“B075RC1Q027SBEA”) call it CTt and take property taxes (FRED mnemonic “B249RC1Q027SBEA”)
and call it P Tt . Define the average capital income tax rate as
τk =
τ p CIt + CTt
.
CIt + P Tt
Then
Real Capital Tax Revenues = 100 ln
τtk BASEt /Pt
P OPt
.
A-13
7. Real Government Expenditure. Take government consumption expenditure (FRED
mnemonic “FGEXPND”) call it GCt . Take government gross investment (FRED mnemonic
“A787RC1Q027SBEA”) call it GIt . Take government net purchases of non-produced
assets (FRED mnemonic “AD08RC1A027NBEA”) call it GPt . Finally, take government
consumption of fixed capital (FRED mnemonic “A918RC1Q027SBEA”) call it GCKt .
Define Gt = GCt + GIt + GPt − GCKt . Then
Real Government Expenditure = 100 ln
Gt /Pt
P OPt
.
8. Real Government Transfers. Take current transfer payments (FRED mnemonic
“A084RC1Q027SBEA”) call T RAN SP AYt and take current transfer receipts (FRED
mnemonic “A577RC1Q027SBEA”) call it T RAN SRECt . Define net current transfers
as
CU RRT RAN St = T RAN SP AYt − T RAN SRECt .
Take capital transfer payments (FRED mnemonic “W069RC1Q027SBEA”) call it CAP T RAN SP AYt
and take capital transfer receipts (FRED mnemonic “B232RC1Q027SBEA”) call it
CAP T RAN SRECt . Define net capital transfers as
CAP T RAN St = CAP T RAN SP AYt − CAP T RAN SRECt .
Take current tax receipts (FRED mnemonic “W006RC1Q027SBEA”), call it T AXRECt ,
take income receipts on assets (FRED mnemonic “W210RC1Q027SBEA”), call it IN CRECt ,
and take the current surplus of government enterprises (FRED mnemonic “A108RC1Q027SBEA”),
call it, GOV SRPt . Define the total tax revenue, Tt , as the sum on consumption, labor,
and capital tax revenues. Define the tax residual as
T AXRESIDt = T AXREC + IN CREC + CSIt + GOV SRPt − Tt .
Define
T Rt = CU RRT RAN St + CAP T RAN St − T AXRESIDt .
Then
Real Government Transfers = 100 ln
T Rt /Pt
P OPt
.
9. Real Government Debt. Take interest payments (FRED mnemonic “A091RC1Q027SBEA”),
call it IN T P AYt . Define net borrowing as
N Bt = Gt + IN Tt + T Rt − Tt .
A-14
Take the adjusted monetary base (FRED mnemonic “AMBSL”), take quarterly averages,
then call it Mt . Then define government debt–starting in 1947–as
Bt = N Bt − ∆Mt + Bt−1 .
The value B1947Q1 is from Cox and Hirschhorn (1983). Then,
Bt /Pt
Real Government Debt = 100 ln
P OPt
Download