Particle MCMC and Sequential Monte Carlo Squared for DSGE Models

Particle MCMC and Sequential Monte Carlo Squared for DSGE Models Edward Herbst Frank Schorfheide∗ Federal Reserve Board University of Pennsylvania CEPR, and NBER February 13, 2015 Abstract This paper explores the efficacy and accuracy of Bayesian estimation of nonlinear dynamic stochastic equlibirium (DSGE) models using two different approaches. First, we consider particle Markov chain Monte Carlo (PMCMC) techniques [Andrieu, Doucet, and Holenstein (2010)], wherein a particle filter-based estimate of the likelihood function is used in lieu of an exact likelihood within an MCMC algorithm. Second, we consider an alternative approach in which the parameters are estimated jointly with the states of the nonlinear model, as in Chopin, Jacob, and Papaspiliopoulos (2012). We assess the effectiveness of various configurations of these two algorithms using two DSGE models where the true likelihood is known. Specifically, we focus on how the number of particles, type of particle filter, and size of model affects the numerical efficiency of both posterior samplers. We also discuss computational considerations important for implementing both approaches. JEL CLASSIFICATION: C11, C15, E10 KEY WORDS: Bayesian Analysis, DSGE Models, Monte Carlo Methods ∗ Correspondence: E. Herbst: Board of Governors of the Federal Reserve System, 20th Street and Constitution Avenue N.W., Washington, D.C. 20551. Email: edward.p.herbst@frb.gov. F. Schorfheide: Department of Economics, 3718 Locust Walk, University of Pennsylvania, Philadelphia, PA 19104-6297. Email: schorf@ssc.upenn.edu. 1 1 Introduction We explore the efficacy and accuracy of Bayesian estimation of nonlinear dynamic stochastic equlibirium (DSGE) models. Bayesian estimation of such models is challenging and relatively few papers have attempted full posterior elicitation: see, for example, Fernandez-Villaverde and Rubio-Ramirez (2004) and Gust, Lopez-Salido, and Smith (2012). Therefore, there is relatively little understanding of the econometric performance previously-used methods, such as particle MCMC, and newer Sequential Monte Carlo techniques. We assesses two posterior simulators for nonlinear state space models. First, we consider particle Markov chain Monte Carlo (PMCMC) techniques [Andrieu, Doucet, and Holenstein (2010)], wherein a particle filter-based estimate of the likelihood function is used in lieu of an exact likelihood within an MCMC algorithm. Second, we consider alternative approach in which the parameters are estimated jointly with the states of the nonlinear model, as in Chopin, Jacob, and Papaspiliopoulos (2012), using so-called the sequential Monte Carlo squared (SM C 2 ) algorithm. Since we are only interested in the econometrics of both simulators, we use linear models as our benchmarks. This allows us to compare both approaches to their Kalman filter-based equivalents, and thus isolate the loss in efficiency attributable solely to the particle approximation to the likelihood. We use a small-scale New Keynesian model and the Smets and Wouters (2007)–henceforth SW model–to get a sense of how model complexity affects the simulators. For both approaches, the particle filter forms the basis of the estimation. In our examples, we span (loosely speaking) “best” and “worst” case scenarios, using both the naive bootstrap particle filter and a particle filter tailored with a so-called “conditionally-optimal” proposal distribution. We also vary the number of particles as well as several other tuning parameters in our examination. Our results can be summarized as follows. For both P M CM C and SM C 2 There can be substantial gain to adapting the particle filter to the particular economic model being estimated. The bootstrap particle filter performs very poorly in most cases, requiring substantially more particles to obtain reliable likelihood estimates. This inaccuracy affects the posterior simulators: for the larger, SW model, the particle MCMC algorithm broke down completely under the bootstrap particle filter. Finally, SM C 2 seems more sensitive to the accuracy of the particle approximation than P M CM C. While P M CM C using the conditionally-optimal particle filter perform nearly as well as their Kalman Filter-based counterparts, the same is not true for SM C 2 , which is clearly less efficient than a Kalman Filter-based SM C algorithm. 2 The material that follows is abstracted from the yet-to-be-published book monograph, “Bayesian Estimation of DSGE Models.” The discussion of the particle MCMC and Sequential Monte Carlo methods build off some of the earlier material in the book, so we include relevant sections here for completeness. We provide a rough table of contents for navigating the document below. • Chapter 1. Describes the small-scale New Keynesian model and the SW model used in the simulations. • Chapter 4. Describes the basics of the random walk Metroplis-Hastings algorithm that is the backbone of the particle MCMC method. • Chapter 5. Describes the basic Sequential Monte Carlo (SMC) method for estimating model parameters that forms the basis for the SM C 2 method. • Chapter 8. Introduces the particle filter for evaluating the likelihood of nonlinear state space models. We define and describe the bootstrap version of the particle filter as well as one based on “conditionally optimal” proposals. • Chapter 9. Describes the PMCMC algorithm and shows posterior simulation results for small-scale new Keynesian model and the SW model. • Chapter 10. Describes the SM C 2 algorithm and shows posterior simulation results for the small-scale New Keynesian model. Bayesian Estimation of DSGE Models Edward Herbst Frank Schorfheide February 5, 2015 Chapter 1 DSGE Modeling Estimated dynamic stochastic general equilibrium (DSGE) models are now widely used by academics to conduct empirical research macroeconomics as well as by central banks to interpret the current state of the economy, analyze the impact of changes in monetary or fiscal policy, and to generate predictions for key macroeconomic aggregates. The term DSGE model encompasses a broad class of macroeconomic models that span the real business cycle models of Kydland and Prescott (1982) and King, Plosser, and Rebelo (1988) as well as the New Keynesian models of Rotemberg and Woodford (1997) or Christiano, Eichenbaum, and Evans (2005), which feature nominal price and wage rigidities and a role for central banks to adjust interest rates in response to inflation and output fluctuations. A common feature of these models is that decision rules of economic agents are derived from assumptions about preferences and technologies by solving intertemporal optimization problems. Moreover, agents potentially face uncertainty with respect to aggregate variables such as total factor productivity or nominal interest rates set by a central bank. This uncertainty is generated by exogenous stochastic processes that may shift technology or generate unanticipated deviations from a central bank’s interest-rate feedback rule. The focus of this book is the Bayesian estimation of DSGE models. Conditional on distributional assumptions for the exogenous shocks, the DSGE model generates a likelihood function, that is, a joint probability distribution for the endogenous model variables such as output, consumption, investment, and inflation that depends on the structural parameters of the model. These structural parameters characterize agents’ preferences, production technologies, and the law of motion of the exogenous shocks. In a Bayesian framework, this likelihood function can be used to transform a prior distribution for the structural parameters 4 of the DSGE model into a posterior distribution. This posterior is the basis for substantive inference and decision making. Unfortunately, it is not feasible to characterize moments and quantiles of the posterior distribution analytically. Instead, we have to use computational techniques to generate draws from the posterior and then approximate posterior expectations by Monte Carlo averages. In Section 1.1 we will present a small-scale New Keynesian DSGE model and describe the decision problems of firms and households and the behavior of the monetary and fiscal authorities. We then characterize the resulting equilibrium conditions. This model is subsequently used in many of the numerical illustrations. Section 1.2 briefly sketches two other DSGE models that will be estimated in subsequent chapters. 1.1 A Small-Scale New Keynesian DSGE Model We begin with a small-scale New Keynesian DSGE model that has been widely studied in the literature (see Woodford (2003) or Gali (2008) for textbook treatments). The particular specification presented below is based on An and Schorfheide (2007). The model economy consists of final good producing firms, intermediate goods producing firms, households, a central bank, and a fiscal authority. We will first describe the decision problems of these agents, then describe the law of motion of the exogenous processes, and finally summarize the equilibrium conditions. The likelihood function for a linearized version of this model can be quickly evaluated, which makes the model an excellent showcase for the computational algorithms studied in this book. 1.1.1 Firms Production takes place in two stages. There are monopolistically-competitive intermediate goods producing firms and perfectly competitive final goods producing firms that aggregate the intermediate goods into a single good that is used for household and government consumption. This two-stage production process makes it fairly straightforward to introduce price stickiness, which in turn creates a real effect of monetary policy. The perfectly competitive final good producing firms combine a continuum of intermediate goods indexed by j ∈ [0, 1] using the technology 1 Z 1 1−ν 1−ν Yt = Yt (j) dj . 0 (1.1) 5 The final good producers take input prices Pt (j) and output prices Pt as given. The revenue from the sale of the final good is Pt Yt and the input costs incurred to produce Yt are R1 Pt (j)Yt (j)dj. Maximization of profits 0 1 1−ν 1 Z 1−ν Πt = Pt Yt (j) − dj 0 1 Z Pt (j)Yt (j)dj, (1.2) 0 with respect to the inputs Yt (j) implies that the demand for intermediate good j is given by Yt (j) = Pt (j) Pt −1/ν Yt . (1.3) Thus, the parameter 1/ν represents the elasticity of demand for each intermediate good. In the absence of an entry cost final good producers will enter the market until profits are equal to zero. From the zero-profit condition, it is possible to derive the following relationship between the intermediate goods prices and the price of the final good: 1 Z Pt (j) Pt = ν−1 ν ν ν−1 dj . (1.4) 0 Intermediate good j is produced by a monopolist who has access to the following linear production technology: Yt (j) = At Nt (j), (1.5) where At is an exogenous productivity process that is common to all firms and Nt (j) is the labor input of firm j. To keep the model simple, we abstract from capital as a factor or production for now. Labor is hired in a perfectly competitive factor market at the real wage Wt . In order to introduce nominal price stickiness, we assume that firms face quadratic price adjustment costs φ ACt (j) = 2 Pt (j) −π Pt−1 (j) 2 Yt (j), (1.6) where φ governs the price rigidity in the economy and π is the steady state inflation rate associated with the final good. Under this adjustment cost specification it is costless to change prices at the rate π. If the price change deviates from π the firm incurs a cost in terms of lost output that is a quadratic function of the discrepancy between the price change and π. The larger the adjustment cost parameter φ, the more reluctant the intermediate goods producers are to change their prices and the more rigid the prices are at the aggregate 6 level. Firm j chooses its labor input Nt (j) and the price Pt (j) to maximize the present value of future profits " Et ∞ X β s Qt+s|t s=0 # Pt+s (j) Yt+s (j) − Wt+s Nt+s (j) − ACt+s (j) . Pt+s (1.7) Here, Qt+s|t is the time t value of a unit of the consumption good in period t + s to the household, which is treated as exogenous by the firm. 1.1.2 Households The representative household derives utility from consumption Ct relative to a habit stock (which is approximated by the level of technology At )1 and real money balances Mt /Pt . The household derives disutility from hours worked Ht and maximizes "∞ # X (Ct+s /At+s )1−τ − 1 M t+s βs Et + χM ln , − χH Ht+s 1 − τ P t+s s=0 (1.8) where β is the discount factor, 1/τ is the intertemporal elasticity of substitution, and χM and χH are scale factors that determine steady state real money balances and hours worked. We will set χH = 1. The household supplies perfectly elastic labor services to the firms, taking the real wage Wt as given. The household has access to a domestic bond market where nominal government bonds Bt are traded that pay (gross) interest Rt . Furthermore, it receives aggregate residual real profits Dt from the firms and has to pay lump-sum taxes Tt . Thus, the household’s budget constraint is of the form Pt Ct + Bt + Mt + Tt = Pt Wt Ht + Rt−1 Bt−1 + Mt−1 + Pt Dt + Pt SCt , (1.9) where SCt is the net cash inflow from trading a full set of state-contingent securities. 1.1.3 Monetary and Fiscal Policy Monetary policy is described by an interest rate feedback rule of the form ρR R,t Rt = Rt∗ 1−ρR Rt−1 e , 1 (1.10) This assumption ensures that the economy evolves along a balanced growth path even if the utility function is additively separable in consumption, real money balances, and leisure. 7 where R,t is a monetary policy shock and Rt∗ is the (nominal) target rate: Rt∗ = rπ ∗ π ψ1 Y ψ2 t ∗ π t Yt∗ . (1.11) Here r is the steady state real interest rate (defined below), πt is the gross inflation rate defined as πt = Pt /Pt−1 , and π ∗ is the target inflation rate. Yt∗ in (1.11) is the level of output that would prevail in the absence of nominal rigidities. We assume that the fiscal authority consumes a fraction ζt of aggregate output Yt , that is Gt = ζt Yt , and that ζt ∈ [0, 1] follows an exogenous process specified below. The government levies a lump-sum tax Tt (subsidy) to finance any shortfalls in government revenues (or to rebate any surplus). The government’s budget constraint is given by Pt Gt + Rt−1 Bt−1 + Mt−1 = Tt + Bt + Mt . 1.1.4 (1.12) Exogenous Processes The model economy is perturbed by three exogenous processes. Aggregate productivity evolves according to ln At = ln γ + ln At−1 + ln zt , where ln zt = ρz ln zt−1 + z,t . (1.13) Thus, on average technology grows at the rate γ and zt captures exogenous fluctuations of the technology growth rate. Define gt = 1/(1 − ζt ), where ζt was previously defined as the fraction of aggregate output purchased by the government. We assume that ln gt = (1 − ρg ) ln g + ρg ln gt−1 + g,t . (1.14) Finally, the monetary policy shock R,t is assumed to be serially uncorrelated. The three innovations are independent of each other at all leads and lags and are normally distributed with means zero and standard deviations σz , σg , and σR , respectively. 1.1.5 Equilibrium Relationships We consider the symmetric equilibrium in which all intermediate goods producing firms make identical choices so that the j subscript can be omitted. The market clearing conditions are given by Yt = Ct + Gt + ACt and Ht = Nt . (1.15) 8 Because the households have access to a full set of state-contingent claims, it turns out that Qt+s|t in (1.7) is Qt+s|t = (Ct+s /Ct )−τ (At /At+s )1−τ . (1.16) Thus, in equilibrium households and firms are using the same stochastic discount factor. Moreover, it can be shown that output, consumption, interest rates, and inflation have to satisfy the following optimality conditions " # −τ Ct+1 /At+1 At Rt 1 = βEt Ct /At At+1 πt+1 τ 1 Ct 1 π 1 = 1− + φ(πt − π) 1 − πt + ν At 2ν 2ν " # −τ Ct+1 /At+1 Yt+1 /At+1 −φβEt (πt+1 − π)πt+1 . Ct /At Yt /At (1.17) (1.18) (1.17) is the consumption Euler equation which reflects the first-order-condition with respect to the government bonds Bt . In equilibrium, the household equates the marginal utility of consuming a dollar today with the discounted marginal utility from investing the dollar, earning interest Rt and consuming it in the next period. (1.18) characterizes the profit maximizing choice of the intermediate goods producing firms. The first-order condition for the firms’ problem depends on the wage Wt . We used the households’ labor supply condition to replace Wt by a function of the marginal utility of consumption. In the absence of nominal rigidities (φ = 0) aggregate output is given by Yt∗ = (1 − ν)1/τ At gt , (1.19) which is the target level of output that appears in the monetary policy rule (1.11). In Chapter 2.1 we will use a solution technique for the DSGE model that is based on a Taylor series approximation of the equilibrium conditions. A natural point around which to construct this approximation is the steady state of the DSGE model. The steady state is attained by setting the innovations R,t , g,t , and z,t to zero at all times. Because technology ln At evolves according to a random walk with drift ln γ, consumption and output need to be detrended for a steady state to exist. Let ct = Ct /At and yt = Yt /At , and yt∗ = Yt∗ /At . Then the steady state is given by π = π∗, r= γ , β R = rπ ∗ , c = (1 − ν)1/τ , and y = gc = y ∗ . (1.20) Steady state inflation equals the targeted inflation rate π∗ ; the real rate depends on the growth rate of the economy γ and the reciprocal of the households’ discount factor β; and 9 finally steady state output can be determined from the aggregate resource constraint. the nominal interest rate is determined by the Fisher equation; the dependence of the steady state consumption on ν reflects the distortion generated by the monopolistic competition among intermediate goods producers. We are now in a position to rewrite the equilibrium conditions by expressing each variable in terms of percentage deviations from its steady state value. Let x̂t = ln(xt /x) and write i h 1 = βEt e−τ ĉt+1 +τ ĉt +R̂t −ẑt+1 −π̂t+1 1 − ν τ ĉt 1 1 π̂t π̂t e −1 = e −1 1− e + νφπ 2 2ν 2ν π̂t+1 −τ ĉt+1 +τ ĉt +ŷt+1 −ŷt +π̂t+1 − β Et e −1 e 2 φπ 2 g π̂t eĉt −ŷt = e−ĝt − e −1 2 R̂t = ρR R̂t−1 + (1 − ρR )ψ1 π̂t + (1 − ρR )ψ2 (ŷt − ĝt ) + R,t (1.21) (1.22) (1.23) (1.24) ĝt = ρg ĝt−1 + g,t (1.25) ẑt = ρz ẑt−1 + z,t . (1.26) The equilibrium law of motion of consumption, output, interest rates, and inflation has to satisfy the expectational difference equations (1.21) to (1.26). 1.2 Other DSGE Models Considered in this Book In addition to the small-scale New Keynesian DSGE model, we consider two other models: the widely-used Smets-Wouters (SW) model, which is a more elaborate version of the smallscale DSGE model that includes capital accumulation as well as wage rigidities, and a real business cycle model with a detailed characterization of fiscal policy. We will present a brief overview of these models below and provide further details as needed in Chapter 6. 1.2.1 The Smets-Wouters Model The Smets and Wouters (2007) model is a more elaborate version of the small-scale DSGE model presented in the previous section. In the SW model capital is a factor of intermediate goods production, and in addition to price stickiness the model features nominal wage stickiness. In order to generate a richer autocorrelation structure, the model also includes 10 investment adjustment costs, habit formation in consumption, and partial dynamic indexation of prices and wages to lagged values. The model is based on work by Christiano, Eichenbaum, and Evans (2005), who added various forms of frictions to a basic New Keynesian DSGE model in order to capture the dynamic response to a monetary policy shock as measured by a structural vector autoregression (VAR). In turn (the publication dates are misleading), Smets and Wouters (2003) augmented the Christiano-Eichenbaum-Evans model by additional exogenous structural shocks (among them price markup shocks, wage markup shocks, preference shocks, and others) to be able to capture the joint dynamics of Euro Area output, consumption, investment, hours, wages, inflation, and interest rates. The Smets and Wouters (2003) paper has been highly influential, not just in academic circles but also in central banks because it demonstrated that a modern DSGE model that is usable for monetary policy analysis can achieve a time series fit that is comparable to a less restrictive vector autoregression (VAR). The 2007 version of the SW model contains a number of minor modifications of the 2003 model in order to optimize its fit on U.S. data. We will use the 2007 model exactly as it is presented in Smets and Wouters (2007) and refer the reader to that article for details. The log-linearized equilibrium conditions are reproduced in Appendix A.1. By now, the SW model has become one of the workhorse models in the DSGE model literature and in central banks around the world. It forms the core of most large-scale DSGE model that augment the SW model with additional features such as a multi-sector production structure or financial frictions. Because of its widespread use, we will consider its estimation in this book. 1.2.2 A DSGE Model For the Analysis of Fiscal Policy In the small-scale New Keynesian DSGE model and in the SW model fiscal policy is passive and non-distortionary. The level of government spending as a fraction of GDP is assumed to evolve exogenously and an implicit money demand equation determines the amount of seignorage generated by the interest rate feedback rule. Fiscal policy is passive in the sense that the government raises lump-sum taxes (or distributes lump-sum transfers) to ensure that its budget constraint is satisfied in every period. These lump-sum taxes are non-distortionary, because they do not affect the decisions of households and firms. The exact magnitude of the lump-sum taxes and the level of government debt are not uniquely determined, but they also do not matter for macroeconomic outcomes. Both the small-scale DSGE model and the Chapter 4 Metropolis-Hastings Algorithms for DSGE Models To date, the most widely used method to generate draws from posterior distributions of a DSGE model is the random walk MH (RWMH) algorithm. This algorithm is a special case of the generic Algorithm 5 introduced in Chapter 3.5 in which the proposal distribution q(ϑ|θi−1 ) can be expressed as the random walk ϑ = θi−1 +η and η is drawn from a distribution that is centered at zero. We introduce a benchmark RWMH algorithm in Section 4.1 and apply it to the small-scale New Keynesian DSGE model in Section 4.2. The DSGE model likelihood function in combination with the prior distribution presented in Chapter 2.3 leads to posterior distribution that has a fairly regular elliptical shape. In turn, the draws from a simple RWMH algorithm can be used to obtain an accurate numerical approximation of posterior moments. Unfortunately, in many other applications, in particular those involving medium- and large-scale DSGE models the posterior distributions could be very non-elliptical. Irregularlyshaped posterior distributions are often caused by identification problems or misspecification. The DSGE model may suffer from a local identification problem that generates a posterior that is very flat in certain directions of the parameter space, similar to the posterior encountered in the simple set-identified model of Chapter 3.3. Alternatively, the posterior may exhibit multimodal features. Multimodality could be caused by the data’s inability to distinguish between the role of a DSGE model’s external and internal propagation mechanisms. For instance, inflation persistence can be generated by highly autocorrelated cost-push shocks or by firms’ inability to frequently re-optimize their prices in view of fluctuating marginal 58 costs. We use a very stylized state-space model to illustrate these challenges for posterior simulators in Section 4.3. In view of the difficulties caused by irregularly-shaped posterior surfaces, we review a variety of alternative MH samplers in Section 4.4. These algorithms differ from the RWMH algorithm in two dimensions. First, they use alternative proposal distributions q(ϑ|θi−1 ). In general, we consider distributions of the form q(·|θi−1 ) = pt (·|µ(θi−1 ), Σ(θi−1 ), ν), (4.1) where pt (·) refers to the density of a student-t distribution Thus, our exploration of proposal densities concentrates on different ways of forming the location parameter µ(·) and the scale matrix Σ(·). For ν = ∞ this notation nests Gaussian proposal distributions. The second dimension in which we generalize the algorithm is blocking, i.e., we group the parameters into subvectors, and use a Block MH sampler to draw iteratively from conditional posterior distributions. While the alternative MH samplers are designed for irregular posterior surfaces for which the simple RWMH algorithm generates inaccurate approximations, we illustrate the performance gains obtained through these algorithms using the simple New Keynesian DSGE model in Section 4.5. Similar to the illustrations in Chapter 3.5, we evaluate the accuracy of the algorithms by computing the variance of Monte Carlo approximations across multiple chains. Our simulations demonstrate that careful tailoring of proposal densities q(ϑ|θi−1 ) as well as blocking the parameters can drastically improve the accuracy of Monte Carlo approximations. Finally, Section 4.6 takes a brief look at the numerical approximation of marginal data densities that are used to compute posterior model probabilities. In Chapter 3.5, we showed directly that the Monte Carlo estimates associated with the discrete MH algorithm satisfied a SLLN and CLT for dependent, identically distributed random variables. All of the MH algorithms here give rise to Markov chains that are recurrent, irreducible and aperiodic for the target distribution of interest. These properties are sufficient for a SLLN to hold. However, validating conditions for a CLT to hold is much more difficult and beyond the scope of this book. 4.1 A Benchmark Algorithm The most widely-used MH algorithm for DSGE model applications is the random walk MH (RWMH) algorithm. The mean of the proposal distribution in (4.1) is simply the current 59 location in the chain and its variance is prespecified: µ(θi−1 ) = θi−1 and Σ(θi−1 ) = c2 Σ̂ (4.2) The name of the algorithm comes from the random walk form of the proposal, which can be written as ϑ = θi−1 + η where η is mean zero with variance c2 Σ̂. Given the symmetric nature of the proposal distribution, the acceptance probability becomes p(ϑ|Y ) α = min ,1 . p(θi−1 |Y ) A draw, ϑ, is accepted with probability one if the posterior at ϑ has a higher value than the posterior at θi−1 . The probability of acceptance decreases as the posterior at the candidate value decreases relative to the current posterior. To implement the RWMH, the user still needs to specify ν, c, and Σ̂. For all of the variations of the RWMH we implement, we set ν = ∞ in (4.1), that is, we use a multivariate normal proposal distribution in keeping with most of the literature. Typically, the choice of c is made conditional on Σ̂, so we first discuss the choice for Σ̂. The proposal variance controls the relative variances and correlations in the proposal distribution. As we have seen in Chapter 3.5, the sampler can work very poorly if q is strongly at odds with the target distribution. This intuition extends to the multivariate setting here. Suppose θ, comprises two parameters, say β and δ, that are highly correlated in the posterior distribution. If the variance of the proposal distribution does not capture this correlation, e.g., the matrix Σ̂ is diagonal, then the draw ϑ is unlikely to reflect the fact that if β is large then δ should also be large, and vice versa. Therefore, p(ϑ|Y ) is likely to be smaller than p(θi−1 |Y ), and so the proposed draw will be rejected with high probability. As a consequence, the chain will have a high rejection rate, exhibit a high autocorrelation, and the Monte Carlo estimates derived from it will have a high variance. A good choice for Σ̂ seeks to incorporate information from the posterior, to potentially capture correlations discussed above. Obtaining this information can difficult. A popular approach, used in Schorfheide (2000), is to set Σ̂ to be the negative of the inverse Hessian at the mode of the log posterior, θ̂, obtained by running a numerical optimization routine before running MCMC. Using this as an estimate for the covariance of the posterior is attractive, because it can be viewed as a large sample approximation to the posterior covariance matrix 60 as the sample size T −→ ∞. There exists a large literature on the asymptotic normality of posterior distributions. Fundamental conditions can be found, for instance, in Johnson (1970). Unfortunately, in many applications the maximization of the posterior density is tedious and the numerical approximation of the Hessian may be inaccurate. These problems may arise if the posterior distribution is very non-elliptical and possibly multi-modal, or if the likelihood function is replaced by a non-differentiable particle filter approximation (see Chapter 8 below). In both cases, a (partially) adaptive approach may work well: First, generate a set of posterior draws based on a reasonable initial choice for Σ̂, e.g. the prior covariance matrix. Second, compute the sample covariance matrix from the first sequence of posterior draws and use it as Σ̂ in a second run of the RWMH algorithm. In principle, the covariance matrix Σ̂ can be adjusted more than once. However, Σ̂ must be fixed eventually to guarantee the convergence of the posterior simulator. Samplers which constantly (or automatically) adjust Σ̂ are known as adaptive samplers and require substantially more elaborate theoretical justifications. Instead of strictly following one of the two approaches of tuning Σ̂ that we just described, we use an estimate of the posterior covariance, Vπ [θ], obtained from an earlier estimation in the subsequent numerical illustrations. While this approach is impractical in empirical work, it is useful for the purpose of comparing the performance of different posterior samplers, because it avoids a distortion due to a mismatch between the Hessian-based estimate and the posterior covariance. Thus, it is a best-case scenario for the algorithm. To summarize, we examine the following variant of the RWMH algorithm: RWMH-V : Σ̂ = Vπ [θ]. The final parameter of the algorithm is the scaling factor c. This parameter is typically adjusted to ensure a “reasonable” acceptance rate. Given the opacity of the posterior, it is difficult to derive a theoretically optimal acceptance rate. If the sampler accepts too frequently, it may be making very small movements, resulting in large serial correlation and a high variance of the resulting Monte Carlo estimates. Similarly, if the chain rejects too frequently, it may get stuck in one region of the parameter space, again resulting in accurate estimates. However, for the special case of a target distribution which is multivariate normal, Roberts, Gelman, and Gilks (1997) have derived a limit (in the size of parameter vector) optimal acceptance rate of 0.234. Most practitioners target an acceptance rate between 0.20 and 0.40. The scaling factor c can be tuned during the burn-in period or via pre-estimation Chapter 5 Sequential Monte Carlo Methods Importance sampling has been rarely used in DSGE model application. A key difficulty with Algorithm 4, in particular in high-dimensional parameter spaces, is to find a good proposal density. In this chapter, we will explore methods in which proposal densities are constructed sequentially. Suppose φn , n = 1, . . . , Nφ , is a sequence that slowly increases from zero to one. We can define a sequence of tempered posteriors as πn (θ) = R [p(Y |θ)]φn p(θ) [p(Y |θ)]φn p(θ)dθ n = 0, . . . , Nφ , φn ↑ 1. (5.1) Provided that φ1 is close to zero, the prior density p(θ) may serve as an efficient proposal density for π1 (θ). Likewise, the density πn (θ) may be a good proposal density for πn+1 (θ). Sequential Monte Carlo (SMC) algorithms try to exploit this insight efficiently. SMC algorithms were initially developed to solve filtering problems that arise in nonlinear state-space models. We will consider such filtering applications in detail in Chapter 8. Chopin (2002) showed how to adapt the particle filtering techniques to conduct posterior inference for a static parameter vector. Textbook treatments of SMC algorithms can be found, for instance, in Liu (2001) and Cappé, Moulines, and Ryden (2005). The volume by Doucet, de Freitas, and Gordon (2001) discusses many applications and practical aspects of SMC. Creal (2012) provides a recent survey focusing on SMC applications in econometrics. The first paper that applied SMC techniques to posterior inference in DSGE models is Creal (2007). He presents a basic SMC algorithm and uses it for posterior inference in a small-scale DSGE model that is similar to the model in Section 1.1. Herbst and Schorfheide (2014) developed the algorithm further, provided some convergence results for an adaptive version of the algorithm building on the theoretical analysis of Chopin (2004), and showed 88 that a properly tailored SMC algorithm delivers more reliable posterior inference for largescale DSGE models with multi-modal posterior than the widely-used RMWH-V algorithm. Much of exposition in this chapter borrows from Herbst and Schorfheide (2014). An additional advantage of the SMC algorithms over MCMC algorithms, on the computational front, highlighted by Durham and Geweke (2014), is that SMC is much more amenable to parallelization. Durham and Geweke (2014) show how to implement a SMC algorithm on graphical processing unit (GPU), facilitating massive speed gains in estimations. While the evaluation of DSGE likelihoods is not (yet) amenable to GPU calculation, we will show how to exploit the parallel structure of the algorithm. We present a generic SMC algorithm in Section 5.1. Further details on the implementation of the algorithm, the adaptive choice of tuning constants, and the convergence of the Monte Carlo approximations constructed from the output of the algorithm are provided in Section 5.2. Finally, we apply the algorithm to the small-scale New Keynesian DSGE model in Section 5.3. Because we will generate draws of θ sequentially, from a sequence of posterior N φ distributions {πn (θ)}n=1 , it is useful to equip the parameter vector with a subscript n. Thus, θn is associated with the density πn (·). 5.1 A Generic SMC Algorithm Just as the basic importance sampling algorithm, SMC algorithms generate weighted draws N φ from the sequence of posteriors {πn }n=1 in (5.1). The weighted draws are called particles. We denote the overall number of particles by N . At any stage the posterior distribution πn (θ) is represented by a swarm of particles {θni , Wni }N i=1 in the sense that the Monte Carlo average h̄n,N N 1 X i a.s. = Wn h(θi ) −→ Eπ [h(θn )]. N i=1 (5.2) i i Starting from stage n − 1 particles {θn−1 , Wn−1 }N i=1 the algorithm proceeds in three steps, using Chopin (2004)’s terminology: correction, that is, reweighting the stage n−1 particles to reflect the density in iteration n; selection, that is, eliminating a highly uneven distribution of particle weights (degeneracy) by resampling the particles; and mutation, that is, propagating the particles forward using a Markov transition kernel to adapt the particle values to the stage n bridge density. The sequence of posteriors in (5.1) was obtained by tempering the likelihood function, that is, we replaced p(Y |θ) by [p(Y |θ)]φn . Alternatively, one could construct the sequence 89 of posteriors by sequentially adding observations to the likelihood function, that is, πn (θ) is based on p(Y1:bφn T c |θ): πn(D) (θ) = R p(Y1:bφn T c )p(θ) , p(Y1:bφn T c )p(θ)dθ (5.3) where bxc is the largest integer that is less or equal to x. This data tempering is particularly attractive in sequential applications. Due to the fact that individual observations are not divisible, the data tempering approach is slightly less flexible. This may matter for the early stages of the SMC sampler in which it may be advantageous to add information in very small increments. The subsequent algorithm is presented in terms of likelihood tempering. However, we also discuss the necessary adjustments for data tempering. The algorithm provided below relies on sequences of tuning parameters. To make the exposition more transparent, we begin by assuming that these sequences are provided ex N φ ante. Let {ρn }n=1 be a sequence of zeros and ones that determine whether the particles N are resampled in the selection step and let {ζn }φn=1 of tuning parameters for the Markov transition density in the mutation step. The adaptive choice of these tuning parameters will be discussed in Section 5.2.2 below. Algorithm 8 (Generic SMC Algorithm with Likelihood Tempering) iid 1. Initialization. (φ0 = 0). Draw the initial particles from the prior: θ1i ∼ p(θ) and W1i = 1, i = 1, . . . , N . 2. Recursion. For n = 1, . . . , Nφ , (a) Correction. Reweight the particles from stage n − 1 by defining the incremental weights i w̃ni = [p(Y |θn−1 )]φn −φn−1 (5.4) and the normalized weights W̃ni = 1 N i w̃ni Wn−1 , PN i Wi w̃ i=1 n n−1 i = 1, . . . , N. (5.5) An approximation of Eπn [h(θ)] is given by h̃n,N = N 1 X i W̃ h(θi ). N i=1 n n−1 (5.6) 90 (b) Selection. Case (i): If ρn = 1, resample the particles via multinomial resampling. Let {θ̂}N i=1 denote N iid draws from a multinomial distribution characterized by supi i port points and weights {θn−1 , W̃ni }N i=1 and set Wn = 1. i and Wni = W̃ni , i = 1, . . . , N . An approximation Case (ii): If ρn = 0, let θ̂ni = θn−1 of Eπn [h(θ)] is given by ĥn,N N 1 X i = W h(θ̂i ). N i=1 n n (5.7) (c) Mutation. Propagate the particles {θ̂i , Wni } via NM H steps of a MH algorithm with transition density θni ∼ Kn (θn |θ̂ni ; ζn ) and stationary distribution πn (θ) (see Algorithm 9 for details below). An approximation of Eπn [h(θ)] is given by h̄n,N = N 1 X h(θni )Wni . N i=1 (5.8) 3. For n = Nφ (φNφ = 1) the final importance sampling approximation of Eπ [h(θ)] is given by: h̄Nφ ,N = N X i h(θN )WNi φ . φ (5.9) i=1 5.1.1 Three-Step Particle Propagation A stylized representation of the propagation of the particles is depicted in Figure 5.1. Each dot corresponds to a particle and its size indicates the weight. At stage n = 0, N = 21 draws are generated from a U [−10, 10] distribution and each particle receives the weight W0i = 1. At stage n = 1 the particles are reweighted during the correction step (the size of the dots is no longer uniform) and the particle values are modified during the mutation step (the location of the dots shifted). The bottom of the figure depicts the target posterior density. As n and φn increase, the target distribution becomes more concentrated. The concentration is mainly reflected in the increased weight of the particles with values between -3 and 3. The figure is generated under the assumption that ρn equals one for n = 3 and zero otherwise. Thus, in iteration n = 3 the resampling step is executed and the uneven particle weights are equalized. While the selection step generates 21 particles with equal weights, there are only six distinct particle values, four of which have multiple copies. The subsequent mutation step restores the diversity in particle values. 91 Figure 5.1: SMC: Evolution of Particles 10 5 0 −5 −10 C φ0 S φ1 M C S M C φ2 S M φ3 Notes: The vertical location of each dot represents the particle value and the diameter of the dot represents its weight. The densities at the bottom represent the tempered targed posterior πn (·). C is Correction; S is Selection; and M is Mutation. 5.1.2 A Closer Look at the Algorithm Algorithm 8 is initialized for n = 0 by generating iid draws from the prior distribution. This initialization works well as long as the prior is sufficiently diffuse to assign non-trivial probability mass to the area of the parameter space in which the likelihood function peaks. There do exist papers in the DSGE model estimation literature in which the posterior mean of some parameters is several prior standard deviations away from the prior mean. For such applications it might be necessary to choose φ0 > 0 and to use an initial distribution that is also informed by the tempered likelihood function [p(Y |θ)]φ0 . If the particles are initialized based on a more general distribution with density g(θ), then for n = 1 the incremental weights have to be corrected by the ratio p(θ)/g(θ). The correction step reweights the stage n − 1 particles to generate an importance sampling approximation of πn . Because the parameter value θi does not change in this step, no i further evaluation of the likelihood function is required. The likelihood value p(Y |θn−1 ) was computed as a by-product of the mutation step in iteration n−1. As discussed in Chapter 3.4, 92 the accuracy of the importance sampling approximation depends on the distribution of the particle weights W̃ni . The more uniformly the weights are distributed, the more accurate the approximation. If likelihood tempering is replaced by data tempering, then the incremental weights w̃ni in (5.4) have to be defined as w̃ni(D) = p(Y(bφn T c+1):bφn T c |θ). (5.10) The correction steps deliver a numerical approximation of the marginal data density as a by-product. Using arguments that we will present in more detail in Section 5.2.4 below, it can be verified that the unnormalized particle weights converge under suitable regularity conditions as follows: Z N 1 X i i [p(Y |θ)]φn−1 p(θ) a.s. w̃n Wn−1 −→ [p(Y |θ)]φn −φn−1 R dθ N i=1 [p(Y |θ)]φn−1 p(θ)dθ R [p(Y |θ)]φn p(θ)dθ . = R [p(Y |θ)]φn−1 p(θ)dθ (5.11) Thus, the data density approximation is given by p̂SM C (Y ) = Nφ Y n=1 ! N 1 X i i w̃ W . N i=1 n n−1 (5.12) Computing this approximation does not require any additional likelihood evaluations. The selection step is designed to equalize the particle weights, if its distribution becomes very uneven. In Algorithm 8 the particles are resampled whenever the indicator ρn is equal to one. On the one hand, resampling introduces noise in the Monte Carlo approximation, which makes it undesirable. On the other hand, resampling equalizes the particle weights and therefore increases the accuracy of the correction step in the subsequent iteration. In Algorithm 8 we use multinomial resampling. Alternative resampling algorithms are discussed in Section 5.2.3 below. i The mutation step changes the values of the particles from θn−1 to θni . To understand the importance of the mutation step, consider what would happen without this step. For simplicity, suppose also that ρn = 0 for all n. In this case the particle values would never change, that is, θni = θ1i for all n. Thus, we would be using the prior as importance sampling distribution and reweight the draws from the prior by the tempered likelihood function [p(Y |θ1i )]φn . Given the information contents in a typical DSGE model likelihood function, this procedure would lead to a degenerate distribution of particles, in which in the last 93 stage Nφ the weight is concentrated on a very small number of particles and the importance sampling approximation is very inaccurate. Thus, the goal of the mutation step is to adapt the values of the stage n particles to πn (θ). This is achieved by using steps of an MH algorithm with a transition density that satisfies the invariance property Z Kn (θn |θ̂ni )πn (θ̂ni )dθ̂ni = πn (θn ). The execution of the MH steps during the particle mutation phase requires at least one, but possibly multiple, evaluations of the likelihood function for each particle i. To the extent that the likelihood function is recursively evaluated with a filter, data tempering has a computational advantages over likelihood tempering, because the former only requires bφn T c ≤ T iterations of the filter, whereas the latter requires T iterations. The particle mutation is ideally suited for parallelization, because the MH steps are independent across particles and do not require any communication across processors. For DSGE models, the evaluation of the likelihood function is computationally very costly because it requires to run a model solution procedure as well as a filtering algorithm. Thus, gains from parallelization are potentially quite large. 5.1.3 A Numerical Illustration We provide a first numerical illustration of the SMC algorithm in the context of the stylized state-space model introduced in Chapter 4.3. Recall that the model is given by # " # " 0 1 θ12 yt = [1 1]st , st = s t , t ∼ iidN (0, 1). t−1 + (1 − θ12 ) − θ1 θ2 (1 − θ12 ) 0 We simulate T = 200 observations given θ = [0.45, 0.45]0 , which is observationally equivalent to θ = [0.89, 0.22]0 , and use a prior distribution that is uniform on the square 0 ≤ θ1 ≤ 1 and 0 ≤ θ2 ≤ 1. Because the state-space model has only two parameters and the model used for posterior inference is correctly specified, the SMC algorithm works extremely well. It is configured as follows. We use N = 1, 024 particles, Nφ = 50 stages, and a linear tempering schedule that sets φn = n/50. The transition kernel for the mutation step is generated by a single step of the RWMH-V algorithm. Some of the output of the SMC algorithm is depicted in Figure 5.2. The left panel displays the sequence of tempered (marginal) posterior distributions πn (θ1 ). It clearly shows that the tempering dampens the posterior density. While the posterior is still unimodal during 94 Figure 5.2: SMC Posterior Approximations for Stylized State-Space Model 1.0 0.8 5 4 0.6 θ2 3 2 0.4 1 0 50 0.2 40 30 n 20 10 0.0 0.2 0.6 0.4 θ 1 0.8 1.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 θ1 Notes: The left panel depicts kernel density estimates of the sequence πn (θ) for n = 0, . . . , 50. The right panel depicts the contours of the posterior π(θ) as well as draws from π(θ). the first few stages of the algorithm, a clear bimodal shape as emerged for n = 10. As φn approaches one, the bimodality of the posterior becomes more pronounced. The left panel also suggests, that the stage n = Nφ − 1 tempered posterior provides a much better importance sampling distribution for the overall posterior π(·) than the stage n = 1 (prior) distribution. The right panel shows a contour plot of the joint posterior density of θ1 and θ2 as well as the draws from the posterior π(θ) = πNφ (θ) obtained in the last stage of Algorithm 8. The algorithm successfully generates draws from the two high-posterior-density regions. 5.2 Further Details of the SMC Algorithm Our initial description of the SMC algorithm left out some details that are important for the successful implementation of the algorithm. Section 5.2.1 discusses the choice of the transition kernel in the mutation step. Section 5.2.2 considers the adaptive choice of various tuning parameters of the algorithm. 5.2.4 outlines some convergence results for Monte Carlo approximations constructed from the output of the SMC sampler. Finally, the resampling step is discussed in more detail in Section 5.2.3. Chapter 8 Particle Filters The key difficulty that arises when the Bayesian estimation of DSGE models is extended from linear to nonlinear models is the evaluation of the likelihood function. Throughout this book, we will focus on the use of particle filters to accomplish this task. Our starting point is a state-space representation for the nonlinear DSGE model of the form yt = Ψ(st , t; θ) + ut , st = Φ(st−1 , t ; θ), ut ∼ Fu (·; θ) (8.1) t ∼ F (·; θ). As discussed in the previous chapter, the functions Ψ(st , t; θ) and Φ(st−1 , t ; θ) are generated numerically by the solution method. For reasons that will become apparent below, we require that the measurement error ut in the measurement equation is additively separable and that the probability density function p(ut |θ) can be evaluated analytically. In many applications, ut ∼ N (0, Σu ). While the exposition of the algorithms in this chapter focuses on the nonlinear state-space model (8.1), the numerical illustrations and empirical applications are based on the linear Gaussian model yt = Ψ0 (θ) + Ψ1 (θ)t + Ψ2 (θ)st + ut , st = Φ1 (θ)st−1 + Φ (θ)t , ut ∼ N (0, Σu ), (8.2) t ∼ N (0, Σ ) obtained by solving a log-linearized DSGE model. For model (8.2) the Kalman filter described in Table 2.1 delivers the exact distributions p(yt |Y1:t−1 , θ) and p(st |Y1:t , θ) against which the accuracy of the particle filter approximation can be evaluated. There exists a large literature on particle filters. Surveys and tutorials can be found, for instance, in Arulampalam, Maskell, Gordon, and Clapp (2002), Cappé, Godsill, and Moulines 148 (2007), Doucet and Johansen (2011), Creal (2012). Kantas, Doucet, Singh, Maciejowski, and Chopin (2014) discuss using particle filters in the context of estimating the parameters of a state space models. These papers provide detailed references to the literature. The basic bootstrap particle filtering algorithm is remarkably straightforward, but may perform quite poorly in practice. Thus, much of the literature focuses on refinements of the bootstrap filter that increases the efficiency of the algorithm, see, for instance, Doucet, de Freitas, and Gordon (2001). Textbook treatments of the statistical theory underlying particle filters can be found in Cappé, Moulines, and Ryden (2005), Liu (2001), and Del Moral (2013). The remainder of this chapter is organized as follows. We introduce the bootstrap particle filter in Section 8.1. This filter is easy to implement and has been widely used in DSGE model applications. The bootstrap filter is a special case of a sequential importance sampling with resampling (SISR) algorithm. This more general algorithm is presented in Section 8.2. An important step in the generic particle filter algorithm is to generate draws that reflect the distribution of the states in period t conditional on the information Y1:t . These draws are generated through an importance sampling step in which states are drawn from a proposal distribution and reweighted. For the bootstrap particle filter, this proposal distribution is based on the state transition equation. Unfortunately, the forward iteration of the state transition equation might produce draws that are associated with highly variable weights, which in turn leads to imprecise Monte Carlo approximations. The accuracy of the particle filter can be improved by choosing other proposal distributions. We discuss in Section 8.3 how to construct more efficient proposal distributions. While the tailoring (or adaption) of the proposal distributions tends to require additional computations, the number of particles can often be reduced drastically, which leads to an improvement in efficiency. DSGE model-specific implementation issues of the particle filter are examined in Section 8.4. Finally, we present the auxiliary particle filter and a filter recently proposed by DeJong, Liesenfeld, Moura, Richard, and Dharmarajan (2013) in Section 8.5. Various versions of the particle filter are applied to the small-scale New Keynesian DSGE model and the SW model in Sections 8.6 and 8.7. We close this chapter with some computational considerations in Section 8.8. Throughout this chapter we condition on a fixed vector of parameter values θ and defer the topic of parameter estimation to Chapters 9 and 10. 149 8.1 The Bootstrap Particle Filter We begin with a version of the particle filter in which the particles representing the hidden state vector st are propagated by iterating the state-transition equation in (8.1) forward. This version of the particle filter is due to Gordon, Salmond, and Smith (1993) and called bootstrap particle filter. As in Algorithm 8, we use the sequence {ρt }Tt=1 to indicate whether the particles are resampled in period t. A resampling step is necessary to prevent the distribution of the particle weights from degenerating. We discuss the adaptive choice of ρt below. The function h(·) is used to denote transformations of interest of the state vector st . The particle filter algorithm closely follows the steps of the generic filter in Algorithm 1. Algorithm 11 (Bootstrap Particle Filter) iid 1. Initialization. Draw the initial particles from the distribution sj0 ∼ p(s0 ) and set W0j = 1, j = 1, . . . , M . 2. Recursion. For t = 1, . . . , T : j (a) Forecasting st . Propagate the period t − 1 particles {sjt−1 , Wt−1 } by iterating the state-transition equation forward: s̃jt = Φ(sjt−1 , jt ; θ), jt ∼ F (·; θ). (8.3) An approximation of E[h(st )|Y1:t−1 , θ] is given by ĥt,M M 1 X j = h(s̃jt )Wt−1 . M j=1 (8.4) (b) Forecasting yt . Define the incremental weights w̃tj = p(yt |s̃jt , θ). (8.5) The predictive density p(yt |Y1:t−1 , θ) can be approximated by p̂(yt |Y1:t−1 , θ) = M 1 X j j w̃ W . M j=1 t t−1 (8.6) If the measurement errors are N (0, Σu ) then the incremental weights take the form 0 −1 1 j j j −n/2 −1/2 w̃t = (2π) |Σu | exp − yt − Ψ(s̃t , t; θ) Σu yt − Ψ(s̃t , t; θ) , (8.7) 2 where n here denotes the dimension of yt . 150 (c) Updating. Define the normalized weights W̃tj = 1 M j w̃tj Wt−1 PM j j . j=1 w̃t Wt−1 (8.8) An approximation of E[h(st )|Y1:t , θ] is given by h̃t,M M 1 X = h(s̃jt )W̃tj . M j=1 (8.9) (d) Selection. Case (i): If ρt = 1 resample the particles via multinomial resampling. Let {sjt }M j=1 denote M iid draws from a multinomial distribution characterized by support points and weights {s̃jt , W̃tj } and set Wtj = 1 for j =, 1 . . . , M . Case (ii): If ρt = 0, let sjt = s̃jt and Wtj = W̃tj for j =, 1 . . . , M . An approximation of E[h(st )|Y1:t , θ] is given by h̄t,M M 1 X = h(sjt )Wtj . M j=1 (8.10) 3. Likelihood Approximation. The approximation of the log likelihood function is given by ln p̂(Y1:T |θ) = T X t=1 ln M 1 X j j w̃ W M j=1 t t−1 ! . (8.11) As for the SMC Algorithm 8, we can define an effective sample size (in terms of number of particles) as [t = M ESS ! M 1 X j 2 (W̃ ) . M j=1 t (8.12) and replace the deterministic sequence {ρt }Tt=1 by an adaptively chosen sequence {ρ̂t }Tt=1 , [ t falls below a threshold, say M/2. In the remainder of for which ρ̂t = 1 whenever ESS this section we discuss the asymptotic properties of the particle filters, letting the number of particles M −→ ∞, and the role of the measurement errors ut in the bootstrap particle filter. 8.1.1 Asymptotic Properties The convergence theory underlying the particle filter is similar to the theory sketched in Chapter 5.2.4 for the SMC sampler. As in our presentation of the SMC sampler, in the 151 subsequent exposition we will abstract many of the technical details that underly the convergence theory for the SMC sampler. Our exposition will be mainly heuristic, meaning that we will present some basic convergence results and the key steps for their derivation. Rigorous derivations are provided in Chopin (2004). As in Chapter 5.2.4 the asymptotic variance formulas are represented in recursive form. While this renders them unusable for the computation of numerical standard errors, they do provide some qualitative insights into the accuracy of the Monte Carlo approximations generated by the particle filter. To simplify the notation, we drop the parameter vector θ from the conditioning set. Starting point is the assumption that Monte Carlo averages constructed from the t − 1 particle j swarm {sjt−1 , Wt−1 }M j=1 satisfy an SLLN and a CLT of the form a.s. √ M h̄t−1,M h̄t−1,M −→ E[h(st−1 )|Y1:t−1 ], − E[h(st−1 )|Y1:t−1 ] =⇒ N 0, Ωt−1 (h) . (8.13) Based on this assumption, we will show the convergence of ĥt,M , p̂(yt |Y1:t−1 ), h̃t,M , and h̄t,M . We write Ωt−1 (h) to indicate that the asymptotic covariance matrix depends on the function h(st ) for which the expected value is computed. The filter is typically initialized by directly sampling from the initial distribution p(s0 ), which immediately delivers the SLLN and CLT provided the required moments of h(s0 ) exist. We now sketch the convergence arguments for the Monte Carlo approximations in Steps 2(a) to 2(d) of Algorithm 11. A rigorous proof would involve verifying the existence of moments required by the SLLN and CLT and a careful characterization of the asymptotic covariance matrices. Forecasting Steps. The forward iteration of the state-transition equation amounts to drawing st from a conditional density gt (st |sjt−1 ). In Algorithm 11 this density is given by gt (st |sjt−1 ) = p(st |sjt−1 ). We denote expectations under this density as Ep(·|sj t−1 ) ĥt,M [h] and decompose M 1 X j j − E[h(st )|Y1:t−1 ] = h(s̃t ) − Ep(·|sj ) [h] Wt−1 t−1 M j=1 (8.14) M 1 X j + Ep(·|sj ) [h]Wt−1 − E[h(st )|Y1:t−1 ] t−1 M j=1 = I + II, say. This decomposition is similar to the decomposition (5.31) used in the analysis of the mutation step of the SMC algorithm. 152 j j Conditional on {sjt−1 , Wt−1 }M i=1 the weights Wt−1 are known and the summands in term I form a triangular array of mean-zero random variables that within each row are indepenj dently but not identically distributed. Provided the required moment bounds for h(s̃jt )Wt−1 are satisfied, I converges to zero almost surely and satisfies a CLT. Term II also converges to zero because M 1 X a.s. j Ep(·|sj ) [h]Wt−1 −→ E Ep(·|st−1 ) [h] Y1:t−1 t−1 M j=1 Z Z = h(st )p(st |st−1 )dst p(st−1 |Y1:t−1 )dst−1 = (8.15) E[h(st )|Y1:t−1 ] Thus, under suitable regularity conditions √ a.s. ĥt,M −→ E[h(st )|Y1:t−1 ], M ĥt,M − E[h(st )|Y1:t−1 ] =⇒ N 0, Ω̂t (h) . (8.16) The asymptotic covariance matrix Ω̂t (h) is given by the sum of the asymptotic variances √ √ of the terms M · I and M · II in (8.14). The convergence of the predictive density approximation p̂(yt |Y1:t−1 ) to p(yt |Y1:t−1 ) in Step 2(b) follows directly from (8.16) by setting h(st ) = p(yt |st ). Updating and Selection Steps. The goal of the updating step is to approximate posterior expectations of the form R h(st )p(yt |st )p(st |Y1:t−1 )dst E[h(st )|Y1:t ] = R ≈ p(yt |st )p(st |Y1:t−1 )dst 1 M PM j j j j=1 h(s̃t )w̃t Wt−1 PM j j 1 j=1 w̃t Wt−1 M = h̃t,M . (8.17) The Monte Carlo approximation of E[h(st )|Y1:t ] has the same form as the Monte Carlo approximation of h̃n,M in (5.24) in the correction step of the SMC Algorithm 8 and its convergence can be analyzed in a similar manner. Define the normalized incremental weights as p(yt |st ) . (8.18) p(yt |st )p(st |Y1:t−1 )dst Then, under suitable regularity conditions, the Monte Carlo approximation satisfies a CLT vt (st ) = R of the form √ M h̃t,M − E[h(st )|Y1:t ] =⇒ N 0, Ω̃t (h) , Ω̃t (h) = Ω̂t vt (st )(h(st ) − E[h(st )|Y1:t ]) . (8.19) The expression for the asymptotic covariance matrix Ω̃t (h) highlights that the accuracy depends on the distribution of the incremental weights. Roughly, the larger the variance of the particle weights, the less accurate the approximation. 153 Finally, the selection step in Algorithm 11 is identical to the selection step in Algorithm 8 and it adds some additional noise to the approximation. If ρt = 1, then under multinomial resampling √ M h̄t,M − E[h(st )|Y1:t ] =⇒ N 0, Ωt (h) , Ωt (h) = Ω̃t (h) + V[h(st )|Y1:t ]. (8.20) As discussed in Chapter 5.2.3, the variance can be reduced by replacing the multinomial resampling with a more efficient resampling scheme. 8.1.2 The Role of Measurement Errors Many DSGE models, e.g., the ones considered in this book, do not assume that the observables yt are measured with error. Instead, the number of structural shocks is chosen to be equal to the number of observables, which means that the likelihood function p(Y1:T |θ) is nondegenerate. It is apparent from the formulas in Table 2.1 that the Kalman filter iterations are well defined even if the measurement error covariance matrix Σu in the linear Gaussian state space model (8.2) is equal to zero, provided that the number of shocks t is not smaller than the number of observables and the forecast error covariance matrix Ft|t−1 is invertible. For the bootstrap particle filter the case of Σu = 0 presents a serious problem. The incremental weights w̃tj in (8.7) are degenerate if Σu = 0 because the conditional distribution of yt |(st , θ) is a pointmass. For a particle j, this point mass is located at ytj = Ψ(s̃jt , t; θ). If the innovation jt is drawn from a continuous distribution in the forecasting step and the state transition equation Φ(st−1 , t ; θ) is a smooth function of the lagged state and the innovation t , then the probability that ytj = yt is zero, which means that w̃tj = 0 for all j and the particles vanish after one iteration. The intuition for this result is straightforward. The incremental weights are large for particles j for which ytj = Ψ(s̃jt , t; θ) is close to the actual yt . Under Gaussian measurement errors, the metric for closeness is given by Σ−1 u . Thus, all else equal, decreasing the measurement error variance Σu increases the discrepancy between ytj and yt and therefore the variance of the particle weights. Consider the following stylized example (we are omitting the j superscripts). Suppose that yt is scalar, the measurement errors are distributed according to ut ∼ N (0, σu2 ), Wt−1 = 1, and let δ = yt − Ψ(st , t; θ). Suppose that in population the δ is distributed according to a N (0, 1). In this case vt (st ) in (8.18) can be viewed as a population approximation of the normalized weights W̃t constructed in the updating step (note that the denominator of these 154 two objects is slightly different): n o 1/2 exp − 2σ12 δ 2 1 1 2 u o n = 1+ 2 W̃t (δ) ≈ vt (δ) = exp − 2 δ . R σu 2σu (2π)−1/2 exp − 21 1 + σ12 δ 2 dδ u The asymptotic covariance matrix Ω̃t (h) in (8.19) which captures the accuracy of h̃t,M as well as the heuristic effective sample size measure defined in (8.12) depend on the variance of the particle weights, which in population is given by Z 1 + 1/σu2 1 1 + σu2 p vt2 (δ)dδ = p = −→ ∞ as σu −→ 0. σu 2 + σu2 1 + 2/σu2 Thus, a decrease in the measurement error variance raises the variance of the particle weights and thereby decreases the effective sample size. More importantly, the increasing dispersion of the weights translates into an increase in the limit covariance matrix Ω̃t (h) and a deterioration of the Monte Carlo approximations generated by the particle filter. In sum, all else equal, the smaller the measurement error variance, the less accurate the particle filter. 8.2 A Generic Particle Filter In the basic version of the particle filter the time t particles were generated by simulating the state transition equation forward. However, the naive forward simulation ignores information contained in the current observation yt and may lead to a very uneven distribution of particle weights, in particular if the measurement error variance is small or if the model has difficulties explaining the period t observation in the sense that for most particles s̃jt the actual observation yt lies far in the tails of the model-implied distribution of yt |(s̃jt , θ). The particle filter can be generalized by allowing s̃jt in the forecasting step to be drawn from a generic importance sampling density gt (·|sjt−1 , θ), which leads to the following algorithm: Algorithm 12 (Generic Particle Filter) 1. Initialization. (Same as Algorithm 11) 2. Recursion. For t = 1, . . . , T : 155 (a) Forecasting st . Draw s̃jt from density gt (s̃t |sjt−1 , θ) and define the importance weights ωtj = p(s̃jt |sjt−1 , θ) gt (s̃jt |sjt−1 , θ) . (8.21) An approximation of E[h(st )|Y1:t−1 , θ] is given by ĥt,M M 1 X j h(s̃jt )ωtj Wt−1 . = M j=1 (8.22) (b) Forecasting yt . Define the incremental weights w̃tj = p(yt |s̃jt , θ)ωtj . (8.23) The predictive density p(yt |Y1:t−1 , θ) can be approximated by M 1 X j j p̂(yt |Y1:t−1 , θ) = w̃ W . M j=1 t t−1 (8.24) (c) Updating. (Same as Algorithm 11) (d) Selection. (Same as Algorithm 11) 3. Likelihood Approximation. (Same as Algorithm 11). The only difference between Algorithms 11 and 12 is the introduction of the importance weights ωtj which appear in (8.22) as well as the definition of the incremental weights w̃tj in (8.23). The main goal of replacing the forward iteration of the state-transition equation by an importance sampling step is to improve the accuracy of p̂(yt |Y1:t−1 , θ) in Step 2(b) and h̃t,M in Step 2(c). We subsequently focus on the analysis of p̂(yt |Y1:t−1 , θ). Emphasizing the dependence of ωtj on both s̃jt and sjt−1 , write p̂(yt |Y1:t−1 ) − p(yt |Y1:t−1 ) M 1 X j j j j j j j j j = p(yt |s̃t )ωt (s̃t , st−1 ) − Egt (·|sj ) [p(yt |s̃t )ωt (s̃t , st−1 )] Wt−1 t−1 M j=1 M 1 X j j j j j + Egt (·|sj ) [p(yt |s̃t )ωt (s̃t , st−1 )] − p(yt |Y1:t−1 ) Wt−1 t−1 M j=1 = I + II, (8.25) 156 say. Consider term II. First, notice that M 1 X j j [p(yt |s̃jt )ωtj (s̃jt , sjt−1 )]Wt−1 E M j=1 gt (·|st−1 ) a.s. j j −→ E Egt (·|st−1 ) [p(yt |s̃t )ωt (s̃t , st−1 )] # Z "Z j j p(s̃t |st−1 ) j j = p(yt |s̃t ) gt (s̃t |st−1 )ds̃t p(st−1 |Y1:t−1 )dst−1 gt (s̃jt |st−1 ) = (8.26) p(yt |Y1:t−1 ), which implies that term II converges to zero almost surely and ensures the consistency of the Monte Carlo approximation. Second, because Egt (·|st−1 ) [p(yt |s̃jt )ωt (s̃jt , st−1 )] Z = p(yt |s̃jt ) p(s̃jt |st−1 ) gt (s̃jt |st−1 )ds̃jt = p(yt |st−1 ), j gt (s̃t |st−1 ) (8.27) the variance of term II is independent of the choice of the importance density gt (s̃jt |sjt−1 ). j j Now consider term I. Conditional on {sjt−1 , Wt−1 }M j=1 the weights Wt−1 are known and the summands form a triangular array of mean-zero random variables that are independently distributed within each row. Recall that p(yt |s̃jt )ωtj (s̃jt , sjt−1 ) = p(yt |s̃jt )p(s̃jt |sjt−1 ) gt (s̃jt |sjt−1 ) . (8.28) Choosing a suitable importance density gt (s̃jt |sjt−1 ) that is a function of the time t observation j yt can drastically reduce the variance of term I conditional on {sjt−1 , Wt−1 }M j=1 . Such filters are called adapted particle filters. In turn, this leads to a variance reduction of the Monte Carlo approximation of p(yt |Y1:t−1 ). A similar argument can be applied to the variance of h̃t,M . The bootstrap particle filter simply sets gt (s̃jt |sjt−1 ) = p(s̃jt |sjt−1 ) and ignores the information in yt . We will subsequently discuss more efficient choices of gt (s̃jt |sjt−1 ). 8.3 Adapting the Generic Filter There exists a large literature on the implementation and the improvement of the particle filters in Algorithms 11 and 12. Detailed references to this literature are provided, for instance, in Doucet, de Freitas, and Gordon (2001), Cappé, Godsill, and Moulines (2007), Doucet and Johansen (2011), and Creal (2012). We will focus in this section on the choice of 157 the proposal density gt (s̃jt |sjt−1 ). The starting point is the conditionally-optimal importance distribution. In nonlinear DSGE models it is typically infeasible to generate draws from this distribution but it might be possible to construct an approximately conditionally-optimal proposal. Finally, we consider conditionally linear Gaussian state-space model for which it is possible to use Kalman filter updating for a subset of state variables conditional on the remaining elements of the state vector. 8.3.1 Conditionally-Optimal Importance Distribution The conditionally-optimal distribution, e.g., Liu and Chen (1998), is defined as the distribution that minimizes the Monte Carlo variation of the importance weights. However, this notion of optimality conditions on the current observation yt as well as the t − 1 particle sjt−1 . Given (yt , sjt−1 ) the weights w̃tj are constant (as a function of s̃t ) if gt (s̃t |sjt−1 ) = p(s̃t |yt , sjt−1 ), (8.29) that is, s̃t is sampled from the posterior distribution of the period t state given (yt , sjt−1 ). In this case w̃tj Z = p(yt |st )p(st |sjt−1 )dst . (8.30) In a typical (nonlinear) DSGE model applications it is not possible to sample directly from p(s̃t |yt , sjt−1 ). One could use an accept-reject algorithm as discussed in Künsch (2005) to generate draws from the conditionally-optimal distribution. However, for this approach to work efficiently, the user needs to find a good proposal distribution within the accept-reject algorithm. As mentioned above, our numerical illustrations below are all based on DSGE models that take the form of the linear Gaussian state-space model (8.2). In this case, one can obtain p(s̃t |yt , sjt−1 ) from the Kalman filter updating step described in Table 2.1. Let s̄jt|t−1 = Φ1 sjt−1 Pt|t−1 = Φ Σ Φ0 j ȳt|t−1 = Ψ0 + Ψ1 t + Ψ2 s̄jt|t−1 Ft|t−1 = Ψ2 Pt|t−1 Ψ02 + Σu s̄jt|t −1 = s̄jt|t−1 + Pt|t−1 Ψ02 Ft|t−1 (yt − ȳt|t−1 ) Pt|t −1 = Pt|t−1 − Pt|t−1 Ψ02 Ft|t−1 Ψ2 Pt|t−1 . The conditionally optimal proposal distribution is given by s̃t |(sjt−1 , yt ) ∼ N s̄jt|t , Pt|t . (8.31) We will use (8.31) as a benchmark to document how accurate a well-adapted particle filter could be. In applications with nonlinear DSGE models, in which it is not possible to sample directly from p(s̃t |yt , sjt−1 ), the documented level of accuracy is typically not attainable. 158 8.3.2 Approximately Conditionally-Optimal Distributions In a typical DSGE model application, sampling from the conditionally-optimal importance distribution is infeasible or computationally too costly. Alternatively, one could try to sample from an approximately conditionally-optimal importance distribution. For instance, if the DSGE model nonlinearity arises from a higher-order perturbation solution and the nonlinearities are not too strong, then an approximately conditionally-optimal importance distribution could be obtained by applying the one-step Kalman filter updating in (8.31) to the first-order approximation of the DSGE model. More generally, as suggested in Guo, Wang, and Chen (2005), one could use the updating steps of a conventional nonlinear filter, such as an extended Kalman filter, unscented Kalman filter, or a Gaussian quadrature filter, to construct an efficient proposal distribution. Approximate filters for nonlinear DSGE models have been developed by Andreasen (2013) and Kollmann (2014). 8.3.3 Conditional Linear Gaussian Models Certain DSGE models have a conditional linear structure that can be exploited to improve the efficiency of the particle filter. These models include the class of Markov switching linear rational expectations (MS-LRE) models analyzed in Farmer, Waggoner, and Zha (2009) as well as models that are linear conditional on exogenous stochastic volatility processes, e.g., the linearized DSGE model with heteroskedastic structural shocks estimated by Justiniano and Primiceri (2008) and the long-run risks model studied in Schorfheide, Song, and Yaron (2014). For concreteness, consider an MS-LRE model obtained by replacing the fixed targetinflation rate π ∗ in the monetary policy rule (1.24) with a time-varying process πt∗ (mt ) of the form ∗ πt∗ = mt πL∗ + (1 − mt )πH , P{mt = l|mt−1 = l} = ηll , l ∈ {0, 1}. (8.32) This model was estimated in Schorfheide (2005).1 Log-linearizing the model with Markovswitching target inflation rate leads to a MS-LRE similar to (2.1), except that the loglinearized monetary policy rule now contains an intercept that depends on mt . The solution 1 A richer DSGE model with a Markov switching target inflation rate was subsequently estimated by Liu, Waggoner, and Zha (2011). Chapter 9 Combining Particle Filters with MH Samplers We previously focused on the particle-filter approximation of the likelihood function of a potentially nonlinear DSGE model. In order to conduct Bayesian inference, the approximate likelihood function has to be embedded into a posterior sampler. We begin by combining the particle filtering methods of Chapter 8 with the MCMC methods of Chapter 4. In a nutshell, we replace the actual likelihood functions that appear in the formula for the acceptance probability α(ϑ|θi−1 ) in Algorithm 5 by particle filter approximations p̂(Y |θ). This idea was first proposed for the estimation of nonlinear DSGE models by Fernández-Villaverde and Rubio-Ramı́rez (2007). We refer to the resulting algorithm as PFMH algorithm. It is a special case of a larger class of algorithms called particle Markov chain Monte Carlo (PMCMC). The theoretical properties of PMCMC methods were established in Andrieu, Doucet, and Holenstein (2010). Applications of PFMH algorithms in other areas of econometrics are discussed in Flury and Shephard (2011). 9.1 The PFMH Algorithm The statistical theory underlying the PFMH algorithm is very complex and beyond the scope of this book. We refer the interested reader to Andrieu, Doucet, and Holenstein (2010) for a careful exposition. Below we will sketch the main idea behind the algorithm. The exposition is based on Flury and Shephard (2011). We will distinguish between {p(Y |θ), p(θ|Y ), p(Y )} 186 and {p̂(Y |θ), p̂(θ|Y ), p̂(Y )}. The first triplet consists of the exact likelihood function p(Y |θ) and the resulting posterior distribution and marginal data density defined as Z p(Y |θ)p(θ) p(θ|Y ) = , p(Y ) = p(Y |θ)p(θ)dθ. p(Y ) (9.1) The second triplet consists of the particle filter approximation of the likelihood function denoted by p̂(Y |θ) and the resulting posterior and marginal data density: Z p̂(Y |θ)p(θ) p̂(θ|Y ) = , p̂(Y ) = p̂(Y |θ)p(θ)dθ. p̂(Y ) (9.2) By replacing the exact likelihood function p(θ|Y ) with the particle filter approximation p̂(Y |θ) in Algorithm 5 one might expect to obtain draws from the approximate posterior p̂(θ|Y ) instead of the exact posterior p(θ|Y ). The surprising implication of the theory developed in Andrieu, Doucet, and Holenstein (2010) is that the distribution of draws from the PFMH algorithm that replaces p(Y |θ) by p̂(Y |θ) in fact does converge to the exact posterior. The algorithm takes the following form: Algorithm 18 (PFMH Algorithm) For i = 1 to N : 1. Draw ϑ from a density q(ϑ|θi−1 ). 2. Set θi = ϑ with probability α(ϑ|θ i−1 p̂(Y |ϑ)p(ϑ)/q(ϑ|θi−1 ) ) = min 1, p̂(Y |θj−1 )p(θi−1 )/q(θi−1 |ϑ) and θi = θi−1 otherwise. The likelihood approximation p̂(Y |ϑ) is computed using Algo- rithm 12. Any of the particle filters described in Chapter 8 could be used in the PFMH algorithm. For concreteness, we use the generic filter described in Algorithm 12. At each iteration the 0 and filter generates draws s̃jt from the proposal distribution gt (·|sjt−1 ). Let S̃t = s̃1t , . . . , s̃M t 1:M denote the entire sequence of draws by S̃1:T . In the selection step we are using multinomial resampling to determine the ancestor for each particle in the next iteration. Thus, we can define a random variable Ajt that contains this ancestry information. For instance, suppose 1 that during the resampling particle j = 1 was assigned the value s̃10 t then At = 10. Note that the Ajt ’s are random variables whose values are determined during the selection step. Let At = A1t , . . . , AN and use A1:T to denote the sequence of At ’s. t 187 The PFMH algorithm operates on a probability space that includes the parameter vector θ as well as S̃1:T and A1:T . We use U1:T to denote the sequence of random vectors that are used to generate S̃1:T and A1:T . U1:T can be thought of as an array of iid uniform random numbers. The transformation of U1:T into (S̃1:T , A1:T ) typically depends on θ and Y1:T because the proposal distribution gt (s̃t |sjt−1 ) in Algorithm 12 depends both on the current observation yt as well as the parameter vector θ which enters measurement and state-transitions equations, see (8.1). For concreteness, consider the conditionally-optimal particle filter for a linear state-space model described in Chapter 8.3.1. Implementation of this filter requires sampling from a N (s̄jt|t , Pt|t ) distribution for each particle j. The mean of this distribution depends on yt and both mean and covariance matrix depend on θ through the system matrices of the state-space representation (8.2). Draws from this distribution can in principle be obtained, by sampling iid uniform random variates, using a probability integral transform to convert them into iid draws from a standard normal distribution, and then converting them into j draws from a N (s̄jt|t , Pt|t ). Likewise, in the selection step, the multinomial resampling could be implemented based on draws from iid uniform random variables. Therefore, we can express the particle filter approximation of the likelihood function as p̂(Y1:T |θ) = g(Y1:T |θ, U1:T ). where U1:T ∼ p(U1:T ) = T Y p(Ut ). (9.3) (9.4) t=1 The PFMH algorithm can be interpreted as operating on an enlarged probability space for the triplet (Y1:T , θ, U1:T ). Define the joint distribution pg Y1:T , θ, U1:T = g(Y1:T |θ, U1:T )p U1:T p(θ). (9.5) The PFMH algorithm samples from the joint posterior pg θ, U1:T |Y1:T ∝ g(Y |θ, U1:T )p U1:T p(θ) (9.6) and discards the draws of U1:T . For this procedure to be valid, it has to be the case that marginalizing the joint posterior pg θ, U1:T |Y1:T with respect to U1:T yields the exact posterior p(θ|Y1:T ). In other words, we require that the particle filter produces an unbiased simulation approximation of the likelihood function for all values of θ: Z E[p̂(Y1:T |θ)] = g(Y1:T |θ, U1:T )p U1:T dθ = p(Y1:T |θ). (9.7) 188 Del Moral (2004) has shown that particle filters indeed deliver unbiased estimates of the likelihood function. It turns out that the acceptance probability for the MH algorithm that operates on the enlarged probability space can be directly expressed in terms of the particle filter approximation p̂(Y1:T |θ). On the enlarged probability space, one needs to generate a proposed draw ∗ for both θ and U1:T . We denote these draws by ϑ and U1:T . The proposal distribution for ∗ ∗ (ϑ, U1:T ) in the MH algorithm is given by q(ϑ|θ(i−1) )p(U1:T ). There is no need to keep track ∗ of the draws (U1:T ), because the acceptance ratio for Algorithm 18 can be written as follows (omitting the time subscripts):   α(ϑ|θi−1 ) = min 1,  ( = min g(Y |ϑ,U ∗ )p(U ∗ )p(ϑ) q(ϑ|θ(i−1) )p(U ∗ )   g(Y |θ(i−1) ,U (i−1) )p(U (i−1) )p(θ(i−1) )  q(θ(i−1) |θ∗ )p(U (i−1) ) (9.8) ) p̂(Y |ϑ)p(ϑ) q(ϑ|θ(i−1) ) 1, . p̂(Y |θ(i−1) )p(θ(i−1) ) q(θ(i−1) |ϑ) The terms p(U ∗ ) and p(U (i−1) ) cancel from the expression in the first line of (9.8) and it suffices to record the particle filter likelihood approximations p̂(Y |ϑ) and p̂(Y |θ(i−1) ). 9.2 Application to the Small-Scale DSGE Model We now apply the PFMH algorithm to the small-scale New Keynesian model, which is estimated over the period 1983:I to 2002:IV. We use the 1-block RWMH-V algorithm and combine it with the Kalman filter, the bootstrap PF, and the conditionally optimal PF. According to the theory sketched in the previous section the PFMH algorithm should accurately approximate the posterior distribution of the DSGE model parameters. Our results are based on Nrun = 20 runs of each algorithm. In each run we generate N = 100, 000 posterior draws and discard the first N0 = 50, 000. As in Chapter 8.6, we use M = 40, 000 particles for the bootstrap filter and M = 400 particles for the conditionally-optimal filter. A single run of the RWMH-V algorithm takes 1 minute and 30 seconds with the Kalman filter, approximately 40 minutes with the conditionally-optimal PF, and approximately 1 day with the bootstrap PF. The results are summarized in Table 9.1. Most notably, despite the inaccurate likelihood approximation of the bootstrap PF documented in Chapter 8.6, the PFMH works remarkably well. Columns 2 to 4 of the table report posterior means which are computed by pooling the 189 Table 9.1: Accuracy of MH Approximations Posterior Mean (Pooled) Inefficiency Factors Std Dev of Means KF CO-PF BS-PF KF CO-PF BS-PF KF τ 2.63 2.62 2.64 66.17 126.76 1360.22 0.020 0.028 0.091 κ 0.82 0.81 0.82 128.00 97.11 1887.37 0.007 0.006 0.026 ψ1 1.88 1.88 1.87 113.46 159.53 749.22 0.011 0.013 0.029 ψ2 0.64 0.64 0.63 61.28 56.10 681.85 0.011 0.010 0.036 ρr 0.75 0.75 0.75 108.46 134.01 1535.34 0.002 0.002 0.007 ρg 0.98 0.98 0.98 94.10 88.48 1613.77 0.001 0.001 0.002 ρz 0.88 0.88 0.88 124.24 118.74 1518.66 0.001 0.001 0.005 r(A) 0.44 0.44 0.44 148.46 151.81 1115.74 0.016 0.016 0.044 π (A) 3.32 3.33 3.32 152.08 141.62 1057.90 0.017 0.016 0.045 γ (Q) 0.59 0.59 0.59 106.68 142.37 899.34 0.006 0.007 0.018 σr 0.24 0.24 0.24 35.21 179.15 1105.99 0.001 0.002 0.004 σg 0.68 0.68 0.67 98.22 64.18 1490.81 0.003 0.002 0.011 σz 0.32 0.32 0.32 84.77 61.55 575.90 0.001 0.001 0.003 -357.17 -358.32 0.040 0.038 0.949 ln p̂(Y ) -357.14 CO-PF BS-PF Notes: Results are based on Nrun = 20 runs of the PF-RWMH-V algorithm. Each run of the algorithm generates N = 100, 000 draws and the first N0 = 50, 000 are discarded. The likelihood function is computed with the Kalman filter (KF), bootstrap particle filter (BS-PF, M = 40, 000) or conditionally-optimal particle filter (CO-PF, M = 400). “Pooled” means that we are pooling the draws from the Nrun = 20 runs to compute posterior statistics. draws generated by the 20 runs of the algorithms. Except for some minor discrepancies in the posterior mean for τ and r(A) , which are parameters with a high posterior variance, the posterior mean approximations are essentially identical for all three likelihood evaluation methods. Columns 5 to 7 contain the inefficiency factors InEffN (θ̄) for each parameters and the last three columns of Table 9.1 contain the standard deviations of the posterior mean estimates across the 20 runs. Not surprisingly, the posterior sampler that is based on the bootstrap PF is the least accurate. The standard deviations are 2 to 4 times as large as for the samplers that utilize either the Kalman filter or the conditionally-optimal PF. Under the Kalman filter the inefficiency factors range from 35 to about 150, whereas under the bootstrap particle filter they range from 575 to 1,890. As stressed in Section 9.1 the most important requirement for PFMH algorithms is that the particle filter approximation 190 Figure 9.1: Autocorrelation of PFMH Draws τ 1.0 0.9 κ 1.0 0.9 0.9 0.8 0.8 0.8 0.7 0.6 0.7 0.7 0.5 0.6 0.6 0.5 0.4 0.5 0.3 0.2 0 5 10 15 20 25 30 35 40 ρr 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0 5 10 15 20 25 30 35 40 0.4 0.4 0 5 10 15 20 25 30 35 40 ψ1 1.0 0.3 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0 5 10 15 20 25 30 35 40 0 0.3 5 10 15 20 25 30 35 40 ψ2 1.0 0.9 0.3 σr 1.0 0 5 10 15 20 25 30 35 40 Notes: The figure depicts autocorrelation functions computed from the output of the 1 Block RWMH-V algorithm based on the Kalman filter (solid), the conditionally-optimal particle filter (dashed) and the bootstrap particle filter (solid with dots). is unbiased – it does not have to be exact. In Figure 9.1 we depict autocorrelation functions for parameter draws computed based on the output of the PFMH algorithms. As a benchmark, the figure also contains autocorrelation functions obtained from the sampler that uses the exact likelihood function computed with the Kalman filter. While under the conditionally-optimal particle filter the persistence of the Markov chain for the DSGE model parameters is comparable to the persistence under the Kalman filter, the use of the bootstrap particle filter raises the serial correlation of the parameter draws drastically, which leads to the less precise Monte Carlo approximations reported in Table 9.1. 191 9.3 Application to the SW Model We now use the PF-RWMH-V algorithm to estimate the SW model. Unlike in Chapter 6.2, where we used a more diffuse prior distribution to estimate the SW model, we now revert back to the prior originally specified by Smets and Wouters (2007). This prior is summarized in Table A-2 in the Appendix. As shown in Herbst and Schorfheide (2014), under the original prior distribution the RWMH algorithm is much better behaved than under our diffuse prior, because it leads to a posterior distribution that does not exhibit multiple modes. The estimation sample is 1966:I to 2004:IV. Using the RWMH-V algorithm, we estimate the model posteriors using the Kalman filter and the conditionally-optimal PF, running each algorithm Nrun = 20 times. The bootstrap particle filters with M = 40, 000 and M = 400, 000 particles turned out to be too inaccurate to deliver reliable posterior estimates. Table 9.2 show the results for the Kalman filter and the conditionally-optimal PF. While the the MCMC chains generated using the KF and CO-PF generate roughly the same means for the parameter draws on average, the variability across the chains is much higher for the CO-PF. According to inefficiency factors, the KF chains are about 10 more efficient than the CO-PF. The results are summarized in Table 9.2. Due to the computational complexity of the PF-RWMH-V algorithm, the results reported in the table are based on N = 10, 000 instead of N = 100, 000 draws from the posterior distribution. We used the conditionally-optimal PF with M = 40, 000 particles and a single-block RWMH-V algorithm in which we scaled the posterior covariance matrix that served as covariance matrix of the proposal distribution by c2 = 0.252 for the KF and c2 = 0.052 for the conditionally-optimal PF. This leads to acceptance rates of 33% for the KF and 24% for the conditionally-optimal PF. In our experience, the noisy approximation of the likelihood function through the PF makes it necessary to reduce the variance of the proposal distribution to maintain a targeted acceptance rate. In the SW application the proposed moves using the PF approximation are about five times smaller than under the exact KF likelihood function. This increases the persistence of the Markov chain and leads to a reduction in accuracy. Because of the difference in precision of PF approximations at different points in the parameter space, the RWMH-V acceptance rates varies much more across chains. For example, the standard deviation of the acceptance rate for the CO-PF PMCMC is 0.09, about ten times larger than for the KF runs. While the pooled posterior means using the KF and the conditionally-optimal PF reported in Table 9.1 are very similar, the standard deviation of the means across runs is three to 192 five times larger if the PF approximation of the likelihood function is used. Because the PF approximation of the log likelihood function is downward-biased the log marginal data density approximation obtained with the PF is much smaller than the one obtained with the KF. Reducing the number of particles for the conditionally-optimal PF to 4,000 or switching to the bootstrap PF with 40,000 or 400,000 particles was not successful in the sense that the acceptance rate quickly dropped to zero. Reducing the variance of the proposal distribution did not solve the problem because to obtain a nontrivial acceptance rate the step-size had to be so small that the sampler would not be able to traverse the high posterior density region of the parameter space in a reasonable amount of time. In view of the accuracy of the likelihood approximation reported in Table 8.4 this is not surprising. The PF approximations are highly volatile and even though the PF approximation is unbiased in theory, finite sample averages appear to be severely biased. In a nutshell, if the variation in the likelihood, conditional on a particular value of θ, is much larger than the variation that we observe along a Markov chain (evaluating the likelihood for the sequence θi , i = 1, . . . , N ) that is generated by using the exact likelihood function, the sampler easily gets stuck in the sense that the acceptance rate for proposed draws drops to zero. Once the PF has generated an estimate p̂(Y |θ) that exceeds p(Y |θ) by a wide margin, it becomes extremely difficult to move to a nearby θ. The reason is that, on the one hand, most of the PF evaluations underestimate p(Y |θ) and, on the other hand, two θ’s that are close to each other tend to be associated with similar likelihood values. 9.4 Computational Considerations We implement the PFMH algorithm on a single machine, utilizing up to twelve cores. Efficient parallelization of the algorithm is difficult, because it is challenging to parallelize MCMC algorithms and it is not profitable to use distributed memory parallelization for the filter. For the small-scale DSGE model it takes 30:20:33 [hh:mm:ss] hours to generate 100,000 parameter draws using the bootstrap PF with 40,000 particles. Under the conditionally-optimal filter we only use 400 particles, which reduces the run time to 00:39:20 minutes. Thus, with the conditionally-optimal filter, the PFMH algorithm runs about 50 times faster and delivers highly accurate approximations of the posterior means. For the 193 SW model the computational time is substantially larger. It took 05:14:20:00 [dd:hh:mm:ss] days to generate 10,000 draws using the conditionally-optimal PF with 40,000 particles. In practical applications with nonlinear DSGE models the conditionally-optimal PF that we used in our numerical illustrations is typically not available and has to be replaced by one of the other filters, possibly an approximately conditionally-optimal PF. Having a good understanding of the accuracy of the PF approximation is crucial. Thus, we recommend to assess the variance of the likelihood approximation at various points in the parameter space as we did in Chapters 8.6 and 8.7 and to tailor the filter until it is reasonably accurate. To put the accuracy of the filter approximation into perspective, one could compare it to the variation in the likelihood function of a linearized DSGE model fitted to the same data, along a sequence of posterior draws θi . If the variation in the likelihood function due to the PF approximation is larger than the variation generated by moving through the parameter space, the PF-MH algorithm is unlikely to produce reliable results. In general, likelihood evaluations for nonlinear DSGE models are computationally very costly. Rather than spending computational resources on tailoring the proposal density for the PF to reduce the number of particles, one can also try to lower the number of likelihood evaluations in the MH algorithm. Smith (2012) developed a PFMH algorithm based on surrogate transitions. In a nutshell the algorithm proceeds as follows: Instead of evaluating the posterior density (and thereby the DSGE model likelihood function) for every proposal draw ϑ, one first evaluates the likelihood function for an approximate model, e.g., a linearized DSGE model, or one uses a fast approximate filter, e.g., an extended Kalman filter, to obtain a likelihood value for the nonlinear model. Using the surrogate likelihood, one can compute the acceptance probability α. For ϑ’s rejected in this step, one never has to execute the timeconsuming PF computations. If the proposed draw ϑ is accepted in the first stage, then a second randomization that requires the evaluation of the actual likelihood is necessary to determine whether θi = ϑ or θi = θi−1 . If the surrogate transition is well tailored, then the acceptance probability in the second step is high and the overall algorithm accelerates the posterior sampler by reducing the number of likelihood evaluations for poor proposals ϑ. 194 Table 9.2: Accuracy of MH Approximations Post. Mean (Pooled) KF (100β −1 π̄ ¯l − 1) CO-PF Ineff. Factors Std Dev of Means KF CO-PF KF CO-PF 0.14 0.14 172.58 3732.90 0.007 0.034 0.73 0.74 185.99 4343.83 0.016 0.079 0.51 0.37 174.39 3133.89 0.130 0.552 α 0.19 0.20 149.77 5244.47 0.003 0.015 σc 1.49 1.45 86.27 3557.81 0.013 0.086 Φ 1.47 1.45 134.34 4930.55 0.009 0.056 ϕ 5.34 5.35 138.54 3210.16 0.131 0.628 h 0.70 0.72 277.64 3058.26 0.008 0.027 ξw 0.75 0.75 343.89 2594.43 0.012 0.034 σl 2.28 2.31 162.09 4426.89 0.091 0.477 ξp 0.72 0.72 182.47 6777.88 0.008 0.051 ιw 0.54 0.53 241.80 4984.35 0.016 0.073 ιp 0.48 0.50 205.27 5487.34 0.015 0.078 ψ 0.45 0.44 248.15 3598.14 0.020 0.078 rπ 2.09 2.09 98.32 3302.07 0.020 0.116 ρ 0.80 0.80 241.63 4896.54 0.006 0.025 ry 0.13 0.13 243.85 4755.65 0.005 0.023 r∆y 0.21 0.21 101.94 5324.19 0.003 0.022 ρa 0.96 0.96 153.46 1358.87 0.002 0.005 ρb 0.22 0.21 325.98 4468.10 0.018 0.068 ρg 0.97 0.97 57.08 2687.56 0.002 0.011 ρi 0.71 0.70 219.11 4735.33 0.009 0.044 ρr 0.54 0.54 194.73 4184.04 0.020 0.094 ρp 0.80 0.81 338.69 2527.79 0.022 0.061 ρw 0.94 0.94 135.83 4851.01 0.003 0.019 ρga 0.41 0.37 196.38 5621.86 0.025 0.133 µp 0.66 0.66 300.29 3552.33 0.025 0.087 µw 0.82 0.81 218.43 5074.31 0.011 0.052 σa 0.34 0.34 128.00 5096.75 0.005 0.034 σb 0.24 0.24 186.13 3494.71 0.004 0.016 σg 0.51 0.49 208.14 2945.02 0.006 0.021 σi 0.43 0.44 115.42 6093.72 0.006 0.043 σr 0.14 0.14 193.37 3408.01 0.004 0.016 σp 0.13 0.13 194.22 4587.76 0.003 0.013 σw 0.22 0.22 211.80 2256.19 0.004 0.012 0.298 9.139 ln p̂(Y ) -964.44 -1017.94 Notes: Results are based on Nrun = 20 runs of the PF-RWMH-V algorithm. Each run of the algorithm generates N = 10, 000 draws. The likelihood function is computed with the Kalman filter (KF) or conditionally-optimal particle filter (CO-PF). “Pooled” means that we are pooling the draws from the Nrun = 20 runs to compute posterior statistics. The CO-PF uses M = 40, 000 particles to compute the likelihood. Chapter 10 Combining Particle Filters with SMC Samplers Following recent work by Chopin, Jacob, and Papaspiliopoulos (2012), we now combine the SMC algorithm of Chapter 5 with the particle filter approximation of the likelihood function developed in Chapter 8 to develop an SM C 2 algorithm. 10.1 An SM C 2 Algorithm As with the PFMH algorithm, our goal is to obtain a posterior sampler for the DSGE model parameters for settings in which the likelihood function of the DSGE model cannot be evaluated with the Kalman filter. The starting point is the SMC Algorithm 8. However, we make a number of modifications to our previous algorithm. Some of these modifications are important, others are merely made to simplify the exposition. First and foremost, we add data sequentially to the likelihood function rather than tempering the entire likelihood function: we consider the sequence of posteriors πnD (θ) = p(θ|Y1:tn ), defined in (5.3), where tn = bφn T c. The advantage of using data tempering are that the particle filter can deliver an unbiased estimate of the incremental weight p(Ytn−1 +1:tn |θ) in the correction step, whereas the estimate of a concave transformation p(Y1:T |θ)φn −φn−1 tends to be biased. Moreover, in general one has to evaluate the likelihood only for tn observations instead of all T observations, which can speed up computations considerably. Second, the evaluation of the incremental and the full likelihood function in the correction and mutation steps of Algorithm 8 are replaced by the evaluation of the respective 196 Table 10.1: Particle System for SM C 2 Sampler After Stage n Parameter State (θn1 , Wn1 ) (θn2 , Wn2 ) 1,1 (s1,1 tn , Wtn ) 2,1 (s2,1 tn , Wtn ) (st1,2 , Wt1,2 ) n n (st2,2 , Wt2,2 ) n n (θnN , WnN ) N,1 (sN,1 tn , Wtn ) .. . .. . ··· 1,M (s1,M tn , Wtn ) .. . ··· .. . 2,M (s2,M tn , Wtn ) .. . N,2 (sN,2 tn , Wtn ) ··· N,M (sN,M ) tn , Wtn particle filter approximations. Using the same notation as in (9.3), we write the particle approximations as p̂(ytn−1 +1:tn |Y1:tn−1 , θ) = g(ytn−1 +1:tn |Y1:tn−1 , θ, U1:tn ), p̂(Y1:tn |θn ) = g(Y1:tn |θn , U1:tn ). (10.1) As before, U1:tn is an array of iid uniform random variables generated by the particle filter with density p(U1:tn ), see (9.4). The approximation of the likelihood increment also depends on the entire sequence p(U1:tn ), because of the recursive structure of the filter: the particle approximation of p(stn−1 +1 |Y1:tn−1 , θ) is dependent on the particle approximation of p(stn−1 |Y1:tn−1 , θ). The distribution of U1:tn does neither depend on θ nor on Y1:tn and can be factorized as p(U1:tn ) = p(U1:t1 )p(Ut1 +1:t2 ) · · · p(Utn−1 +1:tn ). (10.2) To describe the particle system we follow the convention of Chapter 5 and index the parameter vector θ by the stage n of the SMC algorithm and write θn . The particles generated by the SMC sampler are indexed i = 1, . . . , N and the particles generated by the particle filter are indexed j = 1, . . . , M . At stage n we have a particle system {θni , Wni }N i=1 that represents the posterior distribution p(θn |Y1:tn ). Moreover, for each θni we have a particle system that represents the distribution p(st |Y1:tn , θni ). To distinguish the weights used for the particle values that represent the conditional distribution of θt from the weights used to characterize the conditional distribution of st , we denote the latter by W instead of W . Moreover, because the distribution of the states is conditional on the value of θ, we use i, j i,j M superscripts: {si,j t , Wt }j=1 . The particle system can be arranged in the matrix form given in Table 10.1. Finally, to streamline the notation used in the description of the algorithm, we assume that during each stage n exactly one observation is added to the likelihood function. Thus, 197 we can write θt instead of θn and Y1:t instead of Y1:tn and the number of stages is Nφ = T . Moreover, we resample the θ particles at every iteration of the algorithm (which means we do not have to keep track of the resampling indicator ρt ) and we only use one MH step in the mutation phase. Algorithm 19 (SM C 2 ) iid 1. Initialization. Draw the initial particles from the prior: θ0i ∼ p(θ) and W0i = 1, i = 1, . . . , N . 2. Recursion. For t = 1, . . . , T , (a) Correction. Reweight the particles from stage t − 1 by defining the incremental weights i i i w̃ti = p̂(yt |Y1:t−1 , θt−1 ) = g(yt |Y1:t−1 , θt−1 , U1:t ) (10.3) and the normalized weights W̃ti = 1 N i w̃ni Wt−1 , PN i i i=1 w̃t Wt−1 i = 1, . . . , N. (10.4) An approximation of Eπt [h(θ)] is given by h̃t,N N 1 X i i = W̃ h(θt−1 ). N i=1 t (10.5) (b) Selection. Resample the particles via multinomial resampling. Let {θ̂ti }M i=1 denote M iid draws from a multinomial distribution characterized by support points and i i weights {θt−1 , W̃ti }M j=1 and set Wt = 1. Define the vector of ancestors At with elements Ait by setting Ait = k if the ancestor of resampled particle i is particle k, k that is, θ̂ti = θt−1 . An approximation of Eπt [h(θ)] is given by ĥt,N N 1 X i i W h(θ̂t ). = N j=1 t (10.6) (c) Mutation. Propagate the particles {θ̂ti , Wti } via 1 step of an MH algorithm. The proposal distribution is given by ∗i q(ϑit |θ̂ti )p(U1:t ) (10.7) 198 and the acceptance ratio can be expressed as ( ) i i i i ) | θ̂ )/q(ϑ )p(ϑ p̂(Y |ϑ 1:t t t t t α(ϑit |θ̂ti ) = min 1, . i i i i p̂(Y1:t |θ̂t )p(θ̂t )/q(θ̂t |ϑt ) (10.8) An approximation of Eπt [h(θ)] is given by h̄t,N N 1 X h(θti )Wti . = N i=1 (10.9) 3. For t = T the final importance sampling approximation of Eπ [h(θ)] is given by: h̄T,N = N X h(θTi )WTi . (10.10) i=1 A formal analysis of SM C 2 algorithms is provided in Chopin, Jacob, and Papaspiliopoulos (2012). We will provide a heuristic explanation of why the algorithm correctly approximates the target posterior distribution and comment on some aspects of the implementai i }N , Wt−1 tion. At the end of iteration t − 1 the algorithm has generated particles {θt−1 i=1 . i For each parameter value θt−1 there is also a particle filter approximation of the likelihood i,j M i ), a swarm of particles {si,j function p̂(Y1:t−1 |θt−1 t−1 , Wt−1 }j=1 that represents the distribution i i p(st−1 |Y1:t−1 , θt−1 ), and the sequence of random vectors U1:t−1 that underlies the simulation approximation of the particle filter. To gain an understanding of the algorithm it is useful i i i }N , Wt−1 , U1:t−1 to focus on the triplets {θt−1 i=1 . Suppose that Z Z N 1 X i i i h(θ, U1:t−1 )p(U1:t−1 )p(θ|Y1:t−1 )dU1:t−1 dθ ≈ h(θt−1 , U1:t−1 )Wt−1 . N i=1 (10.11) This implies that we obtain the familiar approximation for functions h(·) that do not depend on U1:t−1 Z N 1 X i i h(θ)p(θ|Y1:t−1 )dθ ≈ h(θt−1 )Wt−1 . N i=1 (10.12) i Correction Step. The incremental likelihood p̂(yt |Y1:t−1 , θt−1 ) can be evaluated by iterating i,j M the particle filter forward for one period, starting from {si,j t−1 , Wt−1 }j=1 . Using the notation in (10.1), the particle filter approximation of the likelihood increment can be written as i i i p̂(yt |Y1:t−1 , θt−1 ) = g(yt |Y1:t−1 , U1:t , θt−1 ). (10.13) 199 The value of the likelihood function for Y1:t can be tracked recursively as follows: i i i ) p̂(Y1:t |θt−1 ) = p̂(yt |Y1:t−1 , θt−1 )p̂(Y1:t−1 |θt−1 (10.14) i i i i = g(yt |Y1:t , U1:t , θt−1 )g(Y1:t−1 |U1:t−1 , θt−1 ) i i = g(Y1:t |U1:t , θt−1 ). i i The last equality follows because conditioning g(Y1:t−1 |U1:t−1 , θt−1 ) also on Ut does not change the particle filter approximation of the likelihood function for Y1:t−1 . By induction, we can deduce from (10.11) that the Monte Carlo average 1 N PN approximates the following integral Z Z h(θ)g(yt |Y1:t−1 , U1:t , θ)p(U1:t )p(θ|Y1:t−1 )dU1:t dθ Z Z g(yt |Y1:t−1 , U1:t , θ)p(U1:t )dU1:t p(θ|Y1:t−1 )dθ. = h(θ) i=1 i i )w̃ti Wt−1 h(θt−1 (10.15) Provided that the particle filter approximation of the likelihood increment is unbiased, that is, Z g(yt |Y1:t−1 , U1:t , θ)p(U1:t )dU1:t = p(yt |Y1:t−1 , θ) (10.16) for each θ, we deduce that h̃t,N is a consistent estimator of Eπt [h(θ)]. Selection Step. The selection step Algorithm 19 is very similar to Algorithm 8. To simplify the description of the SM C 2 algorithm, we are resampling in every iteration. Moreover, we are keeping track of the ancestry information in the vector At . This is important, because for each resampled particle i we not only need to know its value θ̂ti but we also want to track the corresponding value of the likelihood function p̂(Y1:t |θ̂ti ) as well as the particle i,j i approximation of the state, given by {si,j t , Wt }, and the set of random numbers U1:t . In the implementation of the algorithm, the likelihood values are needed for the mutation step and the state particles are useful for a quick evaluation of the incremental likelihood in the i correction step of iteration t + 1 (see above). The U1:t ’s are not required for the actual implementation of the algorithm but are useful to provide a heuristic explanation for the validity of the algorithm. Mutation Step. The mutation step essentially consists of one iteration of the PFMH algorithm described in Chapter 9.1. For each particle i there is a proposed value ϑit and an associated particle filter approximation p̂(Y1:t |ϑit ) of the likelihood and sequence of random ∗ i vectors U1:t drawn from the distribution p(U1:t ) in (10.2). As in (9.8), the densities p(U1:t ) 200 ∗ ) cancel from the formula for the acceptance probability α(ϑit |θ̂ti ). For the impleand p(U1:t mentation it is important to record the likelihood value as well as the particle system for the state st for each particle θti . 10.2 Application to the Small-Scale DSGE Model We now present an application of the SM C 2 algorithm to the small-scale DSGE model. The results in this section can be compared to the results obtained in Chapter 9.2. Because the SM C 2 algorithm requires an unbiased approximation of the likelihood function, we will use data tempering instead of likelihood tempering as in Chapter 5.3. Overall, we compare the output of four algorithms: SM C 2 based on the conditionally-optimal PF; SM C 2 based on the bootstrap PF; SMC based on the Kalman filter likelihood function using data tempering; and SMC based on the Kalman filter likelihood function using likelihood tempering. In order to approximate the likelihood function with the particle filter, we are using M = 40, 000 particles for the bootstrap PF and M = 400 particles for the conditionally-optimal PF. The approximation of the posterior distribution is based on N = 4, 000 particles for θ, Nφ = T + 1 = 81 stages under data tempering, and Nblocks = 3 blocks for the mutation step. Table 10.2 summarizes the results from running each algorithm Nrun = 20 times. We report pooled posterior means from the output of the 20 runs as well as inefficiency factors InEffN (θ̄) and the standard deviation of the posterior mean approximations across the 20 runs. The results in the column labeled KF(L) are based on the Kalman filter likelihood evaluation and obtained from the same algorithm that was used in Chapter 5.3. The results in column KF(D) are also based on the Kalman filter, but the SMC algorithm uses data tempering instead of likelihood tempering. The columns CO-PF and BS-BF contain SM C 2 results based on the conditionally-optimal and the bootstrap PF, respectively. The pooled means of the DSGE model parameters computed from output of the KF(L), KF(D), and COPF algorithms are essentially identical. The log marginal data density approximations are less accurate than the posterior mean approximations and vary for the first three algorithms from -358.75 to -356.33. A comparison of the standard deviations and the inefficiency factors indicates that moving from likelihood tempering to data tempering leads to a deterioration of accuracy. For instance, the standard deviation of the log marginal data density increases from 0.12 to 1.19. As discussed in Chapter 5.3 in DSGE model applications it is important to use a 201 Table 10.2: Accuracy of SM C 2 Approximations Posterior Mean (Pooled) Inefficiency Factors KF(L) KF(D) CO-PF BS-PF τ 2.65 2.67 2.68 2.53 1.51 κ 0.81 0.81 0.81 0.70 1.40 8.36 40.60 ψ1 1.87 1.88 1.87 1.89 3.29 18.27 22.56 ψ2 0.66 0.66 0.67 0.65 2.72 10.02 43.30 ρr 0.75 0.75 0.75 0.72 1.31 11.39 ρg 0.98 0.98 0.98 0.95 1.32 ρz 0.88 0.88 0.88 0.84 3.16 r(A) 0.45 0.46 0.44 0.46 1.09 (A) 3.32 3.31 3.31 3.56 γ (Q) 0.59 0.59 0.59 0.64 σr 0.24 0.24 0.24 σg 0.68 0.68 0.68 π σz ln p(Y ) KF(L) KF(D) CO-PF BS-PF Std Dev of Means KF(L) KF(D) 0.01 0.03 7223 0.00 4785 0.01 4197 60.18 4.28 15.06 0.76 0.01 0.01 0.18 0.02 0.02 0.27 0.01 0.02 0.03 0.34 14979 0.00 0.00 0.01 0.08 250.34 21736 0.00 0.00 0.00 0.04 35.35 10802 0.00 0.00 0.00 0.05 26.58 73.78 7971 0.00 0.02 0.04 0.42 2.15 40.45 158.64 6529 0.01 0.03 0.06 0.40 2.35 32.35 133.25 5296 0.00 0.01 0.03 0.16 0.26 0.75 7.29 43.96 16084 0.00 0.00 0.00 0.06 0.73 1.30 1.48 20.20 5098 0.00 0.00 0.00 0.08 2.32 3.63 26.98 41284 0.32 0.32 0.32 0.42 -357.34 -356.33 -340.47 47.60 6570 CO-PF BS-PF 0.07 -358.75 10.41 0.00 0.00 0.00 0.11 0.120 1.191 4.374 14.49 Notes: Results are based on Nrun = 20 runs of the SM C 2 algorithm with N = 4, 000 particles. D is data tempering and L is likelihood tempering. KF is Kalman filter, CO-PF is conditionally-optimal PF with M = 400, BS-PF is bootstrap PF with M = 40, 000. CO-PF and BS-PF use data tempering. convex tempering schedule that adds very little likelihood information in the initial stages. The implied tempering schedule of our sequential estimation procedure is linear and adds a full observation in stage n = 1 (recall that n = 0 corresponds to sampling from the prior distribution). Replacing the Kalman filter evaluation of the likelihood function by the conditionally-optimal particle filter, increases the standard deviations further. Compared to KF(D) the standard deviations of the posterior mean approximations increase by factors ranging from 1.5 to 5. The inefficiency factor for the KF(D) algorithm range from 1.5 to 40, whereas they range from 20 to 250 for CO-PF. A comparison with Table 9.1 indicates that the SMC algorithm is more sensitive to the switch from the Kalman filter likelihood to the particle filter approximation. Using the conditionally-optimal particle filter, there seems to be no deterioration in accuracy of the RWMH algorithm. Finally, replacing the conditionally-optimal PF by the bootstrap PF leads to an additional deterioration in accuracy. Compared to KF(D) the standard deviations for the BS-PF approach are an order of magnitude larger and the smallest inefficiency factor is 4,200. Nonetheless, the pooled posterior means are fairly close to those obtained from the other three algorithms. 202 10.3 Computational Considerations The SM C 2 results reported in Table 10.2 are obtained by utilizing 40 processors. We parallelized the likelihood evaluations p̂(Y1:t |θti ) for the θti particles rather than the particle filter i,j M computations for the swarms {si,j t , Wt }j=1 . The likelihood evaluations are computationally costly and do not require communication across processors. The run time for the SM C 2 with conditionally-optimal PF (N = 4, 000, M = 400) is 23:24 [mm:ss] minutes, where as the algorithm with bootstrap PF (N = 4, 000 and M = 40, 000) runs for 08:05:35 [hh:mm:ss] hours. The bootstrap PF performs poorly in terms of accuracy and runtime. After running the particle filter for the sample Y1:t−1 one could in principle save the particle swarm for the final state st−1 for each θti . In the period t forecasting step, this information can then be used to quickly evaluate the likelihood increment. In our experience with the smallscale DSGE model, the sheer memory size of the objects (in the range of 10-20 gigabytes) precluded us from saving the t−1 state particle swarms in a distributed parallel environment in which memory transfers are costly. Instead, we re-computed the entire likelihood for Y1:t in each iteration. Our sequential (data-tempering) implementation of the SM C 2 algorithm suffers from particle degeneracy in the intial stages, i.e., for small sample sizes. Instead of initially sampling from the prior distribution, one could initialize the algorithm by using an importance sampler with a student-t proposal distribution that approximates the posterior distribution obtained conditional on a small set of observations, e.g., Y1:2 or Y1:5 , as suggested in Creal (2007). Bibliography Altug, S. (1989): “Time-to-Build and Aggregate Fluctuations: Some New Evidence,” International Economic Review, 30(4), 889–920. An, S., and F. Schorfheide (2007): “Bayesian Analysis of DSGE Models,” Econometric Reviews, 26(2-4), 113–172. Anderson, G. (2000): “A Reliable and Computationally Efficient Algorithm for Imposing the Saddle Point Property in Dynamic Models,” Manuscript, Federal Reserve Board of Governors. Andreasen, M. M. (2013): “Non-Linear DSGE Models and the Central Difference Kalman Filter,” Journal of Applied Econometrics, 28(6), 929–955. Andrieu, C., A. Doucet, and R. Holenstein (2010): “Particle Markov Chain Monte Carlo Methods,” Journal of the Royal Statistical Society Series B, 72(3), 269–342. Ardia, D., N. Bastürk, L. Hoogerheide, and H. K. van Dijk (2012): “A Comparative Study of Monte Carlo Methods for Efficient Evaluation of Marginal Likelihood,” Computational Statistics and Data Analysis, 56(11), 3398–3414. Arulampalam, S., S. Maskell, N. Gordon, and T. Clapp (2002): “A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Transactions on Signal Processing, 50(2), 174–188. Aruoba, B., P. Cuba-Borda, and F. Schorfheide (2014): “Macroeconomic Dynamics Near the ZLB: A Tale of Two Countries,” NBER Working Paper, 19248. Aruoba, S. Boragan, L. B., and F. Schorfheide (2013): “Assessing DSGE Model Nonlinearities,” NBER Working Paper, 19693, Manuscript, University of Maryland and University of Pennsylvania. 203 204 Aruoba, S. B., J. Fernández-Villaverde, and J. F. Rubio-Ramı́rez (2006): “Comparing Solution Methods for Dynamic Equilibrium Economies,” Journal of Economic Dynamics and Control, 30(12), 2477–2508. Berzuini, C., and W. Gilks (2001): “RESAMPLE-MOVE Filtering with Cross-Model Jumps,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon, pp. 117–138. Springer Verlag. Bianchi, F. (2013): “Regime Switches, Agents’ Beliefs, and Post-World War II U.S. Macroeconomic Dynamics,” Review of Economic Studies, 80(2), 463–490. Binder, M., and H. Pesaran (1997): “Multivariate Linear Rational Expectations Models: Characterization of the Nature of the Solutions and Their Fully Recursive Computation,” Econometric Theory, 13(6), 877–888. Blanchard, O. J., and C. M. Kahn (1980): “The Solution of Linear Difference Models under Rational Expectations,” Econometrica, 48(5), 1305–1312. Canova, F., and L. Sala (2009): “Back to Square One: Identification Issues in DSGE Models,” Journal of Monetary Economics, 56, 431 – 449. Cappé, O., S. J. Godsill, and E. Moulines (2007): “An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo,” Proceedings of the IEEE, 95(5), 899– 924. Cappé, O., E. Moulines, and T. Ryden (2005): Inference in Hidden Markov Models. Springer Verlag. Chang, Y., T. Doh, and F. Schorfheide (2007): “Non-stationary Hours in a DSGE Model,” Journal of Money, Credit, and Banking, 39(6), 1357–1373. Chen, R., and J. Liu (2000): “Mixture Kalman Filters,” Journal of the Royal Statistical Society Series B, 62, 493–508. Chib, S., and E. Greenberg (1995): “Understanding the Metropolis-Hastings Algorithm,” The American Statistician, 49, 327–335. Chib, S., and I. Jeliazkov (2001): “Marginal Likelihoods from the Metropolis Hastings Output,” Journal of the American Statistical Association, 96(453), 270–281. 205 Chib, S., and S. Ramamurthy (2010): “Tailored Randomized Block MCMC Methods with Application to DSGE Models,” Journal of Econometrics, 155(1), 19–38. Chopin, N. (2002): “A Sequential Particle Filter for Static Models,” Biometrika, 89(3), 539–551. (2004): “Central Limit Theorem for Sequential Monte Carlo Methods and its Application to Bayesian Inference,” Annals of Statistics, 32(6), 2385–2411. Chopin, N., P. E. Jacob, and O. Papaspiliopoulos (2012): “SM C 2 : An Efficient Algorithm for Sequential Analysis of State-Space Models,” arXiv:1101.1528. Christiano, L. J. (2002): “Solving Dynamic Equilibrium Models by a Methods of Undetermined Coefficients,” Computational Economics, 20(1-2), 21–55. Christiano, L. J., M. Eichenbaum, and C. L. Evans (2005): “Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy,” Journal of Political Economy, 113(1), 1–45. Cox, W., and E. Hirschhorn (1983): “The Market Value of US Government Debt,” Journal of Monetary Economics, 11, 261–272. Creal, D. (2007): “Sequential Monte Carlo Samplers for Bayesian DSGE Models,” Unpublished Manuscript, Vrije Universitiet. (2012): “A Survey of Sequential Monte Carlo Methods for Economics and Finance,” Econometric Reviews, 31(3), 245–296. Curdia, V., and R. Reis (2009): “Correlated Disturbances and U.S. Business Cycles,” Working Paper. (2010): “Correlated Disturbances and U.S. Business Cycles,” Manuscript, Columbia University and FRB New York. Davig, T., and E. M. Leeper (2007): “Generalizing the Taylor Principle,” American Economic Review, 97(3), 607–635. DeJong, D. N., B. F. Ingram, and C. H. Whiteman (2000): “A Bayesian Approach to Dynamic Macroeconomics,” Journal of Econometrics, 98(2), 203 – 223. 206 DeJong, D. N., R. Liesenfeld, G. V. Moura, J.-F. Richard, and H. Dharmarajan (2013): “Efficient Likelihood Evaluation of State-Space Representations,” Review of Economic Studies, 80(2), 538–567. Del Moral, P. (2004): Feynman-Kac Formulae. Springer Verlag. (2013): Mean Field Simulation for Monte Carlo Integration. Chapman & Hall/CRC. Del Negro, M., R. B. Hasegawa, and F. Schorfheide (2014): “Dynamic Prediction Pools: An Investigation of Financial Frictions and Forecasting Performance,” NBER Working Paper, 20575. Del Negro, M., and F. Schorfheide (2008): “Forming Priors for DSGE Models (and How it Affects the Assessment of Nominal Rigidities),” Journal of Monetary Economics, 55(7), 1191–1208. (2013): “DSGE Model-Based Forecasting,” in Handbook of Economic Forecasting, ed. by G. Elliott, and A. Timmermann, vol. 2, forthcoming. North Holland, Amsterdam. Doucet, A., N. de Freitas, and N. Gordon (eds.) (2001): Sequential Monte Carlo Methods in Practice. Springer Verlag. Doucet, A., and A. M. Johansen (2011): “A Tutorial on Particle Filtering and Smoothing: Fifteen Years Later,” in Handook of Nonlinear Filtering, ed. by D. Crisan, and B. Rozovsky. Oxford University Press. Durbin, J., and S. J. Koopman (2001): Time Series Analysis by State Space Methods. Oxford University Press. Durham, G., and J. Geweke (2014): “Adaptive Sequential Posterior Simulators for Massively Parallel Computing Environments,” in Advances in Econometrics, ed. by I. Jeliazkov, and D. Poirier, vol. 34, chap. 6, pp. 1–44. Emerald Group Publishing Limited. Farmer, R., D. Waggoner, and T. Zha (2009): “Understanding Markov Switching Rational Expectations Models,” Journal of Economic Theory, 144(5), 1849–1867. Fernández-Villaverde, J., G. Gordon, P. Guerrón-Quintana, and J. F. RubioRamı́rez (2012): “Nonlinear Adventures at the Zero Lower Bound,” National Bureau of Economic Research Working Paper. 207 Fernandez-Villaverde, J., and J. F. Rubio-Ramirez (2004): “Comparing dynamic equilibrium models to data: a Bayesian approach,” Journal of Econometrics, 123(1), 153 – 187. Fernández-Villaverde, J., and J. F. Rubio-Ramı́rez (2007): “Estimating Macroeconomic Models: A Likelihood Approach,” Review of Economic Studies, 74(4), 1059–1087. (2013): “Macroeconomics and Volatility: Data, Models, and Estimation,” in Advances in Economics and Econometrics: Tenth World Congress, ed. by D. Acemoglu, M. Arellano, and E. Dekel, vol. 3, pp. 137–183. Cambridge University Press. Flury, T., and N. Shephard (2011): “Bayesian Inference Based Only On Simulated Likelihood: Particle Filter Analysis of Dynamic Economic Models,” Econometric Theory, 27, 933–956. Foerster, A., J. F. Rubio-Ramirez, D. F. Waggoner, and T. Zha (2014): “Perturbation Methods for Markov-Switching DSGE Models,” Manuscript, FRB Kansas City and Atlanta. Gali, J. (2008): Monetary Policy, Inflation, and the Business Cycle: An Introduction to the New Keynesian Framework. Princeton University Press. Geweke, J. (1989): “Bayesian Inference in Econometric Models Using Monte Carlo Integration,” Econometrica, 57(6), 1317–1399. (1999): “Using Simulation Methods for Bayesian Econometric Models: Inference, Development, and Communication,” Econometric Reviews, 18(1), 1–126. (2005): Contemporary Bayesian Econometrics and Statistics. John Wiley & Sons, Inc. Girolami, M., and B. Calderhead (2011): “Riemann Manifold Langevin and Hamilton Monte Carlo Methods (with discussion),” Journal of the Royal Statistical Society Series B, 73, 123–214. Gordon, N., D. Salmond, and A. F. Smith (1993): “Novel Approach to Nonlinear/NonGaussian Bayesian State Estimation,” Radar and Signal Processing, IEE Proceedings F, 140(2), 107–113. 208 Guo, D., X. Wang, and R. Chen (2005): “New Sequential Monte Carlo Methods for Nonlinear Dynamic Systems,” Statistics and Computing, 15, 135.147. Gust, C., D. Lopez-Salido, and M. E. Smith (2012): “The Empirical Implications of the Interest-Rate Lower Bound,” Manuscript, Federal Reserve Board. Halpern, E. F. (1974): “Posterior Consistency for Coefficient Estimation and Model Selection in the General Linear Hypothesis,” Annals of Statistics, 2(4), 703–712. Hamilton, J. D. (1989): “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle,” Econemetrica, 57(2), 357–384. (1994): Time Series Analysis. Pri. Hammersley, J., and D. Handscomb (1964): Monte Carlo Methods. Methuen and Company, London. Hastings, W. (1970): “Monte Carlo Sampling Methods Using Markov Chains and Their Applications,” Biometrika, 57, 97–109. Herbst, E. (2011): “Gradient and Hessian-based MCMC for DSGE Models,” Unpublished Manuscript, University of Pennsylvania. Herbst, E., and F. Schorfheide (2014): “Sequential Monte Carlo Sampling for DSGE Models,” Journal of Applied Econometrics, forthcoming. Hoeting, J. A., D. Madigan, A. E. Raftery, and C. T. Volinsky (1999): “Bayesian Model Averaging: A Tutorial,” Statistical Science, 14(4), 382–417. Holmes, M. (1995): Introduction to Perturbation Methods. Cambridge University Press. Ireland, P. N. (2004): “A Method for Taking Models to the Data,” Journal of Economic Dynamics and Control, 28(6), 1205–1226. Iskrev, N. (2010): “Local Identification of DSGE Models,” Journal of Monetary Economics, 2, 189–202. Johnson, R. (1970): “Asymptotic Expansions Associated with Posterior Distributions,” Annals of Mathematical Statistics, 41, 851–864. Judd, K. (1998): Numerical Methods in Economics. MIT Press, Cambridge. 209 Justiniano, A., and G. E. Primiceri (2008): “The Time-Varying Volatility of Macroeconomic Fluctuations,” American Economic Review, 98(3), 604–641. Kantas, N., A. Doucet, S. Singh, J. Maciejowski, and N. Chopin (2014): “On Particle Methods for Parameter Estimation in State-Space Models,” arXiv Working Paper, 1412.8659v1. Kass, R. E., and A. E. Raftery (1995): “Bayes Factors,” Journal of the American Statistical Association, 90(430), 773–795. Kim, J., S. Kim, E. Schaumburg, and C. A. Sims (2008): “Calculating and Using Second-Order Accurate Solutions of Discrete Time Dynamic Equilibrium Models,” Journal of Economic Dynamics and Control, 32, 3397–3414. Kim, J., and F. Ruge-Murcia (2009): “How Much Inflation is Necessary to Grease the Wheels,” Journal of Monetary Economics, 56, 365–377. King, R. G., C. I. Plosser, and S. Rebelo (1988): “Production, Growth, and Business Cycles: I The Basic Neoclassical Model,” Journal of Monetary Economics, 21(2-3), 195– 232. King, R. G., and M. W. Watson (1998): “The Solution of Singluar Linear Difference Systems under Rational Expectations,” International Economic Review, 39(4), 1015–1026. Klein, P. (2000): “Using the Generalized Schur Form to Solve a Multivariate Linear Rational Expectations Model,” Journal of Economic Dynamics and Control, 24(10), 1405–1423. Kloek, T., and H. K. van Dijk (1978): “Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo,” Econometrica, 46, 1–20. Koenker, R. (2005): Quantile Regression. Cambridge University Press. Koenker, R., and G. Bassett (1978): “Regression Quantiles,” Econometrica, 46(1), 33–50. Kohn, R., P. Giordani, and I. Strid (2010): “Adaptive Hybrid Metropolis-Hastings Samplers for DSGE Models,” Working Paper. Kollmann, R. (2014): “Tractable Latent State Filtering for Non-Linear DSGE Models Using a Second-Order Approximation and Pruning,” Computational Economics, forthcoming. 210 Komunjer, I., and S. Ng (2011): “Dynamic Identification of DSGE Models,” Econometrica, 79(6), 1995–2032. Koop, G. (2003): Bayesian Econometrics. John Wiley & Sons, Inc. Künsch, H. R. (2005): “Recursive Monte Carlo Filters: Algorithms and Theoretical Analysis,” Annals of Statistics, 33(5), 1983–2021. Kydland, F. E., and E. C. Prescott (1982): “Time to Build and Aggregate Fluctuations,” Econometrica, 50(6), 1345–70. Lancaster, T. (2004): An Introduction to Modern Bayesian Econometrics. Blackwell Publishing. Leamer, E. E. (1978): Specification Searches. Wiley, New York. Leeper, E. M., M. Plante, and N. Traum (2010): “Dynamics of Fiscaling Financing in the United States,” Journal of Econometrics, 156, 304–321. Liu, J. S. (2001): Monte Carlo Strategies in Scientific Computing. Springer Verlag. Liu, J. S., and R. Chen (1998): “Sequential Monte Carlo Methods for Dynamic Systems,” Journal of the American Statistical Association, 93(443), 1032–1044. Liu, J. S., R. Chen, and T. Logvinenko (2001): “A Theoretical Franework for Sequential Importance Sampling with Resampling,” in Sequential Monte Carlo Methods in Practice, ed. by A. Doucet, N. de Freitas, and N. Gordon, pp. 225–246. Springer Verlag. Liu, Z., D. F. Waggoner, and T. Zha (2011): “Sources of Macroeconomic Fluctuations: A Regime-switching DSGE Approach,” Quantitative Economics, 2, 251–301. Lubik, T., and F. Schorfheide (2003): “Computing Sunspot Equilibria in Linear Rational Expectations Models,” Journal of Economic Dynamics and Control, 28(2), 273–285. (2006): “A Bayesian Look at the New Open Macroeconomics,” NBER Macroeconomics Annual 2005. Maliar, L., and S. Maliar (2014): “Merging Simulation and Projection Approaches to Solve High-Dimensional Problems with an Application to a New Keynesian Model,” Quantitative Economics, forthcoming. 211 Metropolis, N., A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller (1953): “Equations of State Calculations by Fast Computing Machines,” Journal of Chemical Physics, 21, 1087–1091. Min, C.-K., and A. Zellner (1993): “Bayesian and Non-Bayesian Methods for Combining Models and Forecasts with Applications to Forecasting International Growth Rates,” Journal of Econometrics, 56(1-2), 89–118. Müller, U. (2011): “Measuring Prior Sensitivity and Prior Informativeness in Large Bayesian Models,” Manuscript, Princeton University. Murray, L. M., A. Lee, and P. E. Jacob (2014): “Parallel Resampling in the Particle Filter,” arXiv Working Paper, 1301.4019v2. Neal, R. (2010): “MCMC using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo, ed. by S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, pp. 113–162. Chapman & Hall, CRC Press. Otrok, C. (2001): “On Measuring the Welfare Costs of Business Cycles,” Journal of Monetary Economics, 47(1), 61–92. Phillips, D., and A. Smith (1994): “Bayesian Model Comparison via Jump Diffusions,” Technical report 94-20, Imperial College of Science, Technology, and Medicine, London. Pitt, M. K., and N. Shephard (1999): “Filtering via Simulation: Auxiliary Particle Filters,” Journal of the American Statistical Association, 94(446), 590–599. Qi, Y., and T. P. Minka (2002): “Hessian-based Markov Chain Monte-Carlo Algorithms,” Unpublished Manuscript. Rı́os-Rull, J.-V., F. Schorfheide, C. Fuentes-Albero, M. Kryshko, and R. Santaeulalia-Llopis (2012): “Methods versus Substance: Measuring the Effects of Technology Shocks,” Journal of Monetary Economics, 59(8), 826–846. Robert, C. P. (1994): The Bayesian Choice. Springer Verlag. Robert, C. P., and G. Casella (2004): Monte Carlo Statistical Methods. Springer. Roberts, G., and J. S. Rosenthal (1998): “Markov-Chain Monte Carlo: Some Practical Implications of Theoretical Results,” The Canadian Journal of Statistics, 25(1), 5–20. 212 Roberts, G., and O. Stramer (2002): “Langevin Diffusions and Metropolis-Hastings Algorithms,” Methodology and Computing in Applied Probability, 4, 337–357. Roberts, G. O., A. Gelman, and W. R. Gilks (1997): “Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms,” The Annals of Applied Probability, 7(1), 110–120. Roberts, G. O., and S. Sahu (1997): “Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler,” Journal of the Royal Statistical Society. Series B (Methodological), 59(2), 291–317. Roberts, G. O., and R. Tweedie (1992): “Exponential Convergence of Langevin Diffusions and Their Discrete Approximations,” Bernoulli, 2, 341 – 363. Rotemberg, J. J., and M. Woodford (1997): “An Optimization-Based Econometric Framework for the Evaluation of Monetary Policy,” in NBER Macroeconomics Annual 1997, ed. by B. S. Bernanke, and J. J. Rotemberg. MIT Press, Cambridge. Sargent, T. J. (1989): “Two Models of Measurements and the Investment Accelerator,” Journal of Political Economy, 97(2), 251–287. Schmitt-Grohé, S., and M. Uribe (2004): “Solving Dynamic General Equilibrium Models Using a Second-Order Approximation to the Policy Function,” Journal of Economic Dynamics and Control, 28, 755–775. (2012): “What’s News in Business Cycles?,” Econometrica, forthcoming. Schorfheide, F. (2000): “Loss Function-based Evaluation of DSGE Models,” Journal of Applied Econometrics, 15, 645–670. (2005): “Learning and Monetary Policy Shifts,” Review of Economic Dynamics, 8(2), 392–419. (2010): “Estimation and Evaluation of DSGE Models: Progress and Challenges,” NBER Working Paper. Schorfheide, F., D. Song, and A. Yaron (2014): “Identifiying Long-Run Risks: A Bayesian Mixed-Frequency Approach,” NBER Working Paper, 20303. Schwarz, G. (1978): “Estimating the Dimension of a Model,” Annals of Statistics, 6(2), 461–464. 213 Sims, C. A. (2002): “Solving Linear Rational Expectations Models,” Computational Economics, 20, 1–20. Sims, C. A., D. Waggoner, and T. Zha (2008): “Methods for Inference in Large Multiple-Equation Markov-Switching Models,” Journal of Econometrics, 146(2), 255–274. Smets, F., and R. Wouters (2003): “An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area,” Journal of the European Economic Association, 1(5), 1123–1175. Smets, F., and R. Wouters (2007): “Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach,” American Economic Review, 97, 586–608. Smith, M. (2012): “Estimating Nonlinear Economic Models Using Surrogate Transitions,” Manuscript, RePEc. Strid, I. (2009): “Efficient Parallelisation of Metropolis-Hastings Algorithms Using a Prefetching Approach,” Computational Statistics and Data Analysis, in press. Tierney, L. (1994): “Markov Chains for Exploring Posterior Distributions,” The Annals of Statistics, 22(4), 1701–1728. van der Vaart, A. (1998): Asymptotic Statistics. Cambridge University Press. Woodford, M. (2003): Interest and Prices. Princeton University Press. Wright, J. (2008): “Bayesian Model Averaging and Exchange Rate Forecasting,” Journal of Econometrics, 146, 329–341. Appendix A Model Descriptions A.1 Smets-Wouters Model The log-linearized equilibrium conditions of the Smets and Wouters (2007) model take the following form: ŷt = cy ĉt + iy ît + zy ẑt + εgt h/γ 1 wlc (σc − 1) ˆ ĉt = ĉt−1 + Et ĉt+1 + (lt − Et ˆlt+1 ) 1 + h/γ 1 + h/γ σc (1 + h/γ) 1 − h/γ b 1 − h/γ (r̂t − Et π̂t+1 ) − ε − (1 + h/γ)σc (1 + h/γ)σc t 1 βγ (1−σc ) 1 ît = î + E î + q̂t + εit t−1 t t+1 (1−σ ) (1−σ ) 2 c c 1 + βγ 1 + βγ ϕγ (1 + βγ (1−σc ) ) k q̂t = β(1 − δ)γ −σc Et q̂t+1 − r̂t + Et π̂t+1 + (1 − β(1 − δ)γ −σc )Et r̂t+1 − εbt ŷt = Φ(αk̂ts + (1 − α)ˆlt + εat ) (A.1) (A.2) (A.3) (A.4) (A.5) k̂ts = k̂t−1 + ẑt (A.6) 1−ψ k ẑt = (A.7) r̂ ψ t (1 − δ) k̂t = k̂t−1 + (1 − (1 − δ)/γ)ît + (1 − (1 − δ)/γ)ϕγ 2 (1 + βγ (1−σc ) )εit (A.8) γ µ̂pt = α(k̂ts − ˆlt ) − ŵt + εat (A.9) (1−σc ) βγ ιp π̂t = E π̂ + π̂t−1 (A.10) t t+1 (1−σ ) c 1 + ιp βγ 1 + βγ (1−σc ) (1 − βγ (1−σc ) ξp )(1 − ξp ) − µ̂pt + εpt (1−σ ) c (1 + ιp βγ )(1 + (Φ − 1)εp )ξp A-1 A-2 r̂tk = ˆlt + ŵt − k̂ts (A.11) µ̂w t (A.12) ŵt r̂t 1 (ĉt − h/γĉt−1 ) = ŵt − σl ˆlt − 1 − h/γ βγ (1−σc ) 1 = (Et ŵt+1 + Et π̂t+1 ) + (ŵt−1 − ιw π̂t−1 ) (1−σ ) c 1 + βγ 1 + βγ (1−σc ) 1 + βγ (1−σc ) ιw (1 − βγ (1−σc ) ξw )(1 − ξw ) µ̂w + εw − π̂ − t t 1 + βγ (1−σc ) (1 + βγ (1−σc ) )(1 + (λw − 1)w )ξw t = ρr̂t−1 + (1 − ρ)(rπ π̂t + ry (ŷt − ŷt∗ )) (A.13) (A.14) ∗ +r∆y ((ŷt − ŷt∗ ) − (ŷt−1 − ŷt−1 )) + εrt . The exogenous shocks evolve according to εat = ρa εat−1 + ηta (A.15) εbt = ρb εbt−1 + ηtb (A.16) εgt = ρg εgt−1 + ρga ηta + ηtg (A.17) εit = ρi εit−1 + ηti (A.18) εrt = ρr εrt−1 + ηtr (A.19) p εpt = ρr εpt−1 + ηtp − µp ηt−1 (A.20) w w εw = ρw εw t t−1 + ηt − µw ηt−1 . (A.21) A-3 The counterfactual no-rigidity prices and quantities evolve according to ŷt∗ = cy ĉ∗t + iy î∗t + zy ẑt∗ + εgt h/γ ∗ 1 wlc (σc − 1) ˆ∗ ∗ ĉt−1 + Et ĉ∗t+1 + (l − Et ˆlt+1 ) ĉ∗t = 1 + h/γ 1 + h/γ σc (1 + h/γ) t 1 − h/γ b 1 − h/γ ∗ − rt − ε (1 + h/γ)σc (1 + h/γ)σc t 1 βγ (1−σc ) 1 ∗ q̂ ∗ + εit î∗t = î + Et î∗t+1 + t−1 (1−σ ) (1−σ ) 2 c c 1 + βγ 1 + βγ ϕγ (1 + βγ (1−σc ) ) t k∗ ∗ − εbt − rt∗ + (1 − β(1 − δ)γ −σc )Et rt+1 q̂t∗ = β(1 − δ)γ −σc Et q̂t+1 ŷt∗ = Φ(αkts∗ + (1 − α)ˆlt∗ + εat ) ∗ k̂ts∗ = kt−1 + zt∗ 1 − ψ k∗ ẑt∗ = r̂ ψ t (1 − δ) ∗ k̂t−1 + (1 − (1 − δ)/γ)ît + (1 − (1 − δ)/γ)ϕγ 2 (1 + βγ (1−σc ) )εit k̂t∗ = γ ŵt∗ = α(k̂ts∗ − ˆlt∗ ) + εat r̂tk∗ = ˆlt∗ + ŵt∗ − k̂t∗ 1 ŵt∗ = σl ˆlt∗ + (ĉ∗t + h/γĉ∗t−1 ). 1 − h/γ (A.22) (A.23) (A.24) (A.25) (A.26) (A.27) (A.28) (A.29) (A.30) (A.31) (A.32) The steady state (ratios) that appear in the measurement equation or the log-linearized equilibrium conditions are given by γ = γ̄/100 + 1 (A.33) π ∗ = π̄/100 + 1 (A.34) r̄ = 100(β −1 γ σc π ∗ − 1) k rss = γ σc /β − (1 − δ) α 1 α (1 − α)(1−α) 1−α wss = k α Φrss ik = (1 − (1 − δ)/γ)γ k 1 − α rss lk = α wss (α−1) ky = Φlk iy = (γ − 1 + δ)ky (A.35) (A.36) (A.37) (A.38) (A.39) (A.40) (A.41) A-4 cy = 1 − gy − iy k ky zy = rss k 1 1 − α rss ky wlc = . λw α cy (A.42) (A.43) (A.44) The measurement equations take the form: Y GRt = γ̄ + ŷt − ŷt−1 IN Ft = π̄ + π̂t F F Rt = r̄ + R̂t CGRt = γ̄ + ĉt − ĉt−1 IGRt = γ̄ + ît − ît−1 W GRt = γ̄ + ŵt − ŵt−1 HOU RSt = ¯l + ˆlt . (A.45) A-5 Table A-1: Diffuse Prior for SW Model Parameter Type Para (1) Para (2) Parameter Type Para (1) Para (2) ϕ Normal 4.00 4.50 α Normal 0.30 0.15 σc Normal 1.50 1.11 ρa Uniform 0.00 1.00 h Uniform 0.00 1.00 ρb Uniform 0.00 1.00 ξw Uniform 0.00 1.00 ρg Uniform 0.00 1.00 σl Normal 2.00 2.25 ρi Uniform 0.00 1.00 ξp Uniform 0.00 1.00 ρr Uniform 0.00 1.00 ιw Uniform 0.00 1.00 ρp Uniform 0.00 1.00 ιp Uniform 0.00 1.00 ρw Uniform 0.00 1.00 ψ Uniform 0.00 1.00 µp Uniform 0.00 1.00 Φ Normal 1.25 0.36 µw Uniform 0.00 1.00 rπ Normal 1.50 0.75 ρga Uniform 0.00 1.00 ρ Uniform 0.00 1.00 σa Inv. Gamma 0.10 2.00 ry Normal 0.12 0.15 σb Inv. Gamma 0.10 2.00 r∆y Normal 0.12 0.15 σg Inv. Gamma 0.10 2.00 π Gamma 0.62 0.30 σi Inv. Gamma 0.10 2.00 Gamma 0.25 0.30 σr Inv. Gamma 0.10 2.00 Normal 0.00 6.00 σp Inv. Gamma 0.10 2.00 Normal 0.40 0.30 σw Inv. Gamma 0.10 2.00 100(β −1 l γ − 1) Notes: Para (1) and Para (2) correspond to the mean and standard deviation of the Beta, Gamma, and Normal distributions and to the upper and lower bounds of the support for the Uniform distribution. For the Inv. Gamma distribution, Para (1) and Para (2) refer to s and ν, where p(σ|ν, s) ∝ σ −ν−1 e−νs 2 /2σ 2 . The following parameters are fixed during the estimation: δ = 0.025, gy = 0.18, λw = 1.50, εw = 10.0, and εp = 10. A-6 Table A-2: SW Model: Original Prior Parameter Type Para (1) Para (2) Parameter Type Para (1) Para (2) ϕ Normal 4.00 1.50 α Normal 0.30 0.05 σc Normal 1.50 0.37 ρa Beta 0.50 0.20 h Beta 0.70 0.10 ρb Beta 0.50 0.20 ξw Beta 0.50 0.10 ρg Beta 0.50 0.20 σl Normal 2.00 0.75 ρi Beta 0.50 0.20 ξp Beta 0.50 0.10 ρr Beta 0.50 0.20 ιw Beta 0.50 0.15 ρp Beta 0.50 0.20 ιp Beta 0.50 0.15 ρw Beta 0.50 0.20 ψ Beta 0.50 0.15 µp Beta 0.50 0.20 Φ Normal 1.25 0.12 µw Beta 0.50 0.20 rπ Normal 1.50 0.25 ρga Beta 0.50 0.20 ρ Beta 0.75 0.10 σa Inv. Gamma 0.10 2.00 ry Normal 0.12 0.05 σb Inv. Gamma 0.10 2.00 r∆y Normal 0.12 0.05 σg Inv. Gamma 0.10 2.00 π Gamma 0.62 0.10 σi Inv. Gamma 0.10 2.00 100(β −1 − 1) Gamma 0.25 0.10 σr Inv. Gamma 0.10 2.00 l Normal 0.00 2.00 σp Inv. Gamma 0.10 2.00 γ Normal 0.40 0.10 σw Inv. Gamma 0.10 2.00 Notes: Para (1) and Para (2) correspond to the mean and standard deviation of the Beta, Gamma, and Normal distributions and to the upper and lower bounds of the support for the Uniform distribution. For the Inv. Gamma distribution, Para (1) and Para (2) refer to s and ν, where p(σ|ν, s) ∝ σ −ν−1 e−νs 2 /2σ 2 . A-7 A.2 Leeper-Plante-Traum Fiscal Policy Model The log-linearized equilibrium conditions of the Leeper, Plante, and Traum (2010) are given by: γ(1 + h) γh τc Ĉt + Ĉt−1 − τ̂ c (A.46) 1−h 1−h 1 + τc t γ τc c Et Ĉt+1 Et τ̂t+1 + Et ubt+1 − = R̂t − c 1+τ 1−h τc ûlt + (1 + κ)ˆlt + (A.47) τ̂ c 1 + τc t τl γ γh = Ŷt − τ̂tl − Ĉt + Ĉt−1 l 1+τ 1−h 1−h γ γ(1 + h) τc γh b c q̂t = Et ût+1 − Et Ĉt+1 + Ĉt − Et τt+1 Ĉt−1(A.48) − ûbt − c 1−h 1−h 1+τ 1−h Y Y Y τc k τ̂tc + β(1 − τ k )α Et Ŷt+1 − β(1 − τ k )α K̂t − βτ k α Et τ̂t+1 + c 1+τ K K K −βδ1 Et ν̂t+1 + β(1 − δ0 )Et q̂t+1 δ2 τk k Yt = τ̂ + K̂t+1 + q̂t + 1 + ν̂t (A.49) 1 − τk t δ0 1 (A.50) 0 = 00 q̂t + (1 − β)Iˆt + Iˆt−1 + βEt ûit + βEt ûit+1 s (1) Y Ŷt = C Ĉt + GĜt + I Iˆt (A.51) ûbt − K̂t = (1 − δ0 )Kt−1 + δ1 ν̂t + δ0 It B B̂t + τ k αY (τ̂tk + Ŷt ) + τ l (1 − α)Y (τ̂tl + Ŷt ) + τ c C(τ̂tc + Ĉt ) B B = R̂t−1 + B̂t−1 + GĜt + Z Ẑt β β a Ŷt = ût + ανt + αK̂t−1 + (1 − α)L̂t . (A.52) (A.53) (A.54) The fiscal policy rules and the law of motion of the exogenous shock processes are provided in the main text (Chapter 6.3). A-8 Table A-3: Fiscal Model: Posterior Moments - Part 2 Based on LPT Prior Mean [5%, 95%] Int. Based on Diff. Prior Mean [5%, 95%] Int. Endogenous Propagation Parameters γ 2.5 [ 1.82, 3.35] 2.5 [ 1.81, 3.31] κ 2.4 [ 1.70, 3.31] 2.5 [ 1.74, 3.37] h 0.57 [ 0.46, 0.68] 0.57 [ 0.46, 0.67] 7.0 [ 6.08, 7.98] 6.9 [ 6.06, 7.89] 0.25 [ 0.16, 0.39] 0.24 [ 0.16, 0.37] s 00 δ2 Endogenous Propagation Parameters ρa 0.96 [ 0.93, 0.98] 0.96 [ 0.93, 0.98] ρb 0.65 [ 0.60, 0.69] 0.65 [ 0.60, 0.69] ρl 0.98 [ 0.96, 1.00] 0.98 [ 0.96, 1.00] ρi 0.48 [ 0.38, 0.57] 0.47 [ 0.37, 0.57] ρg 0.96 [ 0.94, 0.98] 0.96 [ 0.94, 0.98] ρtk 0.93 [ 0.89, 0.97] 0.94 [ 0.88, 0.98] ρtl 0.98 [ 0.95, 1.00] 0.93 [ 0.86, 0.98] ρtc 0.93 [ 0.89, 0.97] 0.97 [ 0.94, 0.99] ρz 0.95 [ 0.91, 0.98] 0.95 [ 0.91, 0.98] σb 7.2 [ 6.48, 8.02] 7.2 [ 6.47, 8.00] σl 3.2 [ 2.55, 4.10] 3.2 [ 2.55, 4.08] σi 5.7 [ 4.98, 6.47] 5.6 [ 4.98, 6.40] σa 0.64 [ 0.59, 0.70] 0.64 [ 0.59, 0.70] Appendix B Data Sources B.1 Small-Scale New Keynesian DSGE Model The data from the estimation comes from Lubik and Schorfheide (2006). Here we detail the construction of the extended sample (2003:I to 2013:IV) for Chapter 8.6. 1. Per Capita Real Output Growth Take the level of real gross domestic product, (FRED mnemonic “GDPC1”), call it GDPt . Take the quarterly average of the Civilian Non-institutional Population (FRED mnemonic “CNP16OV” / BLS series “LNS10000000”), call it P OPt . Then, GDPt GDPt−1 Per Capita Real Output Growth = 100 ln − ln . P OPt P OPt−1 2. Annualized Inflation. Take the CPI price level, (FRED mnemonic “CPIAUCSL”), call it CP It . Then, Annualized Inflation = 400 ln CP It CP It−1 . 3. Federal Funds Rate. Take the effective federal funds rate (FRED mnemonic “FEDFUNDS”), call it F F Rt . Then, Federal Funds Rate = F F Rt . A-9 A-10 B.2 Smets-Wouters Model The data covers 1966:Q1 to 2004:Q4. The construction follows that of Smets and Wouters (2007). Output data come from the NIPA; other sources are noted in the exposition. 1. Per Capita Real Output Growth. Take the level of real gross domestic product, (FRED mnemonic “GDPC1”), call it GDPt . Take the quarterly average of the Civilian Non-institutional Population (FRED mnemonic “CNP16OV” / BLS series “LNS10000000”), normalized so that it’s 1992Q3 value is one, call it P OPt . Then, GDPt GDPt−1 Per Capita Real Output Growth = 100 ln − ln . P OPt P OPt−1 2. Per Capita Real Consumption Growth. Take the level of personal consumption expenditures (FRED mnemonic “PCEC”), call it CON St . Take the level of the GDP price deflator (FRED mnemonic “GDPDEF”), call it GDP Pt . Then Per Capita Real Consumption Growth CON St CON St−1 = 100 ln − ln . GDP Pt P OPt GDP Pt−1 P OPt−1 3. Per Capita Real Investment Growth. Take the level of fixed private investment (FRED mnemonic “FPI”), call it IN Vt . Then, Per Capita Real Investment Growth IN Vt IN Vt−1 = 100 ln − ln . GDP Pt P OPt GDP Pt−1 P OPt−1 4. Per Capita Real Wage Growth. Take the BLS measure of compensation per hour for the nonfarm business sector (FRED mnemonic “COMPNFB” / BLS series “PRS85006103”), call it Wt . Then Per Capita Real Wage Growth = 100 ln Wt GDP Pt − ln Wt−1 GDP Pt−1 . 5. Per Capita Hours Index. Take the index of average weekly nonfarm business hours (FRED mnemonic / BLS series “PRS85006023”), call it HOU RSt . Take the number of employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value is 1, call it EM Pt . Then, Per Capita Hours = 100 ln The series is then demeaned. HOU RSt EM Pt P OPt . A-11 6. Inflation. Take the GDP price deflator, then GDP Pt . Inflation = 100 ln GDP Pt−1 7. Federal Funds Rate. Take the effective federal funds rate (FRED mnemonic “FEDFUNDS”), call it F F Rt . Then, Federal Funds Rate = F F Rt /4. B.3 Leeper-Plante-Traum Fiscal Policy Model The data covers 1960:Q1 to 2008:Q1. The construction follows that of Leeper, Plante, and Traum (2010). Output data come from the NIPA; other sources are noted in the exposition. Each series has a (seperate) linear trend removed prior to estimation. 1. Real Investment. Take nominal personal consumption on durable goods (FRED mnemonic “PCDG”) call it P CEDt and deflate it by the GDP deflator for personal consumption (FRED mnemonic “DPCERD3Q086SBEA”) call it Pt . Take the number of employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value is 1, call it P OPt . Then, Real Investment = 100 ln P CEDt /Pt P OPt . 2. Real Consumption.Take nominal personal consumption on durable goods (FRED mnemonic “PCNG”) call it P CEt and deflate it by the GDP deflator for personal consumption (FRED mnemonic “DPCERD3Q086SBEA”) call it Pt . Take the number of employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value is 1, call it P OPt . Then, Real Consumption = 100 ln P CEt /Pt P OPt . 3. Real Hours worked. Take the index of average weekly nonfarm business hours (FRED mnemonic / BLS series “PRS85006023”), call it HOU RSt . Take the number of employed civilians (FRED mnemonic “CE16OV”), normalized so that its 1992Q3 value is 1, call it EM Pt . Then, Real Hours Worked = 100 ln HOU RSt EM Pt P OPt . A-12 4. Real Consumption Tax Revenues. Take Federal govenment current tax receipts from production and imports (FRED mnemonic “W007RC1Q027SBEA”), call it CT AXt . Then, Real Consumption Tax Revenues = 100 ln CT AXt /Pt P OPt . 5. Real Labor Tax Revenues. Take personal current tax revenues (FRED mnemonic “A074RC1Q027SBEA”), call it ITt , take wage and salary accruals (FRED mnemonic “WASCUR”), call it Wt , and take proprietors’ incomes (FRED mnemonic “A04RC1Q027SBEA”), call it P RIt . Take rental income (FRED mnemoic “RENTIN”), call it REN Tt , take corporate profits (FRED mnemonic “CPROFIT”), call it P ROFt , and take interest income (FRED mnemonic “W255RC1Q027SBEA”), call it IN Tt . Define capital income, CIt , as CIt = REN Tt + P ROFT + IN Tt + P RI/2. Define the average personal income tax rate as: τtp = ITt Wt + P RIt /2 + CIt . Take contributions for government social insurance (FRED mnemonic “W780RC1Q027SBEA”) call it CSIt and take compensation for employees (FRED mnemonic “A57RC1Q027SBEA”) call it ECt . Define the average labor income tax rate as τtl = τtp (Wt + P RIt ) + CSIt . ECt + P RIt /2 Take the tax base BASEt = P CEt + P CEDt . Then l τt BASEt /Pt Real Labor Tax Revenues = 100 ln . P OPt 6. Real Capital Tax Revenues. Take taxes on corporate income (FRED mnemonic “B075RC1Q027SBEA”) call it CTt and take property taxes (FRED mnemonic “B249RC1Q027SBEA”) and call it P Tt . Define the average capital income tax rate as τk = τ p CIt + CTt . CIt + P Tt Then Real Capital Tax Revenues = 100 ln τtk BASEt /Pt P OPt . A-13 7. Real Government Expenditure. Take government consumption expenditure (FRED mnemonic “FGEXPND”) call it GCt . Take government gross investment (FRED mnemonic “A787RC1Q027SBEA”) call it GIt . Take government net purchases of non-produced assets (FRED mnemonic “AD08RC1A027NBEA”) call it GPt . Finally, take government consumption of fixed capital (FRED mnemonic “A918RC1Q027SBEA”) call it GCKt . Define Gt = GCt + GIt + GPt − GCKt . Then Real Government Expenditure = 100 ln Gt /Pt P OPt . 8. Real Government Transfers. Take current transfer payments (FRED mnemonic “A084RC1Q027SBEA”) call T RAN SP AYt and take current transfer receipts (FRED mnemonic “A577RC1Q027SBEA”) call it T RAN SRECt . Define net current transfers as CU RRT RAN St = T RAN SP AYt − T RAN SRECt . Take capital transfer payments (FRED mnemonic “W069RC1Q027SBEA”) call it CAP T RAN SP AYt and take capital transfer receipts (FRED mnemonic “B232RC1Q027SBEA”) call it CAP T RAN SRECt . Define net capital transfers as CAP T RAN St = CAP T RAN SP AYt − CAP T RAN SRECt . Take current tax receipts (FRED mnemonic “W006RC1Q027SBEA”), call it T AXRECt , take income receipts on assets (FRED mnemonic “W210RC1Q027SBEA”), call it IN CRECt , and take the current surplus of government enterprises (FRED mnemonic “A108RC1Q027SBEA”), call it, GOV SRPt . Define the total tax revenue, Tt , as the sum on consumption, labor, and capital tax revenues. Define the tax residual as T AXRESIDt = T AXREC + IN CREC + CSIt + GOV SRPt − Tt . Define T Rt = CU RRT RAN St + CAP T RAN St − T AXRESIDt . Then Real Government Transfers = 100 ln T Rt /Pt P OPt . 9. Real Government Debt. Take interest payments (FRED mnemonic “A091RC1Q027SBEA”), call it IN T P AYt . Define net borrowing as N Bt = Gt + IN Tt + T Rt − Tt . A-14 Take the adjusted monetary base (FRED mnemonic “AMBSL”), take quarterly averages, then call it Mt . Then define government debt–starting in 1947–as Bt = N Bt − ∆Mt + Bt−1 . The value B1947Q1 is from Cox and Hirschhorn (1983). Then, Bt /Pt Real Government Debt = 100 ln P OPt

Particle MCMC and Sequential Monte Carlo Squared for DSGE Models

Related documents

Products

Support

Particle MCMC and Sequential Monte Carlo Squared for DSGE Models

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib