Bayesian VARs for large panels of time series Lucrezia Reichlin ECB, ECARES and CEPR Duke University Second Forecasting Conference, March 2007 “Empirical Econometric Methods applied to Business Cycles and Forecasting: Comparisons between the U.S. and Europe” This Presentation Does the standard Bayesian VAR work for large panels of time series? Large? n possibly > T , data-sets used in recent literature on factor models Work? Forecast and structural analysis This paper is a follow up of De Mol, Giannone and Reichlin, 2006 In that paper (amongst other things) we explored properties of Bayesian regression as the cross-sectional dimension of the data becomes large Two results: 1. Under the same assumptions used to study the asymptotic properties of forecasts based on principal components, the forecast computed on the basis of the point estimates of Bayesian regression converges to the optimal forecast as n and T go to infinity along any path 2. Empirically, the forecast based on the SW’s dataset for the US economy (130 variables) are accurate (as accurate as PC forecasts) see also Giacomini and White, 2006 ⇒ Bayesian regression is a suitable tool for large panels This paper (Banbura, Giannone and Reichlin, 2007) Bayesian VAR for models with n = 3, 8, 20, 150 – Evaluation of out-of-sample forecast accuracy – Structural analysis: the effect of a monetary shock Why is this problem interesting? Bayesian methods are part of the traditional econometrician toolbox and offer a natural solution to overcome the curse of dimensionality problem by shrinking the parameters via the imposition of priors. BVARs standard tools in macro However maximum size in empirical literature is n = 20 (see for example Leeper, Sims and Zha, 1996; Uhlig, 2004). Bayesian Vector Autoregressive models VAR(p): Yt = µ + A1Yt−1 + ... + ApYt−p + ut Yt = y1,t y2,t . . . yn,t ′ ut: n-dimensional white noise with Eutu′t = Ψ Priors: Minnesota prior (Litterman) (Ak )ij −E[(Ak )ij ] √ , i, j = 1, ..., n, s = 1, ..., p |Ψ ∼ i.i.d.N (0, 1) V[(Ak )ij ] Ψ ∼ IW(Σ, n + 2) E[(Ak )ij ] = δi , j = i, k = 1 , 0, otherwise λ2 σi2 V[(Ak )ij ] = 2 2 , (Σ)ij = k σj j=i σi2 , . 0, otherwise Set δi = 1 for prior belief of random walk for variable yit if persistence Set δi = 0 for prior belief of white noise if mean reversion Comments: The parameter λ controls the overall tightness of the prior belief around the random walk or white noise prior. Two extreme cases: 1) λ = ∞: no shrinkage ⇔ OLS 2) λ = 0, we impose the prior (naive) model. The factor 1/k2 controls for the rate at which the prior variance shrinks more for increasing lag length. We treat symmetrically the coefficients on own lags (Ak )ij , i = j and coefficients on lags of different variables (Ak )ij , i 6= j ⇒ the computation of the posterior is tractable computing the posterior for the full system is equivalent to computing it equation by equation, (Kadiyala & Karlsson (1997); Sims & Zha, 1998) Additional remarks • The parameters σi2, i = 1, ..., n, are scale parameters accounting for the different scale and variability of the data. • For the constant terms µ1 , ..., µn we use a diffuse prior. The prior correlation between coefficients is set to zero. • The degrees of freedom of the Inverted Wishart is set to be equal to n + 2 to insure that the expectation of the prior residual variance exists. In this case we have E[Ψ] = Σ. Empirics Data: 131 monthly macro indicators: real activity, prices, financial,... Sample: January 1959 - December 2003 Variables we focus on: - measure of real economic activity (employment/ CES002), - measure of prices (consumer price index /PUNEW), - monetary policy instrument (federal fund rate / FYFF). VAR specifications: • SMALL. Baseline monetary VAR (n = 3) including the three key variables; • CEE. Monetary model of Christiano, Eichenbaum, and Evans, 1999 (n = 8). In addition to the key variable this model includes the index of sensitive material price (PSM99Q) and monetary aggregates: non-borrowed reserves (FMRRA), total reserves (FMRNBA) and M 2 money stock (FM2); • MEDIUM. VAR containing key macroeconomic variables (n = 20). In addition to the CEE model, the following variables are included: Personal Income (A0M051), Real Consumption (A0M051R), Industrial Production (IPS10), Capacity Utilization (A0M082), Unemployment Rate (LHUR), Housing Starts (HSFR), Producer Price Index (PWFSA), Personal Consumption Expenditures Price Deflator (GDMC), Average Hourly Earnings (CES275), M1 Monetary Stock (FM1), Standard and Poor’s Stock Price Index (FSPCOM), Yields on 10 year U.S. Treasury Bond (FYGT10). • LARGE. Include all the 131 indicators from Stock and Watson, 2002. Forecasting Compute point forecasts by using the posterior mean of the parameters. (λ) Write Âj , j = 1, .., p and µ̂(λ) for the posterior mean of the autoregressive coefficients and the constant term obtained by setting the overall tightness equal to λ. The point estimate of the one step ahead forecast is computed as (λ) (λ) (λ) Ŷt+1|t = µ̂(λ) + Â1 Yt + ... + Âp Yt−p+1 Forecasts h-steps ahead are computed recursively. (λ) Write Yt+h|t = (λ) (λ) y1,t+h|t, ..., yn,t+h|t ′ for the h-steps ahead fore- casts, where n is the number of variables included in the model. Benchmark of non-predictability forecasts obtained by imposing the prior specification (λ = 0). (0) Corresponding forecast are denoted by Yt+h|t. - RW with drift for Employment, CPI and FFR. Out-of-sample forecasting experiment • Evaluate the forecast performance of the VARs for the three key series included in all VAR specifications (Employment, CPI and FFR) • Set the order of the VAR to p = 13 • Parameters are estimated recursively using the most recent 10 years observations (rolling scheme) • Evaluation sample: Jan70(T 0) - Dec03(T 1) • Forecast horizon: h = 1, 3, 6, 12 months ahead. We measure forecast accuracy in terms of Mean Square Forecast Error 2 TX 1 −h 1 (λ) (λ) yi,T +h|T − yi,T +h M SF Ei,h = T1 − T0 − 12 + 1 T =T +12−h 0 We report results in terms of MSFE relative to benchmark: (λ,m) (λ) RM SF Ei,h = M SF Ei,h (0) M SF Ei,h ; Setting the overall shrinkage ... Models of different size require a different degree of shrinkage since as n increases the curse of dimensionality and the overfitting become more severe. (Giannone, De Mol and Reichlin 2006 show that to achieve consistency you should shrink more as the cross-section becomes large.) The overall tightness is set to yield a desired fit for the three variables of interest. It can be seen as incorporating a prior belief on the fit of the model. This allow for comparability acrossmodels since we control for overfitting. We set the tightness during the pre-evaluation period going [from Jan60 (t = 1) until Dec69 (t = T0 − 1)]. It is then kept fixed for the entire evaluation period. ... Setting the overall shrinkage ... The in-sample fit is an in-sample measure of the 1-step ahead mean squared forecast error. We evaluate it at the beginning of the evaluation period, that is when we estimated the model for the first time, using data from Jan60 until Dec69 (t = 1, ..., T0): msfe(λ) i TX 0 −1 1 (λ) = (yi,t+1|t − yi,t+1 )2 T0 − p t=p where the parameters are computed using the sample t = 1, ..., T0. Order the variables in all models so that Empl, CPI and FFR are the first. For a given measure of fit, we set the shrinkage as follows: λ∗F it such that 3 (λ∗F it ) X 1 msfe i 3 (0) msfe i=1 i = F it ... Setting the overall shrinkage • Report results corresponding to the fit of the small model with p = 13 estimated by OLS (F it ≈ .60) - We also present the results for a range of in-sample fits F it = 1/4, 1/2, 3/4 Table 1: BVAR, Relative MSFE, 1971-2003 EMP CPI FFR EMP CPI FFR EMP CPI FFR EMP CPI FFR h=1 h=3 h=6 h=12 λ SMALL 1.14 0.89 1.86 0.95 0.66 1.77 1.11 0.64 2.08 1.02 0.83 2.59 ∞ CEE 0.67 0.52 0.89 0.65 0.41 1.07 0.78 0.41 1.30 1.21 0.57 1.71 0.262 MEDIUM 0.54 0.50 0.78 0.51 0.41 0.95 0.66 0.40 1.30 0.86 0.47 1.48 0.108 LARGE 0.46 0.50 0.75 0.38 0.40 0.94 0.50 0.40 1.29 0.78 0.44 1.93 0.035 Comments: • Adding informations helps - larger systems produce more accurate forecasts. • Forecasts from MEDIUM are already quite accurate • The federal fund rate is forecastable only in the short run • Shrink more for larger models Alternative to Bayesian shrinkage: parsimony by lags selection Table (2) presents the results for SMALL and CEE for p = 13 and p selected by BIC. The last two columns recall the results for the Bayesian estimation and the results from the large model to facilitate the comparison. Note that this exercise is not feasible for larger models since the estimation by OLS and p = 13 is unfeasible. Table 2: OLS, Relative MSFE, 1971-2003 SMALL h=1 h=12 CEE h=1 h=12 EMP CPI FFR EMP CPI FFR EMP CPI FFR EMP CPI FFR p = 13 p = BIC BVAR LARGE 1.14 0.89 1.86 1.02 0.83 2.59 0.73 0.55 0.99 1.15 0.56 1.59 1.14 0.89 1.86 1.02 0.83 2.59 0.46 0.50 0.75 0.78 0.44 1.93 7.56 5.61 6.39 22.3 21.0 47.1 0.76 0.55 1.21 0.82 0.53 1.62 0.67 0.52 0.89 1.21 0.57 1.71 0.46 0.50 0.75 0.78 0.44 1.93 Table 2: OLS, Relative MSFE, 1971-2003 SMALL h=1 h=3 h=6 h=12 CEE h=1 h=3 h=6 h=12 EMP CPI FFR EMP CPI FFR EML CPI FFR EMP CPI FFR EMP CPI FFR EMP CPI FFR EML CPI FFR EMP CPI FFR p = 13 p = BIC BVAR LARGE 1.14 0.89 1.86 0.95 0.66 1.77 1.11 0.64 2.08 1.02 0.83 2.59 7.56 5.61 6.39 5.11 4.52 6.92 7.79 4.80 15.9 22.3 21.0 47.1 0.73 0.55 0.99 0.76 0.49 1.29 0.90 0.51 1.51 1.15 0.56 1.59 0.76 0.55 1.21 0.75 0.45 1.27 0.78 0.44 1.48 0.82 0.53 1.62 1.14 0.89 1.86 0.95 0.66 1.77 1.11 0.64 2.08 1.02 0.83 2.59 0.67 0.52 0.89 0.65 0.41 1.07 0.78 0.41 1.30 1.21 0.57 1.71 0.46 0.50 0.75 0.38 0.40 0.94 0.50 0.40 1.29 0.78 0.44 1.93 0.46 0.50 0.75 0.38 0.40 0.94 0.50 0.40 1.29 0.78 0.44 1.93 Results • VARs estimated by OLS performs better when the number of lags is chosen by the BIC criterion • However even in this case the forecasts from SMALL and CEE are less accurate that than those from MEDIUM and LARGE. Bayesian shrinkage and FAVAR FAVAR: augment the baselines monetary VAR (SMALL) with few principal components extracted from the large dataset Bernanke & Boivin & Eliasz, 2005; Stock & Watson, 2005. Principal components are extracted from the large panel of 131 variables. Variables are first stationarized by taking first difference wherever we have imposed a random walk prior δi = 1. Then they are standardized since PC are not scale invariant. Principal components are computed recursively at each point T in the evaluation sample. Specification details We augment the baseline VAR with one [1] and three [3] principal components and look at different lag selection for the VAR. - p = 13, as in Bernanke & Boivin & Eliasz, 2005 - p selected by BIC. - BFAVAR, using Bayesian estimation with p = 13 [we impose the fit from OLS on the small model as in Table 1] - Results for alternative shrinkages are in the paper Table 3: FAVAR, Relative MSFE, 1971-2003 p = 13 p = BIC BVAR LARGE r=1 h=1 EMP 1.36 0.54 0.70 0.46 CPI 1.10 0.57 0.65 0.50 FFR 1.86 0.98 0.89 0.75 h=12 EMP 1.15 0.98 0.92 0.78 CPI 0.95 0.58 0.70 0.44 FFR 2.69 1.43 1.93 1.93 r=3 h=1 EMP 3.02 0.52 0.65 0.46 CPI 2.39 0.52 0.58 0.50 FFR 2.40 0.97 0.85 0.75 h=12 EMP 3.16 0.84 0.83 0.78 CPI 1.98 0.54 0.64 0.44 FFR 7.09 1.46 1.69 1.93 Table 3: FAVAR, Relative MSFE, 1971-2003 p = 13 p = BIC BVAR LARGE r=1 h=1 EMP 1.36 0.54 0.70 0.46 CPI 1.10 0.57 0.65 0.50 FFR 1.86 0.98 0.89 0.75 h=3 EMP 1.13 0.55 0.68 0.38 CPI 0.80 0.49 0.55 0.40 FFR 1.62 1.12 1.03 0.94 h=6 EML 1.33 0.73 0.87 0.50 CPI 0.74 0.52 0.55 0.40 FFR 2.07 1.31 1.40 1.29 h=12 EMP 1.15 0.98 0.92 0.78 CPI 0.95 0.58 0.70 0.44 FFR 2.69 1.43 1.93 1.93 r=3 h=1 EMP 3.02 0.52 0.65 0.46 CPI 2.39 0.52 0.58 0.50 FFR 2.40 0.97 0.85 0.75 h=3 EMP 2.11 0.50 0.61 0.38 CPI 1.44 0.44 0.49 0.40 FFR 3.08 1.16 0.99 0.94 h=6 EML 2.52 0.63 0.77 0.50 CPI 1.18 0.46 0.50 0.40 FFR 3.28 1.45 1.27 1.29 h=12 EMP 3.16 0.84 0.83 0.78 CPI 1.98 0.54 0.64 0.44 FFR 7.09 1.46 1.69 1.93 Results: • FAVAR with p = 13 performs poorly: the gain in parsimony obtained by PC is lost because of too many lags. • the FAVAR is in general outperformed by the BVAR of medium and large size but differences are not that large once we overcome the overfitting problem by using less lags or by shrinkage. Out-of-sample forecasting experiment Conclude • more information improves forecast accuracy once we control for overfitting. • Large BVAR produce accurate forecasts (competitive with small models and PCs, FAVAR ) ⇒ Suitable tool for incorporating large information Structural analysis Impulse responses to a monetary policy shock Recursive identification scheme: • divide the variables in the panel into 2 categories, slow (real variables and prices) and fast-moving (financial and monetary variables). • Identifying assumption: slow-moving variables do not respond contemporaneously to a monetary policy shock (Christiano et al., 2000) ⇒ Recursive (Cholesky) identification scheme. Assume that monetary policy shock is orthogonal to all other shocks driving the economy and affects contemporaneously the federal funds rate by 100 basis-points. Reduced form VAR: VAR(p): Yt = µ + A1Yt−1 + ... + ApYt−p + ut; The Structural VAR: ↓ A0Yt = ν + A1 Yt−1 + ... + Ap Yt−p + et; ut ∼ W N (0, Ψ) et ∼ W N (0, D) where - D is diagonal, - A0 is lower triangular with ones on the main diagonal ′−1 =Ψ - A−1 0 DA0 - A−1 0 A j = Aj =⇒ unique mapping from Ψ to (A0, D). Since the mapping from the reduced for to the structural VAR is unique, draws for the impulse response functions are easily obtained following Canova (1991) and Gordon & Leeper (1994): i) generate draws from the posterior of (A1, ..., Ap, Ψ) ii) for each draw compute the associated A0 and D and hence Aj = A0 Aj iii) Form each draw for A0 , A1 , ...Ap we compute the IRFs. We report results for the same degree of shrinkage used for the forecasting evaluation Estimation is carried out using whole time span of the data 1961-2002 The number of lags remains p = 13 Impulse response functions to the monetary policy shock - responses of Employment, CPI and FFR - different models PUNEW CES002 SMALL 0 −0.2 −0.4 −0.6 −0.8 0 12 24 36 CEE 0 −0.2 −0.4 −0.6 −0.8 48 0 12 24 MEDIUM 36 0 −0.2 −0.4 −0.6 −0.8 48 0 12 24 36 LARGE 0 −0.2 −0.4 −0.6 −0.8 48 0 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 FYFF −1 0 12 24 36 48 −1 0 12 24 36 48 −1 0 12 24 36 48 −1 0 1.5 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 0 12 24 36 48 0 12 24 36 CI 0.9 48 0 12 CI 0.68 24 36 IRF 48 0 12 24 36 48 12 24 36 48 12 24 36 48 Results: • MEDIUM and LARGE impulse responses have the expected signs • Adding information helps resolving price puzzle, cf. Christiano et al. (2000), Bernanke et al. (2005) • For larger lags the confidence intervals for LARGE become explosive Explosive draws ↔ non-stationarity (data enter in levels) Solve by “inexact differencing” Rewrite the VAR(p): Yt = µ + A1Yt−1 + ... + ApYt−p + ut in the ECM form: ∆Yt = µ + (In − A1 − A2 − ... − Ap )Yt−1 + B1∆Yt−1 + ... + Bp∆Yt−p + ut - “Exact differences”: impose (In − A1 − A2 − ... − Ap) equal to zero - “Inexact differences”: shrink (In − A1 − A2 − ... − Ap ) to zero. ⇒ “Sum of coefficients prior” (Doan, Litterman, Sims, 1984) Let τ control for the degree of shrinkage on the sum of coefficients • τ ↓0 • τ ↑∞ → “exact differences” → no shrinkage Our choice: loose prior on sum of coefficients: τ = 10λ Robust for less informative priors (ex. τ = 100λ) and more informative prior (τ = λ). Impulse response functions to the monetary policy shock - responses of Employment, CPI and FFR - sum of coefficient prior CES002 SMALL PUNEW MEDIUM LARGE 0 0 0 0 −0.5 −0.5 −0.5 −0.5 −1 −1 −1 −1 0 12 24 36 48 0 12 24 36 48 0 12 24 36 48 0 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 −1 −1 −1 −1 0 FYFF CEE 12 24 36 48 0 12 24 36 48 0 12 24 36 48 0 1.5 1.5 1.5 1.5 1 1 1 1 0.5 0.5 0.5 0.5 0 0 0 0 −0.5 0 12 24 36 48 −0.5 0 12 24 36 CI 0.9 48 −0.5 0 12 CI 0.68 24 36 IRF 48 −0.5 0 12 24 36 48 12 24 36 48 12 24 36 48 Impulse response functions to the monetary policy shock - model LARGE - responses of selected variables - sum of coefficient prior CES002 PSM99Q PUNEW 0 0 −0.2 −0.4 −0.6 −0.5 −1 0 12 24 36 48 0 12 A0M224_R 24 36 48 12 24 36 48 36 0 48 12 24 24 36 48 0 0 12 24 36 48 36 48 0 12 24 −2 12 24 0 48 36 48 48 12 24 36 48 12 36 CI 0.9 48 24 36 48 0 12 24 36 48 36 48 36 48 1 0 −1 0 12 24 36 48 0 12 24 EXRUS 1.5 1 0.5 0 −0.5 −0.2 24 48 FMRRA 0 12 36 CES275 0.2 0 0 FYGT10 1 0 −1 −2 0 36 FM2 36 24 0 −0.2 −0.4 −0.6 −0.8 FSPCOM 2 24 0.6 0.4 0.2 0 −0.2 FMRNBA 0 12 GMDC −0.5 0 0 FM1 0.5 12 0.2 0.1 0 −0.1 −1 12 0 LHUR −0.5 FYFF 0.5 48 0 0 1 36 −0.5 0 −0.5 −1 −1.5 24 24 0 PWFSA 2 0 −2 −4 12 12 0.5 HSFR 0 −0.5 0 A0M082 0.5 0 −0.5 −1 0 0 IPS10 0.6 0.4 0.2 0 −0.2 −0.4 A0M051 0.5 0 −1 −2 −3 0 CI 0.68 12 24 IRF 36 48 0 12 24 BVAR, Relative MSFE, 1971-2003: baseline h=1 h=12 EMP CPI FFR EMP CPI FFR SMALL 1.14 0.89 1.86 1.02 0.83 2.59 CEE 0.67 0.52 0.89 1.21 0.57 1.71 MEDIUM 0.54 0.50 0.78 0.86 0.47 1.48 LARGE 0.46 0.50 0.75 0.78 0.44 1.93 BVAR, Relative MSFE, 1971-2003: with sum of coefficient prior h=1 h=12 EMP CPI FFR EMP CPI FFR SMALL 1.14 0.89 1.86 1.02 0.83 2.59 CEE 0.68 0.57 0.97 0.65 0.55 1.61 MEDIUM 0.53 0.49 0.75 0.60 0.43 0.93 LARGE 0.44 0.49 0.74 0.50 0.40 0.92 BVAR, Relative MSFE, 1971-2003 With sum of coefficient prior h=1 h=3 h=6 h=12 EMP CPI FFR EMP CPI FFR EML CPI FFR EMP CPI FFR SMALL CEE MEDIUM LARGE 1.14 0.89 1.86 0.68 0.57 0.97 0.53 0.49 0.75 0.44 0.49 0.74 0.95 0.66 1.77 0.60 0.44 1.28 0.49 0.39 0.85 0.36 0.37 0.82 1.11 0.64 2.08 0.65 0.45 1.40 0.58 0.37 0.96 0.44 0.36 0.92 1.02 0.83 2.59 0.65 0.55 1.61 0.60 0.43 0.93 0.50 0.40 0.92 Results With the prior on the sum of coefficients everything works better: • it makes the IRFs of the LARGE model non explosive. • improves forecast accuracy particularly for long horizons and for CPI and the FFR. Similar finding in Tallman & Robertson, 1999 Conclusions