Bayesian VARs for large panels of time series

advertisement
Bayesian VARs for large panels of time series
Lucrezia Reichlin
ECB, ECARES and CEPR
Duke University
Second Forecasting Conference, March 2007
“Empirical Econometric Methods applied to Business Cycles and
Forecasting: Comparisons between the U.S. and Europe”
This Presentation
Does the standard Bayesian VAR work for large panels of time
series?
Large? n possibly > T , data-sets used in recent literature on
factor models
Work? Forecast and structural analysis
This paper is a follow up of De Mol, Giannone and Reichlin, 2006
In that paper (amongst other things) we explored properties of Bayesian
regression as the cross-sectional dimension of the data becomes large
Two results:
1. Under the same assumptions used to study the asymptotic properties
of forecasts based on principal components, the forecast computed on
the basis of the point estimates of Bayesian regression converges to the
optimal forecast as n and T go to infinity along any path
2. Empirically, the forecast based on the SW’s dataset for the US economy
(130 variables) are accurate (as accurate as PC forecasts)
see also Giacomini and White, 2006
⇒ Bayesian regression is a suitable tool for large panels
This paper (Banbura, Giannone and Reichlin, 2007)
Bayesian VAR for models with n = 3, 8, 20, 150
– Evaluation of out-of-sample forecast accuracy
– Structural analysis: the effect of a monetary shock
Why is this problem interesting?
Bayesian methods are part of the traditional econometrician
toolbox and offer a natural solution to overcome the curse of
dimensionality problem by shrinking the parameters via the imposition of priors.
BVARs standard tools in macro
However maximum size in empirical literature is n = 20
(see for example Leeper, Sims and Zha, 1996; Uhlig, 2004).
Bayesian Vector Autoregressive models
VAR(p):
Yt = µ + A1Yt−1 + ... + ApYt−p + ut
Yt = y1,t y2,t . . . yn,t
′
ut: n-dimensional white noise with Eutu′t = Ψ
Priors:
Minnesota prior (Litterman)
(Ak )ij −E[(Ak )ij ]
√
, i, j = 1, ..., n, s = 1, ..., p |Ψ ∼ i.i.d.N (0, 1)
V[(Ak )ij ]
Ψ ∼ IW(Σ, n + 2)
E[(Ak )ij ] =
δi , j = i, k = 1
,
0,
otherwise
λ2 σi2
V[(Ak )ij ] = 2 2 , (Σ)ij =
k σj
j=i
σi2 ,
.
0, otherwise
Set δi = 1 for prior belief of random walk for variable yit if persistence
Set δi = 0 for prior belief of white noise if mean reversion
Comments:
The parameter λ controls the overall tightness of the prior belief around the
random walk or white noise prior.
Two extreme cases:
1) λ = ∞: no shrinkage ⇔ OLS
2) λ = 0, we impose the prior (naive) model.
The factor 1/k2 controls for the rate at which the prior variance shrinks
more for increasing lag length.
We treat symmetrically the coefficients on own lags (Ak )ij , i = j and coefficients on lags of different variables (Ak )ij , i 6= j
⇒ the computation of the posterior is tractable
computing the posterior for the full system is equivalent to computing it
equation by equation, (Kadiyala & Karlsson (1997); Sims & Zha, 1998)
Additional remarks
• The parameters σi2, i = 1, ..., n, are scale parameters accounting for the
different scale and variability of the data.
• For the constant terms µ1 , ..., µn we use a diffuse prior. The prior correlation
between coefficients is set to zero.
• The degrees of freedom of the Inverted Wishart is set to be equal to n + 2
to insure that the expectation of the prior residual variance exists. In this
case we have E[Ψ] = Σ.
Empirics
Data:
131 monthly macro indicators: real activity, prices, financial,...
Sample: January 1959 - December 2003
Variables we focus on:
- measure of real economic activity (employment/ CES002),
- measure of prices (consumer price index /PUNEW),
- monetary policy instrument (federal fund rate / FYFF).
VAR specifications:
• SMALL. Baseline monetary VAR (n = 3)
including the three key variables;
• CEE. Monetary model of Christiano, Eichenbaum, and Evans, 1999 (n = 8).
In addition to the key variable this model includes the index of sensitive
material price (PSM99Q) and monetary aggregates: non-borrowed reserves (FMRRA), total reserves (FMRNBA) and M 2 money stock (FM2);
• MEDIUM. VAR containing key macroeconomic variables (n = 20).
In addition to the CEE model, the following variables are included: Personal Income (A0M051), Real Consumption (A0M051R), Industrial Production (IPS10), Capacity Utilization (A0M082), Unemployment Rate
(LHUR), Housing Starts (HSFR), Producer Price Index (PWFSA), Personal Consumption Expenditures Price Deflator (GDMC), Average Hourly
Earnings (CES275), M1 Monetary Stock (FM1), Standard and Poor’s
Stock Price Index (FSPCOM), Yields on 10 year U.S. Treasury Bond
(FYGT10).
• LARGE. Include all the 131 indicators from Stock and Watson, 2002.
Forecasting
Compute point forecasts by using the posterior mean of the
parameters.
(λ)
Write Âj , j = 1, .., p and µ̂(λ) for the posterior mean of the
autoregressive coefficients and the constant term obtained by
setting the overall tightness equal to λ.
The point estimate of the one step ahead forecast is computed
as
(λ)
(λ)
(λ)
Ŷt+1|t = µ̂(λ) + Â1 Yt + ... + Âp Yt−p+1
Forecasts h-steps ahead are computed recursively.
(λ)
Write Yt+h|t =
(λ)
(λ)
y1,t+h|t, ..., yn,t+h|t
′
for the h-steps ahead fore-
casts, where n is the number of variables included in the model.
Benchmark of non-predictability
forecasts obtained by imposing the prior specification (λ = 0).
(0)
Corresponding forecast are denoted by Yt+h|t.
- RW with drift for Employment, CPI and FFR.
Out-of-sample forecasting experiment
• Evaluate the forecast performance of the VARs for the three
key series included in all VAR specifications (Employment, CPI
and FFR)
• Set the order of the VAR to p = 13
• Parameters are estimated recursively using the most recent 10
years observations (rolling scheme)
• Evaluation sample: Jan70(T 0) - Dec03(T 1)
• Forecast horizon: h = 1, 3, 6, 12 months ahead.
We measure forecast accuracy in terms of Mean Square Forecast
Error
2
TX
1 −h
1
(λ)
(λ)
yi,T +h|T − yi,T +h
M SF Ei,h =
T1 − T0 − 12 + 1 T =T +12−h
0
We report results in terms of MSFE relative to benchmark:
(λ,m)
(λ)
RM SF Ei,h =
M SF Ei,h
(0)
M SF Ei,h
;
Setting the overall shrinkage ...
Models of different size require a different degree of shrinkage
since as n increases the curse of dimensionality and the overfitting become more severe.
(Giannone, De Mol and Reichlin 2006 show that to achieve consistency you
should shrink more as the cross-section becomes large.)
The overall tightness is set to yield a desired fit for the three
variables of interest. It can be seen as incorporating a prior belief
on the fit of the model. This allow for comparability acrossmodels since we control for overfitting.
We set the tightness during the pre-evaluation period going [from
Jan60 (t = 1) until Dec69 (t = T0 − 1)]. It is then kept fixed for
the entire evaluation period.
... Setting the overall shrinkage ...
The in-sample fit is an in-sample measure of the 1-step ahead mean squared
forecast error.
We evaluate it at the beginning of the evaluation period, that is when we
estimated the model for the first time, using data from Jan60 until Dec69
(t = 1, ..., T0):
msfe(λ)
i
TX
0 −1
1
(λ)
=
(yi,t+1|t
− yi,t+1 )2
T0 − p t=p
where the parameters are computed using the sample t = 1, ..., T0.
Order the variables in all models so that Empl, CPI and FFR are the first.
For a given measure of fit, we set the shrinkage as follows:
λ∗F it such that
3
(λ∗F it )
X
1
msfe
i
3
(0)
msfe
i=1
i
= F it
... Setting the overall shrinkage
• Report results corresponding to the fit of the small model with
p = 13 estimated by OLS (F it ≈ .60)
- We also present the results for a range of in-sample fits F it =
1/4, 1/2, 3/4
Table 1: BVAR, Relative MSFE, 1971-2003
EMP
CPI
FFR
EMP
CPI
FFR
EMP
CPI
FFR
EMP
CPI
FFR
h=1
h=3
h=6
h=12
λ
SMALL
1.14
0.89
1.86
0.95
0.66
1.77
1.11
0.64
2.08
1.02
0.83
2.59
∞
CEE
0.67
0.52
0.89
0.65
0.41
1.07
0.78
0.41
1.30
1.21
0.57
1.71
0.262
MEDIUM
0.54
0.50
0.78
0.51
0.41
0.95
0.66
0.40
1.30
0.86
0.47
1.48
0.108
LARGE
0.46
0.50
0.75
0.38
0.40
0.94
0.50
0.40
1.29
0.78
0.44
1.93
0.035
Comments:
• Adding informations helps - larger systems produce more accurate forecasts.
• Forecasts from MEDIUM are already quite accurate
• The federal fund rate is forecastable only in the short run
• Shrink more for larger models
Alternative to Bayesian shrinkage: parsimony by lags selection
Table (2) presents the results for SMALL and CEE for p = 13
and p selected by BIC.
The last two columns recall the results for the Bayesian estimation and the
results from the large model to facilitate the comparison.
Note that this exercise is not feasible for larger models since the estimation
by OLS and p = 13 is unfeasible.
Table 2: OLS, Relative MSFE, 1971-2003
SMALL
h=1
h=12
CEE
h=1
h=12
EMP
CPI
FFR
EMP
CPI
FFR
EMP
CPI
FFR
EMP
CPI
FFR
p = 13
p = BIC
BVAR
LARGE
1.14
0.89
1.86
1.02
0.83
2.59
0.73
0.55
0.99
1.15
0.56
1.59
1.14
0.89
1.86
1.02
0.83
2.59
0.46
0.50
0.75
0.78
0.44
1.93
7.56
5.61
6.39
22.3
21.0
47.1
0.76
0.55
1.21
0.82
0.53
1.62
0.67
0.52
0.89
1.21
0.57
1.71
0.46
0.50
0.75
0.78
0.44
1.93
Table 2: OLS, Relative MSFE, 1971-2003
SMALL
h=1
h=3
h=6
h=12
CEE
h=1
h=3
h=6
h=12
EMP
CPI
FFR
EMP
CPI
FFR
EML
CPI
FFR
EMP
CPI
FFR
EMP
CPI
FFR
EMP
CPI
FFR
EML
CPI
FFR
EMP
CPI
FFR
p = 13
p = BIC
BVAR
LARGE
1.14
0.89
1.86
0.95
0.66
1.77
1.11
0.64
2.08
1.02
0.83
2.59
7.56
5.61
6.39
5.11
4.52
6.92
7.79
4.80
15.9
22.3
21.0
47.1
0.73
0.55
0.99
0.76
0.49
1.29
0.90
0.51
1.51
1.15
0.56
1.59
0.76
0.55
1.21
0.75
0.45
1.27
0.78
0.44
1.48
0.82
0.53
1.62
1.14
0.89
1.86
0.95
0.66
1.77
1.11
0.64
2.08
1.02
0.83
2.59
0.67
0.52
0.89
0.65
0.41
1.07
0.78
0.41
1.30
1.21
0.57
1.71
0.46
0.50
0.75
0.38
0.40
0.94
0.50
0.40
1.29
0.78
0.44
1.93
0.46
0.50
0.75
0.38
0.40
0.94
0.50
0.40
1.29
0.78
0.44
1.93
Results
• VARs estimated by OLS performs better when the number of
lags is chosen by the BIC criterion
• However even in this case the forecasts from SMALL and CEE
are less accurate that than those from MEDIUM and LARGE.
Bayesian shrinkage and FAVAR
FAVAR: augment the baselines monetary VAR (SMALL) with
few principal components extracted from the large dataset
Bernanke & Boivin & Eliasz, 2005; Stock & Watson, 2005.
Principal components are extracted from the large panel of 131 variables.
Variables are first stationarized by taking first difference wherever we have
imposed a random walk prior δi = 1. Then they are standardized since PC
are not scale invariant. Principal components are computed recursively at
each point T in the evaluation sample.
Specification details
We augment the baseline VAR with one [1] and three [3] principal
components and look at different lag selection for the VAR.
- p = 13, as in Bernanke & Boivin & Eliasz, 2005
- p selected by BIC.
- BFAVAR, using Bayesian estimation with p = 13 [we impose
the fit from OLS on the small model as in Table 1]
- Results for alternative shrinkages are in the paper
Table 3: FAVAR, Relative MSFE, 1971-2003
p = 13 p = BIC BVAR LARGE
r=1 h=1 EMP
1.36
0.54
0.70
0.46
CPI
1.10
0.57
0.65
0.50
FFR
1.86
0.98
0.89
0.75
h=12 EMP
1.15
0.98
0.92
0.78
CPI
0.95
0.58
0.70
0.44
FFR
2.69
1.43
1.93
1.93
r=3 h=1 EMP
3.02
0.52
0.65
0.46
CPI
2.39
0.52
0.58
0.50
FFR
2.40
0.97
0.85
0.75
h=12 EMP
3.16
0.84
0.83
0.78
CPI
1.98
0.54
0.64
0.44
FFR
7.09
1.46
1.69
1.93
Table 3: FAVAR, Relative MSFE, 1971-2003
p = 13 p = BIC BVAR LARGE
r=1
h=1
EMP
1.36
0.54
0.70
0.46
CPI
1.10
0.57
0.65
0.50
FFR
1.86
0.98
0.89
0.75
h=3
EMP
1.13
0.55
0.68
0.38
CPI
0.80
0.49
0.55
0.40
FFR
1.62
1.12
1.03
0.94
h=6
EML
1.33
0.73
0.87
0.50
CPI
0.74
0.52
0.55
0.40
FFR
2.07
1.31
1.40
1.29
h=12 EMP
1.15
0.98
0.92
0.78
CPI
0.95
0.58
0.70
0.44
FFR
2.69
1.43
1.93
1.93
r=3
h=1
EMP
3.02
0.52
0.65
0.46
CPI
2.39
0.52
0.58
0.50
FFR
2.40
0.97
0.85
0.75
h=3
EMP
2.11
0.50
0.61
0.38
CPI
1.44
0.44
0.49
0.40
FFR
3.08
1.16
0.99
0.94
h=6
EML
2.52
0.63
0.77
0.50
CPI
1.18
0.46
0.50
0.40
FFR
3.28
1.45
1.27
1.29
h=12 EMP
3.16
0.84
0.83
0.78
CPI
1.98
0.54
0.64
0.44
FFR
7.09
1.46
1.69
1.93
Results:
• FAVAR with p = 13 performs poorly: the gain in parsimony
obtained by PC is lost because of too many lags.
• the FAVAR is in general outperformed by the BVAR of medium
and large size but differences are not that large once we
overcome the overfitting problem by using less lags or by
shrinkage.
Out-of-sample forecasting experiment Conclude
• more information improves forecast accuracy once we control
for overfitting.
• Large BVAR produce accurate forecasts (competitive with
small models and PCs, FAVAR )
⇒ Suitable tool for incorporating large information
Structural analysis
Impulse responses to a monetary policy shock
Recursive identification scheme:
• divide the variables in the panel into 2 categories, slow (real
variables and prices) and fast-moving (financial and monetary
variables).
• Identifying assumption: slow-moving variables do not respond
contemporaneously to a monetary policy shock (Christiano et
al., 2000) ⇒ Recursive (Cholesky) identification scheme.
Assume that monetary policy shock is orthogonal to all other shocks driving
the economy and affects contemporaneously the federal funds rate by 100
basis-points.
Reduced form VAR:
VAR(p):
Yt = µ + A1Yt−1 + ... + ApYt−p + ut;
The Structural VAR:
↓
A0Yt = ν + A1 Yt−1 + ... + Ap Yt−p + et;
ut ∼ W N (0, Ψ)
et ∼ W N (0, D)
where
- D is diagonal,
- A0 is lower triangular with ones on the main diagonal
′−1
=Ψ
- A−1
0 DA0
- A−1
0 A j = Aj
=⇒ unique mapping from Ψ to (A0, D).
Since the mapping from the reduced for to the structural VAR is unique,
draws for the impulse response functions are easily obtained following
Canova (1991) and Gordon & Leeper (1994):
i) generate draws from the posterior of (A1, ..., Ap, Ψ)
ii) for each draw compute the associated A0 and D and hence Aj = A0 Aj
iii) Form each draw for A0 , A1 , ...Ap we compute the IRFs.
We report results for the same degree of shrinkage used for the forecasting
evaluation
Estimation is carried out using whole time span of the data 1961-2002
The number of lags remains p = 13
Impulse response functions to the monetary policy shock
- responses of Employment, CPI and FFR
- different models
PUNEW
CES002
SMALL
0
−0.2
−0.4
−0.6
−0.8
0
12
24
36
CEE
0
−0.2
−0.4
−0.6
−0.8
48
0
12
24
MEDIUM
36
0
−0.2
−0.4
−0.6
−0.8
48
0
12
24
36
LARGE
0
−0.2
−0.4
−0.6
−0.8
48
0
0.5
0.5
0.5
0.5
0
0
0
0
−0.5
−0.5
−0.5
−0.5
FYFF
−1
0
12
24
36
48
−1
0
12
24
36
48
−1
0
12
24
36
48
−1
0
1.5
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
0
0
0
0
0
12
24
36
48
0
12
24
36
CI 0.9
48
0
12
CI 0.68
24
36
IRF
48
0
12
24
36
48
12
24
36
48
12
24
36
48
Results:
• MEDIUM and LARGE impulse responses have the expected
signs
• Adding information helps resolving price puzzle, cf. Christiano et al. (2000), Bernanke et al. (2005)
• For larger lags the confidence intervals for LARGE become
explosive
Explosive draws ↔ non-stationarity (data enter in levels)
Solve by “inexact differencing”
Rewrite the VAR(p):
Yt = µ + A1Yt−1 + ... + ApYt−p + ut
in the ECM form:
∆Yt = µ + (In − A1 − A2 − ... − Ap )Yt−1 + B1∆Yt−1 + ... + Bp∆Yt−p + ut
- “Exact differences”: impose (In − A1 − A2 − ... − Ap) equal to zero
- “Inexact differences”: shrink (In − A1 − A2 − ... − Ap ) to zero.
⇒ “Sum of coefficients prior” (Doan, Litterman, Sims, 1984)
Let τ control for the degree of shrinkage on the sum of coefficients
• τ
↓0
• τ ↑∞
→ “exact differences”
→ no shrinkage
Our choice: loose prior on sum of coefficients: τ = 10λ
Robust for less informative priors (ex. τ = 100λ) and more informative prior (τ = λ).
Impulse response functions to the monetary policy shock
- responses of Employment, CPI and FFR
- sum of coefficient prior
CES002
SMALL
PUNEW
MEDIUM
LARGE
0
0
0
0
−0.5
−0.5
−0.5
−0.5
−1
−1
−1
−1
0
12
24
36
48
0
12
24
36
48
0
12
24
36
48
0
0.5
0.5
0.5
0.5
0
0
0
0
−0.5
−0.5
−0.5
−0.5
−1
−1
−1
−1
0
FYFF
CEE
12
24
36
48
0
12
24
36
48
0
12
24
36
48
0
1.5
1.5
1.5
1.5
1
1
1
1
0.5
0.5
0.5
0.5
0
0
0
0
−0.5
0
12
24
36
48
−0.5
0
12
24
36
CI 0.9
48
−0.5
0
12
CI 0.68
24
36
IRF
48
−0.5
0
12
24
36
48
12
24
36
48
12
24
36
48
Impulse response functions to the monetary policy shock
- model LARGE
- responses of selected variables
- sum of coefficient prior
CES002
PSM99Q
PUNEW
0
0
−0.2
−0.4
−0.6
−0.5
−1
0
12
24
36
48
0
12
A0M224_R
24
36
48
12
24
36
48
36
0
48
12
24
24
36
48
0
0
12
24
36
48
36
48
0
12
24
−2
12
24
0
48
36
48
48
12
24
36
48
12
36
CI 0.9
48
24
36
48
0
12
24
36
48
36
48
36
48
1
0
−1
0
12
24
36
48
0
12
24
EXRUS
1.5
1
0.5
0
−0.5
−0.2
24
48
FMRRA
0
12
36
CES275
0.2
0
0
FYGT10
1
0
−1
−2
0
36
FM2
36
24
0
−0.2
−0.4
−0.6
−0.8
FSPCOM
2
24
0.6
0.4
0.2
0
−0.2
FMRNBA
0
12
GMDC
−0.5
0
0
FM1
0.5
12
0.2
0.1
0
−0.1
−1
12
0
LHUR
−0.5
FYFF
0.5
48
0
0
1
36
−0.5
0
−0.5
−1
−1.5
24
24
0
PWFSA
2
0
−2
−4
12
12
0.5
HSFR
0
−0.5
0
A0M082
0.5
0
−0.5
−1
0
0
IPS10
0.6
0.4
0.2
0
−0.2
−0.4
A0M051
0.5
0
−1
−2
−3
0
CI 0.68
12
24
IRF
36
48
0
12
24
BVAR, Relative MSFE, 1971-2003: baseline
h=1
h=12
EMP
CPI
FFR
EMP
CPI
FFR
SMALL
1.14
0.89
1.86
1.02
0.83
2.59
CEE
0.67
0.52
0.89
1.21
0.57
1.71
MEDIUM
0.54
0.50
0.78
0.86
0.47
1.48
LARGE
0.46
0.50
0.75
0.78
0.44
1.93
BVAR, Relative MSFE, 1971-2003: with sum of coefficient prior
h=1
h=12
EMP
CPI
FFR
EMP
CPI
FFR
SMALL
1.14
0.89
1.86
1.02
0.83
2.59
CEE
0.68
0.57
0.97
0.65
0.55
1.61
MEDIUM
0.53
0.49
0.75
0.60
0.43
0.93
LARGE
0.44
0.49
0.74
0.50
0.40
0.92
BVAR, Relative MSFE, 1971-2003
With sum of coefficient prior
h=1
h=3
h=6
h=12
EMP
CPI
FFR
EMP
CPI
FFR
EML
CPI
FFR
EMP
CPI
FFR
SMALL
CEE
MEDIUM
LARGE
1.14
0.89
1.86
0.68
0.57
0.97
0.53
0.49
0.75
0.44
0.49
0.74
0.95
0.66
1.77
0.60
0.44
1.28
0.49
0.39
0.85
0.36
0.37
0.82
1.11
0.64
2.08
0.65
0.45
1.40
0.58
0.37
0.96
0.44
0.36
0.92
1.02
0.83
2.59
0.65
0.55
1.61
0.60
0.43
0.93
0.50
0.40
0.92
Results
With the prior on the sum of coefficients everything works better:
• it makes the IRFs of the LARGE model non explosive.
• improves forecast accuracy particularly for long horizons and
for CPI and the FFR.
Similar finding in Tallman & Robertson, 1999
Conclusions
Download