Problems in model averaging with dummy variables

Problems in model averaging with dummy variables David F. Hendry and J. James Reade Economics Department, Oxford University Model Evaluation in Macroeconomics Workshop, University of Oslo 6th May 2005 1. Introduction Model averaging: 1. Introduction Model averaging: • widely used, 1. Introduction Model averaging: • widely used, • proposed as a method for accommodating model uncertainly, 1. Introduction Model averaging: • widely used, • proposed as a method for accommodating model uncertainly, • can be shown to have desirable properties in a stationary world: 1. Introduction Model averaging: • widely used, • proposed as a method for accommodating model uncertainly, • can be shown to have desirable properties in a stationary world: – Raftery et al. (1997) show on logarithmic scoring rule, averaged model forecasts better than any individual model. 1. Introduction Model averaging: • widely used, • proposed as a method for accommodating model uncertainly, • can be shown to have desirable properties in a stationary world: – Raftery et al. (1997) show on logarithmic scoring rule, averaged model forecasts better than any individual model. – Hendry & Clements (2004) explore cases where averaging might improve forecasts. 1. Introduction Model averaging: • widely used, • proposed as a method for accommodating model uncertainly, • can be shown to have desirable properties in a stationary world: – Raftery et al. (1997) show on logarithmic scoring rule, averaged model forecasts better than any individual model. – Hendry & Clements (2004) explore cases where averaging might improve forecasts. But: extention to non-stationary world presents difficulties. Plan Section 2: • model averaging introduced, • various methods of implementing outlined, • use of model averaging in empirical literature discussed. In Section 3: • simple models introduced to highlight problems with model averaging in empirically relevant situations, • Monte Carlo simulations used to support the predictions of models and to suggest problems exist in more general context. Section 4 concludes. 2. Model Averaging Possible in both: 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and • Bayesian framework (BMA) (see Raftery et al. 1997). 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and • Bayesian framework (BMA) (see Raftery et al. 1997). Latter much more commonly used in literature. Examples include: 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and • Bayesian framework (BMA) (see Raftery et al. 1997). Latter much more commonly used in literature. Examples include: • growth theory: – Fernandez et al. (2001), 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and • Bayesian framework (BMA) (see Raftery et al. 1997). Latter much more commonly used in literature. Examples include: • growth theory: – Fernandez et al. (2001), – Doppelhofer et al. (2000), 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and • Bayesian framework (BMA) (see Raftery et al. 1997). Latter much more commonly used in literature. Examples include: • growth theory: – Fernandez et al. (2001), – Doppelhofer et al. (2000), • US quarterly GDP: Koop & Potter (2003), 2. Model Averaging Possible in both: • classical statistical framework (see Buckland et al. 1997), and • Bayesian framework (BMA) (see Raftery et al. 1997). Latter much more commonly used in literature. Examples include: • growth theory: – Fernandez et al. (2001), – Doppelhofer et al. (2000), • US quarterly GDP: Koop & Potter (2003) , • Swedish inflation: Eklund & Karlsson (2004). 2.1. Implementation • Set of K variables thought to have explanatory power for parameters of interest. 2.1. Implementation • Set of K variables thought to have explanatory power for parameters of interest. • Form a set M of L models (every subset), {M1, . . . , ML} ∈ M. 2.1. Implementation • Set of K variables thought to have explanatory power for parameters of interest. • Form a set M of L models (every subset), {M1, . . . , ML} ∈ M. • Model selection could be used to reduce size of M. 2.1. Implementation • Set of K variables thought to have explanatory power for parameters of interest. • Form a set M of L models (every subset), {M1, . . . , ML} ∈ M. • Model selection could be used to reduce size of M. • Consider linear regression models. 2.1.1. Reporting of results In conventional linear regression analysis, βb = E(β |X ) reported. Might expect weighted average of βb over the L models, βe = L X wl βl , (1) e +u y = βX e. (2) l=0 and output of: 2.1.2. Existence of true parameters • BUT: Bayesian statisticians reject idea of a single, true estimate. 2.1.2. Existence of true parameters • BUT: Bayesian statisticians reject idea of a single, true estimate. • Instead: each parameter has a distribution. 2.1.2. Existence of true parameters • BUT: Bayesian statisticians reject idea of a single, true estimate. • Instead: each parameter has a distribution. • Thus Fernandez et al. (2001) report probabilities of inclusion for parameters, plot distribution functions. 2.1.2. Existence of true parameters • BUT: Bayesian statisticians reject idea of a single, true estimate. • Instead: each parameter has a distribution. • Thus Fernandez et al. (2001) report probabilities of inclusion for parameters, plot distribution functions. • Hoover & Perez (2004) pp. 767–769 for discussion on existence of ‘true’ parameters, relevance for empirical work. 2.1.2. Existence of true parameters • BUT: Bayesian statisticians reject idea of a single, true estimate. • Instead: each parameter has a distribution. • Thus Fernandez et al. (2001) report probabilities of inclusion for parameters, plot distribution functions. • Hoover & Perez (2004) pp. 767–769 for discussion on existence of ‘true’ parameters, relevance for empirical work. • We analyse model averaging as in equations (1) and (2). 2.1.3. Issues relating to construction of weights Key issue on two levels: • how to construct the weights; • which weighting criterion to use. 2.1.3. How to construct the weights Considering the first issue, for any particular weighting criterion, say Cl , the weighting method might be: Cl wl = PL i=1 Ci . (3) 2.1.3. How to construct the weights Considering the first issue, for any particular weighting criterion, say Cl , the weighting method might be: Cl wl = PL i=1 • ensures that PL l=1 wl = 1. Ci . (3) 2.1.3. How to construct the weights Considering the first issue, for any particular weighting criterion, say Cl , the weighting method might be: Cl wl = PL i=1 • ensures that PL l=1 Ci . (3) wl = 1. • but no variable appears in every model ⇒ sum of weights applied to particular variable not unity, so bias. 2.1.3. How to construct the weights Alternative construction: rescale weight for regressor, so sum over models it appears in is unity. If Nk ⊂ M set of models in M containing βk : wl = P Cl i∈Nk Ci . (4) 2.1.3. How to construct the weights Alternative construction: rescale weight for regressor, so sum over models it appears in is unity. If Nk ⊂ M set of models in M containing βk : wl = P Cl i∈Nk Ci . • weights for any particular regressor sum to unity. (4) 2.1.3. How to construct the weights • Buckland et al. (1997) favour first method: sum of weights measure of importance of regressor. 2.1.3. How to construct the weights • Buckland et al. (1997) favour first method: sum of weights measure of importance of regressor. • But Doppelhofer et al. (2000) advocate rescaled weighting for reporting coefficients: coefficients produced by this method are ones used in forecasting, and for analysing marginal effects. 2.1.3. Which weighting criterion to use • In Bayesian context each model weighted by posterior probability: Pr (Ml ) Pr (X |Ml ) Pr (Ml |X ) = PL . k=1 Pr (Mk ) Pr (X |Mk ) (5) 2.1.3. Which weighting criterion to use • In Bayesian context each model weighted by posterior probability: Pr (Ml ) Pr (X |Ml ) Pr (Ml |X ) = PL . k=1 Pr (Mk ) Pr (X |Mk ) (5) • In non-Bayesian contexts, information criteria might be considered, e.g. Akaike, Bayesian (Schwarz) information criteria. 2.1.3. Which weighting criterion to use • In Bayesian context each model weighted by posterior probability: Pr (Ml ) Pr (X |Ml ) Pr (Ml |X ) = PL . k=1 Pr (Mk ) Pr (X |Mk ) (5) • In non-Bayesian contexts, information criteria might be considered, e.g. Akaike, Bayesian (Schwarz) information criteria. • out-of-sample methods might be used: 2.1.3. Which weighting criterion to use • In Bayesian context each model weighted by posterior probability: Pr (Ml ) Pr (X |Ml ) Pr (Ml |X ) = PL . k=1 Pr (Mk ) Pr (X |Mk ) (5) • In non-Bayesian contexts, information criteria might be considered, e.g. Akaike, Bayesian (Schwarz) information criteria. • out-of-sample methods might be used: – Eklund & Karlsson (2004) predictive Bayesian densities, 2.1.3. Which weighting criterion to use • In Bayesian context each model weighted by posterior probability: Pr (Ml ) Pr (X |Ml ) Pr (Ml |X ) = PL . k=1 Pr (Mk ) Pr (X |Mk ) (5) • In non-Bayesian contexts, information criteria might be considered, e.g. Akaike, Bayesian (Schwarz) information criteria. • out-of-sample methods might be used: – Eklund & Karlsson (2004) predictive Bayesian densities, – Hendry & Clements (2004) minimising MSFE of averaged model. 2.1.3. Which weighting criterion to use • Here, following Buckland et al. (1997), approximation to Schwarz 2 2 information criteria (SIC) used. Uses exp(−b σv,l /2) where σ bv,l is th residual variance of the l model: 2 σ bv,l T 1X 2 vb . = T t=1 t 2.1.3. Which weighting criterion to use • Here, following Buckland et al. (1997), approximation to Schwarz 2 2 information criteria (SIC) used. Uses exp(−b σv,l /2) where σ bv,l is th residual variance of the l model: 2 σ bv,l T 1X 2 vb . = T t=1 t • Thus (non-rescaled) weights given by: 2 exp − 12 σ bv,l wl = PL l=1 exp 2 − 21 σ bv,l . (6) 2.1.3. Which weighting criterion to use Justification for non-Bayesian weights given predominance of BMA in literature: 2.1.3. Which weighting criterion to use Justification for non-Bayesian weights given predominance of BMA in literature: • Schwarz information criteria does not discriminate strongly between similar models, fits in with concerns over model uncertainty. 2.1.3. Which weighting criterion to use Justification for non-Bayesian weights given predominance of BMA in literature: • Schwarz information criteria does not discriminate strongly between similar models, fits in with concerns over model uncertainty. • Schwarz criterion is approximation to Bayes factor. 2.1.3. Which weighting criterion to use Justification for non-Bayesian weights given predominance of BMA in literature: • Schwarz information criteria does not discriminate strongly between similar models, fits in with concerns over model uncertainty. • Schwarz criterion is approximation to Bayes factor. • Difficulty of choosing priors for 2K models 2.1.3. Which weighting criterion to use Justification for non-Bayesian weights given predominance of BMA in literature: • Schwarz information criteria does not discriminate strongly between similar models, fits in with concerns over model uncertainty. • Schwarz criterion is approximation to Bayes factor. • Difficulty of choosing priors for 2K models Thus can apply analytical results and Monte Carlo simulation results here to Bayesian model averaging. 2.2. Context of macroeconomic modelling Model averaging just one way of carrying out data-centred macroeconomic modelling exercise; other paradigms include: 2.2. Context of macroeconomic modelling Model averaging just one way of carrying out data-centred macroeconomic modelling exercise; other paradigms include: • alternative Bayesian strategies such as extreme-bounds analysis (see Hoover & Perez 2004); 2.2. Context of macroeconomic modelling Model averaging just one way of carrying out data-centred macroeconomic modelling exercise; other paradigms include: • alternative Bayesian strategies such as extreme-bounds analysis (see Hoover & Perez 2004); • General-to-Specific model selection (see Hoover & Perez 1999, Hendry & Krolzig 2005, Perez-Amaral et al. 2003): 2.2. Context of macroeconomic modelling Model averaging just one way of carrying out data-centred macroeconomic modelling exercise; other paradigms include: • alternative Bayesian strategies such as extreme-bounds analysis (see Hoover & Perez 2004); • General-to-Specific model selection (see Hoover & Perez 1999, Hendry & Krolzig 2005, Perez-Amaral et al. 2003): – general model posited to include all possible determining factors for parameter of interest, 2.2. Context of macroeconomic modelling Model averaging just one way of carrying out data-centred macroeconomic modelling exercise; other paradigms include: • alternative Bayesian strategies such as extreme-bounds analysis (see Hoover & Perez 2004); • General-to-Specific model selection (see Hoover & Perez 1999, Hendry & Krolzig 2005, Perez-Amaral et al. 2003): – general model posited to include all possible determining factors for parameter of interest, – then process of reduction Hendry (1995, Ch. 9) carried out resulting in parsimonious, congruent, encompassing model. 3. The Effect of Dummy Variables on Model Averaging 3.1. Impulse dummy variables • Simple location-scale data generation process (DGP) with transient mean shift: yt = β + γ1{t=ta} + vt, where vt ∼ IN 0, σv2 (7) 3. The Effect of Dummy Variables on Model Averaging 3.1. Impulse dummy variables • Simple location-scale data generation process (DGP) with transient mean shift: yt = β + γ1{t=ta} + vt, where vt ∼ IN 0, σv2 (7) • Parameter of interest is β, forecast for yT +1, from forecast origin T . 3. The Effect of Dummy Variables on Model Averaging 3.1. Impulse dummy variables • Simple location-scale data generation process (DGP) with transient mean shift: yt = β + γ1{t=ta} + vt, where vt ∼ IN 0, σv2 (7) • Parameter of interest is β, forecast for yT +1, from forecast origin T . √ • consider empirically relevant case where γ = λ T for a fixed constant λ , (see Doornik et al. 1998). 3.1. Impulse dummy variables • General model (GUM, our initial K variables) is DGP augmented by unnecessary impulse dummy, d2,t = 1{t=tb} (where d1,t = 1{t=ta}): yt = β + γd1,t + δd2,t + ut (8) 3.1. Impulse dummy variables • General model (GUM, our initial K variables) is DGP augmented by unnecessary impulse dummy, d2,t = 1{t=tb} (where d1,t = 1{t=ta}): yt = β + γd1,t + δd2,t + ut (8) • thus structural break or outlier has been accounted for, but only one transient location shift actually occurrs (investigator unaware of this). 3.1. Impulse dummy variables • 23 = 8 possible models result,can be estimated using least squares: M0: M2: M4: M6: ybt ybt ybt ybt =0 = δb(2)d1,t = βb(4) + δb(4)d1,t = δb(6)d1,t + γ b(6)d2,t M1: M3: M5: M7: ybt ybt ybt ybt = βb(1) =γ b(3)d2,t = βb(5) + γ b(5)d2,t = βb(7) + δb(7)d1,t + γ b(7)d2,t (9) 3.1.1. Deriving weights and estimates • Least squares gives 3 possible outcomes: βb(0) = βb(2) = βb(3) = βb(6) = 0, T T X X 1 1 γ λ b b β + γ1{t=tb} + vt ' β + = β + √ , β(1) = β(4) = yt = T t=1 T t=1 T T βb(5) = βb(7) T T X X 1 1 yt = β + γ1{t=tb} + vt ' β . = T − 1 t=1,t6=t T − 1 t=1,t6=t b b 3.1.1. Deriving weights and estimates Cumulating these: λ βe ' (w5 + w7) β + (w1 + w4) β + √ T λ = (w1 + w4 + w5 + w7) β + (w1 + w4) √ . (12) T • averaged coefficient 6= true coefficient PL if λ 6= 0, and/or w1 + w3 + w5 + w7 < 1 (which l=1 wl = 1 ⇒ in most cases). 3.1.1. Deriving weights and estimates Cumulating these: λ βe ' (w5 + w7) β + (w1 + w4) β + √ T λ = (w1 + w4 + w5 + w7) β + (w1 + w4) √ . (12) T • averaged coefficient 6= true coefficient PL if λ 6= 0, and/or w1 + w3 + w5 + w7 < 1 (which l=1 wl = 1 ⇒ in most cases). √ • rescaling will mean w1 + w4 larger, hence bias from λ/ T greater. 3.1.1. Deriving weights and estimates Cumulating these: λ βe ' (w5 + w7) β + (w1 + w4) β + √ T λ = (w1 + w4 + w5 + w7) β + (w1 + w4) √ . (12) T • averaged coefficient 6= true coefficient PL if λ 6= 0, and/or w1 + w3 + w5 + w7 < 1 (which l=1 wl = 1 ⇒ in most cases). √ • rescaling will mean w1 + w4 larger, hence bias from λ/ T greater. e the irrelevant regressor, will receive greater weight. • rescaling ⇒ δ, 3.1.2. Model averaging for forecasting stationary data • One justification for model averaging is for ‘forecast pooling’. Outlier one-off so will not occur in forecast period: M0: ybT +1,0 = 0 M1: ybT +1,1 = βb(1) M2: ybT +1,2 = 0 M3: ybT +1,3 = 0 M4: ybT +1,4 = βb(4) M5: ybT +1,5 = βb(5) M6: ybT +1,6 = 0 M7: ybT +1,7 = βb(7) (13) 3.1.2. Model averaging for forecasting stationary data • One justification for model averaging is for ‘forecast pooling’. Outlier one-off so will not occur in forecast period: M0: ybT +1,0 = 0 M1: ybT +1,1 = βb(1) M2: ybT +1,2 = 0 M3: ybT +1,3 = 0 M4: ybT +1,4 = βb(4) M5: ybT +1,5 = βb(5) M6: ybT +1,6 = 0 M7: ybT +1,7 = βb(7) • Letting: yeT +1|T = P7 i=0 wiybT +1,i (13) 3.1.2. Model averaging for forecasting stationary data • One justification for model averaging is for ‘forecast pooling’. Outlier one-off so will not occur in forecast period: M0: ybT +1,0 = 0 M1: ybT +1,1 = βb(1) M2: ybT +1,2 = 0 M3: ybT +1,3 = 0 M4: ybT +1,4 = βb(4) M5: ybT +1,5 = βb(5) M6: ybT +1,6 = 0 M7: ybT +1,7 = βb(7) • Letting: yeT +1|T = P7 i=0 wiybT +1,i • Then forecast error is veT +1|T = yT +1 − yeT +1|T , with mean: λ E veT +1|T = (w0 + w2 + w3 + w6) β − (w1 + w4) √ . T (13) 3.1.2. Model averaging for forecasting stationary data • One justification for model averaging is for ‘forecast pooling’. Outlier one-off so will not occur in forecast period: M0: ybT +1,0 = 0 M1: ybT +1,1 = βb(1) M2: ybT +1,2 = 0 M3: ybT +1,3 = 0 M4: ybT +1,4 = βb(4) M5: ybT +1,5 = βb(5) M6: ybT +1,6 = 0 M7: ybT +1,7 = βb(7) • Letting: yeT +1|T = P7 i=0 wiybT +1,i • Then forecast error is veT +1|T = yT +1 − yeT +1|T , with mean: λ E veT +1|T = (w0 + w2 + w3 + w6) β − (w1 + w4) √ . T • Hence again bias. (13) 3.1.2. Model averaging for forecasting stationary data Worse still, MSFE:  2 E veT +1|T = E  yT +1 − 7 X !2 wiybT +1,i  i=0 " 2 # λ (w0 + w2 + w3 + w6) β − (w1 + w4) √ T 2 2 2 2 λ 2 = σv + (w0 + w2 + w3 + w6) β + (w1 + w4) T βλ + 2 (w0 + w2 + w3 + w6) (w1 + w4) √ , T = E (14) 3.1.2. Model averaging for forecasting stationary data Worse still, MSFE:  2 E veT +1|T = E  yT +1 − 7 X !2 wiybT +1,i  i=0 " 2 # λ (w0 + w2 + w3 + w6) β − (w1 + w4) √ T 2 2 2 2 λ 2 = σv + (w0 + w2 + w3 + w6) β + (w1 + w4) T βλ + 2 (w0 + w2 + w3 + w6) (w1 + w4) √ , (14) T • Likely to be worse for large λ than GUM or any selected model, even allowing for estimation uncertainty, certainly if weights are not rescaled. = E 3.1.3. Numerical Example • β = 1, λ = −1, σ 2v = 0.01, and T = 25. Then averaged β estimate is: λ βe = (w5 + w7) β + (w1 + w4) β + √ T 1 = 0.382 + (0.305) 1 − = 0.626 (15) 5 3.1.3. Numerical Example • β = 1, λ = −1, σ 2v = 0.01, and T = 25. Then averaged β estimate is: λ βe = (w5 + w7) β + (w1 + w4) β + √ T 1 = 0.382 + (0.305) 1 − = 0.626 (15) 5 • very biased for the true value of unity. 3.1.3. Numerical Example • β = 1, λ = −1, σ 2v = 0.01, and T = 25. Then averaged β estimate is: λ βe = (w5 + w7) β + (w1 + w4) β + √ T 1 = 0.382 + (0.305) 1 − = 0.626 (15) 5 • very biased for the true value of unity. • MSFE when forecasting without rescaling the weights is: 2 E veT +1|T = 0.118 0.118. 3.1.3. Numerical Example • Bias smaller if second weighting methodology used: λ 1 βe = β + (w5 + w7) √ = 1 + 0.37754 − = 0.924 0.924. 5 T 3.1.3. Numerical Example • Bias smaller if second weighting methodology used: λ 1 βe = β + (w5 + w7) √ = 1 + 0.37754 − = 0.924 0.924. 5 T • Hard to calculate MSFE with rescaled weights, because each weight depends on which coefficient being multiplied by; would expect MSFE smaller when weights rescaled. 3.1.3. Numerical Example √ • γ = −5 large when σv = 0.1, but outliers of magnitude T often occur in practical models (see Hendry 2001, Doornik et al. 1998). • In Monte Carlo simulation, range of values of λ and T considered. 3.1.4. Impulse Dummy Monte Carlo Simulation Table 1: Bias on β coefficient for 1,000 replications. T GUM MA MA R Lib λ=0 25 0.000 -0.319 0.000 0.000 50 -0.001 -0.316 -0.001 -0.001 λ = -0.05 25 0.000 -0.323 -0.006 0.000 50 -0.001 -0.319 -0.004 -0.001 λ = -0.5 25 0.000 -0.362 -0.048 0.000 50 -0.001 -0.348 -0.034 -0.001 λ = -1 25 0.000 -0.397 -0.079 0.000 50 -0.001 -0.376 -0.055 -0.001 each modelling strategy. Based on Cons DGP True Value 0.000 -0.001 1 1 0.000 -0.001 1 1 0.000 -0.001 1 1 0.000 -0.001 1 1 3.1.4. Impulse Dummy Monte Carlo Simulation • Bias when simply the GUM is run is tiny. • Model averaging induces large bias, ranging from about 30% of the true β coefficient size when the dummies are both insignificant (λ = 0), to around 40% when λ = −1. • Calculations of previous Section are supported here; bias of -0.374 reproduced and in fact stronger, -0.397. • Rescaling leads to lower bias; bias does increase with size of true γ (T fixed), but only reaches about 8% of coefficient size (again corroborating calculations from earlier). • Model selection induces GUM-sized, negligible bias regardless of strategy. 3.1.4. Impulse Dummy Monte Carlo Simulation Table 2: Bias on γ coefficient for each modelling strategy. Based on 1,000 replications. T GUM MA MA R Lib Cons DGP True Value λ=0 25 -0.002 0.212 0.383 -0.073 -0.116 0 50 0.000 0.211 0.381 0.026 -0.042 0 λ = -0.05 25 -0.002 0.324 0.383 -0.004 -0.007 -0.25 -0.354 50 0.000 0.369 0.381 -0.002 -0.002 λ = -0.5 25 -0.002 1.277 0.383 -0.003 -0.003 2.5 50 0.000 1.709 0.381 -0.002 -0.002 3.536 λ = -1 25 -0.002 1.966 0.383 -0.003 -0.003 5 50 0.000 2.645 0.381 -0.002 -0.002 7.07 3.1.4. Impulse Dummy Monte Carlo Simulation • Again negligible GUM bias. • Bias on γ under model averaging decreases as percentage of size of true coefficient from about 100% when dummy barely noticeable (λ = −0.05), to about 40% when dummy very conspicuous (λ = −1). • Rescaled bias invariant to changes in λ (can be shown why). • Rescaled bias larger when λ small (dummy effectively insignificant); backs up earlier assertion. Bias smaller when λ bigger. • Model selection bias again very small. 3.1.4. Impulse Dummy Monte Carlo Simulation Table 3: MSFE for 1-step forecast of T + 1 from T for each modelling strategy. Based on 1,000 replications. GUM MA MA R Lib Cons σ 2 λ=0 T 25 0.011 0.108 0.011 0.013 0.013 0.01 T 50 0.010 0.109 0.010 0.013 0.013 0.01 λ = -0.05 T 25 0.011 0.111 0.011 0.013 0.013 0.01 T 50 0.010 0.111 0.010 0.013 0.013 0.01 λ = -0.5 T 25 0.011 0.137 0.013 0.013 0.013 0.01 T 50 0.010 0.130 0.011 0.013 0.013 0.01 λ = -1 T 25 0.011 0.163 0.016 0.013 0.013 0.01 T 50 0.010 0.151 0.013 0.013 0.013 0.01 3.1.4. Impulse Dummy Monte Carlo Simulation • GUM MSFE is of expected size. • Huge MSFEs predicted for MA are supported • Rescaling substantially improves MSFE; only when λ large is MSFE here larger than under model selection. • Model selection provides competitive MSFE for each (λ, T ) combination. • Suggests that model averaging, appropriately used, could be useful for forecasting (see Raftery et al. 1997, Hendry & Clements 2004). 3.1.5. Does this generalise? • Constant and dummies very simplistic; relevant to real-world applications? 3.1.5. Does this generalise? • Constant and dummies very simplistic; relevant to real-world applications? • Monte Carlo on initial model model, constant replaced with regressor ⇒ same results; in fact stronger bias of 0.103 on β in λ = −1, T = 25 case. 3.1.5. Does this generalise? • Constant and dummies very simplistic; relevant to real-world applications? • Monte Carlo on initial model model, constant replaced with regressor ⇒ same results; in fact stronger bias of 0.103 on β in λ = −1, T = 25 case. √ • If dummy variable was say d = 1{t1<t<t√ , then all λ/ T 2} expressions replaced with (t2 − t1) λ/ T , expect bias to increase with size of period dummy covers. 3.1.5. Does this generalise? • Constant and dummies very simplistic; relevant to real-world applications? • Monte Carlo on initial model model, constant replaced with regressor ⇒ same results; in fact stronger bias of 0.103 on β in λ = −1, T = 25 case. √ • If dummy variable was say d = 1{t1<t<t√ , then all λ/ T 2} expressions replaced with (t2 − t1) λ/ T , expect bias to increase with size of period dummy covers. • Further, some time series, many cross section studies have ‘intermittent’ dummies, i.e. d = 1{t∈D}. e.g. industrial action in a year, country located in Africa. Expect same effect on bias. 3.1.5. Does this generalise? • Constant and dummies very simplistic; relevant to real-world applications? • Monte Carlo on initial model model, constant replaced with regressor ⇒ same results; in fact stronger bias of 0.103 on β in λ = −1, T = 25 case. √ • If dummy variable was say d = 1{t1<t<t√ , then all λ/ T 2} expressions replaced with (t2 − t1) λ/ T , expect bias to increase with size of period dummy covers. • Further, some time series, many cross section studies have ‘intermittent’ dummies, i.e. d = 1{t∈D}. e.g. industrial action in a year, country located in Africa. Expect same effect on bias. • Will consider these more general contexts... 3.2. Period and Intermittent Dummies 3.2.1. Period Dummy Monte Carlo Simulation • Monte Carlo from earlier: yt = β + γd1,t + δd2,t + ut (18) rerun with d1,t = 1{0<t<T /2} (half the sample), and d2,t = 1{(T /2)+1<t<(T /2)+8}; practitioner unsure of point of break. Same parameter values as before. 3.2.1. Period Dummy Monte Carlo Simulation Table 4: Bias on β and MSFE for each modelling strategy. Based on 1,000 replications. Bias on β (true value=1) MSFE (σ 2 = 0.01) GUM MA R Lib Cons GUM MA R Lib Cons λ=0 λ=0 T 25 -0.001 -0.001 0.000 0.000 0.013 0.011 0.014 0.014 T 50 -0.001 -0.001 0.000 -0.001 0.011 0.010 0.013 0.013 λ = -0.05 λ = -0.05 T 25 -0.001 -0.081 -0.001 -0.002 0.013 0.017 0.015 0.015 T 50 -0.001 -0.101 -0.001 -0.001 0.011 0.020 0.013 0.013 λ = -0.5 λ = -0.5 T 25 -0.001 -0.606 0.001 0.000 0.013 0.370 0.014 0.014 T 50 -0.001 -0.409 -0.001 -0.001 0.011 0.177 0.013 0.013 λ = -1 λ = -1 T 25 -0.001 -0.418 0.001 0.000 0.013 0.182 0.014 0.014 T 50 -0.001 -0.020 -0.001 -0.001 0.011 0.011 0.013 0.013 3.2.1. Period Dummy Monte Carlo Simulation • Bias same across all strategies when dummies irrelevant but in GUM. • But even slight break (λ = −0.05) gives bias up to 10% of coefficient size in model averaging. • λ = −0.05 gives horrific bias for model averaging but nothing from selection or GUM. • Similar story for MSFE; competitive when λ = 0 but quickly deteriorates as λ increases. • Horrendous MSFE when λ = −0.5 of up to 30 times DGP error variance. • Competitive MSFE again for larger T when λ = −1 (i.e. γ is massive). • So predictions borne out for longer dummies. 3.2.2. Period Dummy Monte Carlo Simulation: Generalisation • Ran experiment again with two regressors in place of constant. 3.2.2. Period Dummy Monte Carlo Simulation: Generalisation • Ran experiment again with two regressors in place of constant. • Principle appears to generalise as again get noticeable bias and worse MSFE. Results in paper. 3.2.3. Intermittent Dummy Monte Carlo Simulation • Also considered role of intermittent dummies. 3.2.3. Intermittent Dummy Monte Carlo Simulation • Also considered role of intermittent dummies. • GUM: yt = β1X1,t + β2X2,t + γd1,t + δd2,t + ut. 3.2.3. Intermittent Dummy Monte Carlo Simulation • Also considered role of intermittent dummies. • GUM: yt = β1X1,t + β2X2,t + γd1,t + δd2,t + ut. • d1,t is African dummy, d2,t is Latin America dummy (both from Sala-i-Martin 1997a, Sala-i-Martin 1997b). 3.2.3. Intermittent Dummy Monte Carlo Simulation • Also considered role of intermittent dummies. • GUM: yt = β1X1,t + β2X2,t + γd1,t + δd2,t + ut. • d1,t is African dummy, d2,t is Latin America dummy (both from Sala-i-Martin 1997a, Sala-i-Martin 1997b). • As before, d1,t relevant, d2,t irrelevant. X1,t, X2,t both relevant, both mean zero Normally distributed random numbers. 3.2.3. Intermittent Dummy Monte Carlo Simulation • Also considered role of intermittent dummies. • GUM: yt = β1X1,t + β2X2,t + γd1,t + δd2,t + ut. • d1,t is African dummy, d2,t is Latin America dummy (both from Sala-i-Martin 1997a, Sala-i-Martin 1997b). • As before, d1,t relevant, d2,t irrelevant. X1,t, X2,t both relevant, both mean zero Normally distributed random numbers. • β1 = β2 = 1. 3.2.3. Intermittent Dummy Monte Carlo Simulation Table 5: Bias on β1 for each modelling strategy. Based on 1,000 replications. Bias on β1 (true value=1) MSFE (σ 2 = 0.01) GUM MA R MS GUM MA R MS λ=0 λ=0 T 50 0.000 0.109 0.000 0.010 0.067 0.013 T 75 0.000 0.069 0.000 0.010 0.011 0.015 λ = -0.05 λ = -0.05 T 50 0.000 0.133 -0.001 0.010 0.073 0.013 T 75 0.000 0.084 0.000 0.010 0.012 0.015 λ = -0.5 λ = -0.5 T 50 0.000 0.212 -0.001 0.010 0.095 0.013 T 75 0.000 0.085 0.000 0.010 0.013 0.015 λ = -1 λ = -1 T 50 0.000 0.123 -0.001 0.010 0.072 0.013 T 75 0.000 0.070 0.000 0.010 0.011 0.015 3.2.3. Intermittent Dummy Monte Carlo Simulation • Bias on β1, one of parameters of interest, shown. • Even when dummies irrelevant, bias noticeably stronger on model averaging (rescaled), especially as T increases. • Bias most when λ = −0.5, considerably more than any other modelling strategy. • Bias pretty invariant to λ when T larger (75). Not huge but much larger than any other strategy. 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. • Rescaling improves both bias on β and MSFE, 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. • Rescaling improves both bias on β and MSFE, • BUT: rescaling increases size of coefficients on irrelevant variables, 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. • Rescaling improves both bias on β and MSFE, • BUT: rescaling increases size of coefficients on irrelevant variables, • Further, bias still strong, MSFE still large, if got period or intermittent dummies: 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. • Rescaling improves both bias on β and MSFE, • BUT: rescaling increases size of coefficients on irrelevant variables, • Further, bias still strong, MSFE still large, if got period or intermittent dummies: – Worrisome for empirical work, e.g. growth regressions. 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. • Rescaling improves both bias on β and MSFE, • BUT: rescaling increases size of coefficients on irrelevant variables, • Further, bias still strong, MSFE still large, if got period or intermittent dummies: – Worrisome for empirical work, e.g. growth regressions. – Often many dummies specified: ∗ Doppelhofer et al. (2000) 8 dummies in 32 variable, 98 country dataset; ∗ Hoover & Perez (2004) 7 dummies in 36 variable, 107 country dataset. 3.2.4. Lessons from Monte Carlo • Strong bias and shocking MSFE from simple model averaging supported, shown to be general across (λ, T ) combinations. Argues against Buckland et al.’s (1997) idea of model averaging. • Rescaling improves both bias on β and MSFE, • BUT: rescaling increases size of coefficients on irrelevant variables, • Further, bias still strong, MSFE still large, if got period or intermittent dummies: – Worrisome for empirical work, e.g. growth regressions. – Often many dummies specified: ∗ Doppelhofer et al. (2000) 8 dummies in 32 variable, 98 country dataset; ∗ Hoover & Perez (2004) 7 dummies in 36 variable, 107 country dataset. – How biased are regression coeffients, given inclusion of dummies? 3.3. Larger Model • Attempt to show problems of bias occur in larger models. 3.3. Larger Model • Attempt to show problems of bias occur in larger models. • 10 variable dataset with 3 dummies (two relevant). 3.3. Larger Model • Attempt to show problems of bias occur in larger models. • 10 variable dataset with 3 dummies (two relevant). • DGP: yt =β3X3,t + β4X4,t + β5X5,t + β6X6,t + β7X7,t 4 4 8 8 + δ1d1,t + δ2d2,t + vt, 2 vt ∼ N(0, σ ). (21) 3.3. Larger Model • Attempt to show problems of bias occur in larger models. • 10 variable dataset with 3 dummies (two relevant). • DGP: yt =β3X3,t + β4X4,t + β5X5,t + β6X6,t + β7X7,t 4 4 8 8 + δ1d1,t + δ2d2,t + vt, 2 (21) vt ∼ N(0, σ ). • GUM specified with two irrelevant variables and one irrelevant dummy: yt = β1X1,t + β2X2,t + β3X3,t + β4X4,t + β5X5,t + β6X6,t + β7X7,t + δ1d1,t + δ2d2,t + δ3d3,t + ut. (22) 3.4. Larger Model Monte Carlo simulation Table 6: Bias on β2 coefficient for each modelling strategy. Based on 1,000 replications. GUM MA R Lib Con DGP True Value λ =0 T 50 -0.006 -0.212 0.001 0.002 0 T 75 -0.007 -0.192 -0.005 0.001 0 T 100 -0.004 -0.107 -0.020 -0.003 0 λ =-0.5 T 50 -0.006 -0.107 0.002 0.001 0 T 75 -0.007 -0.124 -0.001 0.002 0 T 100 -0.004 -0.049 -0.026 -0.004 0 λ =-1 T 50 -0.006 -0.033 0.002 0.001 0 T 75 -0.007 -0.086 -0.001 0.002 0 T 100 -0.004 -0.014 -0.026 -0.004 0 3.4. Larger Model Monte Carlo simulation • Bias from rescaling on the insignificant coefficient β2 decreasing in T and λ, • sizeable bias if dummies erroneously specified (λ = 0), • considerable bias even when λ = −0.5 hence dummy noticeable. 3.4. Larger Model Monte Carlo simulation Table 7: Bias on β4 coefficient for each modelling strategy. Based on 1,000 replications. GUM MA R Lib Con True DGP value λ =0 50 0.008 -0.178 0.009 0.014 0.676 75 0.001 -0.089 0.007 0.010 0.516 100 0.000 -0.082 0.005 0.009 0.433 λ =-0.5 50 0.008 -0.244 0.010 0.015 0.676 75 0.001 -0.059 0.007 0.011 0.516 100 0.000 -0.079 0.005 0.008 0.433 λ =-1 50 0.008 -0.292 0.010 0.015 0.676 75 0.001 -0.040 0.007 0.011 0.516 100 0.000 -0.076 0.005 0.008 0.433 3.4. Larger Model Monte Carlo simulation • Bias on significant β4 coefficient around 20% of the size of the true coefficient (final column) regardless of T . • When dummy relevant, bias greater as percentage of true coefficient when T = 100 than T = 75. • Large bias for small sample sizes, as T = 50 column suggests. 3.4.1. Effect on other regressors and lessons from Monte Carlo • For β5 and β6, strongly significant parameters, bias in small sample sizes (increasing in λ), but small bias when T large. 3.4.1. Effect on other regressors and lessons from Monte Carlo • For β5 and β6, strongly significant parameters, bias in small sample sizes (increasing in λ), but small bias when T large. • Coefficients on dummies strongly biased, invariant in λ, large bias even when T = 100. 3.4.1. Effect on other regressors and lessons from Monte Carlo • For β5 and β6, strongly significant parameters, bias in small sample sizes (increasing in λ), but small bias when T large. • Coefficients on dummies strongly biased, invariant in λ, large bias even when T = 100. • Suggests predictions of small models earlier generalise to larger models; worrisome for empirical work using BMA. 3.4.1. Effect on other regressors and lessons from Monte Carlo • For β5 and β6, strongly significant parameters, bias in small sample sizes (increasing in λ), but small bias when T large. • Coefficients on dummies strongly biased, invariant in λ, large bias even when T = 100. • Suggests predictions of small models earlier generalise to larger models; worrisome for empirical work using BMA. • MSFE is competitive here from model averaging. But period and intermittent dummy Monte Carlos showed forecasting suffers when structural breaks exist; this does generalise. 4. Conclusions • Small models and outliers and structural breaks ⇒ bias and poor forecasting in averaged models. 4. Conclusions • Small models and outliers and structural breaks ⇒ bias and poor forecasting in averaged models. • Bias and bad forecasting: – because, regardless of rescaling, too much emphasis on bad models, no selection made. 4. Conclusions • Small models and outliers and structural breaks ⇒ bias and poor forecasting in averaged models. • Bias and bad forecasting: – because, regardless of rescaling, too much emphasis on bad models, no selection made. – even if practitioner has noted breaks/outliers and accounted for them in dataset. 4. Conclusions • Small models and outliers and structural breaks ⇒ bias and poor forecasting in averaged models. • Bias and bad forecasting: – because, regardless of rescaling, too much emphasis on bad models, no selection made. – even if practitioner has noted breaks/outliers and accounted for them in dataset. – even in ‘clean’ datasets, DGP ∈ GUM; no collinearity, heteroskedasticity, structural breaks in unmodelled variables... 4. Conclusions We suggest these results: • Refute Buckland et al.’s (1997) un-rescaled averaging, 4. Conclusions We suggest these results: • Refute Buckland et al.’s (1997) un-rescaled averaging, • Call into question Raftery et al.’s (1997) forecasting results in non-stationary datasets, 4. Conclusions We suggest these results: • Refute Buckland et al.’s (1997) un-rescaled averaging, • Call into question Raftery et al.’s (1997) forecasting results in non-stationary datasets, • ⇒ bias problems in work of amongst others Fernandez et al. (2001) and Doppelhofer et al. (2000). 4. Conclusions We suggest these results: • Refute Buckland et al.’s (1997) un-rescaled averaging, • Call into question Raftery et al.’s (1997) forecasting results in non-stationary datasets, • ⇒ bias problems in work of amongst others Fernandez et al. (2001) and Doppelhofer et al. (2000). Model selection shown to be effective alternative to model averaging and has been tested in other difficult modelling contexts (see e.g. Hoover & Perez 2004, Castle 2004). References Buckland, S.T., K.P. Burnham & N.H. Augustin (1997), ‘Model selection: An integral part of inference’, Biometrics 53, 603–618. Castle, J. (2004), Evaluating PcGets and RETINA as automatic model selection algorithms. Unpublished paper, Economics Department, Oxford University. Doornik, Jurgen A, David F Hendry & Bent Nielsen (1998), ‘Inference in cointegrating models: UK M1 revisited’, Journal of Economic Surveys 12(5), 533–72. Doppelhofer, Gernot, Ronald I. Miller & Xavier Sala-i-Martin (2000), Determinants of long-term growth: A Bayesian Averaging of Classical Estimates (BACE) approach, Technical report, National Bureau of Economic Research, Inc. Eklund, J. & S. Karlsson (2004), Forecast combination and model averaging using predictive measures. Unpublished paper, Stockholm School of Economics. Fernandez, C., E. Ley & M.F.J. Steel (2001), ‘Model uncertainty in cross-country growth regressions’, Applied Econometrics 16, 563–576. Hendry, David F. (2001), ‘Modelling UK inflation, 1875-1991’, Journal of Applied Econometrics 16(3), 255–275. Hendry, David F. & Hans-Martin Krolzig (2005), ‘The properties of automatic Gets modelling’, The Economic Journal 105(502), C32–C61. Hendry, D.F. (1995), Dynamic Econometrics, Oxford University Press, Oxford. Hendry, D.F. & M.P. Clements (2004), ‘Pooling of forecasts’, Econometrics Journal 7, 1–31. Hoover, K.D. & S.J. Perez (1999), ‘Data mining reconsidered: Encompassing and the general-to-specific approach to specification search’, Econometrics Journal 2, 167–191. Hoover, Kevin D. & Stephen J. Perez (2004), ‘Truth and robustness in cross-country growth regressions’, Oxford Bulletin of Economics and Statistics 66(5), 765–798. Koop, Gary & Simon Potter (2003), Forecasting in large macroeconomic panels using Bayesian Model Averaging, Staff Report 163, Federal Reserve Bank of New York. Perez-Amaral, Teodosio, Giampiero M. Gallo & Halbert White (2003), ‘A flexible tool for model building: the Relevant Transformation of the Inputs Network Approach (RETINA)’, Oxford Bulletin of Economics and Statistics 65(s1), 821–838. Raftery, A.E., D Madigan & J.A. Hoeting (1997), ‘Bayesian model averaging for linear regression models’, Journal of the American Statistical Association 92(437), 179–191. Sala-i-Martin, Xavier X. (1997a), ‘I just ran two million regressions’, American Economic Review 87(2), 178–83. Sala-i-Martin, Xavier X. (1997b), I just ran four million regressions, Technical report, National Bureau of Economic Research, Inc.

Problems in model averaging with dummy variables

Related documents

Products

Support

Problems in model averaging with dummy variables

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib