Panel Data Analysis Stefan Trappl Constanze Fay Schularick, Taylor (2012) • Is credit growth predicting financial crisis? • Analysis of macroeconomic panel data for developed countries 1870-2008. Two time periods: pre- and post-WW2, 79 major banking crisis in 14 countries. • Dependent variable: “financial crisis” from Bordo et al. (2001) and Reinhart, Rogoff (2009); independent variables: lagged credit and money supply, loans and bank assets, inflation, investment, GDP Trappl, Fay 11/11/2014 2 Our approach: • Is income-inequality predicting financial crisis? • We use the dataset of Schularick/Taylor, but use only a reduced dataset (8 countries) because of the limited availability of income-inequality-data • Dependent binary variable: “financial crisis (0/1)”; independent variables: lagged credit and money supply, loans, investment, personal income inequality (measured by the „Top1%-Income-Share“) • Dataset by Thomas Piketty: Capital in the 21st Century Trappl, Fay 11/11/2014 3 Schularick Taylor - Model • Logistic regression estimating the probability of a crisis based on previous periods credit growth Probability of crisis Lagged credit growth Control variable • OLS and Logit models with country and year fixed effects Trappl, Fay 11/11/2014 4 Our Model: • Generalized Linear Mixed Effects Regression estimating the probability of a crisis based on Income-Inequality in the previous periods Probability of crisis Fixed Effects & Random Effects Terms Error term • GLMM model; country = group Trappl, Fay 11/11/2014 5 XLConnect package • Java-based; used for importing Excel sheets, reading and writing Excel worksheets from within R • Alternative: RODBC package only available in 32bit R version (switch to 32 from 64bit in „Tools/Global Options“ in Rstudio) • There is a possibility to workaround the „incomplete final line“ error when using read.table to create data.frames from Excel or .csv files in R when using the JGR console (File/Load data) Load Excel Sheets in R via either loadWorkbook or readWorksheetfromFile functions; Always save workbook for your commands to be done! Trappl, Fay 11/11/2014 6 Panel data analysis Packages in R • Paneldata: linear models for panel data • pdR: panel data regression • Pglm: panel generalized linear model • Phtt: panel data analysis with heterogenuous time trends • plm: linear models for panel data • lme4, nlme: maximum likelihood estimation with panels OLS does not consider heterogeneity across units or time Data preparation The first two columns in panel data have to be (1) the unit and (2) the time period (most granular level) • The pdata.frame function in plm prepares data frames for panel data analysis. An „index“ variable indicates which columns to recognize as unit and time variable. Default value („NULL“) assumes observations to be listed by individual (column 1) and then time (column 2) or add a number indicating the n° of units in a balanced panel or add a character string indicating the individual or time column; e.g. c(„state“,“year“) Models in the plm package • The individual heterogeneity across units is captured by two error components, one individual which does not change over time and one idiosyncratic assumed to be well behaved and iid. Yes Yes Errors uncorrelated with regressors and white noise? OLS Pooling model No Yes Random effects model („random“) Errors uncorrelated with regressors? No Fixed effects model („within“) Trappl, Fay 11/11/2014 No First Differencing model (errors persistent) 8 Models in plm plm model objects are the result of demeaned data; individual effects time-demeaned: fe, „within“, quasi-time demeaned for the random effects model and no-demeaning for pooling /OLS Types of models • Pooling model (“pooling”): OLS, panel data is pooled, time series component is not considered • Fixed effects model (“within”, dummy variables): based on the deviation of the individual means • Fist-differences model (“fd”, lagged model): removes timeinvariant individual error components by first-differencing; preferred whenindividual error component is persistent over time • Random effects model (“random”): individual error term component uncorrelated with the regressors; more efficient than fixed effects • „Between“ model is based on time (group) averages per unit which discards intragroup variability but is apt for non stationary data; used for estimating long run relationships • Variable coefficient models assume that coefficients vary around an average • FGLS is used when errors are heteroscedastic and autocorrelated, in case of fixed effects also fixed effects FGLS; Function • plm: within, between and random effects models • pvcm: models with variable coefficients • Pggls: FGLS • Pgmm: GMM Function (formula, data, index, effect, model) • Effects: individual or time effects; if there are time effects use gls function in lme package (john fox appendix time series regression) Which model to chose? 1. The F-test compares the model for the full sample with a model based on an equation for each unit. „Poolability“ test with H0 implying that OLS is the apt model, there are no fixed effects, units are sufficiently homogeneous and coefficients are the same for all units Pooltest(plm, pvcm model=„within“)or pFtest: A significant F-statistics leads to If a rejection of the H0 implying that there are fixed effects. 2. Test for individual or time effects: plmtest (plm,type,effect)type: Lagrange multiplier tests („bp“, „honda“, „kw“, „ghm“), effect: individual, time and twoways 3. Test to chose between fixed or random effects models with Hausman-type test comparing estimators under the null of no significant difference between the two models; random model more efficient Assume random effects if n is large relative to t so that individual effects can be viewed as random phtest(plm „within“, plm „random“) Which model to choose? 4. test for serial correlation of the error term: fixed effects always cause serial correlation, in addition there may be usual AR(1) correlation of the idiosyncratic error term -> as these tests have power against each other, joint tests are needed which, however, do not give information on the reason for rejection! There are several joint, marginal and conditional tests in plm; problem is if errors are not normal and homoscedastic In short panels with a large number of observations serial correlation is not a problem as due to the large number of observations error correlations appear as random. Not so in long time series macro models. 5. + further diagnostics + screening tests; dynamic models and when lack of exogeneity of regressors: GMM Trappl, Fay 11/11/2014 11 panel analysis functions Commands Functions ls("package:plm") "between" "ercomp" "index" "pbsytest" "Between" "cipstest" "fixef" "has.intercept" "mtest" "pcce" "pbgtest" "pcdtest" "pdata.frame" "pdim" "pdwtest" "pFtest" "pggls" "pgmm" "pht" "plmtest" "phtest" "pmg" "dynformula" "pbltest" "pFormula" "plm" "plm.data" "pmodel.response" "pooltest" "pvcovHC" "purtest" "pvar" "pvcm" "pwartest" "pwfdtest" "pwtest" "vcovHC" "r.squared" "vcovSCC" "sargan" "Within" • plm : function (formula, data, subset, na.action, effect = c("individual", "time", "twoways"), model = c("within", "random", "ht", "between", "pooling", "fd"), random.method = c("swar", "walhus", "amemiya", "nerlove", "kinla"), inst.method = c("bvk", "baltagi"), restrict.matrix = NULL, restrict.rhs = NULL, index = NULL, ...) • pdata.frame : function (x, index = NULL, drop.index = FALSE, row.names = TRUE) • Explorative data analysis: use „|“ to consider both unit and year dimensions in scatterplot function of car package "vcovBK" Literature: Croissant, Y., Millo, G.: Panel Data Econometrics in R. The plm package. http://cran.r-project.org/web/packages/plm/vignettes/plm.pdf Literature • Schularick, Moritz, and Alan M. Taylor. 2012. "Credit Booms Gone Bust: Monetary Policy, Leverage Cycles, and Financial Crises, 1870-2008." American Economic Review, 102(2): 1029-61. Trappl, Fay 11/11/2014 13