Econometrics II, Summer 2004, CERGE-EI, Daniel Munich Instructor’s notes Applied econometrics (based on PK, Ch.1, 21) 1) Why? a) Empirical thesis at CERGE-EI is more common b) Use of econometrics with M.A. only in business c) Expansion of empirical econometrics due to technological progress in recent decades: PCs, speed, memory, survey and population databases d) Hunt for causal not only statistical relationship 2) Empirical econometrics a) Theory i) ≠ empirical ii) Y=f(X) deterministic model of an economist iii) technique oriented not problem oriented iv) Standard solutions to standard problems b) Empirical i) Y=f(X) + stochastic model of an econometrician ii) Econometrics is much easier w/o data iii) Attempts to overcome problems of imperfect data using standard solution errors, mistakes, definitions endogeneity, lack of controlled experiments iv) Why error term? omission of non-systematic factors omission of systematic influence measurement error v) randomness of human behavior 3) General principles of empirical econometrics a) Use common sense and/or economic theory i) rate vs. rate, real vs. real, trend vs. trend, per capita vs. per capita ii) correlation ≠ causality b) Know the context i) History, institutions, data gathering, instructions, variables definition, preliminary data cleaning, rounding, etc. c) Inspect the data i) Skipping this step -> wasting time in later research ii) Summary statistics 1 Econometrics II, Summer 2004, CERGE-EI, Daniel Munich Instructor’s notes Format types Summary statistics: positive, negative, zero, missing, min/max, Graphing: scatter plots, histograms, trends (technology!) Cause of missing data: rejection, wrong coding, top-coding d) Keep it simple i) Bottom-up or specific-to-general approach Testing is biased if model is not complete ii) Top-down or general-to-specific approach Less biased Infinite number of variables and functional forms with data limits iii) Compromise Expand simple model whenever it fails (misspecification test) e) Results i) Expected sign of coefficients Omitted variable negatively correlated with included variable Multicollinearity -> high variance -> possible negative values Endogeneity bias: ALMP: U=f(-M) or M=g(+U)? Selection bias: impact of retraining on earnings (selfselection) Outliers Lack of identification (moves along demand or supply curve?) ii) Have plausible interpretation of unexpected results iii) Significant important variables iv) Magnitude of coefficients v) Sensitivity to Functional form Included variables Sample/period f) Data mining – pros & cons i) Bad side: tailoring specification to get desired results ii) Good side: discover regularities to inform economic theory 2 Econometrics II, Summer 2004, CERGE-EI, Daniel Munich Instructor’s notes 4) Practical hints (we will learn) a) Log variables i) if % change makes more sense ii) Logging variables can eliminate heteroskedasticity iii) Logging zero (or negative) observations is impossible iv) Remember that impact of dummy on logged dependent variable is approximate only b) Recognize trade-off between bias and efficiency c) Multicollinearity i) Does not create a bias ii) Solution requires more information d) Remember that that heteroskedasticity consistent estimators do not differ from OLS coefficients. Only V-C matrix and std. errors. e) Do not forget to consider interactions of variables. f) Do not use linear form if dependent variable measuring fractions. Possible only if far enough from 0. g) Carefully use ordered explanatory variable (schooling level, children). h) Forgetting about possible endogeneity. i) Bias is not sacred. Some bias can buy efficiency! j) Be able to predict direction of measurement error bias k) R2 i) has no meaning if intercept is omitted ii) adjR2 much better than R2 l) Dropping observations i) Outliers should be inspected before omitted ii) Try to understand missing data Selection problem Omit or predict? m) Pre-testing should be done at higher significance level 25% instead of 5% n) Always check sensitivity o) Reporting i) Admit problems and deficiencies (and learn from them) ii) Failure to prove theory or significant effect is also valuable finding. (selection bias toward significant results). iii) Make plausible assumptions to overcome problems. iv) Finiteness and focus of output p) Practice i) Programs Fine tune at command line, save steps in DO Number versions Store dta, do, logs separately 3 Econometrics II, Summer 2004, CERGE-EI, Daniel Munich Instructor’s notes Comments in DO file Understand commands, test commands by simple data ii) Check, check, check outcomes of all your steps iii) Give variables meaningful names 4