Econometrics Session 1 – Introduction Amine Ouazad, Asst. Prof. of Economics Session 1 - Introduction PRELIMINARIES Introduction • • • • • • Who I am Arbitrage Textbook Grading Homework Implementation Session 1 • The two econometric problems • Randomization as the Golden Benchmark Outline of the Course Who I am • Applied empirical economist. • Work on urban economics, economics of education, applied econometrics in accounting. • Emphasis on the identification of causal effects. • Careful empirical work: clean data work, correct identification of causal effects. • Large datasets: – +100 million observations, administrative datasets, geographic information software. • Implementation of econometric procedures in Stata/Mata. Trade-offs • Classroom is heterogeneous. – In tastes, mathematics level, needs, prior knowledge. • Different fields have different habits. – E.g. “endogeneity” is not an issue/the same issue in OB, Finance, Strategy, or TOM. • Conclusion: – Course provides a particular spin on econometrics, with mathematics when needed, applications. • This is a difficult course, even for students with a prior course in econometrics. Textbooks • *William H. Greene, Econometrics, 6th edition. • Jeffrey Wooldridge, Econometrics of Cross Section and Panel Data. • Joshua Angrist and Jorn Steffen Pischke, Mostly Harmless Econometrics. • Applied Econometrics using Stata, Cameron et al. Prerequisites • I assume you know: – Statistics • Random variables. • Moments of random variables (mean, variance, kurtosis, skewness). • Probabilities. – Real analysis • Integral of functions, derivatives. • Convergence of a function at x or at infinity. – Matrix algebra • Inverse, multiplication, projections. Grading • Exam: 60% • Participation: 10% • Homework: 30% – One problem set in-between Econometrics A and B. Implementation • STATA version 12. – License for PhD students. Ask IT. 5555 or Alina Jacquet. – Interactive mode, Do files, Mata programming. – Compulsory for this course. • MATLAB, not for everybody. – Coding econometric procedures yourself, e.g. GMM. Outline for Session 1 Introduction 1. Correlation and Causation 2. The Two Econometric Problems 3. Treatment Effects Session 1 - Introduction 1. CORRELATION AND CAUSATION 1. The perils of confounding correlation and causation • How can we boost children’s reading scores? – Shoe size is correlated with IQ. • Women earn less than men. – Sign of discrimination? • Health is negatively correlated with the number of days spent in hospital. – Do hospitals kill patients? Potential outcomes framework • A.k.a the “Rubin causality model”. • Outcome with the treatment Y(1), outcome without the treatment Y(0). • Treatment status D=0,1. • FUNDAMENTAL PROBLEM OF ECONOMETRICS: Either Y(1) or Y(0) is observed, or, equivalently, Y=Y(1) D + Y(0) (1D) is observed. • What would have happened if a given subject had received a different treatment? Naïve estimator of the treatment effect • D=E(Y|D=1) – E(Y|D=0). • Does that identify any relevant parameter? • Notice that: – D= E(Y|D=1) – E(Y|D=0) = E(Y(1)|D=1)-E(Y(0)|D=0) • What are we looking for? Ignorable Treatment (Rubin 1983) • Assume Y(1),Y(0) D. • Then E(Y(0)|D=1)=E(Y(0)|D=0)=E(Y(0)). • Similarly for Y(1). • Then: Another Interpretation • • • • Assume Y(D)=a+bD+e. e is the “unobservables”. The naïve estimator D=b+E(e|D=1)-E(e|D=0). Selection bias: S=E(e|D=1)-E(e|D=0). – Overestimates the effect if S>0 – Underestimates the effect if S<0. Definitions • Treatment Effect. Y(1)-Y(0) • Average Treatment Effect. E(Y(1)-Y(0)) • Average Treatment on the Treated. E(Y(1)-Y(0)|D=1) • Average Treatment on the Untreated. E(Y(1)-Y(0)|D=0) Randomization as the Golden Benchmark • Effect of a medical treatment. – Treatment and control group. – Randomization of the assignment to the treatment and to the control. • Why randomize? • … effect of jumping without a parachute on the probability of death. With ignorability… • If the treatment is ignorable (e.g. if the treatment has been randomly assigned to subjects) then – ATE = ATT = ATU Selection bias • Why is there a selection bias? – In medecine, in economics, in management? 1. Self-selection of subjects into the treatment. 2. Correlation between unobservables and observables, e.g. industry, gender, income. Session 1 - Introduction 2. THE TWO ECONOMETRIC PROBLEMS 2. The Two Econometric Problems • Identification and Inference – “Studies of identification seek to characterize the conclusions that could be drawn if one could use the sampling process to obtain an unlimited number of observations.” – “Studies of statistical inference seek to characterize the generally weaker conclusions that can be drawn from a finite number of observations.” Identification vs inference • Consider a survey of a random subset of 1,302 French individuals. • Identification: – Can you identify the average income in France? • Inference: – How close to the true average income is the mean income in the sample? – i.e. what is the confidence interval around the estimate of the average income in Singapore? Identification vs inference • Consider a lab experiment with 9 rats, randomly assigned to a treatment group and a control group. • Identification: – Can you identify the effect of the medication on the rats using the random assignment? • Inference: – With 9 rats, can you say anything about the effectiveness of the medication? This session • This session has focused on identification. – i.e. I assume we have a potentially infinite dataset. – I focus on the conditions for the identification of the causal effect of a variable. • Next session: what problems appear because we have a limited number of observations? Session 1 - Introduction LOOKING FORWARD: OUTLINE OF THE COURSE Outline of the course 1. Introduction: Identification 2. Introduction: Inference 3. Linear Regression 4. Identification Issues in Linear Regressions 5. Inference Issues in Linear Regressions 6. Identification in Simultaneous Equation Models 7. Instrumental variable (IV) estimation 8. Finding IVs: Identification strategies 9. Panel data analysis 10. Bootstrap 11. Generalized Method of Moments (GMM) 12. GMM: Dynamic Panel Data estimation 13. Maximum Likelihood (ML): Introduction 14. ML: Probit and Logit 15. ML: Heckman selection models 16. ML: Truncation and censoring + Exercise/Review session + Exam