ECO 420 Advanced Empirical Methods 1 Instructor • Jing Li • Second year at Miami • Taught undergraduate and graduate econometrics before • Married with two kids 2 Books • Required: Introductory Econometrics, a Modern Approach by Jeffrey M. Wooldridge • http://www.amazon.com/s/ref=nb_sb_noss?u rl=search-alias%3Daps&fieldkeywords=Introductory+Econometrics%2C+a+ Modern+Approach+by+Jeffrey+M.+Wooldridg e • Recommended: Mostly Harmless Econometrics: An Empiricist's Companion 3 Webpage • http://fsb.muohio.edu/lij14/ • Notes, data and codes will be posted 4 Critical Thinking • Example: someone tries to show Canadians like girls more than boys • How? He shows that the number of baby girls born in 2011 is greater than boys. Ok? • Next, he shows that the number of girls adopted in 2011 is greater than boys. Ok? • What do you think? 5 Critical Thinking • A president of a private high school wants to prove that private school is worth the money. • How? He shows that the more students of his schools go to Ivy League than the best public school in town. • What do you think? 6 Causality • How to interpret regression? • Regression can show association • Under stricter assumption (ceteris paribus), regression can prove causality • Econometrics focuses on causality 7 Two Examples • Does wearing safety belt cause fewer deaths? • Does the great recession in 2007-2009 cause fewer marriages? 8 Ceteris Paribus • It means all other things being equal • Ideally, causality can be proved if Ceteris Paribus holds • The president of the private school is wrong because the family backgrounds of students in private and public schools are not equal, so Ceteris Paribus fails. 9 Key Issues • How to design experiment to ensure ceteris paribus? • How to find natural experiment? • How to deal with non-experimental data? 10 STATA • Available at FSB computer lab Room 2037 • Do file puts commands together. Click File--Do, and then choose the do file. • Log file puts results (no graphs) together. The text format is recommended. You can use any text editor to open the file. 11 A Typical Do File • • • • • • Clear Capture log close Log using logfilename.txt, text replace Insheet using datafilename.csv, clear … Log close 12 Most Important Commands • • • • • • • • Insheet: read data into memory List: display data on screen Sort: sort data Des: summarize quantitative variable Tab: summarize qualitative variable Reg: run regression Gen: generate new variable Egen: generate fancy thing 13 Tips for Reading Data • • • • Double check the following items The variable names (letter and number only) The missing values (NA, space, a dot) Dollar sign, comma, etc 14 Tips for Summarizing Data • Pay attention to min, max, obs • Median (the 50% percentile) is more robust to extreme values than mean • Skewness is zero for symmetric distribution • Kurtosis is three for normal distribution • Variance and standard deviation measure the dispersion of the distribution 15 Tips for Plotting • Help twoway • A scatter plot only shows association, not causality • Pay attention to scale • It is not uncommon to plot the log of data 16 Tips for Running Regression • In general, regression shows association, not causality • Pay attention to following: • Outliers • Structural Change • Omitted Variables 17 Review • Mean is good guess by minimizing errors • Conditional mean is better guess than unconditional mean • Conditional mean is random variable • Law of iterated expectation 18 STATA • Unconditional Mean: sum y • Conditional Mean: by x: sum y You need to sort data by x first 19 Compare Means • Example: Are the exam scores in section B greater than section C? • If there are two groups and x is the group indicator, using ttest y, by(x) 20 Regression with Dummy Variables • Alternatively, we can run regression using dummies to compare means. This approach is better than ttest gen d1 = (x == 1), gen d2 = (x == 2), … reg y d1 d2… 21 Question (Optional, Just for fun) • Suppose we draw π~πππππππ(0,1). After we know π = π§ we draw π|(π = π§) ~πππππππ(π§, 1). Find Eπ? • Solve this problem using Law of Iterated Expectation. 22 Answer • πΈ π = πΈ πΈ π|π = 0.75 =πΈ 1+π 2 = 1+πΈπ 2 = 1+0.5 2 23 Simulation • • • • • help uniform set obs 10000 gen z = uniform() gen y = z+(1-z)*uniform() sum y 24 Mean and Sample Mean • We use sample mean (estimator) to estimate the population mean (parameter) • The sample mean has nice properties of (1) unbiasedness; (2) consistency 25 Unbiasedness • Eπ₯ = πΈ =π 1 π π π=1 π₯π = 1 π π π=1 πΈπ₯π = 1 π π π=1 π 26 Law of Large Number (Consistency) • If π₯π ~πππ(π, π 2 ), then as π → ∞, π 1 π₯= π₯π → π π π=1 In words, the sample mean gets closer and closer to the true population mean as the size of random sample rises 27 Simulation (Show Consistency) • • • • • • clear set obs 1000 gen y = 4 + 2*invnormal(uniform()) sum y in 1/10 sum y in 1/100 sum y in 1/1000 28 Discuss • How to use simulation to show unbiasedness? 29 Answer • We need to generate many samples. • Compute sample mean for each sample. • Unbiasedness means the average of those sample means should be close (or in theory identical) to the population mean 30 Causality • Most often, an economist goal is to infer that one variable (x) has a causal effect on another variable (y) • Example 1: x = price, y = quantity demanded • Example 2: x = wearing safety belt, y = death rate • Example 3: x = hosting Olympics Game, y = GDP 31 Ceteris Paribus • By definition, inferring causality requires ceteris paribus (CP) • CP means all other factors being equal (fixed, constant) • Without CP, we are not sure the change in Y is due to the change in X. • Exercise: what does CP mean when x = price, y = quantity demanded? 32 Real Problem • Someone thinks that driving on left (passing) lane causes more accidents • How to prove? • One (bad) answer: let’s pick an interstate, say I275. Consider I275 between exit 33 and exit 41. Each day between 8 am and 10 am we record the number of accidents that happen on the right lane and left lane. Then we do the mean-comparison test. 33 Discuss • Is I275 representative? • Is traffic between exit 33 and exit 41 representative? • Does the time (rush hour, night time) matter? • How to run a regression? • How about using the percentage (# accident / # traffic) rather than the number of accidents? 34 Fundamental Drawback • CP fails since the bad answer uses the observed data • Reckless drivers (W) tend to drive on left lane (X), and reckless driving causes more accidents (Y). • The observed association between X and Y tells nothing about causality since W is not held constant (CP fails). 35 Solutions • (I) use an experiment: randomly assign reckless drivers to left and right lanes. Then compare the mean using the experimental data. • (2) still use observed data, but run a multiple regression which includes as regressor the number of reckless drivers on the left lane. • (3) use fancy econometric models such as instrumental variable regression if the number of reckless drivers on left lane is unobserved. 36 Discrimination Paper • (Race) Discrimination means that someone is treated unfairly just because of his skin color (even if he has high ability) • Using observed data cannot ensure ceteris paribus 37 Experimental Data • Obtained by using the fake resumes • Factors (characteristics) other than names (signal for skin colors) are made comparable • In other words, the name (skin color) is independent of ability. • E(xu)=0, so the key regressor is exogenous • Ceterius Paribus is ensured by using experimental data 38 Policy Implications • The punch line of this research is that job training program may help little for AfricanAmerican, because the program may improve their skill, but cannot change their skin color. 39 Discuss • This is a very smart paper, why? • How about using observed data? Can we draw conclusion based on the observed salary difference between black and white? • How about market heterogeneity? Can we generalize the finding to other markets such as the market for college faculty? Why? 40 Tax Paper • Supply-Side Economics says that people work more (and so GDP rises) after tax cut • So the theory implies a causal effect of tax cut on labor supply 41 Discuss • Consider using the observed data, and run the regression of Labor hours = π½0 + π½1 π‘ππ₯ πππ‘π + π’ What does π’ represent? Is E(xu)=0, or is π‘ππ₯ πππ‘π exogenous? What is the consequence of using observed data? 42 Natural Experiment • There is a tax reform (natural experiment) • In 1987-1988, Iceland moved from a system under which taxes were paid on previous year's income to a pay-as-you-earn system. • So the tax rate for 1987 income became zero in an exogenous manner (has nothing to do with π’) 43 Policy Implications • Figure 1 shows that cutting tax leads to higher employment • Figure 2 shows a hike in GDP in 1987-1988 • Another paper that uses natural experiment: HOME-EQUITY LENDING AND RETAIL SPENDING: EVIDENCE FROM A NATURAL EXPERIMENT IN TEXAS 44 Discuss • Where is natural experiment? Reform, Law Change, Natural Event… • Q: how to show the causal effect of the number of children on labor hours of women? How to design a pure experiment? How about natural experiment? 45