ECO 420 Advanced Empirical Methods

advertisement
ECO 420
Advanced Empirical Methods
1
Instructor
• Jing Li
• Second year at Miami
• Taught undergraduate and graduate
econometrics before
• Married with two kids
2
Books
• Required: Introductory Econometrics, a
Modern Approach by Jeffrey M. Wooldridge
• http://www.amazon.com/s/ref=nb_sb_noss?u
rl=search-alias%3Daps&fieldkeywords=Introductory+Econometrics%2C+a+
Modern+Approach+by+Jeffrey+M.+Wooldridg
e
• Recommended: Mostly Harmless
Econometrics: An Empiricist's Companion
3
Webpage
• http://fsb.muohio.edu/lij14/
• Notes, data and codes will be posted
4
Critical Thinking
• Example: someone tries to show Canadians
like girls more than boys
• How? He shows that the number of baby girls
born in 2011 is greater than boys. Ok?
• Next, he shows that the number of girls
adopted in 2011 is greater than boys. Ok?
• What do you think?
5
Critical Thinking
• A president of a private high school wants to
prove that private school is worth the money.
• How? He shows that the more students of his
schools go to Ivy League than the best public
school in town.
• What do you think?
6
Causality
• How to interpret regression?
• Regression can show association
• Under stricter assumption (ceteris paribus),
regression can prove causality
• Econometrics focuses on causality
7
Two Examples
• Does wearing safety belt cause fewer deaths?
• Does the great recession in 2007-2009 cause
fewer marriages?
8
Ceteris Paribus
• It means all other things being equal
• Ideally, causality can be proved if Ceteris
Paribus holds
• The president of the private school is wrong
because the family backgrounds of students in
private and public schools are not equal, so
Ceteris Paribus fails.
9
Key Issues
• How to design experiment to ensure ceteris
paribus?
• How to find natural experiment?
• How to deal with non-experimental data?
10
STATA
• Available at FSB computer lab Room 2037
• Do file puts commands together. Click File--Do, and then choose the do file.
• Log file puts results (no graphs) together. The
text format is recommended. You can use any
text editor to open the file.
11
A Typical Do File
•
•
•
•
•
•
Clear
Capture log close
Log using logfilename.txt, text replace
Insheet using datafilename.csv, clear
…
Log close
12
Most Important Commands
•
•
•
•
•
•
•
•
Insheet: read data into memory
List: display data on screen
Sort: sort data
Des: summarize quantitative variable
Tab: summarize qualitative variable
Reg: run regression
Gen: generate new variable
Egen: generate fancy thing
13
Tips for Reading Data
•
•
•
•
Double check the following items
The variable names (letter and number only)
The missing values (NA, space, a dot)
Dollar sign, comma, etc
14
Tips for Summarizing Data
• Pay attention to min, max, obs
• Median (the 50% percentile) is more robust to
extreme values than mean
• Skewness is zero for symmetric distribution
• Kurtosis is three for normal distribution
• Variance and standard deviation measure the
dispersion of the distribution
15
Tips for Plotting
• Help twoway
• A scatter plot only shows association, not
causality
• Pay attention to scale
• It is not uncommon to plot the log of data
16
Tips for Running Regression
• In general, regression shows association, not
causality
• Pay attention to following:
• Outliers
• Structural Change
• Omitted Variables
17
Review
• Mean is good guess by minimizing errors
• Conditional mean is better guess than
unconditional mean
• Conditional mean is random variable
• Law of iterated expectation
18
STATA
• Unconditional Mean:
sum y
• Conditional Mean:
by x: sum y
You need to sort data by x first
19
Compare Means
• Example: Are the exam scores in section B
greater than section C?
• If there are two groups and x is the group
indicator, using
ttest y, by(x)
20
Regression with Dummy Variables
• Alternatively, we can run regression using
dummies to compare means. This approach is
better than ttest
gen d1 = (x == 1), gen d2 = (x == 2), …
reg y d1 d2…
21
Question (Optional, Just for fun)
• Suppose we draw 𝑍~π‘ˆπ‘›π‘–π‘“π‘œπ‘Ÿπ‘š(0,1). After we
know 𝑍 = 𝑧 we draw π‘Œ|(𝑍 = 𝑧)
~π‘ˆπ‘›π‘–π‘“π‘œπ‘Ÿπ‘š(𝑧, 1). Find Eπ‘Œ?
• Solve this problem using Law of Iterated
Expectation.
22
Answer
• 𝐸 π‘Œ = 𝐸 𝐸 π‘Œ|𝑍
= 0.75
=𝐸
1+𝑍
2
=
1+𝐸𝑍
2
=
1+0.5
2
23
Simulation
•
•
•
•
•
help uniform
set obs 10000
gen z = uniform()
gen y = z+(1-z)*uniform()
sum y
24
Mean and Sample Mean
• We use sample mean (estimator) to estimate
the population mean (parameter)
• The sample mean has nice properties of (1)
unbiasedness; (2) consistency
25
Unbiasedness
• Eπ‘₯ = 𝐸
=πœ‡
1
𝑛
𝑛
𝑖=1 π‘₯𝑖
=
1
𝑛
𝑛
𝑖=1 𝐸π‘₯𝑖
=
1
𝑛
𝑛
𝑖=1 πœ‡
26
Law of Large Number (Consistency)
• If π‘₯𝑖 ~𝑖𝑖𝑑(πœ‡, 𝜎 2 ), then as 𝑛 → ∞,
𝑛
1
π‘₯=
π‘₯𝑖 → πœ‡
𝑛
𝑖=1
In words, the sample mean gets closer and
closer to the true population mean as the size of
random sample rises
27
Simulation (Show Consistency)
•
•
•
•
•
•
clear
set obs 1000
gen y = 4 + 2*invnormal(uniform())
sum y in 1/10
sum y in 1/100
sum y in 1/1000
28
Discuss
• How to use simulation to show unbiasedness?
29
Answer
• We need to generate many samples.
• Compute sample mean for each sample.
• Unbiasedness means the average of those
sample means should be close (or in theory
identical) to the population mean
30
Causality
• Most often, an economist goal is to infer that
one variable (x) has a causal effect on another
variable (y)
• Example 1: x = price, y = quantity demanded
• Example 2: x = wearing safety belt, y = death
rate
• Example 3: x = hosting Olympics Game, y =
GDP
31
Ceteris Paribus
• By definition, inferring causality requires
ceteris paribus (CP)
• CP means all other factors being equal (fixed,
constant)
• Without CP, we are not sure the change in Y is
due to the change in X.
• Exercise: what does CP mean when x = price, y
= quantity demanded?
32
Real Problem
• Someone thinks that driving on left (passing)
lane causes more accidents
• How to prove?
• One (bad) answer: let’s pick an interstate, say
I275. Consider I275 between exit 33 and exit
41. Each day between 8 am and 10 am we
record the number of accidents that happen
on the right lane and left lane. Then we do the
mean-comparison test.
33
Discuss
• Is I275 representative?
• Is traffic between exit 33 and exit 41
representative?
• Does the time (rush hour, night time) matter?
• How to run a regression?
• How about using the percentage (# accident /
# traffic) rather than the number of accidents?
34
Fundamental Drawback
• CP fails since the bad answer uses the
observed data
• Reckless drivers (W) tend to drive on left lane
(X), and reckless driving causes more
accidents (Y).
• The observed association between X and Y
tells nothing about causality since W is not
held constant (CP fails).
35
Solutions
• (I) use an experiment: randomly assign reckless
drivers to left and right lanes. Then compare the
mean using the experimental data.
• (2) still use observed data, but run a multiple
regression which includes as regressor the
number of reckless drivers on the left lane.
• (3) use fancy econometric models such as
instrumental variable regression if the number of
reckless drivers on left lane is unobserved.
36
Discrimination Paper
• (Race) Discrimination means that someone is
treated unfairly just because of his skin color
(even if he has high ability)
• Using observed data cannot ensure ceteris
paribus
37
Experimental Data
• Obtained by using the fake resumes
• Factors (characteristics) other than names
(signal for skin colors) are made comparable
• In other words, the name (skin color) is
independent of ability.
• E(xu)=0, so the key regressor is exogenous
• Ceterius Paribus is ensured by using
experimental data
38
Policy Implications
• The punch line of this research is that job
training program may help little for AfricanAmerican, because the program may improve
their skill, but cannot change their skin color.
39
Discuss
• This is a very smart paper, why?
• How about using observed data? Can we draw
conclusion based on the observed salary
difference between black and white?
• How about market heterogeneity? Can we
generalize the finding to other markets such
as the market for college faculty? Why?
40
Tax Paper
• Supply-Side Economics says that people work
more (and so GDP rises) after tax cut
• So the theory implies a causal effect of tax cut
on labor supply
41
Discuss
• Consider using the observed data, and run the
regression of
Labor hours = 𝛽0 + 𝛽1 π‘‘π‘Žπ‘₯ π‘Ÿπ‘Žπ‘‘π‘’ + 𝑒
What does 𝑒 represent?
Is E(xu)=0, or is π‘‘π‘Žπ‘₯ π‘Ÿπ‘Žπ‘‘π‘’ exogenous?
What is the consequence of using observed
data?
42
Natural Experiment
• There is a tax reform (natural experiment)
• In 1987-1988, Iceland moved from a system
under which taxes were paid on previous
year's income to a pay-as-you-earn system.
• So the tax rate for 1987 income became zero
in an exogenous manner (has nothing to do
with 𝑒)
43
Policy Implications
• Figure 1 shows that cutting tax leads to higher
employment
• Figure 2 shows a hike in GDP in 1987-1988
• Another paper that uses natural experiment:
HOME-EQUITY LENDING AND RETAIL
SPENDING: EVIDENCE FROM A NATURAL
EXPERIMENT IN TEXAS
44
Discuss
• Where is natural experiment? Reform, Law
Change, Natural Event…
• Q: how to show the causal effect of the
number of children on labor hours of women?
How to design a pure experiment? How about
natural experiment?
45
Download