Estimating a slope

advertisement
Lecture 2: Estimating a Slope
• (Chapter 2.1–2.7)
2-1
Recall 計量經濟學的構成
1) 經濟理論
經濟理論
現象
數理模型
統計理論
2) 數理模型
3) 統計理論
計量模型
4) 計量模型
2-2
計量與經濟理論之差異?
• Economic theory: qualitative results—
Demand Curves Slope Downward
• Econometrics: quantitative results—
price elasticity of demand for milk = -.75
2-3
計量與統計之差異?
• Statistics: “summarize the data faithfully”;
“let the data speak for themselves.”
• Econometrics: “ what do we learn from
economic theory AND the data at hand?”
2-4
計量能做啥事?
• Estimation: What is the marginal propensity to
consume of Taiwan? (結構分析)
• Hypothesis Testing: Do Korean college workers’
productivity higher than Taiwan?(檢定假說)
• Prediction & Forecasting: What will Personal
Savings be in 2004 if GDP is $14,864? And will
it grow in the near future (2008)?(預期及預測)
2-5
Economists Ask:
“What Changes What and How?”
• Higher Income, Higher Saving
• Higher Price, Lower Quantity Demanded
• Higher Interest Rate, Lower Investment
2-6
Savings Versus Income
• Theory Would Assume an Exact Relationship,
e.g., Y =bX
6000
5000
4000
3000
2000
1000
0
24000
48000
72000
96000
2-7
Slope of the Line Is Key!
• Slope is the change in savings with
respect to changes in income
• Slope is the derivative of savings with
respect to income
• If we know the slope, we’ve quantified
the relationship!
2-8
Never So Neat: Savings Versus Income
6
5
4
saving
3
2
1
0
0
20
40
60
80
100
120
2-9
Long-run Consumption Function
特點
•
向上斜
•
經原點
•
猜猜斜率?
2-10
Underlying Mean + Random Part
Yi  b Xi  i
•
(憑直覺) 四大猜法
four intuitively appealing ways
to estimate b
2-11
估計策略
1. Min Σ (Y – Y)
Y 為配適值
2. Min Σ ∣ Y – Y ∣
3. Min Σ (Y – Y) 2
•
優劣點
2-12
“Best Guess 1”
Mean of Ratios:
Y
Yi
1
b g1  
n
Xi
X
共有n個
2-13
Figure 2.4 Estimating the Slope
of a Line with Two Data Points
2-14
“Best Guess 2”
Ratio of Means:
b g2
Y


X
i
i
2-15
Figure 2.5
Estimating the Slope of a Line: bg2
2-16
“Best Guess 3”
Mean of Changes in Y over Changes in X:
Yi  Yi1
1
b g3 

n  1 X i  X i1
y
X
2-17
“Best Guess 4”
Ordinary Least Squares:
b g4
XY


X
i i
2
i
(minimizes  squared residuals in sample)
2-18
Four Ways to Estimate b
Yi  b Xi  i
1) Mean of Ratios:
1 Yi
b g1  
n Xi
3) Mean of Ratio of Changes:
Yi  Yi1
1
b g3 

n 1 X i  X i1
2) Ratio of Means:
4) Ordinary Least Squares:
Y


X
YX


X
b g2
b g4
i
i
i
i
2
i
2-19
Underlying Mean + Random Part
Yi  b Xi  i
• Are lines through the origin likely
phenomena?
2-20
Regression’s Greatest Hits!!!
• An Econometric Top 40
2-21
Two Classical Favorites!!
• Friedman’s Permanent Income hypothesis:
Consumption  b ·(Permanent Income)  i
• Capital Asset Pricing Model (CAPM) :
(Asset j’s Return Above a Riskless Rate) 
b ·(Market’s Return Above a Riskless Rate)  i
2-22
A Golden Oldie !!
• Engel on the Demand for Rye:
E(%change in quantity)  elasticity·(%change in price)
2-23
Four Guesses
• How to Choose?
2-24
What Criteria Did We Discuss?
• Pick The One That's Right
• Make Mean Error Close to Zero
• Minimize Mean Absolute Error
• Minimize Mean Square Error
2-25
What Criteria Did We Discuss? (cont.)
• Pick The One That's Right…
– In every sample, a different estimator
may be “right.”
– Can only decide which is right if we
ALREADY KNOW the right answer—
which is a trivial case.
2-26
What Criteria Did We Discuss? (cont.)
• Make Mean Error Close to Zero
…seek unbiased guesses
Mean Error  E(b g - b )
 BIAS of b g in estimating b
– If E(bg-b) = 0, bg is right on average
– If BIAS = 0, bg is an unbiased estimator
of b
2-27
Checking Understanding
•
Question: Which estimator does
better under the “minimize mean
error” condition?
1. bg-b is always a positive number less
than 2 (our guesses are always a little
high), or
2. bg-b is always +10 or -10
(50/50 chance)
2-28
Checking Understanding (cont.)
Mean Error  E(b g - b )
 BIAS of b g in estimating b
• If our guess is wrong by +10 for half the
observations, and by -10 for the other half,
then E(bg-b) = 0!
– The second estimator is unbiased!
• Mistakes in opposite directions cancel out.
The first estimator is always closer to being
right, but it does worse on this criterion.
2-29
What Criteria Did We Discuss?
• Minimize Mean Absolute Error…
Mean Absolute Error  E(| b g - b |)
– Mistakes don’t cancel out.
– Implicitly treats cost of a mistake as being
proportional to the mistake’s size.
– Absolute values don’t go well
with differentiation.
2-30
What Criteria Did We Discuss? (cont.)
• Minimize Mean Square Error…
Mean Square Error  E[(b g - b ) ]
2
– Implicitly treats cost of mistakes as
disproportionately large for larger mistakes.
– Squared expressions are
mathematically tractable.
2-31
What Criteria Did We Discuss? (cont.)
• Pick The One That’s Right…
– only works trivially
• Make Mean Error Close to Zero…
– seek unbiased guesses
• Minimize Mean Absolute Error…
– mathematically tough
• Minimize Mean Square Error…
– more tractable mathematically
2-32
Criteria Focus Across Samples
• Make Mean Error Close to Zero
• Minimize Mean Absolute Error
• Minimize Mean Square Error
• What do the distributions of the
estimators look like?
2-33
Try the Four in Many Samples
• Pros will use estimators repeatedly—
what track record will they have?
• Idea: Let’s have the computer create
many, many data sets.
• We apply all our estimators to each
data set.
2-34
Try the Four in Many Samples (cont.)
• We use our estimates on many datasets
that we created ourselves.
• We know the true value of b because
we picked it!
• We can compare estimators.
• We run “horseraces.”
2-35
Try the Four in Many Samples (cont.)
• Pros will use estimators repeatedly—
what track record will they have?
• Which horse runs best on many tracks?
• Don’t design tracks that guarantee failure.
• What properties do we need our
computer-generated datasets to have
to avoid automatic failure for one of
our estimators?
2-36
Building a Fair Racetrack
Under what conditions will each estimator fail?
1) Mean of Ratios:
Yi
1
b g1  
n Xi
3) Mean of Ratio of Changes:
Yi  Yi1
1
b g3 

n 1 X i  X i1
2) Ratio of Means:
4) Ordinary Least Squares:
Y


X
YX


X
b g2
b g4
i
i
i
i
2
i
2-37
To Preclude Automatic Failure...
1)
Yi
1
b g1  
n Xi
No X i  0
Y

2) b g 
X
X 0
i
2
i
i
Yi  Yi1
1
3) b g3 

n 1 X i  X i1
No successive X's equal
4) b g 4
YX


X
i
i
2
i
Some X i  0
2-38
Why Does Viewing Many Samples
Work Well?
• We are interested in means: mean error,
mean absolute error, mean squared error.
• Drawing many (m) independent samples lets
us estimate means with variance e2 /m,
where e2 is the variance of that mean’s error.
• If m is large, our estimates will be
quite precise.
2-39
How to Build a Race Track...
Yi  b Xi   i
i  1, 2, , n
• n=?
– How big is each sample?
• b=?
– What slope are we estimating?
• Set X1 , X2 , … , Xn
– Do it once, or for each sample?
• Draw 1 , 2 , ... , n
– Must draw randomly each sample.
2-40
What to Assume About the i ?
• What do the i represent?
• What should the i equal on average?
• What variance do we want for the i ?
2-41
Checking Understanding
Yi  b Xi   i
i  1, 2, , n
• n=?
– How big is each sample?
• b=?
– What slope are we estimating?
• Set X1 , X2 , … , Xn
– Do it once, or for each sample?
• Draw 1 , 2 , … , n
– Must draw randomly each sample.
• Form Y1 , Y2 , … , Yn
– Yi = bXi + i
• We create 10,000 datasets with X and Y.
• For each dataset, what do we want to do?
2-42
Checking Understanding (cont.)
• We create 10,000 datasets with X and Y
• For each dataset, we use all four of our
estimators to estimate bg1 , bg2 , bg3 ,
and bg4
• We save the mean error, mean
absolute error, and mean squared error
for each estimator
2-43
What Have We Assumed?
• We are creating our own data.
• We get to specify the underlying “Data
Generating Process” relating Y to X.
• What is our Data Generating
Process (DGP)?
2-44
What Is Our Data Generating Process?
Yi  b Xi   i
i  1, 2, , n
• E(i ) = 0
• Var(i ) =  2
• Cov(i ,k ) = 0
i≠k
• X1 , X2 , … , Xn are fixed across samples
GAUSS–MARKOV ASSUMPTIONS
2-45
What Will We Get?
•
We will get precise estimates of:
1. Mean Error of each estimator
2. Mean Absolute Error of each estimator
3. Mean Squared Error of each estimator
4. Distribution of each estimator
•
By running different racetracks (DGPs), we
check the robustness of our results.
2-46
Review
• We want an estimator to form a “best guess”
of the slope of a line through the origin.
• Yi = bXi +i
• We want an estimator that works well
across many different samples: low average
error, low average absolute error, low squared
errors…
2-47
Review (cont.)
• We have brainstormed 4 “best guesses”:
1) Mean of Ratios:
Yi
1
b g1  
n Xi
3) Mean of Ratio of Changes:
Yi  Yi1
1
b g3 

n 1 X i  X i1
2) Ratio of Means:
4) Ordinary Least Squares:
Y


X
YX


X
b g2
b g4
i
i
i
i
2
i
2-48
Review (cont.)
• We will compare these estimators in
“horseraces” across thousands of
computer-generated datasets
• We get to specify the underlying relationship
between Y and X
• We know the “right answer” that the
estimators are trying to guess
• We can see how each estimator does
2-49
Review (cont.)
• We choose all the rules for how our
data are created.
• The underlying rules are the
“Data Generating Process” (DGP)
• We choose to use the Gauss–
Markov Rules.
2-50
What Is Our Data Generating Process?
Yi  b Xi   i
i  1, 2, , n
• E(i ) = 0
• Var(i ) =  2
• Cov (i ,k ) = 0
i≠k
• X1 , X2 , … , Xn are fixed across samples
GAUSS–MARKOV ASSUMPTIONS
2-51
Download