Apply

advertisement
Lecture :Apply Gauss Markov Modeling
Regression with One Explanator
(Chapter 3.1–3.5, 3.7
Chapter 4.1–4.4)
5-1
Agenda
• Finding a good estimator for a
straight line through the origin:
Chapter 3.1–3.5, 3.7
• Finding a good estimator for a straight
line with an intercept: Chapter 4.1–4.4
5-2
Where Are We? (範例)
• We wish to uncover quantitative features
of an underlying process, such as the
relationship between family income and
financial aid.
• 更精準些 How much less aid will I
receive on average for each dollar of
additional family income?
• DATA, a sample of the process, for example
observations on 10,000 students’ aid awards
and family incomes.
5-3
隨機項
• Other factors (e), such as number of siblings,
influence any individual student’s aid, so we
cannot directly observe the relationship
between income and aid.
• We need a rule for making a good guess
about the relationship between income and
financial aid, based on the data.
5-4
Guess
• A good guess is a guess which is right
on average.
• We also desire a guess which will have
a low variance around the true value.
5-5
估計式
• Our rule is called an “estimator.”
• We started by brainstorming a number
of estimators and then comparing their
performances in a series of computer
simulations.
• We found that the Ordinary Least Squares
estimator dominated the other estimators.
• Why is Ordinary Least Squares so good?
5-6
工具
• To make more general statements, we
need to move beyond the computer and
into the world of mathematics.
• Last time, we reviewed a number of
mathematical tools: summations,
descriptive statistics, expectations,
variances, and covariances.
5-7
DGP
• As a starting place, we need to write down all
our assumptions about the way the
underlying process works, and about how that
process led to our data.
• These assumptions are called the “Data
Generating Process.”
• Then we can derive estimators that have
good properties for the Data Generating
Process we have assumed.
5-8
Model
• The DGP is a model to approximate
reality. We trade off realism to gain
parsimony and tractability.
• Models are to be used, not believed.
5-9
DGP assumptions
• Much of this course focuses on different
types of DGP assumptions that you can
make, giving you many options as you
trade realism for tractability.
5-10
Two Ways to Screw Up in Econometrics
– Your Data Generating Process assumptions
missed a fundamental aspect of reality (your DGP
is not a useful approximation); or
– Your estimator did a bad job for your DGP.
• Today we focus on picking a good estimator
for your DGP.
5-11
GMT
• Today, we will focus on deriving the
properties of an estimator for a simple
DGP: the Gauss–Markov Assumptions.
• First we will find the expectations and
variances of any linear estimator under
the DGP.
• Then we will derive the Best Linear
Unbiased Estimator (BLUE).
5-12
Our Baseline DGP: Gauss–Markov
(Chapter 3)
• Y = bX +e
• E(ei ) = 0
• Var(ei ) = s 2
• Cov(ei ,ej ) = 0, for i ≠ j
• X ’s fixed across samples (so we can
treat them like constants).
• We want to estimate b
5-13
A Strategy for Inference
•
The DGP tells us the assumed relationships
between the data we observe and the
underlying process of interest.
•
Using the assumptions of the DGP and the
algebra of expectations, variances, and
covariances, we can derive key properties
of our estimators, and search for estimators
with desirable properties.
5-14
An Example: bg1
Yi  b X i  e i
E(e i )  0
Var(e i )  s 2
Cov(e i ,e j )  0, for i  j
X's fixed across samples (so we can treat it as a constant).
1 n Yi
b g1  
n i1 X i
In our simulations, b g1 appeared to give estimates close to b.
Was this an accident, or does b g1 on average give us b ?
5-15
An Example: bg1 (OK on average)
Yi
b X i  ei
1 n Yi
1 n
1 n
E( b g1 )  E(  )   E( )   E(
)
n i1 X i
n i1 X i
n i1
Xi
1 n
1 n 1
  E( b )   E(e i )
n i1
n i1 X i
1
 nb  0  b
n
On average, b g1  b .
E( b g1 )  b
Using the DGP and the algebra of expectations,
we conclude that b g1 is unbiased.
5-16
Checking Understanding
Yi
b X i  ei
1 n Yi
1 n
1 n
E( b g1 )  E(  )   E( )   E(
)
n i1 X i
n i1 X i
n i1
Xi
1 n
1 n 1
  E( b )   E(e i )
n i1
n i1 X i

1
nb  0  b
n
E( b g1 )  b
Question: which DGP assumptions did we need to use?
5-17
Which assumption used?
Yi
b X i  ei
1 n Yi
1 n
1 n
E( b g1 )  E(  )   E( )   E(
)
n i1 X i
n i1 X i
n i1
Xi
Here we used Yi  b X i  e i
1 n
1 n 1
  E( b )   E(e i )
n i1
n i1 X i
Here we used the assumption that X's
are fixed across samples.
1
 nb  0  b
n
Here we used E(e i )  0
5-18
Checking Point 2:
We did NOT use the assumptions about
the variance and covariances of e i .
We will use these assumptions when we
calculate the variance of the estimator.
5-19
Linear Estimators
•
bg1 is unbiased. Can we generalize?
• We will focus on linear estimators.
• Linear estimator: a weighted sum of the Y ’s.
ˆ
b
wY
 ii
5-20
Linear Estimators (weighted sum)
• Linear estimator:
bˆ   wY
i i
• Example: bg1 is a linear estimator.
Yi
1
b g1  
n
Xi
1
wi 
nX i
b g1   wiYi
5-21
A class of Linear Estimators
1) Mean of Ratios:
Y
1
b g1   i
n Xi
wi 
1
nX i
3) Mean of Ratio of Changes:
Yi  Yi1
1
b g3 

n 1 X i  X i1

1 
1
1
wi 



n 1  X i  X i1 X i1  X i 
2) Ratio of Means:
4) Ordinary Least Squares:
Y


X
YX


X
b g2
wi 
b g4
i
1
Xj
i
i
i
2
j
wi 
Xi
X
2
j
• All of our “best guesses” are linear estimators!
5-22
Expectation of Linear Estimators
Yi  b X i  e i
E (e i )  0
Var (e i )  s 2
Cov(e i , e j )  0, for i  j
X 's fixed across samples (so we can treat it as a constant).
n
bˆ   wiYi
i 1
n
n
n
i 1
i 1
i 1
E ( bˆ )  E ( wiYi )   wi E (Yi )  wi E ( b X i  e i )
n
n
i 1
i 1
  wi [ E ( b X i )  E (e i )] b  wi X i
5-23
Condition for Unbias
n
bˆ   wiYi
i 1
n
E ( bˆ )  b  wi X i
i 1
n
A linear estimator is unbiased if
w X
i 1
i
i
 1.
5-24
Check others
• A linear estimator is unbiased if SwiXi = 1
• Are bg2 and bg4 unbiased?
2) Ratio of Means:
4) Ordinary Least Squares:
Y


X
YX


X
b g2
b g4
i
i
1
wi 
Xj
1

X
w
 i i  X Xi
 j

1
Xi  1

Xj
i
i
2
j
wi 
Xi
X
2
j
w X  
i
i

Xi
X
2
Xi
j
1
2
1
X

i
2
Xj
5-25
Better unbiased estimator
• Similar calculations hold for bg3
• All 4 of our “best guesses” are unbiased.
• But bg4 did much better than bg3. Not all
unbiased estimators are created equal.
• We want an unbiased estimator with a low
mean squared error.
5-26
First: A Puzzle…..
• Suppose n = 1
– Would you like a big X or a small X for
that observation?
– Why?
5-27
What Observations
Receive More Weight?
1) Mean of Ratios:
Yi
1
b g1  
n Xi
wi 
1
nX i
3) Mean of Ratio of Changes:
Yi  Yi1
1
b g3 

n 1 X i  X i1

1 
1
1
wi 



n 1  X i  X i1 X i1  X i 
2) Ratio of Means:
4) Ordinary Least Squares:
Y


X
YX


X
b g2
wi 
b g4
i
1
Xj
i
i
i
2
j
wi 
Xi
2
X
 j
5-28
(Stat. significant)?
•
bg1 puts more weight on observations with low
values of X.
•
bg3 puts more weight on observations with low
values of X, relative to neighboring observations.
•
These estimators did very poorly in the simulations.
b g1 
wi 
Yi
1

n Xi
1
nX i
Yi  Yi1
1

n 1 X i  X i1

1 
1
1
wi 



n 1  X i  X i1 X i1  X i 
b g3 
5-29
What Observations
Receive More Weight? (cont.)
• bg2 weights all observations equally.
• bg4 puts more weight on observations with high
values of X.
• These observations did very well in the simulations.
b g2
Y


X
1
wi 
Xj
b g4
i
i
YX


X
i
i
2
j
Xi
wi 
2
X
 j
5-30
Why Weight More Heavily Observations
With High X ’s?
• Under our Gauss–Markov DGP the
disturbances are drawn the same for all
values of X….
• To compare a high X choice and a low X
choice, ask what effect a given disturbance
will have for each.
5-31
Figure 3.1 Effects of a Disturbance for
Small and Large X
5-32
Linear Estimators and Efficiency
• For our DGP, good estimators will place more
weight on observations with high values of X
• Inferences from these observations are less
sensitive to the effects of the same e
• Only one of our “best guesses” had this
property.
• bg4 (a.k.a OLS) dominated the other
estimators.
• Can we do even better?
5-33
Min. MSE
• Mean Squared Error = Variance + Bias2
• To have a low Mean Squared Error,
we want two things: a low bias and a
low variance.
5-34
Need Variance
• An unbiased estimator with a low variance
will tend to give answers close to the true
value of b
• Using the algebra of variances and our
DGP, we can calculate the variance of
our estimators.
5-35
Algebra of Variances
(1) Var (k )  0
(2) Var (kY )  k 2 ·Var (Y )
(3) Var (k  Y )  Var (Y )
(4) Var ( X  Y )  Var ( X )  Var (Y )  2Cov( X , Y )
n
n
i 1
i 1
n
n
(5) Var ( Yi )   Var (Yi )   Cov(Yi , Y j )
i 1 j 1
j i
• One virtue of independent observations is that
Cov( Yi ,Yj ) = 0, killing all the cross-terms in the
variance of the sum.
5-36
Back again to Our Baseline DGP
: Gauss–Markov
• Our benchmark DGP: Gauss–Markov
• Y = bX + e
• E(ei ) = 0
• Var(ei ) = s 2
• Cov(ei ,ej ) = 0, for i ≠ j
• X ’s fixed across samples
We will refer to this DGP (very) frequently.
5-37
Variance of OLS

X iYi 
OLS )  Var  
2 
 X i 
 X iYi 
  Var 
2
2 
 X i 
Var ( bˆ
n

i 1
X iYi X jY j
Cov(
,

2
2

X

X
j 1,
k
k
n
j i
2
 Xi 
 
Var Yi   0

2
X 
k 

2
 Xi 
 
Var  b X i  e i 
  X 2 
k 

5-38
Variance of OLS (cont.)
2
 Xi 
Var  b X i  e i 
OLS )   
  X 2 
k 

Var ( bˆ
2
 Xi 
 
(0  Var  e i   0)
  X 2 
k 

2
 Xi  2
2
 
s

s
  X 2 
k 

 1 

2 

X

k


2
2
X
 i 
s2
2
X
 k
• Note: the higher the Xk2 , the lower
the variance.
5-39
Variance of a Linear Estimator
• More generally:
Var ( wiYi )   Var ( wiYi )  2 Covariance Terms
  Var ( wiYi )  0   wi 2 Var (Yi )
  wi 2 Var ( b X i  e i )
  wi 2  0  Var (e i )  0
 s 2  wi 2
5-40
Variance of a Linear Estimator (cont.)
• The algebras of expectations
and variances allow us to get exact
results where the Monte Carlos gave
only approximations.
• The exact results apply to ANY
model meeting our Gauss–Markov
assumptions.
5-41
Variance of a Linear Estimator (cont.)
• We now know mathematically that bg1–bg4
are all unbiased estimators of b under our
Gauss–Markov assumptions.
• We also think from our Monte Carlo models
that bg4 is the best of these four estimators,
in that it is more efficient than the others.
• They are all unbiased (we know from the
algebra), but bg4 appears to have a smaller
variance than the other 3.
5-42
Variance of a Linear Estimator (cont.)
• Is there an unbiased linear estimator
better (i.e., more efficient) than bg4?
– What is the Best, Linear, Unbiased
Estimator?
– How do we find the BLUE estimator?
5-43
BLUE Estimators
• Mean Squared Error = Variance + Bias2
• An unbiased estimator is right
“on average”
• In practice, we don’t get to average. We
see only one draw from the DGP.
5-44
BLUE Estimators (Trade-off ??)
• Some analysts would prefer an
estimator with a small bias, if it gave
them a large reduction in variance
• What good is being right on average if
you’re likely to be very wrong in your
one draw?
5-45
BLUE Estimators (cont.)
• Mean Squared Error = Variance + Bias2
• In a particular application, there may be
a favorable trade-off between accepting a
little bias in return for a lot less variance.
• We will NOT look for these trade-offs.
• Only after we have made sure our
estimator is unbiased will we try to make
the variance small.
5-46
BLUE Estimators (cont.)
A Strategy for Finding the Best Linear
Unbiased Estimator:
1. Start with linear estimators: wiYi
2. Impose the unbiasedness condition wiXi=1
3. Calculate the variance of a linear estimator:
Var(wiYi) =s2wi2
– Use calculus to find the wi that give the smallest
variance subject to the unbiasedness condition
Result: the BLUE Estimator for Our DGP
5-47
BLUE Estimators (cont.)
Xi
Using calculus, we would find wi 
2
X
 j
This formula is OLS!
OLS is the Best Linear Unbiased Estimator for
the Gauss–Markov DGP.
This result is called the Gauss–Markov Theorem.
5-48
BLUE Estimators (cont.)
• OLS is a very good strategy for the
Gauss–Markov DGP.
• OLS is unbiased: our guesses are right
on average.
• OLS is efficient: it has a small variance
(or at least the smallest possible variance
for unbiased linear estimators).
• Our guesses will tend to be close to right (or
at least as close to right as we can get; the
minimum variance could still be pretty large!)
5-49
BLUE Estimator (cont.)
• According to the Gauss–Markov Theorem,
OLS is the BLUE Estimator for the
Gauss–Markov DGP.
• We will study other DGP’s. For any DGP,
we can follow this same procedure:
– Look at Linear Estimators
– Impose the unbiasedness conditions
– Minimize the variance of the estimator
5-50
Example: Cobb–Douglas Production
Functions (Chapter 3.7)
• A classic production function in economics is
the Cobb–Douglas function.
• Y = aLbK1-b
• If firms pay workers and capital their marginal
product, then worker compensation equals a
fraction b of total output (or national income).
5-51
Example: Cobb–Douglas
• To illustrate, we randomly pick 8 years
between 1900 and 1995. For each year,
we observe total worker compensation
and national income.
• We use bg1, bg2, bg3, and bg4 to
estimate

Compensation = b·National Income +e
5-52
TABLE 3.6 Estimates of the Cobb–Douglas
Parameter b, with Standard Errors
5-53
TABLE 3.7
Outputs from
a Regression* of
Compensation on
National Income
5-54
Example: Cobb–Douglas
• All 4 of our estimators give very
similar estimates.
• However, bg2 and bg4 have much smaller
standard errors. (We will see the value of
small standard errors when we cover
hypothesis tests.)
• Using our estimate from bg4, 0.738, a
1 billion dollar increase in National Income
is predicted to increase total worker
compensation by 0.738 billion dollars.
5-55
A New DGP
• Most lines do not go through the origin.
• Let’s add an intercept term and find the
BLUE Estimator (from Chapter 4).
5-56
Gauss–Markov with an Intercept
Yi  b0  b1 X i  e i (i  1...n)
E(e i )  0
Var(e i )  s
2
Cov(e i ,e j )  0, i  j
X's fixed across samples.
All we have done is add a b0 .
5-57
Gauss–Markov with an Intercept (cont.)
• Example: let’s estimate the effect of income
on college financial aid.
• Students whose families have 0 income do
not receive 0 aid. They receive a lot of aid.
• E[financial aid | family income]
= b0 + b1(family income)
5-58
Gauss–Markov with an Intercept (cont.)
5-59
Gauss–Markov with an Intercept (cont.)
• How do we construct a BLUE Estimator?
• Step 1: focus on linear estimators.
• Step 2: calculate the expectation of a linear
estimator for this DGP, and find the condition
for the estimator to be unbiased.
• Step 3: calculate the variance of a linear
estimator. Find the weights that minimize this
variance subject to the unbiasedness
constraint.
5-60
Expectation of a Linear Estimator
E ( bˆ )  E   wiYi    E ( wiYi )
  wi E (Yi )   wi E ( b 0  b1 X i  e i )
   wi E ( b 0 )  wi E ( b1 X i )  wi E (e i ) 
   b 0 wi  b1wi X i  0
 b 0  wi  b1  wi X i
5-61
Checking Understanding
E ( bˆ )  b0  wi  b1  wi X i
• Question: What are the conditions for an
estimator of b1 to be unbiased? What
are the conditions for an estimator of b0
to be unbiased?
5-62
Checking Understanding (cont.)
E ( bˆ )  b0  wi  b1  wi X i
• When is the expectation equal to b1?
– When wi = 0 and wiXi = 1
• What if we were estimating b0? When is the
expectation equal to b0?
– When wi = 1 and wiXi = 0
• To estimate 1 parameter, we needed 1 unbiasedness
condition. To estimate 2 parameters, we need 2
unbiasedness conditions.
5-63
Variance of a Linear Estimator
Var ( bˆ )  Var   wY
i i    Var  wY
i i0
  wi Var  b 0  b1 X i  e i 
2
  wi
2
0  0  Var (e i )  0
  wi s
2
2
• Adding a constant to the DGP does NOT
change the variance of the estimator.
5-64
BLUE Estimator
To compute the BLUE estimator for bˆ1, we want to
minimize s 2  wi 2
subject to the constraints
w  0
w X 1
i
i
i
Solution:
( X i  X )(Yi  Y )
ˆ
b1   n
2
(
X

X
)
 j
j 1
5-65
BLUE Estimator of b1
( X i  X )(Yi  Y )
ˆ
b1   n
2
(X j  X )
j 1
• This estimator is OLS for the DGP with
an intercept.
• It is the Best (minimum variance) Linear
Unbiased Estimator for the Gauss–Markov
DGP with an intercept.
5-66
BLUE Estimator of b1 (cont.)
( X i  X )(Yi  Y )
ˆ
b1   n
2
(X j  X )
j 1
• This formula is very similar to the
formula for OLS without an intercept.
• However, now we subtract the mean
values from both X and Y.
5-67
BLUE Estimator of b1 (cont.)
( X i  X )(Yi  Y )
ˆ
b1   n
2
(
X

X
)
 j
j 1
• OLS places more weight on high values of:
Xi  X
• Observations are more valuable if X is far
away from its mean.
5-68
BLUE Estimator of b1 (cont.)
wi 
Xi  X
 X j  X 
2


X

X
i

Var ( bˆ1 )  s 2  wi 2  s 2  
2
 X  X  
 j

1
2
s 2
(
X

X
)
 i
2 2
 X j  X 



2
s2
 X
j
X
2
5-69
BLUE Estimator of b0
• The easiest way to estimate the intercept:
bˆ0  Y  bˆ1 X
• Notice that the fitted regression line always
goes through the point
( X ,Y )
• Our fitted regression line passes through “the
middle of the data.”
5-70
Example: The Phillips Curve
• Phillips argued that nations
face a trade-off between inflation
and unemployment.
• He used annual British data on wage
inflation and unemployment from
1861–1913 and 1914–1957 to regress
inflation on unemployment.
5-71
Example: The Phillips Curve (cont.)
• The fitted regression line for 1861–1913
did a good job predicting the data from
1914 to 1957.
• “Out of sample predictions” are a strong
test of an econometric model.
5-72
Example: The Phillips Curve (cont.)
• The US data from 1958–1969 also
suggest a trade-off between inflation
and unemployment.
Unemploymentt  0.06 - 0.55·Inflationt
bˆ0  0.06
bˆ1  0.55
5-73
Example: The Phillips Curve (cont.)
Unemploymentt  0.06 - 0.55·Inflationt
• How do we interpret these numbers?
• If Inflation were 0, our best guess of
Unemployment would be 0.06
percentage points.
• A one percentage point increase of Inflation
decreases our predicted Unemployment level
by 0.55 percentage points.
5-74
Figure 4.2 U.S. Unemployment and
Inflation, 1958–1969
5-75
TABLE 4.1 The Phillips Curve
5-76
Example: The Phillips Curve
• We no longer need to assume our
regression line goes through the origin.
• We have learned how to estimate
an intercept.
• A straight line doesn’t seem to do a
great job here. Can we do better?
5-77
Review
• As a starting place, we need to write down all
our assumptions about the way the
underlying process works, and about how that
process led to our data.
• These assumptions are called the “Data
Generating Process.”
• Then we can derive estimators that have
good properties for the Data Generating
Process we have assumed.
5-78
Review: The Gauss–Markov DGP
• Y = bX +e
• E(ei ) = 0
• Var(ei ) = s 2
• Cov(ei ,ej ) = 0, for i ≠ j
• X ’s fixed across samples (so we can
treat them like constants).
• We want to estimate b
5-79
Review
• We will focus on linear estimators.
• Linear estimator: a weighted sum of the Y ’s.
ˆ
b
wY
 ii
5-80
Review (cont.)
Yi  b X i  e i
E (e i )  0
Var (e i )  s 2
Cov(e i , e j )  0, for i  j
X 's fixed across samples (so we can treat it as a constant).
n
bˆ   wiYi
i 1
n
E ( bˆ )  b  wi X i
i 1
n
A linear estimator is unbiased if
w X
i 1
i
i
 1.
5-81
Review (cont.)
Yi  b X i  e i
E(e i )  0
Var(e i )  s 2
Cov(e i ,e j )  0, for i  j
X's fixed across samples (so we can treat it as a constant).
n
A linear estimator is unbiased if  wi X i  1.
i1
Many linear estimators will be unbiased. How do I pick the "best"
linear unbiased estimator (BLUE)?
5-82
Review: BLUE Estimators
A Strategy for Finding the Best Linear
Unbiased Estimator:
1. Start with linear estimators: wiYi
2. Impose the unbiasedness condition wiXi = 1
3. Use calculus to find the wi that give the smallest
variance subject to the unbiasedness condition.
Result: The BLUE Estimator for our DGP
5-83
Review: BLUE Estimators (cont.)
• Ordinary Least Squares (OLS) is BLUE
for our Gauss–Markov DGP.
• This result is called the “Gauss–Markov
Theorem.”
5-84
Review: BLUE Estimators (cont.)
• OLS is a very good strategy for the Gauss–
Markov DGP.
• OLS is unbiased: our guesses are right
on average.
• OLS is efficient: the smallest possible
variance for unbiased linear estimators.
• Our guesses will tend to be close to right
(or at least as close to right as we can get).
• Warning: the minimum variance could still be
pretty large!
5-85
Gauss–Markov with an Intercept
Yi  b0  b1 X i  e i (i  1...n)
E(e i )  0
Var(e i )  s
2
Cov(e i ,e j )  0, i  j
X's fixed across samples.
All we have done is add a b0 .
5-86
Review: BLUE Estimator of b1
( X i  X )(Yi  Y )
ˆ
b1   n
2
(
X

X
)
 j
j 1
• This estimator is OLS for the DGP with
an intercept.
• It is the Best (minimum variance) Linear
Unbiased Estimator for the Gauss–Markov
DGP with an intercept.
5-87
BLUE Estimator of b0
• The easiest way to estimate the intercept:
bˆ0  Y  bˆ1 X
• Notice that the fitted regression line always
goes through the point
( X ,Y )
• Our fitted regression line passes through “the
middle of the data.”
5-88
Download