The Simple Regression Model Ch.2 The simple regression model

advertisement
Chapter 02
Ch.2 The simple regression model
1.
2.
3.
4.
5.
6.
The Simple Regression Model
y = 0 + 1x + u
Econometrics
Definition of the simple regression model
Deriving the OLS estimates
Mechanics of OLS
Units of measurement & functional form
Expected values & variances of OLSE
Regression through the origin
1
Econometrics
2
Econometrics
4
2.1 Definition of the model
Equation (2.1), y = 0 + 1x + u, defines the
Simple Regression model.
In the model, we typically refer to




y as the Dependent Variable
x as the Independent Variable
s as parameters, and
u as the error term.
Econometrics
3
The Concept of Error Term
A Simple Assumption for u
u represents factors other than x that affect y.
If the other factors in u are held fixed, so that
u = 0, then y = 1x.
Ex. 2.1: yield = 0 + 1fertilizer + u (2.3)

u includes land quality, rainfall, etc.
Ex. 2.2: wage = 0 + 1educ + u

(2.4)
u includes experience, ability, tenure, etc.
Econometrics
Simple Regression Model
5
The average value of u, the error term, in
the population is 0. That is, E(u) = 0.

This is not a restrictive assumption, since we
can always use 0 to normalize E(u) to 0.
To draw ceteris paribus conclusions about
how x affects y, we have to hold all other
factors (in u) fixed.
Econometrics
6
1
Chapter 02
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)
Zero Conditional Mean
y
We need to make a crucial assumption
about how u and x are related.
We want it to be the case that knowing
something about x does not give us any
information about u, so that they are
completely unrelated. That is, that


f(y)
x1
E(y|x) = 0 + 1x
.
u4 {
y3
y2
2
.} u1
y1
(B.27)
Now we prepare 2 restrictions to estimate s.
(2.10)
(2.11)
Simple Regression Model
x2
x3
x4
Econometrics
x
10
Deriving OLSE using MM
Since u = y – 0 – 1x, we can rewrite;
(2.12)
E(u) = E(y – 0 – 1x) = 0
(2.13)
E(xu) = E[x(y – 0 – 1x)] = 0
These are called moment restrictions
E(u|x) = E(u) = 0 also implies that
Cov(x,u) = E(xu) = 0
Econometrics
.} u3
u {.
Cont.
To derive the OLS estimates, we need to
realize that our main assumption of

8
y
y4
9
Deriving OLSE using MM
E(u) = 0
E(xu) = 0
x
Econometrics
x1
Econometrics

1
Population regression line, sample data points
and the associated error terms
Basic idea of regression is to estimate the
population parameters from a sample.
Let {(xi,yi): i = 1, …, n} denote a random
sample of size n from the population.
For each observation in this sample, it will
be the case that
yi = 0 + 1xi + ui. (2.9)
 Because Cov(X,Y) = E(XY) – E(X)E(Y)
x2
7
2.2 Deriving the OLSE

0
.
E(u|x) = E(u) = 0 (2.5&2.6), which implies
E(y|x) = 0 + 1x (PRF)
(2.8)
Econometrics

. E(y|x) =  +  x
11

The approach to estimation implies imposing the
population moment restrictions on the sample
moments. It means, a sample estimator of E(X),
the mean of a population distribution, is simply
the arithmetic mean of the sample.
Econometrics
12
2
Chapter 02
More Derivation of OLS
Cont.
We want to choose values of the parameters
that will ensure that the sample versions of
our moment restrictions are true
The sample versions are as follows:
n

Given the definition of a sample mean, and
properties of summation, we can rewrite the first
condition as follows
y  ˆ0  ˆ1 x (2.16) or ˆ0  y  ˆ1 x (2.17)
So the OLS estimated slope is

n 1  yi  ˆ0  ˆ1 xi  0 (2.14)
i 1
n

More Derivation of OLS
n
ˆ1 

n 1  xi yi  ˆ0  ˆ1 xi  0 (2.15)
i 1
Econometrics
Sample regression line, sample data points
and the associated estimated error terms
y1
Econometrics
.
 uˆ    y
n
i 1
2
i
n
i 1
i
 ˆ0  ˆ1 xi

2
(2.22)
The first order conditions, which are the
almost same as (2.14) & (2.15),
 y
n
.} û1
i 1
x1
16
Given the intuitive idea of fitting a line, we
can set up a formal minimization problem.
.} û3
.
14
Alternate approach to derivation
yˆ  ˆ 0  ˆ1 x
û2 {
Econometrics
15
û4 {
(2.19)
2
 xi  x 
Intuitively, OLS is fitting a line through the
sample points such that the sum of squared
residuals is as small as possible, hence the
term is called least squares.
The residual, û, is an estimate of the error
term, u, and is the difference between the
fitted line (sample regression function) and
the sample point.
See (2.18) & Figure (2.3)
y4
 y
More OLS
The slope estimate is the sample
covariance between x and y divided by the
sample variance of x.
If x and y are positively (negatively)
correlated, the slope will be positive
(negative).
x needs to vary in our sample.
y
i
n
13
Summary of OLS slope estimate
y3
y2
i
i 1
i 1
Econometrics

 x  x  y
x2
Econometrics
Simple Regression Model
x3
x4
i

n


 ˆ0  ˆ1 xi  0,  xi yi  ˆ0  ˆ1 xi  0
i 1
x
17
Econometrics
18
3
Chapter 02
2.3 Properties of OLS
Cont.
2. The sample covariance between the
Algebraic Properties of OLS
regressors and the OLS residuals is zero
1. The sum of the OLS residuals is zero.
n
 x uˆ
Thus, the sample average of the OLS
residuals is zero as well.
n
 uˆ
i 1
i
 0 and thus,
1
n
n
 uˆ
i 1
i
0
Econometrics
Cont.
i 1
19
Econometrics
20
Goodness-of-Fit
  y  y   SST (2.33)
  yˆ  y   SSE (2.34)
 uˆ  SSR (2.35)
2
i
2
i
2
i
Then, SST  SSE  SSR (2.36)
21
2.4 Measurement Units & Function Form
If we use the model y* = 0* + 1* x* + u*
instead of y = 0 + 1 x + u, we get
c
ˆ0*  cˆ0 and ˆ1*  ˆ1
d
where y* = c y and x* = d x. Similarly,
y x
y

 ˆ1 
x y
x
It’s useful we think about how well the
sample regression line fits sample data.
From (2.36),
SSE
SSR
(2.38).
R2 
 1
SST
SST
R2 indicates the fraction of the sample
variation in yi that is explained by the
model.
Econometrics
22
2.5 Means & Variance of OLSE
Now, we view ̂ i as estimators for the parameters
i that appears in the population, which means
properties of the distributions of ̂ i over different
random samples from the population.
Unbiasedness of OLS
Unbiased estimator: An estimator whose expected
value (or mean of its sampling distribution) equals
the population value (regardless of the population
value).
where y* = ln y and x* = ln x.
Simple Regression Model
 0 ( 2.31)
y  ˆ0  ˆ1 x
up of an explained part, and an unexplained part,
yi  yˆ i  uˆi (2.32) Then we define the following :
Econometrics
i
through the mean of the sample
We can think of each observation as being made
ˆ1* 
i
3. The OLS regression line always goes
(2.30)
Algebraic Properties
Econometrics
Algebraic Properties
23
Econometrics
24
4
Chapter 02
Cont.
Unbiasedness of OLS
Cont.
In order to think about unbiasedness, we
need to rewrite our estimator in terms of
the population parameter.
Assumption for unbiasedness
1. Linear in parameters as y = 0 + 1x + u
2. Random sampling {(xi, yi): i = 1, 2, …, n},
Thus, yi = 0 + 1xi + ui
3. Sample variation in the xi, thus
 (x  x)
i
2
Unbiasedness of OLS
ˆ1  
xi  x  yi
 (x  x)
2
i
 
0
then E ˆ1  1 
4. Zero conditional mean, E(u|x) = 0
 1 
 x  x u
 (x  x)
i
i
2
(2.49), (2.52)
i
 x  x   E (u
 (x  x)
i
2
i
| x )  1 (2.53)
i
* we can also get E ( ˆ0 )   0 in the same way.
Econometrics
25
Unbiasedness Summary
Cont.
26
Variances of the OLS Estimators
The OLS estimates of 1 and 0 are
unbiased.
Proof of unbiasedness depends on our 4
assumptions – if any assumption fails, then
OLS is not necessarily unbiased.
Remember unbiasedness is a description of
the estimator – in a given sample our
estimate may be “near” or “far” from the
true parameter.
Econometrics
Econometrics
Now we know that the sampling
distribution of our estimate is centered
around the true parameter.


We want to think about how spread out this
distribution is.
It is much easier to think about this variance
under an additional assumption, so assume
5. Var(u|x) = 2 (Homoskedasticity)
27
Econometrics
28
Homoskedastic Case
Variance of OLSE
y
2 is also the unconditional variance, called
the error variance, since
f(y|x)
Var(u|x) = E(u2|x) - [E(u|x)]2
2
2
2
 E(u|x) = 0, so  = E(u |x) = E(u ) = Var(u)
 And , the square root of the error variance, is
called the standard deviation of the error.
Then we can say
. E(y|x) =  +  x

E(y|x)=0 + 1x and Var(y|x) = 2
Econometrics
Simple Regression Model
x1
29
0
.
1
x2
Econometrics
30
5
Chapter 02
Heteroskedastic Case
Cont.
Variance of OLSE
f(y|x)
Var ( ˆ1 ) 
.
.
x1
.
x2
x3
E(y|x) = 0 + 1x
x
Econometrics
We don’t know what is the error variance,
What we observe are only the residuals, ûi,
not the errors, ui.
So we can use the residuals to form an
estimate of the error variance.
Econometrics
33
Estimate
32
Estimate
uˆi  yi  ˆ0  ˆ1 xi
  0  1 xi  ui   ˆ0  ˆ1 xi
 u  ˆ    ˆ   x
i

0
0
 
1
1

i
Then, an unbiased estimator of  2 is
1
uˆ 2
n  2  i
(2.61)
Econometrics
34
Now, consider the model without a intercept:
~
~
y  1 x (2.63).

Solving the FOC to the minimization
problem, OLS estimated slope is
xy
~
1   i 2 i (2.66).
 xi
If we substitute ˆ for  , then we have
the standard error of ˆ ,
1
ˆ
(
x
 i  x )2
2
Simple Regression Model
Econometrics
2.6 Regression through the Origin
recall that s.d. ˆ  Var ( ˆ )
Econometrics
( 2 . 57 )
 x )2
The larger the error variance, 2, the larger
the variance of the slope estimate.
The larger the variability in the xi, the
smaller the variance of the slope estimate.
As a result, a larger sample size should
decrease the variance of the slope estimate.
ˆ 2 
ˆ  ˆ 2  Standard error of the regression
 
i
2
Cont. Error Variance
2, because we don’t observe the errors, ui.
se ˆ1 
 (x
31
Estimating the Error Variance
Cont. Error Variance

* Recall that a intercept can always normalize E(u)
to 0 in the model with 0.
35
Econometrics
36
6
Download