Lecture 1 Outline Mark Hendricks September 2012

advertisement
Lecture 1
Mark Hendricks
University of Chicago
September 2012
Outline
OLS
OLS Inference
OLS Asymptotics
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
2/42
Linear regression model
Consider a linear regression model involving two variables, y and
x.
y =β0 + βx + Hendricks,
I
y is referred to as the regressand, or explained variable.
I
x is referred to as the regressor, covariate, or explanatory
variable.
I
β0 and β are the (constant) parameters of the model.
Spring 2012
Time Series Statistics
Lecture 1
3/42
Multivariate linear regression
In a multivariate regression model with k regressors,
y =β0 + β1 x1 + β2 x2 + . . . βk xk + =β0 +
k
X
βj xj + j=1
=x0 β + Hendricks,
I
The last line defines x such that the first element is the
constant 1, and the first element of β is β0 .
I
Including the regression constant in the vector notation will
simplify the algebra, as we will always consider the case where
the first regressor is a constant.
Spring 2012
Time Series Statistics
Lecture 1
4/42
Data from the regression model
A sample of n observations is denoted as (yi , xi ) for i = 1, 2, . . . n.
yi =x0i β + i
where


1
xi,1 
 
 
xi ≡ xi,2 
 .. 
 . 
xi,k
Hendricks,
Spring 2012


β0
β1 
 
 
β ≡ β2 
 .. 
.
βk
Time Series Statistics
Lecture 1
5/42
Sample regression
Consider a sample estimate of β, denoted by b.
I
Define
ei = yi − x0i b
Thus,
yi = x0i b + ei
I
Hendricks,
This is the sample counterpart to the population regression
equation.
Spring 2012
Time Series Statistics
Lecture 1
6/42
Ordinary least squares
The ordinary least squares estimator of β minimizes the sum of
squared sample errors:
b ≡ arg min
bo
= arg min
bo
Hendricks,
Spring 2012
n
X
(i )2
i=1
n
X
yi − x0i bo
2
i=1
Time Series Statistics
Lecture 1
7/42
OLS problem
Rewrite the OLS problem in matrix notation,
b ≡ arg min e0 e
bo
= arg min (Y − Xbo )0 (Y − Xbo )
bo
where
 0
x1
 x0 
 2
X ≡  . ,
 .. 
x0n
Hendricks,
Spring 2012
 
y1
y2 
 
Y ≡  . ,
 .. 
 
e1
e2 
 
e ≡  . ,
 .. 
yn
Time Series Statistics
en
Lecture 1
8/42
Assumption: Full-rank
Assumption 1:
X0 X is full rank.
Equivalently, assume that there is no exact linear relationship
among any of the regressors.
Hendricks,
I
Clearly, the existence of OLS estimator requires that this
assumption be satisfied.
I
Multicollinearity refers to the case where this assumption fails.
Spring 2012
Time Series Statistics
Lecture 1
9/42
OLS estimate
Solving the minimization problem above gives the OLS estimate:
b = X0 X
Hendricks,
−1
X0 Y
I
This estimate yields sample residuals of
−1 0
e =Y − X X0 X
XY
−1 0 = I − X X0 X
X Y
I
Thus e is orthogonal to X.
I
Equivalently, the in-sample correlation between xi and ei is
zero.
Spring 2012
Time Series Statistics
Lecture 1
10/42
OLS as linear projection
Using the derivation of b and e, we can write the regression
equation as a linear projection.
Y = PY + MY
where M and P are defined as
P ≡ X X0 X
−1
X0
M ≡ I n − X X0 X
Hendricks,
−1
X0
I
P is an n × n matrix which projects data into the space
spanned by the column vectors of X.
I
M is an n × n matrix which projects data into the space
orthogonal to X.
Spring 2012
Time Series Statistics
Lecture 1
11/42
OLS as method of moments
Suppose the population correlation between x and is zero.
0 =E [x]
0 =E x y − x0 β
Thus,
−1
β = E xx0
E [xy ]
If we estimate β by simply replacing population moments with
sample moments,
−1 0
b = X0 X
XY
which is the same as the OLS estimator above.
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
12/42
Assumption: Exogeneity
Assumption 2:
is exogenous to the regressors, x.
E [ |x] = 0
The exogeneity assumption,
Hendricks,
I
implies that is uncorrelated with x.
I
implies that is uncorrelated with any function of x.
I
does NOT imply that is independent of x.
Spring 2012
Time Series Statistics
Lecture 1
13/42
Conditional expectation
It is generally true that given any variable y along with a vector of
variables, x, we can write
y =E [y | x] + where is simply the prediction error,
= y − E [y | x]
Under the exogeneity assumption,
E [y | x] =x0 β
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
14/42
Projection and conditional expectation
I
Using just the assumption of full-rank, linear projection still
gives the best linear conditional forecast.
h
2 i
b = arg min E y − x0 γ
γ
That is, the linear projection x0 b minimizes the
mean-squared-error.
I
Contrast with assuming linearity and exogeneity. Then the
linear projection equals the conditional expectation:
E [y |x] = x0 b
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
15/42
Unbiased statistics
An estimate is unbiased if its expectation equals the population
value.
I
Consider the sample estimator, x̄, for a sample of n.
I
Suppose we have a variable x with population mean µx .;
I
Verify that x is unbiased:
"
E [x] =E
n
1X
xi
n
#
i=1
n
1X
=
E [xi ]
n
i=1
=µx
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
16/42
Is OLS estimate unbiased?
Check if the OLS estimator is unbiased:
h
−1 0 i
XY
E [b] =E X0 X
i
h
−1 0
0
X (Xβ + )
=E X X
i
h
−1 0
X
=β + E X0 X
But E [b] = β if, and only if,
h
−1 0 i
E X0 X
X =0
which is guaranteed by the exogeneity assumption but not simply
by orthogonality.
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
17/42
Variance of estimator
Continue with the example of the sample mean estimator, x̄ for
sample size n.
I
Suppose var [x] = σ 2 .
I
What is the variance of the sample mean estimator, x̄?
var [x̄] =
Hendricks,
Spring 2012
σ2
n
Time Series Statistics
Lecture 1
18/42
Variance of OLS estimator
The variance of the OLS estimator is
−1
−1 0
X ΣX X0 X
var [b |x] = X0 X
where Σ = E [0 |x].
Hendricks,
I
So far, no assumptions have been made regarding the
distribution of the residuals, , beyond their orthogonality.
I
Inference regarding regression estimates is heavily influenced
by the covariance of the residuals.
Spring 2012
Time Series Statistics
Lecture 1
19/42
Heteroscedasticity and autocorrelation of residuals
I
I
Hendricks,
Heteroscedasticity refers to the
have distinct variances,
 2
σ1 0
 0 σ2
2

Σ=

0 0
case where the residuals but
···
···
..
.
···

0
0



σn2
Autocorrelation refers to the case where residuals are
correlated. In this case, Σ is not diagonal.
Spring 2012
Time Series Statistics
Lecture 1
20/42
Assumption: Homoscedastic and orthogonal residuals
Assumption 3: The residuals are uncorrelated across
observations, with identical variances,
Σ = E 0 |x = σ 2 In
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
21/42
Gauss-Markov Theorem
With these assumptions, the OLS estimator, b, is the minimum
variance linear unbiased estimator of β.
Hendricks,
I
The assumption on Σ simplifies the variance of the OLS
estimator,
−1 0 2
−1
var [b |x] = X0 X
X σ In X X0 X
−1
=σ 2 X0 X
I
Any other linear, unbiased estimator of β will have larger
variance.
I
This is known as the Gauss-Markov theorem. It depends on
the above assumptions regarding linearity, exogeneity,
full-rank, and residual covariance structure.
Spring 2012
Time Series Statistics
Lecture 1
22/42
Outline
OLS
OLS Inference
OLS Asymptotics
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
23/42
OLS Inference
The distribution of the OLS estimates is required in order to assess
statistical significance.
I
Hendricks,
Above the mean and variance of b were derived without
making any distributional assumptions.
Spring 2012
Time Series Statistics
Lecture 1
24/42
Assumption: Normality of residuals
Assumption 4:
The residuals, are normally distributed.
|x ∼ N (0, Σ)
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
25/42
Distribution of OLS estimator
Assumptions 1, 2, 3, 4 imply
b |x ∼ N (β, Ω)
where
Ω = σ2 X 0X
−1
Often, these 4 assumptions are referred to as the classical
regression model.
I
Hendricks,
Note that many results were derived without any
distributional assumptions.
Spring 2012
Time Series Statistics
Lecture 1
26/42
OLS Z-test
Testing the significance of an element of b, would simply be a
z-test:
bj − βj
∼Z
σ ωjj
where ωjj2 is the (j, j) element of X0 X
element of Ω.
. Thus, σ 2 ωij2 is the (i, j)
I
However, this statistic depends on σ, which is unknown.
I
Instead one must use the sample estimate of the variance,
s2 =
Hendricks,
−1
Spring 2012
e0 e
n−k −1
Time Series Statistics
Lecture 1
27/42
OLS t-test
The t-test assesses statistical significance of an element of b,
bj − βj
∼ t(n − k − 1)
s ωjj
which depends only on observable data as well as the hypothesized
value, βj .
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
28/42
R-squared
The R-squared, or coefficient of determination, in a regression is
defined as
Ry2,x =
regression sum of squares
total sum of squares
=1 −
error sum of squares
total sum of squares
Algebraically, this is
P
b n (xi − x̄)2
Ry2,x = Pni=1
2
i=1 (yi − ȳ )
0
ee
=1 − Pn
2
i=1 (yi − ȳ )
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
29/42
R-squared versus correlation
Intuitively, the R-squared is the square of the correlation between y
and the projection of y onto x.
Ry2,x = [corr (Y, PY)]2
In a univariate regression of y on x,
Ry2,x = [corr(y , x)]2
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
30/42
Caveat: Regressing on a constant
The interpretation and formula for R-squared does not hold if there
is no constant regressor.
I
Recall that the first element of x is the constant 1, and the
first column of X is a column of 1’s.
I
Including a constant in the regression is equivalent to running
a regression with demeaned data.
I
Running a regression on just a constant regressor and nothing
else, would simply pick up the mean in the data.
I
Thus, simply shifting the sample data will improve the fit as it
lines up closer to the sample mean.
Note that throughout the derivations above, the constant regressor
guarantees that the mean of e will be zero.
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
31/42
OLS F-test
An F-test will determine the joint significance of the linear
regression:
Ry2,x
1 − Ry2,x
n−k −1
k −2
∼ F (k − 2, n − k − 1)
Namely, this tests whether all coefficients are jointly equal to zero.
Hendricks,
I
Note the use of the R-squared stat.
I
It is simple to generalize this to test whether b is jointly equal
to a non-zero hypothesis vector, β ∗ .
Spring 2012
Time Series Statistics
Lecture 1
32/42
Problems with multicollinearity
If regressor j is highly correlated with the other regressors, then the
variance of the coefficient estimate can be written as
var [bj |x] =
1
1 − Rx2j ,x[−k]
!
σ2
Pn
2
i=1 (xi,j − x̄j )
where
Rx2j ,x[−k]
denotes the R-squared from regressing xj on the remaining
regressors.
Hendricks,
Spring 2012
Time Series Statistics
Lecture 1
33/42
Variance inflation factor
The term
1
1 − Rx2j ,x[−k]
is known as the variance inflation factor.
Hendricks,
I
The VIF is closely related to the condition number of X0 X.
I
The condition number in linear algebra measures the
sensitivity of inverting a matrix.
I
It compares the largest and smallest eigenvalues.
I
Most software packages will warn the user if the condition
number (VIF) is too big.
Spring 2012
Time Series Statistics
Lecture 1
34/42
Variation in regressors
The sample variance of xj reduces the variance of the OLS
estimate bj .
I
Again write
var [bj |x] =
Hendricks,
1
1 − Rx2j ,x[−k]
!
σ2
Pn
2
i=1 (xi,j − x̄j )
I
The denominator of the second term is the scaled sample
variance.
I
If there is not enough variation in the regressor data, then the
OLS estimation can not precisely estimate βj .
Spring 2012
Time Series Statistics
Lecture 1
35/42
Time Series Statistics
Lecture 1
36/42
Outline
OLS
OLS Inference
OLS Asymptotics
Hendricks,
Spring 2012
Non-normality
Applications often do not satisfy Assumption 4, upon which the
inference results relied.
Hendricks,
I
However, the asymptotic distribution of the OLS estimate is
still normally distributed under much weaker assumptions.
I
In practice, inference often relies on having large data sets and
appealing to the asymptotic results.
Spring 2012
Time Series Statistics
Lecture 1
37/42
Assumption: Orthogonality of population residuals
Assumption 5:
the regressors.
The population residuals are uncorrelated with
E x0 = 0
Hendricks,
I
This assumption is much weaker than Assumption 2.
I
This is a restriction on the population variables, not the fitted
estimates, which have zero correlation by construction.
Spring 2012
Time Series Statistics
Lecture 1
38/42
Consistency
A sample statistic is consistent if it converges to the true
population value in probability.
I
Suppose that Assumptions 1, 5 hold.
I
Then the OLS estimator, b is consistent,
plim b = β
I
Hendricks,
In practice, more attention is paid to having a consistent
estimator than an unbiased estimator, due to the weaker
assumption.
Spring 2012
Time Series Statistics
Lecture 1
39/42
Asymptotic distribution of OLS
Under Assumptions 1, 5, 3, the OLS estimate is asymptotically
normal,
b |x ∼asym N (β, Ω)
where
Ω = σ2 X 0X
Hendricks,
Spring 2012
−1
Time Series Statistics
Lecture 1
40/42
Heteroscedastic and autocorrelated inference
For many applications, particularly in time-series, Assumption 3 is
clearly false.
Hendricks,
I
For practical purposes, this is not a big problem for inference.
I
Standard errors which correct for heteroscedasticity and
autocorrelation are widely employed.
I
We come back to these later.
Spring 2012
Time Series Statistics
Lecture 1
41/42
References
I
Hendricks,
Cochrane, John. Asset Pricing. 2001.
I
Greene, William. Econometric Analysis. 2011.
I
Hamilton, James. Time Series Analysis. 1994.
I
Wooldridge, Jeffrey. Econometric Analysis of Cross Section
and Panel Data. 2011.
Spring 2012
Time Series Statistics
Lecture 1
42/42
Download