A. Restating the Model. Armed with these new skills, we can see more clearly how to find the coefficient vector β hat if we are willing to make the assumptions required to run an OLS regression. For the moment, let’s look at the most algebraically simple case in which we have mean-deviated all of our variables, so that we do not have to have an intercept in our regression model. Then we can simply look the equation for a regression line,
Y = Xβ + e as an equation of vectors and matrices. Y is now an n x 1 matrix containing n observations of our dependent variable. The matrix X has dimension n x k, and contains n observations of all k of our independent variables. The vector β is a k x 1 vector that will contain one coefficient for each of our k independent variables, and e is an n x 1 vector of errors. Both
β and e are unknown in this equation, but it is β for which we are really anxious to solve.
B. Solving for β. Notice that Xβ is a conformable product. We might like to multiply every term in this equation by X -1 in order to isolate β, but X will not be square and thus not be invertible. So we need to go through slightly more (but not very) complicated algebraic steps in order to solve for the coefficient vector:
X’Y = X’Xβ + X’e
(X’X) -1
Y = Xβ + e
X’Y = (X’X)
β = (X’X)
-1 X’Xβ
-1 X’Y
This should look analogous to the scalar solution of β = σ xy
/ σ 2 xx
A. Specification Assumption: μ = Xβ. This is equivalent to E(y)=Xβ, or for any particular observation, μ i
= x i
β, and says that we got the specification right. Omitting a relevant explanatory variable means that our specification is wrong, and violates this assumption.
B. Full Rank Assumption: (X’X) -1 exists. In order for X’X to be invertible, and for the linear algebra to work, the X matrix of our explanatory variables needs to have full
(column) rank. One way to think of this is that all of the columns of X must be linearly independent. Another way to think of this is that because the X’X matrix is the variancecovariance matrix of our explanatory variables, this means that none of our explanatory variables can be perfectly collinear with another IV.
C. X is nonstochastic: its elements are fixed. Sometimes we think of our explanatory variables as something that we manipulate (ie., assignment to treatment and control groups) rather than fixed characteristics determined by the social system. If so, at least we need to be able to assume that the Xs are independent of the errors: knowing the x values for a case gives us no idea about what its error will be. This fits with experiments when random assignment is used, and makes our X’e vector go to zero in expectation. If this is true, we can see that β hat
is an unbiased estimator of β using the following algebra:
β hat
= (X’X)
= (X’X)
= (X’X)
= β
-1
-1
-1
X’Y
X’(Xβ + e)
X’(Xβ) + (X’X) -1
= β + (X’X) -1 X’e
X’e by the distributive property by the associative property when E(X’e) goes to zero
σ 2
D. Sphericality: The variance-covariance matrix of our errors, E(ee’) = Ω =
I. Note that these are true rather than estimated errors. Remembering what the identity matrix looks like, you should picture the Ω matrix for a regression with three observations as looking like:
0
0
0
2
0
0
0
2
0
0
0
2
0
0
0
2
What does this imply? In a variance-covariance matrix of errors, the i,jth element is the covariance between the ith observation’s error and the jth observation’s error, and i,ith element is the variance of the ith observation’s error. So the sphericality assumption implies two crucial things for a regression: i.
There is no autocorrelation if E(e i
, e j
)=0 for all i that does not equal j. The off diagonal zeroes tells us that there is no covariance between errors across ii.
observations, in expectation.
We have homoskedasticity if E(e i
2 )=σ 2 for all i. The fact that all of the elements along the diagonal of the matrix are the same means that we have constant error variance; there is no systematic variation across our dataset in how well our model predicts outcomes.
E. The Gauss-Markov Theorem. If all of these assumptions are met, the least squares coefficient vector β hat is the minimum variance linear unbiased estimator of the parameter vector β. You can find a proof of this in places like pages 165-6 of Goldberger’s
A Course in Econometrics, but the point is that OLS is not just curve fitting; it has a theoretical justification that may be as deep as the justification of the linear-normal model under the likelihood theorem.
A. Estimating σ coefficient estimates, we need to estimate σ computing, e hat ’e hat
2 . In order to estimate the Ω matrix and the variance of our
where e hat
2hat , the mean squared residual. We begin by
is the n x 1 vector of empirically-derived errors for each hat ’ e hat versus e hat e hat ’. To observation. Notice the vast difference in the dimensionality of e finish the estimate, we use:
σ 2hat = e hat ’e hat /(n-k)
B. Estimating the Variance of β i
. In order to get our standard errors, we need to estimate the variance of our coefficients. We can do this for all coefficients at once by deriving the variance-covariance matrix of our estimated coefficient vector, β hat
.
Var(β hat
) = E[(β hat
- β)(β
E[(β hat
- β)(β hat
- β)’] = (X’X) -1 X’E(ee’)X(X’X) -1
E[(β hat
- β)(β hat hat
- β)’] = E[(X’X)
- β)’] = (X’X) -1
-1 X’e((X’X)
X’ Ω X(X’X)
-1 X’e)’]
-1
E[(β hat
- β)(β hat
- β)’] = (X’X) -1 X’ σ 2 IX(X’X) -1
E[(β hat
- β)(β hat
- β)’] = σ 2 (X’X) -1 X’IX(X’X) -1 by the sphericality assumption
E[(β hat
- β)(β hat
- β)’] = σ 2 (X’X) -1
C. Estimating the R-square. Introducing some new terms and very little algebra:
e hat ‘e = Y’Y – Y hat ‘Y hat
Error Sum of Squares = Total Sum of Squares – Regression Sum of Squares
ESS = TSS – RSS
R-square = RSS/TSS = 1 – ESS/TSS