econometrics_review

advertisement
Economics of Antitrust
Econometrics Lecture Notes
These lecture notes are intended to refresh your memory on the basic tools of
econometrics that will be necessary in order to complete the empirical exercise. It is
assumed that you’ve had some exposure to basic probability, statistics, and econometrics.
This lecture will not cover the formal proofs involved in those subjects. Rather, the focus
will be on outlining the intuition behind the key assumptions and theorems. The goal is
for you (1) to be able to type in the right commands into the computer, (2) to understand
why typing in those commands might be a good idea (i.e., to remember the key theorems
that say good things happen when you do the right procedures), and (3) to understand the
key assumptions underlying the theorems well enough to be able to evaluate their validity
in an application.
Since only some of you may be comfortable with matrix notation, I’ve stated the
assumptions and results both with and without using such notation. In order to keep the
notation simple, when I’m not using matrix notation, I’ve limited attention to the
univariate case. Be assured that all of this analysis generalizes to the case of multivariate
regression, even if you can’t follow the matrix notation.
Preliminaries (Definitions and Notation)
We start with some data. Let N equal the number of observations, with observations
indexed by i = 1, 2, … N. Each observation contains the value of each variable for that
observation. Our goal is to use data on some variables (called regressors, right-hand-side
(RHS) variables, exogenous variables, independent variables, or explanatory variables) to
explain variation in another variable (called the left-hand-side (LHS) variable, the
endogenous variable, the dependent variable, or the explained variable). Let K equal the
number of explanatory variables, with these variables indexed by j = 1, 2, … K. Let Xbar
denote the mean value of the Xi and let Ybar denote the mean value of the Yi.
Note: The explanatory variables include the constant. For example, in the univariate
model Yi = β0 + Xi β1 + εi, there are two explanatory variables: Xi, whose coefficient is
β1, and a constant “variable” that takes the value 1 for all observations, whose coefficient
is β0.
Note: Even though I’ve implied otherwise above, “exogenous” is not a synonym for
“explanatory,” and “endogenous” is not a synonym for “explained.” As will be explained
in more detail below, it is possible for an explanatory variable to be endogenous.
Exogeneity of the RHS variables is an assumption that helps to justify OLS, not a
definition.
Ordinary Least Squares (OLS)
We want to find the effect of X on Y. One way to do this is via ordinary least squares,
illustrated in the following graph.
Y
X
The vertical distance between data point i and the line (Yi - β0 - Xi β) is called the
residual. What OLS does is to pick to slope and intercept of the line that minimizes the
sum of squared residuals.
BOLS  arg min
B0 , B1
N
 Y
i 1
i
 B0  X i B1 
2
BOLS  arg min Y  XB  (Y  XB)
T
B
To solve for the OLS estimator, you take the derivative of the above expression with
respect to each B, set the derivatives equal to zero, and solve. (The second-order
condition is automatically satisfied in these problems, so any solution is guaranteed to be
a minimum. Under the identification assumption described below, a unique solution
always exists.)
N
B1 
 X
i 1
i
 Xbar Yi  Ybar 
N
 X
i 1
i
 Xbar 
2
B0  Ybar  Xbar B1
B  X T X  X T Y
1
Note that because Y (and possibly X) are random variables, the OLS estimates which are
functions of Y will also be random variables. As random variables, they have means
(expected values) and variances. Since our estimates are random, we would like from
them to have some good properties, e.g., we would like for the expected value to be equal
to our close to the true value of the parameter, and we would like for the variance of the
estimates to be small. The next section deals with the assumptions under which OLS can
have nice properties like these.
In STATA, running OLS is easy: “regress Y X” runs the regression of Y on X and a
constant.
Justifications for OLS
Why might OLS be a good idea? Why not minimize the horizontal sum of squared
residuals, the perpendicular sum of squared residuals, or the sum of the absolute value of
the residuals? Why is minimizing the vertical sum of squared residuals better than these
alternatives? The real reason why we like OLS is because, under a wide range of
different assumptions, OLS has various “good” properties. Exactly what “good”
properties it has depends on exactly what assumptions you make. For the purposes of
this exercise, I’d like to focus on the Gauss-Markov theorem, which is one of many
theorems that starts with some assumptions about the random process generating the data,
and proves that OLS has some nice properties.
A1: Linearity
Yi = β0 + Xi β1 + εi
Y=Xβ+ε
Note that this assumption requires linearity in the parameters (β), and not necessarily in
the variables (X). If your model has a nonlinear variable (e.g., Yi = β0 + ln(Xi)β1 + εi),
simply define a new variable (e.g., Wi = ln(Xi)), such that the model will be linear in the
new variable (e.g., β0 + Wi β 1 + εi). Similarly, εi can be defined as the difference
between Yi and β0 + Xi β1.
We call εi the disturbance term. It captures all of the variables that we do not observe
(and therefore cannot account for) that affect Yi, conditional on Xi.
A2: There is “enough” variation in the explanatory variables (The Identification
Condition)
In the univariate case, this assumption requires that the data contain at least two different
values for X.
In the multivariate case, this assumption requires that X be an N by K matrix with rank
K. This not only means that there must be variation in each individual variable (other
than the constant), but also that the variables must be linearly independent. For example,
if income is always exactly 10% of wealth, it would be impossible for a regression to
estimate separate coefficients for the effects of income and wealth on consumption.
Here’s a picture that shows what goes wrong if this assumption is violated:
Y
ANY Line where the
predicted Y equals the
mean value of Y at the
observed value of X
will minimize the
residual sum of
squares.
X
What happens if you attempt to get a computer to run a regression on data that has this
problem? The exact details vary by which software package you’re using, but the results
will typically immediately reveal that you have a problem. For example, if STATA
encounters this problem, it will drop variables from the regression until the problem goes
away. In the above example, STATA will drop the only variable (X), and will simply
report a horizontal line at the mean value of Y.
A3: Exogeneity
E[εi | X1, X2 … XN] = 0
E[ε | X] = 0
If assumption 3 is satisfied, we say that X is exogenous. If it is not satisfied, we say that
X is endogenous. Note that E[ε | Y ] does not equal zero (it equals Y - β0 - β1 Xbar), so
we say that Y is endogenous.
How can this assumption fail? If there are omitted variables (reflected in the disturbance
term) that are correlated with X, then the expected value of the disturbance term,
conditional on X, will be a function of X, and hence will not equal zero. If we omit the
constant B0, and in fact the true β0 does not equal zero, then E[εi] = β0.
Theorem 1: Under A1-A3, OLS is unbiased.
An estimator is unbiased if E[B] = β.
A4: Homoscedasticity
Var[εi | X1, X2 … XN] = σ2
Var[ε | X] = σ2
This assumption requires that the variance of the disturbance be a constant that does not
vary with the explanatory variables.
Theorem 2 (Gauss-Markov Theorem): Under A1-A4, OLS is BLUE (Best Linear
Unbiased Estimator)
What does this mean? Because the disturbances are random variables, any estimator
based in part of the observed values of Y will also be a random variable. Like all random
variables, such estimators have a mean (i.e., an expected value) and a variance.
An estimator is unbiased if its expected value equals the true value of the parameter.
An estimator, B, is linear if it is a linear function of Y (it can be non-linear in X).
B = f1(X1,X2,…XN)*Y1 + … + fN(X1,X2,…XN)*YN
B = f(X)Y
One estimator is “better” than another if it has lower variance.
Thus, the Gauss-Markov theorem says that, of all possible estimators that are both linear
and unbiased, the OLS estimator has the lowest variance.
Some statisticians and econometricians object to calling a minimum variance estimator
the “best” estimator. They point out that there may be many other criteria for deciding on
the best estimator. For example, you might want to use the estimator with the least meansquare error (E[(B-β)2]), and accept some small bias in return for lowering the variance.
These people prefer the phrase “MVLUE” (“minimum variance linear unbiased
estimator”) to “BLUE.”
[ASK ME ABOUT A FUNNY STORY ABOUT “BLUE” AT THIS POINT.]
Note: In calculating standard errors, most computer programs (including STATA)
assume A1-A4 by default. If you only want to assume A1-A3, you can usually set an
option to do so (in STATA, you use the “robust” option, e.g., reg Y X, robust). Of
course, under A1-A3, OLS is still unbiased, but it is not minimum variance, under these
assumptions there may be better estimators available.
A5: Normality
εi ~ N(0, σ2)
ε ~ N(0, σ2)
Theorem 3 (based on the Rao-Blackwell Theorem): Under A1-A5, OLS is the minimum
variance unbiased estimator.
This theorem tells us that, under the stronger assumption of normality, OLS not only has
lower variance than all other estimators that are unbiased and linear, it also has lower
variance than all non-linear unbiased estimators.
Theorem 4 (based on the Cramer-Rao Inequality): Under A1-A5, the OLS estimator is a
maximum likelihood estimator, and is consistent, asymptotically normally distributed,
asymptotically efficient, and invariant to transformations of the parameter.
This theorem gives us an entirely different approach to justifying the use of OLS.
An estimator B is consistent if it approaches the true value β as the number of
observations increases.
An estimator is asymptotically normally distributed, if its distribution approaches a
normal distribution as the number of observations increases.
An estimator is asymptotically efficient if, in the limit as the number of observations goes
to infinity, its variance is smaller than any other consistent and asymptotically normal
estimator.
An estimator B is invariant to a transformation of the parameter if the corresponding
estimator of f(β) equals f(B).
Simultaneous Equations and the Problem Endogenous Variables
Now I’d like to focus on one key assumption in more detail.
A3: Exogeneity
E[εi | X1, X2 … XN] = 0
E[ε | X] = 0
Under many circumstances, including the empirical exercise, this is an unreasonable
assumption, and OLS gives biased estimates. In particular, A3 is violated whenever the
disturbance is correlated with one of the explanatory variables, since this means that
expected value of the disturbance, conditional on the explanatory variables, will be a nonzero function of the explanatory variables, and thus cannot equal zero.
Short example: Suppose we want to find out the effect of schooling on income. Note that
“ability” may affect both schooling (smarter people get more education) and income
(smarter people earn more). We can attempt to control for this by adding RHS variables
that proxy for ability. But, as long as the RHS variables we include do not perfectly
capture ability, there will be some residual unobserved ability that is reflected in the
disturbance term. If the residual component of ability affects only income and not
schooling, then we have no problem and OLS will work fine. But if the residual
component of ability still affects both schooling and income, then schooling will be
correlated with the disturbance, and A3 will be violated. In this case (because the
correlation is positive), the OLS estimate of the effect of schooling is biased upward.
Longer example, closely related to the empirical exercise: if both the dependent variable
and an explanatory variable are actually determined by a system of simultaneous
equations, than A3 will almost certainly be violated. To see this, we’ll going to walk
through a simple supply and demand example.
[Equation 1a: Demand]
[Equation 2a: Supply]
Q = a – b*P + εd
P = c + d*Q + εs
Suppose that εd ~ N(0,σ2d) and that εs ~ N(0, σ2s), and that εd and εs are independent. We
would like to estimate [1a]. Is A3 satisfied? The answer is no. To see this, suppose that
there is a positive shock to demand (e.g., the price of a complement fell). This means
that consumers are willing to purchase a greater quantity at each price than before, i.e., εd
increased. What happens? In equilibrium, when demand shifts out, both price and
quantity increase. Thus, we have established that changes in εd are positively correlated
with changes in P, via movement along an upward sloping supply curve. But this means
that P cannot be exogenous, and an OLS estimate of b will be biased toward zero (i.e.,
OLS will estimate a demand curve that is more inelastic than the true demand curve).
Here’s another way of thinking about this. If we were to simply regress Q on P, would
we be estimating a demand curve or a supply curve? It might be tempting to say that we
would be estimating a demand curve, since above we have written demand with Q as a
function of P, and supply as P as a function of Q. But this is incorrect, since we could
have written the equations the other way around. As illustrated in the graphs below, the
answer depends on what is changing between observations: supply and/or demand. If the
demand curve is constant across the observations (εd = 0), then movements in the demand
curve (variation in εs) will trace out the demand curve. Conversely, if supply curve is
constant across observations (εs = 0), then movements in the demand curve will trace out
the supply curve. On other hand, we might expect that typically, both supply and demand
are subject to shocks that vary across the observations. In that case, what we estimate
when we regress Q on P is some mixture of the two curves, weighted to whichever one
moves around the least.
Q
Q
P
Q
P
P
Note: P is on the horizontal axis and Q is on the vertical axis: exactly the
opposite of the usual layout.
Instrumental Variables (IV)
What can we do about this problem? One possible solution is called instrumental
variables regression. Let’s start by supposing that we found some way to decompose P
into Pexog + Pendog in such a way that Pexog were exogenous (E[εd | Pexog] = 0) and that
Pendog were defined to equal P – Pexog (we know that Pendog must be endogenous and
positively correlated with εd, since P is positively correlated with εd and Pexog is not). We
can then rewrite the demand equation as follows:
[Equation 1b]
Q = a – b*Pexog + (εd – b*Pendog)
Note that if we treat Pexog as the explanatory variable and (εd – b*Pendog) as the
disturbance, we have an equation that satisfies A3. Thus if we regress Q on Pexog, we will
get an unbiased estimate of a and b. Furthermore, if Pexog is positively correlated with P,
this regression will have some power (in other words if Pexog is uncorrelated with P, the
expected variance of the estimator is infinite). Thus, we can solve the endogeneity
problem if we can construct a variable that is uncorrelated with the residual, but still
moves with the endogenous explanatory variable.
How could construct such a variable? Let’s suppose that we have data on another
variable, Z, which gives the price of raw materials used to produce the good. Obviously
Z should appear in the supply equation, but probably not in the demand equation.
[Equation 1b: Demand]
[Equation 2b: Supply]
Q = a – b*P + εd
P = c + d*Q + e*Z + εs
If Z is independent of demand shocks, it will have no effect on demand except through its
effect on price, and E[εd | Z] = 0 (Z is exogenous). This fact (E[εd | Z] = 0) is the key
assumption that we need to be true in order to solve the endogeneity problem. Now what
happens if we regress P on Z (P = f + g*Z), and then use the results of this regression to
get estimates of P for each observation (Phati = fOLS + gOLS * Zi)? Since Z is exogenous,
and Phat is a linear function of Z (and some constants), Phat is exogenous (E[εd | Phat] =
0). [Quick proof: Suppose Phat were not exogenous, i.e., that E[εd | Phat] = h(Phat) for
some function h. Then it must be the case that E[εd | Z] = h(fOLS + gOLS * Z). But this
contradicts the assumption that Z was exogenous.]
Thus, if we use Pexog = Phat, we have solved the endogeneity problem. Phat can be
contructed from our exogenous variable Z. Since Z is one of the determinants of P, Phat
will be positively correlated with P. Since Z is exogenous, Phat will also be exogenous.
Thus, a regression of Q on Phat will yield unbiased estimates of the structural coefficients
a and b that we are interested in.
Note 1: The above argument relied heavily on the following assumptions.
a. E[εd | Z] = 0
b. Z does not appear in the equation we are estimating (the demand equation).
[If Z did appear in
c. Z and P are correlated (either positively or negatively).
Note 2: The instrumental variables approach is much more general than this simple
example indicates. In particular you can use IV regression in models with some
exogenous explanatory variables that appear in the equation of interest, multiple
endogenous explanatory variables, and multiple instrumental variables. In general, the
setup is as follows.
[Equation of interest]
Yi = β0*1 + β1*Xexog1 + … + βK *XexogK + … +
βK+1*Xendog1 + … + βL*XendogM + εd
[Instrumental variables]
E[εd | Zr] = 0 for m = 1, 2, … R.
Note that the assumptions a-c above must be satisfied for each instrumental variable Zm.
Furthermore, you need at least one instrumental variable for each endogenous
explanatory variable (i.e., R >= M). [There is one more assumption (the rank condition).
I’m not telling it to you because (1) I don’t know how to express it without using linear
algebra, and (2) it is almost impossible for the rank condition to not be satisfied when
there is at least one instrument for each endogenous explanatory variable. If the rank
condition is not satisfied, STATA will let you know.]
The general procedure is as follows:
Step 1a: Regress each of the endogenous explanatory variables on all of the exogenous
variables (this latter group includes of the constant, the exogenous explanatory variables,
and the instrumental variables).
Step 1b: Using the regression coefficients from step 1a, construct a prediction of each
endogenous explanatory variable.
Step 2: Regress the dependent variable of interest on the explanatory variables, replacing
each endogenous explanatory variable with its prediction from step 1b. The regression
coefficients will be unbiased estimates of the true coefficients in the structural equation
of interest.
Because of this two-step procedure, IV regression is sometimes called Two-Stage-LeastSquares (2SLS).
Note 3: You do an IV regression by hand, using one command for each step. But
STATA and most other software packages can do it automatically. Here’s how to get
STATA to do it:
ivreg Y Xexog1 … XexogK (Xendog1 … XendogM = Z1 … ZR)
In the case of the empirical exercise, you will have two endogenous explanatory
variables, two instrumental variables, and at least one exogenous explanatory variable.
Thus, you will enter commands that looks something like this:
ivreg q1 income (p1 p2 = aac1 aac2)
ivreg q2 income (p1 p2 = aac1 aac2)
For details, see the assignment.
Download