Lecture 15: Regression specification, part II BUEC 333 Professor David Jacks

advertisement
Lecture 15: Regression specification, part II
BUEC 333
Professor David Jacks
1
Specification error as a violation of Assumption 1
of the CLRM.
Mis-specification occurs with the wrong choice of:
1.) independent variables
2.) functional form
3.) error distribution
Already considered “right” independent variables
Specification error
2
As always, our regression model is:
Yi = E[Yi | X1i ,X2i ,...,Xki] + εi
Given that X1i ,X2i ,...,Xki will be included in the
model, we need to decide on a shape for the
regression function, E[Yi | X1i ,X2i ,...,Xki].
Do we think the relationship between Xji and Yi is:
1.) a straight line?
2.) a curve?
3.) non-monotonic?
What is functional form and why does it matter?
3
Should the slope be the same for every
observation or are there distinct groups that have
separate slopes (e.g., men/women, before/after)?
Likewise, should the intercept be the same for all
observations or are there distinct groups of
observations that have a separate intercept?
What is functional form and why does it matter?
4
A functional form is a mathematical specification
of the regression function E[Yi | X1i ,X2i ,...,Xki] that
we choose in response to these questions.
The point is that different functional forms may
give very different answers about the marginal
effects of X on Y…and very different predictions.
Thus, correct inference requires us getting the
What is functional form and why does it matter?
5
Should a model include an intercept?
The short answer: yes, always.
The long answer: it is possible that theory tells
you that the regression function should pass
through the origin; that is, theory tells you that
when all the X’s are zero, then Y is zero as well.
First things first: the constant in OLS regressions
6
And why is this better? Theory could be wrong.
If we include the intercept and the true intercept
(of the DGP) turns out to be zero, the (unbiased)
OLS estimate of the intercept should tend to zero.
But if we leave out the intercept but the true
intercept (of the DGP) turns out not to be zero, we
can really screw up our estimates of the slopes.
First things first: the constant in OLS regressions
7
An example: estimating a cost function
Ci = β0 + β1Qi + εi
If Q = 0, then we often think C should be zero.
Makes sense for variable costs: if GM produces no
cars, it needs no workers on the assembly line.
First things first: the constant in OLS regressions
8
First things first: the constant in OLS regressions
9
It might also help to think about what an estimate
of β0 typically includes the:
1.) true β0
2.) (constant) impact of any specification error
3.) sample mean of the residuals (if not equal to 0)
Ideally, we want to purge our results of garbage
First things first: the constant in OLS regressions
10
Now, why would anyone every run a regression
without a constant?
As it turns out, running OLS without a constant
serves to artificially inflate both the value of the
F-statistic and the R2 of the regression.
By including the effect of the intercept in the TSS,
a regression without a constant increases TSS, but
more so ESS than RSS and as R2=ESS/TSS…
First things first: the constant in OLS regressions
11
Having settled the question of whether to include
or not include an intercept, our attention becomes
primarily concerned with functional form.
The most common functional forms we will
encounter are the:
1.) linear: Y = β0 + β1X1 + ε
2.) polynomial: Y = β0 + β1X1 + β1X12 + ε
3.) log-log: log(Y) = β0 + β1log(X1) + ε
4.) semi-log: Y = β0 + β1log(X1) + ε -orlog(Y) = β0 + β1X1 + ε
Functional form
12
The simplest functional form arises when the
independent variables enter linearly:
Yi = β0 + β1X1i + εi
Remember linearity can refer to two things:
linearity in variables and linearity in coefficients.
Examples of non-linearity in variables:
Y = β0 + β1X1 + β1X12 + ε
Y = β0 + β1log(X1) + ε
Y = β0 + β1(1/X1) + ε
The linear functional form
13
The linear functional form
14
The linear functional form
15
For our purposes, non-linearity in variables is OK.
On the other hand, linearity in coefficients is
essential and occurs when the beta’s enter in the
most straightforward fashion.
Thus, the beta’s cannot be:
1.) raised to any power except one
2.) multiplied or divided by other coefficients
The linear functional form
16
Why would we choose the linear form?
1.) If theory or intuition tells us that the marginal
effect of X on Y is a constant (that is, the same at
every level of X):
Yi
 1
X 1i
The linear functional form
17
Why would we choose the linear form?
2.) If theory or intuition tells us that the elasticity
of Y with respect to X is not a constant (that is, it is
not the same at every level of X):
Y , X
1
Yi / Yi
Yi X 1i


X 1i / X 1i X 1i Yi
3.) If you do not know what else to do
The linear functional form
18
A flexible alternative to the linear functional form
is a polynomial where one or more independent
variables are raised to powers other than one:
Yi = β0 + β1X1i + β2(X1i)2 + β3(X1i)3 + β4X2i + εi
Why would we choose a polynomial form?
If theory or intuition tells us that the marginal
effect of X on Y is not a constant:
The polynomial functional form
19
This also implies that the elasticity is not constant:
Y , X
1
Yi X 1i
2 X 1i

 1 2 2 X 1i  33 X 1i
X 1i Yi
Yi


This approach is a useful supplement to the linear
form, precisely because of its flexibility.
The polynomial functional form
20
The polynomial functional form
21
If y to the “b-th power” produces x, then b is the
logarithm of x (with base of y):
b = log(x) if yb = x
Thus, a logarithm (or a log) is the exponent to
which a given base must be taken in order to
produce a specific number.
While logs come in more than one variety, we will
use only natural logs (logs to the base e):
A brief explanation of logs
22
e2 = 2.718282 = 7.389 → ln(7.389) = 2
ln(100) = 4.605
ln(1,000) = 6.908
ln(10,000) = 9.210
ln(100,000) = 11.513
ln(1,000,000) = 13.816
Distinct advantages of using logs:
1.) makes it easy to figure out impact in % terms
A brief explanation of logs
23
One of the most common specifications:
lnYi = β0 + β1lnX1i + β2lnX2i + εi
Why would we choose this form?
If theory or intuition tells us that the marginal
effect of X on Y is not a constant (that is, the
regression function is, in fact, a curve):
Yi
Yi  ln Yi  ln X 1i
 ln Yi

 Yi
X 1i  ln Yi  ln X 1i X 1i
 ln X 1i
The log-log functional form
24
But this implies that the elasticity of Y with respect
to X is constant (main reason for using this form):
Y , X
1
Yi X 1i
Yi X 1i

 1
 1
X 1i Yi
X 1i Yi
Here, the beta’s directly measure elasticities:
that is, the % change in Y for a 1% change in X
(holding all else constant), implying a non-linear
but smooth relationship between X and Y.
The log-log functional form
25
LHS: RHS:
The log-log functional form
26
This one comes in two flavors:
(1) ln(Yi) = β0 + β1X1i + β2X2i + εi
(2) Yi = β0 + β1ln(X1i) + β2X2i + εi
That is, some variables are in natural logarithms.
Economists use this kind of functional form often;
a common application is when the variable being
logged has a very skewed distribution.
The semi-log functional form
27
1.0e-06
8.0e-07
6.0e-07
Density
4.0e-07
0
2.0e-07
2000000
4000000
6000000
usd_salary
8000000
1.00e+07
.4
0
.2
Density
.6
.8
0
12
13
The semi-log functional form
14
ln_salary
15
16
28
Here, neither the marginal effect nor the elasticity
is constant.
But the coefficients in model 1 do have a very
useful interpretation: β1 measures the percentage
change in Yi for a one unit change in Xi.
Example: in model 1, if Y is a person’s salary and
X1 is years of education, then β1 measures the %
The semi-log functional form
29
Another specification issue we are concerned with
is whether different groups of observations have
different slopes and/or intercepts.
We have seen dummy variables before
(“salary_nomiss” in the NHL; “I” in gravity).
These simply indicate the presence or absence of a
characteristic and, thus, take the value of 0 or 1.
Example: Di = 1 if person i is a male
Dummy variables
30
The most common use of dummy variables is to
allow different intercepts for different groups.
Example: Wi = β0 + β1Xi + β2Di + εi where
Wi = person i’s hourly wage
Xi = person i’s education (years)
Di = 1 if person i is male
Di = 0 if person i is not male
For males, Wi = β0 + β1Xi + β2 + εi
(intercept is β0 + β2)
Intercept dummies
31
Same marginal effect, but for a given level of
education, the average wage of males and nonmales is different by β2 dollars.
Intercept dummies
32
Notice that in the previous example, we did not
include a second dummy variable for being male:
Wi = β0 + β1Xi + β2Di + β3Fi + εi where
Fi = 1 if person i is female
Fi = 0 if person i is not female (is male)
Why? This violates Assumption 6 of the CLRM
(no perfect collinearity) as Fi = 1 – Di and Fi is an
exact linear function of Di.
The dummy variable trap
33
Using dummy variables to indicate the absence/
presence of conditions with more than two
categories is no problem…create more dummies.
Example: dummies for position in hockey;
POSITION takes one of four values (L, C, R, D).
Know that SALARY and GOALS are positively
correlated…a result that holds up in OLS results.
What about more than two categories?
34
In this case, we could create a set of position
dummies:
L = 1 if POSITION = L, and 0 otherwise
C = 1 if POSITION = C, and 0 otherwise
R = 1 if POSITION = R, and 0 otherwise
Our omitted category is POSITION = D.
Our regression could then be:
SALARYi = β0 + β1Li + β2Ci + β3Ri + … + εi
What about more than two categories?
35
We can also use dummy variables to allow the
slope of the regression to vary across observations.
Example: suppose we think the returns to
education (the marginal effect of another year of
education on wages) is different for males than for
non-males, but the intercepts are the same.
We could estimate the following regression model:
Wi = β0 + β1Xi + β2XiDi + εi
What about more than two categories?
36
For males (Di = 1), the regression model is
Wi = β0 + (β1 + β2 )Xi + εi
For non-males (Di = 0), the regression model is
Wi = β0 + β1Xi + εi
Likewise, we can consider regressions where there
are both different slopes (i.e., interaction terms are
included) as well as different intercepts.
What about more than two categories?
37
The average wage of males and non-males is
different by β2 dollars when Xi=0
What about more than two categories?
38
As always, you should choose a functional form
based on theory and intuition.
Consequently, you should avoid choosing
functional form based on model fit (R2).
For one thing, you cannot compare R2 with
different functional forms for the dependent
variables; for example, Y versus ln(Y).
Choosing the wrong functional form
39
Choosing the wrong functional form
40
Download