Word Document - Mysmu .edu mysmu.edu

advertisement
Econ107 Applied Econometrics
Topic 6: Specification: Choosing a Functional Form
(Studenmund, Chapter 7)
I.
Should a Constant Term be Included?
Constant term is there to provide for flexibility of the shape (position) of the
regression line. Suppose the correct regression model is:
lnW i =  0 + 1 S i +  i
with  0  0 , but we estimate the following model:
lnW i = 1 S i +  i
6
0
2
4
lnW
8
10
12
The effect of suppressing  0 can be seen from the graph:
0
5
10
15
20
S
Suppressing the constant is that the slope coefficient estimates are biased. Also,
Under the 'false' model:
*

Var ( ˆ1 ) = 2
S i
2
Under the true model:

Var ( ˆ1 ) = 2
si
2
n
Since
n
n
i 1
i 1
n
 si   (S i  S ) 2  S i  nS 2 . S i , the t-ratio is inflated.
i 1
2
2
2
i 1
Include the constant term if data are not in the neighborhood of the origin. Unless
you have strong reason, do not suppress the constant term. Although the constant
term is important from the specification view point, it should NOT be relied on for
purposes of interpretation and analysis.
II.
Functional Forms.
The Log-Log Regression Model
Consider the following 'exponential' regression model:
Y i =  X i 1 e i

which we can express as a linear (in logs) regression model by taking natural
logarithms of both sides:
lnY i =  0 +  1 lnX i +  i
where 'ln' denotes the natural log, ‘e’ is the natural number (i.e., e = 2.71828) and
 0 = ln 
The model is linear in the logarithms, even though it was originally nonlinear in
terms of both the variables and parameters. Also referred to as a Double-Log or
Log-Log model.
If the classical assumptions are fulfilled, then we can estimate the parameters
using OLS by letting:
*
*
Y i =  0 + 1 X i +  i
where:
*
Y i = lnY i
*
X i = lnX i
The estimates are BLUE. This is useful specification for a regression model,
because the slope coefficient can be interpreted as an ‘elasticity’. Using calculus:
dY / Y dY X % Y
=
=
= 1
dX / X dX Y % X
The assumption is that elasticity is constant.
A numerical example. Coffee demand function.
2
lˆnY t = - .7774 - .2530 lnX t R = .7448
(.0152) (.0494)
where Yt = Coffee consumption in cups per day.
Xt = Coffee price per pound.
The price elasticity is -0.253, implying that for a 1% increase in the price of coffee,
the quantity of coffee demanded (as measured by cups consumed each day)
decreases by 0.253%.
Should also mention that the coefficients of determination between two
regressions with different dependent variables cannot be compared. For example,
here the R2 is .7448. Suppose we estimated the regression without the logs (i.e.,
we regressed cups of coffee against the per pound cost of both coffee and tea). If
the R2 for this regression was .6519, we couldn't say that the log-linear regression
had a 'better fit'.
The Log-Lin Regression Model
Take an example from labour economics. The theory of human capital investment
says that individuals will invest in education because it raises their productivity,
and higher productivity raises their potential wages in the labour market.
Wi =Y0 e
 1S i

ei
Taking the logs of both sides.
lnW i =  0 + 1 Si +  i where  0 = lnY 0
where W is income or earnings, and S is the number of years of schooling
(education). Y0 represents earnings in the absence of all education. This is known
as a Semilog regression model, because only one variable (in this case the
dependent variable) is written as a log. This is also expressed as a Log-Lin model
(a Lin-Log model has the independent variable as the only log).
In this model, the slope coefficient measures ' ... the constant proportional change
in W for a given absolute change in X.' In this case, this is the percentage change
in earnings for a one-year increase in educational attainment.
Numerical example:
2
lˆnW i = 2.574 + .085 S i R = .215
(.339) (.009)
The estimated coefficient on schooling indicates that the ‘incremental impact’ of
a year of education is to raise earnings by 8.5%.
The Polynomial Form
Take another example from labour economics.
2
Earningsi =  0 +  1 Agei   2 Agei +  i
This model can produce slopes that changes as the independent variable changes.
dEarnings i
=  1  2 2 Agei
dAgei
The Inverse Form
Take an example from macroeconomics.
W t =  0 + 1 /Ut +  t
This model can produce slopes that changes as the independent variable changes.
dWt
2
=  1 /Ut
dU t
So the slope changes as U changes. As U t is getting larger and larger, Wt is
getting closer and closed to the constant  0 .
III.
Problems with Adopting Wrong Functional Forms.
Suppose we estimate:
lnW i =  0 +  1 S i +  i
but the 'true' model is:
2
lnW i =  0 + 1 S i +  2 S i +  i
*
We want an estimate of the 'rate of return' to education. However, we assume in
our estimated regression that it is constant for each year of education. The truth
may be that it decreases with the level of education (i.e., 1 >0 and  2 <0).
The rate of return is just the partial derivative of the regression function:
 lnW i
= 1  2 2 S i
 Si
Thus, we'd get a biased estimate of the overall rate of return to education, if we
ignored the fact that it's a linear function of the level of education. The SRF is a
biased estimate of the PRF, because the wrong functional form was adopted from
the outset.
0
2
4
6
lnW
8
10
12
14
IV. Dummy Independent Variables
0
2
4
6
S
8
10
Dummy variables are 'discrete' and 'qualitative' (e.g., male or female, in the labour
force or not, working under a collective or individual employment contract,
renting or owning your home). Units of measurement are ‘meaningless’.
Normally 1 is assigned to the presence of some characteristic or attribute; 0 for the
absence of that characteristic or attribute.
EXAMPLE: A regression model of labour market discrimination by gender.
Y i =  0 +  1 S i +  2 Gi +  i
where Yi = annual earnings
Si = years of education.
Gi = 1 if ith person is a male
0 if ith person is a female.
No special estimation issues as long as the regression meets the all the classical
assumptions. Only the nature of the independent variables has changed.
The expected salary of a female is:
E ( Y i | S i , Gi = 0 ) =  0 +  1 S i
The expected salary of a male is:
E ( Y i | S i , Gi = 1 ) =  0 +  1 S i +  2
= (  0 +  2 ) + 1 Si
Since E(  i | Si, Gi)=0. Testing for discrimination (i.e., H0: β2=0) is a test for a
difference in the intercept terms.
Watch for the Dummy Variable Trap: Suppose we estimate the following:
Y i =  0 + 1 Si +  2 F i +  3 M i +  i
where Fi = 1
0
Mi = 1
0
if ith person is female
if ith person is male
if ith person is male
if ith person is female
This is known as the 'Dummy Variable Trap'. We're including redundant
information in the regression. Suppose the sample looks like this:
Page 7
Constant
Fi
Mi
1
1
0
1
0
1
1
1
0
1
0
1
1
1
0
1
1
0
1
0
1
The problem is that the two dummies are a linear function of the constant (i.e.,
Fi+Mi = 1). Perfect multicollinearity. Violates Assumption (6). We’ll see in Ch8
that the estimated coefficients and their standard errors can’t be computed.
The solution is simple -- drop a dummy variable or the constant term.
Rule of Thumb: If you have 'm' categories, then use 'm-1' dummies.
Slope dummy variables: We could allow for differences in these returns by
adding an 'interacted' variable:
Y i =  0 +  1 S i +  2 Gi +  3 Gi  S i +  i
This is a more 'flexible' specification.
The expected salary of female is:
E ( Y i | S i , Gi = 0 ) =  0 +  1 S i
The expected salary of male is:
E ( Y i | S i , Gi = 1 ) = (  0 +  2 ) + (  1 +  3 ) S i
We now have both a 'composite' intercept term and slope coefficient for male.
Page - 8
If β2>0, then male regression line has a higher intercept.
V. How to Detect the Problem of Adopting a Wrong Functional Form?
Plot the residuals and look for 'distinct pattern'. If there is a systematic pattern
between ei and Xi, a different function form is called for. If there is a systematic
pattern between ei and a dummy variable, a dummy variable is needed.
VI. Questions for Discussion: Q7.9
VII. Computing Exercise: Q7.16 (Johnson, Ch 7)
Download