The Semi-Log Specification. AKA Log

advertisement
The Semi-Log Specification. AKA Log-linear
Specification
Many times it is convenient, in order to provide a
better interpretation of the results, to transform some
of the variables we analyze in our econometric models.
One common transformation results from assuming
that the true relationship between Y and X is
Y β βX
e 0 e 1 (1)
Taking natural logarithms of both sides of the equation we obtain:
lnY β0 β1 X (2)
which is a linear relation between lnY and X.
The usefulness of the semi-log relation arises from
the ease of interpreting β1. It is possible to show that
δY Y
(3)
δX
that means that the slope coefficient is the ratio of
the proportionate change in Y to the absolute change
in X.
β1 1
When the change in X is not too small, say one unit,
then we denote it as ∆X, and from the last equation
we can write
proportionate change in Y β1∆X (4)
If ∆X 1 then β1 is equal to the proportionate change
in Y that results from a unit change in X.
In our econometric model this means that we can
write:
lnYi β0 β1Xi ui (5)
where the dependent variable is now lnY and the exogenous variable is still only X.
This specification has been widely used in human
capital theory of earnings determination. The theory states that the logarithm of earnings is linearly
related to the level of educational attainment, so:
ln EARNSi β0 β1EDi ui (6)
the if for example β1 0 1, it means than an additional year of schooling increases earnings by around
10%.
2
Multiple Regression: The Two variable case
Most economic relations, and the processes they describe, involve more than one determinants of some
particular dependent variable.
For example in one of the main examples we have
discussed, education is the only variable that affects
a person’s earnings. There are obviously many other
variables that can affect a person labor earnings: age,
experience, gender, marital status, etc.
Think first of the case where we have two explanatory variables, X1 and X2. We are going to assume
these are the only two variables that affect Y , and
that they are determined outside the model. We can
then write the linear multiple regression model:
Yi β0 β1X1i β2X2i ui (7)
notice that here X1i denotes the ith observation of X1
and X2i denotes the ith observation of X2. The interpretation of the disturbance term has not changed.
Our econometric task is to estimate β0, β1, and β2.
The estimated regression decomposes each Yi between
the fitted value:
Ŷi β̂0 β̂1X1i β̂2X2i (8)
and its residual
ei Yi
Ŷi Yi
β̂0
β̂1X1i
3
β̂2X2i (9)
The OLS technique calculates the values of the unknown parameters by minimizing the sum of the square
of these residuals. Although these will be basically
be done with the help of a computer we can write
formulas for this particular, two variable, model.
To simplify the notation define the following:
y Y
Y x1 X1
X 1 and x2 X2
then the estimates are
β̂2 β̂1 X2 (10)
∑ x2y ∑ x21 ∑ x1 y ∑ x1 x2 ∑ x21 ∑ x22 ∑ x1 x2 2
(11)
∑ x1y ∑ x22 ∑ x2 y ∑ x1 x2 ∑ x21 ∑ x22 ∑ x1 x2 2
(12)
β̂0 Y
β̂2X 2 β̂1X 1
(13)
Notice that the estimators for all three coefficients
depend on all the values for all the variables. For
example β̂2 depends not only on Y and X2 but also
on X1. This means that β̂2 is different from the slope
coefficient of a regression of Y on X2. The multiple regression coefficients cannot be obtained by estimating two simple regressions, one of Y on X1 and
another of Y on X2.
The only exception to this is when X1 and X2 are
uncorrelated, meaning that the correlation is zero,
that is equivalent to ∑ x1x2 0, which in the formulas would make the slope coefficients equivalent to
those of the simple regression models.
4
The normal situation is one in which the correlation between the explanatory variables is not zero,
it might be positive or negative, in both cases this
is taken into account when estimating the unknown
parameters by OLS.
It is important to understand the interpretation of the
coefficients in this new model. For example, let’s
interpret β̂1.
We say that β̂1 is the change in Ŷ that results from
a unit change in X1, holding constant the value of
X2. This phrasing corresponds quite closely to the
concept ceteris paribus, which is commonly used in
economics. This does not mean that X2 remains constant when X1 changes, it just means that if we were
able to keep X2 constant and X1 were to change by
one unit, then Ŷ would change by β̂1.
The same interpretation applies to β̂2.
5
Multiple Regression: The General case
In this case the values of the dependent variable are
determined by several explanatory variables, or regressors. In general we say that Y depends on k explanatory variables:
Yi β0 β1X1i β2X2i βk Xki ui (14)
A typical variable is denoted by X j and a typical coefficient is β j . This technique allows us to take into
account all the relevant variables that help determine
the value of the dependent variable.
OLS here minimizes the sum of square residuals,
where the latter are quite similar to the ones we wrote
in the two variable case, but now with k variables.
The interpretation of the estimated parameters of this
model is not very different from the two variable
case. We say that β̂ j is the change in Ŷ that results
from a unit change in X j , holding constant the values
of all the other variables. Again, the latter closely
corresponds to the concept of ceteris paribus. These
are the only effects revealed when estimating multiple regression models.
Notice that the main measure of goodness of fit that
we have used, the R2 is calculated in the same way
as in the case of the simple regression
2
R 1
∑ni 1 e2i
n
2
∑i 1 Yi Y
6
(15)
However, the R2 is not very useful for comparing alternative specifications of the regression model when
we have different number of regressors. The reason is that it always gives the same answer: the regression with additional variables included fits better. This is because the addition of an explanatory
variable to an original regression model cannot raise
the sum of squared residuals. Therefore the addition of one variable to a regression model cannot decrease the R2 and for all purposes it increases it.
But the gain in the R2 comes at the cost of including
another variable. So to compare different specifications with different number of parameters to estimate
we need a measure that assesses whether the gain in
fit outweighs the cost of estimating one more coefficient for a given number of observations.
A statistic that make this comparison is the adjusted
R2, denoted by R2, or corrected R2. it is defined as:
n
2
e
k 1 ∑
i n
i
1
2
R 1
(16)
∑ni 1 Yi Y 2 n 1 Notice that when comparing specifications it all boils
down to the numerator, since the denominator does
not change. When we add one more regressor both
the sum of square of the residuals goes down, but
also n k 1 , so the change in the R2 depends on
whether the sum of square residuals decrease pro
portionately more or less than n k 1 .
Most software programs report R2, so we can compare the fit of different regressions.
7
Download