Predicting Y with a Natural Log Functional Form

advertisement
How to Predict the Value of Y using a Regression Involving ln(Y)
In predicting the level of a variable based on a regression in which the log of the variable is the
dependent variable, the straightforward procedure of taking the antilog of Predicted ln y t is not
quite right. In this How To document, we explain why and how to do the job correctly.
The difficulty is related to the distinction between finding an expected value (a forecasted
value) when we are dealing with the original equation and finding an expected value when we
are working with a transformation of the original equation.
Here is the assumed data generation process:
ln yt   0  1  t   t .
Suppose we know  0 and  1 . Then to predict ln y 30 , we substitute in 30 for t and obtain:
Predicted ln y30  E  0  1  30   30 
  0  1  30  E  30 .
Because  0 and 1 are constants, their expected value is just their value. Given that
E  30   0 , the above expression reduces to
Predicted ln y30   0  1  30.
When we have estimates b0 and b1 for  0 and 1 , we simply substitute in their values to obtain
our forecast:
Predicted ln y30  b0  b1  30.
The situation is different for predicting yt:
HowtoPredictYfromLnY.doc
Page 1 of 3
Predicted yt  Eexp  0  1  30   30 
 Eexp  0  1  30  exp  30 
 Eexp  0  1  30  Eexp  30 
In the third line in the preceding derivation, we use the fact that the exp    1  30 term is a
constant and therefore independent of exp  30  . That allows us to separate the two terms.
We’d like to be able to write:
Predicted y30  Eexp  0  1  30
which looks like it should work because E 30   0 and exp(0) = 1. We could then simply
substitute the estimated slopes b0 and b1 for the true slopes  0 and 1 . The problem is that
Eexp  30   exp E 30 . The expected value of exp  30  depends on the distribution of the error
terms. In general, the larger the SD of the error terms, the greater the expected value of the
exponential of a mean-zero error term. For example, when the errors are normally distributed,
 SD 30 2 
. If you are comfortable assuming that the errors
and E 30   0 , E exp  30   exp 

2


are normally distributed, an obvious correction is to substitute in the RMSE for SD 30  in the
above expression to obtain the correction factor—we call this the normal correction.
A more general procedure, which works whether or not the errors are normally
distributed, is the following: first compute the antilog of the predicted log of the dependent
variable for each observation—call this Exp(Predicted Ln y). Then regress the actual values of
the dependent variable (the y series) on Exp(Predicted Ln y) without an intercept. Finally, use
the estimated slope (call it c) as the general correction factor:
Predicted yt  c  Expb0  b1t , t  1,T .
These methods are demonstrated in the AnnualGDP.xls file of Chapter 21. In practice the
choice of which procedure to use can make a difference, as the GDP example shows: the
HowtoPredictYfromLnY.doc
Page 2 of 3
correction factor based on the normal method is 1.00083; that based on the general method is
0.990. For more on these procedures see Wooldridge (2003), pp. 207-210.
Reference:
Wooldridge, Jeffrey M. (2003) Introductory Econometrics: A Modern Approach. Second
Edition. Mason, Ohio: Southwestern.
HowtoPredictYfromLnY.doc
Page 3 of 3
Download