How to Predict the Value of Y using a Regression Involving ln(Y) In predicting the level of a variable based on a regression in which the log of the variable is the dependent variable, the straightforward procedure of taking the antilog of Predicted ln y t is not quite right. In this How To document, we explain why and how to do the job correctly. The difficulty is related to the distinction between finding an expected value (a forecasted value) when we are dealing with the original equation and finding an expected value when we are working with a transformation of the original equation. Here is the assumed data generation process: ln yt 0 1 t t . Suppose we know 0 and 1 . Then to predict ln y 30 , we substitute in 30 for t and obtain: Predicted ln y30 E 0 1 30 30 0 1 30 E 30 . Because 0 and 1 are constants, their expected value is just their value. Given that E 30 0 , the above expression reduces to Predicted ln y30 0 1 30. When we have estimates b0 and b1 for 0 and 1 , we simply substitute in their values to obtain our forecast: Predicted ln y30 b0 b1 30. The situation is different for predicting yt: HowtoPredictYfromLnY.doc Page 1 of 3 Predicted yt Eexp 0 1 30 30 Eexp 0 1 30 exp 30 Eexp 0 1 30 Eexp 30 In the third line in the preceding derivation, we use the fact that the exp 1 30 term is a constant and therefore independent of exp 30 . That allows us to separate the two terms. We’d like to be able to write: Predicted y30 Eexp 0 1 30 which looks like it should work because E 30 0 and exp(0) = 1. We could then simply substitute the estimated slopes b0 and b1 for the true slopes 0 and 1 . The problem is that Eexp 30 exp E 30 . The expected value of exp 30 depends on the distribution of the error terms. In general, the larger the SD of the error terms, the greater the expected value of the exponential of a mean-zero error term. For example, when the errors are normally distributed, SD 30 2 . If you are comfortable assuming that the errors and E 30 0 , E exp 30 exp 2 are normally distributed, an obvious correction is to substitute in the RMSE for SD 30 in the above expression to obtain the correction factor—we call this the normal correction. A more general procedure, which works whether or not the errors are normally distributed, is the following: first compute the antilog of the predicted log of the dependent variable for each observation—call this Exp(Predicted Ln y). Then regress the actual values of the dependent variable (the y series) on Exp(Predicted Ln y) without an intercept. Finally, use the estimated slope (call it c) as the general correction factor: Predicted yt c Expb0 b1t , t 1,T . These methods are demonstrated in the AnnualGDP.xls file of Chapter 21. In practice the choice of which procedure to use can make a difference, as the GDP example shows: the HowtoPredictYfromLnY.doc Page 2 of 3 correction factor based on the normal method is 1.00083; that based on the general method is 0.990. For more on these procedures see Wooldridge (2003), pp. 207-210. Reference: Wooldridge, Jeffrey M. (2003) Introductory Econometrics: A Modern Approach. Second Edition. Mason, Ohio: Southwestern. HowtoPredictYfromLnY.doc Page 3 of 3