The Exponential Model The exponential model is a useful tool because in its most typical “first order” form it traces out the path of a variable that is growing at a constant rate. By generalizing the model, the assumption of constant growth can be dropped, allowing the model to better fit the path of a variable that is not growing at a constant rate. The First Order Exponential Model The “first order” exponential model generates a path with a constant growth rate. Letting r represent this constant rate and letting y denote the value of the variable that is growing, the value of the variable at a point in time t is given by (1) y t y 0 e rt , where y 0 is the initial value of the variable at point in time t 0 , and r is the constant rate at which the variable is growing. The continuous rate of growth of the variable y is represented by the quantity 1 dy . The y dt quantity yt in equation (1) represents the value of the variable y at time t , so we can also write yt as y (t ) . The representation y (t ) emphasizes that we are thinking that the value of the variable y is a function of the time t . If this function is differentiable, then it is common to write the derivative as y ' (t ) dy / dt . Since y(t ) yt , it follows that we obtain the derivate y ' (t ) for equation (1) by taking the derivative of the right side of the equation with respect to t. Given data on the variable y , the least squares regression technique can be used to obtain an estimate of the constant growth rate r. This estimate provides a characterization of the trend followed by the variable, in particular the best fit constant growth rate. Because least squares is a “linear regression” technique, we must “linearize” the model before we can apply least squares. This can be accomplished by taking the natural log of both sides of equation (1). Doing so, we obtain (2) ln( yt ) ln( y0 ) rt . Notice that this “logarithmic transformation” moved the variable r from a position of being an exponent on the number e in equation (1) to being a coefficient on the variable t in equation (2). In equation (2), the fact that r is an exponent implies that there is a non-linear relationship between r and yt ; i.e., you would not get a straight line if you plotted the relationship between yt and r. However, in equation (2), the fact that r is a coefficient implies 1 that there is a linear relationship between r and ln( yt ); i.e., you would get a straight line if you plotted the relationship between ln( yt ) and r. Because of this linear relationship, we can obtain a least squares estimate for r by regressing ln(yt ) on t. The Second Order Exponential Model The “second order” exponential model generates a path where the growth rate of the variable may change: (3) yt y0 e r1t r2t 2 Taking the natural log of (3) we have ln( yt ) ln( y0 ) r1t r2t 2 . Differentiating with respect to time t , we obtain 1 / yt dyt / dt r1 2r2t . Letting gt denote the growth rate of the variable y , letting a0 r1 , and letting a1 2r2 , we can rewrite the last equation as (4) g t a0 a1t . Equation (4) is a first degree polynomial model. Comparing equations (3) and (4), we find that a second degree exponential model for the level becomes a first degree polynomial model for the growth rate. In general, when differentiated with respect to time, an exponential model of degree k becomes a polynomial model of degree k-1. In particular, a fourth degree exponential model would be written (5) yt y0 e r1t r2t 2 r3t 3 r4t 4 Taking the natural log of (5) and differentiating with respect to time t , we obtain (6) g t a0 a1t a2t 2 a3t 3 , where gt is again the growth rate of y , a0 r1 , a1 2r2 , a2 3r3 , and a1 4r3 . Equation (6) is a relatively general model in that it allows the growth rate to vary in a flexible manner. Note that, if a1 a2 a3 0 , then the growth rate is constant and equal to g t a0 . If a2 a3 0 , but a1 0 , then the growth rate either increases over time (if a1 0 ) or decreases over time (if a1 0 ). If a3 0 , but a1 0 and a2 0 , then the growth rate can increase and then decrease, or vice versa. Finally, if a1 0 , a2 0 , and a3 0 , then the 2 growth rate can change directions twice, which is about as much flexibility as you would want for any model. By estimating the parameters in the exponential model, it is possible to create a model that fits the level of the variable that is growing, and this model can be used to forecast the level. To illustrate, consider the following figure, which presents an employment measure called FullTime Equivalent Employees. Clearly, the employment level has increased over time, though it is also evident that there are some years in which the employment level decreases. 200,000 Thousands Employed 150,000 U.S. Full Time Equivalent Employees 1948-2006 100,000 50,000 0 19451950195519601965197019751980198519901995200020052010 To obtain a model that fits this time series, we begin by using the general exponential model (5). Because it is nonlinear in the parameters, we must first take the natural log of model (5) to linerize it. Doing so, we obtain, (7) ln( yt ) ln( y0 ) r1t r2t 2 r3t 3 r4t 4 . Because there is a linear relationship between the ln( yt ) and the parameters r1 , r2 , r3 , and r4 , estimates for these four parameters and an estimate for the constant ln( y0 ) can be obtained by regressing the ln( yt ) on the variables t , t 2 , t 3 , and t 4 . The data plotted in the figure starts in 1948 and runs to the year 2006. The variable t runs from zero, which is associated with the year 1948, up to 58, which is associated with the last year 2006. The variables t 2 , t 3 , and t 4 are constructed by squaring, cubing, and quadrupling the variable t . The variable ln( yt ) is constructed by taking the natural log of the employment variable. The following graphic displays the output obtained from Excel from the regression. The R Square, which ranges between 0 and 1, measures the percentage of of the variance in the 3 dependent variable ln( yt ) that is explained by the regression. R 2 .9955 indicates that 99.55 percent of the variance is explained. This is a very high R square, but is typical for this type of regression where the model is fitting a trend. It is important to understand that, while it does provide information about how well the model fits the data, the R square should generally not be used to determine the quality of the model. SUMMARY OUTPUT Regression Statistics Multiple R 0.997743599 R Square 0.995492289 Adjusted R Square 0.995158385 Standard Error 2016.212786 Observations 59 ANOVA df Regression Residual Total Intercept t t2 t3 t4 4 54 58 SS MS F 48478406922 12119601731 2981.368 219516156 4065114 48697923078 Coefficients Standard Error t Stat 51863.38621 1189.562957 43.59868966 804.5338063 289.3534157 2.780453807 15.01072125 20.54046926 0.73078765 0.278632597 0.534288411 0.521502227 -0.005357015 0.004568738 -1.172537265 P-value 9E-44 0.007457 0.468068 0.604149 0.246128 The coefficient column presents the estimates of the parameters. The intercept, which is also referred to as the constant, is the estimate of ln( y0 ) in equation (7). The other four numbers in the column are the estimates of r1 , r2 , r3 , and r4 . The summary output presented in this graphic is not typically presented. More typically, the estimated equations is presented in something like the following manner: (8) ln( yt ) 51,863 805t 15.0t 2 (1,190)*** (289)*** (20.5) 0.28t 3 (0.53) 0.005t 4 , (0.005) R 2 0.9955 Underneath each estimate, the standard error associated with the estimate is presented in parenthesis. Some researchers would present the t-stat rather than the standard error. The two are interchangable in a sense because there is a relationship between them: The t-stat is equal to the coefficeint divided by the standard error. The t-stat is a test statistic associated with the null hypothesis that the coefficient is zero. The estimate for the coefficient will (almost surely) never actually be precisely zero. The question is how close the coefficient is to zero. The standard error is a measure of the lack of fit associated with the coefficient. The t-stat will be near zero when the coefficient itself is near zero, or when the standard error is large. When the t-stat is far from zero, we would tend to want to reject the hypothesis that the coefficient is zero. We say the coefficient is statistically insignificant if it is close enough to zero that we fail to reject the null hypothesis that the coefficient is zero. The p-value is the level of significance 4 at which we can reject the null hypothesis. For example, p-value of 0.246128 for the coefficient estimate -0.005 for r4 , the coefficient on t 4 , indicates we can reject the hypothesis that r4 0 at a 25% level of significance. This is the same as saying we can reject the hypothesis with 75 percent confidence. Typically, at least 90 percent confidence is desirable to reject. Thus, in this case the p-value of 0.246128 indicates we should fail to reject the hypothesis, meaning we conclude the r4 estimate is insignificant, which is the same as saying it is not significantly different than zero (even though it actually is -.005). The commonly used levels of significance are 10 percent, 5 percent, and 1 percent. Note that the p-value for the intercept estimate is 9E-44. This is a very small number, a decimal point followed by 43 zeros until you reach a 9. Thus, we are 99.99999…. percent confident that the intercept is not zero, so we confidently reject the null hypothesis that it is zero. One common way of indicating the level of confidence at which a null hypothesis can be rejected is by attaching asterisks to the standard errors as shown in (8). Three asterisks (***) indicates 99 percent confidence, or rejection at a p-value less than or equal to 0.01. Two asterisks (**) indicates 95 percent confidence, or rejection at a p-value less than or equal to 0.05 but greater than 0.01. One asterisk (*) indicates 90 percent confidence, or rejection at a p-value less than or equal to 0.10 but greater than 0.05. Applying this method to (8), the lack of asterisks on the last three standard errors indicates the estimates for r2 , r3 , and r4 are insignificant, while the three asterisks for the intercept and for r1 indicate that we are very confident these estimates are not zero. Because we find that our estimate for r4 is insignificant, the common practice is to drop the variable t 4 from the regression and re-estimate. Doing so, we obtain, (9) ln( yt ) 52,643 513t 38.1t 2 0.34t 3 , (990)*** (149)*** (6.00)*** (0.68)*** R 2 0.9954 In (9), the asterisks indicate that all of the estimates are significant at the 1 percent level of signficance, meaning we have at least 99 percent confidence that all of the estimates are signficantly different than zero. Notice the R square dropped, which will happen when you drop an explanatory variable. However, the R square typically will not drop much when you drop and insignificant variable because the variable had close to zero explanatory power. Here, indeed we see the R square did not drop much. The following figure plots the model (9) along with the actual data. 5 U.S. Full Time Equivalent Employees 1948-2006 200,000Thousands Employed 150,000 100,000 50,000 0 19451950195519601965 197019751980 198519901995200020052010 Actual Model Because the model (9) is based only upon the time variables, it can be used to forecast. This is accomplished by extending the time variables and then extending the model. The following figure does this for 10 additional years, so we obtain a forecast to the year 2016. From our model, we find that the estimated employment level for 2016 is 155,721,500. One can see cubic of the polynomial (9) at work. The third order of the polynomial is what allows the model to first increase at an increasing rate, but than increase at a decreasing rate. U.S. Full Time Equivalent Employees 1948-2006 200,000Thousands Employed 150,000 100,000 50,000 0 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 Actual Model 6