Notes: Using a Dummy Variable

advertisement
Using a Dummy Variable
Because models describe generalities, a model will not usually predict an unusual change
in a variable, resulting in a large “error” or difference between the model and the actual level of
the variable. Because the “t-stat” associated with an estimated coefficient is equal to the value of
the coefficient divided by the standard error, any increase in the error makes in more likely that
the t-stat will be close to zero, which implies it is more likely that the coefficient is judged
“insignificantly different than zero.”
By reducing the error associated with unusual changes in
a variable, the introduction of a “dummy variable” can increase t-stats and thereby increase the
likelihood that a coefficient is judged “significantly different than zero.” This set of notes
demonstrates the use of a dummy variable in the process of determining whether or not the U.S.
employment growth rate is decreasing.
Exponential Growth and the Log Difference
If a variable y is growing at a constant rate, starting at an initial level y 0 , then we can
write
(1)
y t  y 0 e rt
This implies for some earlier point in time t  1 , we can write
(2)
yt 1  y0 e r t 1
Taking the natural log of (1), we have ln  yt   ln  y0   rt . Taking the natural log of (2), we
have ln  yt 1   ln  y0   r t  1 .
Subtracting the latter equation from the former, we can
construct the “log difference” of the variable, ln  yt   ln  yt 1  , and doing so, we find
(3) ln  yt   ln  yt 1   ln  y0   rt   ln  yt 1   r t  1 ,
which reduces to
(4) ln  yt   ln  yt 1   r .
From conditions (1)-(4), we learn the log difference is equal to the (continuously
compounded) growth rate of a variable over the time period. It is common to use the log
difference as the measure of the growth rate of a variable, rather the percentage change
yt  yt 1  / yt 1 .
1
The Growth Rate of Employment in the U.S.
Letting Lt denote the level of employment during period t , so ln Lt   ln Lt 1  is the growth rate
of employment. The following figure presents the plot of this employment growth rate for the
U.S. over the 1948-2006 period.
U.S. Employment Growth Rate
8.0%
Percent Change
from Previous
Year
6.0%
4.0%
2.0%
0.0%
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
-2.0%
-4.0%
One question of interest is whether the employment growth rate is changing on average
over time. Clearly, examining the figure above, the growth rate exhibits some volitility,
meaning it changes dramattically on occasion. However, is the employment growth rate trending
up or down? To answer this last question, we fit the following first order polynomial model to
the growth rate data.
(5)
g Lt  a0  a1t ,
where g Lt is the employment growth rate during period t as measured by the log difference.
Regressing this growth rate on the time variable t , we obtain
(6)
g Lt  0.022 
(0.005)***
0.00014t ,
(0.0002)
R 2  .0157
where the standard error is presented in parenthesis under each estimated coefficient.
The
estimated model (6) indicates that the growth rate of employment is decreasing slightly over
time, but this decreasing trend is “not significant.” In particular, the standard error of 0.0002 is
2
large relative to the coefficient estimate of -0.00014 on the t variable, implying a relatively
small t-stat of 0.0002/(-0.00014)=-0.94. (The t-stat is reported by the regression package, but
since it can be obtained by dividing as we have done here it is common practice to present the
standard error or t-stat but not both.) Roughly, t-stats in the range [-2,2] indicate nonsignificance, while t-stats outside this range indicate significance. Thus, the “test statistic” of 0.94 indicates non-significance. The p-value reported for the -0.00014 estimate is .35,
confirming the non-significance conclusion. We have only 65 percent confidence that the
-0.00014 estimate is significantly different than zero, and we want at least 90 percent confidence.
The p-value is not reported, but the absence of asterisks on the standard error indicates that we
have not obtained the 90 percent confidence level.
Introducing a Dummy Variable
Notice in the figure above that the employment growth rate is negative in only a few
instances. Examining the data, we find a negative employment growth rate for the years 1949,
1954, 1958, 1970, 1971, 1975, 1982, 1991, 2001, 2002, and 2003. If we think of these years as
being “unusual” recessionary years, then it is reasonable that we try to adjust our thinking about
the trend so that we do not give these unusual periods undue influence. A way to make an
adjustment is by introducing a “dummy variable.”
Define a new variable D that is equal to 1 during a year when the employment growth
rate is negative and equal to zero for all the other years. This dummy variable is also referred to
as an “indicator” variable because the value of 1 indicates something, in this case negative
employment growth. When we include the dummy variable in the regression our polynomial
model for the employment growth rate becomes
(7)
g Lt  a0  a1t  as D .
Estimating this model, we obtain
(8)
g Lt  0.027  0.00012t
(0.003)*** (0.0001)

0.037D ,
(0.0046)***
R 2  .5483
Comparing (6) and (8), notice that the introduction of the dummy variable dramatically
increased the R 2 . This indicates that much of the error in the model (6) is due to the unusual
years where the employment growth rate dipped negative. The coefficient on the time variable
t still hints at a downward trend in the employment growth rate, but the fact that there are no
asterisks on the standard error indicates that this downward trend is not significant. Thus, while
the introduction of the dummy variable can change the significance of another coefficient, it did
not do so in this case.
Note, however, that the addition of the dummy variable did increase the value of the
intercept term from 0.021 in model (6) to 0.027 in model (8). This is important. Because we do
not find significant evidence of a time trend, we should drop the variable t from the model.
3
This implies the model (5) reduces to g Lt  a0 , which indicates the growth rate is constant. The
estimate of this constant is simply the average of the growth rates. For the data we have, we find
(9)
g Lt  0.176
Dropping the variable t from the model (7) and estimating the resulting model by regression the
employment growth rate on the dummy variable we find
(10)
g Lt  0.0240
 0.037D ,
(0.0019)*** (0.0046)***
R 2  .5372
The figure below shows what can also be seen in the estimated models (9) and (10). If
we do not adjust for the unusual down time periods, our estimate for the average growth rate of
employment is 1.76 percent per year. Alternatively, when we adjust for the unusual down times,
our estimate for the average employment growth rate is 2.40 percent per year. This is a rather
large quantitative difference. In our tests for trends in the models (6) and (8), we cannot find
evidence of a significant downward trend in the employment growth rate.
4
Download