Chapter 9: Dummy Variables 9.1 A Dummy Variable: is a variable that can take on only 2 possible values: yes, no up, down male, female union member, non-union member They provide a method for “quantifying” a “qualitative” variable The variable D = 1 if yes, D = 0 if no It doesn’t matter which category gets the 0 or 1. Estimation with Dummy Variables 9.2 If the dummy variable is the only independent variable: Yt = 1 + 2Dt + et If D = 0 Yt = 1 + et If D = 1 Yt = (1 + 2) + et Example: Wage data (See class handout) FE = 0 if the person is male FE = 1 if the person is female Waget = 1 + 2FEt + et Least squares regression will produce a b1 and b2 value such that b1 = the mean of the Wage values for the FE=0 values b1 + b2 = the mean of the Wage values for the FE=1 values Estimation with Dummy Variables 9.3 If there is one continuous explanatory variable and one dummy variable: Yt = 1 + 2Xt + Dt + et If D = 0 Yt = 1 + 2Xt + et Suppose that If D = 1 Yt = (1 + ) + 2Xt + et 1 >0, 2 >0, > 0 It is as though we Y have two regression lines that have the same slope 2 coefficient but have 1 + 2 difference intercepts. 1 X Estimation with Dummy Variables 9.4 Example: Wage data (See class handout) FE = 0 if the person is male FE = 1 if the person is female Waget = 1 + 2EDt + 3FEt + et We estimate this model as an ordinary multiple regression model. Our estimate b3 will measure the difference in wages for males vs. females, after controlling for differences in education. See class handout. Interaction Terms 9.5 An interaction term is an independent variable that is the product of two other independent variables. These independent variables can be continuous or dummy variables Yt = 1 + 2Xt + 3Zt + 4XtZt + et In this model, the effect of X on Y will depend on the level of Z. In this model, the effect of Z on Y will depend on the level of X. Interaction Terms Involving Dummy Variables 9.6 Yt = 1 + 2Xt + 3Dt + 4DtXt + et If D = 0 Yt = 1 + 2Xt + et If D = 1 Yt = (1 + 3 ) + (2+ 4 )Xt + et Y 2+4 1 + 3 1 2 X Suppose that 1 >0, 2 >0, 3 >0, 4 >0 It is as though we have two regression lines that have different slope coefficients and different intercepts. 9.7 Dependent Variable: lnwage Analysis of Variance Sum of Mean Squares Square F Value Pr > F 36.74586 36.74586 122.43 <.0001 159.67604 0.30014 196.42191 0.54785 R-Square 0.1871 2.80361 Adj R-Sq 0.1855 19.54101 Parameter Estimates Parameter Standard Estimate Error t Value Pr > |t| 1.33757 0.13460 9.94 <.0001 0.10673 0.00965 11.06 <.0001 Source DF Model 1 Error 532 Corrected Total 533 Root MSE Dependent Mean Coeff Var Variable Intercept ed DF 1 1 ****************************************************************************** Dependent Variable: lnwage Source Model Error Corrected Total DF 3 530 533 Root MSE Dependent Mean Coeff Var Variable Intercept ed feed female DF 1 1 1 1 Analysis of Variance Sum of Mean Squares Square 45.65281 15.21760 150.76910 0.28447 196.42191 F Value 53.49 Pr > F <.0001 0.53336 R-Square 0.2324 2.80361 Adj R-Sq 0.2281 19.02397 Parameter Estimates Parameter Standard Estimate Error t Value Pr > |t| 1.54743 0.17989 8.60 <.0001 0.09993 0.01306 7.65 <.0001 0.02233 0.01885 1.18 0.2368 -0.56090 0.26344 -2.13 0.0337