Rheinisch-Westfälisches Institut für Wirtschaftsforschung „Implementing Restricted Least Squares in Linear Models“ Dr. John P. Haisken-DeNew jhaiskendenew@rwi-essen.de Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 1 Inter-Industry Wage Differentials - Why do secretaries in the steel industry make more money than otherwise observably identical secretaries in the services industry? - Calculating „wage differentials“: Wages in steel > services ? - Dummy Variables: 0 or 1 Starting Point Krueger/Summers (1988) „Efficiency Wages and the Inter-Industry Wage Structure“, Econometrica, 56, p 259-93. - Would like to interpret differentials as deviations from a weighted average - Remove arbitrary selection of reference category - Excellent seminal paper, however technical problems … - Attempt to implement Restricted Least Squares (RLS) but.. - Incorrect standard errors: t-values systematically biased downward - Incorrect overall inference: Variation systematically biased downward Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 2 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 1a. Background Rheinisch-Westfälisches Institut für Wirtschaftsforschung 1b. Background Technical Contribution (in Handout) Haisken-DeNew/Schmidt (1997) „Inter-Industry and Inter-Regional Differentials: Mechanics and Interpretation“, Review of Economics and Statistics, 79(3), p. 517-21. - How to implement Restricted Least Squares (RLS) correctly - How to implement RLS after any linear model (OLS, FE, RE…) - RLS was implemented in GAUSS, LIMDEP and Stata (crudely) Now RLS is implemented in Stata in a flexible Ado <hds97.ado> - What does the syntax look like? Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 3 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 2a. RLS <hds97.ado> - One Dummy Set Run a linear regression reg/xtreg depvar indepvars Standard Syntax (only ONE dummy set) hds97 indepvars [, options] options description refname( string ) a string containing the name of the "reference" category realname( string ) a string containing a descriptive name for the set of dummy variables weight( varname ) a string containing the name of the weighting variable Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 4 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 2b. RLS <hds97.ado> - Many Dummy Sets Run a linear regression reg/xtreg depvar x* Xvar_1 Zvar_1 Zvar_2 Dvar_* XXLvar_* Advanced Syntax (MANY dummy variable sets) global hds97_1 Xvar_1 Xvar_ref descriptive_name_for_X global hds97_2 Zvar_1 Zvar_2 Zvar_ref descriptive_name_for_Z global hds97_3 Dvar_* ... global hds97_50 XXLvar_* Dvar_ref XXLvar_ref descriptive_name_for_D descriptive_name_for_XXL (up to 50 globals/constraints can be set) Xvar_1 is a regressor used in regress or xtreg previously Xvar_ref is a text name for the reference category descriptive_name is a descriptive text name of the dummy set hds97 [, weight(wgt_var_name)] Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 5 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 2c. RLS <hds97.ado> Output created by <hds97.ado> (A) Original Regression (OLS, RE, FE etc) repeated (B) Each Dummy Variable Group using RLS is calculated - From “k-1” Dummy Variables: “k” Coefficients reported (C) Weighted Standard Deviation (Sampling Corrected) of RLS Betas - Measure of overall variation (D) F-Tests of Joint Significance - Are the dummy variables as a group significant (E) Sample Shares of each Dummy - What were the sample shares used to create the weighted average - From the weighted average, the deviations are calculated (see B) Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 6 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3. Illustrative Example (in Handout) American Current Population Survey (CPS) - Use freely available January 2004 CPS sample - http://www.nber.org/morg/annual/morg04.dta Run simple wage regression (age 18-65) - log hourly wages = f (age, gender, race, marital status, state) Dummy Indicators - gender: male, female - race: white, black, other - marital status: married, divorced, separated, single - states: AK, AL… WY Selecting arbitrary dummy variable as reference - Which one? Makes no difference in the calculation, just in interpretation With RLS, interpret the dummy variables as deviations from a weighted average as opposed to an arbitrary reference category If logged wages, then interpretation: %-point deviations from average Use <hds97.ado> to implement RLS Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 7 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3. Sample Regression Output (in Handout) . regress lhw age genderm raceb raceo msmar msdiv mssep Source | SS df MS Number of obs = 8417 -------------+-----------------------------F( 7, 8409) = 181.36 Model | 242.712792 7 34.673256 Prob > F = 0.0000 Residual | 1607.68867 8409 .191186665 R-squared = 0.1312 -------------+-----------------------------Adj R-squared = 0.1304 Total | 1850.40146 8416 .219867093 Root MSE = .43725 -----------------------------------------------------------------------------lhw | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .00861 .0004585 18.78 0.000 .0077112 .0095088 genderm | .1737988 .0095849 18.13 0.000 .1550101 .1925876 raceb | -.0730053 .0162526 -4.49 0.000 -.1048645 -.0411462 raceo | -.0131488 .0193254 -0.68 0.496 -.0510315 .0247338 msmar | .1365145 .0125807 10.85 0.000 .1118532 .1611758 msdiv | .1014927 .0180303 5.63 0.000 .0661489 .1368365 mssep | .0237369 .0341694 0.69 0.487 -.0432435 .0907174 _cons | 6.5783 .016593 396.45 0.000 6.545774 6.610826 ------------------------------------------------------------------------------ . global . global . global hds97_1 hds97_2 hds97_3 genderm genderf raceb raceo racew msmar msdiv mssep mssgl gender race marital . hds97 Name of reference Haisken-DeNew / Stata 2006 Mannheim description March 31, 2006 8 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3a. Gender (2-Way) Gender Wage Differentials 2-Way (SD=0.0867) 0,20 0,15 Wage Differential 0,10 0,05 0,00 male female female male -0,05 -0,10 -0,15 -0,20 Ref=Female Ref=Male Haisken-DeNew / Stata 2006 Mannheim Restricted Least Squares March 31, 2006 9 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3b. Race (3-Way) Race Wage Differentials 3-Way (SD=0.0205) 0.20 0.15 0.10 Wage Differential white 0.05 other white 0.00 white other other -0.05 black black black -0.10 -0.15 -0.20 Ref=Black Ref=White Haisken-DeNew / Stata 2006 Mannheim Ref=Other Restricted Least Squares March 31, 2006 10 Marital Status Wage Differentials 4-Way (SD=0.0609) 0.20 0.15 Wage Differential 0.10 married divorced 0.05 married married separated divorced 0.00 divorced divorced -0.05 -0.10 separated single -0.15 separated single separated single separated single -0.20 Ref=Single Ref=Married Haisken-DeNew / Stata 2006 Mannheim Ref=Divorced Ref=Seprated R-L-S March 31, 2006 11 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3c. Marital Status (4-Way) Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3d. State of Residence (51-Way) Ref=Hi State Wage Differential 51-Way (Reference=Alaska) 0.6 Wage Differential 0.4 0.2 0 CT -0.2 MN NHNJ MI WA NY VA MO IL LA WI ME NV PA GAHI VT RI MD MT ND OH TN WY KS IA NE OR FL SC ID IN KY SD TXUT NC NM OK MS WV DCDE AL AZCACO -0.4 AR MA -0.6 American States (Ordinary Least Squares) Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 12 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3d. State of Residence (51-Way) Ref=Lo State Wage Differential 51-Way (Reference=Arkansas) 0.6 AK Wage Differential 0.4 0.2 0 CT MN NHNJ MI WA NY VA MO IL LA ME WI NV PARI GAHI VT MD TN WY MT NDNE OH OR KS IA FL SC ID IN KY SD TXUT NC NM OK MS WV DCDE ALAZCACO MA -0.2 -0.4 -0.6 American States (Ordinary Least Squares) Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 13 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 3d. State of Residence (51-Way) State Wage Differentials 51-Way (SD=0.0684) 0.6 0.4 Wage Differential AK CT 0.2 MN NHNJ MI WA NY VA MO IL LA ME HI NV PA GA VT WI MD TN WY MT NDNE OH OR RI KS IA FL SCSD ID IN KY NC NM UT TX OK MS WV MA DCDE 0.0 AL CA AZ CO AR -0.2 -0.4 -0.6 American States (Restricted Least Squares) Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 14 Rheinisch-Westfälisches Institut für Wirtschaftsforschung 4. Conclusions RLS: Interpretation of Dummy Variables - Even with a small dimension, RLS intuitive interpretation - Remove arbitrariness of reference category - Allow for importance weighting of each category Easily Implemented with <hds97.ado> - Can be used after regress or xtreg and coefficients calculated - Useful additional statistics calculated Flexible use - Transform a single set of dummy variables - Transform up to 50 sets of dummy variables at once Areas of Application - Wage Differentials by: Region, Industry, Occupation, Education, Marital Status, Race, etc… Haisken-DeNew / Stata 2006 Mannheim March 31, 2006 15