An Analysis of the Impact of Union Membership on Wages Using Propensity Score Matching Brody Tyler Albregts Department of Economics Indiana University-Purdue University at Indianapolis 4/25/12 1. Introduction Unions have been a powerful force in the labor market in the United States for well over a century, beginning with the Knights of Labor in the late 1800s and reaching a peak in the mid-twentieth century. Since then, economists have analyzed how unions affect workers’ earnings, and continually used new statistical techniques to uncover the benefits, or costs, of labor unions. While it is widely accepted that union members have increased earnings, the increased amount has been called into question. The goal of this paper is to use propensity score matching to determine what hourly wage gains, as a percentage, members receive from joining union organizations. Data were downloaded from the Panel Study on Income Dynamics (PSID), from year 2005. The findings indicate that after matching the wage rate decreased among the entire labor force from a 23.8 percent gain for union membership to only 10.3 percent and among the private sector from 22.4 percent to 16.4 percent, using kernel matching methodology. Findings diminished greatly for workers in the public sector, going as low as 2 percent after matching, however post-matching results were not significant at any meaningful level. 2. Literature Review One of the more prominent papers on union wages using propensity score matching was published by Alex Bryson(2002). Bryson used data in Britain from the Workplace Employee Relations Survey (WERS) and segregated his data among occupational coverage and workplace coverage in the private sector. Unfortunately, there is no data from the PSID to determine whether a person has occupational or workplace coverage, all the PSID determines is whether the person is in a union or not. Bryson found that before matching wage rates were, on average, 17-25 percent higher among union members, but after matching wage rates fell to between 3-6 percent. Bryson used nearest neighbor matching while this paper relies on kernel matching. 3. The Economic Model and Hypothesis There may be several factors affecting pay between members and non-members. Employers may collude with labor unions to pay lower wage rates to non-members, and then share the benefits with union members, (Blakemore et al., 1986). However, as union membership declines, it is unlikely that employers would be able to collude with unions to pay non-members less than the market wage for their skills and experience, since the nonmembers increasingly have employment options elsewhere. Another factor may be union spillovers to non-members and threat among non-members to join the workplace union if not given higher pay. Kahn and Curme(1987) argue for spillover effects concluding they raise nonunion wages, even after accounting for employer-union collusion, and other phenomena that lowers wages. There seems to be some agreement among economists that there is a spillover and/or threat effect which raises nonunion wages. How far this extends outside of union heavy industries is debatable. One would expect very little spillover effect in an industry with almost no union presence. However, if spillover effects exist, this would raise the level of nonunion members’ wages, which could affect the matching. Fortunately, the problem should be ameliorated by the high population of non-members in the sample size, especially in the private sector. Finally, there are other benefits that union members may receive that are not wage related. These include increased job security, pension plans, severance pay, and other social benefits one might get from joining a union. Much of these data are hard to come by and typically cannot be measured via econometric methods, but may carry monetary value to the union member. These extra benefits may lower the gains to the wage rate, for example, a union might bargain for better pensions in lieu of higher wages, or may use a combination of higher wages and better pensions. Given these factors, one would still expect the gains from union membership to decrease after matching. As most data has shown, union members’ mean age and experience, is generally higher than that of non-members, which should increase their earnings regardless of union membership. Bryson found similar results in Britain using the WERS data. 4. Econometrics Methodology Although many differing econometric regression techniques exist, to find the real gains from union membership, propensity score matching is the most useful. Caliendo and Kopeinig(2008) offer an overview of the basics of propensity score matching. The goal of propensity score matching is to establish a treated group (D=1) and an untreated group (D=0). In this case, those who are in a union are the treated group, and those who are nonmembers are the untreated group. The treated and untreated groups are then matched along a variety of independent variables, age, sex, education level, etc. and one, or both, parameters are measured. They are the average treatment effect (ATE) and the average treatment on the treated (ATT), the latter is the focus of this paper. The ATT shows the gains of the treated from being in the treated group and can be shown to equal: τATT= E(τ|D=1) = E[Y(1)|D = 1] - E[Y(0)|D = 1] where Y(1) indicates if the individual got the treatment and Y(0) indicates if the individual did not receive treatment The gains of the treated are subtracted from what the untreated would have gotten from being in the treated group, since it is impossible to be in both the treated and untreated group these gains are estimated and calculated on average. While there are several different matching methods that can be used, this paper will focus on kernel matching. Caliendo and Kopeinig analyzed kernel matching and they found a decrease in the variance associated with this type of matching, meaning a lower standard error, and a higher t-statistic. However, there is the possibility of bad matches using kernel matching, which can influence the ATT. A useful paper by Markus Frolich(2004) analyzed the benefits of different types of matching techniques. Frolich argued that matching one treated to one untreated observation was “inefficient”, which is what one to one nearest neighbor matching employs, and tested a variety of other techniques including kernel matching, local linear matching, and knearest-neighbor. Further, Frolich found that kernel matching seemed to work best when there were a large number of untreated compared to treated observations, which is the case with the data being used in this paper. After the data were downloaded for this paper, several interaction terms were generated. Different types of matching methods were used; first nearest neighbor, however this method proved to be very erratic and outcomes would change drastically with the addition or subtraction of one independent variable. Kernel matching was then used, which produced more static results. Even using slightly different variables in each regression, results tended to be in close proximity with the outcome reported, while consistently having a significant t-test. 5. Data The data were downloaded from the Panel Study on Income Dynamics for year 2005. Data are obtained from the same individuals on a bi-yearly basis by the University of Michigan. After pertinent variables were downloaded and cleaned there were a total of 2960 observations in the population, of which 516 were union members. Dummy variables were generated where appropriate, with race and region, among other variables. Observations were dropped for a variety of reasons, examples of which include: being unemployed for all of year 2004, or working outside of the United States for year 2004. As noted before, union membership was used as the treated group (D=1) and non-members were placed into the non-treated group (D=0). 6. Results and Discussion Kernel matching was used to find logwage, percent change in wage, gains from the entire labor force. Then data were segregated into private sector workers and public sector workers. Kernel matching method was then attempted with each sector individually. 6.1 Entire Labor Force Table 2 shows the propensity score matching for the entire labor force. Before matching there is a 23.8 percent difference in wages between union members and non-members. After matching, that number falls to 10.3 percent, which means union membership offers a 10.3 percent gain to wages. This result is significant at the 5 percent significance level. Table 3 shows the pstest of several of the variables used in the matching, but leaves out exponential and interaction terms. As one can see, any variables that were significant before matching, become insignificant after matching, and matching reduces the bias on almost every variable. Table 4 shows the pstest of the significance of all variables, but only includes the p-value and the pseudo R-squared. Of interest is that the p-value is insignificant at all meaningful levels following matching, meaning that post-match the combination of these variables are not significant in explaining change in the wage rate. Table 5 shows the total sample size, and how many observations were on the common support, the ratio of non-members to union members for the entire sample is about 5-to1.Graph 1 details the distribution of the propensity score of the observations. The top half shows union members, labeled as treated, while the bottom half shows non-members and is labeled untreated. Union members with a propensity scores close to 1 were dropped because they were off the support, meaning they could not be matched with any of the nonunion members. 6.2 Private and Public Sectors Table 6 shows the propensity score matching for the private sector. Before matching there is a 22.4 percent gain in wage among private sector workers from joining a union. However, after matching, that gain decreases to 16.4 percent and is significant at the 5 percent level. Table 7 shows many of the variables used in the matching process, but leaves out exponential and interaction terms, as can be said with the entire labor force, there is a reduction in bias with almost every variable and any variables that were significant before matching, become insignificant after matching. Table 8 shows the pstest for all variables, as with the entire labor force, the p-value becomes insignificant at all meaningful levels. Table 9 shows the sample size and how many observations were on the common support. More observations are off the common support, but the ratio of non-members to union members is higher at more than 7-to-1.Graph 2 shows the distribution of the population using kernel matching. Again, much of the non-members have low propensity scores, which makes matching with union members with high scores more difficult, and in the case of union members with scores near 1, it cannot be done. Kernel matching for the government sector was attempted; however post-match test statistics were not significant at any meaningful level. Post-match logwage dropped to between 2-6 percent in most of the kernel matching models, however the standard error on each model was too high to obtain a test statistic high enough to be significant. Given that the drop in logwage in the private sector was so low compared to the entire labor force, it is likely that the drop in logwage in the public sector would be even higher. 6.3 Discussion The outcome of the kernel matching produced an ATT higher than what Bryson found using data from the British WERS. There could be several reasons for the differences. First, American unions might be more powerful than that found in Britain; a more powerful union would be able to command higher wages than a weaker one, which would push up the logwage post-match. Unfortunately, data on union size or what union each member belongs to is not available in the PSID data. Second, the matching method might play a part in the discrepancy. As noted in the literature review, Bryson used nearest neighbor matching, while this paper uses kernel matching, because of the high number of non- members in relation to union members, kernel matching, Frolich argued, should be employed. Lastly, there may be other non-quantifiable differences between Britain and the United States that affect wages and unions. Types of laws, or even geographic makeup could account for some of these differences. 7. Self-critique The results put forward are the culmination of much work and effort in the attempt to develop a model that was statistically significant and provided insight into the labor market. While more work could have went into finding a significant test statistic for the public sector, there seems to be little work done on the topic, perhaps because gains are so small from union membership that any matching would produce insignificant results. Given the number of matching attempts put forward for this paper, it seems very unlikely that any type of kernel matching would show significant results. While nearest neighbor was attempted, as noted before the results proved to be very erratic. The makeup of the distribution of matching would also make nearest neighbor less efficient, since several of the union members had propensity scores close to 1, while much of the non-members had scores skewed closer to zero. 8. Conclusion The goal of this paper was to determine the gains to wages from unionization by looking at the percent change in wage rates using propensity score matching. The findings indicate that gains in wage rate decrease from 23.8 to 10.3 percent, post-match, in the entire labor force and 22.4 to 16.4 percent, post-match, in the private sector, while public sector wage rates likely go even lower. Although the percentage change was higher, post-match than obtained by Alex Bryson, the model is significant at the 5 percent level. It appears there are differences among matching methods which produce differing results. There may also be differences in the concentration of union members between the United States and Britain which would allow for higher logwage in the United States. While the amount gained from unionization decreases post-match, there are still gains in wage rates among union members. There are likely other factors drawing individuals into joining a labor union, of which some have been previously discussed. It would be interesting to see a study achieve a significant test statistic with public sector employees. 9. References Alex Bryson, 2002. "The Union Membership Wage Premium: An Analysis Using Propensity Score Matching," CEP Discussion Papers dp0530, Centre for Economic Performance, LSE. Blakemore, A. E., Hunt, J. C. and Kiker, B. F. (1986), “Collective Bargaining and Union Membership Effects on the Wages of Male Youths,” Journal of Labor Economics, 4, April, pp. 193-211. Caliendo, M. and Kopeinig, S. (2008), Some Practical Guidance for the Implementation of Propensity Score Matching. Journal of Economic Surveys, 22: 31–72. Frolich, Markus. (2004). “Finite-Sample Properties of Propensity-Score Matching and Weighting Estimators,” The Review of Economics and Statistics, 86(1): 77-90. Kahn, Lawrence M. and Curme, Michael, (1987), “Unions and Nonunion Wage Dispersion,” The Review of Economics and Statistics, Vol. 69, No. 4, Nov., pp. 600607. 10. Tables and Graphs Table 1. Variable Table Variable Name age exper educ poor average rich (not listed) pension northeast midwest south west (not listed) white black nativeamer asian rother (not listed) govtworker married urban whiecollar Description Age of the individual surveyed Number of years of experience at main job Number of years of education Dummy variable for if the individual thought they grew up in a poor household (monetarily) Dummy variable for if the individual thought they grew up in an average household (monetarily) Dummy variable for if the individual thought they grew up in a rich household (monetarily) Binary variable indicating if the individual has a pension at their main job (yes=1, no=0) Dummy variable for if individual lives in: Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, or Vermont Dummy variable for if individual lives in: Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota, or Wisconsin Dummy variable for if individual lives in: Alabama, Arkansas, Delaware, Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, Washington D.C., or West Virginia Dummy variable for if individual livesin: Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, or Wyoming Dummy variable for if individual is white Dummy variable for if individual is black Dummy variable for if individual is Native American Dummy variable for if individual is Asian Dummy variable for if individual is race other Binary variable indicating whether the individual workers for the government (yes=1, no=0) Binary variable indicating whether the individual is married (yes=1, no=0) Binary variable indicating whether the individual lives in an urban setting (yes=1, no=0) Binary variable indicating whether the individual works at a white collar job Table 2. Kernel Matching for the Entire Labor Force Variable Sample Treated Controls Difference S.E. Unmatched 3.06677 2.82794 .238829 .031576 Logwage Matched 3.06345 2.95975 .103694 .050470 T-stat 7.56 2.05 Table 3. Reduction in Bias among selected non-exponential and non-interaction variables for Entire Labor Force Mean Sample Treated Control Significance level Variable Unmatched 45.062 42.556 *** age Matched 44.9 45.591 exper Unmatched Matched 14.186 13.984 9.0078 14.088 *** educ Unmatched Matched 13.38 13.357 13.567 13.342 poor Unmatched Matched .27519 .2749 .23118 .27964 average Unmatched Matched .46318 .46016 .491 .48699 pension Unmatched Matched .86047 .85857 .53396 .86383 *** northeast Unmatched Matched .24225 .24104 .13625 .21564 *** midwest Unmatched Matched .30426 .30677 .25941 .29448 ** south Unmatched Matched .22674 .23108 .42185 .25075 *** white Unmatched Matched .64729 .65139 .71522 .62859 *** black Unmatched Matched .31008 .31474 .23445 .35107 *** ** nativeamer Unmatched Matched .00581 0 .00327 .00019 asian Unmatched Matched .01163 .01195 .01596 .0061 govtworker Unmatched Matched .45736 .45418 .17471 .4811 *** married Unmatched Matched .61822 .62351 .56997 .58503 ** urban Unmatched Matched .92636 .9243 .88216 .92807 *** Unmatched .42636 .51432 *** Matched .42231 .45533 Where * indicates significance at the 10% level, ** indicates significance at the 5% level, and *** indicates significance at the 1% level whitecollar Table 4. Test of the Significance of All Variables for the Entire Sector Sample Pseudo R2 LR chi2 p>chi2 0.358 980.46 0.000 Unmatched 0.057 79.33 1.000 Matched Table 5. Breakdown of Variables on the Support in the Entire Labor Force Psmatch2: Treatment Psmatch2: Common support assignment Off Support On Support Total 0 2,444 2,444 Untreated 14 502 516 Treated 14 2,946 2,960 Total Table 6. Kernel Matching for the Private Sector Variable Sample Treated Controls Unmatched 3.04204 2.81759 Logwage Matched 3.03661 2.87256 Difference S.E. .224442 .043372 .164048 .060330 T-stat 5.17 2.72 Table 7.Reduction in Bias among selected non-exponential and non-interaction variables for the Private Sector Mean Sample Treated Control Significance level Variable Unmatched 43.939 42.397 ** age Matched 43.072 43.91 exper Unmatched Matched 14.457 13.637 8.4422 13.428 *** educ Unmatched Matched 12.654 12.661 13.393 12.621 *** male Unmatched Matched .84643 .84064 .75508 .85536 *** poor Unmatched Matched .30714 .30279 .23104 .27038 *** average Unmatched Matched .43214 .43426 .48884 .4313 * pension Unmatched Matched .79286 .77291 .47 .77639 *** northeast Unmatched Matched .18214 .18327 .15171 .21465 midwest Unmatched Matched .375 .38645 .2588 .3648 *** south Unmatched Matched .25 .251 .39564 .25549 *** white Unmatched Matched .63571 .66534 .73178 .68255 *** black Unmatched Matched .30357 .30677 .21071 .29429 *** nativeamer Unmatched Matched .01071 0 .00397 .00031 asian Unmatched Matched .01429 0 .01785 .00137 married Unmatched Matched .67143 .65339 .57759 .67259 urban Unmatched Matched .91071 .92032 .89241 .92173 *** Unmatched .15714 .47199 *** Matched .17131 .16951 Where * indicates significance at the 10% level, ** indicates significance at the 5% level, and *** indicates significance at the 1% level whitecollar Table 8.Test of the Significance of All Variables for the Private Sector Sample Pseudo R2 LR chi2 p>chi2 0.379 645.77 0.000 Unmatched 0.051 35.47 1.000 Matched Table 9. Breakdown of Variables on the Support in the Private Sector Psmatch2: Treatment Psmatch2: Common Support assignment Off support On support 0 2,017 Untreated 29 251 Treated 29 2,268 Total Total 2,017 280 2,297 Graph 1. Kernel Matching Among the Entire Labor Force Graph 2. Kernel Matching Among the Private Sector