Income Inequality in the United States Data Analysis Assignment For this assignment we will explore the impact of gender and race on the earnings of full-time workers in 2000. The purpose of this assignment is to introduce you to some basic data analysis software (WebCHIP), to develop some familiarity with working with data from the Current Population Survey, and to apply what you have learned in the course to try to explain differences in earnings based on race and gender. Learning Objectives Skill After using this module, students will gain skills in: Using software to access and analyze census data Identifying independent and dependent variables Employing control variables Forming testable hypotheses using quantitative data Quantitative writing Learning how to construct, read, and interpret bivariate tables displaying frequencies and percentages Using real world data to enhance and support key course concepts I. Data Sets You will need to use the EARN2KC.DAT and EARN2KC.NY plus one other data set from a state of your choice. You can access WebCHIP through the SSDAN website. Use these instructions: 1. http://www.ssdan.net/datacounts/data/ 2. From there, click “Browse” on the left sidebar. Find “geo2000” in the drop-down box and select it. 3. Scroll down through the list of data sets until you find the appropriate dataset. Highlight and click “submit.” This will bring up the data set in the WebCHIP program and it is ready for analysis. 4. You can also click here to open EARN2KC.DAT in WebCHIP. 5. You can also click here to open EARN2KC.NY in WebCHIP. II. Variables Although there are several ways in which the following terms may be conceptualized, defined and measured, these are the definitions used by the U.S. Census Bureau and the Bureau of Labor Statistics: Income (Earning) – the money a person makes from working, as wages, salary, or a form of self-employment, expressed as an annual amount. Race (RaceLat) – individual’s self-identification as: Non-Hispanic White (NHwhite) – all persons who indicated their race as white and not of Hispanic origin. Black – all persons who indicated their race as black. Hispanic – persons of white or “other” races who identified themselves as Mexican, Puerto Rican, Cuban, or Other Spanish/Hispanic. Asian (or Pacific Islander) – includes all persons who indicated their race or ethnicity as Chinese, Filipino, Japanese, Asian Indian, Korean, Vietnamese, Cambodian, Hmong, Laotian, Thai, or other Asian as well as Hawaiian, Samoan, Guamanian or other Pacific Islander. American Indian (AmIndian) – all persons who classified themselves as American Indian, Eskimo or Aleut. Gender (Gender) - individual’s self-identification as either male or female. Work Age (WkAge) – age in years, grouped into 9 – 10 year intervals, starting at age 16 III. Using WebCHIP A. Frequency Distributions or Marginals A. To get a listing of the variables and their frequencies, use the “Marginals” function. You should get the following output: RaceLat NHwhite 73.6 Black 11.7 Hispanic 9.9 Gender Male 58.7 Earning Female 41.3 Total 100.0% Asian 4.0 AmIndian .7 Total 100.0% <15K 15-25K 25-35K 35-50K 50K-75K 75K+ 12.3 22.3 20.9 20.4 14.7 9.4 Total 100.0% WkAge 16-24 25-34 35-44 45-54 55-64 65+ 7.8 24.4 30.2 25.0 10.7 1.9 Total 100.0% This is a frequency table for the CPS sample of full-time, year-round workers in 2000. According to the table, 58.7% of all full-time workers are male as opposed to 41.3% female. 12.3% of full-time workers earn less than $15,000 a year while 9.4% earn more than $75,000. 73.6% of full-time workers are non-Hispanic white, 11.7% are black, 9.9% are Hispanic, 4.0% are Asian, and only 0.7% are American Indian. Workers between the ages of 35-54 account for 55.2% of all full-time workers. Only 1.9% of full-time workers are over the age of 65. The frequency table allows one to get an overall sense of the distribution of a particular variable or set of variables that is an important place to start. However, what we are interested in exploring further is the impact of gender and race on earnings. That is, do men still generally earn more than women? What racial group typically has the highest earnings? Which group has the lowest earnings? Do blacks earn more than Hispanics? In order to address these questions, we will need to crosstabulate the variables of interest. IV. Crosstabs A. In order to do the crosstabulation of the two variables to explore, you want to be sure that you know how the variables might be associated. It makes sense to say that one’s gender may influence his or her earnings, but it does not make sense to say that earnings influences whether one is male or female. The variable that influences or affects another variable is known as the independent variable (x) and the variable that is influenced or affected by another variable is called the dependent (y) variable. You can write this as: x _ y. In this case, we are interested in how gender influences earnings. Gender would be the independent variable (x) and earnings would be the dependent variable (y). In other words, earnings to some extent, depends on gender: Gender Earnings. B. To create a crosstabulation table in WebCHIP, you need to tell the program which is the independent and which is the dependent variable. Create a percent down crosstab with the dependent (y) or row variable “Earning” and the independent (x) or column variable “Gender.” You should get the following table: 75K+ 50-75K 35-50K 25-35K 15-25K <15K 100%= Male 13.3 18.6 21.9 19.1 17.9 9.2 56212417 Female 3.9 9.3 18.3 23.4 28.5 16.7 1 39610636 All 9.4 14.7 20.4 20.9 22.3 2.3 N = 95823053 In this case, we can see that 31.9% of male full-time workers make $50,000 or more a year as opposed to only 13.2% of females. On the other hand, 45.2% of female full-time workers earn under $25,000 a year as opposed to only 27.1% of male full-time workers. 1. Do males or female full-time workers typically earn more money? Report some of the specific findings (as I did above). 2. What does this suggest about the current state of income inequality between men and women? Does the gender earning gap appear to be any smaller in the year 2000? Now examine how race influences earnings by following the same process as step two above but substituting the “RaceLat” variable in place of “Gender.” You should be able to complete the table below and answer the following questions: Earnings 75K+ 50-75K 35-50K 25-35K 15-25K <15K 100%= NHwhite Black Hispanic Asian AmIndian All 70500890 11220490 9510429 3880288 710956 N= 95823053 3. What racial group has the highest earnings? Report some specific percentages from the table. 4. What racial group has the lowest earnings? Again, report some specifics. 5. Since these are all full-time workers, what factors might contribute to the observed racial differences in earnings? 6. What do you find most interesting and/or surprising about these findings? V. Finishing Up A. Repeat Part III using the EARN2KC.NY dataset for New York State. After calculating the marginals and crosstabulations for New York, answer the following: 1. What differences in income do you observe in NY state in comparison to the entire United States? 2. Is the gender gap in wage earnings higher or lower in New York? 3. How do the racial differences in earnings in New York compare to the national data? 4. What might account for the differences observed in New York? Be sure to summarize your findings using the marginal frequencies as well as the two crosstabulation tables using gender and race. Report specific findings from your tables, just as I did in my earlier analysis. B. Repeat Part III again for a different state of your choosing. Remember that the suffix for each data set indicates the state it represents (for example EARN2KC.WI is the 2000 earnings data for Wisconsin). Answer the four questions for this state as you did in part A above. How does the gender and racial income inequality in this state compare to New York and nationwide? Why do you think the income situation in this state might be any different? C. Your completed assignment should include: 1. A cover page with your name, course, section number, and title. 2. ALL of the output from your WebCHIP analysis, which includes the marginals, earnings by gender and earnings by race for the national, New York and the additional state data. Note that the output from WebCHIP is easily copied by left clicking and dragging your mouse on the output and then pasting it (right click) into your word processor. Please make sure that you have cleaned up your output, so you don’t have four copies of the same table before you print it! If you copy the output into a word-processor, it would be helpful to label each table so that you can refer to it in your analysis. 3. A 3-4 page typed (word-processed) analysis of your findings addressing: a. The marginals from each of the three data sets. Describe the distribution of gender, race and earnings of full-time workers at the national and state level. Is there much difference in the New York frequencies in comparison to the other state you selected? How does New York compare to the national level data? b. The six questions posed in Part IVB and the four questions in Part VA. You should refer to the specific tables from your WebCHIP output and the web resources provided at the end of this assignment to guide you in understanding and interpreting your findings. VI. Useful Web Resources Bureau of Labor Statistics: http://www.bls.gov/ CensusScope: http://www.censusscope.org/ Current Population Survey: http://www.bls.census.gov/cps/cpsmain.htm The Social Science Data Analysis Network: http://www.ssdan.net/ U.S. Census Bureau: http://www.census.gov/