Chapter Study Guide Chapter 4 Displaying and Describing Categorical Data I. How to Display Categorical Data? Frequency Table, Relative Frequency Table, Bar Chart, Pie Chart, and Contingency Table 1. Frequency Table: A table that shows the number of counts for each category. For example: Search Engine Visits Google 50,629 Direct 22,173 Yahoo 7,272 MSN 3,166 SnapLink 946 All Others 8,987 Total 93,173 2. Relative Frequency Table: When actual counts are replaced by proportions, the frequency table becomes Relative Frequency Table. For example, the above table can be presented as follows: Search Engine Google Direct Yahoo MSN SnapLink Visits 54.34% 23.80% 7.80% 3.40% 1.02% All Others 9.65% Total 100.00% 3. Bar Chart and Pie Chart: When presenting data using Bar or Pie Charts, we must observe the Area Principle: the area occupied by a part of the graph should correspond to the magnitude of the value it represents. Chaodong Han OPRE 504 Page 1 of 4 4. Contingency Table: When data are arranged in a two-way table for two variables, we can show distributions along each variable, contingent on the value of the other variable. Such a table is called Contingency Table. For example, students are reported by class and gender: Total Gender Class Male Female Fresh 450 550 1,000 Sophomore 500 450 950 Junior 480 420 900 Senior 450 400 850 1,880 1,820 3,700 Total II. How to Describe Categorical Data? Marginal Distribution and Conditional Distribution 1. Marginal Distribution: The margins of a contingency table give totals (right-hand column and bottom row). Given the above table, the totals in the right-hand column show the frequency distribution of the variable Class while the bottom row totals show the frequency distribution of the variable Gender. Given the above table, we can show gender distribution for each category of student classification and class distribution of each gender category. For example, Gender distribution is 1,880 for male and 1,820 for female. 2. Conditional Distribution: It shows the distribution of one variable for cases that satisfy a condition on the other variable. Total Gender Class Male Female Fresh 450 (23.9%) 550 (30.2%) 1,000 Sophomore 500 (26.6%) 450 (24.7%) 950 Junior 480 (25.5%) 420 (23.1%) 900 Senior 450 (23.9%) 400 (22.0%) 850 1,880 (100%) 1,820 (100%) 3,700 Total Chaodong Han OPRE 504 Page 2 of 4 In above table, for Male students (e.g., Gender is Male), the class distribution is 23.9% for Fresh, 26.6% for Sophomore, 25.5% for Junior and 23.9% for Senior. III. Issues When Comparing Categorical Data Simpson’s Paradox: the overall percentage is heavily influenced by one category, resulting in a misleading overall percentage. An Example of Simpson’s Paradox Product Sales Representative CD Printer Overall John 90 sales out of 100 sales calls 90% 10 sales out of 20 100 sales out of sales calls 120 sales calls 50% 83% Adam 19 sales out of 20 75 sales out of sales calls 100 sales calls 95% 75% 94 sales out of 120 sales calls 78% By overall performance, John has completed 83% of sales calls while Adam has completed only 78% of sales calls. However, this calculating is misleading. When comparing at each product category, we find that Adam has a higher success rate than John in CD (95% vs. 90%) and in Printer (75% vs. 50%). The paradox occurs when we compare percentages using combined totals across different categories. In fact, selling CDs tend to be much easier than to sell printers. Most of John’s sales are in CD while most of Adam’s sales are in Printers. Therefore, Adam should be considered a more successful sale representative. The lesson learned is that we should compare percentages within each level, rather than across levels or combining percentages. Question 4.1 [ Sharpe 2011, Exercise 30, p.80] The following table shows attendance data collected by the Motion Picture Association of America during the period 2002 to 2006. Figures are in millions of movie admissions. Year Total 2006 2005 2004 2003 2002 12-24 485 489 567 567 551 2659 25-29 136 135 132 124 158 685 Patron Age 30-39 40-49 246 219 194 216 265 236 269 193 237 211 1211 1075 Total 50-59 124 125 145 152 119 665 60 and over 124 122 132 118 130 626 a) What percent of all admissions during this period were bought by people aged 12-24? Chaodong Han OPRE 504 Page 3 of 4 1334 1281 1477 1423 1406 6921 b) What percent of the admission in 2003 were bought by people aged 12-24? c) What percent of the admission were bought by people aged 12-24 in 2006? d) What percent of admission in 2006 were bought by people aged 60 and over? e) What percent of the admission bought by people aged 60 and over were in 2002? More Exercises: [Sharpe 2011, pp. 77-82]. Chapter 4 Exercises 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40. Question 4.2 [Sharpe 2011, Exercise 42, p.83] A company must decide which of the two delivery services they will contract with. During a recent trial period, they shipped numerous packages with each service and have kept track of how often deliveries did not arrive on time. Delivery Company Type of Services Pack Rats Regular Overnight Regular Overnight Boxes R Us Number of Deliveries 400 100 100 400 Number of Late Deliveries 12 16 2 28 a) Compare the overall percentage of late deliveries. b) Based on the results in part a, the company has decided to hire Pack Rats. Do you agree they deliver on time more often? Why or why not? Be specific. c) The results here are an instance of what phenomenon? More Exercises: Sharpe 2011, pp. 82-83: Chapter 4 Exercises 41, 43 and 44. Chaodong Han OPRE 504 Page 4 of 4