HERE - Towson University

advertisement
Chapter Study Guide
Chapter 4
Displaying and Describing Categorical Data
I.
How to Display Categorical Data?
Frequency Table, Relative Frequency Table, Bar Chart, Pie Chart, and Contingency Table
1.
Frequency Table:
A table that shows the number of counts for each category. For example:
Search Engine
Visits
Google
50,629
Direct
22,173
Yahoo
7,272
MSN
3,166
SnapLink
946
All Others
8,987
Total
93,173
2.
Relative Frequency Table:
When actual counts are replaced by proportions, the frequency table becomes Relative
Frequency Table. For example, the above table can be presented as follows:
Search Engine
Google
Direct
Yahoo
MSN
SnapLink
Visits
54.34%
23.80%
7.80%
3.40%
1.02%
All Others
9.65%
Total
100.00%
3.
Bar Chart and Pie Chart:
When presenting data using Bar or Pie Charts, we must observe the Area Principle: the area
occupied by a part of the graph should correspond to the magnitude of the value it represents.
Chaodong Han
OPRE 504
Page 1 of 4
4.
Contingency Table:
When data are arranged in a two-way table for two variables, we can show distributions along
each variable, contingent on the value of the other variable. Such a table is called Contingency
Table. For example, students are reported by class and gender:
Total
Gender
Class
Male
Female
Fresh
450
550
1,000
Sophomore
500
450
950
Junior
480
420
900
Senior
450
400
850
1,880
1,820
3,700
Total
II.
How to Describe Categorical Data?
Marginal Distribution and Conditional Distribution
1.
Marginal Distribution:
The margins of a contingency table give totals (right-hand column and bottom row). Given the
above table, the totals in the right-hand column show the frequency distribution of the variable
Class while the bottom row totals show the frequency distribution of the variable Gender. Given
the above table, we can show gender distribution for each category of student classification and
class distribution of each gender category. For example, Gender distribution is 1,880 for male
and 1,820 for female.
2.
Conditional Distribution:
It shows the distribution of one variable for cases that satisfy a condition on the other variable.
Total
Gender
Class
Male
Female
Fresh
450 (23.9%)
550 (30.2%)
1,000
Sophomore
500 (26.6%)
450 (24.7%)
950
Junior
480 (25.5%)
420 (23.1%)
900
Senior
450 (23.9%)
400 (22.0%)
850
1,880 (100%)
1,820 (100%)
3,700
Total
Chaodong Han
OPRE 504
Page 2 of 4
In above table, for Male students (e.g., Gender is Male), the class distribution is 23.9% for Fresh,
26.6% for Sophomore, 25.5% for Junior and 23.9% for Senior.
III.
Issues When Comparing Categorical Data
Simpson’s Paradox: the overall percentage is heavily influenced by one category, resulting in a
misleading overall percentage.
An Example of Simpson’s Paradox
Product
Sales
Representative
CD
Printer
Overall
John
90 sales out of
100 sales calls
90%
10 sales out of 20 100 sales out of
sales calls
120 sales calls
50%
83%
Adam
19 sales out of 20 75 sales out of
sales calls
100 sales calls
95%
75%
94 sales out of
120 sales calls
78%
By overall performance, John has completed 83% of sales calls while Adam has completed only
78% of sales calls. However, this calculating is misleading. When comparing at each product
category, we find that Adam has a higher success rate than John in CD (95% vs. 90%) and in
Printer (75% vs. 50%). The paradox occurs when we compare percentages using combined totals
across different categories. In fact, selling CDs tend to be much easier than to sell printers. Most
of John’s sales are in CD while most of Adam’s sales are in Printers. Therefore, Adam should be
considered a more successful sale representative. The lesson learned is that we should compare
percentages within each level, rather than across levels or combining percentages.
Question 4.1 [ Sharpe 2011, Exercise 30, p.80] The following table shows attendance data
collected by the Motion Picture Association of America during the period 2002 to 2006. Figures
are in millions of movie admissions.
Year
Total
2006
2005
2004
2003
2002
12-24
485
489
567
567
551
2659
25-29
136
135
132
124
158
685
Patron Age
30-39
40-49
246
219
194
216
265
236
269
193
237
211
1211
1075
Total
50-59
124
125
145
152
119
665
60 and over
124
122
132
118
130
626
a) What percent of all admissions during this period were bought by people aged 12-24?
Chaodong Han
OPRE 504
Page 3 of 4
1334
1281
1477
1423
1406
6921
b) What percent of the admission in 2003 were bought by people aged 12-24?
c) What percent of the admission were bought by people aged 12-24 in 2006?
d) What percent of admission in 2006 were bought by people aged 60 and over?
e) What percent of the admission bought by people aged 60 and over were in 2002?
More Exercises:
[Sharpe 2011, pp. 77-82]. Chapter 4 Exercises 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40.
Question 4.2 [Sharpe 2011, Exercise 42, p.83] A company must decide which of the two
delivery services they will contract with. During a recent trial period, they shipped numerous
packages with each service and have kept track of how often deliveries did not arrive on time.
Delivery Company
Type of Services
Pack Rats
Regular
Overnight
Regular
Overnight
Boxes R Us
Number of
Deliveries
400
100
100
400
Number of Late
Deliveries
12
16
2
28
a) Compare the overall percentage of late deliveries.
b) Based on the results in part a, the company has decided to hire Pack Rats. Do you agree they
deliver on time more often? Why or why not? Be specific.
c) The results here are an instance of what phenomenon?
More Exercises:
Sharpe 2011, pp. 82-83: Chapter 4 Exercises 41, 43 and 44.
Chaodong Han
OPRE 504
Page 4 of 4
Download