File

advertisement
Displaying &
Describing
Categorical Data
To Analyze Data, Make a Picture!
• Pictures show patterns in your data
you may not otherwise see.
• Pictures make it easier for others to
understand your data.
What kind of picture?
It depends on the data…
First Step – Frequency Table
• Records the totals and category names
for counts of categorical data
• Associated Terms:
–Proportion: Count of a given category
divided by the total number of cases
–Relative Frequency Table: Displays
percentages instead of counts
–Distribution: Breakdown of categories
and associated frequencies of data
Frequency Table Example
• Suppose Target sells 50 stuffed animals, Walmart sells 75 stuffed
animals and Toys’R’Us sells 150 stuffed animals in one week. This is
the associated frequency table.
FREQUENCY
RELATIVE FREQ.
STORE
Count
STORE
%
Target
Walmart
Toys’R’Us
50
75
150
Target
Walmart
Toys’R’Us
18.18
27.27
54.55
The Area Principle
• The area occupied by a part of the graph
should correspond to the magnitude of the
value it represents.
• Our eyes tend to be more impressed by
area than other aspects like length so…
• Neglecting the area principle is a very
popular way to “lie” with Statistics.
Example of Abuse of Area
150
75
50
TARGET
WALMART
TOYS’R’US
With this area, Toys’R’Us is at least 8
times as large as Target but in reality,
they only sell 3 times at many bears.
Deceptive? I’d say so!
10 for 1!
How could you fix it? Easy!
150
75
50
TARGET
WALMART
TOYS’R’US
Bar Charts
• An accurate visual representation of
counts of categorical data
• All bars should be the exact same
width & equally spaced
• Can be vertical or horizontal
Bar Chart of Stuffed Animal Data
Relative Frequency Bar Chart
• All bars have same width and spacing
• Draws attention to proportions rather than
actual counts
• Looks the same but shows percentages
instead of frequencies
Stuffed Animal
Relative Frequency Bar Chart
Pie Charts
• Whole group of cases is represented as
one circle.
• The size of the slice is proportional to the
fraction of the whole in a given category.
75
Target
150
Walmart
Toys'R'Us
50
Contingency Table
• Shows how individuals are distributed along
two different variables
• Margins of the table give the totals
• Marginal Distribution: A frequency
distribution that appears in the margins of a
contingency table
Example of Contingency Table
Reagan Class
9th
10th
11th
12th
Total
Grade Grade Grade Grade
Male
540
420
352
267
1579
E Female
505
400
321
275
1501
1045
820
673
542
3080
S
X
Total
Conditional Distribution
• Shows the distribution of one variable for
just the individuals who satisfy some
condition on another variable
• Independent: When the distribution of
one variable is the same for all categories
of another variable
Example of Conditional
Distributions
Male
9th
10th
11th
Grade Grade Grade
12th
Total
Grade
540
420
34.2% 26.6%
267
16.9%
352
22.3%
1579
100%
9th
10th
11th
12th
Total
Grade Grade Grade Grade
Female
505
400
321
275
1501
33.6% 26.6% 21.4% 18.3% 100%
Segmented Bar Chart
• One bar represents the entire
population – compares percentages
instead of counts
• The bars are of equal height &
divided proportionally into segments
corresponding to percentages in each
group
Example of Segmented Bar Chart
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Male
Female
Freshmen
Sophomores
Juniors
Seniors
Simpson’s Paradox
• Using unfair averaging techniques when
computing data may give you misleading
results.
• It is always better to compare averages
within one category since the overall
average may be misleading.
Download