Displaying & Describing Categorical Data

advertisement
Displaying & Describing
Categorical Data
Chapter 3
Look at different types of data and
checking a condition before plugging
ahead.
 Writing clear explanations in context.
 Independence and the importance to
statistics.
 Simpson’s Paradox: think deeply or be
mislead!!

Objective

Number 1 Rule of Data Analysis?
Draw a picture!!!
Frequency Table: records totals and
categories names!!
Frequency Tables
Counts are useful but sometimes we want
to know the fraction or proportion of data
in each category.
 These counts are expressed as
percentages.
 Relative Frequency Table: displays
percentages rather than counts

Frequency Tables

Both these types of tables describe the
distribution across categories.
Frequency Tables
The area principle says that the area
occupied by a part of the graph should
correspond to the magnitude of the value
it represents.
 Violations of this principle are a common
way to lie with statistics.

Area Principle

A bar chart displays the distribution of a
categorical variable, showing the counts
for each category next to each other for
easy comparison.
Bar Charts

If we are interested in the relative
proportion we could replace the counts
with percentages and use a relative
frequency bar chart.
Bar Charts

Pie Charts show the whole group of cases
as a circle . They slice the circle into
pieces whose sizes are proportional to the
fraction of the whole in each category.
Pie Charts
Think before you draw!!
 Always check the categorical condition.
 If you want to make a relative frequency
bar chart or a pie chart, make sure the
categories do not overlap.
 If they overlap you can still make the
chart but your percentages will not add up
to 100.

What should you pick!!


To look at two categorical variables together,
we often arrange the counts in a two-way
table.
A contingency table displays counts and,
sometimes percentages of individuals falling
into named categories on two or more
variables. The table categorizes the
individuals on all variables at once to reveal
possible patterns in one variable that may be
contingent on the category of the other.
Contingency Tables

In a contingency table, the distribution of
either variable alone is called the marginal
distribution. The counts or percentages
are the totals found in the margins (last
row or column) of the table.
Contingency Tables
In Jan 2007, a Gallup poll asked 1008
Americans age 18 and over whether they
planned to watch the upcoming Super
Bowl. The pollster also asked those who
planned to watch whether they were
looking forward more to seeing the
football game or the commercials. The
results are summarized in the table
Example
What’s the marginal distribution of the
responses?
Male
Female
Total
Game
279
200
479
Commercials
81
156
237
Won’t Watch
132
160
292
Total
492
516
1008
Example
The distribution of a variable restricting
the “Who” to consider only a smaller
group of individuals is called a conditional
distribution.
 In a contingency table, when the
distribution of one variable is the same for
all categories of another, we say that the
variables are independent.

Conditional Distributions

How do the conditional distributions of
interest in the commercials differ for men
and women?
Conditional Distributions

Does it seem that there’s an association
between interest in Super Bowl TV
coverage and a person’s sex?
Example

Segmented Bar Charts treat each bar as
the “whole” and divides it proportionally
into segments corresponding to the
percentage in each group.
Segmented Bar Charts

When averages are taken across different
groups, they can appear to contradict the
overall averages.
Simpson’s Paradox
It’s the last inning of the important game.
Your team is a run down with bases
loaded and two outs. The pitcher is due
up, so you’ll be sending in a pinch hitter.
There are 2 batters available on the
bench. Whom should you send to bat?
Player
Overall
Vs LHP
Vs RHP
Wade
33 for 103
28 for 81
5 for 22
Lee
45 for 151
12 for 32
33 for 119
Example
Download