Chapter 2: Frequency Distributions

advertisement
Chapter 2: Frequency Distributions
It’s in the context of frequency distributions that we encounter a telling example of the
importance of communication. The nature of large data sets is difficult to communicate without
some means of summarizing the data sets. We can do so graphically or statistically. This chapter
focuses on the pictorial/graphical approach.
First, let’s look at some data. Here are the weights (in pounds) of some people. Keep in mind,
apropos the notion of real limits, that it’s unlikely that anyone weighs exactly 102 pounds (to 20
decimal places). Instead, if a person weighs between 101.5 and 102.5 pounds, we’ll call 102
pounds the weight of that person.
102
130
152
168
104
133
114
148
140
147
124
152
136
148
138
116
152
141
129
107
132
129
164
136
158
133
135
167
193
137
128
143
128
179
139
139
141
147
154
152
What can you tell me about these weights? Well, it should be obvious that no one is lighter than
100 pounds. With careful scrutiny, you may also determine that no one weighs 200 pounds or
more. But if you wanted to communicate about the nature of weights in this data set, how might
you do so?
One way to describe the data set would be to talk about the minimum and maximum weights, but
they are difficult to identify in the array above. As a first step, re-organize the data above from
heaviest to lightest.
A frequency distribution is an organized tabulation of the number of individuals located in each
category on the scale of measurement.
A frequency distribution can be structured either as a table or as a graph, but in either case the
distribution presents the same two elements:
1. The set of categories that make up the original measurement scale.
2. A record of the frequency, or number of individuals, in each category.
Ch2 - 1
Frequency Distribution Tables
We could construct a frequency distribution table for the above data by listing each weight, as
well as the frequency with which each weight occurred. With this tabular representation of the
data set, you start to gain a better sense of the nature of the data. Enter the unique weights under
X and the frequency with which each occurs under f.
X
f
Ch2 - 2
However, this table should strike you as a bit unwieldy. It might be more useful to construct a
grouped frequency distribution table. G&W supply a number of rules that should govern your
choice of groups: keep the number of groupings to about 10, keep the intervals a simple number
and all the same, and the bottom score in each class interval should be a multiple of the width. In
our case, let’s use 10-pound intervals: 100-109 (actually 99.5 to 109.5), 110-119 (actually 109.5
to 119.5), etc.
X
f
190-199
180-189
170-179
160-169
150-159
140-149
130-139
120-129
110-119
100-109
Notice that two things have happened—one good and one bad. The data now appear a bit more
comprehensible at a glance, but we no longer know what specific weights occurred.
Frequency Distribution Graphs
G&W distinguish between histograms (used with continuous data, and no space between bars)
and bar charts (used with discrete data, and spaces between bars). I’ll tend to use the terms
interchangeably, so I don’t care too much if there is space between bars or not, as long as the
information is clearly conveyed.
You should be able to transfer the grouped frequency distribution table into the grouped
frequency histogram seen below. (“Rough” bars are okay.)
Ch2 - 3
To construct a frequency distribution polygon, you could simply make a dot at the middle of the
top line in each bar, and then connect those dots.
John Tukey developed an even simpler graphical display, which has the added advantage of
retaining all the information about the original scores in the data set. The stem-and-leaf display is
easily constructed from the original data without first rearranging the data. The first step is to set
up the stem. For this data set, the stem would be the decades used to group the data above (100109, etc.) Simply make a vertical line and then place the first two digits of each grouping on the
left of the line. Then, add the “leaves,” which in this case would be the appropriate units for each
weight. Go back to the original data set to enter the data, so that you can see how easily the stemand-leaf display is constructed without any need to first rearrange the data.
19
18
17
16
15
14
13
12
11
10 2
I’ve entered the first weight (102) above. Now you can enter the rest. Just be sure to make each
numeral the same size. When you’re done, you should have a graphical display that is just like
the grouped frequency distribution graph, except that it’s reversed when you turn the page on its
side (from 190 down to 100).
The Shape of a Frequency Distribution
One way in which one can describe a distribution is to talk about its shape. A distribution can be
symmetrical (possible to draw a straight line through the middle so that one side of the
distribution is the mirror image of the other) or skewed (scores tend to pile up at one end of the
distribution and taper off gradually at the other end). A positively skewed distribution has a
longer tail on the right-hand side. In a negatively skewed distribution, the tail is longer to the left.
Ch2 - 4
The distribution on the left below is a negatively skewed distribution. The distribution on the
right below is a positively skewed distribution.
Playing “Fair” with Graphs
As the authors of your text indicate (Box 2.1), it’s possible to distort the information found in
graphs. Edward Tufte is one of the major advocates of honest use of graphs. He’s written a book
(The Visual Display of Quantitative Information) that addresses issues related to accurate
portrayal of graphical information.
On the left below is a graph that details the commission payments to travel agents. First of all,
what is the sense of the change that you would get from just a quick visual inspection of the
graph? Now look more closely. Can you tell what’s “wrong?” What about the graph on the right?
Ch2 - 5
Yep, if the x-axis for the Nobel Prize data had been ruled evenly, the graph would actually have
looked like this:
Needless to say, the productivity of the U.S. did not actually decline.
There are other ways to distort the basic information when portraying it graphically. Below left is
a graph that addresses some old data about the mandated fuel efficiency of automobiles. Can you
tell what’s wrong with the graph? OK, how about a graph that’s more accurate? The graph on the
right below illustrates the same data, but is far less dramatic. Tufte argues for placing all the
“chart junk” outside of the important part of the graph!
Ch2 - 6
On a related topic, at that time people were also concerned about the price of fuel oil. Note the
cost per barrel!  Some very disturbing graphs appeared that indicated a large increase in the
price of crude oil. Can you see what’s wrong with these graphs?
Oil prices were definitely increasing, but the rise may not have been as steep as it appeared to be
in these graphs. The graph below is a bit less disturbing (and more informative). Note that this
graph corrects for inflation (the “REAL” line), a factor that the other graphs ignored.
Here are a couple of graphs from a NPR report. They were attempting to illustrate a large
increase in the crime rate. And the graphs certainly seem to support their point.
Ch2 - 7
However, below is a better picture of the violent crime rate. Note that the crime rate is down in
2005 relative to many previous years (e.g., 1991). So, the changes in the later years (2000-2005)
are all from a period with a relatively small crime rate. The 2.5% increase from 2004-2005 is
from 4.65 violent crimes per 1000 people to 4.77 violent crimes per 1000 people. That seems far
less dramatic! And what about that whopping 8.3% increase in crimes for cities of 500,000 to
999,999 inhabitants? It represents a real increase in crime from 16.9 violent crimes per day to
18.4 violent crimes per day. The increase is not as impressive when portrayed in that fashion.
[http://stubbornfacts.us/domestic_policy/crime/crime_rate_down_bad_graphs_and_misleading_headlines_up]
And what is one to make of this graph? 193%? Really? Thank you Fox News! 
Ch2 - 8
Download