Chapter 2: Frequency Distributions

Chapter 2: Frequency Distributions It’s in the context of frequency distributions that we encounter a telling example of the importance of communication. The nature of large data sets is difficult to communicate without some means of summarizing the data sets. We can do so graphically or statistically. This chapter focuses on the pictorial/graphical approach. First, let’s look at some data. Here are the weights (in pounds) of some people. Keep in mind, apropos the notion of real limits, that it’s unlikely that anyone weighs exactly 102 pounds (to 20 decimal places). Instead, if a person weighs between 101.5 and 102.5 pounds, we’ll call 102 pounds the weight of that person. 102 130 152 168 104 133 114 148 140 147 124 152 136 148 138 116 152 141 129 107 132 129 164 136 158 133 135 167 193 137 128 143 128 179 139 139 141 147 154 152 What can you tell me about these weights? Well, it should be obvious that no one is lighter than 100 pounds. With careful scrutiny, you may also determine that no one weighs 200 pounds or more. But if you wanted to communicate about the nature of weights in this data set, how might you do so? One way to describe the data set would be to talk about the minimum and maximum weights, but they are difficult to identify in the array above. As a first step, re-organize the data above from heaviest to lightest. A frequency distribution is an organized tabulation of the number of individuals located in each category on the scale of measurement. A frequency distribution can be structured either as a table or as a graph, but in either case the distribution presents the same two elements: 1. The set of categories that make up the original measurement scale. 2. A record of the frequency, or number of individuals, in each category. Ch2 - 1 Frequency Distribution Tables We could construct a frequency distribution table for the above data by listing each weight, as well as the frequency with which each weight occurred. With this tabular representation of the data set, you start to gain a better sense of the nature of the data. Enter the unique weights under X and the frequency with which each occurs under f. X f Ch2 - 2 However, this table should strike you as a bit unwieldy. It might be more useful to construct a grouped frequency distribution table. G&W supply a number of rules that should govern your choice of groups: keep the number of groupings to about 10, keep the intervals a simple number and all the same, and the bottom score in each class interval should be a multiple of the width. In our case, let’s use 10-pound intervals: 100-109 (actually 99.5 to 109.5), 110-119 (actually 109.5 to 119.5), etc. X f 190-199 180-189 170-179 160-169 150-159 140-149 130-139 120-129 110-119 100-109 Notice that two things have happened—one good and one bad. The data now appear a bit more comprehensible at a glance, but we no longer know what specific weights occurred. Frequency Distribution Graphs G&W distinguish between histograms (used with continuous data, and no space between bars) and bar charts (used with discrete data, and spaces between bars). I’ll tend to use the terms interchangeably, so I don’t care too much if there is space between bars or not, as long as the information is clearly conveyed. You should be able to transfer the grouped frequency distribution table into the grouped frequency histogram seen below. (“Rough” bars are okay.) Ch2 - 3 To construct a frequency distribution polygon, you could simply make a dot at the middle of the top line in each bar, and then connect those dots. John Tukey developed an even simpler graphical display, which has the added advantage of retaining all the information about the original scores in the data set. The stem-and-leaf display is easily constructed from the original data without first rearranging the data. The first step is to set up the stem. For this data set, the stem would be the decades used to group the data above (100109, etc.) Simply make a vertical line and then place the first two digits of each grouping on the left of the line. Then, add the “leaves,” which in this case would be the appropriate units for each weight. Go back to the original data set to enter the data, so that you can see how easily the stemand-leaf display is constructed without any need to first rearrange the data. 19 18 17 16 15 14 13 12 11 10 2 I’ve entered the first weight (102) above. Now you can enter the rest. Just be sure to make each numeral the same size. When you’re done, you should have a graphical display that is just like the grouped frequency distribution graph, except that it’s reversed when you turn the page on its side (from 190 down to 100). The Shape of a Frequency Distribution One way in which one can describe a distribution is to talk about its shape. A distribution can be symmetrical (possible to draw a straight line through the middle so that one side of the distribution is the mirror image of the other) or skewed (scores tend to pile up at one end of the distribution and taper off gradually at the other end). A positively skewed distribution has a longer tail on the right-hand side. In a negatively skewed distribution, the tail is longer to the left. Ch2 - 4 The distribution on the left below is a negatively skewed distribution. The distribution on the right below is a positively skewed distribution. Playing “Fair” with Graphs As the authors of your text indicate (Box 2.1), it’s possible to distort the information found in graphs. Edward Tufte is one of the major advocates of honest use of graphs. He’s written a book (The Visual Display of Quantitative Information) that addresses issues related to accurate portrayal of graphical information. On the left below is a graph that details the commission payments to travel agents. First of all, what is the sense of the change that you would get from just a quick visual inspection of the graph? Now look more closely. Can you tell what’s “wrong?” What about the graph on the right? Ch2 - 5 Yep, if the x-axis for the Nobel Prize data had been ruled evenly, the graph would actually have looked like this: Needless to say, the productivity of the U.S. did not actually decline. There are other ways to distort the basic information when portraying it graphically. Below left is a graph that addresses some old data about the mandated fuel efficiency of automobiles. Can you tell what’s wrong with the graph? OK, how about a graph that’s more accurate? The graph on the right below illustrates the same data, but is far less dramatic. Tufte argues for placing all the “chart junk” outside of the important part of the graph! Ch2 - 6 On a related topic, at that time people were also concerned about the price of fuel oil. Note the cost per barrel!  Some very disturbing graphs appeared that indicated a large increase in the price of crude oil. Can you see what’s wrong with these graphs? Oil prices were definitely increasing, but the rise may not have been as steep as it appeared to be in these graphs. The graph below is a bit less disturbing (and more informative). Note that this graph corrects for inflation (the “REAL” line), a factor that the other graphs ignored. Here are a couple of graphs from a NPR report. They were attempting to illustrate a large increase in the crime rate. And the graphs certainly seem to support their point. Ch2 - 7 However, below is a better picture of the violent crime rate. Note that the crime rate is down in 2005 relative to many previous years (e.g., 1991). So, the changes in the later years (2000-2005) are all from a period with a relatively small crime rate. The 2.5% increase from 2004-2005 is from 4.65 violent crimes per 1000 people to 4.77 violent crimes per 1000 people. That seems far less dramatic! And what about that whopping 8.3% increase in crimes for cities of 500,000 to 999,999 inhabitants? It represents a real increase in crime from 16.9 violent crimes per day to 18.4 violent crimes per day. The increase is not as impressive when portrayed in that fashion. [http://stubbornfacts.us/domestic_policy/crime/crime_rate_down_bad_graphs_and_misleading_headlines_up] And what is one to make of this graph? 193%? Really? Thank you Fox News!  Ch2 - 8

Chapter 2: Frequency Distributions

Related documents

Products

Support

Chapter 2: Frequency Distributions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib