Section 2.1 Frequency Distributions With additions and improvements by D.R.S., University of Cordele HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions A distribution is a way to describe the structure of a particular data set or population. A frequency distribution is a display of the values that occur in a data set and how often each value, or range of values, occurs. Frequencies (f) are the numbers of data values in the categories of a frequency distribution. A class is a category of data in a frequency distribution. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Are the data measurements Qualitative or Quantitative? – – – – – Blood type circle: Dollars spent on meal Heart rate, beats/min. Model of car Favorite fast food restaurant – Fuel efficiency, mi/gal – My pain, on a scale from 1 to 10 qualitative or quantitative qualitative or quantitative qualitative or quantitative qualitative or quantitative qualitative or quantitative qualitative or quantitative qualitative or quantitative 3 Frequency Distribution for Categorical Data Category Frequency Relative Frequency (list the (put the (this category is what categories counts of how percent of the total here in this many in this sample size?) column) column) (What order? Highest frequency down to lowest? Lowest to highest? Alphabetical? It’s your design decision.) 4 Categorical Frequency Distributions are the fuel for the “Family Feud” What question did they ask to obtain these answers? (Photograph borrowed from some web site somewhere; I failed to record the exact source.) 5 Categorical (or, Qualitative) Frequency Distribution example “What state did you visit most recently?” State visited (the category) How many (the frequency) Alabama 71 California 18 Florida 138 New York 7 South Carolina 48 Tennessee 27 Texas 53 Other states 70 TOTAL 432 6 Things we do with Categorical Frequency Distributions Sometimes we just leave them as tables of words and numbers for reference and interpretation. We draw pictures of them (future lessons). – Bar graphs – Pie charts – Cutesy repeated icons variation of the bar graph 7 A famous categorical frequency distribution we will revisit later Draw this 5-card poker hand Frequency Royal Flush 4 Straight Flush (not including Royal Flush) 36 Four of a Kind 624 Full House 3,744 Flush (not including Royal Flush or Straight Flush) 5,108 Straight (not including Royal Flush or Straight Flush) 10,200 Three of a Kind 54,912 Two Pair 123,552 One Pair 1,098,240 Something that’s not special at all 1,302,540 Total 2,598,600 Quantitative Frequency Distribution When the data are number measurements Classes Each class is a low-tohigh range of values These are called the “Class Limits” Frequency The frequency column gives a count of how many data values fit in the class 9 Quantitative Frequency Distribution (data are number measurements) Placement Test Score 0-9 Each class is a “bucket” 10-19 containing all 20-29 the values in its low-to-high 30-39 limits. 40-49 50 and above the 0 to 9 bucket the 10 to 19 bucket etc. How many applicants 19 38 52 71 50 28 etc. etc. the 20 to 29 bucket 10 Quantitative Frequency Distribution (data are number measurements) Taxable Income How many taxpayers % of taxpayers GPA Where are you going to get the raw data from? How many students % of students How do you know how many rows to put in a frequency distribution? 11 Quantitative Frequency Distribution (data are number measurements) mph How do you pick the low-to-high limits for each of the classes? Weight of newborn How many newborns % of newborns How many cars % of cars Will the distribution need to be shown in a graphic form, too, as well as in this tabular form? 12 Frequency Distributions Constructing a Frequency Distribution 1. Decide how many classes should be in the distribution. There are typically between 5 and 20 classes in a frequency distribution. Several different methods can be used to determine the number of classes that will show the data most clearly, but in this textbook, the number of classes for a given data set will be suggested. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions Constructing a Frequency Distribution (cont.) 2. Choose an appropriate class width. In some cases, the data set easily lends itself to natural divisions, such as decades or years. At other times, we must choose divisions for ourselves. When starting a frequency distribution from scratch, one method of finding an appropriate class width is to begin by subtracting the lowest number in the data set from the highest number in the data set and dividing the difference by the number of classes. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions Constructing a Frequency Distribution (cont.) Rounding this number up gives a good starting point from which to choose the class width. You will want to choose a width so that the classes formed present a clear representation of the data and include all members of the data set, so make a sensible choice. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions Constructing a Frequency Distribution (cont.) 3. Find the class limits. The lower class limit is the smallest number that can belong to a particular class, and the upper class limit is the largest number that can belong to a class. Using the minimum data value, or a smaller number, as the lower limit of the first class is a good place to begin. However, judgment is required. You should choose the first lower limit so that reasonable classes will be produced, and it should have the same number of decimal places as the largest number of decimal places in the data. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions Constructing a Frequency Distribution (cont.) After choosing the lower limit of the first class, add the class width to it to find the lower limit of the second class. Continue this pattern until you have the desired number of lower class limits. The upper limit of each class is determined such that the classes do not overlap. If, after creating your classes, there are any data values that fall outside the class limits, you must adjust either the class width or the choice for the first lower class limit. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions Constructing a Frequency Distribution (cont.) 4. Determine the frequency of each class. Make a tally mark for each data value in the appropriate class. Count the marks to find the total frequency for each class. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Frequency Distributions Class width The class width is the difference between the lower limits or upper limits of two consecutive classes of a frequency distribution. The lower class limit is the smallest number that can belong to a particular class. The upper class limit is the largest number that can belong to a particular class. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution Create a frequency distribution using five classes for the list of 3-D TV prices given in Table 2.2. Table 2.2: 3-D TV Prices (in Dollars and in an Ordered Array) 1595 1599 1685 1699 1699 1699 1699 1757 1787 1799 1799 1885 1888 1899 1899 1899 1984 1999 1999 1999 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) Solution Because we were told how many classes to include, _____ classes, we will begin by deciding on a class width. Subtract the lowest data value from the highest and divide by the number of classes: โ๐๐โ − ๐๐๐ค = ๐๐๐๐ ๐ ๐ค๐๐๐กโ โ๐๐ค ๐๐๐๐ฆ ๐๐๐๐ ๐ ๐๐ This gives us a class width of $ __________. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) We will stop here and consider some options. Choosing a class width of $81 does seem perfectly reasonable from a theoretical point of view. However, one should consider the impression created by having TV prices grouped in intervals of $81. Can you imagine presenting this data to a client? 1500 – 1580, 1581 – ______, ______ – _____, etc. Instead, it would be more reasonable to group TV prices by intervals of $100. Therefore, we will choose our class width to be $100. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) Next, we need to choose a starting point for the classes, that is, the first lower class limit. One should always first consider using the smallest data value for the beginning point. In this case, if we choose the smallest TV price, we would be starting the first class at $_____ with a width of $100. However, given that we’ve chosen a class width of $100, it is more natural to begin the first class at $_____. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) Now let’s continue building the class limits. Adding the class width of $100 to $1500, we obtain a second lower class limit of $______. The next lower limit is found by adding $____ to $_____. We continue in this fashion until we have five lower class limits, one for each of our five classes. Finally, we need to determine appropriate upper class limits. Again, be reasonable. Remember, too, that the classes are not allowed to overlap. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) Because the data are in whole dollar amounts, it makes sense to choose upper class limits that are one dollar less than the next lower limit. The classes we have come up with are as follows. 3-D TV Prices Class Frequency $1500 - $______ $______ - $______ $______ - $______ $______ - $______ $______ - $______ HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) Note that the last upper class limit is also the maximum value in the data set. COINCIDENCE! This will not necessarily occur in every frequency table. However, we have included all the data values in our range of classes, so no adjustments to the classes are necessary. (Important for every data value to find a home in one of the classes, nobody’s left out.) Tabulating the number of data values that occur in each class produces the following frequency table. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.1: Constructing a Frequency Distribution (cont.) 3-D TV Prices Class Frequency $1500 - $1599 ____ $1600 - $1699 ____ $1700 - $1799 ____ $1800 - $1899 ____ $1900 - $1999 ____ Note that the sum of the frequency column is ______ and the number of data values in the original list is _____. They are { equal, unequal }. What does that mean? _________________________________ HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Characteristics of a Frequency Distribution Class boundary A class boundary is the value that lies halfway between the upper limit of one class and the lower limit of the next class. After finding one class boundary, add (or subtract) the class width to find the next class boundary. The boundaries of a class are typically given in interval form: lower boundary–upper boundary. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.2: Calculating Class Boundaries Calculate the class boundaries for each class in the frequency distribution from Example 2.1. Solution Look at the first and second classes. The upper limit of class one is 1599. The lower limit of class two is 1600. Thus, the class boundary between the first two classes is calculated as follows. ๐๐๐๐ + ๐๐๐๐ → $__________ ๐ HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.2: Calculating Class Boundaries (cont.) Recall that the class width is 100. Adding 100 to 1599.5 gives the next class boundary. You can repeat this step to find the remaining class boundaries. 3-D TV Prices with Class Boundaries Class Frequency Class Boundaries $1500 - $1599 $1600 - $1699 $1700 - $1799 $1800 - $1899 $1900 - $1999 2 5 4 5 4 _______ - _______ 1599.5 -_______ _______ - _______ _______ - _______ _______ - _______ HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Characteristics of a Frequency Distribution Class Midpoint Lower Limit ๏ซ Upper Limit Class Midpoint = 2 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.3: Calculating Class Midpoints Calculate the midpoint of each class in the frequency distribution from Example 2.1. Solution The midpoint is the sum of the class limits divided by two. For the first class, the midpoint is calculated as follows. − → $ ________________ HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.3: Calculating Class Midpoints (cont.) We can use this same calculation to find the midpoints of the remaining classes. Another method is to add 100 (the class width) to the first midpoint, as we did with class boundaries. 3-D TV Prices with Class Boundaries Midpoints Class Frequency Class Boundaries Midpoints $1500 - $1599 $1600 - $1699 $1700 - $1799 $1800 - $1899 $1900 - $1999 2 5 4 5 4 1549.5 __________ __________ __________ __________ HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Characteristics of a Frequency Distribution Relative Frequency The relative frequency is the fraction or percentage of the data set that falls into a particular class, given by f Relative Frequency ๏ฝ n where f is the class frequency, n is the sample size, given by n ๏ฝ ๏ฅ fi , and fi is the frequency of the i th class. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.4: Calculating Relative Frequencies (cont.) 3-D TV Prices with Relative Frequencies Class Frequency $1500 - $1599 2 $1600 - $1699 5 $1700 - $1799 4 $1800 - $1899 5 $1900 - $1999 4 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Relative Frequency The Relative Frequency for a class is: class frequency total of frequency col. Optionally convert it to a percent. Compute these relative frequencies as percents. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Calculate relative frequency as decimal Draw this 5-card poker hand Frequency Royal Flush 4 Straight Flush (not including Royal Flush) 36 Four of a Kind 624 Full House 3,744 Flush (not including Royal Flush or Straight Flush) 5,108 Straight (not including Royal Flush or Straight Flush) 10,200 Three of a Kind 54,912 Two Pair 123,552 One Pair 1,098,240 Something that’s not special at all 1,302,540 Total 2,598,600 Relative Frequ’cy Characteristics of a Frequency Distribution Cumulative Frequency The cumulative frequency is the sum of the frequencies of a given class and all previous classes. The cumulative frequency of the last class equals the sample size. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.5: Calculating Cumulative Frequencies Calculate the cumulative frequency for each class in the frequency distribution from Example 2.1. Solution 3-D TV Prices with Cumulative Frequencies Class Frequency $1500 - $1599 $1600 - $1699 $1700 - $1799 $1800 - $1899 $1900 - $1999 2 5 4 5 4 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Cumulative Frequency 2 _______ _______ _______ _______ Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.6: Characteristics of a Frequency Distribution Data collected on the numbers of miles that Beta Corp. employees drive to work daily are listed below. Use these data to create a frequency distribution that includes the class boundaries, midpoint, relative frequency, and cumulative frequency of each class. Use six classes. Be sure that your class limits have the same number of decimal places as the largest number of decimal places in the data. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.6: Characteristics of a Frequency Distribution (cont.) Class width: max value – min value 6 classes Numbers of Miles Beta Corp. Employees Drive to Work Each Day 3.8 10.2 11.9 2.7 1 5.5 9.3 3.7 4.8 6.5 9.1 7.3 5.8 6.2 9.1 7 11 1.4 Instead of using the rough decimal value, let’s make the class width be a nicer value; let’s make it ________. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.6: Characteristics of a Frequency Distribution (cont.) Numbers of Miles Beta Corp. Employees Drive to Work Each Day Choose a nice starting Class Frequency point for the ______ - ______ ______ - ______ lower class ______ - ______ limit of the ______ - ______ first class. ______ - ______ Fill in all the ______ - ______ class limits. Check: Will every data value fit in one of the classes? If so, go ahead; count to find the frequency of each class. HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 2.6: Characteristics of a Frequency Distribution (cont.) Numbers of Miles Beta Corp. Employees Drive to Work Each Day Class Frequency 1.0–2.9 3 3.0–4.9 3 5.0–6.9 4 7.0–8.9 2 9.0–10.9 4 Class Boundaries HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Mid point Relative Frequency Cumulative Frequency Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved.