Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation Chapter 2 McGraw-Hill/Irwin ©The McGraw-Hill Companies, Inc. 2008 GOALS •Organize qualitative data into a frequency table. •Present a frequency table as a “bar chart” (Excel they are called column chart) or a pie chart. •Organize quantitative data into a frequency distribution. •Present a frequency distribution for quantitative data using histograms, frequency polygons, and cumulative frequency polygons. 2 Mutually Exclusive An individual, object, or measurement is included in only one category It can’t be in two categories – Example: A particular phone call cannot originate with both AT&T and MCI – 3 Frequency Table Frequency Table: A grouping of qualitative data into mutually exclusive classes (categories) showing the number of observations in each class Sales data from Auto Dealership P rice 4 P rice ($000) Age of Custome r ForD 24,624.00 24.624 50 Domestic Ca r T ype GM 23,032.00 27,556.00 20,384.00 20,953.00 37,270.00 21,006.00 27,594.00 29,636.00 26,357.00 38,262.00 38,910.00 23,947.00 50 Domestic Ford 59 Domestic GM 32 Domestic GM 29 Domestic Ford 35 Foreign Mercedes 57 Domestic GM 43 Domestic GM 51 Domestic GM 31 Foreign Honda 39 Foreign Mercedes 25 Foreign Honda 43 Domestic Ford 23.032 27.556 20.384 20.953 27.270 21.006 27.594 29.636 26.357 28.262 38.910 23.947 We would like a Frequecy Table that shows how many of each Car Type we sold last month from the Auto Dealership data (counting). Count of Car Type Car Type Ford GM Honda Mercedes Toyota Grand Total Total 28 22 13 10 7 80 Relative Class Frequencies Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class. A relative frequency captures the relationship between a class total and the total number of observations. Car Type GM Ford Mercedes Honda Toyota 5 Frequency Relative Frequency 22 28 10 13 7 80 27.50% 35.00% 12.50% 16.25% 8.75% 100.00% Textbook: Bar Charts Excel: Column Chart Number of Autos Sold 30 28 25 22 F r 20 e q u 15 e n 10 c y 13 10 7 5 0 Ford 6 GM Honda Mercedes Toyota In Excel, this is a Column chart. Column charts are good for Nominal Level Data. Notice that the columns do not touch. Pie Charts Percentage of Autos Sold 9% 13% 35% Ford GM Honda Mercedes 16% Toyota 27% 7 Frequency Distribution A Frequency distribution is a grouping of data into mutually exclusive categories showing the number of observations in each class. •The raw data are more easily interpreted if organized into a frequency distribution •The resulting frequency distribution helps a person to quickly see the “shape” of the data •Although the frequency distribution will result in the loss of some detail, seeing patterns in the data can help a person to make better decisions 8 5 Steps To Organize Raw Data Into A Frequency Distribution 9 Step 1: Decide on Number of Classes Step 2: Determine The Class Interval Step 3: Set The Individual Class Limits Step 4: Tally The Data Into Classes Step 5: Count The Tallies in Each Class & Present the Frequency Distribution Step 1: Determining The Number Of Classes Goal is to use just enough classes so you can see the “shape” of the data. You must use professional judgment. Useful recipe to determine the number of classes: 2k ≥ n n = total observations k = number of classes 10 Best to use 5 < k < 15 General guidelines that are not always possible to follow. Thus, making Frequency Distributions is often refer to as an “art”. Definitions Class Interval – – Distance between lower limit of class and lower limit of the next class The class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class (also midpoint to midpoint) Class Midpoint (Class Mark) – – The midpoint can be thought of as the “typical value” for the class This is the average of the upper and lower class limits: (Lower class limit + upper class limit)/2 11 Step 2: Determine The Class Interval Or Width 12 Class interval should be the same for every interval – If they are not equal graphs may be misleading, & calculations may be problematic – In some cases, where there is a potential for many empty classes, unequal class interval may be necessary The classes all taken together must cover at least the distance from the lowest value in the raw data up to the highest value: Determine Class Interval H-L i ≥ k i = Class Interval H = Highest Value L = Lowest Value k = Number of Classes EXAMPLE – Creating a Frequency Distribution Table Ms. Kathryn Ball of AutoUSA wants to develop tables, charts, and graphs to show the typical selling price on various dealer lots. The table on the right reports only the price of the 80 vehicles sold last month at Whitner Autoplex. 13 Constructing a Frequency Table Example Step 1: Decide on the number of classes. A useful recipe to determine the number of classes (k) is the “2 to the k rule.” such that 2k > n. There were 80 vehicles sold. So n = 80. If we try k = 6, which means we would use 6 classes, then 26 = 64, somewhat less than 80. Hence, 6 is not enough classes. If we let k = 7, then 27 128, which is greater than 80. So the recommended number of classes is 7. Step 2: Determine the class interval or width. The formula is: i (H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value, and k is the number of classes. ($35,925 - $15,546)/7 = $2,911 Round up to some convenient number, such as a multiple of 10 or 100. Use a class width of $3,000 14 Step 3: Set The Individual Class Limits Classes must be mutually exclusive Avoid overlapping or unclear class limits: – – Include lower limit Exclude upper limit Example of class limits: – $12,000 up to $15,000 and $15,000 up to $18,000 Avoid open ended classes (problems with The lower limit of the first class should be a multiple of the class interval (not always possible) Convenient multiples of ten are useful You must compare the actual range to the range implied by the number of classes & class interval 15 $12,000 & $14,999 belong in the first class $15,000 belongs in the second class graphing) General guidelines that are not always possible to follow. Thus, making Frequency Distributions is often refer to as an “art”. Constructing a Frequency Table Example 16 Step 3: Set the individual class limits Constructing a Frequency Table 17 Step 4: Tally the vehicle selling prices into the classes. Step 5: Count the number of items in each class. Observed Patterns: Range: about $15,000 to about $36,000 Concentration between $18,000 & $27,000 Largest concentration is in $18,000 - $21,000 class – 18 Typical Value = (18+21)/2 = 19.5 K. Two sold for $33,000 or more 8 sold for less than $18,000 Relative Frequency Distribution To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations. 19 Graphic Presentation of a Frequency Distribution The three commonly used graphic forms are: Histograms Frequency polygons Cumulative frequency distributions 20 Histogram Histogram for a frequency distribution based on quantitative data is very similar to the column charts (book says: bar chart) showing the distribution of qualitative data. The classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars. The columns must touch in order to visually articulate that the class interval spans from lower class limit to upper class limit. 21 Other Notes About Histogram Histograms constructed from Relative Frequency Distributions look the same (have the same shape), but instead, the vertical axis would show percentages Histograms must have the columns touching: – – 22 The columns must touch in order to visually articulate that the class interval spans from lower class limit to upper class limit (a continuous variable) For nominal or ordinal level data, the columns are not drawn adjacent to each other The category labels are usually words Frequency Polygon 23 A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies. Cumulative Frequency Distribution 24 Cumulative Frequency Distribution 25 Second Example of a Cumulative Frequency Distribution (prices of vehicles are lower) Selling Prices ($ thousands) 12 up to 15 15 up to 18 18 up to 21 21 up to 24 24 up to 27 27 up to 30 30 up to 33 Total 26 Cumulative Frequency Distribution for Vehicles Selling Price Number of Vehicles Sold Cumulative (Frequency) Frequency 8 8 23 31 17 48 18 66 8 74 4 78 2 80 80 = = = = = = 8 + 23 31 + 17 48 + 18 66 + 8 74 + 4 78 + 2 Cumulative Frequency Polygon 80 100% 70 Number of Vehicles Sold 60 75% 50 50% 40 30 25% 20 10 0 9 12 15 18 21 24 Selling Price ($000) 27 27 30 33 Cumulative Frequency Polygon Plot line on coordinate system X-axis = Upper limit of class Y-axis (Left) = Cumulative Frequency Y-axis (Right) = % First point on graph is: (lower limit of first class, 0) 28 x 12 15 18 21 24 27 30 33 y (left) 0 8 31 48 66 74 78 80 Cumulative Frequency Polygon 80 100% 70 Number of Vehicles Sold 60 75% 50 50% 40 30 25% 20 10 0 9 12 15 18 21 24 Selling Price ($000) 29 27 30 33 x 12 15 18 21 24 27 30 33 y (left) 0 8 31 48 66 74 78 80 Cumulative Frequency Polygon 80 100% 70 Number of Vehicles Sold 60 75% 50 50% 40 30 25% 20 10 0 9 12 15 18 21 24 Selling Price ($000) 30 27 30 33 50% of the vehicles sold for less than about $19,500 Cumulative Frequency Polygon 80 100% 70 Number of Vehicles Sold 60 75% 50 50% 40 30 25% 20 10 0 9 12 15 18 21 24 Selling Price ($000) 31 27 30 33 25 of the vehicles sold for less than about $17,500 Cumulative Frequency Polygon 80 100% 70 Number of Vehicles Sold 60 75% 50 50% 40 30 25% 20 10 0 9 12 15 18 21 24 Selling Price ($000) 32 27 30 33 80% of the vehicles sold for less than about $24,000 End of Chapter 2 33