Frequency Distributions: Raw Data List → Grouped into Classes → Reported in a Table Categorical Frequency Distribution [Blu4 Example 2-1 Page 36] Blood Types Class A B O AB Total Tally - Frequency [*] Percent of Total [**] [***] There’s a list of raw data (not shown here) “Class” column lists the groups, the classifications we choose. One class on each line of the table. In “Categorical”, they are “nominal” values, probably not numbers. Like Colors or Sizes or in this example, Blood Types. “Tally” is where [Blu4] puts hash marks to count manually as we process the raw data. “Frequency” is the count of how many raw data are in the Class. [*] Total the Frequency column. This matches the count of how many data values are in your original raw data list. “Percent of Total” – not in every frequency distribution – but can be useful. Divide Frequency for the class in this line ÷ Total of all the Frequencies and express it as a percent, rounded. Numerical Frequency Distribution, Ungrouped [Blu4 Example 2-3 Page 41] SUV s’ MPG ratings Class 12 13 14 15 16 17 18 19 Total Class Tally Boundaries 11.5-12.5 12.5-13.5 13.5-14.5 14.5-15.5 15.5-16.5 16.5-17.5 17.5-18.5 18.5-19.5 - Frequency Cumulative Frequency [**] Raw data is a list of MPG ratings; the only values that occurred were 12, 13, 14, 15, 16, 17, 18, 19. No grouping was done because the number of classes is naturally small enough for this particular set of data. Class Boundaries: 11.5-12.5, for example, is for 11.5 ≤ 𝑥 < 12.5. If there were a 12.5, it would go into the next category, 13, where 12.5 ≤ 𝑥 < 13. Cumulative Frequency: total frequencies as you go along. Not every Frequency Distribution has such a column. Sometimes it’s useful. “Percent of Total” isn’t in this example but it could have been used. [***] Based on [Blu4 Section 2-2] Document1 2/9/2016 10:02 AM - D.R.S. Frequency Distribution, Grouped [Blue Example 2-2 Page 39] Record high temperatures for each of the fifty states Class Class Tally Frequency Cumulative Determining the Classes in this Example: Limits Boundaries Frequency Find the highest value and the lowest value among the raw data, 𝐻 and 𝐿. 100-104 99.5-104.5 Compute 𝑅 = range = 𝐻 − 𝐿. This example: 𝐻 = 134, 𝐿 = 100, 𝑅 = 34. 105-109 104.5-109.5 How many classes do you want? Usually pick from 5 to 20 classes. The 110-114 109.5-14.5 example just decided to make 7 classes. So 𝐶 =7. 115-119 114.5-119.5 Compute class width 𝑊 = 𝑅/𝐶 and if it has a remainder, bump up. In this 34 120-124 119.5-124.5 example, 𝑊 = 7 ≈ 4.9, ↑ to 5. (Not “rounding”, but “ceiling”.) 125-129 124.5-129.5 Choose a starting point, arbitrarily, anything that’s ≤ 𝐿. They picked 100. 130-134 129.5-134.5 Lower Class Limits: Add 𝑊, add 𝑊 again, etc.: 100, 105, … , 130. Total Upper Class Limits: Second class’s Lower Limit – 1: 105 − 1 = 104, then The Midpoint is the numeric center of the class. repeatedly add 𝑊: 109, 114, … , 134 𝐿𝑜𝑤𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦+𝐻𝑖𝑔ℎ𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦 Class Midpoint = Class Boundaries: Lower Class Limit – 0.5 to Upper Class Limit + 0.5 2 Summary of the Rules and the Guidelines for Grouped Frequency Distributions Procedures [Blu4 Page 42] 1) Determine the classes a) Find the highest and the lowest values, 𝐻 and 𝐿. b) Find the range, 𝑅 = 𝐻 − 𝐿. c) Select the number of classes desired, 𝐶. Usually 5 ≤ 𝐶 ≤ 20. Then class width, 𝑊 = 𝑐𝑒𝑖𝑙𝑖𝑛𝑔(𝑅/𝐶). d) Lower Class Limits: Select a starting point = the lowest value or any convenient number less than the lowest value; add the width (repeatedly) to get the lower class limits. e) Find the upper class limits. f) Find the boundaries. 2) Tally the data. 3) Find the numerical frequencies from the tallies. 4) Find the cumulative frequencies. Some more rules and guidelines [Blu4 Page 40] There should be between 5 and 20 classes. No hard and fast rule but that’s a good rule of thumb. It is nice if class width = an odd number. Then you will get class midpoints that have the same place value as the data. [But JK10 page 57 recommends an even number for class width!] The classes must be Mutually Exclusive. (This means the Class Limits don’t overlap. Each data value falls into only One class.) And the classes must be Exhaustive. Every data value fits somewhere. The classes must be Continuous. No gaps, even if the class turns out to be empty. (But you can omit a zero-frequency class at the beginning or end of the frequency distribution, just don’t omit any class in the middle.) The classes must all have the same Width. Based on [Blu4 Section 2-2] Document1 2/9/2016 10:02 AM - D.R.S.