To accompany Hawkes Lesson 2.1
Original content by D.R.S.
1
Is your data Qualitative or Quantitative?
• Qualitative: it’s a category
– Blood type
– Model of car
– Favorite fast food restaurant
• Quantitative: it’s a numerical measurement
– Heart rate, beats per minute
– Fuel efficiency, miles per gallon
– Dollars spent on meal
– My pain, on a scale from 1 to 10
2
Frequency Distribution for Categorical Data
Category
(list the categories here in this column)
Frequency
(put the counts of how many in this column)
Relative Frequency
(this category is what percent of the total sample size?)
(What order? Highest frequency down to lowest?
Lowest to highest? Alphabetical? It’s your design decision.)
3
Categorical Frequency Distributions are the fuel for the “Family Feud”
(Photograph borrowed from some web site somewhere;
I failed to record the exact source.)
4
Categorical (or, Qualitative)
Frequency Distribution example
• “What state did you visit most recently?”
State visited (the category)
Alabama
California
Florida
New York
South Carolina
Tennessee
Texas
Other states
TOTAL
How many (the frequency)
71
18
138
48
7
27
53
70
432
5
Things we do with Categorical
Frequency Distributions
• Sometimes we just leave them as tables of words and numbers for reference and interpretation.
• We draw pictures of them (future lessons).
– Bar graphs
– Pie charts
– Cutesy repeated icons variation of the bar graph
6
A famous categorical frequency distribution we will revisit later
Draw this 5-card poker hand
Royal Flush
Straight Flush (not including Royal Flush)
Four of a Kind
Full House
Flush (not including Royal Flush or Straight Flush)
Frequency
4
36
624
3,744
5,108
Straight (not including Royal Flush or Straight Flush)
Three of a Kind
Two Pair
One Pair
Something that’s not special at all
10,200
54,912
123,552
1,098,240
1,302,540
Total 2,598,600
Quantitative Frequency Distribution
(data is number measurements)
Classes
Each class is a low-tohigh range of values
These are called the
“Class Limits”
Frequency
The frequency column gives a count of how many data values fit in the class
8
Quantitative Frequency Distribution
(data are number measurements)
Placement Test
Score
0-9
10-19
20-29
30-39
40-49
50 and above
How many applicants
19
38
52
71
50
28
9
About the Quantitative Frequency
Distribution
• Instead of individual test score values, we
GROUPED data into CLASSES
• Other names for “classes”: “bins”, “buckets”
• Each class is a low-to-high range of data values
• Each data value falls into exactly one class
• May be one or two “open-ended”classes
– Like our “50 and higher”
10
• CLASS LIMITS are 10-19, 20-29, etc.
• Classes do not overlap!
• Classes are usually the same width.
• CLASS MIDPOINTS are like 14.5, 24.5, etc.
(High minus low, divided by 2)
11
• CLASS LIMITS are 10-19, 20-29, etc.
• CLASS BOUNDARIES split the “gap” between class limits: 9.5-19.5, 19.5-29.5, etc.
• “9.5-19.5” means 9.5 ≤ x < 19.5 (note ≤ vs. < )
– All values between 9.5 and 19.5
– Including the lower endpoint of 9.5
– But excluding the upper endpoint of 19.5
12
A Cumulative
Frequency column
19 + 38 = 57
+ 52 = 109
+ 71 = 180
+ 50 = 230
+ 28 = 258
Placement
Test Score
0-9
10-19
20-29
30-39
40-49
50 and above
How many applicants
19
38
52
71
50
28
Cumulative frequency
19
57
109
180
230
258
13
A Relative
Frequency column
𝟏𝟗
= 𝟎. 𝟎𝟕𝟑𝟔
𝟐𝟓𝟖
Should total exactly 100%
But rounding might throw it off a wee bit.
Placement
Test Score
0-9
10-19
20-29
30-39
40-49
50 and above
TOTAL
How many applicants
19
38
52
71
50
28
Relative frequency
7.4%
14.7%
20.2%
27.5%
19.4%
10.9%
258
14
Constructing a Frequency Distribution
1. How many classes should we have?
2. What class width should we use?
3. Find the class limits.
4. Sort your data, find the frequency of each class.
Adapted from textbook page 46 © HLS
15
Using runners’ times from the Bunny Hop 5K in
Cordele, March 31, 2012 – original data downloaded from a link at rungeorgia.com
Click link to pdf
16
• Between 5 classes and 20 classes is good
• How many data values do you have?
• One textbook suggests: if you have < 125 data values, use the square root of the number of data values
• The Bunny Hop race had 103 finishers.
• By that rule, we would have 10 or 11 classes.
• Let’s agree on 10 classes for this example.
17
The “range” is the highest data value minus the lowest data value.
Divide the range by the number of classes
Then bump up to the next integer.
That’s just a starting point
18
High 66.8000 – Low 20.0167 = Range 45.7833
Divide the range by the number of classes
45.7833 ÷ 10 = 4.57833
Then bump up to the next integer.
Class width is 5
That’s just a starting point
We like it; it sounds good.
Nice “round” kind of a number for our readers
19
Start at what value for the first class?
• The lowest value is 20.0167
• Let’s start our first class at 20.0000
• Same number of decimal places as the data
The first class has a lower class limit of 20.0000
The lower limit of the next class is 25.0000
• Take the lower limit of 20.0000 from previous class
• + class width of 5 = 25.0000 lower limit for next class
20
The first class has lower class limit = 20.0000
The next class has lower class limit = 25.0000
Etc. for the rest of the 10 classes:
• 30.0000, 35.0000, 40.0000, 45.0000 minutes, and
• 50.0000, 55.0000, 60.0000, 65.0000 minutes
21
• The first class has lower class limit 20.0000
• The second class has lower class limit 25.0000
– So the first class has upper class limit 24.9999
• The first class’s class limits: 20.0000 – 24.9999
• Then next comes 25.0000 – 29.9999
• Then 30.0000 – 34.9999, etc.
• All the way up through 65.0000-69.9999
22
4. Count the frequency of each class
If tallying unsorted data by hand, hash marks are useful.
Time (minutes) Frequency
20.0000-24.9999
9
25.0000-29.9999
30.0000-34.9999
35.0000-39.9999
40.0000-44.9999
45.0000-49.9999
26
23
14
7
11
50.0000-54.9999
55.0000-59.9999
60.0000-64.9999
65.0000-69.9999
10
0
2
1
23
Class Limits and Class Boundaries
Class Limits
20.0000-24.9999
25.0000-29.9999
30.0000-34.9999
Etc.
55.0000-59.9999
60.0000-64.9999
65.0000-69.9999
Class Boundaries
19.99995 – 24.99995
24.99995 – 29.99995
29.99995 – 34.99995
Etc.
54.99995 – 59.99995
59.99995 – 64.99995
64.99995 – 69.99995
24
• What to do with the gap between the class limits of adjacent classes?
• Limits 25.0000-29.9999 and 30.0000-34.9999
• There’s gap between 29.9999
0 and 30.0000
0
• Midway between them is 29.9999
5
• Class Boundaries extend to that midpoint
• 24.9999
5 – 29.9999
5 and 29.9999
5 – 34.9999
5
25
• Example: Class Limits 25.0000 – 29.9999
• Class Boundaries 24.9999
5 – 29.9999
5
• This means 24.9999
5 ≤ x < 29.9999
5
• Note: including the lower boundary (≤)
• But not including the upper boundary (<)
• Because classes must never overlap
26
Class Midpoints
(Upper Limit + Lower Limit) ÷ 2
Class Limits
20.0000-24.9999
25.0000-29.9999
30.0000-34.9999
Etc.
55.0000-59.9999
60.0000-64.9999
65.0000-69.9999
Class Midpoints
22.49995
27.49995
32.49995
57. 49995
62. 49995
Etc.
= one class width apart
67. 49995
27
Class Limits, Boundaries, and
Midpoints for the Placement Test
• It’s easier with whole numbers as class limits
Class
Limits
Frequency
Class
Boundaries
Class
Midpoint
0-9
10-19
20-29
30-39
40-49
50 +
19
38
52
71
50
28
-0.5 – 9.5
9.5 – 19.5
19.5 – 29.5
29.5 – 39.5
4.5
14.5
24.5
34.5
39.5 – 49.5
44.5
49.5 and up None? Or 54.5?
28
• Link: The Excel FREQUENCY function .
• Link: The Excel COUNTIF function .
– Need to add info about COUNTIFS function.
• Also Excel “Histogram” function generates frequency distributions (discussed in the
Histogram lesson)
29