2.1 Frequency Distributions

advertisement

Frequency Distributions

To accompany Hawkes Lesson 2.1

Original content by D.R.S.

1

Is your data Qualitative or Quantitative?

• Qualitative: it’s a category

– Blood type

– Model of car

– Favorite fast food restaurant

• Quantitative: it’s a numerical measurement

– Heart rate, beats per minute

– Fuel efficiency, miles per gallon

– Dollars spent on meal

– My pain, on a scale from 1 to 10

2

Frequency Distribution for Categorical Data

Category

(list the categories here in this column)

Frequency

(put the counts of how many in this column)

Relative Frequency

(this category is what percent of the total sample size?)

(What order? Highest frequency down to lowest?

Lowest to highest? Alphabetical? It’s your design decision.)

3

Categorical Frequency Distributions are the fuel for the “Family Feud”

(Photograph borrowed from some web site somewhere;

I failed to record the exact source.)

4

Categorical (or, Qualitative)

Frequency Distribution example

• “What state did you visit most recently?”

State visited (the category)

Alabama

California

Florida

New York

South Carolina

Tennessee

Texas

Other states

TOTAL

How many (the frequency)

71

18

138

48

7

27

53

70

432

5

Things we do with Categorical

Frequency Distributions

• Sometimes we just leave them as tables of words and numbers for reference and interpretation.

• We draw pictures of them (future lessons).

– Bar graphs

– Pie charts

– Cutesy repeated icons variation of the bar graph

6

A famous categorical frequency distribution we will revisit later

Draw this 5-card poker hand

Royal Flush

Straight Flush (not including Royal Flush)

Four of a Kind

Full House

Flush (not including Royal Flush or Straight Flush)

Frequency

4

36

624

3,744

5,108

Straight (not including Royal Flush or Straight Flush)

Three of a Kind

Two Pair

One Pair

Something that’s not special at all

10,200

54,912

123,552

1,098,240

1,302,540

Total 2,598,600

Quantitative Frequency Distribution

(data is number measurements)

Classes

Each class is a low-tohigh range of values

These are called the

“Class Limits”

Frequency

The frequency column gives a count of how many data values fit in the class

8

Quantitative Frequency Distribution

(data are number measurements)

Placement Test

Score

0-9

10-19

20-29

30-39

40-49

50 and above

How many applicants

19

38

52

71

50

28

9

About the Quantitative Frequency

Distribution

• Instead of individual test score values, we

GROUPED data into CLASSES

• Other names for “classes”: “bins”, “buckets”

• Each class is a low-to-high range of data values

• Each data value falls into exactly one class

• May be one or two “open-ended”classes

– Like our “50 and higher”

10

About the classes

• CLASS LIMITS are 10-19, 20-29, etc.

• Classes do not overlap!

• Classes are usually the same width.

• CLASS MIDPOINTS are like 14.5, 24.5, etc.

(High minus low, divided by 2)

11

Class LIMITS vs. Class BOUNDARIES

• CLASS LIMITS are 10-19, 20-29, etc.

• CLASS BOUNDARIES split the “gap” between class limits: 9.5-19.5, 19.5-29.5, etc.

• “9.5-19.5” means 9.5 ≤ x < 19.5 (note ≤ vs. < )

– All values between 9.5 and 19.5

– Including the lower endpoint of 9.5

– But excluding the upper endpoint of 19.5

12

A Cumulative

Frequency column

 19 + 38 = 57

 + 52 = 109

 + 71 = 180

 + 50 = 230

 + 28 = 258

Placement

Test Score

0-9

10-19

20-29

30-39

40-49

50 and above

How many applicants

19

38

52

71

50

28

Cumulative frequency

19

57

109

180

230

258

13

A Relative

Frequency column

𝟏𝟗

= 𝟎. 𝟎𝟕𝟑𝟔

𝟐𝟓𝟖

Should total exactly 100%

But rounding might throw it off a wee bit.

Placement

Test Score

0-9

10-19

20-29

30-39

40-49

50 and above

TOTAL

How many applicants

19

38

52

71

50

28

Relative frequency

7.4%

14.7%

20.2%

27.5%

19.4%

10.9%

258

14

Constructing a Frequency Distribution

1. How many classes should we have?

2. What class width should we use?

3. Find the class limits.

4. Sort your data, find the frequency of each class.

Adapted from textbook page 46 © HLS

15

Example of Construction

Using runners’ times from the Bunny Hop 5K in

Cordele, March 31, 2012 – original data downloaded from a link at rungeorgia.com

Click link to pdf

16

1. How many classes?

• Between 5 classes and 20 classes is good

• How many data values do you have?

• One textbook suggests: if you have < 125 data values, use the square root of the number of data values

• The Bunny Hop race had 103 finishers.

• By that rule, we would have 10 or 11 classes.

• Let’s agree on 10 classes for this example.

17

2. Choose a Class Width

 The “range” is the highest data value minus the lowest data value.

 Divide the range by the number of classes

 Then bump up to the next integer.

 That’s just a starting point

18

2. Choose a Class Width

 High 66.8000 – Low 20.0167 = Range 45.7833

 Divide the range by the number of classes

 45.7833 ÷ 10 = 4.57833

 Then bump up to the next integer.

 Class width is 5

 That’s just a starting point

 We like it; it sounds good.

 Nice “round” kind of a number for our readers

19

3. Find the Class Limits

 Start at what value for the first class?

• The lowest value is 20.0167

• Let’s start our first class at 20.0000

• Same number of decimal places as the data

 The first class has a lower class limit of 20.0000

 The lower limit of the next class is 25.0000

• Take the lower limit of 20.0000 from previous class

• + class width of 5 = 25.0000 lower limit for next class

20

3. Find the Class Limits - Lower

 The first class has lower class limit = 20.0000

 The next class has lower class limit = 25.0000

 Etc. for the rest of the 10 classes:

• 30.0000, 35.0000, 40.0000, 45.0000 minutes, and

• 50.0000, 55.0000, 60.0000, 65.0000 minutes

21

3. Find the class limits - Upper

• The first class has lower class limit 20.0000

• The second class has lower class limit 25.0000

– So the first class has upper class limit 24.9999

• The first class’s class limits: 20.0000 – 24.9999

• Then next comes 25.0000 – 29.9999

• Then 30.0000 – 34.9999, etc.

• All the way up through 65.0000-69.9999

22

4. Count the frequency of each class

If tallying unsorted data by hand, hash marks are useful.

Time (minutes) Frequency

20.0000-24.9999

9

25.0000-29.9999

30.0000-34.9999

35.0000-39.9999

40.0000-44.9999

45.0000-49.9999

26

23

14

7

11

50.0000-54.9999

55.0000-59.9999

60.0000-64.9999

65.0000-69.9999

10

0

2

1

23

Class Limits and Class Boundaries

Class Limits

20.0000-24.9999

25.0000-29.9999

30.0000-34.9999

Etc.

55.0000-59.9999

60.0000-64.9999

65.0000-69.9999

Class Boundaries

19.99995 – 24.99995

24.99995 – 29.99995

29.99995 – 34.99995

Etc.

54.99995 – 59.99995

59.99995 – 64.99995

64.99995 – 69.99995

24

Class Limits and Class Boundaries

• What to do with the gap between the class limits of adjacent classes?

• Limits 25.0000-29.9999 and 30.0000-34.9999

• There’s gap between 29.9999

0 and 30.0000

0

• Midway between them is 29.9999

5

• Class Boundaries extend to that midpoint

• 24.9999

5 – 29.9999

5 and 29.9999

5 – 34.9999

5

25

Class Boundaries

• Example: Class Limits 25.0000 – 29.9999

• Class Boundaries 24.9999

5 – 29.9999

5

• This means 24.9999

5 ≤ x < 29.9999

5

• Note: including the lower boundary (≤)

• But not including the upper boundary (<)

• Because classes must never overlap

26

Class Midpoints

(Upper Limit + Lower Limit) ÷ 2

Class Limits

20.0000-24.9999

25.0000-29.9999

30.0000-34.9999

Etc.

55.0000-59.9999

60.0000-64.9999

65.0000-69.9999

Class Midpoints

22.49995

27.49995

32.49995

57. 49995

62. 49995

Etc.

= one class width apart

67. 49995

27

Class Limits, Boundaries, and

Midpoints for the Placement Test

• It’s easier with whole numbers as class limits

Class

Limits

Frequency

Class

Boundaries

Class

Midpoint

0-9

10-19

20-29

30-39

40-49

50 +

19

38

52

71

50

28

-0.5 – 9.5

9.5 – 19.5

19.5 – 29.5

29.5 – 39.5

4.5

14.5

24.5

34.5

39.5 – 49.5

44.5

49.5 and up None? Or 54.5?

28

Excel Tools

• Link: The Excel FREQUENCY function .

• Link: The Excel COUNTIF function .

– Need to add info about COUNTIFS function.

• Also Excel “Histogram” function generates frequency distributions (discussed in the

Histogram lesson)

29

Download