Chapter 2: Frequency Distributions and Graphs

advertisement
Chapter 2: Frequency Distributions and Graphs
Learning Objectives
Upon completion of Chapter 2, you will be able to:
–
–
Organize the data into a table or chart (called a frequency distribution)
Construct a graph from the chart
I. Basic Vocabulary
•
Raw data is data in its original form.
•
A frequency distribution is the organization of raw data into a table using categories for the
data in one column and the frequencies for each category in the second column.
•
Frequency (f) is the tally or count of the number of data values in each class.
•
Relative frequency (f/n) is the tally or count of the number of data values in each class
divided by the total number of data values.
•
Cumulative Frequency is the tally or count of the number of data values in a class plus the
frequencies for all lower classes.
•
Cumulative relative fequency is the cumlative frequency divided by the totally number of
data values.
II. Frequency Distributions
A. Types of Frequency Distributions
I. Qualitative Data:
• Categorical frequency distribution is a two column chart with a list of all possible
attributes or categories for the data in the first column and the count of the
amount of data in each category in the second column.
II. Quantitative Data:
a) Grouped frequency distribution (for data with a small range) is a chart of each
possible individual value of the data in the first column and the count of the
amount of data with that value in the second column.
b) Ungrouped frequency distribution (for data with a small range) is a chart of each
possible individual value of data in the first column and the count of the amount of
data with that value in the second column.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 1
B. Examples of Frequency Distributions
I. Qualitative or Categorical Frequency Distributions
• Create a table with gender (Male/Female) in the first column and the count of the
number of men and women in the class in the second column.
•
Create a table with level of Employment (none, part time, full time) in the first
column and the count of the number of students in the class in each category in the
second column.
II. Ungrouped Quantitative Frequency Distributions
• In the first column, list the numbers 0, 1, 2, 3, 4… representing the number of keys a
student is carrying. In the second column, list the count of the number of students
with that many keys.
•
In the first column, list the numbers 0, 1, 2, 3, 4… representing the number of cars
in your family. In the second column, list the count of the number of students with
that many cars in their family.
C. Why Construct Frequency Distributions?
•
•
•
•
•
To organize the data for interpretation
To compare different data sets
To simplify the computation for measures of average and speed
To determine the shape distribution
To draw charts and graphs for data
III. Grouped Frequency Distributions
For data with a “large” range, place the data in groups or classes that are several units in width.
A. Terms for Grouped Frequency Distributions
•
•
•
•
The lower class limit represents the smallest data value that can be included in the
class.
The upper class limit represents the largest value that can be included in the class.
Range (R): largest data value minus the smallest data value.
Class boundaries are the numbers used to separate classes but without the gaps
created by class limits.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 2
B. Characteristics of Classes or Groups
•
•
•
•
•
•
•
There should be between 5 and 20 classes.
The class width should be an odd number. (Suggested by Bluman)
The classes must be mutually exclusive.
The classes must be continuous.
The classes must be able to include all data.
The classes must be equal width.
The classes must include all data or be exhaustive.
C. Finding the Class Midpoint
It is the average of either:
a) The 2 class boundaries for each individual class, OR
b) The 2 class limits for each individual class
D. Finding the Class Width
There are several ways to find the class width (all with the same answer).
The class width is either:
a) The difference in 2 sequential lower class limits (2 different classes),
b) The different between 2 sequential upper class limits (2 different classes), OR
c) The different between the lower and upper boundaries for the same class
Note: the class width is constant throughout the frequency distribution
E. Procedure to find the Class Limits from Data
The process to find class limits from data is as follows:
1. Find the range. Range= maximum-minimum values
2. To find the class width, divide the range by the number of classes and round up to the
next whole odd number. The width has the same number of decimal places as the data.
3. Select the lowest data value as the starting point or lowest class limit.
4. Add the width to find the next lower class limit.
5. Upper limits are 1 unit less than the next class’s lower limit.
6. Continue this process until an upper class limit is less than the highest data value.
Note: The last class should not have no members or should not have a frequency equal to
zero.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 3
F. Suggested Number of Classes Based on Sample Size
Sample Size (n)
Suggested Number of Classes
Less than 16
Not Enough Information
16 - 31
5
32 - 63
6
64 - 127
7
128 - 255
8
256 – 511
9
512 - 1023
10
G. Finding Class Boundaries from Class Limits
Use class limits to find class boundaries:
I. Find the class limits (same number of decimal places as the data).
II. Find upper class boundaries by adding ½ unit to the upper class limit of each class.
III. Find the lower class boundaries by subtracting ½ unit from the lower class limit of each
class.
H. Decimal Place Rule
Class limits have the same number of decimal places as the data, but class boundaries have
one additional place value than the data and end in a “5”.
IV.Graphs
A. The Role of Graphs
•
•
Presents the data in pictorial form.
Attracts attention in a publication or a presentation.
B. Types of Graphs
•
•
•
•
Bar graph – graph of the frequency distribution for qualitative or categorical data.
Histograph – graph of the frequency distribution for quantitative data.
Ogive – graph of the cumulative frequency for quantitative data.
Frequency polygon – graph of the frequency for quantitative data.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 4
C. Histogram
Scale: class boundaries or class midpoints
Vertical (or horizontal) bars are proportional to the frequencies for each class.
Class
Boundaries
0.5 – 20.5
20.5 – 40.5
40.5 – 60.5
60.5 – 80.5
80.5 – 100.5
Frequency
4
9
20
40
24
Note: The scale on the non-frequency axis is either the class boundaries or class midpoints.
• Class midpoints are located in the middle of the bars and class boundaries are
located at the ends of the bars.
D. Frequency Polygon
Scale: class midpoints
•
•
•
•
•
Plot the frequency of each class at its midpoint, i.e., (class midpoint, class frequency.)
The scale is sequential midpoints.
Extend the midpoint scale once below the first class midpoint and once above the last
class midpoint. Label the extensions.
Plot a point at each extension with a frequency of zero (extension, 0).
Connect all of the points with line segments forming a polygon.
Note: Remember a polygon is a many sided closed figure. The extension points and the
axis make the figure closed.
Things to Remember About Frequency Polygons
•
•
•
The scale is the difference between two sequential class midpoints.
Extend the scale and graph once above the largest class midpoint and once below the
smallest class midpoint.
Use a frequency of zero with both extensions.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 5
Class
0.5 – 20.5
20.5 – 40.5
40.5 – 60.5
60.5 – 80.5
80.5 – 100.5
Midpoint
10.5
30.5
50.5
70.5
90.5
Frequency
4
9
20
40
24
Note: Cumulative frequency for each upper boundary is the sum of the frequency in that
class plus all lower class frequencies.
E. Ogive or Cumulative Frequency Graph
Scale: class boundaries
Start with the lowest class boundary (lowest lower boundary, 0) and a frequency of zero,
then plot the cumulative frequency at the class boundary of each class. End with the
highest upper boundary (highest upper boundary, n)
Class
0.5 – 20.5
20.5 – 40.5
40.5 – 60.5
60.5 – 80.5
80.5 – 100.5
Dr. Janet Winter, jmw11@psu.edu
Frequency
4
9
20
40
24
Stat 200
Cumulative Frequency
4
13
33
73
97
Page 6
Number of Students
100
90
80
70
60
50
40
30
20
10
0
0.5
20.5
40.5
60.5
80.5
100.5
Scores for final exam
Note: The line segments connect at (.5, 0), (20.5, 4), (40.5, 13), (60.5, 33), (80.5, 73),
(100.5, 97) which are the (lowest lower boundary, 0), (first upper boundary, frequency for
first class), (second upper boundary, frequency for second class),…(last upper boundary,
total frequency).
F. Relative Frequency Graphs
A relative frequency graph uses the frequencies divided by the total of all frequencies
instead of frequencies. Use it with any graph when proportions are more meaningful than
the actual count or frequency.
G. Other Graphs of Interest
I. Dot plot is a graph with a point r dot for each data value above a scaled horizontal line.
II. A Pareto chart is a bar graph (for the categorical data) with the categories arranged from
the highest to the lowest frequency.
Frequency
How People Get to Work
30
20
10
0
Auto
Bus
Dr. Janet Winter, jmw11@psu.edu
Trolley
Train
Stat 200
Walk
Page 7
III. A time series graph is used for data that occur over a specific period of time; it is a graph
of time on the x-axis and frequency on the y-axis ( time, quantity) connected with line
segments:
Temperature
Temperature Over a 5-hour Period
55
50
45
40
35
12
1
2
3
4
5
Time
IV. A pie graph is a circle divided into sections proportional to the percentage in each category.
Favorite American Snacks
Snack
Nuts
8%
Popcorn
13%
Pretzels
14%
Potato
Chips
38%
Tortilla
Chips
27%
Note: The degree for a segment is the relative frequency for the segment times 360°.
V. A stem-and-leaf plot
• Use for quantitative data
• Vertically ordered list of the left part of the data digits (or stem)
• The right most digit of the data digits (called the leaf) listed horizontally and
sequentially to the right
• Retains actual data while showing it in graphic form.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 8
a) Process:
1. Split the digits in the number into right most digit called the leaf and any
remaining digits to the left called the stems
2. List all possible stem values once in increasing order
3. Draw a vertical line to the right of the stems
4. List the leaves sequentially and horizontally to the right of the vertical line with
their respective stems as often as occurs
Note: A stem value is listed once while leaves are listed as often as they occur in
a data value
b) Example:
Data: 123 125 131 113 101 102 104 111
114 111 132 133 141 142 143 132
Stem Plot:
10
1 2
11
1 1
12
3 5
13
1 2
14
1 2
4
3 4
2 3
3
c) Other types of stem plots:
Split stem-and-leaf
• Each stem value is recorded twice
• The first line is for trailing digits 1-4
• The second line is for trailing digits 5-9
Back to back stem-and-leaf
• Separate the data into two categories by listing the leaf’s for one
category to the left of its stem and the leaf’s for the other category to
the right of its stem
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 9
V. Chapter Review Questions
1. One of the early steps a researcher must do when conducting a statistical study is to
a) gather and collect data.
b) use a computer or a calculator to analyze the data.
c) draw conclusions from the data.
2. A statistics professor gives a very easy 100 point test, with the highest score being 98 and
the lowest score being 71. We want to divide this data into categories. Then, a reasonable
width of categories could be
a) 1
b) 5
c) 10
3. The manager of a computer store wishes to track how many computer monitors of
different screen sizes are sold during the week. He tallies the sales by the following
categories: less than 15”, 15-15.9”, 16-16.9”, 17-17.9” 18-18.9”, 19-19.9”, and 20” and
above. The best way to represent the data is using a
a)
b)
c)
d)
Histogram.
Frequency polygon.
Ogive.
All of the above.
4. What presents more information, a frequency polygon or an ogive?
a) The frequency polygon presents more information.
b) The ogive presents more information.
c) They have equal amounts of information.
5. If we would like to display all the areas of the states in the Unites States and we only care
about the states with the largest areas, then an appropriate graph would be a
a) Pareto chart.
b) Time series graph.
c) Pie graph.
6. The dean of engineering at a school wishes to track the number of students with
engineering majors over the past 10 years. An appropriate graph would be a
a) Pareto chart.
b) Time series graph.
c) Pie graph.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 10
VI.Summary
•
Histograms, frequency polygons and ogives are used for quantitative data in a grouped
frequency distribution.
•
Pareto charts and bar graphs are frequency graphs for qualitative variables.
•
Time series graphs are used to show a pattern or trend that occurs over time.
•
Pie graphs are used to show the relationship between the parts and the whole for
qualitative or categorical data.
•
Data can be organized in meaningful ways using frequency distributions and graphs.
VII. ANSWERS: Chapter Review Questions
1. One of the early steps a researcher must do when conducting a statistical study is to
a) gather and collect data.
2. A statistics professor gives a very easy 100 point test, with the highest score being a 98 and
the lowest score being 71. We want to divide this data into categories. Then, a reasonable
width of categories could be
b) 5
3. The manager of a computer store wishes to track how many computer monitors of
different screen sizes are sold during the week. He tallies the sales by the following
categories: less than 15”, 15-15.9”, 16-16.9”, 17-17.9” 18-18.9”, 19-19.9”, and 20” and
above. The best way to represent the data is using a
d) All of the above.
4. Which presents more information, a frequency polygon or an ogive?
c) They have equal amounts of information.
5. If we would like to display the areas of the states in the United States and we only care
about the states with the largest areas, then an appropriate graph would be a
a) Pareto chart.
6. The dean of engineering at a school wishes to track the number of students with
engineering majors over the past 10 years. An appropriate graph would be a
b) Time series graph.
Dr. Janet Winter, jmw11@psu.edu
Stat 200
Page 11
Download