Uploaded by Elijah Quade

Math 227 Statistics

advertisement
Math 227 Elementary Statistics
Bluman 5th edition
CHAPTER 2
Frequency Distributions and Graphs
2
Objectives
• Organize data using frequency distributions.
• Represent data in frequency distributions
graphically using histograms, frequency
polygons, and ogives.
• Represent data using Pareto charts, time series
graphs, and pie graphs.
• Draw and interpret a stem and leaf plot.
• Draw and Interpret a scatter plot for a set of
paired data.
3
Introduction
This chapter will show how to organize data
and then construct appropriate graphs to
represent the data in a concise, easy-tounderstand form.
4
Section 2.1 Organizing Data
Basic Vocabulary
• When data are collected in original form, they are called
raw data.
• A frequency distribution is the organization of raw data in
table form, using classes and frequencies.
• The two most common distributions are categorical
frequency distribution and the grouped frequency
distribution.
5
Frequency Distributions
Categorical Frequency Distributions
count how many times each distinct
category has occurred and
summarize the results in a table
format
6
Example 1: Letter grades for Math 227 Spring 2005:
C A B C D F B B A C C F C
B D A C C C F C C A A C
a)
Construct a frequency distribution for the categorical data.
Class
Tally
Frequency
Percent
A
/
Answer:
////
5
20
B
////
4
16
44
//// /
/
/
////
C
11
D
//
2
8
F
///
3
12
Total 25
100
7
b) What percentage of the students pass the class with
the grade C or better?
Answer:
Total number of letter grade = 25
Number of grade C or better = 20
Percentage =
8
Frequency Distributions
Group Frequency Distributions When the range of the data is large, the
data must be grouped into classes
that are more than one unit in width
9
Grouped Frequency Distributions
The lower class limit represents the smallest value
that can be included in the class.
The upper class limit represents the largest value
that can be included in the class.
10
Lower and Upper Class Limit
Class
Limits
Lower Class
Limit
24 - 30
31 - 37
38 - 44
45 - 51
52 - 58
Class
Limits
Frequency
3
1
5
9
6
Upper Class
Limit
24 - 30
31 - 37
38 - 44
45 - 51
52 - 58
Frequency
3
1
5
9
6
11
Grouped Frequency
Distributions (cont.)
The class boundaries are used to separate the classes so
that there are no gaps in the frequency distribution.
Class
Limits
Class
Limits
Frequency
23.5
23.5
24 - 30
3
30.5
31 - 37
1
37.5
38 - 44
Class
Boundaries
24 - 30
3
31 - 37
1
38 - 44
5
45 - 51
9
52 - 58
6
30.5
37.5
5
44.5
44.5
45 - 51
9
51.5
51.5
52 - 58
58.5
Frequency
6
58.5
12
Class Boundaries Significant Figures
• Rule of Thumb: Class limits should have the same decimal
place value as the data, but the class boundaries have
one additional place value and end in a 5.
e.g. data were whole numbers
lower class boundary = lower class limit – 0.5
upper class boundary = upper class limit + 0.5
e.g. data were one decimal place
lower class boundary = lower class limit – 0.05
upper class boundary = upper class limit +0.05
13
Class Midpoints
•The class midpoint (mark) is found by
adding the lower and upper boundaries (or
limits) and dividing by 2.
14
Class Midpoints
Class
Limits
Class
Midpoints
Frequency
24
27
30
3
31
34
37
1
38
41
44
5
45
48
51
9
52
55
58
6
15
Class Width
The class width for a class in a frequency
distribution is found by subtracting the lower
(or upper) class limit of one class from the
the lower (or upper) class limit of the next
class.
16
Class Width
Class
Limits
Class
Width
Frequency
24 - 30
3
31 - 37
1
38 - 44
5
45 - 51
9
52 - 58
6
7
7
7
7
17
Class Rules
• There should be between 5 and 20 classes.
• The class width should be an odd number but
not absolutely necessary.
• The classes must be mutually exclusive.
• The classes must be continuous.
• The classes must be exhaustive.
• The classes must be equal width.
18
Class width as an odd
number
The class width being an odd number is preferable since it
ensures that the midpoint of each class has the sample
place value as the data.
If the class width is an even number, the midpoint is in
tenths. For example, if the class width is 6 and the class
limits are 6 and 11, the midpoint is:
*This is only a suggestion, and it is not rigorously followed.
19
Relative Frequency
Relative Frequency is the frequency of each class
divided by the total number.
relative frequency =
Class
Limits
class frequency
sum of all frequencies
Frequency
24 - 30
3
31 - 37
1
38 - 44
5
45 - 51
9
52 - 58
6
Total 24
Relative
Frequency
3/24 =0.125
1/24 =0.042
5/24 =0.208
9/24 =0.375
6/24 =o.25
1
20
Cumulative Frequency
Cumulative Frequency is the sum of the
frequencies accumulated up to the upper
boundary of a class.
Class
Limits
Frequency
24 - 30
3
31 - 37
1
38 - 44
5
45 - 51
9
52 - 58
6
Total 24
Cumulative
Frequency
3
4
9
18
24
21
Procedure for constructing a
grouped frequency distribution
1. Decide on the number of classes you want. ( 5 to 20 classes)
2. Calculate (round up) the class width
range
(highest value) – (lowest value)
class width 
=
number of classes
number of classes
*Round the answer up to the nearest whole number if there is a
remainder: ex) the class width of 14.167 –round up to 15.
*If there is no remainder, you will need to add an extra class to
accommodate all the data.
3. Choose a number for the lower limit of the first class
4. Use the lower limit of the first class and the class width to list
the other lower class limits.
5. Enter the upper class limits.
6. Tally the frequency for each class
22
Example 1 : Construct a grouped frequency table for the
following data values.
44, 32, 35, 38, 35, 39, 42, 36, 36, 40, 51, 58
58, 62, 63, 72, 78, 81, 25, 84, 20
Tip: Consider reordering the data.
23
Answer:
20, 25, 32, 35, 35, 36, 36, 38, 39, 40, 42, 44,
51, 58, 58, 62, 63, 72, 78, 81, 84
1. Let number of classes be 5
2. Range = High – Low = 84 – 20 = 64
Class width = 64 / 5 = 12.8 ≈ 13 (Round-up)
Class
Tally
Frequency
20 – 32
///
3
33 – 45
//// ////
/
9
46 – 58
///
3
59 – 71
//
2
72 – 84
////
4
+13
+13
+13
+13
24
Ex) Complete the table.
Class
Class
limit
boundaries
20 – 32
19.5 – 32.5
33 – 45
32.5 – 45.5
39
/
Midpt
Tally
46 – 58
45.5 – 58.5
52
59 – 71
58.5 – 71.5
72– 84
71.5 – 84.5
Frequency
20+32 =26
///
2
Cumulative
Relative
frequency
frequency
3
3
3 =0.14
21
9
12
0.43
///
3
15
0.14
65
//
2
17
0.10
78
////
4
21
0.19
Total
21
////
////
1
25
Frequency Distributions
An ungrouped frequency distribution is used
for numerical data and when the range of
data is small.
26
Example: The number of incoming telephone calls
per day over the first 25 days of business:
4, 4, 1, 10, 12, 6, 4, 6, 9, 12, 12, 1, 1, 1,
12, 10, 4, 6, 4, 8, 8, 9, 8, 4, 1
Construct an ungrouped frequency distribution
27
Answer:
Tally
////
/
//// /
/
Class limits
Class
Number of Calls boundaries
1
0.5 – 1.5
2
1.5 – 2.5
3
2.5 – 3.5
4
3.5 – 4.5
5
4.5 – 5.5
6
5.5 – 6.5
7
6.5 – 7.5
8
7.5 – 8.5
9
8.5 – 9.5
10
9.5 – 10.5
11
10.5 –11.5
12
11.5 –12.5
///
///
//
//
////
Frequency
5
0
0
6
0
3
0
3
2
2
0
4
Cumulative
frequency
5
5
5
11
11
14
14
17
19
21
21
25
28
Types of Frequency Distributions
(summary)
A categorical frequency distribution is used when
the data is nominal.
• A grouped frequency distribution is used when
the range is large and classes of several units in
width are needed.
• An ungrouped frequency distribution is used for
numerical data and when the range of data is
small.
29
Why Construct Frequency
Distributions?
• To organize the data in a meaningful, intelligible way.
• To enable the reader to make comparisons among
different data sets.
• To facilitate computational procedures for measures of
average and spread.
• To enable the reader to determine the nature or shape of
the distribution.
• To enable the researcher to draw charts and graphs for
the presentation of data.
30
Section 2.2 Histogram, Frequency
Polygons, Ogives
This chapter will show how to organize data
and then construct appropriate graphs to
represent the data in a concise, easy-tounderstand form.
31
The Role of Graphs
• The purpose of graphs in statistics is to convey
the data to the viewer in pictorial form.
• Graphs are useful in getting the audience’s
attention in a publication or a presentation.
32
Three Most Common Graphs
• The histogram displays the data by using vertical
bars of various heights to represent the
Frequency
frequencies.
x-axis: class boundaries
y-axis: frequency
8
6
4
2
0
0
10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5
Class Boundaries
33
Three Most Common Graphs
(cont’d.)
• The frequency polygon displays the data by using
lines that connect points plotted for the
Frequency
frequencies at the midpoints of the classes.
x-axis: midpoints
y-axis: frequency
8
7
6
5
4
3
2
1
0
0.5 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5
Class Midpoints
34
Three Most Common Graphs
(cont’d.)
Cumulative
Frequency
• The cumulative frequency or ogive
represents the cumulative frequencies for the
classes in a frequency distribution.
x-axis: class boundaries
y-axis: cumulative frequency
30
20
10
0
0
10.5 21 31.5 42 52.5 63 73.5 84 94.5
Class Boundaries
35
Relative Frequency Graphs
• A relative frequency graph is a graph that
uses proportions instead of frequencies.
Relative frequencies are used when the
proportion of data values that fall into a given
class is more important than the frequency.
36
Example 1 :
The following data are the number of the Englishlanguage Sunday Newspaper per state in the United
States as of February 1, 1996.
2
3
3
4
4
4
4
4
5
6
6
6
7
7
7
8
10
11
11
11 12
12
13
14
14
14
15
15
16
16
16
16
16 16
18
18
19
21
21
23
27
31
35
37
38
39 40
44
62
85
37
a) Using 1 as the starting value and a class width of 15, construct a grouped
frequency distribution.
(for part b) (for part e)
(for part c) (for part d)
Class
Limit
Cumulative
Class
Relative
Cumulative Class
Relative
Freq Boundaries Frequency Frequency Midpoint Frequency
38
b) Construct a histogram for the grouped frequency
distribution.
(x-axis: class boundaries; y-axis: frequency)
Answer:
30
Frequency
25
20
15
10
5
0
e
or
M
.5
90
.5
75
.5
60
.5
45
.5
30
.5
15
Class Boundaries
39
c) Construct a frequency polygon.
(x-axis: class midpoints(marks); y-axis: frequency)
Frequency
Answer:
30
25
20
15
10
5
0
-7
8
23
38
53
68
83
98
Class Marks
40
d) Construct an ogive.
(x-axis: class boundaries; y-axis: cumulative frequency)
Cumulative Frequency
Answer:
60
50
40
30
20
10
0
0.5
15.5
30.5
45.5
60.5
75.5
90.5
Class Boundaries
41
e) Construct a (i) relative frequency histogram,
(ii) relative frequency polygon,
and (iii) relative cumulative frequency ogive.
Answer:
(i) relative frequency histogram
(x-axis: class boundaries; y-axis: relative frequency)
Relative Frequency
0.6
0.5
0.4
0.3
0.2
0.1
0
e
or
M
.5
90
.5
75
.5
60
.5
45
.5
30
.5
15
Class Boundaries
42
(ii) relative frequency polygon
Relative Frequency
(x-axis: class midpoints (marks); y-axis: relative
frequency)
0.6
0.5
0.4
0.3
0.2
0.1
0
-7
8
23
38
53
68
83
98
Class Marks
43
(iii) Ogive using relative cumulative frequency
Relative Cumulative
Frequency
(x-axis: class boundaries; y-axis: relative cumulative
frequency)
1.2
1
0.8
0.6
0.4
0.2
0
0.5
15.5
30.5
45.5
60.5
75.5
90.5
Class Boundary
44
Distribution
shapes
45
Section 2.3 Other Types of Graphs
A Pareto chart is used to represent a frequency distribution for
categorical variable, and the frequencies are displayed by the
heights of vertical bars, which are arranged in order from highest to
lowest.
(x-axis: categorical variables; y-axis: frequencies, which are arranged in
order from highest to lowest)
Frequency
How People Get to W ork
30
20
10
0
Auto
Bus
Trolley
Train
Walk
46
Other Types of Graphs (cont)
A pie graph is a circle
Favorite American Snacks
that is divided into
sections or wedges
according to the
snack nuts
8%
percentage of
popcorn
13%
frequencies in each
pretzels
14%
category of the
distribution.
potato
chips
38%
tortilla
chips
27%
47
Example 1: Grade received for Math 227
C A B B D C C C C B B A F F
a) Construct a pareto chart.
Answer:
Grade
A
B
C
D
F
Frequency
2
4
5
1
2
Next, arrange the frequency in descending order
Grade
C
B
A
F
D
Frequency
5
4
2
2
1
48
Pareto chart
6
Number
5
4
3
2
1
0
C
B
A
F
D
Grade
49
b) Construct a pie chart.
Answer:
Grade
Relative
Frequency Frequency
A
2
B
4
C
5
D
1
F
2
14
Degree
50
Pie chart
F (14%)
A (14%)
D (7%)
B(29%)
C(36%)
51
Other Types of Graphs (cont.)
• A time series graph represents data that occur
over a specific period of time.
Temp.
Temperature Over a 5-hour Period
55
50
45
40
35
12
1
2
3
4
5
Time
52
Example 1: The percentages of voters voting in the last 5 Presidential
elections are shown here. Construct a time series graph.
Answer:
1984
% of voters voting
74.63% 72.48% 78.01% 65.97% 67.50%
% of voters voting
Year
1988
1992
1996
2000
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
1984
1988
1992
Year
1996
2000
53
Stem-and-Leaf Plots
• A stem-and-leaf plot is a data plot that uses part
of a data value as the stem and part of the data
value as the leaf to form groups or classes.
• It has the advantage over grouped frequency
distribution of retaining the actual data while
showing them in graphic form.
54
Stem-and-Leaf Plots (cont)
Digits of each data to the left of a vertical bar are called the stems.
Digits of each data to the right of the appropriate stem are called the leaves.
Example 1: The test scores on a 100-point test were recorded for 20 students
61 93 91 86 55 63 86 82 76 57
94 89 67 62 72 87 68 65 75 84
Construct an ordered stem-and-leaf plot
Answer: Reorder the data:
55 57 61 62 63 65 67 68 72 75 76 82 84 86 86 87 89 91 93 94
Stem
Leaf
5
6
5 7
1 2 3 5 7 8
7
2 5 6
8
2 4 6 6 7 9
9
1 3 4
55
Example 2 : Use the data in example 1 to construct a double stem and leaf plot.
e.g. split each stem into two parts, with leaves 0 – 4 on one part
and 5 – 9 on the other.
0 – 4 → belongs to the first stem
Answer:
5 – 9 → belongs to the second stem
Stem
Leaf
5
5
5 7
6
1 2 3
6
7
7
5 7 8
2
5 6
8
2 4
8
6 6 7 9
9
1 3 4
9
56
Stem-and-Leaf Plots (cont)
A stem-and-leaf plot portrays the shape of a
distribution and restores the original data values.
It is also useful for spotting outliers.
Outliers are data values that are extremely large or
extremely small in comparison to the norm.
57
Misleading Graphs
Example: Graph of Automaker’s Claim
Using a Scale from 95% to 100%
Using a Scale from 0% to 100%
58
2.4 Paired Data and Scatter Plots
• Many times researchers are interested in
determining if a relationship between two
variables exist.
• To do this, the researcher collects data
consisting of two measures that are paired with
another.
• The variable first mentioned is called the
independent variable; the second variable is the
dependent variable.
59
• Scatter Plot – is a graph of order pairs
values that is used to determine
if a relationship exists between two
variables.
60
Analyzing the Scatter Plot
• A positive linear relationship exists when the
points fall approximately in an ascending straight
line and both the x and y values increase at the
same time.
• A negative linear relationship exists when the
points fall approximately in a straight line
descending from left to right.
• A nonlinear relationship exists when the points
fall along a curve.
• No relationship exists when there is no
discernable pattern of the points.
61
Examples of Scatter Plots and Relationships
62
Example 1: A researcher wishes to determine if there is a relationship
between the number of days an employee missed a year and the
person’s age. Draw a scatter plot and comment on the nature of
the relationship.
22 30 25 35 65 50 27 53 42 58
Days missed (y)
0
Days Missed (y)
Age (x)
4
1
2 14
7
16
14
12
10
8
6
4
2
0
3
8
6
4
The relationship of the data
shows a positive linear
relationship.
0
20
Age (x)
40
60
80
63
Summary of Graphs and Uses
• Histograms, frequency polygons, and
ogives are used when the data are
contained in a grouped frequency
distribution.
• Pareto charts are used to show
frequencies for nominal variables.
64
Summary of Graphs and Uses
(cont.)
• Time series graphs are used to show a
pattern or trend that occurs over time.
• Pie graphs are used to show the
relationship between the parts and the
whole.
• When data are collected in pairs, the
relationship, if one exists, can be determined by
looking at a scatter plot.
65
Conclusions
• Data can be organized in some
meaningful way using frequency
distributions. Once the frequency
distribution is constructed, the
representation of the data by graphs is a
simple task.
66
Download