OBJECTIVES Organize data using frequency distributions

advertisement
M227
Chapter 2
Frequency Distributions and Graphs
Sections 1,2
OBJECTIVES
Organize data using frequency distributions.
Represent data in frequency distributions graphically using histograms, frequency
polygons, and ogives.
Represent data using Pareto charts, time series graphs, and pie graphs.
Draw and interpret a stem and leaf plot.
Draw and Interpret a scatter plot for a set of paired data.
This chapter will show how to organize data and then construct appropriate graphs to
represent the data in a concise, easy-to-understand form.
ORGANIZING DATA
When data are collected in original form, they are called raw data.
A frequency distribution is the organization of raw data in table form, using classes and
frequencies.
The most common distributions are
o categorical frequency distributions
o ungrouped frequency distributions
o grouped distributions
PRESENTING DATA
Organized data is presented in pictorial form, called Graphs and Charts, to facilitate its
analysis and interpretation.
The most common graphs and charts are:
o Histograms
o Frequency Polygons
o Cumulative Frequencies or Ogives
o Pareto Charts
o Time Series and Compound Time Series Graphs
o Pie Graphs
o Stem and Leaf Plots
o Scatter Plots
Sections 1,2
1
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Sections 1,2
A. Categorical Frequency Distributions
Categorical frequency distributions are used when data can be placed in specific categories,
such as nominal or ordinal level data: for example political affiliations, major field of study, or
color will use categorical frequency distributions. Example:
The data below (raw data) represent the blood types of 25 people:
A
B
B
AB
O
O
O
B
AB
B
B
B
O
A
O
A
O
O
O
AB
AB
A
O
B
A
The four categories are: A, B, O, and AB. We count how many concurrencies each category
has, and we form the following frequency table:
Class
A
B
O
AB
Tally
Frequency
Cumulative
Frequency
Percent
Cumulative
Percent
5
7
9
4
25
5
12
21
25
20
28
36
16
100
20
48
84
100
/////
///// //
///// ////
////
Table 2.1
B. Ungrouped Frequency Distributions
Ungrouped Frequency Distributions are used when we have few distinct numerical data to
organize. Example:
The data below (raw form) represents the number of incoming calls per day over the first 20
days of a month:
4
4
1
2
5
6
4
6
6
3
3
2
4
6
4
7
We just count how many times each distinct value has occurred:
Class limits
1
2
3
4
5
6
7
Tally
3
1
1
1
Frequency
////
//
///
/////
/
////
/
4
2
3
5
1
4
x 1x
20
Table 2.2
Sections 1,2
2
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Sections 1,2
C. Grouped Frequency Distributions
Grouped Frequency Distributions are used when we need to organize large amounts of
numerical data. Example:
The data below (raw form) represents the miles that the 50 employees of a company travel
to work every day.
1
17
5
8
2
4
2
8
6
17
9
4
7
1
11
14
12
14
16
7
13
5
1
3
2
4
9
2
6
16
2
6
9
4
10
5
5
11
18
8
4
7
6
10
3
5
9
15
18
18
We organize these data by grouping them into subintervals, called classes:
1-3 miles, 4-6 miles, 7-9 miles, 10-12 miles, 13-15 miles, 16-18 miles, where 1 is the lowest
data value and 18 is the highest.
Then, we count how many values fall in each class; this number is called the frequency of
the class:
Class limits
1-3
4-6
7-9
10-12
13-15
16-18
Tally
Frequency
///// /////
///// ///// ////
///// /////
/////
////
///// //
10
14
10
5
4
7
___
50
Table 2.3
Attributes of Frequency Distributions
1. Classes: Grouping of data into subdivisions of fixed width
2. Lower Class Limit: the smallest number of each class: numbers 1, 4, 7, 10, 13, 16 of the
classes in Table 1.
3. Upper Class Limit: the largest number of each class: number 3, 6, 9, 12, 15, 18 of the
classes in Table 1.
4. Class Width for a class is found by subtracting the lower (or upper) class limit of one
class from the lower (or upper) class limit of the next class.
5. Class Boundaries: close the gaps between classes. For example classes 1-3 and 4-6
have a gap between 3 and 4. In this case we define the boundary of each class to be .5
less than the lower limit and .5 more than the upper limit: .5 - 3.5, 3.5 - 6.5, … See
Table 1A below. It means all values greater or equal to .5, 3.5, … and less than 3.5,
6.5, …
In general, the class boundaries have an additional place value than the class limits and
they end with a 5.
Class limit: 3 – 6,
Boundary: 2.5 – 6.5
Class limit: 3.2 – 6.2
Boundary: 3.15 – 6.25
Class limit: 3.46 – 6.46, Boundary: 3.455 - 6.465
Sections 1,2
3
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Sections 1,2
lower class lim it + upper class lim it
2
7. Cumulative Frequency: the sum of the frequencies accumulated up to the upper
boundary of a class.
6. Class Mark: The midpoint of each class: X m =
8. Relative Frequency: the frequency of each class divided by the sum of all the
f
frequencies: Relative frequency =
.
∑f
9. Percent Frequency: the Relative Frequency expressed as a percent:
f
∑f
i100
Relative
Class Limit
Class
Class Mark
Boundaries (Midpoint)
Frequency
Cumulative
Frequency
Percent
Frequency Cumulative
Frequency
Frequency
Cumulative
Frequency
1-3
0.5-3.5
2
10
10
0.20
0.20
20%
20
4-6
3.5-6.5
5
14
24
0.28
0.48
28%
48
7-9
6.5-9.5
8
10
34
0.20
0.68
20%
68
10 - 12
9.5-2.5
11
5
39
0.10
0.78
10%
78
13 - 15
12.5-15.5
14
4
43
0.08
0.86
8%
86
16 - 18
15.5-18.5
17
7
50
0.14
1.00
14%
100
50
1.00
100%
Table 2.4
Rules to Construct Frequency Distributions
1. There should be between 5 and 20 classes
2. The class width should be an odd number. This insures that the midpoint has the same
H −L
place value as the data. Ex: Width =
; Round-up result.
number of classes
3. The classes must be mutually exclusive – nonoverlapping class limits
4. The classes must be continuous.
5. The classes must be exhaustive – enough classes to accommodate all the data.
6. The classes must be equal in width (exception: “open-ended distributions”, no specific
beginning value or no specific ending value.)
Sections 1,2
4
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Sections 1,2
Procedure for Constructing a Quantitative Frequency Distribution
Step 1: Determine the Classes
o Find the lowest and highest value
o Find the range
o Select the number of classes desired (often given)
o Find the width: Divide the range by the number of classes and round UP.
o Select starting point (normally the lowest data value)
o Set lowest class limit equal to the starting point.
o Calculate lower limits for all classes.
o Calculate upper limits for all classes.
o Find the boundaries.
o Calculate the midpoints .
Step 1: Tally the data
Step 3: Find the numerical frequencies.
Step 4: Find cumulative frequencies.
Step 5: Find relative (and / or percent) frequencies.
Step 6: Find cumulative relative and/or percent frequencies.
Reasons for constructing frequency distributions
To organize the data in a meaningful, intelligible way.
To enable the reader to make comparisons among different data sets.
To facilitate computational procedures for measures of average and spread.
To enable the reader to determine the nature or shape of the distribution
To enable the researcher to draw charts and graphs for the presentation of data.
Sections 1,2
5
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Section 2-3
Histograms, Frequency Polygons, and Ogives
The purpose of graphs in statistics is to convey the data to the viewer in pictorial form.
Graphs are useful in getting the audience’s attention in a publication or a presentation.
The three most common graphs:
1. Histogram: displays the data by using vertical bars of various heights to represent the
frequencies. The class boundaries are represented on the x-axis and the frequencies on
the y-axis.
Histogram
Histogram
16
Frequencies
12
10
10
Frequencies
14
14
10
7
8
5
6
4
4
16
14
12
10
8
6
4
2
0
14
10
7
5
2
0
0.5
3.5
6.5
9.5
12.5
15.5
5
0.
18.5
10
.
-3
5
5
3.
.
-6
5
5
6.
Boundaries
0.20
0.28
0.20
0.20
0.14
0.15
0.10
0.10
0.08
0.05
0.00
5
3.
50.
5
6.
53.
5
9.
5
2.
-1
.5
.5
15
18
55.
.
12
15
Histogram - Percent Frequencies
Percent Frequencies
Relative Frequencies
0.25
5
Boundaries
Histogram - Relative Frequencies
0.30
.
-9
4
5
5
5
.5
9.
8.
5.
12
-1
-1
55.5
.5
6.
5
2
9.
1
1
28%
30%
20%
14%
10%
10%
8%
0%
5
0.
Boundaries
20%
20%
.
-3
5
5
3.
.
-6
5
5
6.
.
-9
5
5
5
5
5.
8.
2.
-1
-1
-1
5
5
5
.
.
9.
12
15
Boundaries
Relative frequencies and percent frequencies histograms are constructed the same way as
the frequency histograms, except that instead of frequencies for the data values, we use the
relative or percent frequencies.
Section 2-3
6
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Section 2-3
2. Frequency polygon: displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes. Frequencies are represented on the y-axis.
Frequency Polygon
Relative Frequency Polygon
Relative Frequrncies
Frequrncies
16
14
12
10
8
6
4
2
0
0.30
0.25
0.20
0.15
0.10
0.05
0.00
-1
2
5
8
11
14
17
20
-1
2
5
8
Midpoints
11
14
17
20
Midpoints
Relative frequency and percent frequency polygons are constructed the same way as
frequency polygons, except that we use the relative or percent frequencies in place of the
frequencies.
3. Cumulative frequency or Ogive: represents the cumulative frequencies for the classes
in a frequency distribution. Class boundaries are located on the x-axis and frequencies
on the y-axis.
Realtive Ogive
60
1.2
Relative Cum Freq
Cumulative Frequrncies
Ogive
50
40
30
20
10
0
1
0.8
0.6
0.4
0.2
0
0.5
3.5
6.5
9.5
12.5
15.5
18.5
0.5
Boundaries
3.5
6.5
9.5
12.5
15.5
18.5
Boundaries
Realtive Ogive
Relative Cum Freq
120%
100%
80%
60%
40%
20%
0%
0.5
3.5
6.5
9.5
12.5
15.5
18.5
Boundaries
Relative Ogives can be constructed by replacing the frequency values with the relative (or
percent) values. Ogive charts are useful in answering questions such as how many or
what percent of values are below a certain upper class boundary.
Section 2-3
7
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Section 2-4
Other Types of Graphs
•
•
•
•
Pareto
Time Series
Pie
Stem and Leaf Plots
1. Pareto chart: is used to represent a frequency distribution for categorical variable; the
frequencies are displayed by the heights of vertical bars, which are arranged in order
from highest to lowest. (Graph below from data in table 2-1 above)
Pareto Chart of Blood Types
Frequencies
10
9
8
7
6
5
4
4
2
0
O
B
A
AB
Blood Types
2. Pie graph: is a circle that is divided into sections or wedges according to the percentage
of frequencies in each category of the distribution
Each wedge has an angle that is proportional to the ratio of its frequency to the sum of all
the frequencies of the distribution:
Pie
AB
16% - 57.60
O
36% - 129.60
A
20% - 720
B
28% - 100.80
Section 2-4
8
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Degree =
Section 2-4
f
⋅ 360 , where f is the frequency and n = sum of all the frequencies.
n
Percentages are calculated as before: % =
f
⋅ 100%
n
Class
Frequency
Percent
Angle
O
B
A
AB
9
7
5
4
25
36
28
20
16
100
129.60
100.80
72.00
57.60
3600
3. A stem-and-leaf plot is a data plot that uses part of a data value as the stem and part of
the data value as the leaf to form groups or classes. It has the advantage over grouped
frequency distribution of retaining the actual data while showing them in graphic form
Example: Consider the following data from example 2-12:
25 31
32 44
20
32
32
52
13
44
14
51
43
45
02
57
23
36
32
33
A. Arrange the data in ascending order:
02,13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 44, 45, 51, 52, 57
B. Separate the data according to the first digit:
02
13,14
20,23,25
31,32,32,32,32,33,36
43,44,44,45
51,52,57
C. Use the first digit as the stem and the trailing digits as the leaf:
0
1
2
3
4
5
2
3
0
1
3
1
4
3
2
4
2
5
2 2 2 3 6
4 5
7
3. Time series graph. It is a line graph that represents data occurring over a specific period
of time
For example, the data below represents the number (in millions) of vehicles sold in
California during the last 7 years:
2000
2001
2002
2003
2004
2005
2006
Section 2-4
1.05
0.95
1.45
1.62
1.63
1.96
2.22
9
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Compound Time Series
2.5
2.5
2
2
New Vehicles
New Vehicles
Time Series
Section 2-4
1.5
1
0.5
1.5
1
0.5
0
0
2000
2001
2002
2003
2004
2005
2006
Year
2000
2001
2002
2003
2004
2005
2006
Year
Time series graphs are very useful when two or more sets of values are plotted for the
same period of time. For example, assume that we have another set of values for the
years above, representing the number of vehicles “retired” for each year.
2000
2001
2002
2003
2004
2005
2006
1.05
0.95
1.45
1.62
1.63
1.96
2.22
0.20
0.15
0.21
0.41
0.51
0.68
0.75
We graph both data values with the same x-axis, as shown above. A trend can be
discerned from this graph that there is an increase over the years of the number of
“active” vehicles.
Section 2-4
10
2/6/2007
M227
Chapter 2
Frequency Distributions and Graphs
Section 2-5
4. Scatter Plots
Many times researchers are interested in determining if a relationship between two
variables exist.
To do this, the researcher collects data consisting of two measures that are paired with
each other.
The variable first mentioned is called the independent variable; the second variable is
the dependent variable.
A scatter plot is a graph of ordered pairs of data values that is used to determine if a
relationship exists between the two variables. Typically, the independent variable is
plotted on the x-axis and the dependent variable is plotted on the y-axis.
Example: 2-15
No. of Accidents, x
No. of fatalities, y
376
5
650
20
884
20
1162
28
1513
26
165
34
2236
35
3002
56
4028
68
Scatter Plot
80
70
# Fatalities
60
50
40
30
20
10
0
0
1000
2000
3000
4000
5000
# Accidents
Analyzing a Scatter Plot
A positive linear relationship exists when the points fall approximately in an
ascending straight line and both the x and y values increase at the same time.
A negative linear relationship exists when the points fall approximately in a straight
line descending from left to right.
A nonlinear relationship exists when the points fall along a curve.
No relationship exists when there is no discernable pattern of the points.
Section 2-5
11
2/6/2007
4010
55
M227
Chapter 2
Frequency Distributions and Graphs
Section 2-5
SUMMARY
Histograms, frequency polygons, and ogives are used when the data are contained in a
grouped frequency distribution.
Pareto charts are used to show frequencies for nominal variables
Time series graphs are used to show a pattern or trend that occurs over time.
Pie graphs are used to show the relationship between the parts and the whole.
When data are collected in pairs, the relationship, if one exists, can be determined by
looking at a scatter plot
Section 2-5
12
2/6/2007
Download