CHAPTER 2: Visual Description of Data

advertisement
CHAPTER 2:
Visual Description of Data
to accompany
Introduction to Business Statistics
fourth edition, by Ronald M. Weiers
Presentation by Priscilla Chaffe-Stengel
Donald N. Stengel
© 2002 The Wadsworth Group
Chapter 2 - Learning Objectives
• Convert raw data into a data array.
• Construct:
– a frequency distribution.
– a relative frequency distribution.
– a cumulative relative frequency distribution.
• Construct a stem-and-leaf diagram.
• Visually represent data by using graphs
and charts.
© 2002 The Wadsworth Group
Chapter 2 - Key Terms
• Data array
– An orderly presentation of data in either
ascending or descending numerical order.
• Frequency Distribution
– A table that represents the data in classes
and that shows the number of observations
in each class.
© 2002 The Wadsworth Group
Chapter 2 - Key Terms
• Frequency Distribution
–
–
–
–
–
Class - The category
Frequency - Number in each class
Class limits - Boundaries for each class
Class interval - Width of each class
Class mark - Midpoint of each class
© 2002 The Wadsworth Group
Sturges’ rule
• How to set the approximate number of
classes to begin constructing a frequency
distribution.
k  1 3.322 (log n )
10
where k = approximate number of classes to use and
n = the number of observations in the data set .
© 2002 The Wadsworth Group
How to Construct a
Frequency Distribution
1. Number of classes
Choose an approximate number of classes for your data.
Sturges’ rule can help.
2. Estimate the class interval
Divide the approximate number of classes (from Step 1) into
the range of your data to find the approximate class interval,
where the range is defined as the largest data value minus
the smallest data value.
3. Determine the class interval
Round the estimate (from Step 2) to a convenient value.
© 2002 The Wadsworth Group
How to Construct a
Frequency Distribution, cont.
4. Lower Class Limit
Determine the lower class limit for the first class by
selecting a convenient number that is smaller than the
lowest data value.
5. Class Limits
Determine the other class limits by repeatedly adding the
class width (from Step 2) to the prior class limit, starting
with the lower class limit (from Step 3).
6. Define the classes
Use the sequence of class limits to define the classes.
© 2002 The Wadsworth Group
Converting to a Relative
Frequency Distribution
1. Retain the same classes defined in the
frequency distribution.
2. Sum the total number of observations
across all classes of the frequency
distribution.
3. Divide the frequency for each class by the
total number of observations, forming the
percentage of data values in each class.
© 2002 The Wadsworth Group
Forming a Cumulative Relative
Frequency Distribution
1. List the number of observations in the lowest
class.
2. Add the frequency of the lowest class to the
frequency of the second class. Record that
cumulative sum for the second class.
3. Continue to add the prior cumulative sum to the
frequency for that class, so that the cumulative
sum for the final class is the total number of
observations in the data set.
© 2002 The Wadsworth Group
Forming a Cumulative Relative
Frequency Distribution, cont.
4. Divide the accumulated frequencies for each class
by the total number of observations -- giving you
the percent of all observations that occurred up to
an including that class.
• An Alternative: Accrue the relative frequencies
for each class instead of the raw frequencies.
Then you don’t have to divide by the total to get
percentages.
© 2002 The Wadsworth Group
Example: Problem 2.53
• The average daily cost to community hospitals for
patient stays during 1993 for each of the 50 U.S. states
was given in the next table.
– a) Arrange these into a data array.
– b) Construct a stem-and-leaf display.
– *) Approximately how many classes would be appropriate
for these data? [*not in textbook]
– c & d) Construct a frequency distribution. State interval
width and class mark.
– e) Construct a histogram, a relative frequency distribution,
and a cumulative relative frequency distribution.
© 2002 The Wadsworth Group
Problem 2.53 - The Data
AL
AK
AZ
AR
CA
CO
CT
DE
FL
GA
$775
1,136
1,091
678
1,221
961
1,058
1,024
960
775
HI 823
ID 659
IL 917
IN 898
IA 612
KS 666
KY 703
LA 875
ME 738
MD 889
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
1,036
902
652
555
863
482
626
900
976
829
NM 1,046
NY 784
NC 763
ND
507
OH 940
OK 797
OR 1,052
PA 861
RI
885
SC 838
SD 506
TN 859
TX 1,010
UT 1,081
VT 676
VA 830
WA 1,143
WV 701
WI
744
WY 537
© 2002 The Wadsworth Group
Problem 2.53 - (a) Data Array
CA 1,221
WA 1,143
AK 1,136
AZ 1,091
UT 1,081
CT 1,058
OR 1,052
NM 1,046
MA 1,036
DE 1,024
TX
NH
CO
FL
CH
IL
MI
NV
IN
MD
1,010
976
961
960
940
917
902
900
898
889
RI
LA
MO
PA
TN
SC
VA
NJ
HI
OK
885
875
863
861
859
838
830
829
823
797
NY
AL
GA
NC
WI
ME
KY
WV
AR
VT
784
775
775
763
744
738
703
701
678
676
KS
ID
MN
NE
IA
MS
WY
ND
SD
MT
666
659
652
626
612
555
537
507
506
482
© 2002 The Wadsworth Group
Problem 2.53 - (b)
The Stem-and-Leaf Display
Stem-and-Leaf Display
Leaf Unit: 100
1 12
2 11
8 10
7
9
(11) 8
9
7
7
6
4
5
1
4
N = 50
21
43, 36
91, 81, 58, 52, 46, 36, 24, 10
76, 61, 60, 40, 17, 02, 00
98, 89, 85, 75, 63, 61, 59, 38, 30, 29, 23
97, 84, 75, 75, 63, 44, 38, 03, 01
78, 76, 66, 59, 52, 26, 12
55, 37, 07, 06
82
Range: $482 - $1,221
© 2002 The Wadsworth Group
Problem 2.53 - Continued
• To approximate the number of classes we
should use in creating the frequency
distribution, use Sturges’ Rule, n = 50:
k 13.322(log n)13.322(log 50)
10
10
13.322(1.69897)15.6446.6447
Sturges’ rule suggests we use
approximately 7 classes.
© 2002 The Wadsworth Group
Constructing the Frequency
Distribution
• Step 1. Number of classes
– Sturges’ Rule: approximately 7 classes.
The range is: $1,221 – $482 = $739
$739/7 $106 and $739/8 $92
• Steps 2 & 3. The Class Interval
– So, if we use 8 classes, we can make each
class $100 wide.
© 2002 The Wadsworth Group
Constructing the Frequency
Distribution
• Step 4. The Lower Class Limit
– If we start at $450, we can cover the range in 8 classes,
each class $100 in width.
The first class : $450 up to $550
• Steps 5 & 6. Setting Class Limits
$450 up to $550
$550 up to $650
$650 up to $750
$750 up to $850
$850 up to $950
$950 up to $1,050
$1,050 up to $1,150
$1,150 up to $1,250
© 2002 The Wadsworth Group
Problem 2.53 - (c) & (d)
Average daily cost
Number
$450 – under $550
4
$550 – under $650
3
$650 – under $750
9
$750 – under $850
9
$850 – under $950
11
$950 – under $1,050
7
$1,050 – under $1,150
6
$1,150 – under $1,250
1
Interval width: $100
Mark
$500
$600
$700
$800
$900
$1,000
$1,100
$1,200
© 2002 The Wadsworth Group
Problem 2.53 - (e) The Histogram
12
10
8
6
4
2
0
500
600
700
800
900
1000 1100 1200
© 2002 The Wadsworth Group
Problem 2.53 - The Relative
Frequency Distribution
Average daily cost
$450 – under $550
$550 – under $650
$650 – under $750
$750 – under $850
$850 – under $950
$950 – under $1,050
$1,050 – under $1,150
$1,150 – under $1,250
Number
4
3
9
9
11
7
6
1
Rel. Freq.
4/50 = .08
3/50 = .06
9/50 = .18
9/50 = .18
11/50 = .22
7/50 = .14
6/50 = .12
1/50 = .02
© 2002 The Wadsworth Group
Problem 2.53 - (e) The Percentage
0.25
Polygon
0.2
0.15
0.1
0.05
0
0
200
400
600
800
1000
1200
1400
© 2002 The Wadsworth Group
Problem 2.53 - The Cumulative
Frequency Distribution
Average daily cost
$450 – under $550
$550 – under $650
$650 – under $750
$750 – under $850
$850 – under $950
$950 – under $1,050
$1,050 – under $1,150
$1,150 – under $1,250
Number
4
3
9
9
11
7
6
1
Cum. Freq.
4
7
16
25
36
43
49
50
© 2002 The Wadsworth Group
Problem 2.53 - The Cumulative
Relative Frequency Distribution
Average daily cost
$450 – under $550
$550 – under $650
$650 – under $750
$750 – under $850
$850 – under $950
$950 – under $1,050
$1,050 – under $1,150
$1,150 – under $1,250
Cum.Freq.
4
7
16
25
36
43
49
50
Cum.Rel.Freq.
4/50 = .02
7/50 = .14
16/50 = .32
25/50 = .50
36/50 = .72
43/50 = .86
49/50 = .98
50/50 = 1.00
© 2002 The Wadsworth Group
Problem 2.53 - (e) The Percentage
Ogive (Less Than)
50
45
40
35
30
25
20
15
10
5
0
0
200
400
600
800
1000
1200
© 2002 The Wadsworth Group
The Scatter Diagram
• A scatter diagram is a two-dimensional plot of
data representing values of two quantitative
variables.
• x, the independent variable, on the horizontal axis
• y, the dependent variable, on the vertical axis
• Four ways in which two variables can be
related:
1. Direct
2. Inverse
3. Curvilinear
4. No relationship
© 2002 The Wadsworth Group
An Example: Problem 2.38
• For 6 local offices of a large tax preparation
firm, the following data describe x = service
revenues and y = expenses for supplies,
freight, postage, etc.
• Draw a scatter diagram representing the data.
Does there appear to be any relationship
between the variables? If so, is the relationship
direct or inverse?
© 2002 The Wadsworth Group
Problem 2.38, continued
Scatter Plot with Trend Line
y = Office 25.0
Expenses
(thous) 23.0
21.0
19.0
17.0
15.0
200.0
300.0
400.0
500.0
600.0
x = Service Revenue (thous)
There appears to be a direct relationship between
the service revenue and the office expenses incurred.
© 2002 The Wadsworth Group
Download