Quantitative Data
For a Statistics’ project, students
weighed the contents of cans of cola.
In 2000, 24 cans of cola were weighed
(full and empty). The difference (full
– empty) is the weight of the contents.
The units are grams.
1
Quantitative Data
Who? Cans of cola.
What? Weight (g) of contents.
368, 351, 355, 367, 352, 369, 370, 369
370, 355, 354, 357, 366, 353, 373, 365
355, 356, 362, 354, 353, 378, 368, 349
2
Weight of Contents
What can we say about the weight
of contents of a can of cola?
– Variation!
– Smallest value?
– Largest value?
– Middle value?
3
Display of Data
Stem-and-Leaf Display
or Stem Plot
– Orders the data and creates a
display of the distribution of values.
4
Display of Data
Histogram
– A picture of the distribution of the
data.
– Collects values into bins.
– Bins should be of equal width.
– Different bin choices can yield
different pictures.
5
Frequency
Histogram
Measurement
6
Constructing a Histogram
Order data from smallest to largest
using a stem and leaf display.
Determine bins.
– equal width
– more data
more bins
7
Weight of Contents
Weight of Contents of Cans of Cola
Frequency
15
10
5
0
330
340
350
360
370
380
390
Weight (grams)
8
Shape
Symmetry
– Mounded, flat
Skew
– Right, left
Other
– Multiple peaks, outliers
9
Symmetric, mounded in middle
Histogram of Octane Rating
10
9
8
Frequency
7
6
5
4
3
2
1
0
86
87
88
89
90
91
92
93
94
95
96
Octane
10
Skew - Right
pH of Pork Loins
80
70
Frequency
60
50
40
30
20
10
0
5.0
5.5
6.0
6.5
7.0
pH
11
Skew - Left
Flexibility Index of Young Adult Men
20
Frequency
15
10
5
0
1
2
3
4
5
6
7
8
9
10
Flexibility Index
12
Multiple Peaks
Size of Diamonds (carats)
Frequency
15
10
5
0
0.1
0.2
0.3
0.4
Size (carats)
13
Center
A typical value.
Summary of the whole batch of
numbers.
For symmetric distributions –
easy.
14
Histogram of Octane
Histogram of Octane Rating
10
9
8
Frequency
7
6
5
4
3
2
1
0
86
87
88
89
90
91
92
Octane
93
94
95
96
Center
15
Spread
Variation matters.
– Tightly clustered?
– Spread out?
– Low and high values?
16
Numerical Summaries
Weights of contents of cans of cola.
34 9
35 12334455567
36 25678899
37 0038
17
Numerical Summaries
What is a “typical” value?
Look for the center of the
distribution.
What do we mean by “center”?
18
Measures of Center
Sample Midrange
– Average of the minimum and the
maximum.
(349+378)/2=363.5 grams
– Greatly affected outliers.
19
Measures of Center
Sample Median
– A value that divides the data into a
lower half and an upper half.
– About half the data values are
greater than the median about half
are less than the median.
20
Sample Median
34 9
35 12334455567
36 25678899
37 0038
Median = (357+362)/2
= 359.5 grams
21
Measure of Center
Sample mean
Total
y
n
y
i
n
22
Sample Mean
Total = 8669
n = 24
Total 8669
y
361.2
n
24
23
Mean or Median?
The sample mean is the balance point
of the distribution.
The sample median divides the
distribution into a lower and an upper
half.
For skewed data, the mean is pulled in
the direction of the skew.
24
Numerical Summaries
How much variation is there in the
data?
Look for the spread of the
distribution.
What do we mean by “spread”?
25
Measures of Spread
Sample Range
– The distance from the minimum and
the maximum.
Range = (378 – 349 ) = 29 grams
– The length of the interval that
contains 100% of the data.
– Greatly affected outliers.
26
Quartiles
Medians of the lower and upper
halves of the data.
Trying to split the data into
fourths, quarters.
27
Quartiles
34 9
Lower quartile = (354+354)/2
= 354 grams
35 12334455567
36 25678899
37 0038
Upper quartile = (368+369)/2
= 368.5 grams
28
Measure of Spread
InterQuartile Range (IQR)
– The distance between the quartiles.
IQR = 368.5 – 354 = 14.5 grams
– The length of the interval that
contains the central 50% of the data.
29
Five Number Summary
Minimum
Lower Quartile
Median
Upper Quartile
Maximum
349 grams
354 grams
359.5 grams
368.5 grams
378 grams
30
Box Plots
Establish an axis with a scale.
Draw a box that extends from the
lower to the upper quartile.
Draw a line from the lower quartile to
the minimum and another line from
the upper quartile to the maximum.
31
Outlier Box Plots
Establishes boundaries on what
are “usual” values based on the
width of the box.
Values outside the boundaries are
flagged as potential outliers.
32
Contents of Cans of Cola
345
350
355
360
365
370
375
380
385
W eight (grams)
33
Measures of Spread
Based on the deviation from the
sample mean.
Deviation
y y
34
9-hole Golf Scores
46, 44, 50, 43, 47, 52
282
y
47 strokes
6
40
45
50
55
35
Deviations
–4
+5
–3
–1
40
45
+3
50
55
36
Sample Variance
Almost the average squared deviation
y y
2
s
2
n 1
37
Sample Variance
s
2
16 9 1 25 9 60
5
2
12 strokes
5
38
Sample Standard Deviation
y y
2
s
s
2
n 1
s 12 3.46 strokes
39
Which summary is better?
For symmetric distributions use
the sample mean, y , and sample
standard deviation, s.
For skewed distributions use the
five number summary.
40
Why?
For symmetric distributions the
sample mean and sample median
should be approximately equal so
either would work.
We will see in Chapter 6 why the
sample standard deviation is best
for symmetric distributions.
41
Why?
For skewed distributions, the
sample mean and standard
deviation will be affected by the
skew and/or potential outliers.
The five number summary
displays the skew and is not
affected by outliers.
42