Quantitative Data

advertisement
Quantitative Data
 For a Statistics’ project, students
weighed the contents of cans of cola.
 In 2000, 24 cans of cola were weighed
(full and empty). The difference (full
– empty) is the weight of the contents.
The units are grams.
1
Quantitative Data
Who? Cans of cola.
What? Weight (g) of contents.
368, 351, 355, 367, 352, 369, 370, 369
370, 355, 354, 357, 366, 353, 373, 365
355, 356, 362, 354, 353, 378, 368, 349
2
Weight of Contents
What can we say about the weight
of contents of a can of cola?
– Variation!
– Smallest value?
– Largest value?
– Middle value?
3
Display of Data
Stem-and-Leaf Display
or Stem Plot
– Orders the data and creates a
display of the distribution of values.
4
Display of Data
Histogram
– A picture of the distribution of the
data.
– Collects values into bins.
– Bins should be of equal width.
– Different bin choices can yield
different pictures.
5
Frequency
Histogram
Measurement
6
Constructing a Histogram
 Order data from smallest to largest
using a stem and leaf display.
 Determine bins.
– equal width
– more data
more bins
7
Weight of Contents
Weight of Contents of Cans of Cola
Frequency
15
10
5
0
330
340
350
360
370
380
390
Weight (grams)
8
Shape
Symmetry
– Mounded, flat
Skew
– Right, left
Other
– Multiple peaks, outliers
9
Symmetric, mounded in middle
Histogram of Octane Rating
10
9
8
Frequency
7
6
5
4
3
2
1
0
86
87
88
89
90
91
92
93
94
95
96
Octane
10
Skew - Right
pH of Pork Loins
80
70
Frequency
60
50
40
30
20
10
0
5.0
5.5
6.0
6.5
7.0
pH
11
Skew - Left
Flexibility Index of Young Adult Men
20
Frequency
15
10
5
0
1
2
3
4
5
6
7
8
9
10
Flexibility Index
12
Multiple Peaks
Size of Diamonds (carats)
Frequency
15
10
5
0
0.1
0.2
0.3
0.4
Size (carats)
13
Center
A typical value.
Summary of the whole batch of
numbers.
For symmetric distributions –
easy.
14
Histogram of Octane
Histogram of Octane Rating
10
9
8
Frequency
7
6
5
4
3
2
1
0
86
87
88
89
90
91
92
Octane
93
94
95
96
Center
15
Spread
Variation matters.
– Tightly clustered?
– Spread out?
– Low and high values?
16
Numerical Summaries
Weights of contents of cans of cola.
34 9
35 12334455567
36 25678899
37 0038
17
Numerical Summaries
What is a “typical” value?
Look for the center of the
distribution.
What do we mean by “center”?
18
Measures of Center
Sample Midrange
– Average of the minimum and the
maximum.
(349+378)/2=363.5 grams
– Greatly affected outliers.
19
Measures of Center
Sample Median
– A value that divides the data into a
lower half and an upper half.
– About half the data values are
greater than the median about half
are less than the median.
20
Sample Median
34 9
35 12334455567
36 25678899
37 0038
Median = (357+362)/2
= 359.5 grams
21
Measure of Center
Sample mean
Total
y

n
 y 
i
n
22
Sample Mean
Total = 8669
n = 24
Total 8669
y

 361.2
n
24
23
Mean or Median?
 The sample mean is the balance point
of the distribution.
 The sample median divides the
distribution into a lower and an upper
half.
 For skewed data, the mean is pulled in
the direction of the skew.
24
Numerical Summaries
How much variation is there in the
data?
Look for the spread of the
distribution.
What do we mean by “spread”?
25
Measures of Spread
Sample Range
– The distance from the minimum and
the maximum.
Range = (378 – 349 ) = 29 grams
– The length of the interval that
contains 100% of the data.
– Greatly affected outliers.
26
Quartiles
Medians of the lower and upper
halves of the data.
Trying to split the data into
fourths, quarters.
27
Quartiles
34 9
Lower quartile = (354+354)/2
= 354 grams
35 12334455567
36 25678899
37 0038
Upper quartile = (368+369)/2
= 368.5 grams
28
Measure of Spread
InterQuartile Range (IQR)
– The distance between the quartiles.
IQR = 368.5 – 354 = 14.5 grams
– The length of the interval that
contains the central 50% of the data.
29
Five Number Summary
Minimum
Lower Quartile
Median
Upper Quartile
Maximum
349 grams
354 grams
359.5 grams
368.5 grams
378 grams
30
Box Plots
 Establish an axis with a scale.
 Draw a box that extends from the
lower to the upper quartile.
 Draw a line from the lower quartile to
the minimum and another line from
the upper quartile to the maximum.
31
Outlier Box Plots
Establishes boundaries on what
are “usual” values based on the
width of the box.
Values outside the boundaries are
flagged as potential outliers.
32
Contents of Cans of Cola
345
350
355
360
365
370
375
380
385
W eight (grams)
33
Measures of Spread
Based on the deviation from the
sample mean.
Deviation
y  y
34
9-hole Golf Scores
46, 44, 50, 43, 47, 52
282
y
 47 strokes
6
40
45
50
55
35
Deviations
–4
+5
–3
–1
40
45
+3
50
55
36
Sample Variance
Almost the average squared deviation

y  y 


2
s
2
n 1
37
Sample Variance
s
2

16  9  1  25  9  60


5
2
 12 strokes
5
38
Sample Standard Deviation
  y  y  
2
s
s 
2
n 1
s  12  3.46 strokes
39
Which summary is better?
For symmetric distributions use
the sample mean, y , and sample
standard deviation, s.
For skewed distributions use the
five number summary.
40
Why?
For symmetric distributions the
sample mean and sample median
should be approximately equal so
either would work.
We will see in Chapter 6 why the
sample standard deviation is best
for symmetric distributions.
41
Why?
For skewed distributions, the
sample mean and standard
deviation will be affected by the
skew and/or potential outliers.
The five number summary
displays the skew and is not
affected by outliers.
42
Download