Quantitative Data For a Statistics’ project, students weighed the contents of cans of cola. In 2000, 24 cans of cola were weighed (full and empty). The difference (full – empty) is the weight of the contents. The units are grams. 1 Quantitative Data Who? Cans of cola. What? Weight (g) of contents. 368, 351, 355, 367, 352, 369, 370, 369 370, 355, 354, 357, 366, 353, 373, 365 355, 356, 362, 354, 353, 378, 368, 349 2 Weight of Contents What can we say about the weight of contents of a can of cola? – Variation! – Smallest value? – Largest value? – Middle value? 3 Display of Data Stem-and-Leaf Display or Stem Plot – Orders the data and creates a display of the distribution of values. 4 Display of Data Histogram – A picture of the distribution of the data. – Collects values into bins. – Bins should be of equal width. – Different bin choices can yield different pictures. 5 Frequency Histogram Measurement 6 Constructing a Histogram Order data from smallest to largest using a stem and leaf display. Determine bins. – equal width – more data more bins 7 Weight of Contents Weight of Contents of Cans of Cola Frequency 15 10 5 0 330 340 350 360 370 380 390 Weight (grams) 8 Shape Symmetry – Mounded, flat Skew – Right, left Other – Multiple peaks, outliers 9 Symmetric, mounded in middle Histogram of Octane Rating 10 9 8 Frequency 7 6 5 4 3 2 1 0 86 87 88 89 90 91 92 93 94 95 96 Octane 10 Skew - Right pH of Pork Loins 80 70 Frequency 60 50 40 30 20 10 0 5.0 5.5 6.0 6.5 7.0 pH 11 Skew - Left Flexibility Index of Young Adult Men 20 Frequency 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Flexibility Index 12 Multiple Peaks Size of Diamonds (carats) Frequency 15 10 5 0 0.1 0.2 0.3 0.4 Size (carats) 13 Center A typical value. Summary of the whole batch of numbers. For symmetric distributions – easy. 14 Histogram of Octane Histogram of Octane Rating 10 9 8 Frequency 7 6 5 4 3 2 1 0 86 87 88 89 90 91 92 Octane 93 94 95 96 Center 15 Spread Variation matters. – Tightly clustered? – Spread out? – Low and high values? 16 Numerical Summaries Weights of contents of cans of cola. 34 9 35 12334455567 36 25678899 37 0038 17 Numerical Summaries What is a “typical” value? Look for the center of the distribution. What do we mean by “center”? 18 Measures of Center Sample Midrange – Average of the minimum and the maximum. (349+378)/2=363.5 grams – Greatly affected outliers. 19 Measures of Center Sample Median – A value that divides the data into a lower half and an upper half. – About half the data values are greater than the median about half are less than the median. 20 Sample Median 34 9 35 12334455567 36 25678899 37 0038 Median = (357+362)/2 = 359.5 grams 21 Measure of Center Sample mean Total y n y i n 22 Sample Mean Total = 8669 n = 24 Total 8669 y 361.2 n 24 23 Mean or Median? The sample mean is the balance point of the distribution. The sample median divides the distribution into a lower and an upper half. For skewed data, the mean is pulled in the direction of the skew. 24 Numerical Summaries How much variation is there in the data? Look for the spread of the distribution. What do we mean by “spread”? 25 Measures of Spread Sample Range – The distance from the minimum and the maximum. Range = (378 – 349 ) = 29 grams – The length of the interval that contains 100% of the data. – Greatly affected outliers. 26 Quartiles Medians of the lower and upper halves of the data. Trying to split the data into fourths, quarters. 27 Quartiles 34 9 Lower quartile = (354+354)/2 = 354 grams 35 12334455567 36 25678899 37 0038 Upper quartile = (368+369)/2 = 368.5 grams 28 Measure of Spread InterQuartile Range (IQR) – The distance between the quartiles. IQR = 368.5 – 354 = 14.5 grams – The length of the interval that contains the central 50% of the data. 29 Five Number Summary Minimum Lower Quartile Median Upper Quartile Maximum 349 grams 354 grams 359.5 grams 368.5 grams 378 grams 30 Box Plots Establish an axis with a scale. Draw a box that extends from the lower to the upper quartile. Draw a line from the lower quartile to the minimum and another line from the upper quartile to the maximum. 31 Outlier Box Plots Establishes boundaries on what are “usual” values based on the width of the box. Values outside the boundaries are flagged as potential outliers. 32 Contents of Cans of Cola 345 350 355 360 365 370 375 380 385 W eight (grams) 33 Measures of Spread Based on the deviation from the sample mean. Deviation y y 34 9-hole Golf Scores 46, 44, 50, 43, 47, 52 282 y 47 strokes 6 40 45 50 55 35 Deviations –4 +5 –3 –1 40 45 +3 50 55 36 Sample Variance Almost the average squared deviation y y 2 s 2 n 1 37 Sample Variance s 2 16 9 1 25 9 60 5 2 12 strokes 5 38 Sample Standard Deviation y y 2 s s 2 n 1 s 12 3.46 strokes 39 Which summary is better? For symmetric distributions use the sample mean, y , and sample standard deviation, s. For skewed distributions use the five number summary. 40 Why? For symmetric distributions the sample mean and sample median should be approximately equal so either would work. We will see in Chapter 6 why the sample standard deviation is best for symmetric distributions. 41 Why? For skewed distributions, the sample mean and standard deviation will be affected by the skew and/or potential outliers. The five number summary displays the skew and is not affected by outliers. 42