Stat 104 – Lecture 5 Display of Numerical Data • Histogram – A picture of the distribution of the data. – Collects values into classes. – Classes should be of equal width. – Different class choices can yield different pictures. 1 Frequency Histogram Measurement 2 Constructing a Histogram • Order data from smallest to largest using a stem and leaf display. • Determine classes. – equal width – more data more classes 3 1 Stat 104 – Lecture 5 Split Stem Body Mass (kg) of Canidae 0 | 1,3,3,3,4,4,4 0*| 5,5,5,5,5,6,6,6,7,8,9,9 1 | 0,0,1,2,3 1*| 2 | 2,3 2*| 5 3 | 3*| 6 4 Freq Class 0 ≤ Body Mass < 5 7 5 ≤ Body Mass < 10 10 ≤ Body Mass < 15 12 5 15 ≤ Body Mass < 20 0 20 ≤ Body Mass < 25 2 25 ≤ Body Mass < 30 1 30 ≤ Body Mass < 35 0 35 ≤ Body Mass < 40 1 5 Histogram Distributions Body Mass (kg) 12 8 6 Count 10 4 2 0 5 10 15 20 25 30 35 40 6 2 Stat 104 – Lecture 5 Shape • Symmetry (mirror image) – Mounded, flat • Skew (mounded on one side) – Toward higher values (right) – Toward lower values (left) • Other – Multiple peaks, outliers 7 Symmetric & Mounded Histogram of Octane Rating 10 9 8 Frequency 7 6 5 4 3 2 1 0 86 87 88 89 90 91 92 93 94 95 96 Octane 8 Skewed to Right pH of Pork Loins 80 70 Frequency 60 50 40 30 20 10 0 5.0 5.5 6.0 6.5 7.0 pH 9 3 Stat 104 – Lecture 5 Skewed to Left Flexibility Index of Young Adult Men 20 Frequency 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Flexibility Index 10 Multiple Peaks Size of Diamonds (carats) Frequency 15 10 5 0 0.1 0.2 0.3 0.4 Size (carats) 11 Summarizing Numerical Data • What is a “typical” value? • Look for the center of the distribution. • What do we mean by “center”? 12 4 Stat 104 – Lecture 5 Summary Measures • Central Tendency – Sample midrange – Sample median – Sample mean 13 Measures of Center • Sample Midrange – Average of the minimum and the maximum. – Body mass of Canidae: (1 + 36)/2=18.5 kilograms – Greatly affected by outliers. 14 Measures of Center • Sample Median – A value that divides the data into a lower half and an upper half. – About half the data values are greater than the median about half are less than the median. 15 5 Stat 104 – Lecture 5 Sample Median (n even) Body Mass (kg) of Canidae 0 | 1,3,3,3,4,4,4 0*| 5,5,5,5,5,6,6,6,7,8,9,9 1 | 0,0,1,2,3 1*| Median = (6+6)/2 2 | 2,3 = 6 kilograms 2*| 5 3 | 3*| 6 16 Sample Median (n odd) Body Mass (kg) of Felidae 0 | 2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,7,8 1 | 0,0,1,1,1,2,3,7 2 |1 Median = 8 kilograms 3 |6 4 | 0,7 5 |5 9 |6 16| 2 17| 8 17 Measures of Center • Formula for the sample mean y= Total = n (∑ y ) i n 18 6 Stat 104 – Lecture 5 Sample Mean • Body mass of Canidae • Total = 260 • n = 28 y = Total 260 = = 9 . 3 kg n 28 19 What does each measure? • The sample midrange is midway between the smallest and largest values. • The sample median divides the distribution into a lower and an upper half. • The sample mean is the balance point of the distribution. 20 Which summary is “best”? • For symmetric shapes the sample mean is most informative. • For skewed shapes the sample median is better because it is less affected by outliers. 21 7