Histograms, Frequency Distributions and Related Topics These are constructions that will allow us to represent large sets of data in ways that may be more meaningful to the reader. Histograms provide graphical representation of data with bars whose heights indicate the number of data in a certain range. A frequency table shows the distribution of data in classes (intervals). The classes are constructed so that each data values falls into exactly one class, and the class frequency is the number of data in the class. How long does the 1161 mile Iditarod take? (p. 47, problem 7). 261 271 236 244 279 296 284 299 288 288 247 256 338 360 341 333 261 266 287 296 313 311 307 307 299 303 277 283 304 305 288 290 288 289 297 299 332 330 309 328 307 328 285 291 295 298 306 315 310 318 318 320 333 321 323 324 327 Can you easily see what the maximum and minimum times are? Is it easy to tell how the times are distributed? To find the class width, First compute: Largest value - smallest Value Desired number of classes Increase the value computed to the next highest whole, number even if the first value was a whole number. This will ensure the classes cover the data. The lower class limit of a class is the lowest data that can fit into the class, the upper class limit is the highest data value that can fit into the class. The class width is the difference between lower class limits of adjacent classes. In a frequency table, divide the data range into classes equal width, compute: Largest value - smallest Value Desired number of classes Increase the value computed to the next highest whole, number even if the first value was a whole number. This will ensure the classes cover the data. The lower class limit of a class is the lowest data that can fit into the class, the upper class limit is the highest data value that can fit into the class. The class width is the difference between lower class limits of adjacent classes. Class Boundaries Class boundaries cannot belong to any class. Class boundaries between adjacent classes are the midpoint between the upper limit of the first class, and the lower limit of the higher class. Differences between upper and lower boundaries of a given class is the class width. The midpoint of a class (class mark) is the average of its upper and lower boundaries, which is also the average of its upper and lower limits. It is easier to make the histogram if the data is sorted: 236 244 247 256 261 261 266 271 277 279 283 284 285 287 288 288 288 288 289 290 291 295 296 296 297 298 299 299 299 303 304 305 306 307 307 307 309 310 311 313 315 318 318 320 321 323 324 327 328 328 330 332 333 333 338 341 360 The class width is computed as (360-236)/5 which is 24.8. Hence the class width is 25. Lower Limit Upper Limit Lower Boundary Upper Boundary Mark Frequency 236 260 235.5 260.5 248 4 261 285 260.5 285.5 273 9 286 310 285.5 310.5 298 25 311 335 310.5 335.5 323 16 336 360 335.5 360.5 348 3 Histograms A histogram is a bar graph that can be constructed using a frequency table: Put the class boundaries on the horizontal axis The bars have the same width and always touch and the edges of the bars are on class boundaries. The height of the bar is the class frequency. Histogram for Iditarod Data Time to Complete Iditarod 30 Frequency 25 20 15 Frequency 10 5 0 23 5. 5 26 0. 5 28 5. 5 31 0. 5 Hours 33 5. 5 36 0. 5 Relative Frequencies The relative frequency of a class is f/n where f is the frequency of the class, and n is the total of all frequencies. Relative frequency tables are like frequency tables except the relative frequency is given. Relative frequency histograms are like frequency histograms except the height of the bars represent relative frequencies. Systolic blood pressures of 50 subjects Make a histogram with 8 classes 100 102 104 108 108 110 110 112 112 112 115 116 116 118 118 118 118 120 120 126 126 126 128 128 128 130 130 130 130 130 132 132 134 134 136 136 138 140 140 146 148 152 152 152 156 160 190 200 208 208 Systolic blood pressures of 50 subjects Class Width = (208-100)/8 = 13.5, thus use 14 L. Bndy U. Bndy L. Limit U. Limit Mark Freq. R. Freq. C. Freq 99.5 113.5 100 113 106.5 10 0.20 10 113.5 127.5 114 127 120.5 12 0.24 22 127.5 141.5 128 141 134.5 17 0.34 39 141.5 155.5 142 155 148.5 5 0.10 44 155.5 169.5 156 169 162.5 2 0.04 46 169.5 183.5 170 183 176.5 0 0.00 46 183.5 197.5 184 197 190.5 1 0.02 47 197.5 211.5 198 211 204.5 3 0.06 50 Frequency Histogram for Blood Pressure Data Histogram 18 16 Frequency 14 12 10 Frequency 8 6 4 2 0 5 1. 21 5 7. 19 5 3. 18 5 9. 16 5 5. 15 5 1. 14 5 7. 12 5 3. 11 .5 99 Systolic Blood Pressure Relative Frequency Histogram for Blood Pressure Data Relative Frequency Histogram Relative Frequency 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 21 19 18 16 15 14 12 11 99 5 1. 5 7. 5 3. 5 9. 5 5. 5 1. 5 7. 5 3. .5 Systolic Pressure Cumulative Frequencies & Ogives The cumulative frequency of a class is the frequency of the class plus the frequencies for all previous classes. An ogive is a line graph that displays cumulative frequencies. Constructing Ogives Make a frequency table showing class boundaries and cumulative frequencies. For each class, put a dot over the upper class boundary at the height of the cumulative class frequency. Place dot on horizontal axis at the lower class boundary of the first class. Connect the dots. Ogive for Blood Pressure Data Blood Pres s ures of 50 Subjects Cummulative Frequency 60 50 40 30 20 10 0 99.5 127.5 155.5 Sys tolic Pres s ure 183.5 211.5 Winning Times for Kentucky Derby 120 Cumulative Frequency 100 94 101 100 85 80 75 60 48 40 20 12 0 0 -0.85 1.15 3.15 5.15 7.15 9.15 11.15 13.15 Seconds over 2 Minutes (a) What number, and percentage, of winning times are under 2:07.15? (b) Estimate number, and percentage, of winning times between 2:05.15 and 2:11.15. Distribution Shapes Symmetrical Uniform (it has a rectangular histogram) Skewed left – the longer tail is on the left side. Skewed right – the longer tail is on the right side. Bimodal (the two classes with the largest frequencies are separated by at least one class)