Notes on Histograms

advertisement
Histograms, Frequency
Distributions and Related Topics
These are constructions that will allow us to
represent large sets of data in ways that may
be more meaningful to the reader.
Histograms provide graphical representation
of data with bars whose heights indicate the
number of data in a certain range.
A frequency table shows the distribution of
data in classes (intervals). The classes are
constructed so that each data values falls
into exactly one class, and the class
frequency is the number of data in the class.
How long does the 1161 mile Iditarod take? (p. 47, problem 7).
261
271
236
244
279
296
284
299
288
288
247
256
338
360
341
333
261
266
287
296
313
311
307
307
299
303
277
283
304
305
288
290
288
289
297
299
332
330
309
328
307
328
285
291
295
298
306
315
310
318
318
320
333
321
323
324
327
Can you easily see what the maximum and minimum times are?
Is it easy to tell how the times are distributed?
To find the class width,
First compute:
Largest value - smallest Value
Desired number of classes
Increase the value computed to the next highest whole,
number even if the first value was a whole number. This
will ensure the classes cover the data.
The lower class limit of a class is the lowest data that can
fit into the class, the upper class limit is the highest data
value that can fit into the class. The class width is the
difference between lower class limits of adjacent classes.
In a frequency table, divide the data range into classes
equal width,
compute:
Largest value - smallest Value
Desired number of classes
Increase the value computed to the next highest whole,
number even if the first value was a whole number. This
will ensure the classes cover the data.
The lower class limit of a class is the lowest data that can
fit into the class, the upper class limit is the highest data
value that can fit into the class. The class width is the
difference between lower class limits of adjacent classes.
Class Boundaries
 Class boundaries cannot belong to any class.
 Class boundaries between adjacent classes are the
midpoint between the upper limit of the first class,
and the lower limit of the higher class.
 Differences between upper and lower boundaries
of a given class is the class width.
 The midpoint of a class (class mark) is the average
of its upper and lower boundaries, which is also
the average of its upper and lower limits.
It is easier to make the histogram if the data is sorted:
236
244
247
256
261
261
266
271
277
279
283
284
285
287
288
288
288
288
289
290
291
295
296
296
297
298
299
299
299
303
304
305
306
307
307
307
309
310
311
313
315
318
318
320
321
323
324
327
328
328
330
332
333
333
338
341
360
 The class width is computed as (360-236)/5
which is 24.8. Hence the class width is 25.
Lower
Limit
Upper
Limit
Lower
Boundary
Upper
Boundary
Mark
Frequency
236
260
235.5 260.5
248
4
261
285
260.5 285.5
273
9
286
310
285.5 310.5
298
25
311
335
310.5 335.5
323
16
336
360
335.5 360.5
348
3
Histograms
A histogram is a bar graph that can be constructed
using a frequency table:
 Put the class boundaries on the horizontal axis
 The bars have the same width and always touch
and the edges of the bars are on class boundaries.
 The height of the bar is the class frequency.
Histogram for Iditarod Data
Time to Complete Iditarod
30
Frequency
25
20
15
Frequency
10
5
0
23
5.
5
26
0.
5
28
5.
5
31
0.
5
Hours
33
5.
5
36
0.
5
Relative Frequencies
The relative frequency of a class is f/n where f is the
frequency of the class, and n is the total of all
frequencies.
Relative frequency tables are like frequency tables
except the relative frequency is given.
Relative frequency histograms are like frequency
histograms except the height of the bars represent
relative frequencies.
Systolic blood pressures of 50 subjects
Make a histogram with 8 classes
100 102 104 108 108 110 110 112 112 112
115 116 116 118 118 118 118 120 120 126
126 126 128 128 128 130 130 130 130 130
132 132 134 134 136 136 138 140 140 146
148 152 152 152 156 160 190 200 208 208
Systolic blood pressures of 50 subjects
Class Width = (208-100)/8 = 13.5, thus use 14
L. Bndy
U. Bndy
L. Limit
U. Limit
Mark
Freq.
R. Freq.
C. Freq
99.5
113.5
100
113
106.5
10
0.20
10
113.5
127.5
114
127
120.5
12
0.24
22
127.5
141.5
128
141
134.5
17
0.34
39
141.5
155.5
142
155
148.5
5
0.10
44
155.5
169.5
156
169
162.5
2
0.04
46
169.5
183.5
170
183
176.5
0
0.00
46
183.5
197.5
184
197
190.5
1
0.02
47
197.5
211.5
198
211
204.5
3
0.06
50
Frequency Histogram for Blood Pressure Data
Histogram
18
16
Frequency
14
12
10
Frequency
8
6
4
2
0
5
1.
21
5
7.
19
5
3.
18
5
9.
16
5
5.
15
5
1.
14
5
7.
12
5
3.
11
.5
99
Systolic Blood Pressure
Relative Frequency Histogram for Blood Pressure Data
Relative Frequency Histogram
Relative Frequency
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
21
19
18
16
15
14
12
11
99
5
1.
5
7.
5
3.
5
9.
5
5.
5
1.
5
7.
5
3.
.5
Systolic Pressure
Cumulative Frequencies &
Ogives
 The cumulative frequency of a class is the
frequency of the class plus the frequencies
for all previous classes.
 An ogive is a line graph that displays
cumulative frequencies.
Constructing Ogives
 Make a frequency table showing class boundaries
and cumulative frequencies.
 For each class, put a dot over the upper class
boundary at the height of the cumulative class
frequency.
 Place dot on horizontal axis at the lower class
boundary of the first class.
 Connect the dots.
Ogive for Blood Pressure Data
Blood Pres s ures of 50 Subjects
Cummulative Frequency
60
50
40
30
20
10
0
99.5
127.5
155.5
Sys tolic Pres s ure
183.5
211.5
Winning Times for Kentucky Derby
120
Cumulative Frequency
100
94
101
100
85
80
75
60
48
40
20
12
0
0
-0.85
1.15
3.15
5.15
7.15
9.15
11.15
13.15
Seconds over 2 Minutes
(a) What number, and percentage, of winning times are under
2:07.15?
(b) Estimate number, and percentage, of winning times between
2:05.15 and 2:11.15.
Distribution Shapes





Symmetrical
Uniform (it has a rectangular histogram)
Skewed left – the longer tail is on the left side.
Skewed right – the longer tail is on the right side.
Bimodal (the two classes with the largest
frequencies are separated by at least one class)
Download