Histograms

advertisement
HISTOGRAMS
Representing Data
Why use a Histogram


When there is a lot of data
When data is

Continuous


a mass, height, volume, time etc
Presented in a Grouped Frequency Distribution

Often in groups or classes that are UNEQUAL
Histograms look like this......
NO GAPS between Bars
Continuous data
Bars may be different in width
Determined by Grouped
Frequency Distribution
So we use FREQUENCY DENSITY = Frequency
Class width
AREA is proportional to FREQUENCY
NOT height, because of
UNEQUAL classes!
Grouped Frequency Distribution
Classes
Speed,
km/h
Frequency
0< v ≤40 40< v ≤50 50< v ≤60 60< v ≤90 90< v ≤110
80
15
25
90
30
These classes are well defined there are no gaps !
Drawing


Sensible Scales
Bases of rectangles correctly aligned


Plot the Class Boundaries carefully
Heights of rectangles needs to be correct

Frequency Density
Frequency Densities
Speed, kph
0< v ≤40 40< v ≤50 50< v ≤60 60< v ≤90 90< v ≤110
Class width
40
10
10
30
20
Frequency
80
15
25
90
30
Frequency
Density
2.0
1.5
2.5
3.0
1.5
Freq Dens
Frequency = Width x Height
Frequency = 40 x 2.0 = 80
3.0
2.0
1.0
0
20
40
60
80
100
120
Speed (km/h)
Grouped Frequency Distribution
GAPS! Need to adjust to Continuous
Time taken
5-9
10-19
20-29
30-39
40-59
14
9
18
3
5
(nearest minute)
Freq
Classes
Speed, kph
Frequency
No gaps
Ready to graph
0< v ≤40 40< v ≤50 50< v ≤60 60< v ≤90 90< v ≤110
80
15
25
90
30
Adjusting Classes
4½
Time taken
9½
19½
29½
39½
59½
5-9
10-19
20-29
30-39
40-59
14
9
18
3
5
10
10
10
20
(nearest minute)
Freq
5
Class Widths
Frequency Density
Time taken
(nearest minute)
5-9
10-19
20-29
30-39
40-59
Freq
14
9
18
3
5
Class width
5
10
10
10
20
Frequency
Density
2.8
0.9
1.8
0.3
0.25
Drawing


Sensible Scales
Bases correctly aligned


Plot the Class Boundaries
Heights correct

Frequency Density
Freq Dens
3.0
2.0
1.0
4.5 9.5
19.5
29.5
39.5
49.5
59.5
Time (Mins)
5
10 15 20 25 30 35 40 45
50 55 60
Estimating a Frequency

Imagine we want to Estimate the number of
people with a time between 12 and 25 mins

Because we have rounded to nearest minute
with our classes we.........

Consider the interval from 11.5 to 25.5
Freq Dens
Width
FD
11.5
25.5
3.0
Frequency = 0.9 x 8 = 7.2
Frequency = 1.8 x 6 = 10.8
2.0
Total Frequency = 18
1.0
4.5 9.5
19.5
29.5
39.5
49.5
59.5
Time (Mins)
We can estimate the Mode
Time taken
5-9
10-19
20-29
30-39
40-59
Freq
14
9
18
3
5
CF
14
23
41
44
49
(nearest minute)
Mode is therefore in this Class
Freq Dens
Modal class
3.0
2.0
1.0
4.5 9.5
19.5
29.5
39.5
49.5
59.5
Time (Mins)
…and the other one?
Speed, kph
Frequency


15
25
90
30
No adjustments required – class widths friendly
No ½ values
Estimation from the EXACT values given



80
Simpler to plot


0< v ≤40 40< v ≤50 50< v ≤60 60< v ≤90 90< v ≤110
No adjustment required
Estimate 15 to 56 would use 15 and 56!
Appear LESS OFTEN in the exam
Why use frequency density for the
vertical axes of a Histogram?

The effect of unequal class sizes on the
histogram can lead to misleading ideas
about the data distribution
frequency of class
frequency density  rectangle height 
class width
The vertical axis is Frequency Density
Example: Misprediction of Grade Point Average (GPA)
The following table displays the differences between predicted
GPA and actual GPA.
Positive differences result when predicted GPA > actual GPA.
Class Interval
Frequency
Class
width
-2.0 to < -0.4
23
1.6
-0.4 to < -0.2
55
0.2
-0.2 to < -0.1
97
0.1
-0.1 to < 0
210
0.1
0 to < 0.1
189
0.1
0.1 to < 0.2
139
0.1
0.2 to < 0.4
116
0.2
0.4 to < 2.0
171
1.6
1000
X 10-3
17.1%
of data
2.3%
of data
The frequency histogram considerably
exaggerates the incidence of
overpredicted and underpredicted values
The area of the two most extreme
rectangles are much too large.!!
Example: Density Histogram of Misreporting GPA
frequency density  rectangle height 
Class Interval
Frequency
frequency of class
class width
Class width
Frequency
Density
-2.0 to < -0.4
23
1.6
14
-0.4 to < -0.2
55
0.2
275
-0.2 to < -0.1
97
0.1
970
-0.1 to < 0
210
0.1
2100
0 to < 0.1
189
0.1
1890
0.1 to < 0.2
139
0.1
1390
0.2 to < 0.4
116
0.2
580
0.4 to < 2.0
171
1.6
107
To avoid the
misleading histogram
like the one on last
slide,
display the data with
frequency density
Frequency=( rectangle height )x( class width ) = area of rectangle
X 10-3
Frequency density x 10-3
Principles of Excellent Graphs






The graph should not distort the data.
The graph should not contain unnecessary things
(sometimes referred to as chart junk).
The scale on the vertical axis should begin at zero.
All axes should be properly labelled.
The graph should contain a title.
The simplest possible graph should be used for a
given set of data.
Chap 2-24
Graphical Errors: Chart Junk
Bad Presentation
 Good Presentation
Minimum Wage
1960: $1.00
$
Minimum Wage
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80
1960
1970
1980
1990
Chap 2-25
Graphical Errors:
No Relative Basis
Bad Presentation
A’s received by
students.
Freq.
300
Good Presentation
%
30%
200
20%
100
10%
0
0%
FD
UG
GR
SR
A’s received by
students.
FD
UG
GR
SR
FD = Foundation, UG = UG Dip, GR = Grad Dip, SR = Senior
Chap 2-26
Graphical Errors:
Compressing the Vertical Axis
Bad Presentation
 Good Presentation
Quarterly Sales
$
$
200
50
100
25
0
0
Q1
Q2
Q3
Q4
Quarterly Sales
Q1
Q2
Q3
Q4
Chap 2-27
Graphical Errors: No Zero
Point on the Vertical Axis
Bad Presentation
$
$
Monthly Sales
Monthly Sales
45
42
39
36
45
42
39
36
J

Good Presentations
F
M
A
M J
0
J
F
M
A
M
J
Graphing the first six months of sales
Chap 2-28
Download