Comparing and understanding data sets, boxplots

advertisement
Understanding and
Comparing Distributions
Another Useful Graphical Method:
Boxplots
Pulse Rates n = 138
#
3
9
10
23
23
16
23
10
10
4
2
4
1
Stem
4*
4.
5*
5.
6*
6.
7*
7.
8*
8.
9*
9.
10*
10.
11*
Leaves
Median: mean of pulses in
locations 69 & 70:
median= (70+70)/2=70
588
001233444
5556788899
00011111122233333344444
55556666667777788888888
00000112222334444
55555666666777888888999
0000112224
5555667789
0012
58
0223
1
Q1: median of lower half
(lower half = 69 smallest
pulses); Q1 = pulse in
ordered position 35;
Q1 = 63
Q3 median of upper half
(upper half = 69 largest
pulses); Q3= pulse in position
35 from the high end; Q3=78
Recall the 5-number summary
of data
 Minimum
Q1 median Q3 maximum
 Pulse data 5-number summary
45 63 70 78 111
A boxplot is a graphical display of the 5number summary

Example
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1
2
3
4
5
6
7
6
5
4
3
2
1
2
3
4
5
6
7
6
5
4
3
2
1
6.1
5.6
5.3
4.9
4.7
4.5
4.2
4.1
3.9
3.8
3.7
3.6
3.4
3.3
2.9
2.8
2.5
2.3
2.3
2.1
1.5
1.9
1.6
1.2
0.6
 Consider
the data shown at the left.
– The data values 6.1, 5.6, …, are in the
right column
– They are arranged in decreasing order
from 6.1 (data rank of 25 shown in far
left column) to 0.6 (data rank of 1 in
far left column)
– The center column shows the ranks of
the quartiles (in blue) from each end
of the data and from the overall
median (in yellow)
Boxplot: display of 5-number summary
1
2
3
4
5
6
7
6
5
4
3
2
1
2
3
4
5
6
7
6
5
4
3
2
1
6.1
5.6
5.3
4.9
4.7
4.5
4.2
4.1
3.9
3.8
3.7
3.6
3.4
3.3
2.9
2.8
2.5
2.3
2.3
2.1
1.5
1.9
1.6
1.2
0.6
Largest = max = 6.1
BOXPLOT
7
Q3= third quartile
= 4.2
m = median = 3.4
6
Years until death
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
5
4
3
2
1
Q1= first quartile
= 2.3
Smallest = min = 0.6
0
Disease X
Five-number summary:
min Q1 m Q3 max
Boxplot: display of 5-number
summary

Example: age of 66 “crush” victims at rock
concerts 1999-2000.
5-number summary:
13 17 19 22 47
Boxplot construction
1) construct box with ends located at Q1
and Q3; in the box mark the location of
median (usually with a line or a “+”)
2) fences are determined by moving a
distance 1.5(IQR) from each end of the
box;
2a) upper fence is 1.5*IQR above the upper quartile
2b) lower fence is 1.5*IQR below the lower quartile
Note: the fences only help with constructing the
boxplot; they do not appear in the final boxplot
display
Box plot construction (cont.)
3) whiskers: draw lines from the ends of
the box left and right to the most
extreme data values found within the
fences;
4) outliers: special symbols represent
each data value beyond the fences;
4a) sometimes a different symbol is
used for “far outliers” that are more than
3 IQRs from the quartiles
Boxplot: display of 5-number summary
1
2
3
4
5
6
7
6
5
4
3
2
1
2
3
4
5
6
7
6
5
4
3
2
1
7.9
6.1
5.3
4.9
4.7
4.5
4.2
4.1
3.9
3.8
3.7
3.6
3.4
3.3
2.9
2.8
2.5
2.3
2.3
2.1
1.5
1.9
1.6
1.2
0.6
8
Largest = max = 7.9
7
BOXPLOT
Distance to Q3
7.9 − 4.2 = 3.7
6
Q3= third quartile
= 4.2
Years until death
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
5
Interquartile
range
Q3 – Q1=
4.2 − 2.3 =
1.9
4
3
2
1
0
Disease X
Q1= first quartile
= 2.3
1.5 * IQR = 1.5*1.9=2.85. Individual #25 has a value of
7.9 years, which is 3.7 years above the third quartile.
This is more than 2.85 = 1.5*IQR above Q3. Thus,
individual #25 is a suspected outlier.
ATM Withdrawals by Day,
Month, Holidays
Beg. of class pulses (n=138)
 Q1
= 63, Q3 = 78
 IQR=78  63=15
 1.5(IQR)=1.5(15)=22.5
 Q1
- 1.5(IQR): 63 – 22.5=40.5
 Q3
+ 1.5(IQR): 78 + 22.5=100.5
40.5
63
45
70
78
100.5
Below is a box plot of the yards gained in a
recent season by the 136 NFL receivers who
gained at least 50 yards. What is the
approximate value of Q3 ?
0
136
273
410
547
684
958
821
1095
1232
1369
Pass Catching Yards by Receivers
1.
2.
3.
4.
450
750
215
545
0%
1
0%
2
0%
3
0%
10
4
Countdown
Rock concert deaths: histogram
and boxplot
Automating Boxplot
Construction
 Excel
“out of the box” does not draw
boxplots.
 Many add-ins are available on the internet
that give Excel the capability to draw box
plots.
 Statcrunch (http://statcrunch.stat.ncsu.edu)
draws box plots.
Statcrunch Boxplot
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1
2
3
4
5
6
7
6
5
4
3
2
1
2
3
4
5
6
7
6
5
4
3
2
1
7.9
6.1
5.3
4.9
4.7
4.5
4.2
4.1
3.9
3.8
3.7
3.6
3.4
3.3
2.9
2.8
2.5
2.3
2.3
2.1
1.5
1.9
1.6
1.2
0.6
Largest = max = 7.9
Q3= third quartile
= 4.2
Q1= first quartile
= 2.3
Tuition 4-yr Colleges
Statcrunch: 2012-13 NFL Salaries by
Position
College Football Head Coach
Salaries by Conference
2013 Major League Baseball Salaries by
Team
TA-DAAA! The End
Download