Understanding and Comparing Distributions Another Useful Graphical Method: Boxplots Pulse Rates n = 138 # 3 9 10 23 23 16 23 10 10 4 2 4 1 Stem 4* 4. 5* 5. 6* 6. 7* 7. 8* 8. 9* 9. 10* 10. 11* Leaves Median: mean of pulses in locations 69 & 70: median= (70+70)/2=70 588 001233444 5556788899 00011111122233333344444 55556666667777788888888 00000112222334444 55555666666777888888999 0000112224 5555667789 0012 58 0223 1 Q1: median of lower half (lower half = 69 smallest pulses); Q1 = pulse in ordered position 35; Q1 = 63 Q3 median of upper half (upper half = 69 largest pulses); Q3= pulse in position 35 from the high end; Q3=78 Recall the 5-number summary of data Minimum Q1 median Q3 maximum Pulse data 5-number summary 45 63 70 78 111 A boxplot is a graphical display of the 5number summary Example 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 6 5 4 3 2 1 2 3 4 5 6 7 6 5 4 3 2 1 6.1 5.6 5.3 4.9 4.7 4.5 4.2 4.1 3.9 3.8 3.7 3.6 3.4 3.3 2.9 2.8 2.5 2.3 2.3 2.1 1.5 1.9 1.6 1.2 0.6 Consider the data shown at the left. – The data values 6.1, 5.6, …, are in the right column – They are arranged in decreasing order from 6.1 (data rank of 25 shown in far left column) to 0.6 (data rank of 1 in far left column) – The center column shows the ranks of the quartiles (in blue) from each end of the data and from the overall median (in yellow) Boxplot: display of 5-number summary 1 2 3 4 5 6 7 6 5 4 3 2 1 2 3 4 5 6 7 6 5 4 3 2 1 6.1 5.6 5.3 4.9 4.7 4.5 4.2 4.1 3.9 3.8 3.7 3.6 3.4 3.3 2.9 2.8 2.5 2.3 2.3 2.1 1.5 1.9 1.6 1.2 0.6 Largest = max = 6.1 BOXPLOT 7 Q3= third quartile = 4.2 m = median = 3.4 6 Years until death 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 5 4 3 2 1 Q1= first quartile = 2.3 Smallest = min = 0.6 0 Disease X Five-number summary: min Q1 m Q3 max Boxplot: display of 5-number summary Example: age of 66 “crush” victims at rock concerts 1999-2000. 5-number summary: 13 17 19 22 47 Boxplot construction 1) construct box with ends located at Q1 and Q3; in the box mark the location of median (usually with a line or a “+”) 2) fences are determined by moving a distance 1.5(IQR) from each end of the box; 2a) upper fence is 1.5*IQR above the upper quartile 2b) lower fence is 1.5*IQR below the lower quartile Note: the fences only help with constructing the boxplot; they do not appear in the final boxplot display Box plot construction (cont.) 3) whiskers: draw lines from the ends of the box left and right to the most extreme data values found within the fences; 4) outliers: special symbols represent each data value beyond the fences; 4a) sometimes a different symbol is used for “far outliers” that are more than 3 IQRs from the quartiles Boxplot: display of 5-number summary 1 2 3 4 5 6 7 6 5 4 3 2 1 2 3 4 5 6 7 6 5 4 3 2 1 7.9 6.1 5.3 4.9 4.7 4.5 4.2 4.1 3.9 3.8 3.7 3.6 3.4 3.3 2.9 2.8 2.5 2.3 2.3 2.1 1.5 1.9 1.6 1.2 0.6 8 Largest = max = 7.9 7 BOXPLOT Distance to Q3 7.9 − 4.2 = 3.7 6 Q3= third quartile = 4.2 Years until death 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 5 Interquartile range Q3 – Q1= 4.2 − 2.3 = 1.9 4 3 2 1 0 Disease X Q1= first quartile = 2.3 1.5 * IQR = 1.5*1.9=2.85. Individual #25 has a value of 7.9 years, which is 3.7 years above the third quartile. This is more than 2.85 = 1.5*IQR above Q3. Thus, individual #25 is a suspected outlier. ATM Withdrawals by Day, Month, Holidays Beg. of class pulses (n=138) Q1 = 63, Q3 = 78 IQR=78 63=15 1.5(IQR)=1.5(15)=22.5 Q1 - 1.5(IQR): 63 – 22.5=40.5 Q3 + 1.5(IQR): 78 + 22.5=100.5 40.5 63 45 70 78 100.5 Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who gained at least 50 yards. What is the approximate value of Q3 ? 0 136 273 410 547 684 958 821 1095 1232 1369 Pass Catching Yards by Receivers 1. 2. 3. 4. 450 750 215 545 0% 1 0% 2 0% 3 0% 10 4 Countdown Rock concert deaths: histogram and boxplot Automating Boxplot Construction Excel “out of the box” does not draw boxplots. Many add-ins are available on the internet that give Excel the capability to draw box plots. Statcrunch (http://statcrunch.stat.ncsu.edu) draws box plots. Statcrunch Boxplot 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 6 5 4 3 2 1 2 3 4 5 6 7 6 5 4 3 2 1 7.9 6.1 5.3 4.9 4.7 4.5 4.2 4.1 3.9 3.8 3.7 3.6 3.4 3.3 2.9 2.8 2.5 2.3 2.3 2.1 1.5 1.9 1.6 1.2 0.6 Largest = max = 7.9 Q3= third quartile = 4.2 Q1= first quartile = 2.3 Tuition 4-yr Colleges Statcrunch: 2012-13 NFL Salaries by Position College Football Head Coach Salaries by Conference 2013 Major League Baseball Salaries by Team TA-DAAA! The End