Box and Whisker Plots WHY? When you want to compare 2 or more sets of data, Box and Whisker Plots can be used to easily show the differences between them. HOW? To create a Box and Whisker Plot, you only need to know how to calculate the median of a data set. To calculate the median, simply arrange your data from lowest to highest, and select the middle value. If you have an even number of observations, you will have two values in the middle. Can you see why? If this is the case, then the median will just be the average of these 2 numbers. o For example: if you have 50 observations, then the median will be the average of the 25th and 26th observations in your list. o …..23rd 24th 25th 26th 27th….. 165cm 167cm 168cm 170cm 170cm Median 168 170 2 = 169 = You now need to calculate the median of the lower half of the data, and the upper half of the data (called lower and upper quartiles respectively). The Interquartile Range (IQR) is simply the difference between the lower and upper quartiles. The Box shows the middle half of the data between the upper and lower quartile with the median marked as a solid line across the box. The whiskers show the range of the data. (To help identify any data that falls outside the overall pattern any observations more than 1.5 times the Interquartile Range from the lower or upper quartile are plotted individually - These are called outliers.) Below are 2 sets of data from the UK CensusAtSchool. Both are from a class of year 11 pupils asked to give their height to the nearest centimetre. Set F are the females and set M are the males. Set F 156 172 170 163 166 172 156 174 164 173 174 170 175 180 164 165 172 160 167 177 173 157 168 173 177 150 158 174 165 170 170 168 173 177 Set M 173 178 176 193 165 170 176 168 186 183 170 182 174 180 174 180 166 175 187 173 173 185 176 179 180 183 190 179 178 174 EXAMPLE For Set F 1. Order the data from smallest to highest (34 obs) 150 167 173 156 168 173 156 168 174 157 170 174 158 170 174 160 170 175 163 170 177 164 172 177 164 172 177 165 172 180 165 173 166 173 2. Median = 170 170 2 = 170cm Upper quartile = 173 cm Range 180 – 150 = 30 Lower Quartile = 164 cm IQR = 9 cm Boxplot of Female 150 160 170 180 Female Use the figures given to check on the outlier shown i.e. using the IQR rule we can see that the lower quartile minus 1.5 times the IQR is equal to 150.5 cm. Now construct a Box and Whisker Plot for the set M and compare with set F. What are you conclusions? **** Note that there are no outliers. Check yourself that the lower and upper limits for outliers are 159.5cm and 195.5cm respectively. You may wish to record your class heights and compare these with the year 11 UK students. Additional thoughts What do you think would happen to the median of the UK year 11 boys if the tallest boy was actually 213cm? Do you think it would change? Obviously this height would be considered an outlier using the IQR rule. This should tell you something about the median, and whether it is affected by outliers. What about the mean? Does this change? Which would be the ‘better’ measure of centre? See if you can find out when you should use the mean instead of the median, and when the 2 values will be the same. **** For Set M Median = 177 cm Upper quartile = 182 cm Range 193 – 165 = 28 Lower Quartile = 173 cm IQR = 9 cm