Mr Barton’s Maths Notes Stats and Probability 4. Cumulative Frequency and Box Plots www.mrbartonmaths.com 4. Cumulative Frequency and Box Plots • The answer to this question is similar to the one for: “why do we bother working out averages Why do we bother with Statistical Diagrams? and measures of spreads?”. • We live in a world jam-packed full of statistics, and if we were forced to look at all the facts and figures in their raw, untreated form, not only would we probably not be able to make any sense out of them, but there is also a very good chance our heads would explode. • Statistical Diagrams – if they are done properly - present those figures in a clear, concise, visually pleasing way, allowing us to make some sense out of the figures, summarise them, and compare them to other sets of data. 1. What is Cumulative Frequency? Cumulative is just a posh way of saying “add up as you go along” Frequency is just a posh word for “total” So… if you put them together, you get a very posh way of saying “add the totals up as you go along” Big Example To the right is a table showing the length of time a group of 40 Year 10 students spent playing on the Nintendo Wii on a gloomy week in January. Draw a Cumulative Frequency Curve, use it to estimate the Median and Inter-Quartile Range, and construct a Box Plot Hours spent playing Frequency 0 < h ≤ 1 2 1 < h ≤ 2 5 2 < h ≤ 3 10 3 < h ≤ 4 15 4 < h ≤ 6 5 6 < h ≤ 10 3 2. Adding a Cumulative Frequency Column Before you can even start thinking about drawing a Cumulative Frequency Curve, you need to be able to add a Cumulative Frequency column to your Frequency table. Remember, Cumulative Frequency just means that you add up the frequencies as you go along, so that is exactly what you do! Hours spent playing Frequency Cumulative Freq 0 < h ≤ 1 2 2 1 < h ≤ 2 5 7 2 < h ≤ 3 10 17 3 < h ≤ 4 15 32 4 < h ≤ 6 5 37 6 < h ≤ 10 3 40 Check: This final entry should always equal the total frequency! This is the number of people who play for 1 hour or less This is the number of people who play for 2 hours or less (5 + 2) This is the number of people who play for 3 hours or less (5 + 2 + 10) 3. Drawing the Cumulative Frequency Curve Remember: we plot Cumulative Frequency (y axis) against the upper boundary of each group (x axis) So… for group one it’s 1 on the x axis and 2 on the y and for group two, it’s 2 on the x axis and 7 on the y… Hours spent playing Frequency Cumulative Freq 0 < h ≤ 1 2 2 1 < h ≤ 2 5 7 2 < h ≤ 3 10 17 3 < h ≤ 4 15 32 4 < h ≤ 6 5 37 6 < h ≤ 10 3 40 Things to notice about the Cumulative Frequency Curve: 1. When you have finished plotting the points, join them up with a smooth curve. 2. Native the curve starts at (0, 0). This is because there is nobody playing less than 0 hours a week! 3. You must label your axis correctly, or you lose very easy marks! 40 Cumulative Frequency 35 30 25 20 15 10 5 Time Spent Playing Wii (hours) 1 2 3 4 5 6 7 8 9 10 4. Estimating the Median and Inter-Quartile Range We have spent a while drawing our cumulative frequency curve, so we may as well use it. Very quickly we can come up with estimates for the Median and the Inter-Quartile Range (a) Median As you hopefully remember, the Median is the MIDDLE value. To find it we: 1. Work out what is 50% of our total frequency (half way up the y axis) 2. Draw a horizontal line across until it hits our curve 3. When it hits the curve, draw a vertical line down to the x axis 4. The value on the x axis is our Median (b) Inter-Quartile Range For this we need to work out the upper quartile (UQ) and the lower quartile (LQ), and then calculate: UQ - LQ To find the Upper Quartile: 1. Work out what is 75% of our total frequency (three-quarters of the way up the y axis) 2. Draw a horizontal line across until it hits our curve 3. When it hits the curve, draw a vertical line down to the x axis 4. The value on the x axis is our Upper Quartile The Lower Quartile is the same, but 25% (one-quarter) of the way up! 40 Cumulative Frequency 35 30 25 20 15 10 5 Time Spent Playing Wii (hours) 1 2 3 4 5 6 7 8 9 Median: Upper Quartile Lower Quartile Inter-Quartile Range 50% of 40 = 20 75% of 40 = 30 25% of 40 = 10 = UQ – LQ Median = 3.2 hours UQ = 3.8 hours LQ = 2.4 hours = 3.8 – 2.4 Remember: The Median is a form of average, and just like the Range, The Inter-Quartile Range is a measure of consistency = 1.4 hours 10 5. Drawing Box Plots Box Plots are another way of representing all the same information that can be found on a Cumulative Frequency graph. Top Tip: if you have the chance, draw your box plot directly below your cumulative frequency graph, using the same scale on the x axis, and you can just extend the vertical lines downwards and save yourself a lot of time! Lowest value Highest value Median Lower Quartile Upper Quartile Inter-Quartile Range Range Note: The minimum value is the lowest possible value of your first group, and the maximum value is the highest possible value of your last group 40 Cumulative Frequency 35 30 Min Value = 0 LQ = 2.6 25 20 Median = 3.2 15 UQ = 3.8 Max Value = 10 10 5 Time Spent Playing Wii (hours) 1 2 3 4 5 6 7 8 9 10 Good luck with your revision!