S2 Chapter 5: Normal Approximations Dr J Frost (jfrost@tiffin.kingston.sch.uk) www.drfrostmaths.com Last modified: 30th August 2015 RECAP :: Poisson → Binomial 1 We saw earlier that conceptually speaking a Poisson Distribution is a Binomial Distribution in disguise. Why? The Poisson Distribution counts the occurrence of a event in a time period. We can divide the time period into a large number of time intervals, each a trial in which the event can occur or not occur (i.e. two ? Binomially Distributed, but as the outcomes). The count of events is then time intervals become infinitely small (i.e, 𝒏 large and 𝒑 smaller), we got the Poisson Distribution in the limit. 2 If 𝑋~𝐵 𝑛, 𝑝 , then what distribution could we use to approximate this? 𝑿~𝑷𝒐 ? 𝒏𝒑 3 Under what conditions is the approximation sufficiently accurate? As 𝒏 becomes large and 𝒑 is small. Rule ? of thumb is when 𝒏𝒑 ≤ 𝟏𝟎. 𝑒 −𝜆 𝜆𝑥 𝑝 𝑥 = 𝑥! Calculating probabilities for a Poisson is fine if 𝑥 is small. But if 𝑥 is large, 𝑥! is difficult to calculate. Is there an easier approximation to use? 𝑝 𝑥 Binomial → ? 𝑥 (#successes) The graph shows the probability function for different Binomial Distributions. Which one resembles another distribution and what distribution does it resemble? When 𝒑 = 𝟎. 𝟓, and 𝒏 is fairly large, it resembles a normal distribution in part because it’s symmetrical. This is not a coincidence – if you do S3 you’ll see this ? arises due to something called the Central Limit Theorem. This chapter is about how we can approximate either the Binomial or Poisson Distribution using the Normal Distribution. Continuity Corrections We wish to approximate the Binomial and Poisson distributions using a Normal distribution. One problem is that Binomial/Poisson are discrete whereas the Normal distribution is continuous. We apply something called a continuity correction to approximate a discrete distribution using a continuous one. The random variable 𝑋 represents the time to finish a race in a whole number of hours. We’re interested in knowing the probability Alice took 6 hours. How would you represent this time on a number line given hours is discrete? And what about if hours was now considered to be continuous (as 𝑌)? Discrete: 𝑋=6 5.5 < 𝑌 < 6.5 ? Continuous: 3 4 5 6 7 8 9 10 We can’t just find 𝑷 𝒀 = 𝟔 when 𝒀 is continuous, because the probability is effectively 0. But 𝑷 𝟓. 𝟓 < 𝒀 < 𝟔. 𝟓 would seem a sensible interval to use because any time between 5.5 and 6.5 would have rounded to 6 hours were it discrete. Continuity Corrections If 𝑋 is a discrete variable, and 𝑌 is its continuous equivalent, how would you represent 𝑃 𝑋 ≥ 5 for 𝑌? Discrete: 𝑋≥5 ? Continuous: 3 4 5 6 7 𝑌 ≥ 4.5 8 9 Notice the range has been enlarged by an extra 0.5. 10 How would represent 𝑃 𝑋 < 9 for 𝑌? Discrete: 𝑋<9→𝑋≤8 ? Continuous: 3 4 5 6 7 𝑌 ≤ 8.5 8 9 ! A continuity correction is approximating a discrete range using a continuous one. 1. If > or <, convert to ≥, ≤ first. 2. Enlarge the range by 0.5. 10 Examples Discrete Continuous 𝑃 𝑋≤7 𝑃 𝑋 < 10 𝑃 𝑋>9 𝑃 1 ≤ 𝑋 ≤ 10 𝑃 3<𝑋<6 𝑃 3≤𝑋<6 𝑃 3<𝑋≤6 𝑃(𝑋 = 3) ? 𝑃 𝑌 ≤ 7.5 = P X ≤ 9 → 𝑃 𝑌? ≤ 9.5 = 𝑃 𝑋 ≥ 10 → 𝑃? 𝑌 ≥ 9.5 𝑃 0.5 ≤ 𝑌 ≤ 10.5? = 𝑃 4 ≤ 𝑋 ≤ 5 →? 𝑃 3.5 ≤ 𝑌 ≤ 5.5 𝑃 2.5 ≤ 𝑌 ≤ 5.5 ? 𝑃 3.5 ≤ 𝑋 ≤ 6.5 ? 𝑃 2.5 ≤ 𝑋 ≤ 3.5 ? ! A continuity correction is approximating a discrete range using a continuous one. 1. If > or <, convert to ≥, ≤ first. 2. Enlarge the range (at each end) by 0.5. Binomial → Normal To approximate a Binomial Distribution as a Normal Distribution, we just copy over the mean and variance of the Binomial to the Normal. Sorted! 𝑓 𝑥 𝑝 𝑥 𝑥 𝑥 ! if 𝑛 is large, 𝑝 close to 0.5 𝑋~𝐵 𝑛, 𝑝 approximate ? 𝑛𝑝 1?− 𝑝 𝑌~𝑁 𝑛𝑝, Thus 𝜎 = 𝑛𝑝 1?− 𝑝 Bro Tip: It’s a common error to accidentally forget to square root the variance to get the standard deviation – we need it in calculation of probabilities. Example 𝑋~𝐵 20, 0.4 . a) Use tables to find 𝑃 𝑋 ≤ 6 exactly. 0.25 b) Use a normal approximation to estimate 𝑃 𝑋 ≤ 6 . 𝑿~𝑩 𝟐𝟎, 𝟎. 𝟒 → 𝒀~𝑵 𝟖, 𝟒. 𝟖 Note that < vs ≤ doesn’t matter at this point as 𝑌 𝟔. 𝟓 − 𝟖 𝑷 𝒀 < 𝟔. 𝟓 = 𝑷 𝒁 < is continuous thus the probability is not affected. 𝟒. 𝟖 = 𝑷 𝒁 < −𝟎. 𝟔𝟖𝟒𝟔 … ?We’ve standardised as per S1 to get 𝑍. = 𝟏 − 𝑷 𝒁 < 𝟎. 𝟔𝟖𝟒𝟔 … Recall that either changing the direction of the inequality or changing the sign gives us 1 − ⋯ = 𝟎. 𝟐𝟒𝟖𝟑 We need < and a positive 𝑧 value to use table. Test Your Understanding: 𝑋~𝐵 40, 0.6 . Use a normal approximation to estimate 𝑃 𝑋 ≥ 20 . 𝑿~𝑩 𝟒𝟎, 𝟎. 𝟔 → 𝒀~𝑵 𝟐𝟒, 𝟗. 𝟔 𝟏𝟗. 𝟓 − 𝟐𝟒 𝑷 𝒀 > 𝟏𝟗. 𝟓 = 𝑷 𝒁 > 𝟗. 𝟔 = 𝑷 𝒁 > −𝟏. 𝟒𝟓𝟐 … We both changed > to = 𝑷 𝒁 < 𝟏. 𝟒𝟓𝟐 … < and changed sign, so two 1 − (1 − ⋯ cancel = 𝟎. 𝟗𝟐𝟔𝟓 ? each other out. 𝑋~𝐵 120, 0.25 . Use a normal approximation to estimate 𝑃 35 ≤ 𝑋 ≤ 45 . 𝑿~𝑩 𝟏𝟐𝟎, 𝟎. 𝟐𝟓 → 𝒀~𝑵 𝟑𝟎, 𝟐𝟐. 𝟓 𝑷 𝟑𝟒. 𝟓 ≤ 𝒀 ≤ 𝟒𝟓. 𝟓 = 𝑷 𝒀 ≤ 𝟒𝟓. 𝟓 − 𝑷 𝒀 ≤ 𝟑𝟒. 𝟓 𝟒𝟓. 𝟓 − 𝟑𝟎 𝟑𝟒. 𝟓 − 𝟑𝟎 =𝑷 𝒁< −𝑷 𝒁< 𝟐𝟐. 𝟓 𝟐𝟐. 𝟓 = 𝟎. 𝟏𝟕𝟎𝟓 ? Exercise 5B 1 1 𝑋~𝐵 a) 𝑃 b) 𝑃 c) 𝑃 150, . Use a suitable approximation to estimate: 3 ? 𝑋 ≤ 40 = 𝟎. 𝟎𝟒𝟗𝟓 𝑋 > 60 = 𝟎. 𝟎𝟑𝟒𝟒 ? 45 ≤ 𝑋 ≤ 60 = 𝟎. 𝟕𝟗𝟒𝟓 ? 2 𝑋~𝐵 a) 𝑃 b) 𝑃 c) 𝑃 100, 0.65 . Use a suitable approximation to estimate: 𝑋 > 58 = 𝟎. 𝟗𝟏𝟑 ? 60 < 𝑋 ≤ 72 = 𝟎. 𝟕𝟔𝟖 ? 𝑋 = 70 = 𝟎. 𝟎𝟒𝟖𝟓 ? 3 In a multiple choice test there are 4 possible answers to each question. Given that there are 60 questions on the paper, use a suitable approximation to estimate the probability of getting more than 20 questions correct if the answer to each question is chosen at random from the 4 available choices for each question. 𝟎. 𝟎𝟓𝟎𝟓 ? Poisson → Normal Again, we want the same mean and variance for the Normal as the original Poisson. ! if 𝜆 is large 𝑋~𝑃𝑜 𝜆 approximate Test Your Understanding 𝑋~𝑃𝑜(25). Determine 𝑃 18 ≤ 𝑋 < 35 𝑿~𝑷𝒐 𝟐𝟓 → 𝒀~𝑵 𝟐𝟓, 𝟐𝟓 𝑷 𝟏𝟕. 𝟓 ≤ 𝒀 ≤ 𝟑𝟒. 𝟓 = 𝑷 𝒀 ≤ 𝟑𝟒. 𝟓 − 𝑷 𝒀 ≤ 𝟏𝟕. 𝟓 𝟑𝟒. 𝟓 − 𝟐𝟓 𝟏𝟕. 𝟓 − 𝟐𝟓 =𝑷 𝒁< −𝑷 𝒁< ? 𝟐𝟓 𝟐𝟓 = 𝑷 𝒁 < 𝟏. 𝟗 − 𝑷 𝒁 < −𝟏. 𝟓 = 𝟎. 𝟗𝟕𝟏𝟑 − 𝟏 − 𝟎. 𝟗𝟑𝟑𝟐 = 𝟎. 𝟗𝟎𝟒𝟓 ? 𝑌~𝑁 ?𝜆 , 𝜆 Thus 𝜎 = 𝜆? Exam Example Edexcel S2 Jan 2012 𝑿~𝑷𝒐 ? 𝟓 ? (300 hits per hour is 50 hits per 10 mins) All Approximations Summarised! This handy diagram (found in your textbook) summarises all approximations very nicely. This will help you decide when the exam question says “Using a suitable approximation…” Discrete Continuous 𝜇 = 𝑛𝑝 𝜎 2 = 𝑛𝑝 1 − 𝑝 Binomial 𝐵 𝑛, 𝑝 Is 𝒏𝒑 ≤ 𝟏𝟎? (Large 𝑛, small 𝑝) YES NO 𝜆 = 𝑛𝑝 Poisson 𝑃𝑜 𝜆 Normal 𝑁 𝜇, 𝜎 2 𝜇=𝜆 𝜎2 = 𝜆 Continuity correction needed if line crossed/ On the next few slides, read the question and vote with your diaries what the original distribution is and what the new approximated one is. 𝐵→𝑁 𝐵 → 𝑃𝑜 𝑃𝑜 → 𝑁 Q: A spinner is designed to land on red 10% of the time. Use a suitable approximation to estimate the probability of fewer than 4 red in 60 turns of the spinner. Because 𝑛𝑝 = 6, which is less than 10. 𝐵→𝑁 𝐵 → 𝑃𝑜 𝑃𝑜 → 𝑁 Q: A spinner is designed to land on red 10% of the time. Use a suitable approximation to estimate the probability of more than 20 red in 150 turns of the spinner. Because 𝑛𝑝 > 10. We try see if a Poisson approximation is appropriate first. Even if 𝑝 is not close to 0.5, we use the normal approximation anyway! 𝐵→𝑁 𝐵 → 𝑃𝑜 𝑃𝑜 → 𝑁 Q: A Royal Mail processing centre receives on average 1200 letters a minute. Use a suitable approximation to work out the probability that it receives more than 1500 in a given minute. 𝐵→𝑁 𝐵 → 𝑃𝑜 𝑃𝑜 → 𝑁 Exercises Edexcel S2 Jan 2013 Q2 In a village, power cuts occur randomly at a rate of 3 per year. a) Find the probability that in any given year there will be (i) exactly 7 power cuts, (ii) at least 4 power cuts. (5) (i) 𝟎. 𝟎𝟐𝟏𝟔 (ii) 𝟎. 𝟑𝟓𝟐𝟖 b) Use a suitable approximation to find the probability that in the next 10 years the number of power cuts will be less than 20. (6) 𝑩𝒆𝒕𝒘𝒆𝒆𝒏 𝟎. 𝟎𝟐𝟕𝟒 𝒕𝒐 𝟎. 𝟎𝟐𝟕𝟔 ? ? Edexcel S2 May 2012 Q4 The number of houses sold by an estate agent follows a Poisson distribution, with a mean of 2 per week. The estate agent will receive a bonus if he sells more than 25 houses in the next 10 weeks. (c) Use a suitable approximation to estimate the probability that the estate agent receives a bonus. 𝑿~𝑵 𝟐𝟎, 𝟐𝟎 → 𝑷 𝑿 > 𝟐𝟓 = 𝟎. 𝟏𝟎𝟗𝟑 𝒐𝒓 𝟒 ? Edexcel S2 May 2011 Q5 Defects occur at random in planks of wood with a constant rate of 0.5 per 10 cm length. Jim buys a plank of length 100 cm. a. Find the probability that Jim’s plank contains at most 3 defects. (2) 𝑿~𝑷𝒐 𝟓 ; 𝑷 𝑿 ≤ 𝟑 = 𝟎. 𝟐𝟔𝟓𝟎 b. Shivani buys 6 planks each of length 100 cm. Find the probability that fewer than 2 of Shivani’s planks contain at most 3 defects. (5) 𝒀 = 𝒏𝒐 𝒑𝒍𝒂𝒏𝒌𝒔 𝒘𝒊𝒕𝒉 ≤ 𝟑 𝒅𝒆𝒇𝒆𝒄𝒕𝒔 𝒀~𝑩 𝟔, 𝟎. 𝟐𝟔𝟓 𝑷 𝒀 ≤ 𝟏 = 𝟎. 𝟒𝟗𝟖𝟕 c. Using a suitable approximation, estimate the probability that the total number of defects on Shivani’s 6 planks is less than 18. (6) 𝑻~𝑷𝒐 𝟑𝟎 𝑻 ≈ 𝑺~𝑵 𝟑𝟎, 𝟑𝟎 𝑷 𝑻 < 𝟏𝟖 = 𝑷 𝑺 < 𝟏𝟕. 𝟓 = 𝟎. 𝟎𝟏𝟏𝟐𝟑 ? ? ?