Mathematics for Computer Science Sampling MIT 6.042J/18.062J Estimate % contaminated fish in Charles River? Sampling & Confidence Albert R Meyer, Procedure: catch n fish, test each, use %contaminated in catch as estimate of %contaminated in whole river May 10, 2010 Sampling Questions May 10, 2010 2 }} 11 1/2 PrPrA500 A n -- pμ >> 0.1 500 n 0.1 n = 500, μ = p, worst = Albert R Meyer, = 0.1 1 2 May 10, 2010 lec 14M.2 p ::= fraction contaminated in river test a fish toss bias p coin catch n fish toss n coins An ::= fraction contaminated in the sample of n Albert R Meyer, lec 14M.3 Pairwise Independent Sampling {{ May 10, 2010 Model as Coin Tosses Catch 500 fish; what is probability that estimate is within 0.1 of the actual fraction? Albert R Meyer, Albert R Meyer, lec 14M.1 2 May 10, 2010 lec 14M.4 Pairwise Independent Sampling {{ 2 }} 11 1/2 PrPrA500 A n -- pμ >> 0.1 500 n 0.1 n = 500, { μ = p, } 2 = 0.1 Pr A500 - p 0.1 > 0.95 Albert R Meyer, May 10, 2010 1 Sampling using Binomial PDF Confidence in our estimate Better estimate: With probability 0.95 our estimated fraction will be within 0.1 of the actual fraction of contaminated fish in the whole river. Albert R Meyer, May 10, 2010 A n is { { Albert R Meyer, { Lemma: Pr B lec 14M.8 n,p } np n is min when p = 1/2 Albert R Meyer, lec 14M.10 Sampling using Binomial PDF { May 10, 2010 How to bound this probability when we don’t know p? )} } May 10, 2010 } Sampling using Binomial PDF = 0.06 ( Albert R Meyer, lec 14M.7 np 30 n 500 Pr Pr B 500,pB 500p 0.06 n,p { } = Pr B n,p np n Better estimate: { n Pr A n p Sampling using Binomial PDF n = 500, B n,p May 10, 2010 lec 14M.11 Sampling using Binomial PDF } { Pr 220 B 500,1/2 280 } Pr 220 B 500,1/2 280 500 2500 = i=220 i 280 { ( ) } Pr B 500,p 500p 250 0.06 30500 1/2 Albert R Meyer, May 10, 2010 lec 14M.12 0.99 Albert R Meyer, May 10, 2010 lec 14M.13 2 Confidence not Probable Reality Confidence in our estimate Now suppose we sample 500 fish and discover 230 are contaminated. So we estimate p is 230/500 = 0.46 It’s tempting to say We can actually be 99% confident that our estimated fraction is with 0.06 of the true fraction of contaminated fish in the whole river. Albert R Meyer, May 10, 2010 “the probability that p = 0.46± 0.06 is at least 0.99” --technically wrong! lec 14M.14 Confidence p is unknown, but not a random variable! May 10, 2010 lec 14M.16 Confidence for simplicity we say that p = 0.46 ± 0.06 at the 99% confidence level Albert R Meyer, May 10, 2010 May 10, 2010 lec 14M.15 Confidence p is the actual fraction of bad fish in the river. Albert R Meyer, Albert R Meyer, lec 14M.20 The possible outcomes of our sampling procedure is a random variable. We can say that the “probability that our sampling process will yield a fraction that is ± 0.06 of the true fraction at least 0.99” Albert R Meyer, May 10, 2010 lec 14M.17 Confidence Moral: when you are told that some fact holds at a high confidence level, remember that a random experiment lies behind this claim. Ask yourself “what experiment?” Albert R Meyer, May 10, 2010 lec 14M.21 3 Team Problems Problems 14 Albert R Meyer, May 10, 2010 lec 14M.22 4 MIT OpenCourseWare http://ocw.mit.edu 6.042J / 18.062J Mathematics for Computer Science Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.