Sampling & Confidence Catch fish; what is

advertisement
Mathematics for Computer Science
Sampling
MIT 6.042J/18.062J
Estimate % contaminated fish in
Charles River?
Sampling &
Confidence
Albert R Meyer,
Procedure: catch n fish, test each,
use %contaminated in catch as
estimate of %contaminated in
whole river
May 10, 2010
Sampling Questions
May 10, 2010
2
}}
11 1/2
PrPrA500
A n -- pμ >> 0.1
500
n 0.1 n = 500,
μ = p,
worst =
Albert R Meyer,
= 0.1
1
2
May 10, 2010
lec 14M.2
p ::= fraction contaminated in river
test a fish toss bias p coin
catch n fish toss n coins
An ::= fraction contaminated
in the sample of n
Albert R Meyer,
lec 14M.3
Pairwise Independent Sampling
{{
May 10, 2010
Model as Coin Tosses
Catch 500 fish; what is
probability that estimate
is within 0.1 of the actual
fraction?
Albert R Meyer,
Albert R Meyer,
lec 14M.1
2
May 10, 2010
lec 14M.4
Pairwise Independent Sampling
{{
2
}}
11 1/2
PrPrA500
A n -- pμ >> 0.1
500
n 0.1 n = 500,
{
μ = p,
}
2
= 0.1
Pr A500 - p 0.1 > 0.95
Albert R Meyer,
May 10, 2010
1
Sampling using Binomial PDF
Confidence in our estimate
Better estimate:
With probability 0.95 our
estimated fraction will be
within 0.1 of the actual
fraction of contaminated
fish in the whole river.
Albert R Meyer,
May 10, 2010
A n is
{
{
Albert R Meyer,
{
Lemma: Pr B
lec 14M.8
n,p
}
np n
is min when p = 1/2
Albert R Meyer,
lec 14M.10
Sampling using Binomial PDF
{
May 10, 2010
How to bound this probability
when we don’t know p?
)} }
May 10, 2010
}
Sampling using Binomial PDF
= 0.06
(
Albert R Meyer,
lec 14M.7
np 30
n 500
Pr Pr
B 500,pB 500p
0.06
n,p
{
}
= Pr B n,p np n
Better estimate:
{
n
Pr A n p Sampling using Binomial PDF
n = 500,
B n,p
May 10, 2010
lec 14M.11
Sampling using Binomial PDF
}
{
Pr 220 B 500,1/2 280
}
Pr 220 B 500,1/2 280
500
2500
= i=220 i 280
{
(
) }
Pr B 500,p 500p
250 0.06
30500
1/2
Albert R Meyer,
May 10, 2010
lec 14M.12
0.99
Albert R Meyer,
May 10, 2010
lec 14M.13
2
Confidence not Probable Reality
Confidence in our estimate
Now suppose we sample 500 fish and
discover 230 are contaminated.
So we estimate p is 230/500 = 0.46
It’s tempting to say
We can actually be 99%
confident that our
estimated fraction is with
0.06 of the true fraction
of contaminated fish in the
whole river.
Albert R Meyer,
May 10, 2010
“the probability that
p = 0.46± 0.06
is at least 0.99”
--technically wrong!
lec 14M.14
Confidence
p is unknown,
but not a random variable!
May 10, 2010
lec 14M.16
Confidence
for simplicity we say that
p = 0.46 ± 0.06
at the
99% confidence level
Albert R Meyer,
May 10, 2010
May 10, 2010
lec 14M.15
Confidence
p is the actual fraction of
bad fish in the river.
Albert R Meyer,
Albert R Meyer,
lec 14M.20
The possible outcomes of our
sampling procedure is a random
variable. We can say that the
“probability that our sampling
process will yield a fraction
that is ± 0.06 of the
true fraction at least 0.99”
Albert R Meyer,
May 10, 2010
lec 14M.17
Confidence
Moral: when you are told that
some fact holds at a high
confidence level, remember
that a random experiment
lies behind this claim. Ask
yourself “what experiment?”
Albert R Meyer,
May 10, 2010
lec 14M.21
3
Team Problems
Problems
14
Albert R Meyer,
May 10, 2010
lec 14M.22
4
MIT OpenCourseWare
http://ocw.mit.edu
6.042J / 18.062J Mathematics for Computer Science
Spring 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download