U06FPPProbabilityC

advertisement
Unit 6: Probability
Math? Ugh! Why bother?
• You hear on TV a gubernatorial candidate has a
5% lead over her opponent. Should you believe
she’ll win?
• You’ve got sample data. How far might your
average (or whatever) be off from the population
average?
• You’ve got experimental data. It doesn’t seem to
match the prevailing theory. How likely is it that
you’ve found something new?
Probability
• Start with finite probability (“frequency
theory”), to understand rules
– finite number of possible results in “sample
space”, usually equally likely
• Move to continuous probability, to include
“normal curve”, etc.
– sample space is all numbers [maybe in some
interval]
Finite probabilities
• “Event”: some set of possible “outcomes”, i.e.,
values in the “sample space”
• Probability of an event (with equally likely
outcomes): # of outcomes in the event (called
“successful outcomes”) / # of all possible outcomes
(expressed as fraction or %)
– Ex: Roll a die. P(getting < 3) = 2/6 = 1/3.
• Idea: P(A) = fraction of times A would occur if
experiment is repeated many times
• Equal likelihood of outcomes is important
– Flip 2 quarters: TT or HH twice as likely as HT?
– Similarly, roll 2 dice: 36 equally likely outcomes, and 4,4
seems only half as likely as 3,5 (unless dice have different
colors)
Immediate results
• P(Ac) = 1 – P(A) (Ac is the set of all
outcomes not in A, the “complement” of A)
– Ex: Roll one die. P(a four) = 1/6, so P(not a
four) = 1 – 1/6 = 5/6
– Ex: Flip a coin 5 times. P(no heads) = 1/32, so
P(at least one head) = 31/32.
• 0 ≤ P(A) ≤ 1
Hard example: In 5-card draw, P(4-of-a-kind)
Example: The first box model
• A box contains tickets with the numbers 1, 1,
4, 4, 4, 7, 7, 7, 7, 12. Pick a random slip.
• P(1) = 2/10 = 20%
• P(7) = 4/10 = 40%
• P(not 7) = 60%
• P(1 or 12) = 3/10 = 30%
• P(even) = 4/10 = 40%
Boolean operations: And (I)
• Conditional probability P(A|B) (“probability of A
given B”): Sample space is restricted to B (i.e.,
we know B has occurred). Now compute
probability of A.
– Ex: Pick a card from a (straight) deck. (Face
cards don’t include aces.)
• P(♥) = 13/52 = 1/4
• P(♥ | face card) = 3/12 = 1/4
– Ex: Two cards dealt face down:
• P(2nd is deuce) = 4/52 = 1/13
• P(2nd is deuce | 1st is deuce) = 3/51 = 1/17
Boolean operations: And (II)
• Multplication rule: P(A and B) = P(A)•P(B|A)
• “independent events”: P(B|A) = P(B)
− So with indep events, mult rule becomes
P(A and B) = P(A)•P(B)
− Remark: If A is indep of B, then B is indep
of A.
Examples
• Boxes of tickets
– Box 1: A1,A2,A2,B1,B1,B2 : letters and numbers are
not independent
– Box 2: A1,A2,A2,B1,B2,B2 : letters and numbers are
independent
• From box of 1, 1, 4, 4, 4, 7, 7, 7, 7, 12, pick two tickets:
– with replacement:
– P(two 4’s) = (3/10)(3/10) and P(1 then 7) = (2/10)(4/10)
– without replacement:
– P(two 4’s) = (3/10)(2/9) and P(1 then 7) = (2/10)(4/9)
• Die thrown 4 times
– Which is more likely, 3333 or 1436?
– What is P( 4 scores ≤ 2 )?
Boolean operations: And (III)
• Ex: Caucasian woman with blonde ponytail
snatched purse, jumped into yellow car driven
by black man with mustache and beard. Man
and woman fitting description arrested. At trial,
prosecutor says probs are: yellow car, 1/10; man
with mustache, 1/4; woman with ponytail, 1/10;
woman with blonde hair, 1/3; black man with
beard, 1/10; interracial couple in car, 1/1000. So
chances are 1/(10•4•10•3•10•1000) =
1/12,000,000 that they are wrong people. (???)
Boolean operations: Or (I)
• Addition rule: P(A or B (or both)) = P(A) +
P(B) – P(A and B)
• “mutually exclusive events”: P(A and B) =
0; i.e., if one occurs, the other cannot
– With mut excl events, addition rule becomes
P(A or B) = P(A) + P(B)
Examples
• Pick a card.
– P(A or K) = 4/52 + 4/52 = 2/13
– P(A or ♠) = 4/52 + 13/52 – 1/52 = 4/13
• Tickets 1-100 in a box, draw one.
– P(≤ 10 or ≥ 90) = 10/100 + 11/100 = 21/100
– P(≤ 10 or div by 5) = 10/100 + 20/100 – 2/100
= 7/25
Expected values
• Ex 1: Flip a coin 10 times, paying $1 to play
each time. You win $.50 (plus your $1) if you get
a head. How much should you expect to win?
• Ex 2: Roll two dodecahedral (12-sided) dice.
You win $10 (plus your payment to play) if you
get doubles. How much should you pay to play
for a fair game?
Two similar examples:
• From text: Paradox of the Chevalier de la
Méré: P(at least 1 ace in 4 rolls of die) >
P(at least 1 double-ace in 24 rolls of 2 dice)
• Birthday problem: With 30 people in a
room, how likely is it that at least two have
the same birth date?
The Birthday problem
# people
P(no match)
P(match)
18
0.65308858
0.34691142
19
0.62088147
0.37911853
20
0.58856162
0.41143838
21
0.55631166
0.44368834
22
0.52430469
0.47569531
23
0.49270277
0.50729723
24
0.46165574
0.53834426
25
0.4313003
0.5686997
26
0.40175918
0.59824082
0.11694818
27
0.37314072
0.62685928
0.85885862
0.14114138
28
0.34553853
0.65446147
12
0.83297521
0.16702479
29
0.31903146
0.68096854
13
0.80558972
0.19441028
30
0.29368376
0.70631624
14
0.77689749
0.22310251
31
0.26954537
0.73045463
15
0.74709868
0.25290132
32
0.24665247
0.75334753
16
0.71639599
0.28360401
33
0.22502815
0.77497185
17
0.68499233
0.31500767
34
0.20468314
0.79531686
2
0.99726027
0.00273973
3
0.99179583
0.00820417
4
0.98364409
0.01635591
5
0.97286443
0.02713557
6
0.95953752
0.04046248
7
0.9437643
0.0562357
8
0.92566471
0.07433529
9
0.90537617
0.09462383
10
0.88305182
11
Tree diagram
Flip a coin, then roll a die,
list all alternatives
The Monty Hall Problem
(From Marilyn vos Savant’s column)
Game show: Three doors hide a car and 2 goats.
Contestant picks a door. Host opens one of the
other doors to reveal a goat. Contestant then
may switch to the other unopened door. Is it
better to stay with the original choice or to
switch; or doesn’t it matter?
Marilyn’s answer: Switch!
Many respondents: Doesn’t matter. (“You’re the
goat!”)
Tree diagram of
Stayer’s possible
games
Binomial coefficients (I)
• How many ways are there to choose k
things (without regard to order) from a set
of n things?
– How many ways are there to choose 3 club
officers from a set of 5 to get funded for a trip
to a convention?
– How many ways are there to choose 2 cards
out of the 4 of a given rank to form a pair?
– How many ways are there to choose, out of 8
replications of an experiment, 6 to be
successful?
Ways to arrange 3 letters taken from {a,b,c,d,e}
abc
aec
bde
cda
dbc
ead
abd
aed
bea
cdb
dbe
eba
abe
bac
bec
cde
dca
ebc
acb
bad
bed
cea
dcb
ebd
acd
bae
cab
ceb
dce
eca
ace
bca
cad
ced
dea
ecb
adb
bcd
cae
dab
deb
ecd
adc
bce
cba
dac
dec
eda
ade
bda
cbd
dae
eab
edb
Ways to arrange 3 given letters: a, b, c
abc acb bac bca cab cba
aeb
bdc
cbe
dba
eac
edc
Binomial coefficients (II)
• Step one: How many ways are there to choose
k things in order from a set of n things?
– n(n-1)(n-2)...(n-k+1)
• Step two: How many ways are there to order k
given things?
– k(k-1)(k-2)...1
• Step three: Divide.
– C(n,k) = [n(n-1)(n-2)...(n-k+1)]/[k(k-1)(k-2)...1]
• Notation: n! = n(n-1)(n-2)...1 [1 if n=0]
– C(n,k) = n!/[k! (n-k)!]
Binomial coefficients (I) revisited
• How many ways are there to choose k
things (without regard to order) from a set
of n things?
– How many ways are there to choose 3 club
officers from a set of 5 to get funded for a trip
to a convention?
– How many ways are there to choose 2 cards
out of the 4 of a given rank to form a pair?
– How many ways are there to choose, out of 8
replications of an experiment, 6 to be
successful?
Binomial probabilities (I)
• General question: Suppose an experiment is
carried out n times under the same conditions.
A given event (set of outcomes) A has
probability p . What is the probability that A
occurs exactly k times out of the n repetitions?
– Ex: Roll a die 5 times. Probability of getting exactly
three 4’s?
• There are exactly C(5,3) = 10 patterns of three 4’s
and 2 non-4’s
• Each has probability (1/6)3(5/6)2
• So the answer is 10 (1/6)3(5/6)2
• In general, the answer is C(n,k) pk (1-p)n-k
• Reqs for binomial probability:
– (1) Experiment has 2 complementary outcomes.
– (2) On repeated trials, probabilities don’t change.
• Though formula gives probability of exactly k
“successes” out of n repetitions of an experiment,
we will usually use it for counting at least k
“successes” out of n
– So we have to add up the probabilities for k and
k+1 and k+2 and ... and n .
Examples of binomial distributions (I)
• In a family of 5 kids, P(exactly 3 girls)
• Roll a die 15 times, P(exactly 4 twos)
• Roll two dice 10 times, P(at most two
sums of 5)
Examples of binomial distributions (II)
• Feed vitamin A to one each of 10 pairs of
rats, then all run a maze. In 7 pairs, the Arat was faster. If vitamin A was no help
(i.e., each rat was equally likely to be
faster), how likely is it that, just by chance,
A-rat was faster in at least 7 pairs?
• In a county that is 40% Caucasian, how
likely is it that a jury pool of 20 people has
18 or more Caucasians?
Math stuff about binomial coeffients
• They’re called that because they are the coefficients of x
and y in the expansion of (x+y)n:
– C(n,0)xn + C(n,1)xn-1y + C(n,2)xn-2y2 + ... + C(n,n-1)xyn-1 + C(n,n)yn
• For small n , compute C(n,k) with “Pascal’s triangle”: 1’s
in first row and column, then each entry is sum of the one
above and the one to the right
(More from Marilyn vos Savant’s column)
Suppose we assume that 5% of the people are drug users. A
drug test is 95% accurate (i.e., it produces a correct result
95% of the time, whether the person is using drugs or not).
A randomly chosen person tests positive. Is the person
highly to be a drug user?
Marilyn’s answer: Given your conditions, once the person has
tested positive, you may as well flip a coin to determine
whether she or he is a drug user. The chances are only 5050. But the assumptions, the make-up of the test group and
the true accuracy of the tests themselves are additional
considerations.
(To see this, suppose the population is 10,000 people;
compare numbers of false positives and true positives.)
Drug [disease] testing probabilities
Drug [disease] present?
Test positive
Test negative
Sum
Yes
“Sensitivity”
False negative
1
No
False positive
“Specificity”
1
Ex: Suppose the Bovine test
for lactose abuse has a
sensitivity of 0.99 and a
specificity of 0.95; and that
7% of a certain population
abuses lactose. If a person
tests positive on the Bovine
test, how likely is it that (s)he
really abuses lactose?
pos
neg
abuser
.99
.01
clean
.05
.95
Assuming 7% of
population is really
positive:
x = sensitivity
y = specificity
z = P(pos test => pos)
curve: x = .99
points: x = .99
y = .95 , .90
z = .6 , .42
Counting dragonflies
(thanks to Profs. V. MacMillen and
R. Arnold)
Only two pairs
• 30 censuses altogether, 17 with only two
pairs
• Of 17, 12 had both in same plot
• Do they prefer to lay eggs in proximity?
Censuses with >2 pairs
P1
0
0
P2
0
3
P3
3
0
P1
4
0
P2
0
0
P3
0
4
P1
P2
P3
3
0
2
0
0
5
3
1
1
0
0
2
0
2
0
3
4
0
1
0
4
0
0
0
0
3
2
3
1
3
2
1
0
3
1
2
0
2
1
0
0
0
0
2
2
2
1
3
3
0
1
1
0
1
2
0
2
0
3
0
1
1
0
2
4
2
0
0
1
0
4
5
Up to 12 at the
same time
With 3 pairs
Download