Pr(S/I) = 0.4

advertisement
CONDITIONAL PROBABILITY
Pr(A/B) = the probability of A given B.
Note the difference between conditional and
categorical probability statements:
Categorical:
The probability that you will slip = 0.1
Pr(S) = 0.1
The probability that the Jays win the
World Series = 0.5
Pr(J) = 0.5
Conditional:
The probability that you will slip given that
the ground is icy = 0.4
Pr(S/I) = 0.4
The probability that the Jays win the
Series given that the Yankees make the
playoffs = 0.3
Pr(J/L) = 0.3
1
Definition of Conditional Probability
When Pr(B) > 0
Pr(A/B) = Pr(A & B)/Pr(B)
(Remember, you cannot divide by 0.)
Let’s do some examples to convince
ourselves that this is correct.
2
Example 1 (dice):
What is the probability of throwing 3 given
that you throw an odd number?
There are three ways to throw an odd
number: (1, 3, 5).
So, throwing a 3 is one of three possibilities.
Pr(3/odd) = 1/3.
Now let’s use the definition:
Pr(3/odd) = Pr(3 & odd)/Pr(odd)
= Pr(3)/Pr(1, 3, 5)
= (1/6)/(1/2)
= 1/3
3
Example 2 (cards):
What is the probability of drawing a black
card (B) given that you’ve drawn an ace or a
club (A v C)?
There are 13 clubs plus three aces (clubs
include one ace) = 16 cards. 14 of these
are black.
So Pr(B/A v C) = 14/16 = 7/8
Definition:
Pr(B/A v C) = Pr[B & (A v C)]/Pr(A v C)
Pr[B & (A v C)] = Pr(club or ace of spades)
= 14/52
Pr(A v C) = Pr(club or ace of spades or ace
of diamonds or ace of hearts)
= 16/52
Pr(B/A v C) = (14/52)/(16/52)
= 14/16 = 7/8
4
Evidence and Learning from Experience
Imagine that you have the lists of players for
two hockey teams.
Team A: 12 Canadians, 8 other
Team B: 16 Canadians, 4 other
The lists aren’t marked so you choose one
at random and select one player from it.
That player turns out to be Canadian.
What is the probability that you chose from
team B?
We want the probability that the player is
from B given that the player is Canadian,
i.e.:
Pr(B/C) = Pr(B & C)/Pr(C)
5
Pr(C/A) = 0.6
Pr(A) = Pr(B) = 0.5
Pr(C/B) = 0.8
Pr(C/B) = Pr(C & B)/Pr(B)
So:
Pr(C & B) = Pr(B & C) = Pr(C/B)xPr(B)
= 0.8 x 0.5
= 0.4
Similarly,
Pr(A & C) = Pr(C/A)xPr(A)
= 0.3
You can get a Canadian player in one of
two ways: (A & C) or (B & C). These are
mutually exclusive.
So, Pr(C) = Pr(A & C) + Pr(B & C)
= 0.7
Pr(B/C) = 0.4/0.7 = 0.57
Slightly better than 50%.
6
Suppose you pick two players from one of
the lists (alternatively, you pick a second
player from the list you first chose) and both
are Canadian? Now what is the probability
that you chose from B? (You cannot pick
the same name twice.)
Pr(B/C1 & C2) = Pr(B&C1&C2)/Pr(C1&C2)
Pr(B&C1&C2) = Pr(C2/B&C1) x Pr(B&C1)
= 15/19 x 0.4  0.32
Pr(A&C1&C2) = Pr(C2/A&C1) x Pr(A&C1)
= 11/19 x 0.3  0.17
Pr(C1&C2) = Pr(B&C1&C2)+ Pr(A&C1&C2)
 0.49
So:
Pr(B/C1&C2)  0.32/0.49  0.65
7
With the results from two choices we
increase the probability that we chose from
B.
By adding a second result/test we increase
our evidence that we have chosen from B.
We learn by experience by obtaining more
evidence.
Another way of putting it: more evidence of
a given kind increases our certainty in a
conclusion (we knew that anyway but this
shows that conditional probability captures
something correctly).
8
Basic Laws of Probability
Assumptions:
 Rules are for finite groups of
propositions (or events).
 If A and B are propositions (events),
then so are A v B, A & B, ~A.
 Elementary deductive logic (set
theory) is assumed.
 If A and B are logically equivalent
(same events), then Pr(A) = Pr(B).
We’ve already covered the basic laws:
1. Normality: 0  Pr(A)  1
2. Certainty: Pr() = 1
3. Additivity: Pr(A v B) = Pr(A) + Pr(B)
(Where A and B are mutually exclusive)
9
If A and B aren’t mutually exclusive we have
the general disjunction rule:
4. Pr(A v B) = Pr(A) + Pr(B) – Pr(A & B)
Notice we must subtract the “overlap”
between A and B.
This diagram might make it clear:
Pr(A)
Pr(A & ~B)
Pr(A & B)
Pr(B)
Pr(A v B)
Pr(B & ~A)
If we don’t subtract Pr(A & B) we end up
counting it twice. (See the book for an
‘algebraic’ proof.)
10
We also defined conditional probability:
5. Pr(A/B) = Pr(A & B)/Pr(B) (If Pr(B) > 0)
From this it follows that:
6. Pr(A & B) = Pr(A/B) x Pr(B) (If Pr(B) > 0)
This is the general conjunction rule (i.e. it
can be used for events/propositions that are
not independent).
11
Total Probability
Notice that A = (A & B) v (A & ~B)
Why? Remember, every proposition (event)
is true or false (happens or doesn’t). So, if
A is true (happens), then either B is (does)
as well or B is not (does not).
Therefore,
Pr(A) = Pr[(A & B) v (A & ~B)]
= Pr(A & B) + Pr(A & ~B) [why?]
So:
7. Pr (A) = Pr(A/B)Pr(B) + Pr(A/~B)Pr(~B)
(This follows from the definition of
conditional probability.)
12
Logical Consequence
If B logically entails A, then
Pr(B)  Pr(A)
Why? In such cases, B can’t occur without
A occurring so B is equivalent to A & B. As
we just saw:
Pr(A) = Pr(A & B) + Pr(A & ~B),
So, if B entails A, then
Pr(A) = Pr(B) + Pr(A & ~B).
But Pr(A & ~B)  0. Therefore:
Pr(B)  Pr(A) when B entails A.
13
Statistical Independence
If Pr(A) and Pr(B) > 0, then A and B are
statistically independent if and only if:
8. Pr(A/B) = Pr(A)
In other words, B’s truth (occurrence)
doesn’t change the probability of A’s truth
(occurrence).
So, Pr(A & B) = Pr(A) x Pr(B)
Similarly, If A, B and C are independent:
Pr(A & B & C) = Pr(A) x Pr(B) x Pr(C)
And so on.
14
Here an important extension of law 5:
If Pr(E) > 0:
Pr[A/(B&E)] = Pr(A&B&E)/Pr(B&E) (from 5)
i. Pr([A&B]&E) = Pr[(A&B)/E] x Pr(E) (from
5)
ii. Pr(B&E) = Pr(B/E) x Pr(E)
(from 5)
So if we divide i. by ii. we get:
5C. Pr[A/(B&E)] = Pr[(A&B)/E]/Pr(B/E)
This is the conditionalized form of
Pr[A/(B&E)].
15
Odd Question #2:
Remember, A & B entails B (‘A & B’ can’t be
true without ‘B’ being true).
So, it follows that
Pr(A & B)  Pr(B)
16
Bayes’ Rule
H = a hypothesis
E = evidence [Pr(E) > 0]
Pr(H/E) =
Pr(H)Pr(E/H)
Pr(H)Pr(E/H) + Pr(~H)Pr(E/~H)
This is known as Bayes’ Rule.
This follows because H and ~H are mutually
exclusive and exhaustive.
(See chapter 7 in the book for a proof of this
rule).
Try the “Hockey Team” example above
using Bayes’ rule. You’ll see that it works.
17
Sometimes there are more than two
mutually exclusive and jointly exhaustive
hypotheses:
H1, H2, H3, … Hk.
For each i, Pr(Hi)>0.
We can generalize Bayes’ Rule:
Pr(Hj/E) =
Pr(Hj)Pr(E/Hj)
[Pr(Hi)Pr(E/Hi)]
for i = 1, 2, … k.
NOTE: ‘’ just means ‘the sum of’; i.e. you
figure out Pr(H)Pr(E/H) for each hypothesis
and then add them all up.
18
Base rates and reliability
Imagine a test to determine if drinking water
has E. Coli. The test is right 90% of the
time.
Assume 95% of the water in Ontario is free
of E. Coli.
Now you run the test on your drinking water
and it comes up positive for E. Coli. What is
the probability that your water actually has
E. Coli?
Do you think it’s very likely?
After all, the test is very reliable.
19
E = a water sample has E. Coli
~E = a water sample is free of E. Coli
P = the test is positive.
Pr(E) = 0.05 Pr(~E) = 0.95 Pr(P/E) = 0.90
Pr(P/~E) = 0.10 because the test is wrong
10% of the time.
We want to know Pr(E/P). Let’s use Bayes’
Rule:
Pr(E/P) =
Pr(E)Pr(P/E)
P(E)Pr(P/E) + Pr(~E)Pr(P/~E)
= 0.05 x 0.90/[0.05 x 0.90 + 0.95 x 0.10]
= 0.045/0.14
 0.32
So it is only 32% likely that your water is
bad. It is 68% likely that it’s good.
But the test is reliable, so what’s going on
here?
20
Even though the test is very reliable, the
vast majority of water is clean. This is
called the base rate or background
information and it can’t be ignored:
If we tested 1000 samples, only 50 would
have E. Coli, 950 would not (this is the base
rate).
Of the 950 clean samples, the test says:
10% (95) are contaminated;
90% (855) are clean.
Of the 50 contaminated samples, the test
says:
90% (45) are contaminated;
10% (5) are clean.
So, the test would tell us that 140 samples
are contaminated even though only 45 of
those samples are contaminated (5 are
false negatives).
This is why base rates make a difference.
21
Two ideas of reliability:
I. Pr(P/E): how reliable is the test at
identifying contaminated water
given that it is contaminated.
This is a question of how well the
test is designed.
II. Pr(E/P): how trustworthy is the test result
given that it came up positive.
This is a feature of the test plus
the base rate.
We must be careful to take into account
whether a phenomenon is very common or
very rare in the relevant population before
we form a decision about probabilities.
If something is quite rare, even a reliable
test will give more false positives than there
are actual positives.
Hence the test can be reliable (idea I) but
the result unreliable (idea II).
22
Summary
1. 0  Pr(A)  1
2. Pr() = 1
4. Pr(A v B) = Pr(A) + Pr(B) – Pr(A & B)
5. Pr(A/B) = Pr(A & B)/Pr(B)
6. Pr(A & B) = Pr(A/B) x Pr(B)
7. Pr (A) = Pr(A/B)Pr(B) + Pr(A/~B)Pr(~B)
Bayes’ Rule:
Pr(H/E) =
Pr(H)Pr(E/H)
Pr(H)Pr(E/H) + Pr(~H)Pr(E/~H)
23
Homework:
 Do the exercises at the end of chapters 5,
6 and 7
 Go over the “odd questions” in these
chapters until you understand them.
24
Download