Conditional Probability

advertisement
Conditional Probability
When we obtain additional information about a probability experiment, we
want to use the additional information to reassess the probabilities of events
given the new information.
Example: A box has 5 computer chips. Two are defective. A random sample
of size 2 is selected from the box. (All subsets of size 2 are equally likely).
1. Compute the probability that the second chip is defective.
Intuition/symmetry → P ( second chip defective) = 25 .
More formally. . .
# outcomes with second chip defective
# ways to draw two chips
(4)(2)
.
=
(5)(4)
P ( second chip defective) =
2. If we know that the first chip is good, what is the probability that the
second chip is defective.
Defn. The conditional probability of an event A given an event B
is
P (A|B) :=
P (A ∩ B)
,
P (B)
provided P (B) 6= 0.
The definition makes some sense...The conditional probability of A given
B is the fraction of outcomes in B that are also in A.
An important implication of the definition is as follows:
(∗∗)
P (A ∩ B) = P (A|B)P (B)
= P (B|A)P (A).
(**) holds even if P (A) = 0 or P (B) = 0.
Example: Re-compute the probability that the second chip is defective
given that the first chip is good using the definition.
Example: More computer chips...A box has 500 computer chips with a speed
of 400 Mhz and 500 computer chips with a speed of 500 Mhz. The numbers
of good (G) and defective (D) chips at the two different speeds are as shown
in the table below.
400 Mhz 500 Mhz
G
480
490
970
D
20
10
30
500
500
Total=1000
We select a chip at random and observe its speed. What is the probability
that the chip is defective given that its speed is 400 Mhz?
Example: Consider three cards. One card has two green sides, one card
has two red sides, and the third card has one green side and one red side.
({G, G}, {R, R}, {R, G})
- I pick a card at random and show you a randomly selected side.
- What is the proability that the flip side is green given that the side I
show you is green?
Independence
Sometimes, knowledge that B occurred does not change our assessment of
the P (A). Let’s say I toss a fair coin. I tell you that I got a tail. I then
give you the coin to toss. Does the knowledge that I got a tail affect what
you think the chance is that you will get a head?
Intuitively, two events A and B are independent if the event B does not
have any influcence on the probability that A happens (and vice versa).
Mathematically, independence of two events is defined as follows:
Defn. Two events A and B are called independent if
P (A ∩ B) = P (A)P (B).
Result: If P (B) 6= 0, then
A and B are independent ↔ P (A|B) = P (A).
• Proof of Result: (HW...Use the definitions of conditional probability and
independence.)
The result gives us another way to think of independence: the fraction of A
out of B is the same as the fraction of A out of Ω.
Example: An alternative model for logging on to the AOL network using
dial-up.
Suppose I log on to AOL using dial-up. I connect successfully if and only
if the phone number works and the AOL network works. The probability
that the phone works is .9, and the probability that the network works is .6.
Suppose that the status of the phone line and the status of the AOL network
are independent. What is the probability that I connect successfully?
Result: Events A and B are independent
↔ A and B are independent
↔ A and B are independent
↔ A and B are independent.
Proof of Result: (HW...Use the definition of independence and consequence 1. of Kolmogorov’s Axioms).
I defined independence of two events. We can also talk about independence
of a collection of events.
Defn. Events A1 , . . . , An are mutually independent if for any {i1 , . . . , ik } ⊂
{1, . . . , n},
P(
k
\
j=1
A∗ij )
=
k
Y
P (A∗ij ),
j=1
where A∗ij may be Aij or Aij . Events A1 , . . . An are pairwise independent if
for any i, j ∈ {1, . . . , n}, Ai and Aj are independent.
Note: Mutual independence implies pairwise independence, but pairwise independence does not imply mutual independence. (See supplementary exercises for HW 2/3).
A Little Bit on Systems in Series, Systems and Parallel, and
Reliability (Reference: Hofmann, pp. 17-18.)
• A parallel system consists of k components c1 , . . . , ck arranged in such a way that the
system works if and only if at least one of the k components functions properly.
• A series system consists of k components c1 , . . . , ck arranged in such a way that the
system works if an only if all of the components function properly.
– The system consisting of the AOL network and the phone line is an example of a
parallel system.
• The reliability of a system is the probability that the system works.
– For example, the reliability of the system consisting of the AOL network and the
phone line is .54.
• We can also construct larger systems with sub-systems that are connected in series and
in parallel.
Example: Parallel system with k mutually independent components.
Let c1 , . . . , ck denote the k components in a parallel system. Assume the
k components operate independently, and P (cj works ) = pj . What is
the reliability of the system?
P ( system works) = P ( at least one component works)
= 1 − P ( all components fail )
= 1 − P (c1 fails and c2 fails . . . and ck fails )
k
Y
= 1 − (1 − pj ).
j=1
Example: System in series with k mutually independent components.
Let c1 , . . . , ck denote the k components in a system. Assume the
k components are connected in series, operate independently, and
P (cj works ) = pj . What is the reliability of the system?
P ( system works) = P ( all least components work)
k
Y
=
pj .
j=1
Example:Let’s compute the reliability of a system consisting of subsystems connected in series and in parallel.
Disjointness and Independence are Different Ideas
Disjoint/Mutually Exclusive
vs.
Independent
P (A ∩ B) = P (∅) = 0
P (A ∩ B) = P (A)P (B)
If I know B happened,
then I know A did not happen.
Knowing that B happened
tells me nothing about P (A).
P (A|B) = 0
P (A|B) = P (A)
Law of Total Probability and Bayes’ Rule.
This stuff is not new. The Law of Total Probability and Bayes’ Rule are
just restatements of what we already know.
Example: A rediculous game...
Box 1 (B1) has two gold coins and one penny. Box 2 (B2) has one gold
coin and two pennies. Box 3 (B3) has four gold coins and one penny.
- Player 1 rolls a fair 6-sided die. Call the outcome D. Player 1 picks
a box according to the outcome of the die roll as follows:


1, 2 → pick B1
D = 3, 4, 5 → pick B2


6 → pick B3.
Then, player 1 selects a coin at random from the chosen box and tells
player 2 whether the coin is a gold coin or a penny.
- Player 2 then guesses which box the coin came from.
- If player 2 guesses correctly, then player 2 keeps the selected coin.
Otherwise, player 1 keeps the chosen coin.
a.) What is the probability that player 1 selects a gold coin?
b.) What box will player 2 pick if player 1 selects a gold coin?
c.) What is the probability that player 2 guesses the correct box?
d.) Would you prefer to be player 1 or player 2?
a.) A tree diagram shows all possible outcomes of the two-step procedure.
- There are 3 distinct ways to get a gold coin:
E1 = (B1, G),
E2 = (B2, G),
and E3 = (B3, G).
- E1 , E2 , and E3 are mutually disjoint.
- E1 ∪ E2 ∪ E2 = G
- Axiom (iii) → P (G) = P (E1 ∪ E2 ∪ E2 ) = P (E1 ) + P (E2 ) + P (E3 )
- By definition of conditional probability,
P (E1 ) = P (B1 and G)
= P (G|B1)P (B1)
2 1
2
= ( )( ) =
3 3
9
- Likewise,
1 1
P (E2 ) = P (G|B2)P (B2) = ( )( )
3 2
4 1
P (E3 ) = P (G|B3)P (B3) = ( )( )
5 6
- Then, P (G) =
2
9
+ 16 +
4
30
≈ .522.
*** We just used the Law of Total Probability to compute the probability
of a gold coin.
Defn. A collection of events B1, . . . Bk is called a cover or partition of Ω
if
(i) the events are disjoint (Bi ∩ Bj = ∅ for i 6= j), and
(ii) the union of the events is Ω (∪ki=1 Bi = Ω).
– If we represent a multi-step procedure with a tree diagram, then the
branches of the tree are a cover.
– We can also represent a cover with a different kind of diagram:
Thrm. Law of Total Probability: If the collection of events B1, . . . , Bk
is a cover of Ω, and A is an event, then
P (A) =
k
X
P (A|Bi)P (Bi).
i=1
Proof of the Law of Total Probability:
– By definition of conditional probability P (A|Bi )P (Bi ) = P (A ∩ Bi )
– Because B1 , . . . , Bk partition Ω, the events A ∩ B1 , . . . A ∩ Bk are
disjoint, and ∪ki=1 Ai = A.
P
P
– By Axiom (iii), P (A) = ki=1 P (A ∩ Bi ) = ki=1 P (A|Bi )P (Bi ).
Pictures for the law of total probability...
A tree diagram...
A Venn diagram...
b.) I tell you that I got a gold coin. Which box do you think it came
from?
We want to compute P (Bj |G), j = 1, 2, 3 and pick the highest one.
By definition of conditional probability,
P (Bj ∩ G)
P (G)
P (G|Bj )P (Bj )
=
P (G)
P (Bj |G) =
=
P (G|Bj )P (Bj )
P (G|B1 )P (B1 ) + P (G|B2 )P (B2 ) + P (G|B3 )P (B3 )
Specifically,
P (B1 |G) =
P (B2 |G) =
P (B3 |G) =
To figure out these probabilities, we used Bayes’ rule.
Thrm. Bayes’ Rule: If B1, . . . , Bk is a cover or partition of Ω, and
A is an event, then
P (A|Bj )P (Bj )
P (Bj |A) = Pk
.
j=1 P (A|Bj )P (Bj )
Proof of Bayes’ Rule:
P (Bj |A) = P (Bj ∩ A)
P (A|Bj )P (Bj )
=
P (A)
P (A|Bj )P (Bj )
= Pk
.
P
(A|B
)P
(B
)
j
j
j=1
We can represent Bayes’ rule with tree diagrams and Venn diagrams as well.
Download