Conditional Probability When we obtain additional information about a probability experiment, we want to use the additional information to reassess the probabilities of events given the new information. Example: A box has 5 computer chips. Two are defective. A random sample of size 2 is selected from the box. (All subsets of size 2 are equally likely). 1. Compute the probability that the second chip is defective. Intuition/symmetry → P ( second chip defective) = 25 . More formally. . . # outcomes with second chip defective # ways to draw two chips (4)(2) . = (5)(4) P ( second chip defective) = 2. If we know that the first chip is good, what is the probability that the second chip is defective. Defn. The conditional probability of an event A given an event B is P (A|B) := P (A ∩ B) , P (B) provided P (B) 6= 0. The definition makes some sense...The conditional probability of A given B is the fraction of outcomes in B that are also in A. An important implication of the definition is as follows: (∗∗) P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A). (**) holds even if P (A) = 0 or P (B) = 0. Example: Re-compute the probability that the second chip is defective given that the first chip is good using the definition. Example: More computer chips...A box has 500 computer chips with a speed of 400 Mhz and 500 computer chips with a speed of 500 Mhz. The numbers of good (G) and defective (D) chips at the two different speeds are as shown in the table below. 400 Mhz 500 Mhz G 480 490 970 D 20 10 30 500 500 Total=1000 We select a chip at random and observe its speed. What is the probability that the chip is defective given that its speed is 400 Mhz? Example: Consider three cards. One card has two green sides, one card has two red sides, and the third card has one green side and one red side. ({G, G}, {R, R}, {R, G}) - I pick a card at random and show you a randomly selected side. - What is the proability that the flip side is green given that the side I show you is green? Independence Sometimes, knowledge that B occurred does not change our assessment of the P (A). Let’s say I toss a fair coin. I tell you that I got a tail. I then give you the coin to toss. Does the knowledge that I got a tail affect what you think the chance is that you will get a head? Intuitively, two events A and B are independent if the event B does not have any influcence on the probability that A happens (and vice versa). Mathematically, independence of two events is defined as follows: Defn. Two events A and B are called independent if P (A ∩ B) = P (A)P (B). Result: If P (B) 6= 0, then A and B are independent ↔ P (A|B) = P (A). • Proof of Result: (HW...Use the definitions of conditional probability and independence.) The result gives us another way to think of independence: the fraction of A out of B is the same as the fraction of A out of Ω. Example: An alternative model for logging on to the AOL network using dial-up. Suppose I log on to AOL using dial-up. I connect successfully if and only if the phone number works and the AOL network works. The probability that the phone works is .9, and the probability that the network works is .6. Suppose that the status of the phone line and the status of the AOL network are independent. What is the probability that I connect successfully? Result: Events A and B are independent ↔ A and B are independent ↔ A and B are independent ↔ A and B are independent. Proof of Result: (HW...Use the definition of independence and consequence 1. of Kolmogorov’s Axioms). I defined independence of two events. We can also talk about independence of a collection of events. Defn. Events A1 , . . . , An are mutually independent if for any {i1 , . . . , ik } ⊂ {1, . . . , n}, P( k \ j=1 A∗ij ) = k Y P (A∗ij ), j=1 where A∗ij may be Aij or Aij . Events A1 , . . . An are pairwise independent if for any i, j ∈ {1, . . . , n}, Ai and Aj are independent. Note: Mutual independence implies pairwise independence, but pairwise independence does not imply mutual independence. (See supplementary exercises for HW 2/3). A Little Bit on Systems in Series, Systems and Parallel, and Reliability (Reference: Hofmann, pp. 17-18.) • A parallel system consists of k components c1 , . . . , ck arranged in such a way that the system works if and only if at least one of the k components functions properly. • A series system consists of k components c1 , . . . , ck arranged in such a way that the system works if an only if all of the components function properly. – The system consisting of the AOL network and the phone line is an example of a parallel system. • The reliability of a system is the probability that the system works. – For example, the reliability of the system consisting of the AOL network and the phone line is .54. • We can also construct larger systems with sub-systems that are connected in series and in parallel. Example: Parallel system with k mutually independent components. Let c1 , . . . , ck denote the k components in a parallel system. Assume the k components operate independently, and P (cj works ) = pj . What is the reliability of the system? P ( system works) = P ( at least one component works) = 1 − P ( all components fail ) = 1 − P (c1 fails and c2 fails . . . and ck fails ) k Y = 1 − (1 − pj ). j=1 Example: System in series with k mutually independent components. Let c1 , . . . , ck denote the k components in a system. Assume the k components are connected in series, operate independently, and P (cj works ) = pj . What is the reliability of the system? P ( system works) = P ( all least components work) k Y = pj . j=1 Example:Let’s compute the reliability of a system consisting of subsystems connected in series and in parallel. Disjointness and Independence are Different Ideas Disjoint/Mutually Exclusive vs. Independent P (A ∩ B) = P (∅) = 0 P (A ∩ B) = P (A)P (B) If I know B happened, then I know A did not happen. Knowing that B happened tells me nothing about P (A). P (A|B) = 0 P (A|B) = P (A) Law of Total Probability and Bayes’ Rule. This stuff is not new. The Law of Total Probability and Bayes’ Rule are just restatements of what we already know. Example: A rediculous game... Box 1 (B1) has two gold coins and one penny. Box 2 (B2) has one gold coin and two pennies. Box 3 (B3) has four gold coins and one penny. - Player 1 rolls a fair 6-sided die. Call the outcome D. Player 1 picks a box according to the outcome of the die roll as follows: 1, 2 → pick B1 D = 3, 4, 5 → pick B2 6 → pick B3. Then, player 1 selects a coin at random from the chosen box and tells player 2 whether the coin is a gold coin or a penny. - Player 2 then guesses which box the coin came from. - If player 2 guesses correctly, then player 2 keeps the selected coin. Otherwise, player 1 keeps the chosen coin. a.) What is the probability that player 1 selects a gold coin? b.) What box will player 2 pick if player 1 selects a gold coin? c.) What is the probability that player 2 guesses the correct box? d.) Would you prefer to be player 1 or player 2? a.) A tree diagram shows all possible outcomes of the two-step procedure. - There are 3 distinct ways to get a gold coin: E1 = (B1, G), E2 = (B2, G), and E3 = (B3, G). - E1 , E2 , and E3 are mutually disjoint. - E1 ∪ E2 ∪ E2 = G - Axiom (iii) → P (G) = P (E1 ∪ E2 ∪ E2 ) = P (E1 ) + P (E2 ) + P (E3 ) - By definition of conditional probability, P (E1 ) = P (B1 and G) = P (G|B1)P (B1) 2 1 2 = ( )( ) = 3 3 9 - Likewise, 1 1 P (E2 ) = P (G|B2)P (B2) = ( )( ) 3 2 4 1 P (E3 ) = P (G|B3)P (B3) = ( )( ) 5 6 - Then, P (G) = 2 9 + 16 + 4 30 ≈ .522. *** We just used the Law of Total Probability to compute the probability of a gold coin. Defn. A collection of events B1, . . . Bk is called a cover or partition of Ω if (i) the events are disjoint (Bi ∩ Bj = ∅ for i 6= j), and (ii) the union of the events is Ω (∪ki=1 Bi = Ω). – If we represent a multi-step procedure with a tree diagram, then the branches of the tree are a cover. – We can also represent a cover with a different kind of diagram: Thrm. Law of Total Probability: If the collection of events B1, . . . , Bk is a cover of Ω, and A is an event, then P (A) = k X P (A|Bi)P (Bi). i=1 Proof of the Law of Total Probability: – By definition of conditional probability P (A|Bi )P (Bi ) = P (A ∩ Bi ) – Because B1 , . . . , Bk partition Ω, the events A ∩ B1 , . . . A ∩ Bk are disjoint, and ∪ki=1 Ai = A. P P – By Axiom (iii), P (A) = ki=1 P (A ∩ Bi ) = ki=1 P (A|Bi )P (Bi ). Pictures for the law of total probability... A tree diagram... A Venn diagram... b.) I tell you that I got a gold coin. Which box do you think it came from? We want to compute P (Bj |G), j = 1, 2, 3 and pick the highest one. By definition of conditional probability, P (Bj ∩ G) P (G) P (G|Bj )P (Bj ) = P (G) P (Bj |G) = = P (G|Bj )P (Bj ) P (G|B1 )P (B1 ) + P (G|B2 )P (B2 ) + P (G|B3 )P (B3 ) Specifically, P (B1 |G) = P (B2 |G) = P (B3 |G) = To figure out these probabilities, we used Bayes’ rule. Thrm. Bayes’ Rule: If B1, . . . , Bk is a cover or partition of Ω, and A is an event, then P (A|Bj )P (Bj ) P (Bj |A) = Pk . j=1 P (A|Bj )P (Bj ) Proof of Bayes’ Rule: P (Bj |A) = P (Bj ∩ A) P (A|Bj )P (Bj ) = P (A) P (A|Bj )P (Bj ) = Pk . P (A|B )P (B ) j j j=1 We can represent Bayes’ rule with tree diagrams and Venn diagrams as well.