Lecture Notes 1 - Probability Space - March 2, 2022 1 Probability Space ( Ω, F, P ) Figure 1: Probability Space Definition 1. Sample Space and Outcome The sample space Ω is the set consisting of all the possible outcomes of a random experiment. Definition 2. Event The event E is an subset of Ω. F is supposed to be a collection of subsets of Ω or events. 1 Definition 3. Probability Probability or Probability measure P is a real-valued function defined on F which satisfies the following axioms. 1. For every event E ∈ F, P (E) ≥ 0 2. P (Ω) = 1 3. For any sequence of disjoint events E1 , E2 , . . . P( ∞ [ Ei ) = ∞ X i=1 P (Ei ) i=1 Example 1. Tossing a fair coin three times Ω = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T } F = {φ, {HHH}, {HHT }, · · · , {HHH, HHT }, · · · , Ω } 1 P ({HHH, HHT, HT H, T HH}) = 2 .. . Example 2. Tossing a fair coin until the first head appears Ω = {H, T H, T T H, T T T H, T T T T H, . . . } F = {φ, {H}, {H, T H}, {H, T H, T T H, }, . . . } 3 P ({H, T H}) = 4 .. . Example 3. Romeo and Juliet have a date at a given time, and each will arrive at the meeting place with a delay between 0 and 1 hour, with all pairs of delays being equally likely. Ω = {(x, y)|0 ≤ x ≤ 1, 0 ≤ y ≤ 1} F = {E|E ⊂ Ω} P ({(x, y)||x − y|≤ 23 1 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 }) = 12 144 .. . 2 2 Basic Theorems : Properties of Probability Theorem 1. P (φ) = 0 (1) Proof. Ei = φ for all i ∞ ∞ ∞ [ [ X X 1 = P (Ω) = P (Ω ( Ei )) = P (Ω) + P (Ei ) = 1 + P (Ei ) i=1 i=1 i=1 ∴ P (φ) = 0 Theorem 2. For any finite sequence of n disjoint events E1 , . . . En , P( n [ Ei ) = i=1 n X P (Ei ) (2) i=1 Proof. Ei = φ P( n [ for i > n Ei ) = P ( i=1 ∞ [ Ei ) = i=1 ∞ X P (Ei ) = i=1 n X P (Ei ) + i=1 ∞ X i=n+1 P (Ei ) = n X P (Ei ) i=1 Theorem 3. For every event E, P (E c ) = 1 − P (E) Proof. Ω=E [ Ec, E \ Ec = φ and P (Ω) = 1 ∴ 1 = P (Ω) = P (E) + P (E c ) P (E c ) = 1 − P (E) Theorem 4. If Ei ⊂ Ej , then P (Ei ) ≤ P (Ej ) (3) Proof. P (Ej ) = P (Ei ) + P (Eic \ P (Eic Ej ) ≥ 0 \ Ej ) ∴ P (Ei ) ≤ P (Ej ) 3 Theorem 5. For every event E, 0 ≤ P (E) ≤ 1 Proof. P (E) ≥ 0 P (E) ≤ P (Ω) = 1 ∴ 0 ≤ P (E) ≤ 1 Theorem 6. For every two events Ei and Ej , P (Ei [ Ej ) = P (Ei ) + P (Ej ) − P (Ei \ Ej ) Proof. \ \ \ Ej ) = P (Ei Ejc ) + P (Ei Ej ) + P (Eic Ej ) \ \ \ = P (Ei ) − P (Ei Ej ) + P (Ei Ej ) + P (Ej ) − P (Ei Ej ) \ = P (Ei ) + P (Ej ) − P (Ei Ej ) P (Ei [ 4 (4) 3 Computation of Probability Definition 4. Simple Sample Space A sample space Ω containing N outcomes is called a simple sample space if the probability assigned to each of the outcomes is 1 N . Remark. The probability P (E) of an event E containing N (E) outcomes in this simple sample space is P (E) = N (E) N Remark. Methods of Counting 1. Multiplication Rule If an experiment has k parts, where each part has ni possible outcomes, regardless of the outQ comes in the other parts , then the experiment has ki=1 ni outcomes. 2. Two crucial aspects when counting There are two crucial aspects to consider when we count the outcomes. First, we should distinguish between sampling with replacement and sampling without replacement. Second, we should determine whether or not the ordering of the outcome is important. Taking these two considerations into account, the possible methods of counting can be categorized into four cases- without replacement/ordered, without replacement/ unordered, with replacement/ordered, with replacement/ unorderedConsider selecting k out of n. The methods of counting all of the possible outcome under each of four cases can be represented as follows Figure 2: Methods of Counting [1] With replacement / Ordered In this case, the number of distinct outcomes is given by nk Example 4. If a password is required to have 8 characters (letters or numbers), how many distinct passwords can we get ? 5 [2] Without replacement / Ordered In this case, the number of distinct outcomes is given by n Pk = n! = n × n − 1 × n − 2 × ··· × n − k + 1 (n − k)! Example 5. A box contains n balls numbered 1, . . . n. If k balls are selected with replacement, what is the probability of the event E that each of the k balls that are selected will have a different number (n ≥ k)? Example 6. (Birthday Problem) What is the probability that at least two people in a group of k people (2 ≤ k ≤ 365) will have the same birthday ? [3] Without replacement / Unordered In this case, the order is not important. The outcome of the same kind would be regarded as the same outcome, Reflecting this fact, the number of distinct outcomes is given by n! n n Pk = = n Ck = k! (n − k)!k! k Example 7. Suppose that an urn contains 8 red balls and 4 white balls. We draw 3balls from the urn without replacement. Assuming that at each draw each ball in the urn is equally likely to be chosen, what is the probability that two of three balls drawn are red ? Example 8. Suppose that a deck of 52 cards containing four aces is shuffled thoroughly and the cards are then distributed among four players so that each player receives 13 cards. Determine the probability that each player will receive one ace. Figure 3 6 [4] With replacement / Unordered To count in this case, it is the easiest way to think of putting k balls in the bins numbered from 1 to n. For example, one outcome can be shown as the following figure. Figure 4 Thinking in this way, counting the number of distinct outcomes is equivalent to counting the number of arrangements of k balls and n − 1 walls in the middle (except the two end walls).Therefore, the number of distinct outcomes is given by n−1+k (n − 1 + k)! C = = n−1+k k (n − 1)!k! k Example 9. From the numbers 1, 2, . . . , 45, you may pick any six for your ticket. If the winning number is decided by randomly selecting six numbers from the forty-five with replacement, determine the probability of your winning. 7 4 Conditional Probability Definition 5. Conditional Probability For an event F such that P (F ) > 0, we define the conditional probability of E given F by T P (E F ) P (E|F ) = P (F ) (5) Figure 5: Conditional Probability Remark. Conditional probability, P (E|F ), is the name given to the new belief after receiving the new information, in this case that event F occurred. Conditional probability is also a probability because it satisfies the axioms of probability. P (E|F ) ≥ 0 T P (F ) P (Ω F ) = =1 P (Ω|F ) = P (F ) P (F ) T S T S T ∞ ∞ ∞ [ F )) X P (Ei F ) X P( ∞ F) P (( ∞ i=1 (Ei i=1 Ei ) = = = P (Ei |F ) P ( Ei |F ) = P (F ) P (F ) P (F ) i=1 i=1 i=1 Example 10. We roll two fair 6-sided dice. Given that the two dice land on different numbers, find the conditional probability that at least one die roll is a 6? Figure 6 8 Theorem 7. Suppose P (F ) > 0. Then \ P (E F ) = P (E|F )P (F ) (6) Assuming that all of the conditioning events have positive probability, we get P( n \ Ei ) = P (E1 )P (E2 |E1 )P (E3 |E1 \ E2 ) · · · P (En | i=1 n−1 \ Ei ) (7) i=1 Example 11. Three cards are drawn from an ordinary 52-card deck without replacement. Find the probability that none of the three cards is a heart Theorem 8. The Theorem of Total Probability If F1 , F2 , . . . Fn is a partition P (E) = n X 1 of Ω and P (Fi ) > 0 for all i, then P (E|Fi )P (Fi ) (8) i=1 Figure 7: Probability Space Proof. E= n [ \ (E Fi ) i=1 (E \ P (E Fi ) \ \ \ (E Fj ) = φ for any i, j Fi ) = P (E|Fi )P (Fi ) ∴ P (E) = n X P (E \ Fi ) = i=1 n X P (E|Fi )P (Fi ) i=1 1 It is said that F1 , F2 , . . . Fn is a partition when F1 , F2 , . . . Fn are disjoint and exhaustive, that is, S= n [ Fi and Fi \ Fj = φ f or any i, j i=1 9 Example 12. Suppose that a box contains one fair coin and one coin with a head on each side. Suppose also that one coin is selected at random and it is tossed. What is the probability that a head will be obtained ? Figure 8 Theorem 9. Bayes’ Theorem If Fi , F2 , . . . Fn is a partition of Ω and P (Fj ) > 0, P (E) > 0, then P (E|Fi )P (Fi ) 2 P (Fi |E) = Pn j=1 P (E|Fj )P (Fj ) (9) Proof. P (Fi |E) = T P (E|Fi )P (Fi ) P (Fi E) = Pn P (E) j=1 P (E|Fj )P (Fj ) Example 13. If a person has the disease, the test results are positive with probability 0.95,and if the person does not have the disease, the test results are negative with probability 0.95 . A random person drawn from a certain population has probability 0.001 of having the disease. Given that the person just tested positive, what is the probability of having the disease ? We will call the event that a person has the disease ”D” and the event that the test results are negative ”N”. We know P (D) = 0.001 P (N c |D) = 0.95 P (Dc ) = 0.999 P (N |D) = 0.05 P (N c |Dc ) = 0.05 P (N |Dc ) = 0.95 By applying the Bayes’ theorem, we get P (D|N c ) = 2 0.95 · 0.001 P (N c |D)P (D) = ≈ 0.0187 c c c c P (N |D)P (D) + P (N |D )P (D ) 0.95 · 0.001 + 0.05 · 0.999 This way of updating the probability of event Fi is usually called Bayesian Updating. 10 5 Independence Definition 6. Independence E and F are called independent if P (E \ F ) = P (E) · P (F ) (10) Remark. When P (E) > 0, P (F ) > 0, this definition of independence is equivalent to P (E|F ) = P (E) and P (F |E) = P (F ) (11) Theorem 10. Suppose that P (E) > 0 P (F ) > 0 1. If E and F are disjoint, then E and F are dependent 2. If E and F are independent, then E and F are not disjoint Remark. Suppose that P (E) = 0 and (or) P (F ) = 0. Then whether E and F are disjoint or not, two events are independent Theorem 11. If E and F are independent, then E and F c are also independent. Proof. P (E \ F c ) = P (E) − P (E \ F ) = P (E) − P (E)P (F ) = P (E)[1 − P (F )] = P (E) · P (F c ) Example 14. If E and F are independent, then are E c and F c also independent ? P (E c \ F c ) = P ((E [ F )c ) = 1 − P (E [ F) \ = 1 − (P (E) + P (F ) − P (E F )) = 1 − (P (E) + P (F ) − P (E) · P (F ))(since E and F are independent) = 1 − P (F ) − P (E)(1 − P (F )) = (1 − P (E)) · (1 − P (F )) = P (E c ) · P (F c ) Example 15. A card is selected at random from an ordinary deck of 52 playing cards. E is the event that the selected card is an ace and F is the event that it is a spade. Are events E and F independent ? 11 Definition 7. (Mutual) Independence E1 , E2 , . . . En are said to be (mutually) independent if for all sub-collection of k ≤ n events, P (Ei \ Ej \ ··· \ Ek ) = P (Ei ) · P (Ej ) · · · P (Ek ) (12) Example 16. Consider two independent fair coin tosses, and the following events : H1 = {1st toss is a head} H2 = {2nd toss is a head} D = {the two tosses have different results} Are these three events independent ? We know 1 1 1 P (H1 ) = P (H2 ) = P (D) = 2 2 2 \ \ \ 1 1 1 P (H1 H2 ) = P (H1 D) = P (H2 D) = 4 4 4 \ \ P (H1 H1 D) = 0 From these facts, P (H1 \ H2 ) = 1 = P (H1 ) · P (H2 ) 4 1 D) = = P (H1 ) · P (D) for i = 1, 2 4 \ \ 1 P (H1 H2 D) = 0 6= = P (H1 ) · P (H2 ) · P (D) 8 P (Hi \ Therefore, H1 and H2 are independent, H1 and D are independent, H2 and D are independent, but H1 , H2 and D are not (mutually)independent . Remark. (Mutual) Independence implies that the occurrence or non-occurrence of any number of the events from collection of events, E1 , E2 , . . . En carries no information on the remaining events or their complements Example 17. Assumethat E1 , E2 , E3 , E4 are (mutually) independent and P (E3 Show that P (E1 [ E2 |E3 \ E4 ) = P (E1 [ E2 ) We have P (E1 |E3 \ T T P (E1 E3 E4 ) P (E1 )P (E3 )P (E4 ) T E4 ) = = = P (E1 ) P (E3 E4 ) P (E3 )P (E4 ) 12 T E4 ) > 0. We similarly obtain \ P (E2 |E3 E4 ) = P (E2 ) \ \ \ P (E1 E2 |E3 E4 ) = P (E1 E2 ) From these facts, we get P (E1 [ E2 |E3 \ \ \ \ E4 ) + P (E1 |E3 E4 ) − P (E1 E2 |E3 E4 ) \ [ = P (E1 ) + P (E2 ) − P (E1 E2 ) = P (E1 E2 ) E4 ) = P (E1 |E3 \ Definition 8. Conditional Independence Given an event G, the event E and F are called conditionally independent if P (E \ F |G) = P (E|G) · P (F |G) (13) Remark. The conditional independence is also characterized by T T T \ P (G) · P (E F |G) P (E F G) T = P (E|F G) = P (F G) P (G) · P (F |G) P (G) · P (E|G) · P (F |G) = = P (E|G) P (G) · P (F |G) (14) (15) This relation states that if G is known to have occurred, the additional knowledge that F also occurred does not change the probability of E Remark. Independence of two event E and F does not imply conditional independent, and vice versa. Example 18. There are two coins, a blue and a red one. We choose one of the two at random, each being chosen with probability 1 2, and proceed with two independent tosses. The coins are biased: with the blue coin,the probability of heads in any given toss is 0.99, whereas for the red coin, it is 0.01 Let B be the event that the blue coin was selected. Let also Hi be the event that the ith toss resulted in heads. 1. When the event B is known to have occurred , are the events H1 and H2 conditionally independent? 2. Are the events H1 and H2 independent? We know P (Hi |B) = 0.99 P (Hi |R) = 0.01 ∀i 13 P (H1 \ H2 |B) = 0.99 · 0.99 = P (H1 |B) · P (H2 |B) Thus, H1 and H2 are conditionally independent given B 1 1 · 0.99 + · 0.01 = 0.5 2 2 1 1 P (H2 ) = P (B) · P (H2 |B) + P (R) · P (H2 |R) = · 0.99 + · 0.01 = 0.5 2 2 \ \ \ 1 1 P (H1 H2 ) = P (B) · P (H1 H2 |B) + P (R) · P (H1 H2 |R) = · 0.992 + · 0.012 = 0.4901 2 2 P (H1 ) = P (B) · P (H1 |B) + P (R) · P (H1 |R) = From these facts, P (H1 \ H2 ) = 0.4901 6= 0.25 = P (H1 ) · P (H2 ) Thus, H1 and H2 are not independent 14 6 Appendix Theorem 12. Inclusion and Exclusion Formula P (∪ni=1 Ei ) = n X X P (Ei ) − i=1 X P (Ei ∩ Ej ) + 1≤i<j≤n P (Ei ∩ Ej ∩ Ek ) − . . . 1≤i<j<k≤n + (−1)n−1 P (∩ni=1 Ei ) Proof. We already proved this formula for n = 2. Now let’s prove that if this formula is true for k, then it is also true for k + 1 k k k P (∪k+1 i=1 Ei ) = P ((∪i=1 Ei ) ∪ Ek+1 ) = P (∪i=1 Ei )) + P (Ek+1 ) − P (∪i=1 Ei ) ∩ Ek+1 ) = P (∪ki=1 Ei )) + P (Ek+1 ) − P (∪ki=1 (Ei ∩ Ek+1 )) = k X X P (Ei ) − i=1 X P (Ei ∩ Ej ) + 1≤i<j≤k P (Ei ∩ Ej ∩ Em ) − . . . 1≤i<j<m≤k + (−1)k−1 P (∩ki=1 Ei ) + P (Ek+1 ) − k X i=1 = k+1 X i=1 X P (Ei ∩ Ek+1 ) + P (Ei ∩ Ej ∩ Ek+1 ) − · · · + (−1)k+1−1 P (∩k+1 i=1 Ei ) 1≤i≤j≤k X P (Ei ) − P (Ei ∩ Ej ) + 1≤i<j≤k+1 X P (Ei ∩ Ej ∩ Em ) − . . . 1≤i<j<m≤k+1 + (−1)k+1−1 P (∩k+1 i=1 Ei ) Remark. The principle of mathematical induction Let n0 ∈ N where N is the set of natural numbers and let P (n) be a statement for each natural number n ≥ n0 . Suppose that 1. The statement P (n0 ) is true 2. For all k ≥ n0 , the truth of P (k) implies the truth of P (k + 1). Then P (n) is true for all n ≥ n0 15 Example 19. The Monty Hall Problem A prize is equally likely to be found behind any one of three closed doors in front of you. You point to one of the doors. A friend opens for you one of the remaining two doors, after making sure that the prize is not behind it. At this point, you can stick to your initial choice, or switch to the other unopened door.You win the prize if it lies behind your final choice of a door. Which door will you choose ? Figure 9: The Monty Hall Problem Proof. First, the probability that the prize is behind door i, i = 1, 2, 3, P (i) is P (1) = P (2) = P (3) = 1 3 Second, given that you has chosen door 1, the probability of your friend opening door 3 under the condition that the prize is behind door i, P (3|i) is P (opening 3|1) = 1 2 P (opening 3|2) = 1 P (opening 3|3) = 0 Now, we can the probability of the prize being behind door j given door 3 was opened, P (j|3), j = 1, 2 by applying Bayes’ rule. P (opening 3|1)P (1) P (opening 3|1)P (1) + P (opening 3|2)P (2) + P (opening 3|3)P (3) 1 1 · 1 = 1 1 2 13 1 = 3 2 · 3 +1· 3 +0· 3 P (1|opening 3) = P (opening 3|2)P (2) P (opening 3|1)P (1) + P (opening 3|2)P (2) + P (opening 3|3)P (3) 1 · 31 2 = 1 1 1 1 = 3 2 · 3 +1· 3 +0· 3 P (2|opening 3) = Therefore, you are better off switching. 16 Example 20. A new couple, known to have two children, has just moved into town. Suppose that the mother is encountered walking with one of her children. If this is a girl, what is the probability that both children are girls? Proof. Let G be the event that the child seen with the mother is a girl. And let’s define G1 , G2 , B1 , B2 as follows G1 : The first child is a girl G2 : The second child is a girl B1 : The first child is a boy B2 : The second child is a boy The probability this example requires us to find is given by T T T \ P (G1 G2 ) P (G1 G2 G) = P (G1 G2 |G) = P (G) P (G) Also \ \ \ \ P (G) = P (G|G1 G2 ) · P (G1 G2 ) + P (G|G1 B2 ) · P (G1 B2 ) \ \ \ \ P (G|B1 G2 ) · P (B1 G2 ) + P (G|B1 B2 ) · P (B1 B2 ) \ \ \ \ \ = P (G1 G2 ) + P (G|G1 B2 ) · P (G1 B2 ) + P (G|B1 G2 ) · P (B1 G2 ) where the final equation used the reults P (G|G1 T G2 ) = 1 and P (G|B1 T B2 ) = 0. Assuming that all 4 gender possibilities are equally likely, we get P (G1 \ G2 |G) = 1 4 + P (G|G1 T B2 ) · 1 4 1 4 + P (G|B1 T G2 ) · 1 4 Thus the answer depends on whatever assumptions we make about the conditional probabilities T T P (G|G1 B2 ) and P (G|B1 G2 ). 17