CS145: Probability & Computing Lecture 2: Axioms of Probability, Conditioning & Bayes’ Rule Instructor: Erik Sudderth Brown University Computer Science January 29, 2015 Course Enrollment Please complete the survey TODAY! http://tinyurl.com/cs145Survey Overall course grades will be determined as follows: 50% homeworks, 20% midterm exam, 30% final exam. The midterm exam will be given during the normal lecture time on Thursday, March 12. The final exam will be given on Wednesday, May 6 from 9:00am-12:00pm. Homework Policies Homework 1: Out 1/29, due 2/5, NO MATLAB. Homework Assignments There will be ten homework assignments, each due one week after it is handed out. Homework problems will emphasize probabilistic derivations, calculations, and reasoning. Most homeworks will also have one problem requiring Matlab implementation of simple methods from probability or statistics. The scores of all ten assignments will be averaged equally to determine an overall homework score (we will not “drop” any homeworks). Homeworks will be submitted electronically. Collaboration Policy Students may discuss and work on homework problems in groups. HowCollaboration Policy Students may discuss and work on homework problems in groups. However, each student must write up their solutions independently, and do any required programming ever, each student must write up their solutions independently, and do any required programming independently. List the names of any collaborators on the front page of your solutions. You may independently. List the names of any collaborators on the front page of your solutions. You may not directly copy solutions from other students, or from materials distributed in previous versions not directly copy solutions from other students, or from materials distributed in previous versions of this or other courses. You may not make your solutions available to other students: files in your of this or other courses. You may not make your solutions available to other students: files in your home directory may not be world-readable, and you may not post your solutions to public websites. home directory may not be world-readable, and you may not post your solutions to public websites. Late Submission Policy Homework assignments are due by 11:59pm on Thursday evenings. Late Submission Policy Homework assignments are due by 11:59pm on Thursday evenings. Your answers may be submitted up to 4 days late (by Monday evening); after this point, solutions Your answers may be submitted up to 4 days late (by Monday evening); after this point, solutions will be distributed and handins will no longer be accepted. You may submit up to two late will be distributed and handins will no longer be accepted. You may submit up to two late assignments without penalty. For each subsequent late assignment, 20 points (out of a maximum assignments without penalty. For each subsequent late assignment, 20 points (out of a maximum of 100) will be deducted from the overall score. Exceptions to this policy are only given in very of 100) will be deducted from the overall score. Exceptions to this policy are only given in very unusual circumstances, and any extensions must be requested in advance by e-mail to the instructor. unusual circumstances, and any extensions must be requested in advance by e-mail to the instructor. Syllabus: Summary of Course Topics Reasoning Under Uncertainty • 1/1000 of tourists who visit tropical country X return with a dangerous virus Y . • There is a test to check for the virus. The test has 5% false positive rate and no false negative error. • You returned from country X , took the test, and it was positive. Should you take the painful treatment for the virus? Elements of a Probabilistic Model • The sample space Ω, which is the set of all possible outcomes of an experiment. • The probability law, which assigns to a set A of possible outcomes (also called an event) a nonnegative number P(A) (called the probability of A) that encodes our knowledge or belief about the collective “likelihood” of the elements of A. The probability law must satisfy certain properties to be introduced shortly. Sample Spaces and Probability Laws Probability Law Event B Experiment Event A Sample Space (Set of Outcomes) P(B) P(A) A B Events 1.2: The main ingredients of a probabilistic model. Introduction to Probability, 2008 Some figures Figure and materials courtesy Bertsekas & Tsitsiklis, “likelihood” of the elements ofSample A. The probability law must satisfy Space and Probability Chap. 1 certain properties to be introduced shortly. of every Sn , and xn ∈ ∩n Snc . This shows that (∪n Sn )c ⊂ ∩n Snc . The converse inclusion is established by reversing the above argument, and the first law follows. The argument for the second law is similar. 6 Defining a Probabilistic Model Probability Law Event B P(B) 1.2 PROBABILISTIC MODELS Experiment Event A P(A) A probabilistic model is a mathematical description of an uncertain situation. Sample Space It must be in accordance with a(Set fundamental framework that we discuss in this of Outcomes) A B section. Its two main ingredients are listed below and are visualized in Events Fig. 1.2. Figure of 1.2:a The main ingredients of a probabilistic model. Elements Probabilistic Model • The sample space Ω, which is the set of all possible outcomes of an experiment. Sample Spaces and Events • The probability law, which assigns to a set A of possible outcomes (also called an event) a nonnegative number P(A) (called the probaEvery bility probabilistic model involves underlying process, called the experiof A) that encodes our an knowledge or belief about the collective ment, “likelihood” that will produce oneofout several possible law outcomes. The set of theexactly elements A. ofThe probability must satisfy of all possible outcomes to is called the sample space of the experiment, and is certain properties be introduced shortly. denoted by Ω. A subset of the sample space, that is, a collection of possible Background: Sets Sets: • A set is a collection of objects, which are elements of the set. • A set can be finite, S = {1, 2, . . . , n}. • A set can be countably infinite: S = {x | x = 2k + 1 or x = 2k + 1, k integer} = {1, 1, 3, 3, 5, 5, . . . }. • A set can be uncountable, S = {x | x 2 [0, 1]}. • A set can be empty S = ;. Sets: Elements & Relationships • x 2 S - the element x is a member of the set S • x 2 / S - the element x is not a member of the set S • 9x - there exists x... • 8 - for all elements x ... • T ✓ S - 8x 2 T , x 2 S • T ⇢ S - 8x 2 T , x 2 S AND 9x 2 S such that x 62 T . Sets: Combination & Manipulation • A base set ⌦, all sets are subsets of ⌦ • Basic operations: for S, T ✓ ⌦, • S [ T = {x | x 2 S or x 2 T } • S \ T = {x | x 2 S and x 2 T } • S̄ = S c = {x | x 62 S} • De Morgan’s laws: • (S [ T )c = S̄ \ T̄ • (S \ T )c = S̄ [ T̄ S T c S • S = S̄ Ti2I i c Si2I i • = i2I S̄i i2I Si union intersection complement T Venn Diagram ⌦ Visualizing Sets: Venn Diagrams • A base set ⌦, all sets are subsets of ⌦ • Basic operations: for S, T ✓ ⌦, • S [ T = {x | x 2 S or x 2 T } • S \ T = {x | x 2 S and x 2 T } • S̄ = S c = {x | x 62 S} • De Morgan’s laws: • (S [ T )c = S̄ \ T̄ • (S \ T )c = S̄ [ T̄ S T c • S = S̄ Ti2I i c Si2I i • = i2I S̄i i2I Si cases B 1, ... , Bn. Figure 1 shows a subset B of the square is partitioned in three different ways. However B is partitioned into subsets, or broken up into pieces, the area in B is the sum of the areas of the pieces. This is the addition rule for area. Partitions of a Set FIGURE 1. Partitions of a set B. B The addition rule is satisfied by other measures of sets instead of area, for example, length, volume, and the number or proportion of elements for finite sets. A set B is partitionedTheinto n subsets if: addition rule now appears as one of the three basic rules of proportion. No matter how probabilities are interpreted, it is generally agreed they must satisfy the same three rules: B1 [ B2 [ · · · [ Bn = B Bi \ Bj = ; for any i 6= j mutually disjoint Pitman’s Probability, 1999 The Sample Space ⌦= “Omega” a set, or unordered “list”, of possible outcomes from some random (not deterministic) experiment The list defining the sample space must be: Mutually exclusive: Each experiment has a unique outcome. Collectively exhaustive: No matter what happens in the experiment, the outcome is an element of the sample space. An art: Choosing the “right” granularity, to capture the phenomenon of interest as simply as possible. Modeling in science and engineering involves tradeoffs between accuracy, simplicity, & tractability. A Finite Sample Space • Two rolls of a tetrahedral die –You Sample space vs. sequential description roll a tetrahedral (4-sided) die 2 times. ⌦ = (x, y) | x 2 {1, 2, 3, 4}, y 2 {1, 2, 3, 4} 4 1 1,1 1,2 1,3 1,4 2 Y = Second 3 roll 3 2 1 1 2 3 X = First roll 4 4 4,4 Formally, sample space is a set of 42=16 discrete outcomes Can also model outcome via tree-based sequential description uld not be unique. A given physical situation may be modeled in several different ways, dending on the kind of questions that we are interested in. Generally, the sample ace chosen for a probabilistic modela must be collectively exhaustive, in the You toss (2-sided) coin 10 times. se that no matter what happens in the experiment, we always obtain an outTwo possible sample spaces for this experiment: me that has been included in the sample space. In addition, the sample space You record thetonumber ofbetween times the coin comes up heads: ouldA. have enough detail distinguish all outcomes of interest to the ⌦ = irrelevant {0, 1, 2,details. . . . , 9, 10} deler, while avoiding A Finite Sample Space B. You record the full sequence of head-tail outcomes: 10 all 210 possible H-T sequences ⌦ = {H, T } Example 1.1. Consider two alternative games, both involving ten successive coin tosses: Which is better? It depends on what you want to model: Game 1: We receive $1 each time a head comes up. Game 2: We receive $1 for every coin toss, up to and including the first time a head comes up. Then, we receive $2 for every coin toss, up to the second time a head comes up. More generally, the dollar amount per toss is doubled each time a head comes up. TABLE 1. Translations between events and sets. To interpret the Venn diagrams in terms Events and Sets of events, imagine that a point is picked at random from the square. Each point in the square then represents an outcome, and each region of the diagram represents the event that the point is picked from that region. Event language et language outcome space univer al s t event ub et of n et notation Venn diagram n A , B, C, etc. impo ible event empty set 0 not A, oppo ire of A complement of A AC either A or B or both union of A and B AUB both A and B inter ction of A and B AB, AnB A and B are mutually exclusive A and B are disjoint AB=0 if A then B A is a ub et of B It.JI Ibbl I cgysl Pitman’s Probability, 1999 impo ible event The Axioms of Probability Probability axioms empty set complement of A A, oppo airesubset of A of the sample •notEvent: space • Probability is assigned to events union of A and B either A or B or both 0 Probability AC Y=S AUB r It.JI Ibbl These axioms are appropriate for any finite sample space. I cgysl Infinite spaces require generalization of additivity axiom. Axioms: both A and B 1. Nonnegativity: P(A) ⌅ 0 inter ction of A and B AB, AnB 2. Normalization: P( ) = 1 A B are mutually A and B ⇧ areB) disjoint P(A = P(A) + PAB=0 (B) 3.and Additivity: If Aexclusive ⌃ B = Ø, then A is a ub et of B if A then B • P({s1, s2, . . . , sk }) = P({s1}) + · · · + P({sk }) = P(s1) + · · · + P(sk ) • Let every po – P((X, Y ) is – P({X = 1}) – P (X + Y is The nonnegativity and additivity axioms are fundamental to – P(min(X, Y our intuitive understanding of probability and uncertainty Unit normalization is just a convention, and other options could also be used (e.g., probability between 0% and 100%) • Axiom 3 needs strengthening • Do weird sets have probabilities? The Discrete Uniform Law Discrete uniform law sampling. Formalizes the idea of “completely random” • Let all outcomes be equally likely • Then, number of elements of A P(A) = total number of sample points • Computing probabilities ⇥ counting • Defines fair coins, fair dice, well-shu⇤ed decks Sample space vs. sequential description Uniform Law– for a Finite Sample Space You roll a tetrahedral (4-sided) die 2 times. Probability law: Example with finite sample space 4 1 1,1 1,2 1,3 1,4 2 Y = Second 3 4 roll Y = Second 3 3 2 roll 2 1 1 1 2 3 4 X = First roll • Let every possible outcome have probability 1/16 – P((X, Y ) is (1,1) or (1,2)) = – P({X = 1}) = – P(X + Y is odd) = – P(min(X, Y ) = 2) = 1 2 3 X = First roll 4 4 4,4 Probability axioms Probability Example 1.5. Romeo and Juliet have a date at a given time, and each will arrive the meeting place with delaysample between space 0 and 1 hour, with all pairs of delays • at Event: a subset of athe being equally likely. The first to arrive will wait for 15 minutes and will leave if the Sec. 1.2 Probabilistic Models • other Probability assigned events has not yetisarrived. Whatto is the probability that they will meet? Let us use as sample space the square Ω = [0, 1] × [0, 1], whose elements are the possible pairs of delays for the two of them. Our interpretation of “equally Axioms: likely” pairs of delays is to let the probability of a subset of Ω be equal to its area. probability law satisfies the ⌅ three 1. This Nonnegativity: P(A) 0 probability axioms. The event that Romeo B A and Juliet will meet is the shaded region in Fig. 1.5, and its probability is calculated 2. toNormalization: P( ) = 1 be 7/16. Sec. 1.2 Probabilistic Models Sec. 1.2 Probabilistic Models 3. Additivity: If A ⌃ B = Ø, then P(A ⇧ B) = P(A) + P(B) Properties of Probability Laws U B A U Probability laws have a number of properties, which can be deduced from the P(s1) + below. · · · + P(sk ) axioms. Some of them are= summarized B A A) – P({X = 1} A A B – P(X + Y is C Y – cP(min(X, c (a) If A ⊂ B, then P(A) ≤ P(B). A U (b) (a) B U (a) Consider a probability law, and let A, B, and C be events. 1 A c A B– P A((X, B Y ) is • P({s1, s2, . . . , sk }) = P({s1}) + · · · + P({sk }) • Axiom 3 needs strengthening Properties of Probability Laws • Some Do weird sets have probabilities? r Let every po • (a) Properties of Probability Laws Y=S C (c A A c U B C C U U ! c A laws using Venn diagrams. If A ⊂ B, c c C c c events A A C A ∩ B; seeAdiagram B and B (a). have c A B (c) P(B) = P(A) + P U (d) P(A ∪ B ∪ C) = P(A) + P(Ac ∩ B) + P(Ac ∩ B c ∩ C). BFigure 1.6: Visualization and verificat U (c) P(A ∪ B) ≤ P(A) + P(B). B U (b) P(A ∪ B) = P(A) + P(B) − P(A ∩ B). certain properties to be introduced shortly. Axioms of Probability for Infinite Spaces Probability Law Event B P(B) Probability law: Ex. w/countably infinite sample space Experiment • Sample space: {1, 2, . . .} Event A n – We are given P(n) = 2 , n = 1, 2, . . Sample . Space (Set of Outcomes) – Find P(outcome Probability is even) axioms P(A) A B Probability Events p • Event: a subset of the sample space 1/2 • Probability is assigned to The events Figure 1.2: main ingredients of a probabilistic model. 1/4 ….. Axioms: 1 3 4 SamplePSpaces and Events 1. Nonnegativity: (A) ⌅2 0 1 1 1 1 P ({2, 4, 6, . . .}) = P (2) + P (4) + · · · = + + + · · · = 2. Normalization: P( ) = 1 2 4 6 1/8 • Turn in recitation/tu Y = Sw • Tutorials start next r 1/16 2 3 2an underlying 2 Every probabilistic model involves process, called the experiment, produce several possible (A ⇧one B)out = of P(A) +P (B) outcomes. The set 3. If A that ⌃ B will =axiom: Ø, thenexactly 3.• Additivity: Countable additivity Pfor Countable additivity axiom (needed this calculation): of all possible outcomes is called the sample space of the experiment, and is If A1, A2, . . .denoted are disjoint then: by Ω. events, A subset of the sample space, that is, a collection of possible ⇧A · · · )1= +P P({s (A2k)}) + ··· , sk1}) =2 ⇧ P({s }) P +(A · ·1·)+ • P({s1, s2, . .P. (A = P(s1) + · · · + P(sk ) • Let every po – P((X, Y ) is – P({X = 1}) uniform law y likely Continuous uniform law • Two “random” numbers in [0, 1]. An (Infinite) Continuous Probability Model y You throw a dart at a Continuous 1-meter square region. Sample space: example 1 er of elements of A mber of sample points = {(x, y) | 0 ⇤ x, y ⇤ 1} counting 1 y • Uniform law: Probability = Area well-shu⇤ed decks 1 – P(X + Y ⇤ 1/2) = ? – P( (X, Y ) = (0.5, 0.3) ) 1 x x Conditional Probabilities and • Divide and conquer Bayes’ Rule • Partition of sample space into A , A , A Total probability theorem 1 • Have P(B | Ai), for every i A1 2 • “Prior” – initial “ 3 • We know • Wish to – revise “ B A2 A3 • One way of computing P(B): Some figures and materials courtesy Tsitsiklis, P(B) =Bertsekas P(A &)P (B | A )Introduction to Probability, 2008 1 1 P(A ∪ C) ≤ P(A) + P(C) translates to the new fact roll P(A ∪ C | B) ≤ P(A | B) + P(C | B). Conditional Probability Let us summarize the conclusions reached so far. • P(A |ofB) = probability Properties Conditional Probabilityof A, given that B occurred probability of an event A, given an • The conditional P(B) > 0, is defined by event B with – B is our new universe P(A | B) = P(A ∩ B) , P(B) • Definition: Assuming P(B) = ⇥ 0, and specifies a new (conditional) probability law on the same sample P(A ⌅ofB) space Ω. In particular, all known properties probability laws remain P (A | B) = valid for conditional probability laws. P(B) A • Let B b B• Let M = • Conditional probabilities can also be viewed as a probability law on a P(A | B)B,undefined if conditional P(B) =probability 0 new universe because all of the is concentrated on B. • P⌦ (M = • In the case where the possible outcomes are finitely many and equally likely, we have • P(M = P(A | B) = number of elements of A ∩ B . number of elements of B – Sample space vs. sequential description Conditional Probability Example You roll a tetrahedral (4-sided) die 2 times. Die roll example Probability law: Example with finite sample space 4 4 4 Y = Second 3 roll roll roll 1 1 3 2 2 1 2 Y = Second 3 Y = Second 3 2 1 1,1 1,2 1,3 1,4 1 2 1 3 2 3 4 4 X = First roll X = First roll • Let every possible outcome have probability 1/16 1 2 3 X = First roll 4 4 4,4 P((X, Y )the is (1,1) (1,2)) Y=) = 2 • – Let B be event:ormin(X, P({M X= =max(X, 1}) = Y ) • – Let – P(X + Y is odd) = • P(M = 1 | B) = – P(min(X, Y ) = 2) = • P(M = 2 | B) = Note that conditional probabilities of M given B satisfy axioms, i.e. they are non-negative and sum to one. P(A)=0.05 – B is our new universe Models Based on Conditional Probabilities Multiplication rule • Definition: Assuming P(B) = ⇥ 0, Models based on conditional probabilities Multiplication rule P(Ac)=0.95 P(A | B) = P(A P(C | A B) A ⌅BB) = U P(B) = U Bc A Bc C U false alarm U P(B) = A Bc C U A P(Ac)=0.95 P(A ⌅ B) = c | A) P(B P(A | B) = U P(A) P(Bc | Ac)=0.90 A B C P(B | A) missed detection A P(B | Ac)=0.10 U P(A)=0.05 P(Bc | P(A | B) undefined if P(B) = 0 P(B | A)=0.99 P(Bc | A)=0.01 P(B) U U • Event A: Airplane is flying above Event B: Something registers on radar screen P B) (A∩⌅B B∩ ⌅ C) ·P P(A · P(C (C ||P(B A∩ ⌅| B A PP (A C) = = PP(A) (A) (B |⌅A) A) P A P(Ac) Ac Total probability th • Divide and conquer (B c |Ac Multiplication Rule )= 0.9 0 Multiplication rule Multiplication rule (A∩⌅B B∩ ⌅ C) P(B | A) · P(C (C ||A A∩ ⌅ B) PP (A C) = = PP(A) (A)·P A)P B) U P(C | A B) U U U A B A B C Figure 1.8: Sequential description of the sample space for the radar detection problem in Example 1.9 . In mathematical terms, we are dealing with an event A which occurs if and only if each one of several events A1 , . . . , An has occurred, i.e., A = A1 ∩ A2 ∩ · · · ∩ An . The occurrence of A is viewed as an occurrence of A1 , followed by the occurrence of A2 , then of A3 , etc, and it is visualized as a path on the tree with n branches, corresponding to the events A1 , . . . , An . The probability of A is given by the following rule (see also Fig. 1.9). P(B | A) Multiplication Rule P(Bc | A) Assuming that all of the conditioning events have positive probability, we have A Bc C U P(A) U A U A Bc A Bc Cc ! " " ! P ∩ni=1 Ai = P(A1 )P(A2 | A1 )P(A3 | A1 ∩ A2 ) · · · P An | ∩n−1 i=1 Ai . U U The multiplication rule can be verified by writing P(Ac) P Ac ! ∩ni=1 Ai " ! " P ∩ni=1 Ai P(A1 ∩ A2 ) P(A1 ∩ A2 ∩ A3 ) = P(A1 ) · · · ! n−1 " , P(A1 ) P(A1 ∩ A2 ) P ∩i=1 Ai and by using the definition of conditional probability to rewrite the right-hand side above as " ! P(A1 )P(A2 | A1 )P(A3 | A1 ∩ A2 ) · · · P An | ∩n−1 i=1 Ai . Total probability theorem Total Probability Theorem • Divide and conquer • Partition of sample space into A1, A2, A3 • Have P(B | Ai), for every i A1 A2 • “Prior” – initial • We kno • Wish to – revise B A3 • One way of computing P(B): P(B) = P(A1)P(B | A1) + P(A2)P(B | A2) + P(A3)P(B | A3) P(Ai | Bayes’ Rule mputing P(B): P(A1)P(B | A1)Bayes’ rule + P(A2)P(B | A2) • “Prior” probabilities P(Ai) + P(A )P(B | A3) – 3 initial “beliefs” • Wish to compute P(Ai | B) – revise “beliefs”, given that B occurred = = B A2 P(Ai | B) = A3 P(Ai | B) = • We know P(B | Ai) for each i A1 A2 A3 P(Ai ⌅ B) 2 P(Ai ⌅ B) P(B) P(Ai)P(B | Ai) P(B) P(Ai)P(B | Ai) j P(Aj )P(B | Aj ) Counterintuitive? Reasoning Under Uncertainty Counterintuitive? 1/1000 of of tourists touristswho whovisit visittropical tropicalcountry country return with •• 1/1000 XX return with a a dangerous virus . visit tropical country X return with a dangerous YY.who • 1/1000 ofvirus tourists dangerous virusto Y .check • There There test to checkfor forthe thevirus. virus.The The test false isis aa test test hashas 5%5% false • There israte a test to no check fornegative the virus. error. The test has 5% false positive and positive rate and nofalse false negative error. positive rate and no false negative error. • You and it was You returned returnedfrom fromcountry countryXX, took , tookthe thetest, test, and it was • You returned from country X , took the test, and it was positive. Should you take the painful treatment for thethe virus? positive. youtake take painful treatment positive. Should Should you thethe painful treatment for the for virus? virus? • A virus.BB- positive - positive in test. the test. • A- -has has the the virus. in the 1 1 10001000 1 1 999 999 5 1000 + 1000 100 Pr(A (A || B) Pr B)== 1000 + 20 20 = ⇡ 2% ⇡ 2% 5 1019 1019 = 1000 100 Explanation: Out of 1000 tourist, 1 will have the virus and Explanation: of 1000 tourist, 1 will have the virus and another 50 willOut be false positive in the test. another 50 will be false positive in the test.