Review of Probability Axioms of Probability Theory Pr(A) denotes probability that proposition A is true. (A is also called event, or random variable). 1. 0 Pr( A) 1 2. Pr(True) 1 3. Pr( A B) Pr( A) Pr( B) Pr( A B) Pr( False ) 0 A Closer Look at Axiom 3 Pr( A B) Pr( A) Pr( B) Pr( A B) True A A B B B Using the Axioms to prove new properties Pr( A A) Pr(True) 1 Pr(A) Pr( A) Pr(A) Pr( A A) Pr( A) Pr(A) Pr( False ) Pr( A) Pr(A) 0 1 Pr( A) We proved this Probability of Events • Sample space and events – Sample space S: (e.g., all people in an area) – Events E1 S: (e.g., all people having cough) E2 S: (e.g., all people having cold) • Prior (marginal) probabilities of events – – – – P(E) = |E| / |S| (frequency interpretation) P(E) = 0.1 (subjective probability) 0 <= P(E) <= 1 for all events Two special events: and S: P() = 0 and P(S) = 1.0 • Boolean operators between events (to form compound events) – Conjunctive (intersection): E1 ^ E2 ( E1 E2) – Disjunctive (union): E1 v E2 ( E1 E2) – Negation (complement): ~E (EC = S – E) 5 • Probabilities of compound events – P(~E) = 1 – P(E) because P(~E) + P(E) =1 – P(E1 v E2) = P(E1) + P(E2) – P(E1 ^ E2) – But how to compute the joint probability P(E1 ^ E2)? ~E E E1 E2 E1 ^ E2 • Conditional probability (of E1, given E2) – How likely E1 occurs in the subspace of E2 | E1 E 2 | | E1 E 2 | / | S | P ( E1 E 2) P ( E1 | E 2) | E2 | | E2 | / | S | P ( E 2) P ( E1 E 2) P ( E1 | E 2) P ( E 2) 6 Using Venn diagrams and decision trees is very useful in proofs and reasonings Independence, Mutual Exclusion and Exhaustive sets of events • Independence assumption – Two events E1 and E2 are said to be independent of each other if P ( E1 | E 2) P ( E1) (given E2 does not change the likelihood of E1) – It can simplify the computation P ( E1 E 2) P ( E1 | E 2) P ( E 2) P ( E1) P ( E 2) P ( E1 E 2) P ( E1) P ( E 2) P ( E1 E 2) P ( E1) P ( E 2) P ( E1) P ( E 2) 1 (1 P ( E1)(1 P ( E 2)) • Mutually exclusive (ME) and exhaustive (EXH) set of events – ME: – EXH: E i E j ( P ( E i E j ) 0), i , j 1,.., n, i j E1 ... En S ( P ( E1 ... En ) 1) Random Variables 8 Discrete Random Variables • X denotes a random variable. • X can take on a finite number of values in set {x1, x2, …, xn}. • P(X=xi), or P(xi), is the probability that the random variable X takes on value xi. • P( ). is called probability mass function. • E.g. P( Room) 0.7,0.2,0.08,0.02 These are four possibilities of value of X. Sum of these values must be 1.0 Discrete Random Variables • Finite set of possible outcomes X x 1 , x 2 , x 3 ,..., x n P ( xi ) 0 0.4 0.35 0.3 n P( x ) 1 i 1 X binary: i P ( x ) P ( x ) 1 0.25 0.2 0.15 0.1 0.05 0 X1 X2 X3 X4 10 Continuous Random Variable • Probability distribution (density function) over continuous values X 0 ,10 P (x) 0 10 P ( x ) dx 1 P (x) 0 7 P (5 x 7 ) P ( x ) dx 5 5 7 x 11 Continuous Random Variables • X takes on values in the continuum. • p(X=x), or p(x), is a probability density function (PDF). b Pr( x [a, b]) p( x)dx • E.g. a p(x) x Probability Distribution • Probability distribution P(X|x) – X is a random variable • Discrete • Continuous – x is background state of information Joint and Conditional Probabilities • Joint P (x, y) P ( X x Y y) – Probability that both X=x and Y=y • Conditional P(x | y) P(X x |Y y) – Probability that X=x given we know that Y=y Joint and Conditional Probabilities • Joint P (x, y) P ( X x Y y) – Probability that both X=x and Y=y • Conditional P(x | y) P(X x |Y y) – Probability that X=x given we know that Y=y Joint and Conditional Probability • P(X=x and Y=y) = P(x,y) • If X and Y are independent then P(x,y) = P(x) P(y) • P(x | y) is the probability of x given y P(x | y) = P(x,y) / P(y) P(x,y) = P(x | y) P(y) • If X and Y are independent then P(x | y) = P(x) divided Law of Total Probability Discrete case Continuous case P( x) 1 p( x) dx 1 x P ( x ) P ( x, y ) y P( x) P( x | y ) P( y ) y p ( x) p( x, y ) dy p ( x) p ( x | y ) p ( y ) dy Rules of Probability: Marginalization • Product Rule P ( X , Y ) P ( X | Y ) P (Y ) P (Y | X ) P ( X ) • Marginalization n P (Y ) P (Y , x i ) i 1 X binary: P (Y ) P (Y , x ) P (Y , x ) 18 Gaussian, Mean and Variance N(m, s) Gaussian (normal) distributions N(m, s) P( x) ( x m )2 1 exp 2s 2 s N(m, s) different mean different variance 20 Gaussian networks Each variable is a linear function of its parents, with Gaussian noise Joint probability density functions: X Y X Y 21 Reverend Thomas Bayes (1702-1761) Clergyman and mathematician who first used probability inductively. These researches established a mathematical basis for probability inference Bayes Rule P (H , E ) P (H | E )P (E ) P (E | H )P (H ) P(E | H )P(H ) P(H | E ) P(E ) Pr( A B) Pr( A) Pr( B) Pr( A B) P (H , E ) P (H | E )P (E ) P (E | H )P (H ) 10/40 = probability that you smoke if you have cancer = P(smoke/cancer) P(E | H )P(H ) P(H | E ) P(E ) 10/100 = probability that you have cancer if you smoke 40 People who have cancer 100 People who smoke 1000-100 = 900 people who do not smoke 1000-40 = 960 people who do not have cancer 10 People who smoke and have cancer True All people = 1000 E = smoke, H = cancer A A B B B Prob(Cancer/Smoke) = P (smoke/Cancer) * P (Cancer) / P(smoke) P(smoke) = 100/1000 P(cancer) = 40/1000 P(smoke/Cancer) = 10/40 = 25% Prob(Cancer/Smoke) = 10/40 * 40/1000/ 100 = 10/1000 / 100 = 10/10,000 =/1000 = 0.1% 10/40 = probability that you smoke if you have cancer = P(smoke/cancer) 40 People who have cancer 100 People who smoke 10 People who smoke and have cancer 10/100 = probability that you have cancer if you smoke 1000-100 = 900 people who do not smoke 1000-40 = 960 people who do not have cancer True E = smoke, H = cancer A All people = 1000 A B B Prob(Cancer/Smoke) = P (smoke/Cancer) * P (Cancer) / P(smoke) P(smoke) = 100/1000 P(cancer) = 40/1000 P(smoke/Cancer) = 10/40 = 25% B Prob(Cancer/Smoke) = 10/40 * 40/1000/ 100 = 10/1000 / 100 = 10/10,000 = 1/1000 = 0.1% E = smoke, H = cancer Prob(Cancer/Not smoke) = 30/40 * 40/100 / 900 = 30/100*900 = 30 / 90,000 = 1/3,000 = 0.03 % Prob(Cancer/Not Smoke) = P (Not smoke/Cancer) * P (Cancer) / P(Not smoke) Bayes’ Theorem with relative likelihood • In the setting of diagnostic/evidential reasoning H i P(Hi ) hypotheses P(E j | Hi ) E1 Ej Em – Know prior probability of hypothesis conditional probability – Want to compute the posterior probability evidence/m anifestati ons P(Hi ) P(E j | Hi ) P(Hi | E j ) • Bayes’ theorem (formula 1): P ( H i | E j ) P ( H i ) P ( E j | H i ) / P ( E j ) • If the purpose is to find which of the n hypotheses H1 ,..., H n is more plausible given E j, then we can ignore the denominator and rank them, use relative likelihood rel ( H i | E j ) P ( E j | H i ) P ( H i ) 26 Relative likelihood • P ( E j ) can be computed fromP ( E j | H i ) and P ( H i ,) if we assume all hypotheses H1 ,..., H n are ME and EXH P ( E j ) P ( E j ( H1 ... H n ) (by EXH) n P(E j Hi ) (by ME) i 1 n P(E j | Hi )P(Hi ) i 1 • Then we have another version of Bayes’ theorem: P(Hi | E j ) P(E j | Hi )P(Hi ) n P(E k 1 j | Hk )P(Hk ) rel ( H i | E j ) n rel ( H k 1 k | Ej) n where P(E k 1 j | H k ) P ( H k ), the sum of relative likelihood of all n hypotheses, is a normalization factor Naïve Bayesian Approach • Knowledge base: E1 ,..., Em : evidence/m anifestati on H1 ,..., H n : hypotheses /disorders E j and H i are binary and hypotheses form a ME & EXH set P ( E j | H i ), i 1,...n, j 1,...m conditiona l probabilit ies • Case input: E1 ,..., El • Find the hypothesis H iwith the highest posterior probability P ( H i | E1 ,..., El ) P ( E1 ,...E l | H i ) P ( H i ) • By Bayes’ theorem P ( H i | E1 ,..., E l ) P ( E1 ,...E l ) • Assume all pieces of evidence are conditionally independent, given any hypothesis P ( E1 ,...El | H i ) lj 1 P ( E j | H i ) 28 absolute posterior probability • The relative likelihood rel ( H i | E1 ,..., El ) P ( E1 ,..., El | H i ) P ( H i ) P ( H i ) lj 1 P ( E j | H i ) • The absolute posterior probability P ( H i | E1 ,..., E l ) rel ( H i | E1 ,..., El ) n rel ( H k | E1 ,..., El ) k 1 P ( H i ) lj 1 P ( E j | H i ) l P ( H ) k j 1 P ( E j | H k ) n k 1 • Evidence accumulation (when new evidence is discovered) rel ( H i | E1 ,..., El , El 1 ) P ( El 1 | H i )rel ( H i | E1 ,..., El ) rel ( H i | E1 ,..., El , ~ El 1 ) (1 P ( El 1 | H i ))rel ( H i | E1 ,..., El ) Bayesian Networks and Markov Models – applications in robotics • • • • • • • Bayesian AI Bayesian Filters Kalman Filters Particle Filters Bayesian networks Decision networks Reasoning about changes over time • Dynamic Bayesian Networks • Markov models