2014_0010_basic Probability and Bayes

Review of Probability Axioms of Probability Theory Pr(A) denotes probability that proposition A is true. (A is also called event, or random variable). 1. 0  Pr( A)  1 2. Pr(True)  1 3. Pr( A  B)  Pr( A)  Pr( B)  Pr( A  B) Pr( False )  0 A Closer Look at Axiom 3 Pr( A  B)  Pr( A)  Pr( B)  Pr( A  B) True A A B B B Using the Axioms to prove new properties Pr( A  A) Pr(True) 1 Pr(A)  Pr( A)  Pr(A)  Pr( A  A)  Pr( A)  Pr(A)  Pr( False )  Pr( A)  Pr(A)  0  1  Pr( A) We proved this Probability of Events • Sample space and events – Sample space S: (e.g., all people in an area) – Events E1  S: (e.g., all people having cough) E2  S: (e.g., all people having cold) • Prior (marginal) probabilities of events – – – – P(E) = |E| / |S| (frequency interpretation) P(E) = 0.1 (subjective probability) 0 <= P(E) <= 1 for all events Two special events:  and S: P() = 0 and P(S) = 1.0 • Boolean operators between events (to form compound events) – Conjunctive (intersection): E1 ^ E2 ( E1  E2) – Disjunctive (union): E1 v E2 ( E1  E2) – Negation (complement): ~E (EC = S – E) 5 • Probabilities of compound events – P(~E) = 1 – P(E) because P(~E) + P(E) =1 – P(E1 v E2) = P(E1) + P(E2) – P(E1 ^ E2) – But how to compute the joint probability P(E1 ^ E2)? ~E E E1 E2 E1 ^ E2 • Conditional probability (of E1, given E2) – How likely E1 occurs in the subspace of E2 | E1  E 2 | | E1  E 2 | / | S | P ( E1  E 2) P ( E1 | E 2)    | E2 | | E2 | / | S | P ( E 2) P ( E1  E 2)  P ( E1 | E 2) P ( E 2) 6 Using Venn diagrams and decision trees is very useful in proofs and reasonings Independence, Mutual Exclusion and Exhaustive sets of events • Independence assumption – Two events E1 and E2 are said to be independent of each other if P ( E1 | E 2)  P ( E1) (given E2 does not change the likelihood of E1) – It can simplify the computation P ( E1  E 2)  P ( E1 | E 2) P ( E 2)  P ( E1) P ( E 2) P ( E1  E 2)  P ( E1)  P ( E 2)  P ( E1  E 2)  P ( E1)  P ( E 2)  P ( E1) P ( E 2)  1  (1  P ( E1)(1  P ( E 2)) • Mutually exclusive (ME) and exhaustive (EXH) set of events – ME: – EXH: E i  E j   ( P ( E i  E j )  0), i , j  1,.., n, i  j E1  ...  En  S ( P ( E1  ...  En )  1) Random Variables 8 Discrete Random Variables • X denotes a random variable. • X can take on a finite number of values in set {x1, x2, …, xn}. • P(X=xi), or P(xi), is the probability that the random variable X takes on value xi. • P( ). is called probability mass function. • E.g. P( Room)  0.7,0.2,0.08,0.02 These are four possibilities of value of X. Sum of these values must be 1.0 Discrete Random Variables • Finite set of possible outcomes X   x 1 , x 2 , x 3 ,..., x n  P ( xi )  0 0.4 0.35 0.3 n  P( x )  1 i 1 X binary: i P ( x )  P ( x ) 1 0.25 0.2 0.15 0.1 0.05 0 X1 X2 X3 X4 10 Continuous Random Variable • Probability distribution (density function) over continuous values X  0 ,10  P (x)  0 10  P ( x ) dx  1 P (x) 0 7 P (5  x  7 )   P ( x ) dx 5 5 7 x 11 Continuous Random Variables • X takes on values in the continuum. • p(X=x), or p(x), is a probability density function (PDF). b Pr( x  [a, b])   p( x)dx • E.g. a p(x) x Probability Distribution • Probability distribution P(X|x) – X is a random variable • Discrete • Continuous – x is background state of information Joint and Conditional Probabilities • Joint P (x, y)  P ( X  x  Y  y) – Probability that both X=x and Y=y • Conditional P(x | y)  P(X  x |Y  y) – Probability that X=x given we know that Y=y Joint and Conditional Probabilities • Joint P (x, y)  P ( X  x  Y  y) – Probability that both X=x and Y=y • Conditional P(x | y)  P(X  x |Y  y) – Probability that X=x given we know that Y=y Joint and Conditional Probability • P(X=x and Y=y) = P(x,y) • If X and Y are independent then P(x,y) = P(x) P(y) • P(x | y) is the probability of x given y P(x | y) = P(x,y) / P(y) P(x,y) = P(x | y) P(y) • If X and Y are independent then P(x | y) = P(x) divided Law of Total Probability Discrete case Continuous case  P( x)  1  p( x) dx  1 x P ( x )   P ( x, y ) y P( x)   P( x | y ) P( y ) y p ( x)   p( x, y ) dy p ( x)   p ( x | y ) p ( y ) dy Rules of Probability: Marginalization • Product Rule P ( X , Y )  P ( X | Y ) P (Y )  P (Y | X ) P ( X ) • Marginalization n P (Y )   P (Y , x i ) i 1 X binary: P (Y )  P (Y , x )  P (Y , x ) 18 Gaussian, Mean and Variance N(m, s) Gaussian (normal) distributions N(m, s) P( x)    ( x  m )2  1  exp  2s 2 s   N(m, s) different mean different variance 20 Gaussian networks Each variable is a linear function of its parents, with Gaussian noise Joint probability density functions: X Y X Y 21 Reverend Thomas Bayes (1702-1761) Clergyman and mathematician who first used probability inductively. These researches established a mathematical basis for probability inference Bayes Rule P (H , E )  P (H | E )P (E )  P (E | H )P (H ) P(E | H )P(H ) P(H | E )  P(E ) Pr( A  B)  Pr( A)  Pr( B)  Pr( A  B) P (H , E )  P (H | E )P (E )  P (E | H )P (H ) 10/40 = probability that you smoke if you have cancer = P(smoke/cancer) P(E | H )P(H ) P(H | E )  P(E ) 10/100 = probability that you have cancer if you smoke 40 People who have cancer 100 People who smoke 1000-100 = 900 people who do not smoke 1000-40 = 960 people who do not have cancer 10 People who smoke and have cancer True All people = 1000 E = smoke, H = cancer A A B B B Prob(Cancer/Smoke) = P (smoke/Cancer) * P (Cancer) / P(smoke) P(smoke) = 100/1000 P(cancer) = 40/1000 P(smoke/Cancer) = 10/40 = 25% Prob(Cancer/Smoke) = 10/40 * 40/1000/ 100 = 10/1000 / 100 = 10/10,000 =/1000 = 0.1% 10/40 = probability that you smoke if you have cancer = P(smoke/cancer) 40 People who have cancer 100 People who smoke 10 People who smoke and have cancer 10/100 = probability that you have cancer if you smoke 1000-100 = 900 people who do not smoke 1000-40 = 960 people who do not have cancer True E = smoke, H = cancer A All people = 1000 A B B Prob(Cancer/Smoke) = P (smoke/Cancer) * P (Cancer) / P(smoke) P(smoke) = 100/1000 P(cancer) = 40/1000 P(smoke/Cancer) = 10/40 = 25% B Prob(Cancer/Smoke) = 10/40 * 40/1000/ 100 = 10/1000 / 100 = 10/10,000 = 1/1000 = 0.1% E = smoke, H = cancer Prob(Cancer/Not smoke) = 30/40 * 40/100 / 900 = 30/100*900 = 30 / 90,000 = 1/3,000 = 0.03 % Prob(Cancer/Not Smoke) = P (Not smoke/Cancer) * P (Cancer) / P(Not smoke) Bayes’ Theorem with relative likelihood • In the setting of diagnostic/evidential reasoning H i P(Hi ) hypotheses P(E j | Hi ) E1 Ej Em – Know prior probability of hypothesis conditional probability – Want to compute the posterior probability evidence/m anifestati ons P(Hi ) P(E j | Hi ) P(Hi | E j ) • Bayes’ theorem (formula 1): P ( H i | E j )  P ( H i ) P ( E j | H i ) / P ( E j ) • If the purpose is to find which of the n hypotheses H1 ,..., H n is more plausible given E j, then we can ignore the denominator and rank them, use relative likelihood rel ( H i | E j )  P ( E j | H i ) P ( H i ) 26 Relative likelihood • P ( E j ) can be computed fromP ( E j | H i ) and P ( H i ,) if we assume all hypotheses H1 ,..., H n are ME and EXH P ( E j )  P ( E j  ( H1  ...  H n ) (by EXH) n   P(E j  Hi ) (by ME) i 1 n   P(E j | Hi )P(Hi ) i 1 • Then we have another version of Bayes’ theorem: P(Hi | E j )  P(E j | Hi )P(Hi ) n  P(E k 1 j | Hk )P(Hk )  rel ( H i | E j ) n  rel ( H k 1 k | Ej) n where  P(E k 1 j | H k ) P ( H k ), the sum of relative likelihood of all n hypotheses, is a normalization factor Naïve Bayesian Approach • Knowledge base: E1 ,..., Em : evidence/m anifestati on H1 ,..., H n : hypotheses /disorders E j and H i are binary and hypotheses form a ME & EXH set P ( E j | H i ), i  1,...n, j  1,...m conditiona l probabilit ies • Case input: E1 ,..., El • Find the hypothesis H iwith the highest posterior probability P ( H i | E1 ,..., El ) P ( E1 ,...E l | H i ) P ( H i ) • By Bayes’ theorem P ( H i | E1 ,..., E l )  P ( E1 ,...E l ) • Assume all pieces of evidence are conditionally independent, given any hypothesis P ( E1 ,...El | H i )   lj 1 P ( E j | H i ) 28 absolute posterior probability • The relative likelihood rel ( H i | E1 ,..., El )  P ( E1 ,..., El | H i ) P ( H i )  P ( H i ) lj 1 P ( E j | H i ) • The absolute posterior probability P ( H i | E1 ,..., E l )  rel ( H i | E1 ,..., El ) n  rel ( H k | E1 ,..., El ) k 1  P ( H i ) lj 1 P ( E j | H i ) l P ( H )   k j 1 P ( E j | H k ) n k 1 • Evidence accumulation (when new evidence is discovered) rel ( H i | E1 ,..., El , El 1 )  P ( El 1 | H i )rel ( H i | E1 ,..., El ) rel ( H i | E1 ,..., El , ~ El 1 )  (1  P ( El 1 | H i ))rel ( H i | E1 ,..., El ) Bayesian Networks and Markov Models – applications in robotics • • • • • • • Bayesian AI Bayesian Filters Kalman Filters Particle Filters Bayesian networks Decision networks Reasoning about changes over time • Dynamic Bayesian Networks • Markov models

2014_0010_basic Probability and Bayes

Related documents

Products

Support

2014_0010_basic Probability and Bayes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib