Decision Making Under Uncertainty Russell and Norvig: ch 16 CMSC421 – Fall 2006 Utility-Based Agent sensors ? environment agent actuators Non-deterministic vs. Probabilistic Uncertainty ? a ? b c a b c {a,b,c} {a(pa),b(pb),c(pc)} decision that is best for worst case decision that maximizes expected utility value Non-deterministic model Probabilistic model ~ Adversarial search Expected Utility Random variable X with n values x1,…,xn and distribution (p1,…,pn) E.g.: Xi is Resulti(A)|Do(A), E, the state reached after doing an action A given E, what we know about the current state Function U of X E.g., U is the utility of a state The expected utility of A is EU[A|E] = Si=1,…,n p(xi|A)U(xi) = Si=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A)) One State/One Action Example s0 U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62 A1 s1 s2 s3 0.2 100 0.7 50 0.1 70 One State/Two Actions Example s0 A1 • U1(S0) = 62 • U2(S0) = 74 • U(S0) = max{U1(S0),U2(S0)} = 74 A2 s1 s2 s3 s4 0.2 100 0.7 0.2 50 0.1 70 0.8 80 Introducing Action Costs s0 A1 • U1(S0) = 62 – 5 = 57 • U2(S0) = 74 – 25 = 49 • U(S0) = max{U1(S0),U2(S0)} = 57 A2 -5 -25 s1 s2 s3 s4 0.2 100 0.7 0.2 50 0.1 70 0.8 80 MEU Principle rational agent should choose the action that maximizes agent’s expected utility this is the basis of the field of decision theory normative criterion for rational choice of action Not quite… Must have complete model of: Actions Utilities States Even if you have a complete model, will be computationally intractable In fact, a truly rational agent takes into account the utility of reasoning as well---bounded rationality Nevertheless, great progress has been made in this area recently, and we are able to solve much more complex decision theoretic problems than ever before We’ll look at Decision Theoretic Reasoning Simple decision making (ch. 16) Sequential decision making (ch. 17) Preferences An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations with uncertain prizes Lottery L = [p, A; (1 – p), B] Notation: A > B: A preferred to B A B : indifference between A and B A ≥ B : B not preferred to A Rational Preferences Idea: preferences of a rational agent must obey constraints Axioms of Utility Theory 1. 2. 3. 4. 5. Orderability: (A > B) v (B > A) v (A B) Transitivity: (A > B) ^ (B > C) (A > C) Contitnuity: A > B > C p [p, A; 1-p,C] B Substitutability: A B [p, A; 1-p,C] [p, B; 1-p,C] Monotonicity: A > B (p ≥ q [p, A; 1-p, B] ≥ [q, A; 1-q, B]) Rational Preferences Violating the constraints leads to irrational behavior E.g: an agent with intransitive preferences can be induced to give away all its money if B > C, than an agent who has C would pay some amount, say $1, to get B if A > B, then an agent who has B would pay, say, $1 to get A if C > A, then an agent who has A would pay, say, $1 to get C ….oh, oh! Rational Preferences Utility Theorem (Ramsey, 1931, von Neumann and Morgenstern, 1944): Given preferences satisfying the constraints, there exists a realvalued function U such that U(A) ≥ U(B) A ≥B U([p1,S1;…,pn,Sn])=Si piU(Si) MEU principle: Choose the action that maximizes expected utility Utility Assessment Standard approach to assessment of human utilites: compare a given state A to a standard lottery Lp that has best possible prize w/ prob. p worst possible catastrophy w/ prob. (1-p) adjust lottery probability p until ALp p A continue as before Lp 1-p instant death Aside: Money Utility function Given a lottery L with expected monetrary value EMV(L), usually U(L) < U(EMV(L)) e.g., people are risk-averse Would you rather have $1,000,000 for sure, or a lottery with [0.5, $0; 0.5, $3,000,000]? Decision Networks Extend BNs to handle actions and utilities Also called Influence diagrams Make use of BN inference Can do Value of Information calculations Decision Networks cont. Chance nodes: random variables, as in BNs Decision nodes: actions that decision maker can take Utility/value nodes: the utility of the outcome state. R&N example Prenatal Testing Example Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 1.0 P(~umb|~take)=1.0 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 Evaluating Decision Networks Set the evidence variables for current state For each possible value of the decision node: Set decision node to that value Calculate the posterior probability of the parent nodes of the utility node, using BN inference Calculate the resulting utility for action return the action with the highest utility Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 1.0 P(umb|~take)= 0 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 #1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 umb rain P(umb,rain | take) 0 0 0.2 x 0.6 0 1 0.2 x 0.4 1 0 0.8 x 0.6 1 1 0.8 x 0.4 U(umb,rain) = -25 #1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ??? Umbrella Network take/don’t take So, in this case I would…? P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 #2 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 umb rain P(umb,rain | ~take) 0 0 0. 9 x 0.6 0 1 0.9 x 0.4 1 0 0.1 x 0.6 1 1 0.1 x 0.4 U(umb,rain) = -25 #2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ??? Value of Information Idea: Compute the expected value of acquiring possible evidence Example: buying oil drilling rights Two blocks A and B, exactly one of them has oil, worth k Prior probability 0.5 Current price of block is k/2 What is the value of getting a survey of A done? Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5 Compute expected value of information (VOI) expected value of best action given the infromation minus expected value of best action without information VOI(Survey) = [0.5 x value of buy A given oil in A] + [0.5 x value of buy B given no oil in A] – 0 = ?? Value of Information (VOI) suppose agent’s current knowledge is E. The value of the current best action is EU( | E) max U(Result i (A))P(Result i (A) | E, Do(A)) A i the value of the new best action (after new evidence E’ is obtained): EU( | E, E) max U(Result i (A))P(Result i (A) | E, E, Do(A)) A i the value of information for E’ is: VOI(E) P(ek | E)EU( ek | ek , E) EU( | E) k Umbrella Network take/don’t take P(rain) = 0.4 Take Umbrella rain umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 happiness U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 forecast R P(F=rainy|R) 0 0.2 1 0.7 VOI VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) – EU() umb rain 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 umb rain 0 P(umb,rain | take, rainy) #1: EU(take|rainy) P(umb,rain | take, ~rainy) #3: EU(take|~rainy) umb rain 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 umb rain 0 P(umb,rain | ~take, rainy) #2: EU(~take|rainy) P(umb,rain |~take, ~rainy) #4: EU(~take|~rainy) Umbrella Network take/don’t take Take Umbrella F P(R=rain|F) 0 0.2 1 0.7 rain umbrella P(umb|take) = 0.8 P(umb|~take)=0.1 happiness forecast P(F=rainy) = 0.4 U(~umb, ~rain) = 100 U(~umb, rain) = -100 U(umb,~rain) = 0 U(umb,rain) = -25 umb rain 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 umb rain 0 P(umb,rain | take, rainy) #1: EU(take|rainy) P(umb,rain | take, ~rainy) #3: EU(take|~rainy) umb rain 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 umb rain 0 P(umb,rain | ~take, rainy) #2: EU(~take|rainy) P(umb,rain |~take, ~rainy) #4: EU(~take|~rainy) VOI VOI(forecast)= P(rainy)EU(rainy) + P(~rainy)EU(~rainy) – EU() Summary: Simple Decision Making Decision Theory = Probability Theory + Utility Theory Rational Agent operates by MEU Decision Networks Value of Information