Decision Making

advertisement
Decision Making Under
Uncertainty
Russell and Norvig: ch 16
CMSC421 – Fall 2006
Utility-Based Agent
sensors
?
environment
agent
actuators
Non-deterministic vs.
Probabilistic Uncertainty
?
a
?
b
c
a
b
c
{a,b,c}
{a(pa),b(pb),c(pc)}
 decision that is
best for worst case
 decision that maximizes
expected utility value
Non-deterministic model
Probabilistic model
~ Adversarial search
Expected Utility
Random variable X with n values x1,…,xn and
distribution (p1,…,pn)
E.g.: Xi is Resulti(A)|Do(A), E, the state reached after
doing an action A given E, what we know about the
current state
Function U of X
E.g., U is the utility of a state
The expected utility of A is
EU[A|E] = Si=1,…,n p(xi|A)U(xi)
= Si=1,…,n p(Resulti(A)|Do(A),E)U(Resulti(A))
One State/One Action Example
s0
U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1
= 20 + 35 + 7
= 62
A1
s1
s2
s3
0.2
100
0.7
50
0.1
70
One State/Two Actions Example
s0
A1
• U1(S0) = 62
• U2(S0) = 74
• U(S0) = max{U1(S0),U2(S0)}
= 74
A2
s1
s2
s3
s4
0.2
100
0.7 0.2
50
0.1
70
0.8
80
Introducing Action Costs
s0
A1
• U1(S0) = 62 – 5 = 57
• U2(S0) = 74 – 25 = 49
• U(S0) = max{U1(S0),U2(S0)}
= 57
A2
-5
-25
s1
s2
s3
s4
0.2
100
0.7 0.2
50
0.1
70
0.8
80
MEU Principle
rational agent should choose the action
that maximizes agent’s expected utility
this is the basis of the field of decision
theory
normative criterion for rational choice of
action
Not quite…
Must have complete model of:



Actions
Utilities
States
Even if you have a complete model, will be
computationally intractable
In fact, a truly rational agent takes into account the
utility of reasoning as well---bounded rationality
Nevertheless, great progress has been made in this
area recently, and we are able to solve much more
complex decision theoretic problems than ever before
We’ll look at
Decision Theoretic Reasoning


Simple decision making (ch. 16)
Sequential decision making (ch. 17)
Preferences
An agent chooses among prizes (A, B, etc.)
and lotteries, i.e., situations with uncertain
prizes
Lottery L = [p, A; (1 – p), B]
Notation:
A > B: A preferred to B
A  B : indifference between A and B
A ≥ B : B not preferred to A
Rational Preferences
Idea: preferences of a rational agent must obey
constraints
Axioms of Utility Theory
1.
2.
3.
4.
5.
Orderability:
(A > B) v (B > A) v (A  B)
Transitivity:
(A > B) ^ (B > C) (A > C)
Contitnuity:
A > B > C  p [p, A; 1-p,C]  B
Substitutability:
A  B  [p, A; 1-p,C]  [p, B; 1-p,C]
Monotonicity:
A > B  (p ≥ q  [p, A; 1-p, B] ≥ [q, A; 1-q, B])
Rational Preferences
Violating the constraints leads to irrational behavior
E.g: an agent with intransitive preferences can be
induced to give away all its money




if B > C, than an agent who has C would pay some amount,
say $1, to get B
if A > B, then an agent who has B would pay, say, $1 to get
A
if C > A, then an agent who has A would pay, say, $1 to get
C
….oh, oh!
Rational Preferences  Utility
Theorem (Ramsey, 1931, von Neumann and
Morgenstern, 1944): Given preferences
satisfying the constraints, there exists a realvalued function U such that
U(A) ≥ U(B)  A ≥B
U([p1,S1;…,pn,Sn])=Si piU(Si)
MEU principle: Choose the action that
maximizes expected utility
Utility Assessment
Standard approach to assessment of human
utilites:
compare a given state A to a standard lottery
Lp that has
best possible prize w/ prob. p
worst possible catastrophy w/ prob. (1-p)
adjust lottery probability p until ALp
p
A

continue as before
Lp
1-p
instant death
Aside: Money  Utility function
Given a lottery L with expected monetrary
value EMV(L),


usually U(L) < U(EMV(L))
e.g., people are risk-averse
Would you rather have $1,000,000 for sure,
or a lottery with [0.5, $0; 0.5, $3,000,000]?
Decision Networks
Extend BNs to handle actions and
utilities
Also called Influence diagrams
Make use of BN inference
Can do Value of Information
calculations
Decision Networks cont.
Chance nodes: random variables, as in
BNs
Decision nodes: actions that decision
maker can take
Utility/value nodes: the utility of the
outcome state.
R&N example
Prenatal Testing Example
Umbrella Network
take/don’t take
P(rain) = 0.4
Take Umbrella
rain
umbrella
P(umb|take) = 1.0
P(~umb|~take)=1.0
happiness
U(~umb, ~rain) = 100
U(~umb, rain) = -100
U(umb,~rain) = 0
U(umb,rain) = -25
Evaluating Decision Networks
Set the evidence variables for current state
For each possible value of the decision node:



Set decision node to that value
Calculate the posterior probability of the parent
nodes of the utility node, using BN inference
Calculate the resulting utility for action
return the action with the highest utility
Umbrella Network
take/don’t take
P(rain) = 0.4
Take Umbrella
rain
umbrella
P(umb|take) = 1.0
P(umb|~take)= 0
happiness
U(~umb, ~rain) = 100
U(~umb, rain) = -100
U(umb,~rain) = 0
U(umb,rain) = -25
Umbrella Network
take/don’t take
P(rain) = 0.4
Take Umbrella
rain
umbrella
P(umb|take) = 0.8
P(umb|~take)=0.1
#1
happiness
U(~umb, ~rain) = 100
U(~umb, rain) = -100
U(umb,~rain) = 0
umb
rain
P(umb,rain | take)
0
0
0.2 x 0.6
0
1
0.2 x 0.4
1
0
0.8 x 0.6
1
1
0.8 x 0.4
U(umb,rain) = -25
#1: EU(take) = 100 x .12 + -100 x 0.08 + 0 x 0.48 + -25 x .32 = ???
Umbrella Network
take/don’t take
So, in this case
I would…?
P(rain) = 0.4
Take Umbrella
rain
umbrella
P(umb|take) = 0.8
P(umb|~take)=0.1
#2
happiness
U(~umb, ~rain) = 100
U(~umb, rain) = -100
U(umb,~rain) = 0
umb
rain
P(umb,rain | ~take)
0
0
0. 9 x 0.6
0
1
0.9 x 0.4
1
0
0.1 x 0.6
1
1
0.1 x 0.4
U(umb,rain) = -25
#2: EU(~take) = 100 x .54 + -100 x 0.36 + 0 x 0.06 + -25 x .04 = ???
Value of Information
Idea: Compute the expected value of acquiring possible
evidence
Example: buying oil drilling rights



Two blocks A and B, exactly one of them has oil, worth k
Prior probability 0.5
Current price of block is k/2
What is the value of getting a survey of A done?

Survey will say ‘oil in A’ or ‘no oil in A’ w/ prob. 0.5
Compute expected value of information (VOI)

expected value of best action given the infromation minus expected
value of best action without information
VOI(Survey) = [0.5 x value of buy A given oil in A] +
[0.5 x value of buy B given no oil in A] – 0
= ??
Value of Information (VOI)
suppose agent’s current knowledge is E. The value
of the current best action  is
EU( | E)  max  U(Result i (A))P(Result i (A) | E, Do(A))
A
i
the value of the new best action (after new evidence
E’ is obtained):
EU( | E, E)  max  U(Result i (A))P(Result i (A) | E, E, Do(A))
A
i
the value of information for E’ is:
VOI(E)   P(ek | E)EU( ek | ek , E) EU( | E)
k
Umbrella Network
take/don’t take
P(rain) = 0.4
Take Umbrella
rain
umbrella
P(umb|take) = 0.8
P(umb|~take)=0.1
happiness
U(~umb, ~rain) = 100
U(~umb, rain) = -100
U(umb,~rain) = 0
U(umb,rain) = -25
forecast
R
P(F=rainy|R)
0
0.2
1
0.7
VOI
VOI(forecast)=
P(rainy)EU(rainy) +
P(~rainy)EU(~rainy) –
EU()
umb
rain
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
umb
rain
0
P(umb,rain | take, rainy)
#1: EU(take|rainy)
P(umb,rain | take, ~rainy)
#3: EU(take|~rainy)
umb
rain
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
umb
rain
0
P(umb,rain | ~take, rainy)
#2: EU(~take|rainy)
P(umb,rain |~take, ~rainy)
#4: EU(~take|~rainy)
Umbrella Network
take/don’t take
Take Umbrella
F
P(R=rain|F)
0
0.2
1
0.7
rain
umbrella
P(umb|take) = 0.8
P(umb|~take)=0.1
happiness
forecast
P(F=rainy) = 0.4
U(~umb, ~rain) = 100
U(~umb, rain) = -100
U(umb,~rain) = 0
U(umb,rain) = -25
umb
rain
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
umb
rain
0
P(umb,rain | take, rainy)
#1: EU(take|rainy)
P(umb,rain | take, ~rainy)
#3: EU(take|~rainy)
umb
rain
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
umb
rain
0
P(umb,rain | ~take, rainy)
#2: EU(~take|rainy)
P(umb,rain |~take, ~rainy)
#4: EU(~take|~rainy)
VOI
VOI(forecast)=
P(rainy)EU(rainy) +
P(~rainy)EU(~rainy) –
EU()
Summary: Simple Decision
Making
Decision Theory = Probability Theory +
Utility Theory
Rational Agent operates by MEU
Decision Networks
Value of Information
Download