Chapter 11

advertisement
Chapter 11
What is Utility?
• A way of representing
preferences
• Utility is not money (but it is a
useful analogy)
• Typical relationship between
utility & money:
1
MultiAgent Interactions
• Each agent has preferences. Each agent gets
utility depending on their choices and the
choices of the other
• We can write this set of two utilities as a payoff
matrix as follows.
2
PayOff Matrices
- Business plan. Success depends on other’s choices.
- For simplicity, consider two players.
- I make a choice, but the consequences of that choice depends
on what you do.
- Let c(a1,a2) denote the consequence that results when I (agent
1) choose action a1 and you (agent 2) choose action a2
- Let the utility of that consequence be u1[c(a1,a2)], we often
abbreviate this as u1(a1,a2).
- There is a sound mathematical theory called Game Theory for
dealing with multi-agent choice problems when every agent
knows its utilities and the utilities of all other agents.
- the field of game theory came into being with the 1944 book
Theory of Games and Economic Behavior by John von Neumann
and Oskar Morgenstern. 1930 some early work began.
3
Normal Form
• When outcomes depend only on a single choice by me
and a single choice by you, then the game is said to be
in normal form. This terminology represents the idea
that the game is in a normative form, meaning a
canonical form.
• The payoff matrix expresses games in normal form, but
the game tree expresses games in extensive formrepresenting turn taking. It will probably be helpful to
associate the phrase extensive form with the notion of
a tree, and the phrase normal form with the notion of
a payoff matrix.
4
Strategy
• Game theorists can reason about strategies, based on
contingencies, rather than discrete actions.
• In this context, a strategy is not an attitude like play aggressively or
defend the goal, but rather a complete expression of what to do in
every contingency. As described by Poundstone,
– [A strategy] is a complete description of a particular way
to play a game, no matter what the other player(s) does
and no matter how long the game lasts . A strategy must
prescribe actions so thoroughly that you never have to
make a decision in following it.
• A strategy is defined as a choice made in response to the previous
choices made in the game.
5
Prisoners’ dilemma – damaged
property
Ned
Confess
Don’t
Confess
Both get large
fines
Ned is
suspended
Kelly gets wrist
slapped
Kelly is
suspended
Ned gets wrist
slapped
Both get
minimal
fines
Confess
Kelly
Don’t
Confess
6
Prisoners’ dilemma
Ned
Don’t
Confess
Confess
(defect)
Confess
Kelly
(cooperate)
(defect)
2,2
5,0
Don’t
Confess
0,5
3,3
(cooperate)
7
Solution Concepts
• Three different approaches:
1.minimax, maximin
2.Nash equilibria
3.Pareto optimal
Maximin: look at worst case scenario. Pick result that maximizes the
worst case
Confess
(defect)
Not confess
(cooperative)
Results
2 or 5
0 or 3
Worst Case
2
0
8
Best Response
Suppose for a moment that I am P1 and that I know what P2 will
choose. Given that I know P2's choice, I can search through all of my
choices and find the option that is the best response to his choice. For the
prisoner's dilemma, if P2 chooses to confess then the option that maximizes
my payoff is to confess. Similarly, if P2 chooses to not confess then the
option that maximizes my payoff is to confess.
Best response – simply the best thing I can do GIVEN I know what you will
do.
BUT, I don’t know what you will do. It has to do with regret.
This prisoners dilemma problem makes it easy – as I pick the same thing
either way.
9
Prisoners’ dilemma – Best
Response
Ned
Don’t
Confess
Confess
(defect)
Confess
Kelly
(cooperate)
(defect)
2,2
5,0
Don’t
Confess
0,5
3,3
(cooperate)
10
Best Response
Suppose that instead of knowing exactly which choices P2 was going to make, I
know only that P2 will play confess, say, 40% of the time and not confess 60% of
the time.
I can find the expected utility for confessing (.4*(2)+.6*(5) = 3.8 and the expected
utility for not confessing (.4*(0)+.6*(3) = 1.8), which tells me that my best
response to P2's strategy is to confess. The notion of a best response is very useful
to help understand other solution concepts from multi-agent choice.
For example, the maximin solution can be viewed as the best-response (i.e., the
maximal payoff) when I believe that the other player will always be able to choose
the thing that hurts me the most.
11
Prisoners’
dilemma
Note that no matter what
Ned does, Kelly is better
off if she confesses than
if she does not confess.
So ‘confess’ is a
dominant strategy from
Kelly’s perspective. We
can predict that she will
always confess.
Confess
Ned
Confess
Don’t
Confess
2,2
5,0
0,5
3,3
Kelly
Don’t
Confess
12
Nash Equilibrium
• Suppose you decide to look at the best choice for both of you.
Clearly, ”not confess” is best for both.
• Assuming both of you pick that “best for both solution”
– You observe that if you confess, your utility improves.
– You observe that if the other player confesses, HIS utility
improves.
– You notice that if you both confess, neither player has the
motivation to change his mind.
• A Nash equilibrium is a set of solutions where every player's
choice is a best response to every other player's choice. In
other words, the equilibrium solution is made up of a set of
individual choices that make up a joint action from which
neither benefits by deviating.
13
Nash Equilibrium – (Beautiful Mind)
•
•
•
In general, we will say that two strategies s1 and s2 are in Nash
equilibrium if:
1. under the assumption that agent i plays s1, agent j can do no better
than play s2; and
2. under the assumption that agent j plays s2, agent i can do no better
than play s1.
Neither agent has any incentive to deviate from a Nash equilibrium- it is
stable
Unfortunately:
1. Not every interaction scenario has a Nash equilibrium
2. Some interaction scenarios have more than one Nash equilibrium
14
Criteria for evaluating systems
• Social welfare: maxoutcome ∑i ui(outcome) where ui is the utility for player i.
• Surplus: social welfare of outcome – social welfare of status quo
– Constant sum games have 0 surplus (as all options sum to same total).
Markets are not constant sum
• Pareto efficiency: An outcome o is Pareto efficient if there exists no other
outcome o’ such that some agent has higher utility in o’ than in o and no agent
has lower utility
– if we maximize social welfare we have pareto optimal, but not vice versa
– Not a very useful way of selecting strategies
• Individual rationality: Agent will do what is best for him
• Stability: No agents can increase their utility by changing their strategies (given
everyone else keeps the same strategy). If I knew what my opponent would do,
would I still be satisfied with my decision?
• Symmetry: I get same utility as you if our roles were reversed.
• No dictator: no agent is inherently preferred.
15
Example: Prisoner’s Dilemma
• Two people are arrested for a crime. If neither suspect confesses, both get
light sentence. If both confess, then they get sent to jail. If one confesses
and the other does not, then the confessor gets no jail time and the other
gets a heavy sentence.
• (Actual numbers vary in different versions of the
problem, but relative values are the same)
Dominant
Strategy Equil
not pareto optimal
Confess
Don’t
Confess
Pareto optimal
Confess
2,2
0,5
Don’t
Confess
5,0
3,3
Maximize
social
welfare
16
Pareto Optimal
• Both maximin and Nash equilibrium solutions are pretty
pessimistic; they frame the problem as a competitive game
between my interests and your interests.
• This competition is not healthy because they both produce
a solution (to prisoners dilemma) with both confessing, and
both therefore receiving next to least favorite outcome of
all possibilities (5,3,2,0).
•
The idea behind Pareto optimality is that some joint
solutions should obviously be avoided because they are
bad for everyone.
17
Stated as ordinal (position) rather than cardinal
(how much) values…
Defect
Defect
Cooperate
Cooperate
2,2
4,1
1,4
3,3
18
The term pareto efficient…
• The term pareto efficient is named after Vilfredo Pareto,
an Italian economist who used the concept in his studies
of economic efficiency and income distribution.
• He is also the one credited with the 80/20 rule to
describe the unequal distribution of wealth in his country,
observing that twenty percent of the people owned
eighty percent of the wealth.
• If an economic system is not Pareto efficient, then it is the
case that some individual can be made better off without
anyone being made worse off. It is commonly accepted
that such inefficient outcomes are to be avoided, and
therefore Pareto efficiency is an important criterion for
evaluating economic systems and political policies.
19
Strategic Dominance
• One of my actions is better than my other
choices regardless of what my opponent picks.
(Example is a non-symmetric game)
P1 is always better off by playing C
Is there a dominant strategy for P2? (No, why?)
A strategically dominant solution is a solution which is a best
response for every possible choice made by the other players.
20
Satisficing Equilibrium
• There are other, non-traditional solution concepts that are
relevant for multi-agent games.
• One of these solution concepts is the notion of satisficing
equilibrium.
• The word satisfice means to strive for something that is
sufficient. A satisficing equilibrium occurs when agents have
arrived at choices such that the consequence produced by
these choices yields utilities that all agents are satisfied
with.
• It is an equilibrium because a satisficing agent is content
with what they have. If all agents are content, no agent has
an incentive to change its actions, which means that the
solution is stable.
• In real world, may not know (or have resources to evaluate)
all options, but can make a decision knowing that it is “good
enough”
21
At seats: Show in normal formwrestling
• there is a widespread practice in high school wrestling
where the participants intentionally lose unnaturally large
amounts of weight so as to compete against lighter
opponents.
• In doing so, the participants are clearly not at their top
level of physical and athletic fitness and yet often end up
competing against the same opponents anyway, who have
also followed this practice (mutual defection).
• The result is a reduction in the level of competition. Yet if
a participant maintains their natural weight (cooperating),
they could compete against a stronger opponent who has
lost considerable weight.
22
Utility: inconvenience (0,-1) +
competitive advantage (-3,3)
Losing to compete
Losing to compete
Compete at normal weight
Compete at normal weight
-1, -1
2, -3
-3,2
0,0
So what is best option?
23
Game of Chicken
•Consider another type of encounter — the game of chicken:
Ned
Kelly
straight
swerve
straight
swerve
-10, -10
5, 10
10, 5
7, 7
•(Think of James Dean in Rebel without a Cause)
•Difference from prisoner’s dilemma:
Mutually going straight is most feared outcome.
(Whereas sucker’s payoff is most feared in prisoner’s dilemma.)
24
Game of Chicken
straight
swerve
straight
swerve
-10, -10
5, 10
10, 5
7, 7
•Is there a dominant strategy?
•Is there a pareto optimal (can’t do better without making
someone worse)?
•Is there a “Nash” equilibrium – knowing what my
opponent is going to do, would I be happy with my
decision?
25
Try this one
coop
defect
coop
defect
5,5
0,0
0,0
10,10
•Is there a dominant strategy?
•Is there a pareto optimal (can’t do better without making
someone worse)?
•Is there a “Nash” equilibrium – knowing what my
opponent is going to do, would I be happy with my
decision?
26
And this one
coop
defect
coop
defect
1,0
3,3
4,3
5,2
•Is there a dominant strategy?
•Is there a pareto optimal (can’t do better without making
someone worse)?
•Is there a “Nash” equilibrium – knowing what my
opponent is going to do, would I be happy with my
decision?
27
And this one
coop
defect
coop
defect
1,0
3,3
4,3
5,2
•Is there a dominant strategy?
•Is there a pareto optimal (can’t do better without making
someone worse)?
•Is there a “Nash” equilibrium – knowing what my
opponent is going to do, would I be happy with my
decision?
28
Free Rider
•described by Poundstone
• It's late at night, and there's
no one in the subway
station. Why not just hop over
the turnstiles and save yourself
the fare? But remember, if
everyone hopped the turnstiles,
the subway system would go
broke, and no one would be
able to get anywhere.
What's the chance that your
lost fare will bankrupt the
subway system? Virtually zero.
The trains run whether the cars
are empty or full. In no way
does an extra passenger
increase the system's operating
expenses.
But if everybody thinks this way
29
Normal form game*
(matching pennies)
H
Agent 2
T
Action
H
Outcome
-1, 1
1, -1
Payoffs
Agent 1
T
1, -1
-1, 1
*aka strategic form, matrix form
30
Extensive form game
(matching pennies)
Player 2 doesn’t know
what has been played
so he doesn’t know which
node he is at.
Player 1
Action
T
H
Player 2
H
Terminal node
(outcome)
(-1,1)
T
(1,-1)
H
T
(1,-1)
Payoffs (player1,player 2)
(-1,1)
31
• Strategy:
Strategies
– A strategy, sj, is a complete contingency plan; defines actions
which agent j should take for all possible states of the world. In
these simple games, the state is always “the beginning”.
• Strategy profile: s=(s1,…,sn) – what each agent did
(assuming n players).
– s-i = (s1,…,si-1,si+1,…,sn) - what everyone else did
• Utility function: ui(s)
– Note that the utility of an agent depends on the strategy profile,
not just its own strategy
– We assume agents are expected utility maximizers
32
Normal form game*
(matching pennies)
H
H
Agent 2
T
-1, 1
1, -1
1, -1
-1, 1
Agent 1
T
*aka strategic form, matrix form
Strategy for
agent 1: H
Strategy for
agent 2: T
Strategy
profile
(H,T)
U1((H,T))=1
U2((H,T))=-1
33
Battle of the Sexes
Consider the famous game of Battle of the
Sexes. In the game, a husband and wife must
independently decide on a date activity. The
husband would prefer one form of entertainment,
say fishing, and the wife would prefer another
form of entertainment, say shopping for
clothes. Although both have their most preferred
activity, both prefer being together to being alone.
Wife preference
Shopping
Fishing
Fishing
2,2
4,3
3,4
1,1
Husband preference
Shopping
34
Maximin
Wife Looks at worst option for her.
Pick solution with maximizes
Wife preference
Worst
Choice
Fishing
Husband preference
Shopping
2
3
4
1
35
Reactions
• Maximin – both should be selfish (and do what they want).
• Not a great solution as if either “defects” (from the selfish
choice) makes both happier.
• Is better than worst case of (1,1)
• In fact, either consequence that results when one is selfish
and the other unselfish dominates (in the Pareto-optimal
sense) the maximin solution, and both of these
consequences are in equilibrium (since neither player
benefits by unilaterally changing his/her
mind). Unfortunately, with no way to communicate the
players are left with making independent choices to try and
reach an equilibrium.
36
Reaction?
• Could just randomly pick a strategy – Termed
mixed strategy.
How do you think that would work?
Battle of the sexes (cont)
• Additionally, using a mixed strategy (picking each
option a fraction of the time) does not help their
chances.
• In fact, the expected payoffs for two independent
mixed strategies are pretty bad; if both randomly
choose, the expected payoff is only 2.5 --- not much
better than the maximin value.
• For your information, the set of possible payoffs for all
possible combinations of mixed strategies are
illustrated below.
38
Shows how various combinations of mixed strategies interact.
Randomly pick a mixed strategy for each, then plot the result.
39
Battle of the Sexes
Suppose husband picks fishing
2/3 of the time and wife picks
2/3 shopping.
Wife preference
Shopping 2/3
Fishing 2/3
2,2
(4/9)
Fishing 1/3
4,3 (2/9)
Husband preference
Shopping 1/3
3,4 (2/9)
utility: 2*4/9 + 4*2/9 + 3*2/9 + 1(1/9) = 2.56
1,1 (1/9)
40
A simple competition game
Note – no player
has a dominant
strategy. But low
is dominated for
both players. So
we can predict
that neither will
play low.
Remove it.
Donna
Pierce
High
High
Medium
Medium
Low
60, 60
36, 70
36, 35
70, 36
50, 50
30, 35
35, 36
35, 30
25, 25
Low
41
Iterated Elimination of Dominated Strategies
• Let RiSi be the set of removed strategies for agent i
• Initially Ri=Ø
• Choose agent i, and strategy si such that siSi\Ri (Si
subtract Ri) and there exists si’ Si\Ri such that
ui(si’,s-i)>ui(si,s-i) for all s-i S-i\R-i
• Add si to Ri, continue
• Theorem: If a unique strategy profile, s*, survives
iterated elimination, then it is a Nash Eq.
• Theorem: If a profile, s*, is a Nash Eq then it must
survive iterated elimination.
42
A simple competition game
Once we have
removed low,
medium is now a
dominant strategy
for both. So we
predict that both
Pierce and Donna
will play medium.
Donna
Pierce
High
High
Medium
Medium
60, 60
36, 70
70, 36
50, 50
Low
Low
43
Example – Zero Sum (most vicious)
(We divide the same cake. If I lose, you win.)
Bi matrix form (show utilities separately each player)
•Cake slicing
•Two players
–cutter
–chooser
Cutter's
Utility
Choose
bigger
piece
Choose
smaller
piece
Cut cake
evenly
½ - a bit
½ + a bit
Cut
unevenly
Small piece
Big piece
Chooser's
Utility
Choose
bigger
piece
Choose
smaller
piece
Cut cake
evenly
½ + a bit
½ - a bit
Cut
unevenly
Big piece
Small piece
44
Zero Sum
• Scientists debate whether zero sum scenarios
really exist.
• However, many TREAT situations as if they did.
45
Rationality
•Rationality
Cutter's
Utility
Choose
bigger
piece
Choose
smaller
piece
Cut cake
evenly
(-1, +1)
(+1, -1)
Cut
unevenly
(-10, +10)
(+10, -10)
–each player will take highest utility option
–taking into account the other player's likely behavior
•In example
–if cutter cuts unevenly
•he might like to end up in the lower right
•but the other player would never do that
–-10
–if the current cuts evenly,
•he will end up in the upper left
–-1
•this is a stable outcome
–neither player has an incentive to deviate
46
Other Symmetric 2 x 2 Games
• Given the 4 possible outcomes of (symmetric)
cooperate/defect games, there are 24 possible
orderings on outcomes (showing preference for first
player)
– CC  CD  DC  DD
Cooperation dominates
– DC  DD  CC  CD
Defect dominates
Deadlock. You will always do best by defecting
– DC  CC  DD  i CD
Prisoner’s dilemma
– DC  CC  CD  DD
Chicken
– CC  DC  DD  CD
Stag hunt
47
Download