pptx - Department of Computer and Information Sciences

advertisement
Nash’s Theorem
Theorem (Nash, 1951): Every finite game (finite number of players, finite
number of pure strategies) has at least one mixed-strategy Nash equilibrium.
Nash, John (1951) "Non-Cooperative Games" The Annals of
Mathematics 54(2):286-295.
(John Nash did not call them “Nash equilibria”, that name came later.)
He shared the 1994 Nobel Memorial Prize in Economic Sciences with game
theorists Reinhard Selten and John Harsanyi for his work on Nash equilibria.
He suffered from schizophrenia in the 1950s and 1960s, as depicted in the
1998 film, “A Beautiful Mind”. He nevertheless recovered enough to return to
academia and continue his research.
More on Constant-Sum Games
Minimax Theorem (John von Neumann, 1928): For every two-person, zero-sum game
with finitely many pure strategies, there exists a mixed strategy for each player and a
value V such that:
Given player 2’s strategy, the best possible payoff for player 1 is V
Given player 1’s strategy, the best possible payoff for player 2 is –V.
The existence of strategies part is a special case of Nash’s theorem, and a precursor to
it.
This basically says that player 1 can guarantee himself a payoff of at least V, and
player 2 can guarantee himself a payoff of at least –V. If both players play optimally,
that’s exactly what they will get.
It’s called “minimax” because the players get this value by pursuing a strategy that
tries to minimize the maximum payoff of the other player. We’ll come back to this.
Definition: The value V is called the value of the game.
Eg: The value of Rock-paper-scissors is 0; the best that P1 can hope to achieve,
assuming P2 plays optimally (1/3 probability of each action), is a payoff of 0.
Computing Nash Equilibria
• In general, it’s quite expensive, although it’s
not known exactly how this relates to P or NP.
• For two-person, constant-sum games, this
problem reduces to another problem called
“Linear Programming”, which is in P.
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
The Odds and Evens
game has no purestrategy Nash equilibria.
By Nash’s theorem, it
must have a mixedstrategy Nash
equilibrium.
How can we find it?
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Let’s start by making
some definitions.
Let p1 be the probability
that the Even player
plays 1 finger, in the Nash
equilibrium. So with
probability 1-p1, Even
will play 2 fingers.
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Let’s start by making
some definitions.
Likewise, let q1 be the
probability that the Odd
player plays 1 finger, in
the Nash equilibrium. So
with probability 1-q1,
Odd will play 2 fingers.
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Next, let’s write down
what we know about the
outcomes, in terms of p1
and q1.
In equilibrium, Odd’s
expected payoff is:
q1*p1*(-2) +
q1*(1-p1)*(+3) +
(1-q1)*p1*(+3) +
(1-q1)*(1-p1)*(-4)
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Next, let’s write down
what we know about the
outcomes, in terms of p1
and q1.
In equilibrium, Even’s
expected payoff is:
q1*p1*(+2) +
q1*(1-p1)*(-3) +
(1-q1)*p1*(-3) +
(1-q1)*(1-p1)*(+4)
Computing Nash Equilibria:
2-person, Zero-Sum Games
Observation:
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
If Even selects p1 so that
Odd gets a higher utility
by playing 1 finger
instead of 2 fingers, then
Odd will always select 1
finger.
But that can’t be an
equilibrium!
(Why not?)
Computing Nash Equilibria:
2-person, Zero-Sum Games
Observation:
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Likewise, if Even selects
p1 so that Odd gets a
higher utility by playing 2
fingers instead of 1
fingers, then Odd will
always select 2 fingers.
But that can’t be an
equilibrium, either!
Computing Nash Equilibria:
2-person, Zero-Sum Games
Observation:
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
So, the only possible
equilibrium has Even
selecting p1 so that Odd’s
payoff for selecting 1
finger equals Odd’s
payoff for selecting 2
fingers.
Computing Nash Equilibria:
2-person, Zero-Sum Games
In algebra:
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Odd’s payoff when Even
plays 1 finger with
probability p1, and Odd
always plays 1 finger:
p1*(-2) + (1-p1)*(+3)
Odd’s payoff when Even
plays 1 finger with
probability p1, and Odd
always plays 2 fingers:
p1*(+3) + (1-p1)*(-4)
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Our observation says
these should be equal:
p1*(-2) + (1-p1)*(+3)
= p1*(+3) + (1-p1)*(-4)
=>
-2p1 + 3 – 3p1 =
3p1 -4 + 4p1
=>
7 = 12p1
=>
p1 = 7 / 12
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
We could have done this
for either player; here it is
from Odd’s perspective:
q1*(+2) + (1-q1)*(-3)
= q1*(-3) + (1-q1)*(+4)
=>
2q1 – 3 + 3q1 =
-3 q1 +4 -4q1
=>
12q1 = 7
=>
q1 = 7/12
Computing Nash Equilibria:
2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
So now we know a mixedstrategy Nash equilibrium:
POdd(1 finger) = 7/12
POdd(2 fingers) = 5/12
+3, -3
PEven(1 finger) = 7/12
PEven(2 fingers) = 5/12
+3, -3
-4, +4
The Odds and Evens Game
Quiz: 2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
What is the value of this
game for Even?
(Remember, the value of
the game is the expected
payoff for the player in
equilibrium.)
Likewise, what is the value
of the game for Odd?
Answer: 2-person, Zero-Sum Games
You can get the value for
Even three ways:
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
+3, -3
-4, +4
The Odds and Evens Game
Recall: In equilibrium,
Even’s expected payoff is:
q1*p1*(+2) +
q1*(1-p1)*(-3) +
(1-q1)*p1*(-3) +
(1-q1)*(1-p1)*(+4)
Or, q1*(+2) + (1-q1)*(-3)
or, q1*(-3) + (1-q1)*(+4)
These all equal: -1/12
Answer: 2-person, Zero-Sum Games
Even Player
2 finger 1 finger
Odd Player
1 finger
2 fingers
-2, +2
+3, -3
You can get the value for
Odd the same three ways,
or you can just say that
this is a zero-sum game, so
the value for Odd must be
opposite the value for
Even:
+1/12
+3, -3
-4, +4
The Odds and Evens Game
In other words, it’s better
to be the Odd player than
the Even player, since Odd
will win, on average.
2-person games with more actions
When there are more actions available than 2 per
person, the simple algorithm I gave will no longer
work.
However, it is still possible to compute Nash
equilibria for zero-sum games in polynomial time
using a technique called Linear Programming.
Linear Programming is a well-known kind of
problem with existing solvers, and I won’t cover it in
detail here.
Quiz: Computing an Equilibrium for
Zero-Sum Games
Player 1
X
Y
+5, -5
+2, -2
2. What is the
probability that P2
plays X?
X
Player 2
+3, -3
Y
In equilibrium,
1. What is the
probability that P1
plays X?
+6, -6
3. What is the value of
the game for P1?
Answer: Computing an Equilibrium for
Zero-Sum Games
Player 1
X
Y
+5, -5
+2, -2
2. What is the
probability that P2
plays X? 0.5
X
Player 2
+3, -3
Y
In equilibrium,
1. What is the
probability that P1
plays X? 2/3
+6, -6
3. What is the value of
the game for P1?
4
Games beyond this class’s limits
There are MANY aspects of games and Game
Theory in AI that we will not cover. I’ll briefly
mention some of them:
1. Repeated games and Learning
2. Communication between agents
3. Mechanism Design: How to create games so
that agents have the incentives to behave in
desirable ways (eg, voting and auctions)
1. Repeated Games and Learning
Many games (e.g., Rock-Paper-Scissors) are
typically played multiple times.
These are called repeated games.
This can change the incentive structure and the
best strategies:
E.g., in the Prisoner’s Dilemma, it might be better to say
nothing if you believe you can teach your opponent to
cooperate and say nothing as well.
Learning and Teaching
in Repeated Games
This history of play in repeated games offer
examples of your opponent’s strategy.
This provides an opportunity for learning.
It also provides an opportunity for teaching!
In multi-agent settings with repeated games, every
agent is both a learner and a teacher.
Example learning strategy:
“Fictitious Play”
Idea: build a model of what the opponent’s
strategy is, and then play a best response.
Fictitious Play Learning
1. Create an array A that has an entry for each of the
opponent’s actions. Initialize with prior beliefs.
2. Repeat:
• Assuming the counts in A represent the opponents
mixed strategy, play a best response to A.
• Observe the opponent’s action, and update the
appropriate count in A.
Some Theoretical Results about
Fictitious Play
Theorem: If both players use fictitious play, and
if the empirical distribution of their chosen
actions converges, then it converges to a Nash
equilibrium.
Theorem: In zero sum games, if both players
use fictitious play, they will converge on a Nash
equilibrium.
2. Communication in Games
Sometimes, communication can improve player
outcomes.
Player 1 says: “I will play C”. Response?
+1, +1
0, +5
+5, 0
+3, +3
D
C
+1, +1
0, 0
0, 0
+1, +1
C
C
D
C
D
D
Player 1 says: “I will play C”. Response?
Prisoner’s Dilemma
Coordination game
2. Communication in Games
In the coordination game, P1’s statement is selfcommiting and self-revealing, so believable.
Player 1 says: “I will play C”. Response?
+1, +1
0, +5
+5, 0
+3, +3
D
C
+1, +1
0, 0
0, 0
+1, +1
C
C
D
C
D
D
Player 1 says: “I will play C”. Response?
Prisoner’s Dilemma
Coordination game
3. Mechanism Design: Creating Games
with Desired Outcomes
Elections and auctions are examples of games: they
involve multiple agents, possible actions for each
agent (who to vote for, how much to bid), and
outcomes that depend on all of the agents’
outcomes.
“Mechanism Design” is the study of creating a
reward structure so that we have good outcomes,
such as that the most popular politician gets
elected, or that the person who benefits most from
a good wins the auction.
Arrow’s Theorem
Definition: A voting mechanism is dictatorial if it exactly follows the preferences of a
single voter (called the dictator).
Theorem (Arrow, 1951) (Informally): Any voting mechanism in which voters express
their true preferences for the outcomes (candidates)
1. that has at least 3 outcomes
2. that always selects the most popular outcome
3. and where the choice between two outcomes is not affected by other lesspopular outcomes
must be dictatorial.
Note: This is a well-known example of an impossibility theorem: a theorem that says
it is impossible to design a game with a certain list of desirable properties.
This theorem and many like it don’t apply to certain kinds of voting, like rating systems
(where voters rate each outcome, for example on a scale of 1-10, rather than
specifying preferences.) But it does apply to most voting mechanisms in modern
democracies.
Which property does the US presidential voting system fail on?
Second-Price Auctions
Definition: A second-price auction awards the good
to the highest bidder, and charges a price equal to
the second-highest bid.
Quiz: Second-Price Auctions
Fill in the matrix of payoffs
Let v=10 be your value for a good.
Let b be your bid.
Let c be the highest bid by
anyone else in the auction.
Your payoff is:
v-c if b > c (you win the auction)
0 if b <= c (you lose the auction)
C=7
C=9
C=11
C=13
B=12
B=10
B=8
Is there a dominant strategy?
If yes, is the strategy “truth-revealing”? (That is, does the
strategy make you bid exactly how much you value the good?)
Answer: Second-Price Auctions
Fill in the matrix of payoffs
Let v=10 be your value for a good.
Let b be your bid.
Let c be the highest bid by
anyone else in the auction.
Your payoff is:
v-c if b > c (you win the auction)
0 if b <= c (you lose the auction)
C=7
C=9
C=11
C=13
B=12
3
1
-1
0
B=10
3
1
0
0
B=8
3
0
0
0
Is there a dominant strategy? Yes, bid b=10
If yes, is the strategy “truth-revealing”? Yes, the dominant
strategy matches the value v=10
Second-Price Auctions
Definition: A second-price auction awards the good to the highest
bidder, and charges a price equal to the second-highest bid.
Some properties (under a bunch of assumptions that I won’t get into):
1. They are pareto efficient
2. They are dominant strategy-truthful: the best strategy is to bid
exactly what you think the good is worth to you.
3. It is always worth it for agents to take part in the auction.
4. The auctioneer will never lose money.
5. These auctions come from a family of auctions called VickreyClarke-Groves mechanisms, and these are the only possible
mechanisms that have the first two properties.
Download