Uploaded by Patrick Hup

Game Theory

advertisement
Methods of PPE I
Part Game Theory
Introduction
Decision theory is the theory about rational decision making.
We can distinguish between:
ο‚·
Normative decision theory studies what rational decision makers ought to do;
ο‚·
Descriptive decision theory tries to explain what decision makers actually do.
Decision matrices and decision trees are models that describe a decision situation for a decision maker
Decision matrices are usually used for one-shot decision situations, while decision trees are used for sequential (or dynamic)
decision situations.
When we describe (or model) a decision situation, we distinguish three levels of abstraction:
1.
The decision problem
2.
A formalisation of the decision problem
3.
A visualization of the formalisation
The
1.
2.
3.
4.
5.
model of a decision problem contains at least four elements:
States
Acts or strategies
Outcomes
Preferences over outcomes
Information of the decision maker
Ad 1. A state represents all that the decision maker has no influence on.
Ad 2. In a one-shot decision model, a decision maker can choose an act from a set 𝐴 = {π‘Ž1, π‘Ž2 , … , π‘Žπ‘ } of acts. So, an act represents
what the agent chooses/does.
In a sequential decision model, a decision maker has to choose a sequence of acts, where the sets of acts to choose
from at some moments might depend on the chosen acts in the past. A sequence of acts is called a strategy.
Ad 3. An outcome is the result of an act (in a one-shot decision model) or strategy (in a sequential decision model).
Ad 4. The outcomes can be ordered based on a preference relation. This preference relation mentions for every two outcomes if
one is preferred to the other or not. In the last case it can be that the decision makes is indifferent between the two outcomes,
or cannot compare them. (We say more about this in the coming slides.)
Ad 5. Given the preferences over the outcomes, the decision maker chooses his/her act or strategy from the set of acts. This can
be done by different criteria or objectives. This also depends on the information of the decision maker: is it decision making under
uncertainty, risk or ignorance?
Transformation of scales:
We can distinguish three types of scales:
1.
Ordinal scale: Based on the numbers you can compare the outcomes. But you can give no meaning to the difference or
ratio between these numbers.
2.
Cardinal scales:
a.
Interval scale: Differences have meaning, but ratios do not.
b.
Ratio scale: Differences and ratios have a meaning.
Transformation of scales:
1.
Ordinal scale:
a.
Maintain preference, e.g. A is better than B, but C is better than D.
2.
Interval scale:
a.
Maintain numerical difference, e.g. difference of 160 between states
3.
Ratio scale:
a.
Maintain ratio in difference, e.g. the difference in state B should be twice the difference in A
Expected utility
Expected value (EV):
ο‚·
Expected monetary value (EMV): ∑𝑛π‘₯=1 𝑝π‘₯ π‘šπ‘₯ = 𝑝1π‘š1 + 𝑝2 π‘š2 + β‹― + 𝑝𝑛 π‘šπ‘›
ο‚·
Expected utility (EU): ∑𝑛π‘₯=1 𝑝π‘₯ 𝑒π‘₯ = 𝑝1 𝑒1 + 𝑝2 𝑒2 + β‹― + 𝑝𝑛 𝑒𝑛
Risk
ο‚·
ο‚·
ο‚·
attitude:
Risk averse: diminishing marginal utility
Risk neutral: constant marginal utility
Risk seeking: increasing marginal utility
Axiomatization of expected utility:
EU1: If all outcomes of an act have utility 𝑒, then the utility of the act is 𝑒.
EU2: If one act is certain to lead to better outcomes under all states than another, then the utility of the first act exceeds that of
the latter; and if both acts lead to equal outcomes they have the same utility.
EU3: Every decision problem can be transformed into a decision problem with equally probable states, in which the utility of all
acts is preserved.
EU4: If two outcomes are equally probable, and if the better outcome is made slightly worse, then this can be compensated for
by adding some amount of utility to the other outcome, such that the overall utility of the act is preserved.
Theorem 4.1
If axioms EU1-4 hold for all decision under risk, then, the utility of an act equals its expected utility.
© 2020 Patrick Hup
Preference relations
Consider a set of alternatives 𝑋.
ο‚·
ο‚·
ο‚·
≻ (strict preference)
≽ (“… at least as good as …”)
∼ (indifference)
Properties of preference relations:
ο‚·
Complete: if and only if for all π‘₯, 𝑦 ∈ 𝑋, it holds that: π‘₯ ≻ 𝑦 or π‘₯ ∼ 𝑦 or 𝑦 ≻ π‘₯.
ο‚·
In words, a preference relation is complete if any two alternatives π‘₯ and 𝑦 can be compared to each other: π‘₯ is better than
𝑦, or 𝑦 is better than π‘₯, or the decision maker is indifferent between π‘₯ and 𝑦.
ο‚·
ο‚·
Asymmetric: if and only if for all π‘₯, 𝑦 ∈ 𝑋, it holds that: if π‘₯ ≻ 𝑦 then ¬π‘¦ ≻ π‘₯.
In words, a preference relation is asymmetric if for any two alternatives at most one can be (strictly) better to the other.
ο‚·
ο‚·
Transitive: if and only if for all π‘₯, 𝑦, 𝑧 ∈ 𝑋, it holds that: [π‘₯ ≻ 𝑦 and 𝑦 ≻ 𝑧] implies that [π‘₯ ≻ 𝑧].
In words, a preference relation is transitive if, whenever the decision makes prefers alternative π‘₯ to alternative 𝑦, and
prefers alternative 𝑦 to alternative 𝑧, then he/she agent prefers alternative π‘₯ to alternative 𝑧.
ο‚·
Negative transitive: if and only if for all π‘₯, 𝑦, 𝑧 ∈ 𝑋, it holds that: [¬π‘₯ ≻ 𝑦 and ¬π‘¦ ≻ 𝑧] implies that [¬π‘₯ ≻ 𝑧].
Utility functions
Definition A real-value function 𝑒 is a utility function representing the preference relation ≻ if and only if:
π‘₯ ≻ 𝑦 if and only if 𝑒(π‘₯) > 𝑒(𝑦)
Remark: a utility function is measuring on an ordinal utility scale.
Theorem 5.1
Let 𝑋 be an finite set of outcomes. Preference relation ≻ can be represented by a utility function 𝑒 if and only if ≻ is complete,
asymmetric and negatively transitive in 𝑋.
Axioms
vNM1 Preference relation ≻ is complete if and only if for all 𝐴, 𝐡 ∈ 𝐿, it holds that: 𝐴 ≻ 𝐡 or 𝐴 ∼ 𝐡 or 𝐡 ≻ 𝐴
vNM2 Preference relation ≻ is transitive if and only if for all 𝐴, 𝐡, 𝐢 ∈ 𝐿, it holds that: [𝐴 ≻ 𝐡 and 𝐡 ≻ 𝐢] and [𝐴 ≻ C]
Remark: Note that these are similar as defined before, but now over lotteries.
A new axiom:
vNM3 Preference relation ≻ satisfied independence if and only if for all 𝐴, 𝐡, 𝐢 ∈ 𝐿 and 0 ≤ 𝑝 ≤ 1, it holds that: [𝐴 ≻ 𝐡] if and only
if [𝐴𝑝𝐢 ≻ 𝐡𝑝𝐢].
In words, if you prefer lottery 𝐴 to lottery 𝐡 then you prefer any lottery between 𝐴 and a third lottery 𝐢 to the lottery between 𝐡
and 𝐢 with the same probabilities.
Theorem 5.2
Preference relation ≻ satisfies vNM 1-4 if and only if it can be represented by a utility function 𝑒 satisfying:
(i)
𝐴 ≻ 𝐡 if and only if 𝑒(𝐴) > 𝑒(𝐡)
(ii)
𝑒(𝐴𝑝𝐡) = 𝑝𝑒(𝐴) + (1 − 𝑝)𝑒(𝐡)
(iii)
for every other function 𝑒′ satisfying (i) and (ii) there are numbers 𝑐 > 0 and 𝑑 such that 𝑒′ = 𝑐 ⋅ 𝑒 + 𝑑
A utility function as in Theorem 5.2 is called a von Neumann-Morgenstern (expected) utility function
This theory is fundamental in the development of decision theory in particular game theory.
When the preferences of a decision making axioms vNMI1-vNM4 then the decision maker can make its decision based on
maximizing expected utility.
Decisions under ignorance
In decision making under risk, the decision maker knows the probability of the possible outcomes.
In decision making under ignorance, the decision maker does not know the probability of the possible outcomes (or these
probabilities do not exist).
So, when making decisions under risk, you can use the probability distribution to calculate the expected payoff. This cannot be
done when making decisions under ignorance.
What to do if you do not know the probability of the states?
There are several criteria that can be used.
Strong dominance
π‘Žπ‘– ≻ π‘Žπ‘— if and only if 𝑣(π‘Žπ‘– , 𝑠) ≥ 𝑣(π‘Žπ‘— , 𝑠) for every state 𝑠, and there is at least one state 𝑠𝑛 such that 𝑣(π‘Žπ‘– , 𝑠𝑛 ) > 𝑣(π‘Žπ‘— , 𝑠𝑛 )
We say that π‘Žπ‘– is a strongly dominant act if π‘Žπ‘– ≻ π‘Žπ‘— for all π‘Žπ‘— ≠ π‘Žπ‘– .
If a rational decision maker uses this criterion then he/she chooses the strongly dominant act (if it exists).
Similar for Weak dominance
π‘Žπ‘– ≽ π‘Žπ‘— if and only if 𝑣(π‘Žπ‘– , 𝑠) ≥ 𝑣(π‘Žπ‘— , 𝑠) for every state 𝑠
If there is no (strongly) dominant act, then there can still be a (strongly) dominated act
© 2020 Patrick Hup
We say that act π‘Žπ‘— is a strongly dominated act π‘Žπ‘– if π‘Žπ‘– ≻ π‘Žπ‘— .
If a rational decision maker uses this criterion then he she does not choose a strongly dominated act. Similar for Weak dominance.
Notice that π‘Žπ‘– is strongly dominant act if and only if all other acts are strongly dominated by act π‘Žπ‘– .
Decision rules:
Maximin
Maximize the minimum valuable obtainable with each act
min(π‘Žπ‘π‘‘ π‘Ž) = min(10, 5, −5, −10) = −10
min(π‘Žπ‘π‘‘ 𝑏) = min(6, 5, π‘₯, −5) = −5
min(π‘Žπ‘π‘‘ 𝑐) = min(2,2,2,2) = 2
So, π‘šπ‘Žπ‘₯π‘–π‘šπ‘–π‘š chooses ‘π‘Žπ‘π‘‘ 𝑏’
Leximin
If the worst outcomes are equal, one should choose an alternative in which the second worst outcome is as good as possible.
min(π‘Žπ‘π‘‘ π‘Ž) = min(−5, 6, 9) = −5
min(π‘Žπ‘π‘‘ 𝑏) = min(5, 12, −7) = −7
min(π‘Žπ‘π‘‘ 𝑐) = min(−10, 5, 0) = −10
min(π‘Žπ‘π‘‘ 𝑑) = min(10, −5, 7) = −5
So, its π‘Žπ‘π‘‘ π‘Ž or π‘Žπ‘π‘‘ 𝑑
Act a: min(π‘Žπ‘π‘‘ π‘Ž) = min(−5, 6, 9) = 6
Act d: min(10, −5, 7) = 7
So, choose ‘π‘Žπ‘π‘‘ 𝑑’
Note that ff also the second worst outcomes are equal, compare the third worst outcomes, etc.
Maximax
Maximize the maximal value obtainable with an act.
max(π‘Žπ‘π‘‘ π‘Ž) = max(−5, 6, 9) = 9
max(π‘Žπ‘π‘‘ 𝑏) = max(5, 12, 7) = 12
max(π‘Žπ‘π‘‘ 𝑐) = max(−10, 5, 0) = 5
max(π‘Žπ‘π‘‘ 𝑑) = max(10, −5, 7) = 10
So, choose ′π‘Žπ‘π‘‘ 𝑏’
Optimism-Pessimism Rule
Consider both the best and the worst possible outcome of each alternative, and then choose an alternative to her degree of
optimism or pessimism.
The decision maker is optimistic to degree 0.7
Act A:
max(π‘Žπ‘π‘‘ π‘Ž) = max(−5, 6, 9) = 9
min(π‘Žπ‘π‘‘ π‘Ž) = min(−5, 6, 9) = − 5
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ = 0.7 ⋅ 9 + (1 − 0.7) ⋅ −5 = 4.8
Act B:
max(π‘Žπ‘π‘‘ 𝑏) = max(5, 12, 7) = 12
min(π‘Žπ‘π‘‘ 𝑏) = min(5, 12, 7) = 5
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ = 0.7 ⋅ 12 + (1 − 0.7) ⋅ 5 = 9.9
Act C:
max(π‘Žπ‘π‘‘ 𝑐) = max(−10, 5, 0) = 5
min(π‘Žπ‘π‘‘ 𝑐) = min(−10, 5, 0) = 0)
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ = 0.7 ⋅ 5 + (1 − 0.7) ⋅ 0 = 3.5
Act D:
max(π‘Žπ‘π‘‘ 𝑑) = max(10, −5, 7) = 10
min(π‘Žπ‘π‘‘ 𝑑) = min(10, −6, 7) = −6
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘£π‘Žπ‘™π‘’π‘’ = 0.7 ⋅ 10 + (1 − 0.7) ⋅ −6 = 5.2
So, choose ‘act B’
© 2020 Patrick Hup
Minimax Regret
Choose an alternative under which the maximum regret value is as low as possible.
Regret is calculated by subtracting the value of the best outcome of each state from the value of the outcome in question.
max(πΊπ‘Ÿπ‘œπ‘€π‘–π‘›π‘” π‘’π‘π‘œπ‘›π‘œπ‘šπ‘¦) = max(8, −10, 0) = 8
max(π‘†β„Žπ‘Ÿπ‘–π‘›π‘˜π‘–π‘›π‘” π‘’π‘π‘œπ‘›π‘œπ‘šπ‘¦) = max(4, −3, 0) = 4
max(π‘†π‘‘π‘Žπ‘π‘™π‘’ π‘’π‘π‘œπ‘›π‘œπ‘šπ‘¦) = max(2, 2, 2) = 2
The regret matrix is
max(−12, −5, −6) = −5
So, according to the minimax regret criterion choose ‘Firm B’.
The Principle of Insufficient Reason
If one has no reason to think that one state of the world is more probable than another, then all states should be assigned equal
probability.
1
No lockdown: (10 + 5 + (−5) + (−10)) = −
4
1
Partial lockdown: (6 + 5 + π‘₯ + (−5))
1
4
Lockdown: (2 + 2 + 2 + 2) = 2
4
So, partial lockdown is chosen according to the Principle of Insufficient Reason if
Randomized acts
[Insert theory]
© 2020 Patrick Hup
6+π‘₯
4
> 2, so π‘₯ > 2
Game Theory
Game Theory (also called Interdependent or Interactive Decision Theory) studies situations with more than one decision maker
where the decision makers are aware of the influence of their actions on the choice behaviour of the other decision makers.
In game theory we refer to the decision makers as players.
A game needs to describe the following elements
ο‚·
Players: decision makers
ο‚·
Rules: which player moves when?
ο‚·
Actions: possible moves of a player
ο‚·
Outcomes: for each set of actions (or strategies)
ο‚·
Payoff: the utility a player receives
ο‚·
Information: the knowledge a player has of the relevant variables at a certain point in the game
Taxonomy:
ο‚·
ο‚·
In zero-sum games, in every outcome of the game the sum of all payoffs equal to zero. These games reflect strong
competition.
In nonzero-sum games different outcomes of the game can have different total sum of payoffs.
ο‚·
ο‚·
Noncooperative games: no previous binding commitments.
Cooperative games: binding agreements during pre-play negotiations.
ο‚·
ο‚·
Simultaneous-move game: each player moves once, without knowing the move of the players. A move is called an action.
Sequential-move games: players do not move all at the same time, and it can be that a player moves more than once. Each
move is called an action, and a contingent plan where a player chooses an action at every moment he/she can make a move
is called a strategy.
Normal form games
A simultaneous move game is usually modelled as a normal form game
Definition A normal form game consists of
(i) A set of player 𝑁 = {1, … , 𝑛}
(ii) for every player 𝑖 ∈ 𝑁 a set of (pure) strategies 𝑆𝑖 , and
(iii) for every player 𝑖 ∈ 𝑁 a payoff (utility) function 𝑒𝑖 over the possible strategy profiles (outcomes of the games)
Here 𝑒𝑖 (𝑠1, 𝑠2, … , 𝑠𝑛 ) is the payoff for player 𝑖 when player 1 plays 𝑠1, player 2 players 𝑠2, etc.
So, the payoff of a player depends on the strategies chosen by all players.
This interactive elements is essential in game theory.
We assume that the payoff functions are von Neumann-Morgenstern utility functions, see Chapter 5/Lecture DT2A.
A tuple 𝑠 = (𝑠1, 𝑠2, … , 𝑠𝑛 ) is called a strategy profile.
It consist of 𝑛 strategies, one for each player.
Let 𝑆 = ∏𝑖∈𝑁 𝑆𝑖 be the space of all strategy profiles.
Sometimes we write a strategy profile as 𝑠 = (𝑠𝑖 , 𝑠−𝑖 ), with 𝑠−𝑖 ∈ 𝑆−𝑖 = ∏𝑗∈𝑁{𝑖} 𝑆𝑗 being the set of strategy profiles of ‘the other
players’.
Dominant strategies
Definition A strategy 𝑠𝑖 is a (strictly) dominant strategy for player 𝑖 if against every strategy profile of the other players, it
gives player 𝑖 a higher payoff than any other strategy player 𝑖 could play.
So, a strategy 𝑠𝑖 is a (strictly) dominant strategy for player 𝑖 if
𝑒𝑖 (𝑠𝑖 , 𝑠−𝑖 ) > 𝑒𝑖 (𝑠𝑖′ , 𝑠−𝑖 ) for all 𝑠−𝑖 ∈ 𝑆−𝑖 and all 𝑠𝑖′ ∈ 𝑠−𝑖 /{𝑠𝑖 }
Definition A strategy profile 𝑠 = (𝑠1, 𝑠2, … , 𝑠𝑛 ) is Pareto dominated if there exists a strategy profile 𝑠 ′ = (𝑠1′, 𝑠2′, … , 𝑠2′) such that
𝑒1 (𝑠 ′ ) > 𝑒𝑖 (𝑠) for all 𝑖 ∈ 𝑁
A strategy profile is Pareto optimal if it is NOT Pareto dominated.
The Prisoner Dilemma shows that the strategy profile that results if all players play a dominant strategy need not be Pareto
optimal.
Dominated strategies
Mostly there is no strictly dominant strategies.
In these cases there can exist strictly dominated strategies.
Example
This game does not have a strictly dominant strategy.
However, it has a strictly dominated strategy, namely strategy R2 for player Row.
© 2020 Patrick Hup
Definition A strategy 𝑠𝑖 is a strictly dominated if there is another strategy 𝑠𝑖′ ≠ 𝑠𝑖 for player 𝑖 such that against every strategy
profile of the other players 𝑠𝑖′ gives player 𝑖 a higher payoff than 𝑠𝑖 .
So a strategy 𝑠𝑖 is a strictly dominated if there is another strategy 𝑠𝑖′ ≠ 𝑠𝑖 , such that
𝑒𝑖 (𝑠𝑖 , 𝑠−𝑖 ) < 𝑒𝑖 (𝑠𝑖′ , 𝑠−𝑖 ) for all 𝑠−𝑖 ∈ 𝑆−𝑖
ο‚·
ο‚·
ο‚·
ο‚·
ο‚·
A rational player will not play a strictly dominated strategy.
Note that this says nothing about strategy 𝑠𝑖′ , but only that strategy 𝑠𝑖 will not be played.
There are games which have no strictly dominated strategies.
If a game has a strictly dominant strategy (for a player), then it also has strictly dominated strategies (all the other
strategies of that player).
Alternative definition of strictly dominant strategy: a strategy which strictly dominates all the other strategies of that
player.
Iterated Elimination of Strictly Dominated Strategies
Consider again the previous example
What would you do (as player Row or Col)?
We already saw that R2 is dominated
What would you do as player Col
Consider again the previous example:
What would you do (as payer Row or Col)?
What next?
What is left after iterated elimination of dominated strategies is (R1,C2)
Step 1. If there are no strictly dominated strategies, then stop. Otherwise go to step 2.
Step 2. Choose a player that was not chosen in the previous round and delete at least one (maybe even all of its strictly
dominated strategies in the reduced game. Go to step 3.
Step 3. Consider the reduced game (the game that is left after step 2) and return to step 1.
The final reduced game is independent of the order of elimination.
So, any player can find this reduced game by itself when it knows all payoffs.
Rationality requirement is limited: it is required that each player is able to make (a finite number of) comparisons.
Weak Dominance
What would you play in the game?
There are no strictly dominated strategies
We say that R2 is weakly dominated (by R1) for player Row
(Similar, C2 is a weakly dominated strategy for player Col.)
© 2020 Patrick Hup
Definition A strategy 𝑠𝑖 is a weakly dominated strategy for player 𝑖 if there is another strategy 𝑠𝑖′ such that every strategy
profile of the other players, 𝑠𝑖′ gives player 𝑖 at least the same payoff than 𝑠𝑖 , and there is at least one strategy profile of the
other players such that 𝑠𝑖′ gives player 𝑖 a higher payoff than 𝑠𝑖 .
Formally, a strategy 𝑠𝑖 is a weakly dominated if there is another strategy 𝑠𝑖 ≠ 𝑠𝑖 , such that
𝑒1 (𝑠𝑖 , 𝑠−𝑖 ) ≤ 𝑒𝑖 (𝑠𝑖′ , 𝑠−𝑖 ) for all 𝑠−𝑖 ∈ 𝑆−𝑖
and there is at least one 𝑠−𝑖 ∈ 𝑆−𝑖 such that
𝑒1 (𝑠𝑖 , 𝑠−𝑖 ) < 𝑒𝑖 (𝑠𝑖′ , 𝑠−𝑖 )
Definition A strategy is weakly dominant for a player if it weakly dominates all the other strategies of that player.
Remarks:
The order of elimination matters by iterative elimination of weakly dominated strategies. Therefore, usually we do not do that.
Moreover you can eliminate Nash equilibria.
Game Theory: Nash Equilibrium
A rational player will try to maximize his/her payoff
Each player will choose a strategy in response to the strategies of all other players that maximizes his/her own payoff.
The result is called a Nash equilibrium.
Definition A strategy profile is a (pure) Nash equilibrium if and only if it holds that once every player choses its strategy, then
none of the players could reach a better outcome by unilaterally switching to another strategy.
Formally,
A strategy profile 𝑠 = (𝑠1, 𝑠2, … , 𝑠𝑛 ) is a (pure) Nash equilibrium if for every player 𝑖
𝑒𝑖 (𝑠𝑖 , 𝑠−1) ≥ 𝑒𝑖 (𝑠𝑖′ , 𝑠−1) for all 𝑠𝑖′ ∈ 𝑠𝑖
Remark: Note that a Nash equilibrium is a strategy profile, while a dominant (dominated strategy) is a strategy for some player
Remarks: A (pure) Nash equilibrium is always one of the pure strategy profiles which is left after iterative elimination of (pure)
strictly dominated strategies.
Generally, there are less pure Nash equilibria than pure strategy profiles that remain after the iterative elimination of str ictly
dominated pure strategies.
As mentioned in Lecture DT2, this does not have to be the case by the iterative elimination of weakly dominated strategies.
An alternative definition considers the best response of a player.
A strategy 𝑠𝑖 is a best response of player 𝑖 against the strategy profiles 𝑠−𝑖 of the other players, if it maximizes the payoff of
player 𝑖 when the other players play their strategy in 𝑠−𝑖 .
Then a strategy profile is a (pure) Nash equilibrium if and only if every player plays a best response against the others.
Example
Best response of player Row:
R1 is best response against C1
R1 is best response against C2
R1 and R3 are best response against C3
Best response of player Col:
C1 and C3 is best response against R1
C2 and C3 is best response against R2
C1 is best response against R3
© 2020 Patrick Hup
One (pure) Nash equilibrium: (R1,C3)
Some classic examples
We already saw one of the most famous games, the Prisoner dilemma, in Lecture DT2.
Another example: Coordination game
Both players prefer to do something together, but Row prefers Bar to Cinema, and Col prefers Cinema to Bar.
(Pure) Nash equilibria are (Bar, Bar) and (Cinema, Cinema)
Remark: A game like this is also known as ‘Battle of the Sexes’
Hawk Dove Game
The best for a player is that he/she fights and the opponent does not. Then the ‘fighter’ wins.
The worst that can happen for both players is that both fight (disaster).
It is still better to lose from the other when you do not fight and the other does.
(Pure) Nash equilibria are (Fight, Truce) and (Truce, Fight).
Remark: A game like this is also known as ‘Chicken’.
Mixed strategies
A pure Nash equilibrium does not have to be unique.
In other words, a game can have more than one pure Nash equilibrium.
Not every game has a pure Nash equilibrium, see ‘Matching Pennies’.
A mixed strategy for a player is a probability distribution over its pure strategies.
Every normal form game has at least one mixed Nash equilibrium.
Each player will choose a mixed strategy in response to the mixed strategies of all other players that maximizes his/her own
expected payoff.
The result is called a mixed Nash equilibrium.
Finding Mixed Nash Equilibria
We illustrate this with an example of a game with two players who each have two pure strategies.
Notation:
𝑝: Probability that player Row plays R1 (0 ≤ 𝑝 ≤ 1)
π‘ž: Probability that Player col plays C1 (0 ≤ π‘ž ≤ 1)
Then
1-𝑝: Probability that player row plays R2
1-π‘ž: Probability that player Co plays C2
© 2020 Patrick Hup
Note that 𝑝 described a mixed strategy for player Row, and π‘ž describes a mixed strategy for player Col.
ο‚·
The Expected Payoff 𝑒1 (𝑝, π‘ž) of player Row (player 1) depends on 𝑝 and π‘ž and is given by:
𝑒1 (𝑝, π‘ž) =
= 3π‘π‘ž + 𝑝(1 − π‘ž) + 2(1 − 𝑝)π‘ž + 4(1 − 𝑝)(1 − π‘ž)
= 3π‘π‘ž + 𝑝 − π‘π‘ž + 2π‘ž − 2π‘π‘ž + 4 − 4𝑝 − 4π‘ž + 4π‘π‘ž
= 4π‘π‘ž − 3𝑝 − 2π‘ž + 4
Similar, the Expected Payoff 𝑒2 (𝑝, π‘ž) of player Col depends on 𝑝 and π‘ž and is given by
𝑒2 (𝑝, π‘ž) =
= 0π‘π‘ž + 𝑝(1 − π‘ž) + 2(1 − 𝑝)π‘ž + (1 − 𝑝)(1 − π‘ž)
= 0π‘π‘ž + 𝑝 − π‘π‘ž + 2π‘ž − 2π‘π‘ž + 1 − 𝑝 − π‘ž + π‘π‘ž
= −2π‘π‘ž + π‘ž + 1
ο‚·
𝑝 = 0 (R2) is a best response for player Row
If and only if 𝑒1(0, π‘ž) ≥ 𝑒1 (𝑝, π‘ž) ∀0 < 𝑝 ≤ 1
If and only if −2π‘ž + 4 ≥ 4π‘π‘ž − 3𝑝 − 2π‘ž + 4
If and only if π‘ž ≤ 3/4
ο‚·
𝑝 = 1 is a best response for player 1
If and only if π‘ž ≥ 3/4
ο‚·
0 ≤ 𝑝 ≤ 1 is a best response for player 1
If and only if π‘ž = 3/4
ο‚·
π‘ž = 0 is a best response for player 2 if and only if 𝑒2 (𝑝, 0) ≥ 𝑒2 (𝑝, π‘ž)
if and only if 1 ≥ −2π‘π‘ž + π‘ž + 1
if and only if 𝑝 ≥ 1/2
∀0 < π‘ž ≥ 1
ο‚·
π‘ž = 1 is a best response for player 2
if and only if 𝑝 ≤ 1/2
ο‚·
0 ≤ 𝑝 ≤ 1 is a best response for player 2
If and only if 𝑝 = 1/2
1
3
2
4
This gives the unique (mixed) Nash equilibrium: 𝑝 = , π‘ž =
In the Nash equilibrium player 1 plays R1 and R2 both with probability ½,
while player 2 plays C1 with probability ¾ and C2 with probability ¼.
An alternate way to find mixed Nash equilibria is to use partial differentiation:
First order condition for maximizing 𝑒1(𝑝, π‘ž) is
πœ•π‘’2 (𝑝. π‘ž)
= 4π‘ž − 3 = 0
πœ•π‘
3
so, π‘ž =
4
First order condition for maximizing 𝑒2 (𝑝, π‘ž) is
so, 𝑝 =
πœ•π‘’2 (𝑝. π‘ž)
= −2𝑝 + 1 = 0
πœ•π‘
1
2
A social choice problem is any decision problem faced by a group in which each individual is able to state ordinal preferences
over outcomes.
© 2020 Patrick Hup
Social Choice: Social Choice and Welfare functions
A social choice problem consists of a set of individual decision makers, and for each decision maker a preference relation.
We distinguish two types of preference aggregation:
ο‚·
ο‚·
A Social Choice Function assigns to every social choice problem one or more alternatives which can be considered as the
alternatives that are chosen by society.
A Social Welfare Function assigns to every social choice problem one preference relation which can be seen as the ‘social
preference relation’.
Social choice problems
We consider a society with a finite set of agents or individuals who can choose among a finite set of alternatives.
The society should come to one collective decision (choice of one alternative) taking into account the preferences of the individual
agents.
Definition
Given a set of alternatives 𝐴 = {π‘Ž1 , … , π‘Žπ‘› } and a finite set of agents 𝑁 = {1, … , 𝑛}, a preference profile is a tuple 𝐺 = (≽𝑖 )𝑖∈𝑁 with
≽𝑖 a preference relation on 𝐴, for 𝑖 ∈ 𝑁.
A social choice problem is a triple (𝑁, 𝐴, 𝐺) where
ο‚·
𝑁 is a finite set of agents
ο‚·
𝐴 is a finite set of alternatives, and
ο‚·
𝐺 = (≽𝑖 )𝑖∈𝑁 is a preference profile
Since, we take the set of agents 𝑁 as well as the set of alternatives 𝐴 as given, we represent a social choice problem (𝑁, 𝐴, 𝐺) just
by its preference profile 𝐺.
We make the following assumption.
Assumption: All individual preference relations are transitive, complete and asymmetric
Consequently, we can denote the preference relation 𝑖 ∈ 𝑁 by ≻𝑖 , and a preference profile by (≻𝑖 )𝑖∈𝑁
π‘Ž ≻𝑖 𝑏 means that agent 𝑖 considers alternative π‘Ž ‘better than’ alternative 𝑏.
Social choice functions
Given the preferences of the individual agents, there are two main questions:
ο‚·
How do/should the agents choose one alternative together for the whole society (Social choice function)
ο‚·
Is it possible to derive a social preference relation reflecting the preferences of the society as a whole? (Social welfare
function)
Note that both questions are relevant both from a normative as well as a descriptive viewpoint.
Two viewpoints that have been taken in the literature are
ο‚·
cooperative viewpoint where a benevolent dictator tries to do what is ‘best’ for society
ο‚·
a strategic viewpoint where, by voting, agents can strategically manipulate the voting outcome
A social choice function 𝐢 assigngs to every preference profile 𝐺 a subset of the set of alternatives 𝐴, i.e. 𝐢(𝐺) ⊆ 𝐴.
The set of 𝐢(𝐺) is called the social choice set associated to preference profile G.
Remark: Social choice functions are also called voting rules, or shortly rules.
Social welfare functions
Instead of only making a (social) choice, we might want to know the full social preference relation for a social choice situation.
A social welfare function 𝐹 assigns a preference relation to every social choice situation.
Most social choice and social welfare functions fall into one of the following categories:
1.
Scoring functions (Borda)
2.
Majoritarian functions (Condorcet)
Ad1. Scoring scf’s and swf’s assigns scores (points) to the alternatives in every preference profile, and the ‘winner’ is the
alternative that has the highest sum of scores over all individual agents.
Ad2. Majoritarian scf’s and swf’s derive from each social choice problem one preference relation (the social preference relation)
and based on this relation dertemine who is the ‘winner’.
We abbreviate:
ο‚·
scf: social choice function
ο‚·
swf: social welfare function
© 2020 Patrick Hup
Plurality social welfare function
ο‚·
The plurality scf chooses from the alternatives by only considering what are the best alternatives for each agent. It chooses
the alternatives that are best for the highest number of agents.
Example
Agent 1
a
b
c
d
Agent 2
a
b
c
d
Agent 3
a
b
d
c
Agent 4
c
b
d
a
Agent 5
c
b
c
a
π‘π‘™π‘’π‘Ÿ(𝐺) = (π‘Ž, 𝑏, 𝑐, 𝑑) = (3, 0, 1. 1)
So, plurality winner is alternative π‘Ž
Antiplurality social choice function
ο‚·
The antiplurality scf chooses the alternatives by only considering what are the worst alternatives for each agent. It chooses
the alternatives that are worst for the lowest number of agents.
Consider the previous example
π‘Žπ‘›π‘‘π‘–π‘π‘™π‘’π‘Ÿ(𝐺) = (2, 0, 1, 2)
So, the antiplurality winner is alternative 𝑏.
Borda social welfare function
ο‚·
The borda swf assigns points to all alternatives, and the ‘winner’ is the alternative with the highest number of points when
summing over all the agents.
Consider
Agent 1:
Agent 2:
Agent 3:
the previous example
π‘Žβ‰»π‘β‰»π‘β‰»π‘‘β‰»π‘’
π‘β‰»π‘‘β‰»π‘’β‰»π‘β‰»π‘Ž
π‘β‰»π‘β‰»π‘Žβ‰»π’…β‰»π‘’
π‘π‘œπ‘Ÿπ‘‘π‘Ž(≻1) = (4, 3, 2, 1, 0)
π‘π‘œπ‘Ÿπ‘‘π‘Ž(≻2) = (0, 1, 4, 3, 2)
π‘π‘œπ‘Ÿπ‘‘π‘Ž(≻3) = (2, 4, 3, 1, 0)
Then, the total Borda scores are π΅π‘œπ‘Ÿπ‘‘π‘Ž(𝐺) = (6, 8, 9, 5. 2)
The Borda winner is alternative 𝑐.
Condorcet social welfare function (Majority rule)
ο‚·
Is the set of alternatives are ‘best elements’ in the social preference relation ≽𝐺
ο‚·
The majority rule (Condorcet rule) gives social preference relation π‘Ž ≽𝐺 𝑏 ≽𝐺 𝑐 ≽𝐺 𝑑.
Consider
Agent 1:
Agent 2:
Agent 3:
the previous example
π‘‘β‰»π‘Žβ‰»π‘β‰»π‘
π‘β‰»π‘β‰»π‘‘β‰»π‘Ž
π‘β‰»π‘‘β‰»π‘Žβ‰»π‘
The majority relation is
π‘Ž ≽𝐺 𝑏
𝑐 ≽𝐺 π‘Ž
𝑑 ≽𝐺 π‘Ž
𝑑 ≽𝐺 𝑏
𝑐 ≽𝐺 𝑑
So, there is no Condorcet winner.
Remark: The majority relation ≽𝐺 need not be transitive.
© 2020 Patrick Hup
Social Choice: Properties of Social Choice and Social Welfare Functions, Sen’s liberalism, Restricted Domains
Properties of social welfare functions
Definition A group of people 𝐷 (which may be a single-member group), which is part of the group of all individuals, is decisive
with respect to the ordered pair of social states (π‘Ž, 𝑏) if and only if state π‘Ž is socially preferred to 𝑏, whenever everyone in 𝐷
prefers π‘Ž to 𝑏. A group that is decisive with respect to all pairs of social states is called decisive.
Property
A social welfare function satisfied non-dictatorship if and only if no single individual is decisive.
Property
A social welfare function satisfies ordering if and only if for every possible combination of individual preference relations, the
social preference relation is complete, asymmetric and transitive.
Property
A social welfare function satisfied Pareto efficiency if and only if the group of all individuals in society is decisive.
Property
A social welfare function satisfies independence of irrelevant alternatives (IIA) if and only if all individuals having the same
preference between π‘Ž and 𝑏 in two different preference profiles 𝐺 and 𝐺 ′ , implies that society’s preference between π‘Ž and 𝑏 must
be the same in 𝐺 and 𝐺 ′ .
Arrow’s Impossibility Theorem
Theorem (Arrow’s impossibility theorem)
If there are at least three alternatives, then no Social Welfare Function satisfied independence of irrelevant alternatives, Pareto
efficiency, non-dictatorship and the ordering condition.
Remark: This is Theorem 13.1 in the book.
Corollary if there are at least three alternatives, then every Social Welfare Function that satisfied Pareto efficiency, the ordering
condition and IIA, must be dictatorial.
Remark: There exist non-dictatorial social welfare function that satisfy unanimity and IIA on restricted domains. For examples, if
preferences are single-peaked, then the Condorcet social welfare function satisfied IIA and is Pareto efficient.
Sen on Liberalism
Property
A social welfare function satisfied minimal liberalism if and only if there are at least two individuals in society such that for each
of them there is at least one pair of alternatives with respect to which she is decisive, that is, there is a pair of π‘Ž and 𝑏, such that
if she prefers π‘Ž to 𝑏, then society prefers π‘Ž to 𝑏 (and society prefers π‘Ž to 𝑏 if she prefers π‘Ž to 𝑏).
Theorem
There is no Social Welfare Function that satisfies minimal liberalism, Pareto efficiency and the ordering condition
Remark: This is Theorem 13.2 in the book. (Paradox of Paretian Liberal)
Properties of social choice functions
Next, we discuss an impossibility result for social choice functions.
Assumption: We assume the social choice function to tb single valued, i.e. to every social choice problem it assings a unique
choice (the choice set is a singleton).
Remark: Note that this is a rather strong assumption. Moreover, it is an assumption on the ‘outcome’ (what is assigned by the
social choice function), and not an assumption on the preference profile.
Definition
Agent 𝑗 ∈ 𝑁 has a successful manipulation in preference profile 𝐺 = (≻𝑖 )𝑖 ∈ 𝑁 if by ‘misreporting’ his/her preferences (i.e. stating
a difference preference relation ≻′𝑖 than its real preference relation ≻𝐼 ) while the other agents do not change their preference
relation, the social choice is better for agent 𝑖.
Property
A social choice function is strategy-proof if for every preference profile there is no agent who has a successful manipulation.
So, a social choice function is strategy-proof if misreporting is never beneficial for any agent.
Property
A social choice 𝐢 is dictatorial if there is always an individual agent whose unique best element is always the social choice.
Formal definition:
A social choice function 𝐢 is dictatorial if there is agent 𝑖 ∈ 𝑁 such that, for every preference profile 𝐺,
π‘Ž ≻𝑖 𝑏 for all 𝑏 ∈ 𝐴\{π‘Ž} ⇒ 𝐢(𝐺) = {π‘Ž}
Gibbard-Satherwaite impossibility Theorem
Theorem (Gibbard-Satherwaite Theorem)
If there are at least three alternatives, then there is no Social Choice Function that is strategy-proof and non-dictatorial.
Phrased differently
Corollary If there are at least three alternatives, then every strategy-proof social choice function is dictatorial.
© 2020 Patrick Hup
Remark: Similar as with Arrow’s impossibility theorem for social welfare functions, if we restrict the domain (i.e. we do not allow
all preference relations), then there might be strategy-proof social choice functions that are not dictatorial. An example of such
a domain are agent single-peaked preferences.
Remark: Dowding and van Hees (2007) argue that manipulation might be a virtue from a democratic perspective
Strategic manipulation
Consider the following preference profile with 5 agents and 4 alternatives
Agent
Agent
Agent
Agent
Agent
1
2
3
4
5
a(3)
a(3)
a(3)
c(3)
d(3)
b(2)
b(2)
b(2)
b(2)
b(2)
c(1)
c(1)
d(1)
d(1)
c(1)
d(0)
d(0)
c(0)
a(0)
a(0)
(In red are the Borda scores for the individual agents).
The total Borda scores of the alternatives are π΅π‘œπ‘Ÿπ‘‘π‘Ž(𝐺) = (9, 10, 6, 5), so the Borda winner is alternative b.
Agent 1 has a strategic manipulation: acdb
Then the total Borda scores are π΅π‘œπ‘Ÿπ‘‘π‘Ž(𝐺) = (9, 8, 7, 6)
Single-peaked preferences
A preference relation is single-peaked if the alternatives can be put in a line (order) such that there is a best alternative π‘Ž∗, and
every alternative 𝑏 that ‘lies between’ π‘Ž and π‘Ž∗ is considered better than alternative π‘Ž.
So, there is a unique best alternative, and if you ‘walk away’ from that alternative in either direction (left or right) your utility
decreases.
So, we consider a set of alternatives that can be ordered on a line, in other words let 𝐴 = {π‘Ž1 , π‘Ž2 , … , π‘Ž3 } with π‘Žπ‘˜ ∈ 𝑁 be such that
π‘Žπ‘˜ < π‘Žπ‘˜+1 for all π‘˜ ∈ {1, … , π‘š − 1}
Example: 𝐴 = {1, 2, … , π‘š}
Definition Preference relation ≽𝑖 on 𝐴 is single-peaked if
1.
there is an π‘Ž∗ ∈ 𝐴 such that π‘Ž∗ ≻𝑖 𝑏 for all 𝑏 ∈ 𝐴\{π‘Ž∗ }, and
2.
for all π‘Ž, 𝑏 ∈ 𝐴 it holds that:
if π‘Ž < 𝑏 < π‘Ž∗ then 𝑏 ≻𝑖 π‘Ž;
and
if π‘Ž > 𝑏 > π‘Ž∗ then 𝑏 ≻𝑖 π‘Ž
Remark: a single-peaked preference relation would not be complete.
Some examples of a single-peaked preferences on 𝐴 = {1, 2, … , 100}
ο‚·
π‘Ž ≻𝑖 𝑏 iff π‘Ž < 𝑏
(1 ≻𝑖 2 ≻𝑖 3 ≻𝑖 … )
ο‚·
π‘Ž ≻𝑖 𝑏 iff π‘Ž > 𝑏
(100 ≻𝑖 99 ≻𝑖 98 ≻𝑖 … )
ο‚·
π‘Ž ≻𝑖 𝑏 iff |π‘Ž − 4| < |𝑏 − 4|
2 ≻𝑖 1, 2 ≻𝑖 7, 3 ≻𝑖 2, …
Theorem
If all preference relations ≻𝑖 , 𝑖 ∈ 𝑁, are single-peaked, then the majority relation ≽𝐺 is completely and transitive.
Corollary
If all preference relations ≻𝑖 , 𝑖 ∈ 𝑁, are single-peaked, then a Condorcet winner exist.
Theorem
If all preference relations ≻𝑖 , 𝑖 ∈ 𝑁, are single-peaked, then Condorcet rule is strategy-proof
Remarks: For the Condorcet rule only the peaks matter.
No scoring rule is strategy-proof.
The Condorcet winner is that alternative π‘Ž∗ such that the number of agents that has it peak ‘to the left’ of π‘Ž∗ is equal to the
number of agents that has it peak ‘to the right’ of π‘Ž∗. (Median voter)
© 2020 Patrick Hup
Download