강의자료 - Soft Computing Lab.

advertisement
Evolutionary Computation, 2009
Evolutionary Iterated
Prisoner’s Dilemma Game
H.-T. Kim
Outline
• Evolutionary Prisoner's Dilemma Game
– Prisoner's Dilemma Game
– Iterated Prisoner's Dilemma Game
– N-person Iterated Prisoner's Dilemma Game
– Robert Axelrod’s nIPD game
• Evolution of Iterated Prisoner's Dilemma Game Strategies in
Structured Demes Under Random Pairing in Game Playing
• Simulation on Worksite Interactions between Laborers and
Firms by using Multi-Agent based Evolutionary Computation
1
Prisoner's Dilemma Game
<죄수의 딜레마 게임의 예>
어느 날 SC랩 공용통장에서 거액의 연구지원금이 사라진 사건이 발생, 교수님이 범인으로 승현이와 주
원이를 지목했다. 하지만 물증은 없는 상황. 그래서 교수님은 그 두 명을 각각 따로 방으로 불러서 다음
과 같이 말했다.
‘만약 범행을 순순히 시인한다면 정직한 너한테만 특별히 이번 달 연구비를 2배로 주고 졸업도 일찍 시
켜주겠다. 하지만 괜히 입다물고 있다가 다른 사람이 자백하면 그 사람만 혜택을 주고, 너는 석사졸업
후 6년짜리 박사과정으로 보내겠다.’
하지만 교수님은 만약 둘 다 자백하면 혜택은 전혀 주지 않을 계획이다. 이 상황에서, 어떻게 하는 것이
승현과 주원의 가장 합리적인 선택일까?
‘승현’의
‘죄수
A’의
payoff table
승현
A
주원
B
자백 (배반)
침묵 (협력)
자백
(배반)
교수님의
4년 미움
조기졸업
0년
연구비 2배
침묵
(협력)
6년짜리
10년
박사과정
교수님의
1년 의심
내쉬 균형!
2
Prisoner's Dilemma Game
• 게임의 특징
– 1950 년대에 Merrill Flood 와 Melvin Dresher 에 의해서 고안
– 죄수 2명이 형사에게 취조 당하는 상황을 가정한 모델
• 2명의 player는 서로 의사소통 불가능
– 많은 사회현상이 이러한 형태를 닮아 있다는 점에서 중요한 모델
• 게임이론, 경제학, 그리고 정치학에서 깊이 연구
• Ex) 군비경쟁, 가격경쟁…
• 게임의 조건
– R : 상호협력시의 payoff
– P : 상호배반시의 payoff
T : 나만 배반했을시의 payoff
S : 나만 협력했을시의 payoff
3
Iterated Prisoner's Dilemma Game
• IPD 게임
– 2명의 player가 Prisoner Dilemma 게임을 여러 번 반복
– 상대방이 배반했을시 벌칙을 가하는 것이 가능
• IPD 게임의 결과
– 게임을 반복할수록 서로 협력하는 양상을 보임
– 게임의 player가 학습능력이 있어야 함
 반복을 통해, 협력하는 것이 궁극적으로 더 많은 이득을 가져온다는 것
을 학습
4
N-person Iterated Prisoner's Dilemma Game
• nIPD 게임의 특징
– 2명  n명으로 player가 증가
– Real-world problem을 보다 폭 넓게 반영
– 문제를 모델링하기 위해 ‘진화연산’이 주로 사용
• Robert Axelrod의 nIPD 게임 실험
– nIPD 게임 상황에서, 각 개체는 어떻게 행동하는 것이 가장 합리적인가?
– 실험 과정
• step1) 각 분야의 전문가가 수동으로 작성한 전략을 서로 경쟁
• step2) 진화연산을 이용해 각 개체의 전략을 진화
 실험결과, 가장 우세 전략은?
5
Robert Axelrod’s nIPD game – Step1
• 각 학문 분야의 전문가들에게 IPD 게임에서 특정 행동 전략을 수행하
는 프로그램 요청
– 각 프로그램은 이전의 3번의 게임에서 자신과 상대방의 행동(배신, 협력)
을 기억
– 자신의 행동전략은 이 기억에 기반
• ex) 상대방이 2번 배신했으면 나도 배신, 무조건 배신, 2회 협력 후 1회 배신
• 각 프로그램을 서로 경쟁시켜 가장 우세한 전략을 선정
– 방식 : round-robin tournament
– 총 63개의 프로그램이 경쟁
• 어떤 프로그램은 마르코프 모델이나 베이즈 추론 같은 복잡한 기법을 사용
• 게임의 최종승자
– 제출된 전략중 가장 간단한 ‘TIT FOR TAT’
– TIT FOR TAT: 처음은 일단 협력, 이후부터는 상대방의 행동을 따라하기
6
Robert Axelrod’s nIPD game – Step2
• 진화연산이 전략을 성공적으로 진화시킬 수 있는지 실험
• Encoding
– C : Cooperation
D : Defect
– 이전 1번의 게임에 대해,
CC
CD
DC
DD
case 1
case 2
case 3
case 4
총 4가지 경우가 존재!
– 행동전략 : 각 경우에 대해 어떻게 행동(협력, 배반)할지 규정
– ex) TIT FOR TAT
CC
CD
DC
DD
협력
배신
협력
배신
7
Robert Axelrod’s nIPD game – Step2
• Encoding - 이전 3번의 게임을 기억해야 하는 경우
CC CC CC (case 1)
CC CC CD (case 2)
CC CC DC (case 3)
64가지 경우
…
DD DD DC (case 63)
DD DD DD (case 64)
 따라서 총 64bit + 6bit 로 전략 encoding 가능
• 64bit : 각 경우와 행동을 1대1 맵핑
• 6bit : 이전 3번의 행동을 기억
• EX) CCDCDDDC … DC CCDDCD
– 가능한 전략의 수 = 270
8
Robert Axelrod’s nIPD game – Step2
• 기타 변수들
– Fitness : Payoff의 합
– Population : 20
– Generation : 50
• 실험 결과
– 진화된 대부분의 전략은 협력에 보답하고 배신에는 보복하는 양상
• TIT FOR TAT과 유사!!
– TIT FOR TAT 보다 더 높은 점수를 얻는 전략도 발견
• 실험 양상
① 초기 세대에는 협조적인 개체들이 다른 개체에 보답을 받지 못하고 소
멸
② 약 10~20세대 이후에는 협조에 보답하고 배신에 보복하는 전략이 등장
③ 이후 위와 같은 전략이 population에 다수 분포
9
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 9, NO. 6, DECEMBER 2005
Evolution of Iterated Prisoner's Dilemma Game
Strategies in Structured Demes Under Random
Pairing in Game Playing
Hisao Ishibuchi, Member, IEEE, and Naoki Namikawa, Student Member, IEEE
김희택
Outline
• Introduction
• Two neighbor structure
– IPD game structure
– Mating strategy
• Simulation – Standard Pairing Scheme
• Random Pairing Scheme
• Simulation – Random Pairing Scheme
• Conclusion
11
Introduction
• Spatial IPD game
– Framework of structured demes
– Cells of two-dimensional grid-world
• Two neighborhood structure
① Interaction among players through the IPD game
② Interaction among players for mating strategies
Similar to world of territorial animals or plant
• Random pairing scheme
– Plays game with a randomly chosen neighbor at every round
– Demonstrate evolution of cooperation behavior (in random
pairing)
12
Basic structure – Payoff Matrix
• Payoff Matrix of the game
13
Basic structure – Strategy Encoding
• Single player has a single strategy
• Every Strategy is represented by 5 bit binary sequence
– Example of strategy (TIT-FOR-TAT)
14
IPD game structure – World and Neighborhood
• Use 31 * 31 grid-world
– All player locate on one cell
– 961 player exist
• Examples of neighborhood structure
15
IPD game structure - Game play and Fitness
• NIPD(i)
– The set of Player i and its neighbors
• Game play
– The game is iterated for a pre-specified number of rounds (e.g, 100
rounds)
– Each player plays game against only its neighbors
• Randomly select opponents
• Fitness
– Average payoff obtained from each round of the game
16
Mating strategy – formulation
• NGA(i)
– Set of player i and its neighbors
 NIPD(i) = NGA(i) is not always hold
• Parents is selected from NGA(i)
– Using roulette wheel selection
• Selection probability of strategy j
– f(si) : fitness of player i with strategy si
– Fmin(NGA(i)) : minimum fitness among the NGA(i)
17
Mating strategy – crossover and mutation
• One point crossover
• Bitmap mutation
18
Simulation
• Two kinds of simulation
① Simulate two neighborhood structure with standard pairing
scheme
• Verify the effect of two neighborhood structure on evolution of
cooperative behavior
② Simulate two neighborhood structure with random pairing
scheme
• Examine the effect random pairing scheme on evolution of cooperative
behavior
• 961 spatially fixed player (31 * 31 grid-world)
• Mistake (noisy IPD model)
– A player chooses an action different from its strategy
19
Standard Pairing Simulation – Parameter
Setting
• Case of two neighborhood structure
• Parameter value
Mistake probability
0, 0.001, 0.01, 0.1
Crossover probability
1.0
Mutation probability
1 / (5*961)
Termination of IPD game
100 rounds
Termination of evolution
1000 generations
20
Standard Pairing Simulation – Result
• NIPD has a significant effect on the evolution of cooperative
behavior
• NGA has a much smaller effect than NIPD
• Small NIPD facilitate the evolution of cooperative behavior
<Mistake probability 0.1>
21
Standard Pairing Simulation – Result (2)
• Better results were obtained from smaller mistake probabilities
• Cooperative behavior were evolved independently from the two
neighborhood structures
<Mistake probability 0.01>
<Mistake probability 0.001>
22
Random Pairing Scheme
• Every player chooses its opponent randomly from NIPD at every
round of the game
• The memory about the interaction with a neighbor may
influence an player’s future action against another neighbor
23
Random Pairing Simulation – Result (1)
• The same parameter specifications were used as in the previous
• Evolution of cooperative behavior is very difficult to achieve
<Mistake probability 0>
• Increase number of opponents  Decreased the probability to
24
Random Pairing Simulation – Result (2)
Parameter
Value
Mistake probability
0
NIPD(i)
3
NGA(i)
5
• Strategy characterized by the genetic form “1***1”
25
Random Pairing Simulation – Result (3)
Parameter
Value
Mistake probability
0
NIPD(i)
5
NGA(i)
5
• Strategy characterized by the genetic form “****0”
– The existence of strategies of this type prevents the consecutive
occurrence of mutual cooperation
26
Random Pairing Simulation – Result (4)
Parameter
Value
Mistake probability
0.01
NIPD(i)
3
NGA(i)
5
• Strategy characterized by the genetic form “11**1”
– Those strategies have the ability to recover from mutual defection
(D, D)
– This ability seems to be important under a noisy situation
27
Random Pairing Simulation – Result (5)
Parameter
Value
Mistake probability
0.01
NIPD(i)
5
NGA(i)
5
• The TFT strategy “10011” increased its percentage to almost 100%
• Higher average payoff was obtained from strategies of the form “11**1,”
rather than the TFT strategy “10011.”
28
Other Simulations
29
Conclusion
• Formulated a spatial IPD game using the concept of two
neighborhood structures
① Interaction among players through the IPD game
② Mating strategies
– Computer Simulation
• Use of a small interaction neighborhood facilitated the evolution of
cooperative behavior
• Introduced a random pairing scheme with the two
neighborhood structures
– Computer Simulation
• Cooperative behavior was evolved when we smallest interaction
neighborhood is used
• Future Work
– Explain the results of random pairing scheme simulation
– Use a stochastic strategy represented by a string of real numbers
between 0 and 1
30
Social Simulation Workshop at the
International Joint Conference on Artificial Intelligence
Simulation on Worksite Interactions between
Laborers and Firms by using
Multi-Agent based Evolutionary Computation
Soft Computing Laboratory, Yonsei University
Hee-Taek Kim and Sung-Bae Cho
elsein@sclab.yonsei.ac.kr , sbcho@cs.yonsei.ac.kr
Motivation
Low wage,
but high
productivity
<Firm>
High
wage.
..
Wage
<Laborer>
Labor
• Laborers and firms formulate strategic relationship
– What is rational strategy in position of laborer or firm
• Can we drive mutual benefits relation between Laborers and
firm?
• General economic belief
• laborer tends to cooperate with cooperative firms
• Firm tends to cooperate with cooperative laborers
32
Introduction of the Simulation Model
• Construct computational work-site interaction model
– Multi-agent based approach
– Consist of worker agent and firm agent
– Implement adaptive agent by using evolutionary computation
• Simulate interaction between workers and firms
– Workers and firms are mutually interact each other
– Make collaborative or competitive relationship
33
Evolutionary Computation
• Based on Darwinism
– “Survivals of the fittest”
– Apply evolutionism to computation
• Widely used to modeling social phenomena
– Individual population, behavioral rule, selection and reproduction
– Each individual can adapt to dynamic environment
• Basic evolution process
Population
Calculate
Fitness
Selection
Reproduction
(Crossover
and mutation)
34
Simulation Process – Laborer’s Phase
• The interaction protocol between workers and firms can be
divided into two phase
– Laborer’s phase and firm’s phase
<Laborers Phase>
• Laborers have to decide
whether to resign from firm or
not
• Laborers have to decide
whether to cooperate or defect
with his employer
35
Simulation Process – Firm’s Phase
• Firm’s phase
• Firms have to decide whether
to cooperate or defect with
his opponent laborers
36
Overall Process of Simulation
37
Simulation framework
38
Internal Attributes – Laborer
Attributes of laborer
Description
int
ID
Unique identifier of this laborer
int
employedFirmID
Unique identifier of a firm who employed this laborer
double
asset
Total asset of this laborer
double
productivity
The productivity offered to firm
double
livingCost
Living expenses per one generation. Subtract from asset
int
state
Current state { WORKING, JOBLESS, FRESH, FAILED }
int
continues
The counts of generations from employment to now
Array
chromosome
Array of integers representing strategy of this laborer
Array
firmCareer
After resignation, laborer never employed to same firm again
Queue
firmPastBehaviors
The cooperation history of the firm employed this laborer
Queue
laborerPastBehaviors
The cooperation history of this laborer
39
Internal Attributes– Firm
Attributes of firm
Description
int
ID
Unique identifier of the firm
double
capital
Total capital of this firm. Correspond to laborer’s asset
double
supportingCost
The cost for maintenance of a firm
Array
chromosome
Array of integers representing strategy of this firm
Array
myLaborers
Array of laborers who are employed in this firm
40
Action of Agent
• Cooperation and defection
• Laborer
– Cooperation : High Productivity (ProdH)
– Defection : Low Productivity (ProdL)
– Resign : resign from opponent firm
• Firm
– Cooperation : High wage (WageH)
– Defection : Low wage (WageL)
Laborer
(Laborer, Firm)
Firm
cooperation
defection
Cooperation
(WageH, ProdH - WageH)
(WageL, ProdH – WageL)
Defection
(WageH, ProdL - WageH)
(WageL, ProdL – WageL)
41
Behavioral Strategy of Agent
• Behavioral strategy determine current action of the agent
– All individuals has its own strategy
– All strategies evolve as the simulation being progressed
Strategy of firm encoded on firm chromosome
If (laborerPastBehaviors.countCoperation == 5) then return COOPERATE
Cooperation number of
opponent laborer
0
1
2
3
4
5
6
7
8
9
0
1
1
0
1
1
1
0
0
1
Strategy to decide
cooperation or defection
0 : Defection
1 : Cooperation
Strategy of laborer encoded on laborer chromosome
Cooperation number of
opponent firm
0
1
2
3
4
5
6
7
8
9
1
1
0
0
1
0
0
1
1
0
Strategy to decide
cooperation or defection
42
Evolutionary Engine
• Fitness evaluation
– Firm
• The capital attribute is treated as fitness of the firm
t'
Capital( f t,i )  Profit( f t ',i )  Capital( f t '1,i ) 
t'

t 0
t ,i , j )  Wage( f t ,i , lt ,i , j ))
t ,i )
t 0
m
 ( (Prod(l
 Profit( f
 SupportingCost )
j 1
– Laborer
• The asset attribute is treated as fitness of the laborer
t'
Asset(lt,i, j ) 
 Profit( l
t 0
• Selection
t'
t ,i , j )

(Wage( fˆ
t ,i , lt ,i , j )  livingCost)
t 0
– Used roulette wheel selection
– Possibility of selection
P(lt ,i, j ) 
Asset(lt ,i , j )
k
m
 Asset(l
i 0 j 1
t ,i , j )
P( f t ,i ) 
Capital( f t ,i )
k
 Capital( f
t ,i )
i 1
43
Evolutionary Engine
• Crossover and mutation
– One point cross over
– One point bit mutation
• Elimination
– Eliminate incapable agents from simulation
Asset(lt ,i, j )  0
or
Capital( f t ,i )  0
44

Experimental Design
Description
Value
2000
10
30
30
12
WageH/2
200
10
18
ProdH/2
Firm
30
defection
10
0.01 (WageL, ProdH – WageL)
Description
Defection
(WageH, ProdL - WageH)
Initial population of firm
Value
(WageL, ProdL – WageL)
Maximum population of firm
Initial population of laborers
Maximum population of laborers
Increment
rate of Firm)
laborers population
(Worker,
(Reproduce rate)
Mutation rate
Cooperation
Selection method
Crossover method
Worker
Labor
er
Othe
Laborer
Firm
Initial capital
Initial number of laborers per one firm
Maximum number of laborers per one firm
supportingCost
WageH
WageL
Initial asset
livingCost
ProdH
ProdL
Initial number of firms
(Laborer, Firm)
cooperation
Maximum capacity of history queue ( )
Mistake probability
Cooperation
(WageH, ProdH - WageH)
Defection
30
Infinite
330
Infinite
Firm
cooperation 0.005 Defection
(12, 6)
(12, -3)
0.005
(6, 12)
Roulette wheel
1 point crossover
(6, 3)
45
Experimental Result
46
Experimental Result (2)
47
Conclusion – Second Experiment
• Forbid resignation of laborers
– Laborers cannot escape from vicious firm
– Firms just want to extort faithful laborer
• Results in breakdown of all agents because of selfish behavior
of the firms
48
Current Works
• Extend 2*2 interaction model  Continuous model based on
linear algebra
– Asset/livingCost  X1 + RecentGivenPay X2 + Continuous X3 …
• Beside previous activity of opponent agent, many other factors
can affect current action of the agent
– Environmental information, my current state, opponent state, and
so on…
• Test various policies to simulation model and analysis it’s effect
49
과제
• nIPD game을 직접 구현해 봅시다^^
• 기본적인 실험방법은 Robert Axelrod’s nIPD game과 동일
– Population, Generation, Mutation Rate, Crossover Rate, payoff
matrix, 게임 방식 모두 자유
– 단 payoff matrix는 prisoner’s dilemma game의 조건을 만족해야 함
– http://www.aistudy.co.kr/biology/genetic/example_mitchell.htm참고
• 단, 3번의 이전게임이 아니라, 바로 직전 1번의 게임만 기억
– 따라서 전략 길이는 6bit가 됨
• 제출물
– 보고서, 소스코드
– 기한 : 9월 24
– 언어 : VS2008로 돌아가는 거 (C, C++, C# 등)
50
보고서
• 실험 개요
• 실험 셋팅에 관한 내용
– Population, Generation, Mutation Rate, Crossover Rate, payoff
matrix, 게임 방식 등
• 실험 결과
– 실험결과 우세 전략이 어떻게 나왔는가?
– 그래프, 도표 등을 동원하여 진화 과정을 잘 보일 수도 있을 듯
– 스크린 샷
• 결론
51
감사합니다
Download