Uploaded by Siner Gokhan YΔ±lmaz

summary

advertisement
The patrolling game
Siner Gokhan Yilmaz
Outline
• The explanation of the
game
• The strategies of players
• Bilinear model approach
• Iterative two model
approach
Players of the game
• The patrolling game is
played between attacker
and defender
• The turns are discrete,
and players take actions
simultaneously
Environment
• Game environment is
represented by a graph
G = (V,E)
• V represents the set of
vertices
• E represents the set of
edges
• We define a set of
entrances U, which is
subset of V
Environment
• Value of targets are
represented by 𝐢𝑖𝑑 , πΆπ‘–π‘Ž for
defender and attacker
respectively
• We also denote the set of
neighbors of each vertex j
with N(j)
Strategies
• The attacker can observe
the position of the
defender
• The attacker also knows
full strategy of the
defender
• The defender can only
observe its own position
Strategies
• If the defender is at
vertex i, it moves to one
of its neighbor j with
probability 𝛼𝑖𝑗
• The attacker has two
choices, it can move a
neighbor node or initiate
an attack
• If the attacker decides to
attack, it needs to stay
there for π‘šπ‘– turns
Strategies
• When both the attacker
and the defender are at
same vertex, with 𝑑𝑖
probability the defender
will detect the attacker
• In this case, the attacker
will get payoff −πœ–π‘Ž and
defender will get πœ–π‘‘
• If attack is successful, , the
attacker will get payoff
πΆπ‘–π‘Ž and defender will
get −𝐢𝑖𝑑
Strategies
• At the start of the game,
the attacker is outside of
the game and can enter
anytime from one of the
entrance vertex in U
• We denote outside node
as ‘o’ when defining
Markov Decision Process
Bilevel Model
• 𝑓𝑑 . and π‘“π‘Ž . denotes
utility functions of
defender and attacker
respectively
Attacker’s model: States
• When game is still in
progress, a state is
defined as (𝑖, 𝑗, π‘šπ‘— ) triple
• There is also end game
states Δ𝑐 and Δ𝑠𝑖 , which
represent capture state
and successful attack at
vertex 𝑖 state respectively
Attacker’s model: Actions
• We define actions for
attacker on each state 𝑠
as 𝐡(𝑠)
• πœ‡π‘— represents movement
into vertex 𝑗
• 𝛿𝑗 represents dwelling
into vertex 𝑗
Attacker’s model: Transitions
• We denote transition
probabilities from state
action 𝑠, 𝑏 into state 𝑠′ as
𝑃𝑠𝑏𝑠′
Attacker’s model: Transitions
• Attacker has not started its
attack, still moves around
• When they move to
different vertices, transition
probability is 𝛼𝑖𝑖`
• When they move into same
vertex, we need to include
capture probability
Attacker’s model: Transitions
• Attacker has already
started its attack
• When defender moves into
a different vertex than the
attacked vertex, transition
probability is 𝛼𝑖𝑖`
• When defender moves into
the attacked vertex, we
need to include capture
probability
Attacker’s model: Transitions
• Attacker is one move away
from successful attack
• When defender moves into
a different vertex, which
means attacker successful
in its attempt
• If defender moves into
same vertex, there is still
capture probability
Attacker’s model: Rewards
• Each transition yields a
π‘Ž
reward 𝑅𝑠𝑏𝑠′
for the
attacker
• Rewards are nonzero only
if 𝑠′ is one of the end
game states
Attacker’s model: main
Attacker’s main model
• We can use Bellman
equation to define the
main model for attacker
• 𝛾𝑠 is starting distribution
of states
• 𝑣𝑠 is variable of the main
model, which represents
value of the state 𝑠
Attacker’s dual model
Attacker’s model: dual
Attacker’s main model
• We also need to find dual
of the problem since we
will use KKT conditions
• Dual problem’s variables
are π‘€π‘ π‘Ž , which represents
state action frequencies
Attacker’s dual model
Attacker’s model: KKT conditions
• We can convert attacker’s
model to a set of
constraints by using KKT
conditions
• In addition to primal
feasibility and dual
feasibility, we need to
include complementary
slackness constraints
Main constraints
Dual constraints
Complementary slackness
Defender’s objective
• We can use state action
frequencies of attacker’s
model to calculate
expected reward
defender gets
Overall model
• Overall model can be
rewritten by using KKT
conditions
• Any feasible solution to
this problem will be
optimal strategy for
attacker given 𝛼
Experimental results
• The mathematical model
is realized in GAMS
• Tried out different
nonlinear solvers but
settled on BARON solver
Experimental results: some remarks
• Works well on graphs with
low number of nodes(<10)
• With increasing number of
vertices, the model size
increases polynomial fast
• The problem becomes
unsolvable in reasonable
amount of time for large
number of vertices, ie the
optimality gap does not
decrease
Experimental results: circular example
• Consists of six nodes and
they are circularly
connected
• In this example rewards
of all nodes are equal
Some remarks
• Problem is nonconvex
• Works well on graphs with
low number of
nodes(<10)
• Computational costs
increases on large graphs
• This makes solving it in a
reasonable time infeasible
Alternative approach
Attacker’s model
• Just like attacker’s model,
write down defender’s
model
• Attacker’s model uses
defender’s policy as
parameters
• Solve each problem
iteratively
Defender’s model
?
Defender’s model: Actions
• We define actions for
attacker on each state 𝑠
as 𝐴(𝑠)
• πœˆπ‘— represents movement
into vertex 𝑗
Top-level model: Transitions
• We denote transition
probabilities from state 𝑠
actions π‘Ž and 𝑏 into state
𝑠′ as π‘ƒπ‘ π‘Žπ‘π‘ ′
• We can use this top-level
model to derive
defender’s transitions
Top-level model: Transitions
• Attacker has not started its
attack, still moves around
• When they move to
different vertices, transition
probability is 1
• When they move into same
vertex, we need to include
capture probability
Top-level model: Transitions
• Attacker has already
started its attack
• When defender moves into
a different vertex than the
attacked vertex, transition
probability is 1
• When defender moves into
the attacked vertex, we
need to include capture
probability
Top-level model: Transitions
• Attacker is one move away
from successful attack
• When defender moves into
a different vertex, which
means attacker successful
in its attempt
• If defender moves into
same vertex, we need
include capture probability
Defender’s model: Policy of attacker
• 𝛽𝑠𝑗 represents attacker’s
movement probability to
vertex 𝑗 when the game is
in state 𝑠
• 𝛽𝑠𝑑 represents attacker’s
dwell probability when
the game is in state 𝑠
• 𝛽 values can be
calculated from state
action frequencies, 𝑒
Defender’s model: Transitions
• We denote transition
probabilities from state
action 𝑠, π‘Ž into state 𝑠′ as
π‘ƒπ‘ π‘Žπ‘ ′
Defender’s model: Transitions
• Attacker has not started
its attack, still moves
around
• When they move to
different vertices,
transition probability is
𝛽𝑠𝑗′
• When they move into
same vertex, we need to
include capture
probability
Defender’s model: Transitions
• Attacker has not started
its attack or just started
its attack
• They end up in same
vertex
• With 𝑑𝑗′ attacker gets
captured
Defender’s model: Transitions
• Attacker just started its
attack
• When defender ends up
in a different vertex,
transition probability is
𝛽𝑠𝑑
• When defender ends up
in the same vertex, we
include detection
probability
Defender’s model: Transitions
• Attacker already started
its attack
• Attacker has only one
option, continue its
attack
• If they end up in different
vertices, transition
probability is one
• Otherwise, we have two
end states, one is capture
state
Defender’s model: Transitions
• Attacker is one step away
from successful attack
• Attacker has only one
option, continue its
attack
• This is very similar with
last three cases, only
difference is game ends
for sure
Defender’s model: Rewards
• Each transition yields a
𝑑
reward π‘…π‘ π‘Žπ‘ ′
for the
attacker
• Rewards are nonzero only
if 𝑠′ is one of the end
game states
Defender’s model: Main and dual
Defender’s main model
• We can use Bellman
equation to define the
main model for attacker
• 𝑀𝑠 is variable of the main
model, which represents
value of the state 𝑠
• Dual problem’s variables
are π‘§π‘ π‘Ž , which represents
state action frequencies
Defender’s dual model
Defender’s model: Calculating 𝛼 from solution
• Just like calculating 𝛽, 𝛼
values can be calculated
from state action
frequencies, 𝑧 in this case
Method summary
Attacker’s model
• We solved attacker’s
model and defender’s
model iteratively but
observed no
improvement on the
policies with each
iteration
Defender’s model
Experimental results
• The mathematical model
is realized in GAMS
• Used CPLEX as linear
solver
Experimental results: Some remarks
• Each problem is easy to
solve, even for large
problems (tried up to ~30
nodes)
• Unfortunately, with
current models, system
does not converge to an
equilibrium
• Next, alternative models
can be tried
Questions?
Download