summary

The patrolling game Siner Gokhan Yilmaz Outline • The explanation of the game • The strategies of players • Bilinear model approach • Iterative two model approach Players of the game • The patrolling game is played between attacker and defender • The turns are discrete, and players take actions simultaneously Environment • Game environment is represented by a graph G = (V,E) • V represents the set of vertices • E represents the set of edges • We define a set of entrances U, which is subset of V Environment • Value of targets are represented by 𝐶𝑖𝑑 , 𝐶𝑖𝑎 for defender and attacker respectively • We also denote the set of neighbors of each vertex j with N(j) Strategies • The attacker can observe the position of the defender • The attacker also knows full strategy of the defender • The defender can only observe its own position Strategies • If the defender is at vertex i, it moves to one of its neighbor j with probability 𝛼𝑖𝑗 • The attacker has two choices, it can move a neighbor node or initiate an attack • If the attacker decides to attack, it needs to stay there for 𝑚𝑖 turns Strategies • When both the attacker and the defender are at same vertex, with 𝑑𝑖 probability the defender will detect the attacker • In this case, the attacker will get payoff −𝜖𝑎 and defender will get 𝜖𝑑 • If attack is successful, , the attacker will get payoff 𝐶𝑖𝑎 and defender will get −𝐶𝑖𝑑 Strategies • At the start of the game, the attacker is outside of the game and can enter anytime from one of the entrance vertex in U • We denote outside node as ‘o’ when defining Markov Decision Process Bilevel Model • 𝑓𝑑 . and 𝑓𝑎 . denotes utility functions of defender and attacker respectively Attacker’s model: States • When game is still in progress, a state is defined as (𝑖, 𝑗, 𝑚𝑗 ) triple • There is also end game states Δ𝑐 and Δ𝑠𝑖 , which represent capture state and successful attack at vertex 𝑖 state respectively Attacker’s model: Actions • We define actions for attacker on each state 𝑠 as 𝐵(𝑠) • 𝜇𝑗 represents movement into vertex 𝑗 • 𝛿𝑗 represents dwelling into vertex 𝑗 Attacker’s model: Transitions • We denote transition probabilities from state action 𝑠, 𝑏 into state 𝑠′ as 𝑃𝑠𝑏𝑠′ Attacker’s model: Transitions • Attacker has not started its attack, still moves around • When they move to different vertices, transition probability is 𝛼𝑖𝑖` • When they move into same vertex, we need to include capture probability Attacker’s model: Transitions • Attacker has already started its attack • When defender moves into a different vertex than the attacked vertex, transition probability is 𝛼𝑖𝑖` • When defender moves into the attacked vertex, we need to include capture probability Attacker’s model: Transitions • Attacker is one move away from successful attack • When defender moves into a different vertex, which means attacker successful in its attempt • If defender moves into same vertex, there is still capture probability Attacker’s model: Rewards • Each transition yields a 𝑎 reward 𝑅𝑠𝑏𝑠′ for the attacker • Rewards are nonzero only if 𝑠′ is one of the end game states Attacker’s model: main Attacker’s main model • We can use Bellman equation to define the main model for attacker • 𝛾𝑠 is starting distribution of states • 𝑣𝑠 is variable of the main model, which represents value of the state 𝑠 Attacker’s dual model Attacker’s model: dual Attacker’s main model • We also need to find dual of the problem since we will use KKT conditions • Dual problem’s variables are 𝑤𝑠𝑎 , which represents state action frequencies Attacker’s dual model Attacker’s model: KKT conditions • We can convert attacker’s model to a set of constraints by using KKT conditions • In addition to primal feasibility and dual feasibility, we need to include complementary slackness constraints Main constraints Dual constraints Complementary slackness Defender’s objective • We can use state action frequencies of attacker’s model to calculate expected reward defender gets Overall model • Overall model can be rewritten by using KKT conditions • Any feasible solution to this problem will be optimal strategy for attacker given 𝛼 Experimental results • The mathematical model is realized in GAMS • Tried out different nonlinear solvers but settled on BARON solver Experimental results: some remarks • Works well on graphs with low number of nodes(<10) • With increasing number of vertices, the model size increases polynomial fast • The problem becomes unsolvable in reasonable amount of time for large number of vertices, ie the optimality gap does not decrease Experimental results: circular example • Consists of six nodes and they are circularly connected • In this example rewards of all nodes are equal Some remarks • Problem is nonconvex • Works well on graphs with low number of nodes(<10) • Computational costs increases on large graphs • This makes solving it in a reasonable time infeasible Alternative approach Attacker’s model • Just like attacker’s model, write down defender’s model • Attacker’s model uses defender’s policy as parameters • Solve each problem iteratively Defender’s model ? Defender’s model: Actions • We define actions for attacker on each state 𝑠 as 𝐴(𝑠) • 𝜈𝑗 represents movement into vertex 𝑗 Top-level model: Transitions • We denote transition probabilities from state 𝑠 actions 𝑎 and 𝑏 into state 𝑠′ as 𝑃𝑠𝑎𝑏𝑠′ • We can use this top-level model to derive defender’s transitions Top-level model: Transitions • Attacker has not started its attack, still moves around • When they move to different vertices, transition probability is 1 • When they move into same vertex, we need to include capture probability Top-level model: Transitions • Attacker has already started its attack • When defender moves into a different vertex than the attacked vertex, transition probability is 1 • When defender moves into the attacked vertex, we need to include capture probability Top-level model: Transitions • Attacker is one move away from successful attack • When defender moves into a different vertex, which means attacker successful in its attempt • If defender moves into same vertex, we need include capture probability Defender’s model: Policy of attacker • 𝛽𝑠𝑗 represents attacker’s movement probability to vertex 𝑗 when the game is in state 𝑠 • 𝛽𝑠𝑑 represents attacker’s dwell probability when the game is in state 𝑠 • 𝛽 values can be calculated from state action frequencies, 𝑢 Defender’s model: Transitions • We denote transition probabilities from state action 𝑠, 𝑎 into state 𝑠′ as 𝑃𝑠𝑎𝑠′ Defender’s model: Transitions • Attacker has not started its attack, still moves around • When they move to different vertices, transition probability is 𝛽𝑠𝑗′ • When they move into same vertex, we need to include capture probability Defender’s model: Transitions • Attacker has not started its attack or just started its attack • They end up in same vertex • With 𝑑𝑗′ attacker gets captured Defender’s model: Transitions • Attacker just started its attack • When defender ends up in a different vertex, transition probability is 𝛽𝑠𝑑 • When defender ends up in the same vertex, we include detection probability Defender’s model: Transitions • Attacker already started its attack • Attacker has only one option, continue its attack • If they end up in different vertices, transition probability is one • Otherwise, we have two end states, one is capture state Defender’s model: Transitions • Attacker is one step away from successful attack • Attacker has only one option, continue its attack • This is very similar with last three cases, only difference is game ends for sure Defender’s model: Rewards • Each transition yields a 𝑑 reward 𝑅𝑠𝑎𝑠′ for the attacker • Rewards are nonzero only if 𝑠′ is one of the end game states Defender’s model: Main and dual Defender’s main model • We can use Bellman equation to define the main model for attacker • 𝑤𝑠 is variable of the main model, which represents value of the state 𝑠 • Dual problem’s variables are 𝑧𝑠𝑎 , which represents state action frequencies Defender’s dual model Defender’s model: Calculating 𝛼 from solution • Just like calculating 𝛽, 𝛼 values can be calculated from state action frequencies, 𝑧 in this case Method summary Attacker’s model • We solved attacker’s model and defender’s model iteratively but observed no improvement on the policies with each iteration Defender’s model Experimental results • The mathematical model is realized in GAMS • Used CPLEX as linear solver Experimental results: Some remarks • Each problem is easy to solve, even for large problems (tried up to ~30 nodes) • Unfortunately, with current models, system does not converge to an equilibrium • Next, alternative models can be tried Questions?

summary

Related documents

Products

Support

summary

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib