The patrolling game Siner Gokhan Yilmaz Outline • The explanation of the game • The strategies of players • Bilinear model approach • Iterative two model approach Players of the game • The patrolling game is played between attacker and defender • The turns are discrete, and players take actions simultaneously Environment • Game environment is represented by a graph G = (V,E) • V represents the set of vertices • E represents the set of edges • We define a set of entrances U, which is subset of V Environment • Value of targets are represented by πΆππ , πΆππ for defender and attacker respectively • We also denote the set of neighbors of each vertex j with N(j) Strategies • The attacker can observe the position of the defender • The attacker also knows full strategy of the defender • The defender can only observe its own position Strategies • If the defender is at vertex i, it moves to one of its neighbor j with probability πΌππ • The attacker has two choices, it can move a neighbor node or initiate an attack • If the attacker decides to attack, it needs to stay there for ππ turns Strategies • When both the attacker and the defender are at same vertex, with ππ probability the defender will detect the attacker • In this case, the attacker will get payoff −ππ and defender will get ππ • If attack is successful, , the attacker will get payoff πΆππ and defender will get −πΆππ Strategies • At the start of the game, the attacker is outside of the game and can enter anytime from one of the entrance vertex in U • We denote outside node as ‘o’ when defining Markov Decision Process Bilevel Model • ππ . and ππ . denotes utility functions of defender and attacker respectively Attacker’s model: States • When game is still in progress, a state is defined as (π, π, ππ ) triple • There is also end game states Δπ and Δπ π , which represent capture state and successful attack at vertex π state respectively Attacker’s model: Actions • We define actions for attacker on each state π as π΅(π ) • ππ represents movement into vertex π • πΏπ represents dwelling into vertex π Attacker’s model: Transitions • We denote transition probabilities from state action π , π into state π ′ as ππ ππ ′ Attacker’s model: Transitions • Attacker has not started its attack, still moves around • When they move to different vertices, transition probability is πΌππ` • When they move into same vertex, we need to include capture probability Attacker’s model: Transitions • Attacker has already started its attack • When defender moves into a different vertex than the attacked vertex, transition probability is πΌππ` • When defender moves into the attacked vertex, we need to include capture probability Attacker’s model: Transitions • Attacker is one move away from successful attack • When defender moves into a different vertex, which means attacker successful in its attempt • If defender moves into same vertex, there is still capture probability Attacker’s model: Rewards • Each transition yields a π reward π π ππ ′ for the attacker • Rewards are nonzero only if π ′ is one of the end game states Attacker’s model: main Attacker’s main model • We can use Bellman equation to define the main model for attacker • πΎπ is starting distribution of states • π£π is variable of the main model, which represents value of the state π Attacker’s dual model Attacker’s model: dual Attacker’s main model • We also need to find dual of the problem since we will use KKT conditions • Dual problem’s variables are π€π π , which represents state action frequencies Attacker’s dual model Attacker’s model: KKT conditions • We can convert attacker’s model to a set of constraints by using KKT conditions • In addition to primal feasibility and dual feasibility, we need to include complementary slackness constraints Main constraints Dual constraints Complementary slackness Defender’s objective • We can use state action frequencies of attacker’s model to calculate expected reward defender gets Overall model • Overall model can be rewritten by using KKT conditions • Any feasible solution to this problem will be optimal strategy for attacker given πΌ Experimental results • The mathematical model is realized in GAMS • Tried out different nonlinear solvers but settled on BARON solver Experimental results: some remarks • Works well on graphs with low number of nodes(<10) • With increasing number of vertices, the model size increases polynomial fast • The problem becomes unsolvable in reasonable amount of time for large number of vertices, ie the optimality gap does not decrease Experimental results: circular example • Consists of six nodes and they are circularly connected • In this example rewards of all nodes are equal Some remarks • Problem is nonconvex • Works well on graphs with low number of nodes(<10) • Computational costs increases on large graphs • This makes solving it in a reasonable time infeasible Alternative approach Attacker’s model • Just like attacker’s model, write down defender’s model • Attacker’s model uses defender’s policy as parameters • Solve each problem iteratively Defender’s model ? Defender’s model: Actions • We define actions for attacker on each state π as π΄(π ) • ππ represents movement into vertex π Top-level model: Transitions • We denote transition probabilities from state π actions π and π into state π ′ as ππ πππ ′ • We can use this top-level model to derive defender’s transitions Top-level model: Transitions • Attacker has not started its attack, still moves around • When they move to different vertices, transition probability is 1 • When they move into same vertex, we need to include capture probability Top-level model: Transitions • Attacker has already started its attack • When defender moves into a different vertex than the attacked vertex, transition probability is 1 • When defender moves into the attacked vertex, we need to include capture probability Top-level model: Transitions • Attacker is one move away from successful attack • When defender moves into a different vertex, which means attacker successful in its attempt • If defender moves into same vertex, we need include capture probability Defender’s model: Policy of attacker • π½π π represents attacker’s movement probability to vertex π when the game is in state π • π½π π represents attacker’s dwell probability when the game is in state π • π½ values can be calculated from state action frequencies, π’ Defender’s model: Transitions • We denote transition probabilities from state action π , π into state π ′ as ππ ππ ′ Defender’s model: Transitions • Attacker has not started its attack, still moves around • When they move to different vertices, transition probability is π½π π′ • When they move into same vertex, we need to include capture probability Defender’s model: Transitions • Attacker has not started its attack or just started its attack • They end up in same vertex • With ππ′ attacker gets captured Defender’s model: Transitions • Attacker just started its attack • When defender ends up in a different vertex, transition probability is π½π π • When defender ends up in the same vertex, we include detection probability Defender’s model: Transitions • Attacker already started its attack • Attacker has only one option, continue its attack • If they end up in different vertices, transition probability is one • Otherwise, we have two end states, one is capture state Defender’s model: Transitions • Attacker is one step away from successful attack • Attacker has only one option, continue its attack • This is very similar with last three cases, only difference is game ends for sure Defender’s model: Rewards • Each transition yields a π reward π π ππ ′ for the attacker • Rewards are nonzero only if π ′ is one of the end game states Defender’s model: Main and dual Defender’s main model • We can use Bellman equation to define the main model for attacker • π€π is variable of the main model, which represents value of the state π • Dual problem’s variables are π§π π , which represents state action frequencies Defender’s dual model Defender’s model: Calculating πΌ from solution • Just like calculating π½, πΌ values can be calculated from state action frequencies, π§ in this case Method summary Attacker’s model • We solved attacker’s model and defender’s model iteratively but observed no improvement on the policies with each iteration Defender’s model Experimental results • The mathematical model is realized in GAMS • Used CPLEX as linear solver Experimental results: Some remarks • Each problem is easy to solve, even for large problems (tried up to ~30 nodes) • Unfortunately, with current models, system does not converge to an equilibrium • Next, alternative models can be tried Questions?