Tactical Strategies to Search for

advertisement
Cow-Path Games: Tactical Strategies to Search for
Scarce Resources
by
Kevin Spieser
Submitted to the Department of Aeronautics and Astronautics
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
October 2014 [PeoV
2\Z]
@ Massachusetts Institute of Technology 2014. All rights reserved.
Lf
Signature redacted
Author....................................................
Department of Aeronautics and Astronautics
October 7, 2014
Signature redacted
Certified by............
Emilio Frazzoli
Professor of Aeronautics and Astronautics
Thesis Supervisor
Signature redacted
.....................
. . .
.
Certified by...............
Iatrick Jaillet
Professor of Electrical Engineering
Corpmittee Member
Signature redacted
Certified by......
Hamsa Balakrishnan
Associate Professor of Aeronautics and Astronautics
Committee Member
FN
Accepted by .....
.....................
Signature redacted ........
Paulo C. Lozano
Associate Professor of Aeronautics and Astronautics
Chair, Graduate Program Committee
2
Cow-Path Games: Tactical Strategies to Search for Scarce
Resources
by
Kevin Spieser
Submitted to the Department of Aeronautics and Astronautics
on October 7, 2014, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Abstract
This thesis investigates search scenarios in which multiple mobile, self-interested
agents, cows in our case, compete to capture targets.
The problems considered in this thesis address search strategies that reflect (i)
the need to efficiently search for targets given a prior on their location, and (ii)
an awareness that the environment in which searching takes place contains other
self-interested agents. Surprisingly, problems that feature these elements are largely
under-represented in the literature. Granted, the scenarios of interest inherit the
challenges and complexities of search theory and game theory alike. Undeterred, this
thesis makes a contribution by considering competitive search problems that feature
a modest number of agents and take place in simple environments. These restrictions
permit an in-depth analysis of the decision-making involved, while preserving interesting options for strategic play. In studying these problems, we report a number of
fundamental competitive search game results and, in so doing, begin to populate a
toolbox of techniques and results useful for tackling more scenarios.
The thesis begins by introducing a collection of problems that fit within the competitive search game framework. We use the example of taxi systems, in which drivers
compete to find passengers and garner fares, as a motivational example throughout.
Owing to connections with a well-known problem, called the Cow-Path Problem, the
agents of interest, which could represent taxis or robots depending on the scenario,
will be referred to as cows. To begin, we first consider a one-sided search problem in
which a hungry cow, left to her own devices, tries to efficiently find a patch of clover
located on a ring. Subsequently, we consider a game in which two cows, guided only
by limited prior information, compete to capture a target. We begin by considering a
version in which each cow can turn at most once and show this game admits an equilibrium. A dynamic-programming-based approach is then used to extend the result
to games featuring at most a finite number of turns. Subsequent chapters consider
games that add one or more elements to this basic construct. We consider games
3
where one cow has additional information on the target's location, and games where
targets arrive dynamically. For a number of these variants, we characterize equilibrium search strategies. In settings where this proves overly difficult, we characterize
search strategies that provide performance within a known factor of the utility that
would be achieved in an equilibrium.
The thesis closes by highlighting the key ideas discussed and outlining directions
of future research.
Thesis Supervisor: Emilio Frazzoli
Title: Professor of Aeronautics and Astronautics
Committee Member: Patrick Jaillet
Title: Professor of Electrical Engineering
Committee Member: Hamsa Balakrishnan
Title: Associate Professor of Aeronautics and Astronautics
4
Acknowledgments
This thesis has been a long time in the making. As with many lengthy endeavors, the
road has not always been smooth. However, it is also true that, looking back, I am
grateful for the experience, the knowledge gained, the doors that have been opened,
and the many acquaintances and friends made along the way. Of course, I am also
very appreciative of the frequent encouragement and timely distractions provided by
those that have seen me through this degree.
My advisor, Emilio Frazzoli, afforded me tremendous freedom to pursue a widerange of research topics throughout my studies. This flexibility to think freely across
a breadth of problems not only kept my work engaging, but also made me a more
independent, well-rounded, and all together better researcher. I must also point out
the opportunities I had to visit NASA Ames in California, SMART in Singapore,
as well as the various venues at which I have been fortunate enough to present my
research.
He helped make all of these ventures possible.
Finally, as I got off to
somewhat of a rocky start at MIT, I owe him a special thanks for sticking with me.
The remaining members of my thesis committee, Professors Patrick Jaillet and
Hamsa Balakrishnan, have provided thoughtful commentary and a fresh perspective
on my research during our meetings and the writing of this document. I am grateful
for their time and feedback.
Lastly, I have seen many fellow lab mates come and go in my time at MIT. I
have taken classes and collaborated on research projects with a number of these
individuals. Their assistance with solving homework problems, studying for exams,
marking exams, writing papers, re-writing papers, and brainstorming ideas is much
appreciated. These interactions have been one of the defining features of my graduate
school experience. I hope that, on the whole, they have found my contributions to
these efforts as insightful and formative as I have found theirs. Thank you!
5
Sincerely,
Kevin Spieser
6
This thesis is dedicated to Briggs and my parents.
No cows were harmed in the writing of this thesis.
7
8
Contents
1
2
3
1.1
M otivation . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . .
22
1.2
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
1.3
O rganization
. . . . . . . . . . . . . .... . . . . . . . . . . . . . . .
28
31
Background Material
2.1
Search Theory: An Introduction . . . . . . . . . . . . . . . . . . . . .
32
2.2
Probabilistic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
2.2.1
Search with an imperfect sensor . . . . . . . . . . . . . . . . .
35
2.2.2
Search with a perfect sensor . . . . . . . . . . . . . . . . . . .
37
2.3
Pursuit-evasion games
. . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.4
Persistent planning problems . . . . . . . . . . . . . . . . . . . . . . .
42
45
Mathematical Preliminaries
3.1
4
21
Introduction
G am e theory
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
51
The Cow-Path Ring Problem
4.1
Introducing The Cow-Path Ring Problem . . . . . . . . . . . . . . . .
51
4.2
CPRP Notation and Terminology . . . . . . . . . . . . . . . . . . . .
53
9
The number of turns in the CPRP
. . . . . . . . . . . . . . . . . .
58
4.4
An iterative algorithm for s* . . . . . . . . . . . . . . . . . . . . . .
59
4.5
A direct algorithm for finding s* . . . . . . . . . . . . . . . . . . . .
64
4.6
Summary of the CPRP . . . . . . . . . . . . . . . . . . . . . . . . .
64
.
.
.
.
4.3
5 The Cow-Path Ring Game
67
Adding a second cow to the ring . . . . . .
. . . . . . . . . . . . . .
67
5.2
A model for informed cows . . . . . . . . .
. . . . . . . . . . . . . .
68
5.3
Defining the Cow-Path Ring Game . . . .
. . . . . . . . . . . . . .
70
5.4
A remark about Cow-Path games on the line . . . . . . . . . . . . . .
71
5.5
CPRG-specific notation and terminology .
. . . . . . . . . . . . . .
72
5.6
Search strategies in the CPRG . . . . . . .
. . . . . . . . . . . . . .
74
5.7
The one-turn, two-cow CPRG . . . . . . .
. . . . . . . . . . . . . .
74
5.8
1T-CPRG: computational considerations .
. . . . . . . . . . . . . .
83
5.9
The 1T-CPRG for different cow speeds . .
. . . . . . . . . . . . . .
84
5.10 Finite-turn CPRGs . . . . . . . . . . . . .
. . . . . . . . . . . . . .
84
5.11 Summary of the CPRG . . . . . . . . . . .
. . . . . . . . . . . . . .
87
Games with Asymmetric Information: Life as a Cow Gets Harder
91
.
.
.
.
.
.
.
.
.
.
5.1
Searching with asymmetric information:
92
6.2
Supplementary notation and terminology .
93
6.3
Information models for situational awareness
93
6.4
Behavioral models for asymmetric games .
95
6.5
A bound on the maximum number of turning points in the CPRG
95
.
a motivating example . . . . . . . . . . . .
.
6.1
.
6
10
7
6.6
CPRGs with asymmetric information . . . . . . . . . . . . . . . . . .
99
6.7
AI-CPRGs with perfect knowledge
. . . . . . . . . . . . . . . . . . .
99
6.8
Socially Optimal Resource Gathering . . . . . . . . . . . . . . . . . .
105
6.9
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Dynamic Cow-Path Games: Search Strategies for a Changing World113
A motivation for dynamic environments . . . . . . . . . . . . . . . . .
7.2
Dynamic Cow-Path Games with target transport requirements . . . . 115
7.3
Greedy search strategies for the DE-CPRG . . . . . . . . . . . . . . .
7.4
Equilibria utilities of cows in the DE-CPRG . . . . . . . . . . . . . . 120
7.5
An aggregate worst-case analysis of greedy
7.6
8
114
7.1
118
searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
. . . . . . . . . . . . . . . . . . .
134
Conclusions and Future Directions
137
Summary and Future Directions
11
12
List of Figures
1-1
Depiction of real-world scenarios where agents compete, against one
another, to capture targets given limited information on the location
of targets.
(a) Snapshot of taxi operations in Manhattan.
In busy
urban cities, the operation of taxi drivers trying to find passengers can
be represented as a competitive search game. (b) A sunken treasure
ship wrecked along a coral reef. The exploits that result from two rival
recovery boats each trying to find the ship using a crude sonar map
of the area constitutes a competitive search game. (c) A Kittyhawk
P-40 that crash-landed in the Sahara desert during World War II. In
the event rescue and enemy forces had some idea of the aircraft's location, e.g., from (intercepted) radio communications, and had each
party launched a recovery operation, the resulting race to locate the
plane would be a competitive search game. . . . . . . . . . . . . . . .
2-1
26
A partial taxonomy of select sub-disciplines within the field of search
theory. It is worth reinforcing that the families of games represented
are only a relevant sampling of select research areas in the field, and
not an exhaustive listing. The box shaded in blue represents agentvs-agent or competitive search games, the class of problems that will
be the focus of this thesis. The location of CSGs in the tree indicates
these problems share fundamental attributes with probabilistic search
problems and pursuit-evasion search games. The image was inspired
by similar figures reported in [17], [33]. . . . . . . . . . . . . . . . . .
13
34
2-2
Illustration of the key features of the stochastic or average case CowPath Problem. Starting from the origin, the cow explores R in search of
clover W. A hypothetical search plan is shown in gray. In the instance
depicted, the cow makes four turns before finding the target at the
point marked with a red exclamation mark.
4-1
. . . . . . . . . . . . . .
38
A visualization of the Cow-Path Ring Problem. The target density,
f , as a function of radial position q, is shown in blue. In the instance
depicted, the target, 7, is located on the North-West portion of 'Z.
In the search plan shown, the cow (yellow triangle) travels in the ccw
direction toward q1, where, having not found 7, she reverses direction,
and travels in the cw direction toward q2 . Upon reaching q2 , having
still not found T, she again reverses direction and continues searching
in the ccw direction until ultimately finding 7 at the site indicated
with a red exclamation mark.
5-1
. . . . . . . . . . . . . . . . . . . . . .
55
A visualization of the Cow-Path Line Game. The target density, fT,
is shown in blue. The unique equilibrium search strategy of each cow,
s* = (9, 9), is indicated by a directed gray line. Under s*, Ci heads
toward eCi and, just before meeting, reverses direction and visits any
previously unexplored territory. . . . . . . . . . . . . . . . . . . . . .
5-2
72
An instance of the CPRG illustrating the initial positions and initial
headings of cows C1 and C 2 . The trajectories of both cows, right up
to the point of capture, are shown in dark gray. The target density
f achieves a global maximum in [-7r/4, 0]. In the instance shown, 7
is located along the North-West portion of 'Z. The site at which 7 is
found, in this case by C 2 , is indicated with a red exclamation mark. .
14
73
5-3
A diagram showing associations between families of finite-turn CPRGs.
The node labelled with the pair (i, j) denotes the family of games in
which C 1 and C 2 may turn up to i and
j
times, respectively.
The
numbers above and beside the arrows indicate which cow turns to
bring about the indicated transition. The nodes representing base case
games, for which equilibria strategies may be found using the methods
discussed in previous sections, are colored in red. The nodes representing all other games are colored in gray. The arrows indicate how one
family of games reduces to a simpler family of games when a cow turns.
For example, the (2,2)-CPRG becomes an instance of the (1,2)-CPRG
when C 1 turns, and an instance of the (2, 1)-CPRG when C 2 turns. . .
6-1
88
Visualization of the functionality of notation used for describing subregions of 'Z and one point relative to another on 'Z. Due to the circular
topology of 'R, there is flexibility in the notational system. For example, [qi,
q2]c.
and [q2, qi]cc. refer to the same arc of R. Similarly,
(q 3 + x)c. and (q 3 + 27r - x)c, refer to the same point on 'Z. . . . . .
6-2
Initial positions, qi(0); initial headings,
#j(0);
and target priors,
of C 1 and C 2 for an instance of an AI-CPRG. (a)
f7,
94
f7;
shown in blue,
has local maxima along the South-East and North-West regions of 'Z.
(b)
f2,
shown in green, is more evenly distributed and contains three
modest peaks along 'Z. For q E 'Z such that ff(q)
$
f2(q), C1 and C 2
have different valuations for visiting q first. . . . . . . . . . . . . . . .
An instance of an AI-CPRG. The cows (depicted as cars) C 1 and C 2 are
initially diametrically opposed at the top and bottom of 'R, respectively.
C 1 's prior on T, namely f1j, is shown in blue.
Owing to
ff,
motivated to, if possible, be the first cow to explore segments 'Z
C 1 is
1
and
'R 2 . Shown in green, it is assumed that f2(q) = 1, Vq E 'R, such that
any two segments of 'Z having equal length are equally valuable to C 2
.
6-3
100
The points a, b, c, d, e, f and g are points of interest in Example 6.3.
15
.
102
6-4
Visualization of key quantities used in the proof of Theorem 6.5. The
points labelled 1, 2, and 3 in red correspond to the three points visited
by C 1 in 6.14. In the instance shown, d*
6-5
=
. . . . . . . . . . . .
ccw.
104
Illustration of the three ways in which Uso can fall short from the
maximum value of 2. In each figure, f17 and
fj
are shown in blue
and green, respectively. In (a), C 1 and C 2 are initially positioned on
the "wrong" sides of 9, resulting in a shortfall from 2. Were the cows
able to switch positions, the shortfall could be avoided. In (b), overlap
between Su(ff) and Su(f2) creates unavoidable inefficiency. In (c),
the shortfall results from the lack of convexity of
6-6
ff
and
f2.
. . . . .
107
A socially optimal search strategy for the scenario considered in Example 6.3. The socially optimal search strategy is illustrated by the
purple line: C 1 and C 2 rendezvous at b, having travelled there in the cw
and ccw directions, respectively, and proceed to explore [qi (0), q 2 (0)]c,,
in tandem. The segment R1, shown in red, is the portion of the ring
that transitions from being explored by C 2 in a cooperative search to
being visited first by C 1 in a competitive search. . . . . . . . . . . . .
7-1
111
A sample sequence of target capture times associated with the early
stages of a DE-CPRG. In the instance shown, C1 captures targets 71,
T2 , and 7 4 , while C2 captures targets 73 and 75 . If the statistics shown
are representative of steady-state behavior, then the aggregate utilities
of the cows would be U$'g(s) = 0.6 and U2(s) = 0.4, respectively. . . .
7-2
117
An isometric visualization of an instance of the DE-CPRG. The prior
fo
is shown in blue as a function of position along
. Also shown
are the origin and destination points associated with targets 7j,
'J+1,
and 7i+2 . At the instant shown, C 1 and C2 are searching 9R for W,.
According to
#y,
once 7j is discovered and transported from (9 to PDj,
75+j is popped from the queue of targets and appears on 9.
16
. . . . .
118
7-3
A snapshot of a DE-CPRG taken at the start of CPRGj. The origintarget density, fo, and destination-target density, fD, are shown in (a)
and (b), respectively. Targets are significantly more likely to (i) arrive
in E1 rather than 'Z \E 1 and (ii) seek transport to
7-4
0
2
rather than 'Z \8
2.120
Illustration of two scenarios used in the proof of Proposition 7.1. In
(a), C2 discovers a target at
0
, that requires transport to 9 b, a distance
L
away. During transport, C1 has time to optimally preposition herself
at Oc in preparation for the next stage. A finite time later, in (b), the
}
roles reverse, 1, discovers a target at 0 , that also requires transport to
6
7-5
b,
allowing
C2
to optimally position herself at 0, for the next game. . 122
Visual breakdown of a typical interval spanning the time between successive target captures for Ci using si = s-. On average, it takes Ci
time d to return to qV: after delivering 7j. From the perspective of
Ci, in the worst-case, Ci finds a target at time td(Tij)+, which, on
average, is delivered in time da.
7-6
. . . . . . . . . . . . . . . . . . . . .
132
Segments from possible sample runs, from the perspective of Ci, of a
D E-CPRG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
134
18
List of Tables
3.1
Utility payoffs for the simple two-agent search game in Example 3.4.
The possible actions of agent 1 are displayed in the leftmost column.
The possible actions of agent 2 are displayed along the top row. Given
each agent's search strategy, the first and second entry in each cell
represent the utility of agent 1 and agent 2, respectively. For example,
if agent 1 searches using si = a and agent 2 searches using s 2 = b, then
the probability of agent 1 finding the target is 1 and the probability of
agent 2 finding the target is . .....
.................
48
4.1
Summary of general and CPRP-specific notation used in the thesis.
54
5.1
Summary of CPRG-specific Notation . . . . . . . . . . . . . . . . . .
70
19
20
Chapter 1
Introduction
When eating an elephant, take one bite at a time.
Creighton Abrams
This thesis considers the decision-making process of mobile, self-interested search
agents that compete, against one another, to find targets in a spatial environment.
We are quick to point out that the adversarial scenarios of interest are fundamentally distinct from the cooperative formulations that dominate much of multi-agent
search theory. By and large, these existing works study the exploits of a team of
searchers that cooperate to efficiently locate a target. Unsurprisingly, the study of
search scenarios that stress inter-agent competition among searchers calls for a new
evaluative framework and a customized assortment of analytic methods. Providing
these elements and putting them to use is the central contribution of this thesis.
The role of this preliminary chapter is to, at a high-level, introduce the types of
problems that will be of interest. To this end, we recount a number of real-world
examples in which the competitive search framework features prominently. The aim
is to whet the reader's appetite and motivate why the problems considered are both
intriguing and relevant.
In particular, we will provide an example based on the
operation of a taxi system that will be revisited and serve as a motivational aid
21
at various points throughout the thesis. A contributions section discusses how we
see the work that comprises this document supplementing and extending the field of
search theory. Finally, this chapter provides an overview of the thesis's organizational
structure. This outline is useful as both a navigational aid and a preview of the story
that follows.
1.1
Motivation
The problems considered in this thesis result from the fusion of two fundamental, yet
previously disparate, ideas. The first key idea is that a mobile agent that wishes to
locate a target, but does not know the target's exact location, is obligated to search
for it. The second key idea is that competition naturally arises when multiple selfinterested agents each vie to acquire a scarce resource. The first point is the premise of
search theory. The second point, somewhat more subtly, touches on the competitive
undertones of game theory. This work considers scenarios that incorporate both of
these notions through the study of multi-agent systems in which the agents compete
to capture targets given only limited knowledge of where the targets are located. To
understand the void that the problems investigated in this thesis begin to fill, it is
useful to, very briefly, highlight the types of search problems considered to date by
those in the community. A more detailed account of the relevant literature will be
provided in the next chapter.
In probabilistic search problems, one or more search agents attempt to (efficiently)
capture a target that is indifferent to their actions. In many of these cases, search
plans must be devised given only limited knowledge of a target's location. An example
of a probabilistic search problem is the case of an explorer trying to find buried
treasure given a raggedy and faded map of the area.
When multiple agents are
involved in the search, their plans are often formulated in a cooperative context in
order to improve efficiency.
For example, coordinated planning may increase the
probability the target is found or reduce the expected time required to capture the
22
target. It also has the benefit of avoiding redundancies that could emerge if the agents
planned independently.
In pursuit-evasion games, the target assumes a more animated role and actively
chooses a fixed hiding location (immobile target) or trajectory (mobile target) to.
evade capture by a team of searchers. Again, the searchers act cooperatively in order
to capture the target efficiently, e.g., in minimum time or minimum expected time. In
other words, competitive tension exists only between the target and, collectively, the
team or search agents. The agents themselves have no preference for which agent, if
any, ultimately finds the target. An example of a pursuit-evasion game is the case of
an escaped convict who tries to evade capture by a team of police officers. Pursuitevasion games naturally incorporate an element of game theory, as it is reasonable,
and often necessary, for each party to factor in their adversary's actions when planning
routes and making search-related decisions.
Probabilistic search problems and pursuit-evasion games address decision-making
in a host of applications. However, they offer little guidance about how agents should
search when the agents, themselves, compete against one another to capture targets.
Pragmatically, it is fair to ask, why might one be interested in these scenarios? The
answer, in the author's opinion, is that there are a number of relevant venues where the
agent-versus-agent search dynamic features prominently. Before providing a collection
of examples, it is useful to first give an informal description of the exact relationship
that exists between agents and targets in the problems of interest. Naturally, a formal
discussion of each of these components will follow later chapter of the thesis.
In the search games considered in this thesis, each agent is adversarially aligned
with every other agent. Agents do not form teams, nor do they cooperate, unless
doing so expressly benefits all parties. Agents are, unless otherwise stated, and aside
from their initial conditions, homogeneous. Each agent has a prior on the location of
targets, but the exact locations of the targets are unknown. Moreover, an agent can
discover a target only when standing directly over it. Unlike pursuit-evasion games,
the targets in the games we consider are artifacts of the environment, not strategic
23
decision-makers. The targets are purely immobile and the locations at which they
appear in the environment are determined by a random process. Instead, the game,
as it were, is played among the search agents, with each agent trying to capture as
many targets as possible. To emphasize the stark differences that exist between this
search framework and the cooperative formulations previously described, we refer to
the problems considered in this thesis as competitive search games.
Returning to scenarios that emphasize the incentive for strategic decision-making
in a competitive search setting, consider the role of yellow cabs in Manhattan [73].
These taxis operate in what is called a "hail market". By law, yellow cab drivers may
only pick up passengers that have hailed them from the side of the street 169]. They
cannot schedule jobs in advance, nor can they respond to call-in requests. (Jobs that
originate under these circumstances are handled by a separate fleet of vehicles.) We
argue that yellow cab operations constitute a competitive search game. Abstractly,
the road network on which taxis drive may be viewed as a graph, whose edges and
vertices represent roadways and intersections, respectively. The targets are the passengers; they arrive dynamically according to an, albeit fairy complex socially-driven
and time-variant, spatio-temporal process. The game is played among the taxi drivers,
with each driver trying to maximize their individual revenue, which clearly requires
getting passengers onboard. To operate effectively, drivers must plan their routes by
accounting for the spatial demand pattern of passengers, as well as the location of
nearby cabs. An interesting feature of this system is that targets must be transported
from a pickup location to a dropoff location. That is, there is a service component
associated with capturing a target. The logistics of taxi operations will be revisited
at various points throughout this thesis to motivate specific problems of interest.
As a second example, consider the case of two rival shipwreck-recovery boats
searching for the outskirts of a jagged coral reef for the remnants of a treasure ship
lost at sea. Once more, we argue that this encounter has all the makings of a competitive search game. The environment is the subset of R2 that represents the waters
surrounding the reef. The lone target is the sunken ship. The game is played between
24
the two recovery boats, with each boat trying to discover the wreck (and any treasure that may be onboard) first. Given a priori knowledge of where the ship sank,
perhaps from crude sonar images, historical maps, word of mouth accounts of the
sinking, etc., each boat must chart a course to search the coastal waters surrounding
the reef. Once again, prudent search strategies must factor in not only probabilistic
information about where the target is likely to reside, but also the presence of a rival
salvage boat that harbors similar ambitions. A version of this scenario will motivate
the work in Chapter 6.
The preceding scenarios differ markedly in terms of workspace geometry, the number of agents involved, the processes by which targets arrive, and the time scales over
which searching takes place. This suggests competitive search games encompass a
broad class of search problems and there are potentially many other practical applications that fit naturally within the framework. For example, the same ideas emerge
in the prospecting and mining industries, where rival firms survey vast swaths of land
for gold, minerals, or oil deposits. Given preliminary geographic information, how
one of a handful of firms should prioritize testing potential mining sites in order to
be the first to file claims on the most lucrative locations, fits well within the domain
of competitive search games.
Along similar lines, imagine a military aircraft carrying sensitive information over
hostile enemy territory were to crash-land in a desert. How should authorities search
for the aircraft knowing other individuals, perhaps with unscrupulous intentions, are
also looking for the plane. Once again, the competitive search game framework is a
natural venue to pursue this question. Finally, competitive search games also have
relevant connections to the foraging behavior of animals in the wild and the diffusion
of bacterial colonies over a nutrient-laden agar plate. A sampling of these scenarios
are illustrated in Figure 1-1.
Given the apparent relevance of competitive search
games, it is surprising, at least to the author, that few results, even for scenarios
involving just two search agents, have been reported in the literature.
At this stage, we have, hopefully, convinced the reader that the competitive search
25
game framework captures the incentive for strategic decision-making in a host of
meaningful search applications.
However, as alluded to, the scenarios mentioned
above differ markedly with respect to key features and environmental parameters,
e.g., the number of search agents involved, the workspace geometries, the processes
by which targets arrive, and the time scales over which searching takes place. Unsurprisingly, it would be rather ambitious to expect a single formulation to capture the
nuances and peculiarities of each scenario. In this thesis, we will consider a collection
of idealized search games, with each encounter emphasizing one or more of the elements that punctuate the aforementioned scenarios. In analyzing these encounters,
this thesis provides the first rigorous analysis of competitive search games. A more
detailed exposition of our research philosophy and the contributions of our work is
discussed in the next section.
(a)
(b)
(c)
Figure 1-1: Depiction of real-world scenarios where agents compete, against one another,
to capture targets given limited information on the location of targets. (a) Snapshot of
taxi operations in Manhattan. In busy urban cities, the operation of taxi drivers trying to
find passengers can be represented as a competitive search game. (b) A sunken treasure
ship wrecked along a coral reef. The exploits that result from two rival recovery boats each
trying to find the ship using a crude sonar map of the area constitutes a competitive search
game. (c) A Kittyhawk P-40 that crash-landed in the Sahara desert during World War II.
In the event rescue and enemy forces had some idea of the aircraft's location, e.g., from
(intercepted) radio communications, and had each party launched a recovery operation, the
resulting race to locate the plane would be a competitive search game.
26
1.2
Contributions
This section provides a high-level synopsis of the contributions of the thesis.
A
detailed account of specific advancements can be found in the next section, where the
thesis is deconstructed on a chapter-by-chapter basis. Here, the focus is on conveying
the spirit of the thesis, defining the scope of the work, and remarking on the value of
the work going forward.
The major contribution of this thesis is a collection of algorithmic strategies that
agents can use to search for targets in an environment.
However, unlike existing
constructs, which have no competitive tension or pit a cooperative team of pursuers
against a target, the strategies presented herein are designed for situations in which
search agents compete against one another to find targets. As mentioned, we believe
this framework encapsulates the inter-agent search dynamics in many real-world scenarios, yet acknowledge that, in their fullest form, these problems introduce a number
of analytic and computational complexities.
To make headway, we will often restrict ourselves to encounters that involve two
agents and that take place in topologically simple environments. For example, many of
the problems considered involve two agents contesting targets on a ring. Despite these
restrictions, the search scenarios nevertheless support an assortment of interesting and
sometime surprising behaviors. Moreover, the modest furnishings of these problems
allow us to conduct a formal analysis of the constituent decision-making, often in the
form of quantifiable performance bounds, if not equilibrium strategies for the agents
involved. This approach not only caters to the author's research style, but also serves
to compile a set of initial competitive search game contributions. In this way, our
efforts begin the process of populating a toolbox of competitive search game results
that may prove useful in tackling more elaborate problems.
The next section outlines the content and contributions of the thesis on a chapterby-chapter basis.
27
1.3
Organization
This thesis is organized in chapters. Chapter 2 begins our investigation by providing
an overview of the relevant literature. By and large, this consists of contributions
to the fields of probabilistic search, pursuit-evasion games, and persistent planning
problems.
Included here is an overview of the Cow-Path Problem.
Many of the
scenarios considered in this thesis are an adversarial twist on this well-known problem,
so there is a vested interest in detailing its finer points. Throughout, we adopt the
philosophy that by understanding the pillars currently in place, one can better define
and appreciate the contributions of the work in this thesis. Still in a preparatory role,
Chapter 3 provides a brief overview of some of the technical details used in subsequent
chapters. Specifically, the game-theoretic terms discussed will be used to frame our
discussion of algorithmic search strategies.
With the requisite background material in place, the thesis moves straight into
novel material. The environment of interest in Chapter 4, as well as many of those that
follow, is a ring. Rather than plunge headfirst into a treatment of competitive search
games on a ring, we adopt a more tempered approach. As a prelude, we consider the
problem of how a hungry cow should search the ring to find a patch of clover in the
minimum expected time, given only a prior on the clover's location. This problem is
thematically very similar to the Cow-Path Problem, and serves as a stepping stone to
the adversarial encounters that follow. Iterative algorithms are provided for finding
optimal search strategies and, for games with bounded target densities, a bound is
given on the maximum number of times an intelligent cow would ever turn around.
In this latter case, we show that an optimal search plan may be expressed as the
end product of a nonlinear program followed by a trimming algorithm. This result is
noteworthy as the Cow-Path problem has no known closed-form solutions for general
target distributions.
Without further ado, Chapter 5 considers the canonical problem of this thesis: the
Cow-Path Ring Game, a scenario in which two hungry cows compete, given a prior,
28
to find a patch of clover on a ring. Upon introducing the necessary notation and
terminology, the problem is formally defined. Many of the games discussed in later
chapters will be variations of this formulation. A concerted effort is made to explain
why, for a number of reasons, including the realtime and feedback nature of the game,
the Cow-Path Ring Game is a challenging problem. In response, a simplified version
of the game is presented, one in which each cow may turn at most once. With this
restriction in place, an iterative algorithm is developed that establishes the existence
of an e-Nash equilibrium. Subsequently, a dynamic-programming-based approach is
used to extend this result to games in which each cow may turn at most a finite
number of times.
Moving on, Chapter 6 considers a potpourri of interesting scenarios that are sufficiently different from the Cow-Path Game to justify a chapter of their own. By imposing a cost each time a cow turns, a bound is developed on the maximum number of
times an intelligent cow would ever reverse directions. Continuing, we consider games
in which each cow maintains a unique prior on the target's location. Carrying this
momentum forward, we consider games where one cow is in the advantageous position
of knowing where her rival suspects the target is located. This dichotomy requires
specification of a behavioral model for both the more-informed and less-informed cow.
Assuming the less-informed cow behaves conservatively, it is shown the game admits a
Nash equilibrium. Moreover, for select distributions, the more-informed cow is able to
increase her utility by leveraging her situational advantage. The asymmetric nature of
these games invites the opportunity to study the social welfare of competitive search
games. To this end, a cooperative search strategy is presented that is socially optimal
for any set of target priors. This treatment transitions naturally into a discussion,
albeit a brief one, of the price of anarchy in competitive search games.
Chapter 7 considers competitive search games in which targets arrive dynamically
on a ring. The persistent nature of these games places an emphasis on strategies
that ensure targets are captured efficiently in the long-run.
Among the dynamic
games introduced are scenarios with transport requirements, in which targets, once
29
found, must be delivered to a destination point. A defining attribute of the search
strategies in these games concerns how a cow should position herself while her rival
is preoccupied delivering a target.
As a first contribution, we show that in any
equilibrium of a two-cow dynamic search game, each cow captures, in steady-state,
half of all targets. With this benchmark established, it is shown that greedy search
strategies can, for select target distributions, lead to arbitrarily poor capture rates.
Because it is difficult to quantify the long-term effect of short-term actions in an
equilibrium setting, we, instead, focus on defensive or conservative search strategies
and lowerbound the expected fraction of targets captured using these methods. This
bound is then compared with the theoretically optimum value, i.e., one half, for select
target distributions.
Finally, Chapter 8 summarizes the prominent ideas of the thesis, reflects on the
contributions made, and evaluates the resultant state of competitive search games.
It also takes the time to outline an assortment of open research directions that have
arisen during the development of this work, but that, on account of time constraints,
or their tangential nature, or both, have received only modest deliberation.
30
Chapter 2
Background Material
This chapter provides an overview of related work. Recall that competitive search
games stress the goal of finding targets in an agent-vs-agent setting. We view search
theory as the natural domain of our work, and game theory as the appropriate framework to study the problems of interest. Consistent with this mindset, we will not
attempt to summarize works from the field of game theory. Rather, a short compendium of the basic game-theoretic ideas, which constitute tools for our study, will
be provided in Chapter 3. These ideas are well established and covered in any introductory text on the subject. The major contribution of this thesis lies in using these
ideas to understand search in a novel setting. To this end, this chapter surveys relevant contributions to the field of search theory. The intent is to provide an overview
of the literature, both pre-existing and ongoing, that has relevant connections to
competitive search games, be the association in terms of application, methodology, or
both. This highly focused survey of select works in the fields of probabilistic search,
pursuit-evasion games, and, what we will refer to as persistent planning problems,
fosters an appreciation for the state-of-the-art and assists in defining the ultimate
identity of the thesis. Unlike competitive search games, the vast majority of works
covered in this chapter do not pit one search agent against another. Nevertheless,
they share salient features with the problems we consider or serve to better position
the thesis work within a larger search narrative.
31
2.1
Search Theory: An Introduction
Searching for a lost or hidden item is an age-old problem. People still misplace their
keys, passport, phone, cash, etc., and, when they do, typically search to find them.
Searching is also associated with larger-scale recovery operations, e.g., rescuing a
camper lost in the wilderness. In this thesis, search theory is defined as the study of
problems that take place in a spatial environment and involve n agents trying to find
m targets. Note this definition is quite broad. For example, it says nothing about the
interaction between agents, which could be of a cooperative or competitive nature,
nor the manner in which targets are distributed and arrive. In many cases, the task
of searching is constrained by the need to locate targets efficiently. For example,
in some applications, there is the possibility a target goes undiscovered.
Here, it
makes sense to use a strategy that maximizes the probability of finding it. In other
applications, it is known that a target resides somewhere in a bounded environment.
Although a lawnmower-style sweep of the workspace is guaranteed to find the target,
it is more appropriate to use a search strategy that minimizes the expected discovery
time [31]. In short, interesting search problems combine the goal of finding targets
with the need to provide some form of performance guarantee.
When people misplace basic everyday items, they typically launch small-scale,
ad-hoc campaigns to relocate them. These undisciplined approaches generally suffice
for finding low-value items in small spaces.
However, as the value of the object
increases, e.g., a human life, the size of the environment grows, e.g., a forest hundreds
of square kilometers in size, or both, successfully searching demands a more structured
treatment.
The first rigorous approach to solve search problems was undertaken
during World War II when the Anti-Submarine Warfare Operations Group was tasked
with finding submarines in the Atlantic [58], [63]. The declassification of these efforts
kicked off a mathematical investigation of search-related problems. Today, search
theory is an established field within operations research. More recently, advances in
autonomy have initiated efforts to revisit traditional search paradigms from a robotics
32
perspective. This movement has attracted interest from control theorists, roboticists,
and computer scientists. These researchers have raised the profile of new and emerging
applications within the search community, and advocated for more algorithmic and
pragmatic approaches to solve many existing search problems [17],
[331.
Auspiciously, search theory researchers have surveyed their field with remarkable
regularity, including the authors of [33], which recount many of the efforts detailed
here. Owing to these efforts, search problems generally obey a well-defined taxonomy.
Figure 2-1 provides a visualization of select disciplines within the hierarchy of search
problems. Rather than reiterate existing surveys, the focus of this chapter is on conveying advancements that help to frame the contributions of our work. Accordingly,
we freely pick and choose to cover the topics we deem most relevant. In doing so, we
often ignore or only briefly touch on contributions that while extremely significant,
are of minimal importance for future discussion. For completeness, the interested
reader can find detailed accounts of the classical period of search theory, spanning
the first forty-or-so years of the field, which we touch on only briefly, in [17], [371,
[62], [92].
The last section of the chapter highlights problems that stress persistent planning
for applications that feature indefinite horizons of operation and time-varying environments. Many of the results reported have emerged only recently, and, so far, have
had only tenuous affiliations with searching. Nevertheless, their long-term approach
to planning is pertinent for the work in Chapter 7, where we consider games with
dynamically arriving targets. In briefly surveying these works, the intent is to convey
how our efforts contribute to the state-of-the-art.
33
search theory
search games
probabilistic search
imperfect sensor
static
evader
mobile
evader
agent(s) vs target
perfect sensor
statiC
evader
cow-path
problems
mobile
evader
games
Figure 2-1: A partial taxonomy of select sub-disciplines within the field of search theory.
It is worth reinforcing that the families of games represented are only a relevant sampling
of select research areas in the field, and not an exhaustive listing. The box shaded in blue
represents agent-vs-agent or competitive search games, the class of problems that will be
the focus of this thesis. The location of CSGs in the tree indicates these problems share
fundamental attributes with probabilistic search problems and pursuit-evasion search games.
The image was inspired by similar figures reported in 1171, [331.
2.2
Probabilistic Search
In probabilistic search problems, a search agent, located in environment Q, attempts
to capture a target, T, whose initial position and movement are independent of the
agent's actions [33].
That is, T is impervious or indifferent to being captured. In
many cases, the agent must devise a search plan given only a prior density,
fy
:
Q -- R>o, on T's position. Initial efforts to place probabilistic search problems on
a firm theoretical foundation were undertaken in [58], and later expanded upon in
[59], [60]. Here, geometric arguments are used to characterize the sensor footprint
and detection efficiency of various sensor rigs, e.g., aerial surveillance with a human
spotter. Included is a result linking random search to a detection probability function
that is exponential in the time spent searching a bounded region.
As Figure 2-
1 indicates, probabilistic search problems are naturally categorized according to the
sensing capabilities of the agents involved. For most problems, the natural distinction
is between sensors that are (i) imperfect, i.e., may generate false-negatives, and (ii)
perfect, but have a finite sensing radius range.
34
2.2.1
Search with an imperfect sensor
When an agent's sensor is imperfect, it may fail to detect a target within its sensing
zone. Accordingly, whether the target is found must be discussed in probabilistic
terms. For the time being, assume the target is stationary. The detection function
b : Q x R>O -
[0, 1], assigns the probability b(q, z) to finding the target at point
q E Q, given z amount of effort is spent searching q, and the target is in fact at
q. Given fW, b, and a constraint budget C on the resources (e.g., time) that can be
devoted to searching, the canonical imperfect-sensor search problem is to find z*(q)
the optimal expenditure of search effort at each point q E Q that maximizes the
probability of finding the target subject to all constraints, i.e.,
z* = argmax
J
f(q)b(q, z(q))dq
(2.1)
z:Q-+-Ryo I
s.t. Jz(q)dq ; C.
qcEQ
(2.2)
In continuous environments, solutions to (2.1)-(2.2) typically employ Lagrange
multiplier techniques [60], [92], [95]. Hypothesis testing [40], [741 and convex programming [28] methods have also been used.
If an optimal search plan z*, with
search budget T, has the property that for all t < T, z* restricted over [0, t] is optimal for search budget t, z* is said to be uniformly optimal -
a desirable property.
In [7], [61], it is shown that uniformly optimal search plans exist for a broad class
of problems, provided b is a regular function, i.e., b(.) is increasing, but provides diminishing returns in the argument, [33], [93]. Unfortunately, the search plans these
methods generate are not always feasible, and may require the agents jump or teleport
between points in order to realize them [10],
[35],
[76]. Even today, search plans often
neglect this fundamental requirement, providing high-level search instructions, but
little instruction as to how these may be transformed into drivable search routes, i.e.,
realizable paths [26], [66]. The search plans considered in this thesis all correspond
to continuous, i.e., realizable, trajectories.
35
From a search perspective, an environment Q is discrete if it may be partitioned
into regions, 1, ...
,K
such that b(k, z) is the probability of finding the target in
region k, given z amount of effort is* spent searching region k (as a whole) and the
target is in fact in region k. Unfortunately, Lagrange methods do not extend directly
to discrete Q. The necessary amendments are discussed in [50]. As an analog to
uniform optimality, it was shown in [30] that optimal search plans in discrete Q are
temporally greedy if at any point during search, the next cell to be further inspected
is the one that maximizes the marginal gain in detection probability per marginal
effort expended. Operating within discrete environments makes it easier to formulate
and analyze variants on the basic problem structure. [17] and [33] provide an solid
overview of these and other endevors. In this and the next paragraph, we reiterate a
number of these efforts to provide a sense for the scope of problems that have been
addressed. Returning to discretized search paradigms, partitioning the environment
and the effort expended into discrete components is a much more natural framework to
model, among other things, scenarios where a quantized level of effort must expended
each time a region is searched and the rewards for capture are discounted in time
or search effort. More discussion of these problems, often referred to as sequential
searching, can be found in [52], [68], [93],
194].
A recent trend in one-sided searching is to consider operations in physicallyexpansive environments. In these settings, streamlining the numerics is an important consideration to alleviate computational bottlenecks.
Efforts in this vein are
recounted in [82]. Other works have focused on search problems with false contacts
[53], sensors that generate false positives [45] or, similarly, contain obstacles in the
environment.
In the optimal stopping problem, it is not known whether or not a
target actually resides in the environment [51], [52]. Rather, after some amount of
exploration, the agent must report (i) the target is not in Q, or (ii) the target is in Q
alongside its most likely location. Of course, the agent may reply incorrectly. These
problems introduce a new class of performance metrics, e.g., the probability the agent
makes a correct decision about target's inclusion in Q. Dynamic formulations of this
problem use observations to update target belief functions within a Bayesian setting
36
[34]. This latter class of problems results in feedback search plans that use recorded
observations to guide future decision-making. Motivated by an interest in developing autonomous robotic search platforms, a number of recent efforts have considered
cooperative, multi-agent formulations to solve variants of the probabilistic search
problem. For example, [32] uses a Bayesian approach that allows a team of pursuers
to quickly decide (though not necessarily correctly) whether or not a target is in the
environment, and dynamic programming-based solutions to solve similar problems
are discussed in [65]. However, as these methods often scale unfavorably with respect
to key factors, e.g., the number of search agents or the workspace complexity, research
efforts have also been directed at developing suboptimal algorithms with known, and
acceptable, performance guarantees.
2.2.2
Search with a perfect sensor
A sensor is perfect if it never registers false negatives. Occasionally, it makes sense
to rule out the possibility of false positives as well. It is important to remember that
perfect sensors are still assumed to have a finite sensing radius. If all targets are
stationary, then there is no need for an agent equipped with a perfect sensor to scan
any point more than once for targets. For this reason, perfect-sensor problems focus
not only on finding targets, but doing so in the minimum expected time. Of course,
if targets arrive dynamically, it is likely necessary to revisit points throughout the
workspace. This section considers a collection of perfect-sensor, probabilistic search
problems for static environments.
The Cow-Path Problem, or simply CPP, was proposed, independently, by Beck
[12] and Bellman [16] in the 1960s.
As the reader may have surmised, the CPP
has strong thematic connections with the encounters studied in this thesis. This is
reflected, most notably, by the fact that the players or agents in our games are also
cows. Since its inception, the CPP has become a canonical problem in the fields of
probabilistic path-planning, robotics, and operations research. The exact statement
of the CPP varies from field to field. For example, those in the online algorithms
37
community typically concentrate on a version that emphasizes search performance
in the worst-case [351. In this thesis, however, we will be interested in the following
stochastic version, which places a premium on performance in the average case.
Definition 2.1 (The Cow-Path Problem). A hungry cow is positioned at the
origin of a fence represented by the real line R. The cow knows that a patch of clover
resides somewhere along R, but has only a prior, fT : R --
R>o, on its location.
The cow can move at unit speed and reverse direction instantaneously. On account
of being severely nearsighted, the cow can locate the target only when she is standing
directly over it. How should the cow search to find the clover in the minimum expected
time?
fence
Figure 2-2: Illustration of the key features of the stochastic or average case Cow-Path
Problem. Starting from the origin, the cow explores R in search of clover T. A hypothetical
search plan is shown in gray. In the instance depicted, the cow makes four turns before
finding the target at the point marked with a red exclamation mark.
In the CPP, the cow's sensory capabilities are represented by a sensor that has
zero sensing radius, but perfect accuracy. Figure 2-2 provides a visualization of the
CPP. In the decades since the CPP's inception, various analytic conditions necessary
for search plan optimality have been reported, e.g., [9], [12], [13], [141, [15], [54]. Given
these contributions, it is perhaps surprising that exact (analytic) solutions are known
for only a handful of target distributions; specifically, the rectangular, triangular,
and normal density functions [42]. For general distributions, the common approach
remains to discretize the workspace and rely on dynamic-programming techniques.
38
Recently, a variant of the CPP, in which costs are levied against the cow each time
she turns, was investigated in [361.
2.3
Pursuit-evasion games
Pursuit-evasion games involve one or more pursuers trying to capture an evader.
Typically, these games can be effectively classified according to which aspects of the
pursuit receive the greatest emphasis, which, in turn, drive the mathematical techniques and solution methodologies employed. Unless otherwise stated, it is assumed
throughout the section that both the pursuer and evader are mobile.
The Princess-and-Monsterand Homicidal-Chauffeurgames [11], involve one player
attempting to capture a more agile adversary. In these and other games where it
is critical to model player dynamics, e.g., a finite-turning radius, acceleration constraints, etc., the interplay between pursuer and evader may be cast as a differential
game. The player's objectives and dynamics are incorporated in the Hamilton-JacobiIsaacs equation, which, when solved, gives each player's equilibrium strategy in the
form of a full-state, position, feedback control law. Unfortunately, as more pursuers
are recruited for searching, or the workspace becomes more complex, e.g., obstacles
or irregular perimeter boundaries are introduced, these methods quickly become intractable [33]. Differential games are sufficiently distinct from competitive search
games that to avoid unnecessary digression, the interested reader is referred to [47],
[48] for a detailed account of the subject.
Combinatoric search games strip away details of the problem deemed superfluous
for the application at hand. Instead, player motion and workspace geometry are modeled using simplified dynamics and basic topological structures. These abstractions
permit an insightful high-level study of pursuit-evasion scenarios that would not otherwise be possible. Within this class, ambush games stress the need for algorithms
that ensure the evader is captured, even under conditions that are highly unfavorable
or statistically rare. In the cops-and-robbers game, a robber (evader) and one or more
39
cops (pursuers) take turns moving between vertices of a graph [3], [75]. The cops win
the game whenever a cop and the robber are collocated at a vertex. When capture
can always be avoided, by judicious play on the part of the robber, the robber wins.
The cop number of a game is the minimum number of cops needed to ensure capture
for any initial conditions of the game. Generally speaking, the bulk of research in
this area is aimed at characterizing winning conditions, the cop-number, and monotonicity, i.e., the property that the number of safe vertices decreases in the number of
cop moves, in relation to graph topology. For example, a famous result is that every
planar graph has a cop number of three or less. Extensions of these ideas to games
played on hypergraphs are discussed in [43] for the marshalls-and-robbersgame.
Parson'sgame considers the problem of clearing a building (a graph with nodes
and edges representing rooms and hallways, respectively) that has been infiltrated by
an infinitely fast and agile trespasser [78], [79]. The problem is similar to the artgallery problem [76], except the super-human speed of the assailant requires rooms
be swept in a manner that ensures previously cleared spaces are not recontaminated
[78]. This is achieved by placing guards at topologically-inspired locations to ensure
the perpetrator is confined to an ever diminishing region of the workspace. Research
efforts have focused almost exclusively on the search-number, i.e., the minimum number of guards needed to locate the evader in the worst-case. Once again, much less
effort has been devoted to developing search algorithms to carry out such a sweep,
or to characterizing the time complexity of these schemes. The GRAPH-CLEAR
problem is an extension of Parson's game in which multiple agents are, in general,
required to guard doorways and sweep rooms [56]. Determining the minimum number
of pursuers is known to be computationally challenging, but efficient algorithms have
been reported for graphs possessing special structure, e.g., tree graphs [55], [57].
The lion-and-man game is thematically similar to cops-and-robbers, but unfolds
in two-dimensions, typically a circle or a polygon. The hungry lion tries to eat the
man, while, as one may have guessed, the man tries to escape. Conditions for the
lion to capture the man are reported in [84] for a turn-based version of the game.
40
Extensions to higher-dimensions, most meaningfully R 3 , are reported in [4], [641. A
robotics-inspired variant of the game, in which the pursuer has line-of-sight visibility,
but cannot see through obstacles or boundaries is proposed in [44], [67]. Strategies
for two lions to capture the man in any simple polygon and for three lions to capture
a man in a two-dimensional polygon with obstacles are reported in [49] and [221,
respectively. In many of these problems, randomized search schemes are employed
to minimize the search number. Variations of the game involving finite sensing radii
are reported in [4], [24].
The problem of coordinating multiple agents that each
have a finite sensing radius remains an open problem [331, as does quantifying the
time-complexity of possible capture schemes.
The essential elements of Parson's game are extended to hyper-graphs as in [43],
where they refer to the encounter as a marshalls-and-robbersgame. Here the graphtheoretic notion of tree width features prominently in analyzing the game. Establishing connection to other notions of graph theory, various complexity measures for
pursuit-evasion games are characterized in [25], [83].
A more complete survey of
pursuit evasion in graphs can be found in [6], [391, and more recently in [33] which
originally pointed the author to these works.
A second class of combinatoric search game concerns the exploits of a pursuer who
tries to capture an evader in minimal time. The evader, not being one to go quietly,
strives to delay capture for as long as possible [5], [42]. Owing to these competing
objectives, optimal player strategies are best understood using the language and formalism of game theory. Equilibrium strategies, which are typically highly dependent
on workspace geometry, are reported in [5] for games taking place in an assortment
of environments, including line segments, specialized graphs, and compact regions of
R2. Once again, the use of mixed strategies is critical to describing equilibrium play.
Also in [5], the authors consider team search games in which multiple pursuers scour
the environment in an effort to locate the evader quickly. In this respect, the game
is played between the team of pursuers and the lone evader; individual pursuers have
no preference for which of them, if any, ultimately succeeds in capturing the evader.
41
As in the problems we consider, the structure of player strategies in adversarial
search games depends heavily on a number of key factors, including sequencing of
player moves, e.g., simultaneous versus turn-based; the information made available to
the players, e.g., finite versus infinite sensing radius; the geometry of the environment;
and the number of pursuers, i.e., search agents, participating in the game. Depending
on the specifics of the problem, solutions and efficient algorithms for computing them
are either known, known to scale poorly with problem complexity, or remain open
research problems [33].
It is useful to take stock of the body of work discussed thus far in the context
of competitive search games. Recall that competitive search games emphasize the
decision-making process of search agents that compete (rather than cooperate) to
capture a target. In this setting, agents must individually design and execute search
plans given only a prior on the target's position. This requirement recalls the basic
premise of probabilistic search problems. However, due to inter-agent competition
for targets, prudent search strategies must account for the presence of other agents in
the workspace, and thus prove efficient in a game-theoretic sense. This latter point
recounts the methodology used to analyze select pursuit-evasion games. The next section considers problems that unfold in dynamically changing environments. Although
not directly search-related, these problems help to frame the work in Chapter 7, where
targets are contested on an ongoing basis in a time-varying environment.
2.4
Persistent planning problems
When an environment is dynamic or contains an inherent level of uncertainty, it is
often necessary to employ agents that perform a task indefinitely. This requirement
is present, for example, in surveillance and patrol applications. Here, agents must
inspect or provide service to specific regions of the workspace on an ongoing basis
[21]. Describing policies that realize this long-term functionality is best achieved
by using iterative constructs and adopting an algorithmic perspective. This section
42
draws attention to research, much of which is quite recent, that stresses the need for
precisely this type of persistent planning.
In dynamic vehicle routing (DVR) problems, one or more service vehicles mobilize
within a workspace to accomplish a task; usually to satisfy a set of demands that arrive
dynamically at a time and location that are unknown in advance. However, unlike the
search scenarios we consider, service vehicles in DVR problems are alerted to each new
demand and its location at the time of the demand's arrival. To provide a high quality
of service, agents adopt adaptive policies that reflect the current state of the system
and, to some extent, a projection for how the set of outstanding demands will evolve
[80]. A typical quality of service metric is the expected amount of time, in steadystate, that a demand spends in the system before receiving service 118]. In the dynamic
traveling repairman problem (DTRP), vehicles are notified of each demand's location,
provide on-site service, and the service times are modeled as random variables [191,
[20]. More exotic variants of the DTRP framework that capture features such as
customer impatience, various demand priority levels, and nonholonomic vehicular
constraints are studied using queueing-based policies in [27].
A decentralized DTRP policy that performs optimally under light-load conditions
is considered in [8].
More relevant to the research at hand, however, is the fact
that the policy supports a game-theoretic interpretation in which a socially optimal
Nash equilibrium exists. Service vehicles treat the centroid of all previously serviced
demands as a home base, and return to wait at this location when the environment is
void of outstanding demands. When demands do appear, the vehicles venture forth
to provide service, adjusting their home base as necessary. In this case, the utility
function of each vehicle places a premium on being much closer to a demand than
the next closest vehicle in the workspace.
The problem of finding a target that intermittently emits a low-power distress
beacon is presented in [881. A vehicle with finite sensing radius is capable of locating
the target only when it senses a signal that originated from within its sensing zone.
The expected time for a vehicle to discover the target using periodic search paths
43
is analyzed. In [46], the persistent patrol problem assumes that targets enter the
environment dynamically according to a renewal process, and the vehicle has a prior
over each target's location. Persistent search plans are designed to, in steady-state,
and as the sensing radius becomes small, locate targets in minimal expected time.
Extensions to multi-robot persistent patrol scenarios are provided in [38] in the context of optimal foraging strategies. The main result is that the frequency of visits
to a region is proportional to the third-root of the target density in that particular
region to ensure targets are collected in a timely manner.
The task of providing persistent surveillance to a finite set of feature points within
a region is considered in [87]. The points in question could represent, for example,
sites at which hazardous waste materials have accrued and must be collected for safe
disposal. A periodic speed controller for a fixed vehicle route is designed to minimize
the maximum steady-state accumulation of waste across the sites. A closely related
problem is addressed in [86], where observation points are classified based on the
required visitation frequency and a control policy is enacted that ensures each point
is visited sufficiently often. Similar themes are explored in the context of perimeter
patrol, coverage, and surveillance applications, e.g., [1], [2], [26], [29], [31], [81], [85].
The collection of persistent planning problems outlined above emphasize the need
to continually revisit sites within the environment.
This involves translating the
operational objectives of the problem, in conjunction with the geometric and temporal parameters of the systems, into a visitation schedule for the various sites in the
workspace. In Chapter 7, we consider games in which targets enter the environment
dynamically.
Here again, agent planning requires deciding how often to visit spe-
cific points within the workspace. Of course, the competitive nature of our problem
requires integrating these considerations with the recent travel histories of nearby
search agents.
44
Chapter 3
Mathematical Preliminaries
This chapter provides a focused summary of the main theoretical tools used in the
thesis. Because we are interested in characterizing search strategies that find targets
efficiently in the face of inter-agent competition, this amounts to a presentation of
key game-theoretic ideas. As mentioned in the previous chapter, search theory draws
upon a range of techniques, but has few transcendent results. Rather, contributions
tend to be application-specific. In contrast, game theory has a number of core ideas
that permeate the field and provide a framework to characterize and study a range
of multi-agent encounters, search-related or otherwise. Here, we recount three such
concepts: Nash equilibria, best-response strategies, and maximin strategies. A full
discussion of these ideas may be found in any of the many textbooks on the subject,
including [111, [41], [70], [771. Finally, a short example is provided to better illustrate
each concept. Although well-established, these ideas, taken collectively, provide an
able toolset for characterizing strategic play in competitive search games.
3.1
Game theory
For the purpose of this thesis, a game constitutes an encounter involving n players in
which the utility, i.e., well-being or payoff, of each player depends on the collective
actions of all players. Game theory studies the decision-making process and ultimate
45
outcome of players in a game. In competitive search games, the number of targets a
cow captures depends not only on her search strategy, but also the search strategy
of each rival cow. Therefore, the scenarios of interest are games not only in the
vernacular, but also in the game-theoretic sense. Moreover, game theory is the natural
tool to describe and analyze these encounters. No concept is more central to gametheory than that of a Nash equilibrium, or simply NE, a strategy profile, i.e., a
collection of player strategies, from which no player has a unilateral incentive to
deviate.
For our work, the closely related concept of an c-NE also proves useful
[71], [72]. To this end, we present the following definition, which closely follows the
presentation in [41].
Definition 3.1 (Pure Strategy E-Nash Equilibrium). Let g be a game with n
players, V = {1, . . . , n}. Let Si be the strategy set of player i and S = Si x
...
x Sn
the set of strategy profiles. For s E S, let si be the strategy played by player i in s
and s-.
the strategy profile of all players other than i in s. Let U2(s) be the utility
of player i under s. For e > 0, a strategy profile s* E S is a pure strategy c-Nash
equilibrium of 9 if for all i E V,
U (si , s_) + E > U (si, s*j) for all si E Si.
(3.1)
In words, (3.1) says that s* is an E-Nash equilibrium, or E-NE, if no player can
unilaterally deviate from s* and improve her utility by more than c. In this thesis,
we will be interested in cases where e is small. The more traditional notion of a Nash
equilibrium (NE) corresponds to an E-NE with E = 0. In this way, it is useful to think
of E in (3.1) as a buffer or measure of indifference, i.e., it is not worth going to the
trouble of shifting strategies if the gains do not exceed E.
In most games, it is assumed that player strategies are selected prior to the game
and then revealed, simultaneously, when the game begins. Nevertheless, it is useful
to have a notion for how player i would respond were she to know the strategy of
each of her rivals beforehand. The following idea meets these requirements.
46
Definition 3.2 (Best Response Strategy). Let g be a game with n players. Retain
all of the notation introduced in Definition 3.1. For i E V and s-i E Si, si E Si is a
best-response of player i to s-i if
U (si, sj)
U (s', sj) for all s' E Si.
(3.2)
In other words, if player i were to (omnisciently) know her opponents would, collectively, play s_, she could maximize Uj by playing si. To capture the fact that it is
possible for player i to have multiple best-responses for a given sj, the best-response
relation of player i to s-i E S_, is defined as
BR(sj)
=
{si E Si : Ui(si, sj) > Ui(s', sj) for all s' E Si}.
(3.3)
For game g, let SNE denote the set of Nash equilibria. Nash equilibria and bestresponse strategies are related by the following condition:
s* E SNE <-> s E BRi(s*i), Vi E V.
(3.4)
This stipulation is consistent with the idea that, in any equilibrium, no player can
unilaterally deviate from s* and increase her utility. Although strategies that comprise
a NE profile are intuitively appealing, in some cases, it makes sense to focus on
strategies that offer guarantees regardless of what strategies one's rivals play. The
following idea serves this purpose.
Definition 3.3 (Maximin Strategy). Let q be a game with n players. The strategy
si E Si is a maximin strategy for player i if
min Ui(si, s_j) > max min Ui(s', s_j).
s-i ES-i
Ss'
Si s-jcEs-i
(3.5)
In other words, if player i was concerned that her fellow players may act (intentionally or otherwise) to minimize Uj, then playing a maximin strategy is the intelligent
thing for her to do. In general, however, playing a maximin strategy may be overly
47
.
Table 3.1: Utility payoffs for the simple two-agent search game in Example 3.4. The possible
actions of agent 1 are displayed in the leftmost column. The possible actions of agent 2 are
displayed along the top row. Given each agent's search strategy, the first and second entry
in each cell represent the utility of agent 1 and agent 2, respectively. For example, if agent
1 searches using sl = a and agent 2 searches using s2 = b, then the probability of agent 1
finding the target is y5 and the probability of agent 2 finding the target is
5
P1\P2
a
1
a
b
1
4
)'
1
3
2
5
5'5
2 1
3
1 3
5
1
3 1
1
3
1
47
676
C
c
b
2
conservative. In many cases, player utilities are at least partially aligned and player
i may do herself a disservice by guarding against a worst-case scenario. Nevertheless,
in certain scenarios, including zero-sum games, congestion games, and games where
it is difficult to resolve utility payoffs for overly complex strategy profiles, playing a
maximin strategy may be a level-headed and reasonable course of action.
Example 3.4 (A Simple Search Game). To cement the ideas of a NE, bestresponse strategy, and maximin strategy, consider a very simple search game, g,
involving two search agents (players). In !,
a target arrives at a random location
in the environment. The probability that an agent captures the target is a function
of the agents' initial conditions and the search strategies used by each participant.
For simplicity, assume the initial position of each agent is fixed and there are three
abstract search strategies, denoted as a, b, and c, that each agent may choose from.
Each agent's utility is the probability they succeed in capturing the target. Agent
utilities for each combination of search strategies are shown in Table 3.1
By inspecting each combination of search strategies in Table 3.1, we see the profile
8*
=
(s1 , s2 ) = (b, c) is the only pure strategy NE of g. Under this search profile, the
probability that agent 1 finds the target is two-thirds, and the probability that agent
2 finds the target if one-third. From here, if agent 1 deviates, she has at best a sixty
percent chance of finding the target. Similarly, if agent 2 deviates she has at best a
48
twenty-five percent chance of finding the target.
To illustrate the notion of best response, observe that if agent 2 were to play
s2= a, then agent l's best response is to play s,
=
b, i.e., BI1Z(a) = {b}. Similarly,
BR 2 (c) = {a, c}. Finally, consider that
min U 1 (a, s 2 )
=
(3.6)
minl1(b, s 2 )
=
(3.7)
min U1 (c, s 2 )
= 1.
(3.8)
822
S2
S2
Consequently, max minUi(si, s2 )
S1
S2
=
2
and agent I's maximin strategy is s
= b,
and by playing s, agent 1 has at least a two in three chance of finding the target.
Following a similar line of reasoning for agent 2, we have
(3.9)
min U,(si, a) =
Si
min U1(s, b)
Si
(3.10)
=
(3.11)
min Ui(si, c) = 1.
S1
Consequently, max minU 2 (s 1 , 8 2 )
S2
81
=
} agent 2's maximin strategy is s
=
c, and by
playing s2 agent 2 guarantees she has at least a one in three chance of finding the
target. Note that in this particular example, s* = (s , sl). In general, two-player,
zero-sum games, i.e, games in which U1(s) + U2(s) is constant Vs E S, have the
property that s* E SNE <-> si is a maximin strategy of player i, i = 1, 2.
49
50
Chapter 4
The Cow-Path Ring Problem
This chapter considers a variant of the Cow-Path Problem that takes place on a ring,
instead of the real line. It turns out that pitting two agents against one another in a
contest to capture a target on the real line is not especially interesting. In this case,
there is a unique and very simple equilibrium profile which is independent of each
agent's initial position and the target distribution. We will discuss this strategy in
its entirety in the next chapter. Interesting strategic play can, however, emerge when
the venue shifts to a simple, closed curve, i.e., a ring or a perimeter. Rather than
study search games on rings straight away, this chapter serves to broach the topic
more gradually, by first considering how a single cow should search for a patch of
clover on the ring. Here, the rationale is to first understand how the (intelligent) cow
should search in a relaxed and solitary setting. The insight gleaned from this analysis
will aid in understanding how the cow should, once a second hungry cow is added to
the ring, search in a competitive environment. Equally important, the problem is, in
its own right, an interesting variant of the Cow-Path Problem.
4.1
Introducing The Cow-Path Ring Problem
The Cow-Path Ring Problem, or CPRP for short, was originally considered by the
author in [89]. As the name suggests, the CPRP is closely affiliated with the CPP,
51
the key difference being that the venue, in which searching takes place, has shifted
from the line to a ring. Because a ring has no endpoints, if and when to double-back
during search is a particularly nebulous proposition for the cow in the CPRP. Whereas
reversing directions is critical to formulating a search plan in the CPP, in the CPRP,
it is possible to find the target by fixing a heading and exhaustively searching the
ring. Of course, there is nothing to say this is an efficient method of finding clover
for a given target distribution. Ultimately, this chapter is about understanding at
which points on the ring to turn and when it is best to simply sweep any remaining
territory. Formally, the stochastic or average case CPRP is defined as follows.
Definition 4.1 (The Cow-Path Ring Problem). A hungry cow, C, is located
at an origin point, q,, on the ring JZ. C knows that a patch of clover, W, is located
somewhere on ring 9Z, but has only a prior,
fT:
9Z --+ R>O, on Ts position. C can
move at unit speed and can change directions instantaneously. Finally, on account of
the clover's small footprint, C can discover 7 only when she is standing directly over
it. How should C search to find the clover in the minimum expected time?
A few comments are in order. First, as mentioned, the only difference between the
CPRP and the CPP is the environment in which searching takes place. C's sensor is,
once again, perfectly accurate, but has zero sensing radius. Second, in much of the
analysis that follows, it will be assumed
where 0 <
f
m
in <
fmax < 00.
fY
is bounded, i.e.,
f:
[frnin, frnax],
While it is perfectly acceptable to consider CPRPs for
arbitrary densities fT, we will develop many of our results for bounded
f'.
Moving
along, in the CPRP, C has the option, at any point in time, of fixing her heading, once
and for all, and exhaustively searching T until she finds T. This approach has no direct
analog in the CPP, where C must have a turning contingency or risk never finding the
target. (In the event the target density is zero along one half-line of R, C's optimal
search plan is to explore the other half-line until 7 is found.) To formally investigate
the consequences of this new ability, we require a notational system that is both
descriptive and concise. The next section provides these elements. For convenience,
Table 4.1 provides a summary of the CPRP-specific notation used in this chapter.
52
4.2
CPRP Notation and Terminology
The exploration rule C uses to search for 7 is referred to as a search plan. A generic
search plan is represented by s and the set of all feasible search plans by S. To be
feasible, s must specify a feasible location for C at every point in time. However,
because C is interested in finding T quickly, we can safely assume she will always
travel at her maximum speed, i.e., one. Search plan s may then be specified by listing
the sequence of turning points, points on 'Z at which C is to reverse direction, i.e.,
S = {{qj,..., qn} :nEZ>o, i E'ZVi E I1, ...,n}}.
In this system, should C reach
qn,
(4.1)
i.e., her last scheduled turning point in s, having
not found 7, she reverses direction one last time, before exhaustively searching for and
eventually finding 2. Let Sn denote the set of all search plans that specify n E Zo
turning points, with S = U _Sn the set of all search plans. We now provide a means
to describe specific points on 'Z.
The notation q E 'Z denotes that point q is on 'R. A specific q
E 'R is referenced
by
the number in [0, 27r] that represents the counter-clockwise (ccw) distance from q, to
q, where q, is Ci's position when the search begins at time zero. Owing to the cyclic
nature of JZ, q may also be referenced by the number in [-27r, 0] whose magnitude
represents the clockwise (cw) distance from q, to q. This circular parity affords some
flexibility when referring to points on 'Z; for example, for q E [0, 27r], q, q - 27r, and
q + 27r refer to the same point on 'R. We will frequently switch between these two
labeling schemes, opting to employ whichever is most convenient at a given time.
Although listing a sequence of turning points allows us to describe all search
plans of interest, it makes sense to place some restrictions on turning points so as to
proactively exclude search plans that are clearly suboptimal. As a first effort in this
direction, we enforce the following condition on s E Sn:
qi.qi+ 1 <0, for 1 <i <n-1.
53
(4.2)
Symbol/Acronym
Q
Meaning/Definition
the environment
the ring with unit radius
7
the target (clover)
qE Q
f: Q [fmzn, fm.]
Ci for i= 1, ... , n
CPP
CPRP
CPRG
a point in Q
the target density function
the i-th cow or cow i
Cow-Path Problem
Cow-Path Ring Problem
Cow-Path Ring Game
Table 4.1: Summary of general and CPRP-specific notation used in the thesis.
In words, (4.2) says that after each turn, C must cross q, before making her next turn.
With a little thought, it is clear that any search plan aiming to find Y quickly satisfies
(4.2), since turning while still in previously explored territory is clearly wasteful. To
illustrate the semantics of the notational system implied by (4.1) and (4.2), consider
a search plan s = {qi, q 2 , .. . , qn} E Sn with qi > 0 and n even. In this case, the
search for T would evolve as follows:
qo
-+
That is, starting from q,
2 c wsq3 -c' ... -+n
e travels
(27r +qn).
-w
(4.3)
in the CCW direction toward qi. Should C reach
qi having not found T, she reverses direction and travels toward q 2 . Should C reach
q2 , still having not found 7, she again reverses direction and travels toward q3 , and so
on. Finally, should C reach qn, i.e., her last scheduled turning point in s, with 7 still
proving elusive, she reverses direction one last time before exhaustively searching R
and, eventually, finding T. Similar arguments apply when n is odd, qi < 0, or both.
The pertinent geometric features of the CPRP, along with a portion of a sample
search plan, are illustrated in Figure 4-1.
With a notational system for describing search plans in place, we are finally in
a position to consider the performance of s E S.
54
Let TD(s) denote the time it
q
Figure 4-1: A visualization of the Cow-Path Ring Problem. The target density, f7, as a
function of radial position q, is shown in blue. In the instance depicted, the target, 7, is
located on the North-West portion of 'R. In the search plan shown, the cow (yellow triangle)
travels in the ccw direction toward qi, where, having not found 7, she reverses direction,
and travels in the cw direction toward q2. Upon reaching q2, having still not found 7, she
again reverses direction and continues searching in the ccw direction until ultimately finding
7 at the site indicated with a red exclamation mark.
takes for ( to discover T using s. Because the location of T is random, so too is
TD(s). Consequently, we focus on E [TD(s)], the derivation of which is closely based
on 1121. Assume, once again, that qi > 0 and n is even. By the end of the derivation,
extensions to the cases where n is odd, qi < 0, or both will be obvious. To begin,
note that E [TD(s)] can be broken up into two components: 1.) the amount of time C
spends traveling from the origin to T on the excursion in which T is discovered and
2.) the amount of time C spends backtracking before finding T. Let these times, each
a random variable, be denoted by TD,1(s) and TD,2(s), respectively. First, consider
E [TD,1(s)]. Given n is even and qi > 0, it must be that q, < 0 and all points in
[0, 27r + q,] C 'R will be visited, if required, for the first time when C is moving in the
ccw direction. Similarly, all points in [q, 01 C 'Z will be visited, if required, for the
first time when C is moving in the cw direction. Therefore,
|qn|
27r+qn
E [TD,1(s)]
J
qf'(q)dq +
J
q=O
q=O
55
qfW(-q)dq.
(4.4)
Before considering E [TD,2(s)1, define, for any y, z E [-27r, 27r] such that yz < 0,
F7(y, z) to be the probability that T is located along the arc of R that contains q,
and has endpoints y and z, i.e.,
FW(y, z) = FT(z, y)
=
j
fW(q)dq.
(4.5)
Hence, F7 may be thought of as a type of distribution function taken over arcs of 'R.
Returning to E [TD,2(s)1, if 7
V
[0, qi], which happens with probability 1 - F7(qi, 0),
then C will reach qi having not found 7, and will subsequently have to travel back to
qO, contributing 2q, to TD,2(s). Generalizing, if 7 is not on the arc containing q' and
having endpoints qj_
1
and qj, which happens with probability 1 - F7(qi, qji_), then
C will reach qj having not found T and will have to head back to q empty-handed,
contributing an additional 21qjI to TD,2(s). Putting all of these ideas together gives
E [TD(s)]
=
E [TD,1(s)] + E [TD, 2 (s)
27r+q,
|qnI
qf'(q)dq +
=
(4.6)
q=O
J
qf'(-q)dq + 2
q=O
S
qi(1
-
F(qi, qji_)).
(4.7)
i=1
Let S* denote the set of optimal search plans, i.e., the set of search plans that
minimize (4.7). Previously, it had been argued that s* E S* satisfies (4.2). Two
additional conditions that s* satisfies are discussed in the following lemma.
Lemma 4.2. Let s*
=
{q*q ,
q,. .,q*} E S* be an optimal search plan for the
CPRP. s* satisfies the following conditions:
for 1 < i <n - 2,
(4.8)
Jq*Il + lq*+1 I ! 7r, for 1 < i < n - 1.
(4.9)
jqfl <
Mq+
2
1,
Proof. We address the veridity of (4.8) and (4.9) in turn. Assume, to obtain a contradiction, that s* violates (4.8). There must be an i E Z>0 such that qi+ 2
qZ
in s*. Assuming FT(q;, qi+ 1 ) < 1, there is positive probability that C will have to
56
turn at both q* and qZ+ 1 . However, in traveling from qi+ 1 to q+ 2 , C fails to explore
any new territory. Consequently, if T has not been found upon C reaching q+ 1, it is
guaranteed to remain undiscovered upon C reaching qZ+ 2 . Therefore, by deleting qi'+1
and qg+
2
from s*, it is possible to realize a strict reduction in E [TD], an improvement
that contradicts the optimality of s*. It follows that (4.8) must hold. Although this
analysis has been customized to a ring 'R, it is essentially the same as the associated
proof for the Cow-Path Problem in 112].
To verify (4.9), assume, again to obtain a contradiction, that there exist an s* E S*
that satisfies jq*1 + jq+ 1j > wr for some i E Z>o. In the event C turns at gl+1, she
must then backtrack along a previously explored arc of 'R having length greater than
7r
before exploring new territory. Conversely, by forgoing turning at qg
1
and simply
maintaining course until T is discovered, C is guaranteed to find 7 in time less than
7r. Since this alternate strategy realizes a strict reduction in cost, we again have a
result that contradicts the optimality of s*. It follows that (4.9) must hold.
U
In summary, (4.2), (4.8), and (4.9) state that an optimal search plan tasks C with
alternatively exploring ever-expanding frontiers of 'Z, until such time as half of the
ring has been explored, at which point C continues searching in her current direction
until 7 is found. The following strategies are useful in establishing bounds on C's
achievable performance.
Definition 4.3 (no-turn search strategies). Let sew (respectively secC) denote
the search plan in which e departs from %, and travels exclusively in the cw (ccw)
direction. That is, {sce UsCCW} is the collection of search plans in which e has no builtin contingency for turning. Recall that these strategies have no sensible analogue in
the CPP.
Given C moves at unit speed, sew and secw immediately yield the following preliminary upperbound:
E [TD(s*)]
min (E [TD(scw)] , E [TD(sccw)])
57
27r.
(4.10)
The proof of (4.10) follows from the simple fact that in the case of both sc, and
seCC, C is guaranteed to find ' after sweeping 'R, and this sweep can be completed in
time at most 27r.
4.3
The number of turns in the CPRP
We now focus our attention to the task of characterizing the optimal number of
turning points in the CPRP. Given a search plan s, let #s denote the number of
turning points listed in s, i.e., the maximum number of times, nmax, that E could
conceivably turn under s. For example, for s = (qi,... , qn), #s = n. Note #s is, by
definition, a property of s, while #s* is a property of
fT.
In the CPP, it may be the
case that the optimal search plan contains an infinite number of terms. For CPRPs
with bounded
f
, the following result states that this is not the case.
Proposition 4.1. Consider a CPRP with target density f
9Z
-:
[f in, frax]
where
0 < fmin < finax < oo. Then #s* is finite.
Before embarking on a proof, we comment on the boundedness assumption on f
i.e., that fW(q) E [fmin, fmax] for q E 'R. In many applications, it is reasonable to
bound the probability of finding T in any interval of finite length away from zero,
e.g., when there is no location for which 7's absence can be guaranteed. Moreover, it
is often the case that it is only over intervals of finite length that one can associate a
positive probability to finding ', i.e., the support of fY is non-atomic. For the class
of problems for which both of these conditions hold, the assumption that f : 'Z
[fmin, fmax] is a reasonable one.
Proof. The key idea of the proof is that by turning a sufficiently large number of
times, the burden of backtracking becomes less appealing than simply using a no-turn
strategy, e.g., scw. Assume, to obtain a contradiction, that for a given target density
f
: 'Z -- + [fmin, fmax],
also true that
#s*
lq2n|+lq 2 n+ 1
=
o. From (4.9), for any n E Z>o, Iq*I + Iq*+j < ir. It is
<
7r. Therefore, with finite probability p > rfmin, T will
58
remain undiscovered by the time C reaches
q2n+1.
It follows that E [TD(s*)1 > pnq,
and, for sufficiently large n, exceeds 27r, which violates (4.10). Yet, from earlier
U
discussion, (4.10) must hold. This contradiction proves the claim.
4.4
An iterative algorithm for s*
The discussion to follow centers on determining the minimum number of turn-around
points required to realize an optimal search plan, i.e., min{#s* : s* E S*}. To this
end, with Sn
=
{s E S: #s = n}, let
S, = {s E Sn : E [TD(s)] < E [TD(s') for all s'
E Sn]},
(4.11)
be the set of n-turn optimal search plans.
For convenience, let n* = min{#s* : s* E S*}. The next result captures the intuitive idea that search plans employing more than n* turn-around points can recover
optimal performance.
Proposition 4.2. Consider a CPRP with target density fw : 'Jz
0 < fmin
E [TD(s)]
[fnin, frax] where
fmax < 00. Let s be a search plan in S* with #s = n > n*. For s* E S*,
=
E [TD(s*)].
Proof. Intuitively, because s* is optimal, the need to specify additional turning points
in s confers no advantage, i.e., we would expect E [TD(s)]
E [TD(s*)]. For the time
being, consider the case when #s = n* + 1. A sensible strategy for selecting s is to
downplay the need to specify an extra turning point by ensuring it stands only an
arbitrarily negligible chance of impacting E [TD(s)]. For example, for small E > 0, let
s = {s*, qn*+1}, where for s'
E S,
q E 'Z, {s', q} is the concatenation resulting from
adding q to the end of s', and
qn*+1 = -27r - sgn(qn.) + qn. + E - sgn(qn*).
59
(4.12)
e
In words, (4.12) says that when using s, and should it prove necessary,
turns for
the last time just before she would have finished sweeping 'Z under s*. Consequently,
TD(s) =
TD(s*) only in the rare instances when 7 E (qn*, qn*
Given fT(q) <
fm .
+ E - sgn(qn.)) C 'R.
and that we are free to take e arbitrarily small,
lim P (Y E (qn*, qn. + E - sgn(qn.))) = 0,
(4.13)
implying limEo+ E [TD(s)] = E [TD(s*)].
For #s > n* + 2, we can again make similar arguments to ensure the extra turning
points are largely superfluous. Namely, for k = 0, 1, . . . , #s - n*, selecting the extra
turning points as
qn*
- 0.5' -E - sgn(qn.)
,k
even
(4.14)
,
n-k=
-27r - sgn(qn.) + qn. +
0 .5k -
E - sgn(qn*),
k odd
and taking the search strategy as s = (q*, q1,...,q*._ 1, / .... ,q
) ensures that
-
limE-o+ E [TD(s)] = E [TD(s*j
The proof of Proposition 4.2 provides insight into how turning points beyond the
necessary n* can be mitigated to ensure optimality. It also suggests the following
straight-forward approach, provided below in Algorithm 4.1, to compute n*.
A few comments are in order. First, s* in line 3 is found by minimizing (4.7)
which, for a given n, is nonlinear in the decision variables qi, ... , qn.
A natural
question concerning Algorithm 4.1 is whether or not termination can be guaranteed.
If it can, then both n* and, more importantly, again from line 3, s* E S* are known.
Intuitively, using more turning points when searching cannot increase E [TD(s-)].
Along the lines of Proposition 4.2, it seems reasonable that E [TD(s*)] will decrease
monotonically in n for n < n* and, based on Proposition 4.2, E [TD(s*)] = TE [TD(s*)]
for n > n*. The following discussion provides further insight into the CPRP.
60
Algorithm 4.1: iterative algorithm for finding s*
2;
1
n
2
while true do
<-
3
f ind s*;
4
if jqn-i1 + qnj r~ 27r then
5
n* = n-1;
6
break
7
else
n<-n+1;
8
9 return s*
o < fmin
fmax
that satisfies |q*|I
<
0'o.
Suppose an s*= {q*,...,q*}
:R -+
e S*
[fmin, f m ax] where
has a first turning point
m. Then the optimal number of turns satisfies n* < Lm
I
.
Lemma 4.4. Consider a CPRP with target density f
Proof. Assume the clause of the lemma is true, i.e., there exists an s* = {q*,... , q*
}E
S*, with Iq*' > m. Let A be the event that, under s*, C has to turn n* times before
finding W, i.e., C finds J having made all of her scheduled turns. From (4.9) and the
optimality of s*, C will make no further turns once the explored segment of 'z reaches
length 7r. Given fT (q) > fmin Vq E 'Z, it follows that P (A)
7rfmin.
Assuming A
occurs, C will have to travel a distance no less than m between each successive turn.
Conditioning on A, the last result implies that
E [TD(s*)] > E [TD(s* A)]
(A) > (mn*)(7wfmin)
(4.16)
= 7rmfminn*.
Given E [TD(sc)]
(4.15)
27r, the optimality of s* requires that
rmfminn*
< E [TD(s*)] < E [TD(sc)]
61
27r.
(4.17)
Working from the left and rightmost terms of (4.17) and solving for n* subject to the
constraint that n* E Z>o gives the required result.
M
A key component to the development of Lemma 4.4 was the fact the first turning
point was at least distance m > 0 away from q, i.e., the distance D(q0 , q1) > m. The
following lemma provides a bound on m as a function of the variation in fT along 'z.
Lemma 4.5. Consider a CPRP with target density fT : 'Z -+
o < frin < fmax
< oo. Let s*
=
[fmin, fmax],
where
{q*, q*,...} be an optimal search plan. Then at least
one of the following cases is true
q*I
1
-4fmax
1
Jq*1 >! 2fmax
1
or
(4.18)
(4.19)
Proof. The only terms that involve qi in (4.7) originate within the summation and
are associated with doubling back. Without loss of generality, we may assume that
q* > 0 and, therefore, q* < 0. In this case, the terms involving q* in E [TD(s)]
are 2q*(1 - FT(q*, 0)) and 21 q 1(1 - FT(q*, q*)). Because s* is optimal, there is no
advantage to making small adjustments in q*, such that
d(E [TD(s)]) = 1 - FX(q*, 0) - q*fT(q*) + q*f7(q*)
=
where (4.20) makes use of the fact that q* < 0 => Jq*1 = -q;.
[fMin, fmax],
it follows that FT(q*, 0) 5 fmaxq* and f (q*) <
fmax.
0,
(4.20)
Given f'(q) E
Applying these
bounds to (4.20) gives
2fmaxq* > 1 + fmaxq*.
(4.21)
With q* < 0, 1+ fmaxq* < 1. Now, if 1+ fmaxq ;> 1, then (4.21) immediately gives
(4.18). Otherwise, 1 + f m axq* < 1, which implies (4.19).
The preceding result says that by the end of the second turn, C has ventured
a distance at least
1
4fmnax
from q. Second, both (4.18) and (4.19) depend on fmax,
62
but not on fmin. Intuitively, larger fmax values are indicative of f7 exhibiting more
region-specific characteristics over 'R, which may encourage more aggressive turning
early in the search. Finally, Lemma 4.4 came with the stipulation that jq*1 > m,
for positive m. Clearly, if (4.18) holds, m may be taken as 1/(4fnax). In the event,
(4.19) holds, then C must travel a distance q* after each of at least n* - 1 of her
turns. In this case, the proof of Lemma 4.4 can be easily tailored to develop a similar
upperbound on n*. Moreover, the derivation of m suggests that it may be possible
to use similar techniques to bound
fq2 J,
Jq 3 J,
etc.
The following theorem pairs Lemma 4.4 with Lemma 4.5 to establish an upperbound on n*.
Theorem 4.6. Consider a CPRP with target density fi :Z -+
0 <
fmin
<;
fmax
[fm in, fmax], where
< oc. The optimal number of turning points satisfies
n* < nub
= L8faxJ
(4.22)
Proof. The result follows from successive application of Lemmas 4.4 and 4.5. Assume
n* <
[8fmax
fmi.
1
. Then applying Lemma 4.4 for m =
gives
. Otherwise, from Lemma 4.4, (4.19) must hold such that jqj1 ;>
1
.
(4.18) holds such that Jq*1 >
2fmax
Applying Lemma 2, this time for q*, and remembering to count the first turn gives
n* < [ 4 max ] +1.
Lfi.
Moreover, because fmax
> fmin, we
have that
[4fmax
fmin
J +1 <[
8f m ax
fmin
and n* < [8rmax I
~
min
A few remarks are in order. First, the bound on n* is proportional to the ratio
fmax/fmin.
An fW with one or more peaks will have a higher
fmax/fmin
than an
fc possessing little variation, i.e., is more uniform, over 'Z. Peaks in f7 make it
easier to justify turning around and, in this respect, the proportional dependence on
fmax/fmin is in line with our intuition. Second, the bound in Theorem 4.6 is, at least
in certain cases, rather conservative. To see why, consider that when fix is uniform,
i.e., fmin = fmax = I, the only optimal search plans are se, and sc,. Hence, for this
case, although it is clear that n* = 1, the bound in Theorem 4.6 can only assert that
63
the maximum number of turning times satisfies n* < 8.
4.5
A direct algorithm for finding s*
The iterative nature of Algorithm 4.1 materialized as a consequence of not knowing
whether or not the luxury of an extra turning point could reduce E [TD(s)]. Given the
upperbound in Theorem 4.6, we can approach the CPRP from a more informed vantage point. Specifically, we now know that C requires no more nub turns to efficiently
find W. Algorithm 4.2, provided below, combines the main results of Proposition 4.2
and Theorem 4.6 to provide a direct means of computing s*. It is prudent to comment on a couple of the finer points of the algorithm. First, minimizing (4.7) in line 2
involves solving a nonlinear, non-convex optimization problem in the decision variables qi, q2 , .. .,
qn.b
subject to constraints (4.2), (4.8), and (4.9). Consequently, the
optimization is potentially challenging. Second, the pruning in lines 3 and 4 aims to
counteract the potential looseness of the upperbound on n* by removing superfluous
turning points, yielding a concise search plan that finds 7 in the minimum expected
time.
Algorithm 4.2: direct algorithm for finding s*
1 let nub equal the upperbound in Theorem 4.6;
4.6
that minimizes (4.7);
2
S*
find an s*E
nub G Sub
3
let n* = min{n E 1,.u.. ,u:
4
let s* equal the first n* turning points in sn
JqnI + qn+1 ~~2l;
Summary of the CPRP
To summarize, the CPRP considered how a cow should search, unencumbered, to find
a target on a ring in minimum expected time. We showed that her optimal search
plan satisfies (4.2), (4.8), and (4.9). Moreover, this plan requires her to turn around
at no more than nub points. The exact location of the turn-around points may be
64
determined, as a function of
decision-variables qi, q2 , ...
f
, qne
, by solving a nonlinear optimization problem in the
and applying the pruning measures outlined in Algo-
rithm 4.2. Alternatively, an optimal search plan may be found, as in Algorithm 4.1,
by finding the optimal search plan associated with a fixed number of turning points,
and increasing this number by one until no further reductions in the discovery time
are possible. We remark that the CPP is infinite-dimensional and obtaining numeric
solutions typically involves using dynamic programming to solve a discretized version
of the problem. In contrast, the cyclic topology of 'Z, coupled with the boundedness
assumption fm in
<
f'(q)
fmax for all q - 'R, permitted us to bound the maximum
number of turns required to solve the CPRP in its native continuous domain using
nonlinear optimization techniques.
In the next chapter, a second hungry cow is added to 'Z and the search for clover
becomes competitive in a game-theoretic sense.
65
66
Chapter 5
The Cow-Path Ring Game
This chapter considers the problem in which two or more hungry cows compete to
find a clover patch located somewhere on 'Z. Because many elements of this scenario
draw heavily from the CPRP, we refer to it as the Cow-Path-Ring Game, or simply
the CPRG. Introducing and analyzing this problem is the core contribution of the
thesis. Moreover, the subsequent chapters draw heavily from this problem and consider variations of the standard framework. Accordingly, we invest in describing the
particulars of the encounter and formulating the problem before considering effective
search strategies for the cows to use.
5.1
Adding a second cow to the ring
Adding a second hungry cow to 'Z calls for a number of notational extensions and
operational clarifications. This section provides the necessary additions. First, the
clover is once again referred to as a target and denoted by 'T. The cows contesting T
are identified by an index, 1, ...
given by C = {C1,
C2, ...
,
, n, with cow i denoted as Ci. The set of all cows is
C} and the set of all cows excluding Ci by
e-i
:= C \ ei.
The movement and sensing capabilities of Ci in the CPRG carry over directly from
the CPRP. The position of Ci at time t > 0 is denoted by qi(t) E 'Z and the trajectory
of
ei
over [0, t] by qi,t : R[o,t -
'Z, such that qi,t(T) = qi(r), Vr E [0, t]. As in the
67
CPRP,
ei
has a prior fT: 'R - R>O on T's location, but she does not know T's exact
position. Note that, for the time being, it is assumed each cow has the same prior
on W. In general, how Ci searches for 7 in the CPRG will reflect an awareness of the
other hungry cows on 'R. To emphasize the strategic nature of the encounter, the
approach that Ci uses to search for 7 will be referred to as her search strategy. Ci's
search strategy is denoted by si and the set of all possible search strategies at her
disposal by Si. The collection of search strategies in a game is given by the search
profile s E S = Si x
...
x S,. Given this chapter and all future discussions focus on
CPRGs, we trust there will be no confusion between this convention and the similar
notation used to describe search plans for the CPRP in Chapter 4.
Naturally, how Ci should search when competing to find 7 depends heavily on her
awareness of the environment. As with the CPRP, this includes the prior information,
e.g.,
f
, she has on the clover's location. However, Ci's search strategy now also de-
pends on the knowledge she has regarding the recent whereabouts of rival cows. The
level of awareness a cow has regarding these elements is characterized by her information model of the game. Given the pronounced role this knowledge has in shaping
search strategies, the following section is devoted to specifying the information model
of each cow in the CPRG.
5.2
A model for informed cows
In a general n-player game, player i's information model specifies the information
available to her throughout the course of the game.
In a continuous-time game,
player i has a closed-loop or feedback information model if, at each instant, t > 0,
she has complete knowledge of all previous actions taken by every other player [11].
Conversely, player i has an open-loop information model if, at each instant, t > 0, she
has no knowledge about the previous actions of any other player. In a feedback game,
each player has a feedback information model. Likewise, in an open-loop game, each
player has an open-loop information model.
68
The CPRG we will consider is a continuous-time, closed-loop game. Denoting
ei's
information model by Ii, the latter point is emphasized by writing Ii =
i = 1, ...
,
1
ci,
n. Because the CPRG centers on searching a continuous workspace and
evolves in continuous time, it is worth refining the generic notion of a closed-loop
game to more closely reflect the situation at hand. First, stemming from Ii =
each t > 0, and for each Ci E C, Ci knows qjt for all e3 E C \
{Ci}.
ci,
for
Of course, it is
assumed that Ci also knows her own trajectory, i.e., di,t. Let
F(t) :=
i E C}
(5.1)
denote the set of all cow trajectories taken up to time t > 0. Then, in the CPRG,
Ci's knowledge at any instant t > 0 is comprised of the target prior and the previous
and current position of all cows, i.e., Ii(t) = (f , IF(t)). The feedback nature of the
CPRG allows Ci to use high-performance search strategies capable of responding to
the actions of her rivals in realtime. That is, with Q = {turn, straight}, or equivalently {cw, ccw}, the set of steering commands available to Ci,
e's
search strategies
in the CPRG take the form of mappings from Ii(t) to Q, i.e.,
Si = {si : si(f',IF(t)) -+ Q, t > 0}.
(5.2)
Conversely, in an open-loop CPRG, for each instant t > 0, and for each Ci E C, Ci
has no knowledge of qj (t) for any Cj E C \ {C, }. An open-loop model effectively
corresponds to cows that are extremely nearsighted, such that C2 can gather knowledge of Cj's position,
j
4 i, only in the event Ci and C3 collide while searching.
Unsurprisingly, the restricted sensory infrastructure of these encounters necessitates
much of the planning be done offline in advance. A treatment of open-loop CPRGs
can be found in [96]. The results reported therein chronicle an independent analysis
of open-loop Cow-Path Games. The nature of the results reported therein stand in
stark contrast to and thus complement the contributions of this thesis, underlining
the defining role Ti(t) has in shaping not only the search strategies cows ought to use,
but also the analytic techniques used to study these games.
69
Symbol/Acronym
Si
S
S = Si x - x Sn
qit
Zi (s)
Ui(s)
;
s* = (s*, s, ...
,
s*)
Meaning/Definition
the search strategy of Ci
the set of all of Ci's search strategies
the set of search strategy profiles
trajectory of Ci over [0, t]
the landclaim of Ci under s E S
the utility of Ci under s E S
information model of Ci
an equilibrium search profile in S
Table 5.1: Summary of CPRG-specific Notation
Defining the Cow-Path Ring Game
5.3
In this section, we formally define the CPRG. The preceding sections have provided
much of the machinery needed to discuss competitive search on R. However, we have
yet to update one very important detail: the objective of Ci. A game featuring n
cows, but just a single target, necessitates a reexamining of bovine logic. To this end,
we assume that, although
eC
would prefer to discover T early in the encounter, rather
than late in the encounter, it is more important, given the scarcity of clover, that she
be the first to find it. In other words, Ci searches to maximize her chance of finding
the clover, irrespective of at what time in the game capture may occur.
Although this thesis will focus primarily on CPRGs featuring two cows, in the
interest of generality, we provide a formal statement of the n-cow CPRG.
Definition 5.1 (Cow-Path Ring Game). Consider n hungry cows,
{1,
ei
for i E
... , n}, initially positioned at qi(0) E 9, respectively. Each Ci knows that a clover
patch, 7, is located somewhere on 9Z, but has only a prior
f
: 9 -- R>0 (the
same for each cow) on T's position. Each C2 can move at unit speed and change
directions instantaneously. On account of Ts small footprint, C2 can discover T only
when standing directly over it. Finally, Ci has a closed-loop information model, i.e.,
Ii =
cI,
allowing her to track the movement of her rivals in realtime. What search
strategy should C2 use to maximize her probability of finding W?
70
A few comments are in order. First, Definition 5.1 is presented in a conversational tone. This approach was favored to retain ties with the CPP and elucidate
the simplicity of the formulation. Alternatively, the CPRG may be expressed more
compactly by the tuple gCPRG
(Ci, JR, Ti, Si Ui)i= 1 ,...,. Here, the meaning of each
component is to be understood from previous discussion. In later chapters, where the
formulations become more involved, we will rely on tuple notation to streamline the
presentation. Second, we reiterate that CPRG is a game, not only in the vernacular,
but, more importantly, in the game-theoretic sense. Accordingly, we will use game
theory to develop strategic algorithms the cows may use to search for 'T. Third, in
any instance of the game, nature determines T's location and, in this way, contributes
to the outcome of the game. Table 5.1 provides a summary of CPRG notation used
to describe the CPRG.
5.4
A remark about Cow-Path games on the line
Given the CPRG originated as an adversarial re-imagining of the CPP, it is worth
commenting on our rationale for studying encounters on a ring, as opposed to a line.
The primary reason for this selection follows from the fact that the majority of the
games studied in this thesis involve two cows. As it turns out, a contest between two
cows to find a target on a line (finite, i.e., R[a,b], or infinite, i.e., R) admits a rather
uninteresting NE. Specifically, the unique NE of the game takes the form (9, 9), where
the functionality of 9 is described in Algorithm 5.1.
Algorithm 5.1: functionality of si = s
input: qi(O) and q2(0)
i
align to face e_i;
2
while qi(t) / q-i(t) and qi(t)
3
L
$
go straight;
4
turn around;
5
go straight;
71
q-i(0) do
Figure 5-1 illustrates how a competitive search game between two cows on a line would
resolve under (9, 9). Justification that ( , s) E SNE follows from the fact that any
deviation on the part of Ci must involve surrendering the privilege to visit territory
first or attempting to switch the relative orientation of the cows on the line, i.e.,
the leftmost cow becomes the rightmost cow. Neither of these options can ever be
appealing to both C2 and realizable under s-i = 9. What is uneventful about
is that
it is entirely independent of f7 and only loosely dependent on qi (0) and q 2 (0). Based
on these truths, we chose to study the CPRG because we felt 'Z was the simplest
topological workspace that afforded interesting options for strategic play.
IR
Figure 5-1: A visualization of the Cow-Path Line Game. The target density, f7, is shown
in blue. The unique equilibrium search strategy of each cow, s* = (9, 9), is indicated by a
directed gray line. Under s*, Ci heads toward C-i and, just before meeting, reverses direction
and visits any previously unexplored territory.
5.5
CPRG-specific notation and terminology
We assume that the CPRG ends when each (intelligent) cow can deduce that 'R
has been completely explored and, thus, W has been found. In other words, we are
interested in the behavior of the cows, only up to and including the time that T is
captured.
The notation Ci <- 7 denotes the event that C2 finds T. Recalling the
definition of a NE provided in Chapter 3, let SNE denote the set of NE search profiles
and s* E SNE a NE. Figure 5-2, provided below, gives a visual representation of the
CPRG.
72
q
q =0
C2
Figure 5-2: An instance of the CPRG illustrating the initial positions and initial headings of
cows C1 and C2 . The trajectories of both cows, right up to the point of capture, are shown
in dark gray. The target density f achieves a global maximum in [-7r/4, 0]. In the instance
shown, T is located along the North-West portion of 'R. The site at which T is found, in this
case by C 2 , is indicated with a red exclamation mark.
The following concepts relate to the territory on 9Z that Ci is the first cow to
search, which, in turn, directly affects C2's chances of finding T. Specifically, define
the landclaim of C, under s E S as
i(s) := {q E 'R: C, is first to visit q under s}.
(5.3)
The utility that Ci derives from s may then be expressed as
lUi(s) = P (C
<--
T under s) = Jf(q)dq.
(5.4)
qE Li(s)
Intuitively, effective search strategies will, as a function of the initial conditions, guide
Ci toward unexplored regions of 'Z where
f
is large. Of course, they will also factor
in where rival cows are located. The notation and terminology introduced thus far
will allow us to speak concisely and definitively when analyzing competitive search
games on rings.
73
Search strategies in the CPRG
5.6
Having defined the CPRG and introduced a suite of notation and terminology, we
are finally in a position to consider search strategies for the cows. We begin by
qualitatively describing a collection of possible maneuvers. that may be integrated
into the search activities of Ci. In so doing, the intent is to provide evidence for
why CPRGs are challenging objects of study and use the insight gained to guide our
.
analysis. In this respect, we will focus on CPRGs featuring two cows: C and C 2
To gain an appreciation for the types of complex decisions C1 and C 2 face in the
CPRG, we return to the scenario in Figure 5-2. In particular, focus on C 1 and the are
' C 9Z described by 9Z' = {q E 9 : -ir/4 < q K 0}. Clearly,
f7 achieves
its global
maximum on 9Z' and 9Z' is a region each cow aspires to visit first. We are tempted
to ask, should C1 explore this region immediately, or is she better off to set it aside
for "safe keeping" and return to it later? The answer, of course, is highly contingent
on
( 2
's strategy. For example, C 2 could threaten, and perhaps occasionally follow
through with, raids into territories that C1 values highly, and has her own aspirations
of searching first, e.g., 9'. In the next section, we consider a restricted version of the
CPRG that is more amenable to a first analysis of the underlying strategic dynamics
at play.
5.7
The one-turn, two-cow CPRG
This section considers the class of feedback CPRGs in which each cow may turn at
most once. At first glance, this stipulation may appear overly restrictive. It turns
out, however, that there is much to be gained by analyzing precisely this class of
game. Moreover, in the next section, we will consider games in which the cows may
turn up to a finite number of times. The exposition of optimal strategies in these
finite-turn games is greatly expedited by the forthcoming analysis of one-turn games.
Formally, the feedback, one-turn, two-cow CPRG or simply the 1T-CPRG is defined
74
as follows.
Definition 5.2 (One-Turn Cow-Path Ring Game). The 1T-CPRG is a special
case of the CPRG that features the following amendments:
.
(1) n = 2, i.e., the game involves two cows, referred to as C1 and C 2
(2) define <D
(3)
{cw, ccw} to be the set of directions in which a cow can travel on
=
#i(O) E (b.
'Z. Ci, i
=
eC,
1, 2, may turn at most once during the game, where a turn consists of
i
=
1, 2, has initial heading
Ci changing her heading from cw to ccw or from ccw to cw.
In analyzing the 1T-CPRG, it proves advantageous to work in terms of turning
times, rather than turn-around points. Given that it can be assumed the cows travel
at unit speed, each of these quantities may be readily determined from the other
given the initial condition qi(0).
Sreac,
Two specific player strategies, denoted spass and
feature prominently in the analysis that follows. We describe the finer points of
each at this time.
Definition 5.3 (always turn passive search strategies). Let Si,pass C S denote
the set of strategies that mandate Ci turn around at a specific time, irrespective of
Ci's past positions and current location, i.e., independent of 4-i,t. Specifically,
Sipass =
{s E Si : s = pass(t) for some t E [0, 27r]},
(5.5)
where
(si = pass(t)) = Ci turn around at time t, irrespective q
,t.
(5.6)
In (5.6), pass(t) reads "always turn at time t". The stipulation that t E [0, 27r]
for pass(t) E Sj,pass stems from Ci's selfish nature; namely, she has no good reason
for turning around if she has already circumnavigated 'Z. We remark that within the
confines of this notational system, pass(27r) is the strategy in which Ci never turns.
In this case, Ci's initial heading, Oi(0), determines the direction she travels around '.
75
While strategies in Spass essentially ignore the actions of Ci, the strategies in Si,reac
make use of the game theoretic concept of best response. The specification of Si,reac
requires a few additional notions. Let I>_j(sj, sj) be the indicator function defined
by
ffi>-i(si, S-0) =
1
{0
s = (si, si)
if Ce turns no sooner than Ci under
otherwise.
Also, let SR,, (si) = {s E Si : fif_(s, sj) = 1}. For the CPRG, define the follower
best-response of
ei
to s-i E S-i as
: Ui(s, sj) > Ui(s', sj) Vs' E Sjj>_j}.
BRt(s-i) = {s E Sj
(5.7)
Definition 5.4 (eventually turn reactive search strategies). Let Si,reac C Si
denote the set of reactive strategies in which
but may turn earlier depending on
ei
turns by no later than a specific time,
ei's actions, i.e., dependent
on q-i,t. Specifically,
Si,reac = {s E S : s = reac(t) for some t E [0, 27r]},
where
(si = reac(t))
=
{Z
(- turns at BRj r)
un
Ii turns at t
(5.8)
Ci turns at r < t
otherwise.
In (5.8), reac(t) reads "eventually turn by time t". Having defined Spass and Sreac,
we are now in a position to begin characterizing an equilibrium of the 1T-CPRG.
To begin, consider the profile s =
(Si,
s2) where si = pass(27r) and S2 = pass(27r),
i.e., the profile in which neither cow turns. If neither cow can improve her utility by
more than c by unilaterally deviating from s, then s is an -NE of the 1T-CPRG. On
the other hand, if one of the cows, say Ci, has an incentive to deviate, then Ci must
favor turning at some time t E [0, 27r), to her current approach. Moreover, if
eCcould
guarantee that she will turn no later than Cj, i.e., that ILf>j = 1, then her optimal
76
turning time would be given by
ti,1 = min arg max ii
ti E [0,27r]
pass(ti), arg
max U-i(pass(ti), pass(t-j))
ti <ti < 2/
,
(5.9)
and Ci's follower best-response would be to turn at time
t-i,1 = min arg
Note that knowledge of
f
max
ti,16 t-i <27r
U-i(pass(ti,1 ), pass(t-j)).
(5.10)
and the assumption cows are intelligent permit Ci and
C-_ to compute (5.9) and (5.10), respectively. The feedback nature of the game, i.e.,
2i = Ii, i = 1, 2, allows Ci to implement the desired search strategies. For s E S, 'T
will eventually be found by a cow, implying U 1 (s) + U2 (s) = 1. For this reason, the
1T-CPRG is said to be a constant sum game such that if shifting strategy profiles,
say from s to s', causes Uj to increase by u > 0, then it necessarily also causes U-i
to decrease by u. From this perspective, (5.9) is reminiscent of Ci using a maximin
strategy: an approach aimed at safeguarding her utility from Ci. (5.9) is, however,
not a true maximin strategy because the second argmax is taken over a restricted set
of times, namely those that ensure Ci turns no sooner than Ci, rather than being
taken over the set of all turning times, i.e., [0, 27r].
Although (5.9) ensures Ci is doing the best she can, given the privilege of turning
no later than her rival, to find 7, there remains the question of whether or not C-_
would be content turning second. If the answer is yes, then
s = (si, s-i) = (reac(ti, 1), reac(27r))
(5.11)
is an E-NE. If the answer is no, then Ci's optimal deviation is to turn at time
t-i,2=
min argmax U-i ( argmax Uj (pass(ti), pass(t-j)) , pass(t-i)),
0ytiti 1
t_iitit2ne
(5.12)
yielding the profile (reac(ti,1), reac(t-i,2)), in which Ci will turn, no sooner than C_j,
77
at a time given by
(5.13)
ti,2= min argmax Ui (pass(ti), pass(t-i, 2 )).
t-i,2<ti<27r
We are then left to ponder if
ei
is content turning no sooner than
ei
in the profile
(reac(ti, 1 ), reac(t-i, 2 )), or if she is partial to a unilateral deviation in which she would
turn first, and so on.
To this end, let T = {tii, t-i,2 , ti,3 , ...} be the sequence of
first turning times generated by this procession of one-upmanship. As a notational
shorthand, we define
H_, (t-i) = U-i(argmax U7(pass(t'), pass(t-i)) , pass(tLi)).
(5.14)
t_i<t'<27r
Algorithm 5.1, provided below, keeps track of T and the strategies the cows adopt
while jockeying for position using this back-and-forth mechanism in the 1T-CPRG.
We now describe properties of Algorithm 5.1 that speak to two important issues:
1.) the algorithm's termination, and 2.) which cow or cows, if any, have an incentive
to unilaterally deviate from the strategy profiles prescribed at various stages.
Proposition 5.1. Consider applying Algorithm 5.2 to an instance of the 1T-CPRG.
Let {a, b, c} be a subsequence of three consecutive turning times in T. Assume s1 =
reac(a) and s2 = reac(b) are two of the associated strategies prescribed by the algorithm. Then C1 is the only cow with a unilateral incentive to deviate from the search
profile s
=
(Si,52), and her only profitable deviations involve preemptively turning no
-
later than C 2
Proof. To begin, the logic of Algorithm 5.2 implies that T is non-increasing and
a > b > c.
s
= (Si,
Now consider the strategies s, = reac(a), S2 = reac(b), and profile
s2). Note that in s,
C2
turns no later than C1. Because c immediately follows
b in T, the clause on line 13 of Algorithm 5.2 must fail for s, indicating C1 can increase
her utility (by more than e) by playing reac(c) instead of reac(a), such that C1 turns
no later than C2 in (reac(c), reac(b)) E S. To establish that C1 must turn no later
78
than
e2 to improve her utility, note that in s, C1 is already playing her best response
to C2 turning at time b, implying the absence of any profitable unilateral deviations
in which C1 remains the second cow to turn.
Algorithm 5.2: determine E-NE search profile
1
si = pass(27r), s2
pass(27r);
=
2 if (Si, S 2 ) E SE-NE then
3
4
L
break
i <- index of cow with incentive to deviate from (si, 82);
5 T = 0;
6 k = 1;
ti,k
min argmax U (pass(ti), argmax U-(pass(tj), pass(t-j)));
=
ti <t-i <27r
ti E [0,27r]
8 si = reac(ti,k);
tki,
=
min argmax U-j(pass(ti,k), pass(t-j));
,kt
9
_i
2,r
10 s-i = reac(t-i,k);
11
T
12
while no c-NE established do
13
{ti,k};
<-
if U-i(pass(ti,k), pass(t-i,k)) + E >
K break
14
15
16
else
16
t-i,k+1 <-
min argmax H-i(t-i, ti,k);
<-t-i! ti,k
17
s-i +- reac(t-i,k);
18
ti,k+1
19
k
<-
20
T
+-
21
i
--
ti,k;
k + 1;
{T, t-i,k};
<- -i;
return profile (si, S2
)
22
max
O t-i:5ti,k
79
H-i(t-i, ti,k) then
Now consider C 2 , the first cow to turn in s. Using similar reasoning, we conclude
that since b immediately follows a in T, C 2 must prefer turning no later than e1 , at
or before time a, rather than responding to C1 turning at a. Therefore, C 2 has no
incentive to deviate from s to a strategy in which she responds to C1 turning at time
a. Furthermore, since adoption of the strategy s2 = reac(b) was selected using the
assignment in lines 16 and 17 of Algorithm 5.2, C 2 selects her turning time optimally
over [0, a], implying there are no profitable deviations that involve turning no later
than C1 in [0, a]. We conclude that C1 is the only cow with a unilateral incentive to
deviate from s, and any profitable deviations involve C1 turning no later than C 2 . E
From Proposition 5.1, Algorithm 5.2 assigns si and s 2 such that it is only ever a
cow that turns no sooner than her rival in (si, s2) that, by preferring to turn no later
than her rival, could have a unilateral incentive to deviate. This realization begs the
question, can this succession of one-upmanship continue indefinitely? The following
proposition asserts that, for a large class of target densities, the answer is no.
Proposition 5.2. Let
fT
be a bounded target density satisfying f (q)
M for finite
M > 0, for all q E 'Z. For any combination of initial cow positions qi(0) and initial
cow headings #i(0), i = 1, 2, and c > 0, T is finite, i.e., ITI is finite.
Proof. Assume, to obtain a contradiction, that there exists initial positions and headings, qi(0) and
#j(0),
i = 1, 2, respectively, such that Algorithm 5.2 fails to terminate,
i.e., T is infinite. Let T
=
{t, t 2 , t 3 , . .. . From Algorithm 5.2, T is a non-increasing
sequence. Moreover, because, by the rules of the game, the cows cannot turn before
time zero (when the game starts), T is non-negative. It follows that T must approach
a limiting value v > 0, and for any 6 > 0, there exists a sufficiently large no(6) E N
such that 0
tn
-
n+1
J for all n > no(3). For J > 0, let a > b > c be three
6
consecutive elements of T such that 0 < a - b < 6 and 0 < b - c < 6. From the
assumption that T is infinite, such times are guaranteed to exist.
80
Now consider the following search profiles:
si
= (reac(a),reac(b)),
(5.15)
82
= (pass(a),reac(b)),
(5.16)
(reac(c), reac(b)),
(5.17)
(reac(c), pass(b)).
(5.18)
83 =
84
=
In si, C1 best responds to C2 turning first at time b.
U1 (s 2 ),
Since a > b, 1i(si) >
because the option of turning at time a is considered when best responding
to s2 = reac(b). Moreover, because c immediately follows b in T, it must be that
U1 (s3) >
- 1(si) + c, and, subsequently, that
search profile
84,
1i(S3)
>
U1-(82)
+ E. Now consider the
in which C 2 responds to C1 turning first at time c, by turning at
time b. However, because C 2 best responds to C1 turning first at time c in s3, we have
that
U2(S3) >
(5.19)
12(84).
Since the game is zero-sum, the inequality chain implies that
(5.20)
11(83) > U1(s,) + E
=
11(83) + 6
(5.21)
c
(5.22)
U2(2) > U2(s4) + 6.
(5.23)
1 - 1(S2) > 1 -
12(82) > 12(S3) +
U
The inequality in (5.23) indicates the difference in utility that C 2 sees between s2
and S4 strictly exceeds c. However, because f is bounded and 0 < a - b < 6 and
0 K b - c K 6, we also have that
|12(S2) -
212(S4)1
f(q)dq r2(s2)
81
fcr(q)dq
C2(S4)
.
(5.24)
Define
'2
=X&2 (S2 ) E I
(5.25)
2 (S 4 )
= {q E '9Z: q E
J
2
(S
2
)
U
L2(s4),
be the symmetric difference of C 2 (s2 ) and
that are in C2(S2) or
2(S4),
q V
2(S4),
C2(S2)
(5.26)
C2(s4)}
i.e., the collection of all elements
but not both, we have
(5.27)
f(q)dq
1U2 (S 2 ) - U2(S4)1 <
(max f (q))
(5.28)
dq
(5.29)
< 4M.
The transition from (5.28) to (5.29) follows from three observations. First, C2
turns at b in both S2 and s 4 . Second, C1 turns at most 26 earlier in s 4 than in s2;
hence, accounting for backtracking, the integral in (5.28) is less than or equal to 46.
Third, and finally, fT is, by assumption, bounded everywhere on 'R by M.
By choosing 6 such that 0 <6 < ' , we have, from (5.27)-(5.29), that
1U2(s 2)- U2(s4)1
(5.30)
e,
which contradicts (5.23), thereby refuting the initial assumption, and establishing
that T is indeed a finite sequence.
Combining Propositions 5.1 and 5.2 gives the following result.
Theorem 5.5. Consider the 1T-CPRG with bounded target density
E > 0, the profile
(Si,
s2)
fT.
For any
produced as output from Algorithm 5.2 is an E-NE of the
game.
Proof. For any E > 0, Proposition 5.2 ensures that Algorithm 5.2 terminates and T
82
is finite. Let the search strategies of C 1 and C 2 that emerge from the algorithm be si
and s2, respectively, and s
=
(si,
s2). Let i E {1, 2} be the index of a cow that turns
no later than her rival in s. From Proposition 5.1, 12i is the only cow that could
potentially have an incentive to unilaterally deviate from s. However, for this to be
the case, T must contain at least one extra element than it actually does, which is a
contradiction. Therefore, neither C 1 nor C 2 has an incentive to unilaterally deviate
from s, implying s is an -NE.
M
The fact that the 1T-CPRG admits an E-NE for any E > 0 allows us to comment
on the existence of general NE.
Theorem 5.6. A 1 T-CPRG with bounded f'T has a NE.
Proof. Assume, to obtain a contradiction, that the 1T-CPRG dos not have a NE.
Then for any strategy profile , one of the cows, say Ci, can unilaterally improve her
utility, by some amount E > 0, by unilaterally deviating from 9j, i.e., the strategy she
employs in . However, from Theorem 5.5, for precisely this value of 6, there exists a
strategy s* such that s* is an -NE of the 1T-CPRG. This fact contradicts the initial
assumption, and, hence, establishes that the 1T-CPRG does indeed have a NE.
U
We remark that although Theorem 5.6 guarantees the existence of a NE, it does
not provide a direct algorithm to compute such a quantity. However, an approximation of arbitrary accuracy can be obtained by running Algorithm 5.2 with a sufficiently
small e > 0, rendering the point moot for all practical intents and purposes. Shortly,
we will parlay our understanding of the 1T-CPRG into a methodology to solve CPRGs
in which the cows may turn up to a finite number of times. In the interim, we take
the opportunity to discuss two practical considerations.
5.8
1T-CPRG: computational considerations
This section provides a brief discussion of some of the computational issues associated
with the 1T-CPRG and, specifically, Algorithm 5.2, where a number of maximiza83
tions over potential turning points must be calculated. Fortunately, the difficulty
in evaluating these expressions is eased by the circular geometry of 'R and the fact
both cows travel at the same speed. For example, assume, as in Figure 5-2, that C1
and C 2 are initially heading toward one another. In this setting, if tj
landclaims Z1 and
2
=
t2 , then the
from (5.3) may be readily calculated from symmetry. However,
should Ci unilaterally deviate and turn, instead, at time t1 + At, At > 0, then Ci's
other frontier is eroded by an amount At. This realization streamlines the process of
calculating deviations in landclaims which, in turn, alleviates some of the difficulties
in computing the associated utilities. In the event the cows are chasing each other
(e.g., both cows initially have a cw heading) the strategy of the cow that turns second
is simple: turn the instant before meeting the other cow.
5.9
The 1T-CPRG for different cow speeds
In the CPRG, it was assumed C1 and C 2 each travel at unit speed.
It is worth
remarking that the analysis in this chapter still applies in the case C 1 and C 2 have
speeds v, and v 2 , respectively, with vi
=
v 2 . Assuming Ci knows v-i (which is
reasonable given she knows qi,t), the only change v,
=
v 2 introduces is in computing
the landclaims associated with specific turning times, which feature in the various
optimizations in Algorithm 5.2. Fortunately, the arguments in Section 5.8 are easily
.
amended by continuing to leverage the circular symmetry of 'Z and the ratio v1/v 2
5.10
Finite-turn CPRGs
In the preceding analysis, we assumed C1 and C 2 may turn at most once. This is
a rather severe limitation to impose on the hungry cows. In this section, we study
CPRGs in which C 1 and C2 may turn up to a (pre-specified) finite number of times. In
this regard, the (ni, n 2 )-CPRG is equivalent to the 1T-CPRG defined in Definition 5.2,
except for the important distinction that Ci may now turn up to ni times. More
84
specifically, let
G
=
GCPRG(f7 q(0),
i(0),
ni)i=1, 2
(5.31)
denote the CPRG that has target density fW and in which Ci, with initial condition
(q (0), 0i(0)) (E '9Z x 1, may turn up to ni E Z>o times. Let s E S be the search
profile in which Cj turns no later than her rival at a time tj E [0, 27r) in g. Also, let
(s,t,) =
1(s,tj)
U
2
C 'Z denote the set of all points visited by at least one
(s,t)
cow over [0, tj] under s. The decisions the cows face in the remainder of the game,
i.e., the game unfolding for t > tj, are precisely those captured by the game
Oi(tj),
,qi(tj),
,2GCPRG(f
5.32)
i)=
1=
9
where
fr(q)
=
0
f '(q)
if q E Z(s, tj) , for q E 'R and
otherwise
ni =
(5.33)
(5.34)
ni
otherwise.
A remark is in order as it relates to f7 in
9.
Equation (5.33) implies fw is deficient,
i.e., fT f'(q)dq < 1, if, by time tj, the cows have, collectively, visited a subset of 'Z in
the support of fW. However, with respect to maximizing (5.4), i.e., Ci's probability of
capturing ', we see that what is important to Ci is the value of fr.(8 ) f 7(q)dq, where
i(s) is the landclaim acquired by Ci at the end of the game. From this perspective,
we can think of the cows as accruing density throughout the search, an interpretation
that is well-defined even in the case of deficient target densities. Henceforth, when
we refer to a game using the notation above, it will be with the tacit understanding
that this interpretation is in place.
We are now in a position to address when Ci should turn in the (ni, n 2 )-CPRG.
85
Starting from
g 1 , the
optimal time for C, to turn is given by
ti= min argmax{h(ti)}, where
(5.35)
tie[O,2r]
(5.36)
h(ti)
f(q)dq+
=
g'
RG
~
ifL(ti
)
q (ti) q,(ti), ij)j=1,2,
with Zi (ti) the landclaim Ci acquires in [0, ti], and
is the optimal utility that Ci can acquire in gcPRG(f , qj (ti), j(ti),
hj)j=1,2.
Therefore,
in scheduling her turns, Ci considers not only the density she acquires prior to turning,
but also the density gathered in the equilibria associated with the resultant game. In
this way, we can think of the games as reducing to simpler games, in the sense that
there are fewer overall turns that can be made, each time a cow turns. This viewpoint
is depicted in Figure 5-3.
To solve (5.35)-(5.36) using dynamic programming, it is necessary to first solve
the CPRG for the relevant base case scenarios, i.e., the family of games shaded in red
in Figure 5-3. Having studied the 1T-CPRG in Section 5.7, the remaining base cases
are those in which one cow, say Ci, may turn ni > 2 times, and the other cow, Ci,
has expended all of her turns, i.e., the (ni, 0)-CPRGs and (0, n 2 )-CPRGs, for n1 > 2
and n 2
2, respectively. Any general (ni, n 2 )-CPRG will degenerate to one of these
base case games once the cows have made a sufficient number of turns.
Fortunately, characterizing equilibria play in the (ni, n-i)-CPRG for ni > 2 and
n-i = 0 is straightforward. In these encounters, C-i has expended her turn budget
and is rendered strategically inert. However, Ci's best strategy is to, first off, orient
herself so that she is traveling toward C-i, i.e., /i = 4D \ {#_-}, as if to set up a
head-on meeting. Establishing this alignment, takes at most one turn (in the event
Ci is already on a collision course with C-i, after C-i makes her last turn, no turn
is required to establish the necessary heading). Subsequently, Ci proceeds to travel
toward C-i before turning, the instant before she would collide with C.i. This last
86
turn ensures e2 is positioned to capture any unclaimed density still remaining on 'z
and brings the number of turns made by C4, since ei exhausted her last turn, to no
more than two. For completeness, this search strategy is outlined in Algorithm 5.3.
Algorithm 5.3: functionality of si = sq
i #i(0)
2 6 <-
<-
'D\
(0);
0+
while D(qi, qj) > 6 do
3
L
4
go straight;
5 turn around;
6
go straight;
We remark that although finite-turn CPRGs degenerate to more manageable
CPRGs as the cows turn, the approach suggested by Figure 5-3 requires families
of games be solved for a variety of initial cow headings and positions. Therefore,
employing a dynamic programming-based approach to study finite-turn CPRGs may
exact a rather steep computational burden. Nevertheless, it is reassuring to know
there exists a well-developed methodology to address finite-turn CPRGs at a theoretical level, and that for CPRGs permitting a modest number of turns, a practical
approach to numerically compute equilibria search profiles in a continuous workspace.
5.11
Summary of the CPRG
This chapter introduced the Cow-Path Ring Game as a means to study strategic
decision-making in systems where multiple, self-interested, mobile agents compete to
find a target on a ring. Salient features of the CPRG included the fact that each cow
had minimal sensing capabilities and limited prior knowledge of the target's location.
Reminiscent of probabilistic search problems, each cow used prior knowledge on the
target's position to structure her search. However, owing to the self-interested nature
of the participants, each cow's search strategy also factored in the movement and
ambitions of rival cows, making her search strategic in a game-theoretic sense.
87
1
12
2
22
122
2
2,1
2
1
2
3,1
2
Figure 5-3: A diagram showing associations between families of finite-turn CPRGs. The
node labelled with the pair (i, j) denotes the family of games in which C 1 and C 2 may
turn up to i and j times, respectively. The numbers above and beside the arrows indicate
which cow turns to bring about the indicated transition. The nodes representing base case
games, for which equilibria strategies may be found using the methods discussed in previous
sections, are colored in red. The nodes representing all other games are colored in gray.
The arrows indicate how one family of games reduces to a simpler family of games when a
cow turns. For example, the (2,2)-CPRG becomes an instance of the (1,2)-CPRG when C 1
turns, and an instance of the (2, 1)-CPRG when C 2 turns.
Given the inaugural nature of the work, our analysis focused on CPRGs involving
two cows. Because of the strategic options available to each cow, we argued that it
was challenging to determine both the number of times and the locations at which
each cow would turn around in an equilibrium search profile.
On account of these
difficulties, we considered the 1T-CPRG and through an iterative algorithm, showed
that any such game with a bounded target density admits a NE. This analysis was
extended, using a dynamic programming framework, to address games in which each
cow may turn up to a pre-specified finite number of times.
By re-envisioning the
task of capturing the target with the equivalent goal of maximizing the accumulation
of target density, successive turns transition the game into simpler CPRGs, that,
ultimately, reduce to an instance of either the 1T-CPRG or a simple game in which
only one cow has strategic options.
By focusing on CPGs that take place on a ring and involve two cows, we inevitably
introduced many avenues with which to extend the basic framework laid out in this
chapter. For example, it remains to provide a full treatment of search games involving
88
n > 3 cows on the ring. Alternatively, it is likely the case that insight can be gained
from studying competitive search games that unfold in alternate environments, for
example, on a graph or within a polygon. Although we consider these to be perfectly
valid pursuits, and go on to elaborate on them and other future work items in the
final chapter, we do not pursue them explicitly in this thesis.
Rather, the next
chapter considers variants of the CPRG with a focus on characterizing games with
asymmetric information models. Finally, its successor considers games that unfold in
dynamic environments in which targets arrive on an ongoing basis. In each of these
scenarios, we again limit the analysis to the case of two cows and continue to focus
on encounters that take place in ring environments.
89
90
Chapter 6
Games with Asymmetric Information:
Life as a Cow Gets Harder
This chapter continues the investigation of two-cow CPRGs by considering a number
of variations of the standard CPRG. We study games in which
ei
is subjected to a
penalty each time she turns. We develop an upper bound on the number of turns
a hungry cow would ever make when playing such a CPRG. Subsequently, we study
an intriguing variant of the CPRG: one in which a single cow has superior situational awareness with respect to the clover's location and her rivals' intentions. The
chapter begins by providing a motivational example to illustrate how asymmetries in
the information available to each cow can arise in a competitive search setting. To
precisely articulate these discrepancies, we also supplement our existing library of notation. Subsequently, we formally define the asymmetric information game. We then
characterize equilibria for this game by developing strategic algorithms that allow
the less-informed cow to retain a respectable chance of capturing the target and the
more-informed cow to leverage her superior situational awareness. Finally, we provide
an interpretation of social welfare for search games with asymmetric information and,
for one such family of games, specify a socially optimal search policy.
91
6.1
Searching with asymmetric information:
a motivating example
Chapter 1 cited the example of two rival shipwreck recovery boats searching a coral
reef for the remnants of a treasure ship lost at sea. There, it was assumed that
each boat had the same prior on the sunken ship's location. This would be the case
if, for example, the boats had access to a shared sonar map of the waters. Now,
instead, imagine the first boat's prior on the ship's whereabouts is based on cutting
edge sonar imagery, while the second boat's prior is distinct and based on historical
maps and word-of-mouth accounts of the sinking. In this setting, each boat would,
in general, have different valuations for being the first to visit a specific region of the
workspace. In this chapter, we will consider a competitive search game with a similar
informational infrastructure.
As an added twist, suppose that, owing to an unscrupulous and easily-bought
deckhand, the first boat is privy to the prior of the second boat. This insider information allows her, i.e., the boat's crew, to forecast her rival's intentions in an
unreciprocated manner. Despite the more elaborate preamble, the ensuing contest is
once again a competitive search game played between two recovery boats. However,
in this case, the search strategy of the first boat should, in addition to reflecting her
prior and the movement of the rival boat, exploit, to whatever extent possible, the
added information at her disposal. The previous chapter began the process of populating a toolbox of initial results from which to branch out and tackle more elaborate
competitive search games. By extending earlier results pertaining to the CPRG and
analyzing search games with asymmetric information, we seek to continue this vision.
This chapter begins by bounding the number of turns a hungry cow would ever
make in a two-cow CPRG under the added assumption that a cost is levied each time
she turns. Subsequently, we formally define the Cow-Path Ring Game with asymmetric information and proceed to characterize equilibria strategies for one such family
of game. Finally, the asymmetric information framework provides an opportunity
92
to complement the predominantly competition-oriented results discussed to date by
providing a notion of social utility and characterizing a family of socially optimal
search strategies.
6.2
Supplementary notation and terminology
This section lays the groundwork to support the forthcoming discussion of CPRGs
with asymmetric information. Naturally, we retain all conventions previously introduced in Section 5.5. However, the richer and more nuanced nature of these games
calls for a suite of supplementary notation aimed at (i) providing a more expressive
system for relating one segment of 3Z to another, (ii) extending Ci's information model
to reflect where she believes Ci suspects T is located, i.e., Ci's prior, and (iii) developing a behavioral model that describes how Ci behaves based on the knowledge
that is and, equally important, is not available to her.
To refer to a specific segment of 9, let, for qi, q 2 E 1 and d E (D, [qi,
q2]d
denote
the segment obtained by tracing an arc from qi to q2 along 9Z in direction d. So
that we may be equally adept at specifying one point on 9 in relation to another,
let, for x E R> 0 , q E 9Z, d E 4b, (q +
X)d
denote the point obtained by traveling
distance x, along 9Z, from q in direction d. The functionality of this notational system
is illustrated below in Figure 6-1.
6.3
Information models for situational awareness
In Section 5.2, Ci's information model was given as i(t) = (f,
F(t)), such that her
search strategy was a function of (i) her initial position on 9Z, (ii) her prior on Ts
location, and (iii) information she has regarding the position of rival cows. With
asymmetric information,
ei's
search strategy will also depend on any knowledge she
has of where her rival, Ci, suspects T may be. Accordingly, we will have to augment
2i
to reflect this relationship.
93
q2
[q1, q2lcw
q
q3
Figure 6-1: Visualization of the functionality of notation used for describing subregions of 9Z
and one point relative to another on . Due to the circular topology of 9Z, there is flexibility
in the notational system. For example, [qi, q2]c,. and [q2, ql]ccw refer to the same arc of 9Z.
Similarly, (q3 + x)ccw and (q3 + 27r - x)cw refer to the same point on 9.
To begin, Ci's unique prior on T's location is denoted
fA:
'Z -* R>o. To describe
what information, if any, Ci has regarding where her rivals suspect 7 may be, let, for
i, j E C, i / j,
f$7
denote the prior density that Ci believes Cj has on T. The special
case where f7- =0 is taken to imply that Ci has no idea where Cj suspects 7 may
be. In terms of this notation, the CPRG introduced in Chapter 5, fits within this
notational system under the assignments f. =
f7
= f , Vi, j E C. Summarizing, we
can express Ci's information model in the AI-CPRG by the tuple
-z(t)
=
(6.1)
(IF(t), f7, fJ)ex\,
and Ci's search strategies, still of a feedback nature, again take the form
Si = {sil si(t) : Ii(t) -+ Q}.
We remark that the ability of Ci to accurately maintain the set
(6.2)
{ff.}jEC\i
is likely
beyond the abilities of our bovine participants. Nevertheless, for legacy reasons, we
elect to continue framing the search problems in terms of cows, even if the assumed
capabilities are more in line with those of a human or robotic agent.
94
6.4
Behavioral models for asymmetric games
Under the closed-loop information model, it is possible, at least conceptually, for an
intelligent Ci to predict how Ci should behave.' Indeed, this capability was central
to the analysis of the CPRG in Chapter 5. However, in the AI-CPRG, Ci may notice
inconsistencies between
f7. and qj,t in the event fi7.
fj. To proceed with her search,
Ci must, at some level, resolve this discrepancy. Moreover, this type of resolution may
be required at each time t > 0 and, to complicate matters, C-i may be going through
a similar exercise on her end. Precisely how Ci operates given
f7, fr.,
and qj,t is
determined by her behavioral model. In general, characterizing Ci's behavioral model
requires a rule for updating
f.
based on F (t). Unfortunately, protocols for performing
these updates, e.g., Bayesian belief schemes, significantly complicate the analysis. In
response, we focus on the extreme scenarios in which
In the event
f7Z
=
fT,
ff
=
0 or
=
flt.
Ci can perfectly forecast how C-_ would respond to her
actions. Accordingly, Ci's behavioral model continues to assume that Ci is hungry
and thus acts as a rational utility-maximizing player. At the opposite end of the
spectrum, if
f7i
= 0, then Ci has no idea of where C-_
suspects ' may be, and,
therefore, how Ci will explore 'Z. Accordingly, it is assumed Ci adopts a defensive
approach to search and uses a maximin strategy so as to maximize her minimum
achievable utility. A maximin strategy was defined in the context of a generic game
in Definition 3.3. Later in this chapter, we will outline exactly what constitutes a
maximin strategy in the context of a CPRG.
6.5
A bound on the maximum number of turning
points in the CPRG
In this section, we develop a bound on the maximum number of times Ci would ever
turn around when playing the CPRG. Our analysis assumes a slightly altered formulation of the CPRG in which a fixed cost, ct > 0, is levied against Ci each time she
95
turns. Ideally, such a bound could be developed without this additional stipulation;
however, because C1 and C 2 can, in theory, jostle for position during the game, it
is possible for each cow to turn an arbitrary number of times at the beginning of
the game, collectively explore negligible territory, and have no impact on the overall
outcome. We remark that the inclusion of a turning cost in the formulation is reminiscent of the work in [36], which appended turning costs to the CPP. Clearly, the
similarities end here, as the arguments necessary to coerce out the bounds presented
in this section emerge from an analysis that reflects the competitive nature of the
CPRG.
When Ci reverses direction during a CPRG, she effectively commits to backtracking across previously explored territory.
Intuitively, by turning too frequently, Ci
could provide Ci with an opening to increase her utility. Consequently, it is tempting to speculate that the number of turns Ci would ever make is subject to an upper
bound. In Chapter 5, an iterative algorithm was specified for computing NE search
strategies for games in which each cow may turn at most a pre-specified number of
times. Clearly, a definitive upper bound, n!,b, on the number of turns Ci would ever
make could be used to conclusively solve the CPRG by directly solving the (n'b)n
u
CPRG in Figure 5-3.
For the remainder of this section, it is assumed a cost c is levied against Ci each
time she turns. Let ni(s) be the number of turns made by
eC
under s E S. Then Ci's
utility under s, subject to turning costs, is
U (s) = 1,(s) - ctni(s)
=
Jf(q)dq-ctni(s).
We seek an upper bound on ni(s*) for s* E
SNE.
(6.3)
(6.4)
The following result will prove
useful in this pursuit.
Proposition 6.1. Consider the standard CPRG from Definition 5.1. For any initial
96
conditions (qi(O), Oi(O))i, the size of the land claim secured by
ei
satisfies
max min {Ii(si, si)I} = 1, for i = 1, 2.
siESi -iS-2
(6.5)
In words, (6.5) says that Ci can always, if she so desires, be the first cow to visit
at least half of 'R. However, it is important to remember that maximizing 12il is not
necessarily consistent with maximizing Uj; for example,
f7 may
be negligible over a
large portion of 'Z, and Ci better served to concentrate on securing a smaller, but
more lucrative Zj. This idea will be revisited in the next section.
Proof. We prove (6.5) constructively by specifying a search strategy for
ei that
estab-
lishes the bound. To this end, we sidebar briefly and introduce the following search
strategy.
Definition 6.1. For i E {1, 2}, let 9i denote the set of points on 'Z that are closer,
via travel in either the cw or ccw direction, to Ci than C_. The mirroring-search
strategy, denoted s,, implies the following functionality:
, t > 0.
(si = sm) - $\O-i(=
$_i(t)
(6.6)
otherwise
In other words, sm is the search strategy that involves Ci always traveling toward,
but in the opposite direction to Cj. In so doing, Ci's motion "mirrors" that of her
rival Cj.
For si = sm, symmetry stipulates that when Ci is exploring new territory, so is Cj,
and vice versa. It follows that |Ci(sm, s.-)j = |C-i(sm, s-j) = 1/2 for all s-i E S-i,
and
max min{ Ci(si, s_j)
Si
S-i
;> min{Ci(sm, s-)} =
s-i
.
(6.7)
Similarly, for s-i = sm,
sm)|}
max min{I i(si, sm)I} < maxf{Cii(si,
Si
Si
S-i
97
=
.
(6.8)
The result is established upon noting that (6.7) and (6.8) sandwich the quantity of
interest, namely
max min{Ii (si, sj)I},
Si Si
(6.9)
between 1/2.
0
Let ni(s) denote the number of turns ej makes under s E S. If we impose a cost
ct > 0 on C? each time she turns, and deduct the total cost from (5.4), then we may
parlay Proposition 6.1 into an upperbound on ni(s*).
Proposition 6.2. Consider a CPRG with turning cost ct > 0. For any s*
E
SNE,
ni(s*) satisfies
ni (S*<
-
f f (q) dq] ,
max
(6.10)
x
where w(x)
(x+ ), and
[xl
is the ceiling function of x, i.e., the minimum z E Z
no less than x.
Proof. We begin by bounding b1(s*). Assume, to obtain a contradiction, there exists
s* E SNE with utility
w(x)
Ui(s*) > max
f
(6.11)
(q)dq.
Because the CPRG is constant-sum, (6.11) implies one of three alternatives must hold:
(i)
Cji(s*)I > 1 and, consequently, |Li(s*) < 1, (ii)
i(s*) is non-convex, or (iii)
both of the above. However, under s-i = sm, C-i ensures, from Proposition 6.1, that
Z _-(s*)| ;> I ->
I(s*)I
<;
Moreover, s-i = sm ensures Ci (s*) and L _(s*) form a
convex partition of 'Z. Consequently, s*
sm and Ci has a unilateral incentive to
deviate from s*i (to sm), implying that s* (
SNE-
The contradiction establishes that
Ui(s*) is less than or equal to the righthand side of (6.11).
By not turning around, Ci accrues positive utility and zero cost; consequently, she
will only turn a number of times ni(s) such that ni(s*)ct <; U(s*). Upon solving for
ni(s*) E Z>o, the bound in (6.10) follows.
98
U
6.6
CPRGs with asymmetric information
In the traditional CPRG, each cow maintains the same known prior, fT, on 7's
location. Consequently,
f7
=
f7_
=
f
=
fT and Ci knows Ci's prior, i = 1, 2.
In this section, we consider search scenarios in which Ci has a distinct prior on T's
location and may or may not have knowledge of
f
. Henceforth, we will refer to
an encounter of this type as an Asymmetric Information Cow-Path Ring Game, or
simply an AI-CPRG.
Definition 6.2 (AI-CPRG). Consider two hungry cows, Ci, for i E {1, 2}, initially
located on 'Z at qi(0), respectively. Ci retains her movement and sensing capabilities
from the CPRG in Definition 5.1, but has a distinct prior
tion, with
f7
-,
f_.
In addition, Ci maintains
f7_i
f7: 'R
--+ R>o on T's loca-
e_- has
f7, ff),
as the prior she believes
on 7 Ci's information model is therefore specified by the triple i(t) = (IF(t),
t > 0 and her search strategies are mappings of the form si : i(t) -+ Q(t). What
search strategy should Ci use to maximize her perceived probability of finding W? [90].
A few of remarks are in order. First, as a reminder, F(t) is the set of all trajectories
traveled by cows up to time t > 0 and Q(t) is the set of steering commands available
to Ci at t, i.e, {cw, ccw}. Second, Ci's perceived probability of finding 7 is given
by the expression for Ui in (5.4) with f' replaced by
f7.
The relevant geometric
attributes of an instance of an AI-CPRG are illustrated below in Figure 6-2.
6.7
AI-CPRGs with perfect knowledge
AI-CPRGs in which each cow is aware of her rival's prior, i.e.,
fZ_7
=
ff
for i = 1, 2,
provide a level of situational awareness reminiscent of that in the traditional CPRG.
In these instances, we may extend earlier results and make an immediate statement
regarding the existence of equilibria strategies.
99
..
q
C2
1C.....
C1
C2
(b)
(a)
Figure 6-2: Initial positions, qj(0); initial headings, <ri(0); and target priors, f7; of C 1 and
along the SouthC 2 for an instance of an AI-CPRG. (a) fT, shown in blue, has local maxima
East and North-West regions of 'R. (b) fj, shown in green, is more evenly distributed and
contains three modest peaks along 'R. For q E 'R such that f"(q) # f2(q), C 1 and C 2 have
different valuations for visiting q first.
Proposition 6.3. Consider an AI-CPRG in which Ci is aware of C- 's prior, i.e.,
f7,
= fT, i E
{1, 2}, and may turn at most a finite number of times. The game
admits a NE s* E SNE-
Proof. With
f7f_
=
fi,
Ci can perfectly forecast the best response C_; will make
to any action Ci takes. Leveraging this ability was key to the functionality of Algorithm 5.2, specifically lines 7, 9, and 16, and the analysis of the CPRG conducted
in Section 5.7. With only minor modifications to the associated proofs of Propositions 5.1, 5.2 and Theorem 5.6, the same line of reasoning may be used, for the case
at hand, to establish the existence of an E-NE s* E SNE. Owing to these similarities,
we elect to forgo retracing these requisite arguments, trusting that an appreciation of
the aforementioned arguments is sufficient to handle the necessary amendments, the
extent of which involves using
f7,
in place of
f
, in calculating U2 .
U
The full extent to which informational asymmetries influence search strategies becomes apparent only when one cow has an unreciprocated knowledge of her rival's
prior.
The following example explores how this extra knowledge can, in select in-
stances, be exploited to the benefit of the more-informed cow. Subsequently, the key
ideas will be formalized to develop theoretical results for a broader family of games.
100
Example 6.3. Consider the AI-CPRG in which
in Figure 6.7. Furthermore, assume ff2 (q)
9Z 1 = [b, c]ccw and 'z 2
=
=
f,,
qi (0), and q 2 (0) are as illustrated
f"(q) = 1, Vq E 'Z, and f2
= 0. Let
[e, f],c,.
In this example, C 1 is keen to be the first cow to visit 'Z
1
and 'Z 2 , as, in her eyes,
there is little chance of 7 residing in R \ {jZ1 U 'R2 }. Owing, to
fl
2
=f,
C1 knows
where C 2 suspects T may be, and can forecast C2 's behavior accordingly. In contrast,
fi is uniform
over 'R, indicating C 2 has little idea of where 7 may be. Compounding
her situation, C 2 has no idea where C 1 suspects the target may be, making it difficult
for her to postulate as to how C 1 's actions, i.e., 41,t relate to C 1 's intentions. Given
f2j1
=
0, we assume, as per our behavioral model, C 2 adopts a maximin strategy. For
the time being, denote the maximin strategy C 2 plays as s2
=
som. We will say more
about so, shortly; for now, it suffices to know that the subscript in
Sam
stands for
opportunistic mirroring.
In the event C 1 and C 2 share the prior
fA
=
=
f 1,
as in the CPRG, C1 could
not secure 'R 1 and 'R 2 in any equilibrium search profile; rather, these territories would
be divided between C 1 and C 2 , respectively. However, because C 1 is in the auspicious
position of possessing superior situational awareness, she can, in this case, lure C2
away from '%, thereby increasing her perceived probability of finding 7. To see how,
consider the following strategy: C1 turns immediately and travels to d; upon reaching
d, C 1 turns again and travels toward b; upon reaching b, C1 turns once more and heads
toward
f
or, more compactly,
si --
1
: qi(0) "+
d e+ b
ccw+f.
(6.12)
Meanwhile, by using som, C 2 begins the game by mirroring C 1 and traveling toward
f;
upon reaching
f,
C2
(who is carefully observing C 1 ) turns and heads to-
ward g, where, upon observing C 1 seemingly forfeiting [g, b]cew and recognizing that
I[g, b]cwI
D I[f, e]cw l, C2 continues on and explores [g, b]cw while CI explores [f, e]cw.
At the end of the search, Z1 = [b,
f]ccw
D {'R1 U R2 } and L 2 = [f, b]cc with |C 2 1>
101
12
Figure 6-3: An instance of an AI-CPRG. The cows (depicted as cars) Ci and C 2 are initially
diametrically opposed at the top and bottom of 'R, respectively. 1's prior on '7, namely ff,
is shown in blue. Owing to f7, C 1 is motivated to, if possible, be the first cow to explore
segments JZ 1 and 'R 2 . Shown in green, it is assumed that f2(q) = 1, Vq E 'R, such that any
two segments of R having equal length are equally valuable to C 2 . The points a, b, c, d, e, f
and g are points of interest in Example 6.3.
The insight gained from the previous example may be extended to represent a
broader class of games. To this end, let md denote the midpoint between qi (0) and
q2 (0) on 'R that, starting from qi(0), would be reached the fastest by traveling in
direction d E 4).
Furthermore, and similar to the notation used in Chapter 4 to
describe search plans in the CPRP, let
()di
- - n+ i di_1 =di, i =2, . .., ,(6.13)
qj(0
)
d2
qi
d
I
be the search strategy in which Ci, starting from qi(0) and traveling in direction di,
travels in alternating directions between the points qi, . . . , qn. The strategy below
applies to a less-informed cow that is impervious to utility differences of less than
AX >0.
Definition 6.4 (opportunistic mirroring). The Ax-opportunistic mirroringstrategy, sAxom E S, is the search strategy in which Ci uses sm, unless, at some time t1 > 0,
102
she is presented with the opportunity to exclusively explore along an unexplored segment of length at least Ax greater than the length of any other unexplored segment.
Should such an opportunity arise, Ci explores the longer segment, reverting to sm if,
at some future time t2 > t1 , she witnesses C-i turn.
The following theorem combines many of the aforementioned ideas to encapsulate
equilibrium play for an AI-CPRG, quantifying the extent to which the more-informed
cow can benefit from her superior situational awareness.
Theorem 6.5. Consider an AI-CPRG with f( 1
=
0 and f! 2 (q)
=
fI(q) = 1, Vq E R
Assume the less-informed cow, i.e., C 2 , is impervious to utility difference less than
Ax. There exists s*
s* = q 1 (0)
s*=
G SNE,
+*) (m
with
+ x*), (
(mJ+ x* +
(6.14)
.)d* ) (mn + X*)d*,
sAxm, and with x* E R and d* E (D given by
(6.15)
ff(q)dq
(x*, d*) = argmax
s.t. qi(0) E [Zd, ZJ]j, q 2 (0) E [Zd, Zg]d,
where Zd
=
(md
+
X)d
and zj
=
(6.16)
(m + x + ')d.
Proof. Figure 6-4, provided below, illustrates key points on 9z that are involved in the
arguments that follow. To begin, note that because ffi
=
0,
(
2
, in accord with our
behavioral model, adopts a defensive mindset and plays a maximin strategy, of which
sAx,om is weakly dominant, i.e., for any maximin strategy s', U2(si,
for all si E Si and U2 (si, SAx,om)
>
SAx,om)
> U2 (si, s')
U2 (s1 , s') for some si E Si. Apprised of C 2 's
situation, 11, can, by adopting the strategy in (6.14), ensure a utility at least as great
as the integral in (6.15), as Ci offers C 2 the chance to explore the segment
J'
=
[(mj + x* + !)
d,
103
(mj + x* + !)d]d*
(6.17)
(mg+x*+Ax/2)d*
1d
(M+ ++X*)-j2*
Md
Md
x*
(Md+X*)d*
3
(m-+x*+Ax/2)-
Figure 6-4: Visualization of key quantities used in the proof of Theorem 6.5. The points
labelled 1, 2, and 3 in red correspond to the three points visited by C 1 in 6.14. In the
instance shown, d* = ccw.
of length 2x* + Ax. As this region is Ax longer than the segment
(6.18)
'Z" = [(md + X*)d*, (md + X*)*]*,
C 2 realizes the opportunity, when she is at (mj + x* + 4-f
at (mJ + x*
+
and witnesses C1 turn
)d*, and explores the larger segment, leaving C1 free to search the
smaller, but more personally valuable segment 'T". To see that C1 can do no better
than the optimal cost in (6.15), consider that to improve upon U,1(s*, s*), Ci must (i)
increase
iC
1I, (ii) realize an C1 that is the union of disjoint convex sets, or (iii) do
both of the above.
We examine each possibility in turn. First, because C1 offers C2 the minimum extra
territory needed to shift the land claims about md and mj and S2 = sAx,om guarantees
IZ21
;> I, a deviation from si cannot increase 1,iC. Second, because sAx,om prevents
-2'
C 2 from crossing paths with C 1 , C1 and AC 2 are each convex sets, implying both the
second and third prospects for C 1 to increase U1 are infeasible.
Therefore, equality
in (6.15) is in fact tight. In effect, it is only through C2 playing SAx,om that C1 is
capable of luring C 2 into shifting the boundaries of Z 1 and C2 relative to what would
104
be achieved had S2 = sm, i.e, the demarkation points m,, and m,,".
Therefore, neither C1 nor C 2 has an incentive to unilaterally deviate from the
profile (Si, SAx,om), and (s*, s*), with s* the strategy specified by (6.14) and (6.15), is
a NE of the game. As a final remark, we note that under s*, C1(s*) =
[Zd*,
z-. ]
and
C2(S*) = [zd-, zd.d*.
We note that f2j being uniform over 9Z and
ffi
=
0 featured prominently in the
analysis of Theorem 6.5; namely, it made C 2 's weakly dominant maximin strategy
s2 = sAq,om.
In addition, these attributes made it relatively straightforward for
C 1 to forecast C 2 's behavior and exploit her superior situational awareness through
(6.14) and (6.15). For nonuniform
f2,
specifying C 2 's maximin strategy, given the
full-information nature of the game, is considerably more involved. Unsurprisingly,
it is also more difficult for C 1 to predict C 2 's actions and exploit her knowledge of
f2
in these circumstances. For these reasons, we relegate a full treatment of this
general class of AI-CPRGs, which is likely to involve a full-fledged expedition into
more contemporary fields of game theory, as a future work item, opting, instead, to
switch gears and consider the societal gains possible in a cooperative re-imagining of
AI-CPRGs.
6.8
Socially Optimal Resource Gathering
Thus far, the cows have faced-off against one another in an all-out search for T. It
is intriguing to contrast the collective well-being of the cows in an AI-CPRG having
f1
f2,
with that which may be achieved via cooperation.
To this end, note
that by maximizing (5.4), Ci is, by visiting select segments of 9Z first, striving to
capture as much area under
f7
as possible. If we interpret
f7
as the density of a
commodity, say resource i, that Ci harvests, and further assume that in harvesting
resource i from JZ' C 9Z any deposits of resource -i, i.e., the resource collected by 1_i,
are destroyed, then we can recast our target-capture search games as a functionally
equivalent resource-collection game. Within this context, a natural interpretation of
105
the social utility afforded by search profile s E S is the total weighted value of all
resources collected, i.e.,
Us(s) := aif
1
(6.19)
(q)dq + a2 Jf2i(q)dq,
S(S)
Z 2(s)
where ai is the per unit societal value of resource i. Without loss of generality, and
to ease the exposition, we will focus on the case where a, = a2
=
1. To encapsulate
these ideas, a summarize of the aforementioned problem, which we will refer to as
cooperative resource collection, is provided below.
Definition 6.6. (Cooperative resource collection problem) Consider an AI-CPRG
with the amendment that
f7 no longer
represents a target prior, but rather the unit
density of a commodity, referred to as resource i, along 9R. For i = 1, 2, Ci is aware
of
f7 and has perfect
knowledge of
fN,
such that
f,
-=
f_%.
When Ci is the first to
explore a segments of 9Z, she collects all of resource i along the segment and, in the
process, destroys all of resource -i along the segment. How should C1 and
C2
search
to maximize (6.19)?
To understand how C1 and
e2 ought to search
Sso:= {s E S: Us(s)
3Z, let
Us(s), Vs' E S}
(6.20)
denote the set of socially optimal search profiles. Recognizing that, as per the rules
of the game, at each q E JZ, only a single resource can be harvested, we have that
Vs E S, Vs E SSO,
Us(s)
Us(s)
(6.21)
J
(6.22)
max(f(q), f?(q))dq
J(fi(q) + f27(q))dq
= 2.
(6.23)
(6.24)
106
Given the current interest in socially optimal foraging strategies, assume Ci has
complete awareness of all resource densities, i.e.,
f7i
=
f1,
i E {1, 2}, thereby
allowing for unobstructed coordination between the cows. Let
Su(f) :={q E ':f(q)
be the support of
f7.
(6.25)
>O}
The following definition is useful in describing the connectivity
of subsets of R.
Definition 6.7 (Convex subset of '2). A subset '2' C '2 is convex if for all qi, q2 E
'2' at least one of the following conditions hold: (i) [qi, q2]c, C '2 or (ii) [qi, q2
c
C '2
A '2" c '2 that is not convex is said to be non-convex.
For S' E Sso, any shortfall from the upperbound in (6.24), i.e., 2 - Us(8), is a
consequence of three, possibly intertwined, factors: (i) the initial positions qj(O)iE{1,2}
being unfavorable, (ii) the resource deposits overlapping, i.e., Su(ff) n Su(ff)
#
0,
and (iii) a non-convex Su(f7). Each of these scenarios is depicted in Figure 6-5.
e10
C29
C2
C2
(b)
(a)
(c)
.
Figure 6-5: Illustration of the three ways in which Uso can fall short from the maximum
value of 2. In each figure, f7 and j2 are shown in blue and green, respectively. In (a),
C1 and C 2 are initially positioned on the "wrong" sides of 'z, resulting in a shortfall from
2. Were the cows able to switch positions, the shortfall could be avoided. In (b), overlap
between Su(f7) and Su(f27) creates unavoidable inefficiency. In (c), the shortfall results
from the lack of convexity of f7 and ff7
107
To appreciate these points, consider that in the event Su(fT) and Su(f2) are
convex, mutually disjoint, and qi(O) E Su(f7), i E {1, 2}, it is possible for Ci to
harvest all of resource i on 'R, implying Ui = 1, i E {1, 2}, and U-s = 2. Similarly,
if qi(0) = q2 (0), then it is possible Vq E 'Z, to harvest resource i at q, where i =
argmax3 {ff(q), j E {1, 2}}. To capture the essence of this latter point, consider the
following cooperative search strategy:
Definition 6.8 (Tandem search). Two cows, C1 and C 2 , collocated at time t, i.e.,
qi(t) = q2 (t), use a tandem search strategy, si
= S2 = Stdm,
for t' > t if, Vt' > t,
the cows move along 'T such that qi(t') = q2 (t') and Vq(t') E 'R, resource i(q(t')) is
harvested from q(t'), where i(q(t')) = min{{1, 2} : f7(q(t')) > f__(q(t'))}.
In words, the tandem search strategy is the strategy in which the cows move
along 'Z in the same direction and, at each point, harvest the resource at the highest
density. For convenience, define, for y E 'R, d E 4D and 'R1, 'R 2 , 'R 3 C ',
h : 'R x <D ->
the function
RO as
h(y; d) ^=
f (q)dq +
Jf(q)dq
'Z1
where 'Z 1 , 'R 2 , and 'Z
3
+
max( f(q), f(q))dq,
(6.26)
[q,(0), q2(0)]g.
(6.27)
'R93
are given by
9Z, = [qi(0), y],
9Z 2 =
[q2(0), y]j, 'Z 3
=
The following result captures the essence of the cooperative-based approach to
search discussed above.
Proposition 6.4. Consider an AI-CPRG, with f17 =
initially positioned at qi(O) and ff_ =
U* =
flg.
f27.
For i c {1, 2},
ei
is
The socially optimal utility is given by
max h(y; d).
(6.28)
Furthermore, with (y*, d*) the maximizers of (6.28) a socially optimal strategy, s E
108
Sso, is for Ci to travel from qi(O) to y* in direction d(i), where d(1) = d* and d(2) =
d*, wait, if necessary, to rendezvous with C-j, at y* and proceed to explore '%Z, in
either direction, using a tandem search strategy.
Proof. We begin by showing the social optimum can be achieved by a search strategy
of the proposed form; namely, by a strategy in which C1 and C 2 rendezvous at a
particular point y E 'R and, subsequently, search, in tandem, the portion of 'Z that
remains unexplored using stdnm
It is clear that if C1 and C 2 cross paths at any point, the socially optimal strategy
going forward is for the cows to search the remainder of 'Z in tandem.
To this
end, let 9 be an optimal strategy of the aforementioned form. Assume, to obtain a
contradiction, there exists a strategy S, in which (i) at no point do C 1 and C 2 explore
to
.
any subset of 'R in tandem, and (ii) Us(s) > Us(g), i.e, S' is strictly socially superior
In light of (i) and (ii), it must be the case that for each of the two segments,
[qi(0), q2(0)]d, d E 4, the segment contains at most one convex subset belonging to
Zi, for i E {1, 2}. To see why, consider that if [qi(0), q2(0)]d contained two or more
convex subsets belonging to Zj, the only way for Z to be realizable is if the cows
crossed paths at some point, which would violate the assumption in (i). Therefore,
there exists point z E 'R at the boundary of Z 1(') and C2(.), i.e., z E L 1 (s)
Now consider the strategy
s
n L 2 (s).
in which C 1 and C 2 travel, from qi (0) and q2(0),
respectively, along their respective shortest paths toward z, and, subsequently, explore
any remaining territory in tandem. Let 'R 1 = [qi(0), q2(0)]d, such that z E '1Z and
'Z2= 'Z \ 'Z 1 . Additionally, for ''
using s, over 'M, i.e.,
C R, let Us(s; ') be the social utility generated,
2
109
where 1i(s) := ci(s) n
'. From previous arguments,
fi(q) +
s('; 9 2 )=
92z,
<
(g)
f fi(q)dq
(6.30)
q2n2E(9)
J max(fl'(q), fj(q))dq
(6.31)
9ZR2
(6.32)
= Us( ; R2).
Moreover, because ci() n 9z 1 =
zi(s) n 9i 1 , i E {1, 2},
Us(s; J 1 ), and, given (6.30)-(6.32), that Us(s)
it follows that Us(4; R 1 ) =
Us(s). Because e is of the proposed
form, with rendezvous point z, this result stands in opposition to our earlier assumption; namely, that Us(') > Us(g), since 9 was assumed to be socially optimal among
the proposed class of search strategies. The contradiction allows us to streamline
our search for a socially optimal strategy to the class of strategies specified in the
theorem.
It remains to show the expression in (6.28) is the social utility of a strategy in
which the cows rendezvous and conduct the remainder of their search in tandem. To
this end, note that the first two terms in (6.26) represent the contributions to Us
accrued by C1 and C2 , respectively, as they travel toward one another before meeting
at y. The final term represents the contributions to Us that result from C 1 and
C2
searching all remaining territory in tandem. Therefore, for any y E 9Z and d E (D,
(6.26) is the social utility of a search strategy of the proposed form, of which we know
at least one such strategy is socially optimal. By searching over all such profiles, as
in (6.28), an S' E Sso is guaranteed to be found.
The following example illustrates how C1 and C 2 would search cooperatively to
maximize social utility for a specific instance of fjT and
f2.
Example 6.9. Reconsider the initial positions and target distributions from Example 6.3. In this case, C 2 values all q E 9R equally, while C1 has a preference for [b, c]ccw
110
and [e,
f]cc.
In this case, the socially optimal strategies of each cow are given by
si
CC1 : qi(O) ""+
s2
C2 :
2
(0) C-"
In this way, C1 harvests [b, c]ccw and [e,
remainder of 'R where f27 >
f7,
f]cc
b
+
b
C''Stdm>
where
(6.33)
b
CW'tdm
fl
(6.34)
b.
>
f2
and C 2 harvests the
excluding the small arc [q 1 (O), c]c. Recall that
in the competitive context of an AI-CPRG, C1 was first to visit [b,
f]cc.
In this
sense, the social cost of competition, i.e., the inefficiency introduced by competition,
is represented by the difference between f2(q) and f7(q) integrated over [c, e]ccw,
because [c, ~e]c
shifts from being harvested by C2 to C 1 when transitioning from
a cooperative to a competitive formulation of the search problem. This tradeoff is
illustrated in Figure 6-6.
'.C
2
Figure 6-6: A socially optimal search strategy for the scenario considered in Example 6.3.
The socially optimal search strategy is illustrated by the purple line: C 1 and C 2 rendezvous
at b, having travelled there in the cw and ccw directions, respectively, and proceed to explore
[qi (0), q2 (0)] cc in tandem. The segment 'R 1 , shown in red, is the portion of the ring that
transitions from being explored by C 2 in a cooperative search to being visited first by C1 in
a competitive search.
111
6.9
Conclusions
This chapter considered variants of the standard two-cow CPRG. For CPRGs with
a turning cost, we developed an upper bound on the number of times an intelligent
cow would ever reverse direction while searching. Subsequently, we introduced the
AI-CPRG, in which each cow maintains a unique prior on 7's location. When each
cow knows her rival's prior, previous equilibria results extend naturally. However,
when just one cow has an unreciprocated knowledge of her rival's prior, necessary
amendments to the information model of the game introduce new complexities. An
equilibrium profile is provided for one such class of game, in which the less-informed
cow has no idea of the target's location and, accordingly, takes a defensive approach
to search. Here, the more-informed cow leverages her superior situational awareness
to lure, to the extent possible, her rival away from regions of the workspace coveted
by the more-informed cow. Finally, by re-interpreting the goal of finding the target
with the equivalent objective of capturing as much target density as possible, we
provided an interpretation for the social welfare of an AI-CPRG. In this context, the
cooperative notion of a tandem search strategy was introduced to characterize a class
of socially optimal search profiles.
The treatment of AI-CPRGs was restricted to the case where the uninformed
agent maintained a uniform prior on the target's location. It remains to study the
broader class of game in which the second cow maintains a non-uniform prior and
to fully characterize the strategic options available to each cow. We conjecture that
the complexities that arise in these scenarios, which are compounded by the feedback
nature of the game, would make a detailed analysis especially challenging. We elaborate on this issue in the final chapter of the thesis. For now, rather than overindulge
in such a pursuit, we change course once again and, in the next chapter, consider
CPRGs in which targets arrive dynamically.
112
Chapter 7
Dynamic Cow-Path Games: Search
Strategies for a Changing World
This chapter investigates multi-agent systems in which the agents compete to capture
targets that arrive dynamically. Cows reprise their roles as intrepid search agents,
but now operate in an environment where targets have transport requirements and
continually repopulate the workspace. To address these problems, we introduce the
Dynamic-Environment CPRG, or simply DE-CPRG. We show that greedy searching
(in the myopic sense), although optimal in select instances, can perform very poorly
in others. To bound performance, we establish a condition on the utility each cow
receives in an equilibrium. Recognizing that assessing the long-term effects of short
term actions in an equilibrium setting is, in general, prefixed by an assortment of
unique challenges, we provide a strategy to search the ring that possesses a number
of attractive attributes and, in the long-run, offers some performance guarantees in
the face of inter-agent competition in an ever-changing world.
This chapter is organized as follows. First, we supplement existing CPRG terminology so that we may describe the process by which targets arrive. Subsequently,
we define CPRGs in dynamic environments. From here, we consider greedy strategies, which emphasize finding the next target to appear, potentially at the expense of
securing future targets. We argue that greedy strategies, while occasionally optimal,
113
have the potential to induce severe capture droughts. From here, we establish that
in any equilibrium, each cow captures half of the targets. Next, we revisit the worstcase performance of greedy search strategies on a more formal level, amassing further
evidence that greedy searching can provide poor performance in persistent settings.
Motivated by the desire to pass along a search strategy that avoids prolonged capture
droughts, we provide a defensive-minded strategy that possesses a number of desirable
attributes. Finally, we characterize a worst-case performance bound for this strategy
and compare it to the theoretical equilibrium value, i.e., one half, for a variety of
target distributions.
7.1
A motivation for dynamic environments
Thus far, the scenarios considered involved just a single target that had been randomly
placed prior to commencement of the game. Accordingly, each cow put a premium on
being the first to find the target. A natural next step, and new challenge for the cows,
is to consider search games in which targets arrive dynamically. In introducing these
encounters, the aim is to provide (i) an abstraction that captures the competitivesearch undertones of relevant, real-world scenarios with dynamic componentry and
(ii) a framework to explore and evaluate persistent decision-making strategies.
As an example of a persistent target-capture scenario, consider the operation of
Yellow Cabs, also called Medallions, in Manhattan, New York. Legally, these cabs
may take a job only after being hailed by a passenger. In Chapter 1, we highlighted
how taxi operations of this form could be viewed as a competitive search game. Less
emphasized in that discussion was the long-run nature of the game. For example, in
2013, the average shift length of a taxi driver in New York City was 9.5 hours [23J.
To support themselves, drivers must use a customer-foraging strategy that accounts
not only for (i) the distribution of potential patrons and (ii) the presence of nearby
rivals, as in traditional CPGs, but also (iii) the extended duration of a typical shift.
In this regard, a taxi driver should not be overly concerned if a prospective passenger
114
is preemptively snatched up by a rival driver. However, a driver should be worried
if, upon reflecting on the day's events, they spent a significant fraction of their shift
without a passenger on board.
7.2
Dynamic Cow-Path Games with target transport requirements
To facilitate the forthcoming discussion of DE-CPRGs, we begin by extending our
existing suite of notation to handle scenarios involving multiple targets. To start, tar.
gets are enumerated according to the order in which they arrive on 'z, i.e., T 1, 7 2 ,.
and the spatio-temporal process governing their arrival is denoted by
#Y.
In the games
we consider, targets have transportrequirements. Specifically, each Tj is described by
a pair, (0j,'Dj) E JZ 2 , where
Qj
and 'Dj are Tj's origin point and destination point,
respectively. For Ci to receive credit for capturing Tj, she must (i) be first to find
Tj at (9 and (ii) transport Tj to 'D. Games with this mechanic are motivated by
taxi systems in which drivers compete with other vacant cabs to locate customers,
and, provide transport before being paid. The notation O9-fo, indicates that (D is
distributed according to the spatial density function fo3 : 'Z -+ R;>o. In an analogous
way, we write 'D ~
fej.
More will be said about the temporal process governing T3 's
arrival shortly.
For j > 1, we associated three specific times with T7: the time Tj arrives on 'z,
the time T is captured by a cow at 0., and the time Tj is dropped off at 'D. These
times are denoted by ta(7j), tc(7j), and
td(7j),
respectively, and, following the natural
ordering of events, satisfy ta(Tj) < te(Tj) < td(Tj). C1 and C 2 , via the strategy profile
they employ, are partly responsible for tc('T) and td('T). Ox
# is fully described by f 3,,
foy,
and ta(Tj) for j > 1.
In all games, it is assumed that when C, discovers a target, 'Ty, she immediately
transports Tj from
Qj and 'D
along the shortest path, either [0j,'D]cw or [Q9, 'D]ccw.
Furthermore, we assume that at time tc(T7), both C 1 and C 2 are apprised of (9 and
115
'Dj. This would be the case, for example, in a taxi system where taxis update their
status with a central computer at the beginning of each job. The target summary at
time t, denoted T(t), is a listing of all target arrival, capture, and delivery times up
to time t, i.e.,
j
T(t) = {ta(Wj),tc('Yj),td(W),
(7.1)
1)}<t,
where given a set of times T = {t 1 , t 2 ,...}, Tst = {ti E Tti < t}. In this way,
I(t) encapsulates Ci's knowledge of all target activity on 'z up to t. To reflect this
information, Ii(t), Ci's information model of the game at time t, is augmented to
include T (t), such that 2i(t) = (fT, F'(t), 'I(t)) and the set of Ci's search strategies
are given by
Si = {si
s(y,[o,t], fO,
11T(t),
t)j= 1 ,2 -4
(7.2)
<1}.
With multiple targets peppering 'R in a DE-CPRG, it no longer makes sense to take
Ui(s) = P (Ci
<-
7j under s) for a given
capturing Tj, for
j
j
;> 1, as it ignores a hungry Ci's interest in
> 2. To describe a more appropriate Ui, refer to
[ta(Tj),
td(W)],
i.e., the time during which 7j is in the system, as stage j of the DE-CPRG and define
U7(s) = P (Ci
<-
(7.3)
Tj under s)
to be Ci's utility during stage j. Note that at this point we have said nothing to
rule out the possibility of stages overlapping. Assuming targets are homogeneous, it
is sensible for Ci to focus on the fraction of targets she captures in the long run, as
illustrated in Figure 7-1. For i E {1, 2}, j E Z> 1 , define the indicator random variable
ij.(S) =
{(7.4)
1
if e4<-cr under s
0
otherwise.
The aggregate utility of e2 is defined as the expected fraction of targets she captures
in steady state, i.e.,
Uf 9 (s) = E [fraction of targets
116
ei captures under
s]
(7.5)
(7.6)
E E [1y (s)]
=lim
m-+o1
=lim!Z
m-oco
rlim
m-oom
(7.7)
,(s)
(7.8)
C2
i
C1
P (Ci <- TW under s)
C2
C1
p
I
tc(
) tc 72
tc (4)
3)
tc(
~t
5)
Figure 7-1: A sample sequence of target capture times associated with the early stages of
a DE-CPRG. In the instance shown, C1 captures targets T 1 , T 2 , and 74 , while C 2 captures
targets T3 and 75 . If the statistics shown are representative of steady-state behavior, then
the aggregate utilities of the cows would be Ut"(s) = 0.6 and U2(s) = 0.4, respectively.
We are now in a position to formally define the DE-CPRG with transport requirements. Because the problem features a number of components, we opt to use the
more streamlined tuple notation.
Definition 7.1. A DE-CPRG with target transport requirements is described by the
tuple
gDE-CPRG
i(Cj,
i,
Si
(7.9)
7)i=1,2,
where Ci retains her usual sensing and movement capabilities, 'Z is the ring; for
i C {1, 2}, Ti(t) = (q[o,t,t'I(t))j=1,2, Si ={si : si(fO, Ti(t)) -+<}, Ufa is given
by (7.8); and
#y
is the process by which, for
j
> 1, (i) Tj = (0j,'Dj), (ii) (D
are independent and identically distributed (i.i.d.) with Oj with 'D
ta(Tj+i)
fo,
(iii) 'Dj are i.i.d.
~ fp, (iv) OQ and Dj are independent, (v) 71 arrives at t = 0, and (vi)
td(Tj).
Upon finding TY, Ci is obligated to travel directly to 'D along the
shortest route. Finally, Ci may transport at most one target at a time. How should
Ci search so as to maximize Uia?
117
+
Dj+ 2
+1
++1
(9.
Figure 7-2: An isometric visualization of an instance of the DE-CPRG. The prior fo is
shown in blue as a function of position along R. Also shown are the origin and destination
points associated with targets 'T, 7j+l, and 'T+ 2 . At the instant shown, C 1 and C 2 are
searching 'R for Tj. According to c-, once 7V is discovered and transported from O( to Dj,
cV3+ is popped from the queue of targets and appears on 'R.
A snapshot of a DE-CPRG is shown in Figure 7-2. Based on Oy, a DE-CPRG is
effectively a sequence of CPRGs played in immediate succession, with the important
caveat that the initial conditions of any game are determined by the preceding tra-
jectories of C1 and C 2 . Accordingly, we refer to stage j of a DE-CPRG as CPRGj,
with the time span of CPRGj being [ta('J), tdQ)l. It is worth reinforcing that, as
stated by the rules of the game, when transporting a target, say Tj, Ci's behavior is
unaffected by her strategy si. Consequently, Ci can use this time to advantageously
preposition herself in preparation for CPRGj+
1 . Indeed, as we will see, judicious use
of this time is a defining attribute of efficient search strategies.
7.3
Greedy search strategies for the DE-CPRG
With the goal of maximizing (7.8), how should Ci go about searching for targets? To
get the ball rolling, recall that an algorithm to compute e-NE search strategies for
118
a CPRG was discussed in Chapter 5. Recognizing that a DE-CPRG is effectively a
sequence of CPRGs played in immediate succession, there is an incentive to investigate
if repurposing earlier results can provide added analytic mileage at little overhead. To
this end, the following search strategy builds upon the equilibria results of Chapter 5
while largely ignoring the complexities introduced by a dynamic environment.
Definition 7.2. (greedy searching) For a DE-CPRG, the greedy search strategy,
denoted
sgr,
is the search strategy with the following functionality (si
si = s (qi(tj), q2(tj)) in stage
where tj
= ta(Tj)
j, j > 1,
= Sgr)
-->
(7.10)
and s*(qi, q 2 ) is a NE search strategy of the static-environment
CPRG, discussed in [89], in which
ei
has initial position qi, i = 1, 2.
In words, using si = sgr, (! tries to maximize her probability of capturing the
most recent target to arrive on JZ. In doing so,
may end up in the event
&i
ei
has no regard for where on 'Z she
<- Tj. Should si = sgr offer reasonable performance
guarantees to Ci, it would be tempting to propose
the case of uniform fo, it is easy to see that
(sgr, sgr)
(sgr, syr)
as a NE. For example, in
is a NE profile. Unfortunately,
the following example, unsurprisingly, debunks both of these claims for general
fo
and fD.
Example 7.3. To better understand the potential pitfalls of Ci using si = gs,, consider the scenario depicted in Figure 7-3. At the beginning of CPRGj, C 2 is separated
from region 8 1 by C1 and is unlikely to capture 'T. Nevertheless, C2 's best hope of
capturing
Wj
is to search 'Z by traveling in the ccw direction. However, in pursuing
this agenda, which is consistent with s2
= Sgr,
not only is C 2 unlikely to capture Ti,
.
she is also prone to being stranded along 'Z \ {E 1 U 8 2} at the beginning of CPRGj+1
Consequently, it is also unlikely that C 2 will capture T3 +1 . Intuitively, it is preferable
for C 2 to, instead, (i) follow C1 to the edge of 8 1 , (ii) wait there until the beginning
of stage j + 1, then (iii) explore 6 1 during which time she stands an excellent chance
of capturing T7+ 1 . Although this queueing-centric search strategy requires thinking
119
(a) fc shown in blue.
(b) fD shown in green.
Figure 7-3: A snapshot of a DE-CPRG taken at the start of CPRGJ. The origin-target density, fo, and destination-target density, f,, are shown in (a) and (b), respectively. Targets
are significantly more likely to (i) arrive in 6 1 rather than 9 \ 0 1 and (ii) seek transport to
02 rather than JZ \ 02.
just one stage ahead, it is beyond the scope of si = sg,.
7.4
Equilibria utilities of cows in the DE-CPRG
Example 7.3, from the previous section, revealed that si = sgr can lead to prolonged
capture droughts for Ci. Moreover, it is not obvious how C, should go about striking
an optimal balance between being (i) sufficiently opportunistic so as to capitalize on
short-term advantages and (ii) sufficiently disciplined so as to not lose sight of long-
term objectives, i.e., (7.8). To this end, imagine Ci, by using si, can ensure that she
captures at least forty percent of all targets. Is this an impressive tally, or should Ci
be doing much better in an equilibrium? In this regard, it would be useful to have
a benchmark by which to compare, for a particular si E Si, min,_, Uf"(si, s-i) with
U" (s*) for s* E SNE. The following definitions will prove useful in this regard.
Definition 7.4. Let Su(fo) =
{q C 'lfo(q)
DE-CPRG and search profile s
E
> 0} denote the support of fo. For a
S, si is an itinerant search strategy under s if in
the ensuing stages of the game, i.e., CPRG 1 , CPRG 2 , . .. , for all q E Su(fo), Ci is the
120
first cow to visit q infinitely often.
Definition 7.5. An s E S is an itinerantsearch profile if si and s2 are each itinerant
search strategies under s.
In words, an itinerant strategy is one that actively explores all regions of the
workspace. In this way, an itinerant strategy elicits associations with the notion of
recurrent states in Markov chains and renewal theory in general. However, an itinerant strategy is defined in a competitive context not present in these more traditional
notions. Owing to the transport requirements on 75 and the independence of (D and
'D, search strategies that do not expressly prohibit visiting select points on 'z are
itinerant. The following theorem establishes that the initial conditions of a DE-CPRG
play no role in the payoff Ci receives in any itinerant equilibrium search.
Proposition 7.1. Consider a DE-CPRG. If there exists an itinerant search profile
s*
=
(9,
9)
E SNE, then, irrespective of the initial cow positions, U1,(s*) = U2(s*) =
Proof. The key idea of the proof is that if, in finite time, the cows "switch" positions
on JZ, then the inherent symmetry in the game mandates the cows receive equal utility
going forward. For a more streamlined treatment, we focus on the case where
fT
each have discrete support, i.e., Su(fo) = Su(fp) =
=
{1, ...
,
fo
and
9m} C
JZ,
where the 64 are points on 'Z. Figure 7-4, provided below, captures the essence of the
arguments to follow.
Let the initial position of Ci be qj(0), i
target infinitely often; otherwise, Ufa(g, 9)
=
=
1, 2. Because ( , 9) E SNE, ei captures a
0 and Ci could simply camp out at any
station in Su(fo) and improve her utility, which contradicts ( , 9) E SNE. Also, from
similar arguments, Su(fo) =
Es,
implies any equilibrium strategy, including ( , ),
is itinerant. Therefore, in finite time,
e2 will
capture a target, say Ty, at a station,
say 9 a, that requires transport to station 9 b, with D(Oa, 0b)
where L
=
=
maxa/,b' D(Oa/, b') <
;
27r is the length of 'Z. While C 2 transports 7T to 0 b, Ci has sufficient time to
optimally position herself on 'Z, say at station 6c, in preparation for CPRGj+1 , which
owing to ( , ) E
SNE,
she does. Because CPRGj+1 , in which C1 and C 2 start from 0,
121
and 0 b, respectively, emerged in finite time, the outcomes of CPRG 1 , ...
,
CPRGj, in
terms of which cow captured which targets, contribute only transitorily to
( s)
and
Ulg(
9
, qi(0), q 2 (0)) = Ul49 (s 1,
oc, Ob),
(7.11)
where UQ(si, s2, q 1 , q 2 ) is the utility, i.e., (7.8), Ci derives when C1 and C 2 search
using s, and S2, respectively , starting from qi and q2 on 'R, respectively.
By similar arguments, 3k < oc, k >
eb
and
Ec,
j, such that
CPRGk starts with C1 and C 2 at
respectively, implying
U2"(9,
, qI(O), q 2 (0)) = U2(
(7.12)
S, 6 b, 6c).
Because Ul"g and U2g are both given by (7.8), the symmetry in (7.11) and (7.12),
in conjunction with the constant-sum nature of the game, implies C 1 and C 2 receive
equal utility, i.e., Ui(s*) = U2 (s*) = ', which is the requested result.
U
L
2
C2
02
(b) Scenario 2
(a) Scenario 1
Figure 7-4: Illustration of two scenarios used in the proof of Proposition 7.1. In (a), C 2
discovers a target at Oa that requires transport to O6, a distance L away. During transport,
C 1 has time to optimally preposition herself at 0, in preparation for the next stage. A finite
time later, in (b), the roles reverse, C 1 discovers a target at Oa that also requires transport
to O6, allowing C 2 to optimally position herself at 0, for the next game.
122
We remark that, while necessary, the condition U1 (s) = U2 (s) = 1 is not sufficient
to guarantee s E SNE. For example, reconsider the game depicted in Figure 7-3.
Assume the cows begin at diametrically opposed points on 'R. If for i = 1, 2,
eC
uses the
strategy si = seCC when the current target in play has yet to be discovered, and travels
in the same direction as C-i when
U22(SCcW, SCCW) =
ei
is delivering a target, then U1i(sccw, secw) =
1. However, the door is left (wide) open for
Ci's actions. Namely, by using s', in which
ei
ei
to take advantage of
travels in the clockwise direction to
6 1 immediately after dropping off each target she captures, Ci is likely to capture
ei
Beyond providing a necessary condition for s E
to return to 6 1
SNE,
.
and deliver multiple targets in the time it takes
Proposition 7.1 has another
use. Namely, it states that -2 is an upperbound on min,,_, U(sj, s_j), the worst-case
performance of si E Si, as a function of si. Therefore, the closer this value is to 1, the
more legitimate the case for Ci to use si. This provides useful insight in selecting an si
in the event that (i) an equilibrium cannot be found, or (ii) an equilibrium is known,
but it is difficult for Ci to execute, e.g., it requires significant computational overhead
or planning on the part of 12. In the next section, we revisit greedy searching, consider
the worst-case performance of si = sgr for Ci, and compare this quantity with the
.
value
7.5
An aggregate worst-case analysis of greedy
searching
Owing to the persistent nature of DE-CPRGs, quantifying the long-term effects of
short-term actions is a challenging pursuit. This is especially true in an equilibria
setting. Moreover, such strategies, should they exist, may prove to be computationally
intensive for Ci to implement.
In light of these challenges, we proceed with our
analysis along the following lines. First, we pursue a high-level analysis predicated on
aggregate statistics of the DE-CPRG. Second, we focus on defensive search strategies
that can be implemented by Ci with limited complexity and study the performance
123
guarantees they provide, i.e., the performance of si in min,_, W 9(si, s-i). To begin,
we define the average transport distance of a target as
fo(qj)fD(q2)D(qj, q2)dqidq2.
= J-+Offo~lfDq)~l~)qd
da = lim E [D(O,' D)]
3
2
(7.13)
qi q2
From the i.i.d. properties of (% and 'Dj, E [D('D, (9j+,)] = da. Hence, the average
rate at which Ci, searching uncontested on 'Z, can deliver targets is no more than one
target in time 2 da. Similarly, we define the minimum average delivery distance as
dm = min E [D(qDy)] = min fD(q)D(q, qi)dql
qi
q
qM= argminE [D(q,'Dj)] = argmin
qq
fD(qi)D(q, q)dqi,
(7.14)
(7.15)
with dm := E [D(qm, Dj)]. In words, qm E 'Z is the point or, more generally, the set
of points on 'Z where, upon finding 'Jj, the time required to transport 'T to Dj is the
smallest. In this way, dm serves as a worst case lower bound on the average amount of
time Ci will, having lost CPRGj, have to position herself, from her current location,
in preparation for CPRGj+1 . Because we will frequently refer to an optimal quantity
and its optimizing argument(s), we emphasize the association through the following
operator
(arg, -) max f(x) = (argmax f(x), max f(x)).
(7.16)
For example, we may express (7.14)-(7.15) as
(qm, dm) = (arg, -) min
q E [D(qm, Dj)]-
(7.17)
The following result quantifies the potentially poor performance of greedy searching first alluded to in Example 7.3.
Proposition 7.2. Consider a DE-CPRG. If Ci uses a greedy strategy, i.e., si = sgr
124
then her utility satisfies
minUi 9(sgr, s-i) ;>
;> a,
5-i
1+,3-a
(7.18)
where a and 3, 3 > a, are the utility constants
a = min Ui(qi, q_i),
(7.19)
3
(7.20)
=
min
max
qi,q-i qEB(qi,dm)
U2 (q, qi), where
B(qi,d) := {q E 'Z|D(q,qi) < d}, and ?i(q1,q 2 ) is the stage utility afforded to Ci in
a CPRG where C 1 and C 2 have initial positions q1 and q2 , respectively, and play an
equilibrium strategy of the form outlined in Chapter 5.
Proof. Before proceeding, we remark that a and
from the perspective of Ci. Namely, a and
with respect to both qi and q_i.
#
/
are especially pessimistic quantities
are determined by minimizing U (qi, q-i)
Although it is certainly conceivable for C-i to
maximize U-i(qi, q_i), and therefore minimize Ui(qi, q_i), it is less common to see
Ui(qi, qji) minimized with respect to qi. The reason for minimizing U, over qi is to
emphasize worst case positioning when Ci uses si = sgr and therefore does not plan
for how qi, at the beginning of CPRGj, affects qi at the beginning of CPRGj+,1 . In
this sense, a represents the scenario in which Ci has captured and delivered 7j, but
is subsequently "stranded" on a region of 'Z that affords a poor chance of capturing
72+1. Similarly, 3 corresponds to the scenario in which Ci does not capture 7j and is
1
.
poorly positioned, even with repositioning, to contest Wj+
Returning to the proof of Proposition 7.2, define Ci's worst-case utility, in the
DE-CPRG, from using a greedy search, i.e., si = s,,, as
w
=
mi U" (Sgr, si).
s-i
(7.21)
Given C 1 and C 2 have initial positions qi and q2 at the beginning of CPRGj, Ci's
probability of capturing Tj is at least bi(q, q2 ), since if C-i were to deviate from
125
equilibrium play, it could, given CPRGj is constant-sum, only improve Ci's chance of
capturing 7j.
Therefore, for j > 1, a is the worst possible chance Ci has of capturing 7T using
si =
sgr.
In the event Ci fails to capture 7j, she has, on average, at least time dm
to relocate herself before CPRGj+1 begins. Therefore, by similar reasoning, , is an
approximation of the worst possible chance Ci has of capturing 7T+1 in CPRGj+1
following a failed capture bid in CPRGj. Because w accounts for targets that are
captured after both successes and failures in the previous stage, we write, the steadystate expression
W= lim U) min P (Ci
lim (1
-
<-
7yj+l ICi +- cyj; sg,,7s-i) )+
w) mins_iP (Ce +- j+1Ci
-
'Tj; sgr, s-i)).
(7.22)
(7.23)
-*oo\
In minimizing over s-i, the conditional probabilities for which steady-state limits are
taken in (7.23) lose their dependence on j, ensuring the limits exist and allowing us
to write
W=w( mirnP(ei<--j7+1i-cTj;sgr, si))+
(1-W) ( min s-iP (Ci <- 7j+j I -i +- Tj; sgr,i -i)
(7.24)
(7.25)
(7.26)
> wa + (1 -w).
Solving for w in (7.26) yields (7.18), the desired quantity.
U
A couple of remarks are in order. First, (7.18) is, in general, a highly pessimistic
result. It is predicated on the fact that Ci and Ci are arranged in the worst possible
configuration, from the perspective of Ci, at the beginning of each stage. Naturally,
motion constraints of the cows and the stochastic nature of target arrivals make
this unlikely for general distributions. Nevertheless, the assumptions are, given the
current state of the art, necessary to establish the inequality chain given the general
setting and competitive nature of the game. While (7.18) is conservative in general,
126
it can become tight in extreme scenarios. Reconsidering Example 7.3, we have that
3 = a and (7.18) regresses to a as well. This result confirms our earlier intuition that
by using si = sgr, eC can become stranded on 'Z
\{
1
U 8 2} and realize worst-case
capture rates over an arbitrarily large number of games. Second, if
fo
is uniform
over 'R, then a = / = min,_ U(sgr, si) = 1, making movement memoryless, and, as
noted earlier, it is easy to see that (sg, Sgr) E SNE-
For general
fo,
greedy searching illustrates, in the extreme, the potential pitfalls
of focusing on the short-term gains at the expense of long-term ambitions. How then,
can this tradeoff be optimized in the context of a DE-CPRG? We argue that precisely characterizing the long-term effects of short-term actions given the competitive
and stochastic nature of a DE-CPRG is a challenging pursuit. Moreover, equilibria
strategies, should they exist, may place an unrealistic computational burden on Ci. In
recognition, we change course and advocate for defensive strategies with quantifiable
and respectable capture rates. First, we step back from pursuing equilibrium strategies, and seek, instead, to identify search strategies that avoid prolonged capture
droughts and provide reasonable performance guarantees.
As a first step in this direction, we define the following search strategy.
Definition 7.6. Consider a DE-CPRG. Define Ci's maximin starting point and maximin utility of the game as
(q', U ) = (arg, .) max minUi(qi, q-i),
qi
q-i
(7.27)
respectively. Ci is said to use a maximin search strategy, denoted si = s* Iif (i) C2
travels to q% after each successful delivery and after each failed capture bid, capturing
any target found along the way, and (ii) upon reaching qi, Ci plays an equilibrium
strategy for the remainder of the current game.
By playing si = s*, Ci ensures herself a reasonable chance, i.e., Uf, of capturing 'iy
in each CPRGj she begins from qi. In this way, s , by avoiding prolonged searching
over low-valued regions of 'Z, directly addresses the major pitfall of sgr. Unfortunately,
127
s is, itself, poorly suited to a dynamic world, because, in a setting where games are
played in immediate succession, Ci may spend a significant fraction of time relocating
herself to qt. Extending earlier notation, let
Ei(qi(t), q-i(t))
=
{q
E
R I D(q, qi(t))
<;
D(q, q-i(t))}
(7.28)
denote the set of points on OZ closer to Ci than C-i at time t. For convenience, we
will frequently write
E8(t)
as a shorthand for E8(qi(t), q-i(t)). The following strategy
addresses the limitations of s'. At a high-level, it ensures Ci maintains a reasonable
chance of finding a target in the stages ensuing a loss.
Definition 7.7. (dynamic mirroring strategy) Define the following time-dependent,
landclaim-related quantities for t ;> t (7i)
zi, (t) = {q E ' : Ci first to visit q by the time t - ta(Wj)},
zj (t) = Z 1 ,(t)
(7.29)
(7.30)
U C2,J(t),
(7.31)
'Cj (M = 9 \ Cj (M).
(si = sdm)
-
The dynamic mirroring search strategy, denoted sdm, has the following functionality:
[b \ #_i(t)
i
if qi(t) V
e3(t)
if D(q, qi(t)) < D(q, q-i(t)), Vq E
=
0_i(t)
j (t) , t > 0
(7.32)
otherwise,
where qi is determined by Algorithm 7.1, provided below.
In words, si
= sdm
specifies that Ci head toward C-i as if to set up a head-on
collision. However, if C-i should venture into Oi(tj), then Ci (i) stays just ahead of
C_i, unless (ii) C-_ ventures far enough into
ei(tj) that by continuing with her current
heading, as stipulated in line 3 of Algorithm 7.1, Ci is guaranteed to capture Ti.
128
Algorithm 7.1: determine Oi for si = sd,
1i
2
3
4
=
undef ined;
while true do
if D(q,qi(t)) < D(qq_i(t)), Vq E
L
y (t) then
i=#~)
! break
The attractive feature of sdm is that it allows Ci to be the first to visit all points
she is closest to at the beginning of each stage. Moreover, it is a strategy that allows
her to guard territory on 'R, such that if it is in her favor, she can continue to be
the first to explore this territory from one game to the next, until such time as she
finds the target. Of course, controlling a particular half of 'z is only useful if there is
a decent chance it will contain a target in the near future. To this end, define Ii's
persistent maximin point, q-,, as
(i ,U )
=
(arg, -) max min
qi
ffo(q)dq.
(7.33)
q-i
ei(qi,q-i)
To chronicle repositioning efforts during the time C-_
()
is busy transporting 7j from
to 'D, we also consider a related notion
qz (qi, q-i, qd)
=
arg max
J
fo(q)dq,
qG'B j
ei(q,qd)
(7.34)
qi, in the time it takes Ci to travel from qji to qd. Having introduced sdm and 4
,
where B is shorthand for 'B(qi, D(q-i, qd)), i.e., the set of points Ci can reach, from
we may, finally, fuse these ideas to provide a search strategy tailored to the persistent
nature of a DE-CPRG [91].
Definition 7.8. (dynamic maximin search strategy) Enumerate the targets cap-
tured by Ci in a DE-CPRG as T7, 1 , T, 2 ....
search strategy, s,
The functionality of the dynamic maximin
is described by Algorithm 7.2.
129
Algorithm 7.2: functionality of si = sd
i
j = k = 1;
2
at t = 0, travel to q,;
3 while true do
4
while Ci still searching for next target ij do
5
Si = Sdm;
6
if Ci captures 7_i,k then
7
k
8
C travels to 4- (qi(t), q_j(t),'D(Ti,k));
9
k+1;
% Ci has found 74,;
10
j <- j + 1;
11
travel to 'D(Tij,);
12
travel to qp, capture any target Tjj found along the way;
In words, si = s
affords Ci the following benefits: (1) Ci has a reasonable chance
of capturing T7j once she begins searching from qp, and (2) Ci improves her chance of
capturing a target when a positional advantage she possesses can be exploited. The
next proposition cements the latter point.
Proposition 7.3. Consider a DE-CPRG in which si = s=. For k > 2, define
t-i,k
= td(7_i,k_1),
i.e., the time C-i delivers 7_i,k_1 and begins searching for 7i,k.
Forj E Z>1 , define K(j) to be the set of k E Z such that ta(T-i,k) E
and Ci has visited q
capturing 'k+j
(i P (Ci
<-
(ii) P (Ci <(iii) P (Ci
in
[td (Tj),
t-,k]}.
[tdc(Ti,j),
tc(Ti,j+1)]
For j E Z> 1 , k E K(j), Cij's probability of
satisfies:
k+j ) >
ck+j)
<-7k+j)
01*
is non-decreasing over
>
P (Ci +-
[0t, ]
!
k'ij
in k,
for all k' > k, k'
E K(j),
where td(Wij) and te(Ti,j+1 ) are the times at which Ci delivers Wi, and captures Ti,j+1,
respectively.
130
C has probability at least U; of capturing the target in
Ci,
Proof. By traveling to
the next stage of the DE-CPRG. Moreover, by using si = Sdm during all subsequent
stages until Ci captures her next target, and only shifting
do so, (i) is established. If
ei
shifts E8(t-i,k)
E8
when it is profitable to
E8(t-i,k+1), then it must be that (1)
-
doing so was profitable for Ci, or (2) C-i delivered 'Di,k inside E8(t-i,k). In the first
case, Ci improves P (Ci +1by
Wk+j+1).
In the second case,
ei
ensures P (Ci <-
Tk+j+1)
>
positioning herself to whichever side of 'D(Ti,k) is the more advantageous,
i.e., controls the larger share of fo, for Ci, before the next stage commences. This
argument establishes (ii). Finally, (iii) follows from the conjunction of the previous
E
arguments.
Proposition 7.3 speaks to Ci's ability to continually improve her capture probability following each stage in which she was unsuccessful under s . The next result
addresses Ce's associated long-run capture rate.
Proposition 7.4. Consider a DE-CPRG. By using si = s,
Ci can ensure that her
utility satisfies
min ig(sm, S-i) >
S-i dmi
2+
>
=
2da
+1
(7.35)
-(7.36)
3+
where dl
max(O,d-da,)
max(O,d-da)
2da
ffj(q)D(qf
j)dq is the average time Ci takes to travel from D(Wi,) to Q'.
q
Proof. To establish the bounds in (7.35)-(7.36), we again structure our analysis according to a combination of aggregate statistics and worst-case arguments. Figures 7-5
(above) and 7-6 (below) provide visual support for the arguments that follow. For
s E S, denote the set of games Ci loses after delivering
as 'Pij(s).
i,;_ 1 , but before reaching 4
We will refer to 'Pij(s) as Ci's j-th positioning phase. Similarly, denote
the set of games Ci loses after having started actively searching from 4 , but before
finding 73 as 8, (s). Likewise, we will refer to 83,(s) as Ci's j-th searching phase.
131
Figure 7-5: Visual breakdown of a typical interval spanning the time between successive
target captures for Ci using si = s . On average, it takes Ci time d to return to q- after
delivering T. From the perspective of Ci, in the worst-case, C_. finds a target at time
td('Ti,j)+, which, on average, is delivered in time da.
The number of games that constitute Pij (s) and Sj(s) are denoted by IPij(s) and
18,j(s)1, respectively. On occasion, and when the meaning is clear, we will drop the
dependence on s.
Assuming, C_ is a worst-case rival to C, C_ would be immediately positioned
to capture a target at td(74,j-1)+, i.e., the instant C2 delivers Tj,_1, which she could
deliver, on average, in time da. From the geometry of 9Z, E [D(q-,'D, _1 )] = d, so that
during 'i,,
Ci travels an average distance of d. This leaves, on average, max(O, d -da)
time remaining until Ci reaches Q. As noted earlier, when searching uncontested for
a target on 'R, the distance a cow must travel from delivery to pickup, and vice versa,
is, on average, at least da. Therefore, while (i travels to
f, Ci can capture, on
average, no more than max(O, d - da)/ 2 da targets in addition to the target captured
at td('Yi,-1)+, i.e.,
lim max E ['Pi,j(s , si))] < 1 + max(O, d
-- +0
S-_m2da
-
da)
(7.37)
A visualization for this most recent line of reasoning is provided in Figure 7-5.
Continuing, for each game in Sj, Ci's chance of capturing Ti, is, by Proposition 7.3, a Bernoulli random variable with a probability of success of at least UO.
Hence, the duration of 8 is, on average, no more than (1/OU)
,
1 stages. However,
_i has already explored part of E2
.
it is possible that at the instant C, reaches
-
In a worst-case setting, we can resolve the situation by assuming the target in any
132
partially completed game is found by C-i. In this case,
lim maxE [I8i,j(sd,si) ] <; 1 +1
/0
-
)
=
~.
(7.38)
Referring to Figure 7-6, the fraction of targets 12 captures using si = s
i.e.,
s-i), is the fraction of green circles in an infinitely long sample run, or
uag (s,
mathematically,
Uia()
1
lim E ['Pi, (s)1] + lim E [18ii(s)j] + 1'
(7.39)
where the additional 1 in the denominator of (7.39) represents a green circle, i.e., a
capture, in the sequence of games. Then for si = s
Ci's worst case performance
satisfies
min Ujag(s, s-i)
S-i
min
s-i lim E [I'iy(s, s-i)l] + lim E
2+
1
max(0, d-d)
2da
[I8gj(s
, s-i)l] + 1
(7.40)
(7.41)
1
U
The second inequality, i.e., (7.36), follows from the fact UEc (0, 1), with this rightmost term representing the fraction of games
eCwould
win if she were to travel to
qP following each stage, i.e., both successful and unsuccessful capture bids. The fact
that Ci can, in most instances, guarantee a utility strictly greater than this value is a
testament to the persistent guarantees of s
m
As a quick remark, note that for games in which fo and fD are more-or-less unimodular,
Of
~
j
and d ~ da, such that (7.35) is approximately,
.
Conversely,
for games in which fo and fD are uniform over 'R, 4 is any point on 'R, such that
repositioning is unnecessary, i.e., d= 0, and
0
=
j, such that min,_ ua"(si s)
also 1. Naturally, general cases, i.e., general fo and f,
133
is
require customized analysis
(a) A segment of games, from a sample run, illustrating the outcome of each game from the
perspective of C1 . Green circles denote games in which C, captures a target. Red circles denote
games during Ci's reposition phase, P in which C- captures the target. Finally, blue circles
denote games during Ci's search phase, S ,, in which C- captures the target.
Ti, j-1
'i,j
Ti, j
i,j+1
Si,j+1
Ti,j+1
(b) A segment of games, from a sample run, in which Ci captures a target during P , i.e., en
route to qi. In this case, S = 0, i.e., Ci transitions from Pj to Pje without a search phase.
Figure 7-6: Segments from possible sample runs, from the perspective of Ci, of a
DE-CPRG.
to understand how distribution specifics affect the relationship between d and U4z
that drives (7.35). Nevertheless, it is comforting to know that for select distributions,
(7.35) is not overly removed from -. In instances where this discrepancy is acceptable,
perhaps owing to the inability to implement highly complex search strategies or because better performing strategies are difficult to identify altogether, it is reasonable
to advocate Ci adopt sm when playing a DE-CPRG.
7.6
Conclusions and Future Directions
This chapter investigated scenarios in which agents compete to capture targets that
arrive dynamically in an environment. For games in which targets must be discovered
and then delivered to random locations, it was established that greedy strategies, in
general, perform poorly. In response, we specified a search strategy with maximin
undertones that is tailored to the persistent nature of the game and, using a high-level
aggregate-oriented analysis, provided performance bounds on the worst-case capture
ability of the strategy. For select target distributions, we showed this search policy was
able to guarantee a capture rate that was either on par with or within a respectable
factor of what would be achieved in an equilibrium.
134
Dynamic environment competitive search games are defined, in large part, by
the rules governing the arrival and departure of targets to and from the workspace.
Permuting solely over the various mechanisms by which targets can (i) enter, (ii)
accumulate, and (iii) exit the environment, one quickly generates a collection of games
that is both sizeable and functionally diverse. By way of the DE-CPRG, this chapter
has studied a number of issues likely to be of overarching relevance to an assortment of
these games. However, there remains a bevy of encounters yet to receive a treatment
tailored to the specifics by which their environments evolve.
For more elaborate
formulations, analyzing these games on an agent-by-agent basis may, as with the
DE-CPRG, prove difficult. Developing the appropriate modeling formalisms and the
necessary analytic techniques to study these games is a natural course for future work.
We speak to these points in more specific terms in the next chapter.
135
136
Chapter 8
Summary and Future Directions
This thesis was motivated by the study of systems in which multiple mobile, selfinterested agents compete to capture targets. In the scenarios considered, each agent
had minimal sensing capabilities and limited prior knowledge of each target's location.
It was argued that many real-world systems, including taxi fleets, are, in large part,
driven by similar inter-agent search dynamics.
For a variety of related scenarios,
we asked the question, "What strategy should a particular agent use to search for
targets?" To provide an answer, we introduced Cow-Path games, scenarios in which
hungry cows compete to capture edible targets, as a framework to understand agent
decision-making in adversarial search settings.
It was argued that the most basic environment for which the Cow-Path game
affords interesting options for strategic play was a ring. As a prelude to our study of
these competitive search encounters, we considered the Cow-Path Ring Problem or
CPRP. Here, a single hungry cow, guided by prior information, searches the ring to
find a target in minimal expected time. It was shown that the CPRP is a variant of
the well-known Cow-Path Problem, and many of the conditions necessary for search
plan optimality hold for both scenarios. Additional stipulations for optimal searching
in the CPRP were derived by exploiting the ring's circular topology. Our analysis of
the CPRP was encapsulated by two algorithms that, during execution, inspect the
locations of adjacent turning points to return an optimal search plan.
137
The primary contribution of the thesis was our analysis of the Cow-Path Ring
Game, or CPRG, a scenario in which two cows compete to capture a target. Key
features of the formulation included a shared prior on the target's location and the
ability of each cow to track her rival's motion. Strategic options available to each cow
made it difficult to determine the location and number of times a cow should turn. To
gain analytic traction, we first considered a simplification of the game in which each
cow may turn at most once. For any E > 0, a strategic algorithm was presented to
determine an E-Nash equilibrium, i.e., a search profile from which unilateral deviation
cannot appreciably improve a cow's probability of capturing the target. When each
cow may turn a finite number of times, it was shown the game may be cast as a
dynamic programming problem. In this way, a cow searches by considering not only
her chance of finding the target before the next turn, but also her chance of finding
the target in the equilibria of the ensuing subgame.
Recognizing that the CPRG hinges critically on a number of modeling assumptions, the remainder of the thesis varied a feature of the standard CPRG and studied
the associated effects on each cow's decision-making. To this end, the CPRG with
asymmetric information examined search encounters, also on the ring, in which each
cow had a unique target prior. Additionally, each cow maintained a prior on what
she believed her rival's prior to be. When the cows have perfect knowledge, previous results were shown to extend readily. However, for a family of games in which
one cow had superior situational awareness, a strategic algorithm was presented that
allowed the more informed cow to leverage informational asymmetries whenever possible. As a change of pace, we presented a definition of social welfare for games with
asymmetric information and characterized a socially optimal search policy.
Finally, in the dynamic environment CPRG, targets arrived on the ring dynamically, and each target represented a request for transport from an origin to a destination point. Here, the goal of each cow was to maximize the fraction of targets she
captured in steady-state, with capture requiring a target's transport requirement be
fulfilled. Upon introducing the necessary machinery, we argued that greedy searching,
138
in which a cow searches to maximize her probability of finding the most recent target to appear in the workspace, can, in particular instances, provide arbitrarily poor
performance. The parity of dynamic encounters on a ring is formalized by showing
that each cow captures half of all targets in any equilibrium. Moreover, we argue
that this quantity is useful as an upperbound on the worst-case utility of any search
strategy. Recognizing that it is difficult to resolve the long-term effect of short-term
actions in an equilibrium setting, we advocated for defensive strategies that offer
reasonable worst-case guarantees. We provided one such search strategy, bounded
its performance in the worst case, and showed that while conservative in general,
it provides performance that, for select target distributions, is within a respectable
constant factor of the utility achieved in any equilibria.
Taking stock, our research efforts focused on search encounters that took place on
a ring and involved two cows. Looking forward, there are a number of open problems
that may be of interest to the decision-theory community. Natural extensions include
games that take place in alternate environments, feature three or more cows, or both.
For example, recognizing that graph environments are a representative abstraction
of the road networks on which taxis drive, they are an intriguing venue on which
to stage Cow-Path games. In this direction, known results from graph theory may
provide useful constructs for characterizing effective search strategies. It may also
prove insightful to extend the defensive search strategies from Chapter 7 to graphs.
In crafting that approach for rings, we benefited from only having to manage search
efforts along two frontiers, a luxury absent in graphs. Nevertheless, there ought to be
some means of extending the notion that when in low-valued regions of the workspace,
it is better to relocate to more favorable confines and execute some type of maximin
sweep.
As mentioned, Cow-Path games contested between three or more cows is another
avenue of future research. Given the feedback nature of Cow-Path games, we speculate that stepping from two- to three-cow games may increase the complexity of the
strategic analysis considerably. However, a third cow may also initiate new forms
139
of strategic play. For example, it may be prudent for two of the cows to collude at
the expense of the third. Characterizing this mechanism, the conditions under which
it occurs, or refuting its existence entirely would shed fundamental insight into the
dynamics of general multi-cow games. More fortuitously, we anticipate the marginal
complications imposed by adding a fourth, fifth, or n-th cow to the ring will quickly
saturate, as each cow need only concern herself with justifying her actions relative
to her two immediate neighbors, suggesting the potential to develop a general theory
for n-cow games on the ring.
In Chapter 6, we provided a notion of social utility that relates to maximizing
the collective perceived probability of finding the target. Alternatively, we could also
prescribe a temporal notion of social utility. For example, in a DE-CPRG, we could
take the social utility to be the negative of the average amount of time a target spends
in the environment before being discovered. From this temporal perspective, socially
optimal strategies are those that involve two cows cooperating to minimize the average
system time of targets. This would be justifiable if the cows were assured, in advance,
that all targets would be split evenly between them. Clearly, competition to secure
targets in the DE-CPRG is not necessarily aligned with ensuring targets are found
in a timely manner. By contrasting the average time a target spends before being
picked up in these two competing frameworks, one could develop a second notion of
price of anarchy associated with competitive search games, and one that is likely more
pertinent from the target's perspective.
The DE-CPRGs games studied in this thesis represent only a small portion of
what is a large and diverse family of problems. In a dynamic setting, entertaining new options for strategic play and novel models to describe the evolution of the
workspace may provide plentiful returns. To this point, recall that in Example 7.3, an
equilibrium strategy was for each cow to "take-turns" exploring a particular portion
of the ring that has a high chance of containing the target. This type of queueing
phenomena may be observed, more overtly, in taxi systems, where vehicles line up at
dedicated stands. A more precise characterization of queuing-based searching may
140
provide guidance as to when and where this type of strategy is most appropriate. Similarly, in many dynamic settings, targets accumulate in the workspace with multiple
targets often up for grabs at the same time. Investing in new formulations, e.g., those
that represent the accumulation of targets and movement of agents using continuum
models, would likely be necessary to provide a tractable analytic base. Although such
an approach favors a more macroscopic view of search operations, in contrast to the
agent-based formulations studied in this thesis, any results to emerge would naturally
invite opportunities for validation and integration with real-world data.
To conclude, this thesis considered search scenarios in which agents compete to
capture targets given a prior on the location of targets. To understand the decisionmaking process of agents in these settings, we introduced and analyzed a collection of
stylized scenarios that emphasized adversarial searching between two agents on a ring.
Despite providing both a useful venue to frame competitive search games, as well as an
initial body of results, there remains a panoply of open research directions worthy of
future investigation. Progress in the identified areas would serve to both broaden and
deepen the state of the art in this relevant, and we believe promising, branch of search
theory. With these ideas in mind, we close with the following remark. Although the
study of competitive search games is not without its share of challenges, work in this
area has the potential to shed valuable insight into the competitive tension at play in
a host of relevant systems. Through the study of Cow-Path Games, this thesis has
demonstrated that, under appropriate modeling abstractions, it is possible to formally
analyze the decision-making process of agents that compete with one another to find
targets. Moreover, these results may serve as a valuable stepping stone in the pursuit
of many of the aforementioned future work items.
KS
141
142
References
[11 N. Agmon, S. Kraus, and G. Kaminka. Multi-robot perimeter patrol in adversarial settings. Proceedings of the International Conference on Robotics and
Automation, 2008.
[2] M. Ahmadi and P. Stone. A multi-robot system for continuous area sweeping
tasks. Proceedings of the International Conference on Robotics and Automation,
2006.
[3] M. Aigner and M. Fromme.
Mathematics, 1984.
A game of cops and robbers.
Discrete Applied
[4] S. Alexander, R. Bishop, and R. Ghrist. Capture pursuit games on unbounded
domains. 1'Enseignement Mathematique, 2009.
[5]
S. Alpern and S. Gal.
2010.
The theory of search games and rendezvous. Springer,
[6] B. Alspach. Searching and sweeping graphs: a brief survey. Matematiche, 2004.
[7] V.I. Arkin. A problem of optimum distribution of search effort.
Probability Applications, 1964.
Theory of
[8] A. Arsie, K. Savla, and E. Frazzoli. Efficient routing algorithms for multiple
vehicles with no explicit communications. Transactions on Automatic Control,
2009.
[9] R. Baeza-Yates, J. Culberson, and G.J.E. Rawlins. Searching the plane. Information and Computation, 106:234-252, 1993.
[10] L. Barriere, P. Fraigniaud, N. Santoro, and D. Thilikos. Searching is not jumping.
Proceedings of the 29th Workshop on Graph Theoretic Concepts in Computer
Science, 2003.
[11] T. Basar and G.J. Olsder. Dynamic Noncooperative Game Theory. SIAM, 1999.
[12] A. Beck. On the linear search problem. Israel Journal of Mathematics, 1964.
[13] A. Beck. More on the linear search problem. Israel Journal of Mathematics,
1965.
143
[14] A. Beck and M. Beck.
Mathematics, 1984.
Son of the linear search problem.
Israel Journal of
[15] A. Beck and D. Newman. Yet more on the linear search problem. Israel Journal
of Mathematics, 8, 1970.
[16] R. Bellman. An optimal search problem. SIAM Review, 1963.
[17] S.J. Benkoski, M.G. Monticino, and J.R. Weisinger. A survey of the search theory
literature. Naval Research Logistics, 1991.
[18] D.J. Bertsimas and G.J. van Ryzin. A stochastic and dynamic vehicle routing
problem in the Euclidean plane. OperationsResearch, pages 601-615, 1991.
[19] D.J. Bertsimas and G.J. van Ryzin. Stochastic and dynamic vehicle routing in
the Euclidean plane with multiple capacitated vehicles. Operations Research,
pages 60-76, 1993.
[201 D.J. Bertsimas and G.J. van Ryzin. Stochastic and dynamic vehicle routing
with general interarrival time distributions. Advances in Applied Probability,
pages 947-978, 1993.
[21] B. Bethke, J.P. How, and J. Vian. Group health management of UAV teams
with applications to persistent surveillance. Proceedings of the American Control
Conference, 2008.
[22] D. Bhadauria and V. Isler. Capturing an evader in a polygonal environment with
obstacles. In Proceeding of the Joint Conference on Artificial Intelligence, 2011.
[23] M. Bloomberg and D. Yassky. Taxi cab fact book. New York City Taxi and
Limousine Commission, 2014.
[24] S. D. Bopardikar, F. Bullo, and J. P. Hespanha. Sensing limitations in the lion
and man problem. Proceedings of the American Control Conference, 2007.
[25] R. Borie, C. Tovey, and S. Koenig. Algorithms and complexity results for pursuitevasion problems. Proceedings of the InternationalJoint Conference on Artificial
Intelligence, 2009.
[26] F. Bullo, J. Cortes, and S. Martinez. Distributed Control of Robotic Networks: A
mathematical approach to motion coordination algorithms. Princeton University
Press, 2009.
[27] F. Bullo, E. Frazzoli, M. Pavone, K. Savla, and S.L. Smith. Dynamic vehicle
routing for robotic systems. Proceedings of the IEEE, 2011.
[28] A. Charnes and W.W. Cooper. The theory of search: optimum distribution of
search effort. Management Science, 1958.
144
[29] Y. Chevalyre. Theoretical analysis of the multi-agent patrolling problem. Proceedings of Intelligent Agent Technology, 2004.
[30]
M.C. Chew. A sequential search procedure. The Annals of Mathematical Statistics, 1967.
[31] H. Choset. Coverage for robotics - a survey of recent results. Annals of Mathematics and Artificial Intelligence, 31:113-126, 2001.
[32] T.H. Chung and J.W. Burdick. Multi-agent probabilistic search in a sequential decision-making framework. Proceedings of the InternationalConference on
Robotics and Automation, 2008.
[33] T.H. Chung, G.A. Hollinger, and V. Isler. Search and pursuit-evasion in mobile
robotics: a survey. Autonomous Robots, 2011.
[34] T.H. Chung and R.T. Silvestrini. Modeling and analysis of exhaustive probabilistic search. Naval Research Logistics, 2014.
[35] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2009.
[36] E Demaine, S.P. Fekete, and S. Gal. Online searching with turn cost. Theoretical
Computer Science, 2006.
[37] J.M. Dobbie. A survey of search theory. Operations Research, 1968.
[38] J. Enright and E. Frazzoli. Optimal foraging of renewable resources. Proceedings
of the American Control Conference, 2012.
[39] F.V. Fomin and D.M. Thilikos. An annotated bibliography on guaranteed graph
searching. Theoretical Computer Science, 2008.
[40] R.L. Francis and H.D. Meeks. On saddle point conditions and the generalized
Neyman-Pearson problem. Australian Journal of Statistics, 1972.
[41] D. Fudenberg and J. Tirole. Game Theory. MIT Press, 1991.
[42] S. Gal. Search Games. Academic Press, New York, 1980.
[43] G. Gottlob, N. Leone, and F. Scarcello. Robbers, marshals, and guards: game
theoretic and logical characterizations of hypertree width. Journal of Computer
and System Sciences, 2003.
[44] L.J. Guibas, J. Latombe, S.M. LaValle, D. Lin, and R. Motwani. Visibility-based
pursuit-evasion in a polygonal environment. InternationalJournal of Computational Geometry and Applications, 1999.
[45] R. Hohzaki. Discrete search allocation game with false contacts. Naval Research
Logistics, 2007.
145
146] V.A. Huynh, J.J. Enright, and E. Frazzoli. Persistent patrol in stochastic environments with limited sensors. In Proceedings of AIAA Conference on Guidance,
Navigation, and Control, 2010.
1471 R. Isaacs. Games of pursuit: A technical report, 1951.
148] R. Isaacs. Differential Games. Dover Publications, 1965.
[49] V. Isler, S. Kannan, and S. Khanna. Randomized pursuit evasion in a polygonal
environment. IEEE Transactions on Robotics, 2005.
[50] J.B. Kadane. Discrete search and the Neyman-Pearson lemma. Journal of Mathematical Analysis and Applications, 1968.
[51] J.B. Kadane. Optimal whereabouts search. Operations Research, 1971.
[52] J.B. Kadane and H.A. Simon. Optimal strategies for a class of constrained
sequential problems. The Annals of Statistics, 1977.
[53] D.V. Kalbaugh. Optimal search among false contacts. SIAM Journal of Applied
Mathematics, 1992.
[54] M. Kao, J.H. Rief, and S.R. Tate. Searching in an unknown environment: an
optimal randomized algorithm for the Cow-Path Problem. Information and Computation, 1996.
[551 A. Kolling and S. Carpin. The GRAPH-CLEAR problem: definition, theoretical
properties and its connections to multi-robot aided surveillance. Proceedings of
the Conference on Intelligent Robots and Systems, 2007.
[56] A. Kolling and S. Carpin. Probabilistic graph-clear. Proceedings of International
Conference on Robotics and Automation, 2009.
[57] A. Kolling and S. Carpin. Pursuit-evasion on trees by robot teams. IEEE Transactions on Robotics, 2010.
[58] B.O. Koopman. The Summary Reports Groups of the Columbia University Division of War Research. OEG Report No. 56, 1946.
[59] B.O. Koopman. The theory of search part I: kinematic bases. Operations Research, 1956.
[60] B.O. Koopman.
Research, 1956.
The theory of search part II: target detection.
Operations
[61] B.O. Koopman. The theory of search part III: the optimum distribution of
searching effort. Operations Research, 1957.
[62] B.O. Koopman.
Monthly, 1979.
Search and its optimization.
146
The American Mathematical
[631 B.O. Koopman. Search and Screening: General Principles with Historical Applications. Pergamon Press, 1980.
[64] S. Kopparty and C.V. Ravishankar. A framework for pursuit evasion games in
Rn. Information Processing Letters, 2005.
[65] H. Lau, S. Huang, and G. Dissanayake. Probabilistic search for a moving target
in an indoor environment. Proceedings of IEEE/RSJ International Conference
on Intelligent Robots and Systems, 2006.
166] S.M. LaValle. Planning Algorithms. Cambridge University Press, 2006.
[67] S.M. LaValle, D. Lin, L.j. Guibas, J. Latombe, and R. Motwani. Finding an
unpredictable target in a workspace with obstacles. Proceedings of International
Conference on Robotics and Automation, 1997.
[68] M. Lehnerdt. On the structure of discrete sequential search problems and of their
solutions. Mathematische Operationsforschungund Statistik Series Optimization,
1982.
[69] L. Liu, C. Andris, and C. Ratti. Uncovering cabdriver's behavior patterns from
their digital traces. Computers, Engineers, and Urban Systems, 2010.
[70] M. Maschler, E. Solan, and S. Zamir.
Press, 2013.
Game Theory. Cambridge University
[71] J. Nash. Equilibrium points in n-person games.
Academy of Sciences, 1950.
Proceedings of the National
[72] J. Nash. Non-cooperative games. The Annals of Mathematics, 1951.
[731 New York City Taxi and Limousine Commission. 2014 taxicab fact book. Promotional brochure, 2013.
[74] J. Neyman and E.S. Pearson. On the problem of the most efficient tests of
statistical hypothesis. Philosophical Transactions A: Mathematical, Physical,
and Engineering Sciences, 1933.
[75] R. Nowakowski and P. Winkler. Vertex-to-vertex pursuit in a graph. Discrete
Mathematics, 1983.
[76] J. O'Rourke. Art Gallery Theorems and Algorithms. Oxford University Press,
1987.
[77]
M. Osborne. An Introduction to Game Theory. Oxford University Press, 2003.
[78] T. Parsons. Pursuit-evasion in a graph.
lecture notes in mathematics, 1978.
147
Theory and applications of graphs:
[79] N.N. Petrov. A problem of pursuit in the absence of information on the pursued.
Differentsial'nye Uravneniya, 1982.
[80] H.N. Psaraftis. Vehicle Routing: Methods and Studies, chapter Dynamic Vehicle
Routing Problems, pages 223-248. Elsevier (North-Holland), 1988.
[81] P. Root. Persistent Patrolling in the Presence of Adversarial Observers. PhD
thesis, Massachusetts Institute of Technology, 2014.
[82] H. Sato and J.O. Royset. Path optimization for the resource constrained searcher.
Naval Research Logistics, 2010.
[83] P.D. Seymour and R. Thomas. Graph searching and a min-max theorem for tree
width. Journal of Combinatorial Theory, 1993.
[84] J. Sgall. Solution of David Gale's Lion and Man problem. Theoretical Computer
Science, 2001.
[85] S. Smith, S. Bopardikar, and F. Bullo. A dynamic boundary guarding problem
with translating targets. Proceedings of the Conference on Decision and Control,
2009.
[86] S. Smith and D. Rus. Multi-robot monitoring in dynamic environments with
guaranteed currency of observations. Proceedings of the Conference on Decision
and Control, 2010.
[87] S. Smith, M. Schwager, and D. Rus. Persistent robotic tasks: monitoring and
sweeping in changing environments. IEEE Transactions on Robotics, 2012.
[88] D. Song, C.Y. Kim, and J. Yi. Stochastic modeling of the expected time to search
for an intermittent signal source under a limited sensing range. In Proceedings
of Robotics Science and Systems, 2010.
[891 K. Spieser and E. Frazzoli. The Cow-Path Game: a competitive vehicle routing
problem. Proceedings of the Conference on Decision and Control, 2012.
[90] K. Spieser and E. Frazzoli. Cow-Path Games with asymmetric information: life
as a cow gets harder. Proceedings of the Conference on Decision and Control,
2013.
[91] K. Spieser and E. Frazzoli. Dynamic Cow-Path Games: search strategies for a
changing world. Proceedings of the American Control Conference, 2014.
[92] L. D. Stone. Theory of Optimal Search. Academic Press, New York, 1975.
[93] L.D. Stone. Theory of Optimal Search: 2nd Edition. Operations Research Society
of America, 1989.
[941
C.W. Sweat. Sequential search with discounted income, the discount a function
of the cell searched. The Annals of Mathematical Statistics, 1970.
148
[95] S. Zahl. An allocation problem with applications to operations research and
statistics. Operations Research, 1963.
[96] M. Zhu and E. Frazzoli. On competitive search games for multiple vehicles.
Proceedings of the Conference on Decision and Control, 2012.
149
Download