Cow-Path Games: Tactical Strategies to Search for Scarce Resources by Kevin Spieser Submitted to the Department of Aeronautics and Astronautics in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY October 2014 [PeoV 2\Z] @ Massachusetts Institute of Technology 2014. All rights reserved. Lf Signature redacted Author.................................................... Department of Aeronautics and Astronautics October 7, 2014 Signature redacted Certified by............ Emilio Frazzoli Professor of Aeronautics and Astronautics Thesis Supervisor Signature redacted ..................... . . . . Certified by............... Iatrick Jaillet Professor of Electrical Engineering Corpmittee Member Signature redacted Certified by...... Hamsa Balakrishnan Associate Professor of Aeronautics and Astronautics Committee Member FN Accepted by ..... ..................... Signature redacted ........ Paulo C. Lozano Associate Professor of Aeronautics and Astronautics Chair, Graduate Program Committee 2 Cow-Path Games: Tactical Strategies to Search for Scarce Resources by Kevin Spieser Submitted to the Department of Aeronautics and Astronautics on October 7, 2014, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract This thesis investigates search scenarios in which multiple mobile, self-interested agents, cows in our case, compete to capture targets. The problems considered in this thesis address search strategies that reflect (i) the need to efficiently search for targets given a prior on their location, and (ii) an awareness that the environment in which searching takes place contains other self-interested agents. Surprisingly, problems that feature these elements are largely under-represented in the literature. Granted, the scenarios of interest inherit the challenges and complexities of search theory and game theory alike. Undeterred, this thesis makes a contribution by considering competitive search problems that feature a modest number of agents and take place in simple environments. These restrictions permit an in-depth analysis of the decision-making involved, while preserving interesting options for strategic play. In studying these problems, we report a number of fundamental competitive search game results and, in so doing, begin to populate a toolbox of techniques and results useful for tackling more scenarios. The thesis begins by introducing a collection of problems that fit within the competitive search game framework. We use the example of taxi systems, in which drivers compete to find passengers and garner fares, as a motivational example throughout. Owing to connections with a well-known problem, called the Cow-Path Problem, the agents of interest, which could represent taxis or robots depending on the scenario, will be referred to as cows. To begin, we first consider a one-sided search problem in which a hungry cow, left to her own devices, tries to efficiently find a patch of clover located on a ring. Subsequently, we consider a game in which two cows, guided only by limited prior information, compete to capture a target. We begin by considering a version in which each cow can turn at most once and show this game admits an equilibrium. A dynamic-programming-based approach is then used to extend the result to games featuring at most a finite number of turns. Subsequent chapters consider games that add one or more elements to this basic construct. We consider games 3 where one cow has additional information on the target's location, and games where targets arrive dynamically. For a number of these variants, we characterize equilibrium search strategies. In settings where this proves overly difficult, we characterize search strategies that provide performance within a known factor of the utility that would be achieved in an equilibrium. The thesis closes by highlighting the key ideas discussed and outlining directions of future research. Thesis Supervisor: Emilio Frazzoli Title: Professor of Aeronautics and Astronautics Committee Member: Patrick Jaillet Title: Professor of Electrical Engineering Committee Member: Hamsa Balakrishnan Title: Associate Professor of Aeronautics and Astronautics 4 Acknowledgments This thesis has been a long time in the making. As with many lengthy endeavors, the road has not always been smooth. However, it is also true that, looking back, I am grateful for the experience, the knowledge gained, the doors that have been opened, and the many acquaintances and friends made along the way. Of course, I am also very appreciative of the frequent encouragement and timely distractions provided by those that have seen me through this degree. My advisor, Emilio Frazzoli, afforded me tremendous freedom to pursue a widerange of research topics throughout my studies. This flexibility to think freely across a breadth of problems not only kept my work engaging, but also made me a more independent, well-rounded, and all together better researcher. I must also point out the opportunities I had to visit NASA Ames in California, SMART in Singapore, as well as the various venues at which I have been fortunate enough to present my research. He helped make all of these ventures possible. Finally, as I got off to somewhat of a rocky start at MIT, I owe him a special thanks for sticking with me. The remaining members of my thesis committee, Professors Patrick Jaillet and Hamsa Balakrishnan, have provided thoughtful commentary and a fresh perspective on my research during our meetings and the writing of this document. I am grateful for their time and feedback. Lastly, I have seen many fellow lab mates come and go in my time at MIT. I have taken classes and collaborated on research projects with a number of these individuals. Their assistance with solving homework problems, studying for exams, marking exams, writing papers, re-writing papers, and brainstorming ideas is much appreciated. These interactions have been one of the defining features of my graduate school experience. I hope that, on the whole, they have found my contributions to these efforts as insightful and formative as I have found theirs. Thank you! 5 Sincerely, Kevin Spieser 6 This thesis is dedicated to Briggs and my parents. No cows were harmed in the writing of this thesis. 7 8 Contents 1 2 3 1.1 M otivation . . . . . . . . . . . . . . . . . . . .... . . . . . . . . . . . 22 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.3 O rganization . . . . . . . . . . . . . .... . . . . . . . . . . . . . . . 28 31 Background Material 2.1 Search Theory: An Introduction . . . . . . . . . . . . . . . . . . . . . 32 2.2 Probabilistic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1 Search with an imperfect sensor . . . . . . . . . . . . . . . . . 35 2.2.2 Search with a perfect sensor . . . . . . . . . . . . . . . . . . . 37 2.3 Pursuit-evasion games . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.4 Persistent planning problems . . . . . . . . . . . . . . . . . . . . . . . 42 45 Mathematical Preliminaries 3.1 4 21 Introduction G am e theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 51 The Cow-Path Ring Problem 4.1 Introducing The Cow-Path Ring Problem . . . . . . . . . . . . . . . . 51 4.2 CPRP Notation and Terminology . . . . . . . . . . . . . . . . . . . . 53 9 The number of turns in the CPRP . . . . . . . . . . . . . . . . . . 58 4.4 An iterative algorithm for s* . . . . . . . . . . . . . . . . . . . . . . 59 4.5 A direct algorithm for finding s* . . . . . . . . . . . . . . . . . . . . 64 4.6 Summary of the CPRP . . . . . . . . . . . . . . . . . . . . . . . . . 64 . . . . 4.3 5 The Cow-Path Ring Game 67 Adding a second cow to the ring . . . . . . . . . . . . . . . . . . . . 67 5.2 A model for informed cows . . . . . . . . . . . . . . . . . . . . . . . 68 5.3 Defining the Cow-Path Ring Game . . . . . . . . . . . . . . . . . . 70 5.4 A remark about Cow-Path games on the line . . . . . . . . . . . . . . 71 5.5 CPRG-specific notation and terminology . . . . . . . . . . . . . . . 72 5.6 Search strategies in the CPRG . . . . . . . . . . . . . . . . . . . . . 74 5.7 The one-turn, two-cow CPRG . . . . . . . . . . . . . . . . . . . . . 74 5.8 1T-CPRG: computational considerations . . . . . . . . . . . . . . . 83 5.9 The 1T-CPRG for different cow speeds . . . . . . . . . . . . . . . . 84 5.10 Finite-turn CPRGs . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.11 Summary of the CPRG . . . . . . . . . . . . . . . . . . . . . . . . . 87 Games with Asymmetric Information: Life as a Cow Gets Harder 91 . . . . . . . . . . 5.1 Searching with asymmetric information: 92 6.2 Supplementary notation and terminology . 93 6.3 Information models for situational awareness 93 6.4 Behavioral models for asymmetric games . 95 6.5 A bound on the maximum number of turning points in the CPRG 95 . a motivating example . . . . . . . . . . . . . 6.1 . 6 10 7 6.6 CPRGs with asymmetric information . . . . . . . . . . . . . . . . . . 99 6.7 AI-CPRGs with perfect knowledge . . . . . . . . . . . . . . . . . . . 99 6.8 Socially Optimal Resource Gathering . . . . . . . . . . . . . . . . . . 105 6.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Dynamic Cow-Path Games: Search Strategies for a Changing World113 A motivation for dynamic environments . . . . . . . . . . . . . . . . . 7.2 Dynamic Cow-Path Games with target transport requirements . . . . 115 7.3 Greedy search strategies for the DE-CPRG . . . . . . . . . . . . . . . 7.4 Equilibria utilities of cows in the DE-CPRG . . . . . . . . . . . . . . 120 7.5 An aggregate worst-case analysis of greedy 7.6 8 114 7.1 118 searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 . . . . . . . . . . . . . . . . . . . 134 Conclusions and Future Directions 137 Summary and Future Directions 11 12 List of Figures 1-1 Depiction of real-world scenarios where agents compete, against one another, to capture targets given limited information on the location of targets. (a) Snapshot of taxi operations in Manhattan. In busy urban cities, the operation of taxi drivers trying to find passengers can be represented as a competitive search game. (b) A sunken treasure ship wrecked along a coral reef. The exploits that result from two rival recovery boats each trying to find the ship using a crude sonar map of the area constitutes a competitive search game. (c) A Kittyhawk P-40 that crash-landed in the Sahara desert during World War II. In the event rescue and enemy forces had some idea of the aircraft's location, e.g., from (intercepted) radio communications, and had each party launched a recovery operation, the resulting race to locate the plane would be a competitive search game. . . . . . . . . . . . . . . . 2-1 26 A partial taxonomy of select sub-disciplines within the field of search theory. It is worth reinforcing that the families of games represented are only a relevant sampling of select research areas in the field, and not an exhaustive listing. The box shaded in blue represents agentvs-agent or competitive search games, the class of problems that will be the focus of this thesis. The location of CSGs in the tree indicates these problems share fundamental attributes with probabilistic search problems and pursuit-evasion search games. The image was inspired by similar figures reported in [17], [33]. . . . . . . . . . . . . . . . . . 13 34 2-2 Illustration of the key features of the stochastic or average case CowPath Problem. Starting from the origin, the cow explores R in search of clover W. A hypothetical search plan is shown in gray. In the instance depicted, the cow makes four turns before finding the target at the point marked with a red exclamation mark. 4-1 . . . . . . . . . . . . . . 38 A visualization of the Cow-Path Ring Problem. The target density, f , as a function of radial position q, is shown in blue. In the instance depicted, the target, 7, is located on the North-West portion of 'Z. In the search plan shown, the cow (yellow triangle) travels in the ccw direction toward q1, where, having not found 7, she reverses direction, and travels in the cw direction toward q2 . Upon reaching q2 , having still not found T, she again reverses direction and continues searching in the ccw direction until ultimately finding 7 at the site indicated with a red exclamation mark. 5-1 . . . . . . . . . . . . . . . . . . . . . . 55 A visualization of the Cow-Path Line Game. The target density, fT, is shown in blue. The unique equilibrium search strategy of each cow, s* = (9, 9), is indicated by a directed gray line. Under s*, Ci heads toward eCi and, just before meeting, reverses direction and visits any previously unexplored territory. . . . . . . . . . . . . . . . . . . . . . 5-2 72 An instance of the CPRG illustrating the initial positions and initial headings of cows C1 and C 2 . The trajectories of both cows, right up to the point of capture, are shown in dark gray. The target density f achieves a global maximum in [-7r/4, 0]. In the instance shown, 7 is located along the North-West portion of 'Z. The site at which 7 is found, in this case by C 2 , is indicated with a red exclamation mark. . 14 73 5-3 A diagram showing associations between families of finite-turn CPRGs. The node labelled with the pair (i, j) denotes the family of games in which C 1 and C 2 may turn up to i and j times, respectively. The numbers above and beside the arrows indicate which cow turns to bring about the indicated transition. The nodes representing base case games, for which equilibria strategies may be found using the methods discussed in previous sections, are colored in red. The nodes representing all other games are colored in gray. The arrows indicate how one family of games reduces to a simpler family of games when a cow turns. For example, the (2,2)-CPRG becomes an instance of the (1,2)-CPRG when C 1 turns, and an instance of the (2, 1)-CPRG when C 2 turns. . . 6-1 88 Visualization of the functionality of notation used for describing subregions of 'Z and one point relative to another on 'Z. Due to the circular topology of 'R, there is flexibility in the notational system. For example, [qi, q2]c. and [q2, qi]cc. refer to the same arc of R. Similarly, (q 3 + x)c. and (q 3 + 27r - x)c, refer to the same point on 'Z. . . . . . 6-2 Initial positions, qi(0); initial headings, #j(0); and target priors, of C 1 and C 2 for an instance of an AI-CPRG. (a) f7, 94 f7; shown in blue, has local maxima along the South-East and North-West regions of 'Z. (b) f2, shown in green, is more evenly distributed and contains three modest peaks along 'Z. For q E 'Z such that ff(q) $ f2(q), C1 and C 2 have different valuations for visiting q first. . . . . . . . . . . . . . . . An instance of an AI-CPRG. The cows (depicted as cars) C 1 and C 2 are initially diametrically opposed at the top and bottom of 'R, respectively. C 1 's prior on T, namely f1j, is shown in blue. Owing to ff, motivated to, if possible, be the first cow to explore segments 'Z C 1 is 1 and 'R 2 . Shown in green, it is assumed that f2(q) = 1, Vq E 'R, such that any two segments of 'Z having equal length are equally valuable to C 2 . 6-3 100 The points a, b, c, d, e, f and g are points of interest in Example 6.3. 15 . 102 6-4 Visualization of key quantities used in the proof of Theorem 6.5. The points labelled 1, 2, and 3 in red correspond to the three points visited by C 1 in 6.14. In the instance shown, d* 6-5 = . . . . . . . . . . . . ccw. 104 Illustration of the three ways in which Uso can fall short from the maximum value of 2. In each figure, f17 and fj are shown in blue and green, respectively. In (a), C 1 and C 2 are initially positioned on the "wrong" sides of 9, resulting in a shortfall from 2. Were the cows able to switch positions, the shortfall could be avoided. In (b), overlap between Su(ff) and Su(f2) creates unavoidable inefficiency. In (c), the shortfall results from the lack of convexity of 6-6 ff and f2. . . . . . 107 A socially optimal search strategy for the scenario considered in Example 6.3. The socially optimal search strategy is illustrated by the purple line: C 1 and C 2 rendezvous at b, having travelled there in the cw and ccw directions, respectively, and proceed to explore [qi (0), q 2 (0)]c,, in tandem. The segment R1, shown in red, is the portion of the ring that transitions from being explored by C 2 in a cooperative search to being visited first by C 1 in a competitive search. . . . . . . . . . . . . 7-1 111 A sample sequence of target capture times associated with the early stages of a DE-CPRG. In the instance shown, C1 captures targets 71, T2 , and 7 4 , while C2 captures targets 73 and 75 . If the statistics shown are representative of steady-state behavior, then the aggregate utilities of the cows would be U$'g(s) = 0.6 and U2(s) = 0.4, respectively. . . . 7-2 117 An isometric visualization of an instance of the DE-CPRG. The prior fo is shown in blue as a function of position along . Also shown are the origin and destination points associated with targets 7j, 'J+1, and 7i+2 . At the instant shown, C 1 and C2 are searching 9R for W,. According to #y, once 7j is discovered and transported from (9 to PDj, 75+j is popped from the queue of targets and appears on 9. 16 . . . . . 118 7-3 A snapshot of a DE-CPRG taken at the start of CPRGj. The origintarget density, fo, and destination-target density, fD, are shown in (a) and (b), respectively. Targets are significantly more likely to (i) arrive in E1 rather than 'Z \E 1 and (ii) seek transport to 7-4 0 2 rather than 'Z \8 2.120 Illustration of two scenarios used in the proof of Proposition 7.1. In (a), C2 discovers a target at 0 , that requires transport to 9 b, a distance L away. During transport, C1 has time to optimally preposition herself at Oc in preparation for the next stage. A finite time later, in (b), the } roles reverse, 1, discovers a target at 0 , that also requires transport to 6 7-5 b, allowing C2 to optimally position herself at 0, for the next game. . 122 Visual breakdown of a typical interval spanning the time between successive target captures for Ci using si = s-. On average, it takes Ci time d to return to qV: after delivering 7j. From the perspective of Ci, in the worst-case, Ci finds a target at time td(Tij)+, which, on average, is delivered in time da. 7-6 . . . . . . . . . . . . . . . . . . . . . 132 Segments from possible sample runs, from the perspective of Ci, of a D E-CPRG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 134 18 List of Tables 3.1 Utility payoffs for the simple two-agent search game in Example 3.4. The possible actions of agent 1 are displayed in the leftmost column. The possible actions of agent 2 are displayed along the top row. Given each agent's search strategy, the first and second entry in each cell represent the utility of agent 1 and agent 2, respectively. For example, if agent 1 searches using si = a and agent 2 searches using s 2 = b, then the probability of agent 1 finding the target is 1 and the probability of agent 2 finding the target is . ..... ................. 48 4.1 Summary of general and CPRP-specific notation used in the thesis. 54 5.1 Summary of CPRG-specific Notation . . . . . . . . . . . . . . . . . . 70 19 20 Chapter 1 Introduction When eating an elephant, take one bite at a time. Creighton Abrams This thesis considers the decision-making process of mobile, self-interested search agents that compete, against one another, to find targets in a spatial environment. We are quick to point out that the adversarial scenarios of interest are fundamentally distinct from the cooperative formulations that dominate much of multi-agent search theory. By and large, these existing works study the exploits of a team of searchers that cooperate to efficiently locate a target. Unsurprisingly, the study of search scenarios that stress inter-agent competition among searchers calls for a new evaluative framework and a customized assortment of analytic methods. Providing these elements and putting them to use is the central contribution of this thesis. The role of this preliminary chapter is to, at a high-level, introduce the types of problems that will be of interest. To this end, we recount a number of real-world examples in which the competitive search framework features prominently. The aim is to whet the reader's appetite and motivate why the problems considered are both intriguing and relevant. In particular, we will provide an example based on the operation of a taxi system that will be revisited and serve as a motivational aid 21 at various points throughout the thesis. A contributions section discusses how we see the work that comprises this document supplementing and extending the field of search theory. Finally, this chapter provides an overview of the thesis's organizational structure. This outline is useful as both a navigational aid and a preview of the story that follows. 1.1 Motivation The problems considered in this thesis result from the fusion of two fundamental, yet previously disparate, ideas. The first key idea is that a mobile agent that wishes to locate a target, but does not know the target's exact location, is obligated to search for it. The second key idea is that competition naturally arises when multiple selfinterested agents each vie to acquire a scarce resource. The first point is the premise of search theory. The second point, somewhat more subtly, touches on the competitive undertones of game theory. This work considers scenarios that incorporate both of these notions through the study of multi-agent systems in which the agents compete to capture targets given only limited knowledge of where the targets are located. To understand the void that the problems investigated in this thesis begin to fill, it is useful to, very briefly, highlight the types of search problems considered to date by those in the community. A more detailed account of the relevant literature will be provided in the next chapter. In probabilistic search problems, one or more search agents attempt to (efficiently) capture a target that is indifferent to their actions. In many of these cases, search plans must be devised given only limited knowledge of a target's location. An example of a probabilistic search problem is the case of an explorer trying to find buried treasure given a raggedy and faded map of the area. When multiple agents are involved in the search, their plans are often formulated in a cooperative context in order to improve efficiency. For example, coordinated planning may increase the probability the target is found or reduce the expected time required to capture the 22 target. It also has the benefit of avoiding redundancies that could emerge if the agents planned independently. In pursuit-evasion games, the target assumes a more animated role and actively chooses a fixed hiding location (immobile target) or trajectory (mobile target) to. evade capture by a team of searchers. Again, the searchers act cooperatively in order to capture the target efficiently, e.g., in minimum time or minimum expected time. In other words, competitive tension exists only between the target and, collectively, the team or search agents. The agents themselves have no preference for which agent, if any, ultimately finds the target. An example of a pursuit-evasion game is the case of an escaped convict who tries to evade capture by a team of police officers. Pursuitevasion games naturally incorporate an element of game theory, as it is reasonable, and often necessary, for each party to factor in their adversary's actions when planning routes and making search-related decisions. Probabilistic search problems and pursuit-evasion games address decision-making in a host of applications. However, they offer little guidance about how agents should search when the agents, themselves, compete against one another to capture targets. Pragmatically, it is fair to ask, why might one be interested in these scenarios? The answer, in the author's opinion, is that there are a number of relevant venues where the agent-versus-agent search dynamic features prominently. Before providing a collection of examples, it is useful to first give an informal description of the exact relationship that exists between agents and targets in the problems of interest. Naturally, a formal discussion of each of these components will follow later chapter of the thesis. In the search games considered in this thesis, each agent is adversarially aligned with every other agent. Agents do not form teams, nor do they cooperate, unless doing so expressly benefits all parties. Agents are, unless otherwise stated, and aside from their initial conditions, homogeneous. Each agent has a prior on the location of targets, but the exact locations of the targets are unknown. Moreover, an agent can discover a target only when standing directly over it. Unlike pursuit-evasion games, the targets in the games we consider are artifacts of the environment, not strategic 23 decision-makers. The targets are purely immobile and the locations at which they appear in the environment are determined by a random process. Instead, the game, as it were, is played among the search agents, with each agent trying to capture as many targets as possible. To emphasize the stark differences that exist between this search framework and the cooperative formulations previously described, we refer to the problems considered in this thesis as competitive search games. Returning to scenarios that emphasize the incentive for strategic decision-making in a competitive search setting, consider the role of yellow cabs in Manhattan [73]. These taxis operate in what is called a "hail market". By law, yellow cab drivers may only pick up passengers that have hailed them from the side of the street 169]. They cannot schedule jobs in advance, nor can they respond to call-in requests. (Jobs that originate under these circumstances are handled by a separate fleet of vehicles.) We argue that yellow cab operations constitute a competitive search game. Abstractly, the road network on which taxis drive may be viewed as a graph, whose edges and vertices represent roadways and intersections, respectively. The targets are the passengers; they arrive dynamically according to an, albeit fairy complex socially-driven and time-variant, spatio-temporal process. The game is played among the taxi drivers, with each driver trying to maximize their individual revenue, which clearly requires getting passengers onboard. To operate effectively, drivers must plan their routes by accounting for the spatial demand pattern of passengers, as well as the location of nearby cabs. An interesting feature of this system is that targets must be transported from a pickup location to a dropoff location. That is, there is a service component associated with capturing a target. The logistics of taxi operations will be revisited at various points throughout this thesis to motivate specific problems of interest. As a second example, consider the case of two rival shipwreck-recovery boats searching for the outskirts of a jagged coral reef for the remnants of a treasure ship lost at sea. Once more, we argue that this encounter has all the makings of a competitive search game. The environment is the subset of R2 that represents the waters surrounding the reef. The lone target is the sunken ship. The game is played between 24 the two recovery boats, with each boat trying to discover the wreck (and any treasure that may be onboard) first. Given a priori knowledge of where the ship sank, perhaps from crude sonar images, historical maps, word of mouth accounts of the sinking, etc., each boat must chart a course to search the coastal waters surrounding the reef. Once again, prudent search strategies must factor in not only probabilistic information about where the target is likely to reside, but also the presence of a rival salvage boat that harbors similar ambitions. A version of this scenario will motivate the work in Chapter 6. The preceding scenarios differ markedly in terms of workspace geometry, the number of agents involved, the processes by which targets arrive, and the time scales over which searching takes place. This suggests competitive search games encompass a broad class of search problems and there are potentially many other practical applications that fit naturally within the framework. For example, the same ideas emerge in the prospecting and mining industries, where rival firms survey vast swaths of land for gold, minerals, or oil deposits. Given preliminary geographic information, how one of a handful of firms should prioritize testing potential mining sites in order to be the first to file claims on the most lucrative locations, fits well within the domain of competitive search games. Along similar lines, imagine a military aircraft carrying sensitive information over hostile enemy territory were to crash-land in a desert. How should authorities search for the aircraft knowing other individuals, perhaps with unscrupulous intentions, are also looking for the plane. Once again, the competitive search game framework is a natural venue to pursue this question. Finally, competitive search games also have relevant connections to the foraging behavior of animals in the wild and the diffusion of bacterial colonies over a nutrient-laden agar plate. A sampling of these scenarios are illustrated in Figure 1-1. Given the apparent relevance of competitive search games, it is surprising, at least to the author, that few results, even for scenarios involving just two search agents, have been reported in the literature. At this stage, we have, hopefully, convinced the reader that the competitive search 25 game framework captures the incentive for strategic decision-making in a host of meaningful search applications. However, as alluded to, the scenarios mentioned above differ markedly with respect to key features and environmental parameters, e.g., the number of search agents involved, the workspace geometries, the processes by which targets arrive, and the time scales over which searching takes place. Unsurprisingly, it would be rather ambitious to expect a single formulation to capture the nuances and peculiarities of each scenario. In this thesis, we will consider a collection of idealized search games, with each encounter emphasizing one or more of the elements that punctuate the aforementioned scenarios. In analyzing these encounters, this thesis provides the first rigorous analysis of competitive search games. A more detailed exposition of our research philosophy and the contributions of our work is discussed in the next section. (a) (b) (c) Figure 1-1: Depiction of real-world scenarios where agents compete, against one another, to capture targets given limited information on the location of targets. (a) Snapshot of taxi operations in Manhattan. In busy urban cities, the operation of taxi drivers trying to find passengers can be represented as a competitive search game. (b) A sunken treasure ship wrecked along a coral reef. The exploits that result from two rival recovery boats each trying to find the ship using a crude sonar map of the area constitutes a competitive search game. (c) A Kittyhawk P-40 that crash-landed in the Sahara desert during World War II. In the event rescue and enemy forces had some idea of the aircraft's location, e.g., from (intercepted) radio communications, and had each party launched a recovery operation, the resulting race to locate the plane would be a competitive search game. 26 1.2 Contributions This section provides a high-level synopsis of the contributions of the thesis. A detailed account of specific advancements can be found in the next section, where the thesis is deconstructed on a chapter-by-chapter basis. Here, the focus is on conveying the spirit of the thesis, defining the scope of the work, and remarking on the value of the work going forward. The major contribution of this thesis is a collection of algorithmic strategies that agents can use to search for targets in an environment. However, unlike existing constructs, which have no competitive tension or pit a cooperative team of pursuers against a target, the strategies presented herein are designed for situations in which search agents compete against one another to find targets. As mentioned, we believe this framework encapsulates the inter-agent search dynamics in many real-world scenarios, yet acknowledge that, in their fullest form, these problems introduce a number of analytic and computational complexities. To make headway, we will often restrict ourselves to encounters that involve two agents and that take place in topologically simple environments. For example, many of the problems considered involve two agents contesting targets on a ring. Despite these restrictions, the search scenarios nevertheless support an assortment of interesting and sometime surprising behaviors. Moreover, the modest furnishings of these problems allow us to conduct a formal analysis of the constituent decision-making, often in the form of quantifiable performance bounds, if not equilibrium strategies for the agents involved. This approach not only caters to the author's research style, but also serves to compile a set of initial competitive search game contributions. In this way, our efforts begin the process of populating a toolbox of competitive search game results that may prove useful in tackling more elaborate problems. The next section outlines the content and contributions of the thesis on a chapterby-chapter basis. 27 1.3 Organization This thesis is organized in chapters. Chapter 2 begins our investigation by providing an overview of the relevant literature. By and large, this consists of contributions to the fields of probabilistic search, pursuit-evasion games, and persistent planning problems. Included here is an overview of the Cow-Path Problem. Many of the scenarios considered in this thesis are an adversarial twist on this well-known problem, so there is a vested interest in detailing its finer points. Throughout, we adopt the philosophy that by understanding the pillars currently in place, one can better define and appreciate the contributions of the work in this thesis. Still in a preparatory role, Chapter 3 provides a brief overview of some of the technical details used in subsequent chapters. Specifically, the game-theoretic terms discussed will be used to frame our discussion of algorithmic search strategies. With the requisite background material in place, the thesis moves straight into novel material. The environment of interest in Chapter 4, as well as many of those that follow, is a ring. Rather than plunge headfirst into a treatment of competitive search games on a ring, we adopt a more tempered approach. As a prelude, we consider the problem of how a hungry cow should search the ring to find a patch of clover in the minimum expected time, given only a prior on the clover's location. This problem is thematically very similar to the Cow-Path Problem, and serves as a stepping stone to the adversarial encounters that follow. Iterative algorithms are provided for finding optimal search strategies and, for games with bounded target densities, a bound is given on the maximum number of times an intelligent cow would ever turn around. In this latter case, we show that an optimal search plan may be expressed as the end product of a nonlinear program followed by a trimming algorithm. This result is noteworthy as the Cow-Path problem has no known closed-form solutions for general target distributions. Without further ado, Chapter 5 considers the canonical problem of this thesis: the Cow-Path Ring Game, a scenario in which two hungry cows compete, given a prior, 28 to find a patch of clover on a ring. Upon introducing the necessary notation and terminology, the problem is formally defined. Many of the games discussed in later chapters will be variations of this formulation. A concerted effort is made to explain why, for a number of reasons, including the realtime and feedback nature of the game, the Cow-Path Ring Game is a challenging problem. In response, a simplified version of the game is presented, one in which each cow may turn at most once. With this restriction in place, an iterative algorithm is developed that establishes the existence of an e-Nash equilibrium. Subsequently, a dynamic-programming-based approach is used to extend this result to games in which each cow may turn at most a finite number of times. Moving on, Chapter 6 considers a potpourri of interesting scenarios that are sufficiently different from the Cow-Path Game to justify a chapter of their own. By imposing a cost each time a cow turns, a bound is developed on the maximum number of times an intelligent cow would ever reverse directions. Continuing, we consider games in which each cow maintains a unique prior on the target's location. Carrying this momentum forward, we consider games where one cow is in the advantageous position of knowing where her rival suspects the target is located. This dichotomy requires specification of a behavioral model for both the more-informed and less-informed cow. Assuming the less-informed cow behaves conservatively, it is shown the game admits a Nash equilibrium. Moreover, for select distributions, the more-informed cow is able to increase her utility by leveraging her situational advantage. The asymmetric nature of these games invites the opportunity to study the social welfare of competitive search games. To this end, a cooperative search strategy is presented that is socially optimal for any set of target priors. This treatment transitions naturally into a discussion, albeit a brief one, of the price of anarchy in competitive search games. Chapter 7 considers competitive search games in which targets arrive dynamically on a ring. The persistent nature of these games places an emphasis on strategies that ensure targets are captured efficiently in the long-run. Among the dynamic games introduced are scenarios with transport requirements, in which targets, once 29 found, must be delivered to a destination point. A defining attribute of the search strategies in these games concerns how a cow should position herself while her rival is preoccupied delivering a target. As a first contribution, we show that in any equilibrium of a two-cow dynamic search game, each cow captures, in steady-state, half of all targets. With this benchmark established, it is shown that greedy search strategies can, for select target distributions, lead to arbitrarily poor capture rates. Because it is difficult to quantify the long-term effect of short-term actions in an equilibrium setting, we, instead, focus on defensive or conservative search strategies and lowerbound the expected fraction of targets captured using these methods. This bound is then compared with the theoretically optimum value, i.e., one half, for select target distributions. Finally, Chapter 8 summarizes the prominent ideas of the thesis, reflects on the contributions made, and evaluates the resultant state of competitive search games. It also takes the time to outline an assortment of open research directions that have arisen during the development of this work, but that, on account of time constraints, or their tangential nature, or both, have received only modest deliberation. 30 Chapter 2 Background Material This chapter provides an overview of related work. Recall that competitive search games stress the goal of finding targets in an agent-vs-agent setting. We view search theory as the natural domain of our work, and game theory as the appropriate framework to study the problems of interest. Consistent with this mindset, we will not attempt to summarize works from the field of game theory. Rather, a short compendium of the basic game-theoretic ideas, which constitute tools for our study, will be provided in Chapter 3. These ideas are well established and covered in any introductory text on the subject. The major contribution of this thesis lies in using these ideas to understand search in a novel setting. To this end, this chapter surveys relevant contributions to the field of search theory. The intent is to provide an overview of the literature, both pre-existing and ongoing, that has relevant connections to competitive search games, be the association in terms of application, methodology, or both. This highly focused survey of select works in the fields of probabilistic search, pursuit-evasion games, and, what we will refer to as persistent planning problems, fosters an appreciation for the state-of-the-art and assists in defining the ultimate identity of the thesis. Unlike competitive search games, the vast majority of works covered in this chapter do not pit one search agent against another. Nevertheless, they share salient features with the problems we consider or serve to better position the thesis work within a larger search narrative. 31 2.1 Search Theory: An Introduction Searching for a lost or hidden item is an age-old problem. People still misplace their keys, passport, phone, cash, etc., and, when they do, typically search to find them. Searching is also associated with larger-scale recovery operations, e.g., rescuing a camper lost in the wilderness. In this thesis, search theory is defined as the study of problems that take place in a spatial environment and involve n agents trying to find m targets. Note this definition is quite broad. For example, it says nothing about the interaction between agents, which could be of a cooperative or competitive nature, nor the manner in which targets are distributed and arrive. In many cases, the task of searching is constrained by the need to locate targets efficiently. For example, in some applications, there is the possibility a target goes undiscovered. Here, it makes sense to use a strategy that maximizes the probability of finding it. In other applications, it is known that a target resides somewhere in a bounded environment. Although a lawnmower-style sweep of the workspace is guaranteed to find the target, it is more appropriate to use a search strategy that minimizes the expected discovery time [31]. In short, interesting search problems combine the goal of finding targets with the need to provide some form of performance guarantee. When people misplace basic everyday items, they typically launch small-scale, ad-hoc campaigns to relocate them. These undisciplined approaches generally suffice for finding low-value items in small spaces. However, as the value of the object increases, e.g., a human life, the size of the environment grows, e.g., a forest hundreds of square kilometers in size, or both, successfully searching demands a more structured treatment. The first rigorous approach to solve search problems was undertaken during World War II when the Anti-Submarine Warfare Operations Group was tasked with finding submarines in the Atlantic [58], [63]. The declassification of these efforts kicked off a mathematical investigation of search-related problems. Today, search theory is an established field within operations research. More recently, advances in autonomy have initiated efforts to revisit traditional search paradigms from a robotics 32 perspective. This movement has attracted interest from control theorists, roboticists, and computer scientists. These researchers have raised the profile of new and emerging applications within the search community, and advocated for more algorithmic and pragmatic approaches to solve many existing search problems [17], [331. Auspiciously, search theory researchers have surveyed their field with remarkable regularity, including the authors of [33], which recount many of the efforts detailed here. Owing to these efforts, search problems generally obey a well-defined taxonomy. Figure 2-1 provides a visualization of select disciplines within the hierarchy of search problems. Rather than reiterate existing surveys, the focus of this chapter is on conveying advancements that help to frame the contributions of our work. Accordingly, we freely pick and choose to cover the topics we deem most relevant. In doing so, we often ignore or only briefly touch on contributions that while extremely significant, are of minimal importance for future discussion. For completeness, the interested reader can find detailed accounts of the classical period of search theory, spanning the first forty-or-so years of the field, which we touch on only briefly, in [17], [371, [62], [92]. The last section of the chapter highlights problems that stress persistent planning for applications that feature indefinite horizons of operation and time-varying environments. Many of the results reported have emerged only recently, and, so far, have had only tenuous affiliations with searching. Nevertheless, their long-term approach to planning is pertinent for the work in Chapter 7, where we consider games with dynamically arriving targets. In briefly surveying these works, the intent is to convey how our efforts contribute to the state-of-the-art. 33 search theory search games probabilistic search imperfect sensor static evader mobile evader agent(s) vs target perfect sensor statiC evader cow-path problems mobile evader games Figure 2-1: A partial taxonomy of select sub-disciplines within the field of search theory. It is worth reinforcing that the families of games represented are only a relevant sampling of select research areas in the field, and not an exhaustive listing. The box shaded in blue represents agent-vs-agent or competitive search games, the class of problems that will be the focus of this thesis. The location of CSGs in the tree indicates these problems share fundamental attributes with probabilistic search problems and pursuit-evasion search games. The image was inspired by similar figures reported in 1171, [331. 2.2 Probabilistic Search In probabilistic search problems, a search agent, located in environment Q, attempts to capture a target, T, whose initial position and movement are independent of the agent's actions [33]. That is, T is impervious or indifferent to being captured. In many cases, the agent must devise a search plan given only a prior density, fy : Q -- R>o, on T's position. Initial efforts to place probabilistic search problems on a firm theoretical foundation were undertaken in [58], and later expanded upon in [59], [60]. Here, geometric arguments are used to characterize the sensor footprint and detection efficiency of various sensor rigs, e.g., aerial surveillance with a human spotter. Included is a result linking random search to a detection probability function that is exponential in the time spent searching a bounded region. As Figure 2- 1 indicates, probabilistic search problems are naturally categorized according to the sensing capabilities of the agents involved. For most problems, the natural distinction is between sensors that are (i) imperfect, i.e., may generate false-negatives, and (ii) perfect, but have a finite sensing radius range. 34 2.2.1 Search with an imperfect sensor When an agent's sensor is imperfect, it may fail to detect a target within its sensing zone. Accordingly, whether the target is found must be discussed in probabilistic terms. For the time being, assume the target is stationary. The detection function b : Q x R>O - [0, 1], assigns the probability b(q, z) to finding the target at point q E Q, given z amount of effort is spent searching q, and the target is in fact at q. Given fW, b, and a constraint budget C on the resources (e.g., time) that can be devoted to searching, the canonical imperfect-sensor search problem is to find z*(q) the optimal expenditure of search effort at each point q E Q that maximizes the probability of finding the target subject to all constraints, i.e., z* = argmax J f(q)b(q, z(q))dq (2.1) z:Q-+-Ryo I s.t. Jz(q)dq ; C. qcEQ (2.2) In continuous environments, solutions to (2.1)-(2.2) typically employ Lagrange multiplier techniques [60], [92], [95]. Hypothesis testing [40], [741 and convex programming [28] methods have also been used. If an optimal search plan z*, with search budget T, has the property that for all t < T, z* restricted over [0, t] is optimal for search budget t, z* is said to be uniformly optimal - a desirable property. In [7], [61], it is shown that uniformly optimal search plans exist for a broad class of problems, provided b is a regular function, i.e., b(.) is increasing, but provides diminishing returns in the argument, [33], [93]. Unfortunately, the search plans these methods generate are not always feasible, and may require the agents jump or teleport between points in order to realize them [10], [35], [76]. Even today, search plans often neglect this fundamental requirement, providing high-level search instructions, but little instruction as to how these may be transformed into drivable search routes, i.e., realizable paths [26], [66]. The search plans considered in this thesis all correspond to continuous, i.e., realizable, trajectories. 35 From a search perspective, an environment Q is discrete if it may be partitioned into regions, 1, ... ,K such that b(k, z) is the probability of finding the target in region k, given z amount of effort is* spent searching region k (as a whole) and the target is in fact in region k. Unfortunately, Lagrange methods do not extend directly to discrete Q. The necessary amendments are discussed in [50]. As an analog to uniform optimality, it was shown in [30] that optimal search plans in discrete Q are temporally greedy if at any point during search, the next cell to be further inspected is the one that maximizes the marginal gain in detection probability per marginal effort expended. Operating within discrete environments makes it easier to formulate and analyze variants on the basic problem structure. [17] and [33] provide an solid overview of these and other endevors. In this and the next paragraph, we reiterate a number of these efforts to provide a sense for the scope of problems that have been addressed. Returning to discretized search paradigms, partitioning the environment and the effort expended into discrete components is a much more natural framework to model, among other things, scenarios where a quantized level of effort must expended each time a region is searched and the rewards for capture are discounted in time or search effort. More discussion of these problems, often referred to as sequential searching, can be found in [52], [68], [93], 194]. A recent trend in one-sided searching is to consider operations in physicallyexpansive environments. In these settings, streamlining the numerics is an important consideration to alleviate computational bottlenecks. Efforts in this vein are recounted in [82]. Other works have focused on search problems with false contacts [53], sensors that generate false positives [45] or, similarly, contain obstacles in the environment. In the optimal stopping problem, it is not known whether or not a target actually resides in the environment [51], [52]. Rather, after some amount of exploration, the agent must report (i) the target is not in Q, or (ii) the target is in Q alongside its most likely location. Of course, the agent may reply incorrectly. These problems introduce a new class of performance metrics, e.g., the probability the agent makes a correct decision about target's inclusion in Q. Dynamic formulations of this problem use observations to update target belief functions within a Bayesian setting 36 [34]. This latter class of problems results in feedback search plans that use recorded observations to guide future decision-making. Motivated by an interest in developing autonomous robotic search platforms, a number of recent efforts have considered cooperative, multi-agent formulations to solve variants of the probabilistic search problem. For example, [32] uses a Bayesian approach that allows a team of pursuers to quickly decide (though not necessarily correctly) whether or not a target is in the environment, and dynamic programming-based solutions to solve similar problems are discussed in [65]. However, as these methods often scale unfavorably with respect to key factors, e.g., the number of search agents or the workspace complexity, research efforts have also been directed at developing suboptimal algorithms with known, and acceptable, performance guarantees. 2.2.2 Search with a perfect sensor A sensor is perfect if it never registers false negatives. Occasionally, it makes sense to rule out the possibility of false positives as well. It is important to remember that perfect sensors are still assumed to have a finite sensing radius. If all targets are stationary, then there is no need for an agent equipped with a perfect sensor to scan any point more than once for targets. For this reason, perfect-sensor problems focus not only on finding targets, but doing so in the minimum expected time. Of course, if targets arrive dynamically, it is likely necessary to revisit points throughout the workspace. This section considers a collection of perfect-sensor, probabilistic search problems for static environments. The Cow-Path Problem, or simply CPP, was proposed, independently, by Beck [12] and Bellman [16] in the 1960s. As the reader may have surmised, the CPP has strong thematic connections with the encounters studied in this thesis. This is reflected, most notably, by the fact that the players or agents in our games are also cows. Since its inception, the CPP has become a canonical problem in the fields of probabilistic path-planning, robotics, and operations research. The exact statement of the CPP varies from field to field. For example, those in the online algorithms 37 community typically concentrate on a version that emphasizes search performance in the worst-case [351. In this thesis, however, we will be interested in the following stochastic version, which places a premium on performance in the average case. Definition 2.1 (The Cow-Path Problem). A hungry cow is positioned at the origin of a fence represented by the real line R. The cow knows that a patch of clover resides somewhere along R, but has only a prior, fT : R -- R>o, on its location. The cow can move at unit speed and reverse direction instantaneously. On account of being severely nearsighted, the cow can locate the target only when she is standing directly over it. How should the cow search to find the clover in the minimum expected time? fence Figure 2-2: Illustration of the key features of the stochastic or average case Cow-Path Problem. Starting from the origin, the cow explores R in search of clover T. A hypothetical search plan is shown in gray. In the instance depicted, the cow makes four turns before finding the target at the point marked with a red exclamation mark. In the CPP, the cow's sensory capabilities are represented by a sensor that has zero sensing radius, but perfect accuracy. Figure 2-2 provides a visualization of the CPP. In the decades since the CPP's inception, various analytic conditions necessary for search plan optimality have been reported, e.g., [9], [12], [13], [141, [15], [54]. Given these contributions, it is perhaps surprising that exact (analytic) solutions are known for only a handful of target distributions; specifically, the rectangular, triangular, and normal density functions [42]. For general distributions, the common approach remains to discretize the workspace and rely on dynamic-programming techniques. 38 Recently, a variant of the CPP, in which costs are levied against the cow each time she turns, was investigated in [361. 2.3 Pursuit-evasion games Pursuit-evasion games involve one or more pursuers trying to capture an evader. Typically, these games can be effectively classified according to which aspects of the pursuit receive the greatest emphasis, which, in turn, drive the mathematical techniques and solution methodologies employed. Unless otherwise stated, it is assumed throughout the section that both the pursuer and evader are mobile. The Princess-and-Monsterand Homicidal-Chauffeurgames [11], involve one player attempting to capture a more agile adversary. In these and other games where it is critical to model player dynamics, e.g., a finite-turning radius, acceleration constraints, etc., the interplay between pursuer and evader may be cast as a differential game. The player's objectives and dynamics are incorporated in the Hamilton-JacobiIsaacs equation, which, when solved, gives each player's equilibrium strategy in the form of a full-state, position, feedback control law. Unfortunately, as more pursuers are recruited for searching, or the workspace becomes more complex, e.g., obstacles or irregular perimeter boundaries are introduced, these methods quickly become intractable [33]. Differential games are sufficiently distinct from competitive search games that to avoid unnecessary digression, the interested reader is referred to [47], [48] for a detailed account of the subject. Combinatoric search games strip away details of the problem deemed superfluous for the application at hand. Instead, player motion and workspace geometry are modeled using simplified dynamics and basic topological structures. These abstractions permit an insightful high-level study of pursuit-evasion scenarios that would not otherwise be possible. Within this class, ambush games stress the need for algorithms that ensure the evader is captured, even under conditions that are highly unfavorable or statistically rare. In the cops-and-robbers game, a robber (evader) and one or more 39 cops (pursuers) take turns moving between vertices of a graph [3], [75]. The cops win the game whenever a cop and the robber are collocated at a vertex. When capture can always be avoided, by judicious play on the part of the robber, the robber wins. The cop number of a game is the minimum number of cops needed to ensure capture for any initial conditions of the game. Generally speaking, the bulk of research in this area is aimed at characterizing winning conditions, the cop-number, and monotonicity, i.e., the property that the number of safe vertices decreases in the number of cop moves, in relation to graph topology. For example, a famous result is that every planar graph has a cop number of three or less. Extensions of these ideas to games played on hypergraphs are discussed in [43] for the marshalls-and-robbersgame. Parson'sgame considers the problem of clearing a building (a graph with nodes and edges representing rooms and hallways, respectively) that has been infiltrated by an infinitely fast and agile trespasser [78], [79]. The problem is similar to the artgallery problem [76], except the super-human speed of the assailant requires rooms be swept in a manner that ensures previously cleared spaces are not recontaminated [78]. This is achieved by placing guards at topologically-inspired locations to ensure the perpetrator is confined to an ever diminishing region of the workspace. Research efforts have focused almost exclusively on the search-number, i.e., the minimum number of guards needed to locate the evader in the worst-case. Once again, much less effort has been devoted to developing search algorithms to carry out such a sweep, or to characterizing the time complexity of these schemes. The GRAPH-CLEAR problem is an extension of Parson's game in which multiple agents are, in general, required to guard doorways and sweep rooms [56]. Determining the minimum number of pursuers is known to be computationally challenging, but efficient algorithms have been reported for graphs possessing special structure, e.g., tree graphs [55], [57]. The lion-and-man game is thematically similar to cops-and-robbers, but unfolds in two-dimensions, typically a circle or a polygon. The hungry lion tries to eat the man, while, as one may have guessed, the man tries to escape. Conditions for the lion to capture the man are reported in [84] for a turn-based version of the game. 40 Extensions to higher-dimensions, most meaningfully R 3 , are reported in [4], [641. A robotics-inspired variant of the game, in which the pursuer has line-of-sight visibility, but cannot see through obstacles or boundaries is proposed in [44], [67]. Strategies for two lions to capture the man in any simple polygon and for three lions to capture a man in a two-dimensional polygon with obstacles are reported in [49] and [221, respectively. In many of these problems, randomized search schemes are employed to minimize the search number. Variations of the game involving finite sensing radii are reported in [4], [24]. The problem of coordinating multiple agents that each have a finite sensing radius remains an open problem [331, as does quantifying the time-complexity of possible capture schemes. The essential elements of Parson's game are extended to hyper-graphs as in [43], where they refer to the encounter as a marshalls-and-robbersgame. Here the graphtheoretic notion of tree width features prominently in analyzing the game. Establishing connection to other notions of graph theory, various complexity measures for pursuit-evasion games are characterized in [25], [83]. A more complete survey of pursuit evasion in graphs can be found in [6], [391, and more recently in [33] which originally pointed the author to these works. A second class of combinatoric search game concerns the exploits of a pursuer who tries to capture an evader in minimal time. The evader, not being one to go quietly, strives to delay capture for as long as possible [5], [42]. Owing to these competing objectives, optimal player strategies are best understood using the language and formalism of game theory. Equilibrium strategies, which are typically highly dependent on workspace geometry, are reported in [5] for games taking place in an assortment of environments, including line segments, specialized graphs, and compact regions of R2. Once again, the use of mixed strategies is critical to describing equilibrium play. Also in [5], the authors consider team search games in which multiple pursuers scour the environment in an effort to locate the evader quickly. In this respect, the game is played between the team of pursuers and the lone evader; individual pursuers have no preference for which of them, if any, ultimately succeeds in capturing the evader. 41 As in the problems we consider, the structure of player strategies in adversarial search games depends heavily on a number of key factors, including sequencing of player moves, e.g., simultaneous versus turn-based; the information made available to the players, e.g., finite versus infinite sensing radius; the geometry of the environment; and the number of pursuers, i.e., search agents, participating in the game. Depending on the specifics of the problem, solutions and efficient algorithms for computing them are either known, known to scale poorly with problem complexity, or remain open research problems [33]. It is useful to take stock of the body of work discussed thus far in the context of competitive search games. Recall that competitive search games emphasize the decision-making process of search agents that compete (rather than cooperate) to capture a target. In this setting, agents must individually design and execute search plans given only a prior on the target's position. This requirement recalls the basic premise of probabilistic search problems. However, due to inter-agent competition for targets, prudent search strategies must account for the presence of other agents in the workspace, and thus prove efficient in a game-theoretic sense. This latter point recounts the methodology used to analyze select pursuit-evasion games. The next section considers problems that unfold in dynamically changing environments. Although not directly search-related, these problems help to frame the work in Chapter 7, where targets are contested on an ongoing basis in a time-varying environment. 2.4 Persistent planning problems When an environment is dynamic or contains an inherent level of uncertainty, it is often necessary to employ agents that perform a task indefinitely. This requirement is present, for example, in surveillance and patrol applications. Here, agents must inspect or provide service to specific regions of the workspace on an ongoing basis [21]. Describing policies that realize this long-term functionality is best achieved by using iterative constructs and adopting an algorithmic perspective. This section 42 draws attention to research, much of which is quite recent, that stresses the need for precisely this type of persistent planning. In dynamic vehicle routing (DVR) problems, one or more service vehicles mobilize within a workspace to accomplish a task; usually to satisfy a set of demands that arrive dynamically at a time and location that are unknown in advance. However, unlike the search scenarios we consider, service vehicles in DVR problems are alerted to each new demand and its location at the time of the demand's arrival. To provide a high quality of service, agents adopt adaptive policies that reflect the current state of the system and, to some extent, a projection for how the set of outstanding demands will evolve [80]. A typical quality of service metric is the expected amount of time, in steadystate, that a demand spends in the system before receiving service 118]. In the dynamic traveling repairman problem (DTRP), vehicles are notified of each demand's location, provide on-site service, and the service times are modeled as random variables [191, [20]. More exotic variants of the DTRP framework that capture features such as customer impatience, various demand priority levels, and nonholonomic vehicular constraints are studied using queueing-based policies in [27]. A decentralized DTRP policy that performs optimally under light-load conditions is considered in [8]. More relevant to the research at hand, however, is the fact that the policy supports a game-theoretic interpretation in which a socially optimal Nash equilibrium exists. Service vehicles treat the centroid of all previously serviced demands as a home base, and return to wait at this location when the environment is void of outstanding demands. When demands do appear, the vehicles venture forth to provide service, adjusting their home base as necessary. In this case, the utility function of each vehicle places a premium on being much closer to a demand than the next closest vehicle in the workspace. The problem of finding a target that intermittently emits a low-power distress beacon is presented in [881. A vehicle with finite sensing radius is capable of locating the target only when it senses a signal that originated from within its sensing zone. The expected time for a vehicle to discover the target using periodic search paths 43 is analyzed. In [46], the persistent patrol problem assumes that targets enter the environment dynamically according to a renewal process, and the vehicle has a prior over each target's location. Persistent search plans are designed to, in steady-state, and as the sensing radius becomes small, locate targets in minimal expected time. Extensions to multi-robot persistent patrol scenarios are provided in [38] in the context of optimal foraging strategies. The main result is that the frequency of visits to a region is proportional to the third-root of the target density in that particular region to ensure targets are collected in a timely manner. The task of providing persistent surveillance to a finite set of feature points within a region is considered in [87]. The points in question could represent, for example, sites at which hazardous waste materials have accrued and must be collected for safe disposal. A periodic speed controller for a fixed vehicle route is designed to minimize the maximum steady-state accumulation of waste across the sites. A closely related problem is addressed in [86], where observation points are classified based on the required visitation frequency and a control policy is enacted that ensures each point is visited sufficiently often. Similar themes are explored in the context of perimeter patrol, coverage, and surveillance applications, e.g., [1], [2], [26], [29], [31], [81], [85]. The collection of persistent planning problems outlined above emphasize the need to continually revisit sites within the environment. This involves translating the operational objectives of the problem, in conjunction with the geometric and temporal parameters of the systems, into a visitation schedule for the various sites in the workspace. In Chapter 7, we consider games in which targets enter the environment dynamically. Here again, agent planning requires deciding how often to visit spe- cific points within the workspace. Of course, the competitive nature of our problem requires integrating these considerations with the recent travel histories of nearby search agents. 44 Chapter 3 Mathematical Preliminaries This chapter provides a focused summary of the main theoretical tools used in the thesis. Because we are interested in characterizing search strategies that find targets efficiently in the face of inter-agent competition, this amounts to a presentation of key game-theoretic ideas. As mentioned in the previous chapter, search theory draws upon a range of techniques, but has few transcendent results. Rather, contributions tend to be application-specific. In contrast, game theory has a number of core ideas that permeate the field and provide a framework to characterize and study a range of multi-agent encounters, search-related or otherwise. Here, we recount three such concepts: Nash equilibria, best-response strategies, and maximin strategies. A full discussion of these ideas may be found in any of the many textbooks on the subject, including [111, [41], [70], [771. Finally, a short example is provided to better illustrate each concept. Although well-established, these ideas, taken collectively, provide an able toolset for characterizing strategic play in competitive search games. 3.1 Game theory For the purpose of this thesis, a game constitutes an encounter involving n players in which the utility, i.e., well-being or payoff, of each player depends on the collective actions of all players. Game theory studies the decision-making process and ultimate 45 outcome of players in a game. In competitive search games, the number of targets a cow captures depends not only on her search strategy, but also the search strategy of each rival cow. Therefore, the scenarios of interest are games not only in the vernacular, but also in the game-theoretic sense. Moreover, game theory is the natural tool to describe and analyze these encounters. No concept is more central to gametheory than that of a Nash equilibrium, or simply NE, a strategy profile, i.e., a collection of player strategies, from which no player has a unilateral incentive to deviate. For our work, the closely related concept of an c-NE also proves useful [71], [72]. To this end, we present the following definition, which closely follows the presentation in [41]. Definition 3.1 (Pure Strategy E-Nash Equilibrium). Let g be a game with n players, V = {1, . . . , n}. Let Si be the strategy set of player i and S = Si x ... x Sn the set of strategy profiles. For s E S, let si be the strategy played by player i in s and s-. the strategy profile of all players other than i in s. Let U2(s) be the utility of player i under s. For e > 0, a strategy profile s* E S is a pure strategy c-Nash equilibrium of 9 if for all i E V, U (si , s_) + E > U (si, s*j) for all si E Si. (3.1) In words, (3.1) says that s* is an E-Nash equilibrium, or E-NE, if no player can unilaterally deviate from s* and improve her utility by more than c. In this thesis, we will be interested in cases where e is small. The more traditional notion of a Nash equilibrium (NE) corresponds to an E-NE with E = 0. In this way, it is useful to think of E in (3.1) as a buffer or measure of indifference, i.e., it is not worth going to the trouble of shifting strategies if the gains do not exceed E. In most games, it is assumed that player strategies are selected prior to the game and then revealed, simultaneously, when the game begins. Nevertheless, it is useful to have a notion for how player i would respond were she to know the strategy of each of her rivals beforehand. The following idea meets these requirements. 46 Definition 3.2 (Best Response Strategy). Let g be a game with n players. Retain all of the notation introduced in Definition 3.1. For i E V and s-i E Si, si E Si is a best-response of player i to s-i if U (si, sj) U (s', sj) for all s' E Si. (3.2) In other words, if player i were to (omnisciently) know her opponents would, collectively, play s_, she could maximize Uj by playing si. To capture the fact that it is possible for player i to have multiple best-responses for a given sj, the best-response relation of player i to s-i E S_, is defined as BR(sj) = {si E Si : Ui(si, sj) > Ui(s', sj) for all s' E Si}. (3.3) For game g, let SNE denote the set of Nash equilibria. Nash equilibria and bestresponse strategies are related by the following condition: s* E SNE <-> s E BRi(s*i), Vi E V. (3.4) This stipulation is consistent with the idea that, in any equilibrium, no player can unilaterally deviate from s* and increase her utility. Although strategies that comprise a NE profile are intuitively appealing, in some cases, it makes sense to focus on strategies that offer guarantees regardless of what strategies one's rivals play. The following idea serves this purpose. Definition 3.3 (Maximin Strategy). Let q be a game with n players. The strategy si E Si is a maximin strategy for player i if min Ui(si, s_j) > max min Ui(s', s_j). s-i ES-i Ss' Si s-jcEs-i (3.5) In other words, if player i was concerned that her fellow players may act (intentionally or otherwise) to minimize Uj, then playing a maximin strategy is the intelligent thing for her to do. In general, however, playing a maximin strategy may be overly 47 . Table 3.1: Utility payoffs for the simple two-agent search game in Example 3.4. The possible actions of agent 1 are displayed in the leftmost column. The possible actions of agent 2 are displayed along the top row. Given each agent's search strategy, the first and second entry in each cell represent the utility of agent 1 and agent 2, respectively. For example, if agent 1 searches using sl = a and agent 2 searches using s2 = b, then the probability of agent 1 finding the target is y5 and the probability of agent 2 finding the target is 5 P1\P2 a 1 a b 1 4 )' 1 3 2 5 5'5 2 1 3 1 3 5 1 3 1 1 3 1 47 676 C c b 2 conservative. In many cases, player utilities are at least partially aligned and player i may do herself a disservice by guarding against a worst-case scenario. Nevertheless, in certain scenarios, including zero-sum games, congestion games, and games where it is difficult to resolve utility payoffs for overly complex strategy profiles, playing a maximin strategy may be a level-headed and reasonable course of action. Example 3.4 (A Simple Search Game). To cement the ideas of a NE, bestresponse strategy, and maximin strategy, consider a very simple search game, g, involving two search agents (players). In !, a target arrives at a random location in the environment. The probability that an agent captures the target is a function of the agents' initial conditions and the search strategies used by each participant. For simplicity, assume the initial position of each agent is fixed and there are three abstract search strategies, denoted as a, b, and c, that each agent may choose from. Each agent's utility is the probability they succeed in capturing the target. Agent utilities for each combination of search strategies are shown in Table 3.1 By inspecting each combination of search strategies in Table 3.1, we see the profile 8* = (s1 , s2 ) = (b, c) is the only pure strategy NE of g. Under this search profile, the probability that agent 1 finds the target is two-thirds, and the probability that agent 2 finds the target if one-third. From here, if agent 1 deviates, she has at best a sixty percent chance of finding the target. Similarly, if agent 2 deviates she has at best a 48 twenty-five percent chance of finding the target. To illustrate the notion of best response, observe that if agent 2 were to play s2= a, then agent l's best response is to play s, = b, i.e., BI1Z(a) = {b}. Similarly, BR 2 (c) = {a, c}. Finally, consider that min U 1 (a, s 2 ) = (3.6) minl1(b, s 2 ) = (3.7) min U1 (c, s 2 ) = 1. (3.8) 822 S2 S2 Consequently, max minUi(si, s2 ) S1 S2 = 2 and agent I's maximin strategy is s = b, and by playing s, agent 1 has at least a two in three chance of finding the target. Following a similar line of reasoning for agent 2, we have (3.9) min U,(si, a) = Si min U1(s, b) Si (3.10) = (3.11) min Ui(si, c) = 1. S1 Consequently, max minU 2 (s 1 , 8 2 ) S2 81 = } agent 2's maximin strategy is s = c, and by playing s2 agent 2 guarantees she has at least a one in three chance of finding the target. Note that in this particular example, s* = (s , sl). In general, two-player, zero-sum games, i.e, games in which U1(s) + U2(s) is constant Vs E S, have the property that s* E SNE <-> si is a maximin strategy of player i, i = 1, 2. 49 50 Chapter 4 The Cow-Path Ring Problem This chapter considers a variant of the Cow-Path Problem that takes place on a ring, instead of the real line. It turns out that pitting two agents against one another in a contest to capture a target on the real line is not especially interesting. In this case, there is a unique and very simple equilibrium profile which is independent of each agent's initial position and the target distribution. We will discuss this strategy in its entirety in the next chapter. Interesting strategic play can, however, emerge when the venue shifts to a simple, closed curve, i.e., a ring or a perimeter. Rather than study search games on rings straight away, this chapter serves to broach the topic more gradually, by first considering how a single cow should search for a patch of clover on the ring. Here, the rationale is to first understand how the (intelligent) cow should search in a relaxed and solitary setting. The insight gleaned from this analysis will aid in understanding how the cow should, once a second hungry cow is added to the ring, search in a competitive environment. Equally important, the problem is, in its own right, an interesting variant of the Cow-Path Problem. 4.1 Introducing The Cow-Path Ring Problem The Cow-Path Ring Problem, or CPRP for short, was originally considered by the author in [89]. As the name suggests, the CPRP is closely affiliated with the CPP, 51 the key difference being that the venue, in which searching takes place, has shifted from the line to a ring. Because a ring has no endpoints, if and when to double-back during search is a particularly nebulous proposition for the cow in the CPRP. Whereas reversing directions is critical to formulating a search plan in the CPP, in the CPRP, it is possible to find the target by fixing a heading and exhaustively searching the ring. Of course, there is nothing to say this is an efficient method of finding clover for a given target distribution. Ultimately, this chapter is about understanding at which points on the ring to turn and when it is best to simply sweep any remaining territory. Formally, the stochastic or average case CPRP is defined as follows. Definition 4.1 (The Cow-Path Ring Problem). A hungry cow, C, is located at an origin point, q,, on the ring JZ. C knows that a patch of clover, W, is located somewhere on ring 9Z, but has only a prior, fT: 9Z --+ R>O, on Ts position. C can move at unit speed and can change directions instantaneously. Finally, on account of the clover's small footprint, C can discover 7 only when she is standing directly over it. How should C search to find the clover in the minimum expected time? A few comments are in order. First, as mentioned, the only difference between the CPRP and the CPP is the environment in which searching takes place. C's sensor is, once again, perfectly accurate, but has zero sensing radius. Second, in much of the analysis that follows, it will be assumed where 0 < f m in < fmax < 00. fY is bounded, i.e., f: [frnin, frnax], While it is perfectly acceptable to consider CPRPs for arbitrary densities fT, we will develop many of our results for bounded f'. Moving along, in the CPRP, C has the option, at any point in time, of fixing her heading, once and for all, and exhaustively searching T until she finds T. This approach has no direct analog in the CPP, where C must have a turning contingency or risk never finding the target. (In the event the target density is zero along one half-line of R, C's optimal search plan is to explore the other half-line until 7 is found.) To formally investigate the consequences of this new ability, we require a notational system that is both descriptive and concise. The next section provides these elements. For convenience, Table 4.1 provides a summary of the CPRP-specific notation used in this chapter. 52 4.2 CPRP Notation and Terminology The exploration rule C uses to search for 7 is referred to as a search plan. A generic search plan is represented by s and the set of all feasible search plans by S. To be feasible, s must specify a feasible location for C at every point in time. However, because C is interested in finding T quickly, we can safely assume she will always travel at her maximum speed, i.e., one. Search plan s may then be specified by listing the sequence of turning points, points on 'Z at which C is to reverse direction, i.e., S = {{qj,..., qn} :nEZ>o, i E'ZVi E I1, ...,n}}. In this system, should C reach qn, (4.1) i.e., her last scheduled turning point in s, having not found 7, she reverses direction one last time, before exhaustively searching for and eventually finding 2. Let Sn denote the set of all search plans that specify n E Zo turning points, with S = U _Sn the set of all search plans. We now provide a means to describe specific points on 'Z. The notation q E 'Z denotes that point q is on 'R. A specific q E 'R is referenced by the number in [0, 27r] that represents the counter-clockwise (ccw) distance from q, to q, where q, is Ci's position when the search begins at time zero. Owing to the cyclic nature of JZ, q may also be referenced by the number in [-27r, 0] whose magnitude represents the clockwise (cw) distance from q, to q. This circular parity affords some flexibility when referring to points on 'Z; for example, for q E [0, 27r], q, q - 27r, and q + 27r refer to the same point on 'R. We will frequently switch between these two labeling schemes, opting to employ whichever is most convenient at a given time. Although listing a sequence of turning points allows us to describe all search plans of interest, it makes sense to place some restrictions on turning points so as to proactively exclude search plans that are clearly suboptimal. As a first effort in this direction, we enforce the following condition on s E Sn: qi.qi+ 1 <0, for 1 <i <n-1. 53 (4.2) Symbol/Acronym Q Meaning/Definition the environment the ring with unit radius 7 the target (clover) qE Q f: Q [fmzn, fm.] Ci for i= 1, ... , n CPP CPRP CPRG a point in Q the target density function the i-th cow or cow i Cow-Path Problem Cow-Path Ring Problem Cow-Path Ring Game Table 4.1: Summary of general and CPRP-specific notation used in the thesis. In words, (4.2) says that after each turn, C must cross q, before making her next turn. With a little thought, it is clear that any search plan aiming to find Y quickly satisfies (4.2), since turning while still in previously explored territory is clearly wasteful. To illustrate the semantics of the notational system implied by (4.1) and (4.2), consider a search plan s = {qi, q 2 , .. . , qn} E Sn with qi > 0 and n even. In this case, the search for T would evolve as follows: qo -+ That is, starting from q, 2 c wsq3 -c' ... -+n e travels (27r +qn). -w (4.3) in the CCW direction toward qi. Should C reach qi having not found T, she reverses direction and travels toward q 2 . Should C reach q2 , still having not found 7, she again reverses direction and travels toward q3 , and so on. Finally, should C reach qn, i.e., her last scheduled turning point in s, with 7 still proving elusive, she reverses direction one last time before exhaustively searching R and, eventually, finding T. Similar arguments apply when n is odd, qi < 0, or both. The pertinent geometric features of the CPRP, along with a portion of a sample search plan, are illustrated in Figure 4-1. With a notational system for describing search plans in place, we are finally in a position to consider the performance of s E S. 54 Let TD(s) denote the time it q Figure 4-1: A visualization of the Cow-Path Ring Problem. The target density, f7, as a function of radial position q, is shown in blue. In the instance depicted, the target, 7, is located on the North-West portion of 'R. In the search plan shown, the cow (yellow triangle) travels in the ccw direction toward qi, where, having not found 7, she reverses direction, and travels in the cw direction toward q2. Upon reaching q2, having still not found 7, she again reverses direction and continues searching in the ccw direction until ultimately finding 7 at the site indicated with a red exclamation mark. takes for ( to discover T using s. Because the location of T is random, so too is TD(s). Consequently, we focus on E [TD(s)], the derivation of which is closely based on 1121. Assume, once again, that qi > 0 and n is even. By the end of the derivation, extensions to the cases where n is odd, qi < 0, or both will be obvious. To begin, note that E [TD(s)] can be broken up into two components: 1.) the amount of time C spends traveling from the origin to T on the excursion in which T is discovered and 2.) the amount of time C spends backtracking before finding T. Let these times, each a random variable, be denoted by TD,1(s) and TD,2(s), respectively. First, consider E [TD,1(s)]. Given n is even and qi > 0, it must be that q, < 0 and all points in [0, 27r + q,] C 'R will be visited, if required, for the first time when C is moving in the ccw direction. Similarly, all points in [q, 01 C 'Z will be visited, if required, for the first time when C is moving in the cw direction. Therefore, |qn| 27r+qn E [TD,1(s)] J qf'(q)dq + J q=O q=O 55 qfW(-q)dq. (4.4) Before considering E [TD,2(s)1, define, for any y, z E [-27r, 27r] such that yz < 0, F7(y, z) to be the probability that T is located along the arc of R that contains q, and has endpoints y and z, i.e., FW(y, z) = FT(z, y) = j fW(q)dq. (4.5) Hence, F7 may be thought of as a type of distribution function taken over arcs of 'R. Returning to E [TD,2(s)1, if 7 V [0, qi], which happens with probability 1 - F7(qi, 0), then C will reach qi having not found 7, and will subsequently have to travel back to qO, contributing 2q, to TD,2(s). Generalizing, if 7 is not on the arc containing q' and having endpoints qj_ 1 and qj, which happens with probability 1 - F7(qi, qji_), then C will reach qj having not found T and will have to head back to q empty-handed, contributing an additional 21qjI to TD,2(s). Putting all of these ideas together gives E [TD(s)] = E [TD,1(s)] + E [TD, 2 (s) 27r+q, |qnI qf'(q)dq + = (4.6) q=O J qf'(-q)dq + 2 q=O S qi(1 - F(qi, qji_)). (4.7) i=1 Let S* denote the set of optimal search plans, i.e., the set of search plans that minimize (4.7). Previously, it had been argued that s* E S* satisfies (4.2). Two additional conditions that s* satisfies are discussed in the following lemma. Lemma 4.2. Let s* = {q*q , q,. .,q*} E S* be an optimal search plan for the CPRP. s* satisfies the following conditions: for 1 < i <n - 2, (4.8) Jq*Il + lq*+1 I ! 7r, for 1 < i < n - 1. (4.9) jqfl < Mq+ 2 1, Proof. We address the veridity of (4.8) and (4.9) in turn. Assume, to obtain a contradiction, that s* violates (4.8). There must be an i E Z>0 such that qi+ 2 qZ in s*. Assuming FT(q;, qi+ 1 ) < 1, there is positive probability that C will have to 56 turn at both q* and qZ+ 1 . However, in traveling from qi+ 1 to q+ 2 , C fails to explore any new territory. Consequently, if T has not been found upon C reaching q+ 1, it is guaranteed to remain undiscovered upon C reaching qZ+ 2 . Therefore, by deleting qi'+1 and qg+ 2 from s*, it is possible to realize a strict reduction in E [TD], an improvement that contradicts the optimality of s*. It follows that (4.8) must hold. Although this analysis has been customized to a ring 'R, it is essentially the same as the associated proof for the Cow-Path Problem in 112]. To verify (4.9), assume, again to obtain a contradiction, that there exist an s* E S* that satisfies jq*1 + jq+ 1j > wr for some i E Z>o. In the event C turns at gl+1, she must then backtrack along a previously explored arc of 'R having length greater than 7r before exploring new territory. Conversely, by forgoing turning at qg 1 and simply maintaining course until T is discovered, C is guaranteed to find 7 in time less than 7r. Since this alternate strategy realizes a strict reduction in cost, we again have a result that contradicts the optimality of s*. It follows that (4.9) must hold. U In summary, (4.2), (4.8), and (4.9) state that an optimal search plan tasks C with alternatively exploring ever-expanding frontiers of 'Z, until such time as half of the ring has been explored, at which point C continues searching in her current direction until 7 is found. The following strategies are useful in establishing bounds on C's achievable performance. Definition 4.3 (no-turn search strategies). Let sew (respectively secC) denote the search plan in which e departs from %, and travels exclusively in the cw (ccw) direction. That is, {sce UsCCW} is the collection of search plans in which e has no builtin contingency for turning. Recall that these strategies have no sensible analogue in the CPP. Given C moves at unit speed, sew and secw immediately yield the following preliminary upperbound: E [TD(s*)] min (E [TD(scw)] , E [TD(sccw)]) 57 27r. (4.10) The proof of (4.10) follows from the simple fact that in the case of both sc, and seCC, C is guaranteed to find ' after sweeping 'R, and this sweep can be completed in time at most 27r. 4.3 The number of turns in the CPRP We now focus our attention to the task of characterizing the optimal number of turning points in the CPRP. Given a search plan s, let #s denote the number of turning points listed in s, i.e., the maximum number of times, nmax, that E could conceivably turn under s. For example, for s = (qi,... , qn), #s = n. Note #s is, by definition, a property of s, while #s* is a property of fT. In the CPP, it may be the case that the optimal search plan contains an infinite number of terms. For CPRPs with bounded f , the following result states that this is not the case. Proposition 4.1. Consider a CPRP with target density f 9Z -: [f in, frax] where 0 < fmin < finax < oo. Then #s* is finite. Before embarking on a proof, we comment on the boundedness assumption on f i.e., that fW(q) E [fmin, fmax] for q E 'R. In many applications, it is reasonable to bound the probability of finding T in any interval of finite length away from zero, e.g., when there is no location for which 7's absence can be guaranteed. Moreover, it is often the case that it is only over intervals of finite length that one can associate a positive probability to finding ', i.e., the support of fY is non-atomic. For the class of problems for which both of these conditions hold, the assumption that f : 'Z [fmin, fmax] is a reasonable one. Proof. The key idea of the proof is that by turning a sufficiently large number of times, the burden of backtracking becomes less appealing than simply using a no-turn strategy, e.g., scw. Assume, to obtain a contradiction, that for a given target density f : 'Z -- + [fmin, fmax], also true that #s* lq2n|+lq 2 n+ 1 = o. From (4.9), for any n E Z>o, Iq*I + Iq*+j < ir. It is < 7r. Therefore, with finite probability p > rfmin, T will 58 remain undiscovered by the time C reaches q2n+1. It follows that E [TD(s*)1 > pnq, and, for sufficiently large n, exceeds 27r, which violates (4.10). Yet, from earlier U discussion, (4.10) must hold. This contradiction proves the claim. 4.4 An iterative algorithm for s* The discussion to follow centers on determining the minimum number of turn-around points required to realize an optimal search plan, i.e., min{#s* : s* E S*}. To this end, with Sn = {s E S: #s = n}, let S, = {s E Sn : E [TD(s)] < E [TD(s') for all s' E Sn]}, (4.11) be the set of n-turn optimal search plans. For convenience, let n* = min{#s* : s* E S*}. The next result captures the intuitive idea that search plans employing more than n* turn-around points can recover optimal performance. Proposition 4.2. Consider a CPRP with target density fw : 'Jz 0 < fmin E [TD(s)] [fnin, frax] where fmax < 00. Let s be a search plan in S* with #s = n > n*. For s* E S*, = E [TD(s*)]. Proof. Intuitively, because s* is optimal, the need to specify additional turning points in s confers no advantage, i.e., we would expect E [TD(s)] E [TD(s*)]. For the time being, consider the case when #s = n* + 1. A sensible strategy for selecting s is to downplay the need to specify an extra turning point by ensuring it stands only an arbitrarily negligible chance of impacting E [TD(s)]. For example, for small E > 0, let s = {s*, qn*+1}, where for s' E S, q E 'Z, {s', q} is the concatenation resulting from adding q to the end of s', and qn*+1 = -27r - sgn(qn.) + qn. + E - sgn(qn*). 59 (4.12) e In words, (4.12) says that when using s, and should it prove necessary, turns for the last time just before she would have finished sweeping 'Z under s*. Consequently, TD(s) = TD(s*) only in the rare instances when 7 E (qn*, qn* Given fT(q) < fm . + E - sgn(qn.)) C 'R. and that we are free to take e arbitrarily small, lim P (Y E (qn*, qn. + E - sgn(qn.))) = 0, (4.13) implying limEo+ E [TD(s)] = E [TD(s*)]. For #s > n* + 2, we can again make similar arguments to ensure the extra turning points are largely superfluous. Namely, for k = 0, 1, . . . , #s - n*, selecting the extra turning points as qn* - 0.5' -E - sgn(qn.) ,k even (4.14) , n-k= -27r - sgn(qn.) + qn. + 0 .5k - E - sgn(qn*), k odd and taking the search strategy as s = (q*, q1,...,q*._ 1, / .... ,q ) ensures that - limE-o+ E [TD(s)] = E [TD(s*j The proof of Proposition 4.2 provides insight into how turning points beyond the necessary n* can be mitigated to ensure optimality. It also suggests the following straight-forward approach, provided below in Algorithm 4.1, to compute n*. A few comments are in order. First, s* in line 3 is found by minimizing (4.7) which, for a given n, is nonlinear in the decision variables qi, ... , qn. A natural question concerning Algorithm 4.1 is whether or not termination can be guaranteed. If it can, then both n* and, more importantly, again from line 3, s* E S* are known. Intuitively, using more turning points when searching cannot increase E [TD(s-)]. Along the lines of Proposition 4.2, it seems reasonable that E [TD(s*)] will decrease monotonically in n for n < n* and, based on Proposition 4.2, E [TD(s*)] = TE [TD(s*)] for n > n*. The following discussion provides further insight into the CPRP. 60 Algorithm 4.1: iterative algorithm for finding s* 2; 1 n 2 while true do <- 3 f ind s*; 4 if jqn-i1 + qnj r~ 27r then 5 n* = n-1; 6 break 7 else n<-n+1; 8 9 return s* o < fmin fmax that satisfies |q*|I < 0'o. Suppose an s*= {q*,...,q*} :R -+ e S* [fmin, f m ax] where has a first turning point m. Then the optimal number of turns satisfies n* < Lm I . Lemma 4.4. Consider a CPRP with target density f Proof. Assume the clause of the lemma is true, i.e., there exists an s* = {q*,... , q* }E S*, with Iq*' > m. Let A be the event that, under s*, C has to turn n* times before finding W, i.e., C finds J having made all of her scheduled turns. From (4.9) and the optimality of s*, C will make no further turns once the explored segment of 'z reaches length 7r. Given fT (q) > fmin Vq E 'Z, it follows that P (A) 7rfmin. Assuming A occurs, C will have to travel a distance no less than m between each successive turn. Conditioning on A, the last result implies that E [TD(s*)] > E [TD(s* A)] (A) > (mn*)(7wfmin) (4.16) = 7rmfminn*. Given E [TD(sc)] (4.15) 27r, the optimality of s* requires that rmfminn* < E [TD(s*)] < E [TD(sc)] 61 27r. (4.17) Working from the left and rightmost terms of (4.17) and solving for n* subject to the constraint that n* E Z>o gives the required result. M A key component to the development of Lemma 4.4 was the fact the first turning point was at least distance m > 0 away from q, i.e., the distance D(q0 , q1) > m. The following lemma provides a bound on m as a function of the variation in fT along 'z. Lemma 4.5. Consider a CPRP with target density fT : 'Z -+ o < frin < fmax < oo. Let s* = [fmin, fmax], where {q*, q*,...} be an optimal search plan. Then at least one of the following cases is true q*I 1 -4fmax 1 Jq*1 >! 2fmax 1 or (4.18) (4.19) Proof. The only terms that involve qi in (4.7) originate within the summation and are associated with doubling back. Without loss of generality, we may assume that q* > 0 and, therefore, q* < 0. In this case, the terms involving q* in E [TD(s)] are 2q*(1 - FT(q*, 0)) and 21 q 1(1 - FT(q*, q*)). Because s* is optimal, there is no advantage to making small adjustments in q*, such that d(E [TD(s)]) = 1 - FX(q*, 0) - q*fT(q*) + q*f7(q*) = where (4.20) makes use of the fact that q* < 0 => Jq*1 = -q;. [fMin, fmax], it follows that FT(q*, 0) 5 fmaxq* and f (q*) < fmax. 0, (4.20) Given f'(q) E Applying these bounds to (4.20) gives 2fmaxq* > 1 + fmaxq*. (4.21) With q* < 0, 1+ fmaxq* < 1. Now, if 1+ fmaxq ;> 1, then (4.21) immediately gives (4.18). Otherwise, 1 + f m axq* < 1, which implies (4.19). The preceding result says that by the end of the second turn, C has ventured a distance at least 1 4fmnax from q. Second, both (4.18) and (4.19) depend on fmax, 62 but not on fmin. Intuitively, larger fmax values are indicative of f7 exhibiting more region-specific characteristics over 'R, which may encourage more aggressive turning early in the search. Finally, Lemma 4.4 came with the stipulation that jq*1 > m, for positive m. Clearly, if (4.18) holds, m may be taken as 1/(4fnax). In the event, (4.19) holds, then C must travel a distance q* after each of at least n* - 1 of her turns. In this case, the proof of Lemma 4.4 can be easily tailored to develop a similar upperbound on n*. Moreover, the derivation of m suggests that it may be possible to use similar techniques to bound fq2 J, Jq 3 J, etc. The following theorem pairs Lemma 4.4 with Lemma 4.5 to establish an upperbound on n*. Theorem 4.6. Consider a CPRP with target density fi :Z -+ 0 < fmin <; fmax [fm in, fmax], where < oc. The optimal number of turning points satisfies n* < nub = L8faxJ (4.22) Proof. The result follows from successive application of Lemmas 4.4 and 4.5. Assume n* < [8fmax fmi. 1 . Then applying Lemma 4.4 for m = gives . Otherwise, from Lemma 4.4, (4.19) must hold such that jqj1 ;> 1 . (4.18) holds such that Jq*1 > 2fmax Applying Lemma 2, this time for q*, and remembering to count the first turn gives n* < [ 4 max ] +1. Lfi. Moreover, because fmax > fmin, we have that [4fmax fmin J +1 <[ 8f m ax fmin and n* < [8rmax I ~ min A few remarks are in order. First, the bound on n* is proportional to the ratio fmax/fmin. An fW with one or more peaks will have a higher fmax/fmin than an fc possessing little variation, i.e., is more uniform, over 'Z. Peaks in f7 make it easier to justify turning around and, in this respect, the proportional dependence on fmax/fmin is in line with our intuition. Second, the bound in Theorem 4.6 is, at least in certain cases, rather conservative. To see why, consider that when fix is uniform, i.e., fmin = fmax = I, the only optimal search plans are se, and sc,. Hence, for this case, although it is clear that n* = 1, the bound in Theorem 4.6 can only assert that 63 the maximum number of turning times satisfies n* < 8. 4.5 A direct algorithm for finding s* The iterative nature of Algorithm 4.1 materialized as a consequence of not knowing whether or not the luxury of an extra turning point could reduce E [TD(s)]. Given the upperbound in Theorem 4.6, we can approach the CPRP from a more informed vantage point. Specifically, we now know that C requires no more nub turns to efficiently find W. Algorithm 4.2, provided below, combines the main results of Proposition 4.2 and Theorem 4.6 to provide a direct means of computing s*. It is prudent to comment on a couple of the finer points of the algorithm. First, minimizing (4.7) in line 2 involves solving a nonlinear, non-convex optimization problem in the decision variables qi, q2 , .. ., qn.b subject to constraints (4.2), (4.8), and (4.9). Consequently, the optimization is potentially challenging. Second, the pruning in lines 3 and 4 aims to counteract the potential looseness of the upperbound on n* by removing superfluous turning points, yielding a concise search plan that finds 7 in the minimum expected time. Algorithm 4.2: direct algorithm for finding s* 1 let nub equal the upperbound in Theorem 4.6; 4.6 that minimizes (4.7); 2 S* find an s*E nub G Sub 3 let n* = min{n E 1,.u.. ,u: 4 let s* equal the first n* turning points in sn JqnI + qn+1 ~~2l; Summary of the CPRP To summarize, the CPRP considered how a cow should search, unencumbered, to find a target on a ring in minimum expected time. We showed that her optimal search plan satisfies (4.2), (4.8), and (4.9). Moreover, this plan requires her to turn around at no more than nub points. The exact location of the turn-around points may be 64 determined, as a function of decision-variables qi, q2 , ... f , qne , by solving a nonlinear optimization problem in the and applying the pruning measures outlined in Algo- rithm 4.2. Alternatively, an optimal search plan may be found, as in Algorithm 4.1, by finding the optimal search plan associated with a fixed number of turning points, and increasing this number by one until no further reductions in the discovery time are possible. We remark that the CPP is infinite-dimensional and obtaining numeric solutions typically involves using dynamic programming to solve a discretized version of the problem. In contrast, the cyclic topology of 'Z, coupled with the boundedness assumption fm in < f'(q) fmax for all q - 'R, permitted us to bound the maximum number of turns required to solve the CPRP in its native continuous domain using nonlinear optimization techniques. In the next chapter, a second hungry cow is added to 'Z and the search for clover becomes competitive in a game-theoretic sense. 65 66 Chapter 5 The Cow-Path Ring Game This chapter considers the problem in which two or more hungry cows compete to find a clover patch located somewhere on 'Z. Because many elements of this scenario draw heavily from the CPRP, we refer to it as the Cow-Path-Ring Game, or simply the CPRG. Introducing and analyzing this problem is the core contribution of the thesis. Moreover, the subsequent chapters draw heavily from this problem and consider variations of the standard framework. Accordingly, we invest in describing the particulars of the encounter and formulating the problem before considering effective search strategies for the cows to use. 5.1 Adding a second cow to the ring Adding a second hungry cow to 'Z calls for a number of notational extensions and operational clarifications. This section provides the necessary additions. First, the clover is once again referred to as a target and denoted by 'T. The cows contesting T are identified by an index, 1, ... given by C = {C1, C2, ... , , n, with cow i denoted as Ci. The set of all cows is C} and the set of all cows excluding Ci by e-i := C \ ei. The movement and sensing capabilities of Ci in the CPRG carry over directly from the CPRP. The position of Ci at time t > 0 is denoted by qi(t) E 'Z and the trajectory of ei over [0, t] by qi,t : R[o,t - 'Z, such that qi,t(T) = qi(r), Vr E [0, t]. As in the 67 CPRP, ei has a prior fT: 'R - R>O on T's location, but she does not know T's exact position. Note that, for the time being, it is assumed each cow has the same prior on W. In general, how Ci searches for 7 in the CPRG will reflect an awareness of the other hungry cows on 'R. To emphasize the strategic nature of the encounter, the approach that Ci uses to search for 7 will be referred to as her search strategy. Ci's search strategy is denoted by si and the set of all possible search strategies at her disposal by Si. The collection of search strategies in a game is given by the search profile s E S = Si x ... x S,. Given this chapter and all future discussions focus on CPRGs, we trust there will be no confusion between this convention and the similar notation used to describe search plans for the CPRP in Chapter 4. Naturally, how Ci should search when competing to find 7 depends heavily on her awareness of the environment. As with the CPRP, this includes the prior information, e.g., f , she has on the clover's location. However, Ci's search strategy now also de- pends on the knowledge she has regarding the recent whereabouts of rival cows. The level of awareness a cow has regarding these elements is characterized by her information model of the game. Given the pronounced role this knowledge has in shaping search strategies, the following section is devoted to specifying the information model of each cow in the CPRG. 5.2 A model for informed cows In a general n-player game, player i's information model specifies the information available to her throughout the course of the game. In a continuous-time game, player i has a closed-loop or feedback information model if, at each instant, t > 0, she has complete knowledge of all previous actions taken by every other player [11]. Conversely, player i has an open-loop information model if, at each instant, t > 0, she has no knowledge about the previous actions of any other player. In a feedback game, each player has a feedback information model. Likewise, in an open-loop game, each player has an open-loop information model. 68 The CPRG we will consider is a continuous-time, closed-loop game. Denoting ei's information model by Ii, the latter point is emphasized by writing Ii = i = 1, ... , 1 ci, n. Because the CPRG centers on searching a continuous workspace and evolves in continuous time, it is worth refining the generic notion of a closed-loop game to more closely reflect the situation at hand. First, stemming from Ii = each t > 0, and for each Ci E C, Ci knows qjt for all e3 E C \ {Ci}. ci, for Of course, it is assumed that Ci also knows her own trajectory, i.e., di,t. Let F(t) := i E C} (5.1) denote the set of all cow trajectories taken up to time t > 0. Then, in the CPRG, Ci's knowledge at any instant t > 0 is comprised of the target prior and the previous and current position of all cows, i.e., Ii(t) = (f , IF(t)). The feedback nature of the CPRG allows Ci to use high-performance search strategies capable of responding to the actions of her rivals in realtime. That is, with Q = {turn, straight}, or equivalently {cw, ccw}, the set of steering commands available to Ci, e's search strategies in the CPRG take the form of mappings from Ii(t) to Q, i.e., Si = {si : si(f',IF(t)) -+ Q, t > 0}. (5.2) Conversely, in an open-loop CPRG, for each instant t > 0, and for each Ci E C, Ci has no knowledge of qj (t) for any Cj E C \ {C, }. An open-loop model effectively corresponds to cows that are extremely nearsighted, such that C2 can gather knowledge of Cj's position, j 4 i, only in the event Ci and C3 collide while searching. Unsurprisingly, the restricted sensory infrastructure of these encounters necessitates much of the planning be done offline in advance. A treatment of open-loop CPRGs can be found in [96]. The results reported therein chronicle an independent analysis of open-loop Cow-Path Games. The nature of the results reported therein stand in stark contrast to and thus complement the contributions of this thesis, underlining the defining role Ti(t) has in shaping not only the search strategies cows ought to use, but also the analytic techniques used to study these games. 69 Symbol/Acronym Si S S = Si x - x Sn qit Zi (s) Ui(s) ; s* = (s*, s, ... , s*) Meaning/Definition the search strategy of Ci the set of all of Ci's search strategies the set of search strategy profiles trajectory of Ci over [0, t] the landclaim of Ci under s E S the utility of Ci under s E S information model of Ci an equilibrium search profile in S Table 5.1: Summary of CPRG-specific Notation Defining the Cow-Path Ring Game 5.3 In this section, we formally define the CPRG. The preceding sections have provided much of the machinery needed to discuss competitive search on R. However, we have yet to update one very important detail: the objective of Ci. A game featuring n cows, but just a single target, necessitates a reexamining of bovine logic. To this end, we assume that, although eC would prefer to discover T early in the encounter, rather than late in the encounter, it is more important, given the scarcity of clover, that she be the first to find it. In other words, Ci searches to maximize her chance of finding the clover, irrespective of at what time in the game capture may occur. Although this thesis will focus primarily on CPRGs featuring two cows, in the interest of generality, we provide a formal statement of the n-cow CPRG. Definition 5.1 (Cow-Path Ring Game). Consider n hungry cows, {1, ei for i E ... , n}, initially positioned at qi(0) E 9, respectively. Each Ci knows that a clover patch, 7, is located somewhere on 9Z, but has only a prior f : 9 -- R>0 (the same for each cow) on T's position. Each C2 can move at unit speed and change directions instantaneously. On account of Ts small footprint, C2 can discover T only when standing directly over it. Finally, Ci has a closed-loop information model, i.e., Ii = cI, allowing her to track the movement of her rivals in realtime. What search strategy should C2 use to maximize her probability of finding W? 70 A few comments are in order. First, Definition 5.1 is presented in a conversational tone. This approach was favored to retain ties with the CPP and elucidate the simplicity of the formulation. Alternatively, the CPRG may be expressed more compactly by the tuple gCPRG (Ci, JR, Ti, Si Ui)i= 1 ,...,. Here, the meaning of each component is to be understood from previous discussion. In later chapters, where the formulations become more involved, we will rely on tuple notation to streamline the presentation. Second, we reiterate that CPRG is a game, not only in the vernacular, but, more importantly, in the game-theoretic sense. Accordingly, we will use game theory to develop strategic algorithms the cows may use to search for 'T. Third, in any instance of the game, nature determines T's location and, in this way, contributes to the outcome of the game. Table 5.1 provides a summary of CPRG notation used to describe the CPRG. 5.4 A remark about Cow-Path games on the line Given the CPRG originated as an adversarial re-imagining of the CPP, it is worth commenting on our rationale for studying encounters on a ring, as opposed to a line. The primary reason for this selection follows from the fact that the majority of the games studied in this thesis involve two cows. As it turns out, a contest between two cows to find a target on a line (finite, i.e., R[a,b], or infinite, i.e., R) admits a rather uninteresting NE. Specifically, the unique NE of the game takes the form (9, 9), where the functionality of 9 is described in Algorithm 5.1. Algorithm 5.1: functionality of si = s input: qi(O) and q2(0) i align to face e_i; 2 while qi(t) / q-i(t) and qi(t) 3 L $ go straight; 4 turn around; 5 go straight; 71 q-i(0) do Figure 5-1 illustrates how a competitive search game between two cows on a line would resolve under (9, 9). Justification that ( , s) E SNE follows from the fact that any deviation on the part of Ci must involve surrendering the privilege to visit territory first or attempting to switch the relative orientation of the cows on the line, i.e., the leftmost cow becomes the rightmost cow. Neither of these options can ever be appealing to both C2 and realizable under s-i = 9. What is uneventful about is that it is entirely independent of f7 and only loosely dependent on qi (0) and q 2 (0). Based on these truths, we chose to study the CPRG because we felt 'Z was the simplest topological workspace that afforded interesting options for strategic play. IR Figure 5-1: A visualization of the Cow-Path Line Game. The target density, f7, is shown in blue. The unique equilibrium search strategy of each cow, s* = (9, 9), is indicated by a directed gray line. Under s*, Ci heads toward C-i and, just before meeting, reverses direction and visits any previously unexplored territory. 5.5 CPRG-specific notation and terminology We assume that the CPRG ends when each (intelligent) cow can deduce that 'R has been completely explored and, thus, W has been found. In other words, we are interested in the behavior of the cows, only up to and including the time that T is captured. The notation Ci <- 7 denotes the event that C2 finds T. Recalling the definition of a NE provided in Chapter 3, let SNE denote the set of NE search profiles and s* E SNE a NE. Figure 5-2, provided below, gives a visual representation of the CPRG. 72 q q =0 C2 Figure 5-2: An instance of the CPRG illustrating the initial positions and initial headings of cows C1 and C2 . The trajectories of both cows, right up to the point of capture, are shown in dark gray. The target density f achieves a global maximum in [-7r/4, 0]. In the instance shown, T is located along the North-West portion of 'R. The site at which T is found, in this case by C 2 , is indicated with a red exclamation mark. The following concepts relate to the territory on 9Z that Ci is the first cow to search, which, in turn, directly affects C2's chances of finding T. Specifically, define the landclaim of C, under s E S as i(s) := {q E 'R: C, is first to visit q under s}. (5.3) The utility that Ci derives from s may then be expressed as lUi(s) = P (C <-- T under s) = Jf(q)dq. (5.4) qE Li(s) Intuitively, effective search strategies will, as a function of the initial conditions, guide Ci toward unexplored regions of 'Z where f is large. Of course, they will also factor in where rival cows are located. The notation and terminology introduced thus far will allow us to speak concisely and definitively when analyzing competitive search games on rings. 73 Search strategies in the CPRG 5.6 Having defined the CPRG and introduced a suite of notation and terminology, we are finally in a position to consider search strategies for the cows. We begin by qualitatively describing a collection of possible maneuvers. that may be integrated into the search activities of Ci. In so doing, the intent is to provide evidence for why CPRGs are challenging objects of study and use the insight gained to guide our . analysis. In this respect, we will focus on CPRGs featuring two cows: C and C 2 To gain an appreciation for the types of complex decisions C1 and C 2 face in the CPRG, we return to the scenario in Figure 5-2. In particular, focus on C 1 and the are ' C 9Z described by 9Z' = {q E 9 : -ir/4 < q K 0}. Clearly, f7 achieves its global maximum on 9Z' and 9Z' is a region each cow aspires to visit first. We are tempted to ask, should C1 explore this region immediately, or is she better off to set it aside for "safe keeping" and return to it later? The answer, of course, is highly contingent on ( 2 's strategy. For example, C 2 could threaten, and perhaps occasionally follow through with, raids into territories that C1 values highly, and has her own aspirations of searching first, e.g., 9'. In the next section, we consider a restricted version of the CPRG that is more amenable to a first analysis of the underlying strategic dynamics at play. 5.7 The one-turn, two-cow CPRG This section considers the class of feedback CPRGs in which each cow may turn at most once. At first glance, this stipulation may appear overly restrictive. It turns out, however, that there is much to be gained by analyzing precisely this class of game. Moreover, in the next section, we will consider games in which the cows may turn up to a finite number of times. The exposition of optimal strategies in these finite-turn games is greatly expedited by the forthcoming analysis of one-turn games. Formally, the feedback, one-turn, two-cow CPRG or simply the 1T-CPRG is defined 74 as follows. Definition 5.2 (One-Turn Cow-Path Ring Game). The 1T-CPRG is a special case of the CPRG that features the following amendments: . (1) n = 2, i.e., the game involves two cows, referred to as C1 and C 2 (2) define <D (3) {cw, ccw} to be the set of directions in which a cow can travel on = #i(O) E (b. 'Z. Ci, i = eC, 1, 2, may turn at most once during the game, where a turn consists of i = 1, 2, has initial heading Ci changing her heading from cw to ccw or from ccw to cw. In analyzing the 1T-CPRG, it proves advantageous to work in terms of turning times, rather than turn-around points. Given that it can be assumed the cows travel at unit speed, each of these quantities may be readily determined from the other given the initial condition qi(0). Sreac, Two specific player strategies, denoted spass and feature prominently in the analysis that follows. We describe the finer points of each at this time. Definition 5.3 (always turn passive search strategies). Let Si,pass C S denote the set of strategies that mandate Ci turn around at a specific time, irrespective of Ci's past positions and current location, i.e., independent of 4-i,t. Specifically, Sipass = {s E Si : s = pass(t) for some t E [0, 27r]}, (5.5) where (si = pass(t)) = Ci turn around at time t, irrespective q ,t. (5.6) In (5.6), pass(t) reads "always turn at time t". The stipulation that t E [0, 27r] for pass(t) E Sj,pass stems from Ci's selfish nature; namely, she has no good reason for turning around if she has already circumnavigated 'Z. We remark that within the confines of this notational system, pass(27r) is the strategy in which Ci never turns. In this case, Ci's initial heading, Oi(0), determines the direction she travels around '. 75 While strategies in Spass essentially ignore the actions of Ci, the strategies in Si,reac make use of the game theoretic concept of best response. The specification of Si,reac requires a few additional notions. Let I>_j(sj, sj) be the indicator function defined by ffi>-i(si, S-0) = 1 {0 s = (si, si) if Ce turns no sooner than Ci under otherwise. Also, let SR,, (si) = {s E Si : fif_(s, sj) = 1}. For the CPRG, define the follower best-response of ei to s-i E S-i as : Ui(s, sj) > Ui(s', sj) Vs' E Sjj>_j}. BRt(s-i) = {s E Sj (5.7) Definition 5.4 (eventually turn reactive search strategies). Let Si,reac C Si denote the set of reactive strategies in which but may turn earlier depending on ei turns by no later than a specific time, ei's actions, i.e., dependent on q-i,t. Specifically, Si,reac = {s E S : s = reac(t) for some t E [0, 27r]}, where (si = reac(t)) = {Z (- turns at BRj r) un Ii turns at t (5.8) Ci turns at r < t otherwise. In (5.8), reac(t) reads "eventually turn by time t". Having defined Spass and Sreac, we are now in a position to begin characterizing an equilibrium of the 1T-CPRG. To begin, consider the profile s = (Si, s2) where si = pass(27r) and S2 = pass(27r), i.e., the profile in which neither cow turns. If neither cow can improve her utility by more than c by unilaterally deviating from s, then s is an -NE of the 1T-CPRG. On the other hand, if one of the cows, say Ci, has an incentive to deviate, then Ci must favor turning at some time t E [0, 27r), to her current approach. Moreover, if eCcould guarantee that she will turn no later than Cj, i.e., that ILf>j = 1, then her optimal 76 turning time would be given by ti,1 = min arg max ii ti E [0,27r] pass(ti), arg max U-i(pass(ti), pass(t-j)) ti <ti < 2/ , (5.9) and Ci's follower best-response would be to turn at time t-i,1 = min arg Note that knowledge of f max ti,16 t-i <27r U-i(pass(ti,1 ), pass(t-j)). (5.10) and the assumption cows are intelligent permit Ci and C-_ to compute (5.9) and (5.10), respectively. The feedback nature of the game, i.e., 2i = Ii, i = 1, 2, allows Ci to implement the desired search strategies. For s E S, 'T will eventually be found by a cow, implying U 1 (s) + U2 (s) = 1. For this reason, the 1T-CPRG is said to be a constant sum game such that if shifting strategy profiles, say from s to s', causes Uj to increase by u > 0, then it necessarily also causes U-i to decrease by u. From this perspective, (5.9) is reminiscent of Ci using a maximin strategy: an approach aimed at safeguarding her utility from Ci. (5.9) is, however, not a true maximin strategy because the second argmax is taken over a restricted set of times, namely those that ensure Ci turns no sooner than Ci, rather than being taken over the set of all turning times, i.e., [0, 27r]. Although (5.9) ensures Ci is doing the best she can, given the privilege of turning no later than her rival, to find 7, there remains the question of whether or not C-_ would be content turning second. If the answer is yes, then s = (si, s-i) = (reac(ti, 1), reac(27r)) (5.11) is an E-NE. If the answer is no, then Ci's optimal deviation is to turn at time t-i,2= min argmax U-i ( argmax Uj (pass(ti), pass(t-j)) , pass(t-i)), 0ytiti 1 t_iitit2ne (5.12) yielding the profile (reac(ti,1), reac(t-i,2)), in which Ci will turn, no sooner than C_j, 77 at a time given by (5.13) ti,2= min argmax Ui (pass(ti), pass(t-i, 2 )). t-i,2<ti<27r We are then left to ponder if ei is content turning no sooner than ei in the profile (reac(ti, 1 ), reac(t-i, 2 )), or if she is partial to a unilateral deviation in which she would turn first, and so on. To this end, let T = {tii, t-i,2 , ti,3 , ...} be the sequence of first turning times generated by this procession of one-upmanship. As a notational shorthand, we define H_, (t-i) = U-i(argmax U7(pass(t'), pass(t-i)) , pass(tLi)). (5.14) t_i<t'<27r Algorithm 5.1, provided below, keeps track of T and the strategies the cows adopt while jockeying for position using this back-and-forth mechanism in the 1T-CPRG. We now describe properties of Algorithm 5.1 that speak to two important issues: 1.) the algorithm's termination, and 2.) which cow or cows, if any, have an incentive to unilaterally deviate from the strategy profiles prescribed at various stages. Proposition 5.1. Consider applying Algorithm 5.2 to an instance of the 1T-CPRG. Let {a, b, c} be a subsequence of three consecutive turning times in T. Assume s1 = reac(a) and s2 = reac(b) are two of the associated strategies prescribed by the algorithm. Then C1 is the only cow with a unilateral incentive to deviate from the search profile s = (Si,52), and her only profitable deviations involve preemptively turning no - later than C 2 Proof. To begin, the logic of Algorithm 5.2 implies that T is non-increasing and a > b > c. s = (Si, Now consider the strategies s, = reac(a), S2 = reac(b), and profile s2). Note that in s, C2 turns no later than C1. Because c immediately follows b in T, the clause on line 13 of Algorithm 5.2 must fail for s, indicating C1 can increase her utility (by more than e) by playing reac(c) instead of reac(a), such that C1 turns no later than C2 in (reac(c), reac(b)) E S. To establish that C1 must turn no later 78 than e2 to improve her utility, note that in s, C1 is already playing her best response to C2 turning at time b, implying the absence of any profitable unilateral deviations in which C1 remains the second cow to turn. Algorithm 5.2: determine E-NE search profile 1 si = pass(27r), s2 pass(27r); = 2 if (Si, S 2 ) E SE-NE then 3 4 L break i <- index of cow with incentive to deviate from (si, 82); 5 T = 0; 6 k = 1; ti,k min argmax U (pass(ti), argmax U-(pass(tj), pass(t-j))); = ti <t-i <27r ti E [0,27r] 8 si = reac(ti,k); tki, = min argmax U-j(pass(ti,k), pass(t-j)); ,kt 9 _i 2,r 10 s-i = reac(t-i,k); 11 T 12 while no c-NE established do 13 {ti,k}; <- if U-i(pass(ti,k), pass(t-i,k)) + E > K break 14 15 16 else 16 t-i,k+1 <- min argmax H-i(t-i, ti,k); <-t-i! ti,k 17 s-i +- reac(t-i,k); 18 ti,k+1 19 k <- 20 T +- 21 i -- ti,k; k + 1; {T, t-i,k}; <- -i; return profile (si, S2 ) 22 max O t-i:5ti,k 79 H-i(t-i, ti,k) then Now consider C 2 , the first cow to turn in s. Using similar reasoning, we conclude that since b immediately follows a in T, C 2 must prefer turning no later than e1 , at or before time a, rather than responding to C1 turning at a. Therefore, C 2 has no incentive to deviate from s to a strategy in which she responds to C1 turning at time a. Furthermore, since adoption of the strategy s2 = reac(b) was selected using the assignment in lines 16 and 17 of Algorithm 5.2, C 2 selects her turning time optimally over [0, a], implying there are no profitable deviations that involve turning no later than C1 in [0, a]. We conclude that C1 is the only cow with a unilateral incentive to deviate from s, and any profitable deviations involve C1 turning no later than C 2 . E From Proposition 5.1, Algorithm 5.2 assigns si and s 2 such that it is only ever a cow that turns no sooner than her rival in (si, s2) that, by preferring to turn no later than her rival, could have a unilateral incentive to deviate. This realization begs the question, can this succession of one-upmanship continue indefinitely? The following proposition asserts that, for a large class of target densities, the answer is no. Proposition 5.2. Let fT be a bounded target density satisfying f (q) M for finite M > 0, for all q E 'Z. For any combination of initial cow positions qi(0) and initial cow headings #i(0), i = 1, 2, and c > 0, T is finite, i.e., ITI is finite. Proof. Assume, to obtain a contradiction, that there exists initial positions and headings, qi(0) and #j(0), i = 1, 2, respectively, such that Algorithm 5.2 fails to terminate, i.e., T is infinite. Let T = {t, t 2 , t 3 , . .. . From Algorithm 5.2, T is a non-increasing sequence. Moreover, because, by the rules of the game, the cows cannot turn before time zero (when the game starts), T is non-negative. It follows that T must approach a limiting value v > 0, and for any 6 > 0, there exists a sufficiently large no(6) E N such that 0 tn - n+1 J for all n > no(3). For J > 0, let a > b > c be three 6 consecutive elements of T such that 0 < a - b < 6 and 0 < b - c < 6. From the assumption that T is infinite, such times are guaranteed to exist. 80 Now consider the following search profiles: si = (reac(a),reac(b)), (5.15) 82 = (pass(a),reac(b)), (5.16) (reac(c), reac(b)), (5.17) (reac(c), pass(b)). (5.18) 83 = 84 = In si, C1 best responds to C2 turning first at time b. U1 (s 2 ), Since a > b, 1i(si) > because the option of turning at time a is considered when best responding to s2 = reac(b). Moreover, because c immediately follows b in T, it must be that U1 (s3) > - 1(si) + c, and, subsequently, that search profile 84, 1i(S3) > U1-(82) + E. Now consider the in which C 2 responds to C1 turning first at time c, by turning at time b. However, because C 2 best responds to C1 turning first at time c in s3, we have that U2(S3) > (5.19) 12(84). Since the game is zero-sum, the inequality chain implies that (5.20) 11(83) > U1(s,) + E = 11(83) + 6 (5.21) c (5.22) U2(2) > U2(s4) + 6. (5.23) 1 - 1(S2) > 1 - 12(82) > 12(S3) + U The inequality in (5.23) indicates the difference in utility that C 2 sees between s2 and S4 strictly exceeds c. However, because f is bounded and 0 < a - b < 6 and 0 K b - c K 6, we also have that |12(S2) - 212(S4)1 f(q)dq r2(s2) 81 fcr(q)dq C2(S4) . (5.24) Define '2 =X&2 (S2 ) E I (5.25) 2 (S 4 ) = {q E '9Z: q E J 2 (S 2 ) U L2(s4), be the symmetric difference of C 2 (s2 ) and that are in C2(S2) or 2(S4), q V 2(S4), C2(S2) (5.26) C2(s4)} i.e., the collection of all elements but not both, we have (5.27) f(q)dq 1U2 (S 2 ) - U2(S4)1 < (max f (q)) (5.28) dq (5.29) < 4M. The transition from (5.28) to (5.29) follows from three observations. First, C2 turns at b in both S2 and s 4 . Second, C1 turns at most 26 earlier in s 4 than in s2; hence, accounting for backtracking, the integral in (5.28) is less than or equal to 46. Third, and finally, fT is, by assumption, bounded everywhere on 'R by M. By choosing 6 such that 0 <6 < ' , we have, from (5.27)-(5.29), that 1U2(s 2)- U2(s4)1 (5.30) e, which contradicts (5.23), thereby refuting the initial assumption, and establishing that T is indeed a finite sequence. Combining Propositions 5.1 and 5.2 gives the following result. Theorem 5.5. Consider the 1T-CPRG with bounded target density E > 0, the profile (Si, s2) fT. For any produced as output from Algorithm 5.2 is an E-NE of the game. Proof. For any E > 0, Proposition 5.2 ensures that Algorithm 5.2 terminates and T 82 is finite. Let the search strategies of C 1 and C 2 that emerge from the algorithm be si and s2, respectively, and s = (si, s2). Let i E {1, 2} be the index of a cow that turns no later than her rival in s. From Proposition 5.1, 12i is the only cow that could potentially have an incentive to unilaterally deviate from s. However, for this to be the case, T must contain at least one extra element than it actually does, which is a contradiction. Therefore, neither C 1 nor C 2 has an incentive to unilaterally deviate from s, implying s is an -NE. M The fact that the 1T-CPRG admits an E-NE for any E > 0 allows us to comment on the existence of general NE. Theorem 5.6. A 1 T-CPRG with bounded f'T has a NE. Proof. Assume, to obtain a contradiction, that the 1T-CPRG dos not have a NE. Then for any strategy profile , one of the cows, say Ci, can unilaterally improve her utility, by some amount E > 0, by unilaterally deviating from 9j, i.e., the strategy she employs in . However, from Theorem 5.5, for precisely this value of 6, there exists a strategy s* such that s* is an -NE of the 1T-CPRG. This fact contradicts the initial assumption, and, hence, establishes that the 1T-CPRG does indeed have a NE. U We remark that although Theorem 5.6 guarantees the existence of a NE, it does not provide a direct algorithm to compute such a quantity. However, an approximation of arbitrary accuracy can be obtained by running Algorithm 5.2 with a sufficiently small e > 0, rendering the point moot for all practical intents and purposes. Shortly, we will parlay our understanding of the 1T-CPRG into a methodology to solve CPRGs in which the cows may turn up to a finite number of times. In the interim, we take the opportunity to discuss two practical considerations. 5.8 1T-CPRG: computational considerations This section provides a brief discussion of some of the computational issues associated with the 1T-CPRG and, specifically, Algorithm 5.2, where a number of maximiza83 tions over potential turning points must be calculated. Fortunately, the difficulty in evaluating these expressions is eased by the circular geometry of 'R and the fact both cows travel at the same speed. For example, assume, as in Figure 5-2, that C1 and C 2 are initially heading toward one another. In this setting, if tj landclaims Z1 and 2 = t2 , then the from (5.3) may be readily calculated from symmetry. However, should Ci unilaterally deviate and turn, instead, at time t1 + At, At > 0, then Ci's other frontier is eroded by an amount At. This realization streamlines the process of calculating deviations in landclaims which, in turn, alleviates some of the difficulties in computing the associated utilities. In the event the cows are chasing each other (e.g., both cows initially have a cw heading) the strategy of the cow that turns second is simple: turn the instant before meeting the other cow. 5.9 The 1T-CPRG for different cow speeds In the CPRG, it was assumed C1 and C 2 each travel at unit speed. It is worth remarking that the analysis in this chapter still applies in the case C 1 and C 2 have speeds v, and v 2 , respectively, with vi = v 2 . Assuming Ci knows v-i (which is reasonable given she knows qi,t), the only change v, = v 2 introduces is in computing the landclaims associated with specific turning times, which feature in the various optimizations in Algorithm 5.2. Fortunately, the arguments in Section 5.8 are easily . amended by continuing to leverage the circular symmetry of 'Z and the ratio v1/v 2 5.10 Finite-turn CPRGs In the preceding analysis, we assumed C1 and C 2 may turn at most once. This is a rather severe limitation to impose on the hungry cows. In this section, we study CPRGs in which C 1 and C2 may turn up to a (pre-specified) finite number of times. In this regard, the (ni, n 2 )-CPRG is equivalent to the 1T-CPRG defined in Definition 5.2, except for the important distinction that Ci may now turn up to ni times. More 84 specifically, let G = GCPRG(f7 q(0), i(0), ni)i=1, 2 (5.31) denote the CPRG that has target density fW and in which Ci, with initial condition (q (0), 0i(0)) (E '9Z x 1, may turn up to ni E Z>o times. Let s E S be the search profile in which Cj turns no later than her rival at a time tj E [0, 27r) in g. Also, let (s,t,) = 1(s,tj) U 2 C 'Z denote the set of all points visited by at least one (s,t) cow over [0, tj] under s. The decisions the cows face in the remainder of the game, i.e., the game unfolding for t > tj, are precisely those captured by the game Oi(tj), ,qi(tj), ,2GCPRG(f 5.32) i)= 1= 9 where fr(q) = 0 f '(q) if q E Z(s, tj) , for q E 'R and otherwise ni = (5.33) (5.34) ni otherwise. A remark is in order as it relates to f7 in 9. Equation (5.33) implies fw is deficient, i.e., fT f'(q)dq < 1, if, by time tj, the cows have, collectively, visited a subset of 'Z in the support of fW. However, with respect to maximizing (5.4), i.e., Ci's probability of capturing ', we see that what is important to Ci is the value of fr.(8 ) f 7(q)dq, where i(s) is the landclaim acquired by Ci at the end of the game. From this perspective, we can think of the cows as accruing density throughout the search, an interpretation that is well-defined even in the case of deficient target densities. Henceforth, when we refer to a game using the notation above, it will be with the tacit understanding that this interpretation is in place. We are now in a position to address when Ci should turn in the (ni, n 2 )-CPRG. 85 Starting from g 1 , the optimal time for C, to turn is given by ti= min argmax{h(ti)}, where (5.35) tie[O,2r] (5.36) h(ti) f(q)dq+ = g' RG ~ ifL(ti ) q (ti) q,(ti), ij)j=1,2, with Zi (ti) the landclaim Ci acquires in [0, ti], and is the optimal utility that Ci can acquire in gcPRG(f , qj (ti), j(ti), hj)j=1,2. Therefore, in scheduling her turns, Ci considers not only the density she acquires prior to turning, but also the density gathered in the equilibria associated with the resultant game. In this way, we can think of the games as reducing to simpler games, in the sense that there are fewer overall turns that can be made, each time a cow turns. This viewpoint is depicted in Figure 5-3. To solve (5.35)-(5.36) using dynamic programming, it is necessary to first solve the CPRG for the relevant base case scenarios, i.e., the family of games shaded in red in Figure 5-3. Having studied the 1T-CPRG in Section 5.7, the remaining base cases are those in which one cow, say Ci, may turn ni > 2 times, and the other cow, Ci, has expended all of her turns, i.e., the (ni, 0)-CPRGs and (0, n 2 )-CPRGs, for n1 > 2 and n 2 2, respectively. Any general (ni, n 2 )-CPRG will degenerate to one of these base case games once the cows have made a sufficient number of turns. Fortunately, characterizing equilibria play in the (ni, n-i)-CPRG for ni > 2 and n-i = 0 is straightforward. In these encounters, C-i has expended her turn budget and is rendered strategically inert. However, Ci's best strategy is to, first off, orient herself so that she is traveling toward C-i, i.e., /i = 4D \ {#_-}, as if to set up a head-on meeting. Establishing this alignment, takes at most one turn (in the event Ci is already on a collision course with C-i, after C-i makes her last turn, no turn is required to establish the necessary heading). Subsequently, Ci proceeds to travel toward C-i before turning, the instant before she would collide with C.i. This last 86 turn ensures e2 is positioned to capture any unclaimed density still remaining on 'z and brings the number of turns made by C4, since ei exhausted her last turn, to no more than two. For completeness, this search strategy is outlined in Algorithm 5.3. Algorithm 5.3: functionality of si = sq i #i(0) 2 6 <- <- 'D\ (0); 0+ while D(qi, qj) > 6 do 3 L 4 go straight; 5 turn around; 6 go straight; We remark that although finite-turn CPRGs degenerate to more manageable CPRGs as the cows turn, the approach suggested by Figure 5-3 requires families of games be solved for a variety of initial cow headings and positions. Therefore, employing a dynamic programming-based approach to study finite-turn CPRGs may exact a rather steep computational burden. Nevertheless, it is reassuring to know there exists a well-developed methodology to address finite-turn CPRGs at a theoretical level, and that for CPRGs permitting a modest number of turns, a practical approach to numerically compute equilibria search profiles in a continuous workspace. 5.11 Summary of the CPRG This chapter introduced the Cow-Path Ring Game as a means to study strategic decision-making in systems where multiple, self-interested, mobile agents compete to find a target on a ring. Salient features of the CPRG included the fact that each cow had minimal sensing capabilities and limited prior knowledge of the target's location. Reminiscent of probabilistic search problems, each cow used prior knowledge on the target's position to structure her search. However, owing to the self-interested nature of the participants, each cow's search strategy also factored in the movement and ambitions of rival cows, making her search strategic in a game-theoretic sense. 87 1 12 2 22 122 2 2,1 2 1 2 3,1 2 Figure 5-3: A diagram showing associations between families of finite-turn CPRGs. The node labelled with the pair (i, j) denotes the family of games in which C 1 and C 2 may turn up to i and j times, respectively. The numbers above and beside the arrows indicate which cow turns to bring about the indicated transition. The nodes representing base case games, for which equilibria strategies may be found using the methods discussed in previous sections, are colored in red. The nodes representing all other games are colored in gray. The arrows indicate how one family of games reduces to a simpler family of games when a cow turns. For example, the (2,2)-CPRG becomes an instance of the (1,2)-CPRG when C 1 turns, and an instance of the (2, 1)-CPRG when C 2 turns. Given the inaugural nature of the work, our analysis focused on CPRGs involving two cows. Because of the strategic options available to each cow, we argued that it was challenging to determine both the number of times and the locations at which each cow would turn around in an equilibrium search profile. On account of these difficulties, we considered the 1T-CPRG and through an iterative algorithm, showed that any such game with a bounded target density admits a NE. This analysis was extended, using a dynamic programming framework, to address games in which each cow may turn up to a pre-specified finite number of times. By re-envisioning the task of capturing the target with the equivalent goal of maximizing the accumulation of target density, successive turns transition the game into simpler CPRGs, that, ultimately, reduce to an instance of either the 1T-CPRG or a simple game in which only one cow has strategic options. By focusing on CPGs that take place on a ring and involve two cows, we inevitably introduced many avenues with which to extend the basic framework laid out in this chapter. For example, it remains to provide a full treatment of search games involving 88 n > 3 cows on the ring. Alternatively, it is likely the case that insight can be gained from studying competitive search games that unfold in alternate environments, for example, on a graph or within a polygon. Although we consider these to be perfectly valid pursuits, and go on to elaborate on them and other future work items in the final chapter, we do not pursue them explicitly in this thesis. Rather, the next chapter considers variants of the CPRG with a focus on characterizing games with asymmetric information models. Finally, its successor considers games that unfold in dynamic environments in which targets arrive on an ongoing basis. In each of these scenarios, we again limit the analysis to the case of two cows and continue to focus on encounters that take place in ring environments. 89 90 Chapter 6 Games with Asymmetric Information: Life as a Cow Gets Harder This chapter continues the investigation of two-cow CPRGs by considering a number of variations of the standard CPRG. We study games in which ei is subjected to a penalty each time she turns. We develop an upper bound on the number of turns a hungry cow would ever make when playing such a CPRG. Subsequently, we study an intriguing variant of the CPRG: one in which a single cow has superior situational awareness with respect to the clover's location and her rivals' intentions. The chapter begins by providing a motivational example to illustrate how asymmetries in the information available to each cow can arise in a competitive search setting. To precisely articulate these discrepancies, we also supplement our existing library of notation. Subsequently, we formally define the asymmetric information game. We then characterize equilibria for this game by developing strategic algorithms that allow the less-informed cow to retain a respectable chance of capturing the target and the more-informed cow to leverage her superior situational awareness. Finally, we provide an interpretation of social welfare for search games with asymmetric information and, for one such family of games, specify a socially optimal search policy. 91 6.1 Searching with asymmetric information: a motivating example Chapter 1 cited the example of two rival shipwreck recovery boats searching a coral reef for the remnants of a treasure ship lost at sea. There, it was assumed that each boat had the same prior on the sunken ship's location. This would be the case if, for example, the boats had access to a shared sonar map of the waters. Now, instead, imagine the first boat's prior on the ship's whereabouts is based on cutting edge sonar imagery, while the second boat's prior is distinct and based on historical maps and word-of-mouth accounts of the sinking. In this setting, each boat would, in general, have different valuations for being the first to visit a specific region of the workspace. In this chapter, we will consider a competitive search game with a similar informational infrastructure. As an added twist, suppose that, owing to an unscrupulous and easily-bought deckhand, the first boat is privy to the prior of the second boat. This insider information allows her, i.e., the boat's crew, to forecast her rival's intentions in an unreciprocated manner. Despite the more elaborate preamble, the ensuing contest is once again a competitive search game played between two recovery boats. However, in this case, the search strategy of the first boat should, in addition to reflecting her prior and the movement of the rival boat, exploit, to whatever extent possible, the added information at her disposal. The previous chapter began the process of populating a toolbox of initial results from which to branch out and tackle more elaborate competitive search games. By extending earlier results pertaining to the CPRG and analyzing search games with asymmetric information, we seek to continue this vision. This chapter begins by bounding the number of turns a hungry cow would ever make in a two-cow CPRG under the added assumption that a cost is levied each time she turns. Subsequently, we formally define the Cow-Path Ring Game with asymmetric information and proceed to characterize equilibria strategies for one such family of game. Finally, the asymmetric information framework provides an opportunity 92 to complement the predominantly competition-oriented results discussed to date by providing a notion of social utility and characterizing a family of socially optimal search strategies. 6.2 Supplementary notation and terminology This section lays the groundwork to support the forthcoming discussion of CPRGs with asymmetric information. Naturally, we retain all conventions previously introduced in Section 5.5. However, the richer and more nuanced nature of these games calls for a suite of supplementary notation aimed at (i) providing a more expressive system for relating one segment of 3Z to another, (ii) extending Ci's information model to reflect where she believes Ci suspects T is located, i.e., Ci's prior, and (iii) developing a behavioral model that describes how Ci behaves based on the knowledge that is and, equally important, is not available to her. To refer to a specific segment of 9, let, for qi, q 2 E 1 and d E (D, [qi, q2]d denote the segment obtained by tracing an arc from qi to q2 along 9Z in direction d. So that we may be equally adept at specifying one point on 9 in relation to another, let, for x E R> 0 , q E 9Z, d E 4b, (q + X)d denote the point obtained by traveling distance x, along 9Z, from q in direction d. The functionality of this notational system is illustrated below in Figure 6-1. 6.3 Information models for situational awareness In Section 5.2, Ci's information model was given as i(t) = (f, F(t)), such that her search strategy was a function of (i) her initial position on 9Z, (ii) her prior on Ts location, and (iii) information she has regarding the position of rival cows. With asymmetric information, ei's search strategy will also depend on any knowledge she has of where her rival, Ci, suspects T may be. Accordingly, we will have to augment 2i to reflect this relationship. 93 q2 [q1, q2lcw q q3 Figure 6-1: Visualization of the functionality of notation used for describing subregions of 9Z and one point relative to another on . Due to the circular topology of 9Z, there is flexibility in the notational system. For example, [qi, q2]c,. and [q2, ql]ccw refer to the same arc of 9Z. Similarly, (q3 + x)ccw and (q3 + 27r - x)cw refer to the same point on 9. To begin, Ci's unique prior on T's location is denoted fA: 'Z -* R>o. To describe what information, if any, Ci has regarding where her rivals suspect 7 may be, let, for i, j E C, i / j, f$7 denote the prior density that Ci believes Cj has on T. The special case where f7- =0 is taken to imply that Ci has no idea where Cj suspects 7 may be. In terms of this notation, the CPRG introduced in Chapter 5, fits within this notational system under the assignments f. = f7 = f , Vi, j E C. Summarizing, we can express Ci's information model in the AI-CPRG by the tuple -z(t) = (6.1) (IF(t), f7, fJ)ex\, and Ci's search strategies, still of a feedback nature, again take the form Si = {sil si(t) : Ii(t) -+ Q}. We remark that the ability of Ci to accurately maintain the set (6.2) {ff.}jEC\i is likely beyond the abilities of our bovine participants. Nevertheless, for legacy reasons, we elect to continue framing the search problems in terms of cows, even if the assumed capabilities are more in line with those of a human or robotic agent. 94 6.4 Behavioral models for asymmetric games Under the closed-loop information model, it is possible, at least conceptually, for an intelligent Ci to predict how Ci should behave.' Indeed, this capability was central to the analysis of the CPRG in Chapter 5. However, in the AI-CPRG, Ci may notice inconsistencies between f7. and qj,t in the event fi7. fj. To proceed with her search, Ci must, at some level, resolve this discrepancy. Moreover, this type of resolution may be required at each time t > 0 and, to complicate matters, C-i may be going through a similar exercise on her end. Precisely how Ci operates given f7, fr., and qj,t is determined by her behavioral model. In general, characterizing Ci's behavioral model requires a rule for updating f. based on F (t). Unfortunately, protocols for performing these updates, e.g., Bayesian belief schemes, significantly complicate the analysis. In response, we focus on the extreme scenarios in which In the event f7Z = fT, ff = 0 or = flt. Ci can perfectly forecast how C-_ would respond to her actions. Accordingly, Ci's behavioral model continues to assume that Ci is hungry and thus acts as a rational utility-maximizing player. At the opposite end of the spectrum, if f7i = 0, then Ci has no idea of where C-_ suspects ' may be, and, therefore, how Ci will explore 'Z. Accordingly, it is assumed Ci adopts a defensive approach to search and uses a maximin strategy so as to maximize her minimum achievable utility. A maximin strategy was defined in the context of a generic game in Definition 3.3. Later in this chapter, we will outline exactly what constitutes a maximin strategy in the context of a CPRG. 6.5 A bound on the maximum number of turning points in the CPRG In this section, we develop a bound on the maximum number of times Ci would ever turn around when playing the CPRG. Our analysis assumes a slightly altered formulation of the CPRG in which a fixed cost, ct > 0, is levied against Ci each time she 95 turns. Ideally, such a bound could be developed without this additional stipulation; however, because C1 and C 2 can, in theory, jostle for position during the game, it is possible for each cow to turn an arbitrary number of times at the beginning of the game, collectively explore negligible territory, and have no impact on the overall outcome. We remark that the inclusion of a turning cost in the formulation is reminiscent of the work in [36], which appended turning costs to the CPP. Clearly, the similarities end here, as the arguments necessary to coerce out the bounds presented in this section emerge from an analysis that reflects the competitive nature of the CPRG. When Ci reverses direction during a CPRG, she effectively commits to backtracking across previously explored territory. Intuitively, by turning too frequently, Ci could provide Ci with an opening to increase her utility. Consequently, it is tempting to speculate that the number of turns Ci would ever make is subject to an upper bound. In Chapter 5, an iterative algorithm was specified for computing NE search strategies for games in which each cow may turn at most a pre-specified number of times. Clearly, a definitive upper bound, n!,b, on the number of turns Ci would ever make could be used to conclusively solve the CPRG by directly solving the (n'b)n u CPRG in Figure 5-3. For the remainder of this section, it is assumed a cost c is levied against Ci each time she turns. Let ni(s) be the number of turns made by eC under s E S. Then Ci's utility under s, subject to turning costs, is U (s) = 1,(s) - ctni(s) = Jf(q)dq-ctni(s). We seek an upper bound on ni(s*) for s* E SNE. (6.3) (6.4) The following result will prove useful in this pursuit. Proposition 6.1. Consider the standard CPRG from Definition 5.1. For any initial 96 conditions (qi(O), Oi(O))i, the size of the land claim secured by ei satisfies max min {Ii(si, si)I} = 1, for i = 1, 2. siESi -iS-2 (6.5) In words, (6.5) says that Ci can always, if she so desires, be the first cow to visit at least half of 'R. However, it is important to remember that maximizing 12il is not necessarily consistent with maximizing Uj; for example, f7 may be negligible over a large portion of 'Z, and Ci better served to concentrate on securing a smaller, but more lucrative Zj. This idea will be revisited in the next section. Proof. We prove (6.5) constructively by specifying a search strategy for ei that estab- lishes the bound. To this end, we sidebar briefly and introduce the following search strategy. Definition 6.1. For i E {1, 2}, let 9i denote the set of points on 'Z that are closer, via travel in either the cw or ccw direction, to Ci than C_. The mirroring-search strategy, denoted s,, implies the following functionality: , t > 0. (si = sm) - $\O-i(= $_i(t) (6.6) otherwise In other words, sm is the search strategy that involves Ci always traveling toward, but in the opposite direction to Cj. In so doing, Ci's motion "mirrors" that of her rival Cj. For si = sm, symmetry stipulates that when Ci is exploring new territory, so is Cj, and vice versa. It follows that |Ci(sm, s.-)j = |C-i(sm, s-j) = 1/2 for all s-i E S-i, and max min{ Ci(si, s_j) Si S-i ;> min{Ci(sm, s-)} = s-i . (6.7) Similarly, for s-i = sm, sm)|} max min{I i(si, sm)I} < maxf{Cii(si, Si Si S-i 97 = . (6.8) The result is established upon noting that (6.7) and (6.8) sandwich the quantity of interest, namely max min{Ii (si, sj)I}, Si Si (6.9) between 1/2. 0 Let ni(s) denote the number of turns ej makes under s E S. If we impose a cost ct > 0 on C? each time she turns, and deduct the total cost from (5.4), then we may parlay Proposition 6.1 into an upperbound on ni(s*). Proposition 6.2. Consider a CPRG with turning cost ct > 0. For any s* E SNE, ni(s*) satisfies ni (S*< - f f (q) dq] , max (6.10) x where w(x) (x+ ), and [xl is the ceiling function of x, i.e., the minimum z E Z no less than x. Proof. We begin by bounding b1(s*). Assume, to obtain a contradiction, there exists s* E SNE with utility w(x) Ui(s*) > max f (6.11) (q)dq. Because the CPRG is constant-sum, (6.11) implies one of three alternatives must hold: (i) Cji(s*)I > 1 and, consequently, |Li(s*) < 1, (ii) i(s*) is non-convex, or (iii) both of the above. However, under s-i = sm, C-i ensures, from Proposition 6.1, that Z _-(s*)| ;> I -> I(s*)I <; Moreover, s-i = sm ensures Ci (s*) and L _(s*) form a convex partition of 'Z. Consequently, s* sm and Ci has a unilateral incentive to deviate from s*i (to sm), implying that s* ( SNE- The contradiction establishes that Ui(s*) is less than or equal to the righthand side of (6.11). By not turning around, Ci accrues positive utility and zero cost; consequently, she will only turn a number of times ni(s) such that ni(s*)ct <; U(s*). Upon solving for ni(s*) E Z>o, the bound in (6.10) follows. 98 U 6.6 CPRGs with asymmetric information In the traditional CPRG, each cow maintains the same known prior, fT, on 7's location. Consequently, f7 = f7_ = f = fT and Ci knows Ci's prior, i = 1, 2. In this section, we consider search scenarios in which Ci has a distinct prior on T's location and may or may not have knowledge of f . Henceforth, we will refer to an encounter of this type as an Asymmetric Information Cow-Path Ring Game, or simply an AI-CPRG. Definition 6.2 (AI-CPRG). Consider two hungry cows, Ci, for i E {1, 2}, initially located on 'Z at qi(0), respectively. Ci retains her movement and sensing capabilities from the CPRG in Definition 5.1, but has a distinct prior tion, with f7 -, f_. In addition, Ci maintains f7_i f7: 'R --+ R>o on T's loca- e_- has f7, ff), as the prior she believes on 7 Ci's information model is therefore specified by the triple i(t) = (IF(t), t > 0 and her search strategies are mappings of the form si : i(t) -+ Q(t). What search strategy should Ci use to maximize her perceived probability of finding W? [90]. A few of remarks are in order. First, as a reminder, F(t) is the set of all trajectories traveled by cows up to time t > 0 and Q(t) is the set of steering commands available to Ci at t, i.e, {cw, ccw}. Second, Ci's perceived probability of finding 7 is given by the expression for Ui in (5.4) with f' replaced by f7. The relevant geometric attributes of an instance of an AI-CPRG are illustrated below in Figure 6-2. 6.7 AI-CPRGs with perfect knowledge AI-CPRGs in which each cow is aware of her rival's prior, i.e., fZ_7 = ff for i = 1, 2, provide a level of situational awareness reminiscent of that in the traditional CPRG. In these instances, we may extend earlier results and make an immediate statement regarding the existence of equilibria strategies. 99 .. q C2 1C..... C1 C2 (b) (a) Figure 6-2: Initial positions, qj(0); initial headings, <ri(0); and target priors, f7; of C 1 and along the SouthC 2 for an instance of an AI-CPRG. (a) fT, shown in blue, has local maxima East and North-West regions of 'R. (b) fj, shown in green, is more evenly distributed and contains three modest peaks along 'R. For q E 'R such that f"(q) # f2(q), C 1 and C 2 have different valuations for visiting q first. Proposition 6.3. Consider an AI-CPRG in which Ci is aware of C- 's prior, i.e., f7, = fT, i E {1, 2}, and may turn at most a finite number of times. The game admits a NE s* E SNE- Proof. With f7f_ = fi, Ci can perfectly forecast the best response C_; will make to any action Ci takes. Leveraging this ability was key to the functionality of Algorithm 5.2, specifically lines 7, 9, and 16, and the analysis of the CPRG conducted in Section 5.7. With only minor modifications to the associated proofs of Propositions 5.1, 5.2 and Theorem 5.6, the same line of reasoning may be used, for the case at hand, to establish the existence of an E-NE s* E SNE. Owing to these similarities, we elect to forgo retracing these requisite arguments, trusting that an appreciation of the aforementioned arguments is sufficient to handle the necessary amendments, the extent of which involves using f7, in place of f , in calculating U2 . U The full extent to which informational asymmetries influence search strategies becomes apparent only when one cow has an unreciprocated knowledge of her rival's prior. The following example explores how this extra knowledge can, in select in- stances, be exploited to the benefit of the more-informed cow. Subsequently, the key ideas will be formalized to develop theoretical results for a broader family of games. 100 Example 6.3. Consider the AI-CPRG in which in Figure 6.7. Furthermore, assume ff2 (q) 9Z 1 = [b, c]ccw and 'z 2 = = f,, qi (0), and q 2 (0) are as illustrated f"(q) = 1, Vq E 'Z, and f2 = 0. Let [e, f],c,. In this example, C 1 is keen to be the first cow to visit 'Z 1 and 'Z 2 , as, in her eyes, there is little chance of 7 residing in R \ {jZ1 U 'R2 }. Owing, to fl 2 =f, C1 knows where C 2 suspects T may be, and can forecast C2 's behavior accordingly. In contrast, fi is uniform over 'R, indicating C 2 has little idea of where 7 may be. Compounding her situation, C 2 has no idea where C 1 suspects the target may be, making it difficult for her to postulate as to how C 1 's actions, i.e., 41,t relate to C 1 's intentions. Given f2j1 = 0, we assume, as per our behavioral model, C 2 adopts a maximin strategy. For the time being, denote the maximin strategy C 2 plays as s2 = som. We will say more about so, shortly; for now, it suffices to know that the subscript in Sam stands for opportunistic mirroring. In the event C 1 and C 2 share the prior fA = = f 1, as in the CPRG, C1 could not secure 'R 1 and 'R 2 in any equilibrium search profile; rather, these territories would be divided between C 1 and C 2 , respectively. However, because C 1 is in the auspicious position of possessing superior situational awareness, she can, in this case, lure C2 away from '%, thereby increasing her perceived probability of finding 7. To see how, consider the following strategy: C1 turns immediately and travels to d; upon reaching d, C 1 turns again and travels toward b; upon reaching b, C1 turns once more and heads toward f or, more compactly, si -- 1 : qi(0) "+ d e+ b ccw+f. (6.12) Meanwhile, by using som, C 2 begins the game by mirroring C 1 and traveling toward f; upon reaching f, C2 (who is carefully observing C 1 ) turns and heads to- ward g, where, upon observing C 1 seemingly forfeiting [g, b]cew and recognizing that I[g, b]cwI D I[f, e]cw l, C2 continues on and explores [g, b]cw while CI explores [f, e]cw. At the end of the search, Z1 = [b, f]ccw D {'R1 U R2 } and L 2 = [f, b]cc with |C 2 1> 101 12 Figure 6-3: An instance of an AI-CPRG. The cows (depicted as cars) Ci and C 2 are initially diametrically opposed at the top and bottom of 'R, respectively. 1's prior on '7, namely ff, is shown in blue. Owing to f7, C 1 is motivated to, if possible, be the first cow to explore segments JZ 1 and 'R 2 . Shown in green, it is assumed that f2(q) = 1, Vq E 'R, such that any two segments of R having equal length are equally valuable to C 2 . The points a, b, c, d, e, f and g are points of interest in Example 6.3. The insight gained from the previous example may be extended to represent a broader class of games. To this end, let md denote the midpoint between qi (0) and q2 (0) on 'R that, starting from qi(0), would be reached the fastest by traveling in direction d E 4). Furthermore, and similar to the notation used in Chapter 4 to describe search plans in the CPRP, let ()di - - n+ i di_1 =di, i =2, . .., ,(6.13) qj(0 ) d2 qi d I be the search strategy in which Ci, starting from qi(0) and traveling in direction di, travels in alternating directions between the points qi, . . . , qn. The strategy below applies to a less-informed cow that is impervious to utility differences of less than AX >0. Definition 6.4 (opportunistic mirroring). The Ax-opportunistic mirroringstrategy, sAxom E S, is the search strategy in which Ci uses sm, unless, at some time t1 > 0, 102 she is presented with the opportunity to exclusively explore along an unexplored segment of length at least Ax greater than the length of any other unexplored segment. Should such an opportunity arise, Ci explores the longer segment, reverting to sm if, at some future time t2 > t1 , she witnesses C-i turn. The following theorem combines many of the aforementioned ideas to encapsulate equilibrium play for an AI-CPRG, quantifying the extent to which the more-informed cow can benefit from her superior situational awareness. Theorem 6.5. Consider an AI-CPRG with f( 1 = 0 and f! 2 (q) = fI(q) = 1, Vq E R Assume the less-informed cow, i.e., C 2 , is impervious to utility difference less than Ax. There exists s* s* = q 1 (0) s*= G SNE, +*) (m with + x*), ( (mJ+ x* + (6.14) .)d* ) (mn + X*)d*, sAxm, and with x* E R and d* E (D given by (6.15) ff(q)dq (x*, d*) = argmax s.t. qi(0) E [Zd, ZJ]j, q 2 (0) E [Zd, Zg]d, where Zd = (md + X)d and zj = (6.16) (m + x + ')d. Proof. Figure 6-4, provided below, illustrates key points on 9z that are involved in the arguments that follow. To begin, note that because ffi = 0, ( 2 , in accord with our behavioral model, adopts a defensive mindset and plays a maximin strategy, of which sAx,om is weakly dominant, i.e., for any maximin strategy s', U2(si, for all si E Si and U2 (si, SAx,om) > SAx,om) > U2 (si, s') U2 (s1 , s') for some si E Si. Apprised of C 2 's situation, 11, can, by adopting the strategy in (6.14), ensure a utility at least as great as the integral in (6.15), as Ci offers C 2 the chance to explore the segment J' = [(mj + x* + !) d, 103 (mj + x* + !)d]d* (6.17) (mg+x*+Ax/2)d* 1d (M+ ++X*)-j2* Md Md x* (Md+X*)d* 3 (m-+x*+Ax/2)- Figure 6-4: Visualization of key quantities used in the proof of Theorem 6.5. The points labelled 1, 2, and 3 in red correspond to the three points visited by C 1 in 6.14. In the instance shown, d* = ccw. of length 2x* + Ax. As this region is Ax longer than the segment (6.18) 'Z" = [(md + X*)d*, (md + X*)*]*, C 2 realizes the opportunity, when she is at (mj + x* + 4-f at (mJ + x* + and witnesses C1 turn )d*, and explores the larger segment, leaving C1 free to search the smaller, but more personally valuable segment 'T". To see that C1 can do no better than the optimal cost in (6.15), consider that to improve upon U,1(s*, s*), Ci must (i) increase iC 1I, (ii) realize an C1 that is the union of disjoint convex sets, or (iii) do both of the above. We examine each possibility in turn. First, because C1 offers C2 the minimum extra territory needed to shift the land claims about md and mj and S2 = sAx,om guarantees IZ21 ;> I, a deviation from si cannot increase 1,iC. Second, because sAx,om prevents -2' C 2 from crossing paths with C 1 , C1 and AC 2 are each convex sets, implying both the second and third prospects for C 1 to increase U1 are infeasible. Therefore, equality in (6.15) is in fact tight. In effect, it is only through C2 playing SAx,om that C1 is capable of luring C 2 into shifting the boundaries of Z 1 and C2 relative to what would 104 be achieved had S2 = sm, i.e, the demarkation points m,, and m,,". Therefore, neither C1 nor C 2 has an incentive to unilaterally deviate from the profile (Si, SAx,om), and (s*, s*), with s* the strategy specified by (6.14) and (6.15), is a NE of the game. As a final remark, we note that under s*, C1(s*) = [Zd*, z-. ] and C2(S*) = [zd-, zd.d*. We note that f2j being uniform over 9Z and ffi = 0 featured prominently in the analysis of Theorem 6.5; namely, it made C 2 's weakly dominant maximin strategy s2 = sAq,om. In addition, these attributes made it relatively straightforward for C 1 to forecast C 2 's behavior and exploit her superior situational awareness through (6.14) and (6.15). For nonuniform f2, specifying C 2 's maximin strategy, given the full-information nature of the game, is considerably more involved. Unsurprisingly, it is also more difficult for C 1 to predict C 2 's actions and exploit her knowledge of f2 in these circumstances. For these reasons, we relegate a full treatment of this general class of AI-CPRGs, which is likely to involve a full-fledged expedition into more contemporary fields of game theory, as a future work item, opting, instead, to switch gears and consider the societal gains possible in a cooperative re-imagining of AI-CPRGs. 6.8 Socially Optimal Resource Gathering Thus far, the cows have faced-off against one another in an all-out search for T. It is intriguing to contrast the collective well-being of the cows in an AI-CPRG having f1 f2, with that which may be achieved via cooperation. To this end, note that by maximizing (5.4), Ci is, by visiting select segments of 9Z first, striving to capture as much area under f7 as possible. If we interpret f7 as the density of a commodity, say resource i, that Ci harvests, and further assume that in harvesting resource i from JZ' C 9Z any deposits of resource -i, i.e., the resource collected by 1_i, are destroyed, then we can recast our target-capture search games as a functionally equivalent resource-collection game. Within this context, a natural interpretation of 105 the social utility afforded by search profile s E S is the total weighted value of all resources collected, i.e., Us(s) := aif 1 (6.19) (q)dq + a2 Jf2i(q)dq, S(S) Z 2(s) where ai is the per unit societal value of resource i. Without loss of generality, and to ease the exposition, we will focus on the case where a, = a2 = 1. To encapsulate these ideas, a summarize of the aforementioned problem, which we will refer to as cooperative resource collection, is provided below. Definition 6.6. (Cooperative resource collection problem) Consider an AI-CPRG with the amendment that f7 no longer represents a target prior, but rather the unit density of a commodity, referred to as resource i, along 9R. For i = 1, 2, Ci is aware of f7 and has perfect knowledge of fN, such that f, -= f_%. When Ci is the first to explore a segments of 9Z, she collects all of resource i along the segment and, in the process, destroys all of resource -i along the segment. How should C1 and C2 search to maximize (6.19)? To understand how C1 and e2 ought to search Sso:= {s E S: Us(s) 3Z, let Us(s), Vs' E S} (6.20) denote the set of socially optimal search profiles. Recognizing that, as per the rules of the game, at each q E JZ, only a single resource can be harvested, we have that Vs E S, Vs E SSO, Us(s) Us(s) (6.21) J (6.22) max(f(q), f?(q))dq J(fi(q) + f27(q))dq = 2. (6.23) (6.24) 106 Given the current interest in socially optimal foraging strategies, assume Ci has complete awareness of all resource densities, i.e., f7i = f1, i E {1, 2}, thereby allowing for unobstructed coordination between the cows. Let Su(f) :={q E ':f(q) be the support of f7. (6.25) >O} The following definition is useful in describing the connectivity of subsets of R. Definition 6.7 (Convex subset of '2). A subset '2' C '2 is convex if for all qi, q2 E '2' at least one of the following conditions hold: (i) [qi, q2]c, C '2 or (ii) [qi, q2 c C '2 A '2" c '2 that is not convex is said to be non-convex. For S' E Sso, any shortfall from the upperbound in (6.24), i.e., 2 - Us(8), is a consequence of three, possibly intertwined, factors: (i) the initial positions qj(O)iE{1,2} being unfavorable, (ii) the resource deposits overlapping, i.e., Su(ff) n Su(ff) # 0, and (iii) a non-convex Su(f7). Each of these scenarios is depicted in Figure 6-5. e10 C29 C2 C2 (b) (a) (c) . Figure 6-5: Illustration of the three ways in which Uso can fall short from the maximum value of 2. In each figure, f7 and j2 are shown in blue and green, respectively. In (a), C1 and C 2 are initially positioned on the "wrong" sides of 'z, resulting in a shortfall from 2. Were the cows able to switch positions, the shortfall could be avoided. In (b), overlap between Su(f7) and Su(f27) creates unavoidable inefficiency. In (c), the shortfall results from the lack of convexity of f7 and ff7 107 To appreciate these points, consider that in the event Su(fT) and Su(f2) are convex, mutually disjoint, and qi(O) E Su(f7), i E {1, 2}, it is possible for Ci to harvest all of resource i on 'R, implying Ui = 1, i E {1, 2}, and U-s = 2. Similarly, if qi(0) = q2 (0), then it is possible Vq E 'Z, to harvest resource i at q, where i = argmax3 {ff(q), j E {1, 2}}. To capture the essence of this latter point, consider the following cooperative search strategy: Definition 6.8 (Tandem search). Two cows, C1 and C 2 , collocated at time t, i.e., qi(t) = q2 (t), use a tandem search strategy, si = S2 = Stdm, for t' > t if, Vt' > t, the cows move along 'T such that qi(t') = q2 (t') and Vq(t') E 'R, resource i(q(t')) is harvested from q(t'), where i(q(t')) = min{{1, 2} : f7(q(t')) > f__(q(t'))}. In words, the tandem search strategy is the strategy in which the cows move along 'Z in the same direction and, at each point, harvest the resource at the highest density. For convenience, define, for y E 'R, d E 4D and 'R1, 'R 2 , 'R 3 C ', h : 'R x <D -> the function RO as h(y; d) ^= f (q)dq + Jf(q)dq 'Z1 where 'Z 1 , 'R 2 , and 'Z 3 + max( f(q), f(q))dq, (6.26) [q,(0), q2(0)]g. (6.27) 'R93 are given by 9Z, = [qi(0), y], 9Z 2 = [q2(0), y]j, 'Z 3 = The following result captures the essence of the cooperative-based approach to search discussed above. Proposition 6.4. Consider an AI-CPRG, with f17 = initially positioned at qi(O) and ff_ = U* = flg. f27. For i c {1, 2}, ei is The socially optimal utility is given by max h(y; d). (6.28) Furthermore, with (y*, d*) the maximizers of (6.28) a socially optimal strategy, s E 108 Sso, is for Ci to travel from qi(O) to y* in direction d(i), where d(1) = d* and d(2) = d*, wait, if necessary, to rendezvous with C-j, at y* and proceed to explore '%Z, in either direction, using a tandem search strategy. Proof. We begin by showing the social optimum can be achieved by a search strategy of the proposed form; namely, by a strategy in which C1 and C 2 rendezvous at a particular point y E 'R and, subsequently, search, in tandem, the portion of 'Z that remains unexplored using stdnm It is clear that if C1 and C 2 cross paths at any point, the socially optimal strategy going forward is for the cows to search the remainder of 'Z in tandem. To this end, let 9 be an optimal strategy of the aforementioned form. Assume, to obtain a contradiction, there exists a strategy S, in which (i) at no point do C 1 and C 2 explore to . any subset of 'R in tandem, and (ii) Us(s) > Us(g), i.e, S' is strictly socially superior In light of (i) and (ii), it must be the case that for each of the two segments, [qi(0), q2(0)]d, d E 4, the segment contains at most one convex subset belonging to Zi, for i E {1, 2}. To see why, consider that if [qi(0), q2(0)]d contained two or more convex subsets belonging to Zj, the only way for Z to be realizable is if the cows crossed paths at some point, which would violate the assumption in (i). Therefore, there exists point z E 'R at the boundary of Z 1(') and C2(.), i.e., z E L 1 (s) Now consider the strategy s n L 2 (s). in which C 1 and C 2 travel, from qi (0) and q2(0), respectively, along their respective shortest paths toward z, and, subsequently, explore any remaining territory in tandem. Let 'R 1 = [qi(0), q2(0)]d, such that z E '1Z and 'Z2= 'Z \ 'Z 1 . Additionally, for '' using s, over 'M, i.e., C R, let Us(s; ') be the social utility generated, 2 109 where 1i(s) := ci(s) n '. From previous arguments, fi(q) + s('; 9 2 )= 92z, < (g) f fi(q)dq (6.30) q2n2E(9) J max(fl'(q), fj(q))dq (6.31) 9ZR2 (6.32) = Us( ; R2). Moreover, because ci() n 9z 1 = zi(s) n 9i 1 , i E {1, 2}, Us(s; J 1 ), and, given (6.30)-(6.32), that Us(s) it follows that Us(4; R 1 ) = Us(s). Because e is of the proposed form, with rendezvous point z, this result stands in opposition to our earlier assumption; namely, that Us(') > Us(g), since 9 was assumed to be socially optimal among the proposed class of search strategies. The contradiction allows us to streamline our search for a socially optimal strategy to the class of strategies specified in the theorem. It remains to show the expression in (6.28) is the social utility of a strategy in which the cows rendezvous and conduct the remainder of their search in tandem. To this end, note that the first two terms in (6.26) represent the contributions to Us accrued by C1 and C2 , respectively, as they travel toward one another before meeting at y. The final term represents the contributions to Us that result from C 1 and C2 searching all remaining territory in tandem. Therefore, for any y E 9Z and d E (D, (6.26) is the social utility of a search strategy of the proposed form, of which we know at least one such strategy is socially optimal. By searching over all such profiles, as in (6.28), an S' E Sso is guaranteed to be found. The following example illustrates how C1 and C 2 would search cooperatively to maximize social utility for a specific instance of fjT and f2. Example 6.9. Reconsider the initial positions and target distributions from Example 6.3. In this case, C 2 values all q E 9R equally, while C1 has a preference for [b, c]ccw 110 and [e, f]cc. In this case, the socially optimal strategies of each cow are given by si CC1 : qi(O) ""+ s2 C2 : 2 (0) C-" In this way, C1 harvests [b, c]ccw and [e, remainder of 'R where f27 > f7, f]cc b + b C''Stdm> where (6.33) b CW'tdm fl (6.34) b. > f2 and C 2 harvests the excluding the small arc [q 1 (O), c]c. Recall that in the competitive context of an AI-CPRG, C1 was first to visit [b, f]cc. In this sense, the social cost of competition, i.e., the inefficiency introduced by competition, is represented by the difference between f2(q) and f7(q) integrated over [c, e]ccw, because [c, ~e]c shifts from being harvested by C2 to C 1 when transitioning from a cooperative to a competitive formulation of the search problem. This tradeoff is illustrated in Figure 6-6. '.C 2 Figure 6-6: A socially optimal search strategy for the scenario considered in Example 6.3. The socially optimal search strategy is illustrated by the purple line: C 1 and C 2 rendezvous at b, having travelled there in the cw and ccw directions, respectively, and proceed to explore [qi (0), q2 (0)] cc in tandem. The segment 'R 1 , shown in red, is the portion of the ring that transitions from being explored by C 2 in a cooperative search to being visited first by C1 in a competitive search. 111 6.9 Conclusions This chapter considered variants of the standard two-cow CPRG. For CPRGs with a turning cost, we developed an upper bound on the number of times an intelligent cow would ever reverse direction while searching. Subsequently, we introduced the AI-CPRG, in which each cow maintains a unique prior on 7's location. When each cow knows her rival's prior, previous equilibria results extend naturally. However, when just one cow has an unreciprocated knowledge of her rival's prior, necessary amendments to the information model of the game introduce new complexities. An equilibrium profile is provided for one such class of game, in which the less-informed cow has no idea of the target's location and, accordingly, takes a defensive approach to search. Here, the more-informed cow leverages her superior situational awareness to lure, to the extent possible, her rival away from regions of the workspace coveted by the more-informed cow. Finally, by re-interpreting the goal of finding the target with the equivalent objective of capturing as much target density as possible, we provided an interpretation for the social welfare of an AI-CPRG. In this context, the cooperative notion of a tandem search strategy was introduced to characterize a class of socially optimal search profiles. The treatment of AI-CPRGs was restricted to the case where the uninformed agent maintained a uniform prior on the target's location. It remains to study the broader class of game in which the second cow maintains a non-uniform prior and to fully characterize the strategic options available to each cow. We conjecture that the complexities that arise in these scenarios, which are compounded by the feedback nature of the game, would make a detailed analysis especially challenging. We elaborate on this issue in the final chapter of the thesis. For now, rather than overindulge in such a pursuit, we change course once again and, in the next chapter, consider CPRGs in which targets arrive dynamically. 112 Chapter 7 Dynamic Cow-Path Games: Search Strategies for a Changing World This chapter investigates multi-agent systems in which the agents compete to capture targets that arrive dynamically. Cows reprise their roles as intrepid search agents, but now operate in an environment where targets have transport requirements and continually repopulate the workspace. To address these problems, we introduce the Dynamic-Environment CPRG, or simply DE-CPRG. We show that greedy searching (in the myopic sense), although optimal in select instances, can perform very poorly in others. To bound performance, we establish a condition on the utility each cow receives in an equilibrium. Recognizing that assessing the long-term effects of short term actions in an equilibrium setting is, in general, prefixed by an assortment of unique challenges, we provide a strategy to search the ring that possesses a number of attractive attributes and, in the long-run, offers some performance guarantees in the face of inter-agent competition in an ever-changing world. This chapter is organized as follows. First, we supplement existing CPRG terminology so that we may describe the process by which targets arrive. Subsequently, we define CPRGs in dynamic environments. From here, we consider greedy strategies, which emphasize finding the next target to appear, potentially at the expense of securing future targets. We argue that greedy strategies, while occasionally optimal, 113 have the potential to induce severe capture droughts. From here, we establish that in any equilibrium, each cow captures half of the targets. Next, we revisit the worstcase performance of greedy search strategies on a more formal level, amassing further evidence that greedy searching can provide poor performance in persistent settings. Motivated by the desire to pass along a search strategy that avoids prolonged capture droughts, we provide a defensive-minded strategy that possesses a number of desirable attributes. Finally, we characterize a worst-case performance bound for this strategy and compare it to the theoretical equilibrium value, i.e., one half, for a variety of target distributions. 7.1 A motivation for dynamic environments Thus far, the scenarios considered involved just a single target that had been randomly placed prior to commencement of the game. Accordingly, each cow put a premium on being the first to find the target. A natural next step, and new challenge for the cows, is to consider search games in which targets arrive dynamically. In introducing these encounters, the aim is to provide (i) an abstraction that captures the competitivesearch undertones of relevant, real-world scenarios with dynamic componentry and (ii) a framework to explore and evaluate persistent decision-making strategies. As an example of a persistent target-capture scenario, consider the operation of Yellow Cabs, also called Medallions, in Manhattan, New York. Legally, these cabs may take a job only after being hailed by a passenger. In Chapter 1, we highlighted how taxi operations of this form could be viewed as a competitive search game. Less emphasized in that discussion was the long-run nature of the game. For example, in 2013, the average shift length of a taxi driver in New York City was 9.5 hours [23J. To support themselves, drivers must use a customer-foraging strategy that accounts not only for (i) the distribution of potential patrons and (ii) the presence of nearby rivals, as in traditional CPGs, but also (iii) the extended duration of a typical shift. In this regard, a taxi driver should not be overly concerned if a prospective passenger 114 is preemptively snatched up by a rival driver. However, a driver should be worried if, upon reflecting on the day's events, they spent a significant fraction of their shift without a passenger on board. 7.2 Dynamic Cow-Path Games with target transport requirements To facilitate the forthcoming discussion of DE-CPRGs, we begin by extending our existing suite of notation to handle scenarios involving multiple targets. To start, tar. gets are enumerated according to the order in which they arrive on 'z, i.e., T 1, 7 2 ,. and the spatio-temporal process governing their arrival is denoted by #Y. In the games we consider, targets have transportrequirements. Specifically, each Tj is described by a pair, (0j,'Dj) E JZ 2 , where Qj and 'Dj are Tj's origin point and destination point, respectively. For Ci to receive credit for capturing Tj, she must (i) be first to find Tj at (9 and (ii) transport Tj to 'D. Games with this mechanic are motivated by taxi systems in which drivers compete with other vacant cabs to locate customers, and, provide transport before being paid. The notation O9-fo, indicates that (D is distributed according to the spatial density function fo3 : 'Z -+ R;>o. In an analogous way, we write 'D ~ fej. More will be said about the temporal process governing T3 's arrival shortly. For j > 1, we associated three specific times with T7: the time Tj arrives on 'z, the time T is captured by a cow at 0., and the time Tj is dropped off at 'D. These times are denoted by ta(7j), tc(7j), and td(7j), respectively, and, following the natural ordering of events, satisfy ta(Tj) < te(Tj) < td(Tj). C1 and C 2 , via the strategy profile they employ, are partly responsible for tc('T) and td('T). Ox # is fully described by f 3,, foy, and ta(Tj) for j > 1. In all games, it is assumed that when C, discovers a target, 'Ty, she immediately transports Tj from Qj and 'D along the shortest path, either [0j,'D]cw or [Q9, 'D]ccw. Furthermore, we assume that at time tc(T7), both C 1 and C 2 are apprised of (9 and 115 'Dj. This would be the case, for example, in a taxi system where taxis update their status with a central computer at the beginning of each job. The target summary at time t, denoted T(t), is a listing of all target arrival, capture, and delivery times up to time t, i.e., j T(t) = {ta(Wj),tc('Yj),td(W), (7.1) 1)}<t, where given a set of times T = {t 1 , t 2 ,...}, Tst = {ti E Tti < t}. In this way, I(t) encapsulates Ci's knowledge of all target activity on 'z up to t. To reflect this information, Ii(t), Ci's information model of the game at time t, is augmented to include T (t), such that 2i(t) = (fT, F'(t), 'I(t)) and the set of Ci's search strategies are given by Si = {si s(y,[o,t], fO, 11T(t), t)j= 1 ,2 -4 (7.2) <1}. With multiple targets peppering 'R in a DE-CPRG, it no longer makes sense to take Ui(s) = P (Ci <- 7j under s) for a given capturing Tj, for j j ;> 1, as it ignores a hungry Ci's interest in > 2. To describe a more appropriate Ui, refer to [ta(Tj), td(W)], i.e., the time during which 7j is in the system, as stage j of the DE-CPRG and define U7(s) = P (Ci <- (7.3) Tj under s) to be Ci's utility during stage j. Note that at this point we have said nothing to rule out the possibility of stages overlapping. Assuming targets are homogeneous, it is sensible for Ci to focus on the fraction of targets she captures in the long run, as illustrated in Figure 7-1. For i E {1, 2}, j E Z> 1 , define the indicator random variable ij.(S) = {(7.4) 1 if e4<-cr under s 0 otherwise. The aggregate utility of e2 is defined as the expected fraction of targets she captures in steady state, i.e., Uf 9 (s) = E [fraction of targets 116 ei captures under s] (7.5) (7.6) E E [1y (s)] =lim m-+o1 =lim!Z m-oco rlim m-oom (7.7) ,(s) (7.8) C2 i C1 P (Ci <- TW under s) C2 C1 p I tc( ) tc 72 tc (4) 3) tc( ~t 5) Figure 7-1: A sample sequence of target capture times associated with the early stages of a DE-CPRG. In the instance shown, C1 captures targets T 1 , T 2 , and 74 , while C 2 captures targets T3 and 75 . If the statistics shown are representative of steady-state behavior, then the aggregate utilities of the cows would be Ut"(s) = 0.6 and U2(s) = 0.4, respectively. We are now in a position to formally define the DE-CPRG with transport requirements. Because the problem features a number of components, we opt to use the more streamlined tuple notation. Definition 7.1. A DE-CPRG with target transport requirements is described by the tuple gDE-CPRG i(Cj, i, Si (7.9) 7)i=1,2, where Ci retains her usual sensing and movement capabilities, 'Z is the ring; for i C {1, 2}, Ti(t) = (q[o,t,t'I(t))j=1,2, Si ={si : si(fO, Ti(t)) -+<}, Ufa is given by (7.8); and #y is the process by which, for j > 1, (i) Tj = (0j,'Dj), (ii) (D are independent and identically distributed (i.i.d.) with Oj with 'D ta(Tj+i) fo, (iii) 'Dj are i.i.d. ~ fp, (iv) OQ and Dj are independent, (v) 71 arrives at t = 0, and (vi) td(Tj). Upon finding TY, Ci is obligated to travel directly to 'D along the shortest route. Finally, Ci may transport at most one target at a time. How should Ci search so as to maximize Uia? 117 + Dj+ 2 +1 ++1 (9. Figure 7-2: An isometric visualization of an instance of the DE-CPRG. The prior fo is shown in blue as a function of position along R. Also shown are the origin and destination points associated with targets 'T, 7j+l, and 'T+ 2 . At the instant shown, C 1 and C 2 are searching 'R for Tj. According to c-, once 7V is discovered and transported from O( to Dj, cV3+ is popped from the queue of targets and appears on 'R. A snapshot of a DE-CPRG is shown in Figure 7-2. Based on Oy, a DE-CPRG is effectively a sequence of CPRGs played in immediate succession, with the important caveat that the initial conditions of any game are determined by the preceding tra- jectories of C1 and C 2 . Accordingly, we refer to stage j of a DE-CPRG as CPRGj, with the time span of CPRGj being [ta('J), tdQ)l. It is worth reinforcing that, as stated by the rules of the game, when transporting a target, say Tj, Ci's behavior is unaffected by her strategy si. Consequently, Ci can use this time to advantageously preposition herself in preparation for CPRGj+ 1 . Indeed, as we will see, judicious use of this time is a defining attribute of efficient search strategies. 7.3 Greedy search strategies for the DE-CPRG With the goal of maximizing (7.8), how should Ci go about searching for targets? To get the ball rolling, recall that an algorithm to compute e-NE search strategies for 118 a CPRG was discussed in Chapter 5. Recognizing that a DE-CPRG is effectively a sequence of CPRGs played in immediate succession, there is an incentive to investigate if repurposing earlier results can provide added analytic mileage at little overhead. To this end, the following search strategy builds upon the equilibria results of Chapter 5 while largely ignoring the complexities introduced by a dynamic environment. Definition 7.2. (greedy searching) For a DE-CPRG, the greedy search strategy, denoted sgr, is the search strategy with the following functionality (si si = s (qi(tj), q2(tj)) in stage where tj = ta(Tj) j, j > 1, = Sgr) --> (7.10) and s*(qi, q 2 ) is a NE search strategy of the static-environment CPRG, discussed in [89], in which ei has initial position qi, i = 1, 2. In words, using si = sgr, (! tries to maximize her probability of capturing the most recent target to arrive on JZ. In doing so, may end up in the event &i ei has no regard for where on 'Z she <- Tj. Should si = sgr offer reasonable performance guarantees to Ci, it would be tempting to propose the case of uniform fo, it is easy to see that (sgr, sgr) (sgr, syr) as a NE. For example, in is a NE profile. Unfortunately, the following example, unsurprisingly, debunks both of these claims for general fo and fD. Example 7.3. To better understand the potential pitfalls of Ci using si = gs,, consider the scenario depicted in Figure 7-3. At the beginning of CPRGj, C 2 is separated from region 8 1 by C1 and is unlikely to capture 'T. Nevertheless, C2 's best hope of capturing Wj is to search 'Z by traveling in the ccw direction. However, in pursuing this agenda, which is consistent with s2 = Sgr, not only is C 2 unlikely to capture Ti, . she is also prone to being stranded along 'Z \ {E 1 U 8 2} at the beginning of CPRGj+1 Consequently, it is also unlikely that C 2 will capture T3 +1 . Intuitively, it is preferable for C 2 to, instead, (i) follow C1 to the edge of 8 1 , (ii) wait there until the beginning of stage j + 1, then (iii) explore 6 1 during which time she stands an excellent chance of capturing T7+ 1 . Although this queueing-centric search strategy requires thinking 119 (a) fc shown in blue. (b) fD shown in green. Figure 7-3: A snapshot of a DE-CPRG taken at the start of CPRGJ. The origin-target density, fo, and destination-target density, f,, are shown in (a) and (b), respectively. Targets are significantly more likely to (i) arrive in 6 1 rather than 9 \ 0 1 and (ii) seek transport to 02 rather than JZ \ 02. just one stage ahead, it is beyond the scope of si = sg,. 7.4 Equilibria utilities of cows in the DE-CPRG Example 7.3, from the previous section, revealed that si = sgr can lead to prolonged capture droughts for Ci. Moreover, it is not obvious how C, should go about striking an optimal balance between being (i) sufficiently opportunistic so as to capitalize on short-term advantages and (ii) sufficiently disciplined so as to not lose sight of long- term objectives, i.e., (7.8). To this end, imagine Ci, by using si, can ensure that she captures at least forty percent of all targets. Is this an impressive tally, or should Ci be doing much better in an equilibrium? In this regard, it would be useful to have a benchmark by which to compare, for a particular si E Si, min,_, Uf"(si, s-i) with U" (s*) for s* E SNE. The following definitions will prove useful in this regard. Definition 7.4. Let Su(fo) = {q C 'lfo(q) DE-CPRG and search profile s E > 0} denote the support of fo. For a S, si is an itinerant search strategy under s if in the ensuing stages of the game, i.e., CPRG 1 , CPRG 2 , . .. , for all q E Su(fo), Ci is the 120 first cow to visit q infinitely often. Definition 7.5. An s E S is an itinerantsearch profile if si and s2 are each itinerant search strategies under s. In words, an itinerant strategy is one that actively explores all regions of the workspace. In this way, an itinerant strategy elicits associations with the notion of recurrent states in Markov chains and renewal theory in general. However, an itinerant strategy is defined in a competitive context not present in these more traditional notions. Owing to the transport requirements on 75 and the independence of (D and 'D, search strategies that do not expressly prohibit visiting select points on 'z are itinerant. The following theorem establishes that the initial conditions of a DE-CPRG play no role in the payoff Ci receives in any itinerant equilibrium search. Proposition 7.1. Consider a DE-CPRG. If there exists an itinerant search profile s* = (9, 9) E SNE, then, irrespective of the initial cow positions, U1,(s*) = U2(s*) = Proof. The key idea of the proof is that if, in finite time, the cows "switch" positions on JZ, then the inherent symmetry in the game mandates the cows receive equal utility going forward. For a more streamlined treatment, we focus on the case where fT each have discrete support, i.e., Su(fo) = Su(fp) = = {1, ... , fo and 9m} C JZ, where the 64 are points on 'Z. Figure 7-4, provided below, captures the essence of the arguments to follow. Let the initial position of Ci be qj(0), i target infinitely often; otherwise, Ufa(g, 9) = = 1, 2. Because ( , 9) E SNE, ei captures a 0 and Ci could simply camp out at any station in Su(fo) and improve her utility, which contradicts ( , 9) E SNE. Also, from similar arguments, Su(fo) = Es, implies any equilibrium strategy, including ( , ), is itinerant. Therefore, in finite time, e2 will capture a target, say Ty, at a station, say 9 a, that requires transport to station 9 b, with D(Oa, 0b) where L = = maxa/,b' D(Oa/, b') < ; 27r is the length of 'Z. While C 2 transports 7T to 0 b, Ci has sufficient time to optimally position herself on 'Z, say at station 6c, in preparation for CPRGj+1 , which owing to ( , ) E SNE, she does. Because CPRGj+1 , in which C1 and C 2 start from 0, 121 and 0 b, respectively, emerged in finite time, the outcomes of CPRG 1 , ... , CPRGj, in terms of which cow captured which targets, contribute only transitorily to ( s) and Ulg( 9 , qi(0), q 2 (0)) = Ul49 (s 1, oc, Ob), (7.11) where UQ(si, s2, q 1 , q 2 ) is the utility, i.e., (7.8), Ci derives when C1 and C 2 search using s, and S2, respectively , starting from qi and q2 on 'R, respectively. By similar arguments, 3k < oc, k > eb and Ec, j, such that CPRGk starts with C1 and C 2 at respectively, implying U2"(9, , qI(O), q 2 (0)) = U2( (7.12) S, 6 b, 6c). Because Ul"g and U2g are both given by (7.8), the symmetry in (7.11) and (7.12), in conjunction with the constant-sum nature of the game, implies C 1 and C 2 receive equal utility, i.e., Ui(s*) = U2 (s*) = ', which is the requested result. U L 2 C2 02 (b) Scenario 2 (a) Scenario 1 Figure 7-4: Illustration of two scenarios used in the proof of Proposition 7.1. In (a), C 2 discovers a target at Oa that requires transport to O6, a distance L away. During transport, C 1 has time to optimally preposition herself at 0, in preparation for the next stage. A finite time later, in (b), the roles reverse, C 1 discovers a target at Oa that also requires transport to O6, allowing C 2 to optimally position herself at 0, for the next game. 122 We remark that, while necessary, the condition U1 (s) = U2 (s) = 1 is not sufficient to guarantee s E SNE. For example, reconsider the game depicted in Figure 7-3. Assume the cows begin at diametrically opposed points on 'R. If for i = 1, 2, eC uses the strategy si = seCC when the current target in play has yet to be discovered, and travels in the same direction as C-i when U22(SCcW, SCCW) = ei is delivering a target, then U1i(sccw, secw) = 1. However, the door is left (wide) open for Ci's actions. Namely, by using s', in which ei ei to take advantage of travels in the clockwise direction to 6 1 immediately after dropping off each target she captures, Ci is likely to capture ei Beyond providing a necessary condition for s E to return to 6 1 SNE, . and deliver multiple targets in the time it takes Proposition 7.1 has another use. Namely, it states that -2 is an upperbound on min,,_, U(sj, s_j), the worst-case performance of si E Si, as a function of si. Therefore, the closer this value is to 1, the more legitimate the case for Ci to use si. This provides useful insight in selecting an si in the event that (i) an equilibrium cannot be found, or (ii) an equilibrium is known, but it is difficult for Ci to execute, e.g., it requires significant computational overhead or planning on the part of 12. In the next section, we revisit greedy searching, consider the worst-case performance of si = sgr for Ci, and compare this quantity with the . value 7.5 An aggregate worst-case analysis of greedy searching Owing to the persistent nature of DE-CPRGs, quantifying the long-term effects of short-term actions is a challenging pursuit. This is especially true in an equilibria setting. Moreover, such strategies, should they exist, may prove to be computationally intensive for Ci to implement. In light of these challenges, we proceed with our analysis along the following lines. First, we pursue a high-level analysis predicated on aggregate statistics of the DE-CPRG. Second, we focus on defensive search strategies that can be implemented by Ci with limited complexity and study the performance 123 guarantees they provide, i.e., the performance of si in min,_, W 9(si, s-i). To begin, we define the average transport distance of a target as fo(qj)fD(q2)D(qj, q2)dqidq2. = J-+Offo~lfDq)~l~)qd da = lim E [D(O,' D)] 3 2 (7.13) qi q2 From the i.i.d. properties of (% and 'Dj, E [D('D, (9j+,)] = da. Hence, the average rate at which Ci, searching uncontested on 'Z, can deliver targets is no more than one target in time 2 da. Similarly, we define the minimum average delivery distance as dm = min E [D(qDy)] = min fD(q)D(q, qi)dql qi q qM= argminE [D(q,'Dj)] = argmin qq fD(qi)D(q, q)dqi, (7.14) (7.15) with dm := E [D(qm, Dj)]. In words, qm E 'Z is the point or, more generally, the set of points on 'Z where, upon finding 'Jj, the time required to transport 'T to Dj is the smallest. In this way, dm serves as a worst case lower bound on the average amount of time Ci will, having lost CPRGj, have to position herself, from her current location, in preparation for CPRGj+1 . Because we will frequently refer to an optimal quantity and its optimizing argument(s), we emphasize the association through the following operator (arg, -) max f(x) = (argmax f(x), max f(x)). (7.16) For example, we may express (7.14)-(7.15) as (qm, dm) = (arg, -) min q E [D(qm, Dj)]- (7.17) The following result quantifies the potentially poor performance of greedy searching first alluded to in Example 7.3. Proposition 7.2. Consider a DE-CPRG. If Ci uses a greedy strategy, i.e., si = sgr 124 then her utility satisfies minUi 9(sgr, s-i) ;> ;> a, 5-i 1+,3-a (7.18) where a and 3, 3 > a, are the utility constants a = min Ui(qi, q_i), (7.19) 3 (7.20) = min max qi,q-i qEB(qi,dm) U2 (q, qi), where B(qi,d) := {q E 'Z|D(q,qi) < d}, and ?i(q1,q 2 ) is the stage utility afforded to Ci in a CPRG where C 1 and C 2 have initial positions q1 and q2 , respectively, and play an equilibrium strategy of the form outlined in Chapter 5. Proof. Before proceeding, we remark that a and from the perspective of Ci. Namely, a and with respect to both qi and q_i. # / are especially pessimistic quantities are determined by minimizing U (qi, q-i) Although it is certainly conceivable for C-i to maximize U-i(qi, q_i), and therefore minimize Ui(qi, q_i), it is less common to see Ui(qi, qji) minimized with respect to qi. The reason for minimizing U, over qi is to emphasize worst case positioning when Ci uses si = sgr and therefore does not plan for how qi, at the beginning of CPRGj, affects qi at the beginning of CPRGj+,1 . In this sense, a represents the scenario in which Ci has captured and delivered 7j, but is subsequently "stranded" on a region of 'Z that affords a poor chance of capturing 72+1. Similarly, 3 corresponds to the scenario in which Ci does not capture 7j and is 1 . poorly positioned, even with repositioning, to contest Wj+ Returning to the proof of Proposition 7.2, define Ci's worst-case utility, in the DE-CPRG, from using a greedy search, i.e., si = s,,, as w = mi U" (Sgr, si). s-i (7.21) Given C 1 and C 2 have initial positions qi and q2 at the beginning of CPRGj, Ci's probability of capturing Tj is at least bi(q, q2 ), since if C-i were to deviate from 125 equilibrium play, it could, given CPRGj is constant-sum, only improve Ci's chance of capturing 7j. Therefore, for j > 1, a is the worst possible chance Ci has of capturing 7T using si = sgr. In the event Ci fails to capture 7j, she has, on average, at least time dm to relocate herself before CPRGj+1 begins. Therefore, by similar reasoning, , is an approximation of the worst possible chance Ci has of capturing 7T+1 in CPRGj+1 following a failed capture bid in CPRGj. Because w accounts for targets that are captured after both successes and failures in the previous stage, we write, the steadystate expression W= lim U) min P (Ci lim (1 - <- 7yj+l ICi +- cyj; sg,,7s-i) )+ w) mins_iP (Ce +- j+1Ci - 'Tj; sgr, s-i)). (7.22) (7.23) -*oo\ In minimizing over s-i, the conditional probabilities for which steady-state limits are taken in (7.23) lose their dependence on j, ensuring the limits exist and allowing us to write W=w( mirnP(ei<--j7+1i-cTj;sgr, si))+ (1-W) ( min s-iP (Ci <- 7j+j I -i +- Tj; sgr,i -i) (7.24) (7.25) (7.26) > wa + (1 -w). Solving for w in (7.26) yields (7.18), the desired quantity. U A couple of remarks are in order. First, (7.18) is, in general, a highly pessimistic result. It is predicated on the fact that Ci and Ci are arranged in the worst possible configuration, from the perspective of Ci, at the beginning of each stage. Naturally, motion constraints of the cows and the stochastic nature of target arrivals make this unlikely for general distributions. Nevertheless, the assumptions are, given the current state of the art, necessary to establish the inequality chain given the general setting and competitive nature of the game. While (7.18) is conservative in general, 126 it can become tight in extreme scenarios. Reconsidering Example 7.3, we have that 3 = a and (7.18) regresses to a as well. This result confirms our earlier intuition that by using si = sgr, eC can become stranded on 'Z \{ 1 U 8 2} and realize worst-case capture rates over an arbitrarily large number of games. Second, if fo is uniform over 'R, then a = / = min,_ U(sgr, si) = 1, making movement memoryless, and, as noted earlier, it is easy to see that (sg, Sgr) E SNE- For general fo, greedy searching illustrates, in the extreme, the potential pitfalls of focusing on the short-term gains at the expense of long-term ambitions. How then, can this tradeoff be optimized in the context of a DE-CPRG? We argue that precisely characterizing the long-term effects of short-term actions given the competitive and stochastic nature of a DE-CPRG is a challenging pursuit. Moreover, equilibria strategies, should they exist, may place an unrealistic computational burden on Ci. In recognition, we change course and advocate for defensive strategies with quantifiable and respectable capture rates. First, we step back from pursuing equilibrium strategies, and seek, instead, to identify search strategies that avoid prolonged capture droughts and provide reasonable performance guarantees. As a first step in this direction, we define the following search strategy. Definition 7.6. Consider a DE-CPRG. Define Ci's maximin starting point and maximin utility of the game as (q', U ) = (arg, .) max minUi(qi, q-i), qi q-i (7.27) respectively. Ci is said to use a maximin search strategy, denoted si = s* Iif (i) C2 travels to q% after each successful delivery and after each failed capture bid, capturing any target found along the way, and (ii) upon reaching qi, Ci plays an equilibrium strategy for the remainder of the current game. By playing si = s*, Ci ensures herself a reasonable chance, i.e., Uf, of capturing 'iy in each CPRGj she begins from qi. In this way, s , by avoiding prolonged searching over low-valued regions of 'Z, directly addresses the major pitfall of sgr. Unfortunately, 127 s is, itself, poorly suited to a dynamic world, because, in a setting where games are played in immediate succession, Ci may spend a significant fraction of time relocating herself to qt. Extending earlier notation, let Ei(qi(t), q-i(t)) = {q E R I D(q, qi(t)) <; D(q, q-i(t))} (7.28) denote the set of points on OZ closer to Ci than C-i at time t. For convenience, we will frequently write E8(t) as a shorthand for E8(qi(t), q-i(t)). The following strategy addresses the limitations of s'. At a high-level, it ensures Ci maintains a reasonable chance of finding a target in the stages ensuing a loss. Definition 7.7. (dynamic mirroring strategy) Define the following time-dependent, landclaim-related quantities for t ;> t (7i) zi, (t) = {q E ' : Ci first to visit q by the time t - ta(Wj)}, zj (t) = Z 1 ,(t) (7.29) (7.30) U C2,J(t), (7.31) 'Cj (M = 9 \ Cj (M). (si = sdm) - The dynamic mirroring search strategy, denoted sdm, has the following functionality: [b \ #_i(t) i if qi(t) V e3(t) if D(q, qi(t)) < D(q, q-i(t)), Vq E = 0_i(t) j (t) , t > 0 (7.32) otherwise, where qi is determined by Algorithm 7.1, provided below. In words, si = sdm specifies that Ci head toward C-i as if to set up a head-on collision. However, if C-i should venture into Oi(tj), then Ci (i) stays just ahead of C_i, unless (ii) C-_ ventures far enough into ei(tj) that by continuing with her current heading, as stipulated in line 3 of Algorithm 7.1, Ci is guaranteed to capture Ti. 128 Algorithm 7.1: determine Oi for si = sd, 1i 2 3 4 = undef ined; while true do if D(q,qi(t)) < D(qq_i(t)), Vq E L y (t) then i=#~) ! break The attractive feature of sdm is that it allows Ci to be the first to visit all points she is closest to at the beginning of each stage. Moreover, it is a strategy that allows her to guard territory on 'R, such that if it is in her favor, she can continue to be the first to explore this territory from one game to the next, until such time as she finds the target. Of course, controlling a particular half of 'z is only useful if there is a decent chance it will contain a target in the near future. To this end, define Ii's persistent maximin point, q-,, as (i ,U ) = (arg, -) max min qi ffo(q)dq. (7.33) q-i ei(qi,q-i) To chronicle repositioning efforts during the time C-_ () is busy transporting 7j from to 'D, we also consider a related notion qz (qi, q-i, qd) = arg max J fo(q)dq, qG'B j ei(q,qd) (7.34) qi, in the time it takes Ci to travel from qji to qd. Having introduced sdm and 4 , where B is shorthand for 'B(qi, D(q-i, qd)), i.e., the set of points Ci can reach, from we may, finally, fuse these ideas to provide a search strategy tailored to the persistent nature of a DE-CPRG [91]. Definition 7.8. (dynamic maximin search strategy) Enumerate the targets cap- tured by Ci in a DE-CPRG as T7, 1 , T, 2 .... search strategy, s, The functionality of the dynamic maximin is described by Algorithm 7.2. 129 Algorithm 7.2: functionality of si = sd i j = k = 1; 2 at t = 0, travel to q,; 3 while true do 4 while Ci still searching for next target ij do 5 Si = Sdm; 6 if Ci captures 7_i,k then 7 k 8 C travels to 4- (qi(t), q_j(t),'D(Ti,k)); 9 k+1; % Ci has found 74,; 10 j <- j + 1; 11 travel to 'D(Tij,); 12 travel to qp, capture any target Tjj found along the way; In words, si = s affords Ci the following benefits: (1) Ci has a reasonable chance of capturing T7j once she begins searching from qp, and (2) Ci improves her chance of capturing a target when a positional advantage she possesses can be exploited. The next proposition cements the latter point. Proposition 7.3. Consider a DE-CPRG in which si = s=. For k > 2, define t-i,k = td(7_i,k_1), i.e., the time C-i delivers 7_i,k_1 and begins searching for 7i,k. Forj E Z>1 , define K(j) to be the set of k E Z such that ta(T-i,k) E and Ci has visited q capturing 'k+j (i P (Ci <- (ii) P (Ci <(iii) P (Ci in [td (Tj), t-,k]}. [tdc(Ti,j), tc(Ti,j+1)] For j E Z> 1 , k E K(j), Cij's probability of satisfies: k+j ) > ck+j) <-7k+j) 01* is non-decreasing over > P (Ci +- [0t, ] ! k'ij in k, for all k' > k, k' E K(j), where td(Wij) and te(Ti,j+1 ) are the times at which Ci delivers Wi, and captures Ti,j+1, respectively. 130 C has probability at least U; of capturing the target in Ci, Proof. By traveling to the next stage of the DE-CPRG. Moreover, by using si = Sdm during all subsequent stages until Ci captures her next target, and only shifting do so, (i) is established. If ei shifts E8(t-i,k) E8 when it is profitable to E8(t-i,k+1), then it must be that (1) - doing so was profitable for Ci, or (2) C-i delivered 'Di,k inside E8(t-i,k). In the first case, Ci improves P (Ci +1by Wk+j+1). In the second case, ei ensures P (Ci <- Tk+j+1) > positioning herself to whichever side of 'D(Ti,k) is the more advantageous, i.e., controls the larger share of fo, for Ci, before the next stage commences. This argument establishes (ii). Finally, (iii) follows from the conjunction of the previous E arguments. Proposition 7.3 speaks to Ci's ability to continually improve her capture probability following each stage in which she was unsuccessful under s . The next result addresses Ce's associated long-run capture rate. Proposition 7.4. Consider a DE-CPRG. By using si = s, Ci can ensure that her utility satisfies min ig(sm, S-i) > S-i dmi 2+ > = 2da +1 (7.35) -(7.36) 3+ where dl max(O,d-da,) max(O,d-da) 2da ffj(q)D(qf j)dq is the average time Ci takes to travel from D(Wi,) to Q'. q Proof. To establish the bounds in (7.35)-(7.36), we again structure our analysis according to a combination of aggregate statistics and worst-case arguments. Figures 7-5 (above) and 7-6 (below) provide visual support for the arguments that follow. For s E S, denote the set of games Ci loses after delivering as 'Pij(s). i,;_ 1 , but before reaching 4 We will refer to 'Pij(s) as Ci's j-th positioning phase. Similarly, denote the set of games Ci loses after having started actively searching from 4 , but before finding 73 as 8, (s). Likewise, we will refer to 83,(s) as Ci's j-th searching phase. 131 Figure 7-5: Visual breakdown of a typical interval spanning the time between successive target captures for Ci using si = s . On average, it takes Ci time d to return to q- after delivering T. From the perspective of Ci, in the worst-case, C_. finds a target at time td('Ti,j)+, which, on average, is delivered in time da. The number of games that constitute Pij (s) and Sj(s) are denoted by IPij(s) and 18,j(s)1, respectively. On occasion, and when the meaning is clear, we will drop the dependence on s. Assuming, C_ is a worst-case rival to C, C_ would be immediately positioned to capture a target at td(74,j-1)+, i.e., the instant C2 delivers Tj,_1, which she could deliver, on average, in time da. From the geometry of 9Z, E [D(q-,'D, _1 )] = d, so that during 'i,, Ci travels an average distance of d. This leaves, on average, max(O, d -da) time remaining until Ci reaches Q. As noted earlier, when searching uncontested for a target on 'R, the distance a cow must travel from delivery to pickup, and vice versa, is, on average, at least da. Therefore, while (i travels to f, Ci can capture, on average, no more than max(O, d - da)/ 2 da targets in addition to the target captured at td('Yi,-1)+, i.e., lim max E ['Pi,j(s , si))] < 1 + max(O, d -- +0 S-_m2da - da) (7.37) A visualization for this most recent line of reasoning is provided in Figure 7-5. Continuing, for each game in Sj, Ci's chance of capturing Ti, is, by Proposition 7.3, a Bernoulli random variable with a probability of success of at least UO. Hence, the duration of 8 is, on average, no more than (1/OU) , 1 stages. However, _i has already explored part of E2 . it is possible that at the instant C, reaches - In a worst-case setting, we can resolve the situation by assuming the target in any 132 partially completed game is found by C-i. In this case, lim maxE [I8i,j(sd,si) ] <; 1 +1 /0 - ) = ~. (7.38) Referring to Figure 7-6, the fraction of targets 12 captures using si = s i.e., s-i), is the fraction of green circles in an infinitely long sample run, or uag (s, mathematically, Uia() 1 lim E ['Pi, (s)1] + lim E [18ii(s)j] + 1' (7.39) where the additional 1 in the denominator of (7.39) represents a green circle, i.e., a capture, in the sequence of games. Then for si = s Ci's worst case performance satisfies min Ujag(s, s-i) S-i min s-i lim E [I'iy(s, s-i)l] + lim E 2+ 1 max(0, d-d) 2da [I8gj(s , s-i)l] + 1 (7.40) (7.41) 1 U The second inequality, i.e., (7.36), follows from the fact UEc (0, 1), with this rightmost term representing the fraction of games eCwould win if she were to travel to qP following each stage, i.e., both successful and unsuccessful capture bids. The fact that Ci can, in most instances, guarantee a utility strictly greater than this value is a testament to the persistent guarantees of s m As a quick remark, note that for games in which fo and fD are more-or-less unimodular, Of ~ j and d ~ da, such that (7.35) is approximately, . Conversely, for games in which fo and fD are uniform over 'R, 4 is any point on 'R, such that repositioning is unnecessary, i.e., d= 0, and 0 = j, such that min,_ ua"(si s) also 1. Naturally, general cases, i.e., general fo and f, 133 is require customized analysis (a) A segment of games, from a sample run, illustrating the outcome of each game from the perspective of C1 . Green circles denote games in which C, captures a target. Red circles denote games during Ci's reposition phase, P in which C- captures the target. Finally, blue circles denote games during Ci's search phase, S ,, in which C- captures the target. Ti, j-1 'i,j Ti, j i,j+1 Si,j+1 Ti,j+1 (b) A segment of games, from a sample run, in which Ci captures a target during P , i.e., en route to qi. In this case, S = 0, i.e., Ci transitions from Pj to Pje without a search phase. Figure 7-6: Segments from possible sample runs, from the perspective of Ci, of a DE-CPRG. to understand how distribution specifics affect the relationship between d and U4z that drives (7.35). Nevertheless, it is comforting to know that for select distributions, (7.35) is not overly removed from -. In instances where this discrepancy is acceptable, perhaps owing to the inability to implement highly complex search strategies or because better performing strategies are difficult to identify altogether, it is reasonable to advocate Ci adopt sm when playing a DE-CPRG. 7.6 Conclusions and Future Directions This chapter investigated scenarios in which agents compete to capture targets that arrive dynamically in an environment. For games in which targets must be discovered and then delivered to random locations, it was established that greedy strategies, in general, perform poorly. In response, we specified a search strategy with maximin undertones that is tailored to the persistent nature of the game and, using a high-level aggregate-oriented analysis, provided performance bounds on the worst-case capture ability of the strategy. For select target distributions, we showed this search policy was able to guarantee a capture rate that was either on par with or within a respectable factor of what would be achieved in an equilibrium. 134 Dynamic environment competitive search games are defined, in large part, by the rules governing the arrival and departure of targets to and from the workspace. Permuting solely over the various mechanisms by which targets can (i) enter, (ii) accumulate, and (iii) exit the environment, one quickly generates a collection of games that is both sizeable and functionally diverse. By way of the DE-CPRG, this chapter has studied a number of issues likely to be of overarching relevance to an assortment of these games. However, there remains a bevy of encounters yet to receive a treatment tailored to the specifics by which their environments evolve. For more elaborate formulations, analyzing these games on an agent-by-agent basis may, as with the DE-CPRG, prove difficult. Developing the appropriate modeling formalisms and the necessary analytic techniques to study these games is a natural course for future work. We speak to these points in more specific terms in the next chapter. 135 136 Chapter 8 Summary and Future Directions This thesis was motivated by the study of systems in which multiple mobile, selfinterested agents compete to capture targets. In the scenarios considered, each agent had minimal sensing capabilities and limited prior knowledge of each target's location. It was argued that many real-world systems, including taxi fleets, are, in large part, driven by similar inter-agent search dynamics. For a variety of related scenarios, we asked the question, "What strategy should a particular agent use to search for targets?" To provide an answer, we introduced Cow-Path games, scenarios in which hungry cows compete to capture edible targets, as a framework to understand agent decision-making in adversarial search settings. It was argued that the most basic environment for which the Cow-Path game affords interesting options for strategic play was a ring. As a prelude to our study of these competitive search encounters, we considered the Cow-Path Ring Problem or CPRP. Here, a single hungry cow, guided by prior information, searches the ring to find a target in minimal expected time. It was shown that the CPRP is a variant of the well-known Cow-Path Problem, and many of the conditions necessary for search plan optimality hold for both scenarios. Additional stipulations for optimal searching in the CPRP were derived by exploiting the ring's circular topology. Our analysis of the CPRP was encapsulated by two algorithms that, during execution, inspect the locations of adjacent turning points to return an optimal search plan. 137 The primary contribution of the thesis was our analysis of the Cow-Path Ring Game, or CPRG, a scenario in which two cows compete to capture a target. Key features of the formulation included a shared prior on the target's location and the ability of each cow to track her rival's motion. Strategic options available to each cow made it difficult to determine the location and number of times a cow should turn. To gain analytic traction, we first considered a simplification of the game in which each cow may turn at most once. For any E > 0, a strategic algorithm was presented to determine an E-Nash equilibrium, i.e., a search profile from which unilateral deviation cannot appreciably improve a cow's probability of capturing the target. When each cow may turn a finite number of times, it was shown the game may be cast as a dynamic programming problem. In this way, a cow searches by considering not only her chance of finding the target before the next turn, but also her chance of finding the target in the equilibria of the ensuing subgame. Recognizing that the CPRG hinges critically on a number of modeling assumptions, the remainder of the thesis varied a feature of the standard CPRG and studied the associated effects on each cow's decision-making. To this end, the CPRG with asymmetric information examined search encounters, also on the ring, in which each cow had a unique target prior. Additionally, each cow maintained a prior on what she believed her rival's prior to be. When the cows have perfect knowledge, previous results were shown to extend readily. However, for a family of games in which one cow had superior situational awareness, a strategic algorithm was presented that allowed the more informed cow to leverage informational asymmetries whenever possible. As a change of pace, we presented a definition of social welfare for games with asymmetric information and characterized a socially optimal search policy. Finally, in the dynamic environment CPRG, targets arrived on the ring dynamically, and each target represented a request for transport from an origin to a destination point. Here, the goal of each cow was to maximize the fraction of targets she captured in steady-state, with capture requiring a target's transport requirement be fulfilled. Upon introducing the necessary machinery, we argued that greedy searching, 138 in which a cow searches to maximize her probability of finding the most recent target to appear in the workspace, can, in particular instances, provide arbitrarily poor performance. The parity of dynamic encounters on a ring is formalized by showing that each cow captures half of all targets in any equilibrium. Moreover, we argue that this quantity is useful as an upperbound on the worst-case utility of any search strategy. Recognizing that it is difficult to resolve the long-term effect of short-term actions in an equilibrium setting, we advocated for defensive strategies that offer reasonable worst-case guarantees. We provided one such search strategy, bounded its performance in the worst case, and showed that while conservative in general, it provides performance that, for select target distributions, is within a respectable constant factor of the utility achieved in any equilibria. Taking stock, our research efforts focused on search encounters that took place on a ring and involved two cows. Looking forward, there are a number of open problems that may be of interest to the decision-theory community. Natural extensions include games that take place in alternate environments, feature three or more cows, or both. For example, recognizing that graph environments are a representative abstraction of the road networks on which taxis drive, they are an intriguing venue on which to stage Cow-Path games. In this direction, known results from graph theory may provide useful constructs for characterizing effective search strategies. It may also prove insightful to extend the defensive search strategies from Chapter 7 to graphs. In crafting that approach for rings, we benefited from only having to manage search efforts along two frontiers, a luxury absent in graphs. Nevertheless, there ought to be some means of extending the notion that when in low-valued regions of the workspace, it is better to relocate to more favorable confines and execute some type of maximin sweep. As mentioned, Cow-Path games contested between three or more cows is another avenue of future research. Given the feedback nature of Cow-Path games, we speculate that stepping from two- to three-cow games may increase the complexity of the strategic analysis considerably. However, a third cow may also initiate new forms 139 of strategic play. For example, it may be prudent for two of the cows to collude at the expense of the third. Characterizing this mechanism, the conditions under which it occurs, or refuting its existence entirely would shed fundamental insight into the dynamics of general multi-cow games. More fortuitously, we anticipate the marginal complications imposed by adding a fourth, fifth, or n-th cow to the ring will quickly saturate, as each cow need only concern herself with justifying her actions relative to her two immediate neighbors, suggesting the potential to develop a general theory for n-cow games on the ring. In Chapter 6, we provided a notion of social utility that relates to maximizing the collective perceived probability of finding the target. Alternatively, we could also prescribe a temporal notion of social utility. For example, in a DE-CPRG, we could take the social utility to be the negative of the average amount of time a target spends in the environment before being discovered. From this temporal perspective, socially optimal strategies are those that involve two cows cooperating to minimize the average system time of targets. This would be justifiable if the cows were assured, in advance, that all targets would be split evenly between them. Clearly, competition to secure targets in the DE-CPRG is not necessarily aligned with ensuring targets are found in a timely manner. By contrasting the average time a target spends before being picked up in these two competing frameworks, one could develop a second notion of price of anarchy associated with competitive search games, and one that is likely more pertinent from the target's perspective. The DE-CPRGs games studied in this thesis represent only a small portion of what is a large and diverse family of problems. In a dynamic setting, entertaining new options for strategic play and novel models to describe the evolution of the workspace may provide plentiful returns. To this point, recall that in Example 7.3, an equilibrium strategy was for each cow to "take-turns" exploring a particular portion of the ring that has a high chance of containing the target. This type of queueing phenomena may be observed, more overtly, in taxi systems, where vehicles line up at dedicated stands. A more precise characterization of queuing-based searching may 140 provide guidance as to when and where this type of strategy is most appropriate. Similarly, in many dynamic settings, targets accumulate in the workspace with multiple targets often up for grabs at the same time. Investing in new formulations, e.g., those that represent the accumulation of targets and movement of agents using continuum models, would likely be necessary to provide a tractable analytic base. Although such an approach favors a more macroscopic view of search operations, in contrast to the agent-based formulations studied in this thesis, any results to emerge would naturally invite opportunities for validation and integration with real-world data. To conclude, this thesis considered search scenarios in which agents compete to capture targets given a prior on the location of targets. To understand the decisionmaking process of agents in these settings, we introduced and analyzed a collection of stylized scenarios that emphasized adversarial searching between two agents on a ring. Despite providing both a useful venue to frame competitive search games, as well as an initial body of results, there remains a panoply of open research directions worthy of future investigation. Progress in the identified areas would serve to both broaden and deepen the state of the art in this relevant, and we believe promising, branch of search theory. With these ideas in mind, we close with the following remark. Although the study of competitive search games is not without its share of challenges, work in this area has the potential to shed valuable insight into the competitive tension at play in a host of relevant systems. Through the study of Cow-Path Games, this thesis has demonstrated that, under appropriate modeling abstractions, it is possible to formally analyze the decision-making process of agents that compete with one another to find targets. Moreover, these results may serve as a valuable stepping stone in the pursuit of many of the aforementioned future work items. KS 141 142 References [11 N. Agmon, S. Kraus, and G. Kaminka. Multi-robot perimeter patrol in adversarial settings. Proceedings of the International Conference on Robotics and Automation, 2008. [2] M. Ahmadi and P. Stone. A multi-robot system for continuous area sweeping tasks. Proceedings of the International Conference on Robotics and Automation, 2006. [3] M. Aigner and M. Fromme. Mathematics, 1984. A game of cops and robbers. Discrete Applied [4] S. Alexander, R. Bishop, and R. Ghrist. Capture pursuit games on unbounded domains. 1'Enseignement Mathematique, 2009. [5] S. Alpern and S. Gal. 2010. The theory of search games and rendezvous. Springer, [6] B. Alspach. Searching and sweeping graphs: a brief survey. Matematiche, 2004. [7] V.I. Arkin. A problem of optimum distribution of search effort. Probability Applications, 1964. Theory of [8] A. Arsie, K. Savla, and E. Frazzoli. Efficient routing algorithms for multiple vehicles with no explicit communications. Transactions on Automatic Control, 2009. [9] R. Baeza-Yates, J. Culberson, and G.J.E. Rawlins. Searching the plane. Information and Computation, 106:234-252, 1993. [10] L. Barriere, P. Fraigniaud, N. Santoro, and D. Thilikos. Searching is not jumping. Proceedings of the 29th Workshop on Graph Theoretic Concepts in Computer Science, 2003. [11] T. Basar and G.J. Olsder. Dynamic Noncooperative Game Theory. SIAM, 1999. [12] A. Beck. On the linear search problem. Israel Journal of Mathematics, 1964. [13] A. Beck. More on the linear search problem. Israel Journal of Mathematics, 1965. 143 [14] A. Beck and M. Beck. Mathematics, 1984. Son of the linear search problem. Israel Journal of [15] A. Beck and D. Newman. Yet more on the linear search problem. Israel Journal of Mathematics, 8, 1970. [16] R. Bellman. An optimal search problem. SIAM Review, 1963. [17] S.J. Benkoski, M.G. Monticino, and J.R. Weisinger. A survey of the search theory literature. Naval Research Logistics, 1991. [18] D.J. Bertsimas and G.J. van Ryzin. A stochastic and dynamic vehicle routing problem in the Euclidean plane. OperationsResearch, pages 601-615, 1991. [19] D.J. Bertsimas and G.J. van Ryzin. Stochastic and dynamic vehicle routing in the Euclidean plane with multiple capacitated vehicles. Operations Research, pages 60-76, 1993. [201 D.J. Bertsimas and G.J. van Ryzin. Stochastic and dynamic vehicle routing with general interarrival time distributions. Advances in Applied Probability, pages 947-978, 1993. [21] B. Bethke, J.P. How, and J. Vian. Group health management of UAV teams with applications to persistent surveillance. Proceedings of the American Control Conference, 2008. [22] D. Bhadauria and V. Isler. Capturing an evader in a polygonal environment with obstacles. In Proceeding of the Joint Conference on Artificial Intelligence, 2011. [23] M. Bloomberg and D. Yassky. Taxi cab fact book. New York City Taxi and Limousine Commission, 2014. [24] S. D. Bopardikar, F. Bullo, and J. P. Hespanha. Sensing limitations in the lion and man problem. Proceedings of the American Control Conference, 2007. [25] R. Borie, C. Tovey, and S. Koenig. Algorithms and complexity results for pursuitevasion problems. Proceedings of the InternationalJoint Conference on Artificial Intelligence, 2009. [26] F. Bullo, J. Cortes, and S. Martinez. Distributed Control of Robotic Networks: A mathematical approach to motion coordination algorithms. Princeton University Press, 2009. [27] F. Bullo, E. Frazzoli, M. Pavone, K. Savla, and S.L. Smith. Dynamic vehicle routing for robotic systems. Proceedings of the IEEE, 2011. [28] A. Charnes and W.W. Cooper. The theory of search: optimum distribution of search effort. Management Science, 1958. 144 [29] Y. Chevalyre. Theoretical analysis of the multi-agent patrolling problem. Proceedings of Intelligent Agent Technology, 2004. [30] M.C. Chew. A sequential search procedure. The Annals of Mathematical Statistics, 1967. [31] H. Choset. Coverage for robotics - a survey of recent results. Annals of Mathematics and Artificial Intelligence, 31:113-126, 2001. [32] T.H. Chung and J.W. Burdick. Multi-agent probabilistic search in a sequential decision-making framework. Proceedings of the InternationalConference on Robotics and Automation, 2008. [33] T.H. Chung, G.A. Hollinger, and V. Isler. Search and pursuit-evasion in mobile robotics: a survey. Autonomous Robots, 2011. [34] T.H. Chung and R.T. Silvestrini. Modeling and analysis of exhaustive probabilistic search. Naval Research Logistics, 2014. [35] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2009. [36] E Demaine, S.P. Fekete, and S. Gal. Online searching with turn cost. Theoretical Computer Science, 2006. [37] J.M. Dobbie. A survey of search theory. Operations Research, 1968. [38] J. Enright and E. Frazzoli. Optimal foraging of renewable resources. Proceedings of the American Control Conference, 2012. [39] F.V. Fomin and D.M. Thilikos. An annotated bibliography on guaranteed graph searching. Theoretical Computer Science, 2008. [40] R.L. Francis and H.D. Meeks. On saddle point conditions and the generalized Neyman-Pearson problem. Australian Journal of Statistics, 1972. [41] D. Fudenberg and J. Tirole. Game Theory. MIT Press, 1991. [42] S. Gal. Search Games. Academic Press, New York, 1980. [43] G. Gottlob, N. Leone, and F. Scarcello. Robbers, marshals, and guards: game theoretic and logical characterizations of hypertree width. Journal of Computer and System Sciences, 2003. [44] L.J. Guibas, J. Latombe, S.M. LaValle, D. Lin, and R. Motwani. Visibility-based pursuit-evasion in a polygonal environment. InternationalJournal of Computational Geometry and Applications, 1999. [45] R. Hohzaki. Discrete search allocation game with false contacts. Naval Research Logistics, 2007. 145 146] V.A. Huynh, J.J. Enright, and E. Frazzoli. Persistent patrol in stochastic environments with limited sensors. In Proceedings of AIAA Conference on Guidance, Navigation, and Control, 2010. 1471 R. Isaacs. Games of pursuit: A technical report, 1951. 148] R. Isaacs. Differential Games. Dover Publications, 1965. [49] V. Isler, S. Kannan, and S. Khanna. Randomized pursuit evasion in a polygonal environment. IEEE Transactions on Robotics, 2005. [50] J.B. Kadane. Discrete search and the Neyman-Pearson lemma. Journal of Mathematical Analysis and Applications, 1968. [51] J.B. Kadane. Optimal whereabouts search. Operations Research, 1971. [52] J.B. Kadane and H.A. Simon. Optimal strategies for a class of constrained sequential problems. The Annals of Statistics, 1977. [53] D.V. Kalbaugh. Optimal search among false contacts. SIAM Journal of Applied Mathematics, 1992. [54] M. Kao, J.H. Rief, and S.R. Tate. Searching in an unknown environment: an optimal randomized algorithm for the Cow-Path Problem. Information and Computation, 1996. [551 A. Kolling and S. Carpin. The GRAPH-CLEAR problem: definition, theoretical properties and its connections to multi-robot aided surveillance. Proceedings of the Conference on Intelligent Robots and Systems, 2007. [56] A. Kolling and S. Carpin. Probabilistic graph-clear. Proceedings of International Conference on Robotics and Automation, 2009. [57] A. Kolling and S. Carpin. Pursuit-evasion on trees by robot teams. IEEE Transactions on Robotics, 2010. [58] B.O. Koopman. The Summary Reports Groups of the Columbia University Division of War Research. OEG Report No. 56, 1946. [59] B.O. Koopman. The theory of search part I: kinematic bases. Operations Research, 1956. [60] B.O. Koopman. Research, 1956. The theory of search part II: target detection. Operations [61] B.O. Koopman. The theory of search part III: the optimum distribution of searching effort. Operations Research, 1957. [62] B.O. Koopman. Monthly, 1979. Search and its optimization. 146 The American Mathematical [631 B.O. Koopman. Search and Screening: General Principles with Historical Applications. Pergamon Press, 1980. [64] S. Kopparty and C.V. Ravishankar. A framework for pursuit evasion games in Rn. Information Processing Letters, 2005. [65] H. Lau, S. Huang, and G. Dissanayake. Probabilistic search for a moving target in an indoor environment. Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006. 166] S.M. LaValle. Planning Algorithms. Cambridge University Press, 2006. [67] S.M. LaValle, D. Lin, L.j. Guibas, J. Latombe, and R. Motwani. Finding an unpredictable target in a workspace with obstacles. Proceedings of International Conference on Robotics and Automation, 1997. [68] M. Lehnerdt. On the structure of discrete sequential search problems and of their solutions. Mathematische Operationsforschungund Statistik Series Optimization, 1982. [69] L. Liu, C. Andris, and C. Ratti. Uncovering cabdriver's behavior patterns from their digital traces. Computers, Engineers, and Urban Systems, 2010. [70] M. Maschler, E. Solan, and S. Zamir. Press, 2013. Game Theory. Cambridge University [71] J. Nash. Equilibrium points in n-person games. Academy of Sciences, 1950. Proceedings of the National [72] J. Nash. Non-cooperative games. The Annals of Mathematics, 1951. [731 New York City Taxi and Limousine Commission. 2014 taxicab fact book. Promotional brochure, 2013. [74] J. Neyman and E.S. Pearson. On the problem of the most efficient tests of statistical hypothesis. Philosophical Transactions A: Mathematical, Physical, and Engineering Sciences, 1933. [75] R. Nowakowski and P. Winkler. Vertex-to-vertex pursuit in a graph. Discrete Mathematics, 1983. [76] J. O'Rourke. Art Gallery Theorems and Algorithms. Oxford University Press, 1987. [77] M. Osborne. An Introduction to Game Theory. Oxford University Press, 2003. [78] T. Parsons. Pursuit-evasion in a graph. lecture notes in mathematics, 1978. 147 Theory and applications of graphs: [79] N.N. Petrov. A problem of pursuit in the absence of information on the pursued. Differentsial'nye Uravneniya, 1982. [80] H.N. Psaraftis. Vehicle Routing: Methods and Studies, chapter Dynamic Vehicle Routing Problems, pages 223-248. Elsevier (North-Holland), 1988. [81] P. Root. Persistent Patrolling in the Presence of Adversarial Observers. PhD thesis, Massachusetts Institute of Technology, 2014. [82] H. Sato and J.O. Royset. Path optimization for the resource constrained searcher. Naval Research Logistics, 2010. [83] P.D. Seymour and R. Thomas. Graph searching and a min-max theorem for tree width. Journal of Combinatorial Theory, 1993. [84] J. Sgall. Solution of David Gale's Lion and Man problem. Theoretical Computer Science, 2001. [85] S. Smith, S. Bopardikar, and F. Bullo. A dynamic boundary guarding problem with translating targets. Proceedings of the Conference on Decision and Control, 2009. [86] S. Smith and D. Rus. Multi-robot monitoring in dynamic environments with guaranteed currency of observations. Proceedings of the Conference on Decision and Control, 2010. [87] S. Smith, M. Schwager, and D. Rus. Persistent robotic tasks: monitoring and sweeping in changing environments. IEEE Transactions on Robotics, 2012. [88] D. Song, C.Y. Kim, and J. Yi. Stochastic modeling of the expected time to search for an intermittent signal source under a limited sensing range. In Proceedings of Robotics Science and Systems, 2010. [891 K. Spieser and E. Frazzoli. The Cow-Path Game: a competitive vehicle routing problem. Proceedings of the Conference on Decision and Control, 2012. [90] K. Spieser and E. Frazzoli. Cow-Path Games with asymmetric information: life as a cow gets harder. Proceedings of the Conference on Decision and Control, 2013. [91] K. Spieser and E. Frazzoli. Dynamic Cow-Path Games: search strategies for a changing world. Proceedings of the American Control Conference, 2014. [92] L. D. Stone. Theory of Optimal Search. Academic Press, New York, 1975. [93] L.D. Stone. Theory of Optimal Search: 2nd Edition. Operations Research Society of America, 1989. [941 C.W. Sweat. Sequential search with discounted income, the discount a function of the cell searched. The Annals of Mathematical Statistics, 1970. 148 [95] S. Zahl. An allocation problem with applications to operations research and statistics. Operations Research, 1963. [96] M. Zhu and E. Frazzoli. On competitive search games for multiple vehicles. Proceedings of the Conference on Decision and Control, 2012. 149