Decentralized Mixing Function Control Strategy for Multi-Robot Informative Persistent Sensing Applications by Gavin Chase Hall B.S. Mathematics, B.S. Mechanical Engineering, B.S. Physics West Virginia University, 2009 Submitted to the Department of Mechanical Engineering in partial fulfillment of the requirements for the degree of MASSACHUSEM OF TECHNOLOGY S1 52014 Master of Science at the UIBRARIES MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2014 @ Massachusetts Institute of Technology 2014. All rights reserved. Signature redacted Author .... Department of Mechanical Engineering May 15, 2014 Signature redacted Certified by Daniela Rus Professor of Electrical Engineering and Computer Science Thpsis Sypervisor Signature redacted Certified by ......... / Jean-Jacques E. Slotine Professor of Mechanical Engineering Signature redactedjhesis Supervisor Accepted by............. NflUE ................. David E. Hardt Chair, Department Committee on Graduate Students 2 Decentralized Mixing Function Control Strategy for Multi-Robot Informative Persistent Sensing Applications by Gavin Chase Hall Submitted to the Department of Mechanical Engineering on May 15, 2014, in partial fulfillment of the requirements for the degree of Master of Science Abstract In this thesis, we present a robust adaptive control law that enables a team of robots to generate locally optimal closed path persistent sensing trajectories through information rich areas of a dynamic, unknown environment. This controller is novel in that it allows the robots to combine their global sensor estimates of the environment using a mixing function to opt for either: (1) minimum variance (probabilistic), (2) Voronoi approximation, or (3) Voronoi (geometric) sensing interpretations and resulting coverage strategies. As the robots travel along their paths, they continuously sample the environment and reshape their paths according to one of these three control strategies so that ultimately, they only travel through regions where sensory information is nonzero. This approach builds on previous work that used a Voronoi-based control strategy to generate coverage paths [32]. Unlike the Voronoi-based coverage controller, the mixing-function-based coverage controller captures the intuition that globally integrated sensor measurements more thoroughly capture information about an environment than a collection of independent, localized measurements. Using a non-linear Lyapunov function candidate, we prove that the robots' coverage path configurations converge to a locally optimal equilibrium between minimizing sensing error and path length. A path satisfying this equilibrium is called an informative path. We extend the informative path controller to include a stability margin and to be used in conjunction with a speed controller so that a robot or a group of robots equipped with a finite sensing footprint can stabilize a persistent task by maintaining all growing fields within the environment bounded for all time. Finally, we leverage our informative persistent paths to generate a dynamic patrolling policy that minimizes the distance between instantaneous vehicle position and incident customer demand for a large fleet of service vehicles operating in an urban transportation network. We evaluate the performance of the policy by conducting large-scale simulations to show global stability of the model and by comparing it against a greedy service policy and historical data from a fleet of 16,000 vehicles. Thesis Supervisor: Daniela Rus Title: Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments I would like to thank Professor Daniela Rus for two fantastic years of my life. Being new to research, she taught me many tricks of the trade to help me hit the ground running. Among them were her vision for the application of theory and her knack for clearly conveying complex solutions to select audiences. I like to think of her as a mother figure, except that she avoids me like the plague outside of school. She has been a great friend, and I can always count on her to give me the straight scoop about any issue. I would like to thank Professor Jean-Jacques E. Slotine for making the technical portions of this research a tad less gruesome. His course lectures single-handedly prepped me for the majority of the theoretical work between the covers of this thesis. I would also like to thank all of my two friends in the Distributed Robotics Laboratory for making me realize that performing amazing work can actually be somewhat enjoyable. My biggest thank you goes to my family. Whether it has been music, science, baseball, or trick-or-treating until I was 28, you've made every step of the way so much better. I hope my progress helps to make up for how much my sister let you down. 5 6 Dedicatedto Dana M. Hall, Michael A. Hall, & Philip E. Harner 7 8 Contents 1 2 Introduction 19 1.1 Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 Contribution to Robotics 1.3 Relation to Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Informative Path Controller Using a Mixing Function 31 2.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2 Mixing Function Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3 Mixing Function Control Law 2.4 Deriving Common Control Strategies 2.5 2.6 2.7 . . . . . . . . . . . . . . . . . . . . . . . . 35 . . . . . . . . . . . . . . . . . . . . 38 2.4.1 Voronoi Control Strategy (a = -oo) 2.4.2 Minimum Variance Probabilistic Control Strategy (a = -1) Mixing Function Controller Convergence . . . . . . . . . . . . . . . . . 39 . . . . 40 . . . . . . . . . . . . . . . . . . 41 2.5.1 Sensory Function Parameterization 2.5.2 Coverage Convergence in a Known Environment . . . . . . . . . . 43 2.5.3 Coverage Convergence in an Unknown Environment . . . . . . . . 44 . . . . . . . . . . . . . . . . . 41 Single Robot Coverage Algorithm and Simulation . . . . . . . . . . . . . . 52 2.6.1 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.6.2 Path Shaping Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.6.3 Varying Sensing Weight W and Path Weight Wn . . . . . . . . . . . 59 2.6.4 Computational Complexity of Single Robot Algorithm . . . . . . . 60 Multi-Robot Coverage Algorithm and Simulation . . . . . . . . . . . . . . 60 9 3 Learning Phase ............................ 62 2.7.2 Path Shaping Phase . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.7.3 Varying Sensing Weight W and Path Weight W .. . . . . . . . . . 67 2.7.4 Computational Complexity of Multi-Robot Algorithm . . . . . . 68 2.7.5 Robustness Considerations for Initial Waypoint Configurations . . 69 . . . . 2.7.1 Informative Persistence Controller for Multiple Robots using Mixing Function Coverage Approach Relation to Persistent Sensing Tasks . . . . . . . . . . . . . . . . . . 76 3.2 Informative Persistence Controller . . . . . . . . . . . . . . . . . . . 77 3.3 Single Robot Informative Persistence Algorithm and Simulation . . . . 79 3.3.1 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.3.2 Path Shaping Phase . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.3 Single Robot Persistence Simulation Discussion . . . . . . . 86 . . . . . 3.1 . . . . 86 3.4.1 Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.4.2 Path Shaping Phase . . . . . . . . . . . . . . . . . . . . . . . 91 3.4.3 Multi-Robot Persistence Simulation Discussion . . . . . . . . 93 Dynamic Patrolling Policy Using an Informative Path Controller 95 . . Multi-Robot Informative Persistence Algorithm and Simulation. . 3.4 4 75 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3 4.4 4.2.1 Using Informative Paths . . . . . . . . . . . . . . . . . . . . . . . 97 4.2.2 Multi-Agent Controller Extension . . . . . . . . . . . . . . . . . . 97 4.2.3 Operational Stability . . . . . . . . . . . . . . . . . . . . . . . . . 98 Dynamic Patrolling Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 101 4.3.2 Solution Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.3 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 102 Modeling Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.4.1 Arrival and Destination Distributions 10 . . . . . . . . . . . . . . . . 105 4.5 4.6 Experiments .................................. 106 4.5.1 Greedy Policy Experiments . . . . . . . . . . . . . . . . . . . . . 107 4.5.2 Single Loop Experiments . . . . . . . . . . . . . . . . . . . . . . . 108 4.5.3 Multi-Loop Experiments . . . . . . . . . . . . . . . . . . . . . . . 108 4.5.4 Unseen Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.6.1 4.7 5 Unseen Test Data Results . . . . . . . . . . . . . . . . . . . . . . . 111 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 113 Conclusion and Lessons Learned 115 A Tables of Mathematical Symbols 11 12 List of Figures 1-1 Path reshaping process for three robots . . . . . . . . . . . . . . . . . . . . 22 1-2 Example of persistent sensing by two robots . . . . . . . . . . . . . . . . . 23 1-3 Informative patrolling loops over historical demand distribution in Singapore 25 2-1 Mixing function sensing behaviors . . . . . . . . . . . . . . . . . . . . . . 33 2-2 Mixing function supermodularity . . . . . . . . . . . . . . . . . . . . . . . 34 2-3 Approximation to indicator function. . . . . . . . . . . . . . . . . . . . . . 37 2-4 Mean integral parameter error and Lyapunov-like function in single robot learning phase . . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2-5 Single robot learning phase with an informative path controller . . . . . . . 56 2-6 Mean waypoint position error under the informative path controller for a single robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2-7 Single robot path shaping phase with an informative path controller 2-8 Single robot W vs. Wn 2-9 Consensus and integral parameter errors for multiple robots . . . . . . . . . 63 . . . . 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2-10 Multi-robot learning phase with informative path controller . . . . . . . . . 64 2-11 Mean waypoint position error under the informative path controller for multiple robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2-12 Multi-robot path shaping with informative path controller . . . . . . . . . . 66 2-13 Informative path configurations for two environments with different mixing functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2-14 Multiple robot W vs. Wn . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 2-15 Sensitivity of Voronoi path separation 13 . . . . . . . . . . . . . . . . . . . . 70 2-16 Sensitivity of Voronoi smoothing path separation . . . . . . . . . . . . . . 71 2-17 Relaxed path separation sensitivity of minimum variance coverage strategy 2-18 Setup for testing initial waypoint position sensitivity 3-1 72 . . . . . . . . . . . . 73 Integral parameter error and Lyapunov function candidate of the informative persistence controller for a single robot . . . . . . . . . . . . . . . . . 82 3-2 Single robot learning phase with informative persistence controller . . . . . 83 3-3 Mean waypoint position error for a single robot . . . . . . . . . . . . . . . 84 3-4 Persistent task stability margin for a single robot . . . . . . . . . . . . . . . 85 3-5 Single robot path shaping phase with the informative persistence controller 3-6 Integral parameter error under the informative persistence controller for 85 m ultiple robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3-7 Lyapunov-like function in learning phase under the informative persistence controller for multiple robots . . . . . . . . . . . . . . . . . . . . . . . . . 89 3-8 Consensus error under the informative persistence controller for multiple robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3-9 Multi-robot learning phase with informative persistence controller . . . . . 90 3-10 Mean waypoint position error and persistent sensing task's stability margin for multiple robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3-11 Multi-robot path shaping phase with informative persistence controller . . . 92 4-1 Arrival and destination distribution and map of Singapore with informative loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4-2 Dynamic patrolling policy simulator . . . . . . . . . . . . . . . . . . . . . 104 4-3 Single-loop (CBD) and multi-loop (Singapore-wide) simulation results 14 . . 107 List of Algorithms 1 Mixing function controller for a single robot in an unknown environment: robot level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 2 Mixing function controller for a single robot in an unknown environment: waypoint level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3 Mixing function controller for multiple robots in an unknown environment: robot level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4 Mixing function controller for multiple robots in an unknown environment: waypoint level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 . . . . 80 5 Informative persistence controller for a single robot: waypoint level 6 Informative persistence controller for multiple robots: waypoint level 7 Patrolling policy pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . 100 15 . . . 87 16 List of Tables 4.1 Total patrolling policy ONCA L L distance ratios and utilization factor over 24-hour sim ulations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111 . . . . . . . . . . . . . . . . . 115 A. 1 Common symbols for each control strategy A.2 Single-robot controller symbols A.3 Multi-robot controller symbols . . . . . . . . . . . . . . . . . . . . . . . . 117 . . . . . . . . . . . . . . . . . . . . . . . 116 17 18 Chapter 1 Introduction 1.1 Motivation and Goals When monitoring an unfamiliar and changing environment, robots face two significant challenges: (1) the robots must identify the regions and rates of change corresponding to important sensory information in the environment, and (2) the robots must determine how to optimally position and allocate themselves to cooperatively collect this information. Given a group of robots, each equipped with a sensor to measure the environment, our goal is to derive an adaptive multi-robot control strategy that enables robots to employ multiple sensing strategies to generate a configuration of closed paths that the robots will travel along to maximize their collection of sensory information. These paths are called informative paths, because they drive the robots through locations in the environment where the sensory information is important. Informative path planning brings the notion of sensory value into the planning problem, thus adding information to the geometric formulation. This has many applications. For example, it can be used by a team of robots operating in an underground mining environment to estimate regions of significant CH4 accumulation and generate paths that enable continual monitoring of these locations to prevent future mining disasters. Another example, would be to use this controller to deploy a team of robots to monitor a wildfire over a large environment. The robots would be able to learn the regions where the fire has spread on-line and generate paths such that between all robots, all locations with fire damage are 19 constantly sensed, while regions with no fire damage are avoided. In this thesis, we present a decentralized adaptive control algorithm for generating informative paths in unknown environments for multi-robot systems. This algorithm takes as input sensory information collected by each robot over a dynamic environment and outputs locally optimal paths whose trajectories cover the regions in the environment where sensory information is nonzero. The first feature of this informative path algorithm is a parameter estimation adaptation law the robots use to learn how sensory information is distributed throughout the environment. The second feature is a mixing-function-based coverage controller that performs the reconfiguring of the paths based on a gradient optimization of a cost function comprised of a class of parameterized sensor mixing functions [25]. Mixing functions dictate how sensor measurements from different robots are combined in order to represent different assumptions about the coverage task. By varying the value of a free parameter, and consequently the mixing function class and the robots' aggregate sensor mixing behaviors, the mixing function control strategy can recover multiple common control strategies, including minimum variance (probabilistic), Voronoi smoothing, and strictly Voronoi (geometric) based approaches. These control strategies represent a wide range of robot sensor combination abilities. Whereas the probabilistic control strategy promotes global sensor mixing by minimizing the expected variance of all robots' sensor measurements of a point of interest, the geometric control strategy employs no sensor mixing and only considers the sensor measurements of the robot closest to a point of interest [7, 25]. Voronoi smoothing bridges these two strategies by either increasing or decreasing the amount of sensor synthesis between robots in accordance to the mixing function's free parameter value. The mixing function controller is an extension to the probabilistic and geometric unifying controller introduced in [25]. It consists of placing the waypoints of a path in locally optimal positions that achieve an equilibrium between minimizing the cost of a class of mixing functions, sensing errors, and informative path lengths. Minimizing sensing errors allows the robots to be close to the points of interest that they are sensing. Minimizing path length reduces the robots' travel time through the regions of the environment with no sensory value. As the robots discover the structure of the environment, they reshape their paths 20 according to this equilibrium to avoid visiting static areas and focus on sensing dynamic areas. An example of the reshaping process for three robots using a probabilistic control strategy is shown in Figure 1-1. A number of task-specific multi-robot control strategies have been proposed to accomplish informative path planning in a distributed and efficient way [7,26]. In [27,33], geometric based Voronoi controllers with no sensor measurement synthesis between robots were used. As a result of the robustness of the mixing function control strategy, not only can we directly recover the Voronoi-based controller, we can also smoothly approximate Voronoi-based coverage arbitrarily well, while preserving its asymptotic stability and convergence guarantees. Another benefit of the mixing function control strategy is that we can employ a probabilistic approach to circumvent localized sensing sensitivities intrinsic to geometric Voronoi-based controllers, that cause a significant topological variance in informative path configurations for nearly identical corresponding initial robot configurations. Finally, because the mixing function controller does not require the expensive computation of a Voronoi tessellation over an environment for any of its derived control strategies, informative paths can be generated by robots with limited computational resources. Generating informative paths using the mixing function control strategy is the first step in stabilizing a persistentsensing task [30, 31] in unknown environments, where the robots are assumed to have sensors with finite sensing radii and are required to revisit a location of interest at a specific calculated frequency. A persistent sensing task is defined as a monitoring scenario that can never be completed due to the continual change of the states of the environment. For example, if the robot were to stop monitoring, the information at some points in the environment would grow unbounded. For a persistent sensing task, we also calculate the speed at which robots with finite sensor footprints collect sensory information along an informative path by instituting a stability margin that guarantees a bound on the difference between the robots' current estimate of the environment and the actual state of the environment for all time and all locations. A consequence of having finite sensing radii is that the robots are unable to collect data over the entire environment at once. Therefore, as the data over a dynamic region become outdated, the robots must return to that region to collect new data. In order to prevent the 21 0 0.1 0.6 1.4 0.4 0 .2 0.2 0.2 0.4 0.6 0.8 0 0.2 (a) Iteration: 1 0.4 0.6 0.6 (b) Iteration: 10 11 0 0 .6 0.4 0 .4 0 .2 0c 0.2 0'4 0.6 0.8 0 1 0.2 (c) Iteration: 25 0.4 06 0.8 (d) Iteration: 50 0. .80 0 0. 6 .6 0 4 .4 0 0. 2 O .2 0.2 0.4 0.6 0.8 90 (e) Iteration: 75 0.2 0,4 0.6 0.6 (f) Iteration: 100 Figure 1-1: Starting with initial sweeping paths, the robots learn about the environment by observation; the observations are then used to transform the paths so that they are aligned with the important parts of the environment. The paths correspond to the trajectories of the three robots, where the robots' positions along these paths are represented by the black arrows. The important regions of the environment are shown in green. robots' model of the environment from becoming too outdated, [30] presented a persistent sensing controller that calculates the speeds of the robots at each point along given paths, 22 i~e (a) Stable (b) Unstable Figure 1-2: Example of persistent sensing by two robots. Each robot with a finite sensing radius (red and blue circles around the robots' positions) travels through its path, with the objective of keeping the accumulation function (green dots) low everywhere. The robots collect data at the dynamic regions and shrink the accumulation function. The size of the green dot is proportional to the value of the accumulation function at that location. On the left, a stable speed profile maintains the accumulation function bounded everywhere for all time, whereas on the right, the speed profile is not stable, and the accumulation function grows unbounded in some locations. that are fittingly referred to as speedprofiles. This speed controller enables robots to visit faster changing areas more frequently than slower changing areas. The persistent sensing problem is defined in [30] as an optimization problem whose goal is to keep a time varying field as small in magnitude as possible everywhere. This field is referred to as the accumulationfunction. Where it is not covered by the robots' sensors, the accumulation function grows unbounded, thus indicating a rising need to collect data at that location. Likewise, the accumulation function shrinks where it is covered by the robots' sensors, indicating a decreasing need for data collection. A stable speed profile is defined as one that bounds the largest magnitude of the accumulation function. In this thesis, we extend the computed informative path configurations obtained from the mixing function controller to be used in conjunction with stabilizing speed profiles from the persistent sensing controller to create an informative persistence controller [33] that locally optimizes a persistent sensing task. Figure 1-2 shows an example of two robots performing a persistent sensing task with both a stable and unstable speed profile. 23 By assigning the accumulation function to a physical parameter such as oil spill levels, airborne particulate matter accumulation, or aggregate sensor errors, persistent sensing becomes a very practical approach to a wide array of real world monitoring scenarios. In this thesis, we consider an informative persistent sensing approach to urban Mobility-onDemand (MOD) systems, where the accumulation function represents the historical number of passenger arrivals at discrete sets of locations over a defined period of time. In our previous work [22], we showed that autonomous driving can be used to mitigate the rebalancing problem current MOD systems face. Our objective is to minimize the waiting time of the passengers and the amount of time the vehicles in the system drive empty between subsequent customer requests. The critical question is where should each vehicle go once a delivery is complete? To solve this problem, we leverage our informative paths and persistent sensing controllers to develop optimized task allocation algorithms in the form of a dynamic patrolling policy for a fleet of MOD service vehicles such as taxis. By using historical arrival distributions as input to our control algorithms, we can compute patrolling loops that minimize the distance driven by the vehicles to get to the next request. The algorithm was trained using one month of data from a fleet of 16,000 taxis in Singapore. The resulting informative loops are used to redistribute the vehicles along stationary virtual taxi stand locations along the loop. We compare the policy computed by our algorithm against a greedy policy as well as against the ground truth redistribution of taxis observed on the same dates and show up to a 6 x reduction over historical data in customer waiting time and taxi distance driven without a passenger. These metrics represent two key evaluation criteria: (1) quality of customer service and (2) fuel efficiency. 1.2 Contribution to Robotics This thesis makes the following contributions: * A decentralizedrobust informative control strategy. We extend the probabilistic and geometric unifying controller presented in [25], so that now instead of statically partitioning themselves, a team of robots can adaptively compute closed online paths that continually travel through regions discovered to be important by observation in an 24 145 125- -04- (b) Customer Demand Distribution (a) Patrol Loops Figure 1-3: Evolution of six informative patrol loops over historical demand distribution in Singapore. Service vehicles travel along these loops to minimize the driving distance between subsequent customer pick-ups. Each loop is updated 96 times over a 24 hour time to account for differences in customer demand throughout the day. The peak amplitude in the customer demand distribution represents the central business district of Singapore. unknown and dynamic environment. The provably stable and decentralized adaptive coverage controller uniquely combines robots' global sensor measurements based on a mixing function to learn the location of dynamic events in the environment and simultaneously computes closed informative paths based on these aggregated sensor behaviors. A mixing function controller is advantageous because it is amenable to geometric, probabilistic, and analytical interpretations, all of which have previously been presented separately [25]. We introduce a family of mixing functions with a free parameter, a, and show that different values of the parameter correspond to different assumptions about the coverage task, specifically showing that a minimum variance solution (probabilistic strategy) is obtained with a parameter value of a = -1, Voronoi coverage (geometric strategy) is recovered in the limit a = --co, and Voronoi smoothing coverage is recovered for 0 > a > -co. Using a minimum variance controller (a = -1), we offer an improvement in informative path stability over the Voronoi approach presented in [33], by showing that small differences in the robots' initial waypoint positions do not result in sig25 nificantly different informative path configurations. We derive both single robot and multi-robot cases for each informative path control strategy and prove asymptotic stability and convergence using Lyapunov stability theorems. We develop and analyze single robot and multi-robot informative path algorithms, and perform simulations in MATLAB for both cases. 9 An informative persistencepath controller. As [33] did for a Voronoi-based control strategy, we extend our robust adaptive coverage controller so that persistent sensing tasks can now be performed in unknown environments when the robots' stabilizing path configurations are unknown a priori. Combining a stability metric from persistent sensing tasks with our robust informative control strategy for the single-robot and multi-robot cases we develop informativepersistence controllersthat locally optimize the persistent sensing task by generating informative paths for the robots and subsequently increase the stability metric of the persistent sensing task. Lyapunov proofs are used to prove stability of both the single-robot and multi-robot cases. We evaluate and simulate both single robot and multi-robot informative persistence algorithms in MATLAB. * A dynamic patrollingpolicy for afleet of service vehicles. We instantiate the informative persistence controller to an application in traffic: matching supply and demand for taxis or a Mobility-on-Demand transportation system. The dynamic patrolling policy is comprised of multiple patrol loops and a provably stable vehicle redistribution model. Patrol loops are generated using actual historical data from a fleet of 16,000 vehicles over multiple days as the input to our mixing function informative path controller. In line with our objective to match supply and demand, the computed patrol loops minimize the instantaneous distance between customer requests and taxi position, as well as minimize the length of each patrol loop. Once a configuration of patrol loops has been computed, a centralized scheduling algorithm is implemented to manage request allocation and vehicle redistribution for large-scale (> 500 agents) MATLAB simulations. Our dynamic patrolling algorithm is tested against a greedy service policy and actual historical taxi performance in Singapore. 26 1.3 Relation to Previous Work This work builds on several bodies of work: (1) adaptive control, (2) informative path planning, (3) coverage control, and (4) multi-agent systems. Adaptive path planning algorithms traditionally consider the real time mapping of paths to a set of desired states in an unknown or dynamic environment in continuous-time systems. For example, in [34], an optimization for path planning was presented for the case of partially known environments. In [8], the authors present a path planning algorithm for deploying unmanned aerial vehicle systems in an unknown environment. Most of the previous work in path planning focuses on computing an optimal path according to some metric to reach a destination [14], [23]. In this thesis, the objective of the robots is not to reach a final destination, but instead, continually travel along their computed closed path trajectories through regions of the environment where sensory information is nonzero. We highly prioritize generating paths that allow the robots to travel through regions of interest in an unknown environment using adaptive control strategies to create a novel algorithm for computing informative paths. Informative path sensing extends adaptive path planning algorithms with an emphasis on efficiently measuring and monitoring a dynamic environment. Such a method for computing paths that provide the most information about an environment was presented in [28], with the aim of adaptively learning and traversing through regions of interest with multiple robots. Informative sensing while maintaining periodic connectivity for the robots to share information and synchronize was examined in [12]. Our work considers adaptive path planning and informative sensing in a similar context, by using a robust control strategy that can use both geometric and probabilistic sensor measurement behavior to optimize a coverage task in dynamic environment, as opposed to [25], where a non-adaptive probabilistic and geometric unifying control strategy was implemented for a known, static environment. Cortes et al. [7] introduced a geometric control strategy for multi-robot coverage in a known environment that continually drives the robots toward the centroids of their Voronoi cells, or centroidal Voronoi configuration. Schwager et al. [27] extended this work by enabling the robots to sample and adaptively learn an unknown environment before they began reshaping their paths. Similar work in Voronoi coverage includes [11], where the 27 objective was to design a sampling trajectory that minimizes the uncertainty of an estimated random field at the end of a time frame. Another common approach in coverage control is a probabilistic strategy. For example, [13] proposes an algorithm for positioning robots to maximize the probability of detecting an event that occurs in the environment and minimizes the predictive variance in a time frame. Both geometric and probabilistic based control strategies are based on a optimization that the controllers solve through the evolution of a dynamical system. Geometric and probabilistic control strategies were unified in a mixing function controller introduced in [25]. This work serves as most relevant to our thesis, because it enables a group of agents to position themselves statically in locally optimal locations according to either a probabilistic or Voronoi sensing coverage interpretation of a known environment. In this thesis, we build upon the mixing function controller by defining an agent's closed path as a set of waypoints that can distributedly execute a parameter adaptation law and a decentralized gradient control law to learn an unknown environment from robots' estimates and compute informative sensing paths, respectively. A similar extension of a pre-existing control law to enable informative path generation, was presented in [32], where a Voronoibased control strategy introduced in [27] served as the inspiration. The resulting informative paths computed by our mixing function control strategy locally optimize the sensing position of each waypoint while minimizing the length of the informative path traveled by the robots. Our mixing function control strategy can also be used in conjunction with a governing region revisit policy or speed controller to achieve persistent sensing. The persistent sensing concept motivating this thesis was introduced in [30], where a linear program was designed to calculate the robots' speeds at each point along given paths, in order for them to stabilize a persistent sensing task. A persistent sensing task entails bounding the growth of sensory information within the environment for all times. Examples of growing sensory information could include the amount of rainfall accumulated in a given area, or the amount of measurement uncertainty at a point of interest in the environment. In [30], the robots were assumed to have full knowledge of the environment and were given pre-designed paths. Following the method introduced in [33] for Voronoi coverage, 28 in this thesis, we remove all prior environment assumptions by having the robots learn the environment through parameter estimation, and then use this information to shape their paths into informative persistence paths. By removing these constraints, we create a viable persistent sensing strategy for unknown and dynamic environments. Persistent sensing is related to sweep coverage [5], where robots with finite sensor footprints must sweep their sensor over every point in the environment. The problem is also related to environmental monitoring research such as [4, 6, 15, 18, 37]. In this prior work, the authors often use a probabilistic model of the environment, and estimate the state of that model using a Kalman filter. The robots are then controlled so as to maximize a metric on the quality of the state estimate. Due to the complexity of the models, performance guarantees are difficult to obtain. In this thesis, based on our fully connected robot network, we can provide guarantees on the boundedness of the accumulation function. By likening the concept of informative persistence sensing to patrolling problems [9, 21], we are able to propose a control strategy that distributedly uses an informative persistence patrolling loop to locally optimize a task allocation scenario in a dynamic transportation network. Distributed dynamic vehicle routing scenarios are considered in [1], where events occur according to a random process and are serviced by the robot closest to them. Work on optimal task allocation dates to [19] and [10]. Mobility-on-demand (MOD) is a similar paradigm for dealing with increasing urban congestion. Generally speaking, the objective of MOD problems is to provide on-demand rental facilities of convenient and efficient modes of transportation [20]. Load balancing in DTA problems essentially reduces the Pickup and Delivery problem (PDP), whereby passengers arriving into a network are transported to a delivery site by vehicles. Autonomous load balancing in MOD systems has been studied in [22], where a fluid model was used to represent supply and demand. In this thesis, we employ a PDP problem formulation to model an urban transportation network. Socially-motivated optimization criteria have also been considered in prior work. In [24,36], social optimum planning models were used to compute vehicle paths. Optimization of driving routes subject to congestion was considered in [17]. In a broader context, [16] observed the effect that multiple service policies had on logistic taxi optimization. More recently, in [35] we studied both system-level and social optimization criteria, 29 showing a relationship between urban planning, fuel consumption, and quality of service metrics. In this work we consider similar evaluation models, showing how we can achieve an improvement with respect to all three of these aforementioned points of interest. 1.4 Thesis Organization This thesis is divided into five chapters. Chapter 2 provides the main theoretical foundation of the thesis and derives the mixingfunction informative path controller for both single and multi-robot systems. Simulations and validations of the control algorithms are shown for a wide array of robot control strategies including minimum variance, Vornonoi smoothing, and strictly Voronoi approaches. Chapter 3 extends the informative path controller to persistent sensing and introduces stability margin requirements to the mixing function controller. Simulations are shown for a minimum variance informative persistence control algorithm. Chapter 4 presents a dynamic patrolling policy for a fleet of service vehicles in a MOD system using informative paths derived from Voronoi based controllers. Chapter 5 concludes the thesis with final reflections and lessons learned. 30 Chapter 2 Informative Path Controller Using a Mixing Function The idea of using a mixing function for static coverage in a known environment with multirobot systems was introduced in [25] and was shown to produce results that were more stable numerically as compared to a geometric Voronoi-based approach. In this chapter, we build on this robustness insight and show that we can use a mixing-function-based approach to create an informative path controller that is more stable numerically than the results in [33]. The mixing function control strategy consists of an adaptation law for parameter estimation and a gradient optimization of a coverage cost function consisting of a sensing error cost, a robot path length cost, and a parameterized mixing function. We show that informative paths generated by this control strategy can be altered by varying a free parameter a to enable different sensor estimate mixing behaviors between robots that can be interpreted as either probabilistic, geometric approximation, or geometric. The resulting informative paths computed by the mixing function control strategy, regardless of the sensing interpretation, locally optimize the coverage task. A mathematical formulation of the problem follows. 2.1 Problem Setup Remark 1 All mathematicalsymbols used in this chapter are defined in Appendix A. 31 A sensory function, defined as a map 0 : Q -+ R>o , where Q is a convex, compact environ- ment, determines the constant rate of change of the environment at an arbitrary point q E Q. Let there be N E Z+ robots identified by r E {1, ... ,N}. Robot r is equipped with a sensor to make a point measurement 0 (Pr) at its position Pr E closed path fr : [0, 1] -+ Q C R2 while traveling along its R 2 , consisting of a finite number nr of waypoints. Note that the nr waypoints corresponding to robot r are different from the ni waypoints corresponding to robot r', Vr' I r. The position of the ith waypoint on r is denoted p E P c R , where i E {1, ... nrl, P is the state space of a single waypoint, and dg- is the dimension of the state space. We define Pr {P , ... , pnr E r and P =[P1, ... , PG E !Nnr as the configuration vectors of robot r and of all waypoints, respectively. Because fr is closed, each waypoint i has a previous waypoint i - 1 and next waypoint i + 1 related to it, which are called the neighbor waypoints of i. Note that i+ I = 1 for i = nr, and i - I = nr for i = 1. A robot moves between sequential waypoints in a straight line interpolation. For each waypoint, the cost of the sensing estimate of a point q E Q from its position p , is given by the function f(prq) = ||q-pgI 2 , (2.1) where f(p , q) E R>0 and is differentiable with respect to p . The sensor measurement estimates of the N -nr waypoints are combined in a function g(f(p' , q), ... , f(PN, q)), called the mixing function [25]. The mixing function ga: N Rs R defines how sensory in- formation from different robots is combined to give an aggregate cost of the waypoints' estimate of q. We propose a mixing function of the form ga N nr f p , qa) a(2.2) r=1i=1 where a is a free parameter. The mixingfunction manifests assumptions about the coverage task; in that, by changing the mixingfunction we can derive variety of distributed controllers including Voronoi coverage control (a = -oo), probabilistic coverage control (a = -1), 32 and Voronoi smoothing coverage control (-1 > a > -oo) [25]. Consider a sensing task in which an event of interest occurs randomly at a point q and is sensed at a distance by sensors located on different robots. The mixing function (2.2) assumes that different waypoints positioned at p and p5, may both have some sensory information about the event, instead of only counting the information from the waypoint that is closest to q as in the Voronoi approach. Unlike the geometric Voronoi approach, the mixing function captures the intuition that using more sensor estimates may provide a more accurate estimate of a point of interest in the environment than a single localized sensor estimate of the same point. Mixing function coverage for -1 > a > -oo is shown in Figure 2-la where the overlap of sensor estimates at two waypoint locations is shown as the intersection of two circles. Figure 2-lb shows that the Voronoi coverage case only considers robots' sensor estimates of q within their Voronoi partition [7]. Thus, allowing for no sensor estimate mixing between waypoints. Waypoint position r Waypoint position Sensor cost p, r.) ()an r Sensor cost Mixing function .f(p, q) N rI i=l (b) Voronoi Schematic, a = -- (a) Mixing Function Schematic, -1 > a > -o Figure 2-1: The mixing function defines how sensor measurements of the convex environment are shared by the waypoints. For probabilistic and Voronoi smoothing cases (-1 ;> a > -oo), waypoints combine sensor estimates. For the Voronoi case (a = -oo), only sensor measurements of points of interest within a waypoint's Voronoi partition are considered, and waypoints do not combine sensor estimates. The mixing function has several important properties. For a > 1, the ga becomes the it p-norm of the vector [fI ... ] T. When a < 1, ga is non-convex and not a norm, because 33 violates the triangle inequality. When a < 1, ga is smaller than any of its arguments alone. Therefore, the cost of sensing at a q with different waypoints positioned at p and p Vihj, is less than the cost of sensing with only one of the waypoints individually. Furthermore, the decrease in ga from the addition of a second waypoint is greater than that from the addition of a third waypoint, and so on. Thus, there is a successively smaller benefit to adding more robots. This property is called supermodularity, and is shown in Figure 2-2. it Figure 2-2: As more sensing estimates are considered, the property of super modularity dictates that the amount that the mixing function is decreased becomes increasingly less. 2.2 Mixing Function Cost Building upon [25,33] and using (2.2), we propose a generalized, non-convex cost function of the form H(P) s g(f(pi, q),...f( Nq))O(q)dq+ w N nr IIpi+1II r=1 i=1 ws N N nr r=1 =1 where | nr PIny|2 (E f(p , q)ag iO(q)dq + r1 (2.3) i=1 denotes the 12 -norm, and the integrand g(f(pl, q),...,f(PNq)) represents the aggregated sensing estimate of all waypoints at a single arbitrary point q, with a corresponding weight Ws E Z+. Integrating over all points in Q, weighted by 4 (q), gives the first term of the cost function. The second term of the cost function represents the cost of positioning neighboring waypoints of the same robot too far from one another. Ultimately, 34 this term dictates the cost assigned to the final length of the informative path, and it is given a corresponding weight Wn E Z+. Our goal is to develop a controller that stabilizes the waypoint around configurations P* that minimize H [25]. The general mixing function cost (2.3) can be shown to recover several common existing coverage cost functions. Drawing out the relations between these different coverage algorithms will suggest new insights into when one algorithm should be preferred over another. Substituting (2.1) and (2.2) into the general cost function from (2.3), we explicitly derive the mixing function cost WV Ha(P) = fQ N nr a E(lq r=1 i=1 - 112) ) k (q)dq+ N nr W r +112. (2.4) r=1i=1 This robust cost function consists of a sensing cost, a robot path length cost, and a mixing function cost. An equilibrium is reached between these individual costs when =Ha(P) 0. This optimization defines how the mixing function control strategy generates informative paths. A formal definition of informative paths for multiple robots follows. Definition 1 (Informative Paths for Multiple Robots using a Mixing Function) A collection of informative pathsfor a multi-robot system corresponds to the set of waypoint locationsfor each robot that locally minimizes (2.4). 2.3 Mixing Function Control Law Because Ha is non-convex, a gradient based controller of the form -irdHa(P)(25 uU =- (2.5) yields locally optimal waypoint configurations P* for a control input u with integrator dynamics, and a strictly positive definite gain matrix K[ [25]. 35 By substituting the explicit value of Ha(P), (2.5) becomes p = -K[ J S-K ) ( WS i]Q a-1 N dp LE 2 dp the general gradient based controller (2.6) becomes By expanding = ) 2 ga Q g ( 2 - 2qp + (pi) 2 ) N q+___1 f(r wd 1 1:22dpp lpmt+ )a-1(2 ga r=1 1=1 (2.7) PI Using this gradient descent approach, we propose the following generalized mixing function control law to locally minimize (2.4) and enable waypoints to converge to an equilibrium configuration pJf=-KW ( I I t Q ')a1(-q+p)O(q)dqwp+ Ii ga -Wp _1 ,r (2.8) By substituting the values of the sensor cost (2.2) and the mixing function (2.2), the mixing function control law is explicitly defined as K2pf~~ Ws1- -p 0 a Q _12)a-1 ~ = (q - p) 0 (q)dq - Wnp+, nf1; 29 r=1i=1 It follows from (2.8) and (2.9), that the term inside the integral, (a, Enf- 1 (flq - 2 (2.6) (_W (q)dqd+ f1 2 )a) Ia (Iq - p f12 )a- 1 is equivalent to (f('q)a-i. This term is important, because it gives an approximation to the indicator function of the Voronoi partition of waypoint i of robot r. The approximation improves as a -+ -oo. to a Voronoi partition, (f(,q) In addition to giving an approximation )a-' defines how the sensor estimates of different robots are combined over the environment. As shown in Section 2.4, Voronoi coverage is defined as lima,--( ' )a-'. At this limit, there is no sensor mixing between different robots. Using this intuition, we are able to use our coverage controller to approximate a Voronoi coverage controller with a higher degree of accuracy as we decrease a towards -oo. Even 36 for values of a >> -oo, i.e. a = [-10, - 15], the smoothing controller approximates the Voronoi partition arbitrarily well. The resulting contours of )a-I for various a val- (f (pi', ues are shown in Figure 2-3. 0.9 0.9 0.8 0.8 \ 0.7 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0.2 0.4 0.6 0.8 0 1I 0.2 (a) a = -0.5 0.4 (b) a 0.6 0.8 0.6 0.8 1 - 0.4 0.9 0.8 0.8. 0.7- 0.7-- 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.21, 0.1 0.1 "0 - 0.9 - 0.4 0.2 0.4 0.6 0.8 1 (c) a = -10 "0 0.2 0.4 (d) 1 a = -15 Figure 2-3: Contour plots of (f(p ',q))a-1 are shown for two robots with five waypoints each. As a -+ -oo, the contours approach a Voronoi partition. Next, we define three substitution variables analogous to mass-moments of rigid bodies. The mass M[, first mass-moment Y', and centroid C[, of environment 37 Q, for a mixing function are defined as Mi = jWs( (P ))a)-I(q)dq, (2.10) Yr = jWs( ( ga))a-14(q)qdq, (2.11) _ C' yr =-.1 (2.12) '-M. Let e = CT - pT. Note that f( Iq - pg||) strictly increasing and 4(q) strictly positive imply both Mr> 0 V Q 0 and Ci is in the interior of Q. Thus M[ and C have properties intrinsic to physical masses and centroids. Using these inertial property substitutions, the mixing function control law (2.9) is be defined as ,.Ki (Mie + ir ,,p[ ; = ) (2.13) where = #i= Wn(PI 1+p_ 1 -2p ), Mr+2Wn > 0. (2.14) (2.15) Remark 2 Pi3 > 0 normalizes the weight distributionbetween sensing and staying close to neighboringwaypoints. 2.4 Deriving Common Control Strategies In this section, we use the mixing function cost (2.4) and coverage control law (2.13) to derive Voronoi and minimum variance control strategies. These strategies represent the range of robot sensor behaviors that can be produced by using the mixingfunction. Whereas a Voronoi coverage strategy does not combine sensor measurements from different robots, a minimum variance strategy combines all robots sensors measurements to minimize the expected variance of the global sensor estimate. 38 2.4.1 Voronoi Control Strategy (a = -oo) From [27], we define the Voronoi partition of the ith waypoint along fr as V = {q E Q: |jq - p 1; |q - p* 1|, V(r', i') # (r, i)}, where r, r' E {1,...,N}, i E {1, ... ,nr} and i'E{1, ... ,n,.}. (2.16) Because lima-+-o ga(f(p',q),...,f(Pn,q)) = min f(p ,q), it follows that for a = -oo, gJ=1= (4--f2)a _ 1 2, which implies the Voronoi indicator function (f(pq )a-I = 1. Intuitively, min ga stipulates that there is no sharing of sensor measurements between waypoints over the environment, and consequently, waypoint i of robot r considers only q E Vr. As a result of no sensor mixing between different robots, the cost incurred by all the robots due to the event at q is the same as that incurred by the robot that is closest to q. Thus, for g_., the coverage cost of the mixing function controller is equivalent to the coverage cost of a Voronoi controller, which is defined in [32] as N W nr Hy H r=1 i=1 2 0(q)dq+ -|q-p i'r N nr W p - p+(2.17) 1 HPHI (2.17) r=1 i=1 By redefining the mass M[, first mass-moment Y[, and centroid CT of waypoint i's Voronoi partition Vir as M Yr = Ws(q)dq (2.18) = Wso(q)qdq, (2.19) Yr Cr yr (2.20) and by setting e = CT - p , the resulting gradient descent Voronoi control law is derived u- KM e+/) 39 (2.21) where yf and #3f are the same as in (2.14) and (2.15), respectively. From (2.21), we can see that the Voronoi control law differs from our proposed control law due to the absence of a mixing function in the mass properties M[ and Y[ of the Voronoi partition V[. 2.4.2 Minimum Variance Probabilistic Control Strategy (a = -1) In this section we use a mixing function with free parameter a -1 to derive a control strategy that minimizes the robots' expected variance of their measurements of a point of interest q. We will formulate an optimal Bayesian estimator for the location of q given the aggregate measurements of the waypoints. Assume that the waypoints have noisy measurements of the position of a point of interest in the environment. Let the point of interest be given by a random variable q that takes on values in = Q, where waypoint i of robot r has the current environment measurement q-z. Here, z ~N(O,I2 x2 f(pf, q)) is a bi-variate normally distributed random vari- able, and I2x2 is an identity matrix. The variance of the measurement, f(p , q) is a function surement likelihood of waypoint i of robot r is Pr(xflq : p4) = ) exp( ) of the position of the sensor estimate and the point of interest. Given this variance, the mea- Assuming the measurement estimates obtained by different waypoints conditioned on q are independent, and that 0 (q) is the prior distribution of the q's position, Bayes' Theorem gives the posterior distribution, =1t I Pr(Xir Iq : p ) 0(q) N I Pr1qXy ,p... FI, ,fNH Hnl Pr(Xflq :p )q(q)dq( Our goal is to position the waypoints so that their total estimate of q is as accurate as possible. To achieve this, we want to position the robots so that they minimize the variance of their combined sensor measurements. The product of measurement likelihoods in the numerator of 2.24 can be simplified to a single likelihood function, which has the form of an un-normalized Gaussian N fl nr Pr(xIq: pD) = yexp( r=1i=1 40 I T g2g_2 2 , ( .3 (22 f(p ,q)- 1 )-1. The whose variance is equivalent to our mixing function g-1 = (EL 11 values of y and X[ are given by N nr (2.24) XT = g-1 YE Ef(Pq)- -1 and r=1 i=1 T 1 = ) 11g _1 exp( 1||F - N nr iiTii2 q ),(2.25) - 2 r=q)i=1 2 respectively. Finally, the expectation over q of the likelihood variance recovers our original general mixing cost function (2.3), W _ 1Eq[exp( H -12 E [Y -| XP J Q~1(f(Pi,~) = I--| 2 2 N nr )]1+ E E g-1 W pIi+112 r=1i= N r=1 ),.,f(p))]~d+ -g1(f (p', q), - - ., f )p ))(q)dq +r 2 Ii+|2 nr W i=1 N flrjpr (2.26 (2.26) From this derivation, we can interpret the coverage control optimization as finding the waypoint positions that minimize the expected variance of the likelihood function for an optimal Bayesian estimator of the position of the point of interest q. 2.5 Mixing Function Controller Convergence In this section we introduce sensory function parameterization and prove that the proposed mixingfunction control law in (2.13), causes the set of robot path configurations to converge a locally optimal configuration according to (2.4) for both known and unknown environments. 2.5.1 Sensory Function Parameterization The sensory function O(q) can be parameterized as a linear combination of a set of known basis functions spanning Rd, where d is the dimension of the function space. 41 Assumption 1 (Sensory Basis Functions) ~a E R$> and B: Q - R>, where R> is a vector of nonnegative entries, such that O(q) = B(q) T a, (2.27) where the vector of basisfunctions B(q) is known by the robots, and the nonnegative parameter vector a is estimatedin an unknown environment. In adaptive control literature [29], B represents the set of parameters that can be measured, given full state feedback and observable system dynamics. The parameter vector A represents the set of estimated system parameters such as unknown trajectories or inertias. Denoting ar(t) as the robot's estimation of a it follows that, $(q) = B(q)TA is the robot's approximation of 0 (q), and the mass moment approximations can now be defined as r Q Ws( ( ga j=Ws(f( ( )a-1qo(q)dq, r (2.28) )a1(q)dq, (2.29) kr = Cr Mi' ),(2.30) for the mixing function control strategy A - a, the sensory function Using error anda massamoment errors of the mixing function are defined as #(q) = Rf = kr e ^(q)-(q)=B(q)T a, Ar - M[= jWs(f (a)a-1B(q)Tdq r = (2.31) i, (2.32) yr = Q WSgq )a-lqB(q)Tdq i, (2.33) -i-. Mr (2.34) 42 When a = -1 and we recover the minimum variance mass moment errors A = A - M = , = fir er = )2 2B(q)Tdq a, (2.35) 1 )-2qB(q)dq j, Ws JQg-i (2.36) Ws( JQ _y = g-1 ir-.(2.37) Mr Similarly, when a = -o, (pq) )a-1 - 1, and we recover the Voronoi mass moment errors Rjr= Af -M= WsB(q)TdqA, (2.38) =f WqB(q)Tdq A, (2.39) firyr jirr e = (2.40) -r, In order to compress the notation in all three mass moment approximation notations, we # set the terms BPr (t) and OPr(t) as the value of the basis function vector and the value of at the robot's position pr(t), respectively. 2.5.2 Coverage Convergence in a Known Environment We assume that the robots and waypoints have full knowledge of the sensory parameter vector i.e. ar = a, and therefore their sensory function estimate $(q) = (q). Theorem 1 (Mixing Function Convergence Theorem in a Known Environment) The configuration of all waypoint positionsP, converges to a locally optimal configurationaccording to dHa/dp =0. Proof 1 We define a Lyapunov-like function based on the agent's path and environment measurement. Because the system is autonomous, we use LaSalle's Invariance Principle and invariant set theory to prove asymptotic stability of the system to a locally optimal equilibrium. 43 Let Ha be the Lyapunov function candidate. Because it is comprised of two squared 2-norms, Ha is positive definite. Additionally, Ha -+ oo as p --+ o, and has continuousfirst partialderivatives. Domain Q is bounded, and therefore the state space of all waypoints yNnr is bounded. Let 9 = {P* I Rap* 0} C P be the invariantset of all criticalpoints of Ha over yNn. Taking the time derivative of Ha, we obtain N Ra = nr E r=1 i=1 N dH T r p~i P nrl r=1i=1 E (Mie + ) T K (Mie i+4) < 0. The first derivative of the Lyapunov candidateHa < 0, because K[ is strictly positive definite and /3/ > 0. Because Ha is negative semi-definite and Ha is positive definite, this implies that Ha is non-increasingand lower bounded, thus 3s < 00 such thatlimt, Ha Now 9 is explicitly defined as the set of solutions ofY2 NLi S be the largest invariant set within 9. By definition i = =i4 0, Vi, r. Let 1 Mre=+ 1-K a = s. KI I ifi), from which it follows that S = f, the set of all criticalpoints of Ha. Thus, 9 is an invariantset, and all trajectories converge to 92 as t - oo using LaSalle's Invariance Principle. From (2.13), Mier + fi -+0 implies dHa/dp =0. 2.5.3 Coverage Convergence in an Unknown Environment We now extend the mixing function control law in (2.13) to include a parameterized adaptation law that ensures each robot's independently synthesized path converges to a locally optimal configuration according to (2.4), while each of the robots' estimates of the environment converge to the real environment. The presence of a consensus term in the adaptation law enables all of the robots' estimates of the environment to converge to the same estimate [27]. In order for robots' estimates of the environment to converge, the consensus term requires that each robot has knowledge of the states of all the other robots. Thus, we assume that our network of robots is fully connected. In an unknown environment, robots only have sensory estimates Or (q) of # (q), so the 44 control law from (2.13) becomes d_ =1 (2.41) where 2p ), t =Wn (Pr 1+ p _ f = $i+ 2Wn, Parameter vector ar is adjusted according to the following features of the adaptation law Ar = (2.42) Jwr(r)Bpr(r)Bpr(,r)Tdr, r rWr()B,(r)()p,(r)dr, =2.43) where the data collection weight Wr() [27] is defined as Wr(t) positive constant scalar, if t < wr 0, otherwise, (2.44) where 4r, represents the time at which part of the the adaptation for robot r shuts down to maintain Ar and Ar bounded. Let br = j W( b =1 Y, 4 ,( q) )a-B(q) (q - p) T dq p2, (2.45) ), (2.46) N aprer =-br-T(Arr-A r) - ir,r'(^rr'=1 where 4 > 0 is a consensus scalar gain, and l,,1 45 can be interpreted as the strength of the communication between robots r and r' and is defined as Dmax -||Pr - PW|1, if ||Pr- A 1 <; Dmax 0, otherwise. (2.47) We select Ir,g so that our assumption of a connected work is satisfied, and such that (2.46) maintains continuity on the right-hand side, so we can apply Barbalat's Lemma [29] to prove asymptotic stability of a positive-definite Lyapunov function candidate. Explicitly, we define lr,,. to be a constant for all r, r', which corresponds to a fully connected network between robots. Because a(j) 0, Vj, we require ar(j) 0, VrVj, by the adaptation projection law [27], ar Pre -projraprer) (248) where F E Rdxd is a diagonal positive definite adaptation gain matrix, and the diagonal matrix Iproj, is defined element-wise as Iproj(rU) 0, if ar(j) > 0, 0, if ar(j) = 0 and Aprer(j) 1, otherwise, 0, (2.49) where (j) denotes the jth element for a vector and the jth diagonal element for a matrix. The adaptation law (2.48) includes a weight function Wr(t) in the calculations of Ar and 11r. The weight function must maintain Ar and Ar bounded in order for our stability proof to hold [27]. Because the sensory function is assumed to be time-invariant, we use a step function to represent the weight. Initially the weight is set at a nonzero constant value, before it is set to zero in order for Ar and Ar to remain bounded. This choice of weighting function enables robots to spend a finite amount of time sampling the environment sensory function before they stop sampling. Their estimates of the sensory function are based on 46 the finite amount of sampled points. This weight function is defined as if t < w, 0, Wr(t) where w is a positive constant scalar, and (2.50) 'rWr otherwise, is some positive time at which robot r stops zr sampling the environment's sensory function to guarantee Ar and Ar bounded. Theorem 2 (Mixing Function Convergence Theorem in an Unknown Environment) With waypoint dynamics specified by (2.5), control law specified by (2.41) and adaptive law specified by (2.48), we have 1. Waypoint EquilibriumConvergence limt, JJA=(t)j (t)+ yf(t) 0, Vr E {1,...,N}, Vi E , ... nrl, 2. Environment Estimate Error Vr E { 1,...,N}, Vr Iwr(,r) > 0, liMt--- O||Pr(r) II= 0, 3. Robot Consensus Error limtaoo(* - 40) = 0, Vr, 'E { 1,...,IN}. Proof 2 We define a positive-definite Lyapunov function Va based on the robots'pathsand sensory function estimates and use the three criterionof Barbalat'slemma [29]: 1. Va is positive-definite 2. Va is negative-semi-definite 3. Va is bounded (Va is uniformly continuous) to imply Va -+ 0 and the subsequent asymptotic convergence of the multi-robot system to a locally optimal equilibrium. Let Va be defined as 1 N Va = Ha+ jiTP-'Ar. r=1 47 (2.51) Taking the time derivative of Va, we obtain N nr E Y. Y/a dHa T i r=1 i=1 N N r=1 nr r=1 i=1 N r. F a)+ -(Me + (2.52) r=1 From (2.32), (2.33), (2.34), MiC =MrcT+MTrcT-cT). yr Pluggingthis into (2.52), N Ya nr (M = ({Sf!I + ,r r=1 i=1 Y.rNr + E iF1 ~r. r=1 Using (2.32), we have N nr Va = E E -(fie- + ) T pr +i' .'rp ) T p - r=1i=1 N r=1 Substituting the dynamics specified by (2.5) and control law specified by (2.41), we obtain N 1 1: Va nr _ r=li=l N nr r=1 i=1 hi pir (~ r + ,r) T Kir(k!ir+ ,r) N _ Kp T K T r=1 48 -r r- Using (2.32) and (2.33), N Va nr 1 -E -,ie+ i)TK $!+V pir W f( +N r=1 i=1 Q )a-1B(q)(q - pu) T dq ' ga N &Tr- Ir r=1 Plugging in the adaptationlaw from (2.45), (2.46) and (2.48), N nr Va 1 T E----(AjrK+r) Kf (M+i+4) r=1i=1 i- = N -T 1 (Ar~r - Ar) r=1 N - N r rr r- r=1 N E - r=1 fprojrAprer Using (2.42) and (2.43), the second term in (2.53) becomes t N -= N -- =1 jr Wr()B(Pr('))B(Pr()) dr j wr('r)B(Pr(z))(Pr(,r))dz] t 0Wr (T) Or(Pr(T))T [r(Pr(r))- O(Pr(r))]dr N I t r= _yI0wr ( r (Pr(r)))2dr 49 (2.53) Plugging this expression back into (2.53) we obtain Ya N nr I N fr l ^rA6+ = r=1 i=1 fi -Er=1 0 N r=1 N r'=1 r=1 rI Wr(T) (Or(Pr(T)))2dr N X - Tr(i 7KfA6+ (2.54) NrIprojraprer. Let 1= [ 1,..., i]T. From [27], we can representthe thirdterm in (2.54) as where!Qj = a(j)1, Qj = ,N (j)]T ... and %j =f0; - 1 i L(t)nj, j. L(t) is the weighted graph Laplacianof the system at time t and is defined entry-wise by -(r', for r -pr', L(r, r') =(2.55) Eir, r', for r = r'. This Laplaciangraph is positive-semi-definite, because it has exactly one zero eigenvalue whose eigenvector is 1 [27], as a result of the network of robots being fully connected. Thus, xTLx = 0 only if x = v, for some v E- R. Consequently, TL = a(j)1TL = 0, Vj. Therefore, d -C Td T 2 i LK^2j, 91 L92j = - 1 j=1 j=1 and itfollows that N nr Va = 1 E - =($re(+yr) K, r=l i=1 N Wr (T) (Or(Pr(r)))2dr -YE r=1 fo d ^T -- I Pi t n N Lnj - E &TIprojApre,. j=1 r=1 50 (2.56) We denote the four remaining terms in (2.56) as 61 (t), V92(t), t93 (t) and 64(t), respectively, so that Va(t) = t1(t) + V 2 (t) + tb (t) + definite and J3[ > 0, 01 (t) 0. t2(t) quantity, while [27] proved that 04 (t) 4 (t). Because Kf is uniformly positive- 0 because it is the negative integrandof a squared 0. Because we assume a fully connected network to allowfor robot estimate consensus, itfollows that L(t) > 0 Vt. Thus, implying I3 (t) 0. Consider the time integral of each of these four terms, fit O (r)dr, k = 1,2,4. Because each of the terms is negative definite, ft 6(yr)dr < 0, Vk, and because Va is positive definite, each integral is lower bounded by ft 6k(r dr > -V, where Va is the initial value of Va. Therefore, these integralsare lower bounded and non-increasing, and hence limt, ft 6k( dr exists and isfiniteforall k. Now show that Va is uniformly continuous (Va bounded). It was shown in [27] (Lemma 1) that ti (t) and i)2(t) are uniformly bounded, implying '1 (t) and t9 2(t) are uniformly continuous. Thus, by Barbalat'sLemma, 61 (t) - 0 and t (t) -÷ 0. This implies propo- sition (i) and (ii). It was shown in [27, 32] that ts(t) is uniformly bounded, and it is not differentiable only in isolated points on [0, oo). Because the network is fully connected, t)3 (t) is uniformly bounded and uniformly continuous in time. Therefore, by Barbalat's Lemma [29], 9-3 (t) -+ 0, which implies that 9 LK; -+ 0, Vj. Given the definition of the weighted Laplacian, this implies that Lj -+ ainl1j)1, Vj, where ^inal E RI 0 is the last common parameterestimate vector sharedby all robots. This impliesproposition (iii). Remark 3 This multi-robot convergence proofencompasses the single robot case N = 1. The single robotproofdoes not include the term V-3 (t), because there is no need to consider consensus error if there is only one robot in the environment. Consequently, f = 0. Remark 4 Proposition(i) from Theorem 2 implies that the paths reach a locally optimal conf gurationfor sensing, when the waypoints reach a stable balance between being close to their neighbor waypoints and minimizing the mixing function and estimated sensing error Propositions(ii) and (iii) from Theorem 2 imply that Er(q), Vr -+ 0 for allpoints on any robot's trajectorywhile the weight wr (t) > 0 over the environment. 51 2.6 Single Robot Coverage Algorithm and Simulation In this section we present the algorithms used by a single robot to generate an informative path in an unknown environment. The coverage algorithms for these simulations are divided into a two level hierarchy: (1) robot level and (2) waypoint level. The robot level corresponds to the robot traveling and sampling the environment along its initial path. This algorithm assumes that each robot knows the positions of the waypoints along its path. The robot travels between waypoints in a straight line and makes a measurement of the environment at each waypoint. With each robot measurement q (pr), the robot updates the estimated parameter vector A as shown in line 11 of Algorithm 1. The complete robot level algorithm is shown in Algorithm 1. Algorithm 1 Mixing Function Controller for a Single Robot in an Unknown Environment: Robot Level Require: 0 (q) can be parametrized as in (2.27) Require: a > 0 Require: Waypoints cannot reconfigure faster than the robot traveling the path Require: Robot knows location of pi, Vi E { l, ... ,nr 1: Initially robot is moving towards pi 2: Initialize A and X to zero 3: Initialize a element-wise to some bounded nonnegative value 4: loop 5: 6: if robot reached pi then move towards Pi+1 in a straight line from pi 7: else 8: 9: 10: 11: 12: move towards pi in a straight line from Pr end if Make measurement 0 (Pr) Update a according to (2.48) Update A and . according to (2.42) and (2.43) 13: end loop The waypoint level algorithm corresponds to the waypoints reconfiguring themselves into locally optimal positions to generate an informative path based on the optimization of the mixing function cost (2.4). In the waypoint level algorithm, each waypoint uses the parameter vector &updated by the robots as an input to the mixing function controller that drives its dynamics as shown in line 5 of Algorithm 2. The complete waypoint level 52 algorithm is detailed in Algorithm 2. Algorithm 2 Mixing function Controller for a Single Robot in an Unknown Environment: Waypoint Level Require: Each waypoint knows A from Algorithm 1 Require: Each waypoint knows the position of all other waypoints on t, [Pi,. -,Pn,] Require: Each waypoint knows a 1: loop 2: 3: 4: 5: Compute ga according to a Compute Ci according to (2.30) Obtain all waypoint locations [p1,...,p,] Compute ut according to (2.41) 6: Update pi according to (2.5) 7: end loop The informative path algorithm for a single robot was tested in MATLAB using minimum variance, Voronoi smoothing, and Voronoi coverage strategies, where a = { -1, -10, -oo} in the mixingfunction controller, respectively. For each coverage approach we consider one robot with n = 50 waypoints. A fixed-time step numerical solver is used to integrate the equations of motion and adaptation law using a time step of 0.01 seconds. The environment Q is a unit square. The sensory function 0 (q) is parametrized as a Gaussian network with Gtru exp 2v/~2~I = =n 2a2, exp - n , G(j) - 25 truncated Gaussians, i.e. B = [B(l) ... B( 2 5 )]T, where B(j) G (j) - Gtn, 0, if q - yl < ptmnc, otherwise, (2.57) c = 0.3 and ptcnc = 0.2. The unit square is divided into 8 x 8 discrete grid and each pj is selected so that each of the 25 Gaussians is centered at its corresponding grid square. The parameter vector a is chosen as a(4) = 80, a(14) = 60, a(22) = 70, and a(j) = 0 otherwise. The environment created with these parameters is shown in Figure 2-5c. The parameters a, A and / are initialized to zero. The parameters for the controller are Ki = 70, Vi, 53 F identity, y= 2000, W, = 10, Ws = 100, w = 30. The spatial integrals are approximated by summing integral contributions over a 10 x 10 discretized environment grid. In order for the robot's estimate of the environment to converge to the real environment, we design an initial sweeping path so that the robot will have a rich enough trajectory to observe the entire environment. The robot travels along its initial path without any reconfiguring, so that it can sample the space and learn the distribution of sensory information. This process is referred to as the learning phase. The path shaping phase begins immediately after the robot has traveled its entire initial trajectory once. During the path shaping phase, w = 0 according to (2.50), and we show that the mean waypoint position error converges to zero, thus, experimentally verifying that mixing function controller (2.41) reconfigures the waypoints of fr into an informative path. 2.6.1 Learning Phase The robot travels its initial path trajectory once to measure and estimate the environment according to the adaptation law (2.48). As the robot travels along its path, the adaptation law causes the robot's estimate of the environment to converge to the real environment description as defined by 0 (q) - 0. Figure 2-4a validates this covergence by showing that the mean integral parameter error limt, ft w()(bp,(r)) 2 d r - 0, in accordance with propo- sition (ii) from Theorem 2. To complete the stability criteria from Section (2.5.3) for this learning phase, Figure 2-4 shows that the Lyapunov function candidate Va is monotonically non-increasing and infimum(Va) > 0. The learning phase simulation for a single robot is shown in Figure 2-5. As the robot travels along its initial sweeping path, Figures 2-5a - 2-14e show the adaptation law causes O(q) -+ 0, Vq E Q and the robot's estimate of the environment converges to the real envi- ronment description. 54 35 r- 30- 25- 20- f~15- 10- 5- 0C 0 50 100 150 200 250 300 350 400 450 Path Iterations (a) #(q) Mean Integral Parameter Error 8000 70006000g 50004000R-5300020001000- 0 50 100 150 200 250 300 350 400 450 Path Iterations (b) Bounded Lyapunov Function Va Figure 2-4: The mean integral parameter error shows that the robot's estimate 0 (q) -+ #(q). Peaks occur when the robot encounters a point of interest while its sensing estimate is currently zero. As required by our convergence proof, we also show that Va is positivedefinite and bounded. 55 GO_ 026 0 Q4 .2 06 0.0 (a) Learning iteration: 1 0.2 C O4 0. 0.6 1 (b) Learning iteration: 30 06 0.2 _0 0.2 04 0.6 C.8 I (c) Learning iteration: 90 0- 0 2.2 04 0.6 0.8 1 (d) Learning iteration: 160 Figure 2-5: Single robot learning phase with informative path controller 1st Column: The initial learning path connects all the waypoints, shown as black circles. The black arrow represents the sensing position of robot. 2nd Column the translucent environment represents the true environment and the solid environment represents the estimated environment. This figure directly correlates to Figure 2-4a. 56 2.6.2 Path Shaping Phase Immediately after the robot travels through its initial sweeping path once and learns the environment, controller (2.41) is activated. The robot provides its estimates of the location of the points of interest to its waypoints through A, so that the mixing function controller can drive them into an informative path configuration to cover these points. Figure 2-6 confirms that the path has converged to an equilibrium configuration according to (2.4), by showing that the mean waypoint position error after the path shaping phase is zero. 0.25 0.2 0.15 0 S0.1 0.05- I ^III 50 100 I 150 200 250 300 350 400 Path Iterations Figure 2-6: Mean waypoint position error. ||Ay 4 (t)ei(t)+ai(t)| -+ 0. Waypoints converge to an equilibrium defined by (2.4), that balances thorough sensing and short coverage paths. The peak at 200 iterations corresponds to the beginning of the path shaping phase. The path evolution using this controller is shown in Figure 2-7 for a = [-1,10, -oo]. After 100 iterations (not counting initial learning iterations), the paths already go through all dynamic regions of the environment. It is important to note how well the Voronoi smoothing controller approximates the Voronoi controller even for a >> -oo. 57 450 0A 0.8 0.6 0.2 0.2 0.2 02 0.4 06 0.8 0.2 0 (a) a = -1, Iteration: 5 0.2 0.4 0.6 0 0.8 (b) a = -1, Iteration: 40 0.2 0.4 0.8 08 (c) a = -1, Iteration: 100 0.8 0.8 01 0.6 0.4 0.4 0.2 0 0.2 (d) a 0.4 = 0.6 0.2 0.8 -10, Iteration: 5 0.4 0.6 0.8 0 0.2 0.4 0.6 0. (f) a = -10, Iteration: 100 (e) a = -10, Iteration: 40 0.80 8 0.606 2 00 0.2 (g) a 0'4 = 0.6 0.8 -oo, Iteration: 5 0 0.2 0.4 CA 0.2- 0.2 0.4 0.6 0.8 (h) a = -0, Iteration: 40 0.2 1 (i) a 0.4 = -co, 0.6 c 1 Iteration: 100 Figure 2-7: Single robot path shaping phase with an informative path controller. Rows 1-3 show the path evolution of the minimum variance, Voronoi smoothing, and Voronoi controllers, respectively. For each of these strategies, W, > W, thus we expect longer paths with more thorough environment coverage. The paths connects all the waypoints, shown as black circles. The black arrow represents the robot's position. For a = -10 we already begin to visualize a very close approximation of the Voronoi controller as shown by the similarities of their resulting informative paths. 58 2.6.3 Varying Sensing Weight W and Path Weight Wn The Informative paths in Figure 2-7 were given by W = 100 and Wn = 10. Given that the ratio wnsL 10, the weights are heavily skewed to encourage thorough sensing instead of shorter path configurations. By increasing the neighbor waypoint distance weight W, the controller will provide greater attractive forces between neighboring waypoints, causing the paths to be shorter. Therefore, the system evolves in a way that the environment is covered with low-length paths with less of an emphasis placed on sensing. In Figure 2-8, we compare two weighting scenarios: 1.) Wn > W 2.) W, > Wn. In scenario 1, Wn is high, providing high attractive forces between neighboring waypoints. As a result, we see more of an emphasis placed on neighbor waypoint weights and shorter coverage paths. In scenario 2, Wn is low, thus resulting in low attractive forces between neighboring waypoints. This causes the waypoints to be more distant to each other and focus more on the coverage task. Depending on the application, the weights W,, and W can be selected to satisfy either sensing or size constraints. 0.8 08 OAS 0.2 0.2 0.4 (a) Wn > W, 0.8 Oe . S 0.4 04 0.2- 02 0 a = -1 02 04 Me (b) Wn > W,a = 0 Os 10 0.4 0.6 0. 0.M 01 0.6 0.2 " 0.2 (c) W > W,a = -oo 0.8 0 D 02 04 Q6 (d) Ws > Wn, a= -1 0.8 0.4 0.4 0.2 0.2 0 02 04 0.6 0.8 (e) W > Wn,a = -- 10 I 1 0.2 O 0. 083 I (f) Ws > W,a = -oo Figure 2-8: Single robot W vs. W,. Top Row: Wn > W. Bottom Row: W > W. Voronoi smoothing (a = -10) is indistinguishable here from Voronoi coverage behavior. 59 2.6.4 Computational Complexity of Single Robot Algorithm The gradient controllers described in this work are discretized and implemented in a discrete time control loop. At each iteration of the loop the controller computes spatial integrals over the region. A discretized approximation is used to compute the integral of 0 (q) over Q. The two parameters that impact the computation time are the number of waypoints nr and the number of grid squares in the integral computation, d. Unlike a Voronoi approach, we do not need to check if a point is in a polygon. However, the integrand g(a) we integrate over is linear in nr. The time complexity for computing a discretized integral is linear in the number of grid squares, which is a O(d) operation. Therefore, the total time complexity of the controller is O(d -n,) during each iteration. This controller is significantly less expensive than a Voronoi controller, because it does not require a Voronoi tessellation computation. As a decreases, the behavior of the controller approaches Voronoi-based coverage. This implies that the upper bound of time complexity for the mixing function controller is reached at a = -oo and has a corresponding order 0(n(d + 1)) at each iteration. 2.7 Multi-Robot Coverage Algorithm and Simulation Similar to the single robot coverage algorithm, the multi-robot coverage algorithm has a hierarchy of two levels: (1) robot level and (2) waypoint level. Again, the robot level corresponds to the robots traveling along their paths, sampling and estimating the environment. However, for the multi-robot case, the robots must now obtain and broadcast ar, Vr to achieve a consensus estimate of the environment between all robots. This algorithm can be executed in a distributed way by each robot. Thus, each robot can execute this algorithm independently, while sharing just their current parameter estimate vector R. The robot level algorithm is shown in Algorithm 3. The multi-robot waypoint level algorithm is executed such that each waypoint acts as an agent with its own controller. As a result of this distributed approach, each waypoint computes its mixing function, thus requiring that each waypoint now know the state of all other waypoints (even those not on its respective robot path). Using the parameter vector 60 Algorithm 3 Informative Path Controller for Multiple Robots in an Unknown Environment: Robot Level for robot r Require: 0 (q) can be parametrized as in (2.27) Require: a > 0 Require: Waypoint dynamics are slower than the robots' Require: The network of robots is fully connected Require: Robot knows location of p , Vi E {1,..., nr} 1: Initially robot is moving towards p, 2: Initialize Ar and )Ar to zero 3: Initialize ir to some nonnegative value 4: loop 5: 6: if robot reached p then move towards pr+1 in a straight line from p 7: else 8: 9: 10: 11: 12: 13: move towards p in a straight line from Pr end if Make measurement 0 (Pr) Obtain fij , Vr' that can communicate to r Update ^r according to (2.48) Update Ar and A r according to (2.42) and (2.43) 14: end loop Ar obtained by robot consensus as an input to the control strategy, the result of the waypoint level algorithm is that the waypoints reposition themselves into locally optimal informative path configurations. The waypoint level algorithm is shown in Algorithm 4. The informative path controller for multiple robots was tested in MATLAB for eleven test cases, varying N, a, and O(q). First, we present a case for N = 2 robots, n(r) = 22 waypoints, Vr . The same fixed-time step numerical solver is used with a time step of 0.01 seconds. The environment parameters are a = 0.3 and ptamc = 0.2. The parameters Ar, Ar and Ar, for all r are initialized to zero. The parameters for the controller are Kj'= 70, Vi, r, r = identity, y= 2000, W,, = 10, W = 100, and Wr = 10, Vr. Dmax is assumed to be very large, so that lr,.(t) = 10, Vrr', Vt. We chose the same rich initial sweeping trajectory as in the single robot case, for all robots, so that they have the opportunity to initially observe the entire environment. We present results in an initial learning phase and in the path shaping phase. During the path shaping phase, Wr = 0, Vr, and (2.41) is used to reshape the paths. 61 Algorithm 4 Informative Path Controller for Multiple Robots in an Unknown Environment: Waypoint Level Require: Each waypoint knows ar from Algorithm 3 Require: Each waypoint knows the position of all other waypoints [p1,...,pN] Require: Each waypoint knows a 1: loop 2: 3: 4: 5: Compute the value of the mixing function at the waypoint's position p Compute Ci according to (2.30), but integrating over Q from (2.16) Obtain all other waypoint locations [pl,...pN] Compute u according to (2.41) 6: Update p according to (2.5) 7: end loop 2.7.1 Learning Phase The robots travel their path in its entirety once, measuring the environment as they travel and using the adaptation law (2.48) to estimate the environment. Figure 2-9a shows that the consensus error, referring to Ir"=1 dT=1 ( - aj) converges to zero, in accordance with proposition (iii) from Theorem 2, indicating that all robots have the same estimate of the environment. While this shows that each robot converges to the same estimate of the environment, we still must ensure that this estimate is accurate. Figure 2-9b ensures that the robots' estimates of the environment are accurate by showing that the total mean integral position error for all robots f w(t)('Pr (r)) 2 dr does indeed converge to zero, in accordance with proposition (ii) from Theorem 2. Therefore, as the robots travel their paths, the adaptation laws cause r(q) -+ 0, Vq E Q, Vr. This means that all robots' trajectories are rich enough to generate accurate estimates for all of the environment. Using two robots, we show in simulation that as the robots travel their paths, the adaptation laws cause Or(q) -+ 0, Vq E Q, Vr. This simulated learning phase is shown in Figure 210, where the updated environment knowledge for the two robots is presented in the 2nd column. As shown, the estimated environments converge to the real environment for sampling conducted along the initial sweeping paths. This means that all robots' trajectories are rich enough to generate accurate estimates for all of the environment. 62 100 0800 600 400 En 0 200 0 0 50 100 150 200 250 300 350 400 450 300 350 400 450 Path Iterations (a) Consensus Error 1 V7 1 4S-. 0 1 21 0- OW 8 6 4 2 ~0 50 100 150 200 250 Path Iterations (b) Integral Parameter Error Figure 2-9: The total consensus error shows that every robot converges to the same estimate of the environment. The mean integral parameter error shows that the estimate that each robot converges to, is indeed an accurate representation of the environment. 63 0.8 0.6 0.4 C0 0,2 0.4 0.6 0.8 1 (a) Learning iteration: 1 0.8- 0.6 0.4 0 0.2 0.4 0.6 0.8 1 (b) Learning iteration: 20 0.4 2 (c) Learning iteration: 135 _0 0.6 0.2 04 0.6 0.8 1 02 0.4 06s 0.8 1 0.4 02 00 (d) Learning iteration: 180 Figure 2-10: Multi-robot learning phase with informative path controller. 1st column: the paths connect all the waypoints, shown as black circles, and each robot has a different color assigned to it. The black arrow represents the robots 2nd column: the translucent environment represents the true environment and the solid environment represents the individual estimated environments. 64 2.7.2 Path Shaping Phase Once the robots travel through their paths once, the controller from Section 2.5.3 is activated. Figure 2-11 shows that the estimated mean waypoint position errors, where the estimated error refers to the quantity I|IM (t)J[(t) + a[ (t) 1. As shown, limt, a[ (t) = + f A(t) <(t) 0, Vi, r in accordance with proposition (i) from Theorem 2. Therefore all way- points have converged to an equilibrium in accordance to (2.4). 0.5- 00 50 100 150 200 250 300 350 400 450 Path Iterations Figure 2-11: The mean waypoint position error shows that the waypoints converge to the equilibrium defined by (2.4), where there is a balance between sensing error and informative path length. The peak at 160 iterations indicates when the path shaping phase begins. The simulated path evolution using this controller with a -- 1 is shown in Figures 2-12a to 2-12e. The waypoints minimize the expected variance of their sensor estimates over the environment to create an informative path that ultimately visits no static areas in the environment. Figure 2-13 represents additional simulations that show the final informative path configurations for two different environments with several different mixing function classes. As opposed to previous simulations that used sensing and neighbor weight W Wn = 10, respectively, these two new environment simulations use W = = 100 and 10 and Wn = 5 for all control strategies. Therefore, a very slight emphasis is placed on sensing over path length. Figures 2-13e - 2-13f show how well a = -10 approximates the Voronoi controller, where the resulting informative paths are nearly identical. 65 1, 0.8- 0.6 )~' :1 0.4 0.2 0 0.2 (a) Environment I 0.6 0.8 1 1, 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 '10 0.4 (b) Iteration: 30 0.2 Cl- 0.4 0.6 0.8 1 0 0.2 (c) Iteration: 110 0.6 0.8 1 (d) Iteration: 220 , . . . 1 0.4 0.2 0.4 0.6 0.8 0.8 0.6 0.4 0.2 0 1 (e) Iteration: 300 Figure 2-12: Multi-robot path shaping with informative path controller at a = -1. resulting informative paths travel through only dynamic regions of the environment. 66 The fi 0 0.5 0 0.6 0.4 0. 0,2 0. 4 0. 2C ci 0.2 0.4 (a) a 0s = 0.5 0.2 (0 -1 LI 0.8 0.6 0.4 0A -0 08 01 - 0.2 (b) a= -10 0.4 0.6 0.8 I (c) a =-oo 0.6 0.6 0.8 0. 0.4 0.4 0.2 0.2 0.2 . ... .......... ... ........ ...... ... 4 2 (d) 0.6 a = -1 0.8 0.2 0.4 (e) 0.6 a= -10 0.5 _0 0.2 0.4 0.6 0.8 1 (f) a = -oo Figure 2-13: Informative path configurations for two environments with different mixing functions. The top row represents an environment with some regions of nonzero sensory information. The bottom row represents an environment where every region contains important sensory information. Notice that the Voronoi smoothing controller approximates the Voronoi controller fairly well for the both environments. 2.7.3 Varying Sensing Weight Ws and Path Weight Wn The informative paths in Figure 2-12 were given by W = 100 and W, - 10. Given that the ratio W8= 10, the weights are heavily skewed to encourage thorough sensing instead of shorter path configurations. By increasing the neighbor distance weight Wn, this higher gain will provide greater attractive forces between neighboring waypoints, causing the paths to be shorter. Therefore, the system evolves in a way that the environment is covered with lowlength paths with less of an emphasis placed on sensing. In Figure 2-14, we compare two weighting scenarios: (1) Wn > W, and (2) W > Wn. In scenario 1, W, is high, providing high attractive forces between neighboring waypoints. As a result, we see more of an emphasis placed on neighbor waypoint weights and shorter coverage paths. In scenario 2, 67 there is a low W, providing low attractive forces for neighboring waypoints. This causes the waypoints to be more distant to each other and focus more on the coverage task, rather than generating short paths. Depending on the application, the weights W, and Ws can be selected to satisfy either sensing or size constraints. 0.8 0.8 0.8 0.4 0. 02 0.2 04 0.6 0.8 (a) W,, > Ws, a = -1 0 02 0.4 0.8 0.2 0.8 (b) Wn > Ws,a = -- 10 a. 0.8 0. 0.X 0. 0.4 0. 2 0.2( 0.2 (d) Ws > Wn, a= -1 04 M8 (e) Ws > Wn,a = -10 0.8 04 0. 0.8 1 0.8 1 (c) Wn > W,a = -oo I 0 0.2 0.4 0.8 (f) Ws > WB,a = -oo Figure 2-14: Multiple robot Ws vs.' W,. Top Row: W,, > Ws - shorter paths. Bottom Row: Ws > Wn -÷ more thorough sensing. 2.7.4 Computational Complexity of Multi-Robot Algorithm The three parameters that impact the computation time are the number of robots N, the number of waypoints nr, and the number of grid squares in the integral computation, d. Again, we do not need to check if a point is in a polygon, but the integrand g(a) we integrate over is linear in N -n,. The time complexity for computing a discretized integral is linear in the number of grid squares, which is an O(d) operation. Therefore, the total time complexity of the controller is O(d -N -n) during each iteration. As a decreases, the 68 behavior of the mixing function controller approaches Voronoi coverage, which is considerably more computationally expensive, because it requires the additional computation of a Voronoi Partition. This implies that the upper bound of time complexity for the mixing function controller occurs at a = -00 and has a value O(N. nr(d + 1)) at each iteration. 2.7.5 Robustness Considerations for Initial Waypoint Configurations Because it only considers sensor estimates within each robot's localized Voronoi partition, informative path configurations computed using a Voronoi coverage controller depend heavily on the initial relative position of a waypoint with respect to other waypoints. As shown in Figure 2-15, a very slight variation in initial waypoint position can have a large effect on the final informative path configuration. As a result of its ability to approximate the Voronoi controller, the Voronoi smoothing controller also exhibits localized waypoint sensitivities during path reconfiguration even for free parameter values of a= -3 as shown in Figure 2-16. Based on its convergence behavior shown in Section 2.5, the mixing function controller can be used to relax the dependance on initial waypoint conditions by using a probabilistic control strategy. Because it aims to minimize the variance of a collection sensor measurements of the same point of interest, instead of only considering the closest measurement to the point of interest, the mixing function controller for free parameter value a = -1 does not exhibit extreme topological differences in informative path configurations corresponding small differences in initial waypoint positions. This correspondence is shown in Figure 2-17. In each of these figures, the two robots have the same amount of waypoints (n = 30), and the waypoints of a second robot are positioned a distance of 0.001, in the unit environment, away from the corresponding waypoints of Robot 1. Intuitively, if a waypoint of Robot 1 were to be considered the center of a circle with radius 0.001, the corresponding waypoint positions tested for Robot 2 are on the circumference of the circle surrounding Robot l's waypoints as shown in Figure 2-18. One example of this positioning is [p27 ... ,p 2 ]x = [pl ... 1]x+0.001 and [p2,.. ] =[p... ,pj2] along the x-direction. 500 different positions of Robot 2 were considered for this specific environment simulation, 69 1 0.8 0.8 0.8 0.6- 0.6- I I 0.4- 0.2 0.2 0.4 0.6 0.4- 0.2 0.2 0.8 0.4 0.6 T 0.8 (a) Nearly Identical Waypoint Positions of Two (b) Nearly Identical Waypoint Positions of Two Robots Scenario 1 Robots Scenario 2 0.8 0.6- 0.4 0.4- 0.2 0.2 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 (c) Voronoi Informative Path Separation for (d) Voronoi Informative Path Separation for Scenario 1 Scenario 2 Figure 2-15: Due to sensing sensitivity, small changes in initial waypoint positions can result in significantly different informative path configurations for the same environment using a Voronoi coverage method. each yielding the same general informative path configurations as shown in Figure 2-17. In addition, to this environment, 47 other environments were tested, and in each case, the minimum variance controller produced consistent paths, regardless of the presence of small initial numeric differences in waypoint positions across multiple robots. 70 I *lI 0.8 0.8 0.6- 0.6 I I 0.4- 0.2 0.2 0.2 0.4 0.6 0.8 1 0 (a) Nearly Identical Waypoint Positions of Two Robots Scenario 1 1. 0.2 0.4 0.6 0.8 1 (b) Nearly Identical Waypoint Positions of Two Robots Scenario 2 1 0.8 0.6 0.SF 0.4 0.2 ) 0.2 06 0.2 0.4 0.6 6? 0.2 0. 0.4 0.6 0.8 (c) Voronoi Approximation (a = -3) Informa- (d) Voronoi Approximation (a = -3) Informative Path Separation for Scenario 1 tive Path Separation for Scenario 2 Figure 2-16: The Voronoi smoothing controller retains the local sensitivities of the Voronoi controller. Due to sensing sensitivity, small changes in initial waypoint positions can result in significantly different informative path configurations for the same environment using a Voronoi coverage method. 71 1 0.8 0.8 0.6- 0.6- 0.4 0.4- I 0.2 0.2 0.4 0.6 1e I 0.2 0.8 0.2 0.4 0.6 0.8 1 (a) Nearly Identical Waypoint Positions of Two (b) Nearly Identical Waypoint Positions of Two Robots Scenario 1 Robots Scenario 2 0.8 0.8 0.6 0.6- 0O4F 0.4 0.2 0.2U 0 0.2 0.4 0.6 0 .2 0.8 0.4 0.6 0.8 (c) Minimum Variance Informative Path Separa- (d) Mimimum Variance Informative Path Sepation for Scenario 1 ration for Scenario 2 Figure 2-17: Because it aims to minimize the variance of a collection sensor measurements of the same point of interest, the mixingfunction controller for free parameter value a = -1 does not exhibit the extreme topological differences in informative path configurations as shown by Voronoi and Voronoi smoothing approaches. 72 x10, A-A'Qee . I 0.8 Robot 2 eeeee Waypot Tests 0.6 radius 0.4 = 0.001 0.2 0 Robot I WA faypoint -0.2 -0.4 -0.6 -0.8 -U.U M.0 -0.4 00 -0.2 0.4 0.6 0.8 x10, Figure 2-18: Within a unit square, the initial relative waypoint positions of two robots were varied 500 times for each environment according to this diagram to show that for small changes in initial relative waypoint positions, the minimum variance controller still outputs nearly identical informative paths. 73 74 Chapter 3 Informative Persistence Controller for Multiple Robots using a Mixing Function Coverage Approach The objective of this chapter is to derive a persistent control strategy that is a combination of an informative path controller and a speed controller. During a persistent sensing task, robots equipped with a finite sensor radius collect information in a dynamic environment in order to guarantee a bound on the difference between the robots' current models of the environment O(Q,t) and the actual state of the environment #(Q,t) for all time. Due to their limited sensing range, the robots cannot collect all the data of the environment at one time iteration. As a result, the data of a dynamic region can become outdated, and the robots must return to that region at a given frequency to collect a necessary amount of new information. Because different parts of the environment can change at different rates, the robots must visit different areas in proportion to their rates of change to ensure a bounded uncertainty in the estimation. Thus, we extend the previous sensory function O(q) to include a rate of growth. This extended sensory function is referred to as an accumulation function in the environment. The accumulation function grows where it is not covered by any of the robots' sensors, indicating a growing demand to collect data at that location. Similarly, the function shrinks where it is covered by any of the robots' sensors, indicating a decreasing 75 demand for data collection. In accordance with the objective of the sensing task, we superimpose the mixing function informative path controller with the speed controller from [30], to generate an informative persistence controller. We show that this controller enables the robots to stably complete a persistent task. This encompasses learning the environment dynamics, in the form of growth rates of the field over the environment, and subsequently generating an informative path for a robot with a finite sensor radius to follow and sense the accumulation function, to guarantee the height of the field bounded. 3.1 Relation to Persistent Sensing Tasks Each robot is equipped with a sensor with a finite radius Fr(pr) = {q E Q :q - pr < p}. The stability criterion for a persistent sensing task executed by multiple robots, when given the speed profile for each robot [30], is - ~q) r' where 4(q) q c (q) = s(q, t) < 0, Vq | O(q) > 0, (3.1) is now the rate at which the accumulation function grows at point q. c(q) is the consumption rate at which the accumulation function shrinks when robot r's sensor is covering point q. Note that c(q) > 0 (q), Vq. Tr(t) is remaining amount of time it takes robot r to complete the path at time t, and 'rcr(q, t) is the duration of time that robot r's sensor covers point q along the path at time t. Tr(t) and 'r[(q, t) are directly calculated using the speed profiles. The stability margin of the system is given as S(t) = - (maxq s(q, t)), where a stable persistent task is defined as S > 0. The persistent sensing task only considers points q that satisfy (q) > 0, because a point with zero sensory information does not affect the stability of the controller. Points with nonzero sensory information are called points of interest. In an unknown environment, robots use the estimated version of (3.1), defined as N #r(q, t) - rc' q t 'c,. '.JT1 (q) = Sr(q, t) < 0, (t) 76 Vq | Or(q, t) > 0, (3.2) where c(q) and the speed profile that maximizes $r(t) [30] are both known by the robot. This implies that robot r's estimated stability margin at time t is defined as Sr(t) = -(maxq sr(q, t)). In [30], a LP calculated the speed profile for each robot's path at a time t that maximized $r(t). Using the speed profiles obtained with this LP in conjunction with (3.2), we can derive an informative path controller for a persistent task that causes both the robots' paths and speed profiles to locally optimize persistent monitoring tasks. As noted in (3.2), we assume that the maximizing speed profile for Sr(t) is known by each robot r and is used to obtain sr(q,t), Vq, Vt. 3.2 Informative Persistence Controller To include a stability margin, we assign the waypoints new dynamics of the form f = (3.3) I u, where u is defined in (2.41), and IT 1, if Tr u < 0 and t - tu's > Tdweii i otherwise, , 0, (3.4) where 'rdwen is a design parameter, and tr'7 is the most recent time at which If is unit stepped from zero to one. This controller ensures that the estimated stability margin $r(t) is monotonically non-decreasing for all r. Remark 5 If is a binaryfunction that is used to stop the coverage control action on the waypoints if their reconfigurationis not beneficial to the desired task. Therefore, if the imposed dynamics of informative path controllerdo not improve the stability margin, the coverage controller is temporarily suspended. Using Lyapunov theory, we now prove that the system is stable for persistent sensing tasks. 77 Theorem 3 (Convergence Theorem for Multi-Robot Persistent Sensing) UnderAssumption 1, with waypoint dynamics specified by (3.3), control law specified by (2.41), and adaptive law specified by (2.48), we have 1. Waypoint Equilibrium Convergence imt+.If(t)|A f(t)ef(t) + = 0, VrE {1,...,N}, Vi E {1,. . .,n(r)}, /i(t)l 2. Environment Estimate Error imt+-||pr('r)=0, VrE {1,...,N}, V'r wr(r) >0, 3. Consensus Error limt'.(a~r - ak)= , Vrr'E {l, ... ,)N}. Proof 3 We prove asymptotic stability of the system to a locally optimal equilibriumusing a Lyapunov function candidatebased on virtual energies. Again, we define a positive-definite Lyapunov function Va based on the robots' paths and sensoryfunction estimates and use the three criterionof Barbalat'slemma [29]: 1. Va is positive-definite 2. Va is negative-semi-definite 3. Va is bounded to imply Va -+ 0 and the subsequent asymptotic convergence of the multi-robotsystem to a locally optimal equilibrium. Let Va be the Lyapunov function candidate. Va is identical to the Lyapunov candidate from (2.51). Following the procedurefrom Section 2.7, but with pi now defined by (3.3), we get IT N nr Va = fi r=l i=1 N &projrprer J4Lnj. (3.5) -- (Aflr+ IrKr(r //) + N //)-y r=1 d r r=1 g+ j=1 78 j t wr(r)(Tp, ( )) 2d 2d d Using the same procedure as in Section 2.7, denote the four terms in (3.5) as - 91 (t), -7 2 (t), -i t(t) and - ) 4 (t), so thatVa(t) = -61(t) - 92(t) - tb (t) - 04(t). In Section 2.7, we showed that, v% 2 (t) -+ 0 and 04 (t) -+ 0. This implies (ii) and (iii). We still need to show that limt-+. 61 (t) = 0. Let ' be defined such that 01 Z I 6i. Assume that limtnoo Of (t) = 0, such that, Vt ~tj > t, E > 0, and O'(tj) > e. Let {tj 1 be the infinite sequence of tj's separatedby more than 2 -rdwell such that Of (tj) > E. That is, Itj - tj I > 2 rdwell, Vj # i/, and tj,t1 ' E {tj}7 1 . Because, from [27] (Lemma 1), 'N(t) is uniformly bounded by some value M when Ii = 1 (i.e. |tf (t) 1 5 M), and whenever Ii = 1, it remains with this valuefor at least Tdwell, then we have that Vtj E {tj} Ij d 1, (r)dr > e5 > 0, (3.6) , dwel} > 0. 2M (3.7) tjt-rwell where 3=min{ 00o Ld Jtj{t + dwell j+~~d > 1 }7 1 tjjit- well tje{tj}j0 (3.8) 65, Y > '6(T)dr 1 which must be infinite, and therefore contradictsthat f7 01 ( r)d r = YIq1 fo ' ('r)d'r exists and is bounded. Therefore, by contradiction,limt_,o 0,' (t) =0. This implies (i). Remark 6 Theorem 3(i) stipulates that limt-4. the persistent task. Otherwise, limt- IAk(t)E(t) + i[(t) I = 0 iff this benefits I[ (t) = 0, which stipulates that the persistent task does not benefit if the ith waypoint in robot r's path reconfiguresits position. 3.3 Single Robot Informative Persistence Algorithm and Simulation The single robot informative persistence algorithm has the same two level hierarchy as the mixing function control algorithm presented in Section 2.6. While the robot level algorithm 79 of the informative persistence algorithm remains identical to the algorithm in Section 2.6, the waypoint level of the persistent sensing algorithm changes due to new dynamics introduced by a stability margin requirement. Now, once a waypoint uses the parameter estimate a from the robot level algorithm to compute its dynamics in accordance to the mixing function control strategy (line 7 of Algorithm 5), it must immediately determine whether these dynamics increase the stability margin of the persistent task. If the margin is not increased, the dynamics are deemed not beneficial to the sensing task, and the binary function It on line 8, temporarily suspends the coverage control action. The waypoint algorithm is shown in Algorithm 5. Algorithm 5 Informative Persistence Controller for a Single Robot: Waypoint Level Require: Each waypoint knows a from Algorithm 1 Require: Each waypoint knows a Require: Each waypoint knows the position of all other waypoints on f Require: Each waypoint has knowledge of S 1: Initially compute the value of S 2: loop 3: 4: 5: 6: 7: 8: Compute the waypoint's mixingfunction Compute Ci according to (2.30) Obtain all other waypoint locations [pi,... Pn] Compute ui according to (2.41) Compute Ii according to (3.4) Update pi according to (3.3) 9: end loop The informative persistence controller for a single robot was tested in MATLAB. Overall, ten cases considering 10 different environments were performed. We present a case for n = 36 waypoints and a = -1. For the remainder of this chapter, we will continue to examine only the minimum variance approach. The logic behind this decision is derived from the fact that the stability margin criterion applied to Voronoi, Voronoi smoothing, and minimum variance informative persistence strategies are identical. Therefore, the only difference between these three approaches to persistent sensing control, is the coverage control algorithm that we previously evaluated in Section 3.3. Voronoi informative persistence was thoroughly evaluated in [33], and we have shown that Voronoi smoothing approximates Voronoi coverage control reasonably well. 80 In these tests, a time step of 0.01 seconds is used, and Vdwel = 0.010. The environment parameters are a = 0.2 and ptnmc = 0.1, a(6) = 20, a(14) = 60, a(17) = 40, and a(j) = 0, otherwise. The parameters a, A and )L are initialized to zero. The controller parameters are Ki = 70, Vi, F = identity, y= 2000, W,, = 10, W = 100, w = 100, and p = 0.05. The environment is discretized into a 10 x 10 grid and according to the control law, only points in this grid that satisfy 0 (q) > 0 are used as points of interest in (3.2). As in the informative path controller case, results are presented in an initial learning phase and a path shaping phase. The path learning phase employs a rich initial sweeping path so that the robots will be able to measure and sample the entire environment. The path shaping phase begins immediately following the learning phase and drives the waypoints to reconfigure the robot's path into an informative persistence path according to both the mixing function coverage controller and stability margin. 3.3.1 Learning Phase Initially, the robot travels its static path in its entirety, measuring and using the adaptation law (2.48) as it travels to estimate the environment. As the robot travels its path trajectory, its adaptation law causes its estimate of the environment to converge to the real environment, O(q) -+ 0, Vq E Q. This result is shown in Figure 3-la, where limt, f' wtr) (Pr)) in accordance with proposition (ii) from Theorem 3. To complete the convergence criteria for the learning phase, Figure 3-lb shows that the Lyapunov function candidate Va is monotonically non-increasing, as required by Barbalat's Lemma. The learning phase simulation as seen in Figure 3-2, shows that the robot uses a rich initial sweeping trajectory to initially observe the entire environment. As the robot travels its path trajectory, its adaptation law causes its estimate of the environment to converge to the real environment, 0 (q) -+ 0, Vq E Q. 81 2dr =0, : I 1 1 I 20k L- 0 1! 5- V- E 5 0 50 100 150 200 250 Path Iterations 300 350 400 450 (a) Integral Parameter Error 30C 25( 0 )0 0 20C > 150 0-0 '0 - 3, 100 0 500- S 50 100 150 200 250 Path Iterations 300 350 400 450 (b) Lyapunov Function Figure 3-1: Mean integral parameter error and Lyapunov function candidate of the informative persistence controller for a single robot. The parameter error shows that the robot's estimate of the environment converges to the actual environment, while a positive-definite Lyapunov function was required by Barabalat's Lemma 82 0.E 04 0.2 0.2 0.4 0.6 1 0.8 1 (a) Learning iteration: 0. 8 0. 6 0.4 S 0. 2 0.2 0.4 0.6 0.8 1 (b) Learning iteration: 70 1C 4%77 I 0.8k 0.6 0.4 0.2 -c 0.2 0.4 0.6 0.8 1 (c) Learning iteration: 210 7' 0.8 0.6 0.4 0.2j 0 0.2 0.4 0.6 0.8 1 (d) Learning iteration: 300 Figure 3-2: Single robot learning phase with informative persistence controller. 1st column: the path connects all the waypoints. The points of interest are shown as green regions and the black arrow represents the robot. The robot's sensing radius is represented by the blue circle around its position. 2nd column: the translucent environment represents the true environment and the solid environment represents the estimated environment. 83 3.3.2 Path Shaping Phase Figure 3-3 shows the mean waypoint errors for persistent sensing, or the mean of the quantity Ii(t) ||MI'(t)Ji(t)+ ai(t)f1. limt÷ Ii(t)|A$(t)ei(t) + ai(t)| = 0, in accordance to proposition (i) from Theorem 3. This implies that the waypoints reach the equilibrium defined by the informative path controller. Most importantly for task stabilization and system dynamics, Figure 3-4 shows the temporal evolution of the persistent task's stability margin. Both estimated and true stability margins are shown. The true plot provides ground-truth, showing that the robot's estimates were a good representation of true values. Because the persistent sensing task is initially unstable, the stability margin starts off with a negative value, and then increases with t, implying that the path is reconfiguring to stabilize the persistent task. Note that the stability margin is positive at the conclusion of the simulation. The path evolution using this controller is shown in Figure 3-5. The robot learns the regions of the points of interest in the learning phase and then reconfigures its path to encompass these regions. At 300 iterations (Figure 3-5f), the path locally optimizes the persistent sensing task. 1.5 0 0 ~u0.5- 0 50 100 150 200 250 300 350 400 450 Path Iterations Figure 3-3: The mean waypoint position error converges to zero. This implies that the waypoints are in equilibrium as defined by the informative persistence controller 84 I 0 3 -0.5 - Cn Estimated -True 0 200 300 Path Shaping Iterations 100 Figure 3-4: The stability margin for the persistent task is positive, thus implying the task is stabilized. I 0.8 0.8 0.8 0.6 0.6 0.6 0.4t 0.4 0.2 0.2 0.4[ rk 0!2 0.4 0.6 (a) Iteration: 0.8 0 " 0.2' 1 1 0.2 0.8. 0.6 0.6 0.4 0.2 0.2- 0.2 0.4 0.8 (d) Iteration: 30 0.6 0.8 0 0.2 (b) Iteration: 5 0.8 _0 0.4 0.8 0 02 0.6 0.8 (c) Iteration: 20 A 0.4 0.4 0.81 0.6 0.4 0.2 0. 6 (e) Iteration: 80 0.8 0 0.2 0.4 0.6 0.8 (f) Iteration: 300 Figure 3-5: Single robot path shaping phase with informative persistence controller. The path connects all the waypoints, shown as black circles. The points of interest are shown as green regions. The black arrow represents the robot, and its sensor radius is represented by the blue circle around the robot's position. 85 3.3.3 Single Robot Persistence Simulation Discussion The persistent informative controller is nearly identical to the informative path controller (2.13) with the exception of a stability margin switching function Ii in the waypoint's dynamics. In some scenarios, the informative path that was generated by the controller in Section 3.2 was very similar to the informative path generated by the controller from Section 2.7. On others, the informative paths from both controllers were very different. This is due to the additional restriction of the non-decreasing stability margin. The controller weights W and W, can also impact how the controllers behave and should be tuned to achieve the desired behavior. 3.4 Multi-Robot Informative Persistence Algorithm and Simulation The multi-robot informative persistence algorithm uses the same two level heirarchy as in Sections 2.7 and 3.3. Again, the robot level algorithm does not change from the algorithm presented in Section 2.7. However, the waypoint level informative persistence algorithm dynamics must be updated to account for the stability margin switching variable If. Once the waypoints receive the parameter estimate of the environment fr from the robot level algorithm, they then compute the mixing function dynamics and determine whether they increase the stability margin. If the margin is not increased, control action is temporarily suspended by If. These changes are seen in Algorithm 6. We test the mixingfunction controller for persistent tasks using multiple robots in MATLAB. Overall, we tested 12 cases, each with a different environment. We present a case for N = 2 robots, a = -1, and nr = 22 waypoints, Vr. A fixed-time step numerical solver is used with a time step of 0.01 seconds and Vdwel = 0.01. The region Q is taken to be the unit square. The sensory function 0 (q) is parametrized as a Gaussian network, with B defined in (2.27), a -= 0.2 and pt., = 0.2. The parameter vector a is defined as a(j) = 60, for j E {8, 13, 19}, and a(j) = 0 otherwise. The environment's sensory function (growth rates) created with these parameters can be seen in Figure 3-9. 86 Algorithm 6 Mixing Function Informative Persistence Controller for Multiple Robots: Waypoint Level Require: Each waypoint knows fr from Algorithm 3 Require: Each waypoint knows a Require: Each waypoint knows the location of all its neighboring waypoints Require: Each waypoint has knowledge of Sr 1: Initially compute the value of $r 2: loop 3: 4: 5: Compute the waypoint's mixing function Compute C according to (2.30), but integrating over Q from (2.16) Obtain all other waypoint locations [p1,... p] 6: Compute un according to (2.41) Compute If according to (3.4) Update p according to (3.3) 8: 9: end loop 7: The parameters ir, Ar and Ar, for all r are initialized to zero. The controller parameters are IF = identity, y= 2000, Wn = 10, W = 100, w = 100 and p = 0.05. The two robots were assumed to be fully connected so that lrr. (t)(= 20, Vr, r', Vt. The environment is discretized into a 10 x 10 grid. Again, only points in this grid with or(q) >0 are considered as points of interest. The environment's accumulation function grows at a growth rate 4 (q) at point q and is consumed by robot r at a consumption rate of cr(q) = 10 if q E Fr(pr(t)). This accumulation function is represented by the green regions in Figure 3-9. The goal of this simulation is to guarantee the size of this region is bounded. We present results from the initial learning phase and from the path shaping phase as we did in Section 2.6. In the path shaping phase, the new dynamics (3.3) induced by the stability margin, are used to reconfigure the robots' paths. 3.4.1 Learning Phase In the learning phase, the adaptation laws drive o,(q) -+ 0, Vq E Q, Vr. This implies that the robots' estimates of the environment converge to the actual environment and that the robots' initial trajectories were robust enough to accurately estimate the entire environment. Figure 3-6 shows that the mean over all robots of ft Wr 87 ()(,r()) 2 dr converges to zero, as suggested by proposition (ii) from Theorem 3. Figure 3-8 shows that the consensus error, referring to iT E 1 /(ar - arl) converges to zero, which corresponds to proposition (iii) from Theorem 3. To complete the experimental validation of the convergence criteria for the learning phase, Figure 3-7 shows that the Lyapunov function candidate Va is monotonically non-increasing and that infimum(Va > 0). UI I I 0 1 1 4CL 34-J r. 2 0 100 200 300 Path Iterations 400 500 Figure 3-6: The mean integral parameter error shows that the robots' estimate of the environment converges to the actual environment at the end of the learning phase. 88 600 600C I I I I 100 200 300 I I I 400 500 500C 0 400C 0 300C 200C 1 00C I.- 0 I 600 Path Iterations Figure 3-7: Lyapunov-like function in learning phase is positive-definite and bounded as required by Barbalat's Lemma. 20C 'II I I I I I I I 100 200 300 400 I 1 501 100 0 U 50 0 -50 -100' C 500 600 Path Iterations Figure 3-8: We showed that the robots' total estimate of the environment converges to the actual environment. The consensus error shows that each robot's estimate of the environment is identical at the conclusion of the learning phase. 89 The learning phase simulation is shown in Figure 3-9, where each robot has a rich trajectory over the environment so that they can sample and learn the entire environment. 0.8 1.6 0 .4 0 .2 0 0.2 0.4 0.6 0.8 1 (a) Learning iteration: 1 0 1 0 .6 0 .4 0. 0 0.2 0. 0.6 0.8 1 (b) Learning iteration: 25 0.8 L 0.6 041 0:I~ _0 0.2 0.4 0.6 08 1 (c) Learning iteration: 100 4 I 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 (d) Learning iteration: 200 Figure 3-9: Multi-robot learning phase with informative persistence controller. The paths travel through the points of interest shown as green regions. The robots' sensing radius are circles around the robots' positions. The translucent environment represents the true environment, and the solid environment represents the estimated environment. 90 3.4.2 Path Shaping Phase In this phase K[ = 70, Vi, r. Upon the completion of the learning phase, the controller from Section 3.3 is activated. Figure 3-10a shows that as the paths reconfigure, the quantity Ir(t) IIf[(t)6r(t) + af(t) converges to zero, Vi, r in accordance with proposition (i) from Theorem 3, implying that the waypoints reach the coverage equilibrium defined by the informative persistence control strategy. To complete the convergence criteria, Figure 310b shows that the persistent task's stability margin increases with t as the robots' paths approach informative paths for persistent sensing. Thus the persistent task is stabilized. 20 0. 0 *0. 60 S0. 40. 2 n1I _0 100 400 300 200 500 600 Path Iterations (a) Mean Waypoint Position Error Multiple Robots - 0 -0.5 .c MW -Estimated -1!-ru 0 -True 100 200 Path Shaping Iterations 300 (b) Persistent Task Stability Margin for Multiple Robots Figure 3-10: The mean waypoint position error and stability margin for multiple robots show that the waypoints converge to sensing equilibrium and the persistent task is stabilized. 91 The path evolution using the dynamics of this controller are shown in Figures 3-1 la to 3-1If. 0.8 0.6 0.6 0.6 0.4 0.4 0.2 0 0.2 0.4 0.6 (a) Iteration: 0 0.8 0.2 1 0.4 0.6 0.8 (b) Iteration: 10 1, 0.8 0.8 0.6- 0.6 0 .4 0.4 0 .2- 0.0f 0.2 0.4 0.6 0.8 0 0.2 (c) Iteration: 30 0. 8, 0.8 0. 6 0.6 0. 4 0.4 0.2 0.2 0.2 0.4 06 0.6 0.8 1 (d) Iteration: 60 1 0 0.4 . 0.2 0.8 (e) Iteration: 100 04 . . 0 9 0.6 0.8 (f) Iteration: 300 Figure 3-11: Multi-robot path shaping phase with informative persistence controller. The paths connect all the corresponding waypoints, shown as black circles. The points of interest are shown as green regions. The black arrows represent the robots, and their sensor radii are represented by the colored circles around the robots' positions. 92 3.4.3 Multi-Robot Persistence Simulation Discussion This informative persistence controller is identical to the informative path controller from Section 2.3 while path shaping is required by the persistent sensing task. When the reconfiguration of waypoints based on the new dynamics is not required by the persistent task, control action switches according to the stability margin and does not allow the waypoints to move. By including a step variable (3.4) in the informative persistent sensing Algorithm 6 to restrict a non-decreasing stability margin, we can prevent the paths from shaping the same way that they would by using the mixing function informative path controller given by Algorithm 4. The controller weights W, and Wn can also impact how the controllers behave, and should be tuned to achieve the desired behavior. When performing persistence sensing, each robot uses the speed profile it precomputed to cooperatively stabilize the task. When using a speed profile, it is important that the robots first undergo a learning phase so that they can use the consensus term in their adaptive laws to globally converge to the same environment estimates. Using this approach, the robots are able to work together during the path reshaping phase to cooperatively stabilize the persistent sensing task. If the robots' estimates of the environment were to be different, then the speed profiles of each robot would not globally stabilize the persistent sensing task. In this chapter we have discussed how multiple robots shape their paths into informative paths that are useful for persistent sensing. In the following chapter we show how informative paths and persistent sensing can be used together to develop a dynamic patrolling policy to optimize a fleet of service vehicles operating in an urban environment. 93 94 Chapter 4 Dynamic Patrolling Policy Using an Informative Path Controller 4.1 Motivation In this section, we apply the mixing function control strategy to a transportation problem. We are interested in generating informative paths along which an existing fleet of service vehicles can patrol to increase their efficiency of servicing instantaneous customer demand. It is our vision that once this solution is implemented, service vehicles such as taxis, can become viable platforms for Mobility-on-Demand (MOD). The primary task for many current MOD systems is to propose vehicle allocation solutions that compliments public transportation by optimizing metrics that make the route to and from a public transportation station more efficient in time, distance, and fuel economy [3]. The objective of our informative path application is to minimize the waiting time of the passengers and the amount of time the vehicles in the system drive empty. In our previous work [22], we showed how autonomous driving can be used to mitigate the rebalancing problem current MOD systems face. We consider the task allocation problem in a MOD scenario. In MOD transportation, we assume historical knowledge of passenger arrival at discrete sets of locations. We present a dynamic patrolling policy that allocates vehicles to pickup and delivery tasks. Using historical arrival distributions, we compute patrolling loops that minimize the 95 distance driven by the vehicles to get to the next request. These loops are used to redistribute the vehicles along stationary virtual taxi stand locations on the loop. The algorithm was trained using one month of data from a fleet of 16,000 taxis. We compare the policy computed by our algorithm against a greedy policy, as well as against the ground truth redistribution of taxis observed on the same dates and show an improvement with respect to three key evaluation criteria: (1) minimizing the number of vehicles in the system, (2) quality of service, and (3) distance traveled empty. We show that our policy is robust by evaluating it on previous unseen test data. The main contributions of this policy: " patrolling loop and redistribution model of an unmanaged fleet operation using historical data, * provably stable dynamic redistribution policy for a large number of vehicles using informative paths, " centralized scheduling algorithm for request allocation and vehicle redistribution, * large-scale simulations and evaluations using real data from a fleet of 16,000 vehicles. 4.2 Problem Formulation We consider a pickup and delivery problem (PDP) in a convex bounded planar area QC R 2 . This area or service environment is subject to incident request arrivals at continuous points q E Q. The environment is patrolled by N vehicles that drive along closed loops at constant speed. For simplicity we normalize speed to 1, so that time and distance are equivalent within our model. Vehicles are assigned to service requests by a centralized server, which is assumed to know the locations of all vehicles at any time. A vehicle vi that has been assigned a request qj will travel in a straight line to qj to pick up the request, and then deliver it to its destination si E Q. We assume a continuous time model, i.e. time t E R>O. Requests arrive according to a Poisson process with an arrival rate A and are 96 distributed throughout the region according to a historically derived arrival distribution Za. The destination of incident requests is determined by a destination distribution Zd. 4.2.1 Using Informative Paths In this work we use informative loops generated by a Voronoi informative path controller [32] and extend them to a pickup and delivery problem. We only consider the Voronoi control strategy for this application, because it only requires local robot state information (Voronoi neighbors) to compute an informative path solution. This localized sampling approach is similar to how a taxi or other service vehicle would determine which regions of an environment to patrol. We observe that the regions of dynamic change are analogous to regions of pickup demand, and the act of sampling to reduce uncertainty in coverage is analogous to the act of picking up and delivering incident requests. The difference is that in PDP problems we have discrete rather than continuous events. Also, delivery is different to sampling in that the vehicle has to deliver the request and return to the informative path. In our model, the resulting informative path is a patrolling loop whose route is locally optimized such that it traverses along or very near the areas where pickup requests originate. By utilizing waypoints along a patrolling loop, informative paths can be visualized as method to locally optimize the location of virtual taxi stands across the environment. The goal is to compute the path and placement of the patrolling loops so that the distance from the patrolling loop to the requests is optimized. A mathematical description of this algorithm for multiple agents follows. Multi-Agent Controller Extension There are X e R>O agents identified by r E { 1, ... ,X} in a convex environment Q c R 2 . 4.2.2 In this derivation, note that the number of agents represents the number of patrolling loops that are computed by our patrolling policy. A point in Q is denoted q. Agent r is positioned at pr E Q and travels along its closed path r : [0,1] _ R 2 , consisting of a finite number n of waypoints. The ith waypoint on 4r is located at p , i E { 1,... , n}. Define a vector 97 P E gX.n C Rdim(P) as the vector obtained by making an array of the agents' waypoint positions, P ={p ,...,pr} , where jXn is the state space of the waypoints for all agents. Note that the controller for the single agent is derived by setting X = 1. Let V[ be a Voronoi partition of Q, for the ith waypoint position in agent r's path. Agents can compute the Voronoi partitions based on their waypoint positions. Because each path is closed, fr(0) = fr(1), and each waypoint i along the path has a corresponding previous waypoint i - 1 and next waypoint i + 1. An agent travels between sequential waypoints in a straight line. The sensory function O(q) in this scenario is updated every fifteen minutes over the course of 24 hours in order to reflect the change in sensory information generated by customer requests throughout the day. The agent knows 0 (q); however, it is equipped with a sensor with sensing radius p, to make a point measurement of 0 (Pr) at its position Pr SO that it ensures the stability criterion for the persistent task [30]. We define the collection of informative paths for our multi-agent system as the set of waypoint locations for each agent that locally minimize the Voronoi coverage cost function given by (2.17). We proved the convergence of this coverage controller in Section 2.5. 4.2.3 Operational Stability In addition to controller stability proven in Section 2.5, a necessary condition for a functional deployment of any fleet of service vehicles is operational stability. Informally, we understand operational stability to mean the condition whereby the number of outstanding requests remains bounded in steady state. Formally, we define operational stability as the condition j (t) dt < k1, V t > 0, k < oo . (4.1) To motivate this requirement, consider events arriving into a queue according to a standard Poisson process with rate parameter X. Then the integral gives us the total number of arrivals from the beginning of time until the current time t. If the events are also being serviced at a sufficient rate according to some other process then the number of events in 98 the queue will be less than some constant times the rate parameter for any time window. The service rate p(t) is defined as the rate at which incident customer requests are being serviced by vehicles. In steady state, the stability requirement in (4.1) is satisfied by the simplified expression pi(t) > I (t), where A(t) and y(t) denote the average arrival and service rates, respectively, over the open interval (0, o). Lemma 1 Let C be a closed curve in Euclidean space, {wl,w2, ... , wn, Q C R 2,composed of n waypoints, that are connected by straightlines. Let x and y be any two points on C. Then d(x,y) , where L(C) is the arc length of C and d(x,y) is the Euclideandistance between x and y. Proof 4 The arc length of C is given by L(C) = d(wi, wi + 1), where waypoint wn+1 = w1, and d(wi, wi+1) is the Euclidean distance between consecutive waypoints. Let A and B Without loss of generality, assume L(A) L(B), which implies thatL(A) < Assume for the sake of contradiction, d(x,y) ;> . be unique segments of curve C that connectpointx to point y, such that L(A) + L(B) = L(C). . Because d(x,y) is by definition the minimum distance between any two points in Q, there exists no path P E Q, such that P< .) However, L( A) K ,thus giving the contradiction. Theorem 4 (Steady State Stability) An informativepath service policy gives Y (t) > A (t) if and only ifX > ) + p + 2v/2l) /v, where X is the number of service vehicles, p is the sensor radius of a service vehicle, 1 is the dimension of the square environment Q, v is Proof 5 -X > X( +p+2VD)1v=>Xv(c +p+221)>X. Let a v/( + the constantspeed of the service vehicle. p + 2V2-l), which has units s-1, assuming the time step z of the experiment is Is. Using the convergence stability of the informativepath controllerfrom Sections ?? and 4.2.3, no incident customer request is locatedfurther than p from the informativepath. By Lemma 1, sup{d : d = ||x- y||, Vx,y E Q} = L(C)/2. Thus, the maximum distance any vehicle on the path is from an incident customer request located at q is L(C)/2 + p. Because Q is a square environment, the maximum distance it takes a service vehicle to drive the customer to its destination, and return to the informative path is 2v/l. Thus the maximum distance 99 traveled by a service vehicle whose trip originates on the informative path to service a request is L(C)/2 + p + 2V2-1. Therefore a is now the smallest rate at which a vehicle can service a request, and hence X a is the smallest y (t) that satisfies y (t) > ;L (t). (=>) By the contrapositve law, X < X (L(f) + p + 2x/2l) /v =- Xv/ (L(C) + p + 2V 1) < .k. Again, by letting a denote v/(L(C)2 + p + 2v/l), the service ratefor a single service vehicle, we have X a < A, which implies y (t) < X (t). This result shows that our approach to task allocation for PDP in MOD systems is stable. Next we present a patrolling policy for the delivery vehicles. 4.3 Dynamic Patrolling Policy Our case study for this work is a PDP in a MOD system and uses real data provided by a fleet of 16,000 taxis in Singapore. We will refer to vehicles as taxis for the rest of the thesis. Collectively, these taxis deliver approximately 340,000 trips per day. We illustrate the operation of our algorithm for the Central Business District (CBD) and extend it for the entire island of Singapore. We evaluate how effective our solution is at minimizing the amount of time taxis drive empty by comparing against a greedy policy as well as what actual taxi drivers do based on historical data. Algorithm 7 Informative Path Controller Pseudocode Parameters: arrival distribution Za, patrol loop f, waypoints W = {wI, w2, cated at P = {Pi,P2,--.,Pn}, vector of taxis Se {Ss2, ... , Sm} 1: loop 2: 3: 4: Compute neighbor waypoint locations pf _1, P + 5: while (dH/dPf > 0) do Compute Voronoi partition Vj from Za Compute centroid Ct by integrating over V, 6: 7: 8: Compute control input uw according to (2.13) Update waypoint position pf according to =u end while 9: Check for incoming requests R = {ri , r2,..., rk} Assign requests Re C R within UVj to f 10: 11: 12: Assign nearest taxis S* C Se to requests Re Rebalance remaining Se \ S* taxis s.t. 1>(wi) = 0 13: end loop 100 ... , wn} lo- 4.3.1 Computational Complexity For the dynamic patrolling policy, at each iteration the controller must compute the Voronoi cell for each waypoint and the spatial integrals over the region. Thus the parameters affecting the computation time are the number of agents X, the number of waypoints n, and the number of grid squares in the integral computation m. A decentralized algorithm for a single agent to compute its Voronoi cell [11] runs in 0(n) time. The time complexity for computing a discretized integral is linear in the number of grid squares, and at each grid square requires a check if the center point is within the Voronoi cell, which is 0(n). Therefore, the time complexity of the integral is in 0(nm). If the Voronoi cell is computed first, followed by the discretized integral, the total time complexity is 0(n(m + 1)) at each step of the control loop. Therefore, in the multi-agent case a single Voronoi controller has time complexity O(X -n(m + 1)). 4.3.2 Solution Outline The service region is subject to incident customer requests located at points q E Q and is patrolled by N taxis whose task is to service these requests in a manner that will minimize the distance driven to every request. Requests arrive at a rate ),, representing the sensory function 0 (q). Each patrol loop is defined by a fixed number of waypoints whose positions are using historical customer request distributions. Our patrolling policy is adaptive in time and benefits from a finer discretization time period. In this work we use 15-minute time periods to compute 96 patrol loops for simulations over a 24-hour period (Figures 4-ib, 4-1c). The simplest scheduling and path planning protocols are used to ensure that performance is attributable to the patrolling policy only. We emphasize that neither of these elements are crucial to the operation of the patrolling policy. In Section 4.5.1 we present control experiments that evaluate our policy against another with identical scheduling and path planning, but using greedy redistribution. 101 1396129 128 1 03 - 1.25 106 1037 1076 103. 103.86 103.8 10 3 .8 105 (c) Singapore loops Figure 4-1: Figure 4-la shows a surface plot of an example arrival distribution Za for the CBD, overlayed on a map of the region; the destination distribution Z4 has a similar format. Figures 4-lb-4-lc show the temporal progression of the patrol loops, overlayed on longitude/latitude plots of the GPS coordinates that form the service region. Patrol loops are shown changing dynamically in 15-minute time periods throughout the day (for a total of 96 iterations for each loop), with a darker shade indicating the most recent configuration. 4.3.3 Algorithm Description Algorithm 7 contains the pseudocode for the informative path patrolling policy. The first stage of the patrolling algorithm describes how the patrol loop waypoints reposition themselves in locally optimal locations. The algorithm calculates the Voronoi region for each waypoint, computes the centroid of the region based on Za, and subsequently repositions the waypoints based on (2.17). This algorithm can be implemented in a distributed way such that it can be computed for each waypoint independently, only sharing information withtheir neighboring waypoints and enough information for all waypoints to compute their Voronoi regions. 102 Once the waypoints have converged to a locally optimum configuration, the second stage of the algorithm assigns requests to taxis and rebalances taxis within each patrol loop. Each taxi is initially assigned a home patrol loop in round robin manner. Taxis cycle through four successive modes of operation: FREE, ONCALL, POB ("passenger on board") and RETURN. The service model assumes a dispatch center that controls all incoming requests. Scheduling is performed by matching incoming requests with the nearest available taxis. Once assigned a request, the taxi picks up the customer (ONCALL), delivers them to their destination (POB), and returns to its home loop (RETURN). Taxis flagged FREE use an intra-loop redistribution policy that is analogous to flow equilibrium at waypoints. Taxis patrol around their home loop until such time that every waypoint is serviced by a taxi. Thereafter, taxis remain stationed at their waypoints (treating them as virtual taxi stands), with any remainder of taxis (modulo the number of waypoints) continuing to patrol around the loop. This ensures that all waypoints receive equal service in steady state, while also ensuring that taxis do not waste fuel unnecessarily. Assuming we have Ne taxis and n waypoints along loop t, we require that the net flux (rate of inflow and outflow) D of taxis at waypoints converge to zero over the entire loop, i.e. EZ C= i =0. The following scenarios serve as the basis for all possible taxi dynamics along loop f: 1. Ne < n: each taxi continuously patrols along the loop. 2. NE = n: taxis redistribute to the nearest waypoint until there are Ne/n taxis at each waypoint, and remain stationary queueing for a customer request. 3. N > n: Ne - (Ne mod n) taxis queue at their respective waypoints waiting for a customer, while the remaining N mod n taxis continue to patrol around the loop ensuring that XL 14; = 0. Our service policy ensures that each patrolling loop is in a locally optimal configuration within the environment while bounding the number of outstanding customer requests. In this work we do not allow taxis to exchange home loops, as this could lead to scenarios whereby a patrolling loop loses all of its taxis to neighboring loops. We solve this by ensuring that requests are assigned to the loop whose Voronoi region they originate. 103 4.4 Modeling Historical Data 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 01 02 03 0.4 0.6 0.6 0.7 0.8 0.9 (a) CBD simulation 0.9 (b) Singapore simulation (c) Singapore simulation (overlayed on Za) Figure 4-2: Figures 4-2a-4-2c show screenshots of the simulator in action. Taxis are indicated by a colored triangle. A taxi can be in one of four states: traveling along patrol loop (FREE, black), servicing a pickup request (ON CALL, yellow), driving a passenger to their destination (POB, red or cyan), returning to the patrol loop (RETURN, blue). Pending requests are shown with a yellow 0, and outstanding requests are shown with a circled red 0. 104 We use data collected by a fleet of 16,000 taxis in Singapore. The dataset is one month (August 2010) of trips, consisting of millions of data points at thousands of GPS locations. Each entry records time, location, ID, etc., as well as the status (FREE, ONCALL, POB, etc.). The data serves several purposes. First, we use a subset of the data to train our dynamic patrolling algorithm. Second, we use two subsets of the dataset as test data for conducting simulations. We use the same day (Monday, August 16) for real-time data simulations which do not require training and the same day of the following week (Monday, August 23) for unseen data simulations. Finally, we use the data to quantify ground truth redistribution of taxis in Singapore. Because the actual taxi operation is unmanaged, there is no direct comparison against an existing policy. Instead, we analyze the distribution of the fleet throughout the day and record statistics such as odometry, status of operation, etc., that can be used as quantifiable metrics in our analysis. 4.4.1 Arrival and Destination Distributions Training the policy and conducting simulations both require knowledge of customer arrivals and destinations. Historical data is used to compute spatial arrival and destination distributionsuifaces, denoted by Za and Z,, respectively. The region is discretized into a 50 x 50 grid, with the height of the surface at each location representing the probability of either a customer arrival (a) or request destination (3). We use a 15-minute discretization to construct the surfaces. Figure 4-la shows an example of the arrival surface Za. Data sparsity is almost always an issue in statistical modeling. Considering that the one month dataset spans a 50 x 50 x 96 x 31 space, we see that even a large amount of data will be very sparse. Two stages of smoothing were used to improve performance of our model. First, each 24-hour dataset was smoothed temporally using a simple averaging filter to reduce noise caused by temporal discretization. The resulting surfaces for each time window were then normalized and smoothed using a Gaussian filter to reduce noise caused by the 50 x 50 spatial discretization. 105 4.5 Experiments A simulation framework was implemented in MATLAB. The model implements the spatial PDP formulation presented in Section 4.2. Customer requests arrive in a Poisson process with rate parameter X (t) and are distributed according to Za(t). Taxis traverse the space in straight lines and at constant speed. As the simulation evolves, customers are serviced by N(t) taxis that respond to the incoming pickup requests. Customer destinations are distributed according to the destination distribution surface Z5(t). Figure 4-2 shows annotated screenshots of a typical simulation. Our simulation engine can incorporate any path planning mechanism that maps the locomotion of the taxi onto the road network. This additional complexity was omitted in this work in order to evaluate the effect of the informative path policy in isolation. Because the recorded data from the Singapore taxi fleet use the underlying road network, there is clearly a cost associated with assuming a straight line path planner. We evaluate this cost by leveraging Google's extensive geocoding API. For a given pair of coordinates pI and P2 we can calculate the driving distance from p, to P2, taking into account the time, day, road conditions and road directionality. Generally, distances recorded in simulation incur a cost function / (pI,P2 , t) = Loi, P2 /d(piP2, t)drive by which the recorded distance scales the true driving distance. 2,500 Monte Carlo simulations were carried out to approximate 1 under steady-state conditions by sampling points and computing the driving distance. The average value was calculated to be 8 = 1.974. Thus on average a taxi would drive approximately twice the distance quoted by the simulation if it had made the same journey along the Singapore road network. Three different types of simulation were conducted: (1) greedy policy simulations establish the benefit of the patrolling policy as a benchmark against a simplistic redistribution strategy, (2) ground truth simulations, which use the same number of taxis N that were recorded for a corresponding scenario from historical data, and (3) stability-based simulations, which aim to find the minimum number of taxis Nmin that ensure a k-tight stability guarantee defined by (4.1). A stability margin of k = 0.01 (i.e. 1 percent) was chosen for our experiments. 106 1.6 I 1.4 -Greedy pokiy -. - 0.s 0.0 I * ~ ~ e 90.7 it u~ Mil - Greedy poicy poliy (5 loops) Patroling polcy (10 loops) Paolig pocy (10 loops) Paroing -+ 1.2 I'llie - - Patrolling policy (25loops) 9-e- 0los Wit, 01 ~ 0.8 0.0 V, 0.4 0.6 T0.3 0.4 0.2 -I 0.1 16 (r6 32 4 Time period (15 min ep0011) so so 64 7 so so 1 2 3 4 Time perled (4 hourepooh) (a) CBD average on-call distance per trip (patrolling (b) Singapore average on-call distance per trip (papolicy vs greedy policy) trolling policy vs greedy policy) 4.5 Iin *IAug 4 2r, Aug 23(model) - - -Aug23 hisoric*l 18 (model)I 1.8 3.5 1.4 3 1.2 2.5 Aug 23 (model) J1 lift 2 tol * --- 0.8 lof Aug 1 (Modell) AugG(historical) 0.6 FVWW 1 1 not ')'U" 15 0.2 imm .1, 32 48 rose period (min) 04 916 80 1 2 3 4 Time period (min) 5 6 (c) CBD average on-call distance per trip (patrolling (d) Singapore average on-call distance per trip (papolicy vs historical data) trolling policy vs historical data) 0.9 - -- 0.8 Aug 23 (model) Aug 23 (htsiorlorl) Aug 16 (model) Aug 1 (heloorel. Aug 23 (model) Au 23 (helro)O I (mOdMl) 3 Aug : A.g 1: (historical) 0 Q % 0.7 0.6 \ /I~~ 0~~ 2 it-I 0.5 I' - li 1 0.4 6 1.5 I 0.2 I I I el 6}l 02 '.~' *1 I, 6~ 0.1 n I I 1 18 32 48 Time period (010) 64 80 T 9 3 (e) CBD average on-call distance per taxi (patrolling (f) Singapore average on-call distance per taxi (papolicy vs historical data) trolling policy vs historical data) Figure 4-3: Single-loop (CBD) simulation results and Multi-loop (Singapore-wide) simulation results 4.5.1 Greedy Policy Experiments We first show the utility of our policy by comparing it against a simplistic patrolling policy implemented in the same simulation framework. This "like to like" comparison is important 107 for any simulation-based work if the assumptions made in the simulation engine can affect the generality of the results. Further, these simulations serve as control experiments: if a greedy policy performs poorly then we can conclude that the improvement was due to the informative path redistribution (the only difference between the two policies) and not due to path-planning or scheduling simplifications, which are trivial by comparison. Greedy policy simulations were carried out for the CBD, and for the whole of Singapore. Different numbers of loops (5,10,15,20,25) were used for multi-loop experiments as a benchmark for the best number of loops to use in the main experiments. 4.5.2 Single Loop Experiments The CBD was chosen for the single-loop experiments because (1) it has a high volume of customer requests throughout the day and thus presents a lot of scope for optimization, (2) the CBD is representative of Singapore as a whole in terms of request arrival and destination flow throughout the day, and (3) an area the size of the CBD is an appropriate service region to consider for a single loop. Both ground truth and stability-based simulations were carried out for the CBD. We use a 15-minute discretization epoch both for updating the patrol loops, and for determining the corresponding number of taxis N and arrival rate X. Due to a fine discretization and a relatively small arrival rate, we add smoothing to reduce noise in the results. 4.5.3 Multi-Loop Experiments Although a single-loop policy can be constructed for a service region of any size, its interpretation becomes increasingly questionable when applied to larger areas, all the way up to an entire city. Undoubtedly, the most interesting application is to consider multiple loops that are constructed to work together to service customer requests. Our multi-loop experiments scale up to consider the whole of Singapore. We use a larger time discretization for arrival rate A and for the number of taxis in ground truth simulations. This gives us 6 four-hour epochs to consider throughout the course of the day, which are representative of different time periods throughout a typical day (night, early morning, morning, after108 noon, evening, late evening). Using a finer discretization at such a large scale would not be meaningful, and would simply end up averaging results over several time periods in order to get to contextualize our results. We maintain the 15-minute discretization for updating the patrol loops to ensure that the loops can adapt to changing arrival rate (also discretized into 15-minute time steps). Both ground truth and stability-based simulations were carried out for the whole of Singapore. Based on preliminary results we consider 25 loops as the optimal choice for our experiments. There are 28 postal districts in Singapore, so the 25-loop case gives us an approximation for scaling up a single-loop policy in the case of the CBD (which covers around 1-2 postal regions). 4.5.4 Unseen Test Data In previous work [2] we demonstrate how to accurately infer traffic volume from historical data collected on different days. To determine if our policy is still useful in the absence of real-time data we conduct simulations using unseen historical test data. Our algorithm was first trained by pre-constructing dynamic patrolling loops using historical data from the same day (August 16). Experiments were then carried out using historical data from the same day in the following week (August 23). A full replication of all the preceding experiments was conducted for both the CBD and Singapore-wide multi-loop scenarios. 4.6 Results First, we consider the implications of the greedy policy simulations. Figures 4-3a and 4-3b show the greedy simulation results. In the case of the CBD we observe an overall increase in ONCALL distance per trip by a factor of 1.42. For the entire region of Singapore, the greedy policy performs even worse, increasing the overall ONCALL distance by a factor of 3.71. This supports our intuition that our policy is useful: because the control experiments employing a simple policy performed much worse using the same simulation engine, the increase in performance is due to patrolling policy only. 109 We are interested in examining solutions from three different points of view: (1) customer, (2) taxi driver, and (3) urban planning. Customer (quality of service) Quality of service is represented by customer waiting time, which is equivalent to the ONCALL distance of individual taxi trips. Thus, the average ONCALL distance per trip is an indicator of the expected waiting time. Figure 4-3c shows the average ONCALL per trip distance for our single-loop policy in the CBD. The average ONCALL per trip distance for August 16 is 0.45km, as compared to 1.2km from the historical data. Thus our singleloop policy reduces the total customer waiting time by a factor of 2.66. Figure 4-3d shows the ONCALL per trip distance for the multi-loop case. The total average ONCALL per trip distance computed using our model is 0.13 km as compared to 1.7 km from the historical data. With a distance cost factor P # 2, we see that our 25-loop patrolling policy in Sin- gapore reduces the total customer waiting time by a factor of 6.84. For this scenario, a customer request time that would have historically taken an hour, can now be serviced in under 10 minutes. Taxi Driver (distance traveled empty) We assume that the goal of the taxi driver is to minimize the amount of time driving empty. Figure 4-3e shows the ONCALL per taxi distance for our single-loop policy in the CBD. The average total ONCALL per taxi distance for August 16 is 0.06 km, as compared to 0.18 km from the historical data. Our single-loop policy reduces the average total distance driven empty by a factor of 3. This reduction corresponds to 67 percent decrease in fuel consumption between subsequent customer requests. Figure 4-3f shows the ONCALL per taxi distance for the multi-loop case. The total average ONCA L L per taxi distance computed using our model is 0.12 km as compared to 1.6 km from the historical data. With a distance cost factor # e 2, we see that our 25-loop patrolling policy in Singapore reduces the average total distance driven empty by a factor of 6.74. 110 Urban Planning (reducing congestion) We assume that the goal of the municipal authority is to reduce congestion by reducing the number of taxis on the road. The minimum number of taxis Nmin that are necessary to maintain stability is given by (4.1) for some stability margin k. For any number of taxis N > Nmin we define the utilization factor as (4.2) 1 = Nmin/N. The utilization factor n is the fraction of those N taxis that can service all requests while maintaining stability for a given X. The total utilization factor on August 16th in the CBD using our model is rj = 0.05, thus our model requires only 5 percent of the total taxis available throughout the day to maintain stability. The total utilization factor for the multiloop case is 17 = 0.14, similarly implying that the taxi network is over-utilized by an order of magnitude. Singapore CBD On-call per taxi (km) On-call per trip (km) Utilization factor Ti Aug. 16 Aug. 23 Aug. 16 Aug. 23 0.06 0.45 0.05 0.06 0.44 0.07 0.12 0.13 0.14 0.13 0.13 0.14 Table 4.1: Total patrolling policy ONCALL distance ratios and utilization factor over 24-hour simulations. 4.6.1 Unseen Test Data Results All of the preceding experiments were conducted on unseen test data, as described in Section 4.5.4. By evaluating it on previously unseen test data from August 23, we see that our model performs well and maintains its robustness in the absence of real-time data. Figure 43 shows the results for August 23 overlayed in green. We see that the ONCALL distance results maintain nearly the same magnitude and provide the same caliber of improvement over the historical data as described in Section 4.5.4. We conclude that our policy results in a comparable improvement in performance in the absence of real-time data. 111 4.7 Discussion In this paper we presented a novel patrolling policy for a fleet of service vehicles responding to requests in a PDP scenario. Our policy uses patrol loops based on informative path planning to minimize the distance driven by the vehicles to an incident request. We formalized the notion of stability in our problem context, and proved guarantees for our policy. We used historical data from a fleet 16,000 taxis in Singapore to (1) infer the current ground truth behavior of the unmanaged taxi fleet, (2) to train our algorithm, and (3) to conduct simulations using both real-time and unseen test data. We evaluated the performance of our policy by evaluating customer waiting time, distance driven empty, and congestion. The experiments show that we can achieve substantial improvement in customer waiting time and expected distance driving empty. Further, we observe that the taxi network is over-utilized by showing that a similar level of service is possible with much fewer taxis. Finally, we show that our policy generalizes well to unseen test data, offering an improvement in performance that is on par with results from real-time simulations. 112 Chapter 5 Conclusion and Lessons Learned Building upon previous work in [27,33], in this thesis we presented an informative path controller for both single-robot and multi-robot cases. This controller is decentralized, adaptive, and based on a mixing-function coverage approach that enables robots to combine sensor estimates to learn the distribution of sensory information in the environment and reshape their paths into informative paths. A mixing-function-based coverage approach is robust in that different classes of mixing functions can be used to recover multiple common control strategies including minimum variance and Voronoi approaches. Additionally, as the free parameter a approaches -oo from -1, the mixing function can approximate the Voronoi approach arbitrarily well. A Lyapunov stability proof shows that the controller will reshape the paths to locally optimal configurations and drive the estimated parameter vector error to zero, assuming that the robots' initial trajectories are rich enough. We observed how the free paramter a affected the behavior of the paths. While a E (-1, -oo) generally reproduces the same informative paths given by the Voronoi approach, a = -1 reduces the sensitivity of the Voronoi controller so that extremely small perturbations in initial waypoint positioning does not result in significantly different informative path configurations for a multi-robot system. We also showed that the weights assigned to the sensing task W, and the neighbor distance Wn, can affect the informative path configuration. Thus, these parameters can be used to tune the system according to the desired behavior. For example, if shorter paths are desired, then setting Wn higher is recommended. However, if sensing is very important and short paths are not necessary, then having a high 113 W would generate the desired result. Additionally, these weights can have a big effect on whether the final paths will intersect or not, as it was observed in our simulations and experiments that a high W, tends to generate non-intersecting paths. The informative path control algorithm enables robots to sample and generate useful paths for applications such as surveilling and persistent sensing. In this thesis, we extended the informative path controller to be used in conjuction with a speed controller presented in [30], to drive the paths into locally optical configurations that are beneficial for persistent sensing tasks. Using this extended controller, referred to as an informative persistence controller, the robots drive their paths in a direction where the stability margin of the persistent sensing task does not decrease, hence improving the performance of the robots executing the persistent sensing task. Increasing the stability margin has the additional benefit of allowing the controller to more easily overcome unmodeled errors in the persistent sensing task such as tracking errors by the robots and discretization of the path. Although it uses the same coverage algorithms, in general, the final paths using the informative persistence controller were different than the paths using the informative path controller. This is due to the additional restriction of the non-decreasing stability margin. Finally, we used the informative persistence controller to derive a novel patrolling policy for a fleet of service vehicles responding to requests in a PDP scenario. Our policy used patrol loops based on a Voronoi coverage control strategy to minimize the distance driven by the vehicles to an incident request. We formalized the notion of stability in our problem context, and proved guarantees for our policy. We used historical data from a fleet of 16,000 taxis in Singapore to (1) infer the current ground truth behavior of the unmanaged taxi fleet, (2) to train our algorithm, and (3) to conduct simulations using both real-time and unseen test data. We evaluated the performance of our policy by evaluating customer waiting time, distance driven empty and congestion. The experiments show that we can substantially reduce customer waiting time and the expected distance driven empty. Further, we observed that the taxi network is over-utilized by showing that a similar level of service is possible with much fewer taxis. Finally, we show that our policy generalizes well to unseen test data, offering an improvement in performance that is on par with results from real-time simulations. 114 Appendix A Tables of Mathematical Symbols Table A. 1: Common symbols for each control strategy q Definition Convex bounded environment An arbitrary point in Q d Number of parameters a Mixingfunction free parameter Symbol Q O(q) Sensory function at point q B(q) Vector of basis functions for the sensory function at point q a c(q) a-j Ai ptunk s(q) S G(j) True parameter vector for the sensory function, 0(q) = B(q)Ta Consumption rate at point q for persistent sensing Standard deviation of the j1 h Gaussian basis function Mean of the fth Gaussian basis function Truncation distance for Gaussian basis functions Stability margin of point q for persistent sensing Stability margin of the persistent sensing task Gaussian function used to calculated truncated Gaussian basis Wn Weight assigned to neighboring waypoint distance WS Weight assigned to the mixing function coverage task p Radius for circular sensor footprint for persistent sensing Dwelling time between switching from one to zero Tdwell v Arbitrary real constant scalar F Diagonal, positive-definite adaptation gain matrix y Gain for adaptation law p Radius for circular sensor footprint for persistent sensing 115 Table A.2: Single-robot controller symbols Symbol Pr(t) Definition The robot's position at time t n Number of waypoints in the robot's path Pi The robot's ith waypoint position Vi Voronoi partition of the ith waypoint Ai Mass of Q 1A Approximation of Mi Yi First mass moment of Q fi Approximation of Y Ci Centroid of Q $i Approximation of Ci ei, ei Vi Ci -pi, c - pi Wn(Pi+p 1 + Pi-1 - 2 pi) /3,fA Mi+2Wn,Ai+2Wn PPr (t) Sensory function at the robot's position, 0 (pr(t)) 0(q, t) The robot's approximation of O(q) Bp, (t) a Vector of basis functions at robot's position, B(pr(t)) The robot's parameter estimate a The robot's parameter error, Ui Control input for the i"h waypoint H(pi,...,p,) a- a Locational cost function A The robot's weighted integral of basis functions 2 The robot's weighted integral of sensory measurements Ki Control gain matrix for the ith waypoint in the robot's path w The robot's data weighting function V Lyapunov-like function for informative path and persistent sensing b Term in adaptation laws for purposes of the Lyapunov proof apre Time derivative of the robot's parameter before projection Iproj Projection matrix T(t) Time it takes the robot to complete its path at time t ,rc(q, t) Time the robot covers q along its path at time t F(pr) The robot's sensor footprint s(q) The robot's estimated stability margin of point q S The robot's estimated stability margin of the persistent task TW Time at which the adaptation data weighting function is set to zero Ii Boolean control input for waypoint movement in persistent sensing ti Most recent time the boolean input I switches to one for the ih waypoint 116 Table A.3: Multi-robot controller symbols Symbol Pr(t) Definition Robot r's position at time t nr number of waypoints in robot r's path for multi-robot system N number of robots in the multi-robot system pf Robot r's ith waypoint position V[ Voronoi partition of ith waypoint in robot r's path Mf Mass of Q A?!f Approximation of Mr Yr First mass moment of Q kfr Approximation of Y[ C[ Centroid of Q ei Approximation of Cf ef Cf - pf ii Wn(pr 1f Mr+2Wn Pir A +2Wn + pf 1 -2) OPr (t) Sensory function at robot r's position, 0 (Pr(t)). Or(q, t) Robot r's approximation of 0 (q) Bpr (t) Vector of basis functions at robot r's position, B(Pr(t)) ar Robot r's parameter estimate dr u, Robot r's parameter error for the multi-robot system, ar - a Control input for the ith waypoint in robot r's path Ha Locational mixing function based cost function Ar Robot r's weighted integral of basis functions Ar Robot r's weighted integral of sensory measurements K[ Control gain matrix for the jth waypoint in robot r's path Wr Robot r's data weighting function Lyapunov-like function for multi-robot coverage and persistent sensing V lr,r' Weighting between parameters for robots r and r' L Graph Laplacian of the robot network br Terms in adaptation laws for purposes of the Lyapunov proof apre, Dmax Time derivative of robot r's parameter before projection Maximum distance the robot's can have and still communicate IProjr Qj Projection matrix Vector containing the j'h parameter of each robot (' Tr(t) Consensus gain Time it takes robot r to complete its path at time t 117 Symbol Symbol r'(q, t) Fr(pr) Sr' (q) Definition Definition Time robot r covers q along its path at time t Robot r's sensor footprint Robot r's estimated stability margin of point q for persistent sensing Sr Robot r's estimated stability margin of the persistent sensing task TWr Time at which the adaptation data weighting function is set to zero for robot r IT Boolean control input for shutting down waypoint movement in persistent sensing tur, i Most recent time the boolean input I[ switches to one for the ith waypoint in robot r's path 118 Bibliography [1] A. Arsie and E. Frazzoli. Efficient routing of multiple vehicles with no explicit communications. InternationalJournalofRobust andNonlinearControl, 18(2):154-164, January 2007. [2] J. Aslam, S. Lim, X. Pan, and D. Rus. City-scale traffic estimation from a roving sensor network. In Proceedingsof the 10th ACM Conference on Embedded Network Sensor Systems, pages 141-154. ACM, 2012. [3] R. Balakrishna, M. Ben-Akiva, and H.N. Koutsopoulos. Offline calibration of dynamic traffic assignment: simultaneous demand-and-supply estimation. Transportation Research Record: Journal of the TransportationResearch Board, 2003(-1):5058, 2007. [4] F. Bourgault, A.A. Makarenko, S.B. Williams, B. Grocholsky, and H.F. DurrantWhyte. Information based adaptive robotic exploration. In Intelligent Robots and Systems, 2002. IEEE/RSJ International Conference on, volume 1, pages 540 - 545 vol.1, 2002. [5] H. Choset. Coverage for robotics - A survey of recent results. Annals ofMathematics and Artificial Intelligence, 31(1-4):113-126, 2001. [6] J. Cortes. Distributed kriged kalman filter for spatial estimation. Automatic Control, IEEE Transactionson, 54(12):2816 -2827, dec. 2009. [7] J. Cortes, S. Martinez, T. Karatas, and F. Bullo. Coverage control for mobile sensing networks. Robotics and Automation, IEEE Transactionson, 20(2):243 - 255, april 2004. [8] C.T. Cunningham and R.S. Roberts. An adaptive path planning algorithm for cooperating unmanned air vehicles. In Robotics and Automation, 2001. Proceedings2001 ICRA. IEEE InternationalConference on, volume 4, pages 3981 - 3986 vol.4, 2001. [9] Y. Elmaliach, N. Agmon, and G.A. Kaminka. Multi-robot area patrol under frequency constraints. In Robotics and Automation, 2007 IEEE InternationalConference on, pages 385 -390, april 2007. [10] T.L. Friesz, J. Luque, R.L. Tobin, and B.W. Wie. Dynamic network traffic assignment considered as a continuous time optimal control problem. OperationsResearch, pages 893-901, 1989. 119 [11] R. Graham and J. Cortes. Adaptive information collection by robotic sensor networks for spatial estimation. Automatic Control, IEEE Transactionson, PP(99):1, 2011. [12] G. Hollinger and S. Singh. Multi-robot coordination with periodic connectivity. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 4457-4462. IEEE, 2010. [13] L.E. Kavraki, P. Svestka, J.-C. Latombe, and M.H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. Robotics and Automation, IEEE Transactionson, 12(4):566 -580, aug 1996. [14] K.J. Kyriakopoulos and G.N. Saridis. Minimum jerk path generation. In Robotics and Automation, 1988. Proceedings., 1988 IEEE InternationalConference on, pages 364 -369 vol.1, apr 1988. [15] J. Le Ny and G.J. Pappas. On trajectory optimization for active sensing in gaussian process models. In Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedingsof the 48th IEEE Conference on, pages 6286 -6292, dec. 2009. [16] S. Li. Multi-attribute Taxi Logistics Optimization. PhD thesis, Massachusetts Institute of Technology, 2006. [17] S. Lim, H. Balakrishnan, D. Gifford, S. Madden, and D. Rus. Stochastic motion planning and applications to traffic. The InternationalJournalof Robotics Research, 30(6):699-712, 2011. [18] K.M. Lynch, I.B. Schwartz, Peng Yang, and R.A. Freeman. Decentralized environmental modeling by mobile sensor networks. Robotics, IEEE Transactions on, 24(3):710 -724, june 2008. [19] D.K. Merchant and G.L. Nemhauser. Optimality conditions for a dynamic traffic assignment model. TransportationScience, 12(3):200-207, 1978. [20] W.J. Mitchell, C.E. Borroni-Bird, and L.D. Bums. Reinventing the automobile. Personal Urban Mobility for the 21st Century. Cambridge/l\/IA, 2010. [21] N. Nigam and I. Kroo. Persistent surveillance using multiple unmanned air vehicles. In Aerospace Conference, 2008 IEEE, pages 1 -14, march 2008. [22] M. Pavone, S.L. Smith, E. Frazzoli, and D. Rus. Robotic load balancing for mobilityon-demand systems. The InternationalJournalof Robotics Research, 31(7):839-854, 2012. [23] Yuan-Qing Qin, De-Bao Sun, Ning Li, and Yi-Gang Cen. Path planning for mobile robot using the particle swarm optimization with mutation operator. In Machine Learning and Cybernetics, 2004. Proceedings of 2004 InternationalConference on, volume 4, pages 2473 - 2478 vol.4, aug. 2004. 120 [24] T. Roughgarden and E. Tardos. How bad is selfish routing? (JACM), 49(2):236-259, 2002. Journal of the ACM [25] M. Schwager, D. Rus, and J. J. Slotine. Unifying geometric, probabilistic, and potential field approaches to multi-robot deployment. InternationalJournalof Robotics Research, 30(3):371-383, March 2011. [26] M. Schwager, J.-J. Slotine, and D. Rus. Decentralized, adaptive control for coverage with networked robots. In Robotics and Automation, 2007 IEEE International Conference on, pages 3289 -3294, april 2007. [27] Mac Schwager, Daniela Rus, and Jean-Jacques Slotine. Decentralized, adaptive coverage control for networked robots. The InternationalJournalof Robotics Research, 28(3):357-375, 2009. [28] A. Singh, A. Krause, and W.J. Guestrin, C.and Kaiser. Efficient informative sensing using multiple robots. JournalofArtificial Intelligence Research, 34(2):707, 2009. [29] J.J.E. Slotine and W.A. LI. Applied Nonlinear Control. Prentice Hall, 1991. [30] S. L. Smith, M. Schwager, and D. Rus. Persistent robotic tasks: Monitoring and sweeping in changing environments. Robotics, IEEE Transactions on, PP(99):1 -17, 2011. [31] S.L. Smith, M. Schwager, and D. Rus. Persistent monitoring of changing environments using a robot with limited range sensing. In Robotics andAutomation (ICRA), 2011 IEEE InternationalConference on, pages 5448 -5455, may 2011. [32] Daniel E. Soltero, Stephen L. Smith, and Daniela Rus. Collision avoidance for persistent monitoring in multi-robot systems with intersecting trajectories. In Intelligent Robots andSystems (IROS), 2011 IEEE/RSJInternationalConference on, pages 3645 -3652, sept. 2011. [33] Daniel E. Soltero, Mac Swchager, and Daniela Rus. Generating informative paths for persistent sensing in unknown environments. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ InternationalConference on, 2012. Submitted. [34] A. Stentz. Optimal and efficient path planning for partially-known environments. In Robotics and Automation, 1994. Proceedings., 1994 IEEE InternationalConference on, pages 3310 -3317 vol.4, may 1994. [35] M. Volkov, J. Aslam, and D. Rus. Markov-based redistribution policy model for future urban mobility networks. In Intelligent TransportationSystems (ITSC), 2012 15th InternationalIEEE Conference on, pages 1906-1911. IEEE, 2012. [36] J.G. Wardrop. Some theoretical aspects of road traffic research. In Inst Civil Engineers Proc London/UK, number 0, 1900. [37] Fumin Zhang and N.E. Leonard. Cooperative filters and control for cooperative exploration. Automatic Control, IEEE Transactionson, 55(3):650 -663, march 2010. 121