Decentralized Mixing Function Control Strategy for
Multi-Robot Informative Persistent Sensing Applications
by
Gavin Chase Hall
B.S. Mathematics, B.S. Mechanical Engineering, B.S. Physics
West Virginia University, 2009
Submitted to the Department of Mechanical Engineering
in partial fulfillment of the requirements for the degree of
MASSACHUSEM
OF TECHNOLOGY
S1 52014
Master of Science
at the
UIBRARIES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2014
@ Massachusetts Institute of Technology 2014. All rights reserved.
Signature redacted
Author ....
Department of Mechanical Engineering
May 15, 2014
Signature redacted
Certified by
Daniela Rus
Professor of Electrical Engineering and Computer Science
Thpsis Sypervisor
Signature redacted
Certified by .........
/
Jean-Jacques E. Slotine
Professor of Mechanical Engineering
Signature redactedjhesis Supervisor
Accepted by.............
NflUE
.................
David E. Hardt
Chair, Department Committee on Graduate Students
2
Decentralized Mixing Function Control Strategy for Multi-Robot
Informative Persistent Sensing Applications
by
Gavin Chase Hall
Submitted to the Department of Mechanical Engineering
on May 15, 2014, in partial fulfillment of the
requirements for the degree of
Master of Science
Abstract
In this thesis, we present a robust adaptive control law that enables a team of robots to generate locally optimal closed path persistent sensing trajectories through information rich
areas of a dynamic, unknown environment. This controller is novel in that it allows the
robots to combine their global sensor estimates of the environment using a mixing function to opt for either: (1) minimum variance (probabilistic), (2) Voronoi approximation, or
(3) Voronoi (geometric) sensing interpretations and resulting coverage strategies. As the
robots travel along their paths, they continuously sample the environment and reshape their
paths according to one of these three control strategies so that ultimately, they only travel
through regions where sensory information is nonzero. This approach builds on previous
work that used a Voronoi-based control strategy to generate coverage paths [32]. Unlike the
Voronoi-based coverage controller, the mixing-function-based coverage controller captures
the intuition that globally integrated sensor measurements more thoroughly capture information about an environment than a collection of independent, localized measurements.
Using a non-linear Lyapunov function candidate, we prove that the robots' coverage
path configurations converge to a locally optimal equilibrium between minimizing sensing
error and path length. A path satisfying this equilibrium is called an informative path. We
extend the informative path controller to include a stability margin and to be used in conjunction with a speed controller so that a robot or a group of robots equipped with a finite
sensing footprint can stabilize a persistent task by maintaining all growing fields within the
environment bounded for all time. Finally, we leverage our informative persistent paths
to generate a dynamic patrolling policy that minimizes the distance between instantaneous
vehicle position and incident customer demand for a large fleet of service vehicles operating in an urban transportation network. We evaluate the performance of the policy by
conducting large-scale simulations to show global stability of the model and by comparing
it against a greedy service policy and historical data from a fleet of 16,000 vehicles.
Thesis Supervisor: Daniela Rus
Title: Professor of Electrical Engineering and Computer Science
3
4
Acknowledgments
I would like to thank Professor Daniela Rus for two fantastic years of my life. Being new to
research, she taught me many tricks of the trade to help me hit the ground running. Among
them were her vision for the application of theory and her knack for clearly conveying
complex solutions to select audiences. I like to think of her as a mother figure, except that
she avoids me like the plague outside of school. She has been a great friend, and I can
always count on her to give me the straight scoop about any issue. I would like to thank
Professor Jean-Jacques E. Slotine for making the technical portions of this research a tad
less gruesome. His course lectures single-handedly prepped me for the majority of the
theoretical work between the covers of this thesis.
I would also like to thank all of my two friends in the Distributed Robotics Laboratory
for making me realize that performing amazing work can actually be somewhat enjoyable.
My biggest thank you goes to my family. Whether it has been music, science, baseball, or
trick-or-treating until I was 28, you've made every step of the way so much better. I hope
my progress helps to make up for how much my sister let you down.
5
6
Dedicatedto Dana M. Hall, Michael A. Hall, & Philip E. Harner
7
8
Contents
1
2
Introduction
19
1.1
Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2
Contribution to Robotics
1.3
Relation to Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4
Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
. . . . . . . . . . . . .
. . . . . . . . . . . . . 24
Informative Path Controller Using a Mixing Function
31
2.1
Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2
Mixing Function Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3
Mixing Function Control Law
2.4
Deriving Common Control Strategies
2.5
2.6
2.7
. . . . . . . . . . . . . . . . . . . . . . . . 35
. . . . . . . . . . . . . . . . . . . . 38
2.4.1
Voronoi Control Strategy (a = -oo)
2.4.2
Minimum Variance Probabilistic Control Strategy (a = -1)
Mixing Function Controller Convergence
. . . . . . . . . . . . . . . . . 39
. . . . 40
. . . . . . . . . . . . . . . . . . 41
2.5.1
Sensory Function Parameterization
2.5.2
Coverage Convergence in a Known Environment . . . . . . . . . . 43
2.5.3
Coverage Convergence in an Unknown Environment . . . . . . . . 44
. . . . . . . . . . . . . . . . . 41
Single Robot Coverage Algorithm and Simulation . . . . . . . . . . . . . . 52
2.6.1
Learning Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.2
Path Shaping Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6.3
Varying Sensing Weight W and Path Weight Wn . . . . . . . . . . . 59
2.6.4
Computational Complexity of Single Robot Algorithm . . . . . . . 60
Multi-Robot Coverage Algorithm and Simulation . . . . . . . . . . . . . . 60
9
3
Learning Phase ............................
62
2.7.2
Path Shaping Phase . . . . . . . . . . . . . . . . . . . . . . . . .
65
2.7.3
Varying Sensing Weight W and Path Weight W .. . . . . . . . . .
67
2.7.4
Computational Complexity of Multi-Robot Algorithm
. . . . . .
68
2.7.5
Robustness Considerations for Initial Waypoint Configurations . .
69
.
.
.
.
2.7.1
Informative Persistence Controller for Multiple Robots using Mixing Function
Coverage Approach
Relation to Persistent Sensing Tasks . . . . . . . . . . . . . .
. . . . 76
3.2
Informative Persistence Controller . . . . . . . . . . . . . . .
. . . . 77
3.3
Single Robot Informative Persistence Algorithm and Simulation
. . . . 79
3.3.1
Learning Phase . . . . . . . . . . . . . . . . . . . . .
. . . . 81
3.3.2
Path Shaping Phase . . . . . . . . . . . . . . . . . . .
. . . . 84
3.3.3
Single Robot Persistence Simulation Discussion
. . .
. . . . 86
.
.
.
.
.
3.1
. . . . 86
3.4.1
Learning Phase . . . . . . . . . . . . . . . . . . . . .
. . . . 87
3.4.2
Path Shaping Phase . . . . . . . . . . . . . . . . . . .
. . . . 91
3.4.3
Multi-Robot Persistence Simulation Discussion . . . .
. . . . 93
Dynamic Patrolling Policy Using an Informative Path Controller
95
.
.
Multi-Robot Informative Persistence Algorithm and Simulation.
.
3.4
4
75
4.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3
4.4
4.2.1
Using Informative Paths . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.2
Multi-Agent Controller Extension . . . . . . . . . . . . . . . . . . 97
4.2.3
Operational Stability . . . . . . . . . . . . . . . . . . . . . . . . . 98
Dynamic Patrolling Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3.1
Computational Complexity . . . . . . . . . . . . . . . . . . . . . . 101
4.3.2
Solution Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3.3
Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . 102
Modeling Historical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4.1
Arrival and Destination Distributions
10
. . . . . . . . . . . . . . . . 105
4.5
4.6
Experiments ..................................
106
4.5.1
Greedy Policy Experiments
. . . . . . . . . . . . . . . . . . . . . 107
4.5.2
Single Loop Experiments . . . . . . . . . . . . . . . . . . . . . . . 108
4.5.3
Multi-Loop Experiments . . . . . . . . . . . . . . . . . . . . . . . 108
4.5.4
Unseen Test Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.6.1
4.7
5
Unseen Test Data Results . . . . . . . . . . . . . . . . . . . . . . . 111
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
113
Conclusion and Lessons Learned
115
A Tables of Mathematical Symbols
11
12
List of Figures
1-1
Path reshaping process for three robots . . . . . . . . . . . . . . . . . . . . 22
1-2
Example of persistent sensing by two robots . . . . . . . . . . . . . . . . . 23
1-3
Informative patrolling loops over historical demand distribution in Singapore 25
2-1
Mixing function sensing behaviors . . . . . . . . . . . . . . . . . . . . . . 33
2-2 Mixing function supermodularity . . . . . . . . . . . . . . . . . . . . . . . 34
2-3
Approximation to indicator function. . . . . . . . . . . . . . . . . . . . . . 37
2-4
Mean integral parameter error and Lyapunov-like function in single robot
learning phase . . .... . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2-5
Single robot learning phase with an informative path controller . . . . . . . 56
2-6
Mean waypoint position error under the informative path controller for a
single robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2-7
Single robot path shaping phase with an informative path controller
2-8
Single robot W vs. Wn
2-9
Consensus and integral parameter errors for multiple robots . . . . . . . . . 63
. . . . 58
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2-10 Multi-robot learning phase with informative path controller . . . . . . . . . 64
2-11 Mean waypoint position error under the informative path controller for
multiple robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2-12 Multi-robot path shaping with informative path controller . . . . . . . . . . 66
2-13 Informative path configurations for two environments with different mixing
functions .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2-14 Multiple robot W vs. Wn . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2-15 Sensitivity of Voronoi path separation
13
. . . . . . . . . . . . . . . . . . . . 70
2-16 Sensitivity of Voronoi smoothing path separation
. . . . . . . . . . . . . . 71
2-17 Relaxed path separation sensitivity of minimum variance coverage strategy
2-18 Setup for testing initial waypoint position sensitivity
3-1
72
. . . . . . . . . . . . 73
Integral parameter error and Lyapunov function candidate of the informative persistence controller for a single robot . . . . . . . . . . . . . . . . . 82
3-2
Single robot learning phase with informative persistence controller . . . . . 83
3-3
Mean waypoint position error for a single robot . . . . . . . . . . . . . . . 84
3-4
Persistent task stability margin for a single robot . . . . . . . . . . . . . . . 85
3-5
Single robot path shaping phase with the informative persistence controller
3-6
Integral parameter error under the informative persistence controller for
85
m ultiple robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3-7
Lyapunov-like function in learning phase under the informative persistence
controller for multiple robots . . . . . . . . . . . . . . . . . . . . . . . . . 89
3-8
Consensus error under the informative persistence controller for multiple
robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3-9
Multi-robot learning phase with informative persistence controller . . . . . 90
3-10 Mean waypoint position error and persistent sensing task's stability margin
for multiple robots
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3-11 Multi-robot path shaping phase with informative persistence controller . . . 92
4-1
Arrival and destination distribution and map of Singapore with informative
loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4-2
Dynamic patrolling policy simulator . . . . . . . . . . . . . . . . . . . . . 104
4-3
Single-loop (CBD) and multi-loop (Singapore-wide) simulation results
14
. . 107
List of Algorithms
1
Mixing function controller for a single robot in an unknown environment:
robot level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2
Mixing function controller for a single robot in an unknown environment:
waypoint level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3
Mixing function controller for multiple robots in an unknown environment:
robot level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4
Mixing function controller for multiple robots in an unknown environment:
waypoint level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
. . . . 80
5
Informative persistence controller for a single robot: waypoint level
6
Informative persistence controller for multiple robots: waypoint level
7
Patrolling policy pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . 100
15
. . . 87
16
List of Tables
4.1
Total patrolling policy ONCA L L distance ratios and utilization factor over 24-hour
sim ulations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .111
. . . . . . . . . . . . . . . . . 115
A. 1
Common symbols for each control strategy
A.2
Single-robot controller symbols
A.3
Multi-robot controller symbols . . . . . . . . . . . . . . . . . . . . . . . . 117
. . . . . . . . . . . . . . . . . . . . . . . 116
17
18
Chapter 1
Introduction
1.1
Motivation and Goals
When monitoring an unfamiliar and changing environment, robots face two significant
challenges: (1) the robots must identify the regions and rates of change corresponding
to important sensory information in the environment, and (2) the robots must determine
how to optimally position and allocate themselves to cooperatively collect this information.
Given a group of robots, each equipped with a sensor to measure the environment, our goal
is to derive an adaptive multi-robot control strategy that enables robots to employ multiple
sensing strategies to generate a configuration of closed paths that the robots will travel along
to maximize their collection of sensory information. These paths are called informative
paths, because they drive the robots through locations in the environment where the sensory
information is important.
Informative path planning brings the notion of sensory value into the planning problem, thus adding information to the geometric formulation. This has many applications.
For example, it can be used by a team of robots operating in an underground mining environment to estimate regions of significant CH4 accumulation and generate paths that enable
continual monitoring of these locations to prevent future mining disasters. Another example, would be to use this controller to deploy a team of robots to monitor a wildfire over a
large environment. The robots would be able to learn the regions where the fire has spread
on-line and generate paths such that between all robots, all locations with fire damage are
19
constantly sensed, while regions with no fire damage are avoided.
In this thesis, we present a decentralized adaptive control algorithm for generating informative paths in unknown environments for multi-robot systems. This algorithm takes as
input sensory information collected by each robot over a dynamic environment and outputs
locally optimal paths whose trajectories cover the regions in the environment where sensory
information is nonzero. The first feature of this informative path algorithm is a parameter
estimation adaptation law the robots use to learn how sensory information is distributed
throughout the environment. The second feature is a mixing-function-based coverage controller that performs the reconfiguring of the paths based on a gradient optimization of a
cost function comprised of a class of parameterized sensor mixing functions [25].
Mixing functions dictate how sensor measurements from different robots are combined
in order to represent different assumptions about the coverage task. By varying the value
of a free parameter, and consequently the mixing function class and the robots' aggregate
sensor mixing behaviors, the mixing function control strategy can recover multiple common control strategies, including minimum variance (probabilistic), Voronoi smoothing,
and strictly Voronoi (geometric) based approaches. These control strategies represent a
wide range of robot sensor combination abilities. Whereas the probabilistic control strategy promotes global sensor mixing by minimizing the expected variance of all robots'
sensor measurements of a point of interest, the geometric control strategy employs no sensor mixing and only considers the sensor measurements of the robot closest to a point
of interest [7, 25]. Voronoi smoothing bridges these two strategies by either increasing
or decreasing the amount of sensor synthesis between robots in accordance to the mixing
function's free parameter value.
The mixing function controller is an extension to the probabilistic and geometric unifying controller introduced in [25]. It consists of placing the waypoints of a path in locally
optimal positions that achieve an equilibrium between minimizing the cost of a class of
mixing functions, sensing errors, and informative path lengths. Minimizing sensing errors
allows the robots to be close to the points of interest that they are sensing. Minimizing path
length reduces the robots' travel time through the regions of the environment with no sensory value. As the robots discover the structure of the environment, they reshape their paths
20
according to this equilibrium to avoid visiting static areas and focus on sensing dynamic
areas. An example of the reshaping process for three robots using a probabilistic control
strategy is shown in Figure 1-1.
A number of task-specific multi-robot control strategies have been proposed to accomplish informative path planning in a distributed and efficient way [7,26]. In [27,33], geometric based Voronoi controllers with no sensor measurement synthesis between robots
were used. As a result of the robustness of the mixing function control strategy, not only
can we directly recover the Voronoi-based controller, we can also smoothly approximate
Voronoi-based coverage arbitrarily well, while preserving its asymptotic stability and convergence guarantees. Another benefit of the mixing function control strategy is that we can
employ a probabilistic approach to circumvent localized sensing sensitivities intrinsic to
geometric Voronoi-based controllers, that cause a significant topological variance in informative path configurations for nearly identical corresponding initial robot configurations.
Finally, because the mixing function controller does not require the expensive computation of a Voronoi tessellation over an environment for any of its derived control strategies,
informative paths can be generated by robots with limited computational resources.
Generating informative paths using the mixing function control strategy is the first step
in stabilizing a persistentsensing task [30, 31] in unknown environments, where the robots
are assumed to have sensors with finite sensing radii and are required to revisit a location
of interest at a specific calculated frequency. A persistent sensing task is defined as a
monitoring scenario that can never be completed due to the continual change of the states
of the environment. For example, if the robot were to stop monitoring, the information at
some points in the environment would grow unbounded.
For a persistent sensing task, we also calculate the speed at which robots with finite
sensor footprints collect sensory information along an informative path by instituting a stability margin that guarantees a bound on the difference between the robots' current estimate
of the environment and the actual state of the environment for all time and all locations.
A consequence of having finite sensing radii is that the robots are unable to collect data
over the entire environment at once. Therefore, as the data over a dynamic region become
outdated, the robots must return to that region to collect new data. In order to prevent the
21
0
0.1
0.6
1.4
0.4
0 .2
0.2
0.2
0.4
0.6
0.8
0
0.2
(a) Iteration: 1
0.4
0.6
0.6
(b) Iteration: 10
11
0
0 .6
0.4
0 .4
0 .2
0c
0.2
0'4
0.6
0.8
0
1
0.2
(c) Iteration: 25
0.4
06
0.8
(d) Iteration: 50
0.
.80
0
0. 6
.6
0
4
.4
0
0. 2
O
.2
0.2
0.4
0.6
0.8
90
(e) Iteration: 75
0.2
0,4
0.6
0.6
(f) Iteration: 100
Figure 1-1: Starting with initial sweeping paths, the robots learn about the environment by
observation; the observations are then used to transform the paths so that they are aligned
with the important parts of the environment. The paths correspond to the trajectories of
the three robots, where the robots' positions along these paths are represented by the black
arrows. The important regions of the environment are shown in green.
robots' model of the environment from becoming too outdated, [30] presented a persistent
sensing controller that calculates the speeds of the robots at each point along given paths,
22
i~e
(a) Stable
(b) Unstable
Figure 1-2: Example of persistent sensing by two robots. Each robot with a finite sensing
radius (red and blue circles around the robots' positions) travels through its path, with the
objective of keeping the accumulation function (green dots) low everywhere. The robots
collect data at the dynamic regions and shrink the accumulation function. The size of the
green dot is proportional to the value of the accumulation function at that location. On the
left, a stable speed profile maintains the accumulation function bounded everywhere for all
time, whereas on the right, the speed profile is not stable, and the accumulation function
grows unbounded in some locations.
that are fittingly referred to as speedprofiles. This speed controller enables robots to visit
faster changing areas more frequently than slower changing areas.
The persistent sensing problem is defined in [30] as an optimization problem whose
goal is to keep a time varying field as small in magnitude as possible everywhere. This field
is referred to as the accumulationfunction. Where it is not covered by the robots' sensors,
the accumulation function grows unbounded, thus indicating a rising need to collect data
at that location. Likewise, the accumulation function shrinks where it is covered by the
robots' sensors, indicating a decreasing need for data collection. A stable speed profile is
defined as one that bounds the largest magnitude of the accumulation function.
In this thesis, we extend the computed informative path configurations obtained from
the mixing function controller to be used in conjunction with stabilizing speed profiles
from the persistent sensing controller to create an informative persistence controller [33]
that locally optimizes a persistent sensing task. Figure 1-2 shows an example of two robots
performing a persistent sensing task with both a stable and unstable speed profile.
23
By assigning the accumulation function to a physical parameter such as oil spill levels, airborne particulate matter accumulation, or aggregate sensor errors, persistent sensing
becomes a very practical approach to a wide array of real world monitoring scenarios. In
this thesis, we consider an informative persistent sensing approach to urban Mobility-onDemand (MOD) systems, where the accumulation function represents the historical number of passenger arrivals at discrete sets of locations over a defined period of time. In our
previous work [22], we showed that autonomous driving can be used to mitigate the rebalancing problem current MOD systems face. Our objective is to minimize the waiting time
of the passengers and the amount of time the vehicles in the system drive empty between
subsequent customer requests. The critical question is where should each vehicle go once
a delivery is complete? To solve this problem, we leverage our informative paths and persistent sensing controllers to develop optimized task allocation algorithms in the form of a
dynamic patrolling policy for a fleet of MOD service vehicles such as taxis.
By using historical arrival distributions as input to our control algorithms, we can compute patrolling loops that minimize the distance driven by the vehicles to get to the next
request. The algorithm was trained using one month of data from a fleet of 16,000 taxis in
Singapore. The resulting informative loops are used to redistribute the vehicles along stationary virtual taxi stand locations along the loop. We compare the policy computed by our
algorithm against a greedy policy as well as against the ground truth redistribution of taxis
observed on the same dates and show up to a 6 x reduction over historical data in customer
waiting time and taxi distance driven without a passenger. These metrics represent two key
evaluation criteria: (1) quality of customer service and (2) fuel efficiency.
1.2
Contribution to Robotics
This thesis makes the following contributions:
* A decentralizedrobust informative control strategy. We extend the probabilistic and
geometric unifying controller presented in [25], so that now instead of statically partitioning themselves, a team of robots can adaptively compute closed online paths that
continually travel through regions discovered to be important by observation in an
24
145
125-
-04-
(b) Customer Demand Distribution
(a) Patrol Loops
Figure 1-3: Evolution of six informative patrol loops over historical demand distribution
in Singapore. Service vehicles travel along these loops to minimize the driving distance
between subsequent customer pick-ups. Each loop is updated 96 times over a 24 hour time
to account for differences in customer demand throughout the day. The peak amplitude in
the customer demand distribution represents the central business district of Singapore.
unknown and dynamic environment. The provably stable and decentralized adaptive
coverage controller uniquely combines robots' global sensor measurements based on
a mixing function to learn the location of dynamic events in the environment and simultaneously computes closed informative paths based on these aggregated sensor
behaviors.
A mixing function controller is advantageous because it is amenable to geometric,
probabilistic, and analytical interpretations, all of which have previously been presented separately [25]. We introduce a family of mixing functions with a free parameter, a, and show that different values of the parameter correspond to different assumptions about the coverage task, specifically showing that a minimum variance solution (probabilistic strategy) is obtained with a parameter value of a = -1,
Voronoi coverage (geometric strategy) is recovered in the limit a = --co, and Voronoi
smoothing coverage is recovered for 0 > a > -co.
Using a minimum variance controller (a = -1), we offer an improvement in informative path stability over the Voronoi approach presented in [33], by showing
that small differences in the robots' initial waypoint positions do not result in sig25
nificantly different informative path configurations. We derive both single robot and
multi-robot cases for each informative path control strategy and prove asymptotic stability and convergence using Lyapunov stability theorems. We develop and analyze
single robot and multi-robot informative path algorithms, and perform simulations in
MATLAB for both cases.
9 An informative persistencepath controller. As [33] did for a Voronoi-based control
strategy, we extend our robust adaptive coverage controller so that persistent sensing
tasks can now be performed in unknown environments when the robots' stabilizing
path configurations are unknown a priori. Combining a stability metric from persistent sensing tasks with our robust informative control strategy for the single-robot
and multi-robot cases we develop informativepersistence controllersthat locally optimize the persistent sensing task by generating informative paths for the robots and
subsequently increase the stability metric of the persistent sensing task. Lyapunov
proofs are used to prove stability of both the single-robot and multi-robot cases. We
evaluate and simulate both single robot and multi-robot informative persistence algorithms in MATLAB.
* A dynamic patrollingpolicy for afleet of service vehicles. We instantiate the informative persistence controller to an application in traffic: matching supply and demand
for taxis or a Mobility-on-Demand transportation system. The dynamic patrolling
policy is comprised of multiple patrol loops and a provably stable vehicle redistribution model. Patrol loops are generated using actual historical data from a fleet
of 16,000 vehicles over multiple days as the input to our mixing function informative
path controller. In line with our objective to match supply and demand, the computed
patrol loops minimize the instantaneous distance between customer requests and taxi
position, as well as minimize the length of each patrol loop. Once a configuration of
patrol loops has been computed, a centralized scheduling algorithm is implemented
to manage request allocation and vehicle redistribution for large-scale (> 500 agents)
MATLAB simulations. Our dynamic patrolling algorithm is tested against a greedy
service policy and actual historical taxi performance in Singapore.
26
1.3
Relation to Previous Work
This work builds on several bodies of work: (1) adaptive control, (2) informative path
planning, (3) coverage control, and (4) multi-agent systems. Adaptive path planning algorithms traditionally consider the real time mapping of paths to a set of desired states in an
unknown or dynamic environment in continuous-time systems. For example, in [34], an
optimization for path planning was presented for the case of partially known environments.
In [8], the authors present a path planning algorithm for deploying unmanned aerial vehicle
systems in an unknown environment. Most of the previous work in path planning focuses
on computing an optimal path according to some metric to reach a destination [14], [23].
In this thesis, the objective of the robots is not to reach a final destination, but instead,
continually travel along their computed closed path trajectories through regions of the environment where sensory information is nonzero. We highly prioritize generating paths
that allow the robots to travel through regions of interest in an unknown environment using
adaptive control strategies to create a novel algorithm for computing informative paths.
Informative path sensing extends adaptive path planning algorithms with an emphasis
on efficiently measuring and monitoring a dynamic environment. Such a method for computing paths that provide the most information about an environment was presented in [28],
with the aim of adaptively learning and traversing through regions of interest with multiple
robots. Informative sensing while maintaining periodic connectivity for the robots to share
information and synchronize was examined in [12]. Our work considers adaptive path planning and informative sensing in a similar context, by using a robust control strategy that can
use both geometric and probabilistic sensor measurement behavior to optimize a coverage
task in dynamic environment, as opposed to [25], where a non-adaptive probabilistic and
geometric unifying control strategy was implemented for a known, static environment.
Cortes et al. [7] introduced a geometric control strategy for multi-robot coverage in a
known environment that continually drives the robots toward the centroids of their Voronoi
cells, or centroidal Voronoi configuration. Schwager et al. [27] extended this work by
enabling the robots to sample and adaptively learn an unknown environment before they
began reshaping their paths. Similar work in Voronoi coverage includes [11], where the
27
objective was to design a sampling trajectory that minimizes the uncertainty of an estimated
random field at the end of a time frame.
Another common approach in coverage control is a probabilistic strategy. For example, [13] proposes an algorithm for positioning robots to maximize the probability of detecting an event that occurs in the environment and minimizes the predictive variance in
a time frame. Both geometric and probabilistic based control strategies are based on a
optimization that the controllers solve through the evolution of a dynamical system.
Geometric and probabilistic control strategies were unified in a mixing function controller introduced in [25]. This work serves as most relevant to our thesis, because it enables
a group of agents to position themselves statically in locally optimal locations according to
either a probabilistic or Voronoi sensing coverage interpretation of a known environment.
In this thesis, we build upon the mixing function controller by defining an agent's closed
path as a set of waypoints that can distributedly execute a parameter adaptation law and a
decentralized gradient control law to learn an unknown environment from robots' estimates
and compute informative sensing paths, respectively. A similar extension of a pre-existing
control law to enable informative path generation, was presented in [32], where a Voronoibased control strategy introduced in [27] served as the inspiration. The resulting informative paths computed by our mixing function control strategy locally optimize the sensing
position of each waypoint while minimizing the length of the informative path traveled by
the robots. Our mixing function control strategy can also be used in conjunction with a
governing region revisit policy or speed controller to achieve persistent sensing.
The persistent sensing concept motivating this thesis was introduced in [30], where
a linear program was designed to calculate the robots' speeds at each point along given
paths, in order for them to stabilize a persistent sensing task. A persistent sensing task
entails bounding the growth of sensory information within the environment for all times.
Examples of growing sensory information could include the amount of rainfall accumulated
in a given area, or the amount of measurement uncertainty at a point of interest in the
environment.
In [30], the robots were assumed to have full knowledge of the environment and were
given pre-designed paths. Following the method introduced in [33] for Voronoi coverage,
28
in this thesis, we remove all prior environment assumptions by having the robots learn the
environment through parameter estimation, and then use this information to shape their
paths into informative persistence paths. By removing these constraints, we create a viable
persistent sensing strategy for unknown and dynamic environments.
Persistent sensing is related to sweep coverage [5], where robots with finite sensor
footprints must sweep their sensor over every point in the environment. The problem is also
related to environmental monitoring research such as [4, 6, 15, 18, 37]. In this prior work,
the authors often use a probabilistic model of the environment, and estimate the state of
that model using a Kalman filter. The robots are then controlled so as to maximize a metric
on the quality of the state estimate. Due to the complexity of the models, performance
guarantees are difficult to obtain. In this thesis, based on our fully connected robot network,
we can provide guarantees on the boundedness of the accumulation function.
By likening the concept of informative persistence sensing to patrolling problems [9,
21], we are able to propose a control strategy that distributedly uses an informative persistence patrolling loop to locally optimize a task allocation scenario in a dynamic transportation network. Distributed dynamic vehicle routing scenarios are considered in [1], where
events occur according to a random process and are serviced by the robot closest to them.
Work on optimal task allocation dates to [19] and [10]. Mobility-on-demand (MOD) is a
similar paradigm for dealing with increasing urban congestion. Generally speaking, the
objective of MOD problems is to provide on-demand rental facilities of convenient and efficient modes of transportation [20]. Load balancing in DTA problems essentially reduces
the Pickup and Delivery problem (PDP), whereby passengers arriving into a network are
transported to a delivery site by vehicles. Autonomous load balancing in MOD systems has
been studied in [22], where a fluid model was used to represent supply and demand. In this
thesis, we employ a PDP problem formulation to model an urban transportation network.
Socially-motivated optimization criteria have also been considered in prior work. In
[24,36], social optimum planning models were used to compute vehicle paths. Optimization of driving routes subject to congestion was considered in [17]. In a broader context, [16] observed the effect that multiple service policies had on logistic taxi optimization. More recently, in [35] we studied both system-level and social optimization criteria,
29
showing a relationship between urban planning, fuel consumption, and quality of service
metrics. In this work we consider similar evaluation models, showing how we can achieve
an improvement with respect to all three of these aforementioned points of interest.
1.4
Thesis Organization
This thesis is divided into five chapters. Chapter 2 provides the main theoretical foundation
of the thesis and derives the mixingfunction informative path controller for both single and
multi-robot systems. Simulations and validations of the control algorithms are shown for
a wide array of robot control strategies including minimum variance, Vornonoi smoothing, and strictly Voronoi approaches. Chapter 3 extends the informative path controller
to persistent sensing and introduces stability margin requirements to the mixing function
controller. Simulations are shown for a minimum variance informative persistence control
algorithm. Chapter 4 presents a dynamic patrolling policy for a fleet of service vehicles in
a MOD system using informative paths derived from Voronoi based controllers. Chapter 5
concludes the thesis with final reflections and lessons learned.
30
Chapter 2
Informative Path Controller Using a
Mixing Function
The idea of using a mixing function for static coverage in a known environment with multirobot systems was introduced in [25] and was shown to produce results that were more
stable numerically as compared to a geometric Voronoi-based approach. In this chapter,
we build on this robustness insight and show that we can use a mixing-function-based approach to create an informative path controller that is more stable numerically than the
results in [33]. The mixing function control strategy consists of an adaptation law for parameter estimation and a gradient optimization of a coverage cost function consisting of a
sensing error cost, a robot path length cost, and a parameterized mixing function. We show
that informative paths generated by this control strategy can be altered by varying a free
parameter a to enable different sensor estimate mixing behaviors between robots that can
be interpreted as either probabilistic, geometric approximation, or geometric. The resulting informative paths computed by the mixing function control strategy, regardless of the
sensing interpretation, locally optimize the coverage task. A mathematical formulation of
the problem follows.
2.1
Problem Setup
Remark 1 All mathematicalsymbols used in this chapter are defined in Appendix A.
31
A sensory function, defined as a map 0 :
Q -+ R>o
, where
Q is a convex, compact
environ-
ment, determines the constant rate of change of the environment at an arbitrary point q E
Q.
Let there be N E Z+ robots identified by r E {1, ... ,N}. Robot r is equipped with a sensor
to make a point measurement 0 (Pr) at its position Pr E
closed path fr : [0, 1]
-+
Q C R2
while traveling along its
R 2 , consisting of a finite number nr of waypoints. Note that the nr
waypoints corresponding to robot r are different from the ni waypoints corresponding to
robot r', Vr' I r.
The position of the ith waypoint on r is denoted p E P c R , where i E
{1, ...
nrl,
P is the state space of a single waypoint, and dg- is the dimension of the state space. We
define Pr
{P , ... , pnr E r and P =[P1, ... , PG E !Nnr as the configuration vectors
of robot r and of all waypoints, respectively. Because fr is closed, each waypoint i has a
previous waypoint i - 1 and next waypoint i + 1 related to it, which are called the neighbor
waypoints of i. Note that i+ I = 1 for i = nr, and i - I = nr for i = 1. A robot moves
between sequential waypoints in a straight line interpolation.
For each waypoint, the cost of the sensing estimate of a point q E
Q from its
position
p , is given by the function
f(prq) = ||q-pgI 2 ,
(2.1)
where f(p , q) E R>0 and is differentiable with respect to p . The sensor measurement estimates of the N -nr waypoints are combined in a function g(f(p' , q), ... , f(PN, q)), called
the mixing function [25]. The mixing function ga:
N
Rs
R defines how sensory in-
formation from different robots is combined to give an aggregate cost of the waypoints'
estimate of q. We propose a mixing function of the form
ga
N
nr
f p , qa) a(2.2)
r=1i=1
where a is a free parameter. The mixingfunction manifests assumptions about the coverage
task; in that, by changing the mixingfunction we can derive variety of distributed controllers
including Voronoi coverage control (a = -oo), probabilistic coverage control (a = -1),
32
and Voronoi smoothing coverage control (-1 > a > -oo) [25].
Consider a sensing task in which an event of interest occurs randomly at a point q and
is sensed at a distance by sensors located on different robots. The mixing function (2.2)
assumes that different waypoints positioned at p and p5, may both have some sensory
information about the event, instead of only counting the information from the waypoint
that is closest to q as in the Voronoi approach. Unlike the geometric Voronoi approach,
the mixing function captures the intuition that using more sensor estimates may provide
a more accurate estimate of a point of interest in the environment than a single localized
sensor estimate of the same point. Mixing function coverage for -1 > a > -oo is shown
in Figure 2-la where the overlap of sensor estimates at two waypoint locations is shown
as the intersection of two circles. Figure 2-lb shows that the Voronoi coverage case only
considers robots' sensor estimates of q within their Voronoi partition [7]. Thus, allowing
for no sensor estimate mixing between waypoints.
Waypoint position
r
Waypoint position
Sensor cost
p, r.) ()an
r
Sensor cost
Mixing function
.f(p, q)
N
rI i=l
(b) Voronoi Schematic, a = --
(a) Mixing Function Schematic, -1 > a > -o
Figure 2-1: The mixing function defines how sensor measurements of the convex environment are shared by the waypoints. For probabilistic and Voronoi smoothing cases
(-1 ;> a > -oo), waypoints combine sensor estimates. For the Voronoi case (a = -oo),
only sensor measurements of points of interest within a waypoint's Voronoi partition are
considered, and waypoints do not combine sensor estimates.
The mixing function has several important properties. For a > 1, the ga becomes the
it
p-norm of the vector [fI ... ] T. When a < 1, ga is non-convex and not a norm, because
33
violates the triangle inequality. When a < 1, ga is smaller than any of its arguments alone.
Therefore, the cost of sensing at a q with different waypoints positioned at p and p Vihj,
is less than the cost of sensing with only one of the waypoints individually. Furthermore,
the decrease in ga from the addition of a second waypoint is greater than that from the
addition of a third waypoint, and so on. Thus, there is a successively smaller benefit to
adding more robots. This property is called supermodularity, and is shown in Figure 2-2.
it
Figure 2-2: As more sensing estimates are considered, the property of super modularity
dictates that the amount that the mixing function is decreased becomes increasingly less.
2.2
Mixing Function Cost
Building upon [25,33] and using (2.2), we propose a generalized, non-convex cost function
of the form
H(P)
s
g(f(pi, q),...f( Nq))O(q)dq+
w
N
nr
IIpi+1II
r=1 i=1
ws
N
N nr
r=1 =1
where |
nr
PIny|2
(E f(p , q)ag iO(q)dq +
r1
(2.3)
i=1
denotes the 12 -norm, and the integrand g(f(pl, q),...,f(PNq)) represents the
aggregated sensing estimate of all waypoints at a single arbitrary point q, with a corresponding weight Ws E Z+. Integrating over all points in
Q,
weighted by
4 (q),
gives the
first term of the cost function. The second term of the cost function represents the cost of
positioning neighboring waypoints of the same robot too far from one another. Ultimately,
34
this term dictates the cost assigned to the final length of the informative path, and it is given
a corresponding weight Wn E Z+.
Our goal is to develop a controller that stabilizes the waypoint around configurations
P* that minimize H [25]. The general mixing function cost (2.3) can be shown to recover
several common existing coverage cost functions. Drawing out the relations between these
different coverage algorithms will suggest new insights into when one algorithm should be
preferred over another.
Substituting (2.1) and (2.2) into the general cost function from (2.3), we explicitly derive the mixing function cost
WV
Ha(P) =
fQ
N nr
a E(lq
r=1 i=1
-
112) ) k (q)dq+
N nr
W
r +112.
(2.4)
r=1i=1
This robust cost function consists of a sensing cost, a robot path length cost, and a mixing
function cost. An equilibrium is reached between these individual costs when
=Ha(P)
0.
This optimization defines how the mixing function control strategy generates informative
paths. A formal definition of informative paths for multiple robots follows.
Definition 1 (Informative Paths for Multiple Robots using a Mixing Function) A collection of informative pathsfor a multi-robot system corresponds to the set of waypoint locationsfor each robot that locally minimizes (2.4).
2.3 Mixing Function Control Law
Because Ha is non-convex, a gradient based controller of the form
-irdHa(P)(25
uU
=-
(2.5)
yields locally optimal waypoint configurations P* for a control input u with integrator
dynamics, and a strictly positive definite gain matrix K[ [25].
35
By substituting the explicit value of Ha(P), (2.5) becomes
p = -K[
J
S-K
)
(
WS
i]Q
a-1
N
dp
LE
2
dp
the general gradient based controller (2.6) becomes
By expanding
=
)
2 ga
Q
g
(
2
- 2qp + (pi) 2 )
N
q+___1
f(r
wd
1 1:22dpp
lpmt+
)a-1(2
ga
r=1 1=1
(2.7)
PI
Using this gradient descent approach, we propose the following generalized mixing
function control law to locally minimize (2.4) and enable waypoints to converge to an
equilibrium configuration
pJf=-KW (
I I
t
Q
')a1(-q+p)O(q)dqwp+
Ii
ga
-Wp _1 ,r
(2.8)
By substituting the values of the sensor cost (2.2) and the mixing function (2.2), the mixing
function control law is explicitly defined as
K2pf~~
Ws1- -p 0 a
Q
_12)a-1
~
= (q - p) 0 (q)dq - Wnp+,
nf1;
29
r=1i=1
It follows from (2.8) and (2.9), that the term inside the integral, (a,
Enf- 1 (flq
-
2 (2.6)
(_W
(q)dqd+
f1 2 )a) Ia (Iq - p f12 )a- 1 is equivalent to (f('q)a-i. This term is important, because it
gives an approximation to the indicator function of the Voronoi partition of waypoint i of
robot r. The approximation improves as a -+ -oo.
to a Voronoi partition, (f(,q)
In addition to giving an approximation
)a-' defines how the sensor estimates of different robots are
combined over the environment. As shown in Section 2.4, Voronoi coverage is defined as
lima,--(
' )a-'. At this limit, there is no sensor mixing between different robots.
Using this intuition, we are able to use our coverage controller to approximate a Voronoi
coverage controller with a higher degree of accuracy as we decrease a towards -oo. Even
36
for values of a >> -oo, i.e. a = [-10, - 15], the smoothing controller approximates the
Voronoi partition arbitrarily well. The resulting contours of
)a-I for various a val-
(f (pi',
ues are shown in Figure 2-3.
0.9
0.9
0.8
0.8
\
0.7
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0.2
0.4
0.6
0.8
0
1I
0.2
(a) a = -0.5
0.4
(b) a
0.6
0.8
0.6
0.8
1
-
0.4
0.9
0.8
0.8.
0.7-
0.7--
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.21,
0.1
0.1
"0
-
0.9
-
0.4
0.2
0.4
0.6
0.8
1
(c) a = -10
"0
0.2
0.4
(d)
1
a = -15
Figure 2-3: Contour plots of (f(p ',q))a-1 are shown for two robots with five waypoints
each. As a -+ -oo, the contours approach a Voronoi partition.
Next, we define three substitution variables analogous to mass-moments of rigid bodies.
The mass M[, first mass-moment Y', and centroid C[, of environment
37
Q,
for a mixing
function are defined as
Mi
=
jWs( (P
))a)-I(q)dq,
(2.10)
Yr
=
jWs( (
ga))a-14(q)qdq,
(2.11)
_
C'
yr
=-.1
(2.12)
'-M.
Let e = CT - pT. Note that f( Iq - pg||) strictly increasing and 4(q) strictly positive imply
both Mr> 0 V Q
0 and Ci is in the interior of Q. Thus M[ and C have properties
intrinsic to physical masses and centroids. Using these inertial property substitutions, the
mixing function control law (2.9) is be defined as
,.Ki (Mie + ir
,,p[
; =
)
(2.13)
where
=
#i=
Wn(PI 1+p_
1 -2p
),
Mr+2Wn > 0.
(2.14)
(2.15)
Remark 2 Pi3 > 0 normalizes the weight distributionbetween sensing and staying close to
neighboringwaypoints.
2.4
Deriving Common Control Strategies
In this section, we use the mixing function cost (2.4) and coverage control law (2.13) to
derive Voronoi and minimum variance control strategies. These strategies represent the
range of robot sensor behaviors that can be produced by using the mixingfunction. Whereas
a Voronoi coverage strategy does not combine sensor measurements from different robots,
a minimum variance strategy combines all robots sensors measurements to minimize the
expected variance of the global sensor estimate.
38
2.4.1
Voronoi Control Strategy (a = -oo)
From [27], we define the Voronoi partition of the ith waypoint along fr as
V
=
{q E Q: |jq - p 1; |q - p* 1|, V(r', i') # (r, i)},
where r, r' E {1,...,N}, i E {1, ... ,nr} and i'E{1, ... ,n,.}.
(2.16)
Because lima-+-o ga(f(p',q),...,f(Pn,q)) = min f(p ,q), it follows that for a = -oo,
gJ=1=
(4--f2)a
_
1
2,
which implies the Voronoi indicator
function (f(pq )a-I = 1. Intuitively, min ga stipulates that there is no sharing of sensor
measurements between waypoints over the environment, and consequently, waypoint i of
robot r considers only q E Vr.
As a result of no sensor mixing between different robots, the cost incurred by all the
robots due to the event at q is the same as that incurred by the robot that is closest to
q. Thus, for g_., the coverage cost of the mixing function controller is equivalent to the
coverage cost of a Voronoi controller, which is defined in [32] as
N
W
nr
Hy
H
r=1 i=1
2 0(q)dq+
-|q-p
i'r
N nr W
p - p+(2.17)
1 HPHI
(2.17)
r=1 i=1
By redefining the mass M[, first mass-moment Y[, and centroid CT of waypoint i's
Voronoi partition Vir as
M
Yr
=
Ws(q)dq
(2.18)
=
Wso(q)qdq,
(2.19)
Yr
Cr
yr
(2.20)
and by setting e = CT - p , the resulting gradient descent Voronoi control law is derived
u- KM e+/)
39
(2.21)
where yf and
#3f are the
same as in (2.14) and (2.15), respectively. From (2.21), we can
see that the Voronoi control law differs from our proposed control law due to the absence
of a mixing function in the mass properties M[ and Y[ of the Voronoi partition V[.
2.4.2
Minimum Variance Probabilistic Control Strategy (a = -1)
In this section we use a mixing function with free parameter a
-1 to derive a control
strategy that minimizes the robots' expected variance of their measurements of a point of
interest q. We will formulate an optimal Bayesian estimator for the location of q given the
aggregate measurements of the waypoints.
Assume that the waypoints have noisy measurements of the position of a point of interest in the environment. Let the point of interest be given by a random variable q that
takes on values in
=
Q, where waypoint i of robot r has the current environment measurement
q-z. Here, z ~N(O,I2 x2
f(pf, q)) is a bi-variate normally distributed random vari-
able, and I2x2 is an identity matrix. The variance of the measurement, f(p , q) is a function
surement likelihood of waypoint i of robot r is Pr(xflq : p4) =
)
exp(
)
of the position of the sensor estimate and the point of interest. Given this variance, the mea-
Assuming the measurement estimates obtained by different waypoints conditioned on q
are independent, and that 0 (q) is the prior distribution of the q's position, Bayes' Theorem
gives the posterior distribution,
=1t I
Pr(Xir Iq : p ) 0(q) N I
Pr1qXy
,p...
FI,
,fNH Hnl Pr(Xflq :p )q(q)dq(
Our goal is to position the waypoints so that their total estimate of q is as accurate as
possible. To achieve this, we want to position the robots so that they minimize the variance
of their combined sensor measurements. The product of measurement likelihoods in the
numerator of 2.24 can be simplified to a single likelihood function, which has the form of
an un-normalized Gaussian
N
fl
nr
Pr(xIq: pD) = yexp(
r=1i=1
40
I
T
g2g_2
2
,
( .3
(22
f(p ,q)- 1 )-1. The
whose variance is equivalent to our mixing function g-1 = (EL 11
values of y and X[ are given by
N
nr
(2.24)
XT = g-1 YE Ef(Pq)- -1 and
r=1 i=1
T
1
=
)
11g _1
exp( 1||F
-
N
nr
iiTii2
q ),(2.25)
-
2
r=q)i=1
2
respectively.
Finally, the expectation over q of the likelihood variance recovers our original general
mixing cost function (2.3),
W
_ 1Eq[exp(
H -12
E [Y
-|
XP
J Q~1(f(Pi,~)
=
I--|
2
2
N nr
)]1+ E E
g-1
W
pIi+112
r=1i=
N
r=1
),.,f(p))]~d+
-g1(f (p', q), - - ., f )p ))(q)dq +r 2
Ii+|2
nr W
i=1
N flrjpr
(2.26
(2.26)
From this derivation, we can interpret the coverage control optimization as finding the
waypoint positions that minimize the expected variance of the likelihood function for an
optimal Bayesian estimator of the position of the point of interest q.
2.5
Mixing Function Controller Convergence
In this section we introduce sensory function parameterization and prove that the proposed
mixingfunction control law in (2.13), causes the set of robot path configurations to converge
a locally optimal configuration according to (2.4) for both known and unknown environments.
2.5.1
Sensory Function Parameterization
The sensory function O(q) can be parameterized as a linear combination of a set of known
basis functions spanning Rd, where d is the dimension of the function space.
41
Assumption 1 (Sensory Basis Functions) ~a E R$> and B: Q - R>, where R> is a
vector of nonnegative entries, such that
O(q) = B(q) T a,
(2.27)
where the vector of basisfunctions B(q) is known by the robots, and the nonnegative parameter vector a is estimatedin an unknown environment.
In adaptive control literature [29], B represents the set of parameters that can be measured, given full state feedback and observable system dynamics. The parameter vector
A
represents the set of estimated system parameters such as unknown trajectories or inertias.
Denoting ar(t) as the robot's estimation of a it follows that, $(q) = B(q)TA is the
robot's approximation of 0 (q), and the mass moment approximations can now be defined
as
r
Q
Ws( (
ga
j=Ws(f( ( )a-1qo(q)dq,
r
(2.28)
)a1(q)dq,
(2.29)
kr
=
Cr
Mi'
),(2.30)
for the mixing function control strategy
A - a, the sensory function Using
error anda massamoment errors of the mixing
function are defined as
#(q)
=
Rf
=
kr
e
^(q)-(q)=B(q)T a,
Ar - M[= jWs(f (a)a-1B(q)Tdq
r
=
(2.31)
i,
(2.32)
yr = Q WSgq )a-lqB(q)Tdq i,
(2.33)
-i-.
Mr
(2.34)
42
When a = -1 and we recover the minimum variance mass moment errors
A
=
A - M =
,
=
fir
er
=
)2 2B(q)Tdq a,
(2.35)
1 )-2qB(q)dq j,
Ws
JQg-i
(2.36)
Ws(
JQ
_y =
g-1
ir-.(2.37)
Mr
Similarly, when a = -o, (pq) )a-1
-
1, and we recover the Voronoi mass moment errors
Rjr= Af -M=
WsB(q)TdqA,
(2.38)
=f
WqB(q)Tdq A,
(2.39)
firyr
jirr
e
=
(2.40)
-r,
In order to compress the notation in all three mass moment approximation notations, we
#
set the terms BPr (t) and OPr(t) as the value of the basis function vector and the value of
at the robot's position pr(t), respectively.
2.5.2
Coverage Convergence in a Known Environment
We assume that the robots and waypoints have full knowledge of the sensory parameter
vector i.e. ar = a, and therefore their sensory function estimate $(q) =
(q).
Theorem 1 (Mixing Function Convergence Theorem in a Known Environment) The configuration of all waypoint positionsP, converges to a locally optimal configurationaccording to dHa/dp =0.
Proof 1 We define a Lyapunov-like function based on the agent's path and environment
measurement. Because the system is autonomous, we use LaSalle's Invariance Principle
and invariant set theory to prove asymptotic stability of the system to a locally optimal
equilibrium.
43
Let Ha be the Lyapunov function candidate. Because it is comprised of two squared
2-norms, Ha is positive definite. Additionally, Ha -+ oo as p --+ o, and has continuousfirst
partialderivatives. Domain Q is bounded, and therefore the state space of all waypoints
yNnr is bounded. Let 9 = {P* I Rap* 0} C P be the invariantset of all criticalpoints
of Ha over yNn. Taking the time derivative of Ha, we obtain
N
Ra
=
nr
E
r=1 i=1
N
dH T
r
p~i
P
nrl
r=1i=1
E (Mie +
) T K (Mie i+4) < 0.
The first derivative of the Lyapunov candidateHa < 0, because K[ is strictly positive definite and /3/ > 0. Because Ha is negative semi-definite and Ha is positive definite, this
implies that Ha is non-increasingand lower bounded, thus 3s < 00 such thatlimt, Ha
Now 9 is explicitly defined as the set of solutions ofY2 NLi
S be the largest invariant set within 9. By definition i =
=i4
0, Vi, r. Let
1 Mre=+
1-K
a =
s.
KI
I
ifi), from
which it follows that S = f, the set of all criticalpoints of Ha. Thus, 9 is an invariantset,
and all trajectories converge to 92 as t
-
oo using LaSalle's Invariance Principle. From
(2.13), Mier + fi -+0 implies dHa/dp =0.
2.5.3
Coverage Convergence in an Unknown Environment
We now extend the mixing function control law in (2.13) to include a parameterized adaptation law that ensures each robot's independently synthesized path converges to a locally
optimal configuration according to (2.4), while each of the robots' estimates of the environment converge to the real environment. The presence of a consensus term in the adaptation
law enables all of the robots' estimates of the environment to converge to the same estimate [27]. In order for robots' estimates of the environment to converge, the consensus
term requires that each robot has knowledge of the states of all the other robots. Thus, we
assume that our network of robots is fully connected.
In an unknown environment, robots only have sensory estimates Or (q) of # (q), so the
44
control law from (2.13) becomes
d_
=1
(2.41)
where
2p ),
t =Wn (Pr 1+ p _
f
=
$i+ 2Wn,
Parameter vector ar is adjusted according to the following features of the adaptation law
Ar
=
(2.42)
Jwr(r)Bpr(r)Bpr(,r)Tdr,
r
rWr()B,(r)()p,(r)dr,
=2.43)
where the data collection weight Wr() [27] is defined as
Wr(t)
positive constant scalar,
if t < wr
0,
otherwise,
(2.44)
where 4r, represents the time at which part of the the adaptation for robot r shuts down to
maintain Ar and Ar bounded. Let
br =
j
W(
b =1 Y, 4 ,(
q) )a-B(q) (q - p)
T dq
p2,
(2.45)
),
(2.46)
N
aprer =-br-T(Arr-A
r) -
ir,r'(^rr'=1
where
4
> 0 is a consensus scalar gain, and l,,1
45
can be interpreted as the strength of the
communication between robots r and r' and is defined as
Dmax -||Pr - PW|1,
if ||Pr- A 1 <; Dmax
0,
otherwise.
(2.47)
We select Ir,g so that our assumption of a connected work is satisfied, and such that (2.46)
maintains continuity on the right-hand side, so we can apply Barbalat's Lemma [29] to
prove asymptotic stability of a positive-definite Lyapunov function candidate. Explicitly,
we define lr,,. to be a constant for all r, r', which corresponds to a fully connected network
between robots.
Because a(j)
0, Vj, we require ar(j)
0, VrVj, by the adaptation projection law
[27],
ar
Pre
-projraprer)
(248)
where F E Rdxd is a diagonal positive definite adaptation gain matrix, and the diagonal
matrix Iproj, is defined element-wise as
Iproj(rU)
0,
if ar(j) > 0,
0,
if ar(j) = 0 and Aprer(j)
1,
otherwise,
0,
(2.49)
where (j) denotes the jth element for a vector and the jth diagonal element for a matrix.
The adaptation law (2.48) includes a weight function Wr(t) in the calculations of Ar
and 11r. The weight function must maintain Ar and Ar bounded in order for our stability
proof to hold [27]. Because the sensory function is assumed to be time-invariant, we use a
step function to represent the weight. Initially the weight is set at a nonzero constant value,
before it is set to zero in order for Ar and Ar to remain bounded. This choice of weighting
function enables robots to spend a finite amount of time sampling the environment sensory
function before they stop sampling. Their estimates of the sensory function are based on
46
the finite amount of sampled points. This weight function is defined as
if t <
w,
0,
Wr(t)
where w is a positive constant scalar, and
(2.50)
'rWr
otherwise,
is some positive time at which robot r stops
zr
sampling the environment's sensory function to guarantee Ar and Ar bounded.
Theorem 2 (Mixing Function Convergence Theorem in an Unknown Environment) With
waypoint dynamics specified by (2.5), control law specified by (2.41) and adaptive law
specified by (2.48), we have
1. Waypoint EquilibriumConvergence
limt, JJA=(t)j (t)+ yf(t)
0, Vr E {1,...,N}, Vi E
, ... nrl,
2. Environment Estimate Error
Vr E { 1,...,N}, Vr Iwr(,r) > 0,
liMt--- O||Pr(r) II= 0,
3. Robot Consensus Error
limtaoo(* - 40) = 0,
Vr, 'E { 1,...,IN}.
Proof 2 We define a positive-definite Lyapunov function Va based on the robots'pathsand
sensory function estimates and use the three criterionof Barbalat'slemma [29]:
1. Va is positive-definite
2. Va is negative-semi-definite
3. Va is bounded (Va is uniformly continuous)
to imply Va -+ 0 and the subsequent asymptotic convergence of the multi-robot system to a
locally optimal equilibrium. Let Va be defined as
1 N
Va = Ha+ jiTP-'Ar.
r=1
47
(2.51)
Taking the time derivative of Va, we obtain
N nr
E Y.
Y/a
dHa T
i
r=1 i=1
N
N
r=1
nr
r=1 i=1
N
r.
F a)+
-(Me +
(2.52)
r=1
From (2.32), (2.33), (2.34),
MiC =MrcT+MTrcT-cT).
yr
Pluggingthis into (2.52),
N
Ya
nr
(M
=
({Sf!I
+ ,r
r=1 i=1
Y.rNr
+ E iF1 ~r.
r=1
Using (2.32), we have
N
nr
Va = E E -(fie-
+
) T pr
+i'
.'rp ) T p
-
r=1i=1
N
r=1
Substituting the dynamics specified by (2.5) and control law specified by (2.41), we obtain
N
1 1:
Va
nr
_
r=li=l
N nr
r=1 i=1
hi
pir
(~ r + ,r) T Kir(k!ir+ ,r)
N
_
Kp T K
T
r=1
48
-r r-
Using (2.32) and (2.33),
N
Va
nr
1
-E
-,ie+
i)TK $!+V
pir
W
f(
+N
r=1
i=1 Q
)a-1B(q)(q - pu) T dq
'
ga
N
&Tr- Ir
r=1
Plugging in the adaptationlaw from (2.45), (2.46) and (2.48),
N nr
Va
1
T
E----(AjrK+r) Kf (M+i+4)
r=1i=1
i-
=
N
-T 1
(Ar~r - Ar)
r=1
N
-
N
r
rr
r-
r=1
N
E
-
r=1
fprojrAprer
Using (2.42) and (2.43), the second term in (2.53) becomes
t
N
-=
N
-- =1
jr
Wr()B(Pr('))B(Pr()) dr
j wr('r)B(Pr(z))(Pr(,r))dz]
t
0Wr (T) Or(Pr(T))T [r(Pr(r))- O(Pr(r))]dr
N
I
t
r= _yI0wr
(
r (Pr(r)))2dr
49
(2.53)
Plugging this expression back into (2.53) we obtain
Ya
N nr
I
N fr
l
^rA6+
=
r=1 i=1
fi
-Er=1
0
N
r=1
N
r'=1
r=1
rI
Wr(T) (Or(Pr(T)))2dr
N
X
-
Tr(i
7KfA6+
(2.54)
NrIprojraprer.
Let 1= [ 1,..., i]T. From [27], we can representthe thirdterm in (2.54) as where!Qj = a(j)1, Qj =
,N (j)]T
...
and %j =f0; -
1
i
L(t)nj,
j. L(t) is the weighted graph
Laplacianof the system at time t and is defined entry-wise by
-(r',
for r -pr',
L(r, r') =(2.55)
Eir, r', for r = r'.
This Laplaciangraph is positive-semi-definite, because it has exactly one zero eigenvalue
whose eigenvector is 1 [27], as a result of the network of robots being fully connected.
Thus, xTLx = 0 only if x = v, for some v E- R. Consequently,
TL = a(j)1TL = 0, Vj.
Therefore,
d
-C
Td
T
2 i LK^2j,
91 L92j = -
1
j=1
j=1
and itfollows that
N nr
Va
=
1
E - =($re(+yr) K,
r=l i=1
N
Wr (T) (Or(Pr(r)))2dr
-YE
r=1 fo
d ^T
--
I
Pi
t
n
N
Lnj - E &TIprojApre,.
j=1
r=1
50
(2.56)
We denote the four remaining terms in (2.56) as 61 (t), V92(t), t93 (t) and 64(t), respectively, so that Va(t) = t1(t) + V 2 (t) + tb (t) +
definite and J3[ > 0, 01 (t)
0. t2(t)
quantity, while [27] proved that 04 (t)
4 (t).
Because Kf is uniformly positive-
0 because it is the negative integrandof a squared
0. Because we assume a fully connected network
to allowfor robot estimate consensus, itfollows that L(t) > 0 Vt. Thus, implying I3 (t)
0.
Consider the time integral of each of these four terms, fit O (r)dr, k = 1,2,4. Because each of the terms is negative definite, ft 6(yr)dr < 0, Vk, and because Va is positive definite, each integral is lower bounded by ft 6k(r dr > -V, where Va is the initial
value of Va. Therefore, these integralsare lower bounded and non-increasing, and hence
limt, ft 6k(
dr exists and isfiniteforall k.
Now show that Va is uniformly continuous (Va bounded). It was shown in [27] (Lemma
1) that ti (t) and i)2(t) are uniformly bounded, implying '1 (t) and t9 2(t) are uniformly
continuous. Thus, by Barbalat'sLemma, 61 (t)
-
0 and t (t) -÷ 0. This implies propo-
sition (i) and (ii). It was shown in [27, 32] that ts(t)
is uniformly bounded, and it is not
differentiable only in isolated points on [0, oo). Because the network is fully connected,
t)3 (t) is uniformly bounded and uniformly continuous in time. Therefore, by Barbalat's
Lemma [29],
9-3 (t) -+ 0, which implies that 9 LK; -+ 0, Vj. Given the definition of the
weighted Laplacian, this implies that Lj -+ ainl1j)1, Vj, where ^inal E RI 0 is the last
common parameterestimate vector sharedby all robots. This impliesproposition (iii).
Remark 3 This multi-robot convergence proofencompasses the single robot case N = 1.
The single robotproofdoes not include the term V-3 (t), because there is no need to consider
consensus error if there is only one robot in the environment. Consequently, f = 0.
Remark 4 Proposition(i) from Theorem 2 implies that the paths reach a locally optimal
conf gurationfor sensing, when the waypoints reach a stable balance between being close
to their neighbor waypoints and minimizing the mixing function and estimated sensing
error Propositions(ii) and (iii) from Theorem 2 imply that Er(q), Vr -+ 0 for allpoints on
any robot's trajectorywhile the weight wr (t) > 0 over the environment.
51
2.6
Single Robot Coverage Algorithm and Simulation
In this section we present the algorithms used by a single robot to generate an informative
path in an unknown environment. The coverage algorithms for these simulations are divided into a two level hierarchy: (1) robot level and (2) waypoint level. The robot level
corresponds to the robot traveling and sampling the environment along its initial path. This
algorithm assumes that each robot knows the positions of the waypoints along its path.
The robot travels between waypoints in a straight line and makes a measurement of the
environment at each waypoint. With each robot measurement q (pr), the robot updates the
estimated parameter vector A as shown in line 11 of Algorithm 1. The complete robot level
algorithm is shown in Algorithm 1.
Algorithm 1 Mixing Function Controller for a Single Robot in an Unknown Environment:
Robot Level
Require: 0 (q) can be parametrized as in (2.27)
Require: a > 0
Require: Waypoints cannot reconfigure faster than the robot traveling the path
Require: Robot knows location of pi, Vi E { l, ... ,nr
1: Initially robot is moving towards pi
2: Initialize A and X to zero
3: Initialize a element-wise to some bounded nonnegative value
4: loop
5:
6:
if robot reached pi then
move towards Pi+1 in a straight line from pi
7:
else
8:
9:
10:
11:
12:
move towards pi in a straight line from Pr
end if
Make measurement 0 (Pr)
Update a according to (2.48)
Update A and . according to (2.42) and (2.43)
13: end loop
The waypoint level algorithm corresponds to the waypoints reconfiguring themselves
into locally optimal positions to generate an informative path based on the optimization
of the mixing function cost (2.4). In the waypoint level algorithm, each waypoint uses
the parameter vector &updated by the robots as an input to the mixing function controller
that drives its dynamics as shown in line 5 of Algorithm 2. The complete waypoint level
52
algorithm is detailed in Algorithm 2.
Algorithm 2 Mixing function Controller for a Single Robot in an Unknown Environment:
Waypoint Level
Require: Each waypoint knows A from Algorithm 1
Require: Each waypoint knows the position of all other waypoints on t, [Pi,. -,Pn,]
Require: Each waypoint knows a
1: loop
2:
3:
4:
5:
Compute ga according to a
Compute Ci according to (2.30)
Obtain all waypoint locations [p1,...,p,]
Compute ut according to (2.41)
6:
Update pi according to (2.5)
7: end loop
The informative path algorithm for a single robot was tested in MATLAB using minimum variance, Voronoi smoothing, and Voronoi coverage strategies, where a = { -1, -10, -oo}
in the mixingfunction controller, respectively. For each coverage approach we consider one
robot with n = 50 waypoints. A fixed-time step numerical solver is used to integrate the
equations of motion and adaptation law using a time step of 0.01 seconds. The environment
Q is a unit square.
The sensory function 0 (q) is parametrized as a Gaussian network with
Gtru
exp
2v/~2~I
=
=n
2a2,
exp - n
,
G(j)
-
25 truncated Gaussians, i.e. B = [B(l) ... B( 2 5 )]T, where
B(j) G
(j) - Gtn,
0,
if q - yl < ptmnc,
otherwise,
(2.57)
c = 0.3 and ptcnc = 0.2. The unit square is divided into 8 x 8 discrete grid and each pj is
selected so that each of the 25 Gaussians is centered at its corresponding grid square. The
parameter vector a is chosen as a(4) = 80, a(14) = 60, a(22) = 70, and a(j) = 0 otherwise. The environment created with these parameters is shown in Figure 2-5c. The parameters a, A and / are initialized to zero. The parameters for the controller are Ki = 70, Vi,
53
F
identity, y= 2000, W, = 10, Ws = 100, w = 30. The spatial integrals are approximated
by summing integral contributions over a 10 x 10 discretized environment grid.
In order for the robot's estimate of the environment to converge to the real environment,
we design an initial sweeping path so that the robot will have a rich enough trajectory to
observe the entire environment. The robot travels along its initial path without any reconfiguring, so that it can sample the space and learn the distribution of sensory information. This
process is referred to as the learning phase. The path shaping phase begins immediately
after the robot has traveled its entire initial trajectory once. During the path shaping phase,
w = 0 according to (2.50), and we show that the mean waypoint position error converges
to zero, thus, experimentally verifying that mixing function controller (2.41) reconfigures
the waypoints of fr into an informative path.
2.6.1
Learning Phase
The robot travels its initial path trajectory once to measure and estimate the environment
according to the adaptation law (2.48). As the robot travels along its path, the adaptation
law causes the robot's estimate of the environment to converge to the real environment description as defined by 0 (q) - 0. Figure 2-4a validates this covergence by showing that the
mean integral parameter error limt,
ft w()(bp,(r))
2
d
r - 0, in accordance with propo-
sition (ii) from Theorem 2. To complete the stability criteria from Section (2.5.3) for this
learning phase, Figure 2-4 shows that the Lyapunov function candidate Va is monotonically
non-increasing and infimum(Va) > 0.
The learning phase simulation for a single robot is shown in Figure 2-5. As the robot
travels along its initial sweeping path, Figures 2-5a - 2-14e show the adaptation law causes
O(q) -+ 0, Vq E
Q and the
robot's estimate of the environment converges to the real envi-
ronment description.
54
35
r-
30-
25-
20-
f~15-
10-
5-
0C
0
50
100
150
200
250
300
350
400
450
Path Iterations
(a)
#(q)
Mean Integral Parameter Error
8000
70006000g 50004000R-5300020001000-
0
50
100
150
200
250
300
350
400
450
Path Iterations
(b) Bounded Lyapunov Function Va
Figure 2-4: The mean integral parameter error shows that the robot's estimate 0 (q)
-+
#(q).
Peaks occur when the robot encounters a point of interest while its sensing estimate is
currently zero. As required by our convergence proof, we also show that Va is positivedefinite and bounded.
55
GO_
026
0
Q4
.2
06
0.0
(a) Learning iteration: 1
0.2
C
O4
0.
0.6
1
(b) Learning iteration: 30
06
0.2
_0
0.2
04
0.6
C.8
I
(c) Learning iteration: 90
0-
0
2.2
04
0.6
0.8
1
(d) Learning iteration: 160
Figure 2-5: Single robot learning phase with informative path controller 1st Column: The
initial learning path connects all the waypoints, shown as black circles. The black arrow
represents the sensing position of robot. 2nd Column the translucent environment represents the true environment and the solid environment represents the estimated environment.
This figure directly correlates to Figure 2-4a.
56
2.6.2
Path Shaping Phase
Immediately after the robot travels through its initial sweeping path once and learns the
environment, controller (2.41) is activated. The robot provides its estimates of the location
of the points of interest to its waypoints through A, so that the mixing function controller
can drive them into an informative path configuration to cover these points. Figure 2-6
confirms that the path has converged to an equilibrium configuration according to (2.4), by
showing that the mean waypoint position error after the path shaping phase is zero.
0.25
0.2
0.15
0
S0.1
0.05-
I
^III
50
100
I
150
200
250
300
350
400
Path Iterations
Figure 2-6: Mean waypoint position error. ||Ay
4 (t)ei(t)+ai(t)| -+ 0. Waypoints converge
to an equilibrium defined by (2.4), that balances thorough sensing and short coverage paths.
The peak at 200 iterations corresponds to the beginning of the path shaping phase.
The path evolution using this controller is shown in Figure 2-7 for a = [-1,10, -oo].
After 100 iterations (not counting initial learning iterations), the paths already go through
all dynamic regions of the environment. It is important to note how well the Voronoi
smoothing controller approximates the Voronoi controller even for a >> -oo.
57
450
0A
0.8
0.6
0.2
0.2
0.2
02
0.4
06
0.8
0.2
0
(a) a = -1, Iteration: 5
0.2
0.4
0.6
0
0.8
(b) a = -1, Iteration: 40
0.2
0.4
0.8
08
(c) a = -1, Iteration: 100
0.8
0.8
01
0.6
0.4
0.4
0.2
0
0.2
(d) a
0.4
=
0.6
0.2
0.8
-10, Iteration: 5
0.4
0.6
0.8
0
0.2
0.4
0.6
0.
(f) a = -10, Iteration: 100
(e) a = -10, Iteration: 40
0.80
8
0.606
2
00 0.2
(g)
a
0'4
=
0.6
0.8
-oo, Iteration: 5
0
0.2
0.4
CA
0.2-
0.2
0.4
0.6
0.8
(h) a = -0, Iteration: 40
0.2
1
(i)
a
0.4
= -co,
0.6
c
1
Iteration: 100
Figure 2-7: Single robot path shaping phase with an informative path controller. Rows
1-3 show the path evolution of the minimum variance, Voronoi smoothing, and Voronoi
controllers, respectively. For each of these strategies, W, > W, thus we expect longer paths
with more thorough environment coverage. The paths connects all the waypoints, shown
as black circles. The black arrow represents the robot's position. For a = -10 we already
begin to visualize a very close approximation of the Voronoi controller as shown by the
similarities of their resulting informative paths.
58
2.6.3
Varying Sensing Weight W and Path Weight Wn
The Informative paths in Figure 2-7 were given by W = 100 and Wn = 10. Given that the
ratio wnsL 10, the weights are heavily skewed to encourage thorough sensing instead of
shorter path configurations. By increasing the neighbor waypoint distance weight W, the
controller will provide greater attractive forces between neighboring waypoints, causing
the paths to be shorter. Therefore, the system evolves in a way that the environment is
covered with low-length paths with less of an emphasis placed on sensing. In Figure 2-8,
we compare two weighting scenarios: 1.) Wn > W 2.) W, > Wn. In scenario 1, Wn is
high, providing high attractive forces between neighboring waypoints. As a result, we see
more of an emphasis placed on neighbor waypoint weights and shorter coverage paths. In
scenario 2, Wn is low, thus resulting in low attractive forces between neighboring waypoints.
This causes the waypoints to be more distant to each other and focus more on the coverage
task. Depending on the application, the weights W,, and W can be selected to satisfy either
sensing or size constraints.
0.8
08
OAS
0.2
0.2
0.4
(a) Wn > W,
0.8
Oe
.
S
0.4
04
0.2-
02
0
a = -1
02
04
Me
(b) Wn > W,a =
0
Os
10
0.4
0.6
0.
0.M
01
0.6
0.2
"
0.2
(c) W > W,a = -oo
0.8
0
D
02
04
Q6
(d) Ws > Wn, a= -1
0.8
0.4
0.4
0.2
0.2
0
02
04
0.6
0.8
(e) W > Wn,a = -- 10
I
1
0.2
O
0.
083
I
(f) Ws > W,a = -oo
Figure 2-8: Single robot W vs. W,. Top Row: Wn > W. Bottom Row: W > W. Voronoi
smoothing (a = -10) is indistinguishable here from Voronoi coverage behavior.
59
2.6.4
Computational Complexity of Single Robot Algorithm
The gradient controllers described in this work are discretized and implemented in a discrete time control loop. At each iteration of the loop the controller computes spatial integrals over the region. A discretized approximation is used to compute the integral of 0 (q)
over Q. The two parameters that impact the computation time are the number of waypoints
nr and the number of grid squares in the integral computation, d. Unlike a Voronoi approach, we do not need to check if a point is in a polygon. However, the integrand g(a)
we integrate over is linear in nr. The time complexity for computing a discretized integral
is linear in the number of grid squares, which is a O(d) operation. Therefore, the total time
complexity of the controller is O(d -n,) during each iteration. This controller is significantly
less expensive than a Voronoi controller, because it does not require a Voronoi tessellation
computation. As a decreases, the behavior of the controller approaches Voronoi-based
coverage. This implies that the upper bound of time complexity for the mixing function
controller is reached at a = -oo and has a corresponding order 0(n(d + 1)) at each iteration.
2.7
Multi-Robot Coverage Algorithm and Simulation
Similar to the single robot coverage algorithm, the multi-robot coverage algorithm has a
hierarchy of two levels: (1) robot level and (2) waypoint level. Again, the robot level corresponds to the robots traveling along their paths, sampling and estimating the environment.
However, for the multi-robot case, the robots must now obtain and broadcast ar, Vr to
achieve a consensus estimate of the environment between all robots. This algorithm can be
executed in a distributed way by each robot. Thus, each robot can execute this algorithm
independently, while sharing just their current parameter estimate vector R. The robot level
algorithm is shown in Algorithm 3.
The multi-robot waypoint level algorithm is executed such that each waypoint acts as
an agent with its own controller. As a result of this distributed approach, each waypoint
computes its mixing function, thus requiring that each waypoint now know the state of all
other waypoints (even those not on its respective robot path). Using the parameter vector
60
Algorithm 3 Informative Path Controller for Multiple Robots in an Unknown Environment:
Robot Level for robot r
Require: 0 (q) can be parametrized as in (2.27)
Require: a > 0
Require: Waypoint dynamics are slower than the robots'
Require: The network of robots is fully connected
Require: Robot knows location of p , Vi E {1,..., nr}
1: Initially robot is moving towards p,
2: Initialize Ar and )Ar to zero
3: Initialize ir to some nonnegative value
4: loop
5:
6:
if robot reached p then
move towards pr+1 in a straight line from p
7:
else
8:
9:
10:
11:
12:
13:
move towards p in a straight line from Pr
end if
Make measurement 0 (Pr)
Obtain fij , Vr' that can communicate to r
Update ^r according to (2.48)
Update Ar and A r according to (2.42) and (2.43)
14: end loop
Ar obtained by robot consensus as an input to the control strategy, the result of the waypoint
level algorithm is that the waypoints reposition themselves into locally optimal informative
path configurations. The waypoint level algorithm is shown in Algorithm 4.
The informative path controller for multiple robots was tested in MATLAB for eleven
test cases, varying N, a, and O(q). First, we present a case for N = 2 robots, n(r)
=
22
waypoints, Vr . The same fixed-time step numerical solver is used with a time step of 0.01
seconds. The environment parameters are a = 0.3 and ptamc = 0.2. The parameters Ar, Ar
and Ar, for all r are initialized to zero. The parameters for the controller are Kj'= 70, Vi, r,
r = identity, y= 2000, W,, = 10, W = 100, and Wr = 10, Vr. Dmax is assumed to be very
large, so that lr,.(t)
= 10, Vrr', Vt.
We chose the same rich initial sweeping trajectory as in the single robot case, for all
robots, so that they have the opportunity to initially observe the entire environment. We
present results in an initial learning phase and in the path shaping phase. During the path
shaping phase, Wr = 0, Vr, and (2.41) is used to reshape the paths.
61
Algorithm 4 Informative Path Controller for Multiple Robots in an Unknown Environment:
Waypoint Level
Require: Each waypoint knows ar from Algorithm 3
Require: Each waypoint knows the position of all other waypoints [p1,...,pN]
Require: Each waypoint knows a
1: loop
2:
3:
4:
5:
Compute the value of the mixing function at the waypoint's position p
Compute Ci according to (2.30), but integrating over Q from (2.16)
Obtain all other waypoint locations [pl,...pN]
Compute u according to (2.41)
6:
Update p according to (2.5)
7: end loop
2.7.1
Learning Phase
The robots travel their path in its entirety once, measuring the environment as they travel
and using the adaptation law (2.48) to estimate the environment. Figure 2-9a shows that the
consensus error, referring to
Ir"=1
dT=1
(
-
aj) converges to zero, in accordance
with proposition (iii) from Theorem 2, indicating that all robots have the same estimate
of the environment. While this shows that each robot converges to the same estimate of
the environment, we still must ensure that this estimate is accurate. Figure 2-9b ensures
that the robots' estimates of the environment are accurate by showing that the total mean
integral position error for all robots
f
w(t)('Pr (r)) 2 dr does indeed converge to zero, in
accordance with proposition (ii) from Theorem 2. Therefore, as the robots travel their paths,
the adaptation laws cause r(q) -+ 0, Vq E
Q, Vr.
This means that all robots' trajectories
are rich enough to generate accurate estimates for all of the environment.
Using two robots, we show in simulation that as the robots travel their paths, the adaptation laws cause Or(q) -+ 0, Vq E Q, Vr. This simulated learning phase is shown in Figure 210, where the updated environment knowledge for the two robots is presented in the 2nd
column. As shown, the estimated environments converge to the real environment for sampling conducted along the initial sweeping paths. This means that all robots' trajectories
are rich enough to generate accurate estimates for all of the environment.
62
100 0800
600
400
En
0
200
0
0
50
100
150
200
250
300
350
400
450
300
350
400
450
Path Iterations
(a) Consensus Error
1
V7
1 4S-.
0
1 21 0-
OW
8
6
4
2
~0
50
100
150
200
250
Path Iterations
(b) Integral Parameter Error
Figure 2-9: The total consensus error shows that every robot converges to the same estimate
of the environment. The mean integral parameter error shows that the estimate that each
robot converges to, is indeed an accurate representation of the environment.
63
0.8
0.6
0.4
C0
0,2
0.4
0.6
0.8
1
(a) Learning iteration:
1
0.8-
0.6
0.4
0
0.2
0.4
0.6
0.8
1
(b) Learning iteration: 20
0.4
2
(c) Learning iteration: 135
_0
0.6
0.2
04
0.6
0.8
1
02
0.4
06s
0.8
1
0.4
02
00
(d) Learning iteration: 180
Figure 2-10: Multi-robot learning phase with informative path controller. 1st column: the
paths connect all the waypoints, shown as black circles, and each robot has a different color
assigned to it. The black arrow represents the robots 2nd column: the translucent environment represents the true environment and the solid environment represents the individual
estimated environments.
64
2.7.2
Path Shaping Phase
Once the robots travel through their paths once, the controller from Section 2.5.3 is activated. Figure 2-11 shows that the estimated mean waypoint position errors, where the estimated error refers to the quantity I|IM (t)J[(t) + a[ (t) 1. As shown, limt,
a[ (t)
=
+
f A(t) <(t)
0, Vi, r in accordance with proposition (i) from Theorem 2. Therefore all way-
points have converged to an equilibrium in accordance to (2.4).
0.5-
00
50
100
150
200
250
300
350
400
450
Path Iterations
Figure 2-11: The mean waypoint position error shows that the waypoints converge to the
equilibrium defined by (2.4), where there is a balance between sensing error and informative path length. The peak at 160 iterations indicates when the path shaping phase begins.
The simulated path evolution using this controller with a
-- 1 is shown in Figures 2-12a
to 2-12e. The waypoints minimize the expected variance of their sensor estimates over the
environment to create an informative path that ultimately visits no static areas in the environment.
Figure 2-13 represents additional simulations that show the final informative path configurations for two different environments with several different mixing function classes.
As opposed to previous simulations that used sensing and neighbor weight W
Wn = 10, respectively, these two new environment simulations use W
=
=
100 and
10 and Wn
=
5
for all control strategies. Therefore, a very slight emphasis is placed on sensing over path
length. Figures 2-13e - 2-13f show how well a = -10 approximates the Voronoi controller,
where the resulting informative paths are nearly identical.
65
1,
0.8-
0.6
)~'
:1
0.4
0.2
0
0.2
(a) Environment
I
0.6
0.8
1
1,
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
'10
0.4
(b) Iteration: 30
0.2
Cl-
0.4
0.6
0.8
1
0
0.2
(c) Iteration: 110
0.6
0.8
1
(d) Iteration: 220
,
.
.
.
1
0.4
0.2
0.4
0.6
0.8
0.8
0.6
0.4
0.2
0
1
(e) Iteration: 300
Figure 2-12: Multi-robot path shaping with informative path controller at a = -1.
resulting informative paths travel through only dynamic regions of the environment.
66
The
fi
0
0.5
0
0.6
0.4
0.
0,2
0. 4
0. 2C
ci
0.2
0.4
(a) a
0s
=
0.5
0.2 (0
-1
LI
0.8
0.6
0.4
0A
-0
08
01
- 0.2
(b) a= -10
0.4
0.6
0.8
I
(c) a =-oo
0.6
0.6
0.8
0.
0.4
0.4
0.2
0.2
0.2
. ...
..........
...
........
......
...
4
2
(d)
0.6
a = -1
0.8
0.2
0.4
(e)
0.6
a= -10
0.5
_0
0.2
0.4
0.6
0.8
1
(f) a = -oo
Figure 2-13: Informative path configurations for two environments with different mixing
functions. The top row represents an environment with some regions of nonzero sensory
information. The bottom row represents an environment where every region contains important sensory information. Notice that the Voronoi smoothing controller approximates
the Voronoi controller fairly well for the both environments.
2.7.3
Varying Sensing Weight Ws and Path Weight Wn
The informative paths in Figure 2-12 were given by W = 100 and W,
-
10. Given that
the ratio W8= 10, the weights are heavily skewed to encourage thorough sensing instead of
shorter path configurations. By increasing the neighbor distance weight Wn, this higher gain
will provide greater attractive forces between neighboring waypoints, causing the paths to
be shorter. Therefore, the system evolves in a way that the environment is covered with lowlength paths with less of an emphasis placed on sensing. In Figure 2-14, we compare two
weighting scenarios: (1) Wn > W, and (2) W > Wn. In scenario 1, W, is high, providing
high attractive forces between neighboring waypoints. As a result, we see more of an
emphasis placed on neighbor waypoint weights and shorter coverage paths. In scenario 2,
67
there is a low W, providing low attractive forces for neighboring waypoints. This causes
the waypoints to be more distant to each other and focus more on the coverage task, rather
than generating short paths. Depending on the application, the weights W, and Ws can be
selected to satisfy either sensing or size constraints.
0.8
0.8
0.8
0.4
0.
02
0.2
04
0.6
0.8
(a) W,, > Ws, a = -1
0
02
0.4
0.8
0.2
0.8
(b) Wn > Ws,a = -- 10
a.
0.8
0.
0.X
0.
0.4
0. 2
0.2(
0.2
(d) Ws > Wn,
a=
-1
04
M8
(e) Ws > Wn,a = -10
0.8
04
0.
0.8
1
0.8
1
(c) Wn > W,a = -oo
I
0
0.2
0.4
0.8
(f) Ws > WB,a = -oo
Figure 2-14: Multiple robot Ws vs.' W,. Top Row: W,, > Ws - shorter paths. Bottom Row:
Ws > Wn -÷ more thorough sensing.
2.7.4
Computational Complexity of Multi-Robot Algorithm
The three parameters that impact the computation time are the number of robots N, the
number of waypoints nr, and the number of grid squares in the integral computation, d.
Again, we do not need to check if a point is in a polygon, but the integrand g(a) we integrate over is linear in N -n,. The time complexity for computing a discretized integral
is linear in the number of grid squares, which is an O(d) operation. Therefore, the total
time complexity of the controller is O(d -N -n) during each iteration. As a decreases, the
68
behavior of the mixing function controller approaches Voronoi coverage, which is considerably more computationally expensive, because it requires the additional computation of
a Voronoi Partition. This implies that the upper bound of time complexity for the mixing
function controller occurs at a = -00 and has a value O(N. nr(d + 1)) at each iteration.
2.7.5
Robustness Considerations for Initial Waypoint Configurations
Because it only considers sensor estimates within each robot's localized Voronoi partition, informative path configurations computed using a Voronoi coverage controller depend
heavily on the initial relative position of a waypoint with respect to other waypoints. As
shown in Figure 2-15, a very slight variation in initial waypoint position can have a large
effect on the final informative path configuration. As a result of its ability to approximate
the Voronoi controller, the Voronoi smoothing controller also exhibits localized waypoint
sensitivities during path reconfiguration even for free parameter values of a= -3 as shown
in Figure 2-16.
Based on its convergence behavior shown in Section 2.5, the mixing function controller
can be used to relax the dependance on initial waypoint conditions by using a probabilistic
control strategy. Because it aims to minimize the variance of a collection sensor measurements of the same point of interest, instead of only considering the closest measurement
to the point of interest, the mixing function controller for free parameter value a = -1
does not exhibit extreme topological differences in informative path configurations corresponding small differences in initial waypoint positions. This correspondence is shown in
Figure 2-17.
In each of these figures, the two robots have the same amount of waypoints (n = 30),
and the waypoints of a second robot are positioned a distance of 0.001, in the unit environment, away from the corresponding waypoints of Robot 1. Intuitively, if a waypoint of
Robot 1 were to be considered the center of a circle with radius 0.001, the corresponding
waypoint positions tested for Robot 2 are on the circumference of the circle surrounding Robot l's waypoints as shown in Figure 2-18. One example of this positioning is
[p27 ... ,p 2 ]x = [pl ...
1]x+0.001
and [p2,..
] =[p... ,pj2] along the x-direction.
500 different positions of Robot 2 were considered for this specific environment simulation,
69
1
0.8
0.8
0.8
0.6-
0.6-
I
I
0.4-
0.2
0.2
0.4
0.6
0.4-
0.2
0.2
0.8
0.4
0.6
T
0.8
(a) Nearly Identical Waypoint Positions of Two (b) Nearly Identical Waypoint Positions of Two
Robots Scenario 1
Robots Scenario 2
0.8
0.6-
0.4
0.4-
0.2
0.2
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
(c) Voronoi Informative Path Separation for (d) Voronoi Informative Path Separation for
Scenario 1
Scenario 2
Figure 2-15: Due to sensing sensitivity, small changes in initial waypoint positions can
result in significantly different informative path configurations for the same environment
using a Voronoi coverage method.
each yielding the same general informative path configurations as shown in Figure 2-17.
In addition, to this environment, 47 other environments were tested, and in each case, the
minimum variance controller produced consistent paths, regardless of the presence of small
initial numeric differences in waypoint positions across multiple robots.
70
I
*lI
0.8
0.8
0.6-
0.6
I
I
0.4-
0.2
0.2
0.2
0.4
0.6
0.8
1
0
(a) Nearly Identical Waypoint Positions of Two
Robots Scenario 1
1.
0.2
0.4
0.6
0.8
1
(b) Nearly Identical Waypoint Positions of Two
Robots Scenario 2
1
0.8
0.6
0.SF
0.4
0.2
)
0.2
06
0.2
0.4
0.6
6?
0.2
0.
0.4
0.6
0.8
(c) Voronoi Approximation (a = -3) Informa- (d) Voronoi Approximation (a = -3) Informative Path Separation for Scenario 1
tive Path Separation for Scenario 2
Figure 2-16: The Voronoi smoothing controller retains the local sensitivities of the Voronoi
controller. Due to sensing sensitivity, small changes in initial waypoint positions can result
in significantly different informative path configurations for the same environment using a
Voronoi coverage method.
71
1
0.8
0.8
0.6-
0.6-
0.4
0.4-
I
0.2
0.2
0.4
0.6
1e
I
0.2
0.8
0.2
0.4
0.6
0.8
1
(a) Nearly Identical Waypoint Positions of Two (b) Nearly Identical Waypoint Positions of Two
Robots Scenario 1
Robots Scenario 2
0.8
0.8
0.6
0.6-
0O4F
0.4
0.2
0.2U
0
0.2
0.4
0.6
0 .2
0.8
0.4
0.6
0.8
(c) Minimum Variance Informative Path Separa- (d) Mimimum Variance Informative Path Sepation for Scenario 1
ration for Scenario 2
Figure 2-17: Because it aims to minimize the variance of a collection sensor measurements
of the same point of interest, the mixingfunction controller for free parameter value a = -1
does not exhibit the extreme topological differences in informative path configurations as
shown by Voronoi and Voronoi smoothing approaches.
72
x10,
A-A'Qee
.
I
0.8
Robot 2
eeeee
Waypot Tests
0.6
radius
0.4
= 0.001
0.2
0
Robot I WA faypoint
-0.2
-0.4
-0.6
-0.8
-U.U
M.0
-0.4
00
-0.2
0.4
0.6
0.8
x10,
Figure 2-18: Within a unit square, the initial relative waypoint positions of two robots were
varied 500 times for each environment according to this diagram to show that for small
changes in initial relative waypoint positions, the minimum variance controller still outputs
nearly identical informative paths.
73
74
Chapter 3
Informative Persistence Controller for
Multiple Robots using a Mixing
Function Coverage Approach
The objective of this chapter is to derive a persistent control strategy that is a combination
of an informative path controller and a speed controller. During a persistent sensing task,
robots equipped with a finite sensor radius collect information in a dynamic environment
in order to guarantee a bound on the difference between the robots' current models of the
environment O(Q,t) and the actual state of the environment #(Q,t) for all time. Due to
their limited sensing range, the robots cannot collect all the data of the environment at one
time iteration. As a result, the data of a dynamic region can become outdated, and the
robots must return to that region at a given frequency to collect a necessary amount of new
information.
Because different parts of the environment can change at different rates, the robots must
visit different areas in proportion to their rates of change to ensure a bounded uncertainty
in the estimation. Thus, we extend the previous sensory function O(q) to include a rate
of growth. This extended sensory function is referred to as an accumulation function in
the environment. The accumulation function grows where it is not covered by any of the
robots' sensors, indicating a growing demand to collect data at that location. Similarly, the
function shrinks where it is covered by any of the robots' sensors, indicating a decreasing
75
demand for data collection.
In accordance with the objective of the sensing task, we superimpose the mixing function informative path controller with the speed controller from [30], to generate an informative persistence controller. We show that this controller enables the robots to stably
complete a persistent task. This encompasses learning the environment dynamics, in the
form of growth rates of the field over the environment, and subsequently generating an informative path for a robot with a finite sensor radius to follow and sense the accumulation
function, to guarantee the height of the field bounded.
3.1
Relation to Persistent Sensing Tasks
Each robot is equipped with a sensor with a finite radius Fr(pr)
=
{q E Q :q - pr < p}.
The stability criterion for a persistent sensing task executed by multiple robots, when given
the speed profile for each robot [30], is
- ~q)
r'
where
4(q)
q
c (q) = s(q, t) < 0,
Vq | O(q) > 0,
(3.1)
is now the rate at which the accumulation function grows at point q. c(q) is
the consumption rate at which the accumulation function shrinks when robot r's sensor is
covering point q. Note that c(q) > 0 (q), Vq. Tr(t) is remaining amount of time it takes
robot r to complete the path at time t, and 'rcr(q, t) is the duration of time that robot r's sensor
covers point q along the path at time t. Tr(t) and 'r[(q, t) are directly calculated using the
speed profiles. The stability margin of the system is given as S(t)
=
- (maxq s(q, t)), where
a stable persistent task is defined as S > 0. The persistent sensing task only considers points
q that satisfy
(q) > 0, because a point with zero sensory information does not affect the
stability of the controller. Points with nonzero sensory information are called points of
interest.
In an unknown environment, robots use the estimated version of (3.1), defined as
N
#r(q, t) -
rc' q t
'c,.
'.JT1
(q) = Sr(q, t) < 0,
(t)
76
Vq | Or(q, t) > 0,
(3.2)
where c(q) and the speed profile that maximizes $r(t) [30] are both known by the robot.
This implies that robot r's estimated stability margin at time t is defined as Sr(t) =
-(maxq
sr(q, t)).
In [30], a LP calculated the speed profile for each robot's path at a time t that maximized
$r(t). Using the speed profiles obtained with this LP in conjunction with (3.2), we can
derive an informative path controller for a persistent task that causes both the robots' paths
and speed profiles to locally optimize persistent monitoring tasks. As noted in (3.2), we
assume that the maximizing speed profile for Sr(t) is known by each robot r and is used to
obtain sr(q,t), Vq, Vt.
3.2
Informative Persistence Controller
To include a stability margin, we assign the waypoints new dynamics of the form
f
=
(3.3)
I u,
where u is defined in (2.41), and
IT
1,
if
Tr
u < 0 and t - tu's > Tdweii
i
otherwise,
,
0,
(3.4)
where 'rdwen is a design parameter, and tr'7 is the most recent time at which If is unit stepped
from zero to one. This controller ensures that the estimated stability margin $r(t) is monotonically non-decreasing for all r.
Remark 5 If is a binaryfunction that is used to stop the coverage control action on the
waypoints if their reconfigurationis not beneficial to the desired task. Therefore, if the
imposed dynamics of informative path controllerdo not improve the stability margin, the
coverage controller is temporarily suspended.
Using Lyapunov theory, we now prove that the system is stable for persistent sensing
tasks.
77
Theorem 3 (Convergence Theorem for Multi-Robot Persistent Sensing) UnderAssumption 1, with waypoint dynamics specified by (3.3), control law specified by (2.41), and
adaptive law specified by (2.48), we have
1. Waypoint Equilibrium Convergence
imt+.If(t)|A f(t)ef(t) +
= 0, VrE {1,...,N}, Vi E {1,. . .,n(r)},
/i(t)l
2. Environment Estimate Error
imt+-||pr('r)=0, VrE {1,...,N}, V'r wr(r) >0,
3. Consensus Error
limt'.(a~r - ak)= ,
Vrr'E {l, ... ,)N}.
Proof 3 We prove asymptotic stability of the system to a locally optimal equilibriumusing
a Lyapunov function candidatebased on virtual energies.
Again, we define a positive-definite Lyapunov function Va based on the robots' paths
and sensoryfunction estimates and use the three criterionof Barbalat'slemma [29]:
1. Va is positive-definite
2. Va is negative-semi-definite
3. Va is bounded
to imply Va -+ 0 and the subsequent asymptotic convergence of the multi-robotsystem to a
locally optimal equilibrium.
Let Va be the Lyapunov function candidate. Va is identical to the Lyapunov candidate
from (2.51). Following the procedurefrom Section 2.7, but with pi now defined by (3.3), we
get
IT
N nr
Va =
fi
r=l i=1
N
&projrprer
J4Lnj.
(3.5)
--
(Aflr+
IrKr(r
//)
+
N
//)-y
r=1
d
r
r=1
g+
j=1
78
j
t
wr(r)(Tp, ( )) 2d
2d
d
Using the same procedure as in Section 2.7, denote the four terms in (3.5) as - 91 (t),
-7
2 (t),
-i t(t)
and - ) 4 (t), so thatVa(t) = -61(t) - 92(t) - tb (t) - 04(t). In Section 2.7,
we showed that, v%
2 (t) -+ 0 and 04 (t) -+ 0. This implies (ii) and (iii).
We still need to show that limt-+. 61 (t)
=
0. Let ' be defined such that 01
Z I 6i.
Assume that limtnoo Of (t) = 0, such that, Vt ~tj > t, E > 0, and O'(tj) > e. Let {tj
1
be
the infinite sequence of tj's separatedby more than 2 -rdwell such that Of (tj) > E. That is,
Itj - tj I >
2 rdwell,
Vj # i/, and tj,t1 ' E {tj}7 1 . Because, from [27] (Lemma 1), 'N(t) is
uniformly bounded by some value M when Ii = 1 (i.e. |tf (t) 1 5 M), and whenever Ii = 1, it
remains with this valuefor at least Tdwell, then we have that Vtj E {tj}
Ij
d
1,
(r)dr > e5 > 0,
(3.6)
, dwel} > 0.
2M
(3.7)
tjt-rwell
where
3=min{
00o
Ld
Jtj{t
+
dwell
j+~~d
>
1 }7 1
tjjit-
well
tje{tj}j0
(3.8)
65,
Y
>
'6(T)dr
1
which must be infinite, and therefore contradictsthat f7 01 ( r)d r = YIq1 fo
' ('r)d'r exists
and is bounded. Therefore, by contradiction,limt_,o 0,' (t) =0. This implies (i).
Remark 6 Theorem 3(i) stipulates that limt-4.
the persistent task. Otherwise, limt-
IAk(t)E(t)
+ i[(t) I = 0 iff this benefits
I[ (t) = 0, which stipulates that the persistent task
does not benefit if the ith waypoint in robot r's path reconfiguresits position.
3.3
Single Robot Informative Persistence Algorithm and
Simulation
The single robot informative persistence algorithm has the same two level hierarchy as the
mixing function control algorithm presented in Section 2.6. While the robot level algorithm
79
of the informative persistence algorithm remains identical to the algorithm in Section 2.6,
the waypoint level of the persistent sensing algorithm changes due to new dynamics introduced by a stability margin requirement. Now, once a waypoint uses the parameter estimate
a from the robot level algorithm to compute its dynamics in accordance to the mixing function control strategy (line 7 of Algorithm 5), it must immediately determine whether these
dynamics increase the stability margin of the persistent task. If the margin is not increased,
the dynamics are deemed not beneficial to the sensing task, and the binary function It on
line 8, temporarily suspends the coverage control action. The waypoint algorithm is shown
in Algorithm 5.
Algorithm 5 Informative Persistence Controller for a Single Robot:
Waypoint Level
Require: Each waypoint knows a from Algorithm 1
Require: Each waypoint knows a
Require: Each waypoint knows the position of all other waypoints on f
Require: Each waypoint has knowledge of S
1: Initially compute the value of S
2: loop
3:
4:
5:
6:
7:
8:
Compute the waypoint's mixingfunction
Compute Ci according to (2.30)
Obtain all other waypoint locations [pi,... Pn]
Compute ui according to (2.41)
Compute Ii according to (3.4)
Update pi according to (3.3)
9: end loop
The informative persistence controller for a single robot was tested in MATLAB. Overall, ten cases considering 10 different environments were performed. We present a case
for n = 36 waypoints and a = -1.
For the remainder of this chapter, we will continue to
examine only the minimum variance approach. The logic behind this decision is derived
from the fact that the stability margin criterion applied to Voronoi, Voronoi smoothing, and
minimum variance informative persistence strategies are identical. Therefore, the only difference between these three approaches to persistent sensing control, is the coverage control
algorithm that we previously evaluated in Section 3.3. Voronoi informative persistence was
thoroughly evaluated in [33], and we have shown that Voronoi smoothing approximates
Voronoi coverage control reasonably well.
80
In these tests, a time step of 0.01 seconds is used, and Vdwel = 0.010. The environment
parameters are a = 0.2 and ptnmc = 0.1, a(6) = 20, a(14) = 60, a(17) = 40, and a(j) = 0,
otherwise. The parameters a, A and
)L
are initialized to zero. The controller parameters
are Ki = 70, Vi, F = identity, y= 2000, W,, = 10, W = 100, w = 100, and p = 0.05. The
environment is discretized into a 10 x 10 grid and according to the control law, only points
in this grid that satisfy 0 (q) > 0 are used as points of interest in (3.2).
As in the informative path controller case, results are presented in an initial learning
phase and a path shaping phase. The path learning phase employs a rich initial sweeping
path so that the robots will be able to measure and sample the entire environment. The path
shaping phase begins immediately following the learning phase and drives the waypoints
to reconfigure the robot's path into an informative persistence path according to both the
mixing function coverage controller and stability margin.
3.3.1
Learning Phase
Initially, the robot travels its static path in its entirety, measuring and using the adaptation
law (2.48) as it travels to estimate the environment. As the robot travels its path trajectory,
its adaptation law causes its estimate of the environment to converge to the real environment, O(q) -+ 0, Vq E
Q.
This result is shown in Figure 3-la, where limt, f' wtr) (Pr))
in accordance with proposition (ii) from Theorem 3. To complete the convergence criteria
for the learning phase, Figure 3-lb shows that the Lyapunov function candidate Va is monotonically non-increasing, as required by Barbalat's Lemma. The learning phase simulation
as seen in Figure 3-2, shows that the robot uses a rich initial sweeping trajectory to initially observe the entire environment. As the robot travels its path trajectory, its adaptation
law causes its estimate of the environment to converge to the real environment, 0 (q) -+ 0,
Vq E Q.
81
2dr
=0,
: I
1
1
I
20k
L-
0
1! 5-
V-
E
5
0
50
100
150
200
250
Path Iterations
300
350
400
450
(a) Integral Parameter Error
30C
25( 0
)0
0 20C
> 150 0-0
'0
-
3, 100 0
500-
S
50
100
150
200
250
Path Iterations
300
350
400
450
(b) Lyapunov Function
Figure 3-1: Mean integral parameter error and Lyapunov function candidate of the informative persistence controller for a single robot. The parameter error shows that the robot's
estimate of the environment converges to the actual environment, while a positive-definite
Lyapunov function was required by Barabalat's Lemma
82
0.E
04
0.2
0.2
0.4
0.6
1
0.8
1
(a) Learning iteration:
0. 8
0. 6
0.4
S
0. 2
0.2
0.4
0.6
0.8
1
(b) Learning iteration: 70
1C
4%77
I
0.8k
0.6
0.4
0.2
-c
0.2
0.4
0.6
0.8
1
(c) Learning iteration: 210
7'
0.8
0.6
0.4
0.2j
0
0.2
0.4
0.6
0.8
1
(d) Learning iteration: 300
Figure 3-2: Single robot learning phase with informative persistence controller. 1st column:
the path connects all the waypoints. The points of interest are shown as green regions and
the black arrow represents the robot. The robot's sensing radius is represented by the blue
circle around its position. 2nd column: the translucent environment represents the true
environment and the solid environment represents the estimated environment.
83
3.3.2
Path Shaping Phase
Figure 3-3 shows the mean waypoint errors for persistent sensing, or the mean of the quantity Ii(t) ||MI'(t)Ji(t)+ ai(t)f1. limt÷ Ii(t)|A$(t)ei(t) + ai(t)| = 0, in accordance to proposition (i) from Theorem 3. This implies that the waypoints reach the equilibrium defined
by the informative path controller. Most importantly for task stabilization and system dynamics, Figure 3-4 shows the temporal evolution of the persistent task's stability margin.
Both estimated and true stability margins are shown. The true plot provides ground-truth,
showing that the robot's estimates were a good representation of true values. Because the
persistent sensing task is initially unstable, the stability margin starts off with a negative
value, and then increases with t, implying that the path is reconfiguring to stabilize the
persistent task. Note that the stability margin is positive at the conclusion of the simulation. The path evolution using this controller is shown in Figure 3-5. The robot learns
the regions of the points of interest in the learning phase and then reconfigures its path to
encompass these regions. At 300 iterations (Figure 3-5f), the path locally optimizes the
persistent sensing task.
1.5
0
0
~u0.5-
0
50
100
150
200
250
300
350
400
450
Path Iterations
Figure 3-3: The mean waypoint position error converges to zero. This implies that the
waypoints are in equilibrium as defined by the informative persistence controller
84
I
0
3
-0.5
-
Cn
Estimated
-True
0
200
300
Path Shaping Iterations
100
Figure 3-4: The stability margin for the persistent task is positive, thus implying the task is
stabilized.
I
0.8
0.8
0.8
0.6
0.6
0.6
0.4t
0.4
0.2
0.2
0.4[
rk
0!2
0.4
0.6
(a) Iteration:
0.8
0
"
0.2'
1
1
0.2
0.8.
0.6
0.6
0.4
0.2
0.2-
0.2
0.4
0.8
(d) Iteration: 30
0.6
0.8
0
0.2
(b) Iteration: 5
0.8
_0
0.4
0.8
0
02
0.6
0.8
(c) Iteration: 20
A
0.4
0.4
0.81
0.6
0.4
0.2
0. 6
(e) Iteration: 80
0.8
0
0.2
0.4
0.6
0.8
(f) Iteration: 300
Figure 3-5: Single robot path shaping phase with informative persistence controller. The
path connects all the waypoints, shown as black circles. The points of interest are shown
as green regions. The black arrow represents the robot, and its sensor radius is represented
by the blue circle around the robot's position.
85
3.3.3
Single Robot Persistence Simulation Discussion
The persistent informative controller is nearly identical to the informative path controller
(2.13) with the exception of a stability margin switching function Ii in the waypoint's dynamics. In some scenarios, the informative path that was generated by the controller in
Section 3.2 was very similar to the informative path generated by the controller from Section 2.7. On others, the informative paths from both controllers were very different. This
is due to the additional restriction of the non-decreasing stability margin. The controller
weights W and W, can also impact how the controllers behave and should be tuned to
achieve the desired behavior.
3.4
Multi-Robot Informative Persistence Algorithm and
Simulation
The multi-robot informative persistence algorithm uses the same two level heirarchy as in
Sections 2.7 and 3.3. Again, the robot level algorithm does not change from the algorithm
presented in Section 2.7. However, the waypoint level informative persistence algorithm
dynamics must be updated to account for the stability margin switching variable If. Once
the waypoints receive the parameter estimate of the environment fr from the robot level
algorithm, they then compute the mixing function dynamics and determine whether they
increase the stability margin. If the margin is not increased, control action is temporarily
suspended by If. These changes are seen in Algorithm 6.
We test the mixingfunction controller for persistent tasks using multiple robots in MATLAB. Overall, we tested 12 cases, each with a different environment. We present a case
for N = 2 robots, a = -1,
and nr = 22 waypoints, Vr. A fixed-time step numerical solver
is used with a time step of 0.01 seconds and
Vdwel
= 0.01. The region
Q is
taken to be
the unit square. The sensory function 0 (q) is parametrized as a Gaussian network, with B
defined in (2.27), a -= 0.2 and pt., = 0.2. The parameter vector a is defined as a(j) = 60,
for j E {8, 13, 19}, and a(j) = 0 otherwise. The environment's sensory function (growth
rates) created with these parameters can be seen in Figure 3-9.
86
Algorithm 6 Mixing Function Informative Persistence Controller for Multiple Robots:
Waypoint Level
Require: Each waypoint knows fr from Algorithm 3
Require: Each waypoint knows a
Require: Each waypoint knows the location of all its neighboring waypoints
Require: Each waypoint has knowledge of Sr
1: Initially compute the value of $r
2: loop
3:
4:
5:
Compute the waypoint's mixing function
Compute C according to (2.30), but integrating over Q from (2.16)
Obtain all other waypoint locations [p1,... p]
6:
Compute un according to (2.41)
Compute If according to (3.4)
Update p according to (3.3)
8:
9: end loop
7:
The parameters ir, Ar and Ar, for all r are initialized to zero. The controller parameters
are IF = identity, y= 2000, Wn = 10, W = 100, w = 100 and p = 0.05. The two robots
were assumed to be fully connected so that lrr. (t)(= 20, Vr, r', Vt. The environment is
discretized into a 10 x 10 grid. Again, only points in this grid with or(q) >0 are considered
as points of interest.
The environment's accumulation function grows at a growth rate
4 (q)
at point q and is
consumed by robot r at a consumption rate of cr(q) = 10 if q E Fr(pr(t)). This accumulation function is represented by the green regions in Figure 3-9. The goal of this simulation
is to guarantee the size of this region is bounded.
We present results from the initial learning phase and from the path shaping phase as
we did in Section 2.6. In the path shaping phase, the new dynamics (3.3) induced by the
stability margin, are used to reconfigure the robots' paths.
3.4.1
Learning Phase
In the learning phase, the adaptation laws drive o,(q) -+ 0, Vq E
Q,
Vr. This implies that
the robots' estimates of the environment converge to the actual environment and that the
robots' initial trajectories were robust enough to accurately estimate the entire environment.
Figure 3-6 shows that the mean over all robots of ft Wr
87
()(,r())
2 dr converges
to zero, as
suggested by proposition (ii) from Theorem 3. Figure 3-8 shows that the consensus error,
referring to
iT E 1 /(ar - arl) converges to zero, which corresponds to proposition (iii) from Theorem 3. To complete the experimental validation of the convergence
criteria for the learning phase, Figure 3-7 shows that the Lyapunov function candidate Va
is monotonically non-increasing and that infimum(Va > 0).
UI
I
I
0
1
1
4CL 34-J
r.
2
0
100
200
300
Path Iterations
400
500
Figure 3-6: The mean integral parameter error shows that the robots' estimate of the environment converges to the actual environment at the end of the learning phase.
88
600
600C
I
I
I
I
100
200
300
I
I
I
400
500
500C
0
400C
0
300C
200C
1 00C
I.-
0
I
600
Path Iterations
Figure 3-7: Lyapunov-like function in learning phase is positive-definite and bounded as
required by Barbalat's Lemma.
20C 'II
I
I
I
I
I
I
I
100
200
300
400
I
1 501
100
0
U
50
0
-50
-100'
C
500
600
Path Iterations
Figure 3-8: We showed that the robots' total estimate of the environment converges to the
actual environment. The consensus error shows that each robot's estimate of the environment is identical at the conclusion of the learning phase.
89
The learning phase simulation is shown in Figure 3-9, where each robot has a rich
trajectory over the environment so that they can sample and learn the entire environment.
0.8
1.6
0
.4
0
.2
0
0.2
0.4
0.6
0.8
1
(a) Learning iteration: 1
0 1
0 .6
0 .4
0.
0
0.2
0.
0.6
0.8
1
(b) Learning iteration: 25
0.8
L
0.6
041
0:I~
_0
0.2
0.4
0.6
08
1
(c) Learning iteration: 100
4
I
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
(d) Learning iteration: 200
Figure 3-9: Multi-robot learning phase with informative persistence controller. The paths
travel through the points of interest shown as green regions. The robots' sensing radius
are circles around the robots' positions. The translucent environment represents the true
environment, and the solid environment represents the estimated environment.
90
3.4.2
Path Shaping Phase
In this phase K[ = 70, Vi, r. Upon the completion of the learning phase, the controller from
Section 3.3 is activated. Figure 3-10a shows that as the paths reconfigure, the quantity
Ir(t) IIf[(t)6r(t) + af(t) converges to zero, Vi, r in accordance with proposition (i) from
Theorem 3, implying that the waypoints reach the coverage equilibrium defined by the
informative persistence control strategy. To complete the convergence criteria, Figure 310b shows that the persistent task's stability margin increases with t as the robots' paths
approach informative paths for persistent sensing. Thus the persistent task is stabilized.
20
0.
0
*0. 60
S0. 40. 2
n1I
_0
100
400
300
200
500
600
Path Iterations
(a) Mean Waypoint Position Error Multiple Robots
-
0
-0.5
.c
MW
-Estimated
-1!-ru
0
-True
100
200
Path Shaping Iterations
300
(b) Persistent Task Stability Margin for Multiple Robots
Figure 3-10: The mean waypoint position error and stability margin for multiple robots
show that the waypoints converge to sensing equilibrium and the persistent task is stabilized.
91
The path evolution using the dynamics of this controller are shown in Figures 3-1 la to
3-1If.
0.8
0.6
0.6
0.6
0.4
0.4
0.2
0
0.2
0.4
0.6
(a) Iteration:
0
0.8
0.2
1
0.4
0.6
0.8
(b) Iteration: 10
1,
0.8
0.8
0.6-
0.6
0 .4
0.4
0 .2-
0.0f
0.2
0.4
0.6
0.8
0
0.2
(c) Iteration: 30
0. 8,
0.8
0. 6
0.6
0. 4
0.4
0.2
0.2
0.2
0.4
06
0.6
0.8
1
(d) Iteration: 60
1
0
0.4
.
0.2
0.8
(e) Iteration: 100
04
.
.
0
9
0.6
0.8
(f) Iteration: 300
Figure 3-11: Multi-robot path shaping phase with informative persistence controller. The
paths connect all the corresponding waypoints, shown as black circles. The points of interest are shown as green regions. The black arrows represent the robots, and their sensor
radii are represented by the colored circles around the robots' positions.
92
3.4.3
Multi-Robot Persistence Simulation Discussion
This informative persistence controller is identical to the informative path controller from
Section 2.3 while path shaping is required by the persistent sensing task. When the reconfiguration of waypoints based on the new dynamics is not required by the persistent
task, control action switches according to the stability margin and does not allow the waypoints to move. By including a step variable (3.4) in the informative persistent sensing
Algorithm 6 to restrict a non-decreasing stability margin, we can prevent the paths from
shaping the same way that they would by using the mixing function informative path controller given by Algorithm 4. The controller weights W, and Wn can also impact how the
controllers behave, and should be tuned to achieve the desired behavior.
When performing persistence sensing, each robot uses the speed profile it precomputed
to cooperatively stabilize the task. When using a speed profile, it is important that the robots
first undergo a learning phase so that they can use the consensus term in their adaptive laws
to globally converge to the same environment estimates. Using this approach, the robots
are able to work together during the path reshaping phase to cooperatively stabilize the
persistent sensing task. If the robots' estimates of the environment were to be different,
then the speed profiles of each robot would not globally stabilize the persistent sensing
task.
In this chapter we have discussed how multiple robots shape their paths into informative
paths that are useful for persistent sensing. In the following chapter we show how informative paths and persistent sensing can be used together to develop a dynamic patrolling
policy to optimize a fleet of service vehicles operating in an urban environment.
93
94
Chapter 4
Dynamic Patrolling Policy Using an
Informative Path Controller
4.1
Motivation
In this section, we apply the mixing function control strategy to a transportation problem.
We are interested in generating informative paths along which an existing fleet of service
vehicles can patrol to increase their efficiency of servicing instantaneous customer demand.
It is our vision that once this solution is implemented, service vehicles such as taxis, can
become viable platforms for Mobility-on-Demand (MOD).
The primary task for many current MOD systems is to propose vehicle allocation solutions that compliments public transportation by optimizing metrics that make the route
to and from a public transportation station more efficient in time, distance, and fuel economy [3]. The objective of our informative path application is to minimize the waiting time
of the passengers and the amount of time the vehicles in the system drive empty. In our
previous work [22], we showed how autonomous driving can be used to mitigate the rebalancing problem current MOD systems face. We consider the task allocation problem in
a MOD scenario. In MOD transportation, we assume historical knowledge of passenger
arrival at discrete sets of locations.
We present a dynamic patrolling policy that allocates vehicles to pickup and delivery
tasks. Using historical arrival distributions, we compute patrolling loops that minimize the
95
distance driven by the vehicles to get to the next request. These loops are used to redistribute the vehicles along stationary virtual taxi stand locations on the loop. The algorithm
was trained using one month of data from a fleet of 16,000 taxis. We compare the policy
computed by our algorithm against a greedy policy, as well as against the ground truth
redistribution of taxis observed on the same dates and show an improvement with respect
to three key evaluation criteria: (1) minimizing the number of vehicles in the system, (2)
quality of service, and (3) distance traveled empty. We show that our policy is robust by
evaluating it on previous unseen test data.
The main contributions of this policy:
" patrolling loop and redistribution model of an unmanaged fleet operation using historical data,
* provably stable dynamic redistribution policy for a large number of vehicles using
informative paths,
" centralized scheduling algorithm for request allocation and vehicle redistribution,
* large-scale simulations and evaluations using real data from a fleet of 16,000 vehicles.
4.2
Problem Formulation
We consider a pickup and delivery problem (PDP) in a convex bounded planar area
QC
R 2 . This area or service environment is subject to incident request arrivals at continuous
points q E
Q.
The environment is patrolled by N vehicles that drive along closed loops
at constant speed. For simplicity we normalize speed to 1, so that time and distance are
equivalent within our model. Vehicles are assigned to service requests by a centralized
server, which is assumed to know the locations of all vehicles at any time. A vehicle vi
that has been assigned a request qj will travel in a straight line to qj to pick up the request,
and then deliver it to its destination si E
Q.
We assume a continuous time model, i.e. time
t E R>O. Requests arrive according to a Poisson process with an arrival rate A and are
96
distributed throughout the region according to a historically derived arrival distribution Za.
The destination of incident requests is determined by a destination distribution Zd.
4.2.1
Using Informative Paths
In this work we use informative loops generated by a Voronoi informative path controller
[32] and extend them to a pickup and delivery problem. We only consider the Voronoi
control strategy for this application, because it only requires local robot state information
(Voronoi neighbors) to compute an informative path solution. This localized sampling
approach is similar to how a taxi or other service vehicle would determine which regions
of an environment to patrol.
We observe that the regions of dynamic change are analogous to regions of pickup
demand, and the act of sampling to reduce uncertainty in coverage is analogous to the act
of picking up and delivering incident requests. The difference is that in PDP problems we
have discrete rather than continuous events. Also, delivery is different to sampling in that
the vehicle has to deliver the request and return to the informative path.
In our model, the resulting informative path is a patrolling loop whose route is locally
optimized such that it traverses along or very near the areas where pickup requests originate. By utilizing waypoints along a patrolling loop, informative paths can be visualized
as method to locally optimize the location of virtual taxi stands across the environment.
The goal is to compute the path and placement of the patrolling loops so that the distance
from the patrolling loop to the requests is optimized. A mathematical description of this
algorithm for multiple agents follows.
Multi-Agent Controller Extension
There are X e R>O agents identified by r E { 1, ...
,X}
in a convex environment Q c R 2
.
4.2.2
In this derivation, note that the number of agents represents the number of patrolling loops
that are computed by our patrolling policy. A point in
Q is denoted q. Agent r is positioned
at pr E Q and travels along its closed path r : [0,1] _ R 2 , consisting of a finite number
n of waypoints. The ith waypoint on
4r is
located at p , i E { 1,... , n}. Define a vector
97
P E gX.n C Rdim(P) as the vector obtained by making an array of the agents' waypoint
positions, P ={p ,...,pr} , where jXn is the state space of the waypoints for all agents.
Note that the controller for the single agent is derived by setting X = 1.
Let V[ be a Voronoi partition of
Q,
for the ith waypoint position in agent r's path.
Agents can compute the Voronoi partitions based on their waypoint positions. Because
each path is closed, fr(0)
=
fr(1), and each waypoint i along the path has a corresponding
previous waypoint i - 1 and next waypoint i + 1. An agent travels between sequential
waypoints in a straight line.
The sensory function O(q) in this scenario is updated every fifteen minutes over the
course of 24 hours in order to reflect the change in sensory information generated by customer requests throughout the day. The agent knows 0 (q); however, it is equipped with a
sensor with sensing radius p, to make a point measurement of 0 (Pr) at its position Pr SO
that it ensures the stability criterion for the persistent task [30].
We define the collection of informative paths for our multi-agent system as the set of
waypoint locations for each agent that locally minimize the Voronoi coverage cost function
given by (2.17). We proved the convergence of this coverage controller in Section 2.5.
4.2.3
Operational Stability
In addition to controller stability proven in Section 2.5, a necessary condition for a functional deployment of any fleet of service vehicles is operational stability. Informally, we
understand operational stability to mean the condition whereby the number of outstanding
requests remains bounded in steady state. Formally, we define operational stability as the
condition
j (t) dt < k1, V t > 0, k < oo .
(4.1)
To motivate this requirement, consider events arriving into a queue according to a standard Poisson process with rate parameter X. Then the integral gives us the total number
of arrivals from the beginning of time until the current time t. If the events are also being
serviced at a sufficient rate according to some other process then the number of events in
98
the queue will be less than some constant times the rate parameter for any time window.
The service rate p(t) is defined as the rate at which incident customer requests are
being serviced by vehicles. In steady state, the stability requirement in (4.1) is satisfied by
the simplified expression pi(t) > I (t), where A(t) and y(t) denote the average arrival and
service rates, respectively, over the open interval (0, o).
Lemma 1 Let C be a closed curve in Euclidean space,
{wl,w2,
... , wn,
Q C R 2,composed of n waypoints,
that are connected by straightlines. Let x and y be any two points on C.
Then d(x,y)
, where L(C) is the arc length of C and d(x,y) is the Euclideandistance
between x and y.
Proof 4 The arc length of C is given by L(C)
=
d(wi, wi + 1), where waypoint wn+1 =
w1, and d(wi, wi+1) is the Euclidean distance between consecutive waypoints. Let A and B
Without loss of generality, assume L(A)
L(B), which implies thatL(A) <
Assume for the sake of contradiction, d(x,y) ;>
.
be unique segments of curve C that connectpointx to point y, such that L(A) + L(B) = L(C).
. Because d(x,y) is by definition
the minimum distance between any two points in Q, there exists no path P E Q, such that
P<
.) However, L( A) K
,thus giving the contradiction.
Theorem 4 (Steady State Stability) An informativepath service policy gives Y (t) > A (t)
if and only ifX >
)
+ p + 2v/2l) /v, where X is the number of service vehicles, p is
the sensor radius of a service vehicle, 1 is the dimension of the square environment Q, v is
Proof 5
-X > X(
+p+2VD)1v=>Xv(c +p+221)>X. Let a
v/(
+
the constantspeed of the service vehicle.
p + 2V2-l), which has units s-1, assuming the time step z of the experiment is Is. Using
the convergence stability of the informativepath controllerfrom Sections ?? and 4.2.3, no
incident customer request is locatedfurther than p from the informativepath. By Lemma 1,
sup{d : d = ||x- y||, Vx,y E
Q}
=
L(C)/2. Thus, the maximum distance any vehicle on the
path is from an incident customer request located at q is L(C)/2 + p. Because
Q is a
square environment, the maximum distance it takes a service vehicle to drive the customer
to its destination, and return to the informative path is 2v/l. Thus the maximum distance
99
traveled by a service vehicle whose trip originates on the informative path to service a
request is L(C)/2 + p + 2V2-1. Therefore a is now the smallest rate at which a vehicle can
service a request, and hence X a is the smallest y (t) that satisfies y (t) > ;L (t).
(=>) By the contrapositve law, X < X (L(f) + p + 2x/2l) /v =- Xv/ (L(C) + p + 2V 1) <
.k. Again, by letting a denote v/(L(C)2 + p + 2v/l), the service ratefor a single service
vehicle, we have X a < A, which implies y (t) < X (t).
This result shows that our approach to task allocation for PDP in MOD systems is
stable. Next we present a patrolling policy for the delivery vehicles.
4.3
Dynamic Patrolling Policy
Our case study for this work is a PDP in a MOD system and uses real data provided by
a fleet of 16,000 taxis in Singapore. We will refer to vehicles as taxis for the rest of the
thesis. Collectively, these taxis deliver approximately 340,000 trips per day. We illustrate
the operation of our algorithm for the Central Business District (CBD) and extend it for
the entire island of Singapore. We evaluate how effective our solution is at minimizing the
amount of time taxis drive empty by comparing against a greedy policy as well as what
actual taxi drivers do based on historical data.
Algorithm 7 Informative Path Controller Pseudocode
Parameters: arrival distribution Za, patrol loop f, waypoints W = {wI, w2,
cated at P = {Pi,P2,--.,Pn}, vector of taxis Se {Ss2, ... , Sm}
1: loop
2:
3:
4:
Compute neighbor waypoint locations pf _1, P
+
5:
while (dH/dPf > 0) do
Compute Voronoi partition Vj from Za
Compute centroid Ct by integrating over V,
6:
7:
8:
Compute control input uw according to (2.13)
Update waypoint position pf according to
=u
end while
9:
Check for incoming requests R = {ri , r2,..., rk}
Assign requests Re C R within UVj to f
10:
11:
12:
Assign nearest taxis S* C Se to requests Re
Rebalance remaining Se \ S* taxis s.t.
1>(wi) = 0
13: end loop
100
... , wn}
lo-
4.3.1
Computational Complexity
For the dynamic patrolling policy, at each iteration the controller must compute the Voronoi
cell for each waypoint and the spatial integrals over the region. Thus the parameters affecting the computation time are the number of agents X, the number of waypoints n, and the
number of grid squares in the integral computation m.
A decentralized algorithm for a single agent to compute its Voronoi cell [11] runs in
0(n) time. The time complexity for computing a discretized integral is linear in the number
of grid squares, and at each grid square requires a check if the center point is within the
Voronoi cell, which is 0(n). Therefore, the time complexity of the integral is in 0(nm).
If the Voronoi cell is computed first, followed by the discretized integral, the total time
complexity is 0(n(m + 1)) at each step of the control loop. Therefore, in the multi-agent
case a single Voronoi controller has time complexity O(X -n(m + 1)).
4.3.2
Solution Outline
The service region is subject to incident customer requests located at points q E Q and is
patrolled by N taxis whose task is to service these requests in a manner that will minimize
the distance driven to every request. Requests arrive at a rate
),,
representing the sensory
function 0 (q). Each patrol loop is defined by a fixed number of waypoints whose positions
are using historical customer request distributions. Our patrolling policy is adaptive in time
and benefits from a finer discretization time period. In this work we use 15-minute time
periods to compute 96 patrol loops for simulations over a 24-hour period (Figures 4-ib,
4-1c).
The simplest scheduling and path planning protocols are used to ensure that performance is attributable to the patrolling policy only. We emphasize that neither of these
elements are crucial to the operation of the patrolling policy. In Section 4.5.1 we present
control experiments that evaluate our policy against another with identical scheduling and
path planning, but using greedy redistribution.
101
1396129
128
1
03
-
1.25
106
1037
1076
103.
103.86
103.8
10 3 .8 105
(c) Singapore loops
Figure 4-1: Figure 4-la shows a surface plot of an example arrival distribution Za for the CBD, overlayed
on a map of the region; the destination distribution Z4 has a similar format. Figures 4-lb-4-lc show the
temporal progression of the patrol loops, overlayed on longitude/latitude plots of the GPS coordinates that
form the service region. Patrol loops are shown changing dynamically in 15-minute time periods throughout
the day (for a total of 96 iterations for each loop), with a darker shade indicating the most recent configuration.
4.3.3
Algorithm Description
Algorithm 7 contains the pseudocode for the informative path patrolling policy. The first
stage of the patrolling algorithm describes how the patrol loop waypoints reposition themselves in locally optimal locations. The algorithm calculates the Voronoi region for each
waypoint, computes the centroid of the region based on Za, and subsequently repositions
the waypoints based on (2.17). This algorithm can be implemented in a distributed way
such that it can be computed for each waypoint independently, only sharing information
withtheir neighboring waypoints and enough information for all waypoints to compute
their Voronoi regions.
102
Once the waypoints have converged to a locally optimum configuration, the second
stage of the algorithm assigns requests to taxis and rebalances taxis within each patrol
loop. Each taxi is initially assigned a home patrol loop in round robin manner. Taxis cycle
through four successive modes of operation: FREE, ONCALL, POB ("passenger on board")
and RETURN. The service model assumes a dispatch center that controls all incoming requests. Scheduling is performed by matching incoming requests with the nearest available
taxis. Once assigned a request, the taxi picks up the customer (ONCALL), delivers them to
their destination (POB), and returns to its home loop (RETURN).
Taxis flagged FREE use an intra-loop redistribution policy that is analogous to flow
equilibrium at waypoints. Taxis patrol around their home loop until such time that every waypoint is serviced by a taxi. Thereafter, taxis remain stationed at their waypoints
(treating them as virtual taxi stands), with any remainder of taxis (modulo the number of
waypoints) continuing to patrol around the loop. This ensures that all waypoints receive
equal service in steady state, while also ensuring that taxis do not waste fuel unnecessarily.
Assuming we have Ne taxis and n waypoints along loop t, we require that the net flux (rate
of inflow and outflow) D of taxis at waypoints converge to zero over the entire loop, i.e.
EZ
C=
i =0. The following scenarios serve as the basis for all possible taxi dynamics along
loop f:
1. Ne < n: each taxi continuously patrols along the loop.
2. NE = n: taxis redistribute to the nearest waypoint until there are Ne/n taxis at each
waypoint, and remain stationary queueing for a customer request.
3. N > n: Ne - (Ne mod n) taxis queue at their respective waypoints waiting for a
customer, while the remaining N mod n taxis continue to patrol around the loop
ensuring that XL 14; = 0.
Our service policy ensures that each patrolling loop is in a locally optimal configuration
within the environment while bounding the number of outstanding customer requests. In
this work we do not allow taxis to exchange home loops, as this could lead to scenarios
whereby a patrolling loop loses all of its taxis to neighboring loops. We solve this by
ensuring that requests are assigned to the loop whose Voronoi region they originate.
103
4.4
Modeling Historical Data
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
01
02
03
0.4
0.6
0.6
0.7
0.8
0.9
(a) CBD simulation
0.9
(b) Singapore simulation
(c) Singapore simulation (overlayed on Za)
Figure 4-2: Figures 4-2a-4-2c show screenshots of the simulator in action. Taxis are indicated by a colored
triangle. A taxi can be in one of four states: traveling along patrol loop (FREE, black), servicing a pickup
request (ON CALL, yellow), driving a passenger to their destination (POB, red or cyan), returning to the
patrol
loop (RETURN, blue). Pending requests are shown with a yellow 0, and outstanding requests are shown with
a circled red 0.
104
We use data collected by a fleet of 16,000 taxis in Singapore. The dataset is one month
(August 2010) of trips, consisting of millions of data points at thousands of GPS locations.
Each entry records time, location, ID, etc., as well as the status (FREE, ONCALL, POB,
etc.). The data serves several purposes. First, we use a subset of the data to train our
dynamic patrolling algorithm. Second, we use two subsets of the dataset as test data for
conducting simulations. We use the same day (Monday, August 16) for real-time data
simulations which do not require training and the same day of the following week (Monday,
August 23) for unseen data simulations. Finally, we use the data to quantify ground truth
redistribution of taxis in Singapore. Because the actual taxi operation is unmanaged, there
is no direct comparison against an existing policy. Instead, we analyze the distribution of
the fleet throughout the day and record statistics such as odometry, status of operation, etc.,
that can be used as quantifiable metrics in our analysis.
4.4.1
Arrival and Destination Distributions
Training the policy and conducting simulations both require knowledge of customer arrivals and destinations. Historical data is used to compute spatial arrival and destination
distributionsuifaces, denoted by Za and Z,, respectively. The region is discretized into a
50 x 50 grid, with the height of the surface at each location representing the probability of
either a customer arrival (a) or request destination (3). We use a 15-minute discretization
to construct the surfaces. Figure 4-la shows an example of the arrival surface Za.
Data sparsity is almost always an issue in statistical modeling. Considering that the
one month dataset spans a 50 x 50 x 96 x 31 space, we see that even a large amount of
data will be very sparse. Two stages of smoothing were used to improve performance of
our model. First, each 24-hour dataset was smoothed temporally using a simple averaging
filter to reduce noise caused by temporal discretization. The resulting surfaces for each
time window were then normalized and smoothed using a Gaussian filter to reduce noise
caused by the 50 x 50 spatial discretization.
105
4.5
Experiments
A simulation framework was implemented in MATLAB. The model implements the spatial
PDP formulation presented in Section 4.2. Customer requests arrive in a Poisson process
with rate parameter X (t) and are distributed according to Za(t). Taxis traverse the space
in straight lines and at constant speed. As the simulation evolves, customers are serviced
by N(t) taxis that respond to the incoming pickup requests. Customer destinations are distributed according to the destination distribution surface Z5(t). Figure 4-2 shows annotated
screenshots of a typical simulation.
Our simulation engine can incorporate any path planning mechanism that maps the
locomotion of the taxi onto the road network. This additional complexity was omitted in
this work in order to evaluate the effect of the informative path policy in isolation. Because
the recorded data from the Singapore taxi fleet use the underlying road network, there is
clearly a cost associated with assuming a straight line path planner. We evaluate this cost
by leveraging Google's extensive geocoding API. For a given pair of coordinates pI and
P2 we can calculate the driving distance from p, to P2, taking into account the time, day,
road conditions and road directionality. Generally, distances recorded in simulation incur a
cost function
/ (pI,P2 , t) = Loi, P2
/d(piP2, t)drive by which the recorded distance scales
the true driving distance. 2,500 Monte Carlo simulations were carried out to approximate
1
under steady-state conditions by sampling points and computing the driving distance.
The average value was calculated to be 8 = 1.974. Thus on average a taxi would drive
approximately twice the distance quoted by the simulation if it had made the same journey
along the Singapore road network.
Three different types of simulation were conducted: (1) greedy policy simulations establish the benefit of the patrolling policy as a benchmark against a simplistic redistribution
strategy, (2) ground truth simulations, which use the same number of taxis N that were
recorded for a corresponding scenario from historical data, and (3) stability-based simulations, which aim to find the minimum number of taxis Nmin that ensure a k-tight stability
guarantee defined by (4.1). A stability margin of k = 0.01 (i.e. 1 percent) was chosen for
our experiments.
106
1.6
I
1.4
-Greedy pokiy
-.
-
0.s
0.0
I
*
~
~
e
90.7
it
u~
Mil -
Greedy poicy
poliy (5 loops)
Patroling polcy (10 loops)
Paolig pocy (10 loops)
Paroing
-+
1.2
I'llie
-
- Patrolling policy (25loops)
9-e- 0los
Wit,
01
~
0.8
0.0
V,
0.4
0.6
T0.3
0.4
0.2
-I
0.1
16
(r6
32
4
Time period (15 min ep0011)
so
so
64
7
so
so
1
2
3
4
Time perled (4 hourepooh)
(a) CBD average on-call distance per trip (patrolling (b) Singapore average on-call distance per trip (papolicy vs greedy policy)
trolling policy vs greedy policy)
4.5
Iin
*IAug
4
2r,
Aug 23(model)
- - -Aug23 hisoric*l
18 (model)I
1.8
3.5
1.4
3
1.2
2.5
Aug 23 (model)
J1
lift
2
tol
*
---
0.8
lof
Aug 1 (Modell)
AugG(historical)
0.6
FVWW 1
1
not
')'U"
15
0.2
imm
.1,
32
48
rose period (min)
04
916
80
1
2
3
4
Time period (min)
5
6
(c) CBD average on-call distance per trip (patrolling (d) Singapore average on-call distance per trip (papolicy vs historical data)
trolling policy vs historical data)
0.9
- --
0.8
Aug 23 (model)
Aug 23 (htsiorlorl)
Aug 16 (model)
Aug 1 (heloorel.
Aug 23 (model)
Au 23 (helro)O
I (mOdMl)
3
Aug
:
A.g 1: (historical)
0
Q
%
0.7
0.6
\
/I~~
0~~
2
it-I
0.5
I'
-
li
1
0.4
6
1.5
I
0.2
I
I
I
el 6}l
02
'.~'
*1
I,
6~
0.1
n
I
I
1
18
32
48
Time period (010)
64
80
T
9
3
(e) CBD average on-call distance per taxi (patrolling (f) Singapore average on-call distance per taxi (papolicy vs historical data)
trolling policy vs historical data)
Figure 4-3: Single-loop (CBD) simulation results and Multi-loop (Singapore-wide) simulation results
4.5.1
Greedy Policy Experiments
We first show the utility of our policy by comparing it against a simplistic patrolling policy
implemented in the same simulation framework. This "like to like" comparison is important
107
for any simulation-based work if the assumptions made in the simulation engine can affect
the generality of the results. Further, these simulations serve as control experiments: if a
greedy policy performs poorly then we can conclude that the improvement was due to the
informative path redistribution (the only difference between the two policies) and not due
to path-planning or scheduling simplifications, which are trivial by comparison. Greedy
policy simulations were carried out for the CBD, and for the whole of Singapore. Different
numbers of loops (5,10,15,20,25) were used for multi-loop experiments as a benchmark for
the best number of loops to use in the main experiments.
4.5.2
Single Loop Experiments
The CBD was chosen for the single-loop experiments because (1) it has a high volume of
customer requests throughout the day and thus presents a lot of scope for optimization, (2)
the CBD is representative of Singapore as a whole in terms of request arrival and destination
flow throughout the day, and (3) an area the size of the CBD is an appropriate service region
to consider for a single loop.
Both ground truth and stability-based simulations were carried out for the CBD. We use
a 15-minute discretization epoch both for updating the patrol loops, and for determining
the corresponding number of taxis N and arrival rate X. Due to a fine discretization and a
relatively small arrival rate, we add smoothing to reduce noise in the results.
4.5.3
Multi-Loop Experiments
Although a single-loop policy can be constructed for a service region of any size, its interpretation becomes increasingly questionable when applied to larger areas, all the way
up to an entire city. Undoubtedly, the most interesting application is to consider multiple
loops that are constructed to work together to service customer requests. Our multi-loop
experiments scale up to consider the whole of Singapore. We use a larger time discretization for arrival rate A and for the number of taxis in ground truth simulations. This gives us
6 four-hour epochs to consider throughout the course of the day, which are representative
of different time periods throughout a typical day (night, early morning, morning, after108
noon, evening, late evening). Using a finer discretization at such a large scale would not be
meaningful, and would simply end up averaging results over several time periods in order
to get to contextualize our results. We maintain the 15-minute discretization for updating
the patrol loops to ensure that the loops can adapt to changing arrival rate (also discretized
into 15-minute time steps).
Both ground truth and stability-based simulations were carried out for the whole of
Singapore. Based on preliminary results we consider 25 loops as the optimal choice for
our experiments. There are 28 postal districts in Singapore, so the 25-loop case gives us
an approximation for scaling up a single-loop policy in the case of the CBD (which covers
around 1-2 postal regions).
4.5.4
Unseen Test Data
In previous work [2] we demonstrate how to accurately infer traffic volume from historical
data collected on different days. To determine if our policy is still useful in the absence
of real-time data we conduct simulations using unseen historical test data. Our algorithm
was first trained by pre-constructing dynamic patrolling loops using historical data from
the same day (August 16). Experiments were then carried out using historical data from
the same day in the following week (August 23). A full replication of all the preceding
experiments was conducted for both the CBD and Singapore-wide multi-loop scenarios.
4.6
Results
First, we consider the implications of the greedy policy simulations. Figures 4-3a and 4-3b
show the greedy simulation results. In the case of the CBD we observe an overall increase
in ONCALL distance per trip by a factor of 1.42. For the entire region of Singapore, the
greedy policy performs even worse, increasing the overall ONCALL distance by a factor of
3.71. This supports our intuition that our policy is useful: because the control experiments
employing a simple policy performed much worse using the same simulation engine, the
increase in performance is due to patrolling policy only.
109
We are interested in examining solutions from three different points of view: (1) customer, (2) taxi driver, and (3) urban planning.
Customer (quality of service)
Quality of service is represented by customer waiting time, which is equivalent to the
ONCALL distance of individual taxi trips. Thus, the average ONCALL distance per trip
is an indicator of the expected waiting time. Figure 4-3c shows the average ONCALL per
trip distance for our single-loop policy in the CBD. The average ONCALL per trip distance
for August 16 is 0.45km, as compared to 1.2km from the historical data. Thus our singleloop policy reduces the total customer waiting time by a factor of 2.66. Figure 4-3d shows
the ONCALL per trip distance for the multi-loop case. The total average ONCALL per trip
distance computed using our model is 0.13 km as compared to 1.7 km from the historical
data. With a distance cost factor P
#
2, we see that our 25-loop patrolling policy in Sin-
gapore reduces the total customer waiting time by a factor of 6.84. For this scenario, a
customer request time that would have historically taken an hour, can now be serviced in
under 10 minutes.
Taxi Driver (distance traveled empty)
We assume that the goal of the taxi driver is to minimize the amount of time driving empty.
Figure 4-3e shows the ONCALL per taxi distance for our single-loop policy in the CBD.
The average total ONCALL per taxi distance for August 16 is 0.06 km, as compared to
0.18 km from the historical data. Our single-loop policy reduces the average total distance
driven empty by a factor of 3. This reduction corresponds to 67 percent decrease in fuel
consumption between subsequent customer requests. Figure 4-3f shows the ONCALL per
taxi distance for the multi-loop case. The total average ONCA L L per taxi distance computed
using our model is 0.12 km as compared to 1.6 km from the historical data. With a distance
cost factor #
e
2, we see that our 25-loop patrolling policy in Singapore reduces the average
total distance driven empty by a factor of 6.74.
110
Urban Planning (reducing congestion)
We assume that the goal of the municipal authority is to reduce congestion by reducing
the number of taxis on the road. The minimum number of taxis Nmin that are necessary to
maintain stability is given by (4.1) for some stability margin k. For any number of taxis
N > Nmin we define the utilization factor as
(4.2)
1 = Nmin/N.
The utilization factor n is the fraction of those N taxis that can service all requests while
maintaining stability for a given X. The total utilization factor on August 16th in the CBD
using our model is rj = 0.05, thus our model requires only 5 percent of the total taxis
available throughout the day to maintain stability. The total utilization factor for the multiloop case is 17 = 0.14, similarly implying that the taxi network is over-utilized by an order
of magnitude.
Singapore
CBD
On-call per taxi (km)
On-call per trip (km)
Utilization factor Ti
Aug. 16
Aug. 23
Aug. 16
Aug. 23
0.06
0.45
0.05
0.06
0.44
0.07
0.12
0.13
0.14
0.13
0.13
0.14
Table 4.1: Total patrolling policy ONCALL distance ratios and utilization factor over 24-hour
simulations.
4.6.1
Unseen Test Data Results
All of the preceding experiments were conducted on unseen test data, as described in Section 4.5.4. By evaluating it on previously unseen test data from August 23, we see that our
model performs well and maintains its robustness in the absence of real-time data. Figure 43 shows the results for August 23 overlayed in green. We see that the ONCALL distance
results maintain nearly the same magnitude and provide the same caliber of improvement
over the historical data as described in Section 4.5.4. We conclude that our policy results
in a comparable improvement in performance in the absence of real-time data.
111
4.7
Discussion
In this paper we presented a novel patrolling policy for a fleet of service vehicles responding to requests in a PDP scenario. Our policy uses patrol loops based on informative path
planning to minimize the distance driven by the vehicles to an incident request. We formalized the notion of stability in our problem context, and proved guarantees for our policy.
We used historical data from a fleet 16,000 taxis in Singapore to (1) infer the current ground
truth behavior of the unmanaged taxi fleet, (2) to train our algorithm, and (3) to conduct
simulations using both real-time and unseen test data. We evaluated the performance of
our policy by evaluating customer waiting time, distance driven empty, and congestion.
The experiments show that we can achieve substantial improvement in customer waiting
time and expected distance driving empty. Further, we observe that the taxi network is
over-utilized by showing that a similar level of service is possible with much fewer taxis.
Finally, we show that our policy generalizes well to unseen test data, offering an improvement in performance that is on par with results from real-time simulations.
112
Chapter 5
Conclusion and Lessons Learned
Building upon previous work in [27,33], in this thesis we presented an informative path
controller for both single-robot and multi-robot cases. This controller is decentralized,
adaptive, and based on a mixing-function coverage approach that enables robots to combine
sensor estimates to learn the distribution of sensory information in the environment and
reshape their paths into informative paths. A mixing-function-based coverage approach is
robust in that different classes of mixing functions can be used to recover multiple common
control strategies including minimum variance and Voronoi approaches. Additionally, as
the free parameter a approaches -oo from -1, the mixing function can approximate the
Voronoi approach arbitrarily well. A Lyapunov stability proof shows that the controller
will reshape the paths to locally optimal configurations and drive the estimated parameter
vector error to zero, assuming that the robots' initial trajectories are rich enough.
We observed how the free paramter a affected the behavior of the paths. While a E
(-1, -oo) generally reproduces the same informative paths given by the Voronoi approach,
a = -1 reduces the sensitivity of the Voronoi controller so that extremely small perturbations in initial waypoint positioning does not result in significantly different informative
path configurations for a multi-robot system. We also showed that the weights assigned
to the sensing task W, and the neighbor distance Wn, can affect the informative path configuration. Thus, these parameters can be used to tune the system according to the desired
behavior. For example, if shorter paths are desired, then setting Wn higher is recommended.
However, if sensing is very important and short paths are not necessary, then having a high
113
W would generate the desired result. Additionally, these weights can have a big effect
on whether the final paths will intersect or not, as it was observed in our simulations and
experiments that a high W, tends to generate non-intersecting paths.
The informative path control algorithm enables robots to sample and generate useful
paths for applications such as surveilling and persistent sensing. In this thesis, we extended
the informative path controller to be used in conjuction with a speed controller presented
in [30], to drive the paths into locally optical configurations that are beneficial for persistent
sensing tasks. Using this extended controller, referred to as an informative persistence controller, the robots drive their paths in a direction where the stability margin of the persistent
sensing task does not decrease, hence improving the performance of the robots executing
the persistent sensing task. Increasing the stability margin has the additional benefit of allowing the controller to more easily overcome unmodeled errors in the persistent sensing
task such as tracking errors by the robots and discretization of the path. Although it uses
the same coverage algorithms, in general, the final paths using the informative persistence
controller were different than the paths using the informative path controller. This is due to
the additional restriction of the non-decreasing stability margin.
Finally, we used the informative persistence controller to derive a novel patrolling policy for a fleet of service vehicles responding to requests in a PDP scenario. Our policy used
patrol loops based on a Voronoi coverage control strategy to minimize the distance driven
by the vehicles to an incident request. We formalized the notion of stability in our problem context, and proved guarantees for our policy. We used historical data from a fleet of
16,000 taxis in Singapore to (1) infer the current ground truth behavior of the unmanaged
taxi fleet, (2) to train our algorithm, and (3) to conduct simulations using both real-time
and unseen test data. We evaluated the performance of our policy by evaluating customer
waiting time, distance driven empty and congestion. The experiments show that we can
substantially reduce customer waiting time and the expected distance driven empty. Further, we observed that the taxi network is over-utilized by showing that a similar level of
service is possible with much fewer taxis. Finally, we show that our policy generalizes well
to unseen test data, offering an improvement in performance that is on par with results from
real-time simulations.
114
Appendix A
Tables of Mathematical Symbols
Table A. 1: Common symbols for each control strategy
q
Definition
Convex bounded environment
An arbitrary point in Q
d
Number of parameters
a
Mixingfunction free parameter
Symbol
Q
O(q)
Sensory function at point q
B(q)
Vector of basis functions for the sensory function at point q
a
c(q)
a-j
Ai
ptunk
s(q)
S
G(j)
True parameter vector for the sensory function, 0(q) = B(q)Ta
Consumption rate at point q for persistent sensing
Standard deviation of the j1 h Gaussian basis function
Mean of the fth Gaussian basis function
Truncation distance for Gaussian basis functions
Stability margin of point q for persistent sensing
Stability margin of the persistent sensing task
Gaussian function used to calculated truncated Gaussian basis
Wn
Weight assigned to neighboring waypoint distance
WS
Weight assigned to the mixing function coverage task
p
Radius for circular sensor footprint for persistent sensing
Dwelling time between switching from one to zero
Tdwell
v
Arbitrary real constant scalar
F
Diagonal, positive-definite adaptation gain matrix
y
Gain for adaptation law
p
Radius for circular sensor footprint for persistent sensing
115
Table A.2: Single-robot controller symbols
Symbol
Pr(t)
Definition
The robot's position at time t
n
Number of waypoints in the robot's path
Pi
The robot's ith waypoint position
Vi
Voronoi partition of the ith waypoint
Ai
Mass of Q
1A
Approximation of Mi
Yi
First mass moment of Q
fi
Approximation of Y
Ci
Centroid of Q
$i
Approximation of Ci
ei, ei
Vi
Ci -pi, c - pi
Wn(Pi+p
1
+ Pi-1 -
2
pi)
/3,fA
Mi+2Wn,Ai+2Wn
PPr (t)
Sensory function at the robot's position, 0 (pr(t))
0(q, t)
The robot's approximation of O(q)
Bp, (t)
a
Vector of basis functions at robot's position, B(pr(t))
The robot's parameter estimate
a
The robot's parameter error,
Ui
Control input for the i"h waypoint
H(pi,...,p,)
a-
a
Locational cost function
A
The robot's weighted integral of basis functions
2
The robot's weighted integral of sensory measurements
Ki
Control gain matrix for the ith waypoint in the robot's path
w
The robot's data weighting function
V
Lyapunov-like function for informative path and persistent sensing
b
Term in adaptation laws for purposes of the Lyapunov proof
apre
Time derivative of the robot's parameter before projection
Iproj
Projection matrix
T(t)
Time it takes the robot to complete its path at time t
,rc(q, t)
Time the robot covers q along its path at time t
F(pr)
The robot's sensor footprint
s(q)
The robot's estimated stability margin of point q
S
The robot's estimated stability margin of the persistent task
TW
Time at which the adaptation data weighting function is set to zero
Ii
Boolean control input for waypoint movement in persistent sensing
ti
Most recent time the boolean input I switches to one for the ih waypoint
116
Table A.3: Multi-robot controller symbols
Symbol
Pr(t)
Definition
Robot r's position at time t
nr
number of waypoints in robot r's path for multi-robot system
N
number of robots in the multi-robot system
pf
Robot r's ith waypoint position
V[
Voronoi partition of ith waypoint in robot r's path
Mf
Mass of Q
A?!f
Approximation of Mr
Yr
First mass moment of Q
kfr
Approximation of Y[
C[
Centroid of Q
ei
Approximation of Cf
ef
Cf - pf
ii
Wn(pr
1f
Mr+2Wn
Pir
A +2Wn
+ pf
1 -2)
OPr (t)
Sensory function at robot r's position, 0 (Pr(t)).
Or(q, t)
Robot r's approximation of 0 (q)
Bpr (t)
Vector of basis functions at robot r's position, B(Pr(t))
ar
Robot r's parameter estimate
dr
u,
Robot r's parameter error for the multi-robot system, ar - a
Control input for the ith waypoint in robot r's path
Ha
Locational mixing function based cost function
Ar
Robot r's weighted integral of basis functions
Ar
Robot r's weighted integral of sensory measurements
K[
Control gain matrix for the jth waypoint in robot r's path
Wr
Robot r's data weighting function
Lyapunov-like function for multi-robot coverage and persistent sensing
V
lr,r'
Weighting between parameters for robots r and r'
L
Graph Laplacian of the robot network
br
Terms in adaptation laws for purposes of the Lyapunov proof
apre,
Dmax
Time derivative of robot r's parameter before projection
Maximum distance the robot's can have and still communicate
IProjr
Qj
Projection matrix
Vector containing the j'h parameter of each robot
('
Tr(t)
Consensus gain
Time it takes robot r to complete its path at time t
117
Symbol
Symbol
r'(q, t)
Fr(pr)
Sr' (q)
Definition
Definition
Time robot r covers q along its path at time t
Robot r's sensor footprint
Robot r's estimated stability margin of point q for persistent sensing
Sr
Robot r's estimated stability margin of the persistent sensing task
TWr
Time at which the adaptation data weighting function is set to zero for
robot r
IT
Boolean control input for shutting down waypoint movement in persistent
sensing
tur, i
Most recent time the boolean input I[ switches to one for the ith waypoint
in robot r's path
118
Bibliography
[1] A. Arsie and E. Frazzoli. Efficient routing of multiple vehicles with no explicit communications. InternationalJournalofRobust andNonlinearControl, 18(2):154-164,
January 2007.
[2] J. Aslam, S. Lim, X. Pan, and D. Rus. City-scale traffic estimation from a roving
sensor network. In Proceedingsof the 10th ACM Conference on Embedded Network
Sensor Systems, pages 141-154. ACM, 2012.
[3] R. Balakrishna, M. Ben-Akiva, and H.N. Koutsopoulos. Offline calibration of dynamic traffic assignment: simultaneous demand-and-supply estimation. Transportation Research Record: Journal of the TransportationResearch Board, 2003(-1):5058, 2007.
[4] F. Bourgault, A.A. Makarenko, S.B. Williams, B. Grocholsky, and H.F. DurrantWhyte. Information based adaptive robotic exploration. In Intelligent Robots and
Systems, 2002. IEEE/RSJ International Conference on, volume 1, pages 540 - 545
vol.1, 2002.
[5] H. Choset. Coverage for robotics - A survey of recent results. Annals ofMathematics
and Artificial Intelligence, 31(1-4):113-126, 2001.
[6] J. Cortes. Distributed kriged kalman filter for spatial estimation. Automatic Control,
IEEE Transactionson, 54(12):2816 -2827, dec. 2009.
[7] J. Cortes, S. Martinez, T. Karatas, and F. Bullo. Coverage control for mobile sensing
networks. Robotics and Automation, IEEE Transactionson, 20(2):243 - 255, april
2004.
[8] C.T. Cunningham and R.S. Roberts. An adaptive path planning algorithm for cooperating unmanned air vehicles. In Robotics and Automation, 2001. Proceedings2001
ICRA. IEEE InternationalConference on, volume 4, pages 3981 - 3986 vol.4, 2001.
[9] Y. Elmaliach, N. Agmon, and G.A. Kaminka. Multi-robot area patrol under frequency
constraints. In Robotics and Automation, 2007 IEEE InternationalConference on,
pages 385 -390, april 2007.
[10] T.L. Friesz, J. Luque, R.L. Tobin, and B.W. Wie. Dynamic network traffic assignment
considered as a continuous time optimal control problem. OperationsResearch, pages
893-901, 1989.
119
[11] R. Graham and J. Cortes. Adaptive information collection by robotic sensor networks
for spatial estimation. Automatic Control, IEEE Transactionson, PP(99):1, 2011.
[12] G. Hollinger and S. Singh. Multi-robot coordination with periodic connectivity. In
Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages
4457-4462. IEEE, 2010.
[13] L.E. Kavraki, P. Svestka, J.-C. Latombe, and M.H. Overmars. Probabilistic roadmaps
for path planning in high-dimensional configuration spaces. Robotics and Automation, IEEE Transactionson, 12(4):566 -580, aug 1996.
[14] K.J. Kyriakopoulos and G.N. Saridis. Minimum jerk path generation. In Robotics
and Automation, 1988. Proceedings., 1988 IEEE InternationalConference on, pages
364 -369 vol.1, apr 1988.
[15] J. Le Ny and G.J. Pappas. On trajectory optimization for active sensing in gaussian
process models. In Decision and Control, 2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedingsof the 48th IEEE Conference
on, pages 6286 -6292, dec. 2009.
[16] S. Li. Multi-attribute Taxi Logistics Optimization. PhD thesis, Massachusetts Institute
of Technology, 2006.
[17] S. Lim, H. Balakrishnan, D. Gifford, S. Madden, and D. Rus. Stochastic motion
planning and applications to traffic. The InternationalJournalof Robotics Research,
30(6):699-712, 2011.
[18] K.M. Lynch, I.B. Schwartz, Peng Yang, and R.A. Freeman. Decentralized environmental modeling by mobile sensor networks. Robotics, IEEE Transactions on,
24(3):710 -724, june 2008.
[19] D.K. Merchant and G.L. Nemhauser. Optimality conditions for a dynamic traffic
assignment model. TransportationScience, 12(3):200-207, 1978.
[20] W.J. Mitchell, C.E. Borroni-Bird, and L.D. Bums. Reinventing the automobile. Personal Urban Mobility for the 21st Century. Cambridge/l\/IA, 2010.
[21] N. Nigam and I. Kroo. Persistent surveillance using multiple unmanned air vehicles.
In Aerospace Conference, 2008 IEEE, pages 1 -14, march 2008.
[22] M. Pavone, S.L. Smith, E. Frazzoli, and D. Rus. Robotic load balancing for mobilityon-demand systems. The InternationalJournalof Robotics Research, 31(7):839-854,
2012.
[23] Yuan-Qing Qin, De-Bao Sun, Ning Li, and Yi-Gang Cen. Path planning for mobile robot using the particle swarm optimization with mutation operator. In Machine
Learning and Cybernetics, 2004. Proceedings of 2004 InternationalConference on,
volume 4, pages 2473 - 2478 vol.4, aug. 2004.
120
[24] T. Roughgarden and E. Tardos. How bad is selfish routing?
(JACM), 49(2):236-259, 2002.
Journal of the ACM
[25] M. Schwager, D. Rus, and J. J. Slotine. Unifying geometric, probabilistic, and potential field approaches to multi-robot deployment. InternationalJournalof Robotics
Research, 30(3):371-383, March 2011.
[26] M. Schwager, J.-J. Slotine, and D. Rus. Decentralized, adaptive control for coverage with networked robots. In Robotics and Automation, 2007 IEEE International
Conference on, pages 3289 -3294, april 2007.
[27] Mac Schwager, Daniela Rus, and Jean-Jacques Slotine. Decentralized, adaptive coverage control for networked robots. The InternationalJournalof Robotics Research,
28(3):357-375, 2009.
[28] A. Singh, A. Krause, and W.J. Guestrin, C.and Kaiser. Efficient informative sensing
using multiple robots. JournalofArtificial Intelligence Research, 34(2):707, 2009.
[29] J.J.E. Slotine and W.A. LI. Applied Nonlinear Control. Prentice Hall, 1991.
[30] S. L. Smith, M. Schwager, and D. Rus. Persistent robotic tasks: Monitoring and
sweeping in changing environments. Robotics, IEEE Transactions on, PP(99):1 -17,
2011.
[31] S.L. Smith, M. Schwager, and D. Rus. Persistent monitoring of changing environments using a robot with limited range sensing. In Robotics andAutomation (ICRA),
2011 IEEE InternationalConference on, pages 5448 -5455, may 2011.
[32] Daniel E. Soltero, Stephen L. Smith, and Daniela Rus. Collision avoidance for persistent monitoring in multi-robot systems with intersecting trajectories. In Intelligent
Robots andSystems (IROS), 2011 IEEE/RSJInternationalConference on, pages 3645
-3652, sept. 2011.
[33] Daniel E. Soltero, Mac Swchager, and Daniela Rus. Generating informative paths
for persistent sensing in unknown environments. In Intelligent Robots and Systems
(IROS), 2012 IEEE/RSJ InternationalConference on, 2012. Submitted.
[34] A. Stentz. Optimal and efficient path planning for partially-known environments. In
Robotics and Automation, 1994. Proceedings., 1994 IEEE InternationalConference
on, pages 3310 -3317 vol.4, may 1994.
[35] M. Volkov, J. Aslam, and D. Rus. Markov-based redistribution policy model for
future urban mobility networks. In Intelligent TransportationSystems (ITSC), 2012
15th InternationalIEEE Conference on, pages 1906-1911. IEEE, 2012.
[36] J.G. Wardrop. Some theoretical aspects of road traffic research. In Inst Civil Engineers
Proc London/UK, number 0, 1900.
[37] Fumin Zhang and N.E. Leonard. Cooperative filters and control for cooperative exploration. Automatic Control, IEEE Transactionson, 55(3):650 -663, march 2010.
121