DISTRIBUTED OPTIMIZATION FOR COOPERATIVE MISSIONS IN

advertisement
DISTRIBUTED
OPTIMIZATION
FOR
COOPERATIVE MISSIONS IN
UNCERTAIN
ENVIRONMENTS
C. G. Cassandras
Center for Information and Systems Engineering
Boston University
Christos G. Cassandras
CODES Lab. - Boston University
OUTLINE
¾ COOPERATIVE “MISSION” SETTING
¾ REWARD MAXIMIZATION MISSIONS
¾ COOPERATIVE RECEDING HORIZON (CRH) CONTROL
¾ COVERAGE CONTROL MISSIONS
¾ DEMOS: Applets and Movies
ACKNOWLEDGEMENTS:
PhD Students: Wei Li (PhD 2006), Ning Xu, Minyi Zhong
Sponsors: NSF, AFOSR, ARO, DOE, Honeywell
Members of Boston U. Sensor Networks Consortium
Christos G. Cassandras
CODES Lab. - Boston University
COOPERATIVE MISSION SETTING
TARGETS
MOBILE NODES
Christos G. Cassandras
CODES Lab. - Boston University
DIFFERENT COOPERATIVE MISSION TYPES
¾ RENDEZ-VOUS AT SOME TARGET POINT
¾ FORMATION MAINTENANCE
¾ REWARD MAXIMIZATION
¾ COVERAGE CONTROL
Christos G. Cassandras
CODES Lab. - Boston University
RENDEZ-VOUS MISSION
RENDEZ-VOUS
AT THIS TARGET
Christos G. Cassandras
CODES Lab. - Boston University
FORMATION MAINTAINANCE MISSION
Christos G. Cassandras
CODES Lab. - Boston University
REWARD MAXIMIZATION MISSION
TARGETS
WITH DIFFERENT
REWARDS AND DEADLINES
MISSION OBJECTIVE: MAXIMIZE TOTAL REWARD COLLECTED
BY VISITING TARGETS BEFORE THEIR “DEADLINES” EXPIRE
Christos G. Cassandras
CODES Lab. - Boston University
REWARD MAXIMIZATION MISSION
X
X
X
X
X
X
CONTINUED
?
UNKNOWN
TARGETS
?
?
Node 4 “repelled” by Node 3
⇒ Search task performed
Christos G. Cassandras
CODES Lab. - Boston University
COVERAGE CONTROL MISSION
SENSOR FIELD WITH
UNKNOWN DATA SOURCES
- ONLY DENSITY
FUNCTION ASSUMED
--Meguerdichian
Meguerdichianetetal,
al,INFOCOM,
INFOCOM,2001,
2001,
--Cortes
Cortesetetal,
al,IEEE
IEEETrans.
Trans.on
onRobotics
Roboticsand
andAuto.,
Auto.,2004
2004
--Cassandras
Cassandrasand
andLi,
Li,Euro.
Euro.J.J.ofofControl,
Control,2005
2005
Christos G. Cassandras
CODES Lab. - Boston University
COOPERATIVE REWARD MAXIMIZATION MISSION
R1φ1(t)
r6i, j = 1,…,3
R6φ6(t)
R3φ3(t)
r3i, j = 1,…,3
r1i, j = 1,…,3
R4φ4(t)
R4
r4i, j = 1,…,3
R5φ5(t)
R5
r5i, j = 1,…,3
C1
p1i, i = 1,…,6
C3
p3i, i = 1,…,6
Christos G. Cassandras
C2
r2i, j = 1,…,3
p2i, i = 1,…,6
CODES Lab. - Boston University
R
R2φ22(t)
COOP. REWARD MAXIMIZATION MISSION
CONTINUED
This is like the notorious TRAVELING SALESMAN
problem, except that…
¾ … there are multiple (cooperating) salesmen
¾ … there are deadlines + time-varying costs
¾ … environment is stochastic
(vehicles may fail, threats damage vehicles, etc.)
Christos G. Cassandras
CODES Lab. - Boston University
SOLUTION APPROACHES
¾ Stochastic Dynamic Programming – Wohletz et al, 2001
Extremely complex…
¾ Functional Decomposition:
ƒ Dynamic Resource Allocation – Castanon and Wohletz, 2002
ƒ Assignment Problems through Mixed Integer Linear
Programming – Bellingham et al, 2002
Combinatorially complex…
ƒ Path Planning – Hu and Sastry, 2001, Lian and Murray 2002, Gazi and
Passino, 2002, Bachmayer and Leonard, 2002
Christos G. Cassandras
CODES Lab. - Boston University
COMBINATORIAL + STOCHASTIC COMPLEXITY
R2
R3
R4
R1
R5
1. Target Assignment
Christos G. Cassandras
2. Routing
3. Path Control
CODES Lab. - Boston University
RECEDING HORIZON (RH) CONTROL: MAIN IDEA
• Do not attempt to assign nodes to targets
• Cooperatively steer nodes
towards
“high expected reward” regions
PLANNING
HORIZON,
H periodically/on-event
• Repeat
process
• Worry about final node-target assignment Turns out nodes
at the last possible instant
converge to targets
ACTION
on their own!
HORIZON, h
Solve optimization problem
by selecting all ui to maximize
total expected rewards over H
u1
u2
Christos G. Cassandras
u3
CODES Lab. - Boston University
CONTRAST APPROACHES
HEDGE-AND-REACT
HEDGE-AND-REACT
¾ Delay decisions until
last possible instant
¾ No stochastic model
¾ Simpler opt. problems
VS
ESTIMATE-AND-PLAN
ESTIMATE-AND-PLAN
¾ Need accurate
stochastic models
¾ Curse of dimensionality
Compare to
Model Predictive Control (MPC)
Christos G. Cassandras
CODES Lab. - Boston University
CRH CONTROL PROBLEM FORMULATION
ƒ Target positions (i = 1,…,N): yi ∈R²
ƒ Node dynamics (j = 1,…,M):
• State:
xj(t) ∈R²
position of jth node at time t
• Control:
uj(t)
Node heading at time t
⎡cos u j (t )⎤
0
x& j (t ) = V j ⎢
,
x
(
0
)
=
x
⎥
j
j
⎢⎣sin u j (t ) ⎥⎦
ƒ At kth iteration, time tk (k=1,2,…):
• Planning Horizon:
Hk
• Node position at time tk+Hk:
x j (t k + H k ) = x j (t k ) + x& j (t k ) H k
Christos G. Cassandras
CODES Lab. - Boston University
RH PROBLEM FORMULATION
CONTINUED
ƒ At kth iteration (k=1,2,…):
Earliest time node j can reach target i under control uj(tk):
τ ij (u j (t k ), t k ) = (t k + H k ) + || x j (t k + H k ) − yi || V j
τ ij
VjHk
xj(tk)
yi
qij
uj(tk)
xj (tk +Hk)
qkj
τ kj
Christos G. Cassandras
yk
Probability vehicle j
assigned to target
CODES Lab. - Boston University
CRH PROBLEM FORMULATION
CONTINUED
ƒ Objective at kth iteration:
Maximize EXPECTED REWARD over horizon Hk
Target i value attainable by node j
[depends on uj(t)]
M
N
node j value
M
N
max ∑∑ Riφi (τ ij ) pij (τ ij ) qij (t k + H k ) − ∑∑ C j rij (t k + H k )
u
Control
node
headings
i =1 j =1
Prob. time
nodewhen
j node Prob. node j
Earliest
collects
targetreward
i reward assigned to target i
j can collect
[depends on uj(t)]
[depends
on
u
(t)]
j
from target i
[depends on uj(t)]
Christos G. Cassandras
i =1 j =1
Prob. node j
destroyed by target i
[depends on uj(t)]
CODES Lab. - Boston University
THE FUNCTION φi(t) [REWARD DISCOUNTING FUNCTION]
• Targets with deadlines:
1
• Targets with time
windows:
Christos G. Cassandras
1
CODES Lab. - Boston University
THE FUNCTION φi(t)
CONTINUED
• Sequencing targets:
1
φ1(t)
• A general purpose
φ−function:
ln(1−α )
t
Di
⎧
⎪e
φi (t ) = ⎨ ln(1−α )
⎪e Di t e − β (t − Di )
⎩
α
1
0.8
β
0.6
if t ≤ Di
if t > Di
0.4
0.2
0
Christos G. Cassandras
φ2(t)
20
40
t 60
80
CODES Lab. - Boston University
100
THE FUNCTION qij
[TARGET ASSIGNMENT FUNCTION]
• Node-to-target distance: dij
• Relative distance:
δ ij =
= x j − yi
d ij
∑
M
m =1
or: b closest nodes
to j only
d im
• Target assignment function qij(δij):
Monotonically non-increasing and s.t.
qij (0) = 1, qij (1) = 0
Christos G. Cassandras
CODES Lab. - Boston University
THE FUNCTION qij
CONTINUED
• A example of qij function (M=2):
⎧1
⎪⎪ 1
qij (δij ) = ⎨
(1− Δ) −δij
⎪1− 2Δ
⎪⎩0
[
1
]
if δij ≤ Δ
if Δ < δij ≤1− Δ
otherwise
Capture Radius (Δ)
0.8
0.6
0.4
0.2
0
Christos G. Cassandras
0.2
0.4
0.6
0.8
1
1.2
1.4
CODES Lab. - Boston University
THE FUNCTION qij
CONTINUED
qij(t) defines DYNAMIC RESPONSIBILITY REGIONS for vehicle j
• Sj – Full Responsibility Region (FR) δij≤Δ
• Cj – Cooperative Region (CR)
Δ <δij≤1-Δ
• Ij – Invisibility Region (IR)
δij> 1-Δ
Vehicle
Targets in FR
committed to
node
Christos G. Cassandras
FR x
j
CR
CR
IR
Targets in IR
ignored by
node
CODES Lab. - Boston University
THE FUNCTION qij
CONTINUED
10
8
6
4
2
1
1
0
2
2
-2
3
3
-4
-6
4
4
-8
-10
-10
Δ=0.1
Δ=0.2
What happens as
parameter Δ increases ?
Christos G. Cassandras
-5
0
5
10
Voronoi partition
Partition of a plane with into n convex polygons such that
each polygon contains exactly one point and every point
in a given polygon is closer to its central point than to any other.
CODES Lab. - Boston University
2-VEHICLE CASE – DYNAMIC PARTITIONING
Possible
Target
Location
Vehicle
Locations
II: Only vehicle 1 goes to target
III: Both vehicles go to target
IV: Only vehicle 2 goes to target (1 is repelled !)
Christos G. Cassandras
CODES Lab. - Boston University
PLANNING AND ACTION HORIZONS
PLANNING Horizon H(t):
H (t ) = d min (t ) ≡ min d ij (t )
i, j
ACTION Horizon h(t):
h(t ) = α H + β H H (t ), α H ≥ 0, 0 ≤ β H ≤ 1
OR: Whenever next EVENT occurs
Christos G. Cassandras
CODES Lab. - Boston University
TARGET ASSIGNMENT
MAIN IDEA IN CRH APPROACH:
Replace complex Discrete Stochastic Optimization problem
by a sequence of simpler Continuous Optimization problems
But how do we guarantee that vehicles ultimately
head for the desired DISCRETE TARGET POINTS?
Christos G. Cassandras
CODES Lab. - Boston University
STABILITY ANALYSIS
• TARGETS: yi
• UAVs: xj
DEFINITION: Node trajectory x(t ) = [x1 (t ),K, xM (t )]
generated by a controller is stationary, if there
exists some tV < ∞, such that x j (tv ) − yi ≤ si for
some i = 1, K , N , j = 1, K , M .
QUESTION:
Under what conditions is a CRH-generated
trajectory stationary ?
Wei Li
Target Size
CISE - Boston University
STABILITY ANALYSIS
CONTINUED
Recall objective function:
M
M
N
N
max ∑∑ Riφi (τ ij ) pij (τ ij )qij (t k + H k ) − ∑∑ C j rij (t k + H k )
u
i =1 j =1
i =1 j =1
α
T
φi (t ) = 1- t t ∈ [ 0 ,T]
rij=0
⎧1
⎪⎪ 1
(1− Δ) −δij
qij (δij ) = ⎨
⎪1− 2Δ
⎪⎩0
1
[
α
0.8
pij=1
0.6
0.4
0.2
]
if j ∈Bi ,δij ≤ Δ
if j ∈Bi , Δ < δij ≤1− Δ
otherwise
1
0
20
40
t 60
80
100
0.8
0.6
0.4
0.2
0
Wei Li
0.2
0.4
0.6
0.8
1
1.2
1.4
CISE - Boston University
STABILITY ANALYSIS
CONTINUED
N
M
J (x) = ∑∑ Ri x j − yi qij
Objective function reduces to:
i =1 j =1
CRH controller solves
optimization problem:
⎧⎪min J (x)
x∈Fk
⎨
⎪⎩ Fk = w : w j − x j (t k ) = VH k
{
}
i.e., minimize the potential function J(x) over a set of M circles:
min J(x)
…
F
1
k
Fk2
Wei Li
FkM
CISE - Boston University
MAIN STABILITY RESULT
Local minima of J(x): x l = ( x1l ,..., xMl ) ∈ R 2 M , l = 1, K , L
Vector of node positions
at kth iteration of CRH controller:
xk
Theorem: Suppose H k = min d ij (t k ) .
i, j
l
If, for all l = 1,…,L, x j = yi for some i = 1,…,N, j = 1,…,M,
then J ( x k ) − J ( x k +1 ) > b (b > 0 is a constant).
If all local minima coincide with targets,
the CRH-generated trajectory is stationary
Wei Li
CISE - Boston University
MAIN STABILITY RESULT
QUESTION:
When do all local minima coincide with target points?
1 Vehicle, N targets
If there exists a yi s.t. Ri −
N
∑
j =1, j ≠ i
Rj
yi − y j
yi − y j
>0
2 Vehicles, 1 target
2 Vehicles, 2 targets
Wei Li
CISE - Boston University
TO RECAP…
• Limited look-ahead – control optimizes expectation
over “planning horizon”
• Control updates – event-driven (events are deterministic or random)
or time-driven (for a given “action horizon”)
• Target assignment – done implicitly, not explicitly:
No combinatorial problem involved
• Assignment + Routing + Path Control – all done together
Christos G. Cassandras
CODES Lab. - Boston University
RH CONTROLLER FEATURES
CONTINUED
• Target values change – deadlines, target sequencing, return to base
• Node capabilities change - resource depletion, failures, damage
• Threat capabilities change - radar on/off, threat damage
• Target locations change - new targets, moving targets
• Obstacle avoidance - targets with negative values
• Randomness - new control actions in response to random events
• Constraints – heading change, heading-dependent costs,
sensing tasks
Christos G. Cassandras
CODES Lab. - Boston University
DISTRIBUTED COOPERATIVE CONTROL
Construct GRADIENT FIELD instead of artificial potential field
∂J i
Fij =
= ci ( x j ) ⋅ f i 0 ( x j )
∂x j
if yi ∈ S j
⎧1
⎪⎪
(1 − δ ij )(2δ ij − 1)
ci ( x j ) = ⎨qij −
if yi ∈ C j
(1 − 2Δ)
⎪
if yi ∈ I j
⎪⎩0
Cooperation coefficient
Christos G. Cassandras
⎧ Ri x j − yi
if x j ≠ yi
⎪−
f i 0 ( x j ) = ⎨ Di || x j − yi ||
⎪0
otherwise
⎩
Force exerted by target i
on node j given that it is the
only node in the mission space
CODES Lab. - Boston University
DISTRIBUTED COOPERATIVE CONTROL
• 2 examples (M=2, N=10)
Christos G. Cassandras
CODES Lab. - Boston University
OTHER ISSUES
¾ Local optima in the CRH optimization problem
¾ Oscillatory vehicle behavior (instabilities)
¾ Additional path constraints,
e.g., rendez-vous at targets
¾ Does CRH control generate optimal
assignments?
Christos G. Cassandras
CODES Lab. - Boston University
REWARD MAXIMIZATION MISSION DEMO
MOVIES OF SUCH MISSIONS WITH SMALL ROBOTS:
3 Khepera robots
http://frontera.bu.edu/CoopCtrl.html
executing mission:
visiting 8 targets with
different rewards and
deadlines. Robots
communicate wirelessly
and dynamically update
headings. Overhead
vision system provides
location data.
Inner ring - “reward":
R - 200, G - 300, B - 600
Outer ring - "deadline":
R - 80s, G - 50s, B - 20s
Christos G. Cassandras
CODES Lab. - Boston University
COVERAGE CONTROL MISSION
GOAL: Deploy mobile nodes to maximize data source detection
probability
– unknown data sources
– data sources may be mobile
R(x)
(Hz/m2)
50
40
30
20
10
0
10
8
6
?
? ??Ω??? ?
?
4
2
0
10
5
0
Perceived data source density
over mission space
Christos G. Cassandras
CODES Lab. - Boston University
PROBLEM FORMULATION
ƒ N mobile sensors, each located at si∈R2
ƒ Data source at x emits signal with energy E
ƒ Signal observed by sensor node i (at si )
ƒ Sensing model:
pi ( x) ≡ p (Detected by i | A( x), si )
( A(x) = data source emits at x )
ƒ Sensing attenuation:
pi(x) is a decreasing function of di(x) ≡ ||x - si||
(distance between x and si )
Christos G. Cassandras
CODES Lab. - Boston University
PROBLEM FORMULATION
CONTINUED
ƒ Joint detection prob. assuming sensor independence:
N
P( x) = 1 − ∏ [1 − pi ( x)]
i =1
ƒ OBJECTIVE:
Determine locations si (i=1,…,N)
to maximize total detection probability:
N
⎫
⎧
max ∫ R( x)⎨1 − ∏ [1 − pi ( x)]⎬dx
si ∈Ω
⎭
⎩ i =1
Ω
Perceived data source density
Christos G. Cassandras
CODES Lab. - Boston University
DISTRIBUTED COOPERATIVE SCHEME
CONTINUED
ƒ Denote
N
⎧
⎫
F ( s1 , K, s N ) = ∫ R( x)⎨1 − ∏ [1 − pi ( x)]⎬dx
⎩ i =1
⎭
Ω
ƒ Maximize F(s1,…,sN) by forcing nodes to move using
gradient information:
N
∂pk ( x) sk − x
∂F
dx
= ∫ R ( x) ∏ [1 − pi ( x)]
∂sk Ω
∂d k ( x) d k ( x)
i =1,i ≠ k
Christos G. Cassandras
CODES Lab. - Boston University
DISTRIBUTED COOPERATIVE SCHEME
CONTINUED
N
∂pk ( x) sk − x
∂F
dx
= ∫ R ( x) ∏ [1 − pi ( x)]
∂sk Ω
∂d k ( x) d k ( x)
i =1,i ≠ k
This has to be evaluated numerically.
Not doable for a mobile sensor with limited
computation capacity.
¾ Approximate pi(x) by truncating sensing attenuation
¾ Discretize pi(x) using a grid
Details in
Christos G. Cassandras
--Cassandras
Cassandrasand
andLi,
Li,Euro.
Euro.J.J.ofofControl,
Control,2005
2005
CODES Lab. - Boston University
COVERAGE CONTROL MISSION DEMO
SOFTWARE DEMO OF COVERAGE CONTROL ALGORITHM:
http://frontera.bu.edu/Applets/CoverageContr/index.html
No communication cost
With communication cost
Sensing Range
Christos G. Cassandras
CODES Lab. - Boston University
POLYGONAL OBSTACLES…
CONTINUED
• Constrain the navigation of mobile nodes
• Interfere with the sensing
⎧ pi ( x, si ) if x is visible from si
pˆ i ( x, si ) = ⎨
otherwise
⎩ 0
Visibility
Region
Christos G. Cassandras
CODES Lab. - Boston University
GRADIENT CALCULATION WITH OBSTACLES
CONTINUED
N
∂pˆ i ( x, si ) si − x
∂H
= ∫ R( x) ∏ [1 − pˆ k ( x, sk )]
dx +
∂si V ( si )
∂di ( x) di ( x)
k =1, k ≠ i
⎧ pi
pˆ i = ⎨
⎩0
visible
invisible
Q ( si )
∑A
j =1
j
Q(si): # of occluding corner points
New term captures change in visibility region of si
Mathematically: use an extension of the
Leibnitz rule for differentiating an integral
where both the integrand and the integration
domain are functions of the control variable
Christos G. Cassandras
CODES Lab. - Boston University
DEPLOYMENT DEMO – WITH OBSTACLES
http://codescolor.bu.edu/coverage
Christos G. Cassandras
CODES Lab. - Boston University
A “FAIRNESS” ISSUE…
CONTINUED
Some areas covered extremely well, while others not
covered at all
SOLUTION: Assign higher reward to the same amount of marginal gain
in P(x,s) in low coverage region
H (s) = ∫ R( x) P( x, s)dx
F
H M (s) = ∫ R( x) M ( P( x, s) )dx
F
M(·) : [0,1] → R
concave non-decreasing function
Christos G. Cassandras
CODES Lab. - Boston University
DEPLOYMENT DEMO – REACTION TO EVENTS
Event
detected
Christos G. Cassandras
CODES Lab. - Boston University
ONGOING WORK:
SCALABLE, ASYNCHRONOUS, DISTRIBUTED OPTIMIZATION
¾ Small, cheap cooperating devices cannot handle complexity
⇒ we need DISTRIBUTED control and optim. algorithms
¾ Cooperating agents operate asynchronously
⇒ we need ASYNCHRONOUS control/optimization schemes
¾ Too much communication kills node energy sources
⇒ communicate ONLY when necessary
⇒ we need ASYNCHRONOUS control/optimization schemes
¾ Networks grow large, sensing tasks grow large
⇒ we need SCALABLE control and optim. algorithms
Christos G. Cassandras
CODES Lab. - Boston University
Download