DISTRIBUTED OPTIMIZATION FOR COOPERATIVE MISSIONS IN UNCERTAIN ENVIRONMENTS C. G. Cassandras Center for Information and Systems Engineering Boston University Christos G. Cassandras CODES Lab. - Boston University OUTLINE ¾ COOPERATIVE “MISSION” SETTING ¾ REWARD MAXIMIZATION MISSIONS ¾ COOPERATIVE RECEDING HORIZON (CRH) CONTROL ¾ COVERAGE CONTROL MISSIONS ¾ DEMOS: Applets and Movies ACKNOWLEDGEMENTS: PhD Students: Wei Li (PhD 2006), Ning Xu, Minyi Zhong Sponsors: NSF, AFOSR, ARO, DOE, Honeywell Members of Boston U. Sensor Networks Consortium Christos G. Cassandras CODES Lab. - Boston University COOPERATIVE MISSION SETTING TARGETS MOBILE NODES Christos G. Cassandras CODES Lab. - Boston University DIFFERENT COOPERATIVE MISSION TYPES ¾ RENDEZ-VOUS AT SOME TARGET POINT ¾ FORMATION MAINTENANCE ¾ REWARD MAXIMIZATION ¾ COVERAGE CONTROL Christos G. Cassandras CODES Lab. - Boston University RENDEZ-VOUS MISSION RENDEZ-VOUS AT THIS TARGET Christos G. Cassandras CODES Lab. - Boston University FORMATION MAINTAINANCE MISSION Christos G. Cassandras CODES Lab. - Boston University REWARD MAXIMIZATION MISSION TARGETS WITH DIFFERENT REWARDS AND DEADLINES MISSION OBJECTIVE: MAXIMIZE TOTAL REWARD COLLECTED BY VISITING TARGETS BEFORE THEIR “DEADLINES” EXPIRE Christos G. Cassandras CODES Lab. - Boston University REWARD MAXIMIZATION MISSION X X X X X X CONTINUED ? UNKNOWN TARGETS ? ? Node 4 “repelled” by Node 3 ⇒ Search task performed Christos G. Cassandras CODES Lab. - Boston University COVERAGE CONTROL MISSION SENSOR FIELD WITH UNKNOWN DATA SOURCES - ONLY DENSITY FUNCTION ASSUMED --Meguerdichian Meguerdichianetetal, al,INFOCOM, INFOCOM,2001, 2001, --Cortes Cortesetetal, al,IEEE IEEETrans. Trans.on onRobotics Roboticsand andAuto., Auto.,2004 2004 --Cassandras Cassandrasand andLi, Li,Euro. Euro.J.J.ofofControl, Control,2005 2005 Christos G. Cassandras CODES Lab. - Boston University COOPERATIVE REWARD MAXIMIZATION MISSION R1φ1(t) r6i, j = 1,…,3 R6φ6(t) R3φ3(t) r3i, j = 1,…,3 r1i, j = 1,…,3 R4φ4(t) R4 r4i, j = 1,…,3 R5φ5(t) R5 r5i, j = 1,…,3 C1 p1i, i = 1,…,6 C3 p3i, i = 1,…,6 Christos G. Cassandras C2 r2i, j = 1,…,3 p2i, i = 1,…,6 CODES Lab. - Boston University R R2φ22(t) COOP. REWARD MAXIMIZATION MISSION CONTINUED This is like the notorious TRAVELING SALESMAN problem, except that… ¾ … there are multiple (cooperating) salesmen ¾ … there are deadlines + time-varying costs ¾ … environment is stochastic (vehicles may fail, threats damage vehicles, etc.) Christos G. Cassandras CODES Lab. - Boston University SOLUTION APPROACHES ¾ Stochastic Dynamic Programming – Wohletz et al, 2001 Extremely complex… ¾ Functional Decomposition: Dynamic Resource Allocation – Castanon and Wohletz, 2002 Assignment Problems through Mixed Integer Linear Programming – Bellingham et al, 2002 Combinatorially complex… Path Planning – Hu and Sastry, 2001, Lian and Murray 2002, Gazi and Passino, 2002, Bachmayer and Leonard, 2002 Christos G. Cassandras CODES Lab. - Boston University COMBINATORIAL + STOCHASTIC COMPLEXITY R2 R3 R4 R1 R5 1. Target Assignment Christos G. Cassandras 2. Routing 3. Path Control CODES Lab. - Boston University RECEDING HORIZON (RH) CONTROL: MAIN IDEA • Do not attempt to assign nodes to targets • Cooperatively steer nodes towards “high expected reward” regions PLANNING HORIZON, H periodically/on-event • Repeat process • Worry about final node-target assignment Turns out nodes at the last possible instant converge to targets ACTION on their own! HORIZON, h Solve optimization problem by selecting all ui to maximize total expected rewards over H u1 u2 Christos G. Cassandras u3 CODES Lab. - Boston University CONTRAST APPROACHES HEDGE-AND-REACT HEDGE-AND-REACT ¾ Delay decisions until last possible instant ¾ No stochastic model ¾ Simpler opt. problems VS ESTIMATE-AND-PLAN ESTIMATE-AND-PLAN ¾ Need accurate stochastic models ¾ Curse of dimensionality Compare to Model Predictive Control (MPC) Christos G. Cassandras CODES Lab. - Boston University CRH CONTROL PROBLEM FORMULATION Target positions (i = 1,…,N): yi ∈R² Node dynamics (j = 1,…,M): • State: xj(t) ∈R² position of jth node at time t • Control: uj(t) Node heading at time t ⎡cos u j (t )⎤ 0 x& j (t ) = V j ⎢ , x ( 0 ) = x ⎥ j j ⎢⎣sin u j (t ) ⎥⎦ At kth iteration, time tk (k=1,2,…): • Planning Horizon: Hk • Node position at time tk+Hk: x j (t k + H k ) = x j (t k ) + x& j (t k ) H k Christos G. Cassandras CODES Lab. - Boston University RH PROBLEM FORMULATION CONTINUED At kth iteration (k=1,2,…): Earliest time node j can reach target i under control uj(tk): τ ij (u j (t k ), t k ) = (t k + H k ) + || x j (t k + H k ) − yi || V j τ ij VjHk xj(tk) yi qij uj(tk) xj (tk +Hk) qkj τ kj Christos G. Cassandras yk Probability vehicle j assigned to target CODES Lab. - Boston University CRH PROBLEM FORMULATION CONTINUED Objective at kth iteration: Maximize EXPECTED REWARD over horizon Hk Target i value attainable by node j [depends on uj(t)] M N node j value M N max ∑∑ Riφi (τ ij ) pij (τ ij ) qij (t k + H k ) − ∑∑ C j rij (t k + H k ) u Control node headings i =1 j =1 Prob. time nodewhen j node Prob. node j Earliest collects targetreward i reward assigned to target i j can collect [depends on uj(t)] [depends on u (t)] j from target i [depends on uj(t)] Christos G. Cassandras i =1 j =1 Prob. node j destroyed by target i [depends on uj(t)] CODES Lab. - Boston University THE FUNCTION φi(t) [REWARD DISCOUNTING FUNCTION] • Targets with deadlines: 1 • Targets with time windows: Christos G. Cassandras 1 CODES Lab. - Boston University THE FUNCTION φi(t) CONTINUED • Sequencing targets: 1 φ1(t) • A general purpose φ−function: ln(1−α ) t Di ⎧ ⎪e φi (t ) = ⎨ ln(1−α ) ⎪e Di t e − β (t − Di ) ⎩ α 1 0.8 β 0.6 if t ≤ Di if t > Di 0.4 0.2 0 Christos G. Cassandras φ2(t) 20 40 t 60 80 CODES Lab. - Boston University 100 THE FUNCTION qij [TARGET ASSIGNMENT FUNCTION] • Node-to-target distance: dij • Relative distance: δ ij = = x j − yi d ij ∑ M m =1 or: b closest nodes to j only d im • Target assignment function qij(δij): Monotonically non-increasing and s.t. qij (0) = 1, qij (1) = 0 Christos G. Cassandras CODES Lab. - Boston University THE FUNCTION qij CONTINUED • A example of qij function (M=2): ⎧1 ⎪⎪ 1 qij (δij ) = ⎨ (1− Δ) −δij ⎪1− 2Δ ⎪⎩0 [ 1 ] if δij ≤ Δ if Δ < δij ≤1− Δ otherwise Capture Radius (Δ) 0.8 0.6 0.4 0.2 0 Christos G. Cassandras 0.2 0.4 0.6 0.8 1 1.2 1.4 CODES Lab. - Boston University THE FUNCTION qij CONTINUED qij(t) defines DYNAMIC RESPONSIBILITY REGIONS for vehicle j • Sj – Full Responsibility Region (FR) δij≤Δ • Cj – Cooperative Region (CR) Δ <δij≤1-Δ • Ij – Invisibility Region (IR) δij> 1-Δ Vehicle Targets in FR committed to node Christos G. Cassandras FR x j CR CR IR Targets in IR ignored by node CODES Lab. - Boston University THE FUNCTION qij CONTINUED 10 8 6 4 2 1 1 0 2 2 -2 3 3 -4 -6 4 4 -8 -10 -10 Δ=0.1 Δ=0.2 What happens as parameter Δ increases ? Christos G. Cassandras -5 0 5 10 Voronoi partition Partition of a plane with into n convex polygons such that each polygon contains exactly one point and every point in a given polygon is closer to its central point than to any other. CODES Lab. - Boston University 2-VEHICLE CASE – DYNAMIC PARTITIONING Possible Target Location Vehicle Locations II: Only vehicle 1 goes to target III: Both vehicles go to target IV: Only vehicle 2 goes to target (1 is repelled !) Christos G. Cassandras CODES Lab. - Boston University PLANNING AND ACTION HORIZONS PLANNING Horizon H(t): H (t ) = d min (t ) ≡ min d ij (t ) i, j ACTION Horizon h(t): h(t ) = α H + β H H (t ), α H ≥ 0, 0 ≤ β H ≤ 1 OR: Whenever next EVENT occurs Christos G. Cassandras CODES Lab. - Boston University TARGET ASSIGNMENT MAIN IDEA IN CRH APPROACH: Replace complex Discrete Stochastic Optimization problem by a sequence of simpler Continuous Optimization problems But how do we guarantee that vehicles ultimately head for the desired DISCRETE TARGET POINTS? Christos G. Cassandras CODES Lab. - Boston University STABILITY ANALYSIS • TARGETS: yi • UAVs: xj DEFINITION: Node trajectory x(t ) = [x1 (t ),K, xM (t )] generated by a controller is stationary, if there exists some tV < ∞, such that x j (tv ) − yi ≤ si for some i = 1, K , N , j = 1, K , M . QUESTION: Under what conditions is a CRH-generated trajectory stationary ? Wei Li Target Size CISE - Boston University STABILITY ANALYSIS CONTINUED Recall objective function: M M N N max ∑∑ Riφi (τ ij ) pij (τ ij )qij (t k + H k ) − ∑∑ C j rij (t k + H k ) u i =1 j =1 i =1 j =1 α T φi (t ) = 1- t t ∈ [ 0 ,T] rij=0 ⎧1 ⎪⎪ 1 (1− Δ) −δij qij (δij ) = ⎨ ⎪1− 2Δ ⎪⎩0 1 [ α 0.8 pij=1 0.6 0.4 0.2 ] if j ∈Bi ,δij ≤ Δ if j ∈Bi , Δ < δij ≤1− Δ otherwise 1 0 20 40 t 60 80 100 0.8 0.6 0.4 0.2 0 Wei Li 0.2 0.4 0.6 0.8 1 1.2 1.4 CISE - Boston University STABILITY ANALYSIS CONTINUED N M J (x) = ∑∑ Ri x j − yi qij Objective function reduces to: i =1 j =1 CRH controller solves optimization problem: ⎧⎪min J (x) x∈Fk ⎨ ⎪⎩ Fk = w : w j − x j (t k ) = VH k { } i.e., minimize the potential function J(x) over a set of M circles: min J(x) … F 1 k Fk2 Wei Li FkM CISE - Boston University MAIN STABILITY RESULT Local minima of J(x): x l = ( x1l ,..., xMl ) ∈ R 2 M , l = 1, K , L Vector of node positions at kth iteration of CRH controller: xk Theorem: Suppose H k = min d ij (t k ) . i, j l If, for all l = 1,…,L, x j = yi for some i = 1,…,N, j = 1,…,M, then J ( x k ) − J ( x k +1 ) > b (b > 0 is a constant). If all local minima coincide with targets, the CRH-generated trajectory is stationary Wei Li CISE - Boston University MAIN STABILITY RESULT QUESTION: When do all local minima coincide with target points? 1 Vehicle, N targets If there exists a yi s.t. Ri − N ∑ j =1, j ≠ i Rj yi − y j yi − y j >0 2 Vehicles, 1 target 2 Vehicles, 2 targets Wei Li CISE - Boston University TO RECAP… • Limited look-ahead – control optimizes expectation over “planning horizon” • Control updates – event-driven (events are deterministic or random) or time-driven (for a given “action horizon”) • Target assignment – done implicitly, not explicitly: No combinatorial problem involved • Assignment + Routing + Path Control – all done together Christos G. Cassandras CODES Lab. - Boston University RH CONTROLLER FEATURES CONTINUED • Target values change – deadlines, target sequencing, return to base • Node capabilities change - resource depletion, failures, damage • Threat capabilities change - radar on/off, threat damage • Target locations change - new targets, moving targets • Obstacle avoidance - targets with negative values • Randomness - new control actions in response to random events • Constraints – heading change, heading-dependent costs, sensing tasks Christos G. Cassandras CODES Lab. - Boston University DISTRIBUTED COOPERATIVE CONTROL Construct GRADIENT FIELD instead of artificial potential field ∂J i Fij = = ci ( x j ) ⋅ f i 0 ( x j ) ∂x j if yi ∈ S j ⎧1 ⎪⎪ (1 − δ ij )(2δ ij − 1) ci ( x j ) = ⎨qij − if yi ∈ C j (1 − 2Δ) ⎪ if yi ∈ I j ⎪⎩0 Cooperation coefficient Christos G. Cassandras ⎧ Ri x j − yi if x j ≠ yi ⎪− f i 0 ( x j ) = ⎨ Di || x j − yi || ⎪0 otherwise ⎩ Force exerted by target i on node j given that it is the only node in the mission space CODES Lab. - Boston University DISTRIBUTED COOPERATIVE CONTROL • 2 examples (M=2, N=10) Christos G. Cassandras CODES Lab. - Boston University OTHER ISSUES ¾ Local optima in the CRH optimization problem ¾ Oscillatory vehicle behavior (instabilities) ¾ Additional path constraints, e.g., rendez-vous at targets ¾ Does CRH control generate optimal assignments? Christos G. Cassandras CODES Lab. - Boston University REWARD MAXIMIZATION MISSION DEMO MOVIES OF SUCH MISSIONS WITH SMALL ROBOTS: 3 Khepera robots http://frontera.bu.edu/CoopCtrl.html executing mission: visiting 8 targets with different rewards and deadlines. Robots communicate wirelessly and dynamically update headings. Overhead vision system provides location data. Inner ring - “reward": R - 200, G - 300, B - 600 Outer ring - "deadline": R - 80s, G - 50s, B - 20s Christos G. Cassandras CODES Lab. - Boston University COVERAGE CONTROL MISSION GOAL: Deploy mobile nodes to maximize data source detection probability – unknown data sources – data sources may be mobile R(x) (Hz/m2) 50 40 30 20 10 0 10 8 6 ? ? ??Ω??? ? ? 4 2 0 10 5 0 Perceived data source density over mission space Christos G. Cassandras CODES Lab. - Boston University PROBLEM FORMULATION N mobile sensors, each located at si∈R2 Data source at x emits signal with energy E Signal observed by sensor node i (at si ) Sensing model: pi ( x) ≡ p (Detected by i | A( x), si ) ( A(x) = data source emits at x ) Sensing attenuation: pi(x) is a decreasing function of di(x) ≡ ||x - si|| (distance between x and si ) Christos G. Cassandras CODES Lab. - Boston University PROBLEM FORMULATION CONTINUED Joint detection prob. assuming sensor independence: N P( x) = 1 − ∏ [1 − pi ( x)] i =1 OBJECTIVE: Determine locations si (i=1,…,N) to maximize total detection probability: N ⎫ ⎧ max ∫ R( x)⎨1 − ∏ [1 − pi ( x)]⎬dx si ∈Ω ⎭ ⎩ i =1 Ω Perceived data source density Christos G. Cassandras CODES Lab. - Boston University DISTRIBUTED COOPERATIVE SCHEME CONTINUED Denote N ⎧ ⎫ F ( s1 , K, s N ) = ∫ R( x)⎨1 − ∏ [1 − pi ( x)]⎬dx ⎩ i =1 ⎭ Ω Maximize F(s1,…,sN) by forcing nodes to move using gradient information: N ∂pk ( x) sk − x ∂F dx = ∫ R ( x) ∏ [1 − pi ( x)] ∂sk Ω ∂d k ( x) d k ( x) i =1,i ≠ k Christos G. Cassandras CODES Lab. - Boston University DISTRIBUTED COOPERATIVE SCHEME CONTINUED N ∂pk ( x) sk − x ∂F dx = ∫ R ( x) ∏ [1 − pi ( x)] ∂sk Ω ∂d k ( x) d k ( x) i =1,i ≠ k This has to be evaluated numerically. Not doable for a mobile sensor with limited computation capacity. ¾ Approximate pi(x) by truncating sensing attenuation ¾ Discretize pi(x) using a grid Details in Christos G. Cassandras --Cassandras Cassandrasand andLi, Li,Euro. Euro.J.J.ofofControl, Control,2005 2005 CODES Lab. - Boston University COVERAGE CONTROL MISSION DEMO SOFTWARE DEMO OF COVERAGE CONTROL ALGORITHM: http://frontera.bu.edu/Applets/CoverageContr/index.html No communication cost With communication cost Sensing Range Christos G. Cassandras CODES Lab. - Boston University POLYGONAL OBSTACLES… CONTINUED • Constrain the navigation of mobile nodes • Interfere with the sensing ⎧ pi ( x, si ) if x is visible from si pˆ i ( x, si ) = ⎨ otherwise ⎩ 0 Visibility Region Christos G. Cassandras CODES Lab. - Boston University GRADIENT CALCULATION WITH OBSTACLES CONTINUED N ∂pˆ i ( x, si ) si − x ∂H = ∫ R( x) ∏ [1 − pˆ k ( x, sk )] dx + ∂si V ( si ) ∂di ( x) di ( x) k =1, k ≠ i ⎧ pi pˆ i = ⎨ ⎩0 visible invisible Q ( si ) ∑A j =1 j Q(si): # of occluding corner points New term captures change in visibility region of si Mathematically: use an extension of the Leibnitz rule for differentiating an integral where both the integrand and the integration domain are functions of the control variable Christos G. Cassandras CODES Lab. - Boston University DEPLOYMENT DEMO – WITH OBSTACLES http://codescolor.bu.edu/coverage Christos G. Cassandras CODES Lab. - Boston University A “FAIRNESS” ISSUE… CONTINUED Some areas covered extremely well, while others not covered at all SOLUTION: Assign higher reward to the same amount of marginal gain in P(x,s) in low coverage region H (s) = ∫ R( x) P( x, s)dx F H M (s) = ∫ R( x) M ( P( x, s) )dx F M(·) : [0,1] → R concave non-decreasing function Christos G. Cassandras CODES Lab. - Boston University DEPLOYMENT DEMO – REACTION TO EVENTS Event detected Christos G. Cassandras CODES Lab. - Boston University ONGOING WORK: SCALABLE, ASYNCHRONOUS, DISTRIBUTED OPTIMIZATION ¾ Small, cheap cooperating devices cannot handle complexity ⇒ we need DISTRIBUTED control and optim. algorithms ¾ Cooperating agents operate asynchronously ⇒ we need ASYNCHRONOUS control/optimization schemes ¾ Too much communication kills node energy sources ⇒ communicate ONLY when necessary ⇒ we need ASYNCHRONOUS control/optimization schemes ¾ Networks grow large, sensing tasks grow large ⇒ we need SCALABLE control and optim. algorithms Christos G. Cassandras CODES Lab. - Boston University