Nonmyopic Adaptive Informative Path Planning for Multiple Robots Amarjeet Singh (UCLA) Andreas Krause (Caltech) William Kaiser (UCLA) rsrg@caltech ..where theory and practice collide Monitoring rivers and lakes [IJCAI ‘07] Need to monitor large spatial phenomena Temperature, nutrient distribution, fluorescence, … NIMS Kaiser et.al. (UCLA) Can only make a limited number of measurements! Depth Color indicates actual temperature Predicted temperature Use robotic sensors to cover large areas Predict at unobserved locations Location across Where should welake sense to get most accurate predictions? 2 Urban Search & Rescue Detection Range Detected Survivors How can we coordinate multiple search & rescue helicopters to quickly locate moving survivors? 3 Related work Information gathering problems considered in Experimental design (Lindley ’56, Robbins ’52…), Value of information (Howard ’66), Spatial statistics (Cressie ’91, …), Machine Learning (MacKay ’92, …), Robotics (Sim&Roy ’05, …), Sensor Networks (Zhao et al ’04, …), Operations Research (Nemhauser ’78, …) Existing algorithms typically Heuristics: No guarantees! Can do arbitrarily badly. Find optimal solutions (Mixed integer programming, POMDPs): Very difficult to scale to bigger problems. Want algorithms that have theoretical guarantees and scale to large problems! 4 How to quantify collected information? Sensing quality function F(A) assigns utility to set A of locations, e.g., Expected reduction in MSE for predictions based GP model F(A1) = 4 F(A2) = 10 Want to pick sensing locations A µ V to maximize F(A) 5 Selecting sensing locations Given: finite set V of locations Want: A*µ V such that G4 G1 G2 Typically NP-hard! G3 Greedy algorithm: Start with A = ; For i = 1 to k s* := argmaxs F(A [ {s}) A := A [ {s*} How well does the greedy algorithm do? 6 Key observation: Diminishing returns Selection A = {Y1, Y2} Y1 Selection B = {Y1,…, Y5} Y2 Y1 Y2 Y3 Y4 Many sensing quality functions are submodular*: Y5 Information gain [Krause & Guestrin ’05] Y‘ Adding Y’ doesn’t help much Adding Y’ will help a lot! Expected Mean Squared Error [Das & Kempe ’08] New observation Y’ Detection time / likelihood [Krause et al. ’08] + Y’ Large improvement … Submodularity: B A *See paper for details + Y’ Small improvement For A µ B, F(A [ {Y’}) – F(A) ¸ F(B [ {Y’}) – F(B) 7 Selecting sensing locations Given: finite set V of locations Want: A*µ V such that G4 G1 G2 Typically NP-hard! G3 Greedy algorithm: Start with A = ; For i = 1 to k s* := argmaxs F(A [ {s}) A := A [ {s*} Theorem [Nemhauser et al. ‘78]: F(AG) ¸ (1-1/e) F(OPT) Greedy near-optimal! 8 Challenges for informative path planning Use robots to monitor environment Not just select best k locations A for given F(A). Need to … take into account cost of traveling between locations … cope with environments that change over time … need to efficiently coordinate multiple agents Want to scale to very large problems and have guarantees 9 Outline and Contributions Path Constraints Dynamic environments Multi-robot coordination 10 Informative path planning So far: max F(A) s.t. |A|· k Most informative locations Known as submodular orienteering problem. Robot needs to travel might be far apart! between&selected Best known algorithms (Chekuri Pal ’05, locations Singh et al ’07) are superpolynomial! Locations V nodes in a graph Can we exploit additionalC(A) structure to get betterpath = cost of cheapest s4 algorithms? connecting nodes A 2 1 2 s2 1 s10 s1 1 1 s11 1 s3 1 s5 max F(A) s.t. C(A) · B Greedy algorithm fails arbitrarily badly! 11 Additional structure: Locality If A, B are observation sets close by, then F(A [ B) < F(A) + F(B) If A, B are observation sets, at least r apart, then F(A [ B) ¼ F(A) + F(B) [we only assume F(A [ B) ¸ ° (F(A) + F(B))] F(B) A1 A2 F(A) B1 r B2 Call such an F (r,°)-local Sensors that are far apart are approximately independent Holds for many objective functions (e.g., GPs with decaying covariance etc.) We showed locality is empirically valid! 12 The pSPIELOR Algorithm based on sensor placement algorithm by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06 pSPIEL: Efficient nonmyopic algorithm (padded Sensor Placements at Informative and costEffective Locations) Select starting and ending location s1 and sB Decompose sensing region g g g into small, well-separated g g C C clusters S Solve cardinality constrained problem per cluster (greedy) g g g C Combine solutions using g C g g g orienteering algorithm Smooth resulting path 13 2,2 1,2 g1,1 2,1 1,3 2,3 1 S1 2 B 4,3 g4,1 4,4 3,1 3,3 4 4,2 3,2 3 3,4 Guarantees for pSPIELOR based on results by Krause, Guestrin, Gupta, Kleinberg IPSN ‘06 Theorem: For (r,°)-local submodular F pSPIEL finds a path A with submodular utility path length F(A) ¸ C(A) · (°) OPTF O(r) OPTC *See paper for details 14 pSPIEL Results: Search & Rescue Sensor Planning Research Challenge Rescued Survivors Rescue Range Expected number of survivors rescued Coordination of multiple mobile sensors to detect survivors of major urban disaster Buildings obstruct viewfield of camera F(A) = Expected # of people detected 80 pSPIEL 60 Greedy 40 20 Heuristic pSPIELDetection outperforms existing algorithms Range (Chao et al) for informative0path planning Detected Survivors 0 20 30 40 50 10 Number of timesteps 15 Outline and Contributions Path Constraints pSPIELOR exploits (r,°)-locality to near-optimally solve submodular orienteering Dynamic environments Multi-robot coordination 16 Dynamic environments So far: maxA F(A) s.t. C(A) · B Assumes we know the sensing quality F in advance Plan a fixed (nonadaptive) path / placement A In practice: Model unknown; need to learn as we go Environment changes dynamically Active learning: Find adaptive policy that modifies solution based on observations Gigantic POMDP (intractable) Can we efficiently find a good solution? 17 Sequential sensing Sensing policy XX55=21 =17 =? XX33=16 =? XX77=19 =? F(X5=17, X3=16, X7=19) = 3.4 X2 =? X12=? F(…) = 2.1 X23 =? F(…) = 2.4 F() = 3.1 expected utility over outcome of observations Want to pick sensing policy ¼ to maximize F(¼) 18 NAÏVE Algorithm [Singh, K, Kaiser, IJCAI ’09] At each timestep t Plan nonadaptive solution A* = argmax Ft(A) Execute first step of nonadaptive solution Receive observations obs Update sensing quality Ft+1(A) = Ft(A | obs) 8 A Efficient! E.g., using pSPIEL Defines a Nonmyopic Adaptive informatIVE policy NAIVE How well does this policy compare to the optimal policy? 19 Guarantees for NAÏVE-pSPIEL [Singh, K, Kaiser IJCAI ‘09] Theorem: (see paper for details) At every timestep t it holds that Ft(NAIVE) = (1) Ft(OPT) – O(H(|obs)) Value of optimal Uncertainty Key idea: Replace Ft by Gt(¼) = Ft(¼) + ¸ I(£ | in ¼) policy OPT model parameters Application specific where ¸ 0 is a learning rate parameter Need to trade off exploration (reducing H()) and exploitation (maximizing F(A)) 20 Expected number of survivors rescued Exploration-exploitation tradeoff 100 l = 0.1 80 60 l=0 40 l = 0.5 20 l = 0.9 0 0 10 20 30 40 Number of timesteps 50 Intermediate values of ¸ lead to best performance 21 Expected number of survivors rescued Results: Search & Rescue 80 60 NAIVE-pSPIEL OR NAIVE-Greedy pSPIELOR 40 Greedy 20 0 0 10 20 30 40 Number of timesteps 50 Adaptive planning leads to significant performance improvement! 22 Example paths 400 400 300 Distance (pixels) Distance (pixels) Initial Survivor Locations Initial Survivor Locations 200 100 0 0 Starting Location 100 200 Distance (pixels) 300 Greedy algorithm 400 300 Starting Location 200 100 0 0 100 200 Distance (pixels) 300 400 pSPIELOR 23 Results: environmental monitoring Monitor photosynthetically active regions under forest canopy F(A) = #”critical” regions covered % of critical locations observed 0.2 0.15 NAIVE-pSPIEL 0.1 0.05 Adaptive planning leads to significant pSPIEL performance improvement! 0 0 10 20 30 Number of timesteps 4024 Outline and Contributions Path Constraints Dynamic environments pSPIELOR exploits (r,°)-locality to near-optimally solve submodular orienteering NAÏVE-pSPIEL implicitly trades off exploration and exploitation to obtain near-optimal adaptive policy Multi-robot coordination 25 Multi-robot coordination max¼1…¼k F(¼1 U ¼2 U … U ¼k) s.t. C(¼1) · B; C(¼2) · B; … ; C(¼k) · B ¼k s ¼1 t ¼2 Can use single-robot algorithm to plan joint policy Exponential increase in complexity with #robots 26 Sequential allocation Use pSPIEL to find policy P1 for the first robot max¼1 F(¼1) s.t. C(¼1) · B Optimize for second robot (P2) committing to nodes in P1 max¼ 2 F(¼1 U ¼2) s.t. C(¼2) · B Optimize for k-th robot (Pk) committing to nodes in P1,…,Pk-1 max¼k F(¼1 U ¼2 U … ¼k} s.t. C(¼k) · B ¼k s ¼1 ¼2 t 27 Performance comparison Greedy selection of nodes with no path cost constraint Arbitrarily Poor Sequential allocation NAÏVE-pSPIELOR for multiple robots – policy planning RewardOpt Greedy over policies RewardPS ¸ ?? = O(1/°) Theorem: RewardSA RewardOpt ¸ 1+ Works for any single robot path adaptive planning algorithm! Independent of number of robots used! Key tool for analysis: Extension of submodular functions to adaptive policies 28 Average number of survivors rescued Multi-robot results 120 3 Robots 100 2 Robots 80 60 40 1 Robot 20 0 0 10 20 30 40 Number of timesteps 50 Diminishing returns as the number of robots increases 29 Conclusions New algorithm pSPIELOR for nonadaptive informative path planning for (r,°)-local submodular functions New algorithm, NAÏVE-pSPIELOR for adaptive informative path planning using implicit exploration-exploitation analysis Extensions to multiple robots by sequential allocation Perform well on real world problems 30