Overview: Generalizations of Multi-Agent Path Finding to Real-World Scenarios Hang Ma, Sven Koenig, Nora Ayanian, Liron Cohen Wolfgang Hönig, T. K. Satish Kumar, Tansel Uras, Hong Xu Department of Computer Science University of Southern California {hangma,skoenig,ayanian,lironcoh,whoenig,turas,hongx}@usc.edu, tkskwork@gmail.com Craig Tovey School of ISyE Georgia Institute of Technology ctovey@isye.gatech.edu Abstract Multi-agent path finding (MAPF) is well-studied in artificial intelligence, robotics, theoretical computer science and operations research. We discuss issues that arise when generalizing MAPF methods to real-world scenarios and four research directions that address them. We emphasize the importance of addressing these issues as opposed to developing faster methods for the standard formulation of the MAPF problem. 1 Introduction Multi-agent path finding (MAPF) has been well-studied by researchers from artificial intelligence, robotics, theoretical computer science and operations research. The task of (standard) MAPF is to find the paths for multiple agents in a given graph from their current vertices to their targets without colliding with other agents, while at the same time optimizing a cost function. Existing MAPF methods use, for example, reductions to problems from satisfiability, integer linear programming or answer set programming [Yu and LaValle, 2013b; Erdem et al., 2013; Surynek, 2015] or optimal, bounded-suboptimal or suboptimal search methods [Silver, 2005; Sturtevant and Buro, 2006; Ryan, 2008; Wang and Botea, 2008; Standley, 2010; Standley and Korf, 2011; Wang and Botea, 2011; Luna and Bekris, 2011; Sharon et al., 2013; de Wilde et al., 2013; Barer et al., 2014; Goldenberg et al., 2014; Wagner and Choset, 2015; Boyarski et al., 2015; Sharon et al., 2015]. We have recently studied various issues that arise when generalizing MAPF to real-world scenarios, including Kiva (Amazon Robotics) warehouse systems [Wurman et al., 2008] (Figure 1) and autonomous aircraft towing vehicles [Morris et al., 2016]. These issues can be categorized into two general concerns: 1. Developing faster methods for the standard formulation of the MAPF problem is insufficient because, in many real-world scenarios, new structure can be exploited or new problem formulations are required. 2. Studying MAPF or its new formulations only as Guni Sharon Department of Computer Science The University of Texas at Austin gunisharon@gmail.com combinatorial optimization problems is insufficient because the resulting MAPF solutions also need to be executed. We discuss four research directions that address both concerns from different perspectives: 1. In many real-world multi-agent systems, agents are partitioned into teams, targets are given to teams, and each agent in a team needs to get assigned a target from the team, before one finds paths for all agents. We have formulated the combined target assignment and path finding (TAPF) problem for teams of agents to address this issue. We have also developed an optimal TAPF method that scales to dozens of teams and hundreds of agents [Ma and Koenig, 2016]. 2. In many real-world multi-agent systems, agents are anonymous (exchangeable), but their payloads are non-anonymous (non-exchangeable) and need to be delivered to given targets. The agents can often exchange their payloads in such systems. We have formulated the package-exchange robot routing (PERR) problem as a first attempt to tackle more general transportation problems where payload transfers are allowed [Ma et al., 2016]. In this context, we have also proved the hardness of approximating optimal MAPF solutions. 3. In many real-world multi-agent systems, the consistency of agent motions and the resulting predictability of agent motions is important (especially in work spaces shared by humans and agents), which is not taken into account by existing MAPF methods. We have exploited the problem structure of given MAPF instances in two stages: In the first stage, we have developed a scheme for finding paths for the agents that include many edges from user-provided highways, which achieves consistency and predictability of agent motions [Cohen et al., 2015]. In the second stage, we have developed methods that automatically generate highways [Cohen et al., 2016]. 4. MAPF is mostly motivated by navigation or motion planning for multi-robot systems. However, the • The anonymous variant of MAPF (also called goalinvariant MAPF) results from TAPF if only one team exists (that consists of all agents). The agents can get assigned any target and are thus exchangeable. It can be solved optimally in polynomial time with flow-based MAPF methods [Yu and LaValle, 2013a; Turpin et al., Figure 3: A small region of a Kiva layout. The green cells represent pod storage locations, the orange ovals the robots (with 2014]. pods not pictured), and the purple and pink regions the queues around the inventory stations. Figure 1: Autonomous drive units and storage pods that to move the inventory pods with the correct bins from The state-of-the-art optimal TAPF method, called the contain products and can be moved byused drive units theirthe storage locations to the inventory stations where a pick worker removes the desired products from the desired bin. Min-Cost Flow [Ma and Koenig, 2016], Note that the pod has four system faces, and the drive unit mayConflict-Based need (left) and the layout of a typical Kiva warehouse to rotate the pod in order to present the correct face. When a picker is done with a pod, the drive unit stores it in an empty combines search and flow-based MAPF methods. It (right) [Wurman et al., 2008]. storage location. Each station is equipped with a desktop computer that generalizes to dozens of teams and hundreds of agents. controls pick lights, barcode scanners, and laser pointers that Figure 5: The Kiva demonstration facility. Acknowledgments Konolige, K.; Fox, D.; Ortiz, C.; Agno, A.; Eriksen, M.; Limketkai, B.; Ko, J.; Morisset, B.; Schulz, D.; Stewart, B.; and Vincent, R. 2004. Centibots: Very large scale distributed robotic teams. In Proceedings of the International Symposium on Experimental Robotics. Lesser, V. R. 1999. Cooperative multiagent systems: A personal view of the state of the art. IEEE Transactions on Knowledge and Data Engineering 11(1):133–142. Malone, T. W.; Fikes, R. E.; Grant, K. R.; and Howard, M. T. 1988. Enterprise: A market-like task scheduler for distributed computing environments. In Huberman, B. A., ed., The Ecology of Computation. North Holland. Rosenschein, J. S., and Zlotkin, G. 1994. Rules of Encounter. Cambridge: The MIT Press. Simmons, R.; Smith, T.; Dias, M. B.; Goldberg, D.; Hershberger, D.; Stentz, A.; and Zlot, R. 2002. A layered architecture for coordination of mobile robots. In Schultz, A., and Parker, L., eds., Multi-Robot Systems: From Swarms to Intelligent Automata. Kluwer. Wellman, M. P., and Wurman, P. R. 1998. Market-aware agents for a multiagent world. Robotics and Autonomous Systems 24:115–25. Building a working MVS requires a core set of great mechanical, electrical, and software engineers. It is yet another thing to turn it into a commercial product and manage the manufacture, assembly, and deployment of these systems. We thank the world-class Kiva employees who have breathed life into this vision. References Boutilier, C.; Shoham, Y.; and Wellman, M. P. 1997. Economic principles of multi-agent systems. Artificial Intelligence 94(1):1–6. Butenko, S.; Murphey, R.; and Pardos, P. M., eds. 2003. Cooperative Control: Models, Applications and Algorithms. Springer. Gilmour, K. 2003. Amazon warehouse, amazon adventure. Internet Magazine. Hazard, C. J.; Wurman, P. R.; and D’Andrea, R. 2006. Alphabet soup: A testbed for studying resource allocation in multi-vehicle systems. In Proceedings of the 2006 AAAI Workshop on Auction Mechanisms for Robot Coordination, 23–30. Jennings, N. R., and Bussmann, S. 2003. Agent-based control systems: Why are they suited to engineering complex systems? IEEE Control Systems Magazine 61–73. Jennings, N. R. 1996. Coordination techniques for distributed artificial intelligence. In O’Hare, G. M. P., and Jennings, N. R., eds., Foundations of Distributed Artificial Intelligence. Wiley. 187–210. are used to identify the pick and put locations. Because every product is scanned in and out of the system, overall pick- errors go down, which potentially eliminates the need optimality or bounded-suboptimality ofing solutions forMAPF post-picking quality control. In general, every station is capable of being either a picking station or a replenishment 3 Package-Exchange Robot Routing (PERR) station. In practice, pick stations will be located near outdoes not necessarily entail their robustness, especially bound conveyors, and replenishment stations will be located near pallet drop off points. and New Complexity Results for MAPF given the imperfect plan-execution capabilities realThe power of the Kivaof solution comes from the fact that it allows every worker to have random access to any invenin the warehouse. Moreover, that inventory can be retrieved world robots. We have developed atory framework in parallel. When the picker is filling several boxes Agents at the are often anonymous but carry payloads (packages) time, the parallel, random access ensures that she is efficiently postprocesses the output of same awaiting MAPF method not on pods to arrive. In fact, by keeping a small that are assigned targets and are thus non-anonymous. For of work at the station, the Kiva system delivers a new pod can face everybe six seconds, which sets a baseline picking to create a plan-execution schedule thatqueue executed lines rate of 600 lines per hour. Peak rates can exceed 600example, in a Kiva warehouse system, the drive units are per hour when the operator can pick more off by real-world multi-robot systems [Hönig et al., 2016]. than one itemanonymous a pod. but the storage pods they carry are assigned 1759 2 3 For a large warehouse, the savings in personnel can be Consider, for example, what a Kiva implemenlocations and are thus non-anonymous. If each tation these of the book warehouse would involve. A busy storage bookWe now showcase the practicality ofsignificant. research seller may ship 100,000 boxes a day. With existing automation, this level of output would employ perhaps 75 workers agent carries one package, the problem is equivalent to directions to demonstrate that, in order to generalize MAPF (standard) MAPF. In reality, the packages can often be methods to real-world addressing both concerns is Figure 2:scenarios, A Kiva drive unit and storage pod. transferred among agents, which results in more general as important, if not more, than developing faster methods for transportation problems, for example, ride-sharing with the standard formulation of the MAPF problem. passenger transfers [Coltin and Veloso, 2014] and package 1755 delivery with robots in offices [Veloso et al., 2015]. We have 2 Combined Target Assignment and Path formulated PERR as a first step toward understanding these Finding (TAPF) for Teams of Agents problems [Ma et al., 2016]. In PERR, each agent carries one package, any two agents in adjacent vertices can exchange Targets are often given to teams of agents. Each agent in their packages, and each package needs to be delivered to a a team needs to get assigned a target given to its team so given target. PERR can thus be viewed as a modification of that the paths of the agents from their current vertices to (standard) MAPF: their target optimize a cost function. For example, in a Kiva warehouse system, the drive units that relocate storage • Packages in PERR can be viewed as agents in (standard) pods from the inventory stations to the storage locations MAPF which move by themselves. form a team because each of them needs to get assigned an • Two packages in adjacent vertices are allowed to available storage location. Previous MAPF methods assume exchange their vertices in PERR but two agents in that each agent is assigned a target in advance by some targetadjacent vertices are not allowed to exchange their assignment procedure but, to achieve optimality, we have vertices in (standard) MAPF. formulated TAPF, which couples the target-assignment and the path-finding problems and defines one common objective K-PERR is a generalization of PERR where packages for both of them. In TAPF, the agents are partitioned into are partitioned into K types and packages of the same type teams. Each team is given the same number of unique targets are exchangeable. Since, in TAPF, agents are partitioned as there are agents in the team. The task of TAPF is to assign into teams and agents in the same team are exchangeable, the targets to the agents and plan collision-free paths for the K-PERR can be viewed as a modification of TAPF with agents from their current vertices to their targets in a way K teams in the same sense that PERR can be viewed as such that each agent moves to exactly one target given to its a modification of (standard) MAPF. We have proved the team, all targets are visited and the makespan (the earliest hardness of approximating optimal PERR and K-PERR time step when all agents have reached their targets and stop solutions (for K ≥ 2). One corollary of our study is that moving) is minimized. Any agent in a team can get assigned both MAPF and TAPF are NP-hard to approximate within a target of the team, and the agents in the same team are any factor less than 4/3 for makespan minimization, even thus exchangeable. However, agents in different teams are when there are only two teams for TAPF. We have also not exchangeable. TAPF can be viewed as a generalization of demonstrated that the addition of exchange operations to (standard) MAPF and an anonymous variant of MAPF: MAPF does not reduce its complexity theoretically but makes PERR easier to solve than MAPF experimentally. There is • (Standard) MAPF results from TAPF if every team a continuum of problems that arise in different real-world consists of exactly one agent and the number of teams scenarios: “One agent with many packages” yields the classic thus equals the number of agents. The assignments of rural postman problem; “as many agents as packages” yields targets to agents are pre-determined and the agents are MAPF, TAPF or PERR. Understanding both extremes helps thus non-anonymous (non-exchangeable). 2 This statistic is based on single unit picks and has been reproduced for extended periods in the Kiva test facility. 3 This statistic was verified when a small Kiva demonstration system was brought to a drugstore distribution center where operators picked at nearly 700 lines per hour. Area1 10,577 10,396 10,272 Area2 = 1.5 SolCost 10,588 10,603 10,313 10,639 (1.5) and algorithm 5, 2.0}, es. The shows WY(2.0) untime g again neficial. olution hich is uts and but no paramrmance w2 ∈ of the ing the t-most, ouragee edges se with to cirSecond, es well e paths, ve over costs or =3 SolCost 11,354 11,707 11,414 11,319 11,363 11,245 Y(w2 ) for . Cells are minate ike ine highe corriaths for Figure 4: Kiva-like domain on which we compare ECBS(w) and Figure 2: User-provided highways in a simulated Kiva ECBS(w1 )+HWY(2.0). warehouse system. one attackinside the middle, required 150 to agents Area1as and Area2.by Inmany thoseother areas,real-world CBS has scenarios. less flexibility than ECBS(w) to avoid collisions by moving agents around other agents, which could explain why it fails solutions within runtime Structure limit. 4to find Exploitation of the Problem and Predictability of Motions Conclusions Agents share their work spaces with humans, and the We presented newmotions bounded-suboptimal MAPF approach consistency of atheir and the resulting predictability that takes advantage of additional inputs that represent a of their motions are important for the safety of the humans, highway and a parameter w. It uses the highway to dewhich existing MAPF methods do not take into account. rive new w-admissible heuristic that encourage path This motivates us to exploit the values problem structure of given finding to return paths that include the edges the highMAPF instances and design a scheme that ofencourages way. The level of encouragement with w. Our agents to move along user-providedincreases sets of edges (called new bounded-suboptimal variants of CBS and ECBS(w), [ ] highways) Cohen et al., 2015 . We use highways in the called CBS+HWY(w) and ECBS(w encour1 )+HWY(w context of a simple inflation scheme based on2 ),the ideas age a global behavior of the agentsetthat [Phillips behind experience graphs al.,avoids 2012]collisions. to derive On the theoretical side, weencourage developed MAPF a simple approach new heuristic values that methods to that uses highways for MAPF and provides suboptimality return paths that include the edges of the highways, which guarantees. On the experimental side, we demonstrated that avoids head-to-head collisions among agents and achieves ECBS(w1 )+HWY(w the runtimes and so2 ) can decrease consistency and predictability of their motions. For example, lution costs of ECBS(w) in Kiva-like domains with many in a Kiva warehouse system, we can design highways along agents if the highway captures the problem structure well.as the narrow passageways between the storage locations In future work, we in plan to develop that deshown by the arrows Figure 2. Weapproaches have demonstrated termine good highways automatically, whether in a simulated Kiva warehouse system investigate that such highways inflating the edge methods costs of the given graph (bymaintaining increasing accelerate MAPF significantly while costs of bounded-suboptimality highway edges to w) in to inflating the desired of addition the MAPF solution the heuristic values provides benefits, out costs. The problem structure additional of TAPF and PERRfigure instances can also be exploitedmovement with the same We haveedges also whether penalizing costs methods. against highway developed thatmaps automatically generate highways that (similar tomethods direction (Jansen and Sturtevant 2008)) are competitive user-provided in ourapproaches feasibility helps to improvewith the performance of ones our MAPF ]. suboptimality guarantees, exstudy Cohen et al.,to2016 while [continuing provide tend ECBS(w1 )+HWY(w2 ) to split the user-provided subbound w dynamically w1 and w2 (sim5optimality Dealing with Imperfect between Plan-Execution ilar to how ECBS(w) splits the suboptimality bound w dyCapabilities namically between the high-level and low-level searches), and explore highways context of other MAPF algoState-of-the-art MAPF in or the TAPF methods can find collisionrithms, such for as M* and inflated free paths hundreds of M*. agents optimally or with user-provided sub-optimality guarantees in a reasonable amount of computation time. They perform well even in Acknowledgments cluttered and tight environments, such as Kiva warehouse We thank Maxim Likhachev Michael for helpsystems. However, agentsand often have Phillips imperfect planful discussions. We also thank Ariel Felner, Guni Sharon, execution capabilities and are not able to synchronize their and Maxim Barer for theircan CBSresult and ECBS(w) source motions perfectly, which in frequent and code, timewhich we used in our experiments. was sup-a expensive replanning. Therefore, Our we research have proposed ported by NSF numbers 1409987 andnetwork 1319966. framework that under makesgrant use of a simple temporal to The views and conclusions contained in this document are postprocess a MAPF solution efficiently to create a planexecution schedule that works for non-holonomic robots, takes their maximum translational and rotational velocities into account, provides a guaranteed safety distance between them and exploits slack (defined as the difference of the latest and earliest entry times of locations) to absorb imperfect plan executions and avoid time-intensive replanning in many cases [Hönig et al., 2016]. This framework has been evaluated in simulation and on real robots. TAPF and PERR methods can also be applied in the same framework. Issues to be addressed in future work include adding user-provided safety distances, additional kinematic constraints, planning with uncertainty and replanning. 6 Conclusions We discussed four research directions that address issues that arise when generalizing MAPF methods to real-world scenarios and exploit either the problem structure or existing MAPF methods. Our goal was to point out interesting research directions for researchers working in the field of MAPF. 7 Acknowledgments The research at USC was supported by ARL under grant number W911NF-14-D-0005, ONR under grant numbers N00014-14-1-0734 and N00014-09-1-1031, NASA via Stinger Ghaffarian Technologies and NSF under grant numbers 1409987 and 1319966. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations, agencies or the U.S. government. References [Barer et al., 2014] M. Barer, G. Sharon, R. Stern, and A. Felner. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In Annual Symposium on Combinatorial Search, pages 19– 27, 2014. [Boyarski et al., 2015] E. Boyarski, A. Felner, R. Stern, G. Sharon, D. Tolpin, O. Betzalel, and S. E. Shimony. ICBS: improved conflict-based search algorithm for multiagent pathfinding. In International Joint Conference on Artificial Intelligence, pages 740–746, 2015. [Cohen et al., 2015] L. Cohen, T. Uras, and S. Koenig. Feasibility study: Using highways for bounded-suboptimal multi-agent path finding. In Annual Symposium on Combinatorial Search, pages 2–8, 2015. [Cohen et al., 2016] L. Cohen, T. Uras, T. K. S. Kumar, H. Xu, N. Ayanian, and S. Koenig. Improved bounded-suboptimal multi-agent path finding solvers. In International Joint Conference on Artificial Intelligence, 2016. [Coltin and Veloso, 2014] B. Coltin and M. Veloso. Scheduling for transfers in pickup and delivery problems with very large neighborhood search. In AAAI Conference on Artificial Intelligence, pages 2250–2256, 2014. [de Wilde et al., 2013] B. de Wilde, A. W. ter Mors, and C. Witteveen. Push and rotate: Cooperative multiagent path planning. In International Conference on Autonomous Agents and Multi-agent Systems, pages 87– 94, 2013. [Erdem et al., 2013] E. Erdem, D. G. Kisa, U. Oztok, and P. Schueller. A general formal framework for pathfinding problems with multiple agents. In AAAI Conference on Artificial Intelligence, pages 290–296, 2013. [Goldenberg et al., 2014] M. Goldenberg, A. Felner, R. Stern, G. Sharon, N. R. Sturtevant, R. C. Holte, and J. Schaeffer. Enhanced partial expansion A*. Journal of Artificial Intelligence Research, 50:141–187, 2014. [Hönig et al., 2016] W. Hönig, T. K. S. Kumar, L. Cohen, H. Ma, H. Xu, N. Ayanian, and S. Koenig. Multi-agent path finding with kinematic constraints. In International Conference on Automated Planning and Scheduling, pages 477–485, 2016. [Luna and Bekris, 2011] R. Luna and K. E. Bekris. Push and swap: Fast cooperative path-finding with completeness guarantees. In International Joint Conference on Artificial Intelligence, pages 294–300, 2011. [Ma and Koenig, 2016] H. Ma and S. Koenig. Optimal target assignment and path finding for teams of agents. In International Conference on Autonomous Agents and Multiagent Systems, pages 1144–1152, 2016. [Ma et al., 2016] H. Ma, C. Tovey, G. Sharon, T. K. S. Kumar, and S. Koenig. Multi-agent path finding with payload transfers and the package-exchange robot-routing problem. In AAAI Conference on Artificial Intelligence, pages 3166–3173, 2016. [Morris et al., 2016] R. Morris, C. Pasareanu, K. Luckow, W. Malik, H. Ma, S. Kumar, and S. Koenig. Planning, scheduling and monitoring for airport surface operations. In AAAI-16 Workshop on Planning for Hybrid Systems, 2016. [Phillips et al., 2012] M. Phillips, B. J. Cohen, S. Chitta, and M. Likhachev. E-graphs: Bootstrapping planning with experience graphs. In Robotics: Science and Systems, 2012. [Ryan, 2008] M. Ryan. Exploiting subgraph structure in multi-robot path planning. Journal of Artificial Intelligence Research, 31:497–542, 2008. [Sharon et al., 2013] G. Sharon, R. Stern, M. Goldenberg, and A. Felner. The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence, 195:470– 495, 2013. [Sharon et al., 2015] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant. Conflict-based search for optimal multiagent pathfinding. Artificial Intelligence, 219:40–66, 2015. [Silver, 2005] D. Silver. Cooperative pathfinding. In Artificial Intelligence and Interactive Digital Entertainment, pages 117–122, 2005. [Standley and Korf, 2011] T. S. Standley and R. E. Korf. Complete algorithms for cooperative pathfinding problems. In International Joint Conference on Artificial Intelligence, pages 668–673, 2011. [Standley, 2010] T. S. Standley. Finding optimal solutions to cooperative pathfinding problems. In AAAI Conference on Artificial Intelligence, pages 173–178, 2010. [Sturtevant and Buro, 2006] N. R. Sturtevant and M. Buro. Improving collaborative pathfinding using map abstraction. In Artificial Intelligence and Interactive Digital Entertainment, pages 80–85, 2006. [Surynek, 2015] P. Surynek. Reduced time-expansion graphs and goal decomposition for solving cooperative path finding sub-optimally. In International Joint Conference on Artificial Intelligence, pages 1916–1922, 2015. [Turpin et al., 2014] M. Turpin, K. Mohta, N. Michael, and V. Kumar. Goal assignment and trajectory planning for large teams of interchangeable robots. Autonomous Robots, 37(4):401–415, 2014. [Veloso et al., 2015] M. Veloso, J. Biswas, B. Coltin, and S. Rosenthal. CoBots: Robust Symbiotic Autonomous Mobile Service Robots. In International Joint Conference on Artificial Intelligence, pages 4423–4429, 2015. [Wagner and Choset, 2015] G. Wagner and H. Choset. Subdimensional expansion for multirobot path planning. Artificial Intelligence, 219:1–24, 2015. [Wang and Botea, 2008] K. Wang and A. Botea. Fast and memory-efficient multi-agent pathfinding. In International Conference on Automated Planning and Scheduling, pages 380–387, 2008. [Wang and Botea, 2011] K. Wang and A. Botea. MAPP: a scalable multi-agent path planning algorithm with tractability and completeness guarantees. Journal of Artificial Intelligence Research, 42:55–90, 2011. [Wurman et al., 2008] P. R. Wurman, R. D’Andrea, and M. Mountz. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 29(1):9–20, 2008. [Yu and LaValle, 2013a] J. Yu and S. M. LaValle. Multiagent path planning and network flow. In E. Frazzoli, T. Lozano-Perez, N. Roy, and D. Rus, editors, Algorithmic Foundations of Robotics X, Springer Tracts in Advanced Robotics, volume 86, pages 157–173. Springer, 2013. [Yu and LaValle, 2013b] J. Yu and S. M. LaValle. Planning optimal paths for multiple robots on graphs. In IEEE International Conference on Robotics and Automation, pages 3612–3617, 2013.