Multi Robot Systems Course Bar Ilan University Mor Vered Motivation In a multi-robot environment path planning or collision avoidance is an important problem. Multi-robot systems researchers have been investigating distributed coordination methods for improving spatial coordination in teams – a.k.a collision avoidance. Such methods adapt the coordination method to the dynamic changes in density of the robots. Basically, the goal is to help avoid collision from any static obstacles or other dynamic objects, such as moving robots. Motivation Spatial conflicts can cause the team’s productivity to drop with the addition of robots. We will see that this phenomenon is impacted by the coordination methods used by the team-members, as different coordination methods yield radically different productivity results. Heuristic Coordination Approaches Noise method - If I am on a given trajectory that is danger of colliding with another agents', add random noise to my direction vector. ”Behavior Based Formation Control for Multi-robot Teams”. Balch, Tucker and Arkin, Ronald C Heuristic Coordination Approaches Aggression method - Describes a controller which breaks deadlocks in favour of the most ‘aggressive’ robot. The robots compete and only one gains access to the resource. When robots come to close to each other, each of the robots chooses an aggression level (randomly); the robot with the lower level concedes its position, preventing a collision. Later showed that it might be best to choose aggression level proportional to the robot's task. For every cycle a robot found itself within 2 radii of a teammate, it selected either an aggressive or timid behavior, with probability of 0.5. If the robot selected to become timid, it backed away for 100 cycles (10 simulated seconds). Otherwise it proceeded forward, executing the aggressive behavior. As robots chose to continue being “aggressive” or to become “timid” every cycle, the probability that two robots would collide in this implementation was near zero. ”Go ahead, Make my day: Robot Conflict Resolution by Aggressive Competition”. Vaughan, Richard and St{\o}y, Kasper and Sukhatme, Gaurav and Mataric, Maja Heuristic Coordination Approaches Repel method - When sensing a possible collision on course the Repel group backtracked for 500 cycles (50 seconds) but mutually repelled using a direction of 180 degrees away from the closest robot. Heuristic Coordination Approaches TimeRand method - This method contained no repulsion methods to directly avoid collision but in case of a collision could take care of the immobile robot. When robots sensed that they did not significantly move for 100 cycles (10 seconds), they proceeded to move with a random walk for 150 cycles (15 seconds). Heuristic Coordination Approaches TimeRepel method - This method also contained no repulsion methods to directly avoid collision but only reacted after the fact to collisions. Once these robots did not move for 150 cycles (15 seconds), they then moved backwards for 50 cycles (5 seconds). All methods taken from “ A Study of Mechanisms for Improving Robotic Group Performance “Rosenfeld, Avi and Kaminka, Gal A and Kraus, Sarit and Shehory, Onn Heuristic Coordination Approaches Homework Come up with your own heuristic coordination approach by next week’s lesson and e-mail it to me. Selecting the Best Approach Assuming we have come up with several solutions. Another problem arising from that is how to select the best coordination method. As we stated before spatial conflicts can cause the team’s productivity to drop with the addition of robots. This phenomenon is impacted by the coordination methods used by the team-members, as different coordination methods yield radically different productivity results. No one collision avoidance method is best in all domain and group size settings. The effectiveness of coordination methods in a given context is not known in advance. Selecting the Best Approach CCC - Combined Coordination Cost Method How to select the best coordination method CCC -quantifies the production resources spent on coordination conflicts ; quantify the cost of group interactions. Multi-attribute cost measure to quantify resources such as time and fuel each group member spends in coordination behaviors during task execution. Facilitates comparison between different group methods. “ A Study of Mechanisms for Improving Robotic Group Performance “Rosenfeld, Avi and Kaminka, Gal A and Kraus, Sarit and Shehory, Onn CCC - Combined Coordination Cost Method Contended that if robots dynamically reduce their CCC, group productivity will be improved. To demonstrate this, they created robotic groups which dynamically adapt their coordination techniques based on each robot’s CCC estimate. Problem with this method – it ignores the gains accumulated from long periods of no coordination needs. The next method tries to fix that “ A Study of Mechanisms for Improving Robotic Group Performance “Rosenfeld, Avi and Kaminka, Gal A and Kraus, Sarit and Shehory, Onn Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective Used a reinforcement-learning approach to coordination algorithm selection. Proved it both on experiments (foraging) and empirically by mathematical equations. “Adaptive Multi-Robot Coordination: A Game-Theoretic Perspective“Gal A. Kaminka, Dan Erusalimchik and Sarit Kraus Problem Definition The normal routine of a robot's operation is to carry out its primary task until interrupted by a conflict with another robot which must be resolved by a coordination algorithm. This is called a conflict event. The event triggers a coordination algorithm to handle the conflict. Once it successfully finishes, the robots involved go back to their primary task. Problem Definition Defined several kinds of tasks : 1) Loose-coordination between the robots ( only occasional need for spatial or temporal coordination ). For example; multi-robot foraging. 2) Cooperative task – the robots seek to maximize group utility. For example; exploration. 3) Timed tasks – the task is bound in time. For example; exploration – completely explore a new area as quickly as possible, or patrolling. Problem Definition Divided time into : 1) Active interval – in which the robot was actively investing resources in coordination. 2) Passive interval – in which the robot no longer requires investing in coordination. 3) The robot has a nonempty set of coordination algorithms to select from. The choice of c coordination algorithm selection effects the duration of the active and passive intervals. Reinforcement Learning Computes how an agent ought to take actions in an environment so as to maximize some definition of cumulative reward. Basic reinforcement model consists of : A set of environment states - S. A set of actions - A. Rules of transitioning between states. Rules that determine the immediate reward of a transition. 5) Rules that describe what the agent observes. 1) 2) 3) 4) Reinforcement Learning The trick is - how to define the reward. They introduced a reward function EI effectiveness index , that reduces time and resources spent coordinating and maximizes the time between conflicts that require coordination. Took into consideration time spent on coordination and time not spent on coordination. Reinforcement Learning Instead of reward functions – used Q-Learning. ( a variant of reinforcement learning ). A technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. Takes into account an accumulative value - of all results taken up until now. “Q-learning”, Watkins, Christopher JCH and Dayan, Peter General reward The general reward is comprised of : 1) The total cost of coordination – cost of internal resources such as battery life and fuel. 2) Time spent on coordinating. 3) Frequency of coordinating.