From: AAAI Technical Report SS-95-02. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. An experimental Hitoshi approach to cooperative of multi-agent MATSUBARA, Kazuo HIRAKI, Yoichi MOTOMURA,Yasuo Isao HARAand Hideki ASOH Electrotechnical Laboratory Tsukuba, Ibaraki, JAPAN 305 e-mail: matsubar~etl.go.jp Abstract Wehave built a multiple autonomous-robots simulator in order to study cooperative learning ’of multiagent systems. By using the simulator, experiments can be done for any tasks by any learning methods with some conditions. The simulator makes it possible to approach empirically to cooperative learning. ’Learning of multi- agent systems’ has not been founded yet as a field of study. Certain points have not been madeclear: what is learning of multi-agent systems?; what questions should be raised?; what should be achieved; what should be considered as the cost (for example, should the cost of communication between agents be considered?). Weintend to found a field, ’learning of multi-agent systems’, through the experiment of a concrete example of appropriate complicatedness. 1 learning Introduction systems KUNIYOSHI, agents and the task. The importance of learning of multi-agent systems has often been pointed out, and some studies have dealt with the matter. But they are limited to simple domain setting. ’Learning of multi- agent systems’ has not been founded yet as a field of study. After all, certain points have not been madeclear: what is learning of multi-agent systems?; what questions should be raised?; what should be achieved; what should be considered as the cost (for example, should the cost of communication between agents be considered?). Weintend to found field, ’learning of multi-agent systems’, through the experiment of a concrete example of appropriate complicatedness. 2 A Multiple Simulator: Autonomous MARS Robot An examplewhichis chosenas the subjectthat hasappropriate complicatedness is thecasein which We havebuilta multiple autonomous-robots simula-multiple behavior-based autonomous robotscoopertorin orderto studycooperative learning of multi- ateto carryout a taskin a physical environment. agentsystems. By usingthesimulator, experiments Thefinalgoalof the studyshouldbe realized on can be doneforany tasksby anylearning methods realrobots, buttheoperation of cooperative task withsomeconditions. Therehavebeenonlyfewre- on real robotsalonestillhas many problemsto searches thathavedealtwiththecooperative learningdealwith.Therefore in thisexperiment a simulaof multi-agents in realistic domains. Thesimulatortoris built,on whichan experimental environment whichis introduced here,however, makesit possiblefor cooperative learning is prepared. Thesimulato approach empirically to cooperative learning. tor is extendedfrom MARS(Multiple Autonomous In a multi-agent system, a taskis carried outby RobotSimulator)J1], whichis now underdevelopcooperation amongagents.Desirable typesof co- mentfor robotic researches, in orderto applyfor operation varies according to thecomplicatedness of the experiments on cooperative learning. Thesimtheouterworldotherthanagents, theability of the ulator whichhasbeenmadesimplifies therealworld, agents, thequality andquantity of thetask,anddy- butit is planned practically on a realistic bases. namicchangeof theenvironment whichincludes the The architecture of MARSis basedon Subsumption 135 TCP4P IIF modulo MARS Figure I: Current model of MARS Architecture[2] proposed by R. Brooks. The current model is shown in Pig.1. For this study there have been made some improvements to the original model of MARS.A simulated robot is designed to have the body and the brain separated conceptually. The framework, which was proposed by M. Inaba, is called Remote-Brained robotics[3]. The body is the main constituent for taking action, and it acts in the physical environment. With its multiple sensors, the body is capable of getting information from physical environment. The brain is designed to control the action by receiving sensor information from the body. The brain consists of stratified control modules. A simulated robot (see Fig.2) can be made by using GUI(Graphics User Interface) in MARS.The simulator is written on EUSLISP[4],which is an objectoriented LISP language for robotics researches. In the simulator, the sub-control modules are prepared, such as 1. random walk, 2. avoidance of obstacles, 3. escape from deadlock. They can be incorporated whenthe robots are defined. The sub-control modules are in charge of controls in the lower level (such as a conditioned reflex), and they are brought to action when decision is not made in the higher level. Therefore the sub-control modules are subsumedunder the higher modules, which are in charge of advanced and broad decision- making. In order to evaluate the experiment on cooperative learning by various approaches or models, the higher control modules are prefered to be able to be written 136 Figure2: A simulated robotin MARS with any programming languagesindependently of thesimulator. In orderto connect thisEXT-modules to thesimulator, we havedeveloped interface modules.Thishasmadepossible thedistributed processing throughthe networkby TCP/IPdatagram-typed socket. Thehigher decision-making agentforeachof therobots canbe operated by different workstations. Sincecooperative learning needsa numberof complicated andlarge-scale EXT-modules, thisinterface system candistribute andreduce theloadof calculationon thesimulator. The EXT-module are connected to the interface moduleby UNIX standardI/O, so we can use any programming languages. The EXT-modules are writtento thefollowing format. OUTPUT.... Robot..No (Action list) INPUT.... Robot_No Time(Valueof Sensor1) Value ofSensor2 ..... Robot_No (aninteger number) is an identifier of robot, whichis decided at thetimeof connection (it canbe knownfromthesensor-input). Timeis a valueof timecounter in MARS. OUTPUT: ~ ¯............................... i~i~i~i i~:::~,i~: ~~:~,~:::: i! i....................... Action listis a sequence of thefollowing action commands. !i~ii~i!~ii:ii~.i~i~ii~-~ i i~):~ ~i:i: ¯ (:spd x)..... Moveforward x in a unittime. Change direction forX degrees in ¯ (:avel x)..... a unittime. ~.~J.~f:~ii.~..~. i. ~:i.i :!i:::iiiili.iiiii ::ii~i i:~i!i i~ii.i iliiii.~i Z~i:,.,~ ~l i:i~!ii i.i::i ¯ (mop) .....Do nothing specifically (stop). ¯ (:press x)..... Pressan object withstrength x. Figure3: Operation panelforteaching of MARS ¯ (:grasp X).....Graspan objectX. judgethesituation by thesensorinformation obtained fromthestandard input;do decision-making; select an action; writeintothestandard output. If, ¯ (:send-sensor x (:rotate y)) .....Rotate a loadedat thispoint, kindof reward is arranged to be offered sensor x fory degrees. toward theaction, learning maybe realized. Learning witha teacher is alsopossible by receiving INPUT: theaction listandthesensor input, if a researcher navigates by an operation panelon X-Window (see Valueof sensor ...... values returned from the folFig.3). Thissystem iseffective forlearning of allthe lowing sensors. EXT-modules (witha teacher) by a stochastic model or a neuralnetwork model. eye-sensor ....Return thelabel listof object in As an experimental task, we have selecteda sight(scopeanglea degree; distance L).This Cleanup RoomProblem. Therearea lot of heavyor sensor canrotate. lightobjects in theroom.Thereareseveral robots in the room. Some robots can pick up heavy objects, ¯ radar-sensor ....Return thelistof distance towardthenearestobjecton theX (number) ra- whatotherscannot.Therobotshaveto pickup as manyobjectsas theycan withina limitedtime.If diating lines. a robotcannot pickup an object alone, towor three ¯ touch-sensor ....Making touchjudgment, return robotsmaypickup theobjectby cooperation. Afthelistof touching points (local coordinate val- tercleaning up an object, therobots receive rewards ues). according to weightof theobject. Thegoalforthe robots is to receive as manyrewards as possible in ¯ infrared-sensor ....A infrared sensor. Returnt thelimited time.An example of initial setupof this ornilwhether thereis anobject in thespecified taskon MARSis illustrated in Fig.4. scope. Release a grasped object. ¯ (:release) ..... ¯ reward-sensor ....Return thevalueof evaluation 3 (value of a realnumber decided by a treated object) of anaction to treat a task(press:). An Example of Cooperation of Multiple Autonomous Robots in thelastsection, multiIn addition, themoving speedof a robotandso on As forthetaskmentioned ple robots carry about the objects by cooperation folcanbe obtained. lowingtheprogram by a researcher. TheFig.5shows TheEXT-modules function in thefollowing way: the casein whicha researcher makesa programin 137 \ ® ° Figure 4: Cleanup RoomProblem advance to let robots cooperate, but the goal of our experiments is to make robots cooperate in various ways through the process of learning. 4 Toward Cooperative Learning Through the experiment, we are investigating what kind of "learning by multi-agents" is possible. The m~inpoints of investigation are as follows. Sharing and distribution of knowledge Each agent develops its individuality through learning. So it is possible to have a lazy agent which does not tend to carry out a task. This agent, however, mayshow its ability in abnormal situations. Howare the redundancy and the robustness brought about through learning? Learning methods Which methods of learning are valuable as candidates for multi-agents systems? For example, is reinforcement learning valid? Further, what should get feedback as a basis toward learning? Learning is supposed to develop by some knowledge being shared by all agents sharing and other Concluding Remarks knowledge being possessed by individual agents 5 distribution. Then where shall the boundary be In this paper we have dealt with a simulator as a tool formed between sharing and distribution? for studying cooperative learning. The characteris¯ Shaxing of satisfaction tics of the simulator includes: for multiple robots; convenience due to an interface by standard I/O (it Not only the evaluation of achievement of its can be written in any languages can be used from owntask, learning of each agent must reflects the achievement of all agents as a whole. How other machines through a network); wide use as tool for cooperative learning (any task or any learncould satisfaction be shared amongagents? ing method can be dealt with). This simulator is to ¯ Redundancy and robustness be PDSsoftware to the public after making further 138 7.toplevel loop :- sense(DATA), plan(DATA, Plan), act(Plan),!, add_hist(DATA,Plan), loop. In loop :- loop. 7.--- planning part 7. sensing X planning 7. acting 7. add history of Data and plan case failing in sensing and/or planning !!--- plan(SenseData, Action) touchable_mb(SenseData)-> 7. If touchable to a MB(target) Erasp_mb(SenseData, Action) ~ grasp the exist_m_with_r(SenseData)-> 7. If there is a robot near a go_to_m_w_r(SenseData,Action)7. go to the exist_mb_obj(SenseData) -> ~ If there is a turn_to_MB(SenseData, Action) 7. turn to the exist_obstacle(SenseData) avoid_obj(SenseData, Action) exist_ob(SenseData) turn_to_obj(SenseData,Action) random_walk(Action) If there X avoid it is an obstacle If there is an object 7. turn to the object far Default Behavior(random walk) ). Figure 5: An example program for cooperation 139 away revisions. Afterthiswe intendto conduct experimentson cooperative learning in thisenvironment, whichis themainsubject of thestudy. References Y. Mizuno,and M. Kakikura: [I]Y. Kuniyoshi, MultipleAutonomous RobotSimulatoron Euslisp- MARS-, The12thannualconference on Robotics Society of Japan(1994) [2] R.Brooks: A robust layered control system for a mobile robot, IEEE Journal of Robotics and Automation, vol.RA-2, no.l,pp14-23 (1986) Remote-brained robotics: interfacing [3]M.Inaba: AI withrealworldbehaviors, Proc.of ISRR-93 (1993) EuslispReference Manual, ETL tech[4]T.Matsui: nicalreport,ETL-TR-92-35, Electrotechnical Laboratory, Japan(1992) 140