An experimental approach to cooperative learning of multi-agent

advertisement
From: AAAI Technical Report SS-95-02. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.
An experimental
Hitoshi
approach to cooperative
of multi-agent
MATSUBARA, Kazuo HIRAKI, Yoichi MOTOMURA,Yasuo
Isao HARAand Hideki ASOH
Electrotechnical
Laboratory
Tsukuba, Ibaraki,
JAPAN 305
e-mail: matsubar~etl.go.jp
Abstract
Wehave built a multiple autonomous-robots simulator in order to study cooperative learning ’of multiagent systems. By using the simulator, experiments
can be done for any tasks by any learning methods
with some conditions. The simulator makes it possible to approach empirically to cooperative learning.
’Learning of multi- agent systems’ has not been
founded yet as a field of study. Certain points have
not been madeclear: what is learning of multi-agent
systems?; what questions should be raised?; what
should be achieved; what should be considered as the
cost (for example, should the cost of communication
between agents be considered?). Weintend to found
a field, ’learning of multi-agent systems’, through the
experiment of a concrete example of appropriate complicatedness.
1
learning
Introduction
systems
KUNIYOSHI,
agents and the task. The importance of learning of
multi-agent systems has often been pointed out, and
some studies have dealt with the matter. But they
are limited to simple domain setting. ’Learning of
multi- agent systems’ has not been founded yet as
a field of study. After all, certain points have not
been madeclear: what is learning of multi-agent systems?; what questions should be raised?; what should
be achieved; what should be considered as the cost
(for example, should the cost of communication between agents be considered?). Weintend to found
field, ’learning of multi-agent systems’, through the
experiment of a concrete example of appropriate complicatedness.
2
A Multiple
Simulator:
Autonomous
MARS
Robot
An examplewhichis chosenas the subjectthat
hasappropriate
complicatedness
is thecasein which
We havebuilta multiple
autonomous-robots
simula-multiple
behavior-based
autonomous
robotscoopertorin orderto studycooperative
learning
of multi- ateto carryout a taskin a physical
environment.
agentsystems.
By usingthesimulator,
experiments
Thefinalgoalof the studyshouldbe realized
on
can be doneforany tasksby anylearning
methods realrobots,
buttheoperation
of cooperative
task
withsomeconditions.
Therehavebeenonlyfewre- on real robotsalonestillhas many problemsto
searches
thathavedealtwiththecooperative
learningdealwith.Therefore
in thisexperiment
a simulaof multi-agents
in realistic
domains.
Thesimulatortoris built,on whichan experimental
environment
whichis introduced
here,however,
makesit possiblefor cooperative
learning
is prepared.
Thesimulato approach
empirically
to cooperative
learning. tor is extendedfrom MARS(Multiple
Autonomous
In a multi-agent
system,
a taskis carried
outby RobotSimulator)J1],
whichis now underdevelopcooperation
amongagents.Desirable
typesof co- mentfor robotic
researches,
in orderto applyfor
operation
varies
according
to thecomplicatedness
of the experiments
on cooperative
learning.
Thesimtheouterworldotherthanagents,
theability
of the ulator
whichhasbeenmadesimplifies
therealworld,
agents,
thequality
andquantity
of thetask,anddy- butit is planned
practically
on a realistic
bases.
namicchangeof theenvironment
whichincludes
the The architecture
of MARSis basedon Subsumption
135
TCP4P
IIF modulo
MARS
Figure I: Current model of MARS
Architecture[2] proposed by R. Brooks. The current model is shown in Pig.1. For this study there
have been made some improvements to the original
model of MARS.A simulated robot is designed to
have the body and the brain separated conceptually.
The framework, which was proposed by M. Inaba, is
called Remote-Brained robotics[3]. The body is the
main constituent for taking action, and it acts in the
physical environment. With its multiple sensors, the
body is capable of getting information from physical
environment. The brain is designed to control the action by receiving sensor information from the body.
The brain consists of stratified control modules. A
simulated robot (see Fig.2) can be made by using
GUI(Graphics User Interface) in MARS.The simulator is written on EUSLISP[4],which is an objectoriented LISP language for robotics researches.
In the simulator, the sub-control modules are prepared, such as 1. random walk, 2. avoidance of obstacles, 3. escape from deadlock. They can be incorporated whenthe robots are defined. The sub-control
modules are in charge of controls in the lower level
(such as a conditioned reflex), and they are brought
to action when decision is not made in the higher
level. Therefore the sub-control modules are subsumedunder the higher modules, which are in charge
of advanced and broad decision- making. In order
to evaluate the experiment on cooperative learning
by various approaches or models, the higher control modules are prefered to be able to be written
136
Figure2: A simulated
robotin MARS
with any programming
languagesindependently
of
thesimulator.
In orderto connect
thisEXT-modules
to thesimulator,
we havedeveloped
interface
modules.Thishasmadepossible
thedistributed
processing throughthe networkby TCP/IPdatagram-typed
socket.
Thehigher
decision-making
agentforeachof
therobots
canbe operated
by different
workstations.
Sincecooperative
learning
needsa numberof complicated
andlarge-scale
EXT-modules,
thisinterface
system
candistribute
andreduce
theloadof calculationon thesimulator.
The EXT-module
are connected
to the interface
moduleby UNIX standardI/O, so we can use any
programming languages.
The EXT-modules
are writtento thefollowing
format.
OUTPUT....
Robot..No
(Action
list)
INPUT.... Robot_No
Time(Valueof Sensor1)
Value
ofSensor2
.....
Robot_No
(aninteger
number)
is an identifier
of
robot,
whichis decided
at thetimeof connection
(it
canbe knownfromthesensor-input).
Timeis a valueof timecounter
in MARS.
OUTPUT:
~
¯...............................
i~i~i~i
i~:::~,i~:
~~:~,~::::
i!
i.......................
Action
listis a sequence
of thefollowing
action
commands.
!i~ii~i!~ii:ii~.i~i~ii~-~
i i~):~ ~i:i:
¯ (:spd
x).....
Moveforward
x in a unittime.
Change
direction
forX degrees
in
¯ (:avel
x).....
a unittime.
~.~J.~f:~ii.~..~.
i. ~:i.i
:!i:::iiiili.iiiii
::ii~i
i:~i!i
i~ii.i
iliiii.~i
Z~i:,.,~
~l
i:i~!ii
i.i::i
¯ (mop)
.....Do nothing
specifically
(stop).
¯ (:press
x).....
Pressan object
withstrength
x.
Figure3: Operation
panelforteaching
of MARS
¯ (:grasp
X).....Graspan objectX.
judgethesituation
by thesensorinformation
obtained
fromthestandard
input;do decision-making;
select
an action;
writeintothestandard
output.
If,
¯ (:send-sensor
x (:rotate
y)) .....Rotate
a loadedat thispoint,
kindof reward
is arranged
to be offered
sensor
x fory degrees.
toward
theaction,
learning
maybe realized.
Learning
witha teacher
is alsopossible
by receiving
INPUT:
theaction
listandthesensor
input,
if a researcher
navigates
by an operation
panelon X-Window
(see
Valueof sensor ...... values returned from the folFig.3).
Thissystem
iseffective
forlearning
of allthe
lowing sensors.
EXT-modules
(witha teacher)
by a stochastic
model
or a neuralnetwork
model.
eye-sensor
....Return
thelabel
listof object
in
As an experimental
task, we have selecteda
sight(scopeanglea degree;
distance
L).This Cleanup
RoomProblem.
Therearea lot of heavyor
sensor
canrotate.
lightobjects
in theroom.Thereareseveral
robots
in
the
room.
Some
robots
can
pick
up
heavy
objects,
¯ radar-sensor
....Return
thelistof distance
towardthenearestobjecton theX (number)
ra- whatotherscannot.Therobotshaveto pickup as
manyobjectsas theycan withina limitedtime.If
diating
lines.
a robotcannot
pickup an object
alone,
towor three
¯ touch-sensor
....Making
touchjudgment,
return robotsmaypickup theobjectby cooperation.
Afthelistof touching
points
(local
coordinate
val- tercleaning
up an object,
therobots
receive
rewards
ues).
according
to weightof theobject.
Thegoalforthe
robots
is to receive
as manyrewards
as possible
in
¯ infrared-sensor
....A infrared
sensor.
Returnt thelimited
time.An example
of initial
setupof this
ornilwhether
thereis anobject
in thespecified
taskon MARSis illustrated
in Fig.4.
scope.
Release
a grasped
object.
¯ (:release)
.....
¯ reward-sensor
....Return
thevalueof evaluation
3
(value
of a realnumber
decided
by a treated
object)
of anaction
to treat
a task(press:).
An Example of Cooperation
of
Multiple
Autonomous
Robots
in thelastsection,
multiIn addition,
themoving
speedof a robotandso on As forthetaskmentioned
ple
robots
carry
about
the
objects
by
cooperation
folcanbe obtained.
lowingtheprogram
by a researcher.
TheFig.5shows
TheEXT-modules
function
in thefollowing
way: the casein whicha researcher
makesa programin
137
\
®
°
Figure 4: Cleanup RoomProblem
advance to let robots cooperate, but the goal of our
experiments is to make robots cooperate in various
ways through the process of learning.
4
Toward Cooperative
Learning
Through the experiment, we are investigating what
kind of "learning by multi-agents" is possible. The
m~inpoints of investigation are as follows.
Sharing and distribution
of knowledge
Each agent develops its individuality through
learning. So it is possible to have a lazy agent
which does not tend to carry out a task. This
agent, however, mayshow its ability in abnormal
situations. Howare the redundancy and the robustness brought about through learning?
Learning methods
Which methods of learning are valuable as candidates for multi-agents systems? For example,
is reinforcement learning valid? Further, what
should get feedback as a basis toward learning?
Learning is supposed to develop by some knowledge being shared by all agents sharing and other
Concluding
Remarks
knowledge being possessed by individual agents 5
distribution. Then where shall the boundary be
In this paper we have dealt with a simulator as a tool
formed between sharing and distribution?
for studying cooperative learning. The characteris¯ Shaxing of satisfaction
tics of the simulator includes: for multiple robots;
convenience due to an interface by standard I/O (it
Not only the evaluation of achievement of its
can be written in any languages can be used from
owntask, learning of each agent must reflects
the achievement of all agents as a whole. How other machines through a network); wide use as
tool for cooperative learning (any task or any learncould satisfaction be shared amongagents?
ing method can be dealt with). This simulator is to
¯ Redundancy and robustness
be PDSsoftware to the public after making further
138
7.toplevel
loop :- sense(DATA),
plan(DATA, Plan),
act(Plan),!,
add_hist(DATA,Plan),
loop.
In
loop :- loop.
7.--- planning part
7. sensing
X planning
7. acting
7. add history of Data and plan
case failing in sensing and/or planning
!!---
plan(SenseData, Action)
touchable_mb(SenseData)->
7. If touchable to a MB(target)
Erasp_mb(SenseData, Action) ~ grasp the
exist_m_with_r(SenseData)-> 7. If there is a robot near a
go_to_m_w_r(SenseData,Action)7. go to the
exist_mb_obj(SenseData) ->
~ If there is a
turn_to_MB(SenseData, Action) 7. turn to the
exist_obstacle(SenseData)
avoid_obj(SenseData,
Action)
exist_ob(SenseData)
turn_to_obj(SenseData,Action)
random_walk(Action)
If there
X avoid it
is
an obstacle
If there is an object
7. turn to the object
far
Default Behavior(random walk)
).
Figure 5: An example program for cooperation
139
away
revisions.
Afterthiswe intendto conduct
experimentson cooperative
learning
in thisenvironment,
whichis themainsubject
of thestudy.
References
Y. Mizuno,and M. Kakikura:
[I]Y. Kuniyoshi,
MultipleAutonomous
RobotSimulatoron Euslisp- MARS-, The12thannualconference
on
Robotics
Society
of Japan(1994)
[2] R.Brooks: A robust layered control system for
a mobile robot, IEEE Journal of Robotics and
Automation, vol.RA-2, no.l,pp14-23 (1986)
Remote-brained
robotics:
interfacing
[3]M.Inaba:
AI withrealworldbehaviors,
Proc.of ISRR-93
(1993)
EuslispReference
Manual,
ETL tech[4]T.Matsui:
nicalreport,ETL-TR-92-35,
Electrotechnical
Laboratory,
Japan(1992)
140
Download