Developing Industrial Multi-Agent Systems f J. M.

From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
fDeveloping Industrial
Multi-Agent Systems
N. R. Jennings
J. M. Corera
I. Laresgoiti
Dept. Electronic Engineering,
QueenMary&Westfield College,
Mile End Road, LondonE1 4NS, UK.
N.R.Jennings@
qmw.ac.uk
IBERDROLA
S. A.
c/Gardoqui 8,
48008Bilbao, Spain.
jose.corera
@iberdrola.es
LABEIN,
Parque Technologicode Zamudio101
48016 Zamudio,Spain.
lares@labein.es
(INVITED PAPER)
Abstract
Thedevelopment
and deployment
of multi-agentsystemsin real
worldsettings raises a number
of importantresearchissues and
problemswhichmustbe overcome
if DistributedAI (DAD
is to
becomea widespreadsolution technology.Workundertakenin
the context of the ARCHON
project has provideda numberof
importantinsights into these issues. Byprovidingan in depth
analysis of ARCHON’s
electricity transportation management
application,this paperdrawstogethermanyof the experiences
obtainedwhenbuildingoneof the world’sfirst operationalDA!
systems.
Introduction
In manyindustrial applications a substantial amountof
time, effort and finance has been devoted to developing
complexaud sophisticated software systems. These systems are often viewedin a piecemeal manneras isolated
islands of automation,when,in reality, they shouldbe seen
as componentsof a muchlarger business function (Jew
nings, 1994a). The main benefit of taking a holistic
perspectiveis that the partial subsystemscan be integrated
into a coherent and consistent super-systemin whichthey
worktogether to better meetthe needsof the entire application. Bythe veryfact that they are integrated, the finite
budgets available for information technologydevelopment
can be madeto go further - consistent and up-to-date versions of the data can be sharedby all the problemsolvers,
basic functionalities need only be implementedin one
place, problemsolving can makeuse of timely information
whichmightnot otherwisebe available, and so on.
Twocomponentsare required to develop a well-structured DAIsystem: a software frameworkwhich provides
assistance for interaction betweenthe constituent subcomponents and a design methodologywhichprovides a means
of structuring these interactions. ARCHON
addresses both
of these facets: providinga decentralisedsoftwareplatform
whichoffers the necessarycontrol and level of integration
to help the subcomponents
to worktogether and devising a
concomitant methodologywhich offers guidance on how
to decompose
the overall application and howto distribute
the constituent tasks throughout the communityto make
best use of the capabilities of the ARCHON
framework.
Both of these facets havebeen applied to a numberof real
world industrial applications (Jennings, 1994b)- however
here the electricity transportation management
application
developedand run on-line at IberdrolaI is the mainfocus.
ARCHON’s
individual problem solving entities are
called agents; these agents havethe ability to control their
f Theworkdescribedin this paperwassupportedby the ESPRIT
II project P2256(ARCHON)
ownproblemsolving and to interact with other community
members.Theinteractions typically involve agents cooperating and communicatingwith one another in order to
enhancetheir individual problemsolving and to better
solve the overall application problem.Eachagent consists
of an ARCHON
Layer (AL) and an application program
(knownas an Intelligent System(IS)). Propose-built
can make use of the ARCHON
functionality to enhance
their problemsolving and to improvetheir robustness.
However
pre-existing ISs can also he incorporated, with a
little adaptation,and can experiencesimilar benefits (Jennings et aL, 1993).Thislatter point is importantbecausein
manycases developingthe entire application afresh would
he considered too expensive or too large a change away
fromproventec.tmology(Jennings and Wittig, 1992).
Tosuccessfully incorporate both purpose-built and preexisting systems, communitydesign must he carried out
from two different perspectivessimultaneously. A top
downapproachis neededto look at the overall needsof the
application and a bottomup approachis neededto look at
the capabilities of the existing systems. Oncethe gap
betweenwhat is required and what is available has been
identified, the systemdesigner can chooseto provide the
additional functionality through newsystems, through
additions to the existing systems, or through the ARCHON
softwareitself. This methodology,
whichis describedmore
thoroughlyin Vargaet al. (1994), shapes the design process by providingguidelines for problemdecompositionand
distribution whichreduceinefficiencies.
Thispaperis organisedalongthe followinglines: section
two provides an overview of the ARCHON
architecture.
Section three describes the Iberdrola application - it
involves seven heterogeneousagents, a substantial number
of which were purpose built, which perform three main
typesof activity (data acquisition,fault diagnosis,andservice restoration) and cooperate in a numberof different
styles. Finally, section four highlights somekey experiences whichimpact uponthe design and implementationof
future DAIsystems.
The Archon Framework
The ARCHON
software has been used to integrate a wide
variety of application programtypes under the general
assumptionthat the ensuingagentswill be loosely coupled
and semi-autonomous.The ISs themselves can he heterogeneous- in terms of their programming
language, their
I. Iberdrolais a largeSpanish
electricutility. Theirtransportnetwork,onwhichthis appficafionoperates,is controlledby the
NorthDispatchControlRoom
(DCR)
located in Bilbao.
Jennings
423
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
algorithm, their problemsolving paradigm,and their hardware platform - as their differences are maskedby a
standard AL-ISinterface. AnALviews its IS in a purely
functional manner,it expects minvoke functions (task)
whichreturn results, and there is a fixed language(Cockburn and Jennings, 1995)for managingthis interaction.
In an ARCHON
communitythere is no centrally located
global authority, each agent controls its ownIS and mediates its owninteractions withother agents (acquaintances).
Thesystem’soverall objectives are expressedin the separate local goals of each communitymember.Becausethe
agents’ goals are often interrelated, social interactions are
required to meetglobal constraints and to providethe necessary services and information. Such interactions are
controlled by the agent’s AL.
In more detail, an agent’s ALneeds to: control tasks
within its local IS (monitor), decidewhento interact with
other agents (planning and coordination module) (for
whichit needsto modelthe capabilities of its ownIS and
the ISs of the other agents - agent informationmanagement
(AIM)module), and communicatewith its acquaintances
(high level communication
module)(fig.
Intelligent
System
"1
r
ARCHONLayer
MU Name
Updat:eTopology
Wit:hAlarms
Input
DISTURBANCE-ID
UpdateTopolog]
WithAlarms
e topology_’~
update
AL-IS
Interface
Requests
[topology_
Control
~onfmnatious
I update
r
I
J
Result
UPDATRD-TOPOLOGY
Figure2: UpdateTopologyWithAlarms
MonitoringUnit
understandthe ALdirectives and for the ALto understand
the IS messages.
MUsrepresent the finest level of control in the AL,at the
next level of granularitythere are phms.Plans are pre-speciHigh Level CommunicationManager (HLCM)
fled acyclic OR-graphsin whichthe nodes are MUsand the
arcs are conditions. Theseconditions can: be dependanton
data already available from previously executed MUsin the
Acquaintance
Ir
plan, be dependanton data input to the plan whenit started,
Models
makeuse of the lockingmechanism
for critical sections of the
Planning &
plan,
or
be
used
to
return
intermediate
results before a plan
Coordination
has completed.
4------D, [ Self Model ]
Module (PCM)
Asampleplan whichstarts the fault diagnosisactivity is
shownin figure 3. Firstly, MO,RM=M~.-q.qAG~.S
are used as
an input to the Set:NewFault:MUwhichnotes that there is
a newfault in the networkand generates a newidentifier
(Dr STURBANCE-ID)
for it - this identifier is retarned as one
of the plan’s intermediate results. The alarm messagesand
Monitor
the disturbanceidentifier are then usedas inputs to the previAI,-IS Imerfaee
ously described Updat:eTopologyWit:hAlarms
MU.
Whenthis MUis complete, the model of the network on
whichthe diagnosiswill be basedis up to date and hencethe
processof identifyinga potential list of faults cancommence.
Thereare twowaysthis activity can proceed: firstly, it is
checkedwhethera list of generatedhypotheseshas already
been provided (and stored in the domaindata componentof
AIM)by an agent called BRSin which case these should
Figare 1: ARCHON
AgentArchitecture
form the start point (the Hypot:hesisGenerat:ionFromForeignSource
MUshould be executed). If no
Monitor
pertinent informationis availablethen the list shouldbe genThe Monitoris responsible for controlling the local IS.
erated from scratch (theHypochesisGeneraccion
MU
EachIS task is representedin the Monitorby a monitoring
shouldhe executed).In the latter case, the planreturnsthe list
unit (MU).MUspresent a standard interface to the Moniof generatedhypothesesas an intermediateresult so that they
tor whatever the host programming
language and hardware
can be used elsewherewithin the agent or even disseminated
platform of the underlyingIS. Figure 2 showsa graphical
to relevant acquaintances.
representation ofa MUcalled UpdaUeTopologyWiChThe plan mechanismhas an inbuilt backtracking facifity
Alarms which takes ALARM-MESSAGES and
whichcan be used to express preferencesand deal with comDISTURBANCE-ID
as inputsand producesUPDATEDplex alternatives.
Consider the plan ReceiveAlarms
TOPOLOGY
asan output. The IS task associated with this
(figure 4) whichdetermineswhat course of action an agent
MUis called UpdateTopologyWit:hAlarms
in the
should take whenit receives alarmmessages.Here there are
ALand t:opolo0y_updat:e in the IS.
three cases: (i) see whether the alarms correspond to
MUscan send and receive messages(directives, confirongoingfault whichis known
about; (ii) see whetherthe mesmations and requests) to and from the IS. All messages
sages have been generated by planned maintenance
have to pass through the AL-ISinterface whichperforms
(manoeuvres)on the network;or (iii) see whetherthe alarms
the translation and interpretation required for the IS to
correspondto a newfault. TheMonitorfirst tries the leflmost
I
Messagesto/from acquaintances
’I
424 ICMAS.95
* t
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
Plan Name:
|
!
ALARM-MESSAGES
StartNewDia~
inte~~T~Rl~CE
-
ID
fa
( rms
G(NE~EUD:PHy~
~
-HYPOTHESES
Figure3: StartNewDiasnos/s
Plan
branch and executes the OngoingFault
MU- if this is
unable to give a disturbanceidentifier for the alarms then
they cannot correspondto a knownfault and so this branch
fails; if, on the other hand,an identifier is foundthen the
rest of the branchis traversed. In the case of failure, the
plan mechanism
backtracksup to the last successful execution (MUCollectAlarms)
and tries the next branch
(theManoeuvres
MU)- thisbranch
failsif thedisturbance
identifier associatedwith the alarmsis not given the
tag "MANOEUVRES".
Finally, if the alarms are not generated by manceuvresthen they correspond to a newfault
and so the StartNewDiagnosis
branch, as described
in figure 3, is invoked(this branchnever fails and so the
ReceiveAlarms
plan will successfully terminate when
the plan has been completed).
Plan Name:
ReceiveAlarms
~ ALARM-MESSAGES
( NOT
( NOT
(DXSTURBANCE-ID
?X))) (DISTURBANCE-ID
~s))}
Figure 4: ReceiveAlarmsPlan (with backtracking)
Thehighest level at whichthe IS’s activities are represented is the behaviourlevel. Belmvioarscontain a plan, a
trigger conditionfor activating the behaviour,descriptionsof
the inputs neededby the activity and the results whichwill
be produced, and any children of the behaviour. There are
two types of behaviour: those that are visible to the FCM
(and the other ALcomponents)and those that are purely
internal to the Monitor.Theformertype are called .Idlk,
and
they maybe triggered by newdata (either arriving fromother
agents or whichthe agent has generateditself) or by direct
requests fromother agents.
plannin~ and Coordimfion Module (PC/M)
ThePCM
is the reflective part of the AL,reasoningaboutthe
agent’s role in terms of the wider cooperating community
(Jennings and Pople, 1993). This modulehas to assess the
agent’s current status and decide whichactions should be
taken in order to exploit interactions with others whilst ensuring that the agent contributes to the community’s
overall
well being. Specific examplesof the PCM’s
functionality include: deciding whichskills should be executedlocally and
whichshould be delegated to others, directing requests for
cooperationto appropriate agents, detert~nining howto respendto requests fromother agents, and identifying whento
disseminate timely information to acquaintanceswhowould
benefit fromreceivingit.
The PCMis composedof generic rules about cooperation
and situation assessmentwhichare applicablein all industrial applications - all the domainspecific informationneeded
to define individual behaviouris stored in the self and acquaintance models. The former contains information about
the local IS andthe latter containinformationaboutthe other
agents in the systemwith whichthe modellingagent will interact. For example,in order to determinehow
to obtain information which is neededmexecute a behaviourbut which
is not currently available, the PCM
will makereferenceto its
self modelto see if the informationcan be providedlocally
by executingan appropriate skill. If the informationcannot
be provided locally then the acquaintance models are
checkedto see if another community
membercan provide it.
Thefinal majorrole of the PCM
is to deal with requests arriving fromother agents. Byreference to its self model,it
will decidewhetherto honourthe request and will then activate the necessaryskill to provide the requestedd,,~; when
the informationis available it will ensurethat a reply is directed to the sourceof the request.
Agent Information Mmmgmument
Medele
The AIMmoduleis a distributed object management
system
which was designed to provide information management
services to cooperating agents CFuijnman
and Afsarmanesh,
1993). WithinARCHON,
it is used to store both the agent
modelsand the domainlevel data.
As an illuswafion of the agent models, consider an agent
which is capable of producing information about ALARMMESSAGES.
Theinterest slots of its acqnainmucemodels
contain those agents whoare interested in .,~ceiving this
information and the conditions underwhichthey are interested. The following portion of the acquaintance model
specifies that an agent called BRSis interested in ALARMMESSAGES
which contain chronological information, that
an agent called AAA
is interested only in non-chronological
alarm messages,and that an agent called BAIis only interested in non-chronological alarm messageswhich have the
string INTwithin their ALARMS
field:
INTERF.~-DESCRIPTOR
INFORMATION-NAME:
ALARM-ME88AGF.S
Jennings
425
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
INFORMATION-CONDITION:
[ (’BRS’,(CONTAIN
(ALARM-MESSAGES
=CHRONO
"YES’)));
(=AA.A’,(CONTAIN
(ALARM-MESSAGES
"CHRONO
"NO’)));
(~BAI’, (AND(CONTAIN
(ALARM-MESSAGES
"CHRONO
"NO’))
(CONTAIN
(ALARM-MESSAGES.ALARMS
"IN~)));]
In manyindustrial applications the domainlevel data
whichthe agents need to exchangehas a complexinternal
structure. In ARCHON,
this structure is specified and
maintained by AIM.For example, the information type
AL,MLM-MESSAGES
is defined in the following manner:
ALARM-MESSAGES
AGENT-ID:
AGENT
DISTURBANCE-ID:
SOURCE
CHRONO:
Y-N-FlAG
BLOCK-TYPE:
BLOCK-TYPE
BLOCK-ID:
ID-TYPE
ALARMS:
LIST-OF-ALARMS
whereeach of the followingtypes has the followingset or
permissable values: agent (CSI I AAA), source
(yymmddhhmmss
I MANOEUVRES
I UNKNOWN),
y-nflag (YES I NO), block-type (UNIQUEI UNKNOWN
MIXED),id-type (yymmddhhmmss),
and list-of-alarms
(alarml..... alarmn).
High Level Comm~tieation Module
The High Level CommunicationModule (HLCM)allows
agents to communicatewith one another using services
based on the TCP/IPprotocol. The HLCM
incorporates the
functionality of the ISO/OSISession Layer whichcontinuously checks communication
links and provides automatic
recovery of connectionbreaks whenpossible.
Electricity
Transport Management
Energymanagement
is the process of monitoringand controlling the cycle of generating, transporting and
distributing electrical energy to industrial and domestic
customers. Generation transforms raw energy into a more
accessible form. Energyis then transportedfromits generation site to the consumer. To minimise losses during
transportation, the electrical voltage is madehigh (132 kV
or above) before it is placed on a transport networkand
sent over manyhundredsof kilometres. Finally, the voltage
is loweredand electricity is delivered to the consumers
using a distribution network.
Toensure the transportation networkremains within the
desired safety and economicalconstraints, it is equipped
with a sophisticated data acquisition system(SCADA)
and
several conventional application programswhichhelp the
operator (control engineer) to analyse it (these programs
are primarily designed for normaloperating conditions).
The network’s operation is monitored from a DCRand
wheneveran unexpected event occurs hundreds of alarms
arc automatically sent to it by the SCADA
system. Under
these circumstancesthe operator has to rely on experiential
knowledgeto analyse the information, diagnose the situation, and take appropriate remedialactions to return the
networkto a safe state. Toreduce the operators’ cognitive
load in such circumstances, and to help themmakebetter
decisions faster, Iberdrola decidedthat a numberof decision support systems should be developed. Thesesystems
were then interconnected and subsequentlyextendedusing
ARCHON
technology - for a more detailed analysis refer
to Jenningset aL (1995).
Whyuse DAItechniques?
WhenIberdrola decided, in 1988, to implementdecision
support tools to ease the workloadof their control engineers during disturbances, several technical factors
affected their design choices. Firstly, the control system
426
ICMAS-9$
itself wasa proprietary productfroma control systemssupply company- thus it was considered too risky and too
difficult to embedthe additionalfunctionalitydirectly within
it. Secondly,the state of the art for commercial
systemsin
this domainmeantthat the diverse support functions could
only be realised through a numberof standalone systems.
Consequently,Iberdrola built separate decision support systemsto assist with different aspectsof the control engineer’s
job - the one whichis mostrelevant to the subsequentdiscussion is the alarms analysis expert systemwhichdiagnosed
faults producedin the networkbasedon the alarmmessages
which arrived at the DCR.Their decision support systems
were unconnectedapart from the fact that they retrieved
information about the networkfrom the samesource (the
control system’sreal time database). Tomakethis information available to the non-proprietary software products a
numberof interfaces to the control systemhad to be written
- as well as providingaccess,these interfacescouldfilter and
pre-process networkinformation.
By 1991, however,someimportant changes had occurred:
(i) the evolution of IT hardwareand software had significantly increased the quantity and quality of the data which
couldbe acquiredfromthe transport network;(ii) distributed
computing had becomecommercially viable because of
improvements
in local area networktechnology;and (iii) the
prices of computershad decreasedsignificantly so that powerful machineswere no longer prohibitively expensive.
Takentogether, these changesmeant that better and more
powerfultools couldbe built to assist the controlengineer.In
particular it wasconsideredimportantto be able to actually
perform and dynamicallymonitor the service restoration
process and also to exploit the newdata sources, such as
chronological information or faster rate snapshots, which
becameavailable. However
the tried and tested decision support tools werestill needed.Thusit wasdecidedto adopta
system upgrading strategy which enabled the previously
operational componentsto be used in conjunction with the
newfunctionality. Twomeansof realising this strategy were
considered:extendthe existing systemsto coverthe newfeatures or follow a distributed approach and allow the new
functionality to be expressedas distinct computational
entities which could interact with the pre-existing systems
through a common
distribution platform. The secondoption
was chosenbecause it was consideredto be the mosteffective meansof:
( i) Permittingreasoningbased on informationof different
granularity. Twotypes of alarm, chronological and nonchronological, nowneededto be dealt with. In non-chronological alarms, the time stampedis coincidentwith the time
of acquisition by the control system,whereasin chronological alarms the time stampedis coincident with the actual
occurrenceof the event. Aschronologicalalarmsrepresent a
moreaccurate picture of events in the networkthey generally
lead to a swifter diagnosis, howeverthey havethe disadvantage that chronological information has a low priority in
Iberdrola’s communication
channels. Thus whenthe channels are saturated (as can happenduringa disturbance)their
availability time is unpredictable.For these reasonsit was
decided to build a newalarm analysis expert systemwhich
utilised chronological information and could subsequently
integrate its results with those of the pre-existing system,
rather than construct a monolithic system whichreceived
both types of data and had to embody
both types of diagnostic knowledge.A similar situation occurs whenconsidering
service restoration. Twotypes of informationare relevant to
this activity: snapshots(whichprovidea comprehensive
pic-
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
tore of the current state of all the componentsin the
network) and alarm messages(which showhowthe state
of the components
has changedover a period of time). The
former can be producedrelatively quickly and give a complete picture of the system’sstate, whereasthe latter may
take several minutesfor a large disturbancebut are needed
to indicate the type of fault fromwhichthe systemmustbe
restored. Rather than trying to place bothtypes of information and reasoning in a single system it seemedmore
natural to develop a service restoration subsystemwhich
dealt mainly with snapshots and received the necessary
high-level informationabout the equipmentat fault froma
diagnosissubsystem(rather than trying to deal with the raw
alarmmessagesitself).
(ii) Allowing different network models to be included
within the same syster~ Someof the problem solvers
needed to work on the SCADA
model of the network,
while others needed the applications network model(a
modelwhichpermits networkequations to be solved and
takes the physicalcharacteristics of all its components
into
account). Rather than trying to combineand hannonise
these complexand disparate modelsat design time, it was
decided that each subsystem should work on whichever
modelwas most appropriate for its task. Thenthe various
components
should be able to interact at mntimeto resolve
any inconsistencieswhicharise fromtheir use of different
network models.
(iii) Enablinga numberof different problemsolving paradigmsto be utilized. Thediverse range of activities which
neededto be performedin this application meantthat there
was no universally best problemsolving paradigm:procedural techniqueswererequired for algorithmiccalculations
like connectivity (to knowwhich componentis connected
to whichother) and load-flowanalysis (solution of the network equations), whereas symbolic reasoning based on
heuristic search wasthe best approachto diagnosis. A distributed approach enabled each componentto be encoded
in the most appropriate method.
(iv) Meetingthe application’sperformance
criterio. Transportation management
is a time-critical application and as
manydifferent types of information can be processed in
parallel, with only a small synchronisationoverhead,the
response time of the overall system can be improved
throughthe use of a numberof interconnectedmachines.
Having decided upon a distributed approach, a choice
had to be madebetweenusing moreconventional distributed processing techniques or DAItechniques - here the
latter wasadoptedfor the followingreasons: (Barandiaran
etaL, 1991; Abel etal., 1993)
(i) Economy: The alarms analysis expert system was
already operational, howevernewfunctionality neededto
be addedand newinformation neededto be treated. It was
estimated that the cost of modifyingthe extant systemwas
significantly larger than that of implementinga newone.
However
it wasalso judgedthat as the newfunctionalities
and data were so diverse that it wouldbe an extremely
expensive activity to put them within a single system.
Therefore it turned out to be more economicalto build
smaller systems, and re-uso the existing alarm analysis
expert system, and allow them to be integrated through a
DAI framework. A DAI framework was needed because
the interactions between these subsystems were both
sophisticated and context dependent, therefore run time
reasoning based on dynamicdata needed to be performed.
(6) Robusmess: As the subsystems have overlapping
domainsof expertise, the failure of one of themto produce
an answerdoes not necessarily meanthat no solution will be
f~comin~
of the
other systems
may
be ablethis
to
nee atbecause
least aone
partial
solution.
However
to achieve
back-upfunctionality in a flexible manner,the different
problemsolving components
needto be intelligently coordinated - a task beyond present generation distributed
processing systems.
(iii) Reliability: Thesolutionsof the systemsthat overlapcan
be cross-referenced to enable the operator to be presented
with morereliable information.Again,however,this crossreferencing functionality needs to be properly managed
according to the prevailing circumstances and so requires
dynamicand flexible reasoningto take place.
(iv) Natural representation of the domain:A DAIapproach
accurately represents the way the control engineers work
whena large disturbanceoccurs. Theyspecialize their roles
- one looks after restoration, another tries to diagnosethe
problembasedon different sources of infmmation,and so on
- and they then communicaterelevant information to one
another to ensure they are following a coherent course of
action towardsthe overall objective of restoring the service
(Jennings and Wittig, 1992).
Specification of the Agents
During normal workingconditions, management
of the networkby the operator in the DCRconsists mainlyof topology
changes (operation on breakers and switches), generation
scheduling,and control of the energyinterchangewith other
utilities (Corera et aL, 1993). However,during emergency
situations management
becomesconsiderably moredifficult
because of the large numberof constraints whichhaveto be
taken into considerationand the insufficient quality of the
information which is available to makethese decisions.
Emergency
situations typically originate froma short circuit
in a line, bus-baror transformer.Theycan be exacerbatedby
equipmentmalfunctioning(eg a breaker failing to open)
subsequentoverloads(a dominoeffect cancauseone line to
fail becauseof an overload,this in turn .increasesthe loadon
neighbouring lines so they becomeoverloaded and subsequently fall, and so on). The situation can becomeeven
worse if powerstations becomedisconnected as this will
cause an imbalancein the network’s power. Consequently,
actions to restore service must be taken rapidly and accurately, so that whatstarts as a relatively minorproblemdoes
not escalate into a majordisaster. In these circumstances,the
actions whichthe operator can perform consist mainly of
breaker operations, topologychanges,and activation/deactivation of automatismsand protective relays. For larger
disturbances, however,actions on powerplants mayalso be
required.
Fromthis description of the control engineer’sjob, a topdownanalysis identified that a comprehensive
decision support systemshouldcover the followingactivities: (i) Detect
the existence of disturbances; sometimesthe operation of
protective relays and breakers can be caused by routine
maintenanceand this should not be confusedwith a genuine
disturbancesituation; (ii) Determinethe cause, location and
type of the disturbance;including identifying if any equipmentis permanentlydamaged;(ili) Analysethe situation
the networkonceit arrives at a steadystate; and(iv) Prepare
a restoration plan to return the networkto its original operational state
Allying this top-downanalysis with the bottom-upperspective of examiningthe extant systems, it wasdecidedto
encapsulate the followingpre-existing systemsas agents the alarms analysis expert systemand the interface to the
control system. As discussed earlier, the availability of
Jenuings
427
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
chronological alarm messagesnecessitated a newdiagnosis systemwhichit was decided to makeavailable as an
agent. Finally, it wasalwaysknownthat informationabout
the initial area out of service(the blackout area)couldhelp
constrain the search of the faulty equipment,howeverit
was never deemedcost effective to develop a dedicated
stand alone systemfor this purposesince the original alarm
analysis expert system’sperformancewas considered satisfactory (if somewhatslow). Howeverthrough the use
DAItechnologymuchof the basic infrastructure to implement this functionality was nowavailable from other
agents and so it was considered economicallyviable to
developa systemcapable of producingthis information(in
terms of the ARCHON
methodology,this decision correspondsto providing additional functionality through the
developmentof new systems).
In more detail, the operational DAIsystem consists of
sevenagents runningon five different machines.
¯ BAI(Black-outAreaIdentifier) Whena fault occurs, the
network’sprotective relays and breakersautomaticallytry
to isolate the minimum
amountof equipmentpossible; in
an ideal case only the elementat fault wouldbe isolated.
The BAI’sobjective is to identify whichelements of the
networkare initially out of service as the actual elementat
fault mustbe within this region. It uses non chronological
alarm messagesas its information source and cooperates
with the BRSand the AAA
mincrease the efficiency of the
overall diagnosisprocess.
¯ CSI (pre-existing ControlSystemInterface) TheCSI acts
as the application’s front end to the control systemcomputers. Its objectivesare to acquireanddistribute networkdata
to the other agents, to interface to the conventionalmanagementsystemapplication programs, and to monitor the
restoration processto detect any unexpecteddeviations. It
is split into twophysical agents: CSI-Dwhichdetects the
occurrence of disturbances and preprocesses the chronological and non chronological alarm messageswhich are
used by the AAA,the BAIand the BRS;and CSI-Rwhich
detects and corrects inconsistencies in the snapshot data
file of the network,calculates the powerflowingthroughit
and makesthis information available to the SRAand the
UIA.CSI-Dis primarily concernedwith the system’sdiagnosis activities andCSI-Rwithits restoration activities.
¯ BRS(Breakers and Relays Supervisor) The new alarms
analysis expert systemwhichdetects the occurrenceof a
disturbance, determinesthe type of fault and its extent,
generates an ordered list of fault hypotheses, validates
hypotheses, and identifies malfunctioningequipment.In
order to performits analysis, it takes twotypes of inputs:
chronological alarm messagesand snapshots of the networkwhichgive the status of every breaker and switch.
¯ AAA
(pre-existing, non chronological AlarmsAnalysis
Agentexpert system) This agent pursues similar goals to
the BITS,howeverthe quality of informationit receives is
inferior to that of the BRS.Althoughthe alarm messages
received by both systemsrelate to the samephysical operations, those received by the AAArepresent :t5 seconds
accuracy, while those received by the BRSare precise.
This meansthat if the data is error free, then the BRSperforms a better diagnosis than the AAA.Howeverif someof
the chronologicalinformationis lost (a distinct possibility
when the SCADA
system is busy) then the BRSmayperform worse than the AAA.Therefore wheneverincomplete
or erroneousinformationexists, whichis in mostinteresting cases, there is a need for cooperationbetweenthe two
systems to make the overall system more robust and
reliable.
428
ICMAS-9$
¯ SRA(Service RestorationAgent)This agent devises a service restoration plan to return the networkto a steady state
after a blackouthas occurred.Todo this it takes into account
the constraints imposedby the damaged
equipment,as identiffed bythe diagnosisagents.
¯ UIA(User Interface Agent) This agent implements the
interface betweenthe users and the community
of agents. It
gives the user the facility to inspect the results producedby
the diagnosisagents, display the alarmsreceived, and browse
throughthe log of analyseddisturbances. Fromthe point of
viewof restoration, the user can see the plan produced,modify it, run it in a simulatedenvironment
to see its predicted
effect, and request the development
of a newrestoration plan
whichtakes into account someactions whichhe deemspertinent. Throughthe use of a distributed windowingsystem,
the UIApresents the appropriate informationon the consoles
of the various control engineers whoare workingon the
system.
This systemdesignensuresthat all the tasks identified by
the top-downanalysis are performedby at least one agent.
Robusmessis achieved by having multiple agents that are
able to provide the same (or at least some) overlapping
results. Efficiencyis obtainedby the parallel activation of
tasks. Reliability is increased because even if one of the
agents breaks downthe rest of the agents can often produce
a result which,althoughnot as goodas the one providedby
the completesystem,is still of use to the operator.
CooperativeScenarios
An important example of cooperation in this system
involves the information interchange between the AAA,
BItS and BAI agents. The AAAand the BRSproduce the
sameresult from different information sources, while the
BAIapplies different knowledgeto produce a result that
should he coherent with that of the AAA
and the BRS.
Assumea block of non-chronologicalalarm messageshas
been provided by the SCADA
system and these alarm messages havebeenidentified as related to a disturbanceby the
CSI. Usingthe interest descriptors of its acquaintancemodels - see the AIMsection - the CSI will realise that this
information is relevant to the AAA
and the BAI. Application of the appropriate PCM
generic rule will result in this
informationvoluntarily being sent to the specified agents as
unsolicited data. Sometime later, the sameprocess will be
repeated and the BRSwill receive the correspondingchronological alarm messages. At this point, the AAA,BAIand
BRSare all operatingin parallel.
Whenthe AAAreceives the alarm messages, and the corresponding disturbance identifier marks themas being a
consequenceof a newfault (see figure 4), the St~artNewDiagnosis plan is executed and a preliminary set of
hypotheses (GENERATED-HYPOTHESES)
are produced.
During this time, the BAIwould also have received the
alarmmessagesand wouldhave started its skill for producing the Iniuial-BZack-Oun-Area.
Whenthis plan is
complete, the BAI’s PCMchecks whether any other agents
are interested in this information- it finds out that the AAA
is andso it sendsout the information.
Simultaneously,aitough after a certain delay, the BRS
agent starts workingon the analysis of the chronological
alarm messages.This will also result in a list of GENERATED-HYPOTHESES
being produced. The BRS checks
whetherany agents are interested in this information- again
the AAA
is noted and the generated hypothesesare sent to
it. TheBRSthen continueswithits diagnosisto try andvalidate the causeof the fault.
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
After producingits tentative list of hypotheses,the AAA
proceedswith a detailed analysis to try and ascertain the
precise cause of the fault (i.e. to produce VALIDATEDHYPOTHESES).
The following situations maythen occur:
(i) the Initial-Black-Out-Area is available to the
AAA,this triggers a refinement behaviour which may
reduce the numberof hypothesesto be validated because
the BAIhas givena focusedviewof the situation: (ii) the
generated hypotheses provided by the BRSare available
to the AAA.this triggers another refinement behaviour
and obtains a better reorderingof the hypothesesto be validated anda benefit in findingthe elementat fault; (iii) the
validated hypothesesprovidedby the BRSare available to
the AAA,this triggers yet another refinement hehaviour,
whichhas the samefunctionality as the previous one, but
the reordering is based on validated hypotheseswhichare
moreaccurate; (iv) if no informationis available fromthe
BAIor BRS,the AAA
proceeds with its hypotheses validation as a standaioneagent. Therefore,if the other agents
are downor they are too slowto provide the information,
the AAA
will continue and find a faulty element although
its diagnosiswill be less reliable andwill take longer.
The restoration process is activated whenevera disturbanceis detected. Oncethe disturbanceis identified, the
disturbance identifier is sent to the CSI-Rwhichacquires
the snapshot of the network, corrects any inconsistencies
whichhavearisen in its representation, and calculates the
powerflow solution of the current state. This information
is then passedontothe SRA
so that it can preparefor its restoration planning. TheSRAwaits until the diagnosisagents
haveinformedit of the elementsuspectedof being at fault
(VALIDATED-HYPOTHESES)
and then proceeds to prepare a restorationplan. If, duringthis plan preparation,the
SRAis informedthat the equipmentat fault is different
from that originally indicated by either the AAA
or the
BRS,then it replans the restoration taking this information
into account.
TheUIAis the interface throughwhichthe user accesses
the results producedby the agent community.Duringthe
diagnosisphase, the user is presentedwith beth the tentative (early) list of suspected hypotheses and the final
(validated) list. Duringthe restoration phase, the UIAsupports a moreparticipatory interaction betweenthe user and
the agent commumty.
The user is presented with the restoration plan and can then decideto modifyit, run a detailed
simulationto see the effects of the plan on the state of the
networkor ask for a new plan to be devised taking into
account newconstraints whichhe specifies. The UIAalso
supportsa reporting functionality in that the control engineer can ask for the logs of the disturbancesto he presented
and analysed.
Observations and Reflections
This application has beenin operationin Iberdrola’s North
DCRsince the beginning of 1994 and has afforded a
numberof benefits. Firstly, the agent systemgives better
results than its stand alone counterparts becauseit takes
multiple types of knowledgeand data into accountand then
integrates themin a consistent manner.Secondly,the agent
systemis morerobust because there are overlappingfunctionalities whichmeanspartial results can be producedin
the case of component
(agent) failure. Thirdly, someresults
can be provided more quick!y because cooperation prorides a short cut (see previous section). Fourthly, the
functionalities of the different domainsystems can be
increased independently which makesthem easier to mainrain (see, for example,the argumentfor developingthe BAD.
Fifthly, the control engineer is providedwith an integrated
viewof the results he is interestedin. Finally, the systemhas
been designedto be open so that newagents can he addedin
an incremental manner.
Oneof the key features of this multi-agent systemis the
wayit handlesfault diagnosisby using twodifferent types of
data (the non-chronologicalalarms used by the AAA
and the
chronological alarms used by the BRS)and two different
points of view(the typical diagnosis approachof hypothesis
generation and validation used by the AAA
and BRS,and the
BAI’s monitoringapproachwhichprovides a high level view
of the status of the network).Withthis set-up, it is possible
to dynamically select the solution methodwhich is best
suited to the current situation. For example,if the BRSis
operational, but the AAA
is not, the solution providedto the
control engineeris the one createdby the BRS;but if boththe
BItS and the AAA
are running, the solution provided is the
one which is mutually agreed betweenthem. Also the fact
that multiple agents are trying to generatethe sameresults
can he exploitedto avoidrepetition of certain tasks if it is
deemeddesirable in a particular context. For exampleboth
the AAAand the BRScan provide GENERATED-HYPOTHESES,consequently if the generated hypotheses that are
providedby the BRSare available to the AAA
before it starts
its owngenerationtask, then this task neednot be executed
and the hypothesesprovidedby the BRScan he used instead
(see figure 3). This ability to flexibly manage,at runtime,
multiple sources of data and multiple problemsolving perspectives provides enormousrobustness to the overall
systembecauseif one of the agents crashes the others will
still be able to providesomeformof solution.
As a consequenceof the experience obtained during the
developmentand installation of this multi-agent system,
someimportant application design improvementsare foreseen for the future. Thefirst drawbackof the current system
is causedby the fact that the energytransport networkcovers
a vast geographic area (meaningthere is a huge amountof
topological information) and that it encompassesa number
of different voltage levels. As the network’s hehaviour
dependsboth on the voltage level and the geographiclocation, the main problemsolving agents (the AAA,BRS,BAI
and SRA)have to contain and manageinformation about
Iberdrola’s entire transport network.This meansthe agents
require a substantial amount of memoryand computing
resource because their searches are through such a large
problemspace. To combatthis problem,the next version of
the system will be designed so that the agents workwith
smaller portions of the network;this will m&ke
themeasier
to debugand maintain, faster in execution, and morecost
effective in that they could run on PCs instead of
workstations.
The seconddrawbackof the current systemis that all the
agents haveuniformknowledgeof the network.For instance,
the AAA
applies virtually the sameknowledgeabout protective relays to its 400 kV,220kVand 132kVvoltage levels.
However,if there were one AAA
(or BItS or SRA)per voltage level, it wouldbe possible to customisetheir domain
knowledge.For example,the followingpieces of potentially
useful knowledge
couldbe reflected: the fact that protective
relays on the 400kVvoltage level are morereliable than the
onesat 132kVvoltagelevel, and the fact that the 400kVnetworkis more interconnected than the 132 kV networkand
has morecomplexbreakerstructures (like central breakersor
rings of breakers). A further sourceof heterogeneitywhichis
Jennings
429
From: Proceedings of the First International Conference on Multiagent Systems. Copyright © 1995, AAAI (www.aaai.org). All rights reserved.
currently maskedis that the networkitself is the result of
the fusion of a numberof smaller transport networksthat
were developed by different companiesbefore coming
under the overarching umbrella of Iberdrola. Thus, for
example, the protective relays of the Northwestern
Iberdrola networkare different from the protective relays
in the rest of the network.Againthis informationcould be
exploited if smaller and more specialised agents were
developed.
On a more general note, although the ARCHON
approachwas influenced to a large extent by the need to
incorporate pre-existing industrial control softwareinto a
multi-agent community,
it is felt that cooperative interworking, heterogeneity, semi-autonomy, and loose
coopfing are am’ibutes that are likely to be encountered
whenbuilding mostcomplexsystems(even if they are built
entirely fromscratch and are not deliberately conceivedas
DAIsystems). This belief is based on three mainobservations. Firstly, mostlarge organisatious,wherethe majority
of complexsystems reside, have departmental structures
whichneedto be observed,but the individuals within these
structures often needto worktogether in a coherent manner. Secondly,the components(including both the humans
and the software)within an organisation, and evenwithin
department,are likely to be heterogeneoussimply because
each one has to be based upona different modellingparadigmin order to be effective. Finally, complexityis best
handled by devolvingresponsibility for decisions to the
level at whichthe actions are performed(hence the problemsolvers will be loosely coupledand semi-autonomous).
In most cases the domainsystems that already existed
were each fairly complexand had been designed to encompass those aspects of the domainthat could be expressed
within a coherent modelling paradigm. They were
designed in this mannerbecause conventional wisdomdictates that when building a system using a particular
modellingapproach,one should include everythingthat is
knownabout the system’s world that can be expressed
within that model.However,
after due reflection and observation it is noted that once several such conventional
systems are brought together into a cooperative framework, then completeness and comprehensivenessare no
longer the key criteria for allocating agents. In a multiagent system,questionsof the efficiencyof the overall system are likely to be paramount.This, in turn, maydictate
that an agent is identified withthe smallestpossiblecoherent and autonomousentity. As a consequencethe system
mayhave a large number of such agents which may, in
turn, havean implication on the performanceof the overall
system. The ARCHON
experience as yet does not extend
to systems with many(hundreds) of agents, howeverit
likely that in such situations the designermayneedto consider if someof the smaller agents need to be coalesced
again.
Otherimportantexperiencesfromthis workare that: (i)
speed of operation is a factor even wherethe real-time
requirementsof the underlying processes are longer than
milliseconds (because a great manyconcurrent processes
are active and their interactions are cumulative);(ii) passing of intermediate results (or progress reporting) is
effective meansof increasingsystemparallelism; (iii) data
whichis exchangedbetweenagents ought to have a degree
of persistence so that troubleshootingcan take place and
audit records can be maintained;(iv) significant improvementsin the overall application can be obtained through
relatively straightforwardcooperativeinteractions. Finally,
430
ICMAS-9$
it is importantthat a clear and detailed DAImethodology
is
workedout, especially in viewof the abovediscussionabout
agent granularity - ARCHON’s
informal hybrid approachis
an importantfirst step in this direction but it needsto be made
morerigourousand havean evenclearer link to the software
developmentprocess.
References
Abel,E., Laresgoiti, I., Perez, J., and Corera,J., and Echavarri, J., 1993. A multi-agent approach to analyse
disturbancesin electrical networks.Prec. FourthInt. Conf.
on Expert Systems Applications to Power Systems, Melbourne, Australia.
Barandiaran,
J., Laresgoiti,I., Perez,J., Corera,J., andEchavarri, J., 1991.Diagnosing
faults in electrical networks.Prec.
EXPERSYS
91, Paris, France.
Corera,J., Echavarri,J., Laresgoiti, I., Lazaro,J. M., and
Perez, J., 1993.On-lineexportsystemfor service restoration.
Prec. Fourth Int. Conf. on Expert SystemsApplications to
PowerSystems,Melbom-ne,Australia.
Cockburn,D., and Jennings, N. R., 1995. ARCHON:
A DAI
Systemfor Industrial Applications. In Foundationsof DAI
(eds. G. M.P. O’ Hare&N. R. Jennings), WileyInterscience
Jennings, N. R., Corera,J., Laresgoiti, I., Mamdani,
E. H.,
Perriolat, F., Skarek, P. and Varga. L. Z. 1995. Using
ARCHON
to develop real-word DAI applications for
electricity
transportation managementand particle
accelerator control. IEEEExpert.
Jennings, N. R. 1994a. Cooperationin Industrial MultiAgentSystems. Series in ComputerScience - Vo143,World
Scientific Press.
Jennings, N. R. 1994b. The ARCHON
system and its Applications. Proc Second Int. WorkingConf. on Cooperating
KnowledgeBased Systems, Keele, UK,13-29
Jennings, N. R., and Pople, J. A., 1993. Designand Implementation of ARCHON’s
Coordination Module. Prec.
Workshop on Cooperating Knowledge Based Systems,
Keele, UK,61-82.
Jennings, N. R., Varga, L. Z., Am’nts,R., Fuchs, J., and
Skarek, P. 1993. TransformingStandalone Expert Systems
into a Community
of Cooperating Agents. Int. Journal of
EngineeringApplications of AI 6 (4) 31%331.
Jennings, N. R., and Wittig, T., 1992. ARCHON:
Theory
and Practice. In DistributedArtificial Intelligence: Theory
and Praxis (eds. N. M. Avourisand L.Gasser), KluwerAcademicPress, 179-195.
Tuijnman, F., and Afsarmanesh, A., 1993. Distributed
Objects in a Federation of Autonomous
CooperatingAgents.
Prec. Int. Conf. on Intelligent and CooperativeInformation
Systems, Rotterdam,Netherlands, 256-265.
Varga,L. Z., Jennings, N. R., and Cuckburn,D. 1994. Integrating Intelligent Systemsinto a CooperatingCommunity
for Electricity Diswibution Management.Expert Systems
with Applications(1994) 7 (4) 563-579.