ResQ Freiburg: Team Description and Evaluation

advertisement
ResQ Freiburg:
Team Description and Evaluation
Alexander Kleiner Michael Brenner Tobias Bräuer
Christian Dornhege Moritz Göbelbecker Mathias Luber
Johann Prediger Jörg Stückler
Institut für Informatik
Universität Freiburg
79110 Freiburg, Germany
www.informatik.uni-freiburg.de/~rescue/
Abstract
ResQ Freiburg is the world champion of the 2004 RoboCup competition in the Rescue simulation league. RoboCupRescue is a large-scale multi-agent simulation of urban
disasters where, in order to save lives and minimize damage, rescue teams must effectively cooperate despite sensing and communication limitations. To accomplish this,
ResQ Freiburg introduced new methods for hierarchical path planning, death-time prediction of civilians, coordination of multi-agent city exploration, as well as an any-time
rescue sequence optimization based on genetic algorithms. To evaluate the usefulness
of these techniques we performed an extensive evaluation of the log files of the best
participating teams in the competition. Our analysis explains the reasons for our team’s
success, and thus could also provide an evaluation tool for future competitions.
1 Introduction
The RoboCupRescue simulation league is a part of the RoboCup competitions and aims at
simulating large scale disasters and exploring new ways for the autonomous coordination
of rescue teams [6]. These goals are socially highly significant and feature challenges unknown to other RoboCup leagues, like the coordination of heterogenous teams with more
than 30 agents, the search for civilians by the exploration of a large scale environment, as
well as the scheduling of time critical rescue missions. Moreover, challenges similar to
those found in other RoboCup leagues are inherent to the domain: The simulated environment is highly dynamic and partially observable by a single agent. Agents have to plan and
decide their actions asynchronously in real-time.
This paper presents the approach of the ResQ Freiburg team, the winner in the
RoboCupRescue simulation league at RoboCup 2004. The contributions of this work are
methods for hierarchical real-time path planning, the prediction of the life-time of civilians
based on CART [1] regression and ADABoost [4], the efficient coordination of an active
disaster space exploration based on senses, communication and reasoning, as well as an
any-time rescue sequence optimization based on genetic algorithms.
We compare the performance of our team with the performance of other teams in terms
of their capability of, extinguishing fires, freeing roads from debris, disaster space exploration and civilian rescue. The evaluation is carried out with information extracted from
simulation log files that were gathered during the RoboCup competition 2004. Our results
1
explain clearly the success of our team, but also confirm the approaches proposed in this
paper.
The remainder of this paper is structured as follows. We present the general agent architecture in Section 2. The following three sections concern general abilities of all agents:
Path planning is described in Section 3, communication in Section 4, and exploration in
Section 5. Specific capabilities are described in Section 6 (civilian rescue) and Section 7
(extinguishing of fires). In Section 8 we present some of the tools for developing our agents.
Finally, an extensive evaluation and analysis of the 2004 RoboCupRescue competition is
given in Section 9.
2 Agent architecture
The basic task of every agent (platoons as well as centers) is to collect, store and evaluate information, to choose actions fitting best to the situation and finally to execute these
actions.
For storing information the agent maintains a detailed world model. Once connected
to the kernel the world model consists of the initial information about the buildings, nodes
and roads. It will be updated in every cycle by the sensing process. The agent “perceives”
changes inside his hear and see radius, i.e. they are automatically communicated by the
kernel, additional updates result from communication by other agents. Active sensing for
specific tasks like the exploration will be explained later.
In order to flexibly handle the different duties of agents all of their abilities are specified
in terms of tasks. A Task consists of its preconditions (describing executability of a task),
the process itself (including actions to perform and messages to send), and a priority value
(which may be updated as a result of new information). For example, firebrigade agents
are able to extinguish buildings. Consequently, there is an ExtuingishTask which describes
moving to the firescene, calculation and prediction of fire evaluation, and coordination with
the other platoons. However, there are also more basic tasks (like Move) which are created
by higher level tasks.
Creation of a task is triggered by agents (or existing tasks) as a result of updates to the
world model and the agent’s reasoning. However, there is no fixed decision hierarchy or
control flow, but the decision process results from the ranking of tasks collected in a priority
queue. To prevent frequent changes of decisions the current task can only be overruled
by a task of a markedly higher priority or by a command from the station or a leading
agent (cf. Section 7). Once a task has been performed the next in queue will be activated
immidiately. If it took no effect for more than 3 rounds it is distroyed so that the agent is
not stuck for too long.
3 Path planning
Path planning is a basic capability for every rescue agent wanting to reach its selected
target. However, ResQ Freiburg agents heavily rely on their path planning capabilities
already during target selection. They determine travel durations for almost all possible
targets in advance, which enables them to evaluate the trade-off between a target’s utility
and travel duration. This allows the agents to take future travel costs into account very
early during target selection, however, it means that an agent may query the path planner
hundreds or sometimes thousands of times during one cycle!
We achieved the necessary efficiency for such frequent path planner queries with the
three techniques described in this section. Firstly and foremost, planning is done on a
graph considerably smaller than the original city map which consists only of the combined
road segments between crossings, so-called “longroads”. Secondly, instead of heuristic
search we use an efficient implementation of Dijkstra’s algorithm [2] on the longroad graph.
2
Counterintuitive as that may seem, this turns out to be effective as in every cycle every agent
issues a great number of path queries all starting at the same point (its current position)
which the path planner can answer in constant time after having computed and cached
Dijkstra’s algorithm just once. A simple extension on top of the standard algorithm enables
agents to plan to and from locations that are not crossings. Thirdly, we use caching and
object pools to prevent repeated computations of the same paths and, more importantly,
to prevent the Java virtual machine from creating new objects repeatedly. Since these are
low-level methods specific to the Java language they are not covered in this paper.
The specific usage of the path planner that results from the cycle-based movements of
agents in the Robocup Rescue domain is described at the end of this section.
3.1 Planning with longroads
There are many possible locations between which agents can move on a Robocup Rescue
map: buildings, streets, and street nodes. Paths between these positions are sequences of
road-node tuples, possibly starting or ending with buildings (for a more accurate description
cf. [7]). The most obvious way to compute such plans (and the way implemented in the
YabAPI library [9]) is to search the graph spanned by those connected roads, nodes, and
buildings. Note that edges in the transition graph do not correspond to roads, but to the
adjacency information from the GIS which is communicated to the agents at the beginning
of the simulation. In the transition graph, roads are just another kind of vertice! Using
the given representation, the transition graphusually consists of several thousand nodes and
edges. A (slightly simplified) example is shown in Fig. 1.
b1
c11
n1
r1
c12
r2
Figure 1: Roads, longroads, and crossings
In the non-hierarchical GIS representation that was chosen for RoboCupRescue, many
nodes (e. g. n1) connect exactly two road segments. This is mainly due to the fact that
buildings are represented as having an entry node in the Rescue simulation through which
they are entered from the adjacent road or left again. We will call these nodes “internal” in
the following.
The key realization underlying the ResQ Freiburg planner is that for an agent entering
an internal node from one road segment there is no choice but to move to the second connected road segment (unless he wants to reach a building connected to the internal node).
Intuitively, a set of road segments that are interlinked by internal nodes can be considered
to be one single road, a “longroad”. If agents enter a longroad and do not want to reach
a building situated on the longroad they have no choice but to move on to the end of it.
Longroads are terminated by crossings (more than two adjacent roads; for simplicity dead
end nodes with only one adjacent road will be called crossings, too). In Fig. 1 internal
nodes are shown in black and crossings in white. The route between crossings c11 and c12
constitutes a longroad which we will name lr1 in the following.
Since the only points of choice for the agent are crossings the ResQ planner builds a new
transition graph consisting of crossings as vertices and longroads as edges. A longroad r
3
has two endpoints, its head and tail. An optimal path P (c 1 , c2 ) between vertices on the new
graph, i.e. crossings on the original one, can be found with any shortest-path algorithm and
is easily recompiled to the original RoboCupRescue format by replacing longroads with
the sequence of their internal nodes and roads, either from head to tail or from tail to head.
However, since the desired paths usually do neither start nor end at crossings some ontop processing is required. With the exception of buildings directly connected to crossings
and crossings themselves, every location l on a map lies at exactly one longroad 1. Consequently, the head and tail of this longroad, called c 1l and c2l in the following, are the the
only crossings directly reachable from l without passing another one.
Any route between two locations s and e (where here s and e shall be assumed not to
be on the same longroad) must include either c 1s or c2s and, respectively, either c 1e or c2e . The
length of an optimal path from s to e therefore is:
min scis + P (cis , cje ) + cje e
i,j
To solve this formula efficiently, the ResQ planner precomputes and stores the direct
routes from a location to its adjacent crossings. The optimal paths from the crossings c 1s and
c2s to all other crossings are computed and stored during two runs of Dijkstra’s algorithm.
As mentioned above this turns out to be quite efficient because during one cycle the agent
usually queries many paths to locations all over the map. Single-target search could be
speeded up by use of heuristics but would have to be repeated for every target. On the very
sparse longroad graphs 2 twice running Dijkstra’s algorithm turned out to be more efficient.
Anyway, the longroad graphs computed from the maps currently used in the competition
are so small (few hundred nodes and edges) that the the search process takes less time than
the re-transformation of the path found to the kernel format.
3.2 Cycle-based planning
Efficiency is not the only criterium for path planning in the Robocup Rescue domain. Since
the simulation is cycle-based, finding paths with minimal lengths or even minimal duration
may not be the wisest choice. For example, two paths differing only by a few meters or,
respectively, a few seconds can often be considered as equivalent as long as they take the
same number of cycles to travel.
Therefore the ResQ path planner supports planning with cost functions that return realnumbers travel costs that can be interpreted as (fractions of) cycles. We have provided
several such functions, accounting for or ignoring aspects like partially blocked roads, other
agents, unknown road states, etc., that give good approximations of the travel time an agent
has to expect according to its current knowledge of the world.
This allows the agents to build equivalence classes among paths and, consequently,
targets. Several selection mechanisms allow to optimize other criteria when the numbers
of expected cycles of the paths to a set of targets are equal or in some specific range. It is
thus possible for an agent to select the most important target among the ones most easily
reachable or, vice-versa, the closest among the most important targets.
4 Communication and world modeling
Communication fulfills two purposes. Primarily it is used for exchanging information about
the current world state. Thus the agents obtain information that is observed by others and
so improve their local world model. The second task of communication is the coordination
1 In principle, buildings can have several entrances and that way may connect to several longroads, but to the
best of our knowledge, not even the Rescue kernel currently does support this feature.
2 The standard RoboCupRescue maps never feature more the four longroads meeting at a crossing.
4
of the agents. Agents can send instructions to other agents, in order to coordinate and
optimize their acting.
In the RoboCupRescue simulation, platoon agents can only hear messages from platoons and the center of their own type. Furthermore, center agents can only hear messages
from their platoons and other centers. Like in the real world the number of messages that
can be sent or received in parallel is limited. A platoon agent can send or receive at most
4 messages in each cycle. The length of a message is limited to 256 bytes. A center agent
can send or receive at most 2*n message in each cycle, where n is the number of platoon
agents of the same type as the center.
In order to communicate as many different informations as possible in a single message
and to allow easy dispatching of (parts of) its content to other agents, each message sent
by a ResQ agent is composed of small self-contained units of information, called tokens.
Each token consists of a header section and a data section. The header section stores type,
priority, length and receivers of the token. The data section contains the information, which
the token transfers. For instance there is a token of the type ROAD_BLOCKED, which
transmits lists of blocked roads.
Both platoons and centers may initiate communication. Most commonly this is the case
when the agent perceives changes in the world, i.e. receives updates to its world model from
the kernel. For example, a platoon may initiate communication by composing a message
of tokens and sending it to the respective station. The station, upon receiving the message,
decomposes it to tokens again, interprets their content and, if necessary, updates its own
local world model. After that the message is passed on to the other platoons of the same
type and to other stations which may again route the information to their platoons.
In order to prevent the repeated passing on of a token, the stations store for each token
whether it was already sent. Thus, although agents may receive the same information from
different sources, they will only pass it on once, avoiding infinite cycles of communication.
Since platoons may accept only four messages per cycle it is impossible for platoons
to rely on obtaining all information from their peers. Therefore most information is routed
through the center agents. Direct communication between platoons takes place only in
time-critical situations where instructions have to be passed quickly, i.e. between the firebrigades at a specific fire site.
5 Exploration
5.1 Knowledge Base
The Knowledge Base (KB) maintains the knowledge of an agent on the relation between
the set of civilians C and the set of locations L. This is carried out by maintaining for
each civilian c ∈ C a set of locations L c that contains all possible locations of the civilian.
Furthermore, we maintain for each location l ∈ L a set of civilians C l that contains all
civilians that are possibly situated at location l. Initially, ∀c ∈ C, L c = L and ∀l ∈ L, Cl =
C.
The KB allows us the calculation of the expectation of the number of civilians situated
at any location l. This is achieved by calculating the probability that civilian c is situated at
location l, given the current state of the knowledge base:
1
if l ∈ Lc
|Lc |
(1)
P (loc(c) = l|KBt ) =
0
otherwise
Which yields the expectation on the number of civilians situated at location l:
E[|Cl |] =
|C|
P (loc(ci ) = l|KBt )
i=0
5
(2)
Note that from the above follows that initially the expectation for each location l is given by
E[|Cl |] = |C|
|L| , that is, we expect civilians to be uniformly and independently distributed on
the map. This is clearly not the case if buildings have a different size or a different degree
of their destruction. However, in order to incooperate this information, one would have to
switch to a belief based, internal representation of the KB.
The KB is updated by either visual or auditory perception, communication of perception from other agents and reasoning. Both communicated and experienced perception are
inserted into the KB with respect to the agent’s sensor model. The sensor model returns for
any location l the set of locations V l and Al that are in visual (10m) or auditory (30m) range
of l, respectively. Based on the sensor model, one can perform either positive or negative
update operations on the KB:
1. Positive updates:
(a) Civilian c seen at l: c = l ∧ c ∈
/ L\l
/ L \ Al
(b) Civilian c heard at l: c ∈ A l ∧ c ∈
2. Negative updates:
(a) No Civilian seen at l: ∀c ∈ C ⇒ c = l
(b) No Civilian heard at l: Not implemented
The KB is implemented as a |C|x|L| boolean matrix, where C is the set of civilians and L
the set of locations. An entry c, l is set to false if a civilian c is definitely not at location
l, and set to true otherwise (including the case of uncertainty). Initially, all entries are set
to true.
5.2 District exploration
District exploration is a multi-agent behavior for the coordinated search of buried civilians.
The behavior guarantees that at any time each agent is assigned to a reachable and unexplored district on the map. The search is carried out by all agent team (ambulance team,
police force and fire brigade), if they do not have to perform tasks with higher priority, such
as extinguishing or rescuing.
Coordinated exploration is difficult due to the fact that the blockage of roads and hence
the reachability of regions is unknown to the agents in advance. To solve this problem, we
implemented various clustering techniques, such as agglomerative clustering and KD-tree
based clustering. These methods calculate from a given connectivity graph G = V, E of
a city, where V represents the set of locations and E the set of connections between them,
a hierarchical clustering. The hierarchical clustering, represented by a binary tree, provides
at each level a partitioning of the city into n districts, reflecting the reachability of locations
on the map. Note that the clustering has to be revised continuously, because the state of the
roads and thus the connectivity changes during each cycle of the simulation.
Since agent teams are assigned arbitrarily to the exploration task 3 , the total number of
assigned agents is usually unknown. Therefore each team performs a separated clustering
of the map that overlaps with the clusterings of other teams. From the clustering, agents
are uniquely assigned to a district within a team, if there are less districts than agents,
districts are assigned multiply. In order to prevent agents from exploring locations twice, it
is necessary to further coordinate the search by the negotiation of exploration targets (see
section 5.5).
The exploration within a district can significantly be accelerated if exploration targets
are selected with respects to the agent’s sensor model. The sensor model provides for each
location on the map the set of locations that can be observed from that location. In order
3 Agents
might be busy with more important tasks, such as extinguishing fires
6
to foster the selection locations with high entropy, a utility value U (l) is calculated that is
equal to the number of locations |O l | observable from location l:
U (l) = |Ol |
(3)
The overall sum of utilities over time can be maximized by the selection of targets with
high utility as well as targets that are reachable within a short amount of time. Hence, from
the set of locations L D that are within the agent’s district, a target location l t is decided
based on the trade-off between utility U (l) and travel cost C (l):
lt = argmax U (l) − α ∗ C (l)
l∈LD
(4)
whereas α is a constant regulating the trade-off between the estimated travel costs and the
exploration utility and has to be determined experimentally. The estimated Travel costs
C (l) are provided by the path planner, described in section 3.
5.3 Active Exploration
Active exploration is an extension to the previously described district exploration task in
that the search focuses on locations with high evidence on civilian whereabouts. This is
carried out by exploiting the knowledge, collected from senses, communication and reasoning, in the KB (see section 5.1). Evidence from the KB is utilized by calculating the
utility value U (l) in equation 4 with respect to the number of civilians expected to be found
at all observable locations O l :
E[|Ck |]
(5)
U (l) =
k∈Ol
which yields, after inserting equation 2:
U (l) =
|C|
P (loc(ci ) = k|KBt )
(6)
k∈Ol i=0
5.4 Active surveillance
Furthermore it is important for the rescue team to have up-to-date information on the injured civilians that have been found by the exploration task. The prediction module, described in section 6.1, can provide as accurate predictions of the civilian life time, as up-todate the information on buriedness, damage and hitpoints is. As we will describe in section
6.2, the number of civilians that can be rescued depends on the efficiency of the rescue
team, which in turn, depends on the accuracy of predictions. Hence we extended the active
exploration behavior in that it assigns agents to the survilliance of known civilian locations
after the map has been explored sufficiently. The survilliance behavior is carried out by
sampling interesting locations randomly from the set of known civilian locations, whereas
locations with obsolete information are selected with high probability.
The number of agents that are assigned to active search is limited to |L|
k , where L is the
set of open locations and k a constant that has to be determined experimentally. A small
k might cause the agents to waste resources on the search, whereas a large k increases the
time needed to finish the search. All agents above the assignment limit are performing
active surveillance.
5.5 Team coordination
Besides the agent distribution due to the assignment of districts, it is necessary to further
coordinate the multi-agent search in oder to prevent the multiple exploration of locations.
7
This is carried out by communicating the information on found civilians, as well as locations that have been visited. However, if agents are selecting exploration targets from
the same district (i.e. due to the overlap or the shortage of available targets), it might still
occur that they explore locations twice. We implemented two methods for reducing the
probability of multiple target exploration. Firstly, agents select exploration targets from a
probability distribution. Secondly, agents negotiate targets they plan to explore in the next
cycle via the short range communication channel (say and hear).
It turned out that the latter performs poor, if agents are able to move much longer
distances in a cycle as they are able to observe, which is true for the current parameter
setting of the RoboCupRescue kernel. The problem could be solved by performing the
negotiation via the long range communication. Unfortunately, this does no pay off, since
communication is a limited resource. Hence agents decide their exploration targets by a
random selection that prefers targets that yield high score after equation 4.
6 Civilian Rescue
6.1 Lifetime prediction
To achieve good results in the civilian rescue process, it is necessary to know when a
civilian will die. If there is a reliable prediction for the life time of a certain civilian the
scheduling of the rescue operation can be adapted accordingly. On the one hand, it is
possible that a civilian does not need to be rescued at all, because it is alive at the end of
the simulation. On the other hand, it is possible that a civilian will die within a short time
and has to be rescued as soon as possible in order to survive.
For the ResQ Freiburg agents, machine learning was used to gain a prediction for the
civilian’s life time and classification into survivors and victims. We created an autorun tool
that starts the kernel and the agents simultaneously in order to collect arbitrary data. The
tool was used for several simulation runs on the Kobe, VC and Foligno maps, from which
a large amount of datasets were generated. A data set consists of the values for health and
damage of each civilian at each time step gained during the simulation. In order to reduce
the noise in the data, simulations were carried out under the following conditions:
• Simulating time: 400
• No rescue processes by the agents
• No fires
The latter two are necessary in order to prevent unexpected changes of the damage of a
civilian due to its rescue, resulting in zero damage, or due to fires, resulting in unpredictable
high damage. For the calculation of the life time, there has to be determined a time of death
for each dataset. Hence, the simulation time was chosen to be 400 rounds which seemed to
be a good compromise between an ideal simulation time of ∞ and the standard simulation
time of 300 rounds that would lead to a non-uniform distribution of the datasets.
Regression and classification was carried out with the WEKA [10] machine learning
tool. We utilized the C4.5 algorithm (decision trees) for the classification task. The regression of the simulation time is based on Adaptive Boosting (Ada Boost) [4]. Since the
current implementation of the WEKA tool does only provide Ada Boost on classification,
we had to extend this implementation for regression [3], which then has been applied with
regression trees (CART) [1].
The regression trees have been evaluated on test data sets in order to learn the confidence of a prediction in dependency of the civilian’s damage and the distance between the
timestamp of the dataset and the predicted time of death. Confidence values are necessary,
since predictions are less accurate as higher the difference between the observation and the
civilian’s actual time of death. The sequence optimization, described in section 6.2, relies
on the confidence values in order to minimize sequence fluctuations.
8
6.2 Genetic Sequence Optimization
If the time needed for rescuing civilians and the life time of civilians is predictable, one can
estimate the overall number of survivors after executing a rescue sequence by a simulation.
For each rescue sequence S = t 1 , t2 , ..., tn of n rescue targets, an utility U (S) is calculated that is equal to the number of civilians that are expected to survive. Unfortunately an
exhaustive search over all n! possible rescue sequences is intractable. A straight forward
solution to the problem is, for example, to sort the list of targets by the time necessary to
reach and rescue them and to subsequently rescue targets from the top of the list. However,
as shown in section 9, this might lead to sub-optimal solutions. Hence we decided to utilize a Genetic Algorithm (GA) for the optimization of sequences and thus the subsequent
improvement of existing solutions [5].
The time for rescuing civilians is approximated by a linear regression based on the
buriedness of a civilian and the number of ambulance teams participating to the rescue.
Travel costs between two targets are estimated by averaging over costs sampled during
previous simulation runs. This is much more efficient than the calculation of exact travel
costs, involving in the worst case the calculation of the Floyd-Warshall matrix 4 .
The GA is initialized with heuristic solutions, as for example solutions that greedily
prefer targets that can be rescued within a short time or urgent targets that have a short
lifetime. The fitness function of solutions is set equal to the previously described utility
U (S). In order to guarantee that solutions in the genetic pool are at least as good as the
heuristic solutions, the so called elitism mechanism, which forces the permanent existence
of the best solution in the pool, has been used. Furthermore we utilized a simple one-pointcrossover strategy, a uniform mutation probability of p ≈ 1/n and a population size of 10.
Within each cycle, 500 populations of solutions are calculated by the ambulance station
from which the best sequence is broadcasted to the ambulance teams that synchronously
start to rescue the first civilian in the sequence.
One difficulty of the sequence optimization is given by the fact that information stored
in the KB on civilians changes dynamically during each round and thus might cause fluctuations of the rescue sequence. This can be caused by two reasons: Firstly, civilians are
discovered by the active exploration, which is executed by other agents at the same time.
Secondly, predictions vary due to information updates from active or passive survilliance.
The latter effect can be weakened by updating the sequence with respect to the confidence
of predictions. Updates of the information on civilians are ignored, if they are not statistically significant with respect to their confidence interval.
The effect of information updates due to exploration has to be controlled by deciding between rescue latency and rescue permanence. Therefore we implemented a reactive
mechanism that recognizes from information updates any emergency rescue targets and, if
one has been detected, subsequently causes a revision of the current target sequence. An
emergency situation is given if any other target has to be rescued immediately in order to
survive, but also the current target would survive if postponing its rescue.
7 Extinguishing fires
7.1 Prediction
To fight fires effectively a good prediction of the fire spread is essential. But in order to
make this prediction, we need two pieces of information for each building: a) the set of
surrounding buildings and b) the time it takes for a fire to spread to those neighbors.
For the latter, we determined that the time until a building reaches fieriness 2 (which we
dubbed orange time) is a useful quantity. It does not only allow us to compute the fire-start
3
4
The Floyd-Warshall matrix calculates in O n
from all sources to all targets the shortest path.
9
time but it is also a good indicator of the difficulty of the building. To compute the orange
time, we used automated data-mining techniques as outlined in section 5.5.
For the problem of computed the fire neighborhood, we use a algorithm which checks
the distances between the walls of the source building and another one. If the distance is
smaller than a value d, it checks if other buildings are in between. If this is not the case, the
second buildings is added to the fire neighbourhood of the source buildings. To improve
the speed, only buildings whose center is inside a radius r from the source are considered.
But as this calculation is only performed once at the connect time, performance is not a
high priority.
To predict the fire spread, a modified Dijkstra’s algorithm, where all burning buildings
are put into the queue at the start, is used. The distance d(A, B) between two Buildings A
and B is defined as the time is takes for building A to put building B on fire. If fire cannnot
jump over from A to B, d(A, B) is infinite.
Additionally, the algorithm is used to implement a clustering on the fires. Using union
find structures two fire sites that will touch in N turns are merged. So, at the beginning
of the simulation every fire will be in its own site, and merging will take place, as the fire
spreads.
7.2 Target selection
Target selection is implemented in two stages: first, a fire site is selected, then a specific
building in that site.
Site selection is based on the cost/utility-ratio, where C is the sum of the orange times
of all buildings in the fire site and U = b + α ∗ civs where b is the number of buildings that
will be put on fire by the site in the next turns, and civs is the number of civilians in these
buildings.
Additionally, buildings are added to the costs that are predicted to catch fire before the
agent can reach the fire site.
In the second stage, a fire on the selected site is chosen. To do this, we first remove all
fires from the site that have no unburned fire neighbourhood or have f ieryness = 3. Fires
that cannot be extinguished with the available agents are dropped as well as fires that are
not reachable. The remaining fires are then ordered by a priority function
P (b) = (α ∗ unburned + β ∗ extinguished + γ ∗ civs + f (dist)) ∗ g(direction)
where unburned and extinguished are the respective number of buildings in the fire
neighbourhood, civs the number of living civilians in an unburned building and f is a
function that computes a bonus or malus depending on the distance between the agent and
the fire. g is a function that penalises fires that will spread towards the edge of the map,
with g approaching zero the nearer the building is to the edge.
7.3 Coordination
Due to world-model differences and different starting positions, agents might select different buildings to extinguish. As this is usually unwanted, we introduced several mechanisms
to coordinate the fire brigades.
Every time a fire brigade agent switches to a different fire site (or chooses one for the
first time) it sends a message to the station containing the site. The station will collect these
messages and send a summary of the number of brigades at every site to the agents every
few turns. The agents will then try to switch to a reachable site with the largest number of
other fire brigades.
In addition to this, each agent tells the station for every site if it is reachable or not. If
an agent is refuelling, no site is reachable. From this messages the station computes the
maximum number of available agents for each site and sends it to the fire brigades every
10
Figure 2: The ResQ-Viewer with several plug-ins
turn. The brigades can then use this information to avoid extinguishing buildings they
cannot put out with the available number of agents.
We also implemented a leader mechanism. Each fire site – if there are any agents in it
– has a platoon leader who sends his current and next target to the other agents. As these
orders are usually very time-critical, they are not relayed over the fire station. The other
agents will then try to extinguish the same fire as their leader. If this is not possible for
some reason, or the building is already put out, they will fall back to the next buildings the
leader sent – more fall-backs are usually not necessary, as the message latency is normally
only one turn.
The leaders are assigned by the station using the following scheme:
• If a site has no leader, the first agent who switches to this site becomes leader.
• If a leader has to refuel, he resigns leadership two turns beforehand but continues to
send orders in this time. The other agents that are not in the refuge then send their
remaining water quantity to the station, which then announces the agent with the
most water as the new leader.
8 Development and debugging tools
8.1 Viewer tools
In order to be able to program and debug a system as complex as a RoboCup Rescue agent
system, we developed an extensive tool-set based on the viewer by T.Morimoto [8].
Every module in our agents is able to write status and debug data to a number of log-files
in either human- or machine-readable form. Additionally every agent writes each change
11
in his world-model – updates from the kernel, from communication with other agents and
from its own reasoning – to a world log.
We heavily modified the Moriomoto-Viewer, so our viewer is able to maintain several
world-models and several viewer windows at the same time. This enables us, using the
world log from the agents, to display the internal world model of each agent alongside the
real kernel view.
Another important development was the design of a plug-in API, that allowed us to
extend the capabilities of the viewer anytime without getting problems with maintainability.
Every plug-in is attached to a viewer window and has access to its world model. It can also
interact with the viewer by receiving mouse clicks from the window and mark objects in
different colours and styles. By reading the log-files, a plug-in can then show additional
information about the internal state of an agent, either in the viewer window (e.g. its current
target or the planned path) or in its own window (e.g. the task-queue). Furthermore, the
plug-in-system allows us to implement or modify some of our algorithms in the viewer by
writing an appropriate plug-in. This makes development much easier, as you don’t have to
start your agents and evaluate their performance after every change, and you can get a direct
feedback to parameter changes without having to change the source-code and recompile.
8.2 Batchcontroller
The batchController software 5 is designed for data mining in the RobocupRescue domain.
The module starts simulation runs automatically and logs formatted data specific to a given
learning task.
A task receives pre- and postcondition calls in each cycle to evaluate task-specific conditions on the world and thus determine the objects that are to be logged or that the logging
is to be stopped for. Each task has a name that is used to produce unique log file names
automatically.
9 Results
During the competition, teams are evaluated by an overall score that is calculated based on
the state of civilian health and building destruction. However, since this score incooperates
the total performance of all agent skills, such as exploration, extinguishing and rescuing, it
is difficult to assess single agent skills directly. In order to compare our agents with agents
from other teams, the performance of typical agent skills are emphasized by an evaluation
of log files that were collected during the 2004 competition. The following tables provide
results from all rounds of all teams that passed the preliminaries. All values are concerning
the last round, i.e. the percentage of clean roads at round 300. Bold numbers denote the
best results that have been achieved during the respective round.
Table 1 shows the percentage of blockades that have been removed by the police agents.
The results show that particularly the teams Damas Rescue and The Black Sheep most
efficiently removed blockage from the roads.
Table 2 shows the percentage of buildings that have been saved by the fire brigades.
Obviously the team Damas Rescue saved most of the buildings, whereas SBC reached a
robust behavior, shown by the good average value.
The efficiency of exploration is another important criterium for the team evaluation.
As more locations of civilians are known, as more efficiently rescue operations can be
scheduled. Table 3 shows the percentage of buildings that were visited by agents 6 .
The result shows that Caspian explored most of the buildings. However, the percentage
of explored buildings does not necessarily correlate with the percentage of found civilians,
5 Full documentation
6 Note
at http://kaspar.informatik.uni-freiburg.de/ project2/batchController/batchController.html
that full communication of visited locations as well as exploitation of a sensor model was assumed
12
Table 1: Percentage of clean roads
Final-VC
Final-Random
Final-Kobe
Final-Foligno
Semi-VC
Semi-Random
Semi-Kobe
Semi-Foligno
Round2-Kobe
Round2-Random
Round2-VC
Round1-Kobe
Round1-VC
Round1-Foligno
ResQ
74,68
77,84
92,25
96,41
67,93
82,53
92,40
95,45
92,52
87,74
91,34
89,19
91,90
95,84
Damas
82,22
86,51
93,74
97,72
79,57
87,44
93,65
97,08
93,52
90,03
91,62
89,51
92,13
96,92
Caspian
71,79
77,66
92,08
97,22
68,86
77,47
92,71
95,58
91,46
87,62
90,74
87,78
91,74
96,52
BAM
70,43
63,10
92,05
96,07
57,90
81,93
92,51
96,37
92,46
87,71
89,87
88,21
91,84
96,36
SOS
N/A
N/A
N/A
N/A
67,22
82,26
92,62
96,93
92,78
87,86
91,40
88,30
N/A
94,19
SBC
N/A
N/A
N/A
N/A
57,85
79,53
92,56
97,07
93,45
88,73
90,92
87,70
91,81
96,62
ARK
N/A
N/A
N/A
N/A
53,27
80,30
93,55
95,92
92,25
85,03
N/A
91,12
91,54
97,63
B.Sheep
N/A
N/A
N/A
N/A
80,53
78,76
99,72
83,44
99,50
99,97
98,86
81,17
99,82
80,15
Number of wins
AVG %:
STD %:
0
87,72
8,25
7
90,83
5,09
0
87,09
8,59
0
85,49
11,25
0
88,17
8,93
0
87,62
11,59
2
86,73
13,63
5
90,19
9,96
Final-VC
Final-Random
Final-Kobe
Final-Foligno
Semi-VC
Semi-Random
Semi-Kobe
Semi-Foligno
Round2-Kobe
Round2-Random
Round2-VC
Round1-Kobe
Round1-VC
Round1-Foligno
ResQ
47,21
24,04
38,24
91,15
23,45
23,18
96,49
36,22
70,27
99,04
10,23
99,46
97,25
98,99
Damas
54,13
26,38
61,89
62,77
23,60
28,73
76,76
38,06
37,03
60,91
11,57
98,92
99,53
98,99
Caspian
81,67
15,03
38,38
60,92
25,49
18,09
94,32
32,72
59,73
54,68
10,23
99,73
79,70
36,13
BAM
43,19
12,35
13,51
34,56
27,14
19,55
95,41
37,79
95,41
99,16
13,53
99,73
99,76
45,99
SOS
N/A
N/A
N/A
N/A
19,12
22,82
24,32
31,89
48,38
63,55
12,67
99,05
N/A
32,53
SBC
N/A
N/A
N/A
N/A
25,10
21,45
90,54
28,48
61,49
97,60
71,99
98,78
98,90
54,29
ARK
N/A
N/A
N/A
N/A
26,36
17,09
55,27
26,82
10,54
80,70
N/A
67,16
99,53
43,59
B.Sheep
N/A
N/A
N/A
N/A
27,22
18,91
94,19
23,23
95,54
99,52
36,51
91,89
99,53
29,86
Number of Wins:
AVG %:
STD %:
3
61,09
37,80
5
55,66
34,11
2
50,49
31,83
2
52,65
37,50
0
39,37
27,28
1
64,86
31,63
0
47,45
30,49
3
61,64
36,70
Table 2: Percentage of saved buildings
as shown by table 4 7 . This is due to the fact, that communication as well as reasoning
might increase the efficiency of exploration. At the end, more civilians were found by
ResQ Freiburg than Caspian, even the latter explored more buildings.
Important for efficient rescue operations, is the point in time when civilian whereabouts
are known. As earlier civilians are found as better their rescue can be scheduled. Figures
3 and 4 show the number of civilians found during each cycle on the RandomMap. The
results confirm the efficiency of ResQ Freiburg’s exploration: At any time the agents knew
about more civilians than agents of any other team.
Figure 5 documents the difference between a greedy rescue target selection, i.e. to prefer targets that can be rescued fast, and the selection based on an optimization by a genetic
algorithm. It can be seen that an optimization of the rescue sequence clearly increases the
number of rescued civilians.
Finally table 5 shows the number of civilians saved by each team: ResQ Freiburg saved
more than 620 civilians during all rounds, which are 35 more than the second best and 59
more than the third best in the competition.
7 Note
that civilians are considered as being found, if one of the agents was within their visual range
13
Table 3: Percentage of explored buildings
Final-VC
Final-Random
Final-Kobe
Final-Foligno
Semi-VC
Semi-Random
Semi-Kobe
Semi-Foligno
Round2-Kobe
Round2-Random
Round2-VC
Round1-Kobe
Round1-VC
Round1-Foligno
ResQ
83,48
69,62
89,19
84,15
69,39
78,91
85,41
74,75
87,16
81,18
83,40
87,43
85,37
83,78
Damas
83,24
72,62
92,97
85,25
72,86
68,73
96,22
89,12
90,68
80,94
70,18
90,27
90,48
90,05
Caspian
87,02
78,13
89,73
86,73
77,42
71,91
92,97
84,98
95,00
88,61
84,58
94,05
95,28
90,05
BAM
67,27
49,92
94,19
74,29
45,08
54,36
95,54
62,49
91,76
84,53
40,44
96,08
94,26
60,00
SOS
N/A
N/A
N/A
N/A
52,01
59,36
66,62
65,35
80,54
60,67
67,74
96,62
N/A
54,65
SBC
N/A
N/A
N/A
N/A
52,87
70,27
97,30
92,53
94,19
94,24
87,88
97,70
97,72
88,57
ARK
N/A
N/A
N/A
N/A
47,92
46,18
99,46
79,08
99,46
82,61
N/A
97,84
100,00
67,37
B.Sheep
N/A
N/A
N/A
N/A
59,72
46,18
91,89
20,74
92,43
87,89
89,54
80,95
91,35
13,00
Number of Wins:
AVG %:
STD %:
1
81,66
5,82
1
83,83
9,98
4
86,89
7,87
1
72,16
22,21
0
67,06
13,87
2
87,33
14,59
4
79,99
21,84
1
67,37
30,80
Final-VC
Final-Random
Final-Kobe
Final-Foligno
Semi-VC
Semi-Random
Semi-Kobe
Semi-Foligno
Round2-Kobe
Round2-Random
Round2-VC
Round1-Kobe
Round1-VC
Round1-Foligno
ResQ
97,22
90,91
98,77
96,67
77,92
88,51
100,00
90,12
98,89
98,89
92,22
94,29
100,00
100,00
Damas
94,44
85,71
97,53
96,67
77,92
73,56
100,00
95,06
98,89
95,56
78,89
100,00
100,00
97,14
Caspian
100,00
81,82
95,06
96,67
85,71
72,41
100,00
86,42
97,78
98,89
90,00
100,00
100,00
94,29
BAM
81,94
70,13
98,77
72,22
45,45
63,22
98,61
81,48
95,56
81,11
45,56
98,57
97,14
77,14
SOS
N/A
N/A
N/A
N/A
53,25
67,82
79,17
83,95
91,11
70,00
72,22
100,00
N/A
74,29
SBC
N/A
N/A
N/A
N/A
53,25
80,46
100,00
97,53
100,00
96,67
88,89
100,00
100,00
92,86
ARK
N/A
N/A
N/A
N/A
50,65
52,87
100,00
85,19
100,00
85,56
N/A
94,29
100,00
77,14
B.Sheep
N/A
N/A
N/A
N/A
63,64
55,17
97,22
30,86
98,89
94,44
87,78
78,57
98,57
14,29
Number of Wins:
AVG %:
STD %:
9
94,60
7,17
4
92,24
10,53
7
92,79
9,03
1
79,06
20,75
1
76,87
13,73
5
90,97
14,69
3
82,85
19,35
0
71,94
30,25
Table 4: Percentage of found civilians
10 Conclusion
The results presented in Section 9 clearly show the strengths of the ResQ Freiburg team – as
well as some points for future improvement. We are confident that an efficient exploration
and rescue sequence optimization are crucial components of the overall team performance.
It is interesting to note that during the final on the RandomMap, which decided by unbelievable 0.4 points of the total score the positioning between Damas Rescue and ResQ
Freiburg, ResQ Freiburg was able to rescue even seven civilians more than the second best.
However, the basis of the Knowledge Base and thus the exploration is built upon the
agents’ world model and efficient information exchange by communication. In order to
gain a reliable world model, one has to implement various tools in order to optimize and
understand the information processing in the background. The implementation of communication and world modeling was probably one of the tasks we had to spend most of our
time on.
Another problem that had to be tackled by our team was the strong limitation on computational resources. Without an efficient implementation of the path planner, our team,
whose code is entirely written in JAVA, would not be able to compete.
In summary, the results presented provide an interesting insight in the 2004 competition: Besides strategies for extinguishing fires and the removal of blockades, also explo-
14
Round:Semifinal Map:RandomMap
90
Percent Civilians Found
80
70
60
50
40
ARK
BAM
Caspian
DAMAS
ResQFreiburg
SBCe
SOS
TheBlackSheep
30
20
10
0
0
150
300
Time
Figure 3: The number of civilians found by exploration on a randomly generated map
during the semi-final
Round:Final Map:RandomMap
100
Percent Civilians Found
90
80
70
60
50
40
30
20
BAM
Caspian
DAMAS
ResQFreiburg
10
0
0
150
300
Time
Figure 4: The number of civilians found by exploration on a randomly generated map
during the final
ration and sequence optimization are crucial subproblems of the RoboCupRescue simulation league.
References
[1] L. Breiman, J.H. Friedman, R. A. Olshen, and C.J. Stone. Classification and regression trees. Wadsworth & Brooks, 1984.
[2] T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1992.
[3] Harris Drucker. Improving regressors using boosting techniques. In Proc. 14th International Conference on Machine Learning, pages 107–115. Morgan Kaufmann,
1997.
[4] Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In International Conference on Machine Learning, pages 148–156, 1996.
15
Number of saved civilians
70
Greedy-Heuristic
Genetic Algorithm
65
60
# Civilians
55
50
45
40
35
30
0
1
KobeEasy
2
KobeHard
3
4
5
KobeMedium KobeVeryHard
RandomMapFinal
6
7
VCEasy
8
VCFinal
9
VCVeryHard
Figure 5: The number of rescued civilians under two different strategies
Table 5: Number of saved civilians
Final-VC
Final-Random
Final-Kobe
Final-Foligno
Semi-VC
Semi-Random
Semi-Kobe
Semi-Foligno
Round2-Kobe
Round2-Random
Round2-VC
Round1-Kobe
Round1-VC
Round1-Foligno
ResQ
42
32
46
66
18
22
57
37
57
52
31
45
62
53
Damas
43
25
45
54
15
26
47
46
37
48
33
51
62
53
Caspian
52
29
46
50
17
16
54
44
43
39
32
47
55
37
BAM
34
16
30
29
12
14
52
43
50
45
24
43
57
33
SOS
N/A
N/A
N/A
N/A
11
20
20
42
43
47
37
47
N/A
37
SBC
N/A
N/A
N/A
N/A
12
14
39
28
35
44
51
31
51
41
ARK
N/A
N/A
N/A
N/A
12
15
34
29
28
50
N/A
25
54
30
B.Sheep
N/A
N/A
N/A
N/A
14
15
44
24
43
37
34
34
44
23
#Wins:
Σ TOTAL:
Σ SEMI+PREM
9
620
434
5
585
418
2
561
384
0
482
373
0
304
304
1
346
346
0
277
277
0
312
312
[5] J. H. Holland. Adaption in Natural and Artificial Systems. University of Michigan
Press, 1975.
[6] H. Kitano, S. Tadokoro, I. Noda, H. Matsubara, T. Takahashi, A. Shinjou, and S. Shimada. RoboCup Rescue: Search and rescue in large-scale disasters as a domain for
autonomous agents research. In IEEE Conf. on Man, Systems, and Cybernetics(SMC99), 1999.
[7] T. Morimoto.
How to Develop a RoboCupRescue
http://ne.cs.uec.ac.jp/~morimoto/rescue/manual/.
[8] T.
Morimoto.
RoboCup
Rescue
http://ne.cs.uec.ac.jp/~morimoto/rescue/viewer/index.html.
[9] T.
Morimoto.
YabAPI
agent
http://ne.cs.uec.ac.jp/~morimoto/rescue/yabapi/.
Agent,
Viewer,
development
kit,
2002.
2002.
2002.
[10] Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools with
Java implementations. Morgan Kaufmann, San Francisco, 2000.
16
Download