A Hybrid Approach for Data Distribution Management*

advertisement
A Hybrid Approach to Data Distribution Management*
Gary Tan, Yusong Zhang, and Rassul Ayani
School of Computing
National University of Singapore
Singapore 119260
Corresponding Email: gtan@comp.nus.edu.sg
Fax: (+65) 779 4580
Abstract
One of the services provided by the High Level Architecture (HLA) Run-time Infrastructure (RTI) is
data distribution management (DDM), which aims to make data communication more efficient by
sending data to only those federates requiring the data. Several DDM methods, notably region and
grid-based, have been proposed and some of them have been implemented in RTI. This paper first
briefly discusses grid-based and region-based DDM and their advantages and disadvantages, and then
a new hybrid approach is proposed. Our simulations show that in some situations, the hybrid
approach can reduce both the number of irrelevant messages of the grid-based DDM and the number
of matching of the region-based approach.
Keywords: high level architecture, data distribution management, grid-based filtering and regionbased filtering, and hybrid filtering
1 Introduction
One of the services provided by the High Level Architecture (HLA) Run-time Infrastructure (RTI) is data
distribution management (DDM), which aims to make data communication more efficient by sending data
to only those federates requiring the data [1]. In an HLA simulation, federates become publishers and/or
subscribers. For example, in a war-game simulation, tanks publish their positions which other objects, say
fighter planes, may be interested to detect by subscribing to receive the position updates as published by
the tanks.
DDM serves to make the data communication more efficient by sending the data to only those
federates who need the data. This is in contrast to the broadcasting mechanism employed by Distributed
Interactive Simulation (DIS). The approaches used by DDM are aimed at reducing the message traffic
*
This research is supported by the NUS-MINDEF collaboration GR6757
1
over the network, and the data set required to be processed by the receiving federates [2, 3, 4]. Several
DDM filtering mechanisms have been proposed and some of them have been implemented in RTI. The
two main types of DDM filtering are region-based and grid-based filtering.
The region-based filtering method uses a fundamental construct called the routing space (RS), which
is usually a two or three-dimensional coordinate system through which a federate expresses its interest in
receiving data (subscription region) or declares its intention to send data (update region). When an update
region and a subscription region of different federates overlap, the RTI establishes communications
connectivity between the publishing and subscribing federates, and the updates of the publishing federate
are then routed to the subscribing federate. Figure 1 shows one subscription region (S) and two update
regions U1 and U2 belonging to three different federates. In this example, S and U1 overlap and the
updates from the object associated with U1 will be routed to the federate of S. However, S and U2 do not
overlap and thus U2’s updates will not be routed to S.
U1
S
U2
Figure 1: Region-based Filtering in a Routing Space
In region-based filtering, S will receive all updates of federates within U1, even those outside the
shaded area in figure 1. Thus, S will receive some irrelevant data (those within U1, but outside of the
shaded area in figure 1). This is considered irrelevant data, because S is not interested in it. Furthermore,
every update region must be compared with all subscription regions in the routing space to ascertain its
receivers. So if there are too many subscription regions in the routing space, the number of matching is
high. To reduce the amount of irrelevant data and matching cost, the grid-based filtering approach is
adopted.
In grid-based filtering, the routing space is partitioned into a grid of cells. Now instead of direct
matching between the update regions and the subscription regions, matching is only performed within
each cell and only updates in the cell are sent to the interested subscribers in that cell, thereby reducing
the amount of irrelevant data. In our grid-based approach, two lists are associated with each grid-cell: a
2
list of publishers (objects that fall within the cell at a certain point in time) and a list of subscribers
(objects that are interested to receive data about objects within the cell).
With grid-based filtering, the size of the grid-cell becomes an important issue. Large cell sizes will
produce larger multicast groups associated with each cell, while smaller cell sizes will produce smaller
lists but requires more frequent updating of the group lists. Much research has been focused on grid-based
filtering. Cohen and Kemkes discussed the impact of the update/subscription rate on the performance of
DDM [5, 6]. Van Hook [7] and Rizik [8] studied the performance of grid-based filtering algorithms and
show the impact of grid cell size on communication costs and performance. Berrached et al [9] made
comparisons between region-based and grid-based filtering.
In a previous paper [10], we described a simulation platform used to investigate the grid-based
filtering mechanism, and reported on the experimental results. We showed that the size of the grid cells
had a substantial impact on the performance of grid-based distributed simulations.
In this paper, we give a new DDM approach – a hybrid approach which combines both region-based
and grid-based concepts, and compare the cost of this approach with that of the grid-based and regionbased approaches. From the results and comparisons, we find that the hybrid approach has less cost in
some situations.
In order to compare the cost of these DDM approaches, we use the same simulation platform as
described in [10]. The simulation platform is written in the object-oriented programming language C++
and runs on a Fujitsu AP3000 distributed-memory system with 32 nodes. The platform comprises three
sub-models: the SimObj, the DDM manager ( Local DDM manager and DDM coordinator), and the
Communication Layer.
The SimObj submodel simulates the movements of the simulation entities from the simulation data or
real data (trace file). The DDM manager simulates the declaration management and data distribution
management services of HLA/RTI while the communication layer provides the networking services using
TCP or UDP.
The rest of the paper is organized as follows. In section 2, we review the resource costs of the regionbased filtering and the grid-based filtering. In section 3, we give the hybrid approach. Section 4 discusses
the results and analysis. The conclusion is presented in section 5.
2 Resource costs of DDM filtering
In the previous section, we briefly discussed the region-based filtering and the grid-based filtering. In this
section, we discuss the resource costs of these two approaches.
2.1 Cost of grid-based approach
3
In our simulation platform, there is a submodel named DDM manager. The DDM manager is in charge of
the data filtering mechanisms. It comprises the local DDM manager and the DDM coordinator. The local
DDM manager exists in every federate model. The multicast group is determined by the DDM
coordinator and is sent back to the publisher’s local DDM manager. Then the publisher’s local DDM
manager will connect with the subscriber’s local DDM manager and send the data to the subscribers
directly. At the side of the subscriber’s local DDM manager, the irrelevant messages are filtered. The
DDM coordinator divides the whole routing space into a number of cells, and each cell will keep the two
lists, the publishers’ list and the subscribers’ list. There is no direct contact between the publishing
regions and the subscription regions; the publisher and the subscriber communicate by registering their
interest with the related cells’ publishers’ list or subscribers’ list.
For example in a time-stepped simulation, the publication and subscription regions U and S in figure
2 is interpreted in the following way:
At a point in time, the subscriber S registers its interest in {C13, C14, C23, C24, C33, C34} and the
publisher U delivers its data to {C32, C33, C42, C43}.
So at this time, the DDM will maintain a structure looking like this:
C11: {{NULL}, {NULL}}, (first list is for publishers, second list is for subscribers)
C12: {{NULL}, {NULL}},
C13: {{NULL}, {S}},
C14: {{NULL}, {S}},
C15: {{NULL}, {NULL}}, …
C31: {{NULL}, {NULL}},
C32: {{U}, {NULL}},
C33: {{U}, {S}},
C34: {{NULL}, {S}},
C35: {{NULL}, {NULL}}, …
C45: {{NULL}, {NULL}}
4
C11
C12
C13
C14
C15
C23
C24
C25
C33
C34
C35
C43
C44
C45
S
C21
C22
C31
C32
U
C41
C42
Figure 2: Grid-cells in Grid-based DDM
At this time, both the publishing region and the subscription region have an overlap in C33. So the
DDM manager will notify the publisher (U) to send its updates in C33 to the subscriber (S).
In this approach, if the subscription/update region is changed, the publisher’s list or the subscriber’s
list of some cells will change too. So the internal database of the DDM coordinator will change. This
modification will generate some cost, which we call “list update cost”. The value of this cost will depend
on the system situations of the DDM coordinator such as the memory size, CPU speed and so on.
Since the DDM filtering is used in a distributed environment, there is another cost of the
communication messages within the system. We call this the “message cost”. From figure 3, which
shows the functionality of the DDM manager, we find that the message cost is generated in 3 stages.
Publisher
DDM Coordinator
Request update
publishing region
Subscriber
Update publishing lists
Request update
subscription region
Update subscription lists
Matching
Reply multicast member
Send data
Figure 3: DDM diagram
1) Update/subscription region modification request
5
When the update regions or the subscription regions are changed, the information is to be transferred
from the federate models to the DDM coordinator. The cost of this stage depends on the number of
federates and update frequency. If the update frequency is too high, the number of passing messages
in this stage will be large.
2) The information of multicast groups
When the DDM coordinator establishes the multicast group for every publisher by the grid-based or
region-based approach, the DDM coordinator should send this information to the publishers.
3) The multicast message
When the publisher gets the multicast group information from the DDM coordinator, it will multicast
its attributes to all members of this group. The cost at this stage depends on the size of the multicast
group.
2.2 Cost of the region-based approach
The region-based filtering still has the message cost. But the difference with the grid-based approach is
that in the grid-based approach, there is the “ list update cost”, while in the region-based approach, there
is the “matching cost”.
As mentioned in the previous section, in the grid-based approach, the update regions need not
compare their regions with all subscription regions. The DDM coordinator maps these regions into the
grid cells in the routing space. However, in the region-based approach, to establish the multicast group for
each update region, the DDM coordinator must compare and match this region with all subscription
regions in the routing space. In the region-based filtering, if the number of objects is quite large, this
matching cost may be high and must be considered. In our model, the value of this cost depends on the
system abilities of the node where the DDM coordinator is.
3 Hybrid approach
The basic region-based approach generates matching cost while the grid-based approach generates lists
update and message costs. So we propose a hybrid approach to reduce the update/message cost of the
grid-based filtering and the matching cost of the region-based filtering. The basic idea of this approach is
to implement the DDM services in two phases:
1. Divide the routing space into grid-cells with fixed size and map the publish/subscription region into
the cells’ publisher list or subscriber list through the grid-based approach.
2. Use region-based approach to make the exact match to decide the multicast group for every update
region.
6
C11
C12
C13
C21
C22
C23
C31
C32
C33
S2
S3
C41
S1
C14
C15
C24
C25
C34
C35
C44
C45
U1
C42
C43
Figure 4: the hybrid approach
For example in Figure 4, there are three subscription regions (S1, S2, S3) and one update region (U1)
in the routing space. In the first phase, by grid-based approach, the routing space is divided into 4 by 5
cells. The cell lists look like the following:
C11{{NULL},{S2}}, C12{{NULL}, {S2}}, C13{{NULL}, {S1,S2}}, C14{{NULL}, {S1}}, C15{{NULL},
{NULL}},
C21{{NULL}, {S2}}, C22{{NULL}, {S2}}, C23{{NULL}, {S1,S2}}, C24{{NULL}, {S1}}, C25{{NULL},
{NULL}},
C31{{NULL}, {S3}}, C32{{U1}, {S3}}, C33{{U1}, {S1}}, C34{{NULL}, {S1}}, C35{{NULL}, {NULL}},
C41{{NULL}, {S3}}, C42{{U1}, {S3}}, C43{{U1}, {NULL}}, C44{{NULL}, {NULL}}, C45{{NULL},
{NULL}}
From this, the DDM manager establishes the multicast group for U1 as {U1: S1, S3}. Then in the
second phase, the DDM manager utilizes region-based approach, and since U1 and S3 have no overlap
with each other, the DDM manager removes S3 from U1’s multicast group. Finally, the group of U1 is
{U1: S1}.
For this example, if we had solely used the grid-based approach, the publisher’s multicast group
would have been {U1: S1, S3} (S3 is irrelevant); if we had used the region-based approach, the update
region U1 would have to be matched with all subscription regions - S1, S2 and S3.
4 Simulation results and analysis
7
In order to evaluate the performance of the hybrid approach, we simulate a tank dogfight scenario and run
it on a Fujitsu AP3000 32-node multiprocessor running SunOS 5.5.1. In this scenario, a two-dimensional
battlefield is mapped into a routing space and two federates, each of which contains a number of tank
objects, are simulated. It is assumed that each tank moves with a constant velocity in several time-steps.
Initially, the tanks are placed at random in the routing space and their directions are also determined at
random (North, South, East or West). This scenario is dynamic in the sense that the objects dynamically
modify their update and subscription regions.
Table 1 shows the parameters of this scenario (performed in a time-stepped simulation).
Parameter
Value
C1
1.0
C2
5.0
C3
1.0
C4
1.0
K
1.0
Number of Objects
80
Max Speed (km/h)
60.0
Sensor Range (km)
3.0
2
Routing Space (km )
40*40
Simulation time (time-steps)
100
Table 1. Assumptions for dog-fight scenario
Where C1 = cost of one publishing or subscription update,
C2 = cost of sending one message,
C3 = cost of filtering the irrelevant message in the grid-based approach and
C4 = matching cost in the region-based approach.
Here, K is the simulation time between two consecutive updates. K=1 means that there is one update
in each simulation time step. K=2 means there is one update after two time steps. So the smaller K is, the
higher the update frequency.
Figure 5 illustrates the number of messages in the hybrid and the grid-based approaches.
8
comparison between grid-based approach and hybrid approach
700000
number of messages
600000
500000
400000
grid-based approach
hybrid approach
300000
200000
100000
(1
00
*1
0
(9 0)
0*
9
(8 0)
0*
8
(7 0)
0*
7
(6 0)
0*
6
(5 0)
0*
5
(4 0)
0*
4
(3 0)
0*
3
(2 0)
0*
2
(1 0)
0*
10
)
(8
*8
)
(4
*4
)
(2
*2
)
(1
*1
)
0
number of grid cells
Figure 5: Comparison of the number of messages between grid-based and hybrid approaches (We
assume that no. of tanks=80, Update distance=4.0km, subscription distance=64.0km)
From Figure 5, we find that the hybrid approach has less number of messages than the grid-based
approach. In the grid-based approach, the number of the messages has a deep relationship with the grid
cell size. The larger the cell is, the more the irrelevant messages will be. In this scenario, the number of
irrelevant messages of the grid-based approach increases suddenly after the number of cells reduces to
100 (10*10). But in the hybrid approach, since we use the region-based phase to reduce the irrelevant
messages, the number of messages does not change significantly with the changing of the grid cell size
and the number of messages is kept low.
As mentioned previously, we say that the hybrid approach will have less matching than the regionbased approach. Figure 6 is a comparison of the number of matches between the hybrid approach (with
100 grid cells) and the region-based approach using different update frequencies (K) where there are 80
tanks in the simulation system.
9
comparison of matching number
700000
600000
500000
matching 400000
number 300000
200000
100000
0
region-based
approach
hybrid approach
k=1.0
k=2.0
k=3.0
simulation time for
each update
Figure 6: Comparison of matching number between the hybrid approach and the region-based
approach
The results presented in figure 6 supports our hypothesis. Using the hybrid approach reduces the
number of matching in the system. When K is 1, the matching number of the hybrid approach is much
lower than that of the region-based approach. With increasing K, the matching number of the regionbased approach reduces. This is because the update frequency decreases when K is larger. But the
matching number of the hybrid approach increases with increasing K. The reason is that the
update/subscription regions enlarge as K increases. This affects the hybrid approach in the grid-based
phase. So from figure 6, we see that the hybrid approach has better performance when the update
frequency is high. But this advantage will be less if K is larger.
Figure 7 shows the cost comparison of the grid-based and hybrid approaches when C1=1, C2=5 and
C3=C4=1, and K=1 in the dogfight scenario. When the number of cells is 10,000 (100*100), the hybrid
approach has more costs than the grid-based approach. From figure 5, we know that the message costs of
the two approaches are almost same. But because the hybrid approach has a large matching cost while the
grid-based approach’s irrelevant message filtering cost is lower, the total cost of hybrid approach is higher
than that of the grid-based approach. With the decrease of cell numbers, the number of irrelevant
messages in the grid-based approach begins to be larger. Although the matching cost of the hybrid
approach also increases with the decrease of cell numbers, since C2 (message sending cost) is larger than
10
C4 (matching cost), the total cost of the grid-based approach begins to exceed the cost of the hybrid
approach.
4000000
3500000
3000000
2500000
2000000
1500000
1000000
500000
0
grid-based approach
hybrid approach
(1
00
*1
(9 00)
0*
(8 90)
0*
(7 80)
0*
(6 70)
0*
(5 60)
0*
(4 50)
0*
(3 40)
0*
(2 30)
0*
(1 20)
0*
10
(8 )
*8
(4 )
*4
(2 )
*2
(1 )
*1
)
cost
cost comparison between the grid-based approach and ther hybrid
approach
number of cells
Figure 7: Cost comparison between grid-based and hybrid approach
So in a distributed system, if the network cost is much higher than the system cost of the nodes
(C2>>C4), we suggest using the hybrid approach with large grid-cells instead of the grid-based approach.
5 Conclusion
Data distribution is an important issue in large scale distributed simulations with several thousands of
entities. The broadcasting mechanism employed in Distributed Interactive Simulation (DIS) generates
unnecessary network traffic and is not suitable for large scale and dynamic simulations. An efficient data
distribution mechanism should filter the data and forward only the needed data to each federate. DDM
filtering approaches are aimed at reducing the unnecessary network traffic.
In this paper, we discussed the cost of two DDM filtering approaches. The grid-based approach will
generate some list update and message costs and Figure 5 suggests that one should avoid using very large
cell sizes. For the region-based approach, because it will generate too much matching cost, we can
consider using this approach when the update frequency is low (when K is large in Figure 6). Otherwise,
the matching cost will become prominent. To solve this limit, we can expand it with other technologies
such as clustering and hierarchies or use full-distributed models.
11
The hybrid approach proposed in this paper is an improvement over the region-based and the gridbased approaches. Preliminary results show that with this method, the matching cost is lower than that of
the region-based approach, and this advantage is more apparent if the update frequency is high. It also
produces a lower number of irrelevant messages than that of the grid-based approach using large cell
sizes. Finally, if the network communications cost is high, the hybrid method with large grid cell sizes is
recommended.
In this paper, only one scenario was used, more scenarios and experiments must be run to test the
general applicability of this hybrid approach.
References
1. Daniel J. Van Hook and James O. Calvin, Data distribution management in RTI 1.3, in Proceedings
of the Simulation Interoperability Workshop (SIW), Spring 1998.
2. HLA Data Distribution Management: Design Documents Version 0.7 (November 12, 1997).
3. Sundir Srinivasan and Paul F. Reynolds, Communications, Data Distribution and other Goodies in
the HLA Performance Model, In proceedings of the Winter Simulation conference 1997.
4. Ivan Tacic and Richard Fujimoto, Synchronized data distribution management in distributed
simulation, in Proceedings of the Simulation Interoperability Workshop (SIW), Spring 1997.
5. Danny Cohen and Andreas Kemkes, User-Level Measurement of DDM Scenarios, in Proceedings of
the Simulation Interoperability Workshop (SIW), Spring 1997.
6. Danny Cohen and Andreas Kemkes, Applying user-level measurements to RTI 1.3 Release 2, in
Proceedings of the Simulation Interoperability Workshop (SIW), Fall 1998.
7. Steven J. Rak and Daniel J. Van Hook, Evaluation of grid-based relevance filtering for multicast
group assignment, in Proceedings of the Distributed Interactive Simulation, 1996.
8. Pete Rizik et al., Optimal geographic routing space cell size in the FEDEP for prey-centric models, in
Proceedings of the Simulation Interoperability Workshop (SIW), Spring 1998.
9. Berrached, A., Beheshti, M., and Sirisaengtaksin, O. Evaluation of Grid-based Data Distribution in
the HLA, in Proceedings of the 1998 Conference on Simulation Methods and Applications, Orlando
FL, November 1-3 1998, pp. 209-215
10. G.Tan, R. Ayani, Y.S. Zhang and F.Moradi, "Grid-based Data Management in Distributed
Simulation", to appear in Proceedings of 33rd Annual Simulation Symposium, Washington, U.S.A.,
April 2000.
12
Download