KM-SSIW - University of California, Irvine

advertisement
Characterizing Scenarios for DDM Performance and Benchmarking RTIs
Katherine L. Morse
Science Application International Corporation
Department of Information & Computer Science
10260 Campus Point Drive, MS B-1-E
University of California
San Diego, CA 92121
Irvine, CA 92697-3425
619-552-5442, 619-552-5112
kmorse@ics.uci.edu
katherine_morse@cpqm.saic.com
Dr. Lubomir Bic
Dr. Michael Dillencourt
Department of Information & Computer Science
University of California
Irvine, CA 92697-3425
{bic, dillenco}@ics.uci.edu
Keywords:
DDM, benchmark, RTI
ABSTRACT: As more High Level Architecture (HLA) Run Time Infrastructures (RTIs) become available, potential
users are faced with the prospect of choosing one which best fits the performance characteristics of their applications.
This prospect is particularly complex when evaluating the performance of HLA Data Distribution Management
(DDM) services because detailed design information for DDM systems is not widely available. Furthermore, it is not
clear that the users can reasonably be expected to extrapolate expected performance of DDM on their applications
from a detailed DDM design. This problem is not new to computer and compiler users in general. The solution to
this problem has long been the development and wide exercise of benchmarks. For a benchmark set to be effective in
its domain, it must precisely identify quantifiable characteristics of interest and exercise them independently and in
combinations which are directly comparable to potential applications. This paper describes just such a set of
quantifiable characteristics for scenarios using DDM, a set of benchmarks which exercise them, and the results of
running the benchmarks on RTI 1.3.
The work presented in this paper is part of the High Level Architecture development process underway in the Defense
Modeling and Simulation office (DMSO) and the DoD Architecture Management Group 1.
1
Introduction
The primary goal of HLA [1] DDM services is to
reduce the amount of data received by federates. But
they should not do so at the cost of excessive overhead,
i.e. they should not use more CPU cycles and/or delay
data delivery more than the federates would if they only
used Declaration Management (DM) and performed
final filtering themselves. It goes without saying that
DDM services should also deliver the correct data, i.e.
it must deliver at least the required data but may deliver
more. This paper describes a benchmark algorithm
designed to exercise DDM implementations in a way
which is controllable by users so they may evaluate
RTIs against tests which approximate their own
intended scenarios. Section 2 describes performance
1
measures which can be used to evaluate an RTI’s
ability to meet these goals for its DDM implementation.
Section 3 analyzes the characteristics of scenarios
which are relevant strictly to DDM. The benchmark
algorithm which exercises these characteristics is
described in section 4. Section 5 lays out the range of
experiments which exercise key steps in the process of
using DDM, followed by the results of running the
experiments on RTI 1.3 v5 in section 6. Finally in
section 0 we state the simplifying assumptions we made
for this algorithm and accompanying experiments,
suggesting
optimizations
for
future
DDM
implementations.
Initial work on this project was funded under DARPA ASTT contract MDA9972-97-C-0023.
2
Performance Measures
In section 1 we stated that DDM’s primary goal is to
reduce the amount of data received by federates, but
this requirement is derived from a higher requirement
to enable federates to perform their jobs of simulating
their models in a timely manner. To do so, federates
must receive data in a timely manner and they must
have enough CPU cycles available to process the data
when they receive it. Balancing these two requirements
is central to the performance of a DDM
implementation.
Suppose a federate is only using DM. The RTI may
not have to use as many CPU cycles because it doesn’t
have to manage regions, but the receiving federate will
have to expend additional cycles to discard irrelevant
data2. This situation may also impact the timeliness of
the federate’s receipt of desired data in two ways.
First, the data may be delayed in the network because
the network is flooded with data which will only be
discarded by the receiver. Second, the federate has to
expend time as well as CPU cycles to discard unwanted
data when it’s received, delaying the federate’s
processing of the desired data. So, two measures are of
interest:
1. Efficiency = CPU cycles expended per desired
update; an efficient DDM implementation will use
fewer CPU cycles across the federation than the
federates would discarding unwanted data.
2. Latency = time to receive wanted data; to be fair in
evaluating DDM implementations this is the time
between the sending federate calling the RTI and
the receiving federate determining that the data is
of interest, i.e. including the time it would take the
federate to discard enough unwanted data to get to
the desired data.
See [5] for a description of the measurement method
for CPU usage.
3
Characterizing Scenarios
Since this work is targeted at analyzing DDM, a
characterization of scenarios should not focus on the
semantic relationship between the modeled world and
its routing spaces and regions3, but on their effect on
the performance of DDM. The following taxonomy
characterizes scenarios strictly with respect to DDM in
a way that users can be reasonably expected to be able
to characterize their scenarios.
3.1
Number of Regions (r)
More regions will require more memory to store within
RTI components and more CPU cycles to search when
regions are updated.
3.2
Rate of Region Modification (r)
Logically, whenever an update region or subscription
region in a routing space is modified, the RTI must
recalculate the region’s intersections with all other
regions of the opposite type. In practice, optimizations
could subdivide the routing space and improve this
performance, but in general we expect that frequent
region modifications will require more CPU cycles to
recalculate intersections.
3.3
Number of Intersections (i)
Since all region intersections for a given object class
and routing space must be rechecked when either an
update or subscription region is modified, a large
number of intersections will require more CPU cycles
to recalculate.
3.4
Rate of Region Intersection Change (i)
DDM services are not directly responsible for sending
and receiving updates, they only establish connectivity
between federates for the data. And they do so in
parallel with other federate and federation activities
such as sending and receiving data. The effects of
region modifications may not be instantaneous. If the
DDM implementation’s time to establish connectivity
is high, then rapid changes in the intersections will not
be affected immediately, resulting in a larger
percentage of unwanted data. Note that this is different
from r because regions can change without the
intersections between them changing.
4
The Benchmark Algorithm
Having established in section 3 the characteristics we
wish the benchmark to manifest, we now describe the
algorithm for the benchmark. The size, shape, and
pattern of regions in this algorithm are intentionally
artificial. The goal is to be able to control accurately
the DDM-specific characteristics of the scenario, not to
represent a realistic simulation scenario. See [4, 5] for
DDM benchmarking which more closely approximates
a realistic scenario. Figure 4-1 illustrates a sample
region layout for a test with 5 federates, 12 regions, and
86 intersections.
2
This assumes an RTI architecture roughly like the
current DMSO architecture described in [2] in which
RTI components are hosted on the same processors as
the federates.
3
This is the subject of a previous paper[3].
Subscription regions are laid out in non-overlapping
horizontal bands. Update regions are laid out in
vertical bands. Subscription regions extend the entire
width of the x dimension from RTI_MIN to
RTI_MAX, and their extents never change. This is the
same usage scenario labeled as (e) in [5]. Update
regions are responsible for calculating and modifying
their extents to create and manage the number of
intersections requested4. Within a federate, they start
from RTI_MIN in the y dimension and extend as far
toward RTI_MAX as necessary to create the number of
intersections required.
we need to adjust the update region lengths to account
for overlaps with their own federate’s subscription
region. Figure 4-2 illustrates this extension for the
example. The dark blocks mark the bands of local
subscription regions that the update regions must
extend around.
RTI_MAX
f4s2
RTI_MAX
f4s2
f4s1
f3s2
f4s1
f3s1
f3s2
f2s2
f3s1
f2s1
f2s2
Y
f2s1
f1s3
f1s3
f1s2
f1s2
f1s1
f1s1
f0s3
f0s3
f0s2
Y
f0u1
f0u2
f0u3
f1u1
f1u2
f1u3
f2u1
f2u2
f3u1
f3u2
f4u1
f4u2
RTI_MAX
RTI_MIN
f0s1
RTI_MIN
f0u1
f0u2
f0u3
f1u1
f1u2
f1u3
f2u1
f2u2
f3u1
f3u2
f4u1
X
Figure 4-1. Sample Region Layout
Figure 4-2. Accounting for Local Subscription
Regions
Regions are uniformly assigned to federates to the
extent possible given the number of regions and
federates. In most cases the algorithm is always
accurate, but assigning intersection changes may be off
by one or two depending on the divisibility of the user’s
parameters. Notice in Figure 4-1 that specifying 12
regions means that federates 0 and 1 each have three
regions while the remaining three federates each have
two. As a result, when the 86 intersections are
allocated, all federates get 17 intersections except for
federate 0 which gets 18, but federates 2 through 4
have to allocate each of their 17 intersections across
only two update regions rather than three.
So far we have only accounted for the initial placement
of regions.
We must also account for region
modifications and intersection changes: r and i. r
is specified in modifications per minute. i is specified
as a percentage of i. Intersection changes are made by
sliding the update regions “up” and “down” across the
subscription regions, always accounting for the black
hole of local subscription regions. The number of
changes is controlled by how far the region slides.
Observe that you can’t have a region intersection
change without modifying a region, so region
modifications happen automatically with intersection
changes.
There’s only one small problem with the layout as
described.
“Intersections” between your own
subscription and update regions don’t “count”, e.g.
even though f0u1 and f0s1 overlap in the figure, that’s
not an intersection from the perspective of DDM. So,
Notice that each federate can independently calculate
region location, size, and movement based on knowing
how many other federates are participating and how
many total regions and intersections there are. In fact
within each federate the modification of each update
region is independent of the modification of every other
one because once each update region “knows” how
many how often it must change, and how many
intersections and intersection changes it’s responsible
for maintaining, it performs these actions against the
X
We don’t have to worry about whether or not update
regions overlap each other because the update regions
control all region modifications leading to
intersections.
4
f4u2
RTI_MAX
RTI_MIN
RTI_MIN
f0s1
f0s2
static subscription regions without interaction with the
other update regions.
4.1
Inputs
The benchmark takes the following inputs:
 Number of federates – there is no specified limit
on the number of federates which can participate.
 Federate number - 0 ≤ federate number < number
of federates.
 Total regions – there is no specified limit on the
number of regions which can be created, but there
is a practical limit based on the RTI range for a
dimension because it must be subdivided to form
disjoint subscription regions. The benchmark
actually creates twice this many regions: one set of
update regions this size and one set of subscription
regions this size. Regions are uniformly assigned
to federates plus or minus one based on the ratio of
regions to federates.
 r – region modifications are given in
modifications per minute. The default cycle time
for modifications is one minute, but if more region
modifications are specified per minute than can be
accommodated by the number of regions, the cycle
time is divided until it will accommodate them.
 Total intersections – total intersections ≤ total
regions * (total regions – local regions), i.e. each
update region can intersect with every subscription
region except those assigned to the local federate.
 i – region intersection changes are specified as a
percentage of i. i cannot always be exactly the
number requested by the user because of the
relationship between i and the number of
intersections. If the number of intersections is to
remain constant, intersection changes must occur
in pairs; sliding an update region up or down by
one subscription region results in one new
intersection and the deletion of an existing one.
Once the algorithm uniformly assigns intersection
changes to the federates and the federates assign
them to update regions, the federates attempt to
even out assignments to update regions, adjusting
adjacent regions up and down by one. However,
the number assigned to the federate in the first
place could have been odd.
 Number of minutes to run; number of seconds to
allow the federation to run before beginning
measurements; number of milliseconds to allocate
for a single measurement loop.
 DM/DDM switch – this compiler switch is
provided to enable measurement of the difference
between using just DM and using DDM as
described in section 2. Even when the algorithm is
just using DM, it performs all the region
modification calculations because the federate
would still have to use this information to identify
unwanted data. These calculations also represent
the normal federate activity of moving objects with
which the update regions are associated.
5
Experiments
The experiments described in this section are designed
to exercise general types of scenarios, e.g. when
regions are established statically and don’t change for
the duration of the scenario vs. update regions changing
dynamically as objects move. For all experiments, a
single object is associated with each update region and
each workstation supports only one federate. Each
experiment is run for 7 minutes, with results recorded
after 2 minutes.
The first experiment measures the cost of using just
DM, relying solely on the federates to discard
irrelevant data. This is the same usage scenario labeled
as (a) in [5].
Table 5-1. DM Only
f
r
r
i
i
[2, 5, 10]
N/A
N/A
N/A
N/A
The second experiment measures the cost of using
DDM with static regions. It is designed to demonstrate
how efficient DDM is over just using DM and
discarding irrelevant data without introducing the other
effects of DDM overhead.
Table 5-2. Static Regions
f
r
r
i
i
[2, 5, 10]
[50, 200, 1000]
0
[r2/50, r2/10]
0
The third test isolates the cost of making region
modifications without any intersection changes. If a
DDM implementation were cleverly optimized, for this
test we would expect to see only slightly higher CPU
usage but little other effect since the connectivity
doesn’t change, so the flow of data should remain
unchanged. Since regions are uniformly assigned to
federates, r per federate = r/f which ranges from 5 to
500 per minute.
Table 5-3. Isolated Regions Modifications
f
r
r
i
i
[2, 5, 10]
[50, 200, 1000]
r
[r2/50, r2/10]
0
The final test exercises DDM under “realistic”
circumstances, i.e. regions are modified fairly
frequently resulting in region intersection changes.
Table 5-4. Fully Dynamic DDM Usage
f
r
r
i
i
6
[2, 5, 10]
[50, 200, 1000]
r
[r2/50, r2/10]
[i, 2i]
Results
The experiments in section 5 will be run with RTI 1.3
v5 on Windows NT and Sun OS 5.5. Federates will
send 600 updates per minute per region.
7
Simplifying Assumptions and Future
Work
To expedite initial development of the benchmark, we
made some simplifying assumptions, most of which
could be relaxed in the future. While we don’t expect
that these assumptions have measurable impact on the
results of the benchmark, we can envision
optimizations in DDM implementations under which
performance could change when these static
assumptions are allowed to vary.
 Only one routing space5 is used.
 Only one object class and set of attributes is used.
 Only update regions change.
 Regions, region modifications, intersections,
intersection changes are uniformly assigned to
federates.
 Federates have an equal number of update and
subscription regions.
 Interactions are not included.
 We did not measure performance differences using
the advisory switches.
 Are other measures more appropriate for RTI
implementations which do not have architectures
similar to the current one with some computation
done at the local RTI component?
5
If the proposed IEEE HLA Federate Specification is
approved, separate routing spaces will go away and this
issue with it.
8
Conclusions
We have developed a DDM benchmark which allows
users to roughly approximate the anticipated DDM
usage in their intended scenarios and demonstrated how
to use it to measure performance as it’s relevant to
federate and federation developers.
While the
benchmark is not a substitute for actually implementing
and evaluating DDM within a federation, it can serve
the intended purpose of benchmarks, allowing potential
users to quantitatively evaluate vendor offerings with a
standardized measure.
9
Acknowledgements
We wish to extend our thanks to Andreas Kemkes,
Danny Cohen, and their team at Perceptronics for
setting us straight on performance measurements and
especially for integrating the benchmark algorithm into
their performance framework so we could get
measures.
10 References
[1] Defense Modeling and Simulation Office: “High
Level Architecture RTI Interface Specification,
Version 1.3,” April 1998.
[2] R. Weatherly :
“RTI 1.3 Architecture,”
Proceedings of the 1998 Fall Simulation
Interoperability
Workshop,
98F-SIW-132,
September 1998.
[3] K. Morse and J. Steinman: “Data Distribution
Management in the HLA:
Multidimensional
Regions and Physically Correct Filtering,”
Proceedings of the 1997 Spring Simulation
Interoperability Workshop, 97S-SIW-052, March
1997.
[4] D. Cohen and A. Kemkes:
“User-Level
Measurements of DDM Scenarios,” Proceedings of
the 1998 Spring Simulation Interoperability
Workshop, 98S-SIW-072, March 1998.
[5] D. Cohen and A. Kemkes: “Applying User-Level
Measurements to RTI 1.3 Release 2,” Proceedings
of the 1998 Fall Simulation Interoperability
Workshop, 98F-SIW-132, September 1998.
Author Biographies
KATHERINE L. MORSE is a Senior Computer
Scientist with SAIC, received her B.S. in mathematics
(1982), B.A. in Russian (1983), and M.S. in computer
science (1986) from the University of Arizona. Ms.
Morse has worked in industry for over 20 years in the
areas of compilers, operating systems, neural networks,
simulation, speech recognition, image processing,
computer
security,
and
engineering
process
development. She received an M.S. in information &
computer science from the University of California,
Irvine in 1995 and is currently pursuing her Ph.D. in
the same field, focusing on the application of mobile
agents to dynamic DDM.
LUBOMIR F. BIC received the MS degree in
Computer Science from the Technical University
Darmstadt, Germany, in 1976 and the pH in
Information and Computer Science from the University
of California, Irvine, in 1979. He is currently Professor
of Information and Computer Science at the University
of California, Irvine. His general research interests lie
in the areas of parallel and distributed computing. He
is focusing primarily on programming paradigms and
environments for loosely-coupled multicomputers and
distributed simulation/modeling systems.
Download