Characterizing Scenarios for DDM Performance and Benchmarking RTIs Katherine L. Morse Science Application International Corporation Department of Information & Computer Science 10260 Campus Point Drive, MS B-1-E University of California San Diego, CA 92121 Irvine, CA 92697-3425 619-552-5442, 619-552-5112 kmorse@ics.uci.edu katherine_morse@cpqm.saic.com Dr. Lubomir Bic Dr. Michael Dillencourt Department of Information & Computer Science University of California Irvine, CA 92697-3425 {bic, dillenco}@ics.uci.edu Keywords: DDM, benchmark, RTI ABSTRACT: As more High Level Architecture (HLA) Run Time Infrastructures (RTIs) become available, potential users are faced with the prospect of choosing one which best fits the performance characteristics of their applications. This prospect is particularly complex when evaluating the performance of HLA Data Distribution Management (DDM) services because detailed design information for DDM systems is not widely available. Furthermore, it is not clear that the users can reasonably be expected to extrapolate expected performance of DDM on their applications from a detailed DDM design. This problem is not new to computer and compiler users in general. The solution to this problem has long been the development and wide exercise of benchmarks. For a benchmark set to be effective in its domain, it must precisely identify quantifiable characteristics of interest and exercise them independently and in combinations which are directly comparable to potential applications. This paper describes just such a set of quantifiable characteristics for scenarios using DDM, a set of benchmarks which exercise them, and the results of running the benchmarks on RTI 1.3. The work presented in this paper is part of the High Level Architecture development process underway in the Defense Modeling and Simulation office (DMSO) and the DoD Architecture Management Group 1. 1 Introduction The primary goal of HLA [1] DDM services is to reduce the amount of data received by federates. But they should not do so at the cost of excessive overhead, i.e. they should not use more CPU cycles and/or delay data delivery more than the federates would if they only used Declaration Management (DM) and performed final filtering themselves. It goes without saying that DDM services should also deliver the correct data, i.e. it must deliver at least the required data but may deliver more. This paper describes a benchmark algorithm designed to exercise DDM implementations in a way which is controllable by users so they may evaluate RTIs against tests which approximate their own intended scenarios. Section 2 describes performance 1 measures which can be used to evaluate an RTI’s ability to meet these goals for its DDM implementation. Section 3 analyzes the characteristics of scenarios which are relevant strictly to DDM. The benchmark algorithm which exercises these characteristics is described in section 4. Section 5 lays out the range of experiments which exercise key steps in the process of using DDM, followed by the results of running the experiments on RTI 1.3 v5 in section 6. Finally in section 0 we state the simplifying assumptions we made for this algorithm and accompanying experiments, suggesting optimizations for future DDM implementations. Initial work on this project was funded under DARPA ASTT contract MDA9972-97-C-0023. 2 Performance Measures In section 1 we stated that DDM’s primary goal is to reduce the amount of data received by federates, but this requirement is derived from a higher requirement to enable federates to perform their jobs of simulating their models in a timely manner. To do so, federates must receive data in a timely manner and they must have enough CPU cycles available to process the data when they receive it. Balancing these two requirements is central to the performance of a DDM implementation. Suppose a federate is only using DM. The RTI may not have to use as many CPU cycles because it doesn’t have to manage regions, but the receiving federate will have to expend additional cycles to discard irrelevant data2. This situation may also impact the timeliness of the federate’s receipt of desired data in two ways. First, the data may be delayed in the network because the network is flooded with data which will only be discarded by the receiver. Second, the federate has to expend time as well as CPU cycles to discard unwanted data when it’s received, delaying the federate’s processing of the desired data. So, two measures are of interest: 1. Efficiency = CPU cycles expended per desired update; an efficient DDM implementation will use fewer CPU cycles across the federation than the federates would discarding unwanted data. 2. Latency = time to receive wanted data; to be fair in evaluating DDM implementations this is the time between the sending federate calling the RTI and the receiving federate determining that the data is of interest, i.e. including the time it would take the federate to discard enough unwanted data to get to the desired data. See [5] for a description of the measurement method for CPU usage. 3 Characterizing Scenarios Since this work is targeted at analyzing DDM, a characterization of scenarios should not focus on the semantic relationship between the modeled world and its routing spaces and regions3, but on their effect on the performance of DDM. The following taxonomy characterizes scenarios strictly with respect to DDM in a way that users can be reasonably expected to be able to characterize their scenarios. 3.1 Number of Regions (r) More regions will require more memory to store within RTI components and more CPU cycles to search when regions are updated. 3.2 Rate of Region Modification (r) Logically, whenever an update region or subscription region in a routing space is modified, the RTI must recalculate the region’s intersections with all other regions of the opposite type. In practice, optimizations could subdivide the routing space and improve this performance, but in general we expect that frequent region modifications will require more CPU cycles to recalculate intersections. 3.3 Number of Intersections (i) Since all region intersections for a given object class and routing space must be rechecked when either an update or subscription region is modified, a large number of intersections will require more CPU cycles to recalculate. 3.4 Rate of Region Intersection Change (i) DDM services are not directly responsible for sending and receiving updates, they only establish connectivity between federates for the data. And they do so in parallel with other federate and federation activities such as sending and receiving data. The effects of region modifications may not be instantaneous. If the DDM implementation’s time to establish connectivity is high, then rapid changes in the intersections will not be affected immediately, resulting in a larger percentage of unwanted data. Note that this is different from r because regions can change without the intersections between them changing. 4 The Benchmark Algorithm Having established in section 3 the characteristics we wish the benchmark to manifest, we now describe the algorithm for the benchmark. The size, shape, and pattern of regions in this algorithm are intentionally artificial. The goal is to be able to control accurately the DDM-specific characteristics of the scenario, not to represent a realistic simulation scenario. See [4, 5] for DDM benchmarking which more closely approximates a realistic scenario. Figure 4-1 illustrates a sample region layout for a test with 5 federates, 12 regions, and 86 intersections. 2 This assumes an RTI architecture roughly like the current DMSO architecture described in [2] in which RTI components are hosted on the same processors as the federates. 3 This is the subject of a previous paper[3]. Subscription regions are laid out in non-overlapping horizontal bands. Update regions are laid out in vertical bands. Subscription regions extend the entire width of the x dimension from RTI_MIN to RTI_MAX, and their extents never change. This is the same usage scenario labeled as (e) in [5]. Update regions are responsible for calculating and modifying their extents to create and manage the number of intersections requested4. Within a federate, they start from RTI_MIN in the y dimension and extend as far toward RTI_MAX as necessary to create the number of intersections required. we need to adjust the update region lengths to account for overlaps with their own federate’s subscription region. Figure 4-2 illustrates this extension for the example. The dark blocks mark the bands of local subscription regions that the update regions must extend around. RTI_MAX f4s2 RTI_MAX f4s2 f4s1 f3s2 f4s1 f3s1 f3s2 f2s2 f3s1 f2s1 f2s2 Y f2s1 f1s3 f1s3 f1s2 f1s2 f1s1 f1s1 f0s3 f0s3 f0s2 Y f0u1 f0u2 f0u3 f1u1 f1u2 f1u3 f2u1 f2u2 f3u1 f3u2 f4u1 f4u2 RTI_MAX RTI_MIN f0s1 RTI_MIN f0u1 f0u2 f0u3 f1u1 f1u2 f1u3 f2u1 f2u2 f3u1 f3u2 f4u1 X Figure 4-1. Sample Region Layout Figure 4-2. Accounting for Local Subscription Regions Regions are uniformly assigned to federates to the extent possible given the number of regions and federates. In most cases the algorithm is always accurate, but assigning intersection changes may be off by one or two depending on the divisibility of the user’s parameters. Notice in Figure 4-1 that specifying 12 regions means that federates 0 and 1 each have three regions while the remaining three federates each have two. As a result, when the 86 intersections are allocated, all federates get 17 intersections except for federate 0 which gets 18, but federates 2 through 4 have to allocate each of their 17 intersections across only two update regions rather than three. So far we have only accounted for the initial placement of regions. We must also account for region modifications and intersection changes: r and i. r is specified in modifications per minute. i is specified as a percentage of i. Intersection changes are made by sliding the update regions “up” and “down” across the subscription regions, always accounting for the black hole of local subscription regions. The number of changes is controlled by how far the region slides. Observe that you can’t have a region intersection change without modifying a region, so region modifications happen automatically with intersection changes. There’s only one small problem with the layout as described. “Intersections” between your own subscription and update regions don’t “count”, e.g. even though f0u1 and f0s1 overlap in the figure, that’s not an intersection from the perspective of DDM. So, Notice that each federate can independently calculate region location, size, and movement based on knowing how many other federates are participating and how many total regions and intersections there are. In fact within each federate the modification of each update region is independent of the modification of every other one because once each update region “knows” how many how often it must change, and how many intersections and intersection changes it’s responsible for maintaining, it performs these actions against the X We don’t have to worry about whether or not update regions overlap each other because the update regions control all region modifications leading to intersections. 4 f4u2 RTI_MAX RTI_MIN RTI_MIN f0s1 f0s2 static subscription regions without interaction with the other update regions. 4.1 Inputs The benchmark takes the following inputs: Number of federates – there is no specified limit on the number of federates which can participate. Federate number - 0 ≤ federate number < number of federates. Total regions – there is no specified limit on the number of regions which can be created, but there is a practical limit based on the RTI range for a dimension because it must be subdivided to form disjoint subscription regions. The benchmark actually creates twice this many regions: one set of update regions this size and one set of subscription regions this size. Regions are uniformly assigned to federates plus or minus one based on the ratio of regions to federates. r – region modifications are given in modifications per minute. The default cycle time for modifications is one minute, but if more region modifications are specified per minute than can be accommodated by the number of regions, the cycle time is divided until it will accommodate them. Total intersections – total intersections ≤ total regions * (total regions – local regions), i.e. each update region can intersect with every subscription region except those assigned to the local federate. i – region intersection changes are specified as a percentage of i. i cannot always be exactly the number requested by the user because of the relationship between i and the number of intersections. If the number of intersections is to remain constant, intersection changes must occur in pairs; sliding an update region up or down by one subscription region results in one new intersection and the deletion of an existing one. Once the algorithm uniformly assigns intersection changes to the federates and the federates assign them to update regions, the federates attempt to even out assignments to update regions, adjusting adjacent regions up and down by one. However, the number assigned to the federate in the first place could have been odd. Number of minutes to run; number of seconds to allow the federation to run before beginning measurements; number of milliseconds to allocate for a single measurement loop. DM/DDM switch – this compiler switch is provided to enable measurement of the difference between using just DM and using DDM as described in section 2. Even when the algorithm is just using DM, it performs all the region modification calculations because the federate would still have to use this information to identify unwanted data. These calculations also represent the normal federate activity of moving objects with which the update regions are associated. 5 Experiments The experiments described in this section are designed to exercise general types of scenarios, e.g. when regions are established statically and don’t change for the duration of the scenario vs. update regions changing dynamically as objects move. For all experiments, a single object is associated with each update region and each workstation supports only one federate. Each experiment is run for 7 minutes, with results recorded after 2 minutes. The first experiment measures the cost of using just DM, relying solely on the federates to discard irrelevant data. This is the same usage scenario labeled as (a) in [5]. Table 5-1. DM Only f r r i i [2, 5, 10] N/A N/A N/A N/A The second experiment measures the cost of using DDM with static regions. It is designed to demonstrate how efficient DDM is over just using DM and discarding irrelevant data without introducing the other effects of DDM overhead. Table 5-2. Static Regions f r r i i [2, 5, 10] [50, 200, 1000] 0 [r2/50, r2/10] 0 The third test isolates the cost of making region modifications without any intersection changes. If a DDM implementation were cleverly optimized, for this test we would expect to see only slightly higher CPU usage but little other effect since the connectivity doesn’t change, so the flow of data should remain unchanged. Since regions are uniformly assigned to federates, r per federate = r/f which ranges from 5 to 500 per minute. Table 5-3. Isolated Regions Modifications f r r i i [2, 5, 10] [50, 200, 1000] r [r2/50, r2/10] 0 The final test exercises DDM under “realistic” circumstances, i.e. regions are modified fairly frequently resulting in region intersection changes. Table 5-4. Fully Dynamic DDM Usage f r r i i 6 [2, 5, 10] [50, 200, 1000] r [r2/50, r2/10] [i, 2i] Results The experiments in section 5 will be run with RTI 1.3 v5 on Windows NT and Sun OS 5.5. Federates will send 600 updates per minute per region. 7 Simplifying Assumptions and Future Work To expedite initial development of the benchmark, we made some simplifying assumptions, most of which could be relaxed in the future. While we don’t expect that these assumptions have measurable impact on the results of the benchmark, we can envision optimizations in DDM implementations under which performance could change when these static assumptions are allowed to vary. Only one routing space5 is used. Only one object class and set of attributes is used. Only update regions change. Regions, region modifications, intersections, intersection changes are uniformly assigned to federates. Federates have an equal number of update and subscription regions. Interactions are not included. We did not measure performance differences using the advisory switches. Are other measures more appropriate for RTI implementations which do not have architectures similar to the current one with some computation done at the local RTI component? 5 If the proposed IEEE HLA Federate Specification is approved, separate routing spaces will go away and this issue with it. 8 Conclusions We have developed a DDM benchmark which allows users to roughly approximate the anticipated DDM usage in their intended scenarios and demonstrated how to use it to measure performance as it’s relevant to federate and federation developers. While the benchmark is not a substitute for actually implementing and evaluating DDM within a federation, it can serve the intended purpose of benchmarks, allowing potential users to quantitatively evaluate vendor offerings with a standardized measure. 9 Acknowledgements We wish to extend our thanks to Andreas Kemkes, Danny Cohen, and their team at Perceptronics for setting us straight on performance measurements and especially for integrating the benchmark algorithm into their performance framework so we could get measures. 10 References [1] Defense Modeling and Simulation Office: “High Level Architecture RTI Interface Specification, Version 1.3,” April 1998. [2] R. Weatherly : “RTI 1.3 Architecture,” Proceedings of the 1998 Fall Simulation Interoperability Workshop, 98F-SIW-132, September 1998. [3] K. Morse and J. Steinman: “Data Distribution Management in the HLA: Multidimensional Regions and Physically Correct Filtering,” Proceedings of the 1997 Spring Simulation Interoperability Workshop, 97S-SIW-052, March 1997. [4] D. Cohen and A. Kemkes: “User-Level Measurements of DDM Scenarios,” Proceedings of the 1998 Spring Simulation Interoperability Workshop, 98S-SIW-072, March 1998. [5] D. Cohen and A. Kemkes: “Applying User-Level Measurements to RTI 1.3 Release 2,” Proceedings of the 1998 Fall Simulation Interoperability Workshop, 98F-SIW-132, September 1998. Author Biographies KATHERINE L. MORSE is a Senior Computer Scientist with SAIC, received her B.S. in mathematics (1982), B.A. in Russian (1983), and M.S. in computer science (1986) from the University of Arizona. Ms. Morse has worked in industry for over 20 years in the areas of compilers, operating systems, neural networks, simulation, speech recognition, image processing, computer security, and engineering process development. She received an M.S. in information & computer science from the University of California, Irvine in 1995 and is currently pursuing her Ph.D. in the same field, focusing on the application of mobile agents to dynamic DDM. LUBOMIR F. BIC received the MS degree in Computer Science from the Technical University Darmstadt, Germany, in 1976 and the pH in Information and Computer Science from the University of California, Irvine, in 1979. He is currently Professor of Information and Computer Science at the University of California, Irvine. His general research interests lie in the areas of parallel and distributed computing. He is focusing primarily on programming paradigms and environments for loosely-coupled multicomputers and distributed simulation/modeling systems.