Memory System Behavior of Java-Based Middleware Martin Karlsson, Kevin E. Moore, Erik Hagersten and David A. Wood February 11, 2003 Ninth International Symposium on High Performance Computer Architecture Java-Based Middleware: An Important New Workload for Multiprocessor Servers • Java-Based middleware connects Web pages to databases • Web-based applications are deployed in 3-tier systems – Clients – Middleware (e.g. application servers) – Databases • Rapid growth • Diverse clients will increase the role of middleware Browsers/ Thin Clients Web Server Middleware Business Logic Databases LAN/WAN HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 2 Java Middleware Benchmarks • SPECjbb2000 – Approximates a 3-tier system in a single application – Will run on any JVM without any 3rd-party software – Easy to install, tune and run (set up time measured in hours) • ECperf (now SPECjAppServer2001) – Runs on a real 3-tier system – Easy to isolate the behavior of individual tiers – Requires expensive 3rd-party software (application server and database) – Difficult to install, tune and run (set up time measured in weeks) HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 3 Outline • Background – 3-Tiered Systems – ECperf and SPECjbb2000 • Hardware monitoring experiments – System size scaling – Benchmark scaling • Simulation Experiments – Cache Performance • Design decisions – Shared Caches HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 4 Application Servers & 3-Tiered Systems Users/Customers (e-commerce) • 3-tiered systems are common in e-commerce and B2B applications • Application servers provide a framework for middle-tier applications – Presentation – Business Rules • Services include Tier 1 Application Server Presentation Logic Other Businesses (B2B) Business Rules – Database connectivity – Client connectivity – Resource management • Application servers often implemented in Java HPCA February 11, 2003 Tier 2 Database Memory System Performance of Java-Based Middleware Tier 3 5 ECperf Driver • Runs on top of existing commercial applications (Database and Application Server) – Adds Cost, tuning effort – Restricted source code • Consists of 4 networked programs – – – – Application Server Database Supplier Emulator Driver • Runs on multiple machines Supplier Emulator Order Agents Emulator Servlet Mfg Agents Application Server Servlet Host Presentation Logic Orders & Mfg Servlets Supplier Servlets Java Beans EJB Container Business Rules Mfg Supplier Corp Orders Database – Easy to isolate tiers HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 6 ECperf Driver • Runs on top of existing commercial applications (Database and Application Server) Order Agents Application Server Servlet Host • Consists of 4 networked Measurements programs on middle tier Application Server Database Supplier Emulator Driver • Runs on multiple machines Emulator Servlet Mfg Agents – Adds Cost, tuning effort – Restricted source code – – – – Supplier Emulator only Orders & Mfg Servlets Supplier Servlets EJB Container Mfg Supplier Corp Orders Database – Easy to isolate tiers HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 7 SPECjbb2000 • Single JVM • Database emulated by trees of Java objects • Easy to install tune and run • Available source code • Difficult to measure behavior of individual tiers Client Threads Business Logic Engine Object Trees Benchmark Process HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 8 SPECjbb2000 • Single JVM • Database emulated by trees of Java objects • Easy to install tune and run • Available source code • Difficult to measure behavior of individual tiers Client Threads Measurements include database and client code Business Logic Engine Object Trees Benchmark Process HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 9 Outline • Background – 3-Tiered Systems – ECperf and SPECjbb2000 • Hardware monitoring experiments – System size scaling – Benchmark scaling • Simulation Experiments – Cache Performance • Design decisions – Shared Caches HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 10 Monitoring Experiments • Hardware – Sun E6000 (SPECjbb2000, Application Server, Database) • 16, 248 MHz UltraSparc II processors • 2 GB RAM • 1 MB unified L2 cache – Sun Netra (Emulator, Driver) • 1, 500 MHz UltraSparc IIe • Software – HotSpot 1.3.1 JVM – Solaris 8 HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 11 Benchmark Settings and Alterations • SPECjbb2000 – Increased warm-up and measurement intervals • 60 s warm-up and 6 min measurement – Picked 1 value for the number of warehouses • #warehouses = #processors • ECperf – Relaxed response time requirements • JVM Options – Heap Size = 1424 MB – ISM – New Generation = 400 MB HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 12 Performance Scaling 15 Speedup 10 Linear SPECjbb ECperf 5 0 0 2 4 6 8 10 12 14 16 Processors HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 13 Data Sharing 80 Cache to Cache Transfer Ratio (%) 70 60 50 ECperf 40 SPECjbb 30 20 10 0 0 2 4 6 8 10 12 14 16 Processors HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 14 Memory Use vs. Scale Factor (8 p) 600 Memory Use (MB) 500 400 ECperf 300 SPECjbb 200 100 0 0 5 10 15 20 25 30 35 40 Scale Factor HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 15 Scaling Effects • Scaling System Size – – – – – – Increased system size from 1 to 15 processors High Idle times for both benchmarks on large systems Contention inside the application or JVM High fraction of sharing misses on large systems Very few misses to main memory despite large heap CPI (ECperf 2.0-2.8, SPECjbb2000 1.8-2.3) • Benchmark Scaling – Increased transaction input rate and database size • ECperf: Orders Input Rate • SPECjbb2000: Warehouses – Affects SPECjbb2000 more than ECperf HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 16 Outline • Background – 3-Tiered Systems – ECperf and SPECjbb2000 • Hardware monitoring experiments – System size scaling – Benchmark scaling • Simulation Experiments – Cache Performance • Design decisions – Shared Caches HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 17 Cache Simulations • Experiments conducted with Virtutech Simics with an extended memory system simulator – 4-way set associative caches – 64 byte cache lines • Cache Miss Rates – Uniprocessor simulations – Split 1-level caches • Sharing Analysis – 8-processor simulations – Unified cache HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 18 Data Cache Misses/1000 Instructions 20 ECperf SPECjbb-25 10 SPECjbb-10 SPECjbb-1 0 32 64 128 256 512 1024 2048 4096 8192 16384 Cache Size (KB) HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 19 Instruction Cache Misses/1000 Instructions 20 ECperf SPECjbb-25 10 SPECjbb-10 SPECjbb-1 0 32 64 128 256 512 1024 2048 4096 8192 16384 Cache Size (KB) HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 20 Communication Distribution 12.3%, 100% Percent Cache-to-Cache Tranfers (%) 100 20%, 88.5% 80 60 ECperf SPECjbb-25 40 20 0 0 20 40 60 80 100 Percent of All Cache Lines (%) HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 21 Outline • Background – 3-Tiered Systems – ECperf and SPECjbb2000 • Hardware monitoring experiments – System size scaling – Benchmark scaling • Simulation Experiments – Cache Performance • Design decisions – Shared Caches HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 22 Shared Caches • Potentially a good fit for Java-based middleware – High cache-to-cache transfer ratio – Small working sets – Low memory bandwidth • Important design point for CMPs • Experiment: Measured data miss rate for a simulated 8-processor system running each benchmark – All caches are 1MB – Varied number of caches and degree of sharing HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 23 Data Miss Rate vs. Sharing Degree 25 Misses/1000 Instructions 20 15 ECperf SPECjbb-25 10 5 0 1 2 4 8 Processors/Cache HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 24 Paper Summary • Descriptions of ECperf and SPECjbb2000 • Combination of hardware monitoring and full-system simulation – – – – Scalability Execution time breakdown I/D Cache performance Input rate scaling • Effects of garbage collection • Data sharing analysis and shared-cache performance • Conclusion – Benchmark differences can lead to opposite design conclusions HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 25 Conclusions • Both SPECjbb2000 and ECperf – Have small data sets – High rate of sharing misses • SPECjbb2000 approximates ECperf well except for 2 important differences – ECperf has a much larger instruction footprint and a higher instruction miss rate – The memory footprint of SPECjbb2000 is larger than that of ECperf, especially on large systems HPCA February 11, 2003 Memory System Performance of Java-Based Middleware 26