Evaluating a $2M Commercial Server on a $2K PC and Related Challenges Mark D. Hill Multifacet Project (www.cs.wisc.edu/multifacet) Computer Sciences Department University of Wisconsin—Madison February 2003 (C) 2003 Mulitfacet Project University of Wisconsin-Madison Context & Summary • Commercial Servers – Processors, memory, disks $2M – Run large multithreaded transaction-oriented workloads – Use commercial applications on commercial OS • To Simulate on $2K PC – Scale & tune workloads – Manage simulation complexity – Cope with workload variability Keep L2 miss rates, etc. Separate timing & function Use randomness & statistics • NSF Challenges in Computer Architecture Evaluation Advice researchers, program committees, & funders basically “know," but often forget to heed Methods 2 Wisconsin Multifacet Project Multifacet: Commercial Server Design • Wisconsin Multifacet Project – Directed by Mark D. Hill & David A. Wood – Sponsors: NSF, WI, IBM, Intel, & Sun – Current Contributors: Alaa Alameldeen, Brad Beckman, Milo Martin, Mike Marty, Kevin Moore, & Min Xu • Commercial Server Availability – SafetyNet tolerates some transient faults [ISCA 2002] • Commercial Server Software Complexity – Flight Data Recorder aids debugging of multithreaded programs [ISCA 2003] • Commercial Server Design Complexity – Token Coherence eases coherence protocol design [IEEE Micro Top Picks, Nov-Dec 2003] Methods 3 Wisconsin Multifacet Project Outline • Workload & Simulation Methods – – – – Select, scale, & tune workloads Transition workload to simulator Specify & test the proposed design Evaluate design with simple/detailed processor models • Separate Timing & Functional Simulation • Cope with Workload Variability • NSF Challenges in Computer Architecture Evaluation Methods 4 Wisconsin Multifacet Project Multifacet Simulation Overview Full Workloads Commercial Server (Sun Fire V880) Scaled Workloads Workload Development Memory Protocol Generator (SLICC) Pseudo-Random Protocol Checker Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) Protocol Development Processor Timing Simulator (Opal) Timing Simulator • Virtutech Simics (www.virtutech.com) • Rest is Multifacet software Methods 5 Wisconsin Multifacet Project Select Important Workloads Full Workloads • • • • • Online Transaction Processing: DB2 w/ TPC-C-like Java Server Workload: SPECjbb Static web content serving: Apache Dynamic web content serving: Slashcode Java-based Middleware Methods 6 Wisconsin Multifacet Project Setup & Tune Workloads (on real hardware) Full Workloads Commercial Server (Sun Fire V880) • Tune workload, OS parameters • Measure transaction rate, speed-up, miss rates, I/O • Compare to published results Methods 7 Wisconsin Multifacet Project Scale & Re-tune Workloads Commercial Server (Sun Fire V880) Scaled Workloads • Scale-down for PC memory limits • Retaining similar behavior (e.g., L2 cache miss rate) • Re-tune to achieve higher transaction rates (OLTP: raw disk, multiple disks, more users, etc.) Methods 8 Wisconsin Multifacet Project Transition Workloads to Simulation Scaled Workloads Full System Functional Simulator (Simics) • Create disk dumps of tuned workloads • In simulator: Boot OS, start, & warm application • Create Simics checkpoint (snapshot) Methods 9 Wisconsin Multifacet Project Specify Proposed Computer Design Memory Protocol Generator (SLICC) Memory Timing Simulator (Ruby) • • • • Coherence Protocol (control tables: states X events) Cache Hierarchy (parameters & queues) Interconnect (switches & queues) Processor (later) Methods 10 Wisconsin Multifacet Project Test Proposed Computer Design Pseudo-Random Protocol Checker • • • • • Memory Timing Simulator (Ruby) Randomly select write action & later read check Massive false-sharing for interaction Perverse network stresses design Transient error & deadlock detection Sound but not complete Methods 11 Wisconsin Multifacet Project Simulate with Simple Blocking Processor Scaled Workloads Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) • Warm-up caches or sometimes sufficient (SafetyNet) • Run for fixed number of transactions – Some transaction partially done at start – Other transactions partially done at end • Cope with workload variability (later) Methods 12 Wisconsin Multifacet Project Simulate with Detailed Processor Scaled Workloads Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) Processor Timing Simulator (Opal) • Accurate (future) timing & (current) function • Simulation complexity decoupled (discussed soon) • Same transaction methodology & work variability issues Methods 13 Wisconsin Multifacet Project Simulation Infrastructure & Workload Process Full Workloads Commercial Server (Sun Fire V880) Memory Protocol Generator (SLICC) Pseudo-Random Protocol Checker • • • • • Scaled Workloads Full System Functional Simulator (Simics) Memory Timing Simulator (Ruby) Processor Timing Simulator (Opal) Select important workloads: run, tune, scale, & re-tune Specify system & pseudo-randomly test Create warm workload checkpoint Simulate with simple or detailed processor Fixed #transactions, manage simulation complexity (next), cope with workload variability (next next) Methods 14 Wisconsin Multifacet Project Outline • Workload & Simulation Methods • Separate Timing & Functional Simulation – Simulation Challenges & Complexity – Timing-First Simulation • Cope with Workload Variability • NSF Challenges in Computer Architecture Evaluation Methods 15 Wisconsin Multifacet Project Simulating Function Getting Harder! Web Server Target Application Kernels SPEC Benchmarks (Simulated) Target System Database Operating System MMU Status Registers Real Time Clock Serial Port I/O MMU Controller DMA Controller IRQ Controller Terminal Processor RAM PCI Bus Graphics Card Methods 16 Ethernet Controller CDROM SCSI Disk Fiber Channel Controller SCSI Controller … SCSI Disk Wisconsin Multifacet Project Simulating Timing Getting Harder! • Micro-architecture complexity – Multiple “in-flight” instructions – Speculative execution – Out-of-order execution • Thread-level parallelism – Hardware Multi-threading – Traditional Multi-processing Methods 17 Wisconsin Multifacet Project Managing Simulator Complexity Timing and Functional Simulator Integrated (SimOS) - Complex Functional Simulator Timing Simulator Functional-First (Trace-driven) Timing Simulator Functional Simulator Timing-Directed Complete Timing No? Function Timing Simulator Complete Timing Partial Function Methods - Timing feedback No Timing Complete Function + Timing feedback - Tight Coupling - Performance? Timing-First (Multifacet) Functional Simulator No Timing Complete Function 18 Wisconsin Multifacet Project add load Execute Cache Network Timing-First Operation CPU System Commit Verify RAM CPU Timing Simulator Reload Functional Simulator • Timing Simulator runs speculatively ahead • On commit, calls Functional Simulator to verify • Reload Timing Simulator state if necessary, e.g., interrupt, unimplemented instruction Methods 19 Wisconsin Multifacet Project Timing-First Discussion Timing Simulator Complete Timing Partial Function • • • • • • Functional Simulator Timing-First Simulation No Timing Complete Function Supports speculative multi-processor timing models Leverages existing simulators Rapid development time (e.g., immediate checks) Has low simulation overhead (18% uniprocessor) Introduces relatively little performance error (< 3%) BUT duplicates some code & function Methods 20 Wisconsin Multifacet Project Outline • Workload & Simulation Methods • Separate Timing & Functional Simulation • Cope with Workload Variability – Variability in Multithreaded Workloads – Coping in Simulation • NSF Challenges in Computer Architecture Evaluation Methods 21 Wisconsin Multifacet Project What is Happening Here? OLTP Methods 22 Wisconsin Multifacet Project What is Happening Here? • How can slower memory lead to faster workload? • Answer: Multithreaded workload takes different path – Different lock race outcomes – Different scheduling decisions • (1) Does this happen for real hardware? • (2) If so, what should we do about it? Methods 23 Wisconsin Multifacet Project One Second Intervals (on real hardware) OLTP Methods 24 Wisconsin Multifacet Project 60 Second Intervals (on real hardware) 16-day simulation OLTP Methods 25 Wisconsin Multifacet Project Coping with Workload Variability • Running (simulating) long enough not appealing • Need to separate coincidental & real effects • Standard statistics on real hardware – Variation within base system runs vs. variation between base & enhanced system runs – But deterministic simulation has no “within” variation • Solution with deterministic simulation – Add pseudo-random delay on L2 misses – Simulate base (enhanced) system many times – Use simple or complex statistics Methods 26 Wisconsin Multifacet Project Confidence Interval Example ROB • Estimate #runs to get non-overlapping confidence intervals Methods 27 Wisconsin Multifacet Project Outline • Workload & Simulation Methods • Separate Timing & Functional Simulation • Cope with Workload Variability • NSF Challenges in Computer Architecture Evaluation Advice researchers, program committees, & funders basically “know," but often forget to heed Methods 28 Wisconsin Multifacet Project NSF Challenges in Computer Architecture Evaluation • Dec 2001 NSF Computer Systems Architecture Workshop – Report in IEEE Computer, Aug 2003 – By Kevin Skadon, Margaret Martonosi,David August, Mark Hill, David Lilja, & Vijay Pai • Simulation Frameworks – P (Problem): Need more modularity, portability, & reuse – R (Recommendation): More simulations frameworks, e.g., ASIM & Liberty • Benchmarking – P: Benchmarks for too few domains – R: Reward benchmark development & characterization; consider micro- and synthetic benchmarks Methods 29 Wisconsin Multifacet Project NSF Challenges in Computer Architecture Evaluation • Abstractions & Methodology – P: Believe simulation too much; other methods insufficiently • 1985 ISCA: 30% simulation & 30% modeling • 2001 ISCA: 90% simulation & 0% modeling – R: Push analytic models for insight, cross validation, & far—reaching research • Metrics, Accuracy, & Validation – P: Too dependent on relative & aggregate metrics – R: More metrics & statistical methods, especially when balancing multiple dimensions (e.g., performance & power) Methods 30 Wisconsin Multifacet Project Talk Summary • Simulations of $2M Commercial Servers must – Complete in reasonable time (on $2K PCs) – Handle OS, devices, & multithreaded hardware – Cope with variability of multithreaded software • Multifacet – Scale & tune transactional workloads – Separate timing & functional simulation – Cope w/ workload variability via randomness & statistics • References (www.cs.wisc.edu/multifacet/papers) – Simulating a $2M Commercial Server on a $2K PC [Computer 2/03] – Full-System Timing-First Simulation [Sigmetrics 02] – Variability in Architectural Simulations … [HPCA 03] • NSF Panel – Challenges in Computer Architecture Evaluation [Computer 8/03] Methods 31 Wisconsin Multifacet Project Backup Slides Methods 32 Wisconsin Multifacet Project Other Multifacet Methods Work • Specifying & Verifying Coherence Protocols – [SPAA98], [HPCA99], [SPAA99], & [TPDS02] • Workload Analysis & Improvement – Database systems [VLDB99] & [VLDB01] – Pointer-based [PLDI99] & [Computer00] – Middleware [HPCA03] • Modeling & Simulation – – – – – – Methods Commercial workloads [Computer02] & [HPCA03] Decoupling timing/functional simulation [Sigmetrics02] Simulation generation [PLDI01] Analytic modeling [Sigmetrics00] & [TPDS TBA] Micro-architectural slack [ISCA02] Interaction costs [Micro02] 33 Wisconsin Multifacet Project