Methods - Computer Sciences Dept.

advertisement
Evaluating a $2M Commercial
Server on a $2K PC
and Related Challenges
Mark D. Hill
Multifacet Project (www.cs.wisc.edu/multifacet)
Computer Sciences Department
University of Wisconsin—Madison
February 2003
(C) 2003 Mulitfacet Project
University of Wisconsin-Madison
Context & Summary
• Commercial Servers
– Processors, memory, disks  $2M
– Run large multithreaded transaction-oriented workloads
– Use commercial applications on commercial OS
• To Simulate on $2K PC
– Scale & tune workloads
– Manage simulation complexity
– Cope with workload variability
Keep L2 miss rates, etc.
Separate timing & function
Use randomness & statistics
• NSF Challenges in Computer Architecture Evaluation
Advice researchers, program committees, & funders
basically “know," but often forget to heed
Methods
2
Wisconsin Multifacet Project
Multifacet: Commercial Server Design
• Wisconsin Multifacet Project
– Directed by Mark D. Hill & David A. Wood
– Sponsors: NSF, WI, IBM, Intel, & Sun
– Current Contributors: Alaa Alameldeen, Brad Beckman,
Milo Martin, Mike Marty, Kevin Moore, & Min Xu
• Commercial Server Availability
– SafetyNet tolerates some transient faults [ISCA 2002]
• Commercial Server Software Complexity
– Flight Data Recorder aids debugging of multithreaded programs
[ISCA 2003]
• Commercial Server Design Complexity
– Token Coherence eases coherence protocol design
[IEEE Micro Top Picks, Nov-Dec 2003]
Methods
3
Wisconsin Multifacet Project
Outline
• Workload & Simulation Methods
–
–
–
–
Select, scale, & tune workloads
Transition workload to simulator
Specify & test the proposed design
Evaluate design with simple/detailed processor models
• Separate Timing & Functional Simulation
• Cope with Workload Variability
• NSF Challenges in Computer Architecture Evaluation
Methods
4
Wisconsin Multifacet Project
Multifacet Simulation Overview
Full Workloads
Commercial Server
(Sun Fire V880)
Scaled Workloads
Workload Development
Memory Protocol
Generator (SLICC)
Pseudo-Random
Protocol Checker
Full System Functional
Simulator (Simics)
Memory Timing
Simulator (Ruby)
Protocol Development
Processor Timing
Simulator (Opal)
Timing Simulator
• Virtutech Simics (www.virtutech.com)
• Rest is Multifacet software
Methods
5
Wisconsin Multifacet Project
Select Important Workloads
Full Workloads
•
•
•
•
•
Online Transaction Processing: DB2 w/ TPC-C-like
Java Server Workload: SPECjbb
Static web content serving: Apache
Dynamic web content serving: Slashcode
Java-based Middleware
Methods
6
Wisconsin Multifacet Project
Setup & Tune Workloads (on real hardware)
Full Workloads
Commercial Server
(Sun Fire V880)
• Tune workload, OS parameters
• Measure transaction rate, speed-up, miss rates, I/O
• Compare to published results
Methods
7
Wisconsin Multifacet Project
Scale & Re-tune Workloads
Commercial Server
(Sun Fire V880)
Scaled Workloads
• Scale-down for PC memory limits
• Retaining similar behavior (e.g., L2 cache miss rate)
• Re-tune to achieve higher transaction rates
(OLTP: raw disk, multiple disks, more users, etc.)
Methods
8
Wisconsin Multifacet Project
Transition Workloads to Simulation
Scaled Workloads
Full System Functional
Simulator (Simics)
• Create disk dumps of tuned workloads
• In simulator: Boot OS, start, & warm application
• Create Simics checkpoint (snapshot)
Methods
9
Wisconsin Multifacet Project
Specify Proposed Computer Design
Memory Protocol
Generator (SLICC)
Memory Timing
Simulator (Ruby)
•
•
•
•
Coherence Protocol (control tables: states X events)
Cache Hierarchy (parameters & queues)
Interconnect (switches & queues)
Processor (later)
Methods
10
Wisconsin Multifacet Project
Test Proposed Computer Design
Pseudo-Random
Protocol Checker
•
•
•
•
•
Memory Timing
Simulator (Ruby)
Randomly select write action & later read check
Massive false-sharing for interaction
Perverse network stresses design
Transient error & deadlock detection
Sound but not complete
Methods
11
Wisconsin Multifacet Project
Simulate with Simple Blocking Processor
Scaled Workloads
Full System Functional
Simulator (Simics)
Memory Timing
Simulator (Ruby)
• Warm-up caches or sometimes sufficient (SafetyNet)
• Run for fixed number of transactions
– Some transaction partially done at start
– Other transactions partially done at end
• Cope with workload variability (later)
Methods
12
Wisconsin Multifacet Project
Simulate with Detailed Processor
Scaled Workloads
Full System Functional
Simulator (Simics)
Memory Timing
Simulator (Ruby)
Processor Timing
Simulator (Opal)
• Accurate (future) timing & (current) function
• Simulation complexity decoupled (discussed soon)
• Same transaction methodology
& work variability issues
Methods
13
Wisconsin Multifacet Project
Simulation Infrastructure & Workload Process
Full Workloads
Commercial Server
(Sun Fire V880)
Memory Protocol
Generator (SLICC)
Pseudo-Random
Protocol Checker
•
•
•
•
•
Scaled Workloads
Full System Functional
Simulator (Simics)
Memory Timing
Simulator (Ruby)
Processor Timing
Simulator (Opal)
Select important workloads: run, tune, scale, & re-tune
Specify system & pseudo-randomly test
Create warm workload checkpoint
Simulate with simple or detailed processor
Fixed #transactions, manage simulation complexity (next),
cope with workload variability (next next)
Methods
14
Wisconsin Multifacet Project
Outline
• Workload & Simulation Methods
• Separate Timing & Functional Simulation
– Simulation Challenges & Complexity
– Timing-First Simulation
• Cope with Workload Variability
• NSF Challenges in Computer Architecture Evaluation
Methods
15
Wisconsin Multifacet Project
Simulating Function Getting Harder!
Web Server
Target Application
Kernels
SPEC
Benchmarks
(Simulated)
Target System
Database
Operating
System
MMU
Status
Registers
Real Time
Clock
Serial Port
I/O MMU
Controller
DMA
Controller
IRQ
Controller
Terminal
Processor
RAM
PCI Bus
Graphics
Card
Methods
16
Ethernet
Controller
CDROM
SCSI
Disk
Fiber
Channel
Controller
SCSI
Controller
…
SCSI
Disk
Wisconsin Multifacet Project
Simulating Timing Getting Harder!
• Micro-architecture complexity
– Multiple “in-flight” instructions
– Speculative execution
– Out-of-order execution
• Thread-level parallelism
– Hardware Multi-threading
– Traditional Multi-processing
Methods
17
Wisconsin Multifacet Project
Managing Simulator Complexity
Timing and Functional
Simulator
Integrated (SimOS)
- Complex
Functional
Simulator
Timing
Simulator
Functional-First (Trace-driven)
Timing
Simulator
Functional
Simulator
Timing-Directed
Complete Timing
No? Function
Timing
Simulator
Complete Timing
Partial Function
Methods
- Timing feedback
No Timing
Complete Function
+ Timing feedback
- Tight Coupling
- Performance?
Timing-First (Multifacet)
Functional
Simulator
No Timing
Complete Function
18
Wisconsin Multifacet Project
add
load
Execute
Cache
Network
Timing-First Operation
CPU
System
Commit
Verify
RAM
CPU
Timing
Simulator
Reload
Functional
Simulator
• Timing Simulator runs speculatively ahead
• On commit, calls Functional Simulator to verify
• Reload Timing Simulator state if necessary,
e.g., interrupt, unimplemented instruction
Methods
19
Wisconsin Multifacet Project
Timing-First Discussion
Timing
Simulator
Complete Timing
Partial Function
•
•
•
•
•
•
Functional
Simulator
Timing-First Simulation
No Timing
Complete Function
Supports speculative multi-processor timing models
Leverages existing simulators
Rapid development time (e.g., immediate checks)
Has low simulation overhead (18% uniprocessor)
Introduces relatively little performance error (< 3%)
BUT duplicates some code & function
Methods
20
Wisconsin Multifacet Project
Outline
• Workload & Simulation Methods
• Separate Timing & Functional Simulation
• Cope with Workload Variability
– Variability in Multithreaded Workloads
– Coping in Simulation
• NSF Challenges in Computer Architecture Evaluation
Methods
21
Wisconsin Multifacet Project
What is Happening Here?
OLTP
Methods
22
Wisconsin Multifacet Project
What is Happening Here?
• How can slower memory lead to faster workload?
• Answer: Multithreaded workload takes different path
– Different lock race outcomes
– Different scheduling decisions
• (1) Does this happen for real hardware?
• (2) If so, what should we do about it?
Methods
23
Wisconsin Multifacet Project
One Second Intervals (on real hardware)
OLTP
Methods
24
Wisconsin Multifacet Project
60 Second Intervals (on real hardware)
16-day
simulation
OLTP
Methods
25
Wisconsin Multifacet Project
Coping with Workload Variability
• Running (simulating) long enough not appealing
• Need to separate coincidental & real effects
• Standard statistics on real hardware
– Variation within base system runs
vs. variation between base & enhanced system runs
– But deterministic simulation has no “within” variation
• Solution with deterministic simulation
– Add pseudo-random delay on L2 misses
– Simulate base (enhanced) system many times
– Use simple or complex statistics
Methods
26
Wisconsin Multifacet Project
Confidence Interval Example
ROB
• Estimate #runs to get
non-overlapping confidence intervals
Methods
27
Wisconsin Multifacet Project
Outline
• Workload & Simulation Methods
• Separate Timing & Functional Simulation
• Cope with Workload Variability
• NSF Challenges in Computer Architecture Evaluation
Advice researchers, program committees, & funders
basically “know," but often forget to heed
Methods
28
Wisconsin Multifacet Project
NSF Challenges in Computer Architecture Evaluation
• Dec 2001 NSF Computer Systems Architecture Workshop
– Report in IEEE Computer, Aug 2003
– By Kevin Skadon, Margaret Martonosi,David August,
Mark Hill, David Lilja, & Vijay Pai
• Simulation Frameworks
– P (Problem): Need more modularity, portability, & reuse
– R (Recommendation): More simulations frameworks,
e.g., ASIM & Liberty
• Benchmarking
– P: Benchmarks for too few domains
– R: Reward benchmark development & characterization; consider
micro- and synthetic benchmarks
Methods
29
Wisconsin Multifacet Project
NSF Challenges in Computer Architecture Evaluation
• Abstractions & Methodology
– P: Believe simulation too much; other methods insufficiently
• 1985 ISCA: 30% simulation & 30% modeling
• 2001 ISCA: 90% simulation & 0% modeling
– R: Push analytic models for insight, cross validation,
& far—reaching research
• Metrics, Accuracy, & Validation
– P: Too dependent on relative & aggregate metrics
– R: More metrics & statistical methods, especially when
balancing multiple dimensions (e.g., performance & power)
Methods
30
Wisconsin Multifacet Project
Talk Summary
• Simulations of $2M Commercial Servers must
– Complete in reasonable time (on $2K PCs)
– Handle OS, devices, & multithreaded hardware
– Cope with variability of multithreaded software
• Multifacet
– Scale & tune transactional workloads
– Separate timing & functional simulation
– Cope w/ workload variability via randomness & statistics
• References (www.cs.wisc.edu/multifacet/papers)
– Simulating a $2M Commercial Server on a $2K PC [Computer 2/03]
– Full-System Timing-First Simulation [Sigmetrics 02]
– Variability in Architectural Simulations … [HPCA 03]
• NSF Panel
– Challenges in Computer Architecture Evaluation [Computer 8/03]
Methods
31
Wisconsin Multifacet Project
Backup Slides
Methods
32
Wisconsin Multifacet Project
Other Multifacet Methods Work
• Specifying & Verifying Coherence Protocols
– [SPAA98], [HPCA99], [SPAA99], & [TPDS02]
• Workload Analysis & Improvement
– Database systems [VLDB99] & [VLDB01]
– Pointer-based [PLDI99] & [Computer00]
– Middleware [HPCA03]
• Modeling & Simulation
–
–
–
–
–
–
Methods
Commercial workloads [Computer02] & [HPCA03]
Decoupling timing/functional simulation [Sigmetrics02]
Simulation generation [PLDI01]
Analytic modeling [Sigmetrics00] & [TPDS TBA]
Micro-architectural slack [ISCA02]
Interaction costs [Micro02]
33
Wisconsin Multifacet Project
Download