IBM Research – Austin Heather Hanson Karthick Rajamani What computer architects need to know about memory throttling WEED 2010 June 20, 2010 © 2010 IBM Corporation Outline Memory throttling overview Experimental platform – System configuration – Memory throttling implementation Memory throttling characterization – Bandwidth – Power – Performance Summary 2 © 2010 IBM Corporation Memory throttling in a nutshell Memory throttling is a power-performance knob that: – Impacts memory reference rates of both instruction and data streams – controls power – can be used for safety or optimization • regulate DIMM temperatures • enforce memory power budgets Memory throttling restricts read & write traffic – directly controls memory power – indirectly affects processors and other components Several implementation styles in commercial systems – insert periodic idle cycles – allow arbitrary number of transactions up to power (estimated) threshold – run + hold windows – enforce read & write quotas [this paper] • first N transactions to proceed in time window • any further requests wait until next time period 3 © 2010 IBM Corporation Comparison to clock throttling run-hold clock throttling regular frequency during run portion; clock halted during hold portion quota-style memory throttling Nth request in each period; additional requests would be queued for later service reads & writes proceed as requested up to N requests per period Example: N = 6 Up to 6 transactions serviced per period, regardless of request timing 4 © 2010 IBM Corporation POWER6 Memory Throttling IBM JS12 blade system – Processor • POWER6 • 1 socket x 2 cores per processor socket • 3.8 GHz frequency (fixed in these experiments) • SLES10 linux – Memory •16 GB capacity • 8 DIMMS x 2 GB each • DDR2 • 667 MHz bus Quota-style memory throttling – N transactions per M memory cycles 100% throttle level == unthrottled – Time period is faster than thermal and power supply timescales 5 © 2010 IBM Corporation Memory throttle characterization methodology 1. Sweep throttle settings • Set throttle • Run steady-behavior benchmark DAXPY (double A * X plus Y) FPMAC (floating-point multiply accumulate) RandomMemory (generate random addresses) SPECPower_ssj2008 calibration phase (peak throughput for warehouse transactions) • Record sensor data, 256ms per sample Memory power Memory reads & writes Instruction throughput And other sensors not shown here • Decrement throttle • Repeat for full range of throttle settings 2. Repeat throttle sweep for multiple benchmarks and memory footprints – Microbenchmarks: L1 cache contained and main memory footprints – SPECPower_ssj2008: behaves as nearly contained in on-chip caches 3. Calculate median sensor data for each permutation {benchmark, footprint, throttle} 6 © 2010 IBM Corporation Memory throttle effect on bandwidth transition between linear & saturated regions saturated 7 © 2010 IBM Corporation Subtle but very important point about transition region Actual bandwidth < max bandwidth bandwidth restrictions pipeline starvation reduced request rate A closer look at RandomMemory-DIMM • uses less bandwidth than other benchmarks at same throttle levels • also less bandwidth than its own saturation level Simply measuring bandwidth at a single/current throttle level is not enough to identify a region of operation less than max could be saturated or transition region ….a controller will not be able to accurately predict the effect on bandwidth of a throttle level change …or predict the effect on power or performance 8 © 2010 IBM Corporation Memory Power is basically linear with bandwidth, so this chart looks familiar…. 9 © 2010 IBM Corporation Throttling effects relative to each benchmark Generally more performance reduction than power reduction (in %) – Throttling alone doesn’t affect static portion of memory power • Leveraging idle low-power modes of memory can alter positively the power-performance curve for memory request rate throttling. – Possible to waste energy from longer execution time Larger bandwidth demands larger effect from throttling – Conversely, power reduction only when performance is impacted. L1-contained DAXPY: throttling has no effect performance power DIMM-sized DAXPY: drastic effect 10 © 2010 IBM Corporation Summary Memory throttling is a power-performance knob available in commercial systems Memory controller restricts read & write bandwidth – caps memory power – controls DIMM temperature Mileage may vary – power and performance management depend on bandwidth demand • throttling a low-bandwidth workload doesn’t reduce much power – potential to use more energy due to increased execution time • use highly throttled settings with caution Effective tool for power capping – power constrained configurations – thermal safety – power shifting 11 © 2010 IBM Corporation Acknowledgements IBM Research – Austin IBM Systems & Technology Group – Memory characterization: Joab Henderson, Kenneth Wright – EnergyScale firmware: Guillermo Silva, Andrew Geissler 12 © 2010 IBM Corporation