Accurate Power and Energy Measurement on Kepler-based Tesla GPUs
Martin Burtscher
Department of Computer Science
Introduction
GPU-based accelerators
Quickly spreading in PCs and even handheld devices
Widely used in high-performance computing
Power and energy efficiency
Heat dissipation is a problem
Electric bill and battery life are of growing concern
Exascale requires 50x boost in performance per watt
Important research area
Need to develop techniques to reduce power and energy
Have to be able to measure power/energy of programs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 2
GPU Power Sensors
Hardware
High-end compute GPUs include power sensors
For example, K20/K40 Tesla cards have built-in sensor
These cards are the target of this talk
Software
Can query sensor with NVIDIA Management Library http://developer.nvidia.com/nvidia-management-library-nvml
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 3
Problems
Power sensor data behaves strangely
Running the same kernel twice yields different energy
First launch: 114 J, second launch: 147 J (29% more energy)
Running a kernel 2x as long more than doubles energy
1x input: 732 J, 2x input: 1579 J (8% above doubling)
Power sensor sampling rate varies greatly
Ranges from 0.266 ms to 130 ms (7.7 Hz to 3760 Hz)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 4
Methodology
Hardware
Two K20c, two K20m, two K20X, and two K40m GPUs
Measurement
Query power and time in loop on “idle” CPU core
Test code
Compute-intensive regular n-body kernel
Constant computation rate of over 2 TFlops on a K20c
No data dependences; vary n to adjust kernel runtime
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 5
Expected Power Profile
Kernel starts executing
Kernel stops executing
GPU idle power
Measurement loop runtime
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 6
Measured Power Profile
5s
Power ramps up slowly
Power ramps down slowly
3s
Macroscopic phenomena
4s
Switch to step shape
Idle power reached
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 7
Energy = Area Under Power Curve
Missing energy?
Unclear how big energy is
Delayed energy?
Integrate to where?
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 8
Ramp-up Behavior of 2 Short Runs
Ramp down doesn’t follow
2 nd run starts higher but also follows curve
Short run same as longer run
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 9
Ramp-down Behavior of Several Runs t
2
160 t
3 t
4
140
Driver lowers power level
120
Shape depends on power at t
2
100
80
Shape always the same
60
Steps down every second 40
20
Power increases after kernel done
0
16.2
17.2
18.2
19.2
20.2
Shifted Runtime [s]
21.2
22.2
23.2
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 10
Sampling Interval Lengths t
1
160
140
120
100
80
Very long interval t
2 t
3
Driver activity can prevent sampling t
4
80
70
60
50
40
60 30
40
20
Wide range of intervals
20
Short intervals 10
0
10.7
12.0
13.3
14.6
15.9
17.2
18.5
19.8
21.1
22.4
23.7
Runtime [s]
0
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 11
Sampling Interval Lengths (zoomed-in)
120 12
100
80
Identical values
10
Sampled power only ever changes after long interval
8
60 6
Very long interval
40 4
20
Many short intervals
2
0 0
12.030
12.035
12.040
12.045
12.050
12.055
12.060
Runtime [s]
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 12
Correcting the Measurements
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 13
Sampling Frequency
Eliminate redundant samples
Only sample once every 15 ms (66.7 Hz)
Cannot accurately measure kernels under ~150 ms
Account for the variation in interval length t
1 Use high-resolution time stamps 160
140
Example: energy from t
1
to t
4
Dotted (fixed intervals): 1205 J
Solid (variable intervals): 1066 J
13% discrepancy t
4
120
100
80
60
40
20
0
10.7
12.0
13.3
14.6
15.9
17.2
18.5
19.8
21.1
22.4
23.7
Runtime [s]
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 14
True Power
Sensor hardware
Seems to asymptotically approach true power
Reminiscent of capacitor charging
True instant power
P true is a function of the slope of the power profile dP/dt and the power measured by the sensor P sensor
P true
= P sensor
+ C × dP sensor
/dt
“Capacitance” of sensor
C ≈ 0.84 s on all tested K20 GPUs
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 15
Back-calculated from Expected Profile
Minimized absolute errors to determine C
‘Capacitor’ function matches measured values perfectly
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 16
Corrected Power Profile t
1 t
2
160
140
120
Wobbles due to sampling errors
100 ‘Active idle’ power level
80 t
3
60
40
Corrected profile matches expected rectangular profile
20
0
13 14 15 16 17
Time [s]
18 19 20 21
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 17
Correction of 2 Short Runs t
1a t
2a t
1b t
2b
160
140
120 t
3b
Corrected power profile matches expected profile
100
80
60
40
20
0
111 112 113 114 115
Time [s]
116 117 118 119
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 18
Second K20c GPU t
1
160
140
120
100
80
60
40
20
0
16.5
17.5
Identical to original K20c
18.5
19.5
20.5
Time [s] t
2
21.5
22.5
23.5
t
3
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 19
K20m GPU
180 t
1
160
140
120
100
80
60
40
20
0
62.7
63.7
Similar profile but higher power level
64.7
65.7
66.7
Time [s] t
2
67.7
68.7
69.7
t
3
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 20
K20X GPU t
1 t
2 t
4
200
180
160
140
120
100
Profile is good, no correction needed!
80
60
40
20
Huge 600 ms gap
0
128 129 130 131 132 133 134 135 136 137
Time [s]
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 21
K40m GPU
K40m again requires correction
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 22
Application to Full CUDA Program
Implementation of Barnes Hut n-body algorithm
Taken from LonestarGPU benchmark suite
Contains multiple regular and irregular kernels
Highly optimized, but still suffers from load imbalance, divergence, and uncoalesced accesses
Main kernel is ‘regularized’ (warp-based)
NASA/JPL-Caltech/SSC
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 23
Barnes Hut Power Profile (1 Step)
“Wave” in profile
Slow then fast drop-off
Original profile is hard to interpret
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 24
Barnes Hut Power Profile (Kernels)
“Wave” in profile
Slow then fast drop-off
Original profile is hard to interpret
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 25
Corrected Barnes Hut Power Profile
160 a b cd
140
120
100
80
60
40
20
0
61.7
Two similar irreg. kernels
62.7
One more irreg. kernel
63.7
Corrected profile reveals important info
Regularized main kernel
64.7
65.7
Time [s]
Decrease due to load imbal.
66.7
Very short regular kernel
67.7
68.7
ef
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 26
K20Power Tool
Output
Corrected profile and corresponding ‘active’ energy
Features
Computes instant power using ‘capacitor’ formula
Employs high-resolution time steps
Samples at true frequency of 66.7 Hz
Dissemination
Open source, research license
http://cs.txstate.edu/~burtscher/research/K20power/
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 27
Marcher System
Tool will be part of Marcher system at Texas State
NSF-funded green computing infrastructure
Marcher is a power-measurable cluster system
832 general-purpose cores
12,000 GPU and MIC cores
1.2 TB of DDR3 with power throttling and scaling
50 TB of hybrid storage with hard drives and SSDs
Component-level power measurement tools (e.g.,
CPU, DRAM, Disk, GPU, Xeon Phi)
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 28
Summary
Correctly measuring K20/K40 power and energy
Sample at 66.7 Hz and include time stamps
Compute true power with presented formula
Use neighboring power samples to approximate slope
Compute true energy by integrating true power
Over intervals where power is above ‘active idle’
K20Power tool
Software tool that implements this methodology
Paper at http://cs.txstate.edu/~burtscher/papers/gpgpu14.pdf
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 29
Acknowledgments
Collaborators
Ivan Zecena and Ziliang Zong
U.S. National Science Foundation
DUE-1141022, CNS-1217231, and CNS-1305359
NVIDIA Corporation
Grants and equipment donations
Texas State University
Research Enhancement Program
Nvidia
Accurate Power and Energy Measurement on Kepler-based Tesla GPUs 30