Lecture 1: Course Introduction and Review

advertisement
EENG 449bG/CPSC 439bG
Computer Systems
Lecture 11
Power Issues and DVS
February 17, 2005
Prof. Andreas Savvides
Spring 2005
http://www.eng.yale.edu/courses/eeng449bG
2/17/05
EENG449b/Savvides
Lec 10.1
Announcements
• Reading reference for this lecture
– J. Pouwese, K. Lagendoen, H. Sips, “Dynamic Voltage
Scaling on a Low Power Microprocessor”, posted on the
class website
• Midterm date discussion & conflicts with
other classes
2/17/05
EENG449b/Savvides
Lec 10.2
Why worry about power?
Intel vs. Duracell
16x
14x
Processor (MIPS)
12x
Hard Disk (capacity)
10x
Improvement
(compared to year 0)
8x
Memory (capacity)
6x
4x
Battery (energy stored)
2x
1x
0
1
2
3
4
Time (years)
5
6
• No Moore’s Law in batteries: 2-3%/year growth
EENG449b/Savvides
2/17/05
Lec 10.3
Current Battery Technology is
Inadequate
Battery
Rechargeable? Wh/lb
Alkaline MnO2
NO
65.8
Silver Oxide
NO
60
Li/MnO2
NO
105
Zinc Air
NO
140
NiCd
YES
23
Li-Polymer
YES
65-90
• Example: 20-watt battery
Wh/litre
347
500
550
1150
125
300-415
» NiCd weighs 0.5 kg, lasts 1 hr, and costs $20
» Comparable Li-Ion lasts 3 hrs, but costs > 4x more
2/17/05
EENG449b/Savvides
Lec 10.4
Comparison of Energy Sources
Power (Energy) Density
Batteries (Zinc-Air)
Batteries(Lithium ion)
Source of Estimates
3
1050 -1560 mWh/cm (1.4 V)
Published data from manufacturers
3
300 mWh/cm (3 - 4 V)
Published data from manufacturers
2
15 mW/cm - direct sun
Solar (Outdoors)
2
0.15mW/cm - cloudy day.
Published data and testing.
2
.006 mW/cm - my desk
Solar (Indoor)
Vibrations
2
0.57 mW/cm - 12 in. under a 60W bulb
3
0.001 - 0.1 mW/cm
Testing
Simulations and Testing
2
3E-6 mW/cm at 75 Db sound level
Acoustic Noise
Passive Human
Powered
9.6E-4 mW/cm2 at 100 Db sound level
1.8 mW (Shoe inserts >> 1 cm )
Published Study.
Thermal Conversion
0.0018 mW - 10 deg. C gradient
Published Study.
2
Direct Calculations from Acoustic Theory
3
80 mW/cm
2/17/05
3
Nuclear Reaction
1E6 mWh/cm
3
300 - 500 mW/cm
Fuel Cells
~4000 mWh/cm
3
Published Data.
Published Data.
EENG449b/Savvides
Assume 1mW Average as definition of “Scavenged Energy”
Lec 10.5
Trends in Total Power Consumption
source : arpa-esto
DEC 21164
microprocessor
power dissipation
• Frightening: proportional to area & frequency
2/17/05
EENG449b/Savvides
Lec 10.6
Power Metrics in Microprocessors
nJ/Instruction
– Mostly for processors with the same instruction sets
– Does not capture the effect of operand size (e.g 8-bit
addition vs. 32-bit addition operations
MIPS/Watt
mA – common among component data sheets
Remember:
Power (Watts)  IV
Energy (Joules)  Power  time
1 2
Energy(Joules)  CV
2
2/17/05
EENG449b/Savvides
Lec 10.7
Modeling the Battery Behavior
• Theoretical capacity of battery is decided by the
amount of the active material in the cell
» batteries often modeled as buckets of constant energy
• e.g. halving the power by halving the clock frequency is assumed to double the
computation time while maintaining constant computation per battery life
• In reality, delivered or nominal capacity depends on
how the battery is discharged
» discharge rate (load current)
» discharge profile and duty cycle
» operating voltage and power level drained
2/17/05
EENG449b/Savvides
Lec 10.8
Battery Capacity
from [Powers95]
• Current in “C” rating: load current
nomralized to battery’s capacity
» e.g. a discharge current of 1C for a capacity of
500 mA-hrs is 500 mA
2/17/05
EENG449b/Savvides
Lec 10.9
Battery Capacity vs. Discharge
Current
• Amount of energy delivered is decreased as the
current (rate at which power is drawn) is increased
» rated as ampere hours or watt hours when discharged at a
specific rate to a specific cut-off voltage
– primary cells rated at a current which is 1/100th of the capacity in
ampere hours (C/100)
– secondary cells are rated at C/20 or C/10
• At high currents, the diffusion process that moves
new active material from electrolytes to the electrode
cannot keep up
» concentration of active material at cathode drops to zero, and
cell voltage goes down below cut-off
» even though active material in cell is not exhausted!
2/17/05
EENG449b/Savvides
Lec 10.10
Battery Energy Consumers
2/17/05
EENG449b/Savvides
Lec 10.11
Where does the Power Go?
Peripherals
Disk
Display
Processing
Programmable
Ps & DSPs
ASICs
(apps, protocols etc.)
Memory
Battery
DC-DC
Converter
Radio
Modem
Power Supply
2/17/05
RF
Transceiver
Communication
EENG449b/Savvides
Lec 10.12
Power Consumption for a Computer
with Wireless NIC
Other
7%
CPU/Memory
21%
Hard Drive
18%
2/17/05
Display
36%
Wireless LAN
18%
EENG449b/Savvides
Lec 10.13
Energy Consumption of
Wireless NICs (Wavelan)
2/17/05
Specs
Measured
2 Mbps
(Bronze)
Sleep Mode
Idle Mode
Receive Mode
Transmit Mode
9 mA
-------280 mA
330 mA
14 mA
178 mA 200
mA
280 mA
11 Mbps
(Silver)
Sleep Mode
Idle Mode
Receive Mode
Transmit Mode
10 mA
-------180 mA
280 mA
10 mA
156 mA
190 mA
284 mA
EENG449b/Savvides
Lec 10.14
Example: Power Consumption for
Compaq’s iPAQ
206MHz StrongArm SA-1110
processor
320x240 resolution color TFT LCD
Touch screen
32MB SDRAM / 16MB Flash memory
USB/RS-232/IrDA connection
Speaker/Microphone
Lithium Polymer battery
PCMCIA card expansion pack & CF
card expansion pack
* Note
CPU is idle state of most of its time
Audio, IrDA, RS232 power is measured when
each part is idling
Etc includes CPU, flash memory, touch
screen and all other devices
2/17/05
Frontlight brightness was 16
EENG449b/Savvides
Lec 10.15
Microprocessor Power Consumption
CMOS Circuits
(Used in most microprocessors)
Static Component
Bias and leakage currents
O(1mW)
Dynamic Component
Digital circuit switching inside
the processor
2
P  Istandby Vdd  Ileakage Vdd  Isc Vdd  ClVdd fclk
Static
2/17/05
Dynamic
EENG449b/Savvides
Lec 10.16
Power Consumption in Digital CMOS
Circuits
2
Power  Istandby Vdd  Ileakage Vdd  Isc Vdd  ClVdd fclk
Istandby
- current constantly drawn from the power supply
Ileakage
- determined by fabrication technology
Isc
- short circuit current due to the DC path between the
supply rails during output transitions
Cl
fclk
2/17/05
- load capacitance at the output node
- clock frequency
Vdd
- power supply voltage
EENG449b/Savvides
Lec 10.17
Dynamic Voltage Scaling
• What can you do to conserve power on a
processor?
• Dynamic power consumption is the dominant
component
• Example: Transmeta’s Crusoe processor
2/17/05
EENG449b/Savvides
Lec 10.18
DVS on Low Power Processor
Number of gates
M
Pdynamic   C k  f  Vdd
Dynamic Power Component
2
k 1
Load capacitance of gate k
Maximum gain when voltage is lowered BUT
lower voltage increases circuit delay
Propagation delay
VDD
τ
 (VDD  VT ) 2
Transistor gain factor
2/17/05
CMOS transistor threshold voltage
EENG449b/Savvides
Lec 10.19
Voltage Scaling on LART
• Dynamically lower the processor voltage and
frequency to reduce power consumption
• LART wearable board
–
–
–
–
–
2/17/05
StorngARM 1100 Processor 190MHz
Various I/O capabilities
32 MB volatile memory
4 MB non-volatile memory
Programmable voltage regulator
EENG449b/Savvides
Lec 10.20
Processor Envelope
At 1.5V Max clock frequency 251MHz
Min frequency the processor functions correctly is 59MHz
2/17/05
EENG449b/Savvides
Lec 10.21
LART Power Measurement
Based on dhrystone benchmark
• Note the measurement setup at
Different levels on the board
• Always provide hooks for
measurement, testing and debugging
during your design. Both for
software and hardware!!!
2/17/05
Total Power Consumption on the
LART
Platform
EENG449b/Savvides
Lec 10.22
System Support Requirements
• To manage DVS effectively, the
computation requirements must be known in
advance
• Predictive scheme
– Try to learn that behavior based on the computation
profile
• Better scheme: Applications should be power
aware
• Processor frequency and scaling should be
changed without much delay
– This is specific to each processor
– 150us for the LART processor
2/17/05
EENG449b/Savvides
Lec 10.23
Example: Power Aware Video Playback
• Annotate a H.263 video decoder with
information on the clock speed required to
decode a known video sequence
• Using a 12.6s video, 15fps
• Power consumption measurements for LART
– No-DVS: 198mW for CPU, 207mW for memory
subsystem
– DVS: 100mW for CPU and 204mW for the memory
subsystem
– 2X improvement, but 25% improvement when memory
accesses are considered
2/17/05
EENG449b/Savvides
Lec 10.24
LART Memory Performance
• Memory access is optimal when high
resolution memory access timing is available
• For LART the optimal memory pattern:
–
–
–
–
2/17/05
148MHz
92 MB/s memory bandwidth
Power consumption 514.2mW
Energy cost 5.6mJ/MB
EENG449b/Savvides
Lec 10.25
Memory Subsystem Power
Consumption – Read Operation
Optimal memory access waveforms
Power consumption
2/17/05
Memory Bandwidth
EENG449b/Savvides
Lec 10.26
Energy breakdown for read
(based on 1MB read)
2/17/05
Regulator
Loss-factor
EENG449b/Savvides
Lec 10.27
Power Breakdown for H.263 Decoder
2/17/05
EENG449b/Savvides
Lec 10.28
Reducing Power Consumption
is a multilevel task!
• Physical layer
– Technology – reduce the surface of CMOS circuits
• Architecture/IC level
– Several optimizations in the design (e.g parallelism and
pipelining)
– Provide hooks for software driven power management (e.g
different power modes and clock speeds)
• OS Level
– Smart schedulers, interval schedulers, DVS
• Application Level
– Power aware applications that worn the OS and the hardware
about the features needed during application lifetime
– Sleep modes and DVS driven by applications
• Network Level
– Networked devices may be able to apply low duty cycles, in
which some of the devices are asleep and others are awake
2/17/05
EENG449b/Savvides
Lec 10.29
Conclusions
• Interval based schedulers not so efficient
– Interval-scheduler – reduce voltage after a prespecified idle period is detected
• Better leverage of DVS when the processor
is aware of the application requirements
– Illustrated with the H.263 encoder
• Monitor different power consumption
profiles across different sections of the
platform and use them to make clever
decisions about power-management
• What is missing:
– Comments on power regulator efficiencies…
2/17/05
EENG449b/Savvides
Lec 10.30
Announcements
• Need to start deciding on the final
projects.
• We need to discuss these with you
individually at the end of class
• One page detailed proposal by March 3
• This should include
– 1 paragraph motivation and description of your project
– 1 paragraph on the approach you are going to use and
the tools
– 1 paragraph on evaluation
» What is the strategy you will use to evaluate the
performance of your project.
2/17/05
EENG449b/Savvides
Lec 10.31
DVS Example
• Consider a processor with DVS
• Frequency range 250 – 59MHz
• Supply Voltage range 0.8V (@49MHz) and
1.5V (@250MHz)
• Assume that the processor can compute at
1 MIPS per MHz.
2/17/05
EENG449b/Savvides
Lec 10.32
DVS Example 1
• What is the maximum energy saving the
processor can achieve with dynamic voltage
scaling?
P  CV 2 f
c  1.52  250MHz 562.5
Energy Saving 

 14.9
2
c  0.8  59MHz
37.6
• What is missing?
2/17/05
EENG449b/Savvides
Lec 10.33
Task Execution Energy Cost
• A certain task needs to run on the processor. The task
requires 200 Million Instructions to complete.
• Which power level will be the most efficient?
P250  c  1.52  250
P59  c  0.82  59
200
T250 
 0.8s
250
200
T59 
 3.39 s
59
E 250  562.5c  0.8  450c
2/17/05
E 59  37.76c  3.39  128c
EENG449b/Savvides
Lec 10.34
Power Consumption on Embedded
Processors
• Different core I/O from Peripheral I/O –
numbers here
– Cores scaling down to 0.8V. 1.8V devices are becoming
common
– General Purpose I/O interfaces still at 3.0 – 3.3V
» Makes power supply harder, additional regulator
inefficiency
• Sleep modes and associate cost of sleep and
recovery SA-1100 modes
– Need time and energy to transition between states
2/17/05
EENG449b/Savvides
Lec 10.35
Example: SA-1100 CPU
400 mW
• RUN
• IDLE
– CPU stopped when not
in use
– Monitoring for
interrupts
RUN
10 s
• SLEEP
160 ms
– Shutdown on-chip
activity
IDLE
50 mW
2/17/05
90 s
10 s
90 s
SLEEP
0.16 mW
EENG449b/Savvides
Lec 10.36
Duty Cycling: Exploiting Sleep Modes
• Imagine a processor with max power
consumption 120mW
• Power supply voltage 2.5V
• We need to power the device form a
2000mAh battery for 1 year
• Sleep mode draws 20uA current
• What is the duty cycle the device needs to
operate at to last for at least 1 year?
2/17/05
EENG449b/Savvides
Lec 10.37
Duty cycling
• 1 year has 365 x 24 = 8760 hours
2000
I avg 
 228A
8760
Pavg  I avg  V  228  2.5  570W
I avg  Ton  I ON  (1  TON )  Isleep
Ton 
2/17/05
I avg  Isleep
I on  Isleep
228  20

 0.434%  38h / year
48000  20
EENG449b/Savvides
Lec 10.38
Voltage Reduction is Better
• Example: task with 100ms deadline, requires 50ms CPU
time at full speed
– normal system gives 50ms computation, 50ms idle/stopped time
– half speed/voltage system gives 100ms computation, 0ms idle
– same number of CPU cycles but 1/4 energy reduction
Speed
T1
T2
T1
T2
Same work,
lower energy
Task
Idle
Task
2/17/05
Time
EENG449b/Savvides
Lec 10.39
Problem with Voltage Reduction
• Voltage gets dictated by the tightest (critical) timing
constraint
» not a problem if latency not important
– throughput can always be improved by pipelining, parallelism etc.
» but, real systems have bursty throughput and latency critical
tasks
Solution: dynamically vary the voltage!
2/17/05
EENG449b/Savvides
Lec 10.40
Varying the Supply Voltage
Variable Supply
Tframe
Fixed Supply
Tframe
Active
Idle
Active
Efixed = 1/2  CVdd 2
Evar = 1/2  C(Vdd /2)2 = 1/4E fixed
1.0
0.8
Normalized
Power
Fixed Supply
0.6
0.4
from [Gutnik96]
(VLSI Symposium)
0.2
Variable Supply
0
2/17/05
0
0.2
0.4
0.6
0.8
Normalized Workload
1.0
EENG449b/Savvides
Lec 10.41
XYZ Node Frequency Scaling
2/17/05
EENG449b/Savvides
Lec 10.42
Code Optimizations for Low Power
• High-level operations (e.g. C statement) can be compiled into
different instruction sequences
» different instructions & ordering have different power
• Instruction Selection
– Select a minimum-power instruction mix for executing a piece of high
level code
• Instruction Packing & Dual Memory Loads
–
2/17/05
Two on-chip memory banks
» Dual load vs. two single loads
» Almost 50% energy savings
EENG449b/Savvides
Lec 10.43
Code Optimizations for Low Power
(contd.)
• Reorder instructions to reduce switching effect at functional units and
I/O buses
– E.g. Cold scheduling minimizes instruction bus transitions
• Operand swapping
– Swap the operands at the input of multiplier
– Result is unaltered, but power changes significantly!
• Other standard compiler optimizations
– Intermediate level: Software pipelining, dead code elimination, redundancy
elimination
– Low level: Register allocation and other machine specific optimizations
• Use processor-specific instruction styles
– e.g. on ARM the default int type is ~ 20% more efficient than char or short as
the latter result in sign or zero extension
– e.g. on ARM the conditional instructions can be used instead of branches
2/17/05
EENG449b/Savvides
Lec 10.44
Minimizing Memory Access Costs
• Reduce memory access, make better use of registers
– Register access consumes power << than memory access
• Straightforward way: minimize number of read-write
operations, e.g.
• Cache optimizations
– Reorder memory accesses to improve cache hit rates
• Can use existing techniques for high-performance code
generation
2/17/05
EENG449b/Savvides
Lec 10.45
Low-power Software Strategies
• Code running on CPU
– Code optimizations for low power
CPU
• Code accessing memory objects
– SW optimizations for memory
Cache
• Data flowing on the buses
– I/O coding for low power
Memory
• Compiler controlled power management
2/17/05
EENG449b/Savvides
Lec 10.46
How can power consumption be reduced
at the circuit design level inside a
processor?
2/17/05
EENG449b/Savvides
Lec 10.47
Example: Reference Datapath
Critical path delay: Tadder + Tcomparator = 25 ns
 Frequency: fref = 40 MHz
 Total switched capacitance = Cref
 Vdd = Vref = 5V
2
 Power for reference datapath = Pref = CrefVref fref

2/17/05
EENG449b/Savvides
from “Digital Integrated Circuits” by
LecRabaey
10.48
Parallel Datapath
The clock rate can be reduced by x2 with the same
throughput: fpar = fref/2 = 20 MHz
 Total switched capacitance = Cpar = 2.15Cref
 Vpar = Vref/1.7
2
 Ppar = (2.15Cref)(Vref/1.7) (fref /2) = 0.36Pref
EENG449b/Savvides

2/17/05
from “Digital Integrated Circuits” by
LecRabaey
10.49
Pipelined Datapath



2/17/05
fpipe = fref
Cpipe = 1.1Cref
Vpipe = Vref/1.7
Voltage can be dropped while maintaining the original
throughput
Pipe = CpipeVpipe2fpipe = (1.1Cref)(Vref/1.7)22fref = 0.37Pref
EENG449b/Savvides
from “Digital Integrated Circuits” by
LecRabaey
10.50
Datapath Architecture-Power
Trade-off Summary
Datapath
Architecture
Original
Pipelined
Parallel
PipelineParallel
2/17/05
Voltage Area
Power
5V
2.9V
2.9V
1
1.3
3.4
1
0.37
0.34
2.0V
3.7
0.18
EENG449b/Savvides
Lec 10.51
Back to Processor Architecture: ARM
Performance
• Some possible avenues of optimizing
performance and power consumption on the
ARM
– Use the on-chip cache
– Write code in 16-bit mode assembly
» Need only one memory access to fetch an
instruction
– Execution in RAM vs. Flash
– Write code in assembly
• Refer to the ARM assembly language
handout for more references
2/17/05
EENG449b/Savvides
Lec 10.52
Download