Lecture 1: Course Introduction and Review

advertisement
EENG 449bG/CPSC 439bG
Computer Systems
Lecture 6
Overview of Power Issues in Computing Systems
January 29, 2004
Prof. Andreas Savvides
Spring 2004
http://www.eng.yale.edu/courses/eeng449bG
1/29/04
EENG449b/Savvides
Lec 6.1
Announcements
• Reading for this lecture
– J. Pouwese, K. Lagendoen, H. Sips, “Dynamic Voltage Scaling
on a Low Power Microprocessor”, posted on the class website
• Embedded Processor Programming Reading
– “Get By without an RTOS” – by Melkonian – link posted on
class website
• Project descriptions due tomorrow
– Email: andreas.savvides@yale.edu
• Microcontroller workshop tomorrow in AKW000. I
recommend you attend at least 1 of these according
to your project needs
– 1:00 – 2:00pm ARM/THUMB programming and tools
– 2:00 – 2:15pm Break
– 2:15 – 3:15pm Mote programming & tools
1/29/04
EENG449b/Savvides
Lec 6.2
Why worry about power?
Intel vs. Duracell
16x
14x
Processor (MIPS)
12x
Hard Disk (capacity)
10x
Improvement
(compared to year 0)
8x
Memory (capacity)
6x
4x
Battery (energy stored)
2x
1x
0
1
2
3
4
Time (years)
5
6
• No Moore’s Law in batteries: 2-3%/year growth
EENG449b/Savvides
1/29/04
Lec 6.3
Current Battery Technology is
Inadequate
Battery
Rechargeable? Wh/lb
Alkaline MnO2
NO
65.8
Silver Oxide
NO
60
Li/MnO2
NO
105
Zinc Air
NO
140
NiCd
YES
23
Li-Polymer
YES
65-90
• Example: 20-watt battery
Wh/litre
347
500
550
1150
125
300-415
» NiCd weighs 0.5 kg, lasts 1 hr, and costs $20
» Comparable Li-Ion lasts 3 hrs, but costs > 4x more
1/29/04
EENG449b/Savvides
Lec 6.4
Comparison of Energy Sources
Power (Energy) Density
Batteries (Zinc-Air)
Batteries(Lithium ion)
Source of Estimates
3
1050 -1560 mWh/cm (1.4 V)
Published data from manufacturers
3
300 mWh/cm (3 - 4 V)
Published data from manufacturers
2
15 mW/cm - direct sun
Solar (Outdoors)
2
0.15mW/cm - cloudy day.
Published data and testing.
2
.006 mW/cm - my desk
Solar (Indoor)
Vibrations
2
0.57 mW/cm - 12 in. under a 60W bulb
3
0.001 - 0.1 mW/cm
Testing
Simulations and Testing
2
3E-6 mW/cm at 75 Db sound level
Acoustic Noise
Passive Human
Powered
9.6E-4 mW/cm2 at 100 Db sound level
1.8 mW (Shoe inserts >> 1 cm )
Published Study.
Thermal Conversion
0.0018 mW - 10 deg. C gradient
Published Study.
2
Direct Calculations from Acoustic Theory
3
80 mW/cm
1/29/04
3
Nuclear Reaction
1E6 mWh/cm
3
300 - 500 mW/cm
Fuel Cells
~4000 mWh/cm
3
Published Data.
Published Data.
EENG449b/Savvides
Assume 1mW Average as definition of “Scavenged Energy”
Lec 6.5
Trends in Total Power Consumption
source : arpa-esto
DEC 21164
microprocessor
power dissipation
• Frightening: proportional to area & frequency
1/29/04
EENG449b/Savvides
Lec 6.6
Power Metrics in Microprocessors
nJ/Instruction
– Mostly for processors with the same instruction sets
– Does not capture the effect of operand size (e.g 8-bit
addition vs. 32-bit addition operations
MIPS/Watt
mA – common among component data sheets
Remember:
Power (Watts)  IV
Energy (Joules)  Power  time
1 2
Energy(Joules)  CV
2
1/29/04
EENG449b/Savvides
Lec 6.7
Modeling the Battery Behavior
• Theoretical capacity of battery is decided by the
amount of the active material in the cell
» batteries often modeled as buckets of constant energy
– e.g. halving the power by halving the clock frequency is assumed to
double the computation time while maintaining constant computation per
battery life
• In reality, delivered or nominal capacity depends on
how the battery is discharged
» discharge rate (load current)
» discharge profile and duty cycle
» operating voltage and power level drained
1/29/04
EENG449b/Savvides
Lec 6.8
Battery Capacity
from [Powers95]
• Current in “C” rating: load current
nomralized to battery’s capacity
» e.g. a discharge current of 1C for a capacity of
500 mA-hrs is 500 mA
1/29/04
EENG449b/Savvides
Lec 6.9
Battery Capacity vs. Discharge
Current
• Amount of energy delivered is decreased as the
current (rate at which power is drawn) is increased
» rated as ampere hours or watt hours when discharged at a
specific rate to a specific cut-off voltage
– primary cells rated at a current which is 1/100th of the capacity in
ampere hours (C/100)
– secondary cells are rated at C/20 or C/10
• At high currents, the diffusion process that moves
new active material from electrolytes to the electrode
cannot keep up
» concentration of active material at cathode drops to zero, and
cell voltage goes down below cut-off
» even though active material in cell is not exhausted!
1/29/04
EENG449b/Savvides
Lec 6.10
Battery Energy Consumers
1/29/04
EENG449b/Savvides
Lec 6.11
An Example Wireless Platform
ARM/THUMB 40MHz
Running uCos-ii
PALOS
RS-485 &
External Power
ADXL 202E
MEMS Accelerometer
MCU I/F
Host Computer, GPS, etc
UI: Pushbuttons
Medusa MK-2
1/29/04
EENG449b/Savvides
Lec 6.12
Where does the Power Go?
Peripherals
Disk
Display
Processing
Programmable
Ps & DSPs
ASICs
(apps, protocols etc.)
Memory
Battery
DC-DC
Converter
Radio
Modem
Power Supply
1/29/04
RF
Transceiver
Communication
EENG449b/Savvides
Lec 6.13
Power Consumption for a Computer
with Wireless NIC
Other
7%
CPU/Memory
21%
Hard Drive
18%
1/29/04
Display
36%
Wireless LAN
18%
EENG449b/Savvides
Lec 6.14
Energy Consumption of
Wireless NICs (Wavelan)
1/29/04
Specs
Measured
2 Mbps
(Bronze)
Sleep Mode
Idle Mode
Receive Mode
Transmit Mode
9 mA
-------280 mA
330 mA
14 mA
178 mA 200
mA
280 mA
11 Mbps
(Silver)
Sleep Mode
Idle Mode
Receive Mode
Transmit Mode
10 mA
-------180 mA
280 mA
10 mA
156 mA
190 mA
284 mA
EENG449b/Savvides
Lec 6.15
Example: Power Consumption for
Compaq’s iPAQ
206MHz StrongArm SA-1110
processor
320x240 resolution color TFT LCD
Touch screen
32MB SDRAM / 16MB Flash memory
USB/RS-232/IrDA connection
Speaker/Microphone
Lithium Polymer battery
PCMCIA card expansion pack & CF
card expansion pack
* Note
CPU is idle state of most of its time
Audio, IrDA, RS232 power is measured when
each part is idling
Etc includes CPU, flash memory, touch
screen and all other devices
1/29/04
Frontlight brightness was 16
EENG449b/Savvides
Lec 6.16
Microprocessor Power Consumption
CMOS Circuits
(Used in most microprocessors)
Static Component
Bias and leakage currents
O(1mW)
Dynamic Component
Digital circuit switching inside
the processor
2
P  Istandby Vdd  Ileakage Vdd  Isc Vdd  ClVdd fclk
Static
1/29/04
Dynamic
EENG449b/Savvides
Lec 6.17
Power Consumption in Digital CMOS
Circuits
2
Power  Istandby Vdd  Ileakage Vdd  Isc Vdd  ClVdd fclk
Istandby
- current constantly drawn from the power supply
Ileakage
- determined by fabrication technology
Isc
- short circuit current due to the DC path between the
supply rails during output transitions
Cl
fclk
1/29/04
- load capacitance at the output node
- clock frequency
Vdd
- power supply voltage
EENG449b/Savvides
Lec 6.18
DVS on Low Power Processor
Number of gates
M
P   Ck  f  Vdd
Dynamic Power Component
2
k 1
Load capacitance of gate k
Maximum gain when voltage is lowered BUT
lower voltage increases circuit delay
Propagation delay
VDD
τ
 (VDD  VT ) 2
Transistor gain factor
1/29/04
CMOS transistor threshold voltage
EENG449b/Savvides
Lec 6.19
Voltage Scaling on LART
• Dynamically lower the processor voltage and
frequency to reduce power consumption
• LART wearable board
–
–
–
–
–
1/29/04
StorngARM 1100 Processor 190MHz
Various I/O capabilities
32 MB volatile memory
4 MB non-volatile memory
Programmable voltage regulator
EENG449b/Savvides
Lec 6.20
Processor Envelope
1/29/04
EENG449b/Savvides
Lec 6.21
LART Power Measurement
Based on dhrystone benchmark
• Note the measurement setup at
Different levels on the board
• Always provide hooks for
measurement, testing and debugging
during your design. Both for
software and hardware!!!
1/29/04
Total Power Consumption on the
LART
Platform
EENG449b/Savvides
Lec 6.22
Memory Subsystem Power
Consumption – Read Operation
Optimal memory access waveforms
Power consumption
1/29/04
Memory Bandwidth
EENG449b/Savvides
Lec 6.23
Energy breakdown for read
(based on 1MB read)
1/29/04
Regulator
Loss-factor
EENG449b/Savvides
Lec 6.24
Power Breakdown for H.263 Decoder
1/29/04
EENG449b/Savvides
Lec 6.25
Reducing Power Consumption
is a multilevel task!
• Physical layer
– Technology – reduce the surface of CMOS circuits
• Architecture/IC level
– Several optimizations in the design (e.g parallelism and
pipelining)
– Provide hooks for software driven power management (e.g
different power modes and clock speeds)
• OS Level
– Smart schedulers, interval schedulers, DVS
• Application Level
– Power aware applications that worn the OS and the hardware
about the features needed during application lifetime
– Sleep modes and DVS driven by applications
• Network Level
– Networked devices may be able to apply low duty cycles, in
which some of the devices are asleep and others are awake
1/29/04
EENG449b/Savvides
Lec 6.26
Conclusions
• Interval based schedulers not so efficient
– Interval-scheduler – reduce voltage after a prespecified idle period is detected
• Better leverage of DVS when the processor
is aware of the application requirements
– Illustrated with the H.263 encoder
• Monitor different power consumption
profiles across different sections of the
platform and use them to make clever
decisions about power-management
• What is missing:
– Comments on power regulator efficiencies…
1/29/04
EENG449b/Savvides
Lec 6.27
How can power consumption be reduced
at the circuit design level inside a
processor?
1/29/04
EENG449b/Savvides
Lec 6.28
Example: Reference Datapath
Critical path delay: Tadder + Tcomparator = 25 ns
 Frequency: fref = 40 MHz
 Total switched capacitance = Cref
 Vdd = Vref = 5V
2
 Power for reference datapath = Pref = CrefVref fref

1/29/04
EENG449b/Savvides
from “Digital Integrated Circuits” byLecRabaey
6.29
Parallel Datapath
The clock rate can be reduced by x2 with the same
throughput: fpar = fref/2 = 20 MHz
 Total switched capacitance = Cpar = 2.15Cref
 Vpar = Vref/1.7
2
 Ppar = (2.15Cref)(Vref/1.7) (fref /2) = 0.36Pref
EENG449b/Savvides

1/29/04
from “Digital Integrated Circuits” byLecRabaey
6.30
Pipelined Datapath



1/29/04
fpipe = fref
Cpipe = 1.1Cref
Vpipe = Vref/1.7
Voltage can be dropped while maintaining the original
throughput
Pipe = CpipeVpipe2fpipe = (1.1Cref)(Vref/1.7)2fref = 0.37Pref
EENG449b/Savvides
from “Digital Integrated Circuits” byLecRabaey
6.31
Datapath Architecture-Power
Trade-off Summary
Datapath
Architecture
Original
Pipelined
Parallel
PipelineParallel
1/29/04
Voltage Area
Power
5V
2.9V
2.9V
1
1.3
3.4
1
0.37
0.34
2.0V
3.7
0.18
EENG449b/Savvides
Lec 6.32
Power Consumption on Embedded
Processors
• Different core I/O from Peripheral I/O –
numbers here
– Cores scaling down to 0.8V. 1.8V devices are becoming
common
– General Purpose I/O interfaces still at 3.0 – 3.3V
» Makes power supply harder, additional regulator
inefficiency
• Sleep modes and associate cost of sleep and
recovery SA-1100 modes
– Need time and energy to transition between states
1/29/04
EENG449b/Savvides
Lec 6.33
Example: SA-1100 CPU
400 mW
• RUN
• IDLE
– CPU stopped when not
in use
– Monitoring for
interrupts
RUN
10 s
• SLEEP
160 ms
– Shutdown on-chip
activity
IDLE
50 mW
1/29/04
90 s
10 s
90 s
SLEEP
0.16 mW
EENG449b/Savvides
Lec 6.34
Low-power Software
• Wireless industry  Constantly evolving
standards
• Systems have to be flexible and adaptable
–
Significant portion of system functionality is
implemented as software running on a programmable
processor
• Software drives the underlying hardware
–
Hence, it can significantly impact system power
consumption
• Significant energy savings can be obtained
by clever software design.
1/29/04
EENG449b/Savvides
Lec 6.35
Low-power Software Strategies
• Code running on CPU
– Code optimizations for low power
CPU
• Code accessing memory objects
– SW optimizations for memory
Cache
• Data flowing on the buses
– I/O coding for low power
Memory
• Compiler controlled power management
1/29/04
EENG449b/Savvides
Lec 6.36
Code Optimizations for Low Power
• High-level operations (e.g. C statement) can be compiled into
different instruction sequences
» different instructions & ordering have different power
• Instruction Selection
– Select a minimum-power instruction mix for executing a piece of high
level code
• Instruction Packing & Dual Memory Loads
–
1/29/04
Two on-chip memory banks
» Dual load vs. two single loads
» Almost 50% energy savings
EENG449b/Savvides
Lec 6.37
Code Optimizations for Low Power
(contd.)
• Reorder instructions to reduce switching effect at functional units and
I/O buses
– E.g. Cold scheduling minimizes instruction bus transitions
• Operand swapping
– Swap the operands at the input of multiplier
– Result is unaltered, but power changes significantly!
• Other standard compiler optimizations
– Intermediate level: Software pipelining, dead code elimination, redundancy
elimination
– Low level: Register allocation and other machine specific optimizations
• Use processor-specific instruction styles
– e.g. on ARM the default int type is ~ 20% more efficient than char or short as
the latter result in sign or zero extension
– e.g. on ARM the conditional instructions can be used instead of branches
1/29/04
EENG449b/Savvides
Lec 6.38
Minimizing Memory Access Costs
• Reduce memory access, make better use of registers
– Register access consumes power << than memory access
• Straightforward way: minimize number of read-write
operations, e.g.
• Cache optimizations
– Reorder memory accesses to improve cache hit rates
• Can use existing techniques for high-performance code
generation
1/29/04
EENG449b/Savvides
Lec 6.39
Minimizing Memory Access Costs
(contd.)
• Loop optimizations such as loop unrolling, loop fusion
also reduce memory power consumption
• More effective: explicitly target minimization of
switching activity on I/O busses and exploiting
memory hierarchy
– Data allocation to minimize I/O bus transitions
» e.g. mapping large arrays with known access patterns to main
memory to minimize address bus transitions
» works in conjunction with coding of address busses
– Exploiting memory hierarchy
» e.g. organizing video and DSP data to maximize the higher levels
(lower power) of memory hierarchy
1/29/04
EENG449b/Savvides
Lec 6.40
Computation & Communication
• Energy/bit  Energy/op large even for short ranges!
Mote-class Node
WINS-class Node
Transmit
720 nJ/bit
Receive
110 nJ/bit
Transmit
6600 nJ/bit
Receive
3300 nJ/bit
Energy breakdown for acoustic
Encode Decode
1/29/04
Receive
Processor
~ 200 ops/bit
Processor
1.6 nJ/op
~ 6000 ops/bit
Energy breakdown for image
Encode
Transmit
4 nJ/op
Decode
Transmit
EENG449b/Savvides
Receive
Lec 6.41
Next time
• ARM/THUMB Programming & Peripherals
• Embedded Operating Systems
• Don’t forget tomorrow’s workshop!
– 1:00pm AKW 000
1/29/04
EENG449b/Savvides
Lec 6.42
Download