EENG 449bG/CPSC 439bG Computer Systems Lecture 6 Overview of Power Issues in Computing Systems January 29, 2004 Prof. Andreas Savvides Spring 2004 http://www.eng.yale.edu/courses/eeng449bG 1/29/04 EENG449b/Savvides Lec 6.1 Announcements • Reading for this lecture – J. Pouwese, K. Lagendoen, H. Sips, “Dynamic Voltage Scaling on a Low Power Microprocessor”, posted on the class website • Embedded Processor Programming Reading – “Get By without an RTOS” – by Melkonian – link posted on class website • Project descriptions due tomorrow – Email: andreas.savvides@yale.edu • Microcontroller workshop tomorrow in AKW000. I recommend you attend at least 1 of these according to your project needs – 1:00 – 2:00pm ARM/THUMB programming and tools – 2:00 – 2:15pm Break – 2:15 – 3:15pm Mote programming & tools 1/29/04 EENG449b/Savvides Lec 6.2 Why worry about power? Intel vs. Duracell 16x 14x Processor (MIPS) 12x Hard Disk (capacity) 10x Improvement (compared to year 0) 8x Memory (capacity) 6x 4x Battery (energy stored) 2x 1x 0 1 2 3 4 Time (years) 5 6 • No Moore’s Law in batteries: 2-3%/year growth EENG449b/Savvides 1/29/04 Lec 6.3 Current Battery Technology is Inadequate Battery Rechargeable? Wh/lb Alkaline MnO2 NO 65.8 Silver Oxide NO 60 Li/MnO2 NO 105 Zinc Air NO 140 NiCd YES 23 Li-Polymer YES 65-90 • Example: 20-watt battery Wh/litre 347 500 550 1150 125 300-415 » NiCd weighs 0.5 kg, lasts 1 hr, and costs $20 » Comparable Li-Ion lasts 3 hrs, but costs > 4x more 1/29/04 EENG449b/Savvides Lec 6.4 Comparison of Energy Sources Power (Energy) Density Batteries (Zinc-Air) Batteries(Lithium ion) Source of Estimates 3 1050 -1560 mWh/cm (1.4 V) Published data from manufacturers 3 300 mWh/cm (3 - 4 V) Published data from manufacturers 2 15 mW/cm - direct sun Solar (Outdoors) 2 0.15mW/cm - cloudy day. Published data and testing. 2 .006 mW/cm - my desk Solar (Indoor) Vibrations 2 0.57 mW/cm - 12 in. under a 60W bulb 3 0.001 - 0.1 mW/cm Testing Simulations and Testing 2 3E-6 mW/cm at 75 Db sound level Acoustic Noise Passive Human Powered 9.6E-4 mW/cm2 at 100 Db sound level 1.8 mW (Shoe inserts >> 1 cm ) Published Study. Thermal Conversion 0.0018 mW - 10 deg. C gradient Published Study. 2 Direct Calculations from Acoustic Theory 3 80 mW/cm 1/29/04 3 Nuclear Reaction 1E6 mWh/cm 3 300 - 500 mW/cm Fuel Cells ~4000 mWh/cm 3 Published Data. Published Data. EENG449b/Savvides Assume 1mW Average as definition of “Scavenged Energy” Lec 6.5 Trends in Total Power Consumption source : arpa-esto DEC 21164 microprocessor power dissipation • Frightening: proportional to area & frequency 1/29/04 EENG449b/Savvides Lec 6.6 Power Metrics in Microprocessors nJ/Instruction – Mostly for processors with the same instruction sets – Does not capture the effect of operand size (e.g 8-bit addition vs. 32-bit addition operations MIPS/Watt mA – common among component data sheets Remember: Power (Watts) IV Energy (Joules) Power time 1 2 Energy(Joules) CV 2 1/29/04 EENG449b/Savvides Lec 6.7 Modeling the Battery Behavior • Theoretical capacity of battery is decided by the amount of the active material in the cell » batteries often modeled as buckets of constant energy – e.g. halving the power by halving the clock frequency is assumed to double the computation time while maintaining constant computation per battery life • In reality, delivered or nominal capacity depends on how the battery is discharged » discharge rate (load current) » discharge profile and duty cycle » operating voltage and power level drained 1/29/04 EENG449b/Savvides Lec 6.8 Battery Capacity from [Powers95] • Current in “C” rating: load current nomralized to battery’s capacity » e.g. a discharge current of 1C for a capacity of 500 mA-hrs is 500 mA 1/29/04 EENG449b/Savvides Lec 6.9 Battery Capacity vs. Discharge Current • Amount of energy delivered is decreased as the current (rate at which power is drawn) is increased » rated as ampere hours or watt hours when discharged at a specific rate to a specific cut-off voltage – primary cells rated at a current which is 1/100th of the capacity in ampere hours (C/100) – secondary cells are rated at C/20 or C/10 • At high currents, the diffusion process that moves new active material from electrolytes to the electrode cannot keep up » concentration of active material at cathode drops to zero, and cell voltage goes down below cut-off » even though active material in cell is not exhausted! 1/29/04 EENG449b/Savvides Lec 6.10 Battery Energy Consumers 1/29/04 EENG449b/Savvides Lec 6.11 An Example Wireless Platform ARM/THUMB 40MHz Running uCos-ii PALOS RS-485 & External Power ADXL 202E MEMS Accelerometer MCU I/F Host Computer, GPS, etc UI: Pushbuttons Medusa MK-2 1/29/04 EENG449b/Savvides Lec 6.12 Where does the Power Go? Peripherals Disk Display Processing Programmable Ps & DSPs ASICs (apps, protocols etc.) Memory Battery DC-DC Converter Radio Modem Power Supply 1/29/04 RF Transceiver Communication EENG449b/Savvides Lec 6.13 Power Consumption for a Computer with Wireless NIC Other 7% CPU/Memory 21% Hard Drive 18% 1/29/04 Display 36% Wireless LAN 18% EENG449b/Savvides Lec 6.14 Energy Consumption of Wireless NICs (Wavelan) 1/29/04 Specs Measured 2 Mbps (Bronze) Sleep Mode Idle Mode Receive Mode Transmit Mode 9 mA -------280 mA 330 mA 14 mA 178 mA 200 mA 280 mA 11 Mbps (Silver) Sleep Mode Idle Mode Receive Mode Transmit Mode 10 mA -------180 mA 280 mA 10 mA 156 mA 190 mA 284 mA EENG449b/Savvides Lec 6.15 Example: Power Consumption for Compaq’s iPAQ 206MHz StrongArm SA-1110 processor 320x240 resolution color TFT LCD Touch screen 32MB SDRAM / 16MB Flash memory USB/RS-232/IrDA connection Speaker/Microphone Lithium Polymer battery PCMCIA card expansion pack & CF card expansion pack * Note CPU is idle state of most of its time Audio, IrDA, RS232 power is measured when each part is idling Etc includes CPU, flash memory, touch screen and all other devices 1/29/04 Frontlight brightness was 16 EENG449b/Savvides Lec 6.16 Microprocessor Power Consumption CMOS Circuits (Used in most microprocessors) Static Component Bias and leakage currents O(1mW) Dynamic Component Digital circuit switching inside the processor 2 P Istandby Vdd Ileakage Vdd Isc Vdd ClVdd fclk Static 1/29/04 Dynamic EENG449b/Savvides Lec 6.17 Power Consumption in Digital CMOS Circuits 2 Power Istandby Vdd Ileakage Vdd Isc Vdd ClVdd fclk Istandby - current constantly drawn from the power supply Ileakage - determined by fabrication technology Isc - short circuit current due to the DC path between the supply rails during output transitions Cl fclk 1/29/04 - load capacitance at the output node - clock frequency Vdd - power supply voltage EENG449b/Savvides Lec 6.18 DVS on Low Power Processor Number of gates M P Ck f Vdd Dynamic Power Component 2 k 1 Load capacitance of gate k Maximum gain when voltage is lowered BUT lower voltage increases circuit delay Propagation delay VDD τ (VDD VT ) 2 Transistor gain factor 1/29/04 CMOS transistor threshold voltage EENG449b/Savvides Lec 6.19 Voltage Scaling on LART • Dynamically lower the processor voltage and frequency to reduce power consumption • LART wearable board – – – – – 1/29/04 StorngARM 1100 Processor 190MHz Various I/O capabilities 32 MB volatile memory 4 MB non-volatile memory Programmable voltage regulator EENG449b/Savvides Lec 6.20 Processor Envelope 1/29/04 EENG449b/Savvides Lec 6.21 LART Power Measurement Based on dhrystone benchmark • Note the measurement setup at Different levels on the board • Always provide hooks for measurement, testing and debugging during your design. Both for software and hardware!!! 1/29/04 Total Power Consumption on the LART Platform EENG449b/Savvides Lec 6.22 Memory Subsystem Power Consumption – Read Operation Optimal memory access waveforms Power consumption 1/29/04 Memory Bandwidth EENG449b/Savvides Lec 6.23 Energy breakdown for read (based on 1MB read) 1/29/04 Regulator Loss-factor EENG449b/Savvides Lec 6.24 Power Breakdown for H.263 Decoder 1/29/04 EENG449b/Savvides Lec 6.25 Reducing Power Consumption is a multilevel task! • Physical layer – Technology – reduce the surface of CMOS circuits • Architecture/IC level – Several optimizations in the design (e.g parallelism and pipelining) – Provide hooks for software driven power management (e.g different power modes and clock speeds) • OS Level – Smart schedulers, interval schedulers, DVS • Application Level – Power aware applications that worn the OS and the hardware about the features needed during application lifetime – Sleep modes and DVS driven by applications • Network Level – Networked devices may be able to apply low duty cycles, in which some of the devices are asleep and others are awake 1/29/04 EENG449b/Savvides Lec 6.26 Conclusions • Interval based schedulers not so efficient – Interval-scheduler – reduce voltage after a prespecified idle period is detected • Better leverage of DVS when the processor is aware of the application requirements – Illustrated with the H.263 encoder • Monitor different power consumption profiles across different sections of the platform and use them to make clever decisions about power-management • What is missing: – Comments on power regulator efficiencies… 1/29/04 EENG449b/Savvides Lec 6.27 How can power consumption be reduced at the circuit design level inside a processor? 1/29/04 EENG449b/Savvides Lec 6.28 Example: Reference Datapath Critical path delay: Tadder + Tcomparator = 25 ns Frequency: fref = 40 MHz Total switched capacitance = Cref Vdd = Vref = 5V 2 Power for reference datapath = Pref = CrefVref fref 1/29/04 EENG449b/Savvides from “Digital Integrated Circuits” byLecRabaey 6.29 Parallel Datapath The clock rate can be reduced by x2 with the same throughput: fpar = fref/2 = 20 MHz Total switched capacitance = Cpar = 2.15Cref Vpar = Vref/1.7 2 Ppar = (2.15Cref)(Vref/1.7) (fref /2) = 0.36Pref EENG449b/Savvides 1/29/04 from “Digital Integrated Circuits” byLecRabaey 6.30 Pipelined Datapath 1/29/04 fpipe = fref Cpipe = 1.1Cref Vpipe = Vref/1.7 Voltage can be dropped while maintaining the original throughput Pipe = CpipeVpipe2fpipe = (1.1Cref)(Vref/1.7)2fref = 0.37Pref EENG449b/Savvides from “Digital Integrated Circuits” byLecRabaey 6.31 Datapath Architecture-Power Trade-off Summary Datapath Architecture Original Pipelined Parallel PipelineParallel 1/29/04 Voltage Area Power 5V 2.9V 2.9V 1 1.3 3.4 1 0.37 0.34 2.0V 3.7 0.18 EENG449b/Savvides Lec 6.32 Power Consumption on Embedded Processors • Different core I/O from Peripheral I/O – numbers here – Cores scaling down to 0.8V. 1.8V devices are becoming common – General Purpose I/O interfaces still at 3.0 – 3.3V » Makes power supply harder, additional regulator inefficiency • Sleep modes and associate cost of sleep and recovery SA-1100 modes – Need time and energy to transition between states 1/29/04 EENG449b/Savvides Lec 6.33 Example: SA-1100 CPU 400 mW • RUN • IDLE – CPU stopped when not in use – Monitoring for interrupts RUN 10 s • SLEEP 160 ms – Shutdown on-chip activity IDLE 50 mW 1/29/04 90 s 10 s 90 s SLEEP 0.16 mW EENG449b/Savvides Lec 6.34 Low-power Software • Wireless industry Constantly evolving standards • Systems have to be flexible and adaptable – Significant portion of system functionality is implemented as software running on a programmable processor • Software drives the underlying hardware – Hence, it can significantly impact system power consumption • Significant energy savings can be obtained by clever software design. 1/29/04 EENG449b/Savvides Lec 6.35 Low-power Software Strategies • Code running on CPU – Code optimizations for low power CPU • Code accessing memory objects – SW optimizations for memory Cache • Data flowing on the buses – I/O coding for low power Memory • Compiler controlled power management 1/29/04 EENG449b/Savvides Lec 6.36 Code Optimizations for Low Power • High-level operations (e.g. C statement) can be compiled into different instruction sequences » different instructions & ordering have different power • Instruction Selection – Select a minimum-power instruction mix for executing a piece of high level code • Instruction Packing & Dual Memory Loads – 1/29/04 Two on-chip memory banks » Dual load vs. two single loads » Almost 50% energy savings EENG449b/Savvides Lec 6.37 Code Optimizations for Low Power (contd.) • Reorder instructions to reduce switching effect at functional units and I/O buses – E.g. Cold scheduling minimizes instruction bus transitions • Operand swapping – Swap the operands at the input of multiplier – Result is unaltered, but power changes significantly! • Other standard compiler optimizations – Intermediate level: Software pipelining, dead code elimination, redundancy elimination – Low level: Register allocation and other machine specific optimizations • Use processor-specific instruction styles – e.g. on ARM the default int type is ~ 20% more efficient than char or short as the latter result in sign or zero extension – e.g. on ARM the conditional instructions can be used instead of branches 1/29/04 EENG449b/Savvides Lec 6.38 Minimizing Memory Access Costs • Reduce memory access, make better use of registers – Register access consumes power << than memory access • Straightforward way: minimize number of read-write operations, e.g. • Cache optimizations – Reorder memory accesses to improve cache hit rates • Can use existing techniques for high-performance code generation 1/29/04 EENG449b/Savvides Lec 6.39 Minimizing Memory Access Costs (contd.) • Loop optimizations such as loop unrolling, loop fusion also reduce memory power consumption • More effective: explicitly target minimization of switching activity on I/O busses and exploiting memory hierarchy – Data allocation to minimize I/O bus transitions » e.g. mapping large arrays with known access patterns to main memory to minimize address bus transitions » works in conjunction with coding of address busses – Exploiting memory hierarchy » e.g. organizing video and DSP data to maximize the higher levels (lower power) of memory hierarchy 1/29/04 EENG449b/Savvides Lec 6.40 Computation & Communication • Energy/bit Energy/op large even for short ranges! Mote-class Node WINS-class Node Transmit 720 nJ/bit Receive 110 nJ/bit Transmit 6600 nJ/bit Receive 3300 nJ/bit Energy breakdown for acoustic Encode Decode 1/29/04 Receive Processor ~ 200 ops/bit Processor 1.6 nJ/op ~ 6000 ops/bit Energy breakdown for image Encode Transmit 4 nJ/op Decode Transmit EENG449b/Savvides Receive Lec 6.41 Next time • ARM/THUMB Programming & Peripherals • Embedded Operating Systems • Don’t forget tomorrow’s workshop! – 1:00pm AKW 000 1/29/04 EENG449b/Savvides Lec 6.42