Jon Stanley EE382N-4 Spring 2011 Objectives Tasks Keil MCB2300 ARM7 Evaluation Board Beagleboard-XM ◦ ◦ ◦ ◦ System overview Measurement setup Instruction power usage Code power comparison ◦ ◦ ◦ ◦ System overview Android and Ubuntu OS experience Measurement setup Code power comparison Summary Other notes Explore several points in Lecture 16 on Power Aware Computing Characterize power of an ARM processor Characterize energy use of code ◦ Low level (bare metal) ◦ High level (Linux OS) ◦ Coding methods Power and energy measurements Software ◦ Instruction level ◦ Small routines ◦ C-code comparisons: Delays, timer interrupts, OS scheduling ◦ Linux OS impact ◦ Optimization impact ◦ Data type size impact Hardware ◦ Keil MCP2300 and Beagleboard-XM NXP LPC2387 microcontroller ◦ ARM7 processor core ◦ GPIO to LEDs, ADC I/O, etc. uVision IDE and Flash Magic for programming Bare metal code ◦ More accurate measure of power ◦ Can measure power of each instruction type Measurement setup ◦ Ammeter with uA resolution due to tiny changes in power consumption of different instructions ◦ Use good shielded probe to VBUS jumper on board ◦ Careful of current drift if instrument isn’t high quality Same instruction 1000 times to minimize effect of Branch on power Consider data hazards when writing test code 0 1 2 3 4 5 loop ADD ADD ADD ADD ADD ADD R0, R1, R2, R3, R4, R0, R0, R1, R2, R3, R4, R0, #1 #1 #1 #1 #1 #1 … 999 ADD R4, R4, #1 B loop Instruction Current (mA) NOP 41.94 MOV 41.56 ADD 41.59 ADD with stall 41.51 ADD with LSL 41.52 MOV with LSL 41.57 MUL 41.65 B 41.05 PC Load to Jump 40.86 NOP uses more power? Stalled pipeline uses less power, but uses more energy due to longer execution time ADD uses less power than MUL MOV to PC uses less power than Branch loop loop jump ADD R0, R0, #1 CMP R0, #1 ADDLT R1, R1, #1 ADDEQ R1, R1, #1 ADDGT R1, R1, #1 B loop ADD R0, R0, #1 CMP R0, #1 BLT jump BEQ jump ADD R1, R1, #1 B loop Method Current (mA) Conditional 42.33 Branch 42.33 No measurable difference at instruction level Branch would be better when there is large amounts of code between each conditional statement to reduce execution time C Code LED PWM dimming Timer interrupt PWM Method Current (mA) Interrupt 92.31 Delay 92.31 ◦ Interrupt handler ◦ LED control by interrupt Delay PWM ◦ Main loop uses delays ◦ Main loop controls LED Surprising result? ◦ Delay did not put processor in idle (like OS sleep). ◦ Tests were repeated later in Linux OS. Released in 2010 Features: ◦ TI DM3730 microcontroller ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ 1GHz ARM Cortex-A8 core 512MB DDR Ethernet 10/100 4-port USB RS232 Stereo audio S-video and DVI Boots from 4GB MicroSD card Comes with Angstrom Linux OS verification image Power usage is a realistic scenario of a system Installation files: http://softwaredl.ti.com/dsps/dsps_public_sw/sdo_tii/TI_Android _DevKit/02_00_00/exports/AM37X.tar.gz README is contained within the tarball PC with Ubuntu 10 to build file system on SD card Android boot-up and program execution was slow Not friendly for developing code within OS Installation files and instructions: http://elinux.org/BeagleBoardUbuntu Use instructions for “Demo Image - Maverick 10.10” ◦ Installation will take at least 4 hours ◦ Need internet connectivity - use “dhclient usb1” to get IP ◦ Note: the NetInstall method resulted in broken OS for me PC with Ubuntu 10 to build file system on SD card Ubuntu GUI runs faster than Android ◦ Built-in terminal and friendly for developing within OS ◦ “sudo apt-get install gcc” to install GCC compiler Hardware: BeagleBoard with Ubuntu ◦ Fluctuating current consumption so a DMM is not practical for an average current measurement ◦ NI cDAQ-9174 with 9227 analog current module ◦ Sample current consumption at 2kS/sec ◦ Measure average current with ~1.5 minute window ◦ Turn off user-defined LEDs to reduce noise echo none >/sys/class/leds/beagleboard\:\:usr0/trigger echo none >/sys/class/leds/beagleboard\:\:usr1/trigger Software: 2 C-codes for each test ◦ First code has simple for loop for current measure ◦ Second code uses gettimeofday() and runs the same for loop 1000 times to determine average runtime main() { int i,c; while(1) { for(i=0;i<10000;i++) c += i; } } Simple code that accumulates iteration count Idle OS Running Test Program HW2 exercise – gcc creates poor asm code that performs unnecessary memory operations Direct correlation to energy consumption Energy = V * (Icode – IIdleOS) * T Test Current (mA) Time (us) Coulomb (mA*us) Idle OS 602 For loop sum in C 673 135 9585 Optimized for loop in asm 677 45 3375 3x decrease in energy consumption by eliminating unnecessary memory operations Coulomb usage comparison of for loop sum code using different data types for the variable “c” Test Bytes Idle OS Current (mA) Time (us) Coulomb (mA*us) 602 Short 2 657 184 10120 Integer 4 673 135 9585 Float 4 639 891 32967 Double 8 660 487 28246 Coulomb usage appears correlated to both the complexity of the library function that performs the add and size of the data type Test application: ◦ Loop that writes to variable. 50% duty. ~2ms period. Test 1: Delay ◦ Cannot depend on CPU clock period for delay timing due to OS, so poll using gettimeofday Test 2: OS Scheduler ◦ Use usleep for timing with us resolution Test Current (mA) Time (us) Coulomb (mA*us) Jitter (us) Min Avg Max Idle OS 598 Delay 621 2020 46460 7 10 7209 Scheduler 598 2238 0 37 180 9162 Smallest data type generally lower energy Use better compilers or check assembly code Direct correlation between code performance and energy usage for ARM processors ◦ Beware of stalling the pipeline ◦ Look at Coulomb usage of different code approaches Fast algorithms may use more power than slow ones but use less energy overall ◦ Memory transactions are energy expensive ◦ Use interrupts or OS scheduler instead of polling in OS Results may not entirely hold for x86 or other processors, but these can be characterized using the methods outlined here Issues: ◦ Difficult to find documentation ◦ Community-oriented development ◦ Fairly new board Processor voltage and clock is adjustable but driver library not fully merged in Ubuntu yet ◦ cpufrequtils package Processor current consumption is accessible to software on the board via device on I2C Processor current consumption is accessible to software on the board ◦ Not recommended for precision gauging because 10-bit ADC resolution yields ~53mA per LSB with series resistor Hardware details in BeagleBoard user manual Software: ◦ http://groups.google.com/group/beagleboard/browse_t hread/thread/7810fb7a93e44a4e ◦ Software is currently broken ◦ /dev/I2C-1 file access issue in Ubuntu “sudo i2cdump 1 0x4a” in terminal returns “Device or resource busy” error Permission conflict with OS over i2c file access? “sudo fuser –km /dev/i2c-1” will freeze the OS if attempting to kill any processes that use i2c-1 Current consumption comparison of various system energy saving or shutdown options accessible from the Ubuntu GUI Test Current (mA) Idle OS 602 Blank screen 598 Suspend 566 Processor halted (shutdown) 380 Embedded System Power ◦ Lecture 16 – Power Aware Programming ◦ http://www.newelectronics.co.uk/electronics-technology/optimising-thepower-consumption-of-embedded-systems/30528/ ◦ http://www.wimserc.org/members/Papers/1215608754-PID632073.pdf ◦ http://www.netrino.com/node/178 ◦ http://low-powerdesign.com/article_cypress_093010.html Keil MCB2300 ◦ http://www.keil.com/support/man/docs/mcb2300/mcb2300_intro.htm Beagleboard-XM ◦ Beagleboard.org ◦ Google Groups on Beagleboard topics