ARM Energy Usage Report

advertisement
Jon Stanley
EE382N-4
Spring 2011

Objectives
Tasks
Keil MCB2300 ARM7 Evaluation Board

Beagleboard-XM




◦
◦
◦
◦
System overview
Measurement setup
Instruction power usage
Code power comparison
◦
◦
◦
◦
System overview
Android and Ubuntu OS experience
Measurement setup
Code power comparison
Summary
Other notes



Explore several points in Lecture 16 on Power
Aware Computing
Characterize power of an ARM processor
Characterize energy use of code
◦ Low level (bare metal)
◦ High level (Linux OS)
◦ Coding methods


Power and energy measurements
Software
◦ Instruction level
◦ Small routines
◦ C-code comparisons:
 Delays, timer interrupts, OS scheduling
◦ Linux OS impact
◦ Optimization impact
◦ Data type size impact

Hardware
◦ Keil MCP2300 and Beagleboard-XM

NXP LPC2387 microcontroller
◦ ARM7 processor core
◦ GPIO to LEDs, ADC I/O, etc.


uVision IDE and Flash Magic for programming
Bare metal code
◦ More accurate measure
of power
◦ Can measure power of
each instruction type

Measurement setup
◦ Ammeter with uA resolution due to tiny changes in
power consumption of different instructions
◦ Use good shielded probe to VBUS jumper on board
◦ Careful of current drift if instrument isn’t high quality


Same instruction
1000 times to
minimize effect of
Branch on power
Consider data
hazards when
writing test code
0
1
2
3
4
5
loop
ADD
ADD
ADD
ADD
ADD
ADD
R0,
R1,
R2,
R3,
R4,
R0,
R0,
R1,
R2,
R3,
R4,
R0,
#1
#1
#1
#1
#1
#1
…
999
ADD R4, R4, #1
B loop
Instruction
Current (mA)
NOP
41.94
MOV
41.56
ADD
41.59
ADD with stall
41.51
ADD with LSL
41.52
MOV with LSL
41.57
MUL
41.65
B
41.05
PC Load to Jump
40.86




NOP uses more power?
Stalled pipeline uses
less power, but uses
more energy due to
longer execution time
ADD uses less power
than MUL
MOV to PC uses less
power than Branch
loop
loop
jump
ADD R0, R0, #1
CMP R0, #1
ADDLT R1, R1, #1
ADDEQ R1, R1, #1
ADDGT R1, R1, #1
B loop
ADD R0, R0, #1
CMP R0, #1
BLT jump
BEQ jump
ADD R1, R1, #1
B loop


Method
Current (mA)
Conditional
42.33
Branch
42.33
No measurable difference
at instruction level
Branch would be better
when there is large
amounts of code between
each conditional
statement to reduce
execution time



C Code
LED PWM dimming
Timer interrupt PWM
Method
Current (mA)
Interrupt
92.31
Delay
92.31
◦ Interrupt handler
◦ LED control by interrupt

Delay PWM
◦ Main loop uses delays
◦ Main loop controls LED

Surprising result?
◦ Delay did not put processor in idle (like OS sleep).
◦ Tests were repeated later in Linux OS.


Released in 2010
Features:
◦ TI DM3730 microcontroller
◦
◦
◦
◦
◦
◦
◦
◦
◦
 1GHz ARM Cortex-A8 core
512MB DDR
Ethernet 10/100
4-port USB
RS232
Stereo audio
S-video and DVI
Boots from 4GB MicroSD card
Comes with Angstrom Linux OS verification image
Power usage is a realistic scenario of a system





Installation files:
http://softwaredl.ti.com/dsps/dsps_public_sw/sdo_tii/TI_Android
_DevKit/02_00_00/exports/AM37X.tar.gz
README is contained within the tarball
PC with Ubuntu 10 to build file system on SD card
Android boot-up and program execution was slow
Not friendly for developing code within OS


Installation files and instructions:
http://elinux.org/BeagleBoardUbuntu
Use instructions for “Demo Image - Maverick 10.10”
◦ Installation will take at least 4 hours
◦ Need internet connectivity - use “dhclient usb1” to get IP
◦ Note: the NetInstall method resulted in broken OS for me


PC with Ubuntu 10 to build file system on SD card
Ubuntu GUI runs faster than Android
◦ Built-in terminal and friendly for developing within OS
◦ “sudo apt-get install gcc” to install GCC compiler

Hardware: BeagleBoard with Ubuntu
◦ Fluctuating current consumption so a DMM is not
practical for an average current measurement
◦ NI cDAQ-9174 with 9227 analog current module
◦ Sample current consumption at 2kS/sec
◦ Measure average current with ~1.5 minute window
◦ Turn off user-defined LEDs to reduce noise
 echo none >/sys/class/leds/beagleboard\:\:usr0/trigger
 echo none >/sys/class/leds/beagleboard\:\:usr1/trigger

Software: 2 C-codes for each test
◦ First code has simple for loop for current measure
◦ Second code uses gettimeofday() and runs the same
for loop 1000 times to determine average runtime
main() {
int i,c;
while(1) {
for(i=0;i<10000;i++)
c += i;
}
}

Simple code that accumulates iteration count
Idle OS
Running
Test
Program



HW2 exercise – gcc creates poor asm code
that performs unnecessary memory operations
Direct correlation to energy consumption
 Energy = V * (Icode – IIdleOS) * T
Test
Current
(mA)
Time
(us)
Coulomb
(mA*us)
Idle OS
602
For loop sum in C
673
135
9585
Optimized for loop in asm
677
45
3375
3x decrease in energy consumption by
eliminating unnecessary memory operations

Coulomb usage comparison of for loop sum code
using different data types for the variable “c”
Test
Bytes
Idle OS
Current
(mA)
Time
(us)
Coulomb
(mA*us)
602
Short
2
657
184
10120
Integer
4
673
135
9585
Float
4
639
891
32967
Double
8
660
487
28246

Coulomb usage appears correlated to both the
complexity of the library function that performs
the add and size of the data type

Test application:
◦ Loop that writes to variable. 50% duty. ~2ms period.

Test 1: Delay
◦ Cannot depend on CPU clock period for delay timing
due to OS, so poll using gettimeofday

Test 2: OS Scheduler
◦ Use usleep for timing with us resolution
Test
Current
(mA)
Time
(us)
Coulomb
(mA*us)
Jitter (us)
Min
Avg
Max
Idle OS
598
Delay
621
2020
46460
7
10
7209
Scheduler
598
2238
0
37
180
9162



Smallest data type generally lower energy
Use better compilers or check assembly code
Direct correlation between code performance and
energy usage for ARM processors
◦ Beware of stalling the pipeline
◦ Look at Coulomb usage of different code approaches
 Fast algorithms may use more power than slow ones but use
less energy overall
◦ Memory transactions are energy expensive
◦ Use interrupts or OS scheduler instead of polling in OS

Results may not entirely hold for x86 or other
processors, but these can be characterized using
the methods outlined here

Issues:
◦ Difficult to find documentation
◦ Community-oriented development
◦ Fairly new board

Processor voltage and clock is adjustable but
driver library not fully merged in Ubuntu yet
◦ cpufrequtils package

Processor current consumption is accessible
to software on the board via device on I2C

Processor current consumption is accessible to
software on the board
◦ Not recommended for precision gauging because 10-bit
ADC resolution yields ~53mA per LSB with series resistor


Hardware details in BeagleBoard user manual
Software:
◦ http://groups.google.com/group/beagleboard/browse_t
hread/thread/7810fb7a93e44a4e
◦ Software is currently broken
◦ /dev/I2C-1 file access issue in Ubuntu
 “sudo i2cdump 1 0x4a” in terminal returns “Device or
resource busy” error
 Permission conflict with OS over i2c file access?
 “sudo fuser –km /dev/i2c-1” will freeze the OS if attempting
to kill any processes that use i2c-1

Current consumption comparison of various
system energy saving or shutdown options
accessible from the Ubuntu GUI
Test
Current
(mA)
Idle OS
602
Blank screen
598
Suspend
566
Processor halted (shutdown)
380

Embedded System Power
◦ Lecture 16 – Power Aware Programming
◦ http://www.newelectronics.co.uk/electronics-technology/optimising-thepower-consumption-of-embedded-systems/30528/
◦ http://www.wimserc.org/members/Papers/1215608754-PID632073.pdf
◦ http://www.netrino.com/node/178
◦ http://low-powerdesign.com/article_cypress_093010.html

Keil MCB2300
◦ http://www.keil.com/support/man/docs/mcb2300/mcb2300_intro.htm

Beagleboard-XM
◦ Beagleboard.org
◦ Google Groups on Beagleboard topics
Download