Power Aware Embedded Operating System Design

Power Aware Embedded Operating System Design
by
Travis C. Furrer
Submitted to the Department of Electrical Engineering and Computer Science
in Partial Fulfillment of the Requirements for the Degrees of
Bachelor of Science in Electrical Engineering and Computer Science
and
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2000
2000 Massachusetts Institute of Technology. All Rights Reserved.
Signature of Author ..................................................
Department of Electrical Engineering and Computer Science
May 19.2000
............................
Anantha Chandrakasan
Professor of Electrical Engineering
C ertified by ...........................................................................
sA
iate
hesis Su ervisor
Accepted by ..................................
...........
.................
.
. ......................
.
Arthur C. Smith
Professor of Electrical Engineering
Chairman, Department Committee on Graduate Theses
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
JUL 2 7 2000
LIBRARIES
2
Power Aware Embedded Operating System Design
by
Travis C. Furrer
Submitted to the Department of Electrical Engineering and Computer Science on
May 19, 2000, in Partial Fulfillment of the Requirements for the Degrees of
Bachelor of Science in Electrical Engineering and Computer Science
and
Master of Engineering in Electrical Engineering and Computer Science
Abstract
The ptAMPS low-power distributed wireless sensor project seeks to design small embedded systems that
will require an operating system (OS). I chose to port the Embedded Cygnus Operating System (eCos) to the
StrongARM 1100 microprocessor (SA-1100) so that it can be used on the pAMPS system. The OS was
debugged and tested for execution from both RAM and ROM on an SA-1100 evaluation board. A
description of how the OS was ported and debugged is given. A simple form of Dynamic Voltage Scaling
(DVS) was implemented and energy-efficiency experiments were done. FIR filtering with a variable-length
filter was used as a sample application to show that DVS provides a significant energy savings. Detailed
results from these experiments are presented. Additional ideas for ways to save energy in the OS, left as
future work, are included.
Thesis Supervisor: Anantha Chandrakasan
Title: Associate Professor of Electrical Engineering
3
4
Acknowledgements
I would like to acknowledge and thank Anantha for inspiring me to work with his group, for arranging
for my research assistantship funding, for giving me the freedom to organize my own project, and for his
enthusiasm about each success along the way. I feel privileged to have worked with him and his group.
I would also like to thank the following people and organizations for their invaluable help, without
which I could not have completed this thesis:
My parents, for their loving and wise direction that led me to pursue this degree, for always trying to
give me the best, and for letting me call them to talk anytime about anything. Without their persistent
encouragement I could not have made it through MIT. I hope they're able to stick around to see me through
many more successes! I would also like to thank the rest of my family for serving as an encouragement and
a good example to me while I was in college.
Rex, Amit, Manish, SeongHwan, Jim, Eugene, Wendy, Alice, PaulPeter, and everyone else in Anantha's
group who have contributed to the optimistic, courteous, friendly, humorous, and intelligent atmosphere
here. I am impressed with each of you and your research. Special thanks to Rex who provided the DVS test
board for use with the Brutus, and helped run the experiments. Special thanks also to Manish for help
designing the variable-length filter experiment.
Red Hat Software (formerly Cygnus Solutions), for giving me early access to their StrongARM 110
version of eCos. Thanks also to various people of the ecos-discuss@sourceware . cygnus . com
mailing list, such as Andrew Lunn who provided source code to measure CPU load under eCos.
The brains behind the zephyr help instance, who are available 24 hours a day to answer any question. It
was from them that I learned many important skills that allowed me to do a project of this kind.
Charles Leiserson, Harald Prokop, and Jamey Hicks for their involvement with my UROP research last
year, which initially led me onto the topic of low power software (or "cool" software, as Charles says).
My friend Ryan, for preventing much anguish for me by advising me (from experience!) to get my thesis written early.
5
6
"The harderI work, the luckier I get."
- Alvin Furrer
7
8
Table of Contents
1
2
3
4
5
A
B
C
D
E
Introduction.............................................................................................................................................
1.1
M otivation: The uAM PS Project ..............................................................................................
1.2 Background ..................................................................................................................................
1.2.1
General Low Power Software Techniques ................................................................
1.2.2
Low Power OS Techniques .......................................................................................
1.2.3
The StrongARM 1100 M icroprocessor .........................................................................
1.2.4
Em bedded Real-Tim e Operating System s.................................................................
1.2.5
The Em bedded Cygnus Operating System .................................................................
1.3
O verview ......................................................................................................................................
Porting the OS.........................................................................................................................................
2.1
Initial W ork on the Source Code .............................................................................................
2.1.1
Copying the SA-I10 HAL as a Starting Point..........................................................
2.1.2
Porting of Self-Contained Functions and M acros .....................................................
2.1.3
O verview of the Boot Sequences .............................................................................
2.1.4
Building the V irtual M emory Page Tables.................................................................
2.1.5
Enabling the M M U ...................................................................................................
2.2
The D ebugging Process ...............................................................................................................
eCos Applications...................................................................................................................................
3.1
Early Dem o Program s..................................................................................................................
3.2
Developing a D V S Demo ............................................................................................................
Ideas for Energy Efficiency Im provem ents.........................................................................................
4.1
DV S-Related Techniques.............................................................................................................
4.1.1
Choose the Right Boot Frequency..............................................................................
4.1.2
Energy vs. Quality Scaling .........................................................................................
4.1.3
Thread-based Voltage Scheduling..............................................................................
4.2
N on-D V S Techniques..................................................................................................................
4.2.1
K now your Hardware ................................................................................................
4.2.2
Efficient/Restricted use of M em ory ...........................................................................
Conclusion..............................................................................................................................................
D V S Test Board Specifications ..............................................................................................................
A .1
A .2
A .3
eCos
Control Signals from Brutus to D V S Test Board .....................................................................
Specification for D V S Test Board Inputs ................................................................................
A C Characteristics .......................................................................................................................
Regression Test Results.................................................................................................................
The Big Picture of StrongARM Power Consum ption ............................................................................
Instructions .............................................................................................................................................
15
15
15
16
18
20
23
23
25
27
27
29
31
32
36
. 43
45
47
47
47
51
51
51
51
51
51
51
52
53
55
55
55
57
59
61
D . 1 Building eCos for the Brutus ....................................................................................................
D .2 Program m ing Flash M em ories................................................................................................
R aw D ata From Experim ents..................................................................................................................
63
63
65
67
E. 1
Data from V oltage/Frequency Scaling Experim ent.....................................................................
E.2
Data From V ariable-Length Filter D V S Experim ent...................................................................
Photos .....................................................................................................................................................
67
69
71
F
B ibliograp hy .............................................................................................................................................
9
73
10
List of Figures
Figure 1.1: Diagram of a uAMPS Node ....................................................................................................
Figure 1.2: SA-1100 Block Diagram............................................................................................
..
Figure 1.3: SA- 1100 Power and Clock Supply Sources and States During Power-Down Modes .............
Figure 1.4: Scalability of eCos [5].................................................................................................................
Figure 1.5: Modularity of eCos [5]................................................................................................................
Figure 2.1: SA-1100 vs. SA-110....................................................................................................................
Figure 2.2: eCos Configuration Tool.............................................................................................................
Figure 2.3: eCos Configuration for Brutus Platform ..................................................................................
Figure 2.4: SA- 1100 Memory Map for the entire 4Gb 32-bit Address Space...........................................
Figure 2.5: Memory Layout for STUBS Startup .......................................................................................
Figure 2.6: Memory Layouts for RAM and ROM Startup ............................................................................
Figure 2.7: Before Enabling the MMU..........................................................................................................
Figure 2.8: After Enabling the MMU ............................................................................................................
Figure 3.1: Measured Energy per Operation vs. Frequency and Supply Voltage .....................................
Figure 3.2: Energy per Operation with Voltage Scaling vs. without Voltage Scaling ...............................
Figure 3.3: Screen Shot from DVS Demo ..................................................................................................
Figure A. 1: Transients in output voltage of DVS Board ...........................................................................
Figure F. 1: Photo of Brutus Board.................................................................................................................
Figure F.2: Photo of DVS Test Board .......................................................................................................
Figure F.3: Photo of Brutus with DVS Board connected ...........................................................................
Figure F.4: Screen Shot of Graphical Demo on Brutus LCD .....................................................................
Figure F.5: StrongARM 1100 Chip Photo.....................................................................................................
11
16
21
22
24
24
29
30
31
37
41
42
43
44
49
50
50
57
71
71
71
72
72
12
List of Tables
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
2.1: Boot Sequence for RAM Startup ..................................................................................................
2.2: Boot Sequence for ROM Startup..................................................................................................
2.3: Boot Sequence for STUBS Startup............................................................................................
2.4: Virtual Memory Mapping as specified in Page Tables ...........................................................
3.1: Voltages used with DVS for each Clock Frequency ................................................................
A .1: V oltage C ontrol Signals...............................................................................................................
A.2: DVS Test Board Voltage Control Signals ................................................................................
B. 1: eCos Regression Tests Results for Brutus Port........................................................................
C. 1: Breakdown of Power Consumption of SA- 110 processor when running Dhrystone ...............
E. 1: Data from Voltage/Frequency Scaling Experiment ................................................................
E.2: Data from Variable-Length Filter DVS Experiment.................................................................
13
34
35
36
39
49
55
56
59
61
68
70
14
Chapter 1
Introduction
1.1 Motivation: The uAMPS Project
The micro-Adaptive Multi-domain Power-aware Sensors (pAMPS) project [19] is being done by students in the Integrated Circuits and Systems group of the MIT Microsystems Technologies Laboratory. The
project vision is to perform efficient distributed remote sensing using small wireless sensor nodes. Each
node is battery-powered and contains a microsensor (for collecting remote data), an embedded microprocessor (for local pre-processing of sensor data), and a radio transceiver (for wireless transmission of sensor data
to a central base station). The type of the sensor and other details will depend on the application, but the
pAMPS system is being designed as a general substrate to be used in various remote sensing applications.
Although the long-term vision is for the nodes to be fully integrated into a custom system-on-chip (SOC),
the initial prototypes will be built with commercially available components, including the Intel StrongARM 11001 low-power microprocessor [9], assembled on a circuit board (Figure 1.1).
My contribution to the pAMPS project has been to provide the operating system (OS) to control the
software running on each node. The nodes have several special needs that constrain the type of OS that can
be used. For example, because the signal from a sensor must be sampled at a precise rate even while other
computing tasks are being performed, a real-time and multi-threaded OS is needed. More importantly,
though, the OS needs to operate under strict power/energy constraints because the nodes are battery powered. The design and implementation of an OS that can meet the needs of the pAMPS project is the main
topic of this thesis.
1.2 Background
Some background will be helpful before discussing the actual project. I will first make some comments on
the concept of low power software in general, and introduce an important low power software technique that
can be implemented at the OS level specifically. After this comes a short look at the StrongARM 1100
microprocessor. Finally, I will comment on embedded real-time operating systems in general and explain
why I chose ECOS as the OS to base my project on.
1.
StrongARM and ARM are registered trademarks of Advanced RISC Machines Limited.
15
External Stimulus (application-specific)
Sensor
(acoustic,
seismic, etc.)
AiD
Converter
To/From Base Station
or Remote Node
DRAM
(16Mb EDO)
To PC
(for debugging)
Battery
Figure 1.1: Diagram of a uAMPS Node
1.2.1 General Low Power Software Techniques
Many techniques for energy efficiency [1, 30] are applicable at the level of hardware, but there are also
methods that can be applied in software. 2 In this thesis, I use the term low power software to refer to software on which some sort of optimization has been done (whether at the algorithmic level in the source code,
at lower levels by a compiler, or both) with the energy consumption of the system in mind. Traditionally,
runtime has been used as the metric for most optimizations because performance is so important. In battery powered embedded systems, however, battery lifetime is at stake and thus energy consumption is at
least as important as performance.
The difference between energy consumption and power consumption is a distinction that needs to be
highlighted. Although there are cases in which one might be interested in minimizing peak power consump-
2.
It is worthwhile to make a special note of any techniques that are unique to software (no effort
in hardware could accomplish a similar improvement). However, it is still useful to consider
techniques that attempt to use software enhancements to make up for inefficient hardware. For
example, the hardware available on a satellite is fixed, but more energy-efficient system
software could perhaps be uploaded.
16
tion, for the pAMPS nodes we are interested in minimizing overall energy consumption (while maintaining
the required functionality) since that is what matters for battery life. The term "low-power software" seems
to imply simply that we are minimizing average power, but that would not be enough since time is also a factor in overall energy consumption. For this reason, the term "energy efficient software" might have been
more accurate.
The most obvious low power software technique is to make the software "lean" by eliminating inefficient code or unnecessary functionality that increases code size, which effects energy consumption by wasting both memory and clock cycles. Other techniques include: compiler enhancements for more energy
efficient code generation, compiler enhancements for memory alignment of code and data for cache performance, algorithmic transformations for performance or for cache efficiency, algorithmic transformations for
trading off energy vs. quality, as so on [8, 13, 26, 35, 36].
Optimizing software for energy consumption is not unlike optimizing for performance. In fact, very
often these two can coexist: most transformations that decrease runtime (increase performance) will
also reduce energy consumption. This is an important point to keep in mind. The synergy between these
two types of optimizations is convenient. Also, when evaluating low power optimizations we need to be
careful to distinguish between those that would already have been done traditionally for performance's sake,
and those that are unique to low-power systems.
An example of an OS optimization that would be done for the sake of energy, but never for performance, is to coarsen the grain of scheduling at times when the performance of all threads is non-critical (i.e.
it is okay for threads to wait for a longer time while other threads run). Since context switches have a cost in
terms of energy (due to the flushing of caches, saving and restoring state, etc.), minimizing them by coarsening the scheduling would save energy. But since performance is non-critical in these situations, it would
make no sense to do coarsening for the sake of performance.
Naturally, to perform low power software optimizations you need to have a detailed understanding of
how energy is consumed by the hardware. While some components consume energy at a constant power
level that cannot be significantly effected by software (i.e. leakage), the energy consumption of other components can be effected dramatically depending on how the software operates (for example, energy expended
in SRAM due to cache misses). While there are large similarities between embedded systems that make certain low power optimizations generally useful, one should watch out for details of energy consumption in
particular hardware that could effect optimization tradeoffs that are made with appropriate use of Amdahl's
law. 3 For example, if your floating point unit is an off chip coprocessor and communicating with it is expen-
17
sive in terms of energy, it may actually be more efficient to perform certain floating point calculations using
software emulation instead. There is clearly a different tradeoff between energy and performance in this
case, as compared to one where floating point operations are energy-efficient, and thus the types of software
optimizations used should be different.
One of the frustrations in designing low-power software is that it almost always involves stripping away
functionality that is considered unnecessary, but that seems useful. We may wish to retain some of this functionality at least part of the time. A power aware system attempts to provide different levels of quality/functionality (and thus energy consumption) at different times by intelligently monitoring the available system
energy and activity, and user preferences (i.e. desired latency or quality). By providing high functionality/
performance sometimes and dynamically scaling down to more limited functionality at other times, the hope
is to have the best of both worlds: average energy consumption is reduced without sacrificing peak performance. This technique is useful as long as the overhead associated with scaling functionality does not outweigh the average energy savings it provides. 4
1.2.2 Low Power OS Techniques
Since the OS is at the heart of the system, it is likely in some cases to be an important place to apply general
low power software techniques. We need to focus on the OS in addition to applications because it performs
certain privileged system management activities that can have an important impact on energy efficiency.
Also note that any techniques that happen to be platform specific (for example, those for which cache size or
other details of the hardware are important parameters) are more naturally applied to the OS code since it is
usually very platform-specific anyway (applications, on the other hand, may be compiled to run on several
different platforms and therefore such techniques may not be as feasible).
Research has been done on how to implement OS schedulers that make good use of the idle and sleep
modes of both the processor and peripherals [14-18, 27, 31]. This is a "power aware" technique because the
OS can actively manage power consumption by varying the set of enabled functional units while still meet-
3.
Although Amdahl's law should always be kept in mind, we may sometimes need to consider
energy efficiency in almost every area simultaneously in order to meet an absolute energy
constraint.
4.
There is overhead involved in executing the algorithms that are used to manage the scaling.
There may also be a small amount of energy used by any additional hardware needed to
support the scaling. If these overheads turn out to be large, it may have been better to use a
traditional low-power design approach.
18
ing the needs of user applications. Implementing power awareness at the OS level is advantageous since it
can improve energy efficiency even for applications that are not themselves power aware.
There are several ways in which an OS can be power aware. Any of the common OS features might be
implemented to vary their functionality based on the available energy or user demands. For example, a
power aware file system/disk driver might provide synchronous disk access when necessary for robustness,
but switch to asynchronous disk access at other times so that energy could be saved by clustering disk
accesses together and minimizing the number of disk accesses (assuming that there is a constant component
to the cost in terms of energy of each separate disk access). Another example might be a power aware scheduler that modifies the scheduling policy in such a way that allows dynamic trade-offs between performance
and power consumption.
The most important example of power awareness in the OS for this thesis, however, is the use of a technique called dynamic voltage scaling (DVS) 5 [22, 24, 33]. DVS takes advantage of an important property
of static digital CMOS logic: the energy consumed to perform a given computation (i.e. to execute a
sequence of instructions) is roughly proportional to the square supply voltage, as can be seen in the following formula [28]:
Energy
= CtOtV2D +
IleakVDDM
Where Cot is the total switched capacitance of the computation, VDD is the supply voltage, Ileak is the
leakage current, N is the number of clock cycles taken by the computation, and f is the clock frequency. 6
Note that Co, is the sum of the total switched capacitances in each individual clock cycle. The total
switched capacitance will be different for different clock cycles because different instructions are being executed and different functional units are being used. Here, we ignore these details by lumping everything into
the single value CtotIt has been shown previously [3, 38] that the quadratic dependency on voltage can be exploited to provide significant energy savings by staticallyscaling down frequency and voltage together. DVS, on the other
hand, takes advantage of this concept to scale frequency and voltage in the context of dynamically changing
5.
It is interesting to note that Transmeta's LongRun TM technology [37] is essentially dynamic
voltage scaling. The TM5400 can apparently scale from 500 MHz at 1.2 V to 700 MHz at
1.6V.
6.
Recall that the maximum achievable frequency depends on voltage (this can be seen in Figure
3.1 on page 49), which constrains the combinations of values we can use to reach a desired
energy efficiency.
19
system activity and requirements (i.e. at runtime). Because the fraction of time spent at different frequencies
is highly application dependent, so are the actual energy savings that can achieved with DVS. In fact, there
are even unusual cases in which DVS can actually be worse in terms of battery life. Battery life is sometimes
shorter when drawing energy at a constant rate than when drawing the same amount of energy in periodic
pulses [21]. In many cases DVS might fall short of the best-case (in terms of battery life) pattern of power
consumption even despite its energy efficiency.
The fundamental software problem of DVS is in implementing an algorithm that decides exactly what
frequency/voltage level to use at each instant in time. Algorithms that do this are called voltage scheduling
algorithms, and they are a topic of ongoing research [23, 25, 29]. The simplest such algorithms are intervalbased. Time is divided into equal intervals and the algorithm uses data (such as processor utilization in terms
of idle cycles) gathered in previous intervals to determine the frequency/voltage for the next interval.
Although such algorithms provide enough energy savings in many cases to justify the use of DVS, they are
not optimal. Potential enhancements could involve mechanisms to dynamically vary the interval length, and
to better predict the processor usage in the next interval by independently considering the activity of each
thread (this is discussed at the end of [25]).
1.2.3 The StrongARM 1100 Microprocessor
The Intel StrongARM 1100 processor was chosen for tAMPS for several reasons. Foremost is probably that
it has a very high performance/power ratio. 7 However, it is also a good choice because it has built in capability for software controlled frequency scaling, which is needed for DVS. Supply voltage scaling, however,
requires off-chip hardware which will be built into the ptAMPS board.8
The SA- 1100 is also nice because it has several integrated peripheral units. As you can see from Figure
1.2, there are general-purpose 1/0 pins (GPIO), 5 specialized serial ports, and a built in LCD display controller.
In addition to software-controllable clock frequency, the SA-1 100 also has idle and sleep modes. The
idle mode stops the clock to the ARM core, but most of the peripherals remain active. Power consumption in
idle mode is lowered by about a factor of 5. The sleep mode shuts almost everything off, and power consumption is lowered by orders of magnitude. Figure 1.3 gives more detail about which functional units
7.
The Hitachi SH3 processor has a higher performance/power ratio, but lower raw performance
than the StrongARM.
8.
The second generation StrongARM chips, due in mid 2000, will supposedly have voltage
scaling built in also.
20
remain active is which modes. For complete details on the SA-1 100, consult the SA-1 100 developer's manual [9].
--
r --
3.686
Os
-- -
I;
Instruction
tes
(1
32768
OSCDDUao
PC
and
ARMTm*
SA-1
Core
I
Read
Buffer
I
I
Addr
(8bye)E
Minicache
IDMMU
Misc
Test
Load/Store Data
I Processing
OS Timer
GeneralPurpose I/O
Interrupt
Intel*
Strnn ARML
SA-1100
JA
.
(16 Kbytes)
IMMU
1
PL
.
Core
Write
Buffer
System Bus
System
Control
-r
Module i-(SCM) I
Management
Reset
Controller
1
~
Memnory
I
Cont...l.
Cont--
-
-DMA
90ontrollerl
Module
--
LCD
Controller
(MPCM)
I
I
P-erip e l Control
MOdule (PCM)
Peripheral Bus
SSerial
Channel 0
UjSB
+
Channel 1
SDLC
erial
Channel 2
IrDA
Serial
Channel 3
UART
(from Intel SA- 1100 Developer's Manual)
Figure 1.2: SA- 1100 Block Diagram
21
+
Serial
Channel 4
CODEC
4
I
Power Management Mode
Supply Source
Run
Idle
Sleep
Module
Pwr
CIk
Pwr
Clk
Pwr
Pwr
CIk
Disabled
Stopped
On
Running
CIk
CPU
MMUs (l&D)
Stopped
Write buffer
Read buffer
JTAG
VDD
3.6864
MHz
OS timer
LCD controller
On
Serial channel 0-4
Running
On
Memory and
PCMCIA
Running
control
Real-time clock
Interrupt
controller
Power manager
VDDX
32.768
kHz
General-purpose I/O
(from Intel SA- I100 Developer's Manual)
Pin pads
Figure 1.3: SA-
1100 Power and Clock Supply Sources and States During Power-Down Modes
Software development for the StrongARM 1100 is done on the "Brutus" evaluation board [10, 11].
Once the software works on the Brutus, porting it to the initial pAMPS board should not be difficult since it
will be similar to the Brutus. The Brutus computer was designed as a test platform to demonstrate almost all
of the capabilities of the StrongARM 1100 microprocessor. Its components include:
" SA-1100 Microprocessor
- Memory System (16Mb DRAM, 512K SRAM, 256K Flash, 256K ROM)
- Two PCMCIA Slots
* 320x240 Color LCD Screen
- Audio Accessories (microphone, speaker)
" HEX LED Display (one digit)
- Touch Screen
- Keyboard
" Two RS-232 Serial I/O Interfaces
Since most of these components will not be present on the ptAMPS board, direct OS support (i.e. drivers) for
them will not be needed. However, these peripherals (especially the LCD screen and keyboard) are helpful
for providing direct interaction with demonstration applications that are meant to be run only on the Brutus.
22
1.2.4 Embedded Real-Time Operating Systems
Some of the applications of pAMPS will be real-time. 9 Rather than writing an embedded real-time operating system (RTOS) of our own from scratch, we chose to start with an existing RTOS and make modifications as needed. The current state of RTOS's is amazingly diverse; there are well over 100 different available
RTOS's [5, 34] with many distinguishing factors.
For pAMPS, we need an open source RTOS because we intend to modify the source code to add power
aware features. StrongARM support is preferable, so that we don't have to port it ourselves. We need scalability since we want to be able to have very lean code (no unneeded features). Preferences regarding other
distinguishing features are less important.
1.2.5 The Embedded Cygnus Operating System
After reviewing a long list of RTOS's, my conclusion was that the Embedded Cygnuslo Operating System
(eCos) [6], from Cygnus Solutions, would be suitable as a starting point for use with our pAMPS prototype.
The features of eCos that make it attractive for this project are:
- scalability: eCos has over 200 configuration options (which can be chosen using a handy configuration tool, seen in Figure 1.4) for fine grain scalability, and code size can be as small as a few kilobytes.
- compatibility: eCos has pITRON compatibility, and will soon have EL/IX [7] compatibility which
makes it more compatible with Linux.
" multi-platform: Should we ever choose to stray from StrongARM, eCos is more likely to support
our next choice of platform.
- modularity: eCos is implemented with a hardware abstraction layer (HAL) that makes it easier to
port to new platforms. It is also implemented in such a way that makes it easy to plug in a custom
scheduler, device driver, etc. (see Figure 1.5).
" open source: eCos source code is freely downloadable at sourceware.cygnus.com
* development toolchain: eCos uses the standard GNU toolchain
" support: There is an active mailing list (ecos-discuss) for free support. Increasing volume on this
mailing list indicates that eCos is becoming more popular.
9.
For a concise introduction to the topic of real-time systems, look at [32]. For more than you
probably want to know about rate-monotonic real-time scheduling, refer to [12]. The topic of
real-time systems has been studied for decades and is rather advanced, and thus I do not
attempt to discuss it in this thesis.
10. ECOS was named prior to the acquisition of Cygnus Solutions by Red Hat Software, which
happened in January 2000.
23
Configuration and Build Tools
Application-specific
operating system
eCos Kernel Components
st
Interrupts
Schedulers
Exception
Handlin
Acd-ns
Memory
AlocatIon
Synch
Dr!vers
F
Libraries
(from Cygnus eCos Market Backgrounder)
Figure 1.4: Scalability of eCos [5]
Figure 1.5: Modularity of eCos [5]
24
Unfortunately, eCos does not currently allow the dynamic loading of code. Until eCos supports this feature it will not be easily possible for us to download additional application code onto a ptAMPS node after
deployment. Instead, the entire OS and application (which are actually compiled together into the same
binary image) must be replaced together.
Other OS's that were considered did not appear to meet our needs as well as eCos. For example,
embedded Linux has real time support, but is not nearly lean enough and is not scalable (it is not trivial to
eliminate the file system, for example). The uCLinux OS is leaner, but has not yet been ported to any ARM
platforms. ChorusOS (from Sun) is not open source. LynxOS has an interesting patented interrupt handling
mechanism, but is not open source and also doesn't support ARM. RTems is open source and actually has
advantages over eCos, but has no ARM support. Several other RTOS's were considered and all had similar
issues.
1.3 Overview
Chapter 2 gives details about what is involved in porting and debugging an embedded operating system such
as eCos. Chapter 3 describes the DVS demo application that I developed. Some ideas for energy efficiency
improvements are given in Chapter 4. After the conclusion in Chapter 5, several appendices give useful data
and more information about the OS and DVS experiments.
25
26
Chapter 2
Porting the OS
It is of course necessary to get the basic OS features working before any experimentation can be done with
special low-power OS features like DVS. This chapter describes how eCos was ported to run on the Brutus.
A lot of low-level details had to be taken care of to get eCos to boot properly.
2.1 Initial Work on the Source Code
Fortunately, eCos is designed with portability in mind. Most of the kernel is written in C++ and is entirely
portable. The platform dependent code is isolated in what is called the Hardware Abstraction Layer
(HAL) and is written in C. Porting eCos to a new platform means creating a HAL for that platform. The
HAL code contains the routines necessary to boot and initialize the system. It also contains several functions
and macros that are used by the rest of the eCos code. Although there are many features in common between
microprocessors, the method of accessing and controlling the features is different for each processor. For
example, most platforms have caches, interrupts, timers, and MMU's. The HAL presents a common interface for these features to the platform dependent part of eCos, through various functions and macros. Most
of these functions are short and self contained (it doesn't take very many instructions to mask an interrupt,
for example), and thus are simple to port.
Since there are often many different platforms based on the same architecture (for example, ARM, Intel,
and Cirrus Logic all make their own chips based on the ARM instruction set), the HAL is divided further
into sections that are architecture-specific and sections that are platform-specific. This made the job of porting eCos to the Brutus easier because the ARM architecture was already supported. The parts of the HAL
that actually needed to be ported included about 5000 lines of C and ARM Assembly code scattered across
no more than two dozen files.
There are two different ways to run eCos code: from RAM or from ROM. Running code from ROM
requires first programming the ROM, while running code in RAM requires first downloading the code into
RAM. Since downloading code into RAM is faster and easier than reprogramming a ROM, it is generally
desirable to perform debugging on code in RAM. However, since the SA- 1100 always boots from ROM,' 1
11. When the SA- 1100 is powered on, it begins executing code starting from address zero, which
happens to fall in the ROM area of memory. This makes sense because RAM will normally
contain garbage at boot time, and thus the boot code could not be run from there.
27
there is a small portion of code that must run from ROM and implement some sort of download protocol to
allow other code to be downloaded and run in RAM. For the ARM Software Development Toolkit (SDT),
this small portion of code is called Angel. For eCos, which uses the GNU tool chain, this code is called a
GDB Stub. The protocol is specific to the GNU Debugger (GDB) which is used on a PC to download the
code to the Brutus over an RS-232 connection. It is called a "stub" because it only implements a small subset
of the gdb remote debugging protocol -just
enough to allow code to be downloaded and run from RAM. 12
The first task in porting eCos was to get a working set of GDB stub ROMs. Since much of the initialization code in the HAL is necessary even for GDB stubs, most of the HAL had to be ported before GDB stubs
would work. All 5000 lines of code had to be ported before any real testing or debugging could be done. Fortunately, porting the GDB stubs is the majority of the work that needs to be done to get all of eCos working
(except for several additional bugs that needed to be fixed), since the remaining code is platform independent
and already debugged on other platforms.
Since GDB has a feature for using the ARM debugging protocol (adp), it can connect to a Brutus running Angel. Therefore it would be possible to use Angel to debug eCos in RAM instead of first debugging
the GDB stubs. However, various issues in connecting to Angel with gdb make this undesirable. For example, after stopping execution at a breakpoint there are bugs that prevent the execution from continuing further. This renders breakpoints useless and one is left with only single-stepping, which severely limits
debugging capability. Also, Angel does not support multi-threaded debugging (the ability to control execution on a thread-by-thread basis). So I chose to port the GDB stubs and use them instead. However, the
Angel source code is a valuable reference when writing the GDB stubs (much of the early boot code is the
same).
Since GDB stubs are a subset of eCos (they both use the same HAL), the process of compiling them is
very similar to compiling an eCos kernel. Section 2.1.1 talks about how the Brutus HAL code was prepared
for configuration and compilation in the eCos configuration tool. Section 2.1.2 goes on to discuss how the
various functions and macros in the HAL were ported (as we discussed earlier in this section). Then, Section
2.1.3 gives an overview of the boot sequence, which is the final and most difficult part of the HAL porting.
Section 2.1.4 and Section 2.1.5 describe how the SA- 1100's virtual memory system is initialized at boot
time.
12. It is the necessity of this separate downloading step that qualifies eCos as an embedded
operating system. Since embedded computers often have no text displays or keyboards, all
communication and debugging must be done from a remote host.
28
2.1.1 Copying the SA-1 10 HAL as a Starting Point
Since Cygnus had already created an eCos HAL for the EBSA285 (the SA- 110/21285 evaluation board), I
chose to use this as a starting point for my Brutus/SA- 1100 HAL. There is enough similarity between the
SA- 110 and SA- 1100 that this saved some work. The shaded areas in Figure 2.1 show the features of the SA1100 that are identical to the SA- 110. Unshaded areas are units that either did not exist in the SA-1 10, or
were significantly changed, or used to be part of the 21285 companion chip but were integrated onto the SA1100. Any code that accesses features in the unshaded areas of the SA- 1100 in this figure needs to be ported
or completely re-written.
1.
Read Buffer
8KB
Data Cache
I
512-byte
MiniDcache
nGeneral-Purpose
Interrupt
Controller
/
Memory/
Controller
DMA
Controller
Serial
Controllers
LCD
Controller
Interval
Timer
Real-Time
Clock
(from Intel SA-1 100 Developer's Manual)
Figure 2.1: SA-1100 vs. SA-110
29
rn eCos HAL
u...
.........
True
....T....r
....TrueCs.lbrb
C library
Math
library
True
True
-
mmon error code support
WA"subsysdemc
1~Waiclock
E6
device
Watchdog device
CygMon ROM monitor
True
T
True
True
True
Type
Value
Defauk Value
Macro
Fie
Boolean
True
True
CYGPKGKERNEL
Defined at line
URL
Vu is required by
file://C-\Program Files\ygnus Solutions\eCos\doc\ref*
Vrequires
CGFUNHALCOMMON KERNELSUPPORT
is required by
dby
is re
is required by
V is required by
. is required by
is required by
Vis required by
CGFUN.HALCOMMONERNELSUPPOR T
CYGPKGUITRON
CGPKLIBCMALLOC
CYGSEM LIBC STDIO THREADSAFESTREAMS
CYGSEMLIBC SIGNALS THREAD SAFE
CYGSEMLUBCSTARTUP MAINTHREAD
CYGSEM LIBCEXaTSTOPSSYSTEM
,CYGPKGsYGMON
This package contains the core functionality of the eCos kernel. It relies on
unctionality provided by various HAL packages and by the eCos
nrastructure. In turn the eCos kernel provides support for other packages
such as the device drivers and the uITRON compaibilty layer.
Figure 2.2: eCos Configuration Tool
user interface to the actual eCos
The eCos configuration tool, seen in Figure 2.2, provides a graphical
or disable entire sections of the code
source code. At the top level (seen in the figure), the user can enable
configurations within each package. The
called packages. At lower levels, the user can perform fine-grain
it by looking at the source code
hierarchical menu is not hard-coded into the tool; the tool actually generates
the tool copies the source code for the
repository. Once a specific set of configurations have been chosen,
and the configurations are writenabled packages into a build tree (separate from the source code repository)
C macros. The rest of the eCos source
ten into the build tree in the form of several header files containing
configured.
code includes these header files and uses the macros to behave as
/hal/arm/ebsa285/)
After copying all the files of the EBSA285 HAL ($ECOSSRC /packages
($ECOSSRC/packages /hal /arm/
and doing string replacements to create a distinct Brutus HAL
possible to build an eCos kernel with the
brutus /) in the eCos source code repository ($ECOSSRC), it is
new Brutus HAL using the configuration tool (Figure 2.3).
30
D-
I eCos HAL
Enumeration
206400
Delault Value 206400
CYGHWR HAL ARM BRUTUS PROCESSOR CLOCK
Macro
- CAProgram Files\Cygnus Solutions\eCos\packages\hal\a
File
Detined athlne~154fie-//C-\Program Files\Cygnus Solutions\eCos\doc\ref\ec
URL
Type
QI
Platform-independent HAL options
El L Source-level debugging support
El (CARM architecture
Provide.diagnostic dump for exceptions
FProcess all exceptions with the eCos application
Support GDB thead operations via ICE/MultCE
[E 0 ARM PID evaluation board&
El C ARM AEB-1 evaluation board
Citrus Logic EDB7pXX evaluation boards
DT
E C lntel EBSA285 StrongARM evaluation boards
E 0 Intel StrongARM 1100 evaluation boards"
uN Startup type
serial port
serial port baud rate
Value
False
False
son ... as .......... guena. .
xpressed in KHz.
ram
0
38400
0
port baud rate
seNal device divers
er Systems CMA230 board
38400
206400
False
Figure 2.3: eCos Configuration for Brutus Platform
2.1.2 Porting of Self-Contained Functions and Macros
The first file to be ported is:
$ECOS_SRC/packages/hal/arm/brutus/v-2_10/include/hal-brutus .h
because it contains a large number of one-line macros describing the register locations of the SA- 1100 that
are used throughout the rest of the HAL code. The register locations in this file are copied from the SA- 1100
developer's manual, as are several special bitmask macros for some of registers. Browsing these macros is a
good way to become familiar with the SA- 1100's features. For example, here are the definitions of the macros for the real time clock (RTC). The register names correspond to the names used in the SA- 1100 manual:
/* SA-1100 Internal Registers for System Control Module
Real-Time Clock Definitions */
#define
#define
#define
#define
/* RTSR
REG32
REG32
REG32
REG32
SA1100_REGRCNR
SA1100_REGRTAR
SA1100-REGRTSR
SA1100_REGRTTR
*/
#define SA1100_ALARMDETECTED
#define
#define
#define
/* RTTR
SA1100_1HZRISINGEDGEDETECTED
SA1100_ALARMINTERRUPTENABLE
SA1100_1HZINTERRUPTENABLE
*/
#define SA1100_CLOCKDIVIDERCOUNTMASK
#define SA1100_TRIMDELETE_COUNTMASK
31
_PTR(0x90010004)
_PTR(0x90010000)
_PTR(0x90010010)
_PTR(0x90010008)
Ox1
0x2
0x4
0x8
Ox000 )FFFF
OxO3F FOOOO
Once these basic macros are defined, several larger (but still small) macros and functions can be ported.
For example, below are the macros used by the eCos code to mask or unmask interrupts. These are defined
in $ECOSSRC/packages/hal/arm/brutus/vl_2_10/src/brutusmisc.c:
Original (for SA-110):
void haljinterrupt-mask(int vector)
*SA110_IRQCONT_IRQENABLECLEAR = 1 << vector;
void haljinterrupt-unmask(int vector) {
*SA110_IRQCONT_IRQENABLESET = 1 << vector;
Ported (for SA-1 100):
void hal interruptjmask(int
*SA1100_REGICMR &= -(1
vector)
{
<< vector);
void haljinterrupt-unmask(int vector) {
*SA1100_REGICMR I= (1 << vector);
This example is representative of the type of editing that needs to be done in a large number of macros and
functions throughout the HAL in the following files (this is not necessarily an exhaustive list):
$ECOS_SRC/packages/hal/arm/brutus/v1_2_10/include/halcache.h
$ECOSSRC/packages/hal/arm/brutus/v2_10/include/halplatformints .h
$ECS_SRC/packages/hal/arm/brutus/v12_10/include/hal_diag.h
$ECOSSRC/packages/hal/arm/brutus/v12_10/include/pkgconf/halarmbrutus
$ECOSSRC/packages/hal/arm/brutus/vl_2_10/include/plf-stub.h
$ECOSSRC/packages/hal/arm/brutus/v1_2_10/src/plf_stub.c
$ECOSSRC/packages/hal/arm/brutus/v1_2_10/src/hal_diag.c
$ECOS_SRC/packages/hal/arm/brutus/v1_2_10/src/brutusmisc .c
$ECOSSRC/packages/io/serial/v1_2_10/include/pkgconf/io-serial.h
$ECOSSRC/packages/io/serial/vl_2_10/src/arm/brutusserial .c
.h
2.1.3 Overview of the Boot Sequences
The most complex part of the Brutus HAL is the initialization code. Fortunately, the main flow of the boot
sequence is the same for all ARM processors and thus did not need to be ported. The file that contains the
code that runs when the processor boots is:
$ECOSSRC/packages/hal/arm/arch/v1_2_10/src/vectors.S
and this file did not need to be edited at all. However, the very first thing this code does is call a
PLATFORMSETUP macro. This macro, which is for platform specific initialization, is defined in:
$ECOSSRC/packages/hal/arm/brutus/v1_2_10/include/hal-platformsetup.h
and this file required a complete rewrite.
Before we discuss any details of boot-time initialization, it would be good to overview the list of actions
that are performed when eCos boots. The boot sequence is different depending on whether eCos is running
32
from ROM or RAM. The boot sequence for the GDB stubs is similar to ROM startup, but there are differences.
During the boot sequence, the Brutus HEX LED display is used to display numbers which indicate visibly which part of the boot process is currently executing. If the SA- 1100 crashes or hangs during the boot
sequence, the user can use the value on the HEX display for clues to debug the problem. The LED value at
the end of each boot sequence is zero. Normally, the boot sequence occurs so quickly that you see nothing
but a zero on the display. If you see any other number, you know immediately that the corresponding step
has failed.
Table 2.1, Table 2.2, and Table 2.3 list the actions performed as part of the boot sequence for each startup type (RAM, ROM, STUBS), along with the LED values that are displayed for each. Note that some
details have been left out from these tables. The best way to discover the complete details of what happens
when eCos boots is to read the source code directly.
There are a couple of details of the boot sequences that exist for energy-efficiency, as is recommended
by the SA- 1100 manual. First, notice that the instruction cache is enabled early (even before the MMU is
enabled), in order to allow the boot code to run much more efficiently. This means that later, when the MMU
is enabled, the ICACHE needs to be temporarily disabled, flushed, and re-enabled. Second, the clock frequency is set and clock switching (a feature that allows the core clock to double in frequency relative to the
memory clock) is enabled early in the boot sequence, in order for the boot code to run at the desired frequency (which can be chosen to trade off boot time with boot energy).
33
Next Boot Action Performed
Readout
8
If already in supervisor mode, jumps to step 5 (below).
7
Sets up exception vectors for undefined instruction and software interrupt exceptions.
6
Switches to supervisor mode.
5
Sets up exception vectors for IRQ, FIQ, prefetch abort, data abort.
4
Initializes stack pointers.
Initializes CPSR, SPSR.
Clears BSS.
3
Platform specific hardware initialization in halhardware inito. For Brutus, this sets up the
interrupt environment by masking all interrupts and setting them all to do IRQ and not FIQ.
2
(Nothing)
1
Invokes static constructors (for all the C++ code of eCos).
0
Starts the eCos kernel by calling cyg-starto.
Table 2.1: Boot Sequence for RAM Startup
34
Hex LED
Readout
(blank)
Next Boot Action Performed
Enters SVC mode, sets frequency, flushes caches.
Initializes HEX LED display.
F
Enables instruction cache.
E
Initializes peripheral pins and GPIOs.
Clears OS timer count register.
D
Sets clock frequency and enables clock switching.
C
Initializes memory interfaces (DRAM waveforms, ROM type, etc.).
B
Builds virtual memory page tables.
A
Disables domain access control.
Sets page table base address register.
9
Enables MMU and caches.
8
(Nothing)
7
Sets up exception vectors for undefined instruction and software interrupt.
6
Makes sure we are in supervisor mode.
5
Sets up exception vectors for IRQ, FIQ, prefetch abort, data abort.
4
Sets up reset exception vector (for warm reset).
Relocates data from ROM to RAM.
Initializes stacks.
Initializes CPSR and SPSR.
Clears BSS.
3
Platform specific hardware initialization in halhardwareinito. For Brutus, this sets up the
interrupt environment by masking all interrupts and setting them all to do IRQ and not FIQ.
2
(Nothing)
1
Invokes static constructors (for all the C++ code of eCos).
0
Starts the eCos kernel by calling cyg-starto.
Table 2.2: Boot Sequence for ROM Startup
35
Hex LED
Readout
(blank)
Next Boot Action Performed
Enters SVC mode, sets frequency, flushes caches.
Initializes HEX LED display.
F
Enables instruction cache.
E
Initializes peripheral pins and GPIOs.
Clears OS timer count register.
D
Sets clock frequency and enables clock switching.
C
Initializes memory interfaces (DRAM waveforms, ROM type, etc.).
B
Builds virtual memory page tables.
A
Disables domain access control.
Sets page table base address register.
9
Enables MMU and caches.
8
(Nothing)
7
(Nothing)
6
Makes sure we are in supervisor mode.
5
Sets up exception vectors for software interrupt, IRQ, FIQ, prefetch abort, data abort.
4
Sets up reset exception vector (for warm reset).
Relocates data from ROM to RAM.
Initializes stacks.
Initializes CPSR and SPSR.
Clears BSS.
3
Platform specific hardware initialization in halhardware inito. For Brutus, this sets up the
interrupt environment by masking all interrupts and setting them all to do IRQ and not FIQ.
2
Initializes stubs (initializes serial port, etc.).
1
Invokes static constructors (for all the C++ code of eCos).
0
Starts the eCos kernel by calling cyg-starto.
Table 2.3: Boot Sequence for STUBS Startup
2.1.4 Building the Virtual Memory Page Tables
One step of the boot sequence implemented by the PLATFORMSETUP macro (mentioned in the previous
section) is to build the virtual memory page tables. The use of virtual memory allows for flexibility in defining the memory layout. Implementing and debugging the code that builds the page tables and enables the
MMU (next section) is one of the more difficult tasks of porting eCos to the Brutus.
36
The layout of physical memory, shown in Figure 2.4, is already determined by the SA- 1100 platform
and cannot be altered:
Reserved (384 Mbyte)
OhCOO0 0000
Zeros Bank (128 Mbyte)
DRAM Bank 3 (128 Mbyte)
Cache flush replacement data
Reads return zero
128 Mbyte
DRAM Bank 2 (128 Mbyte)
Dynamic Memory
DRAM Bank 1 (128 Mbyte)
512 Mbyte
DRAM Bank 0 (128 Mbyte)
LCD and DMA Registers (256 Mbyte)
Oh8000 0000
Memory and Expansion Registers (256 Mbyte:
Internal Registers
System Control Module Registers(256 Mbyte
1GB
Peripheral Module Registers (256 Mbyte)
Reserved (1GB)
Oh4000 0000
PCMCIA Socket 0 Space (256 Mbyte)
PCMCIA Interface
512 Mbyte
PCMCIA Socket 1 Space (256 Mbyte)
0h2000 0000
Static Bank Select 3 (128 Mbyte)
Static Bank Select 2 (128 Mbyte)
Static Bank Select 1 (128 Mbyte)
OhOO0O 0000
Static Memory
52 Flash, SRAM)
Static Bank Select 0 (128 Mbyte)
(from Intel SA- 1100 Developer's Manual)
Figure 2.4: SA- 1100 Memory Map for the entire 4Gb 32-bit Address Space
Note that a large portion of the address space is devoted to the "Internal Registers." When addresses in
this range are used, the SA- 1100 routes the data to or from special internal registers instead of to off-chip
memory. This is the mechanism by which most of the features of the chip are accessed and controlled.
Also notice that ROM is found at address
0x00000000, while DRAM begins at address
0xC 0 0 0 0 0 0 0. Although the SA- 1100 is not the only processor with a memory layout like this, it is somewhat uncommon to have ROM at address zero. The reason is because the ARM architecture (as well as many
others) requires the exception vectors to begin at address zero, and generally the exception vectors need to
be in RAM so that software can exchange exception handlers at runtime. Since the SA- 1100 has ROM at
37
address zero, if you want exception vectors to be in RAM you will have to enable the MMU and create a
non-flat 13 virtual memory mapping.
The ARM architecture defines the page tables and other details of the virtual memory system (these are
not specific to the SA-1 100). A two-level page table scheme is used. The LI page table occupies 16Kb of
memory, contains 4096 entries, and each entry governs a
4Gb). The L2 page tables each occupy
the address space (for a total of
1Mb section of the address space (for a total of
1Kb of memory, contain 256 entries, and each entry governs 4Kb of
1Mb). There is also a special "sub-page" feature that allows access control
(but not address remapping) to extend to a resolution of 1Kb. As with any multi-level page table scheme, the
use of the lower level page tables is optional. Thus, I chose to avoid the use of L2 page tables because they
waste memory and make the each page table walk (which is implemented in hardware) less efficient. By
using only an LI page table, several "holes" are left in the address space where the latter part of a range of
addresses (that is less than the
1Mb resolution, such as the 256K of boot ROM) has a mapping in the virtual
memory system but has no underlying physical memory (and thus accesses to these addresses could be
unpredictable). This is a sacrifice that I chose to make for the sake of energy efficiency. 14
13. By "non-flat," I simply mean that the predicate (virtual address == physical address) does not
hold true for all locations in memory, and thus address translation is actually necessary.
14. I make no claim here as to how much this effects energy efficiency because I did not perform
actual experiments to measure this.
38
Table 2.4 shows the memory layout that I chose to use for eCos on the Brutus:
Purpose of
Size
Physical Address Range
Virtual Address Range
Boot ROM
1 Mb
OxOOOOOOOO. .OxOOOFFFFF
0x04000000. .OxO40FFFFF
Peripheral
Control Module
(PCM) Registers
1 Mb
0x80000000. .Ox80OFFFFF
0x80000000. .Ox80OFFFFF
System Control
Module (SCM)
Registers
1 Mb
0x90000000. .Ox900FFFFF
0x90000000. .Ox900FFFFF
Memory Control
Registers
1 Mb
OxAOOOOOOO. .OxAOOFFFFF
0xAO000000. .OxA0OFFFFF
DMA/LCD
Registers
2 Mb
0xB0000000.
0xB01FFFFF
0xB0000000. .0xB01FFFFF
DRAMBank1
4Mb
OxCO00000..0xCO3FFFFF
0x00000000..OxO03FFFFF
DRAM Bank 2
4 Mb
0xC8000000. .OxC83FFFFF
0x00400000. .OxO07FFFFF
DRAM Bank 3
4 Mb
OxDOOOOOOO. .OxDO3FFFFF
0x00800000. .Ox0OBFFFFF
DRAM Bank 4
4 Mb
OxD8000000. .OxD83FFFFF
OxOOCOOOOO. .OxOOFFFFFF
OxE000000.
OxEOOO OOOO. .OxE80FFFFF
Memory Area
Zeros Bank
128 Mb
.
.OxE80FFFFF
Table 2.4: Virtual Memory Mapping as specified in Page Tables
RAM has been remapped to address zero, while ROM has been moved up to address
0x0 40 00 0 0 0.15
The rest of the address space has a flat mapping to avoid confusion. No mapping is created for SRAM or
Flash since they are not used at this time. 16 Any reads or writes to virtual addresses outside any of the ranges
in the table will produce page faults. The access permissions and cacheablility are not shown in the table, but
DRAM Bank 4 is mapped uncacheable and unbufferable, so that it can be used for things such as the LCD
frame buffer (which is required to be uncacheable and unbufferable). This memory mapping is one of the
few things that will need to be adjusted when eCos is finally ported from the Brutus to the ApAMPS prototype.
Once the memory map is decided, eCos needs to be configured so that the linker knows where to place
code and data when compiling eCos. Normally, this would require the direct use of a linker script, which is
15. This location for ROM will be okay as long as there is less than 64Mb of DRAM (otherwise
the DRAM starting at address zero will occupy space beyond 0x040 0 0 0 0 0).
16. Before running eCos, it is important that the Brutus switches be set to enable DRAM instead of
SRAM, and to enable 32-bit wide ROM accesses.
39
beyond the level of many programmers. Therefore the eCos configuration tool has a graphical feature to aid
in the automatic generation of a linker script (and a couple of other related files).
Figure 2.5 on page 41 and Figure 2.6 on page 42 show the memory layouts as they were defined in the
eCos configuration tool for STUBS, RAM, and ROM startup. You can see in Figure 2.5 that the GDB stub
code resides in ROM starting at address
0x0400 0 0 0 0 and can use the entire 256Kb range of ROM if nec-
essary, while (writable) data is limited to the lower 16Kb of RAM. The remaining part of RAM is reserved
for downloading code that is to be run from RAM. Notice that the data section appears in both the RAM and
ROM regions in the layout. This is because this section contains initialized data and is copied (relocated)
from ROM to RAM at boot time. The 800Kb reserved section at the bottom of RAM to leave room for the
exception vectors.
Looking at Figure 2.6, we see that eCos programs compiled for RAM startup only use the first three
banks of DRAM (as mentioned earlier, the fourth bank is mapped uncacheable and unbufferable and is for
special uses such as the LCD buffer). The 32Kb of reserved space in the lower part of DRAM is for the GDB
stubs and the virtual memory page tables (which reside at OxO 0004000). Programs compiled for ROM
startup use RAM very similarly, and use ROM in the same way it was used for STUBS startup (except now
it is much more likely that the entire 256Kb range would be needed).
40
1*
"2''
C
S
16
-D
0
2
CD
ii
2
LD
0D
0D
C3
O)
8
0D
CD
E
2
Figure 2.5: Memory Layout for STUBS Startup
41
1
Cu
C
2
C
0b
8~
*1
'N
'N
ii
'U
2
La'
Ll.
0
C
8
8
C
00.
8
8C)
0
8
-C
C
CD
0
CD
*3
Figure 2.6: Memory Layouts for RAM and ROM Startup
42
-C
2.1.5 Enabling the MMU
Once the page tables have been created, the next step is to enable virtual memory using them. Enabling virtual memory should be as simple as setting the page table base address register to the physical address of the
page table (OxC004000) and turning the MMU on. However, when the memory mapping is not flat (for the
region of code that actually enables the MMU) things get complicated because the address of the next
instruction "magically" changes when the MMU is enabled (which is effectively like a jump instruction). On
some pipelined architectures, it is possible to place a branch in exactly the right position so that execution
will continue seamlessly. However, this does not work on the SA- 1100, so an interesting "hack" must be performed instead.
Figure 2.7 illustrates the steps of the hack that occur prior to the activation of the MMU, and Figure 2.8
shows the steps that happen afterward:
Virtual Address Space
Step 1:
Physical Address Space
DRAM:OXCOO#####
ROM:
Ox040#####
DRAM: OXOOO0###
ROM: OX0 0 #####
Virtual
Physical
Program Counter
I x000#####
Step 2:
DRAM: OxCOO#####
ROM: OxO40 #####I
DRAM: OxOOO #####
PC
IOx00 0#
####]
Figure 2.7: Before Enabling the MMU
43
ROM: OXOOO #####
Physia
Virtual
Step 3:
DRAM: OxCOO#####
ROM: Ox040#####
PC
_______
________
DRAM:Ox0 0#####
x000#####
Physical
Virtual
Step 4:
ROM:OxOOO#####
DRAM: OxCOO#####
ROM:Ox04O#####
PC
DRAM:OxOOO#####
0x040#####
Physical
Virtual
Step 5:
ROM:OxOOO#####
DRAM:OXCOO#####
ROM : 0x040#####
0x040#####
DRAM:Ox000#####
ROM:OxOOO#####
Figure 2.8: After Enabling the MMU
In Step 1, the page tables have been built normally (DRAM mapped to OxO 0 0 00 0 00, ROM mapped to
OxO 40 0 0 00 0, etc.) and the program counter points directly to the code that is executing from ROM. If we
enabled the MMU in this condition, the next instruction would be fetched from a nonsensical location in
DRAM because the address in the program counter would suddenly be mapped there. To prevent this from
happening, Step 2 temporarily overwrites the page table entry for the corresponding page in DRAM so that
it points to the page in ROM where the code is executing. Since the memory mapping for this page is now
flat, the MMU can be safely enabled.
In Step 3, the MMU is enabled and code is still running from the same place in ROM. We now need to
update the PC, however, so that the code will be running at its new virtual address. This is done by a branch
instruction (Step 4) that sends the PC to the next instruction at its correct virtual address. Now we need to
44
restore the page table entry that was temporarily overwritten so that we can again access the first page of
DRAM. This is done in Step 5, and now things are back to normal with the MMU enabled.
There is one detail that was glossed over, however. Although the steps as described would work in general, there is a complication as we have defined the memory map for eCos because the page table itself is in
the first page of DRAM - the very area of memory that becomes temporarily unavailable in step 2. The
page table must have a mapping in the virtual address space, however, in order for us to restore the page
table entry in step 5. I accomplished this by creating an alias to the first page of DRAM in the page table.
The very last page table entry (which was otherwise unused) points to the first page in DRAM where the
page tables are stored. When the temporarily overwritten page table entry is restored in step 5, the code
writes to the page table at the address OxFFF04 000, which points to the correct place in DRAM. By creating this alias to the page in memory where the page tables are stored, we are assured to always be able to
write to them if necessary.
Some subtle points needs to be mentioned. First, the TLBs must be flushed each time the page tables are
edited, as is done in steps 2 and 5. Second, the caches are temporarily disabled during all of these steps to
prevent aliases from being created in them after the MMU is enabled. Finally, note that any registers that
contain addresses must be updated when the MMU is enabled. For example, if the link register (LR) contained a return address, it will need to be translated before returning.
If the MMU ever needed to be disabled, there would be a similar set of steps to follow. However, with
eCos on the Brutus it is never necessary to disable the MMU.
2.2 The Debugging Process
Once all of the code in the HAL was finally ported to the Brutus as described in the various parts of Section
2.1, the next step was to compile GDB stubs and begin debugging the code in ROM. The debugging of the
early parts of the boot sequence was the hardest because the only feedback available was from the HEX LED
display (serial ports are not initialized until late in the boot sequence). To debug the code that performs the
steps described in Section 2.1.5, for example, I actually had to write code to display the values in the SA1100 registers one digit at a time.
Once the GDB stubs are working, it is possible to connect to the Brutus with gdb and download code
just as would be done with Angel. The next step is to compile a full eCos kernel and test an eCos application.
Since eCos comes with over 150 test programs, the existing test suite was used to verify the functionality of
45
eCos on the Brutus. Initially, most of the tests passed. Tests that failed were mostly due to a bug in the floating point code that gcc generates to pack or unpack doubles.
46
Chapter 3
eCos Applications
3.1 Early Demo Programs
To verify that eCos was working reasonably well on the Brutus, and to get some more experience writing
eCos applications, a few simple demo programs were written.
My earliest test programs print messages to the serial port, which are then displayed in the gdb terminal
on the PC. It wasn't long, however, before we wanted to be able to interact with the Brutus directly using its
own peripherals. I ported the LCD and keyboard drivers from some of Intel's code (that was written to run
on Angel), and used these to write a series of graphical demos. One of the demos, for example, displays colored bouncing squares on the LCD screen. Each square is animated by a separate thread. A screen shot of
this program is shown in Figure F.4 on page 72.
The next step before implementing DVS was to write a test program to allow the frequency of the SA1100 to be changed at runtime. I wrote this program as an extension of the graphical colored squares demo
so that any runtime errors would be visibly obvious. The keyboard is used to control frequency as desired.
The SA- 1100 manual actually recommends that clock frequency only be set at boot time. However, the
SA- 1100 clock frequency can actually be safely changed at any time, as long as certain peripherals are not in
use (because the clock signal to some peripherals such as serial ports is unstable during the 150us that it
takes for the PLL to re-lock).
One issue that surfaced in developing this program is that some peripherals need to be re-configured
when the frequency is changed. For example, the LCD pixel clock is derived from the core clock using a
configurable divider. When the core clock frequency changes, the divisor must be updated in order to maintain the same LCD refresh rate.
3.2 Developing a DVS Demo
Now that frequency scaling worked, I was ready to begin developing a program to do actual Dynamic Voltage Scaling. The first part that needed to be developed was code to determine the processor load. The average load value over each interval is what is used (in the simplest implementation of DVS) to determine what
frequency to use in the next interval (in this case, intervals are on the order of a second or two). The method
this program uses for determining load is to create a thread of lowest priority (so that it only runs when nothing else is ready to run) with an infinite loop. A counter is incremented inside the loop, so that the counter
47
value can be used to determine roughly how much time the processor has spent running the lowest priority
thread. A load-monitoring program implemented in this manner was posted to the ecos -discuss mailing
list (by Andrew Lunn, an eCos user from Switzerland). This load monitoring code has a cleverly written calibration step which temporarily makes the counter thread highest priority. The counter value after one second at highest priority is used as the 100% load reference point.
One problem with this method for determining processor load is that it requires a
loop to be running
when the application would otherwise be idle. Since the SA-1100 idle mode consumes significantly less
energy, it would be nice to be able to use it. An alternate scheme for determining processor load (which has
not yet been implemented) would be to make the lowest priority thread simply idle the processor, and check
the value of the OS timer count register upon entering and exiting idle mode. By subtracting the two values
it can be determined how much time the processor spent idling. Since the OS timer counts at a known frequency (32.7KHz), a calibration step would not be required.
The frequency scaling and load monitoring code was enough to create a dynamic frequency scaling
demo. I was able to write a program which adjusted the clock frequency depending on the processor utilization of the application. The program tried to keep the load at a certain value (around 95%) by raising or lowering the frequency. Since the application was performing very predictable periodic tasks, this demo worked
well.
The only remaining feature needed to complete a true DVS demo was now the voltage scaling code. To
determine which voltage to use at each processor frequency, the DVS Test Board (shown in Figure F.2 on
page 71) was hooked up to the Brutus and the experiment shown in Figure 3.1 was done. This experiment
shows the energy consumed at each frequency and voltage combination (a nice illustration of why we want
to do DVS in the first place), but also shows the lowest voltages at which the processor can run at each frequency without crashing. A small safety margin was added to these voltages and the values given in
Table 3.1 were decided upon.
48
C0.28
09.
01.
176.
25.4.0.0
Core Voltage (V)
Frequency (MHz)
Figure 3.1: Measured Energy per Operation vs. Frequency and Supply Voltage
Core
Frequency
(MHz)
Core
Supply
(V)
D4-0
Hex)
206.4
1.500
OA
191.7
1.400
OC
176.9
1.350
OD
162.2
1.275
10
147.5
1.225
12
132.7
1.175
14
118.0
1.100
17
103.2
1.025
1A
88.5
0.975
iC
73.7
0.925
lE
59.0
0.900
IF
Table 3.1: Voltages used with DVS for each Clock Frequency
The test application that was chosen to demonstrate DVS is variable-length filtering. The DVS demo
program performs FIR filtering of an input signal using simple convolution. By varying the length of the FIR
filter, differing amounts of computation are required to perform the filtering and the output signal quality
49
varies. Figure 3.2 shows how the resulting energy (per instruction) is affected by varying filter length/quality.
For this application, DVS provides a 60% energy savings over frequency scaling alone.
I
I
I
I
I
0 0
1-
0.
0
0
0 -
0.
C
0:7 0.
0
00
0
Z5 0.1
00
4-
0.5
E
0
-A
0
00
0.
z0.
30.
20.
1-
AA
0
20
40
60
140
120
100
80
Filter Quality (Impulse Response Length)
160
180
200
Figure 3.2: Energy per Operation with Voltage Scaling vs. without Voltage Scaling
Figure 3.3: Screen Shot from DVS Demo
50
Chapter 4
Ideas for Energy Efficiency Improvements
Although my experiments showed that even a simple implementation of DVS can give good results, this
work is only a beginning. More energy efficient techniques would need to be used to better meet the needs of
the pAMPS project. This chapter explores several ideas, some better than others.
4.1 DVS-Related Techniques
4.1.1 Choose the Right Boot Frequency
It was alluded to earlier (in Chapter 2) that the clock frequency and voltage that are used during system initialization can be traded off with the time the initialization takes. If initialization time is not critical and other
parts of the system are not burning significant amounts of energy during this time, the processor should be
kept at the lowest possible frequency (and voltage) while booting. However, in other cases is may be ideal to
boot at higher frequencies. This is a trade-off that should be considered once more details of a particular
ptAMPS application become clear. This technique is only useful in applications where the nodes are for
some reason frequently rebooted (recall that the SA- 1100 must reboot when coming out of sleep mode).
4.1.2 Energy vs. Quality Scaling
The OS can help DVS spend more time at lower voltages by appropriately varying the quality of certain features. For example, software floating point operations can be done with varying levels of accuracy as
required by the application.
4.1.3 Thread-based Voltage Scheduling
The simple interval-based DVS technique used in the experiment of Chapter 3 will perform poorly for certain applications with irregular load patterns. It has been suggested ([22]-[25]) that using information about
the execution patterns of individual threads could provide better input for voltage scheduling algorithms.
Perhaps the voltage scheduling could actually be integrated into the OS scheduler. Unfortunately, I did not
have time to further develop this idea.
4.2 Non-DVS Techniques
4.2.1 Know your Hardware
There is no substitute for understanding the details of the hardware that effect energy efficiency. Examples of
features significant for energy efficiency on the SA- 1100 are:
51
* Be careful not to let any of the SA-1 100 GPIO pins float (see page 11-184 of the SA-1100 manual).
Floating pins can cause unnecessary transitions to occur in the pads of the SA-1100, which are
powered by the 3.3V supply and thus can expend significant amounts of energy. All GPIOs should
either be configured as outputs, or driven by the devices at the other end. Since reset state of the
SA-1 100 is for all GPIOs to be inputs (to avoid contention), the boot sequence should configure the
correct pins as outputs as soon as possible.
- Use FIQs when appropriate. Avoiding the unnecessary saving and restoring of registers (necessary
for IRQs) can be good both for performance and for energy efficiency.
* In the special case of a system that uses an LCD panel that does not require the AC bias signal from
the SA-1100's LCD controller, the AC bias level should be set to the minimum value to save
energy.
- Make sure the DRAM waveform configuration registers are set up properly. The detailed timing of
the signals between the SA- 1100 and DRAM can effect energy on the memory bus. There may be
different choices of timing configurations that work equally well, but some may be more energy
efficient than others.
* Use DMA-driven (instead of interrupt-driven) I/O when appropriate. Not only does this prevent the
ARM core from doing extra work, but also the DMA controller can do more efficient block transfers, saving energy both on the SA-1 100 and also in the external memory system.
4.2.2 Efficient/Restricted use of Memory
In some systems the DRAM consumes a large percentage of overall power. Special attention to the use of
memory by the OS can effect this. For example, the memory map that is chosen will effect energy consumption in the TLBs and elsewhere. If half of the time the SA- 1100 is accessing memory addresses in the range
OxFFFF#### and the other half of the time it is accessing memory in the range OxO000####, there will
be a lot of transitions occurring in the TLBs and Caches because of the differing upper address bits. Defining
the memory map so that the two ranges are adjacent might save energy. Keeping the memory map relatively
flat can also help avoid the use of L2 page tables, which take more memory and make page table walks take
more energy. Details in how memory allocation is performed will also make a difference in the memory
address access patterns that could effect energy consumption.
52
Chapter 5
Conclusion
Although I have laid the groundwork for the pIAMPS OS, there is plenty of work left to do. Future work
might include:
- More research on DVS.
- pAMPS application fine-tuning: Do benchmarking on the actual pAMPS system and fine-tune OS
and application for best energy efficiency.
" Implement an ARMulator model for the pAMPS board to allow energy-true simulation of software.
- Compiler enhancements for energy efficiency, such as using the Thumb instruction set in addition
to ARM instructions for code compactness (Thumb is not supported on the StrongARM 1100 but
another ARM chip could be used).
- Consider how to make higher-level OS's, such as Linux, more energy efficient (file systems, virtual
memory paging, networking, dynamic memory allocation, process handling, parallel/distributed
processing, etc.).
I am glad to have had this opportunity to learn and grow in several areas before my MIT education is
drawn to a close. Through the course of writing this thesis I have been introduced to real-time operating systems, embedded software development, the ARM architecture, and more details of the GNU toolchain and C
programming. The pAMPS project now has a small OS to use on its prototype hardware and to begin
improving upon. We also now have some real experience with the energy savings that can be achieved by
DVS. This project has been challenging and even overwhelming at times, but the experience is valuable and
will perhaps help future projects go more smoothly.
53
54
Appendix A
DVS Test Board Specifications
This appendix gives some specifications for the DVS Test Board (see Figure F.2 on page 71) that was
designed by Rex Min for use with the Brutus.
A.1 Control Signals from Brutus to DVS Test Board
We used 5 GPIO signals to control the voltage level. Fortunately, the Brutus has exactly 5 GPIO pins connected to easily accessible test points. Wires were soldered to these test points for connection to the DVS test
board. Table A. 1 shows how the control wires were connected:
Test Point
DVS Board Input
GPIO
Also Functions As
TP16
D4
9
left green LED
TP17
D3
8
right green LED
TP93
D2
20
red LED
TP25
DI
26
RCLK out
TP26
DO
27
32KHz Out
Table A.1: Voltage Control Signals
A.2 Specification for DVS Test Board Inputs
Table A.2 shows which values of the DVS board inputs correspond to which voltage output levels. This
information was used to write the code that allows that Brutus to set the voltage. There is more than one
value that produces 1.250 V, but only one of them is used by the software to produce this voltage level. If
inputs D4 or D3 to DVS test board are left floating, the board will take the value from its switches instead of
from the control wires. The board was designed to honor the SA-1 100's maximum safe voltage of 1.6 V,
even though the LTC 1736 regulator chip that was used can output up to 2 V.
55
D4-DO
(Hex)
D4-DO
(Decimal)
D4-DO
(Binary)
Output Voltage
(to SA-1100
00
0
00000
S
01
1
00001
S
02
2
00010
S
03
3
00011
S
04
4
00100
S
05
5
00101
S
06
6
00110
S
07
7
00111
S
08
8
01000
1.600
09
9
01001
1.550
OA
10
01010
1.500
0B
11
01011
1.450
0C
12
01100
1.400
OD
13
01101
1.350
OE
14
01110
1.300
OF
15
01111
1.250*
10
16
10000
1.275
11
17
10001
1.250
12
18
10010
1.225
13
19
10011
1.200
14
20
10100
1.175
15
21
10101
1.150
16
22
10110
1.125
17
23
10111
1.100
18
24
11000
1.075
19
25
11001
1.050
1A
26
11010
1.025
1B
27
11011
1.000
iC
28
11100
0.975
ID
29
11101
0.950
1E
30
11110
0.925
IF
31
11111
0.900
Table A.2: DVS Test Board Voltage Control Signals
56
A.3 AC Characteristics
Figure A. 1 shows a scope plot of the output voltage of the DVS board (while driving the Brutus) as the voltage is changed between various levels. Zooming in revealed that the rising and falling transients are less than
100ps is all cases. This is roughly the same amount of time that the SA- 1100 takes to re-lock its PLL when
changing clock frequency (150ps). Because these times are on the same order of magnitude, it is not necessary for the DVS software to introduce a waiting period between the adjustment of voltage and the adjustment of frequency.
1 .3
I
I
I
I
I
I
5
10
15
20
I
I
I
I
I
)1.
0.6
0
I
I
25
30
35
40
Seconds
Figure A.1: Transients in output voltage of DVS Board
57
I
45
50
58
Appendix B
eCos Regression Test Results
Table B. 1 is a list of the regression test programs that are included with eCos 1.3.1. For each test, the results
after running it on the debugged Brutus port is indicated. Note that the serial driver still contains a bug that
will need to be fixed later, and thus a few tests failed.
Result
P
P
P
N/A
P
FAIL
FAIL
FAIL
FAIL
FAIL
FAIL
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
Test
Result
P
hal/arm/brutus/v 1_3_1/tests/dram-test
hal/common/v1_3_1/tests/cache
hal/common/v1_3_1/tests/context
hal/common/v1_3_1/tests/intr
io/serial/v 1_3_1/tests/seriall
io/serial/v 1_3_1/tests/serial2
io/serial/v 1_3_1/tests/serial3
io/serial/v1_3_1/tests/serial4
io/serial/v 1_3_1/tests/serial5
io/serial/v 1_3_1/tests/ttyl
io/serial/v 1_3 1/tests/tty2
kernel/v 1_3 1/tests/bin semO
kernel/v 1_3_1/tests/bin semi
kernel/v 13_1/tests/bin sem2
kernel/v 1_3_1/tests/clockO
kernel/v 1_3_1/tests/clockl
kernel/v 1_3_1/tests/clockcnv
kernel/v 1_3_1/tests/cnt semO
kernel/v 1_3_1/tests/cnt semi
kernel/v 1 3_1/tests/exceptl
kernel/v 1_3_1/tests/flagO
kernel/v 13_1/tests/flagl
kernel/v 1_3_1/tests/intrO
kernel/v 1_3_1/tests/kclockO
kernel/v 1_3_1/tests/kclockl
kernel/vi_3_1/tests/kexceptl
kernel/v 1-3_1/tests/kintrO
kernel/v 1_3_1/tests/kmboxl
kernel/v 1_3_1/tests/kmemfixl
kernel/v 1_3_1/tests/kmemvarl
kernel/v 1_3_1/tests/kmutexO
kernel/v1 3 1/tests/kmutexl
kernel/v1_3_1/tests/kschedl
kernel/v1 3_1/tests/ksemrO
kernel/v 1_3_1/tests/kseml
kernel/v 1_31/tests/kflagO
kernel/v1_3_1/tests/kflagl
kernel/v 13_1/tests/kthreadO
kernel/v1 3_1/tests/kthreadl
kernel/v1 3_1/tests/mbox1
kernel/vi_3_1/tests/memfixI
kernel/v 1_3_1/tests/memfix2
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
N/A
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
Test
kernel/v131/tests/memvarl
kernel/vI_3_1/tests/memvar2
kernel/v 1_3 1/tests/mutexO
kernel/v 1_3_1/tests/mutexI
kernel/v 1_3_1/tests/mutex2
kernel/vI_3_1/tests/mutex3
kernel/v1_3_1/tests/schedl
kernel/v_31/tests/sync2
kernel/v1_3_1/tests/sync3
kernel/v 1_3_1/tests/threadO
kernel/v 1_3_1/tests/threadl
kernel/v 1_3_1/tests/thread2
kernel/v 1_31/tests/release
kernel/v1_3_1/tests/kill
kernel/v 1_3_1/tests/thread-gdb
kernel/v 1_3_1/tests/tm basic
kernel/v1_3_1/tests/dhrystone
kernel/v1_3_1/tests/stress threads
kernel/013_1/tests/kcachel
kernel/v 1_3_1/tests/kcache2
language/c/libc/v_31/tests/ctype/ctype
language/c/libc/vl_3_1/tests/il8n/setlocale
language/c/libc/v_ 3 1/tests/setjmp/setjmp
language/c/libc/vl_3_1/tests/signal/signall
language/c/libc/vl_3_1/tests/signal/signal2
language/c/libc/v1_3_1/tests/stdio/sprintfl
language/c/libc/v1 3 1/tests/stdio/sprintf2
language/c/libc/vl_3_1/tests/stdio/sscanf
language/c/libc/vl_3_1/tests/stdio/stdiooutput
language/c/libc/v_ 3 1/tests/stdlib/abs
language/c/libc/v1_3_1/tests/stdlib/atexit
language/c/libc/vl 3 1/tests/stdlib/atoi
language/c/libc/vI 3 1/tests/stdlib/atol
language/c/libc/vI_3_1/tests/stdlib/bsearch
language/c/libc/v1_3_1/tests/stdlib/div
language/c/libc/v_ 3 1/tests/stdlib/getenv
language/c/libc/v1_3_1/tests/stdlib/labs
language/c/libc/v_31/tests/stdlib/ldiv
language/c/libc/v_ 3 1/tests/stdlib/qsort
language/c/libc/vl 3 1/tests/stdlib/mallocl
language/c/libc/v13_1/tests/stdlib/malloc2
language/c/libc/v_ 3 1/tests/stdlib/malloc3
Table B.1: eCos Regression Tests Results for Brutus Port
59
Result
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
Result
Test
language/c/libc/vl
language/c/libc/vl
language/c/libc/vl
language/c/libc/v_
3 1/tests/stdlib/randl
3 1/tests/stdlib/rand2
3 1/tests/stdlib/rand3
3 1/tests/stdlib/rand4
3 1/tests/stdlib/realloc
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
P
language/c/libc/vl
language/c/libc/v1_3_1/tests/stdlib/srand
language/c/libc/vl 3_1/tests/stdlib/strtol
language/c/libc/v1_3_1/tests/stdlib/strtoul
language/c/libc/vl 3_1/tests/string/memchr
language/c/libc/vl_3_1/tests/string/memcmpl
language/c/libc/vl_3_1/tests/string/memcmp2
language/c/libc/vl_3_1/tests/string/memcpyl
language/c/libc/vL3_/tests/string/memcpy2
language/c/libc/vl_3_1/tests/string/memmovel
language/c/libc/vl 3_1/tests/string/memmove2
language/c/libc/vl_3_1/tests/string/memset
language/c/libc/vl_3_1/tests/string/strcatl
language/c/libc/v1_3_1/tests/string/strcat2
language/c/libc/v_ 3 1/tests/string/strchr
language/c/libc/vl_3_1/tests/string/strcmpl
language/c/libc/vl 3_1/tests/string/strcmp2
language/c/libc/v_3_/tests/string/strcolll
language/c/libc/v1 3_1/tests/string/strcoll2
language/c/libc/v_ 3 1/tests/string/strcpyl
language/c/libc/v13_1/tests/string/strcpy2
language/c/libc/vl 3_1/tests/string/strcspn
language/c/libc/v1_3_1/tests/string/strlen
language/c/libc/v1_3 1/tests/string/strncatl
language/c/libc/v_ 3 1/tests/string/strncat2
language/c/libc/vl_3_1/tests/string/strncpyl
language/c/libc/vl 3 1/tests/string/strncpy2
language/c/libc/vl 3_1/tests/string/strpbrk
language/c/libc/vl 3_1/tests/string/strrchr
language/c/libc/vl 3 1/tests/string/strspn
language/c/libc/vl 3 1/tests/string/strstr
language/c/libc/vl_3_1/tests/string/strtok
language/c/libc/v_ 3 1/tests/string/strxfrml
language/c/libc/vl_3_1/tests/string/strxfrm2
language/c/libc/vl_3_1/tests/time/asctime
language/c/libc/vl 3 1/tests/time/clock
language/c/libc/vL_3_1/tests/time/ctime
language/c/libc/vl_3_1/tests/time/gmtime
language/c/libc/vl 3 1/tests/time/localtime
language/c/libc/vl 3 1/tests/time/mktime
language/c/libc/v_ 3 1/tests/time/strftime
Test
language/c/libc/vl_3_1/tests/time/time
language/c/libm/v _3_1/tests/vectors/acos
language/c/libm/vl 3_1/tests/vectors/asin
language/c/libm/vl_3_1/tests/vectors/atan
language/c/libm/vl_3_1/tests/vectors/atan2
language/c/libm/vl 3_1/tests/vectors/ceil
language/c/libm/v_ 3 1/tests/vectors/cos
language/c/libm/vl_3_1/tests/vectors/cosh
language/c/libn/vl_3_1/tests/vectors/exp
language/c/libm/vl 3 1/tests/vectors/fabs
language/c/libm/v_3_1/tests/vectors/floor
language/c/libm/vl_3_1/tests/vectors/fmod
language/c/ibm/v_3_1/tests/vectors/frexp
language/c/libm/vl_3_1/tests/vectors/ldexp
language/c/libni/vl_3_1/tests/vectors/log
language/c/libn/v_ 3_1/tests/vectors/loglO
language/c/libm/v_ 3 1/tests/vectors/modf
language/c/libm/vl_3_1/tests/vectors/pow
language/c/libm/v_ 3 1/tests/vectors/sin
language/c/libm/vl_3_1/tests/vectors/sinh
language/c/libm/vl_3_1/tests/vectors/sqrt
language/c/libm/vl 3 1/tests/vectors/tan
language/c/libm/vl3_1/tests/vectors/tanh
devs/wallclock/vl_3_1/tests/wallclock
devs/watchdog/vl_3 1/tests/watchdog
compat/uitron/v L3_1/tests/testI
compat/uitron/vl_3_1/tests/test2
compat/uitron/v1 3 1/tests/test3
compat/uitron/vl_3_1/tests/test4
compat/uitron/vl_3_1/tests/test5
compat/uitron/v_3 _1/tests/test6
compat/uitron/vl_3_1/tests/test7
compatluitron/v 13_1/tests/test8
compat/uitron/v1_3_1/tests/test9
compat/uitron/v1_3_1/tests/testcx2
compat/uitron/vl_3_1/tests/testcx3
compat/uitron/v13_1/tests/testcx4
compat/uitron/v 13_1/tests/testcx5
compat/uitron/vL_3_1/tests/testcx6
compat/uitron/vl_3_1/tests/testcx7
compat/uitron/vl3_1/tests/testcx8
compat/uitron/vl_3_1/tests/testcx9
compat/uitron/vl_3_1/tests/testcxx
compat/uitron/vl_3_1/tests/testintr
Table B.1: eCos Regression Tests Results for Brutus Port
60
Appendix C
The Big Picture of StrongARM Power Consumption
Although this data [1] is for the StrongARM 110 and not the 1100, it gives a good general idea of how much
power goes to each functional unit in a processor of this type. In the StrongARM 1100, we would expect a
considerable amount of power to also be consumed by certain peripheral controllers (such as the LCD controller) when they are enabled.
Unit
Power
I-cache
27%
I-box
18%
D-cache
16%
Clock
10%
IMMU
9%
E-box
8%
DMMU
8%
Write Buffer
2%
Bus Interface
2%
PLL
1%
Table C.1: Breakdown of Power Consumption of SA-110 processor when running Dhrystone
61
62
Appendix D
Instructions
D.1 Building eCos for the Brutus
The following directions are for building eCos on Linux. This is a bit harder than building on WinNT, since
there you can do everything inside the eCos Configuration Tool. When working on linux, however, you are
doing the mostly the same thing that the configuration tool does but without the convenience of the graphical
interface. There are a couple of exceptions: for example, you must use the configuration tool if you need to
edit a memory layout. Also note that when you edit the configurations manually on Linux, there is no automatic checking of the 'requires' and 'precludes' rules. You should be very careful not to violate these rules
unless you know what you are doing.
Use the following steps to guide you through the process:
1.
Set the following variables appropriately (these values are only examples):
setenv ECOSSRC /opt/ecos/src/ecos-1.2.10
setenv ECOSBUILD /opt/ecos/build/ecos-ram-1.2.10
setenv ECOSINSTALL /opt/ecos/install/ecos-ram-1.2.10
2.
Use one of the below commands to create an eCos build tree. Add options as necessary to control
which packages are included in the build.
RAM:
tclsh $ECOSSRC/packages/pkgconf.tcl -- force -- target=arm --platform=brutus -startup=ram --prefix=$ECOSINSTALL --builddir=$ECOS_BUILD -- srcdir=$ECOSSRC/
packages -enable-CYGPKGHAL -enable-CYGPKGINFRA -enable-CYGPKGKERNEL enable-CYGPKGLIBC -enable-CYGPKGLIBM -enable-CYGPKGERROR -enableCYGPKGHALARM -enable-CYGPKG_HAL_ARM_BRUTUS -enable-CYGPKGIO -enableCYGPKGIOSERIAL -enable-CYGPKGDEVICESWALLCLOCK
ROM:
tclsh $ECOSSRC/packages/pkgconf.tcl -- force -- target=arm --platform=brutus -startup=rom --prefix=$ECOSINSTALL --builddir=$ECOSBUILD -- srcdir=$ECOSSRC/
packages -enable-CYGPKGHAL -enable-CYGPKGINFRA -enable-CYGPKGKERNEL enable-CYGPKGLIBC -enable-CYGPKGLIBM -enable-CYGPKGERROR -enableCYGPKGHALARM -enable-CYGPKG_HAL_ARM_BRUTUS -enable-CYGPKGIO -enableCYGPKGIOSERIAL -enable-CYGPKGDEVICESWALLCLOCK
STUBS:
tclsh $ECOS_SRC/packages/pkgconf.tcl -- force -- target=arm --platform=brutus -startup=stubs --prefix=$ECOSINSTALL --builddir=$ECOS_BUILD -srcdir=$ECOSSRC/packages -enable-CYGPKGHAL -enable-CYGPKGINFRA -enableCYGPKGHAL_ARM -enable-CYGPKG_HAL_ARM_BRUTUS
3.
cd $ECOS_BUILD/pkgconf
63
4.
emacs pkgconf.mak
Replace "1100" with "110" until a newer version of gcc has been installed.
5.
emacs *.h
Edit the header files configure the fine-grain options of eCos. Listed below with each type of configuration are some examples of variables you may need to change from their defaults. Note that
some header files are automatically generated and you should never edit them by hand.
Fine grain configuration for RAM startup:
hal.h:
#define
#define
#undef
#define
CYGDBGHALDEBUG_GDB_INCLUDESTUBS
CYGDBGHALDEBUG_GDB_BREAKSUPPORT
CYGDBGHALDEBUGGDBCTRLCSUPPORT
CYGDBGHALDEBUGGDBTHREADSUPPORT
halarmbrutus.h:
#define CYGHWRHALARMBRUTUSSTARTUP ram
#define CYGHWRHALARMBRUTUSPROCESSORCLOCK
191700
libc.h:
#define CYGSEMLIBCPERTHREADRAND
#define CYGNUMLIBCMALLOCMEMPOOLSIZE
ioserial.h:
4000000
#define CYGPKG_I0_SERIAL_ARM_BRUTUS
#define CYGPKGIOSERIAL_ARMBRUTUSSERIAL1
Fine grain configuration for ROM startup:
hal .h:
#define
#define
#undef
#define
CYGDBGHALDEBUG_GDB_INCLUDESTUBS
CYGDBGHALDEBUGGDBBREAKSUPPORT
CYGDBGHALDEBUGGDBCTRLCSUPPORT
CYGDBGHALDEBUGGDB_THREADSUPPORT
halarmbrutus.h:
#define CYGHWRHALARMBRUTUSSTARTUP rom
#define CYGHWRHALARMBRUTUSPROCESSORCLOCK
191700
libc.h:
#define CYGSEMLIBCPERTHREADRAND
#define CYGNUM_LIBCMALLOCMEMPOOLSIZE
4000000
Fine grain configuration for ROM stubs:
hal .h:
#undef
#undef
#define
#define
#undef
#undef
#define
CYGFUNHALCOMMONKERNELSUPPORT
CYGPKGHALEXCEPTIONS
CYGDBGHALDEBUG_GDB_INCLUDESTUBS
CYGDBGHALDEBUGGDBBREAKSUPPORT
CYGDBGHALDEBUGGDBCTRLCSUPPORT
CYGDBGHALDEBUGGDBTHREADSUPPORT
CYGHALROMMONITOR
halarmbrutus.h:
#define CYGHWRHALARMBRUTUSSTARTUP stubs
#define CYGHWRHALARMBRUTUSPROCESSORCLOCK
191700
6. cd . .
7. setenv CC arm-elf-gcc
8. make
This will generate the install tree, which will contain the ecos library that you need to link with
your ecos applications. Watch carefully for any warnings during the build.
For Stubs only, to generate the actual image to burn into the flash you now do this:
64
1. make -C
hal/common/vi_2_10/src/stubrom
2. cd hal/common/vl_2_10/src/stubrom
3. arm-elf-objcopy -o binary stubrom stubrom.img
4.
Now follow the directions for writing the stubrom. img into a flash.
If you compiled eCos for RAM or ROM start, now you can build your applications with this new kernel by
pointing the PKGINSTALLDIR variable in their makefiles to $ECOSINSTALL.
D.2 Programming Flash Memories
If you compiled your eCos application as a ROM image or built eCos gdb stubs as a ROM image, you will
need to program the image onto a pair of flashes. You will need to have the Arm SDT installed to do this.
The installation CD contains a version for Solaris that you should install on a Sun somewhere.
Make sure that the flashes you want to program are in the left-hand pair of sockets (labelled U34, U44),
that angel is in the right-hand sockets (labelled U35, U45)), and that a there is a serial cable with a null
modem connecting the Sun to the front-most serial port on the Brutus.
Now execute the following instructions on the Sun (assuming both f mu. axf and your compiled stubrom. img are in the current directory):
armsd -remote
-adp
-port 1 -line
load fmu.axf
go
writeflash 262144 stubrom.img
readflash 262144 stubrom.chk
quit
quit
diff stubrom.img stubrom.chk
65
38400
66
Appendix E
Raw Data From Experiments
E.1 Data from Voltage/Frequency Scaling Experiment
Table E. 1 gives the actual measured data that was used to generate Figure 3.1 on page 49. This data was
acquired on February 18, 2000, using a Keithley 2400 Sourcemeter. The numbers are current in mA at the
5V supply1 7 to the DVS Test Board, as it was powering the Brutus running and eCos application which
keeps the processor at 100% load and allows voltage and frequency to be adjusted from the keyboard.
Note that there are some additional voltage values in this data (0.925, 0.975, 1.025, 1.075, 1.125, 1.175,
1.225, and 1.275) that were excluded from Figure 3.1 because they would have caused a confusing scale
change in the voltage axis at 1.300V. The shaded area of the table corresponds to voltage/frequency combinations at which the SA- 1100 failed to run. By comparing the voltage values along the upper edge of the
shaded area to the values in Table 3.1 on page 49, you can see that we have used a suitable safety margin to
prevent DVS from causing the SA- 1100 to crash under adverse conditions (such as high temperatures).
Finally, note that the upper bound on energy savings that can be achieved with our DVS on this hardware is:
1
x 5.16 x 0.00824
1 - 59000000
1
x 5.16 x 0.06102
206400000
17. The nominal 5V supply actually averaged about 5. 16V.
67
=
52.8%
Frequency (MHz)
59.0
73.7
88.5
103.2 1 118.0 1 132.7
147.5
162.2
176.9
191.7
206.4
1.600
25.35
29.91
34.45
38.98
43.58
48.20
52.54
57.35
61.16
65.10
69.52
1.550
23.62
27.90
32.23
36.46
40.80
45.11
49.22
53.75
57.35
61.08
65.15
1.500
22.01
26.05
30.13
34.09
38.17
42.17
46.15
50.35
53.64
57.22
61.02
1.450
20.46
24.21
28.03
31.75
35.57
39.33
43.00
46.96
50.11
53.46
57.04
1.400
19.16
22.67
26.20
29.71
33.28
36.81
40.22
43.87
46.89
50.07
53.33
1.350
17.71
20.99
1.300
16.43
19.50
1.275
16.09
19.03
1.250
15.38
18.22
1.225
14.80
17.54
1.200
14.14
16.77
1.175
13.70
16.25
1.150
13.07
15.52
1.125
12.54
14.90
1.100
11.92
14.19
1.075
11.66
13.83
1.050
11.06
13.15
1.025
10.59
12.59
1.000
10.03
11.94
0.975
9.71
11.53
0.950
9.17
10.92
0.925
8.76
10.42
0.900
8.24
9.83
0
Table E.1: Data from Voltage/Frequency Scaling Experiment
68
E.2 Data From Variable-Length Filter DVS Experiment
The measured data used to create Figure 3.2 on page 50 appears in Table E.2. This data was collected on
February 20, 2000, using a Keithley 2000 Sourcemeter. The sourcemeter was connected between the output
of the DVS test board and the power supply input of the SA-1 100 core, so this experiment does not take into
account the power drawn by the DC-DC converter itself. The Brutus was running an application that performs convolutional filtering of a signal using variable lengths of FIR filters (from 32 to 192 in steps of 8).
For each filter length, the application would lower the frequency and voltage as much as possible while still
filtering the input data at the rate it was being generated. The experiment was also done with varying frequency but fixed voltage as a control. Again, some simple arithmetic shows that the amount by which we can
scale down the energy used to filter the same input data but with varying output quality is:
1
1- 59000000 x 0.900 x 0.01359
10854
2064I000 x 1.500 x 0.
69
=
Load (%)
Current with
varying voltage
(mA)
Current at
fixed 1.5V
(mA)
59.0/0.900
91
13.59
35.71
40
59.0 / 0.900
95
13.69
35.90
48
73.7/0.950
92
17.64
43.61
56
73.7/0.950
96
17.71
43.84
64
88.5 /0.975
93
22.98
51.42
72
88.5/0.975
97
23.09
51.65
80
103.2 / 1.000
94
28.80
59.15
88
103.2 / 1.000
98
28.94
59.45
96
118.0 / 1.025
95
36.87
66.63
104
118.0 / 1.025
98
37.01
66.89
112
132.7 / 1.100
95
46.42
73.94
120
147.5 / 1.150
93
54.93
80.98
128
147.5 / 1.150
96
55.05
81.19
136
162.2 / 1.200
93
64.45
87.99
144
162.2 / 1.200
97
64.59
88.20
152
176.9 / 1.300
94
77.29
94.93
160
176.9 / 1.300
97
77.41
95.08
168
191.7 / 1.400
95
88.85
101.74
176
191.7 / 1.400
98
88.98
101.91
184
206.4 / 1.500
95
108.43
108.43
192
206.4 / 1.500
99
108.54
108.54
FIR Filter
Length
Freq. (MHz)
Voltage
32
Table E.2: Data from Variable-Length Filter DVS Experiment
70
Appendix F
Photos
Figure F.1: Photo of Brutus Board
Figure F.2: Photo of DVS Test Board
Figure F.3: Photo of Brutus with DVS Board connected
71
Figure F.4: Screen Shot of Graphical Demo on Brutus LCD
Figure F.5: StrongARM 1100 Chip Photo
72
References
[1]
K. Asanovic, "Vector Microprocessors," Ph.D. thesis, University of California, Berkeley, Spring
1998.
[2] A. Chandrakasan, "Basics of Low Power Circuit and Logic Design," [Online tutorial], Available:
http://www-mtl.mit.edu/research/icsystems/tutorials/
[3]
A. Chandrakasan, R. Amirtharajah, S.H. Cho, J. Goodman, G. Konduri, J. Kulik, W. Rabiner, A.
Wang, "Design Considerations for Distributed Microsensor Systems," In Proc. IEEE 1999 Custom
Integrated Circuits Conference (CICC '99), May 1999, pp. 279-286.
[4]
Cygnus Solutions, "Cygnus eCos Public License Version 1.0," [Online document], Available: http:/
/www.cygnus.com/ecos/ecoslicense.html
[5]
Cygnus Solutions, "Cygnus eCos Market Backgrounder," [Online document], Available: http://
www.cygnus.com/ecos/mrktbgrnd.pdf
[6]
Cygnus Solutions, "Cygnus eCos White Paper," [Online document], Available: http://
www.cygnus.com/ecos/wp.pdf
[7]
Cygnus Solutions, "EL/IX White Paper," [Online document], Available: http://
sourceware.cygnus.com/elix/whitepaper.html
[8]
M. Frigo, C. Leiserson, H. Prokop, S. Ramachandran, "Cache-Oblivious Algorithms," extended
abstract submitted for publication, Available: http://supertech.lcs.mit.edu/cilk/papers/
[9]
Intel, "Intel StrongARM SA- 1100 Microprocessor Developer's Manual," August 1999.
[10]
Intel, "StrongARM SA-1 100 Developer Board Firmware Kit User's Guide, November 1998.
[11]
Intel, "StrongARM SA-1 100 Microprocessor Evaluation Platform User's Guide," October 1998.
[12]
M. Klein, et al., A Practitioner'sHandbookfor Real-time Analysis: Guide to Rate Monotonic
Analysisfor Real-time Systems, Boston: Kluwer Academic Publishers, 1993.
[13] T. C. Lee, V. Tiwari, "A Memory Allocation Technique for Low-Energy Embedded DSP Software,"
Proceedingsof the 1995 IEEE Symposium on Low Power Electronics, San Diego, CA, October
1995.
[14]
J. Lorch, "A Complete Picture of the Energy Consumption of a Portable Computer," Masters
Thesis, Computer Science, University of California at Berkeley, December 1995.
[15]
J. Lorch, A. Smith, "Energy Consumption of Apple Macintosh Computers," IEEE Micro, 18(6):5463, November/December 1998.
[16]
J. Lorch, A. Smith, "Reducing Processor Power Consumption by Improving Processor Time
Management in a Single-User Operating System," Proceedingsof the Second ACM International
Conference on Mobile Computing and Networking, Rye Brook, NY, 143-154, November 1996.
[17]
J. Lorch, A. Smith, "Scheduling Techniques for Reducing Processor Energy Use in MacOS,"
Wireless Networks, 3(5):311-324, October 1997.
[18]
J. Lorch, A. Smith, "Software Strategies for Portable Computer Energy Management," IEEE
Personal Communications Magazine, 5(3):60-73, June 1998.
[19]
R. Min, T. Furrer, A. Chandrakasan, "Dynamic Voltage Scaling Techniques for Distributed
Microsensor Networks," WVLSI '00, April 2000.
[20]
MIT MTL Integrated Circuits and Systems group, ptAMPS project web site, Available: http://wwwmtl.mit.edu/research/icsystems/uamps/
73
[21] M. Pedram, Q. Wu, "Design Considerations for Battery-Powered Electronics," In Proceedings of
the 36th Design Automation Conference, 1999.
[22]
T. Pering, "Dynamic Voltage Scaling and an Overview of Low-Power Microprocessors,"
Presentation given at the University of Washington, Nov. 1998, Available: http://
infopad.eecs.berkeley.edu/-pering/lpsw/lpsw.html
[23]
T. Pering, R. Brodersen, "Energy Efficient Voltage Scheduling for Real-Time Operating Systems,"
work in progress paper for RTAS'98. Available: http://infopad.eecs.berkeley.edu/-pering/lpsw/
lpsw.html
[24]
T. Pering, T. Burd, R. Brodersen, "Dynamic Voltage Scaling and the Design of a Low-Power
Microprocessor System," Workshop on Low-Power Microprocessors at the 1998 International
Symposium of Computer Architecture, June 1998.
[25]
T. Pering, T. Burd, R. Brodersen, "The Simulation and Evaluation of Dynamic Voltage Scaling
Algorithms," In 1998 InternationalSymposium on Low-Power Electronics and Design.
[26]
H. Prokop, "Cache-Oblivious Algorithms," Masters Thesis, MIT Department of Electrical
Engineering and Computer Science, June 1999.
[27]
Q. Qiu, M. Pedram, "Dynamic Power Management Based on Continuous-Time Markov Decision
Processes," In Proceedings of the 36th Design Automation Conference, 1999.
[28]
A. Sinha, "Energy-Scalable Software," Masters Thesis, MIT Department of Electrical Engineering
and Computer Science, 2000.
[29]
Y. Shin, K. Choi, "Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems," In
Proceedingsof the 36th DesignAutomation Conference, 1999.
[30]
W. Shiue, C. Chakrabarti, "Memory Exploration for Low Power Embedded Systems," In
Proceedingsof the 36th Design Automation Conference, 1999.
[31]
M. Srivastava, A. Chandrakasan, R. Brodersen, "Predictive System Shutdown and Other
Architectural Techniques for Energy Efficient Programmable Computation," In IEEE Transactions
on VLSI Systems, Vol. 4, No. 1, March 1996.
[32]
D. Stepner, N. Rajan, D. Hui, "Embedded Application Design Using a Real-Time OS," In
Proceedingsof the 36th Design Automation Conference, 1999.
[33]
A. Stratakos, T. Burd, R.W. Brodersen, "Integrated Voltage Regulator and Clock Generator for
Dynamic Voltage and Frequency Scaling," [Online presentation], Available: http://
bwrc.eecs.berkeley.edu/burd/gpp/slides/IcSeminar. 11-96/
[34]
H. Takada, "Designing Small-Scale Embedded Systems with uITRON Kernel" [Online
presentation] Available: http://www.ertl.ics.tut.ac.jp/-hiro/escs98-ohp.pdf
[35]
V. Tiwari, "Logic and System Design for Low Power Consumption," Ph.D. thesis, Princeton
University, 1996.
[36]
V. Tiwari, S. Malik, A. Wolfe, T.C. Lee, "Instruction Level Power Analysis and Optimization of
Software," Journalof VLSI Signal ProcessingSystems, Vol. 13, No. 2, August 1996, Available:
http://www.ee.princeton.edu/-vivek/publications.html
[37]
Transmeta Corporation, "The Technology Behind CrusoeTM Processors," [Online Whitepaper],
January 2000, pp. 16-17, Available: http://www.transmeta.com/crusoe/download/pdf/
crusoetechwp.pdf
[38]
A. Wang, W. Rabiner Heinzelman, and A. Chandrakasan, "Energy-Scalable Protocols for BatteryOperated Microsensor Networks," IEEE Workshop on Signal Processing Systems (SiPS '99),
October 1999, Taipei, Taiwan.
74