Master Thesis in Computer Engineering

advertisement
Mälardalen University
Department of Computer Engineering
Supervisor: Raimo Haukilahti
Examiner: Lennart Lindh
Master Thesis in Computer Engineering
Energy Modelling of
Portable Embedded Systems
Lena Higberg
lhg98006@student.mdh.se
Västerås, 2002
Foreword
This document describes a master thesis in computer engineering and consists of two documents. The
first document Power/Energy Simulators For Embedded Systems is a state-of-the-art report that
consists of a survey of existing power simulators for embedded systems. The other document Energy
Simulation of OS kernel System Calls describe the extension of a simulator and the results obtained
during simulation.
Acknowledgement
I would sincerely like to thank my supervisor, Raimo Haukilahti, for all the help during this master
thesis. I would also like to thank Johan Stärner at Mälardalen University who has helped me during the
modifications of the simulator sim-cache.
Contents
Power/Energy Simulators for Embedded Systems
Energy Simulation of OS kernel System Calls
Lena Higberg
Energy Modelling of Portable Embedded Systems
3
19
2
Mälardalen University
Department of Computer Engineering
Supervisor: Raimo Haukilahti
Examiner: Lennart Lindh
Power/Energy Simulators
for Embedded Systems
Lena Higberg
lhg98006@student.mdh.se
Västerås, 2002
Lena Higberg
Energy Modelling of Portable Embedded Systems
3
Abstract
This state-of-the-art report is the first part of a master thesis in computer engineering and consists of a
survey of existing techniques for energy modelling of complete embedded systems and their
components. The power simulators described in this report are: Wattch, SimplePower, PowerTimer,
Tempest, SimBed and three other nameless methods to estimate the power consumed in a system.
Also SimpleScalar has been described even though it is not a power simulator because several of the
power simulators described in this report is based on SimpleScalar. The purpose of this survey is to
gather information on different power simulators and see if there exists any simulator that can handle
to simulate a whole system, that is both an application and an operating system, and preferably give
both the power consumption and the performance (i.e. the execution time) for both the application and
the operating system separately.
Lena Higberg
Energy Modelling of Portable Embedded Systems
4
Table of Contents
1 Introduction ............................................................................................................................... 6
1.1 Background ...................................................................................................................... 6
1.2 Purpose ............................................................................................................................. 6
1.3 Limitations ....................................................................................................................... 6
2 Power simulators ..................................................................................................................... 6
2.1 Architectural level simulators .......................................................................................... 7
3 SimpleScalar............................................................................................................................ 8
3.1 Simulator internals ........................................................................................................... 8
4 Wattch ..................................................................................................................................... 9
5 SimplePower ......................................................................................................................... 10
6 SimBed .................................................................................................................................. 12
6.1 Simulator description ..................................................................................................... 12
7 TEM2P2EST .......................................................................................................................... 13
8 PowerTimer ........................................................................................................................... 13
9 A framework for energy analysis of embedded operating systems....................................... 13
10 A energy and performance profiler for embedded systems ................................................ 14
11 A hardware/software approach to analyse the energy overhead ......................................... 15
Conclusions .............................................................................................................................. 16
References ................................................................................................................................ 17
Lena Higberg
Energy Modelling of Portable Embedded Systems
5
1 Introduction
This state-of-the-art report is the first part of a master thesis in computer engineering, and consists of a
survey of existing techniques for energy modelling of complete embedded systems and their
components. The simulators described in this report are: SimpleScalar, Wattch, PowerAnalyzer,
SimplePower, PowerTimer, Tempest, SimBed and three nameless methods that estimates the power
consumption. A short description of the background to this master thesis and the purpose with this first
part of this master thesis will be given here in this section.
1.1 Background
“Recently the power consumption within embedded systems has gained attention and has become one
of the primary design constraints. Portable designs such as MP3-players, palmtops and cellular phones
contain batteries and have therefore a limited amount of energy. Operation time is then dependent on
the systems power consumption. To be able to make a good energy-efficient design, low-power design
techniques must be applied at many levels of abstraction and to each component.” Consequently lowpower design must be applied at the lowest level of abstractions as well as at the architectural and
system levels (i.e. for applications and operating systems).
Being able to simulate the power consumption of a system is important because if the power
consumption is too high, the designers of the system can make changes in a very early stage of the
design, thereby saving both time and money.
1.2 Purpose
The purpose with this state-of-the-art report is to make a survey of existing power simulators at
architectural level, and especially to find out if anyone of the simulators can handle to simulate a
whole system, that is both an application and an operating system. Moreover the simulator should
produce both the power consumption and execution time, preferably both for the operating system and
for the application separately.
1.3 Limitations
The limitation for this report is that only architectural energy simulators are described. For example
there are three relatively known simulators: Rsim [1], Simics [2], SimOS [3], that is not described in
this report since they are only performance simulators. Neither are methods on how to reduce the
power consumption nor optimising techniques being brought up.
2 Power simulators
Low power design can be classified into following levels of abstractions [4], [5]:
 Application/System level – the energy consumption to run a particular program as well as an
operating system (i.e. system calls) can be reduced at this level.
 Behavioral/Algorithm level – different algorithms for the same purpose gives different amount
of power consumption.
 Architectural level – here the power consumption in for example caches, core and processor
buses are being analysed and optimised.
 Logic (gate) level – at this level both the function and the style of the circuit is decided. There
are various designing styles and each have their power-performance tradeoffs.
 Transistor (circuit) level – this is the lowest level of abstractions, there are techniques unique
to this level that can be used to further limit the power dissipation in the circuit (e.g. changing
the input voltage, reordering of transistors).
Lena Higberg
Energy Modelling of Portable Embedded Systems
6
The reader may find an overview of the levels in figure 1, and their relations according to capacity,
accuracy, speed, resources and savings.
Abstraction
level
Analysis
Capacity
Analysis
Accuracy
Analysis
Speed
Analysis
Resources
Energy
Savings
Most
Worst
Fastest
Least
Most
Least
Best
Slowest
Most
Least
Application/System
Behavioral/Algorithm
Architectural
Logic (Gate)
Transistor (Curcuit)
Figure 1: An overview over the levels of abstraction. This picture is from [5].
In this report power simulators at the architectural level are being surveyed.
2.1 Architectural level simulators
An architectural simulator is a tool that reproduces the behaviour of a computing device [6]. In figure
2, the reader can view a taxonomy of hardware modeling tools.
Architectural
Trace-Driven
Execution-Driven
Emulation
Direct Execution
Figure 2: A taxonomy of hardware modeling tools [10].
A trace-driven simulator uses a trace of executed instructions, obtained by first executing the program
on a real system. This trace is then used to drive a model of the system to be tested. To be able to
collect the instruction trace, it requires the use of a variety of hardware- and software techniques.
Trace-driven simulators cannot model miss-speculated code execution as execution-driven simulators,
because the instruction trace are only recorded at correct program execution [6].
Execution-driven simulators also called event-driven, reproduces the execution of instructions on the
simulated machine either by emulation or direct execution. The processor and surrounding
components such as memory are simulated. A reference generator simulates the activities of the
processor and issues memory references or commands to a simulator of the memory system. When the
memory system simulator receives a reference or command from the reference generator (which is
now a program rather than a predetermined trace), it simulates the path of the reference through the
extended memory hierarchy – including contention with other references – and returns to the reference
generator the time that the reference took to be satisfied. Execution-driven simulation provides more
accuracy than trace-driven simulation because of this feedback that takes place from the memory
system simulator to the reference generator [6, 8].
Direct-execution decouples functional and timing simulation [7]. Functional simulation generates
values (for memory and registers) and control flow, by executing the application on a real processor
(the host) while storing the values. Timing simulation determines the number of cycles taken by the
simulated execution (timing for non-memory instructions is determined mostly by static analysis).
Lena Higberg
Energy Modelling of Portable Embedded Systems
7
Emulation, on the other hand, is when a model of the processor that is to be simulated, is created. The
application is then executed on the model.
3 SimpleScalar
SimpleScalar is a cycle-accurate architectural level processor simulator [8, 9]. It is distributed free-ofcharge to academic non-commercial users, with all source code, making it possible to relatively easily
extend the simulator. Since SimpleScalar were released, it has become popular with as can be seen
here in this survey since many of the simulators described here is based on SimpleScalar.
SimpleScalar tool set includes several simulators ranging from a fast functional simulator to a detailed,
dynamically scheduled processor model that supports non-blocking caches, speculative execution and
state-of-the-art branch prediction. SimpleScalar cannot simulate a whole system, i.e. it can only
simulate applications, and does not produce the power consumption as a result of the simulation.
SimpleScalar is being described in this report, despite the fact that it is only a performance simulator,
because several of the power simulators described later in this report is based on SimpleScalar.
A new version of SimpleScalar is due some time this year; the new version includes, among other
simulators, one that can handle to simulate a whole system but only according to performance though
[10]. PowerAnalyzer [10], a power simulator based on SimpleScalar/ARM (also not ready yet), will
also be included in this new version.
3.1 Simulator internals
Figure 3 shows an overview of all the simulators that are included in SimpleScalar.
Sim-Fast
Sim-Safe
Sim-Profile
Sim-Cache
Sim-Cheetah
Sim-Outorder
- 420 lines
- no timing
- 4+MIPS
- 350 lines
- no timing
- w/ checks
- 900 lines
- no timing
- lot of stats
- -1000 lines
- functional
- cache stats
- 3900 lines
- performance
- OoO issue
- branch pred.
- mis-spec.
- ALUs
- cache
- TLB
-150 KIPS
Performance
Detail
Figure 3: SimpleScalar simulators, performance verses detail [2].
The fastest, least detailed simulator, sim-fast, does no time accounting, only functional simulation – it
executes each instruction serially, simulating no instructions in parallel, no cache and no instruction
checking. The simulation speed (on P4-1.7GHz) for sim-fast is 10+ millions of instructions per second
(IPS).
A separate version of sim-fast, called sim-safe, also performs functional simulation, but checks for
correct alignment and access permissions for each memory reference. The results produced with this
simulator is the same as for sim-fast with the difference that the total number of loads and stores
executed are included also.
The SimpleScalar tool set also includes two cache simulators, sim-cache and sim-cheetah. These
simulators perform high-level cache studies but do not take access time of the caches into account
(e.g., studies that are concerned only with miss rates). The simulators simulates level one instruction
Lena Higberg
Energy Modelling of Portable Embedded Systems
8
cache, level one data cache, level two unified cache, instruction TLB and data TLB. TLB stands for
translation lookaside buffer, and stores the most recent page table entry references.
Also a simulator that produces profile information is included, sim-profile, that generates detailed
profiles on instruction classes and addresses text symbols, memory accesses, branches and data
segment symbols.
The most complicated simulator is sim-outorder, which is a detailed out-of-order issue simulator with
a multi-level memory system. This simulator produces the total result of all the simulators above and
also the number of cycles taken by the simulated execution (i.e. timing simulation). The simulation
speed (on P4-1.7GHz) for sim-outorder is 350+ KIPS.
The SimpleScalar hardware model’s software architecture is shown in figure 4. Applications run at the
model using execution-driven simulation, which requires the inclusion of an instruction-set emulator
and an I/O emulation module.
Host Interface
Target ISA
I/O interface
Target ISA emulator
I/O emulator
B Pred
Resource
Cache
Simulator
Core
Loader
Stats
Memory
Regs
Host Interface
Host Platform
Figure 4: The SimpleScalar hardware model’s software architecture [4].
The I/O emulation module provides simulated programs with access to external input and output
facilities. SimpleScalar supports several I/O emulation modules, ranging from system-call emulation
to full-system simulation. For system-call emulation the system invokes the I/O module whenever a
program attempts to execute a system call in the instruction set interpreter. The system emulates the
call by translating it to an equivalent host operating-system call and directing the simulator to execute
the call on the simulated programs behalf.
The simulator core defines the simulators main loop, which executes one iteration for each instruction
of the program until finished. For a timing model (i.e. sim-outorder) the main loop must account for
the progression of execution time measured in clock cycles instead of instructions.
SimpleScalar models several instruction sets including SimpleScalar PISA and Alpha and supports
several of host platforms like Windows/NT, Linux/x86 and Sparc/Solaris.
4 Wattch
Wattch is a simulator that estimates processor power consumption at the architectural level, developed
at Princeton University, and is one of the simulators that are based on SimpleScalar. The simulators
power estimation is based on a suite of parameterizable power models for different hardware
structures [11]. SimpleScalar is used as the cycle level performance simulator, which keeps track of
which units are accessed per cycle and records the total energy consumed for an application. Wattch
uses a modified version of SimpleScalar’s sim-outorder, which is extended with an additional number
Lena Higberg
Energy Modelling of Portable Embedded Systems
9
of pipeline stages so that it will be more in line with current microprocessors. Since the simulator is
based on sim-outorder it models therefore an Alpha processor. Figure 5 pictorially describes the
structure of Wattch.
Hardware
Config
Power Estimate
Cycle-Level
Performance
Simulator
Binary
Parameterizable
Power Models
Cycle-by-Cycle HW
Access Counts
Performance
Estimate
Figure 5: The structure of Wattch [11].
There are three possible ways to use Wattch: One way is the case where the user is interested in
comparing several design configurations that are achievable simply by varying parameters for
hardware structures that are modelled (micro architectural tradeoffs). The other usage scenario is for
software or compiler development, where a single hardware configuration is used and several
programs are simulated and compared (compiler optimisation). The third usage scenario highlights
Wattch’s modularity, additional hardware modules can be added to the simulator (hardware
optimisation).
The main processor units that are modelled falls into four categories [11]:
 Array structures: Data and instruction caches, cache tag arrays, all register files, register alias
table, branch predictors and large portions of the instruction window and load/store queue.
 Fully Associative Content-Addressable Memories (CAM): Instruction window/reorder buffer
wakeup logic, load/store order checks and TLB:s for example.
 Combinational Logic and Wires: Functional Units, instruction window selection logic,
dependency check logic and result buses.
 Clocking: Clock buffers, clock wires and capacitive loads.
The simulation speed is reduced with approximately 30%, compared to performance simulation (simoutorder alone). The accuracy is approximately +/-13% [12]. Wattch is distributed freely for noncommercial use, with source code.
5 SimplePower
SimplePower is an execution-driven, cycle-accurate architectural level energy estimation tool. The
simulated system consists of the processor core, on-chip instruction and data caches, off-chip memory
and the interconnect buses between the core and the caches and between the caches and the off-chip
memory, see figure 6. SimplePower can point out power hot spots in hardware and software before
systems are built. SimplePower was developed at the Pennsylvania State University.
Lena Higberg
Energy Modelling of Portable Embedded Systems
10
SimplePower
Main
Memory
I cache
D cache
Output module
Cache/Bus simulator
Energy Statistics
Core (pipeline)
Power estimation interface
Core
Energy
Memory
Energy
Bus
I/O
Energy Energy
Switch Capacitance Tables
Figure 6: SimplePower structure [14].
SimplePower simulates an in-order processor with a 5-stage pipeline. Perfect cache is assumed.
SimplePower is also based on SimpleScalar and models a subset (the integer part excluding division)
of the instruction set of SimpleScalar. Clock power is not implemented and neither are system calls
nor the processors control unit.
The core simulates the execution of all active instructions at each clock cycle. All activated functional
units corresponding power estimation interfaces are called by the core. To be able to keep the
simulator technology independent, the power estimation interface was developed for all the
architectural level functional units. In that way only the table or the interface implementation needs to
be changed if the architecture of a unit is changed.
SimplePower uses transition-sensitive energy models and analytical energy models. The energy
consumption is impacted by switching activity. When the energy model captures the switching activity
we refer to the technique as a transition-sensitive approach (in contrast to the analytical energy model).
SimplePower uses the number of transitions in a given operation to calculate the power consumption.
Transitions accounts for the main part of the processors power consumption. The technique builds an
energy model for each functional unit. These transition-sensitive models contain, switch capacitance
(in form of a table) for a functional unit for each input transition obtained from VLSI layouts and
extensive HSPICE simulation. These switch capacitance tables can be used to calculate the power
consumed by the unit in reaction to an input transition. The input transition can be either a complete
instruction or the toggling of a data line. The problem with input transition is that the tables can grow
large if the number of possible input combinations is high. Therefore a clustering technique is used,
that coordinates similar transitions and energy patterns together.
The energy model used by SimplePower to estimate the energy consumption for system buses is
transition-sensitive. The energy consumption of the buses depends on the switching activity on the bus
lines and the capacitive load of each bus line. SimplePower uses predefined transition-sensitive
models for each functional unit (ALU, multiplier, divider) to estimate the energy consumption for the
datapath. On the contrary, to estimate the energy in the memories a simple analytical energy model is
used.
SimplePower provides the following outputs: the register file final status, the total number of cycles in
execution, the number of transitions in the buses, switch capacitance statistics for different functional
units and the total switch capacitance [13]. The total energy consumption can be calculated (E = C*V 2)
using the total capacitance.
Lena Higberg
Energy Modelling of Portable Embedded Systems
11
The average error compared to values using HSPICE (a circuit level simulator), was found to be
within 15% for all the units. The simulation speed is not presented in these articles [13, 14].
SimplePower is distributed free of charge and can be found at the university’s homepage.
6 SimBed
SimBed [15, 16] is an execution-driven simulation testbed that measures the execution behaviour and
power consumption of embedded applications and real-time operating systems (RTOSs) by executing
them on an accurate architectural model of a micro controller with simulated real-time stimuli. The
processor simulator measures the power consumption of the system, with accuracy to within 10-15%
of real measurements.
6.1 Simulator description
SimBed is a cycle-accurate processor model that emulates the Motorola M-CORE processor (a lowpower CPU core) as the microcontroller. All devices, interrupts and interrupt handlers used by the
operating system and application are accurately simulated. To make the simulation more realistic,
some background load should be run (real-time stimuli). This was made by running two tasks: a
periodic control loop task and an aperiodic inter-process communication task.
SimBed keeps track of real-time jitter (differs slightly from traditional definition), response-time delay
(differs significantly from traditional definition) and total CPU energy consumption divided into user,
kernel, handler, semaphore and idle components. Figure 7 pictorially describes the test-bed structure.
Statistics:
Power data
Jitter data
Delay data
Applications
RTOSs
SimBed
Host platform
Figure 7: Test-bed structure [16].
There is also a flash memory simulator included in the emulator, because microcontrollers often has
FLASH memory. The emulator executes the program directly from “FLASH”, and no write is allowed
to this memory. The user of the test-bed uses a download tool that is included in SimBed, which
downloads the code to the FLASH memory. The display emulator makes it possible to print out on the
screen from the application. SimBed includes I/O simulation so that it can handle applications with
input/output operations (like MPEG). There is also an interrupt controller that handles/controls all
external interrupts.
The power consumption model used by SimBed is based on experiment data (instead of simulation).
For each single instruction, the power consumption is measured by using an infinite loop with only the
instruction that is to be measured inside. This measured power number will be the base power
consumption number of this specific instruction. When multiplying this number with the execution
time of the instruction, the basic energy consumption of the instruction will be obtained. This number
will however be too small, due to some overhead that needs to be accounted for. One explanation for
this is that during the single instruction test, the state of all the modules inside the processor will not
change as much as if the previous instruction would have been different, this is called inter-instruction
Lena Higberg
Energy Modelling of Portable Embedded Systems
12
overhead. Another factor that influence the accuracy is the changing of the operators of each
instruction. The total power consumption for a single instruction is therefore the basic power
consumption for the instruction, the approximated inter-instruction overhead and the average number
for operator variation, all added together.
7 TEM2P2EST
TEM2P2EST stands for Thermal Enabled Multi-Model Power/Performance ESTimator and can be
used both at architectural level as well as compiler level [17]. Tempest is a cycle-accurate micro
architectural power and performance estimator. Also this simulator is based on SimpleScalar simulator
sim-outorder. The simulator can estimate power consumption either by using empirical data or
analytical power models, the user can select which mode they want to use. The power models
estimates both the dynamic and the leakage power since leakage power is becoming more and more
imported due to the shrinking process technology. Additional features included are technology scaling
options and dual Vt technology support. There is also a thermal model included in the simulator that
converts the power numbers into a temperature profile.
8 PowerTimer
PowerTimer [18] is a fast, cycle-accurate, parameterised research simulator, developed by IBM
research group to aid in the evaluation of future PowerPCTM processors from the viewpoint of powerperformance efficiency. PowerTimer extends a research simulator called Turandot [28, 29] that
models a generic, parameterised, out-of-order superscalar processor, level 1 data and instruction cache,
level 2 cache, branch predictor and main memory.
The energy models are derived from real, circuit-level power simulation data. These models are
controlled by two sets of parameters:
 Technology/circuit parameters – which allows appropriate scaling from one CMOS generation
to the next.
 Microarchitectural-level parameters – various queue/buffer size, pipe latencies and bandwidth
values.
That can be determined by the user of the simulation tool.
PowerTimer can be used in two different modes. The performance simulator can be used standalone,
and then the statistics from that simulation can be processed through the energy models to generate
average unit-wise power numbers. Or the energy models can be embedded in the actual simulation
code. This allows the user to view the cycle-by-cycle energy characterization as well as the average
unit-wise statistics as in the first mode.
The accuracy and the speed of the simulation are not presented in the article. PowerTimer is
distributed freely.
9 A framework for energy analysis of embedded operating systems
Robert P. Dick et al. in [19, 20] has developed an energy analysis framework that can be used to
analyse the energy consumption for the functions of RTOSs and applications.
The internal operation of the SPARClite processor is simulated using a cycle-accurate instruction set
simulator. In order to account for the effect of cache misses, an on-line SPARClite cache simulator is
used. The framework also consists of memory, timer, UART and a bus interface models. Other
Lena Higberg
Energy Modelling of Portable Embedded Systems
13
peripherals (i.e. other hardware components, e.g. brake sensors, ASICS) can also be added to the
simulator.
The processor has a 5-stage pipeline, SPARC v8 instruction set architecture and also a power-down
mode (by reducing the frequency) that can be used to reduce energy consumption.
The framework gives a detailed report of the energy consumed by the applications and the RTOSs
functions, using call-trees. A call tree is a graphical description of the function-call hierarchy and
contains power statistics in the form of a histogram. Each tree node corresponds to a function call and
has a child node for each new function call within the function itself, the energy and time consumed
for every function call are annotated.
Each component in the simulator has a power model that observes the data flowing through the
component and computes the power as a function of the present and past values at its terminals. The
power consumed by all the components is aggregated to get the total power consumed in the system
per cycle. The processor power model uses the current and previous instruction codes among other
statistics to determine the processor power consumption. Memory energy consumption is derived from
the manufacturers data-sheet.
It is easy to add new hardware to the simulated system (if the hardware implementation is known, the
energy consumption can be computed using known energy analysis techniques [25][26][27]).
The accuracy of this energy analysis framework is not presented in the article.
10 An energy and performance profiler for embedded systems
T. Šimunić et al. presents in [21], a source code optimisation methodology and a profiler for energy
consumption and performance in embedded systems. This profiler was developed by Stanford
University and Bologna University in cooperation/collaboration with Hewlett-Packard Laboratories.
The profiler simulates embedded systems consisting of a microprocessor with two levels of cache, offchip memory, DC-DC converter and battery.
The profiler extends previous work that was made by the same authors [22]. This work in turn
extended the ARMulator, which is a proprietary instruction-level performance simulator from ARM
inc., with cycle-accurate energy models for all system components. In order to evaluate energy
efficiency of two different implementations, the designer would need to obtain cycle-by-cycle plots
and then manually relate cycles to the software portion of interest. This is why the profiler was
created.
The profiler works concurrently with the cycle-accurate simulator and samples periodically the
simulation results. The profiler maps the energy and performance to the executed function using
information gathered at the compilation time. The user can decide how often the profiler will gather
information, whereby the simulation speed is increasing. Usually an interval of 1 μs is sufficient.
The profiler can output both the performance and energy consumption for all components in the
system as well as for the source code. If choosing to profile the source code, the output shows the total
energy consumption or performance for each function and their “underlaying” functions that are
called. The programs total energy consumption is the energy consumed by the main function.
Accuracy is within 5% of the hardware measurements for the tested system.
Lena Higberg
Energy Modelling of Portable Embedded Systems
14
11 A hardware/software approach to analyse the energy overhead
L. Benini et al. from the University of Bologna proposes, in [23], a methodology to analyse the energy
overhead due to the presence of an embedded operating system in a portable device. They used a
hardware/software system (a case studie) to analyse the energy consumed within the system that is to
be tested. The hardware system is the SmartBadgeIII, a prototype of wearable devices from HewlettPackard Laboratories. As operating system they used eCos, witch is a real-time operating system from
Red Hat that was ported to the target platform, that is the SmartBadge.
The SmartBadgeIII has a StrongARM 1100 processor [24], and integrates in the same chip the ARM
core, a memory management unit, interrupt and DMA controller and many I/O controllers like UART,
audio and LCD. The system also contains some memory: data and instruction caches, flash and static
RAM.
The experimental set-up consists of a hardware component and a software component, see figure 8.
The hardware component consists of an I/V conversion board that converts the current (I) absorbed by
the SmartBadge to voltage (U) values. This value is then sent to a data acquisition board (DAQ) that
communicates to a PC, which runs a LABVIEW program that controls the measurement framework.
To obtain the energy consumption there is a need for both the current and the execution time to be
known. For that reason an accurate software trigger is used. The DAQ board allows an external signal
to start and stop the measurement and the signal is provided by driving a general-purpose pin at the
processor. The LABVIEW program is then responsible for providing the energy values by combining
the power and time information’s.
Voltage
values
SmartBadge
I
I/V
conversion
board
U
DAQ
Time
PC
that runs
LABVIEW
Figure 8: The experimental set-up.
Neither the accuracy nor the simulation speed of this method is presented in the article.
Lena Higberg
Energy Modelling of Portable Embedded Systems
15
Conclusions
To get an overview of the simulators looked into in this report and to make it easier to compare the
simulators against each other, a table with all the simulators was made.
Power
estimation
accuracy
Components
that are being
simulated
Distributed
freely?
Can handle
OS and app?
Performance
and/or power
simulator?
Processors
supported
Wattch
+/-13%
Yes
Application
Both
SimplePower
15%
Yes
Application
Energy
SimBed
10-15%
No
Both
Power
PowerTimer
-
Cache, off-chip
memory, I/O
Perfect cache,
off-chip memory
and buses, I/O
Flash memory,
I/O
L1, L2 cache,
ext. memory
Yes
Application
Both
Alpha (Simoutorder)
Integer subset of
SimpleScalar
PISA
Motorola
M-CORE
PowerPC
Tempest
-
Cache, ext.
memory, I/O
No
Application
Both
Alpha (Simoutorder)
R. P. Dick et al.
-
No
Both
Energy
T. Šimunić et
al.
5%
No
Application
Energy
Fujitsu
SPARClite
ARM
L. Benini et al.
-
Cache, DRAM,
timer, UART
Cache, off-chip
memory, DCDC, battery
All components
in SmartBadge
No
Both
Energy
StrongARM 1100
Table 1: A comparison of the simulators looked into in this report.
Lena Higberg
Energy Modelling of Portable Embedded Systems
16
References
[1] C. J. Hughes, V. S. Pai, P. Ranganathan, S. V. Adve, Rsim: Simulating Shared-Memory
Multiprocessors with ILP Processors, 2002.
[2] Peter S. Magnusson et al., Simics: A Full System Simulation Platform, 2002.
[3] M. Rosenblum, S. A. Herrod, E. Witchel, A. Gupta, Complete Computer System Simulation: The
SimOS Approch, 1995.
[4] Bengt Oelmann, Asynchronous and Mixed Synchronous/Asynchronous Design Techniques for Low
Power, KTH 2000.
[5] Pradip Bose et al., Power-Efficient Design: Modeling and Optimizations, tutorial, ISCA, 2001.
[6] David E. Culler and Jaswinder Oal Singh (1999), Parallel Computer Architecture A
hardware/software approach, Morgan Kaufmann Publishers, Inc. Pages 231-234.
[7] M. Durbhakula, V. S. Pai, S. Adve, Improving the Accuracy vs. Speed Tradeoff for
Simulating Shared-Memory Multiprocessors with ILP Processors.
[8] Todd Austin, Eric Larson and Dan Ernst, SimpleScalar: An Infrastructure for Computer System
Modeling, IEEE, february 2002.
[9] Doug Burger and Todd M. Austin, The SimpleScalar Tool Set, Version 2.0. 1997.
[10] Todd Austin et al., SimpleScalar Tutorial (for releas 4.0), held at MICRO-34, 2001.
[11] D. Brooks, V. Tiwari, M. Martonosi, Wattch: A Framework for Architectural-Level Power
Analysis and Optimizations, ISCA, 2000.
[12] S. Ghiasi, D. Grunwald, A Comparison of Two Architectural Power Models, 2000.
[13] W. Ye, N. Vijaykrishnan, M. Kandemir and M. J. Irwin, The Design and Use of SimplePower: A
Cycle-Accurate Energy Estimation Tool, Microsystems Design Lab, The Pennsylvania State
University, DAC 2000.
[14] N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, W. Ye, Energy-Driven Integrated
Hardware-Software Optimizations Using SimplePower, ISCA, 2000.
[15] K. Baynes, C. Collins, E. Fitherman, B. Ganesh, P. Kohout, C. Smit, T. Zhang and B. Jacob, The
Performance and Energy Consumption of Three Embedded Real-Time Operating Systems, November
2001.
[16] T. Zhang, RTOS Performance and Energy Consumption Analysis Based on an Embedded System
Testbe, Master’s Thesis, University of Maryland, May 2001.
[17] A. Dhodapkar, C. H. Lim, G. Cai, W. R. Daasch, TEM2P2EST: A Thermal Enabled Multi-model
Power/Performance ESTimator, 2001.
[18] D. Brooks, M. Martonosi, J-D. Wellman, P. Bose, Power-Performance Modeling and Tradeoff
Analysis for a High End Microprocessor, 2000.
Lena Higberg
Energy Modelling of Portable Embedded Systems
17
[19] R. P. Dick, G. Lakshaminarayana, A. Raghunatham, N. K. Jha, Power Analysis of Embedded
Operating Systems, presented at ACM, 2000.
[20] R. P. Dick, G. Lakshaminarayana, A. Raghunatham, N. K. Jha, Power Analysis of Embedded
Operating Systems, to be presented at ACM, 2000.
[21] Tajana Šimunić, L. Benini, G. De Micheli, Mat Hans, Source Code Optimization and Profiling of
Energy Consumption in Embedded Systems, 2000.
[22] Tajana Šimunić, L. Benini, G. De Micheli, Cycle-Accurate Simulation of Energy Consumption in
Embedded Systems, proceedings of DAC, 1999.
[23] A. Acquaviva, L. Benini, B. Riccò, Energy Characterization of Embedded Real-Time Operating
Systems, 2001.
[24] Advanced RISC Machines Ltd., Advanced RISC Machines Architectural Reference Manual, July
1996.
[25] L. Benini, G. De Micheli, Dynamic Power Management: Design techniques and CAD tools,
1997.
[26] A. R. Chandrakasan, R. W. Brodersen, Low Power Digital CMOS Design, 1995.
[27] J. Rabaey, M. P. (Editors), Low Power Design Methodologies, 1996.
[28] M. Moudgill, P. Bose, J. Moreno, Validation of Turandot, a fast processor model for
microarchitecture exploration, IEEE, 1999.
[29] M. Moudgill, J. Wellman, J. Moreno, Environment for PowerPC microarchitecture exploration,
1999.
Lena Higberg
Energy Modelling of Portable Embedded Systems
18
Mälardalen University
Department of Computer Engineering
Supervisor: Raimo Haukilahti
Examiner: Lennart Lindh
Energy Simulation of OS kernel System Calls
Lena Higberg
lhg98006@student.mdh.se
Västerås, 2002
Lena Higberg
Energy Modelling of Portable Embedded Systems
19
Abstract
The SimpleScalar simulator, sim-cache, has been extended so that it can handle to simulate not only an
application but also an operating system. As a result of the simulation the execution times and energy
consumption for the different system calls in the operating system is produced. The operating system
used is SW Symo; somewhat rewritten though so it will work with the simulator. The simulated
system consists of a MIPS processor, caches and off-chip memory. Energy consumed in CPU and
caches are modelled using the energy simulator Wattch and simple energy models have also been
added for the main memory and the off-chip bus. Minimum, maximum and average execution times
and energy consumptions for the different system calls in SW Symo has been simulated using a simple
application. The simulated execution times has been compared to those measured when running SW
Symo on a M68000 system [1] and a conclusion can be drawn that all but one of the system calls has a
lower simulated execution time; probably partly depending on the fact that the M68000 system has no
cache.
Lena Higberg
Energy Modelling of Portable Embedded Systems
20
Table of contents
1 Introduction………….……………………………………………………………………………... 22
1.1 Purpose………………………………………………………………………………………. 22
1.2 Motivation……………………………………………………………………………………. 22
1.3 Limitations…………………………………………………………………………………… 22
2 Related work………………………………………………………………………………………... 22
3 Problem description..………………………….……………………………………………………. 22
4 Problem analysis……………………………………………………………………………………. 23
5 Method……………………………………………………………………………………………… 23
6 Solutions……………………………………………………………………………………………. 24
6.1 The operating system, SW Symo…………………………………………………………… 24
6.1.1 Implemented system calls……………………………………………………………………….. 25
6.2 MIPS instruction set architecture……………………………………………………………. 26
6.3 Modifications in sim-cache………………………………………………………………….. 26
6.4 Problems occurred during implementation…………………………………………………... 27
6.5 Execution times for each system call………………………………………………………… 28
6.6 Energy modelling…………………………………………………………………………….. 30
7 Results……………………………………………………………………………………………….32
Summary………………………………………………………………………………………………33
References……………………………………………………………………………………………. 34
Appendix A……………………………………………………………………………………………35
Appendix B……………………………………….…………………………………………………... 39
Lena Higberg
Energy Modelling of Portable Embedded Systems
21
1 Introduction
A short description of the purpose, motivation and the limitations of the implementation part of this
master thesis will be given here in this section. The background for this master thesis, to why energy is
an important aspect, has already been described in the state-of-the-art report.
1.1 Purpose
The purpose with this second part of the master thesis was to choose and extend a simulator so the
energy consumption for the different system calls in an operating system could be obtained. Hence,
the simulator should be able to handle the requests mentioned in the state-of-the-art report. That is so
that the simulator can handle to simulate a whole system, in other words both an application and an
operating system. As a result of the simulation, both the power consumption and the performance (i.e.
the execution time) for each system call in the operating system should be produced.
1.2 Motivation
Energy consumption has become an important aspect while designing a portable embedded system.
Portable devices depend on battery power and have therefore a limited amount of energy. Operation
time is then dependent on the systems power consumption. High power consumption also contributes
in more heat developing.
Hence, there is a need to study the energy consumed in embedded systems under different
circumstances. One aspect to explore could be the energy consumed for the operating system in an
embedded system, considering an operating system completely in software verses an operating system
partly in hardware. To be able to compare these two cases, the energy consumed for each system call
in the operating system in both cases can be compared.
1.2 Limitations
The execution time measuring and the energy modelling is somewhat simplified due to time
restriction. Also, the simulator can only simulate a specific operating system.
2 Related work
Similar work that has been done can be found in articles [13],[14] and [15]. Article [13] presents
modelling of embedded systems with SimBed, both execution behaviour (jitter and delay) and the
energy consumption of embedded applications and RTOSs is measured. The energy consumption is
not measured for each system call though, as has been done here, it is measured for the applications,
idle task, semaphores and so on. Article [14] also analysis the power consumption of an RTOS and an
application. The energy consumption is measured for the application tasks and can be traced for each
function in the task. This article does not either measure the energy consumption for the different
system call in the operating system. In article [15] a hardware/software method is used to analyse the
energy overhead due to the presence of an embedded operating system in a wearable device. In this
article though the minimum, maximum and average energy consumed for each system calls in the OS
is measured.
3 Problem descriptions
This master thesis is about energy characterization of the different system calls of an operating system.
As an example, consider two systems:
Lena Higberg
Energy Modelling of Portable Embedded Systems
22


A CPU and memory – and on this system running an operating system and an application.
A CPU, a RTU (Real Time Unit – a real-time kernel in hardware) and memory – consequently
(part) of the operating system is in hardware, and running on the system is (the software part
of) the operating system and an application.
How much energy is consumed by the operating system in each case, for each system call? This is
what is of interest, but this master thesis handles only the first case though.
4 Problem analysis
To be able to find out the energy consumption for each system call, there is a need for a simulator.
Does it already exist a simulator that can simulate the energy consumption for each system call? Or is
it necessary to extend a simulator?
During the first part of this master thesis, a survey was made on existing energy simulators. The
conclusion from this work was that it does not exist a simulator that can handle our requests, so there
is a need to extend a simulator.
When deciding which simulator to extend, the processor simulated and the result of the simulation,
among other things, must be considered. Preferably the simulator should simulate a processor for
embedded systems; these processors are often a bit simpler than, for instance, processors for PC’s.
In addition, operating system to be used must be decided as well as the application. As mentioned
earlier, this master thesis handles only the part where an operating system in software is used, but
preferably there should be a possibility to add a RTU to the simulator. What needs to be decided is
pictorially described in figure 1 here below.
Appl.
RTU
+ OS
Simulator
Modelled CPU
Result
Figure 1: What needs to be decided before starting implementation.
5 Method
The first thing that had to be done was to decide which simulator to extend, and with that, which
processor to model. An examination of the summery table from the state-of-the-art-report above,
showed that most of the simulators are not possible choices due to the fact that they are not accessible
or that they model a too complex processor. A conclusion was drawn in collaboration with the tutor,
that there are in fact only two possibilities:
 Either, a SimpleScalar [5,6,7] simulator can be extended, or
 seamless, a simulator developed by Mentor Graphics, can be used together with the T.
Šimunić et. al method [8].
If we were to extend a SimpleScalar simulator we would need to add operating system functionality
and energy modelling to the existing simulator. Sim-cache, modelling the SimpleScalar PISA, would
have been the most suitable simulator to use. PISA contains most part of the MIPS instruction set
architecture, so that would be the architecture simulated.
Lena Higberg
Energy Modelling of Portable Embedded Systems
23
If the T. Šimunić et. al method together with seamless, instead of the ARMulator that they used, was to
be used we would have a simulator that models an ARM processor and can handle an operating
system. It would be the energy modelling part that needs to be added.
An overview of the two different choices can be found in figure 2.
OS + application
OS + application
Sim-cache
MIPS
Seamless
ARM
CACHE
MEM
Result
Possibility to add
RTU.
MEM
Result
Figure 2: The two possible choices of simulators to extend.
One advantage using sim-cache would be that it executes at instruction-level while seamless executes
at bit-level. An advantage using the T. Šimunić et al. method is that it is possible to add VHDL code to
seamless, thus making it easier to add an RTU to the system. Another advantage is that it simulates
ARM code while sim-cache cannot handle this yet. A new version of SimpleScalar, that models an
ARM processor, is being developed but is not completed yet.
As operating system SW Symo would be possible to use. It is an operating system developed at the
university by a student as a master thesis [1]. It is a complete software version of the operating system
symo that is partly written in hardware (RTU), the interface is exactly the same for both operating
systems.
6 Solutions
A decision was made to use and extend the SimpleScalar simulator, sim-cache. There was no
particular reason for this choice, both methods described in the earlier chapter have their advantages
and disadvantages and neither seemed to be a better choice than the other. It felt kind of easier though,
to modify sim-cache because during the first part of this master thesis SimpleScalar had been studied
and tested.
The SimpleScalar simulators models the portable instruction set architecture (PISA), which is
SimpleScalar’s own instruction set that contains most of the 64-bit MIPS instruction set architecture.
If using Linux as the development environment, as in this case, it follows the big endian structure (i.e.
byte zero is always the most significant byte). Before starting to implement, the code for the simulator
was studied to get an overview of the system. The SimpleScalar tool set, version 3, had already been
installed; the simulators can be downloaded, at no cost, through the SimpleScalar homepage [7]. There
is a Webb site [3] on the Internet that contains all the files and some descriptions, that has been a big
help during this part.
6.1 The operating system, SW Symo
As mentioned in chapter 4, the operating system SW Symo [1] could be used, and a decision was
made to do so. SW Symo had to be modified to some extent, because it was implemented to execute
Lena Higberg
Energy Modelling of Portable Embedded Systems
24
on a Motorola 68000 system. Therefore the assembly code had to be re-written to MIPS assembly
code that is simulated by sim-cache. To be able to implement the assembler code in MIPS assembler
instead of M68000, it becomes necessary to modify the task control block (TCB) structure. Also, the
special in- and output routines that depended on the platform that was used had to be changed.
The external interrupt routines that are included in the operating system will not be implemented,
mostly due to the fact that they are not of interest in this case. The timer cannot be used with simcache and because of this the simulator must, instead, determine when it is time for the operating
systems task switch routine to run.
When the operating system was written in M68000 assembler, the kernel code, which must not be
interrupted, was protected by executing it in supervisory mode. When using SW Symo with simcache, the kernel code cannot be run in supervisory mode and can therefore be interrupted when simcache is to execute the task switch routine, SCHEDULER. This is prohibited by examination of the
address in sim-cache before starting the execution of the task switch routine.
To be able to run the operating system on the simulator it must be compiled with the special
SimpleScalar compiler, that compiles the code so that the simulator can understand it, and with the
libraries and the crt0 file that accompanies the SimpleScalar tool set. This makes it necessary to
change the makefile.
The linker script [4] that came with SW Symo could not be used since it was made for a M68000
system and must now fit a MIPS system and memory map instead. The order that files are organised in
the memory can also be controlled in the linker script, thus making it possible to organise the code in a
way so that the kernel code is separated from the application code. Another reason why the linker
script had to be modified is that there is a need to know the address to the task switch routine; also this
can be controlled in the linker script.
Files that has been modified is of course the assembler files: ass_support.S, symosysc.S, symo_off.inc
and symo_ext.inc. Due to the fact that the TCB is changed, the files symodef.h, symo_off.i and
rtu_file.c also needs to be modified and the application file, symo_os.c, is naturally changed. Not all
files are needed for use with sim-cache, such is the case for e.g. the hardware dependent routines in the
files basinout.obj, inpout.obj.
When writing an application for SW Symo, the user has to make sure that there exists an idle task in
the system, with the lowest priority (priority 7).
6.1.1 Implemented system calls
Here follows a very short description of the system calls that is still in use (some of the system calls
can not be used now, as for example the system calls that handles external interrupts):
Initialisation routines:
os_init – initialises variables and lists.
Thread management routines:
thread_create – create a thread and initialise the TCB for the specific thread.
thread_start – the first time this system call is made it starts to execute the first thread in ready queue
(i.e. the thread with highest priority). In all other cases, it makes the specified thread ready.
thread_delete – deletes/terminates the currently running thread.
thread_block – the currently running thread becomes blocked.
thread_yield – if there exists another thread in the system with the same priority switch executing task.
thread_getinfo – returns information on the specified thread.
Time management routines:
Lena Higberg
Energy Modelling of Portable Embedded Systems
25
init_period_time – initialises the periodic time for a periodic thread.
wait_for_next_period – current periodic thread is made to wait for next period.
stop_period – disables periodic start for the specified periodic thread.
start_period – enables periodic start for the specified periodic thread.
delay – the executing thread is set to sleep (waiting) for a specified amount of ticks/time.
remove_from_timeq – removes specified thread from the delay / period queue and activates
respectively terminates the thread.
read_timeq – returns the time left in the waiting queue for a specified thread.
Semaphore functions:
create_semaphore – creates and initialises a semaphore.
delete_semaphore – deletes the specified semaphore.
pend_semaphore – makes the currently running thread pending for the specified semaphore.
release_semaphore – release the specified semaphore.
read_semaphore – returns the count value for the specified semaphore.
6.2 The MIPS instruction set architecture
A MIPS processor consists of an integer processing unit (the CPU) and a collection of coprocessors.
Coprocessor 0 handles traps, exceptions, and the virtual memory system. One of the registers in
coprocessor 0 is the EPC (Exception Program Counter) register, which normally contains the address
of the program counter when an exception occures. In this case it is used to save the address where the
application is interrupted when the task switch routine is to be executed. This register can be read by
using the mfc0 - move from coprocessor 0, instruction.
MEMORY
CPU
FPU (coprocessor 1)
Register
$0-$31
Register
$f0-$f31
Arithmet
ic unit
Divide
Multiply
Arithmet
ic unit
LO
HI
Coprocessor 0
(Traps & mem)
BadVAddr
Cause
Status
EPC
Figure 3: The MIPS architecture.
The MIPS central processing unit contains 32 general-purpose registers that are numbered 0-31.
Register n is designated by $n. A set of conventions as to how registers should be used has been
established and can be found in [2]. The pseudo-op codes and all instructions can also be found in [2],
but not all of these instructions and op-codes are included in PISA.
6.3 Modifications in sim-cache
The timer routines cannot be used to decide when it is time to switch tasks (as already mentioned in
5.1) so instead this has to be controlled by the simulator. After a number of instructions has been
executed the task switch routine (SCHEDULER) will be run by the simulator, sim-cache, instead. This
is set to 35 000 instructions but can easily be changed. The reason, why the number of instructions was
chosen to be 35 000, was that initialisation routines and the start of the system have had lots of time to
Lena Higberg
Energy Modelling of Portable Embedded Systems
26
execute and finish and that SCHEDULER should be able to run and that there still would be time left
for the application to execute in between the task switches. When SCHEDULER is to be executed the
address where the application was interrupted is saved in the EPC register, that can be read by
SCHUDELER using the mfc0 instruction.
To be able to add operating system functionality to sim-cache the address to the task switch routine
must be known so that the execution path can be changed. The linker script, as described in section
6.2, controls the address of this function. While compiling the operating system an option is made to
save the addresses, of where functions and variables is put in the memory, to a file (sumo.map). The
address for this task switch routine can be found in this file.
SimpleScalar PISA does not include all the MIPS instructions, especially one is not included; the mfc0
instruction needed to translate the assembler part of the operating system. This makes it necessary to
do some changes in sim-cache so that the simulator can handle this necessary instruction. This had
already been done once by Johan Stärner at the university, so there was no reason to do this work
again. The files changed for this reason were: machine.c/h/def (pisa.c/h/def), dlite.c, regs.h and simcache.c. The compiler from the SimpleScalar version 4 can be used, as the instruction has already been
added to this compiler.
Some of the system calls requires a direct execution of a task switch routine, that is the thread_delete,
thread_block, delay and the wait_for_next_period system calls. The simulator has to control if the
application is to make one of these system calls, so that the address of where the application has been
interrupted can be saved in the EPC register.
In the linker script file for the operating system all the kernel code that is not to be interrupted is put in
one address space so that the simulator can check before running SCHEDULER that no such code is
running, that has already been described in chapter 5.2. Also here the addresses can be found in the file
sumo.map. The main function from the extended sim-cache can be viewed in appendix C.
6.4 Problems occurred during implementation
One problem encountered during implementation was an error occurring while linking the operating
system:
File:rownr: Relocation truncated to fit: GPREL variable
After a great deal of searching the conclusion could be drawn, that the global variables in the assembly
code caused this error. The reason for this was never found, but if moving the definition on these
global variables to c-code the error disappeared.
Another problem that took a long time to find seemed to be caused due to the fact that the function
printf is non re-entrant. By e-mailing the SimpleScalar help-mailinglist as well as directly to
SimpleScalar, an attempt was made to try and verify this theory, but there was no response to these
mails. The actual code for printf is not available (at least I cannot find it) so the possibility to verify
the theory by examination of the code is not, to my knowledge, possible. This makes it necessary to
protect the function somehow and it was decided to leave this responsibility to the user that writes the
application, by demanding that printf must be used with semaphores. This makes it to a non-possibility
to use printf in the idle task, as it would then interfere with the applications (if idle holds the
semaphore). Another possibility to solve this problem would be to make the address check, before
running the task switch routine, in sim-cache more complex so that it want interrupt printf.
Lena Higberg
Energy Modelling of Portable Embedded Systems
27
6.5 Execution times for each system call
Execution times have been estimated for the system calls described in section 6.1. To be able to get the
execution time for each system call, the number of cycles needed to execute the system call is
calculated.
The simulated cache structure is: 8 KB level 1 instruction cache, 8 KB level 1 data cache and 256 KB
unified cache. In the article [12], the access times for the cache was used that can be found in table 1
below. These access times seemed to be sensible so a decision was made to use these numbers. Thus,
if there is a cache hit at level 1 an assumption is made that the instruction takes 1 cycle to execute.
L1 cache access time (cycles)
L2 cache access time (cycles)
Memory latency (cycles)
1
12
54
Table 1: Micro architectural parameters used during simulation.
The cycle time for the main memory is assumed to be 90 ns and is derived from article [10]. This is
the only article found that accounts for both cycle time and energy consumption for the memory,
which in this article is 1 MB SRAM. Given that the simulated frequency is 600 MHz (why this
frequency is used will be described in section 6.6), it comes down to a memory latency of 54 cycles.
Each system call has to be executed at least a few times so that an average number of cycles needed to
execute the system call can be calculated. The average value is needed because depending on where in
the code the system call is made it is possible to get a different amount of cycles needed to execute the
system call. The system call and the average number of cycles to complete the system call are written
to a file called syscallstats.dat, along with the minimum and maximum value calculated. Also here it is
necessary to know the address space for kernel code, to be able to know when the system call has
finished executing.
For some system calls, as mentioned earlier, it is necessary to run the task switch routine directly
afterwards. When this is necessary the execution of the task switch routine is included to the number
of cycles needed for the execution of the system call.
Figure 4 below shows the execution times that were retrieved from a simulation using the application
that can be viewed in appendix B. The system call os_init is not included in the figure because the
simulated execution time for this system call is so much higher compared to the other system calls.
The simulated execution time for os_init is 10782 cycles and since it is an OS initialisation routine it is
only executed once.
The difference between the minimum and maximum values of the execution times and energy
consumption that were retrieved by the simulation were rather large for some of the system calls. For
the system call thread_start it could depend on that it has a special function the first time it is
executed. The system call thread_create is the first system call made in the application, which could
have something to do with the reason of why there is such a difference in the values.
Lena Higberg
Energy Modelling of Portable Embedded Systems
28
3500
3000
Cycles
2500
2000
Min
Avarage
Max
1500
1000
pend_semaphore
thread_create
thread_getinfo
thread_yield
thread_start
delay
thread_delete
wait_for_next_period
delete_semaphore
release_semaphore
thread_block
start_period
create_semaphore
read_timeq
read_semaphore
stop_period
init_period_time
0
remove_from_timeq
500
System calls
Figure 4: The minimum, average and maximum cycles needed to execute the system calls.
A comparison was made with the results that Lariza Rizvanovic [1] received when executing SW
Symo on the M68000 system. The comparison can be viewed in table 2 below. The execution times
for SW Symo running on a M68000 system was measured in seconds and has been recalculated to
cycles assuming the frequency is the default for the system, i.e. 16 MHz. The reason to compare in
cycles rather than in seconds, is that execution times in second is even more frequency dependent than
using cycles. The comparison is made with the average simulated execution times.
System call
init_period
start_period
stop_period
create_semaphore
pend_semaphore*
read_semaphore
release_semaphore
delete_semaphore
thread_create
os_init
thread_yield
thread_start
remove_from_timeq
thread_delete
thread_block
Exec. times M68000
Cycles
s
42,7
684
41,6
666
40,1
642
44,3
709
70
1120
40,8
653
70,2/71
1123/1136
67
1072
127
2032
2735
43760
37,4
598
91/160
1456/2560
70
1120
79
1264
43
688
Sim. avr. exec. times
Cycles
429
268
437
750
537
256
619
378
1028
10732
261
859
687
1317
533
Difference
Cycles
255
398
205
-41
583
397
504/517
694
1004
33000
337
597/1701
433
-53
155
*Pend_semaphore(if semaphore free and semaphoreq empty) 70s
Table 2: A comparison between execution times.
Lena Higberg
Energy Modelling of Portable Embedded Systems
29
The execution time for the system call pend_semaphore in [1] assumes that the semaphore is free and
that the semaphore queue is empty. This is not the case in the simulated execution time. The system
calls release_semaphore and thread_start has two different values depending on the circumstances
when called. Again this has not been under consideration while simulating the execution times.
The conclusion that can be made by studying table 2 is that for almost all of the system calls the
simulated execution time is faster. This was in fact somewhat expected, since the M68000 system has
no cache in contrast to the simulated system that has both level 1 and level 2 caches. But for most of
the system calls the execution times seems to follow each other pretty well, high execution time
system calls seems to be high in both cases. As for example the execution time for the system call
os_init is much higher than for all the other system calls in both cases.
6.6 Energy modelling
While doing energy modelling all the components in the system must be considered, i.e. the caches,
the main memory and the bus has to be modelled as well as the processor itself, see figure 5.
CPU
L1 Instr. Cache
L1 Data Cache
L2 Instr./Data Cache
MEMORY
Figure 5: The simulated architecture, the dotted lines are to show that there are two different chips.
The energy consumed in caches and CPU is modelled using the power simulator Wattch together with
sim-cache. The supervisor to this thesis, Raimo Haukilahti, has connected the simulators together and
has done the changes needed in the simulators for this purpose. The frequency used in Wattch is 600
MHz.
The processor and the caches are on one chip while the memory is on another (off-chip memory). This
makes it necessary for an off-chip bus, which consumes a considerable amount of energy and needs
therefore to be modelled. An assumption is made that there is a 32-bit address bus and a 64-bit data
bus. Each bus line is assumed to have a capacitive load of 20pF [9] and the voltage is assumed to
3,3V, which has been used in both articles [12] and [10]. By using the equation:
E = 0.5*C*V2
(1)
Where E is energy (J), C is the capacitance (F) and V is the voltage (V), the energy consumed for each
bus line can be calculated. The value on a bus line can naturally be either 1 or 0, and energy is
consumed every time the value of the line switches.
The energy consumed by the bus is evaluated by monitoring the switching activity on each bus line.
Due to time restriction the energy model is simplified by assuming that 50% of the lines are changed
every time the bus is used, instead of calculating exactly how many of the lines switches values.
Lena Higberg
Energy Modelling of Portable Embedded Systems
30
The energy needed to access the main memory is also somewhat simplified. Both read and write
operation is assumed to consume 4,95 nJ each time. This number on energy consumption for each
memory access has been used in both article [9] and [10]. When not accessed the memory consumes
some energy anyway. The energy 0,000066667 nJ ((1/600MHz)*0,01mW) is used as idle energy for
the memory, which has been used in article [10].
In figure 6 the simulated energy consumption for the different system can be found. The application
used while retrieving these results can be viewed in appendix B. Also here the system call os_init is
not included in the figure because the energy consumed by this system call is much higher than for the
other system calls. The energy consumption for os_init is 9262,2 nJ.
900
800
700
600
Min
nJ
500
Avarage
400
Max
300
200
pend_semaphore
thread_create
thread_getinfo
thread_yield
thread_start
delay
thread_delete
wait_for_next_period
delete_semaphore
release_semaphore
thread_block
start_period
create_semaphore
read_timeq
read_semaphore
stop_period
init_period_time
0
remove_from_timeq
100
System calls
Figure 6: The energy consumed for the different system calls.
As mentioned in section 2, related work, the energy consumption of kernel functions has also been
measured in article [15], using eCos as operating system. The energy consumption has been measured
at frequencies of 59 MHz and 221.2 MHz. The overall energy consumption presented in this article is
higher with the lower frequency. At the higher frequency, i.e. 221.2 MHz, the measured energy
consumption is between 340 nJ to 13540 nJ. Considering the fact that the operating systems are
different and that the simulated frequency is higher (600 MHz), it is difficult though to compare the
energy consumptions for the system calls presented in the article to the simulated energy
consumptions presented here in this report.
Lena Higberg
Energy Modelling of Portable Embedded Systems
31
7 Result, future work
The extended simulator developed in this master thesis can handle to simulate a whole system and as a
result of the simulation produce the performance and energy consumption for the system calls in the
operating system. The energy models for the main memory and off-chip bus is a bit simplified though.
When modelling the energy consumed in the off-chip bus an assumption is made that 50% of the bus
lines switches values. Further on, the simulator simulates an MIPS processor even though an ARM
processor was to be preferred. A version 4 of SimpleScalar simulators, that can simulate an ARM
processor, is being developed but was not finished as the implementation started. There has not been
time to look into the possibilities to add an RTU to the simulator.
The operating system has been simulated together with a simple test application and the execution
times and energy consumed for each system call has been obtained. The simulated execution times
have been compared to the execution times measured in [1] when running SW Symo on a M68000
system. A conclusion that can be made by studying table 2, section 6.5, is that for almost all of the
system calls the simulated execution time is much faster. This was in fact somewhat expected, since
the M68000 system has no cache. Considering the simulated energy consumptions there is not much
that can be said about the obtained results since there are no values to compare the result with.
Lena Higberg
Energy Modelling of Portable Embedded Systems
32
Summary
Out of two possibilities as described in section 5, the chose of simulator to be extended was the
SimpleScalar simulator; sim-cache [5]. Sim-cache models the SimpleScalar PISA and since PISA
contains most part of the MIPS instruction set architecture, the simulated processor naturally became
an MIPS processor. The operating system functionality were added to sim-cache, by making simcache start the execution of the OS task switch routine after a number of instructions has executed.
The operating system used for simulation was decided to be SW Symo [1], a complete software
version of the operating system Symo that is partly written in hardware. Since the operating system
was implemented to run on a M68000 system the assembler files had to be rewritten to MIPS
assembler. This made it necessary to also change the TCB structure. The makefile and the linker script
had to be changed as well, since the special SimpleScalar compilers must be used and the linker script
must now fit the MIPS memory map. Also, in the linker script the addresses of the functions can be
decided which is needed for use with sim-cache.
Execution times are measured for the different system calls in the operating system. If there is a cache
hit on level 1, it is assumed that it takes only one cycle to execute the instruction. A cache miss on
level 1 means that the unified l2 cache must be accessed, thus increasing the execution time for the
instruction with 12 cycles. The access time for the memory is assumed to be 54 cycles. All of these
assumptions are based on numbers that have been found when reading articles, these specific numbers
of cycles has been used in articles [12] and [10.]
An energy modelling part was also added to the simulator. The cache and CPU is modelled using the
power simulator Wattch. The supervisor of this master thesis, helped connecting Wattch to sim-cache,
and made the necessary changes in the simulators. A simple memory and off-chip bus energy model is
added as well. When modelling the off-chip bus an assumption is made that 50% of the bus lines
switch values each time the bus is used, instead of calculating exactly how many of the lines switches
values. This simplification was made mostly due to time restriction. The energy consumed in the
memory both if accessed and on idle can be found in article [10]. The capacitive load on the bus line is
assumed to be the same as in article [9].
As a result of the simulation the minimum, maximum and average execution times and energy
consumed for the different system calls in the operating system is produced. The execution time is
measured in cycles and the energy in nJ.
The execution times and energy consumption for the different system calls has been obtained by
simulating a simple application with the operating system, the application used can be found in
appendix B. The simulated execution times have been compared to the execution times measured for
the system calls when running SW Symo on an M68000 system [1]. A conclusion that can be made by
studying table 2 in section 6.5 is that all but one of the system calls has a lower simulated execution
time; probably partly depending on the fact that the M68000 system has no cache.
Lena Higberg
Energy Modelling of Portable Embedded Systems
33
References
[1]Larisa Rizvanovic, Symo HW/SW Real-Time Kernel for single processor system, master thesis,
Mälardalen University, 2001.
[2] Larry Huffman, David Graves, MIPSpro™ Assembly Language Programmer’s Guide,
http://www.mips.com/Documentation/MIPSPro_Ass._Lang_Vol1.pdf, 1996.
[3] Yuan Wei, Simplescalar Source Code Analysis,
http://www.cs.virginia.edu/~yw3f/cs757/simplescalar/index.html, 2001.
[4] Steve Chamberlain, Ian Lance Taylor, Cygnus Solutions, Linker Scripts,
http://www.geekgadgets.org/docs/ld_3.html, 2001.
[5] Doug Burger and Todd M. Austin, The SimpleScalar Tool Set, Version 2.0. 1997.
[6] Todd Austin et al., SimpleScalar Tutorial (for releas 4.0), held at MICRO-34, 2001.
[7] The SimpleScalar LLC homepage, www.SimpleScalar.com, 2001.
[8] Tajana Šimunić, L. Benini, G. De Micheli, Mat Hans, Source Code Optimization and Profiling of
Energy Consumption in Embedded Systems, 2000.
[9] N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, W. Ye, Energy-Driven Integrated
Hardware-Software Optimizations Using SimplePower, ISCA, 2000.
[10] Tajana Šimunić, L. Benini, G. De Micheli, Cycle-Accurate Simulation of Energy Consumption in
Embedded Systems, proceedings of DAC, 1999.
[11] V. Srinivasan, E. S. Davidson, G. S. Tyson, M. J. Charney, T. R. Puzak, Branch History Guided
Instruction Prefetshing, 2001.
[12] D. Brooks, V. Tiwari, M. Martonosi, Wattch: A Framework for Architectural-Level Power
Analysis and Optimizations, ISCA, 2000.
[13] K. Baynes, C. Collins, E. Fitherman, B. Ganesh, P. Kohout, C. Smit, T. Zhang and B. Jacob, The
Performance and Energy Consumption of Three Embedded Real-Time Operating Systems, November
2001.
[14] R. P. Dick, G. Lakshaminarayana, A. Raghunatham, N. K. Jha, Power Analysis of Embedded
Operating Systems, presented at ACM, 2000.
[15] A. Acquaviva, L. Benini, B. Riccò, Energy Characterization of Embedded Real-Time Operating
Systems, 2001
Lena Higberg
Energy Modelling of Portable Embedded Systems
34
Appendix A
The main function of the extended file, sim-cache.c:
#define NROFSYSCALLS 19
#define ADDRBUS_WIDTH 32
#define DATABUS_WIDTH 64
void sim_main(void) {
int i, line;
md_inst_t inst;
register md_addr_t addr;
enum md_opcode op;
register int is_write;
enum md_fault_type fault;
struct stat_stat_t stat;
struct stat_stat_t * statpointer;
counter_t cache_dl1_accesses_old; /*Contains the old value of nr of accesses*/
counter_t cache_dl1_accesses_hits;/*Contains the old value of nr of accesses*/
counter_t cache_il1_hits_old, cache_il1_misses_old, cache_dl1_hits_old,
cache_dl1_misses_old, cache_dl2_hits_old, cache_dl2_misses_old;
counter_t cycles = 0, cycles_old = 0;
int calcSyscallStats = 0; /* Calculating syscall stats? (1=yes) */
long start_value = 0;
int syscall;
double syscallEnergy = 0;
syscallStats_t syscallStats[NROFSYSCALLS]; /* If the array size is changed...
/* the func. initStrings must be changed too!! */
double totCacheCPU_energy = 0; /* Energy consumed in cache and CPU (nJ) */
double bus_energy = 0; /* Energy consumed in the off-chip bus (nJ). */
double mem_energy = 0; /* Energy consumed in the main memory (nJ). */
/* The capacitance for the lines in the bus, C=20pF, the voltage, V=3.3V. */
double E = 0.1089;
/* Energy consumed per busline, 0.5*C*V*V, (nJ) */
FILE *file, *file1, *totEnergyFile;
int instr_counter = 0;
fprintf(stderr, "sim: ** starting functional simulation w/ caches **\n");
/** Initialisation routines... **/
file1 = fopen("syscallstats.dat","w");
if (file1 == NULL){
printf("ERROR: Could not open file.");
exit(0);
}
fprintf(file1, "Syscall\t\t\tMin\tAvr\tMax\n");
for (syscall = 0; syscall <= NROFSYSCALLS-1; syscall++){
syscallStats[syscall].nrOfSyscalls = 0;
syscallStats[syscall].exectime.min = 30000;
syscallStats[syscall].exectime.sum = 0;
syscallStats[syscall].exectime.max = 0;
syscallStats[syscall].energy.min = 30000;
syscallStats[syscall].energy.sum = 0;
syscallStats[syscall].energy.max = 0;
}
initStrings(syscallStats);
totEnergyFile = fopen("energy.dat","w");
if (totEnergyFile == NULL){
printf("ERROR: Could not open file.");
exit(0);
}
fprintf(totEnergyFile, "Syscall\t\t\tBusEnergy\tMemEnergy\tCPUCacheEnergy\tTOTAL\n");
for (line=0; line<ADDRBUS_WIDTH; line++)
addressBus[line] = 0;
for (line=0; line<DATABUS_WIDTH; line++)
dataBus[line] = 0;
file = fopen("output.txt","w");
if (file == NULL){
printf("ERROR: Could not open file.");
exit(0);
}
fprintf(file, "#Instr Tot_cycles Instr_cycles Icache Dcache L2Cache CPU TOTAL\n"); // output
data header
statpointer = &stat;
/* set up initial default next PC */
Lena Higberg
Energy Modelling of Portable Embedded Systems
35
regs.regs_NPC = regs.regs_PC + sizeof(md_inst_t);
/* check for DLite debugger entry condition */
if (dlite_check_break(regs.regs_PC, /* no access */0, /* addr */0, 0, 0))
dlite_main(regs.regs_PC - sizeof(md_inst_t), regs.regs_PC,
sim_num_insn, &regs, mem);
while (TRUE)
{
/*rhi:added for Wattch to clear hardware access counters */
clear_access_stats();/* main_mem_access also cleared here. */
/* maintain $r0 semantics */
regs.regs_R[MD_REG_ZERO] = 0;
#ifdef TARGET_ALPHA
regs.regs_F.d[MD_REG_ZERO] = 0.0;
#endif /* TARGET_ALPHA */
cache_dl1_accesses_old = cache_dl1->hits + cache_dl1->misses;
cache_il1_hits_old=cache_il1->hits;
cache_il1_misses_old=cache_il1->misses;
cache_dl1_hits_old=cache_dl1->hits;
cache_dl1_misses_old=cache_dl1->misses;
cache_dl2_hits_old=cache_dl2->hits;
cache_dl2_misses_old=cache_dl2->misses;
cycles_old = cycles;
/* get the next instruction to execute */
if (itlb)
cache_access(itlb, Read, IACOMPRESS(regs.regs_PC),
NULL, ISCOMPRESS(sizeof(md_inst_t)), 0, NULL, NULL);
if (cache_il1){
/* rhi: added for wattch */
icache_access++;
cache_access(cache_il1, Read, IACOMPRESS(regs.regs_PC),
NULL, ISCOMPRESS(sizeof(md_inst_t)), 0, NULL, NULL);
}
MD_FETCH_INST(inst, mem, regs.regs_PC);
/* keep an instruction count */
sim_num_insn++;
/* set default reference address and access mode */
addr = 0; is_write = FALSE;
/* set default fault - none */
fault = md_fault_none;
/* decode the instruction */
MD_SET_OPCODE(op, inst);
/* execute the instruction */
switch (op)
{
#define DEFINST(OP,MSK,NAME,OPFORM,RES,FLAGS,O1,O2,I1,I2,I3)
case OP:
SYMCAT(OP,_IMPL);
break;
#define DEFLINK(OP,MSK,NAME,MASK,SHIFT)
case OP:
panic("attempted to execute a linking opcode");
#define CONNECT(OP)
#define DECLARE_FAULT(FAULT)
{ fault = (FAULT); break; }
#include "machine.def"
default:
panic("attempted to execute a bogus opcode");
}
\
\
\
\
\
\
if (fault != md_fault_none)
fatal("fault (%d) detected @ 0x%08p", fault, regs.regs_PC);
if (MD_OP_FLAGS(op) & F_MEM){
sim_num_refs++;
if (MD_OP_FLAGS(op) & F_STORE)
is_write = TRUE;
}
/* update any stats tracked by PC */
for (i=0; i < pcstat_nelt; i++){
Lena Higberg
Energy Modelling of Portable Embedded Systems
36
counter_t newval;
int delta;
/* check if any tracked stats changed */
newval = STATVAL(pcstat_stats[i]);
delta = newval - pcstat_lastvals[i];
if (delta != 0){
stat_add_samples(pcstat_sdists[i], regs.regs_PC, delta);
pcstat_lastvals[i] = newval;
}
}
/* check for DLite debugger entry condition */
if (dlite_check_break(regs.regs_NPC, is_write ? ACCESS_WRITE : ACCESS_READ,
addr, sim_num_insn, sim_num_insn))
dlite_main(regs.regs_PC, regs.regs_NPC, sim_num_insn, &regs, mem);
/*** Counting cycles... ***/
if (cache_dl1->hits+cache_dl1->misses > cache_dl1_accesses_old){
dcache_access++;
}
/* Instruction cache is always accessed... */
if (cache_il1->hits > cache_il1_hits_old)
cycles++;
/* l1 hit. */
else
cycles += 12;
/* l1 miss. */
/* miss in dl1 cache ? */
if(cache_dl1->misses > cache_dl1_misses_old)
cycles += 12;
/* accessed main memory ? */
if(cache_dl2->misses > cache_dl2_misses_old)
cycles += (cache_dl2->misses - cache_dl2_misses_old)*54;/*could be several accesses */
if(main_mem_access != (cache_dl2->misses - cache_dl2_misses_old))
printf("\nWARNING !!!!! inconsistend data: memory accesses\n");
/*rhi:Added by Wattch to update per-cycle power statistics */
update_power_stats();
totCacheCPU_energy += report_totCache_energy();
/** Estimate bus and memory energy consumption... **/
/* Using a simplified bus energy model for now..
-> assume 50% of the lines swithces each time the bus is used... */
/* Energy consumed per line that switches value: E = 0.1089 nJ */
/* Energy consumed in memory: on access 4.95nJ, else 0.0000166667nJ */
if ( main_mem_access > 0 ){
mem_energy += 4.95*main_mem_access;
bus_energy += ((ADDRBUS_WIDTH+DATABUS_WIDTH)*0.5*E) * main_mem_access;
}else{
mem_energy += 0.0000166667; /* Some energy consumed in memory anyway. */
/* No energy consumed in buses. */
}
/** Already estimating exectime and energy consumption for a syscall? **/
if (calcSyscallStats == 1){
if (regs.regs_NPC >= 0x00408000){
/* if syscall finished, save syscall stats... */
syscallStats[syscall].nrOfSyscalls++;
/* ..exectime stats */
syscallStats[syscall].exectime.sum += cycles - start_value;
if (syscallStats[syscall].exectime.min > cycles - start_value)
syscallStats[syscall].exectime.min = cycles - start_value;
if (syscallStats[syscall].exectime.max < cycles - start_value)
syscallStats[syscall].exectime.max = cycles - start_value;
/* ..and energy stats */
syscallEnergy = bus_energy+mem_energy+totCacheCPU_energy;
syscallStats[syscall].energy.sum += syscallEnergy;
if (syscallStats[syscall].energy.min > syscallEnergy)
syscallStats[syscall].energy.min = syscallEnergy;
if (syscallStats[syscall].energy.max < syscallEnergy)
syscallStats[syscall].energy.max = syscallEnergy;
/* print result to file. */
fprintf(totEnergyFile, "%f\t%f\t%f\t%f\n", bus_energy, mem_energy,
totCacheCPU_energy, bus_energy+mem_energy+totCacheCPU_energy);
calcSyscallStats = 0;
}
}
/** Is a new system call made? **/
Lena Higberg
Energy Modelling of Portable Embedded Systems
37
if (strcmp(MD_OP_NAME(op),"jal") == 0 &&
regs.regs_NPC >= 0x004014c0 &&
regs.regs_NPC <= 0x00402690 &&
calcSyscallStats != 1){
syscall = check_syscall(regs.regs_NPC, totEnergyFile);
if (syscall==19)
/* False alarm... */
calcSyscallStats = 0;
/*fprintf(file1, "%x ", regs.regs_NPC);*/
else{
calcSyscallStats = 1; /* start tracking syscall stats. */
start_value = cycles;
mem_energy = 0;
bus_energy = 0;
totCacheCPU_energy = 0;
}
}
/** Check if syscall requires direct execution of taskswitch... **/
/* if so save address to EPC reg. */
if (strcmp(MD_OP_NAME(op),"jal")==0){
/* syscall addresses in sumo.map */
if (regs.regs_NPC == 0x00401ff8 /* delay */
|| regs.regs_NPC == 0x00402130
/* wait_for_next_period */
|| regs.regs_NPC == 0x00401be8
/* thread_block */
/*|| regs.regs_NPC == 0x004018b8
/* thread_delete */
|| regs.regs_NPC == 0x00401910
/* thread_yield */
|| regs.regs_NPC == 0x00402610
/* pend_sem */
) {
/* NPC is changed due to the fact that jal is a jump instruction. */
regs.regs_COPROC.epc = regs.regs_PC + sizeof(md_inst_t);
}
}
/** Run SCHEDULER? **/
instr_counter++;
/* After 35000 instr let SCHEDULER run, if OS kernel code is not already running. */
if (instr_counter >= 35000 &&
regs.regs_NPC >= 0x00408000/* &&
regs.regs_NPC <= 0x00408570 /*not kernel code*/){
printf("** 35 000 instr (%d) - run SCHEDULER **\n", instr_counter);
regs.regs_COPROC.epc = regs.regs_NPC;
/** 0x004010a0 is the startadress for SCHEDULER (see sumo.map). */
regs.regs_PC = 0x004010a0;
regs.regs_NPC = 0x004010a0 + sizeof(md_inst_t);
instr_counter = 0;
}
else{
/* go to the next instruction */
regs.regs_PC = regs.regs_NPC;
regs.regs_NPC += sizeof(md_inst_t);
}
/* finish early? */
if (max_insts && sim_num_insn >= max_insts){
/** When simulating a OS we always end up here... **/
/* so write result to file and close. */
for (syscall = 0; syscall <= NROFSYSCALLS-1; syscall++){
if (syscallStats[syscall].nrOfSyscalls > 0)
fprintf(file1, "%d %s%d\t%d\t%d\t%.2f\t%.2f\t%.2f\n",
syscallStats[syscall].nrOfSyscalls,
syscallStats[syscall].name,
syscallStats[syscall].exectime.min,
(syscallStats[syscall].exectime.sum/
syscallStats[syscall].nrOfSyscalls),
syscallStats[syscall].exectime.max,
syscallStats[syscall].energy.min,
(syscallStats[syscall].energy.sum/
syscallStats[syscall].nrOfSyscalls),
syscallStats[syscall].energy.max);
}
fclose(file);
fclose(file1);
fclose(totEnergyFile);
return;
}
}
}
Lena Higberg
Energy Modelling of Portable Embedded Systems
38
Appendix B
The application (symo_os.c) used when retrieving the results (execution time and energy consumption
for the different system calls).
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
IDLE_ID 1
IDLE_PRIO 15
task1_ID 2
task1_PRIO 0
task2_ID 3
task2_PRIO 0
y_ID 4
y_PRIO 0
t3_ID 5
t3_PRIO 6
t4_ID 6
t4_PRIO 6
void main(void){
printf("STARTMAIN\n");
os_init();
thread_create(IDLE_ID, IDLE_PRIO,ready,idle, idle_stack,STACK_SIZE);
thread_create(y_ID, y_PRIO,blocked, y_idle, y_stack, STACK_SIZE);
thread_create(task2_ID, task2_PRIO, ready,task2, task2_stack,STACK_SIZE);
thread_create(task1_ID, task1_PRIO, ready,task1, task1_stack,STACK_SIZE);
thread_create(t3_ID, t3_PRIO, ready,task3, t3_stack,STACK_SIZE);
thread_create(t4_ID, t4_PRIO, ready,task4, t4_stack,STACK_SIZE);
if( create_semaphore(1,5))
printf("sem1 error\n");
if(create_semaphore(2,2))
printf("sem2 error\n");
thread_start(IDLE_ID);
while (1){}
}
void idle(void){
while(1){}
}
void y_idle(void){
int flag=0;
timeq_info_t info;
init_period_time(2); /* Maste anropas fran aktuellt task!! */
start_period(y_ID);
create_semaphore(3,3);
read_semaphore(3);
while (1){
pend_semaphore(1);
printf("Y\n");
release_semaphore(1);
pend_semaphore(3);
delete_semaphore(3);
info = read_timeq(0,y_ID);
read_semaphore(1);
/*delay(1): Periodic threads not allowed to use delay!!*/
if (flag==1){
thread_start(task1_ID);
remove_from_timeq(0,t4_ID);
}
pend_semaphore(1);
printf("Y - wait_for_next_period anropas\n");
release_semaphore(1);
wait_for_next_period();
flag=1;
}
}
void task1(void){
thread_start(y_ID);
read_semaphore(1);
while(1){
pend_semaphore(1);
printf("t1\n");
release_semaphore(1);
thread_block();
Lena Higberg
Energy Modelling of Portable Embedded Systems
39
pend_semaphore(1);
printf("t1 - Some task woke me up!!\n");
release_semaphore(1);
thread_yield();
}
}
void task2(void){
thread_info_t information;
while(1){
pend_semaphore(1);
read_semaphore(2);
printf("t2\n");
release_semaphore(1);
pend_semaphore(2);
delete_semaphore(2);
information =thread_getinfo(task2_ID);
if(information.priority == 1){
pend_semaphore(1);
printf("prio=1\n");
release_semaphore(1);
}
pend_semaphore(1);
printf("t2 - delay anropas\n");
release_semaphore(1);
delay(1);
pend_semaphore(1);
printf("t2 - After delay!\n");
release_semaphore(1);
read_semaphore(1);
thread_start(task1_ID);
pend_semaphore(1);
printf("t2 - task 1 is now ready, this task will be deleted\n");
stop_period(t4_ID);
release_semaphore(1);
thread_delete();
}
}
void task3(){
init_period_time(3);
start_period(t3_ID);
pend_semaphore(1);
printf("t3\n");
release_semaphore(1);
while (1){
wait_for_next_period();
thread_getinfo(t3_ID);
/*stop_period(t3_ID);
remove_from_timeq(0,t3_ID); /* 1-activate, 0-terminate */
}
}
void task4(){
int i=0;
delay(1);
init_period_time(2);
start_period(t4_ID);
thread_getinfo(t4_ID);
thread_getinfo(t3_ID);
pend_semaphore(1);
printf("t4\n");
release_semaphore(1);
delay(1);
while(1){
i++;
if(i>3)
stop_period(t4_ID);
if(i>5){
delay(2);
remove_from_timeq(0,t3_ID);
thread_delete();
}
wait_for_next_period();
}
}
Lena Higberg
Energy Modelling of Portable Embedded Systems
40
Download