Mälardalen University Department of Computer Engineering Supervisor: Raimo Haukilahti Examiner: Lennart Lindh Master Thesis in Computer Engineering Energy Modelling of Portable Embedded Systems Lena Higberg lhg98006@student.mdh.se Västerås, 2002 Foreword This document describes a master thesis in computer engineering and consists of two documents. The first document Power/Energy Simulators For Embedded Systems is a state-of-the-art report that consists of a survey of existing power simulators for embedded systems. The other document Energy Simulation of OS kernel System Calls describe the extension of a simulator and the results obtained during simulation. Acknowledgement I would sincerely like to thank my supervisor, Raimo Haukilahti, for all the help during this master thesis. I would also like to thank Johan Stärner at Mälardalen University who has helped me during the modifications of the simulator sim-cache. Contents Power/Energy Simulators for Embedded Systems Energy Simulation of OS kernel System Calls Lena Higberg Energy Modelling of Portable Embedded Systems 3 19 2 Mälardalen University Department of Computer Engineering Supervisor: Raimo Haukilahti Examiner: Lennart Lindh Power/Energy Simulators for Embedded Systems Lena Higberg lhg98006@student.mdh.se Västerås, 2002 Lena Higberg Energy Modelling of Portable Embedded Systems 3 Abstract This state-of-the-art report is the first part of a master thesis in computer engineering and consists of a survey of existing techniques for energy modelling of complete embedded systems and their components. The power simulators described in this report are: Wattch, SimplePower, PowerTimer, Tempest, SimBed and three other nameless methods to estimate the power consumed in a system. Also SimpleScalar has been described even though it is not a power simulator because several of the power simulators described in this report is based on SimpleScalar. The purpose of this survey is to gather information on different power simulators and see if there exists any simulator that can handle to simulate a whole system, that is both an application and an operating system, and preferably give both the power consumption and the performance (i.e. the execution time) for both the application and the operating system separately. Lena Higberg Energy Modelling of Portable Embedded Systems 4 Table of Contents 1 Introduction ............................................................................................................................... 6 1.1 Background ...................................................................................................................... 6 1.2 Purpose ............................................................................................................................. 6 1.3 Limitations ....................................................................................................................... 6 2 Power simulators ..................................................................................................................... 6 2.1 Architectural level simulators .......................................................................................... 7 3 SimpleScalar............................................................................................................................ 8 3.1 Simulator internals ........................................................................................................... 8 4 Wattch ..................................................................................................................................... 9 5 SimplePower ......................................................................................................................... 10 6 SimBed .................................................................................................................................. 12 6.1 Simulator description ..................................................................................................... 12 7 TEM2P2EST .......................................................................................................................... 13 8 PowerTimer ........................................................................................................................... 13 9 A framework for energy analysis of embedded operating systems....................................... 13 10 A energy and performance profiler for embedded systems ................................................ 14 11 A hardware/software approach to analyse the energy overhead ......................................... 15 Conclusions .............................................................................................................................. 16 References ................................................................................................................................ 17 Lena Higberg Energy Modelling of Portable Embedded Systems 5 1 Introduction This state-of-the-art report is the first part of a master thesis in computer engineering, and consists of a survey of existing techniques for energy modelling of complete embedded systems and their components. The simulators described in this report are: SimpleScalar, Wattch, PowerAnalyzer, SimplePower, PowerTimer, Tempest, SimBed and three nameless methods that estimates the power consumption. A short description of the background to this master thesis and the purpose with this first part of this master thesis will be given here in this section. 1.1 Background “Recently the power consumption within embedded systems has gained attention and has become one of the primary design constraints. Portable designs such as MP3-players, palmtops and cellular phones contain batteries and have therefore a limited amount of energy. Operation time is then dependent on the systems power consumption. To be able to make a good energy-efficient design, low-power design techniques must be applied at many levels of abstraction and to each component.” Consequently lowpower design must be applied at the lowest level of abstractions as well as at the architectural and system levels (i.e. for applications and operating systems). Being able to simulate the power consumption of a system is important because if the power consumption is too high, the designers of the system can make changes in a very early stage of the design, thereby saving both time and money. 1.2 Purpose The purpose with this state-of-the-art report is to make a survey of existing power simulators at architectural level, and especially to find out if anyone of the simulators can handle to simulate a whole system, that is both an application and an operating system. Moreover the simulator should produce both the power consumption and execution time, preferably both for the operating system and for the application separately. 1.3 Limitations The limitation for this report is that only architectural energy simulators are described. For example there are three relatively known simulators: Rsim [1], Simics [2], SimOS [3], that is not described in this report since they are only performance simulators. Neither are methods on how to reduce the power consumption nor optimising techniques being brought up. 2 Power simulators Low power design can be classified into following levels of abstractions [4], [5]: Application/System level – the energy consumption to run a particular program as well as an operating system (i.e. system calls) can be reduced at this level. Behavioral/Algorithm level – different algorithms for the same purpose gives different amount of power consumption. Architectural level – here the power consumption in for example caches, core and processor buses are being analysed and optimised. Logic (gate) level – at this level both the function and the style of the circuit is decided. There are various designing styles and each have their power-performance tradeoffs. Transistor (circuit) level – this is the lowest level of abstractions, there are techniques unique to this level that can be used to further limit the power dissipation in the circuit (e.g. changing the input voltage, reordering of transistors). Lena Higberg Energy Modelling of Portable Embedded Systems 6 The reader may find an overview of the levels in figure 1, and their relations according to capacity, accuracy, speed, resources and savings. Abstraction level Analysis Capacity Analysis Accuracy Analysis Speed Analysis Resources Energy Savings Most Worst Fastest Least Most Least Best Slowest Most Least Application/System Behavioral/Algorithm Architectural Logic (Gate) Transistor (Curcuit) Figure 1: An overview over the levels of abstraction. This picture is from [5]. In this report power simulators at the architectural level are being surveyed. 2.1 Architectural level simulators An architectural simulator is a tool that reproduces the behaviour of a computing device [6]. In figure 2, the reader can view a taxonomy of hardware modeling tools. Architectural Trace-Driven Execution-Driven Emulation Direct Execution Figure 2: A taxonomy of hardware modeling tools [10]. A trace-driven simulator uses a trace of executed instructions, obtained by first executing the program on a real system. This trace is then used to drive a model of the system to be tested. To be able to collect the instruction trace, it requires the use of a variety of hardware- and software techniques. Trace-driven simulators cannot model miss-speculated code execution as execution-driven simulators, because the instruction trace are only recorded at correct program execution [6]. Execution-driven simulators also called event-driven, reproduces the execution of instructions on the simulated machine either by emulation or direct execution. The processor and surrounding components such as memory are simulated. A reference generator simulates the activities of the processor and issues memory references or commands to a simulator of the memory system. When the memory system simulator receives a reference or command from the reference generator (which is now a program rather than a predetermined trace), it simulates the path of the reference through the extended memory hierarchy – including contention with other references – and returns to the reference generator the time that the reference took to be satisfied. Execution-driven simulation provides more accuracy than trace-driven simulation because of this feedback that takes place from the memory system simulator to the reference generator [6, 8]. Direct-execution decouples functional and timing simulation [7]. Functional simulation generates values (for memory and registers) and control flow, by executing the application on a real processor (the host) while storing the values. Timing simulation determines the number of cycles taken by the simulated execution (timing for non-memory instructions is determined mostly by static analysis). Lena Higberg Energy Modelling of Portable Embedded Systems 7 Emulation, on the other hand, is when a model of the processor that is to be simulated, is created. The application is then executed on the model. 3 SimpleScalar SimpleScalar is a cycle-accurate architectural level processor simulator [8, 9]. It is distributed free-ofcharge to academic non-commercial users, with all source code, making it possible to relatively easily extend the simulator. Since SimpleScalar were released, it has become popular with as can be seen here in this survey since many of the simulators described here is based on SimpleScalar. SimpleScalar tool set includes several simulators ranging from a fast functional simulator to a detailed, dynamically scheduled processor model that supports non-blocking caches, speculative execution and state-of-the-art branch prediction. SimpleScalar cannot simulate a whole system, i.e. it can only simulate applications, and does not produce the power consumption as a result of the simulation. SimpleScalar is being described in this report, despite the fact that it is only a performance simulator, because several of the power simulators described later in this report is based on SimpleScalar. A new version of SimpleScalar is due some time this year; the new version includes, among other simulators, one that can handle to simulate a whole system but only according to performance though [10]. PowerAnalyzer [10], a power simulator based on SimpleScalar/ARM (also not ready yet), will also be included in this new version. 3.1 Simulator internals Figure 3 shows an overview of all the simulators that are included in SimpleScalar. Sim-Fast Sim-Safe Sim-Profile Sim-Cache Sim-Cheetah Sim-Outorder - 420 lines - no timing - 4+MIPS - 350 lines - no timing - w/ checks - 900 lines - no timing - lot of stats - -1000 lines - functional - cache stats - 3900 lines - performance - OoO issue - branch pred. - mis-spec. - ALUs - cache - TLB -150 KIPS Performance Detail Figure 3: SimpleScalar simulators, performance verses detail [2]. The fastest, least detailed simulator, sim-fast, does no time accounting, only functional simulation – it executes each instruction serially, simulating no instructions in parallel, no cache and no instruction checking. The simulation speed (on P4-1.7GHz) for sim-fast is 10+ millions of instructions per second (IPS). A separate version of sim-fast, called sim-safe, also performs functional simulation, but checks for correct alignment and access permissions for each memory reference. The results produced with this simulator is the same as for sim-fast with the difference that the total number of loads and stores executed are included also. The SimpleScalar tool set also includes two cache simulators, sim-cache and sim-cheetah. These simulators perform high-level cache studies but do not take access time of the caches into account (e.g., studies that are concerned only with miss rates). The simulators simulates level one instruction Lena Higberg Energy Modelling of Portable Embedded Systems 8 cache, level one data cache, level two unified cache, instruction TLB and data TLB. TLB stands for translation lookaside buffer, and stores the most recent page table entry references. Also a simulator that produces profile information is included, sim-profile, that generates detailed profiles on instruction classes and addresses text symbols, memory accesses, branches and data segment symbols. The most complicated simulator is sim-outorder, which is a detailed out-of-order issue simulator with a multi-level memory system. This simulator produces the total result of all the simulators above and also the number of cycles taken by the simulated execution (i.e. timing simulation). The simulation speed (on P4-1.7GHz) for sim-outorder is 350+ KIPS. The SimpleScalar hardware model’s software architecture is shown in figure 4. Applications run at the model using execution-driven simulation, which requires the inclusion of an instruction-set emulator and an I/O emulation module. Host Interface Target ISA I/O interface Target ISA emulator I/O emulator B Pred Resource Cache Simulator Core Loader Stats Memory Regs Host Interface Host Platform Figure 4: The SimpleScalar hardware model’s software architecture [4]. The I/O emulation module provides simulated programs with access to external input and output facilities. SimpleScalar supports several I/O emulation modules, ranging from system-call emulation to full-system simulation. For system-call emulation the system invokes the I/O module whenever a program attempts to execute a system call in the instruction set interpreter. The system emulates the call by translating it to an equivalent host operating-system call and directing the simulator to execute the call on the simulated programs behalf. The simulator core defines the simulators main loop, which executes one iteration for each instruction of the program until finished. For a timing model (i.e. sim-outorder) the main loop must account for the progression of execution time measured in clock cycles instead of instructions. SimpleScalar models several instruction sets including SimpleScalar PISA and Alpha and supports several of host platforms like Windows/NT, Linux/x86 and Sparc/Solaris. 4 Wattch Wattch is a simulator that estimates processor power consumption at the architectural level, developed at Princeton University, and is one of the simulators that are based on SimpleScalar. The simulators power estimation is based on a suite of parameterizable power models for different hardware structures [11]. SimpleScalar is used as the cycle level performance simulator, which keeps track of which units are accessed per cycle and records the total energy consumed for an application. Wattch uses a modified version of SimpleScalar’s sim-outorder, which is extended with an additional number Lena Higberg Energy Modelling of Portable Embedded Systems 9 of pipeline stages so that it will be more in line with current microprocessors. Since the simulator is based on sim-outorder it models therefore an Alpha processor. Figure 5 pictorially describes the structure of Wattch. Hardware Config Power Estimate Cycle-Level Performance Simulator Binary Parameterizable Power Models Cycle-by-Cycle HW Access Counts Performance Estimate Figure 5: The structure of Wattch [11]. There are three possible ways to use Wattch: One way is the case where the user is interested in comparing several design configurations that are achievable simply by varying parameters for hardware structures that are modelled (micro architectural tradeoffs). The other usage scenario is for software or compiler development, where a single hardware configuration is used and several programs are simulated and compared (compiler optimisation). The third usage scenario highlights Wattch’s modularity, additional hardware modules can be added to the simulator (hardware optimisation). The main processor units that are modelled falls into four categories [11]: Array structures: Data and instruction caches, cache tag arrays, all register files, register alias table, branch predictors and large portions of the instruction window and load/store queue. Fully Associative Content-Addressable Memories (CAM): Instruction window/reorder buffer wakeup logic, load/store order checks and TLB:s for example. Combinational Logic and Wires: Functional Units, instruction window selection logic, dependency check logic and result buses. Clocking: Clock buffers, clock wires and capacitive loads. The simulation speed is reduced with approximately 30%, compared to performance simulation (simoutorder alone). The accuracy is approximately +/-13% [12]. Wattch is distributed freely for noncommercial use, with source code. 5 SimplePower SimplePower is an execution-driven, cycle-accurate architectural level energy estimation tool. The simulated system consists of the processor core, on-chip instruction and data caches, off-chip memory and the interconnect buses between the core and the caches and between the caches and the off-chip memory, see figure 6. SimplePower can point out power hot spots in hardware and software before systems are built. SimplePower was developed at the Pennsylvania State University. Lena Higberg Energy Modelling of Portable Embedded Systems 10 SimplePower Main Memory I cache D cache Output module Cache/Bus simulator Energy Statistics Core (pipeline) Power estimation interface Core Energy Memory Energy Bus I/O Energy Energy Switch Capacitance Tables Figure 6: SimplePower structure [14]. SimplePower simulates an in-order processor with a 5-stage pipeline. Perfect cache is assumed. SimplePower is also based on SimpleScalar and models a subset (the integer part excluding division) of the instruction set of SimpleScalar. Clock power is not implemented and neither are system calls nor the processors control unit. The core simulates the execution of all active instructions at each clock cycle. All activated functional units corresponding power estimation interfaces are called by the core. To be able to keep the simulator technology independent, the power estimation interface was developed for all the architectural level functional units. In that way only the table or the interface implementation needs to be changed if the architecture of a unit is changed. SimplePower uses transition-sensitive energy models and analytical energy models. The energy consumption is impacted by switching activity. When the energy model captures the switching activity we refer to the technique as a transition-sensitive approach (in contrast to the analytical energy model). SimplePower uses the number of transitions in a given operation to calculate the power consumption. Transitions accounts for the main part of the processors power consumption. The technique builds an energy model for each functional unit. These transition-sensitive models contain, switch capacitance (in form of a table) for a functional unit for each input transition obtained from VLSI layouts and extensive HSPICE simulation. These switch capacitance tables can be used to calculate the power consumed by the unit in reaction to an input transition. The input transition can be either a complete instruction or the toggling of a data line. The problem with input transition is that the tables can grow large if the number of possible input combinations is high. Therefore a clustering technique is used, that coordinates similar transitions and energy patterns together. The energy model used by SimplePower to estimate the energy consumption for system buses is transition-sensitive. The energy consumption of the buses depends on the switching activity on the bus lines and the capacitive load of each bus line. SimplePower uses predefined transition-sensitive models for each functional unit (ALU, multiplier, divider) to estimate the energy consumption for the datapath. On the contrary, to estimate the energy in the memories a simple analytical energy model is used. SimplePower provides the following outputs: the register file final status, the total number of cycles in execution, the number of transitions in the buses, switch capacitance statistics for different functional units and the total switch capacitance [13]. The total energy consumption can be calculated (E = C*V 2) using the total capacitance. Lena Higberg Energy Modelling of Portable Embedded Systems 11 The average error compared to values using HSPICE (a circuit level simulator), was found to be within 15% for all the units. The simulation speed is not presented in these articles [13, 14]. SimplePower is distributed free of charge and can be found at the university’s homepage. 6 SimBed SimBed [15, 16] is an execution-driven simulation testbed that measures the execution behaviour and power consumption of embedded applications and real-time operating systems (RTOSs) by executing them on an accurate architectural model of a micro controller with simulated real-time stimuli. The processor simulator measures the power consumption of the system, with accuracy to within 10-15% of real measurements. 6.1 Simulator description SimBed is a cycle-accurate processor model that emulates the Motorola M-CORE processor (a lowpower CPU core) as the microcontroller. All devices, interrupts and interrupt handlers used by the operating system and application are accurately simulated. To make the simulation more realistic, some background load should be run (real-time stimuli). This was made by running two tasks: a periodic control loop task and an aperiodic inter-process communication task. SimBed keeps track of real-time jitter (differs slightly from traditional definition), response-time delay (differs significantly from traditional definition) and total CPU energy consumption divided into user, kernel, handler, semaphore and idle components. Figure 7 pictorially describes the test-bed structure. Statistics: Power data Jitter data Delay data Applications RTOSs SimBed Host platform Figure 7: Test-bed structure [16]. There is also a flash memory simulator included in the emulator, because microcontrollers often has FLASH memory. The emulator executes the program directly from “FLASH”, and no write is allowed to this memory. The user of the test-bed uses a download tool that is included in SimBed, which downloads the code to the FLASH memory. The display emulator makes it possible to print out on the screen from the application. SimBed includes I/O simulation so that it can handle applications with input/output operations (like MPEG). There is also an interrupt controller that handles/controls all external interrupts. The power consumption model used by SimBed is based on experiment data (instead of simulation). For each single instruction, the power consumption is measured by using an infinite loop with only the instruction that is to be measured inside. This measured power number will be the base power consumption number of this specific instruction. When multiplying this number with the execution time of the instruction, the basic energy consumption of the instruction will be obtained. This number will however be too small, due to some overhead that needs to be accounted for. One explanation for this is that during the single instruction test, the state of all the modules inside the processor will not change as much as if the previous instruction would have been different, this is called inter-instruction Lena Higberg Energy Modelling of Portable Embedded Systems 12 overhead. Another factor that influence the accuracy is the changing of the operators of each instruction. The total power consumption for a single instruction is therefore the basic power consumption for the instruction, the approximated inter-instruction overhead and the average number for operator variation, all added together. 7 TEM2P2EST TEM2P2EST stands for Thermal Enabled Multi-Model Power/Performance ESTimator and can be used both at architectural level as well as compiler level [17]. Tempest is a cycle-accurate micro architectural power and performance estimator. Also this simulator is based on SimpleScalar simulator sim-outorder. The simulator can estimate power consumption either by using empirical data or analytical power models, the user can select which mode they want to use. The power models estimates both the dynamic and the leakage power since leakage power is becoming more and more imported due to the shrinking process technology. Additional features included are technology scaling options and dual Vt technology support. There is also a thermal model included in the simulator that converts the power numbers into a temperature profile. 8 PowerTimer PowerTimer [18] is a fast, cycle-accurate, parameterised research simulator, developed by IBM research group to aid in the evaluation of future PowerPCTM processors from the viewpoint of powerperformance efficiency. PowerTimer extends a research simulator called Turandot [28, 29] that models a generic, parameterised, out-of-order superscalar processor, level 1 data and instruction cache, level 2 cache, branch predictor and main memory. The energy models are derived from real, circuit-level power simulation data. These models are controlled by two sets of parameters: Technology/circuit parameters – which allows appropriate scaling from one CMOS generation to the next. Microarchitectural-level parameters – various queue/buffer size, pipe latencies and bandwidth values. That can be determined by the user of the simulation tool. PowerTimer can be used in two different modes. The performance simulator can be used standalone, and then the statistics from that simulation can be processed through the energy models to generate average unit-wise power numbers. Or the energy models can be embedded in the actual simulation code. This allows the user to view the cycle-by-cycle energy characterization as well as the average unit-wise statistics as in the first mode. The accuracy and the speed of the simulation are not presented in the article. PowerTimer is distributed freely. 9 A framework for energy analysis of embedded operating systems Robert P. Dick et al. in [19, 20] has developed an energy analysis framework that can be used to analyse the energy consumption for the functions of RTOSs and applications. The internal operation of the SPARClite processor is simulated using a cycle-accurate instruction set simulator. In order to account for the effect of cache misses, an on-line SPARClite cache simulator is used. The framework also consists of memory, timer, UART and a bus interface models. Other Lena Higberg Energy Modelling of Portable Embedded Systems 13 peripherals (i.e. other hardware components, e.g. brake sensors, ASICS) can also be added to the simulator. The processor has a 5-stage pipeline, SPARC v8 instruction set architecture and also a power-down mode (by reducing the frequency) that can be used to reduce energy consumption. The framework gives a detailed report of the energy consumed by the applications and the RTOSs functions, using call-trees. A call tree is a graphical description of the function-call hierarchy and contains power statistics in the form of a histogram. Each tree node corresponds to a function call and has a child node for each new function call within the function itself, the energy and time consumed for every function call are annotated. Each component in the simulator has a power model that observes the data flowing through the component and computes the power as a function of the present and past values at its terminals. The power consumed by all the components is aggregated to get the total power consumed in the system per cycle. The processor power model uses the current and previous instruction codes among other statistics to determine the processor power consumption. Memory energy consumption is derived from the manufacturers data-sheet. It is easy to add new hardware to the simulated system (if the hardware implementation is known, the energy consumption can be computed using known energy analysis techniques [25][26][27]). The accuracy of this energy analysis framework is not presented in the article. 10 An energy and performance profiler for embedded systems T. Šimunić et al. presents in [21], a source code optimisation methodology and a profiler for energy consumption and performance in embedded systems. This profiler was developed by Stanford University and Bologna University in cooperation/collaboration with Hewlett-Packard Laboratories. The profiler simulates embedded systems consisting of a microprocessor with two levels of cache, offchip memory, DC-DC converter and battery. The profiler extends previous work that was made by the same authors [22]. This work in turn extended the ARMulator, which is a proprietary instruction-level performance simulator from ARM inc., with cycle-accurate energy models for all system components. In order to evaluate energy efficiency of two different implementations, the designer would need to obtain cycle-by-cycle plots and then manually relate cycles to the software portion of interest. This is why the profiler was created. The profiler works concurrently with the cycle-accurate simulator and samples periodically the simulation results. The profiler maps the energy and performance to the executed function using information gathered at the compilation time. The user can decide how often the profiler will gather information, whereby the simulation speed is increasing. Usually an interval of 1 μs is sufficient. The profiler can output both the performance and energy consumption for all components in the system as well as for the source code. If choosing to profile the source code, the output shows the total energy consumption or performance for each function and their “underlaying” functions that are called. The programs total energy consumption is the energy consumed by the main function. Accuracy is within 5% of the hardware measurements for the tested system. Lena Higberg Energy Modelling of Portable Embedded Systems 14 11 A hardware/software approach to analyse the energy overhead L. Benini et al. from the University of Bologna proposes, in [23], a methodology to analyse the energy overhead due to the presence of an embedded operating system in a portable device. They used a hardware/software system (a case studie) to analyse the energy consumed within the system that is to be tested. The hardware system is the SmartBadgeIII, a prototype of wearable devices from HewlettPackard Laboratories. As operating system they used eCos, witch is a real-time operating system from Red Hat that was ported to the target platform, that is the SmartBadge. The SmartBadgeIII has a StrongARM 1100 processor [24], and integrates in the same chip the ARM core, a memory management unit, interrupt and DMA controller and many I/O controllers like UART, audio and LCD. The system also contains some memory: data and instruction caches, flash and static RAM. The experimental set-up consists of a hardware component and a software component, see figure 8. The hardware component consists of an I/V conversion board that converts the current (I) absorbed by the SmartBadge to voltage (U) values. This value is then sent to a data acquisition board (DAQ) that communicates to a PC, which runs a LABVIEW program that controls the measurement framework. To obtain the energy consumption there is a need for both the current and the execution time to be known. For that reason an accurate software trigger is used. The DAQ board allows an external signal to start and stop the measurement and the signal is provided by driving a general-purpose pin at the processor. The LABVIEW program is then responsible for providing the energy values by combining the power and time information’s. Voltage values SmartBadge I I/V conversion board U DAQ Time PC that runs LABVIEW Figure 8: The experimental set-up. Neither the accuracy nor the simulation speed of this method is presented in the article. Lena Higberg Energy Modelling of Portable Embedded Systems 15 Conclusions To get an overview of the simulators looked into in this report and to make it easier to compare the simulators against each other, a table with all the simulators was made. Power estimation accuracy Components that are being simulated Distributed freely? Can handle OS and app? Performance and/or power simulator? Processors supported Wattch +/-13% Yes Application Both SimplePower 15% Yes Application Energy SimBed 10-15% No Both Power PowerTimer - Cache, off-chip memory, I/O Perfect cache, off-chip memory and buses, I/O Flash memory, I/O L1, L2 cache, ext. memory Yes Application Both Alpha (Simoutorder) Integer subset of SimpleScalar PISA Motorola M-CORE PowerPC Tempest - Cache, ext. memory, I/O No Application Both Alpha (Simoutorder) R. P. Dick et al. - No Both Energy T. Šimunić et al. 5% No Application Energy Fujitsu SPARClite ARM L. Benini et al. - Cache, DRAM, timer, UART Cache, off-chip memory, DCDC, battery All components in SmartBadge No Both Energy StrongARM 1100 Table 1: A comparison of the simulators looked into in this report. Lena Higberg Energy Modelling of Portable Embedded Systems 16 References [1] C. J. Hughes, V. S. Pai, P. Ranganathan, S. V. Adve, Rsim: Simulating Shared-Memory Multiprocessors with ILP Processors, 2002. [2] Peter S. Magnusson et al., Simics: A Full System Simulation Platform, 2002. [3] M. Rosenblum, S. A. Herrod, E. Witchel, A. Gupta, Complete Computer System Simulation: The SimOS Approch, 1995. [4] Bengt Oelmann, Asynchronous and Mixed Synchronous/Asynchronous Design Techniques for Low Power, KTH 2000. [5] Pradip Bose et al., Power-Efficient Design: Modeling and Optimizations, tutorial, ISCA, 2001. [6] David E. Culler and Jaswinder Oal Singh (1999), Parallel Computer Architecture A hardware/software approach, Morgan Kaufmann Publishers, Inc. Pages 231-234. [7] M. Durbhakula, V. S. Pai, S. Adve, Improving the Accuracy vs. Speed Tradeoff for Simulating Shared-Memory Multiprocessors with ILP Processors. [8] Todd Austin, Eric Larson and Dan Ernst, SimpleScalar: An Infrastructure for Computer System Modeling, IEEE, february 2002. [9] Doug Burger and Todd M. Austin, The SimpleScalar Tool Set, Version 2.0. 1997. [10] Todd Austin et al., SimpleScalar Tutorial (for releas 4.0), held at MICRO-34, 2001. [11] D. Brooks, V. Tiwari, M. Martonosi, Wattch: A Framework for Architectural-Level Power Analysis and Optimizations, ISCA, 2000. [12] S. Ghiasi, D. Grunwald, A Comparison of Two Architectural Power Models, 2000. [13] W. Ye, N. Vijaykrishnan, M. Kandemir and M. J. Irwin, The Design and Use of SimplePower: A Cycle-Accurate Energy Estimation Tool, Microsystems Design Lab, The Pennsylvania State University, DAC 2000. [14] N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, W. Ye, Energy-Driven Integrated Hardware-Software Optimizations Using SimplePower, ISCA, 2000. [15] K. Baynes, C. Collins, E. Fitherman, B. Ganesh, P. Kohout, C. Smit, T. Zhang and B. Jacob, The Performance and Energy Consumption of Three Embedded Real-Time Operating Systems, November 2001. [16] T. Zhang, RTOS Performance and Energy Consumption Analysis Based on an Embedded System Testbe, Master’s Thesis, University of Maryland, May 2001. [17] A. Dhodapkar, C. H. Lim, G. Cai, W. R. Daasch, TEM2P2EST: A Thermal Enabled Multi-model Power/Performance ESTimator, 2001. [18] D. Brooks, M. Martonosi, J-D. Wellman, P. Bose, Power-Performance Modeling and Tradeoff Analysis for a High End Microprocessor, 2000. Lena Higberg Energy Modelling of Portable Embedded Systems 17 [19] R. P. Dick, G. Lakshaminarayana, A. Raghunatham, N. K. Jha, Power Analysis of Embedded Operating Systems, presented at ACM, 2000. [20] R. P. Dick, G. Lakshaminarayana, A. Raghunatham, N. K. Jha, Power Analysis of Embedded Operating Systems, to be presented at ACM, 2000. [21] Tajana Šimunić, L. Benini, G. De Micheli, Mat Hans, Source Code Optimization and Profiling of Energy Consumption in Embedded Systems, 2000. [22] Tajana Šimunić, L. Benini, G. De Micheli, Cycle-Accurate Simulation of Energy Consumption in Embedded Systems, proceedings of DAC, 1999. [23] A. Acquaviva, L. Benini, B. Riccò, Energy Characterization of Embedded Real-Time Operating Systems, 2001. [24] Advanced RISC Machines Ltd., Advanced RISC Machines Architectural Reference Manual, July 1996. [25] L. Benini, G. De Micheli, Dynamic Power Management: Design techniques and CAD tools, 1997. [26] A. R. Chandrakasan, R. W. Brodersen, Low Power Digital CMOS Design, 1995. [27] J. Rabaey, M. P. (Editors), Low Power Design Methodologies, 1996. [28] M. Moudgill, P. Bose, J. Moreno, Validation of Turandot, a fast processor model for microarchitecture exploration, IEEE, 1999. [29] M. Moudgill, J. Wellman, J. Moreno, Environment for PowerPC microarchitecture exploration, 1999. Lena Higberg Energy Modelling of Portable Embedded Systems 18 Mälardalen University Department of Computer Engineering Supervisor: Raimo Haukilahti Examiner: Lennart Lindh Energy Simulation of OS kernel System Calls Lena Higberg lhg98006@student.mdh.se Västerås, 2002 Lena Higberg Energy Modelling of Portable Embedded Systems 19 Abstract The SimpleScalar simulator, sim-cache, has been extended so that it can handle to simulate not only an application but also an operating system. As a result of the simulation the execution times and energy consumption for the different system calls in the operating system is produced. The operating system used is SW Symo; somewhat rewritten though so it will work with the simulator. The simulated system consists of a MIPS processor, caches and off-chip memory. Energy consumed in CPU and caches are modelled using the energy simulator Wattch and simple energy models have also been added for the main memory and the off-chip bus. Minimum, maximum and average execution times and energy consumptions for the different system calls in SW Symo has been simulated using a simple application. The simulated execution times has been compared to those measured when running SW Symo on a M68000 system [1] and a conclusion can be drawn that all but one of the system calls has a lower simulated execution time; probably partly depending on the fact that the M68000 system has no cache. Lena Higberg Energy Modelling of Portable Embedded Systems 20 Table of contents 1 Introduction………….……………………………………………………………………………... 22 1.1 Purpose………………………………………………………………………………………. 22 1.2 Motivation……………………………………………………………………………………. 22 1.3 Limitations…………………………………………………………………………………… 22 2 Related work………………………………………………………………………………………... 22 3 Problem description..………………………….……………………………………………………. 22 4 Problem analysis……………………………………………………………………………………. 23 5 Method……………………………………………………………………………………………… 23 6 Solutions……………………………………………………………………………………………. 24 6.1 The operating system, SW Symo…………………………………………………………… 24 6.1.1 Implemented system calls……………………………………………………………………….. 25 6.2 MIPS instruction set architecture……………………………………………………………. 26 6.3 Modifications in sim-cache………………………………………………………………….. 26 6.4 Problems occurred during implementation…………………………………………………... 27 6.5 Execution times for each system call………………………………………………………… 28 6.6 Energy modelling…………………………………………………………………………….. 30 7 Results……………………………………………………………………………………………….32 Summary………………………………………………………………………………………………33 References……………………………………………………………………………………………. 34 Appendix A……………………………………………………………………………………………35 Appendix B……………………………………….…………………………………………………... 39 Lena Higberg Energy Modelling of Portable Embedded Systems 21 1 Introduction A short description of the purpose, motivation and the limitations of the implementation part of this master thesis will be given here in this section. The background for this master thesis, to why energy is an important aspect, has already been described in the state-of-the-art report. 1.1 Purpose The purpose with this second part of the master thesis was to choose and extend a simulator so the energy consumption for the different system calls in an operating system could be obtained. Hence, the simulator should be able to handle the requests mentioned in the state-of-the-art report. That is so that the simulator can handle to simulate a whole system, in other words both an application and an operating system. As a result of the simulation, both the power consumption and the performance (i.e. the execution time) for each system call in the operating system should be produced. 1.2 Motivation Energy consumption has become an important aspect while designing a portable embedded system. Portable devices depend on battery power and have therefore a limited amount of energy. Operation time is then dependent on the systems power consumption. High power consumption also contributes in more heat developing. Hence, there is a need to study the energy consumed in embedded systems under different circumstances. One aspect to explore could be the energy consumed for the operating system in an embedded system, considering an operating system completely in software verses an operating system partly in hardware. To be able to compare these two cases, the energy consumed for each system call in the operating system in both cases can be compared. 1.2 Limitations The execution time measuring and the energy modelling is somewhat simplified due to time restriction. Also, the simulator can only simulate a specific operating system. 2 Related work Similar work that has been done can be found in articles [13],[14] and [15]. Article [13] presents modelling of embedded systems with SimBed, both execution behaviour (jitter and delay) and the energy consumption of embedded applications and RTOSs is measured. The energy consumption is not measured for each system call though, as has been done here, it is measured for the applications, idle task, semaphores and so on. Article [14] also analysis the power consumption of an RTOS and an application. The energy consumption is measured for the application tasks and can be traced for each function in the task. This article does not either measure the energy consumption for the different system call in the operating system. In article [15] a hardware/software method is used to analyse the energy overhead due to the presence of an embedded operating system in a wearable device. In this article though the minimum, maximum and average energy consumed for each system calls in the OS is measured. 3 Problem descriptions This master thesis is about energy characterization of the different system calls of an operating system. As an example, consider two systems: Lena Higberg Energy Modelling of Portable Embedded Systems 22 A CPU and memory – and on this system running an operating system and an application. A CPU, a RTU (Real Time Unit – a real-time kernel in hardware) and memory – consequently (part) of the operating system is in hardware, and running on the system is (the software part of) the operating system and an application. How much energy is consumed by the operating system in each case, for each system call? This is what is of interest, but this master thesis handles only the first case though. 4 Problem analysis To be able to find out the energy consumption for each system call, there is a need for a simulator. Does it already exist a simulator that can simulate the energy consumption for each system call? Or is it necessary to extend a simulator? During the first part of this master thesis, a survey was made on existing energy simulators. The conclusion from this work was that it does not exist a simulator that can handle our requests, so there is a need to extend a simulator. When deciding which simulator to extend, the processor simulated and the result of the simulation, among other things, must be considered. Preferably the simulator should simulate a processor for embedded systems; these processors are often a bit simpler than, for instance, processors for PC’s. In addition, operating system to be used must be decided as well as the application. As mentioned earlier, this master thesis handles only the part where an operating system in software is used, but preferably there should be a possibility to add a RTU to the simulator. What needs to be decided is pictorially described in figure 1 here below. Appl. RTU + OS Simulator Modelled CPU Result Figure 1: What needs to be decided before starting implementation. 5 Method The first thing that had to be done was to decide which simulator to extend, and with that, which processor to model. An examination of the summery table from the state-of-the-art-report above, showed that most of the simulators are not possible choices due to the fact that they are not accessible or that they model a too complex processor. A conclusion was drawn in collaboration with the tutor, that there are in fact only two possibilities: Either, a SimpleScalar [5,6,7] simulator can be extended, or seamless, a simulator developed by Mentor Graphics, can be used together with the T. Šimunić et. al method [8]. If we were to extend a SimpleScalar simulator we would need to add operating system functionality and energy modelling to the existing simulator. Sim-cache, modelling the SimpleScalar PISA, would have been the most suitable simulator to use. PISA contains most part of the MIPS instruction set architecture, so that would be the architecture simulated. Lena Higberg Energy Modelling of Portable Embedded Systems 23 If the T. Šimunić et. al method together with seamless, instead of the ARMulator that they used, was to be used we would have a simulator that models an ARM processor and can handle an operating system. It would be the energy modelling part that needs to be added. An overview of the two different choices can be found in figure 2. OS + application OS + application Sim-cache MIPS Seamless ARM CACHE MEM Result Possibility to add RTU. MEM Result Figure 2: The two possible choices of simulators to extend. One advantage using sim-cache would be that it executes at instruction-level while seamless executes at bit-level. An advantage using the T. Šimunić et al. method is that it is possible to add VHDL code to seamless, thus making it easier to add an RTU to the system. Another advantage is that it simulates ARM code while sim-cache cannot handle this yet. A new version of SimpleScalar, that models an ARM processor, is being developed but is not completed yet. As operating system SW Symo would be possible to use. It is an operating system developed at the university by a student as a master thesis [1]. It is a complete software version of the operating system symo that is partly written in hardware (RTU), the interface is exactly the same for both operating systems. 6 Solutions A decision was made to use and extend the SimpleScalar simulator, sim-cache. There was no particular reason for this choice, both methods described in the earlier chapter have their advantages and disadvantages and neither seemed to be a better choice than the other. It felt kind of easier though, to modify sim-cache because during the first part of this master thesis SimpleScalar had been studied and tested. The SimpleScalar simulators models the portable instruction set architecture (PISA), which is SimpleScalar’s own instruction set that contains most of the 64-bit MIPS instruction set architecture. If using Linux as the development environment, as in this case, it follows the big endian structure (i.e. byte zero is always the most significant byte). Before starting to implement, the code for the simulator was studied to get an overview of the system. The SimpleScalar tool set, version 3, had already been installed; the simulators can be downloaded, at no cost, through the SimpleScalar homepage [7]. There is a Webb site [3] on the Internet that contains all the files and some descriptions, that has been a big help during this part. 6.1 The operating system, SW Symo As mentioned in chapter 4, the operating system SW Symo [1] could be used, and a decision was made to do so. SW Symo had to be modified to some extent, because it was implemented to execute Lena Higberg Energy Modelling of Portable Embedded Systems 24 on a Motorola 68000 system. Therefore the assembly code had to be re-written to MIPS assembly code that is simulated by sim-cache. To be able to implement the assembler code in MIPS assembler instead of M68000, it becomes necessary to modify the task control block (TCB) structure. Also, the special in- and output routines that depended on the platform that was used had to be changed. The external interrupt routines that are included in the operating system will not be implemented, mostly due to the fact that they are not of interest in this case. The timer cannot be used with simcache and because of this the simulator must, instead, determine when it is time for the operating systems task switch routine to run. When the operating system was written in M68000 assembler, the kernel code, which must not be interrupted, was protected by executing it in supervisory mode. When using SW Symo with simcache, the kernel code cannot be run in supervisory mode and can therefore be interrupted when simcache is to execute the task switch routine, SCHEDULER. This is prohibited by examination of the address in sim-cache before starting the execution of the task switch routine. To be able to run the operating system on the simulator it must be compiled with the special SimpleScalar compiler, that compiles the code so that the simulator can understand it, and with the libraries and the crt0 file that accompanies the SimpleScalar tool set. This makes it necessary to change the makefile. The linker script [4] that came with SW Symo could not be used since it was made for a M68000 system and must now fit a MIPS system and memory map instead. The order that files are organised in the memory can also be controlled in the linker script, thus making it possible to organise the code in a way so that the kernel code is separated from the application code. Another reason why the linker script had to be modified is that there is a need to know the address to the task switch routine; also this can be controlled in the linker script. Files that has been modified is of course the assembler files: ass_support.S, symosysc.S, symo_off.inc and symo_ext.inc. Due to the fact that the TCB is changed, the files symodef.h, symo_off.i and rtu_file.c also needs to be modified and the application file, symo_os.c, is naturally changed. Not all files are needed for use with sim-cache, such is the case for e.g. the hardware dependent routines in the files basinout.obj, inpout.obj. When writing an application for SW Symo, the user has to make sure that there exists an idle task in the system, with the lowest priority (priority 7). 6.1.1 Implemented system calls Here follows a very short description of the system calls that is still in use (some of the system calls can not be used now, as for example the system calls that handles external interrupts): Initialisation routines: os_init – initialises variables and lists. Thread management routines: thread_create – create a thread and initialise the TCB for the specific thread. thread_start – the first time this system call is made it starts to execute the first thread in ready queue (i.e. the thread with highest priority). In all other cases, it makes the specified thread ready. thread_delete – deletes/terminates the currently running thread. thread_block – the currently running thread becomes blocked. thread_yield – if there exists another thread in the system with the same priority switch executing task. thread_getinfo – returns information on the specified thread. Time management routines: Lena Higberg Energy Modelling of Portable Embedded Systems 25 init_period_time – initialises the periodic time for a periodic thread. wait_for_next_period – current periodic thread is made to wait for next period. stop_period – disables periodic start for the specified periodic thread. start_period – enables periodic start for the specified periodic thread. delay – the executing thread is set to sleep (waiting) for a specified amount of ticks/time. remove_from_timeq – removes specified thread from the delay / period queue and activates respectively terminates the thread. read_timeq – returns the time left in the waiting queue for a specified thread. Semaphore functions: create_semaphore – creates and initialises a semaphore. delete_semaphore – deletes the specified semaphore. pend_semaphore – makes the currently running thread pending for the specified semaphore. release_semaphore – release the specified semaphore. read_semaphore – returns the count value for the specified semaphore. 6.2 The MIPS instruction set architecture A MIPS processor consists of an integer processing unit (the CPU) and a collection of coprocessors. Coprocessor 0 handles traps, exceptions, and the virtual memory system. One of the registers in coprocessor 0 is the EPC (Exception Program Counter) register, which normally contains the address of the program counter when an exception occures. In this case it is used to save the address where the application is interrupted when the task switch routine is to be executed. This register can be read by using the mfc0 - move from coprocessor 0, instruction. MEMORY CPU FPU (coprocessor 1) Register $0-$31 Register $f0-$f31 Arithmet ic unit Divide Multiply Arithmet ic unit LO HI Coprocessor 0 (Traps & mem) BadVAddr Cause Status EPC Figure 3: The MIPS architecture. The MIPS central processing unit contains 32 general-purpose registers that are numbered 0-31. Register n is designated by $n. A set of conventions as to how registers should be used has been established and can be found in [2]. The pseudo-op codes and all instructions can also be found in [2], but not all of these instructions and op-codes are included in PISA. 6.3 Modifications in sim-cache The timer routines cannot be used to decide when it is time to switch tasks (as already mentioned in 5.1) so instead this has to be controlled by the simulator. After a number of instructions has been executed the task switch routine (SCHEDULER) will be run by the simulator, sim-cache, instead. This is set to 35 000 instructions but can easily be changed. The reason, why the number of instructions was chosen to be 35 000, was that initialisation routines and the start of the system have had lots of time to Lena Higberg Energy Modelling of Portable Embedded Systems 26 execute and finish and that SCHEDULER should be able to run and that there still would be time left for the application to execute in between the task switches. When SCHEDULER is to be executed the address where the application was interrupted is saved in the EPC register, that can be read by SCHUDELER using the mfc0 instruction. To be able to add operating system functionality to sim-cache the address to the task switch routine must be known so that the execution path can be changed. The linker script, as described in section 6.2, controls the address of this function. While compiling the operating system an option is made to save the addresses, of where functions and variables is put in the memory, to a file (sumo.map). The address for this task switch routine can be found in this file. SimpleScalar PISA does not include all the MIPS instructions, especially one is not included; the mfc0 instruction needed to translate the assembler part of the operating system. This makes it necessary to do some changes in sim-cache so that the simulator can handle this necessary instruction. This had already been done once by Johan Stärner at the university, so there was no reason to do this work again. The files changed for this reason were: machine.c/h/def (pisa.c/h/def), dlite.c, regs.h and simcache.c. The compiler from the SimpleScalar version 4 can be used, as the instruction has already been added to this compiler. Some of the system calls requires a direct execution of a task switch routine, that is the thread_delete, thread_block, delay and the wait_for_next_period system calls. The simulator has to control if the application is to make one of these system calls, so that the address of where the application has been interrupted can be saved in the EPC register. In the linker script file for the operating system all the kernel code that is not to be interrupted is put in one address space so that the simulator can check before running SCHEDULER that no such code is running, that has already been described in chapter 5.2. Also here the addresses can be found in the file sumo.map. The main function from the extended sim-cache can be viewed in appendix C. 6.4 Problems occurred during implementation One problem encountered during implementation was an error occurring while linking the operating system: File:rownr: Relocation truncated to fit: GPREL variable After a great deal of searching the conclusion could be drawn, that the global variables in the assembly code caused this error. The reason for this was never found, but if moving the definition on these global variables to c-code the error disappeared. Another problem that took a long time to find seemed to be caused due to the fact that the function printf is non re-entrant. By e-mailing the SimpleScalar help-mailinglist as well as directly to SimpleScalar, an attempt was made to try and verify this theory, but there was no response to these mails. The actual code for printf is not available (at least I cannot find it) so the possibility to verify the theory by examination of the code is not, to my knowledge, possible. This makes it necessary to protect the function somehow and it was decided to leave this responsibility to the user that writes the application, by demanding that printf must be used with semaphores. This makes it to a non-possibility to use printf in the idle task, as it would then interfere with the applications (if idle holds the semaphore). Another possibility to solve this problem would be to make the address check, before running the task switch routine, in sim-cache more complex so that it want interrupt printf. Lena Higberg Energy Modelling of Portable Embedded Systems 27 6.5 Execution times for each system call Execution times have been estimated for the system calls described in section 6.1. To be able to get the execution time for each system call, the number of cycles needed to execute the system call is calculated. The simulated cache structure is: 8 KB level 1 instruction cache, 8 KB level 1 data cache and 256 KB unified cache. In the article [12], the access times for the cache was used that can be found in table 1 below. These access times seemed to be sensible so a decision was made to use these numbers. Thus, if there is a cache hit at level 1 an assumption is made that the instruction takes 1 cycle to execute. L1 cache access time (cycles) L2 cache access time (cycles) Memory latency (cycles) 1 12 54 Table 1: Micro architectural parameters used during simulation. The cycle time for the main memory is assumed to be 90 ns and is derived from article [10]. This is the only article found that accounts for both cycle time and energy consumption for the memory, which in this article is 1 MB SRAM. Given that the simulated frequency is 600 MHz (why this frequency is used will be described in section 6.6), it comes down to a memory latency of 54 cycles. Each system call has to be executed at least a few times so that an average number of cycles needed to execute the system call can be calculated. The average value is needed because depending on where in the code the system call is made it is possible to get a different amount of cycles needed to execute the system call. The system call and the average number of cycles to complete the system call are written to a file called syscallstats.dat, along with the minimum and maximum value calculated. Also here it is necessary to know the address space for kernel code, to be able to know when the system call has finished executing. For some system calls, as mentioned earlier, it is necessary to run the task switch routine directly afterwards. When this is necessary the execution of the task switch routine is included to the number of cycles needed for the execution of the system call. Figure 4 below shows the execution times that were retrieved from a simulation using the application that can be viewed in appendix B. The system call os_init is not included in the figure because the simulated execution time for this system call is so much higher compared to the other system calls. The simulated execution time for os_init is 10782 cycles and since it is an OS initialisation routine it is only executed once. The difference between the minimum and maximum values of the execution times and energy consumption that were retrieved by the simulation were rather large for some of the system calls. For the system call thread_start it could depend on that it has a special function the first time it is executed. The system call thread_create is the first system call made in the application, which could have something to do with the reason of why there is such a difference in the values. Lena Higberg Energy Modelling of Portable Embedded Systems 28 3500 3000 Cycles 2500 2000 Min Avarage Max 1500 1000 pend_semaphore thread_create thread_getinfo thread_yield thread_start delay thread_delete wait_for_next_period delete_semaphore release_semaphore thread_block start_period create_semaphore read_timeq read_semaphore stop_period init_period_time 0 remove_from_timeq 500 System calls Figure 4: The minimum, average and maximum cycles needed to execute the system calls. A comparison was made with the results that Lariza Rizvanovic [1] received when executing SW Symo on the M68000 system. The comparison can be viewed in table 2 below. The execution times for SW Symo running on a M68000 system was measured in seconds and has been recalculated to cycles assuming the frequency is the default for the system, i.e. 16 MHz. The reason to compare in cycles rather than in seconds, is that execution times in second is even more frequency dependent than using cycles. The comparison is made with the average simulated execution times. System call init_period start_period stop_period create_semaphore pend_semaphore* read_semaphore release_semaphore delete_semaphore thread_create os_init thread_yield thread_start remove_from_timeq thread_delete thread_block Exec. times M68000 Cycles s 42,7 684 41,6 666 40,1 642 44,3 709 70 1120 40,8 653 70,2/71 1123/1136 67 1072 127 2032 2735 43760 37,4 598 91/160 1456/2560 70 1120 79 1264 43 688 Sim. avr. exec. times Cycles 429 268 437 750 537 256 619 378 1028 10732 261 859 687 1317 533 Difference Cycles 255 398 205 -41 583 397 504/517 694 1004 33000 337 597/1701 433 -53 155 *Pend_semaphore(if semaphore free and semaphoreq empty) 70s Table 2: A comparison between execution times. Lena Higberg Energy Modelling of Portable Embedded Systems 29 The execution time for the system call pend_semaphore in [1] assumes that the semaphore is free and that the semaphore queue is empty. This is not the case in the simulated execution time. The system calls release_semaphore and thread_start has two different values depending on the circumstances when called. Again this has not been under consideration while simulating the execution times. The conclusion that can be made by studying table 2 is that for almost all of the system calls the simulated execution time is faster. This was in fact somewhat expected, since the M68000 system has no cache in contrast to the simulated system that has both level 1 and level 2 caches. But for most of the system calls the execution times seems to follow each other pretty well, high execution time system calls seems to be high in both cases. As for example the execution time for the system call os_init is much higher than for all the other system calls in both cases. 6.6 Energy modelling While doing energy modelling all the components in the system must be considered, i.e. the caches, the main memory and the bus has to be modelled as well as the processor itself, see figure 5. CPU L1 Instr. Cache L1 Data Cache L2 Instr./Data Cache MEMORY Figure 5: The simulated architecture, the dotted lines are to show that there are two different chips. The energy consumed in caches and CPU is modelled using the power simulator Wattch together with sim-cache. The supervisor to this thesis, Raimo Haukilahti, has connected the simulators together and has done the changes needed in the simulators for this purpose. The frequency used in Wattch is 600 MHz. The processor and the caches are on one chip while the memory is on another (off-chip memory). This makes it necessary for an off-chip bus, which consumes a considerable amount of energy and needs therefore to be modelled. An assumption is made that there is a 32-bit address bus and a 64-bit data bus. Each bus line is assumed to have a capacitive load of 20pF [9] and the voltage is assumed to 3,3V, which has been used in both articles [12] and [10]. By using the equation: E = 0.5*C*V2 (1) Where E is energy (J), C is the capacitance (F) and V is the voltage (V), the energy consumed for each bus line can be calculated. The value on a bus line can naturally be either 1 or 0, and energy is consumed every time the value of the line switches. The energy consumed by the bus is evaluated by monitoring the switching activity on each bus line. Due to time restriction the energy model is simplified by assuming that 50% of the lines are changed every time the bus is used, instead of calculating exactly how many of the lines switches values. Lena Higberg Energy Modelling of Portable Embedded Systems 30 The energy needed to access the main memory is also somewhat simplified. Both read and write operation is assumed to consume 4,95 nJ each time. This number on energy consumption for each memory access has been used in both article [9] and [10]. When not accessed the memory consumes some energy anyway. The energy 0,000066667 nJ ((1/600MHz)*0,01mW) is used as idle energy for the memory, which has been used in article [10]. In figure 6 the simulated energy consumption for the different system can be found. The application used while retrieving these results can be viewed in appendix B. Also here the system call os_init is not included in the figure because the energy consumed by this system call is much higher than for the other system calls. The energy consumption for os_init is 9262,2 nJ. 900 800 700 600 Min nJ 500 Avarage 400 Max 300 200 pend_semaphore thread_create thread_getinfo thread_yield thread_start delay thread_delete wait_for_next_period delete_semaphore release_semaphore thread_block start_period create_semaphore read_timeq read_semaphore stop_period init_period_time 0 remove_from_timeq 100 System calls Figure 6: The energy consumed for the different system calls. As mentioned in section 2, related work, the energy consumption of kernel functions has also been measured in article [15], using eCos as operating system. The energy consumption has been measured at frequencies of 59 MHz and 221.2 MHz. The overall energy consumption presented in this article is higher with the lower frequency. At the higher frequency, i.e. 221.2 MHz, the measured energy consumption is between 340 nJ to 13540 nJ. Considering the fact that the operating systems are different and that the simulated frequency is higher (600 MHz), it is difficult though to compare the energy consumptions for the system calls presented in the article to the simulated energy consumptions presented here in this report. Lena Higberg Energy Modelling of Portable Embedded Systems 31 7 Result, future work The extended simulator developed in this master thesis can handle to simulate a whole system and as a result of the simulation produce the performance and energy consumption for the system calls in the operating system. The energy models for the main memory and off-chip bus is a bit simplified though. When modelling the energy consumed in the off-chip bus an assumption is made that 50% of the bus lines switches values. Further on, the simulator simulates an MIPS processor even though an ARM processor was to be preferred. A version 4 of SimpleScalar simulators, that can simulate an ARM processor, is being developed but was not finished as the implementation started. There has not been time to look into the possibilities to add an RTU to the simulator. The operating system has been simulated together with a simple test application and the execution times and energy consumed for each system call has been obtained. The simulated execution times have been compared to the execution times measured in [1] when running SW Symo on a M68000 system. A conclusion that can be made by studying table 2, section 6.5, is that for almost all of the system calls the simulated execution time is much faster. This was in fact somewhat expected, since the M68000 system has no cache. Considering the simulated energy consumptions there is not much that can be said about the obtained results since there are no values to compare the result with. Lena Higberg Energy Modelling of Portable Embedded Systems 32 Summary Out of two possibilities as described in section 5, the chose of simulator to be extended was the SimpleScalar simulator; sim-cache [5]. Sim-cache models the SimpleScalar PISA and since PISA contains most part of the MIPS instruction set architecture, the simulated processor naturally became an MIPS processor. The operating system functionality were added to sim-cache, by making simcache start the execution of the OS task switch routine after a number of instructions has executed. The operating system used for simulation was decided to be SW Symo [1], a complete software version of the operating system Symo that is partly written in hardware. Since the operating system was implemented to run on a M68000 system the assembler files had to be rewritten to MIPS assembler. This made it necessary to also change the TCB structure. The makefile and the linker script had to be changed as well, since the special SimpleScalar compilers must be used and the linker script must now fit the MIPS memory map. Also, in the linker script the addresses of the functions can be decided which is needed for use with sim-cache. Execution times are measured for the different system calls in the operating system. If there is a cache hit on level 1, it is assumed that it takes only one cycle to execute the instruction. A cache miss on level 1 means that the unified l2 cache must be accessed, thus increasing the execution time for the instruction with 12 cycles. The access time for the memory is assumed to be 54 cycles. All of these assumptions are based on numbers that have been found when reading articles, these specific numbers of cycles has been used in articles [12] and [10.] An energy modelling part was also added to the simulator. The cache and CPU is modelled using the power simulator Wattch. The supervisor of this master thesis, helped connecting Wattch to sim-cache, and made the necessary changes in the simulators. A simple memory and off-chip bus energy model is added as well. When modelling the off-chip bus an assumption is made that 50% of the bus lines switch values each time the bus is used, instead of calculating exactly how many of the lines switches values. This simplification was made mostly due to time restriction. The energy consumed in the memory both if accessed and on idle can be found in article [10]. The capacitive load on the bus line is assumed to be the same as in article [9]. As a result of the simulation the minimum, maximum and average execution times and energy consumed for the different system calls in the operating system is produced. The execution time is measured in cycles and the energy in nJ. The execution times and energy consumption for the different system calls has been obtained by simulating a simple application with the operating system, the application used can be found in appendix B. The simulated execution times have been compared to the execution times measured for the system calls when running SW Symo on an M68000 system [1]. A conclusion that can be made by studying table 2 in section 6.5 is that all but one of the system calls has a lower simulated execution time; probably partly depending on the fact that the M68000 system has no cache. Lena Higberg Energy Modelling of Portable Embedded Systems 33 References [1]Larisa Rizvanovic, Symo HW/SW Real-Time Kernel for single processor system, master thesis, Mälardalen University, 2001. [2] Larry Huffman, David Graves, MIPSpro™ Assembly Language Programmer’s Guide, http://www.mips.com/Documentation/MIPSPro_Ass._Lang_Vol1.pdf, 1996. [3] Yuan Wei, Simplescalar Source Code Analysis, http://www.cs.virginia.edu/~yw3f/cs757/simplescalar/index.html, 2001. [4] Steve Chamberlain, Ian Lance Taylor, Cygnus Solutions, Linker Scripts, http://www.geekgadgets.org/docs/ld_3.html, 2001. [5] Doug Burger and Todd M. Austin, The SimpleScalar Tool Set, Version 2.0. 1997. [6] Todd Austin et al., SimpleScalar Tutorial (for releas 4.0), held at MICRO-34, 2001. [7] The SimpleScalar LLC homepage, www.SimpleScalar.com, 2001. [8] Tajana Šimunić, L. Benini, G. De Micheli, Mat Hans, Source Code Optimization and Profiling of Energy Consumption in Embedded Systems, 2000. [9] N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, W. Ye, Energy-Driven Integrated Hardware-Software Optimizations Using SimplePower, ISCA, 2000. [10] Tajana Šimunić, L. Benini, G. De Micheli, Cycle-Accurate Simulation of Energy Consumption in Embedded Systems, proceedings of DAC, 1999. [11] V. Srinivasan, E. S. Davidson, G. S. Tyson, M. J. Charney, T. R. Puzak, Branch History Guided Instruction Prefetshing, 2001. [12] D. Brooks, V. Tiwari, M. Martonosi, Wattch: A Framework for Architectural-Level Power Analysis and Optimizations, ISCA, 2000. [13] K. Baynes, C. Collins, E. Fitherman, B. Ganesh, P. Kohout, C. Smit, T. Zhang and B. Jacob, The Performance and Energy Consumption of Three Embedded Real-Time Operating Systems, November 2001. [14] R. P. Dick, G. Lakshaminarayana, A. Raghunatham, N. K. Jha, Power Analysis of Embedded Operating Systems, presented at ACM, 2000. [15] A. Acquaviva, L. Benini, B. Riccò, Energy Characterization of Embedded Real-Time Operating Systems, 2001 Lena Higberg Energy Modelling of Portable Embedded Systems 34 Appendix A The main function of the extended file, sim-cache.c: #define NROFSYSCALLS 19 #define ADDRBUS_WIDTH 32 #define DATABUS_WIDTH 64 void sim_main(void) { int i, line; md_inst_t inst; register md_addr_t addr; enum md_opcode op; register int is_write; enum md_fault_type fault; struct stat_stat_t stat; struct stat_stat_t * statpointer; counter_t cache_dl1_accesses_old; /*Contains the old value of nr of accesses*/ counter_t cache_dl1_accesses_hits;/*Contains the old value of nr of accesses*/ counter_t cache_il1_hits_old, cache_il1_misses_old, cache_dl1_hits_old, cache_dl1_misses_old, cache_dl2_hits_old, cache_dl2_misses_old; counter_t cycles = 0, cycles_old = 0; int calcSyscallStats = 0; /* Calculating syscall stats? (1=yes) */ long start_value = 0; int syscall; double syscallEnergy = 0; syscallStats_t syscallStats[NROFSYSCALLS]; /* If the array size is changed... /* the func. initStrings must be changed too!! */ double totCacheCPU_energy = 0; /* Energy consumed in cache and CPU (nJ) */ double bus_energy = 0; /* Energy consumed in the off-chip bus (nJ). */ double mem_energy = 0; /* Energy consumed in the main memory (nJ). */ /* The capacitance for the lines in the bus, C=20pF, the voltage, V=3.3V. */ double E = 0.1089; /* Energy consumed per busline, 0.5*C*V*V, (nJ) */ FILE *file, *file1, *totEnergyFile; int instr_counter = 0; fprintf(stderr, "sim: ** starting functional simulation w/ caches **\n"); /** Initialisation routines... **/ file1 = fopen("syscallstats.dat","w"); if (file1 == NULL){ printf("ERROR: Could not open file."); exit(0); } fprintf(file1, "Syscall\t\t\tMin\tAvr\tMax\n"); for (syscall = 0; syscall <= NROFSYSCALLS-1; syscall++){ syscallStats[syscall].nrOfSyscalls = 0; syscallStats[syscall].exectime.min = 30000; syscallStats[syscall].exectime.sum = 0; syscallStats[syscall].exectime.max = 0; syscallStats[syscall].energy.min = 30000; syscallStats[syscall].energy.sum = 0; syscallStats[syscall].energy.max = 0; } initStrings(syscallStats); totEnergyFile = fopen("energy.dat","w"); if (totEnergyFile == NULL){ printf("ERROR: Could not open file."); exit(0); } fprintf(totEnergyFile, "Syscall\t\t\tBusEnergy\tMemEnergy\tCPUCacheEnergy\tTOTAL\n"); for (line=0; line<ADDRBUS_WIDTH; line++) addressBus[line] = 0; for (line=0; line<DATABUS_WIDTH; line++) dataBus[line] = 0; file = fopen("output.txt","w"); if (file == NULL){ printf("ERROR: Could not open file."); exit(0); } fprintf(file, "#Instr Tot_cycles Instr_cycles Icache Dcache L2Cache CPU TOTAL\n"); // output data header statpointer = &stat; /* set up initial default next PC */ Lena Higberg Energy Modelling of Portable Embedded Systems 35 regs.regs_NPC = regs.regs_PC + sizeof(md_inst_t); /* check for DLite debugger entry condition */ if (dlite_check_break(regs.regs_PC, /* no access */0, /* addr */0, 0, 0)) dlite_main(regs.regs_PC - sizeof(md_inst_t), regs.regs_PC, sim_num_insn, &regs, mem); while (TRUE) { /*rhi:added for Wattch to clear hardware access counters */ clear_access_stats();/* main_mem_access also cleared here. */ /* maintain $r0 semantics */ regs.regs_R[MD_REG_ZERO] = 0; #ifdef TARGET_ALPHA regs.regs_F.d[MD_REG_ZERO] = 0.0; #endif /* TARGET_ALPHA */ cache_dl1_accesses_old = cache_dl1->hits + cache_dl1->misses; cache_il1_hits_old=cache_il1->hits; cache_il1_misses_old=cache_il1->misses; cache_dl1_hits_old=cache_dl1->hits; cache_dl1_misses_old=cache_dl1->misses; cache_dl2_hits_old=cache_dl2->hits; cache_dl2_misses_old=cache_dl2->misses; cycles_old = cycles; /* get the next instruction to execute */ if (itlb) cache_access(itlb, Read, IACOMPRESS(regs.regs_PC), NULL, ISCOMPRESS(sizeof(md_inst_t)), 0, NULL, NULL); if (cache_il1){ /* rhi: added for wattch */ icache_access++; cache_access(cache_il1, Read, IACOMPRESS(regs.regs_PC), NULL, ISCOMPRESS(sizeof(md_inst_t)), 0, NULL, NULL); } MD_FETCH_INST(inst, mem, regs.regs_PC); /* keep an instruction count */ sim_num_insn++; /* set default reference address and access mode */ addr = 0; is_write = FALSE; /* set default fault - none */ fault = md_fault_none; /* decode the instruction */ MD_SET_OPCODE(op, inst); /* execute the instruction */ switch (op) { #define DEFINST(OP,MSK,NAME,OPFORM,RES,FLAGS,O1,O2,I1,I2,I3) case OP: SYMCAT(OP,_IMPL); break; #define DEFLINK(OP,MSK,NAME,MASK,SHIFT) case OP: panic("attempted to execute a linking opcode"); #define CONNECT(OP) #define DECLARE_FAULT(FAULT) { fault = (FAULT); break; } #include "machine.def" default: panic("attempted to execute a bogus opcode"); } \ \ \ \ \ \ if (fault != md_fault_none) fatal("fault (%d) detected @ 0x%08p", fault, regs.regs_PC); if (MD_OP_FLAGS(op) & F_MEM){ sim_num_refs++; if (MD_OP_FLAGS(op) & F_STORE) is_write = TRUE; } /* update any stats tracked by PC */ for (i=0; i < pcstat_nelt; i++){ Lena Higberg Energy Modelling of Portable Embedded Systems 36 counter_t newval; int delta; /* check if any tracked stats changed */ newval = STATVAL(pcstat_stats[i]); delta = newval - pcstat_lastvals[i]; if (delta != 0){ stat_add_samples(pcstat_sdists[i], regs.regs_PC, delta); pcstat_lastvals[i] = newval; } } /* check for DLite debugger entry condition */ if (dlite_check_break(regs.regs_NPC, is_write ? ACCESS_WRITE : ACCESS_READ, addr, sim_num_insn, sim_num_insn)) dlite_main(regs.regs_PC, regs.regs_NPC, sim_num_insn, &regs, mem); /*** Counting cycles... ***/ if (cache_dl1->hits+cache_dl1->misses > cache_dl1_accesses_old){ dcache_access++; } /* Instruction cache is always accessed... */ if (cache_il1->hits > cache_il1_hits_old) cycles++; /* l1 hit. */ else cycles += 12; /* l1 miss. */ /* miss in dl1 cache ? */ if(cache_dl1->misses > cache_dl1_misses_old) cycles += 12; /* accessed main memory ? */ if(cache_dl2->misses > cache_dl2_misses_old) cycles += (cache_dl2->misses - cache_dl2_misses_old)*54;/*could be several accesses */ if(main_mem_access != (cache_dl2->misses - cache_dl2_misses_old)) printf("\nWARNING !!!!! inconsistend data: memory accesses\n"); /*rhi:Added by Wattch to update per-cycle power statistics */ update_power_stats(); totCacheCPU_energy += report_totCache_energy(); /** Estimate bus and memory energy consumption... **/ /* Using a simplified bus energy model for now.. -> assume 50% of the lines swithces each time the bus is used... */ /* Energy consumed per line that switches value: E = 0.1089 nJ */ /* Energy consumed in memory: on access 4.95nJ, else 0.0000166667nJ */ if ( main_mem_access > 0 ){ mem_energy += 4.95*main_mem_access; bus_energy += ((ADDRBUS_WIDTH+DATABUS_WIDTH)*0.5*E) * main_mem_access; }else{ mem_energy += 0.0000166667; /* Some energy consumed in memory anyway. */ /* No energy consumed in buses. */ } /** Already estimating exectime and energy consumption for a syscall? **/ if (calcSyscallStats == 1){ if (regs.regs_NPC >= 0x00408000){ /* if syscall finished, save syscall stats... */ syscallStats[syscall].nrOfSyscalls++; /* ..exectime stats */ syscallStats[syscall].exectime.sum += cycles - start_value; if (syscallStats[syscall].exectime.min > cycles - start_value) syscallStats[syscall].exectime.min = cycles - start_value; if (syscallStats[syscall].exectime.max < cycles - start_value) syscallStats[syscall].exectime.max = cycles - start_value; /* ..and energy stats */ syscallEnergy = bus_energy+mem_energy+totCacheCPU_energy; syscallStats[syscall].energy.sum += syscallEnergy; if (syscallStats[syscall].energy.min > syscallEnergy) syscallStats[syscall].energy.min = syscallEnergy; if (syscallStats[syscall].energy.max < syscallEnergy) syscallStats[syscall].energy.max = syscallEnergy; /* print result to file. */ fprintf(totEnergyFile, "%f\t%f\t%f\t%f\n", bus_energy, mem_energy, totCacheCPU_energy, bus_energy+mem_energy+totCacheCPU_energy); calcSyscallStats = 0; } } /** Is a new system call made? **/ Lena Higberg Energy Modelling of Portable Embedded Systems 37 if (strcmp(MD_OP_NAME(op),"jal") == 0 && regs.regs_NPC >= 0x004014c0 && regs.regs_NPC <= 0x00402690 && calcSyscallStats != 1){ syscall = check_syscall(regs.regs_NPC, totEnergyFile); if (syscall==19) /* False alarm... */ calcSyscallStats = 0; /*fprintf(file1, "%x ", regs.regs_NPC);*/ else{ calcSyscallStats = 1; /* start tracking syscall stats. */ start_value = cycles; mem_energy = 0; bus_energy = 0; totCacheCPU_energy = 0; } } /** Check if syscall requires direct execution of taskswitch... **/ /* if so save address to EPC reg. */ if (strcmp(MD_OP_NAME(op),"jal")==0){ /* syscall addresses in sumo.map */ if (regs.regs_NPC == 0x00401ff8 /* delay */ || regs.regs_NPC == 0x00402130 /* wait_for_next_period */ || regs.regs_NPC == 0x00401be8 /* thread_block */ /*|| regs.regs_NPC == 0x004018b8 /* thread_delete */ || regs.regs_NPC == 0x00401910 /* thread_yield */ || regs.regs_NPC == 0x00402610 /* pend_sem */ ) { /* NPC is changed due to the fact that jal is a jump instruction. */ regs.regs_COPROC.epc = regs.regs_PC + sizeof(md_inst_t); } } /** Run SCHEDULER? **/ instr_counter++; /* After 35000 instr let SCHEDULER run, if OS kernel code is not already running. */ if (instr_counter >= 35000 && regs.regs_NPC >= 0x00408000/* && regs.regs_NPC <= 0x00408570 /*not kernel code*/){ printf("** 35 000 instr (%d) - run SCHEDULER **\n", instr_counter); regs.regs_COPROC.epc = regs.regs_NPC; /** 0x004010a0 is the startadress for SCHEDULER (see sumo.map). */ regs.regs_PC = 0x004010a0; regs.regs_NPC = 0x004010a0 + sizeof(md_inst_t); instr_counter = 0; } else{ /* go to the next instruction */ regs.regs_PC = regs.regs_NPC; regs.regs_NPC += sizeof(md_inst_t); } /* finish early? */ if (max_insts && sim_num_insn >= max_insts){ /** When simulating a OS we always end up here... **/ /* so write result to file and close. */ for (syscall = 0; syscall <= NROFSYSCALLS-1; syscall++){ if (syscallStats[syscall].nrOfSyscalls > 0) fprintf(file1, "%d %s%d\t%d\t%d\t%.2f\t%.2f\t%.2f\n", syscallStats[syscall].nrOfSyscalls, syscallStats[syscall].name, syscallStats[syscall].exectime.min, (syscallStats[syscall].exectime.sum/ syscallStats[syscall].nrOfSyscalls), syscallStats[syscall].exectime.max, syscallStats[syscall].energy.min, (syscallStats[syscall].energy.sum/ syscallStats[syscall].nrOfSyscalls), syscallStats[syscall].energy.max); } fclose(file); fclose(file1); fclose(totEnergyFile); return; } } } Lena Higberg Energy Modelling of Portable Embedded Systems 38 Appendix B The application (symo_os.c) used when retrieving the results (execution time and energy consumption for the different system calls). #define #define #define #define #define #define #define #define #define #define #define #define IDLE_ID 1 IDLE_PRIO 15 task1_ID 2 task1_PRIO 0 task2_ID 3 task2_PRIO 0 y_ID 4 y_PRIO 0 t3_ID 5 t3_PRIO 6 t4_ID 6 t4_PRIO 6 void main(void){ printf("STARTMAIN\n"); os_init(); thread_create(IDLE_ID, IDLE_PRIO,ready,idle, idle_stack,STACK_SIZE); thread_create(y_ID, y_PRIO,blocked, y_idle, y_stack, STACK_SIZE); thread_create(task2_ID, task2_PRIO, ready,task2, task2_stack,STACK_SIZE); thread_create(task1_ID, task1_PRIO, ready,task1, task1_stack,STACK_SIZE); thread_create(t3_ID, t3_PRIO, ready,task3, t3_stack,STACK_SIZE); thread_create(t4_ID, t4_PRIO, ready,task4, t4_stack,STACK_SIZE); if( create_semaphore(1,5)) printf("sem1 error\n"); if(create_semaphore(2,2)) printf("sem2 error\n"); thread_start(IDLE_ID); while (1){} } void idle(void){ while(1){} } void y_idle(void){ int flag=0; timeq_info_t info; init_period_time(2); /* Maste anropas fran aktuellt task!! */ start_period(y_ID); create_semaphore(3,3); read_semaphore(3); while (1){ pend_semaphore(1); printf("Y\n"); release_semaphore(1); pend_semaphore(3); delete_semaphore(3); info = read_timeq(0,y_ID); read_semaphore(1); /*delay(1): Periodic threads not allowed to use delay!!*/ if (flag==1){ thread_start(task1_ID); remove_from_timeq(0,t4_ID); } pend_semaphore(1); printf("Y - wait_for_next_period anropas\n"); release_semaphore(1); wait_for_next_period(); flag=1; } } void task1(void){ thread_start(y_ID); read_semaphore(1); while(1){ pend_semaphore(1); printf("t1\n"); release_semaphore(1); thread_block(); Lena Higberg Energy Modelling of Portable Embedded Systems 39 pend_semaphore(1); printf("t1 - Some task woke me up!!\n"); release_semaphore(1); thread_yield(); } } void task2(void){ thread_info_t information; while(1){ pend_semaphore(1); read_semaphore(2); printf("t2\n"); release_semaphore(1); pend_semaphore(2); delete_semaphore(2); information =thread_getinfo(task2_ID); if(information.priority == 1){ pend_semaphore(1); printf("prio=1\n"); release_semaphore(1); } pend_semaphore(1); printf("t2 - delay anropas\n"); release_semaphore(1); delay(1); pend_semaphore(1); printf("t2 - After delay!\n"); release_semaphore(1); read_semaphore(1); thread_start(task1_ID); pend_semaphore(1); printf("t2 - task 1 is now ready, this task will be deleted\n"); stop_period(t4_ID); release_semaphore(1); thread_delete(); } } void task3(){ init_period_time(3); start_period(t3_ID); pend_semaphore(1); printf("t3\n"); release_semaphore(1); while (1){ wait_for_next_period(); thread_getinfo(t3_ID); /*stop_period(t3_ID); remove_from_timeq(0,t3_ID); /* 1-activate, 0-terminate */ } } void task4(){ int i=0; delay(1); init_period_time(2); start_period(t4_ID); thread_getinfo(t4_ID); thread_getinfo(t3_ID); pend_semaphore(1); printf("t4\n"); release_semaphore(1); delay(1); while(1){ i++; if(i>3) stop_period(t4_ID); if(i>5){ delay(2); remove_from_timeq(0,t3_ID); thread_delete(); } wait_for_next_period(); } } Lena Higberg Energy Modelling of Portable Embedded Systems 40