Table of Contents Abstract .................................................................................................................................................................. 2 1 Basic Concepts .................................................................................................................................................... 3 1.1 Embedded systems .......................................................................................................................................... 3 1.2 Microkernel ...................................................................................................................................................... 4 1.3 Virtualized Linux ............................................................................................................................................... 5 1.4 Scheduling ........................................................................................................................................................ 6 1.5 Wombat - Iguana model .................................................................................................................................. 7 2 PerfMonitor Description ..................................................................................................................................... 9 2.1 Architecture ..................................................................................................................................................... 9 2.2 Use cases ........................................................................................................................................................ 10 2.3 Concepts ......................................................................................................................................................... 11 2.3.1 Function ...................................................................................................................................................... 11 2.3.2 Channel ....................................................................................................................................................... 12 2.3.3 Data Hierarchies ......................................................................................................................................... 12 2.3.4 Event Loop .................................................................................................................................................. 13 2.3.5 Resource Manager ...................................................................................................................................... 14 3 Implementation details..................................................................................................................................... 15 3.1 Client - Server Protocol .................................................................................................................................. 15 3.2 Client Library API ............................................................................................................................................ 19 3.3 Configuration Files ......................................................................................................................................... 26 Abstract The real time requirements for embedded systems and the flexibility and high performance of general purpose computers is a hard target to reach for a single system. Measuring performance for ARM processors and setting profiles for applications is a starting point for building a scheduler that can guarantee real time deadlines while stil maintaing high performance. 1 Basic Concepts 1.1 Embedded systems An embedded system could be regarded as a computer system designed to perform one or more dedicated functions, usually with real-time constraints. They can contain one or more processors, each dedicated to handle a particular task. A general purpose computer is defined, in contrast, to be flexible and meet end-user needs. An embedded system is not a strictly definable term. Most systems have programmability and extensibility elements, which are specific to general pupose computers. A handheld computer can have an embedded operating system and multiple specialized processors, like a digital signal processor (DSP), while also allowing different applications to be loaded and peripherals to be connected. The embedded market has well over 99% of processors and is growing strongly while the PC market is rather flat. Most of these systems have very small operating systems, their kernel consisting of some device drivers and libraries, and are based on 8-bit and 16-bit microcontrollers. The demand for more sophysticated devices, with 32-bit general purpose microprocessors, or even 64-bit, with memory management units (MMU) has risen as more and more different functionalities are now incorporated into a single device. As an embedded system is usually a real-time computing system, so that the correctness of an operation on such a system depends not only on the logical correctness of the code, but also on the restrictions on time. Depending on the time constrains these systems can be classified into: hard real time: the execution of a critical part of code is guaranted to be within a given time interval. A deadline violation is considered a critical failure and can have disastrous consequences. For example when deploying the airbags on a automobile. soft real time: if the time constrains are violated only the service quality for the product is being reduced. A deadline violation is not critical and may occour from time to time. For example when diplaying a video frame, an omission may occour with no disastrous consequece. A normal Linux is not suitable for real time operations, because it's response times are unpredictable. RTLinux (Real Time Linux) addresses this problem. RTLinux is a microkernel that runs the entire Linux operating system as a fully preemptive process, meaning that the Linux will run on top of this core as a thread. This system should provide real-time performance. However, the system is to complex to be fully analized, and tests under heavy load have shown deadline violations[MHH02]. The real time requirements for embedded systems and the flexibility and high performance of general purpose computers is a hard target to reach for a single system. Measuring performance for ARM processors and setting profiles for applications is a starting point for building a scheduler that can guarantee real time deadlines while stil maintaing high performance. 1.2 Microkernel A microkernel is a minimal kernel, which only provides mechanisms needed to implement an operating system. It does not provide any services. The microkernel is the only software running in privileged mode, and if the hardware provides multiple levels of privilege, than the microkernel would be running at the most privileged one. The actual operating system is implemented in user-mode (unprivileged mode). Services like device drivers, protocol stacks, file system, user interfaces, all run in usermode. The microkernel provides mechanisms like address space management, thread management and inter-process communication. This approach gives some benefits. The microkernel is very small in size and fast. The code is easyer to maintain, and therefor it is more reliable. Also the failure of a service, which now runs in user-mode does not corrupt the kernel. For example, if a networking service crashes, this will only terminate this service, leaving the rest of the system functional. On the other hand, because of the Linux kernel very large size the code is hard to maintain, and a bug in the kernel is more likely to appear. This is making the Linux kernel not suitable for embedded systems. The microkernel can have servers running on top of it. Servers are basically daemon programs, for which the kernel can grant some special privileges, like interacting with phisical memory. Device drivers can interact directly with the hardware. A basic set of servers for a general purpose microkernel could be made of file system, device drivers, networking servers. A crash from such a server can be corrected by simply restarting it. This can lead to some system state loss, but in most cases this is not a problem. Fig 1: Monolithic Kernel - Microkernel 1.3 Virtualized Linux Virtualization is a framework or methodology of dividing the resources of a computer into multiple execution environments, by applying one or more concepts or technologies such as hardware and software partitioning, time-sharing, partial or complete machine simulation, emulation, quality of service, and many others. A virtualized Linux will not have direct access to a computer's resources. Over the resources there will be added an abstractization layer. This layer is usually called Virtual Machine Monitor (VMM). There are many ways to think about virtualization. The VMM could run itself over the hardware, without requiring any host operating system, or it could run as a top level application, over an existing hosted OS. 1.4 Scheduling Scheduling is mainly concerned with CPUs resources allocation to processes. The software entity responsible with this is called scheduler. As there usually are more processes requesting resources than there are resources, they need to be shared. Scheduling algorithms are made to take into consideration the following factors: CPU utilization - to keep the CPU as busy as possible. Throughput - number of processes that complete their execution per time unit. Turnaround - total time between submission of a process and its completion. Waiting time - amount of time a process has been waiting in the ready queue. Response time- amount of time it takes from when a request was submitted until the first response is produced. Fairness - Equal CPU time to each thread. Scheduling algorithms can be classified into: ◦ first in first out (FIFO): it is the simplest algorithm, meaning the first process to arive in the ready queue gets served ◦ shortest remaining time: the scheduler puts the processes in the ready queue according to how much time they have left to execute ◦ fixed priority preemptive scheduling: processes are given priorities and the scheduler puts them in the ready queue according to that. If a higher than current priority process preempts, the scheduler will interrupt the current running process. ◦ round robin: every process is assigned a fixed time slice, and the scheduler cycles through them. The scheduling in Linux since kernel version 2.5 is done with a multilevel feedback queue algorithm. It is a combination of multiple algorithms. There are priority levels ranging from 0 to 140, 0 – 99 are real-time priorities and 100 – 140 non-real time. For the real-time processes the fixed priority algorithm is applied, and for processes with the same real-time priority, it is the round – robin. For the non-real time ones, there are multiple FIFO ready queues. A newly starting process will get positioned at the top level FIFO queues. After a process uses its time slice it will drop from its current FIFO queue to the one on the next level. This continues until the process reaches its base ready queue, where a round-robin algorithm will aply. A process can also be promoted in the ready queues if it blocks for an I/O operation. This kind of scheduling favours the short jobs and the I/O intesive ones. From version 2.6 a new scheduler was introduced, the O(1) scheduler. This one reduces the overhead of the previous one, and does the scheduling in a constant time (0(1)), no matter how many processes are runnning the system. And from version 2.6.23 the Completely Fair Scheduler was introduced. This one changes the concept of running queues with one of a time-ordered red-black tree to build a timeline of future task execution. Also it uses nanosecond granularity accounting, removing the notion of timeslice and other heuristics. 1.5 Wombat - Iguana model This Wombat – Iguana model is an implementation of a system over a microkernel. As the microkernel offers no services, the Iguana layer is needed for basic services: allocating and sharing memory, memory proctection and general resource management. The Iguana layer has an address-space management that reduces overheads on context-switching on processors with virtual addressed caches. A virtual address is tied to particular process. As different processes tend to use the same virtual addresses for specific code/data segments, on a context switch also a cache flush is required. This can be avoided if the processes have non-overlapping address spaces. So Iguana is trying to avoid overlapping. Rather than every process having its own address space, they all try to share the same one, each one getting its own protection domain. A process can access data in it's virtual address space, only if it is inside its protection domain. On 32-bit processors the 4GB of addressing space may not be enough for all processes, so newly created processes can have their own virtual address space. Fig 2: Wom bat – Iguan a Wo mba t repr esen ts the Linu x serv er. It is runn ing in its own protection domain as a process. The Compatibility Mode Linux Process is running in a different virtual address space. 2 PerfMonitor Description 2.1 Architecture VMX Performance monitoring library allows applications built on top of it to obtain performance information for selected Linux processes, process this information and send it to a remote analysis tool. The library has two components: core - used both by client and server to build channel/function hierarchies and to talk to each other. client - used to build applications with support for Performance Monitoring. Client library exposes to the user a C API detailed described in a later chapter. The API provides an easy way to define what information is to be extracted, how it is processed and also when to read/send this information to the remote analysis tool. Communication with Performance Monitoring Server (vPmon) is abstracted. Client is provided with the option of using a configuration file instead of hard-wire function calls in the application. In this case the user needs to insert event hooks in the places with special meaning to the application and at a later time, without recompiling, configure what is to be read. The other component is the server. It is responsible with collecting messages from clients, interpret them and send them to a remote tool. It can also remote monitor applications. An overview of how the library is used to analyse performance data in a system is described in the picture below: 2.2 Use cases There are two main use cases: self monitored. Application source code is modified to use performance monitoring library. Event signaling triggers collecting, processing and forwarding of performance data to Performance Monitor Server. This is the preferred method of using the library because it allows accurate profiling. In the picture above, self monitored application uses the C API to declare which performance counters are of interest and what kind of data is to be reported for each event. Counters are read within application context using PM Linux Syscall. This syscall is only ashim as the Kernel does not store performance management information. It forwards the request to the Micro Kernel which reads performance management virtualized registers and sends the data back to Linux Kernel and then to User Space. Because counters are read using a Micro Kernel syscall read atomicity is guaranteed. When syscall returns to performance monitoring library the data isprocessed according to the channel hierarchy and then sent to vPmon server via Unix Socket. Server unpacks event data formats it according to the selected output method and sends it to the remote analysis toolvPerf. remote monitored. Allows collecting of performance management data without the need to recompile the application. The downside is that events are notnecessarily correlated with the innerworkings of the application as there is no way of knowing when it starts or ends running a specific task. This case is selected when another application, built with performance monitoring library, requests that the server starts monitoring a specific Linux process ID. When event is signalled, usually by a periodic timer, the server uses PM Linux Syscall to retrieve counters data for the selected process, then it passes it through associated channels and formats it to the selected output method and sends it through Serial/UDP. 2.3 Concepts 2.3.1 Function This object stands for a specific performance parameter: number of cycles, number of instructions, thread execution time, etc. It has a pair of PMC/PMD registers associated with it. The association is not fixed since the same function can be acomplished with different PMC/PMD register pairs. Platform Manager object handles mapping of PMC/PMD registers to Function instances (object factory). Function object supports the following: read/write counter value - Write() call fails for read only functions like thread execution time. Counter value is not necessarily 8 byte wide. Size() reports how many bytes are needed to store the value returned by Read(). introspection - via Type() function. Each type (cycles, instructions, time, etc.) has its own ID. Type() together with Size() define a data type which can be processed by Channels. behaviour configuration - various flags can be set which change the way Function behaves. For instance, cycle and instruction counters can be configured to either continue counting or reset to zero after Read(). 2.3.2 Channel This object extends Function interface providing same functionality while not being tied to a specific performance parameter. Channels are linked to other Functions or to other Channels and process in different ways data coming from the entities they are linked to: accumulate, add, group, etc. Channel object supports the following: read/write - the value obtained is related to the type of channel: it can be the sum of two Functions, it can be an accumulated value for one Function over multiple Read()'s, etc. introspection - via Type() and Visit() functions. The library user can enumerate entities linked to the selected channel and dynamically change the hierarchy of performance management data being read upon an event. dynamic structure changes - via Link() and Unlink(). Functions and Channels can be added/removed at runtime. 2.3.3 Data Hierarchies Here is an example of a configuration of channels and functions linked together in two tree-like structures assigned to two events. When an event is triggered, from an internal source or based on a timeout, channels are read. Values that are obtained by Read(), described by Type() and Size() are sent to the Server to be processed further and sent to the remote monitoring application. Fu nctions/Ch annels can be linked to multiple channels and a Channel, depending on its type can have multiple Functions/Channels linked to it. User must take care not to create loops as Link() method does not check against them. A mechanism is in place so that a Function is Read() only once when an Event is triggered, even if it is reachable from the top Channel through multiple paths. 2.3.4 Event Loop Performance management library is event driven. The server handles messages received from clients and also configuration commands received from the remote monitoring application. Also, the server handles periodic reading of channels for remote monitored applications. For all this to work the library offers a generic event loop object. It supports the following: monitoring of file descriptors ready to be read()/recv() configurable timeouts - which are also used to offer timer functionality for periodic channel reads callbacks triggered each time something happens - at every loop - useful mostly for debugging configurable loop latency - a bound on how much to wait for a file descriptor to be ready. If there is an every-loop callback configured then it is guaranteed to be called with at most loop-latency delay Event Loop functionality is presented in the figure below: 2.3.5 Resource Manager Resources are managed at two levels: allocation of low-level performance regisers is managed by Platform objects, availability of high-level Functions (performance register pairs) is managed by Resource Manager objects. Functions and low-level register allocation are notgroupped together in the same class. In the case of a self-monitored thread that is also remote-monitored there is only one Platform instance but there are two Resource Managers: one in the application itself and one in Performance Monitor server (in charge of remote monitoring). Resource Manager supports the following: initialization - based on PID and Platform instance. All Functions are instantiated and made available. function reservation - user requests for a specific function to be available. Low-level register contention may occur because another function is using the same PMC/PMD registers function release - marks the function and associated Platform registers as available one-shot read - the specified set of Functions is read and result is cached. This cancels the errors caused by time-delay in reading Channels. Self-monitored threads resource allocation is done using a Platform proxy. Requests to acquire/release PMC/PMD registers are sent to Performance Monitoring server and approved/rejected. 3 Implementation details 3.1 Client - Server Protocol Messages from client to the PerfMonitor server are of TLV type: 0______3_4_____7_8______________ | Type | Length| Value | |______|_______|_______________| TLVs can be composed, the Value field can contain other TLVs. The Length field contains the length of the Value field. Messages are sent as commands and are triggered by different events: Commands: PM_CMD_CLIENT_REGISTER 0_______________________3_4______7_8_______11 | PM_CMD_CLIENT_REGISTER | 4 | pid | |________________________|________|_________| -registers the client with process id 'pid' to the server, after this, the client can send other messages PM_CMD_CLIENT_UNREGISTER 0_________________________3_4______7_8_______11 | PM_CMD_CLIENT_UNREGISTER | 4 | pid | |__________________________|________|_________| -unregisters the client with process id 'pid', so that the server can clear the context PM_CMD_CLIENT_ECHO 0_________________________3_4______7 | PM_CMD_CLIENT_ECHO |0 | |__________________________|________| -the client cand send an echo, and get a reply, to see if the vPmon is still active PM_CMD_CLIENT_PLATFORM 0_________________________3_4______7_8____11_12____15_16_________ | PM_CMD_CLIENT_PLATFORM | length | pid | type | registers | |__________________________|________|_______|________|___________| -'pid' is the client's pid making a request -'type' can be: ACQUIRE or RELEASE -'registers' are pairs of uint_32 representing the registers to acquire or release - the client sends these messages when trying to reserve/release hardware functions. The client will receive a reply: 0_________________________3_4______7_8____11 | PM_CMD_CLIENT_PLATFORM | 4 | result| |__________________________|________|_______| - 'result' is the result of the client's request. It will contain success, or the error type. PM_CMD_CLIENT_MONITOR_PERIODICALLY 0_________________________3_4______7_8____11_12____15_16_____19_20______23 | PM_CMD_CLIENT_MONITOR_PER| 16 | pid | type | 4 | period | |__________________________|________|_______|________|_________|_________| - a client sends this request in order for the server to start or stop monitoring a remote process - 'pid' is the pid of the process to be monitored/ stop monitoring. - 'type' is event type. It can be ET_START_THREAD - for start monitoring or ET_END_THREAD - for stop monitoring. - 'period' is the period at which data is collected from the monitored process. PM_CMD_CLIENT_EVENT 0_________________________3_4______7_8____11_12_____________ | PM_CMD_CLIENT_EVENT | length | pid | Events |__________________________|________|_______|_______________ - a command can contain multiple client events PM_EVENT 0______________3_4______7_8____11_12_____________ | PM_EVENT | length | tag | Channels |_______________|________|_______|_______________ - 'tag' contains the event type. It can be one of the following: ET_START_SYSTEM_RUN ET_END_SYSTEM_RUN ET_START_APP ET_END_APP ET_START_THREAD ET_END_THREAD ET_START_THREAD_PERIOD ET_END_THREAD_PERIOD ET_END_SESSION ET_POWER ET_SYSINT_SY ET_SYSINT_IL ET_SYSINT_SK ET_SYSINT_SI Channels do different transformations on the Functions and Channels they are linked with. The TLVs returned by the channels can be simple or composed. The composed TLS have this structure: 0_________________3_4______7_8______ | PM_FUNC_COMPOSED | length | TLVs |__________________|________|_______ - the 'TLVs' field can be null or contain other TLVs. Channels that give composed TLVs are: PM_CHAN_PASS_THROUGH_MULTIPLE - it concatenates in the 'TLVs' field the TLVs returned by the Functions and Channels it is linked with. If no Function or channel is linked the TLVs field remains empty PM_CHAN_NULL - it returns an empty 'TLVs' field. It reads the linked Functions/Channels but does not store the results. PM_CHAN_REVERSE - it reads the linked Functions/Channels in reverse order than the one filled in the 'TLVs' The remaining Channels give simple 'TLVs': PM_CHAN_PASS_THROUGH - it can be linked to only 1 Function/Channel and gives the exact TLV it reads PM_CHAN_MICRO2MILI - it can be linked to only 1 Function/Channel and if the read TLV is one of the uint_64 types, it divides the value with 100 PM_CHAN_AGGREGATE - it can be linked to multiple Functions/Channels. If they are of the same types, uint_64 type, it adds the data of all the read TLVs, and gives a TLV with all aggregated data. The TLVs returned by hw Functions: 0_____3_4______7_8______ | type | length | data | |______|________|_______| - the 'type' field can be: PM_FUNC_VOLTAGE = 1, PM_FUNC_CURRENT, PM_FUNC_POWER, PM_FUNC_INSTR, PM_FUNC_CYCLES, PM_FUNC_FREQ, PM_FUNC_TIME_STAMP, PM_FUNC_DUMMY, The TLVs returned by user Functions: 0_____________3_4______7_8______12_13_______ | PM_USER_FUNC | length | tag | data | |______________|________|_________|_________| - the 'tag' field contains can be: PM_FUNC_ALGORITHM PM_FUNC_TEXT PM_FUNC_PPID PM_FUNC_INSTANCE_NO PM_FUNC_TIME_SCHEDULED PM_FUNC_TIME_DEADLINE PM_FUNC_TIME_EXECUTION PM_FUNC_REAL_TIME PM_FUNC_THREAD_TYPE 3.2 Client Library API Functions in this API return values from this enum: enum{ PM_RESULT_OK = 0, PM_ERR_FAIL, PM_ERR_INVALID_PARAM, PM_ERR_OUT_OF_BOUNDS, PM_ERR_CONNECTED, PM_ERR_NOT_CONNECTED, PM_ERR_MISMATCH, PM_ERR_UNAVAILABLE, PM_ERR_UNKNOWN, PM_ERR_NO_MEMORY, PM_ERR_BIND, PM_ERR_TIMEOUT } PM_RESULT pmon_open( PM_HANDLE *out_HLibrary, uint32_t in_Flags, const char* in_AppName, int in_AppID, const char* in_CfgFile, ...); Main PerfMonitor library function. Offers the performance monitoring capabilities to any application that uses it. Opens a handler to the PerfMonitor library that will be used for any further perfmonitoring activities. out_HLibrary = reference to the handler that this function will open. in_Flags = configuration flags related to the internal capabilities of the vPmon handler. e.g.: buffered or non-buffered events; in_CfgFile = if present (!= NULL), the resources management in terms of channels, events and functions can be done 'automatically' by using a configuration file in which all the relationships between functions, events and channels are described. If NULL then all the associations between functions, channels and events have to be done manually. in_AppName = the name of the desired data set inside the configuration file. e.g. decoder, display in_AppId = the identifier of the data set inside the configuration file. e.g.: 1, 2 and so on PM_RESULT pmon_close( PM_HANDLE *in_HLibrary ); Closes the handler previously open using pmon_open(). Any vPmon facilities are now unavailable to the application that used it. in_HLibrary = a valid handler (a previously open vPmon main resource, using pmon_open()) has to be passed to this function. PM_RESULT pmon_channel_open( PM_HANDLE in_HLibrary, int in_ChannelType, PM_HANDLE *out_HChannel ); Opens a channel. A channel must be open in order for the application to be able to perform resource monitoring. This entity then must be linked with some other vPmon entities - the functions. Finally, a channel is instructed to do the measurements using the third kind of vPmon entity - the events. in_HLibrary = a valid handler, previously open using pmon_open(). in_ChannelType = designate the channel's type. Actual channel types are: PM_CHAN_PASS_THROUGH PM_CHAN_PASS_THROUGH_MULTIPLE PM_CHAN_MICRO2MILI PM_CHAN_AGGREGATE PM_CHAN_NULL PM_CHAN_REVERSE out_HChannel = reference to the open channel. PM_RESULT pmon_channel_close( PM_HANDLE in_HLibrary, PM_HANDLE *in_HChannel ); Closes a channel. in_HLibrary = a valid handler, previously open using pmon_open(). in_HChannel = reference to the channel to be closed. PM_RESULT pmon_channel_link( PM_HANDLE in_HLibrary, PM_HANDLE in_HChannel, PM_HANDLE in_HFunction ); Links a function to a channel. By doing this, it offers the channel the possibility to get information from the platform reserved performance management registers and pass it to the applicationlevel. in_HLibrary = a valid handler, previously open using pmon_open(). in_HChannel = reference to an open channel. in_HFunction = reference to a function. The functions inside vPmon offer the functionality and flexibility the application needs. e.g.: measure the real-time at which some event happens, identify the number of cycles and/or instructions that some operation takes, report the CPU's voltage and/or the current at a given moment and so on. PM_RESULT pmon_channel_unlink( PM_HANDLE in_HLibrary, PM_HANDLE in_HChannel, PM_HANDLE in_HFunction ); Unlinks (detaches ) a function from a channel. Therefore, that channel cannot offer the information related to the unlinked function any more. in_HLibrary = a valid handler, previously open using pmon_open(). in_HChannel = reference to an open channel. in_HFunction = reference to a function. PM_RESULT pmon_function_open( PM_HANDLE in_HLibrary, PM_HANDLE *out_HFunction, const char* in_FunctionType, ... ); Opens a function. The function itself is the entity that offers actual information to the application that uses PerfMonitor library. Depending on its type, it performs the the desired measurement and offers the information to the user-level. in_HLibrary = a valid handler, previously open using pmon_open(). in_FunctionType = the function type offers the desired functionality to the application that uses vPmon. The actual function types are: ~> Application related measurement functions: PM_FUNC_VOLTAGE PM_FUNC_CURRENT PM_FUNC_POWER PM_FUNC_INSTR PM_FUNC_CYCLES PM_FUNC_FREQ PM_FUNC_TIME_STAMP PM_FUNC_DUMMY PM_RESULT pmon_function_close( PM_HANDLE in_HLibrary, PM_HANDLE *in_HFunction ); Closes a previously opened function. in_HLibrary = a valid handler, previously open using pmon_open(). in_HFunction = reference to a valid function, previously pmon_function_open(). open with PM_RESULT pmon_function_enable( PM_HANDLE in_HLibrary, PM_HANDLE in_HFunction ); Enables a function. Once a function is open, in order to use it with a channel it has to be enabled first. in_HLibrary = a valid handler, previously open using pmon_open(). in_HFunction = reference to a valid function, previously open with pmon_function_open(). PM_RESULT pmon_function_disable( PM_HANDLE in_HLibrary, PM_HANDLE in_HFunction ); Disables a function. Before its close, a function must be disabled first. The final operation on a function is closing it. in_HLibrary = a valid handler, previously open using pmon_open(). in_HFunction = reference to a valid function, previously open with pmon_function_open(). PM_RESULT pmon_event_create( PM_HANDLE in_HLibrary, int in_EventTag, PM_HANDLE *out_HEvent ); Creates an event. The event is the PerfMonitor entity that triggers an actual measurement. It relates to a channel that, at its turn, relates to the effective measurement entities, the functions. When an event occurs that channel is activated and so are the functions related to the channel. After that, one can have the desired measurements in place. An event can be inserted into the code in any place that the programmer is interested to have data for. Usually, the events are named after the actions they have to trigger, e.g.: start thread, start period, end application and so on. in_HLibrary = a valid handler, previously open using pmon_open(). in_EventTag = event's unique name. Suggestive names can be chosen, so that no confusion could arise in reading the instrumented code. out_HEvent = reference to the newly created event. Currently, the supported events' tag are: ET_START_SYSTEM_RUN ET_END_SYSTEM_RUN ET_START_APP ET_END_APP ET_START_THREAD ET_END_THREAD ET_START_THREAD_PERIOD ET_END_THREAD_PERIOD ET_END_SESSION ET_POWER ET_SYSINT_SY ET_SYSINT_IL ET_SYSINT_SK ET_SYSINT_SI PM_RESULT pmon_event_delete( PM_HANDLE in_HLibrary, PM_HANDLE *in_HEvent ); Deletes an event. This is the last operation to be performed on an event. in_HLibrary = a valid handler, previously open using pmon_open(). in_HEvent = valid reference to a previously created event, that is about to be deleted. PM_RESULT pmon_event_target( PM_HANDLE in_HLibrary, PM_HANDLE in_HEvent, PM_RESULT (*in_Callback)(TLVEvent*, void*), void *in_Context ); In case of an event signal, this function offers the programmer the possibility to manage the data measured by the hardware counters all by himself. The in_Callback (function pointer) and the in_Context (context) are both used to call the function that is passed as argument. in_HLibrary = a valid handler, previously open using pmon_open(). in_HEvent = a valid event handler, previously open with pmon_event_create(). PM_RESULT pmon_event_link( PM_HANDLE in_HLibrary, PM_HANDLE in_HEvent, PM_HANDLE in_HChannel ); Links a channel to an event. By attaching a channel to an event, whenever that event is signaled (using pmon_event_signal()) the channel is triggered. Consequently, the functions related to that channel will perform their measurement actions and data will be available for processing. in_HLibrary = a valid handler, previously open using pmon_open(). in_HEvent = reference to a valid event handler, previously open with pmon_event_create(). in_HChannel = reference to a valid channel, previously open with pmon_channel_create(). PM_RESULT pmon_event_unlink( PM_HANDLE in_HLibrary, PM_HANDLE in_HEvent, PM_HANDLE in_HChannel ); Un-links a channel from an event. Once this action is taken, the channel still has functions attached to it, but it cannot be triggered any more by the event. in_HLibrary= a valid handler, previously open using pmon_open(). in_HEvent=reference to a valid event handler, previously open with pmon_event_create(). in_HChannel=reference to a valid channel, previously open with pmon_channel_create(). PM_RESULT pmon_event_signal( PM_HANDLE in_HLibrary, PM_HANDLE in_HEvent ); Signals an event. This means that, according to the programmer's desire, at the time an action in the code has to be instrumented (measured using vPmon) this is the function that triggers the event's related channel. Therefore data from the functions linked to that channel will be available in that place in the code. in_HLibrary = a valid handler, previously open using pmon_open(). in_HEvent = reference to a valid event handler, previously open with pmon_event_create(). PM_RESULT pmon_remote_monitor_start( PM_HANDLE in_HLibrary, int in_MonitorPid, int in_Period); PerfMonitor also offers the possibility to monitor some remote process, besides the basic facility of a thread's self-monitoring. This function can identify such a remote process based on its process id and initiates the periodic query for that process at in_Period regular intervals. in_HLibrary = a valid handler, previously open using pmon_open(). in_MonitorPid = the remote process's id. PM_RESULT pmon_remote_monitor_stop( PM_HANDLE in_HLibrary, int in_MonitorPid); Stops the remote monitoring activity, previously pmon_remote_monitor_start(). in_HLibrary = a valid handler, previously open using pmon_open(). in_MonitorPid = the process's id to remote monitor. started by PM_RESULT pmon_echo( PM_HANDLE in_HLibrary ); Replies to a message from a monitored thread/process. It uses Linux socket-connection to send back a message at regular intervals. It can be used to check Pmon's functionality/availability. in_HLibrary = a valid handler, previously open using pmon_open(). PM_RESULT pmon_flush( PM_HANDLE in_HLibrary ); Flushes all the messages that happens to be buffered in the communication channel for the monitored thread, upto the moment of its call. Uses Linux socket-connections. in_HLibrary = a valid handler, previously open using pmon_open() 3.3 Configuration Files PerfMonitor's client features are available also from a configuration file. Events, Channels, Functions can be declared and linked from the configuration file. In the application's code the events still need to be triggered at the needed time. The benefits of using the configuration file is that the syntax is a bit simpler than writing code, so it is less to write, and a change in the config file does not needed a recompilation of the application's sources. The parsing are done with Flex and Bison tools, and the grammar is presented below: <app_name> <id> { <instructions> } instructions: Function <func_name> hw <func_type> <flag>; Function <func_name> user <func_type> <user_function>; Function <func_name> const uint32 <func_type> <number>; Function <func_name> const string <func_type> <string>; Channel <chan_name> <chan_type>; Event <event_name> <event_type>; <ent> = {<ent1>, <ent2>, ...}; app_name, func_name, user_function, string, chan_name, event_name, ent, ent1, ent2: [a-zA-Z_][a-zA-Z_0-9]* number, id: [0-9]+ func_type: //hw functions FUNC_INSTR // (I) instructions FUNC_CYCLES // (Y) cycles FUNC_TIME_STAMP // (TE) time spent on processor by the thread //user functions FUNC_ALGORITHM // (A) algorithm FUNC_TEXT // (X) text FUNC_PPID // (X) parent pid FUNC_INSTANCE_NO // (N) frame number FUNC_TIME_SCHEDULED // (TS) FUNC_TIME_DEADLINE // (TD) FUNC_TIME_EXECUTION // (TE) not used FUNC_REAL_TIME // (R) FUNC_THREAD_TYPE //(TT) flag: FLAG_NONE FLAG_ENABLED //flag for enabling the function FLAG_RESET_AFTER_READ //flag for reseting the counter value after each read chan_type: CHAN_PASS_THROUGH // simple channel, can be linked to 1 function, does not modify function's output CHAN_PASS_THROUGH_MULTIPLE // can be linked to multiple functions, does not modify any function's output CHAN_MICRO2MILI // can be linked to 1 function, divides its result to 1000 CHAN_AGGREGATE // can be linked to multiple functions, adds the results of the linked functions CHAN_NULL // can be linked to multiple functions, gives no output, used for reseting the counters CHAN_REVERSE // can be linked to multiple functions, reads the linked functions in a reversed order event_type: START_SYSTEM_RUN END_SYSTEM_RUN START_APP END_APP START_THREAD END_THREAD START_THREAD_PERIOD END_THREAD_PERIOD END_SESSION POWER The application registers itself with app_name and id. When parsing the file it will run the instructions under the block identified by the app_name and id taken together (as a pair of identification values). The configuration file also accepts C style comments (/* comment ... */) Linking can done between: functions and channel; channels to channel; channels and events. So ent can be one of event_name or chan_name; ent1, ent2 can be one of chan_name or func_name. The user_function is defined inside the application and is identified by the user_function string. Short example: In here, the block is named test 1, taken both the app_name (here with a value of test) and the id (here with a value of 1 in here) as an identifier pair. test 1{ Function F_I hw FUNC_INSTR FLAG_RESET_AFTER_READ; Function F_TT const uint32 FUNC_THREAD_TYPE 1; Channel C_MP CHAN_PASS_THROUGH_MULTIPLE; Event ev START_APP; C_MP = {F_I, F_TT}; ev = {C_MP}; }