Design Patterns for Tasks in Real-Time Systems Class Number: ETP-241 Presented By: Michael C. Grischy Authors: David E. Simon & Michael C. Grischy of Octave Software Group, Inc. This paper discusses how to divide your code into a set of tasks under a Real-Time Operating System (RTOS) by illustrating a few design patterns to help you in that effort, patterns that have turned up repeatedly in our design work. The patterns we’ll discuss fall into three major groups: desynchronizing patterns, synchronizing patterns, and architectural patterns. In addition, we’ll consider how best to write the code for a task in order to avoid many of the common pitfalls of multi-threaded programming. We assume that the reader has a working knowledge of basic RTOS operation and services, along with a basic knowledge of C. In our examples, we assume that we’re working with an RTOS that provides a few standard services: a priority-based preemptive scheduler, mutexes, message queues, and timing services. Response Requirements One very important characteristic that drives the division of code into tasks, and the associated patterns for those tasks, is the response time requirements of your system. If your system has no particular response requirements—that is, if nothing that the CPU needs to do has a deadline—then you’re likely to write your software without using an RTOS, as a simple polling loop. If your system really has no response requirements whatever, you might even get away without using interrupts. Few systems are that simple, however, since in most systems the hardware imposes at least a few deadlines. Hardware deadlines include, for example, retrieving a character from a serial port UART before the next character arrives and overwrites the first, or noticing which button a user has pressed before the user takes his finger off the button and that information is lost. Even simple systems, therefore, usually end up as a mixture of interrupt service routines (ISRs) and a polling loop. Trouble arises when the response time requirements get a little too complicated for the interrupts-plus-polling-loop architecture. Imagine your system has a small display that needs to be updated every 100 milliseconds in order to smoothly run some animation. And suppose the polling loop, as part of its many jobs, must construct the next frame for the animation so that it is ready to be displayed when the 100 millisecond timer expires and the timer’s ISR changes the display to show the new frame. Now imagine that the animation doesn’t run smoothly because the CPU doesn’t always get around the polling loop quickly enough to dependably construct the next frame in a timely manner. If ISRs are your only mechanism for prioritizing CPU work, then you might “solve” this problem by moving the code that constructs the next frame for the animation into the end of the timer ISR in order to guarantee that the next frame will be ready when the timer interrupt next occurs. If enough features get added that the polling loop can’t keep up with, say, Page 1 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. the mechanical control your system needs, then, similarly, you might “solve” that problem by moving all of the code for mechanical control into ISRs. In really bad cases, most of the code ends up in ISRs, and the ISRs get so long that they have to poll one another’s devices because the ISRs themselves are causing other ISRs to miss their deadlines. You end up with things like the temperature change ISR (which now controls the whole factory) polling the serial port, because otherwise serial port characters get lost. This may seem like fantasy, but we’ve seen code like this. At this point—ideally, rather before your code gets to this point—you introduce a preemptive RTOS into your architecture. This gives you a way to control priorities and thereby control response without moving more and more code into ISRs. That user interface animation moves into a high priority task, not into an ISR, and the serial port ISR, which will still execute before the animation task, meets its deadlines. The factory control code moves into a task whose priority is high enough that the factory runs smoothly, not into an ISR, where it interferes with noticing whether the user has pushed a button. This logic brings us to the first and most fundamental task design pattern, found in almost every real-time RTOS-based system: Task Patterns for Desynchronization The principal problem with polling loops is that everything that the CPU does in that polling loop is synchronized, that is, things are invariably executed sequentially. Although this has its good side, as we’ll discuss later, it’s bad when the problem at hand is meeting deadlines. Everything in a polling loop waits for everything else. If your linear fit1 code is in the same loop with the code that controls the anti-lock brakes, the braking code will have to wait until the linear fit is done before it gets the attention of the CPU. See Figure 1. pollingLoop.c main () { while (TRUE) { /* A function that takes a long time. */ DoLinearFit (); /* A function that needs to be called frequently. */ CheckOnBrakes (); } } Figure 1 Polling Loop Code 1 A linear fit is a compute-intensive mathematical operation; it is used here and further on as a typical example of some CPU operation that might take long enough to cause your system to miss other deadlines. David Simon & Michael Grischy Page 2 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. Yes, you can play some games to stop the linear fit in the middle and see what’s going on with the brakes, but simple, maintainable, bug free code is not a likely result of this approach. These two operations need to be desynchronized: put the linear fit in one task, a lower priority task, and the braking operation in another, higher priority task. Then the RTOS will ensure that the braking operation gets the CPU when it needs it, and the linear fit gets the CPU when it is otherwise idle. The high priority task with the braking code is rather like a junior ISR: it executes ahead of less-urgent things like the linear fit but behind the very urgent code that you put in the ISRs. This is the basic desynchronization pattern. Here are some common desynchronization pattern variations that turn up: • User Interface Operation Pattern. If your user interface does nothing more in response to some user input than turn on an LED or some other one- or two-line operation, you’re probably just going to take care of it in the ISR that tells your system that the user input has arrived. However, if your user interface code has to remember what menu the user has been looking at, use the current system state and the identity of the button the user pressed to determine the next menu to present, determine a default choice for that menu, and then put up an elaborate display, then perhaps a lot of that work wants to get moved out of the ISR. If all that code stays in the ISR your system may miss other deadlines. User interface work typically has a deadline on the order of 100 milliseconds, and 100 milliseconds is plenty of time for an ISR to pass a message to a user interface task, for the RTOS to switch to that task, for a few other ISRs to execute, and for your user interface code to do what it needs to do. Therefore, a task is the right place for that code. The priority for that task must reflect the deadline; it must have a higher priority than a task that, for example, does a linear fit that takes 250 milliseconds of CPU time. • Millisecond Operation Pattern. More generally any operation whose deadline is measured in milliseconds—not microseconds—is a candidate for a high priority task. Assuming that your system has some serious computing to do at least once in a while, then anything that your system must do in milliseconds will miss its deadline if it has to wait for the serious computing to complete. (If your system never has any time-consuming CPU activity, then you’re unlikely to have any deadline problems anyway and might well stick to a polling loop for your code.) Operations whose deadlines are measured in microseconds typically end up in ISRs in any case. Some examples of things that fall into the millisecond category are… o Responding after a complete message has been received from another system over the serial port. o Constructing a suitable response to a network frame and sending it. o Turning off the valve when sensors tell us that we’ve added enough of some ingredient to a manufacturing mixture. • CPU Hog Pattern. Any operation that takes up an amount of CPU time measured in seconds, or perhaps a large number of milliseconds, is a candidate to David Simon & Michael Grischy Page 3 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. be moved into a low priority task. The trouble with CPU-intensive operations is that they endanger every deadline in the system. Creating a separate, low-priority task for a CPU-intensive operation gets rid of the interference. Note that of course moving such an operation into a low priority task is the equivalent of moving everything else into a higher priority task. However, the “put the CPUhogging operation into a low priority task” is often an obvious pattern. • Monitoring Function Pattern. If your system has some monitoring function, something that it always does when there’s nothing else to be done, then this operation goes into a task that becomes the lowest priority task in your system. This may even be a “polling task,” which never blocks, but which absorbs all leftover CPU time after all other operations have finished. For example, when there’s nothing else to do, the polling task in a system that monitors the levels of the gasoline in the underground tanks at a gas station measures the levels one more time to see if anything new and interesting has happened. Task Patterns for Synchronization The problem with desynchronizing your code by moving parts of it into different tasks is that many of the resources your software wants to use—certain hardware, data structures in memory, files—can’t be accessed asynchronously by multiple tasks in a safe way. For example, if one of your tasks is in the middle of updating a linked list when another of your tasks gains control of the CPU and tries to read along the list, your system is likely to crash when the reading task follows a not-yet-updated pointer into garbage memory. This brings us to the second major task design pattern: Task Patterns for Synchronization, which provide ways to serialize access to shared resources. Probably the most common—and perhaps also the most abused—way to deal with shared resources is to associate a mutex with each resource, declare everything global, and then take and release the mutex whenever and from wherever the resource is required, as shown here in Figure 2: RTOSMutexTake2 (mutex); /* Code accessing the shared resource goes here */ RTOSMutexRelease (mutex); Figure 2 Using a Mutex to Protect a Shared Resource The difficulties with this are… 2 Throughout, functions with names of the form RTOS… are functions in the RTOS API. Note that, when one of these routines is called, the calling task can lose control of the CPU if another task of a higher priority is ready to run. David Simon & Michael Grischy Page 4 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. • Mutexes work only if you use them correctly every time; that is, whenever any code anywhere in the system accesses the shared resource, the access must, without fail, be surrounded by code to take and release the proper—and not some other—mutex. Otherwise, mutexes lead to a host of subtle, infrequently appearing, hard-to-diagnose bugs in your system. When you use mutexes, every modification to your code presents another opportunity for another of these insidious problems to creep in. • Mutexes can affect response time in unpredictable ways that can cause your system to miss deadlines sporadically. Your high priority task always meets its deadline…unless it just happens to try to take a mutex right after some low priority task has taken it and started to use the shared resource. These bugs can also be extraordinarily tough to find, since they seldom cooperate by showing themselves when you have your test equipment set up to find them. Though your RTOS may take care of this priority inversion problem3 for you—if your RTOS supports this feature—you must be willing to take the performance hit that this feature incurs. A task whose job it is to handle a shared resource is a task pattern for solving this problem. The basic synchronization pattern looks like the one in Figure 3: MySharedResourceTask.c static MY_RESOURCE mySharedResource; void mySharedResourceTask () { while (TRUE) { /* Wait for a request to do something with the resource */ RTOSQueueRead (...); /* Do the request. */ mySharedResource += ...; /* etc. */ } } Figure 3 Using a Task to Protect a Shared Resource In the pattern above, mySharedResource is a data structure that many other tasks need to access in some way or other. As you can see in the sample, the MySharedResourceTask module has declared this data structure static, thereby encapsulating it and preventing undisciplined access by other code in the system. The 3 “Priority inversion” occurs when a high priority task gets stuck waiting for a mutex that a low priority task holds and can’t release because a medium priority task is using up all the CPU time, thereby preventing the low priority task from getting any CPU time to release the mutex. Some RTOSes implement a “priority inheritance” algorithm, which promotes the low priority task to high priority if the high priority task is waiting for a mutex held by the low priority task. Of course, this algorithm uses a little CPU time each time your code takes or releases a mutex. David Simon & Michael Grischy Page 5 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. mySharedResourceTask is declared to be a task within the RTOS; it calls RTOSQueueRead to wait for requests from other tasks (through an RTOS queue in this example, although many other RTOS mechanisms can do this job) about what needs to be done to mySharedResource. Then it does the operation. Note that all of the operations on mySharedResource, being done now within the context of mySharedResourceTask, are done sequentially; they are serialized in the task’s message queue. The mutex and all of the attendant problems are gone. Although the task’s message queue is now a shared resource, the burden is on the RTOS, not the programmer, to synchronize access to it. Figure 4 has a more concrete example, a common variation of this pattern we call the Hardware I/O Pattern in which the resource to protect is a piece of hardware, in this case a display: DisplayTask.c void DisplayTask () { while (TRUE) { /* Wait for a request to update the display. */ RTOSQueueRead (...); consider (in what may be a big cloud of logic code) whether to honor the new request or keep the current display on the screen if (the display needs changing) { update the display } } } Figure 4 A Task to Protect an I/O Device, such as a Display In a system to control a pay phone, for example, various parts of the code may simultaneously think it would be a good idea to show the user what number he’s called, how long he’s been on the phone, an advertisement for the phone company’s new rates, and an indication that his phone card is about to run out. If the display can’t display all these things simultaneously, it makes no sense to allow all those parts of the code to fight over the display. Instead, the logic to decide which of these suggestions should actually get displayed and to handle the display hardware goes into the task in Figure 4. Other parts of the code send their suggestions to this task. This eliminates the need for a mutex, resolves concerns about sharing the display hardware, and confines to this one module all of the logic for determining what is most important to display rather than scattering it throughout your system. Figure 5 is a similar example, in which the shared resource is a flash memory: WriteLogToFlashTask.c void WriteLogToFlashTask () David Simon & Michael Grischy Page 6 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. { while (TRUE) { /* Wait for a request to write a log entry to the flash. */ RTOSQueueRead (...); /* Write the log entry to the flash part. */ vFlashWriteStart (...); /* Block task here until flash has chance to recover. */ RTOSSleep (FLASH_RECOVERY_TIME); } } Figure 5 A Task Writes to a Log in Flash Memory In Figure 5, WriteLogToFlashTask writes to the flash and then blocks itself while the flash recovers.4 If other code in the system finds other events to log during the recovery period, they write their logging requests onto the RTOS queue, and the requests wait on the queue until WriteLogToFlashTask has finished with previous requests, and the flash has had a chance to recover. Information about when it is OK to write the next log entry into the flash doesn’t get spread around the system, and you will not have to code logic in the other tasks to deal with the recovery time. Responses from Synchronization Tasks Synchronizing tasks fulfill a role much like a server, offering the services of the shared resource to the other tasks, the “client” tasks, within the system. One problem that must be resolved is that client tasks very often want a response from the server. In the examples in Figure 4 and Figure 5, clients needed no response. If a client task sends an item for the log to the logging server task in Figure 5, for example, the behavior of the client task most likely will not depend upon when (or even if) the entry gets written into the log. In the pay phone, most of the software probably doesn’t care what eventually gets displayed: the payment software will likely disconnect the call when the user runs out of money, whatever the display task has actually shown the user. Often, however, the client task needs a response to its request. Sometimes a task needs this response before it can continue its own processing, a synchronous response. In other situations, the server task may not be able to provide a response immediately and the client task doesn’t want to block waiting for the response. So the client task must somehow get its response later, an asynchronous response. Synchronous Response Pattern. A task asking a calibration subsystem to get a value may need that value before it can continue its calculations: it needs a synchronous response. Almost invariably, using a mutex is the easiest way to deal with this requirement, but that does not mean that you should simply make the calibration data 4 Most flash memories require a recovery time after data has been written to them. During this recovery time, software may neither read from nor write to the flash. David Simon & Michael Grischy Page 7 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. global and let any task that wants a calibration value read it. Instead, encapsulate access to the data as shown in Figure 6: calibration.c static struct { /* Memory-based copy of the calibration data. */ int a_iValues [MAX_KEYS]; MUTEX mutex; } sCalibration; void CalibrationInit (void) { Initialize the sCalibration struct RTOSTaskStart (vWriteCalibrationToFlashTask, ...); } int iCalibrationRead (int iKey); { int iReturnValue; RTOSMutexTake (sCalibration.mutex); iReturnValue = sCalibration.a_iValues [iKey]; RTOSMutexRelease (sCalibration.mutex); return (iReturnValue); } void vCalibrationWrite (int iKey, int iValue); { RTOSMutexTake (sCalibration.mutex); sCalibration.a_iValues [iKey] = iValue;; RTOSMutexRelease (sCalibration.mutex); /* Post a request to write the calibration entry to flash. */ RTOSQueueWrite (queue, ..., iKey, iValue); } void vWriteCalibrationToFlashTask () { int iKey, iValue; while (TRUE) { /* Wait for a request to write a calibration entry. */ RTOSQueueRead (queue, ..., &iKey, &iValue); /* Write the entry to the flash part. */ vFlashWriteStart (iKey, iValue); /* Block task here until flash has chance to recover */ RTOSSleep (FLASH_RECOVERY_TIME); } } Figure 6 A Calibration Task David Simon & Michael Grischy Page 8 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. This code keeps a shadow of the calibration values in the sCalibration structure. When some part of your code wishes to write a new calibration entry, vCalibrationWrite puts the new value into sCalibration (using the mutex that protects that structure) and then sends a message to vWriteCalibrationToFlashTask to write the new calibration value into the flash. When some part of your code wishes to read a calibration entry, iCalibrationRead fetches the new value from sCalibration (again, using the mutex). This code thus provides immediate response for tasks that need entries from the calibration, although any task that calls either vCalibrationWrite or iCalibrationRead must be able to wait for the mutex without missing deadlines. Asynchronous Response Pattern. As an example of a task that requires an asynchronous response, consider a task that needs to move a motorized mechanical assembly some distance in a certain direction, but that needs to continue running the rest of the factory while the mechanical assembly is busy moving. If the software in this system contains a separate server task to handle the operation of the mechanical assembly, then this server task needs to provide a service that allows clients to request a move but not block waiting for the move to complete. When the move completes, then the server task notifies the client task. Since it is common for tasks to read from an RTOS queue to get information about what is going on in the outside world, one convenient way to provide an asynchronous response is to have the server task send a message back to the client task’s message queue when the response is available. Some code to do that is shown here in Figure 7: asynchServer.c static struct { /* Asynchronous notification information */ QUEUE q; /* Queue on which to send notification */ MESSAGE msg; /* Notification message to send on queue. */ } sNotification; void AsynchServerInit (QUEUE q, MESSAGE msg) { sNotification.q = q; sNotification.msg = msg; RTOSTaskStart (vAsynchServerTask, ...); } void vStartProcessNotifyWhenDone (void) { /* Send the task a request to start the process. */ RTOSQueueWrite (queue, MSG_START_PROCESS, ...); } David Simon & Michael Grischy Page 9 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. void vAsynchServerTask () { while (TRUE) { /* Wait for a request. */ RTOSQueueRead (queue, &msgEvent, ...); switch (msgEvent) { case MSG_START_PROCESS: start up the process break; case MSG_PROCESS_DONE: /* An ISR or another task has informed us that the process has completed. Notify our client. */ RTOSQueueWrite (sNotification.q, sNotification.msg, ...); break; } } } Figure 7 A Server Task Provides Asynchronous Response When AsynchServerInit is called to initialize the server task, the client task’s q and msg are remembered for later use. When the client task calls vStartProcessNotifyWhenDone, the routine writes a message to the server task’s queue asking it to start the “process”. When the server task reads the message, it starts the “process”. Later, an ISR or another task sends a MSG_PROCESS_DONE message to the server’s task, signaling that the “process” is finished. At this point, the server task sends an asynchronous response to the client task, using q and msg, to notify it that the “process” is finished. A more general way to accomplish the same result is to have the server task call a callback function in the client task instead of passing a message to a queue. Instead of a queue and a message in this case, AsynchServerInit takes a callback function pointer and perhaps a context value to pass to the callback function. The disadvantage of this method is that you often end up writing a whole raft of functions in your client tasks something like the one shown in Figure 8. The advantages are that the server task has no access to the client’s queue, thereby encapsulating the queue and reducing the chance of bugs related to the queue; and that the client task may be able to handle the event in the callback function without having to incur the overhead imposed by using a queue. clientTask.c ... /* This routine is supplied by the client task and is called by the server task to asynchronously notify the client task of some event. */ void vEventCallback (CONTEXT context) { RTOSQueueWrite (clientQueue, context, ...); } David Simon & Michael Grischy Page 10 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. Figure 8 Typical Callback Function for an Asynchronous Server Task to Call Architectural patterns The two patterns discussed in this section are patterns that we’ve seen often in our work. They can sometimes make your coding life simpler. Periodic Task Pattern. Periodic tasks deal with things that your system needs to do on a regular basis. Suppose, for example, that your system sometimes needs to blink an LED on and off once per second to indicate some error condition to the user. You could no doubt piggyback this onto some task in your system, but you could also write the simple code shown in Figure 9: led.c static BOOL fBlinkingEnabled; /* We assume that writing to a boolean variable is atomic */ void vStartBlinking (void) { fBlinkingEnabled = TRUE; } void vStopBlinking (void) { fBlinkingEnabled = FALSE; } void vLedTask () { static BOOL fLedOn = FALSE; while (TRUE) { RTOSSleep (ONE_HALF_SECOND); if (fBlinkingEnabled) { if (fLedOn) LedOff (); /* Turn the LED off */ else LedOn (); /* Turn the LED on */ fLedOn = !fLedOn; } else { LedOff (); /* Turn the LED off */ fLedOn = FALSE; } } } Figure 9 A Periodic Task A few points about this pattern: • Using RTOSSleep as shown here may not be particularly accurate, depending upon your RTOS and depending upon the rest of your system. Most RTOSes offer more sophisticated ways, such as a callback from a timer service, to get accurate timings if they are important for your periodic task. • This periodic task wakes up and uses some CPU time, even when the LED is not blinking. If this is a problem in your system, the code in Figure 10 deals with that David Simon & Michael Grischy Page 11 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. (at the penalty of some slight complication to your code). Note that most RTOSes give you a way to get a message into a queue at some future time (although you’ll probably have to write a callback function such as the one shown in Figure 8 above). Led2.c static BOOL fBlinkingEnabled; void vStartBlinking (void) { fBlinkingEnabled = TRUE; RTOSQueueWrite (queue, ...); } void vStopBlinking (void) { fBlinkingEnabled = FALSE; } void vLedTask () { static BOOL fLedOn = FALSE; while (TRUE) { /* Wait for msg from RTOS timer or from vStartBlinking */ RTOSQueueRead (queue, ...); if (fBlinkingEnabled) { if (fLedOn) LedOff (); /* Turn the LED off */ else LedOn (); /* Turn the LED on */ fLedOn = !fLedOn; /* Solicit a msg from the RTOS 0.5 seconds from now */ RTOSNotify (ONE_HALF_SECOND, queue, ...); } else { LedOff (); /* Turn the LED off */ fLedOn = FALSE; } } } Figure 10 A Periodic Task the Runs Only When it Needs to. State Machine Pattern. State machine tasks are a very simple way to implement state machines in a real-time environment. State machines keep track of the state of the outside world in one or more “state variables” and perform actions and change their states in response to events that arrive from the outside world. In an RTOS environment the basic pattern for a state machine task is the code here in Figure 11: stateMachineTask.c void vStateMachineTask () { STATE state; EVENT event; David Simon & Michael Grischy Page 12 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. state = idle state while (TRUE) { /* Wait for an event that drives the state machine. */ RTOSQueueRead (..., &event); switch (event) { case event1: if (state == ...) { state = next appropriate state start next appropriate action } else if (state == ...) { ... } ... break; case event2: if (state == ...) { state = next appropriate state start next appropriate action } else if (state == ...) { ... } ... break; case event3: ... ... } } } Figure 11 State Machine Task Whenever something happens that affects this state machine, the ISR or task that notices the event writes a message to vStateMachineTask’s message queue indicating what has happened. The vStateMachineTask code uses this message as an event to drive the state machine forward. Even though the real world events that drive the state machine may happen so quickly that the code can’t process one before the next has occurred, vStateMachineTask’s queue will serialize those events so that the code can handle them in an orderly way. State machines are common in mechanical control systems. The state variables of such a state machine contain the condition of the mechanical hardware. Events are sent to the task when a sensor triggers or a stepper motor has completed a certain number of steps or a given amount of time has passed. When it receives one of these events, the state machine code changes its internal state variables to reflect the new state of the hardware and does whatever is needed to start the next necessary hardware action. David Simon & Michael Grischy Page 13 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. Coding pattern In the sections above, we have discussed various patterns that can guide you to reasonable ways of dividing the work of your system into separate tasks under an RTOS. In this last section, we discuss a pattern for coding tasks. Once we have decided how to break a system up into a set of tasks, we use the coding pattern discussed below to help us avoid many of the common pitfalls of multi-threaded programming. While a pre-emptive RTOS can help your system meet its deadlines, it also introduces, by its very nature, a whole raft of potential problems that are brought about because one thread of execution can lose control of the processor to another thread of execution essentially at any time. All RTOSes supply tools, such as mutexes and queues, to help solve these problems, but you must use these tools flawlessly or your system will have bugs that appear only intermittently and that are very difficult to analyze and find. The coding pattern helps systematize the use of these tools in order to minimize problems of this nature. The coding pattern is a generic pattern that can be used as a starting point for tasks that don’t fit the task patterns already discussed. It combines and generalizes some of the concepts introduced earlier. The coding pattern provides a task that makes efficient use of CPU time. The coding pattern also eliminates re-reentrancy problems through the use of a simple, mechanical coding strategy. And, in the spirit of object oriented programming, the coding pattern encapsulates all of the resources that are used by a task and provides an API for controlling the task. Specifically, here are the salient characteristics of the coding pattern: • The task is either blocked waiting for a message telling it to do something or it is busy doing something. It is never blocked waiting for some particular message or event. It never spins in a loop waiting for something to finish. It never polls variables figuring out if there is something to do. The task is either blocked, or it is busy. • The coding pattern systematizes the use of the mutex to serialize access to the task’s data. Every API routine—i.e., every routine that is called by a thread of execution that is different than the task’s thread of execution—that asynchronously accesses the task’s data has a preamble and postamble that takes and releases the mutex that protects that data. Likewise, there is a preamble and postamble to take and release the mutex in each and every case statement for the messages that the task processes from its message queue. • Any asynchronous event that must be processed by the task, such as an interrupt, a timer callback, or a request from another thread to start some process managed by the task, is handled by having a dedicated routine that turns the event into a message that is sent to the task’s queue. In this way, all asynchronous events are serialized in the task’s queue where they are handled one at a time by the task’s thread in the order in which they were received. • The mutex and the data structures it protects are always encapsulated in one module. The data may be accessed in various task contexts using a mutex for David Simon & Michael Grischy Page 14 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. protection, but access to the data and the use of the mutex are encapsulated. This gives you more control over how long the mutex is held, since all the code that holds it is in one module. It also helps prevent the data sharing bugs that arise when some piece of code somewhere in your system modifies the shared data but fails to use the mutex properly. • A task’s queue is encapsulated in one module. A queue handle is never a global variable, since that would allow anybody working on any code in your system to write any miscellaneous collection of bytes into the queue. Instead, all messages are written to a task’s queue and read from the queue in just the one place. The pattern that embodies these characteristics is shown here in Figure 12: genericTask.c static static static static DATA_STRUCTURE sData; // this task’s encapsulated data MUTEX mutex; // protects access to task’s data QUEUE queue; // this task’s message queue CALLBACK_FUNC * pCallBackProcess1Done; // client callback /* API functions (subsets of these routines are possible). */ void vGenericTaskInit (CALLBACK_FUNC * pCallBack) { pCallBackProcess1Done = pCallBack; // set up client’s callback mutex = RTOSMutexCreate (...); queue = RTOSQueueCreate (...); RTOSTaskStart (vGenericTask, ...); install ISR vGenericTaskProcess1DoneIsr as handler for the hardware’s Process1-Done interrupt } int iGetSynchResponse (void) { int iValue; RTOSMutexTake (mutex); iValue = get value out of sData RTOSMutexRelease (mutex); return (iValue); } void vStartProcess1GetAsyncResponse (...) { RTOSMutexTake (mutex); manipulate sData RTOSMutexRelease (mutex); /* This will elicit an asynchronous response when done. */ RTOSQueueWrite (queue, MSG_START_PROCESS1, ...); } /* Notification routine called in task context. */ void vNotifyEvent1 (void) { /* Tell the task that event 1 has occurred. */ RTOSQueueWrite (queue, MSG_EVENT1, ...); } David Simon & Michael Grischy Page 15 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. /* ISR called in interrupt context. */ static void vProcess1DoneIsr (void) { RTOSEnterInterrupt (); service the hardware to turn off the interrupt RRTOSQueueWrite (queue, MSG_PROCESS1_DONE, ...); RTOSExitInterrupt (); // Task rescheduling can occur here. } /* The task. */ static void vGenericTask () { while (TRUE) { /* Block here waiting to be told what to do next. */ RTOSQueueRead (queue, msgEvent, ...); switch (msgEvent) { case MSG_EVENT1: RTOSMutexTake (mutex); write to or read from the shared data as needed RTOSMutexRelease (mutex); break; case MSG_START_PROCESS1: RTOSMutexTake (mutex); write to or read from the shared data as needed RTOSMutexRelease (mutex); do whatever is necessary to start process1 break; case MSG_PROCESS1_DONE: RTOSMutexTake (mutex); write to or read from the shared data as needed RTOSMutexRelease (mutex); /* Notify client that process1 has completed. */ pCallBackProcess1Done (); break; ... } } } Figure 12 Generic Task Code Pattern Using Patterns All of the patterns that we have discussed here apply to a broad variety of applications. To use them, you must see the characteristics in your systems that match the characteristics of the patterns. These patterns are not exclusive of one another; sometimes a little mixing and matching is called for as you put together a system. However, we have found that thinking along the lines described here gets us a long ways towards building reliable systems that meet their deadlines and that we can code reasonably quickly. David Simon & Michael Grischy Page 16 Design Patterns for Tasks in Real-Time Systems Octave Software Group, Inc. Michael Grischy is one of the founders of Octave Software Group, Inc., a software development consulting firm. David Simon, also a founder, has recently retired from Octave Software. Octave Software Group, Inc. 1855 Hamilton Ave., #203 San Jose, CA 95125 650-941-3797 www.OctaveSW.com michael@OctaveSW.com David Simon & Michael Grischy Page 17