Multicore Processors make real time embedded systems more realistic Abstract: With the ever-increasing demands for embedded system devices, multi-core solutions are becoming more prevalent. The use of multiple cores increases the complexity of software design in many aspects. What are the different hardware architectures that can be implemented and which of them are more realistic and cost effective, are questions that all multicore systems software developers will ask. The majority will require a straightforward, simple approach to accomplishing the task. Background : An Embedded System: Is a system built to perform its duty, completely or partially independent of human intervention. Is specially designed to perform a few tasks in the most efficient way. Interacts with physical elements in our environment, viz. controlling and driving a motor, sensing temperature, etc. An embedded system can be defined as a control system or computer system designed to perform a specific task. Common examples of embedded systems include MP3 players, navigation systems on aircraft and intruder alarm systems. An embedded system can also be defined as a single purpose computer. Embedded systems are often required to provide RealTime response. A Real-Time system is defined as a system whose correctness depends on the timeliness of its response. Examples of such systems are flight control systems of an aircraft, sensor systems in nuclear reactors and power plants. For these systems, delay in response is a fatal error. A more relaxed version of Real-Time Systems, is the one where timely response with small delays is acceptable. Example of such a system would be the Scheduling Display System on the railway platforms. In technical terminology, Real-Time Systems can be classified as: Hard Real-Time Systems - systems with severe constraints on the timeliness of the response. Soft Real-Time Systems - systems which tolerate small variations in response times. Hybrid Real-Time Systems - systems which exhibit both hard and soft constraints on its performance. Problem Statement : Multicore for Real-time Embedded Applications Methodology : There are three main types of multiprocessing architectures, distributed processing (DP), symmetric multiprocessing (SMP) and asymmetric multiprocessing (AMP). Each has its own set of characteristics, advantages and disadvantages. Distributed processing (DP) is based on independent nodes. With DP, each node has its own processor and memory, and each of the nodes communicates over busses or a fabric. Each DP node may have different peripherals, and individual, separate copies of the operating system are run on each of the nodes. Advantages of a DP approach include predictable performance and higher memory bandwidth since memory is not shared. The DP approach often works well for multi-channel applications. The disadvantages of DP include the fact that load balancing must be performed by the application and that the application must be tied to a number of nodes. Also, DP typically supports a smaller per node quantity of memory compared to SMP and AMP designs. In SMP architectures, each node may have two or more processors and memory is global to all processors. In addition, the processors may also have both local cache and shared cache, and the cache is coherent between all processors and memory. Also, a single O/S is used to control all the nodes. The advantages of SMP include a large global memory and better performance per Watt, important for SWaP (size, weight and power) sensitive applications thanks to the use of fewer memory controllers. Instead of splitting memory between multiple CPUs, SMP’s large global memory is accessible to all of the processor cores. Data intensive applications, such as image processing and data acquisition systems, often prefer large global memories that can be accessed at data rates up to 100s of Mbytes/sec. These large memory applications benefit from the single large memory common in most multi-core designs. SMP also provides simpler node-to-node communication, and SMP applications can be programmed to be independent of node count. SMP especially lends itself to the use of new multicore processor designs. The disadvantages of SMP include the fact that the memory latency and bandwidth of a given node can be affected by other nodes, and cache “thrashing” may occur in some applications. SMP architectures differ from AMP in that a single block of memory is shared by the multiple processors or by multiple cores on a single multi-core processor. A single OS image runs across all the cores enabling truly parallel processing. A big advantage of SMP operating systems is that they perform load-balancing for the tasks between all available cores. Asymmetric multiprocessing designs use SMP hardware architecture where a common global memory is shared between the various processors. To the system software, this makes the SMP architecture look like a DP architecture. In AMP designs, application tasks are sent to the system’s separate processors. These processors may all be located on different boards or collocated on the same board, but each is essentially a separate computing system with its own OS and memory partition within the common global memory. One advantage of an AMP design is that asymmetric memory partitions can be assigned from one large global memory, making more efficient use of memory resources and potentially reducing system cost. Asymmetric multi-processing provides a sort of hybrid approach between DP and SMP by implementing distributed processing on an SMP architecture. In AMP, applications memory is partitioned between the nodes, and independent copies of the O/S can run on each node. Advantages of AMP include the fact that it is simple to migrate existing (non-SMP) O/Ss to the model and it offers superior node-to-node communication compared to a distributed architecture. Also, AMP supports the sharing of a large global memory asymmetrically between nodes. The disadvantages of AMP include some of the downsides of both DP and SMP, including the fact that load balancing must be performed by the application, memory latency and bandwidth can be affected by other nodes, cache “thrashing” may occur in some applications, and the application is tied to a number of nodes. Key Results For single board computers (SBCs), integrating two or more processors onto one device saves real estate for other important I/O features such as integrated mass storage module or a highperformance serial backplane interface. Embedded system developers can reduce multiple embedded systems into single hardware platform by allocating CPU cores in a multicore processor dedicated to real time tasks. Multi-core processors are especially well suited to SMP since they are ideal for the intensive multitasking applications common to signal processing, mission computing and industrial control which typically have multiple processes and multiple tasks or threads running in parallel within a process. These types of applications are often best addressed with an SMP operating system. Recent industry trends have combined to make SMP very attractive. Processor performance that was once garnered by increasing the chip’s clock frequency has become more difficult to achieve. Meanwhile, higher currents are required to drive signals faster, increasing the amount of power used in increasingly smaller chip real-estate. A related issue is that leakage current becomes problematic at higher frequencies and associated thermal solutions are problematic for embedded applications. The trend reflects the inexorable march of Moore’s Law as silicon density continues to double every couple of years. Unfortunately, smaller geometries no longer lend themselves to faster frequencies, but rather to more circuitry. As a result, the major processor manufactures are now moving to multi-core processors, which feature larger on-chip caches and enhanced instruction sets. This is a trend that is suitable for SMP/real-time system architectures. Discussion The key to making multi-OS embedded systems work on a multicore CPU is an RTOS that supports virtualization. Virtualization provides the isolation between multiple operating environments and also enables legacy real-time systems to be integrated with new functionality and minimal impact on legacy software. The latest Intel Multicore processors include a feature called Intel Virtualization Technology or Intel VT that enables hardware –enforced isolation of the processors I/O and memory. With virtualization multiple control loops can run simultaneously. Scope for Future Work: With the increasing demands on embedded devices, it is no wonder that more processing power is required. The move to multicore platforms is natural evolution for embedded devices considering the convergence of functionality being placed on what have traditionally been known as single purpose devices. Take the phone for example. It ha evolved from a device, whose main purpose was to place a simple point to point call to a device that functions as a mobile phone, gaming unit, camera, media server, web browser and more all-in-one. While convergence of functionality being placed on embedded devices in and of itself may not necessitate the move to multicore, additional considerations such as foot print, energy consumption, heat dissipation and other aspects of driving a single processor at higher and higher frequencies demand the transition. The addition of multiple cores also creates the need for communication and synchronization. Software developers want an easy to use mechanism that allows them to take advantage of multicore systems efficiently. They also want that method to be extensible in the future. With all these factors in mind, the need for a communication system and different types of hardware architecture and RTOS and what they should support and how they should be implemented should be explored. Conclusion: Multicore platforms for real time embedded systems are here to stay are will only become more prevalent in the future. Multicore solutions today mostly contain two to four cores on a chip/processor, but has the potential to grow to a very large number in the future. Some cores today have more than 300DSPs per chip. This number will continue to grow and be accompanied by combinations of different specialized cores. Whether one opts for an AMP or SMP RTOS architecture, the cost of developing real time embedded system will reduce and response time will also decrease making them more effective. Acknowledgement We thank all those who have contributed to this research article. We thank our friends and family members for their support and encouragement. We would also like to give a special mention of the efforts and the encouragement of the faculty members of Department of Computer Science and Engineering, CMRIT and especially our faculty mentor Mr. Sudhakar K.N. for his able guidance, without whom we could not have completed this article. Last but not the least, we would like to thank Intel Corporation for conducting this contest, participating in which we have learnt so much. References Web References: Wikipedia, Intel Software Network Resources, IEEE articles and journals, MIT Open Courseware Michael Barr. "Embedded Systems Glossary". Netrino Technical Library. Embedded.com - Under the Hood: Robot Guitar embeds autotuning John Catsoulis, Designing Embedded Hardware, O'Reilly, May 2005, ISBN 0-596-00755-8. Anoop MS, Security needs in embedded systems, Tata Elxsi, India, May 2008. International Journal of Critical Computer-Based Systems Other References: Operating System 3rd edition, by Gary Nutt Operating System concepts 7th edition, by Galvin