Improving Digital Signal Processing Performance and Lowering Costs With the IntervalZero RTX Real-Time Platform Used on Multi-Core-Based Intel® Processors White Paper Executive Summary Intel Corporation Digital signal processors (DSPs), with their specialized architectures optimized for heavy math computation, IntervalZero Inc. have been a mainstay of the real-time digital signal processing marketplace as a cost-effective solution for many complex designs. Although DSPs remain a viable choice for many systems, developers are finding that proprietary DSPs are no longer required to perform real-time digital signal processing. Instead, developers are migrating to the IntervalZero RTX Real-Time Platform, which comprises Intel® multi-core processors, the Microsoft Windows* operating system and RTX real-time software. This platform is designed to outperform DSPs, reduce costs and streamline development cycles. This paper compares DSP-based systems to systems based on the RTX Real-Time Platform and provides an overview of the economic and operational benefits that can serve as a catalyst for the migration away from DSPs. Among other findings, this paper presents benchmark test results showing that in terms of floating-point operations per second, the performance of a multi-core DSP does not match the raw performance and speed of a device based on the 2nd generation Intel® Core™ i7 processor. IntervalZero’s symmetric multiprocessing-enabled RTX software provides hard real-time functionality for Windows operating systems, including Windows 7 with its touch-and-gesture capabilities. White Paper: Improving Digital Signal Processing Performance and Lowering Costs Contents System Diagrams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 DSP-Based Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 RTX Real-Time Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Hardware Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Processor Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Board Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Communication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Memory Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Software Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 General-Purpose Operating System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Real-Time Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Development Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Code Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 DSP Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 White Paper: Improving Digital Signal Processing Performance and Lowering Costs System Diagrams DSP-Based Platform DSPs rely on specialized architectures designed for heavy math computation, but not for general processing. Due to the computational focus of DSPs, general-purpose processors (GPPs) are typically used for human machine interface (HMI) and general-purpose functions, TRADITIONAL DSP SYSTEM DESIGN Dual-Core System DSP Subsystem Windows OS Human Machine Interface (HMI) UNUSED Real-Time Processing Real-Time Processing Real-Time Processing Processor 1 DSP 1 DSP 2 DSP 3 such as input and output (see Figure 1). In addition, it should be noted that DSPs require separate memory devices for each device, and separate buses are needed for all inter-processor communications (i.e., PCIe or serial interface). Processor 0 Figure 1. Traditional DSP-based system. RTX REAL-TIME PLATFORM RTX Real-Time Platform The RTX Real-Time Platform takes a different approach, dividing up the multiple cores for dedicated real-time processing and generalpurpose processing. Figure 2 illustrates an Intel® processor-based Application Platform World-Class User Interface Real-Time Application multi-core device with cores dedicated to Windows and cores dediRTX Platform cated to RTX for real-time signal processing functions. Unlike the DSP design, memory is shared using the same high-speed bus, allowing for data sharing and inter-processor communications. The following chart compares the two architectures (see Table 1). Subsequent sections of this paper will discuss the merits of the architectures in detail. Windows Real-Time Kernel Extension − RTX Windows Kernel Core 0 Intel® Atom™ Processor Core X Core X+1 Core X+2 x86 Single or Multi-core ... Core X+31 x86 64-bit Multi-core Figure 2. RTX real-time platform. 3 White Paper: Improving Digital Signal Processing Performance and Lowering Costs Table 1. Comparing features and benefits of DSP and RTX platform architectures. Category Hardware Design Drivers/BSP System Communication Memory RTOS Development Environment Code Base DSP Libraries Benchmarks DSP Platform • Custom board design • Long development cycle • Difficult-to-modify hardware • Requires In Circuit Emulators (ICE) • Custom drivers – device-specific RTX Real-time Platform • COTS • Short development cycle • Easy-to-change hardware • Local Debug – no ICE needed • Standard drivers • Custom protocol • Custom buses • Chip down – hard to change • Limited selection • Limited memory reach (1 GB) • No MMU • Proprietary • Standard memory • Standard APIs • PnP/COTS Modules – easy to change • Large selection • Large memory reach (>4 GB) • MMU – Virtual memory • winAPI • Proprietary tools • Manufacture-specific • Proprietary compilers • Assembly and C • Architecture-specific • Processing libraries from OEM • Standard x86 tools • Microsoft Visual Studio* • Intel and Microsoft compilers • C, C++, C# • Universal/Portable • Intel-supplied IPP library • 1.25 GHz – Max core speed • 20 GFLOPS/Device max • 3.0 GHz – Max core speed • >50 GFLOPS/Device max RTX Benefit • Reduced cost • Reduced risk • Faster time to market • Cost • Productivity • Ease-of-use • Future proof • Reduced system size, cost and complexity • Familiar API • Common tool chain throughout the product • Use other higher level languages for non real-time • No requirement to use OEM proprietary libraries • Faster • Higher speed Hardware Design Board Design Intel’s multi-core advances are changing the way that systems develop- In DSP-based systems, engineers spend a great deal of time selecting ers meet real-time signal processing needs. The latest Intel processors the right processor so that the board can be designed and built in time not only have multiple cores with extremely high clock speeds, but also for the software. The need to make hardware commitments early in perform complex math efficiently. They can deliver several times the the design process often creates challenges. Any miscalculation can performance of standard DSPs, as can be seen from the benchmark lead to a hardware redesign, which will have a negative impact on the numbers in Table 1. delivery schedule. In addition, Intel multi-core devices are delivered with commercial- Using the IntervalZero RTX Real-time Platform imposes no strict dead- off-the-shelf (COTS) hardware, which is not typically an option for lines for choosing and designing the hardware. Because RTX is Intel® DSP users. COTS hardware, when compared to custom hardware, architecture-based, standard COTS hardware is available for develop- translates into reduced costs, shorter time to market and less risk. ment and production. Since no custom boards or drivers are required for development, the final hardware selection can be performed later in the Processor Selection design cycle — which lowers risk and improves the chances of releasing When replacing DSPs with Intel multi-core devices, attention should be products on time. focused on the number of cores needed, as well as which Intel processor family best fits the system requirements. As a rough guideline, developers should aim for a 1:1 relationship between DSPs and Intel cores. Because Intel cores are capable of extremely high clock speeds, the design will likely require fewer cores. However, the 1:1 ratio is a conservative start that allows room for optimization. Intel offers a variety of processor families in a range of cost and power options. For example, Intel® Atom™ processor-based devices are focused on embedded systems that require low power and minimized cost. 4 Also, because RTX is based on Intel architecture and Windows, engineers can use PCs as both development and target machines. No separate target system with specialized in-circuit emulators is needed to develop and debug code. As the chart in Table 1 shows, the use of COTS hardware in RTX reduces engineering costs and lowers risks. White Paper: Improving Digital Signal Processing Performance and Lowering Costs Communication support to satisfy most complex system requirements. As a result, GPPs DSP-based systems rely on custom hardware and software interfaces running general-purpose operating systems are typically found next to when communicating between devices. Serial lines and sometimes PCI DSPs to handle all of the I/O demands and the complex user interfaces. buses are used for inter-processor communications. These custom interfaces can be troublesome to design on custom hardware, and they can be difficult to upgrade as requirements change. The strong demand for powerful GUIs is making Windows the operating system of choice due to its abundant tools support and standardization. The use of standardized tools such as Visual Studio, combined with the The RTX platform uses shared memory and formalized APIs to commu- large support structure of the Microsoft Developer Network (MSDN), nicate between processes. Because everything runs on a single device, help make transitioning from DSPs to RTX a straightforward process. sharing data and messages between threads running on different cores is easier. This communication architecture helps make programming easy and scalable. Real-Time Operating System DSPs typically use proprietary real-time operating systems (RTOSs) from device manufacturers such as DSP/BIOS* from Texas Instruments Inc. or Memory Selection VDK from Analog Devices Inc. These DSP-based RTOSs are capable, but Whereas DSPs require separate memory devices, the RTX platform they have significant limitations because of the lack of communications consolidates memory into a single memory device. In addition, an Intel among the various DSP cores. Each DSP runs its own application, and processor-based platform can use standard off-the-shelf memory little inter-processor communication is used. modules (e.g., 8 GB modules), which can be easily changed as system requirements dictate. DSPs require on-board memory chips because of strict timing and routing requirements. Switching out memory chips is difficult and requires board changes to add or remove memory. These differences make the DSP-based approach a more costly, more complex and less flexible alternative than the RTX platform. Software Design The IntervalZero RTX Real-Time Platform implements a single symmetric multiprocessing scheduler across all of the cores in the real-time subsystem (see Figure 3). Instead of a separate RTOS and application running on each real-time core, the RTX platform uses a single real-time scheduler to schedule threads across a number of cores. This flexible SMP scheduling model helps make synchronization and communications among the various cores and threads simple and easy to implement. Software advances have also had a significant impact on DSP-based Load balancing can also be easily implemented from the APIs that the systems. The demand for standardized tooling in lieu of proprietary RTX subsystem provides. The RTX APIs enable threads to be moved toolsets is a recent change affecting DSP developers. Programming between processors during runtime. DSPs requires proprietary tools and lower-level languages such as C and Assembly. Many engineering teams find that proprietary DSP tools and low-level languages require specialized expertise, which is often difficult to find and expensive to acquire. By contrast, developers are discovering that the use of standardized development tools and Quad-Core System Windows OS Core RTX Core Core Core higher-level languages, such as Microsoft Visual Studio* and C++, can increase productivity and greatly reduce engineering costs. Figure 3. RTX SMP-enabled scheduler. General-Purpose Operating System The demand for complex graphical user interfaces (GUIs) has also affected DSP-based system development. Increasingly, customers are seeking elaborate touch-and-gesture-based user interfaces on top of a real-time subsystem. DSPs have traditionally not had the GUI or I/O 5 White Paper: Improving Digital Signal Processing Performance and Lowering Costs Development Tools The RTX platform’s strong support for C++ is designed to provide DSP manufacturers provide proprietary tools such as Texas Instruments’ distinct benefits. Because RTX makes use of powerful Intel-based Code Composer Studio* and Analog Devices’ VisualDSP++*. These tools devices, programmers can use C++ to enable increased productivity. are useful for developing DSP-based applications, but market demand By keeping the code base in higher-level languages such as C and C++, and cost pressures are providing impetus for more standardized tool- programmers can take advantage of portability and code reuse to ing. In addition, the low-level tooling and proprietary coding practices reduce their costs and risks. inherent in DSPs do not promote code reuse and portability. This drawback typically translates into longer development times and higher risks. DSP Libraries Many developers have discovered that working with proprietary DSP Texas Instruments and Analog Devices both offer optimized DSP tools is not only challenging, but also costly to maintain. libraries to help developers with performance and time to market. As an alternative to these DSP libraries, Intel created the Intel® Integrated The RTX platform uses Visual Studio, a standardized tool for program- Performance Primitives (IPP). The Intel IPP library is essentially a collec- ming in an Intel architecture environment. Finding engineering talent for tion of optimized functions (i.e., DSP, imaging, video and so forth) for Intel Microsoft tools is generally easier and more cost-effective than is the multi-core architectures. The Intel IPP library is designed to help increase case with DSP tools. Many engineering teams prefer to use standardized productivity and performance by providing optimized routines for the tools such as Visual Studio and MSDN for their robustness and support, most common DSP-based functions. and they find that taking advantage of these tools enables increased productivity and reduced costs. Benchmarks Full development tool suites that integrate into Visual Studio are also Performance differences between processors can be measured in available from Intel, including proven C/C++ compilers; performance and parallel libraries; and tools for error checking, code robustness and performance profiling. Examples include the Intel® Parallel Studio XE and Intel® C++ Studio XE Suites. Code Base a number of ways. Because the focus of this paper is on DSPs, the following comparison measures the number of floating-point operations per second (GFLOPS). Table 2. Performance differential: DSP vs. Intel® multi-core processor Benchmarks Migrating code base to the RTX platform is a straightforward process. 1.25 GHz - Max core speed 3.0 GHz - Max core speed 20 GFLOPS per core 32 GFLOPS per core When considering a port, the developer needs to be aware that an The benchmark test results shown in Table 2 compare the performance RTX application has two parts — general-purpose, or non-real-time of a Texas Instruments C66xx multi-core DSP and a device based on the processing, and DSP real-time processing. The non-real-time, general- 2nd generation Intel® Core™ i7 processor. purpose code will directly port over using a standard Visual Studio project, and it will run on Windows. The DSP real-time code will also be built using Visual Studio, but it will run on the RTX-controlled cores (the real-time subsystem). As the chart indicates, the TI C66xx at 1.25 GHz outputs 20 singleprecision GFLOPS per core. The 2nd generation Intel Core i7 processor outputs 32 double-precision GFLOPS per core. Although the GFLOPS performance is strong with the TI DSP, these benchmark tests indi- DSPs are usually programmed in both C and Assembly. The Assembly cate that it does not match the raw performance and speed of code is not portable and will need to be written in C or C++, but the C Intel processors. code can be ported over using Visual Studio. The RTX platform enables real-time users to code efficiently in C++ rather than having to start with C and Assembly programming. Although DSPs do have some C++ support, object-oriented languages typically lose too much efficiency to be useful when compiled for DSP architectures. 6 White Paper: Improving Digital Signal Processing Performance and Lowering Costs Conclusion The IntervalZero RTX Real-time Platform marries the power of the Windows 7 interface with the real-time signal processing capabilities of Intel multi-core architectures. The use of standardized COTS hardware and standardized tooling is designed to greatly simplify engineering efforts and reduce cost. The RTX Platform is also designed to increase innovation, portability and scalability. The Intel® Embedded Design Center provides qualified developers with Web-based access to technical resources. Access Intel Confidential design materials, step-by step guidance, application reference solutions, training, and Intel’s tool loaner program, and connect with an e-help desk and the embedded community. Design Fast. Design Smart. Get started today. edc.intel.com. The need for digital signal processing can be expected to continue, but recent hardware and software advances have made dedicated DSPs an option instead of a requirement. For complex systems requiring powerful GUIs and real-time signal processing, other options are available that offer higher performance, more scalability and less cost. 7 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel’s Web site at www.intel.com. Copyright © 2011 Intel Corporation. All rights reserved. Intel, the Intel logo, Core, and Atom are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Printed in USA 0911/MGA/OCG/XX/PDF Please Recycle 326162-001US