{FINAL COTS11 GPU FPGAs Signal Proc Jeff Lead.doc 3 page lead by Jeff for Tech Recon: FPGAs vs. GPGPUs for Signal Processing Systems for November COTS Journal.} Editor’s Notes: 1. Figure 2 should be redrawn replacing the stylized Altera logos to just say “Altera 10 FPGAs and SoCs” in plain text. FPGA Board Advances Tighten Up System Capabilities Integrated alongside faster converter technologies, FPGA solutions are feeding today’s huge signal processing appetites. Meanwhile, GPUs are becoming accepted as a solid choice for parallel processing military systems. Jeff Child, Editor-in-Chief Gone are the days when even the term “Digital Signal Processer” occupies mainstream mindshare in military system design. That’s because the signal processing functionality on board today’s FPGA chips are much more interesting the kind of system-oriented DSP functions used in defense. And signal processing capabilities of FPGAs continue to climb, feeding the insatiable appetite such systems have for more digital signal processing muscle. The requirements for such systems continue to call for ever more data collection capacity. The ability, for example, to process that data—in the form of radar captured video or images—presents major system design challenges for developers of military platforms. Board-level FPGA computing solutions have grown to become key enablers for waveform-intensive applications like sonar, radar, SIGINT and SDR. FPGAs a System Level Technology Faster FPGA-based DSP capabilities combined with an expanding array of IP cores and development tools for FPGAs are enabling new system architectures. Today FPGAs are complete systems on a chip. The high-end lines of the major FPGA vendors even have generalpurpose CPU cores on them. And the military is hungry to use FPGAs to fill processing roles. Devices like the Xilinx Virtex-6 and -7 and the Altera Stratix IV and V are examples that have redefined an FPGA as a complete processing engine in its own right. While FPGAs remain a mainstay of military signal processing, an alternative of the “GPUs as general-purpose processing engine” has been gaining momentum since 2007. GPGPU offers a simpler way to do complex multiprocessing by putting high-performance graphics processors to work on general-purpose processing tasks. This fits well into the theme of doing more while keeping the complexity at bay. Graphics chip vendor NVIDIA developed a parallel computing architecture called CUDA. System developers can also us AMD GPUs using OpenCL instead of CUDA. Languages like CUDA and OpenCL let programmers use conventional computing languages to access the massively parallel processing capabilities of the GPU. Aside from serving applications in radar, signals intelligence and video surveillance and interpretation, GPUs have potential in other application areas, including target tracking, image stabilization and SAR (synthetic aperture radar) simulation. FPGAs Tie Close with ADCs/DACs Back to the FPGA side, one big advantage of FPGAs lies in their ample, programmable, highspeed I/O, which is why they are often found close to the analog-to-digital converters (ADC) behind radar phased arrays. Board level vendors continue to roll out integrated solutions using the latest greats ADCs and DACs tied with FPGA processing. In an example along those lines, Curtiss-Wright last month announced a collaboration with Tektronix Component Solutions to developed technology that double the analog-to-digital (ADC) and digital-to-analog (DAC) data bandwidth performance supported by its CHAMP-WB OpenVPX board family. The new receiver and transmitter products will deliver 25 Gsamples/s and the combined boardset will enable direct RF sampling of bandwidths up to 12GHz using open architecture COTS modules. The board-set’s ultra-high sampling rate will enable these applications to scan huge swaths of bandwidth for signals of interest. The CHAMP-WB is the first entry in Curtiss-Wright Defense Solutions’ family of user-programmable Xilinx Virtex-7 FPGA-based computing products and is targeted specifically at wide-band, low latency applications that require large FPGA processing, wide input/output requirements, with minimal latency. When combined with the TADF-4300 module, featuring 12 GS/s 8-bit ADC technology and 12 GS/s 10-bit DAC technology from Tektronix, an extremely high performance wide-band DRFM system can be created. The combined card-set is called the CHAMP-WB-DRFM. The CHAMP-WB complements this processing capability with a data plane directly connected to the FPGA with support for Gen2 Serial RapidIO (SRIO). 10.3 Gbps Aurora links can also be supported between FPGA cards. Alternate fabrics can also be supported with different FPGA cores. Integrated FPGA Solution Pushing the performance envelope in a similar way, Pentek in September rolled out new members of Onyx family of high-speed data converter XMC FPGA modules: the 3-channel Onyx Model 71721 and the 4-channel Onyx Model 71761, 200 MHz 16-bit A/D XMC modules based on the high density Xilinx Virtex-7 FPGA. Each has a programmable digital down converter and a suite of built-in programmable cores. Each module has a front end A/D converter stage that accepts three (Model 71721) or four (Model 71761) analog HF or IF inputs on front panel SSMC connectors, with each transformer-coupled to Texas Instruments ADS5485 200 MHz, 16-bit A/D converters (Figure 1). The 200 MHz sampling rate handles the needed bandwidth for a wide range of signal processing applications. The Model 71721 also includes a two-channel 16-bit 800 MHz D/A converter. The Model 71721 and Model 71761 come preconfigured with a suite of built-in functions for digital down conversion, data capture, synchronization, time tagging, and formatting, making them ideal turn-key interfaces for radar, communications, or general data acquisition applications. An A/D acquisition IP module is included for easy data capture and delivery to system memory. Building on the design in the Cobalt Virtex-6 family, architectural enhancements in the Onyx family include a doubling of the DDR3 memory in both size and speed to 4 Gbytes and 1600 MHz, respectively. The PCIe interface has been upgraded to Gen 3, delivering peak transfer rates up to 8 Gbytes/s. The Virtex-7 is more power efficient than previous generations making it easier to utilize larger FPGAs. Optional LVDS and gigabit serial connections to the Virtex-7 are available for connecting to custom high performance I/O. Altera Weighs In Although Xilinx FPGAs tend to dominate in terms of number of board products on the market, Altera-FPGA technology offers interesting alternatives. According to Altera, initiatives from Naval Research Lab and other sources are seeking out new ways to develop flexible, multimission RADAR, or 'FlexDAR' capabilities. Figure 2 shows a block diagram of the implementation using Altera Arria 10 FPGAs. Board level Arria 10 FPGAs are emerging too. Exemplifying that trend, Bittware’s latest family of board is the A10 family based on Altera’s Arria 10 FPGAs and SoCs. The A10 board family features flexible memory configurations, sophisticated clocking and timing options, QSFP28 cages that support 100Gbps (including 100GigE) optical transceivers, FPGA Mezzanine Card (FMC), and support for the network-enabled Altera SDK for OpenCL. Built on 20nm process technology, Arria 10 FPGAs and SoCs are the industry’s first FPGA to integrate hardened floating-point (IEEE 754-compliant) DSP blocks that deliver breakthrough floating-point performance of up to 1.5 TFLOPS. Arria 10 SoCs are also the industry’s only 20nm FPGA to integrate a dual-core ARM Cortex-A9 MPCore hard processor system (HPS). The A10 family includes Bittware’s A10 family consists of 11 board variants including PCIe, AMC, VPX form factors. At a highly level of the signal processing food chain, there’s long been a lack of any kind of standards-based approach to military signal processing that encompasses RF architectures. Along such lines, Mercury last month announced an initiative called OpenRFM to streamline the integration of RF and digital subsystems in advanced sensor processing applications with the goal of creating more affordable, flexible and open standards-based solutions. According to Mercury, this initiative will directly address DoD procurement mandates including open systems architecture, interoperability, technology re-use and affordability. The goal for OpenRFM is provide state-of-the-art design, test, and control practices for interfacing RF and digital subsystems in an embedded architecture, such as OpenVPX. This will in theory enabled seamless integration of RF and microwave elements within electronic warfare (EW) and signals. HPC Levels of GPU Performance Back to GPGPUs, the parallel processing capabilities of GPGPUs have made them a building block in a number of High Performance Computing solutions introduced in the past 12 months. A number of solutions are available under the HPC categories where the goal is more pure performance than ruggedness. Performance levels of these systems are in the Teraflop range and usually make use of GPGPU or FPGA technologies. Along such lines, One Stop Systems offers a PCIe Gen3 expansion appliance that supports up to 16 high-end accelerator boards from a single or multiple servers. The 3U High-Density Compute Accelerator (CA16000) provides up to 73.3 Teraflops of computational power using NVIDIA Tesla K10 GPU accelerators. The CA16000 is a complete appliance, solving integration issues and making installation easy. The user simply connects the cable or cables to the host server(s) and has hundreds or thousands of additional compute cores readily available. Even though there have been surprisingly few new rugged board level GPGPU products released this calendar year, it continues to become a more accepted technology in defense applications. And system level solutions are becoming available as well. Along those lines, last month at AUSA GE’s Intelligent Platforms announced that it had received a $2.6 million order from BAE Systems Platforms and Services for a quantity of its latest generation 3U VPX COTS Rugged Systems. The systems will be deployed as part of the US Army’s CETU (Common Embedded Training Unit) which sees in-vehicle training and simulation incorporated into the Bradley Fighting Vehicle (Figure 3). Housed in a rugged, 5-slot enclosure, the system includes a GE 3U VPX single board computer featuring an Intel Core i7 processor and a rugged graphics board that takes advantage of the performance of an NVIDIA 384-core ‘Kepler’ GPU. The graphics board is a result of GE’s close working relationship with NVIDIA which has allowed GE to incorporate truly rugged technology rather than commercial/benign environment technology. GE’s expertise in developing sophisticated GPU-based solutions also allows the system to support non-standard and legacy display formats. Figure 1. The 4-channel Onyx Model 71761, 200 MHz 16-bit A/D XMC module is based on the high density Xilinx Virtex-7 FPGA. Figure 2. Naval Research Lab and other sources are seeking out new ways to develop flexible, multimission RADAR, or 'FlexDAR' capabilities based on FPGAs. Figure 3. US Army’s CETU (Common Embedded Training Unit) implements in-vehicle training and simulation in the Bradley Fighting Vehicle. An M6 Linebacker air defense variant of the Bradley is shown here.