FINAL COTS11 GPU FPGAs Signal Proc Jeff

advertisement
{FINAL COTS11 GPU FPGAs Signal Proc Jeff Lead.doc 3 page lead by Jeff for Tech Recon:
FPGAs vs. GPGPUs for Signal Processing Systems for November COTS Journal.}
Editor’s Notes:
1.
Figure 2 should be redrawn replacing the stylized Altera logos to just say “Altera 10
FPGAs and SoCs” in plain text.
FPGA Board Advances Tighten Up System Capabilities
Integrated alongside faster converter technologies, FPGA solutions are feeding today’s huge
signal processing appetites. Meanwhile, GPUs are becoming accepted as a solid choice for
parallel processing military systems.
Jeff Child, Editor-in-Chief
Gone are the days when even the term “Digital Signal Processer” occupies mainstream
mindshare in military system design. That’s because the signal processing functionality on board
today’s FPGA chips are much more interesting the kind of system-oriented DSP functions used
in defense. And signal processing capabilities of FPGAs continue to climb, feeding the insatiable
appetite such systems have for more digital signal processing muscle. The requirements for such
systems continue to call for ever more data collection capacity. The ability, for example, to
process that data—in the form of radar captured video or images—presents major system design
challenges for developers of military platforms. Board-level FPGA computing solutions have
grown to become key enablers for waveform-intensive applications like sonar, radar, SIGINT
and SDR.
FPGAs a System Level Technology
Faster FPGA-based DSP capabilities combined with an expanding array of IP cores and
development tools for FPGAs are enabling new system architectures. Today FPGAs are
complete systems on a chip. The high-end lines of the major FPGA vendors even have generalpurpose CPU cores on them. And the military is hungry to use FPGAs to fill processing roles.
Devices like the Xilinx Virtex-6 and -7 and the Altera Stratix IV and V are examples that have
redefined an FPGA as a complete processing engine in its own right.
While FPGAs remain a mainstay of military signal processing, an alternative of the “GPUs as
general-purpose processing engine” has been gaining momentum since 2007. GPGPU offers a
simpler way to do complex multiprocessing by putting high-performance graphics processors to
work on general-purpose processing tasks. This fits well into the theme of doing more while
keeping the complexity at bay. Graphics chip vendor NVIDIA developed a parallel computing
architecture called CUDA. System developers can also us AMD GPUs using OpenCL instead of
CUDA. Languages like CUDA and OpenCL let programmers use conventional computing
languages to access the massively parallel processing capabilities of the GPU. Aside from
serving applications in radar, signals intelligence and video surveillance and interpretation,
GPUs have potential in other application areas, including target tracking, image stabilization and
SAR (synthetic aperture radar) simulation.
FPGAs Tie Close with ADCs/DACs
Back to the FPGA side, one big advantage of FPGAs lies in their ample, programmable, highspeed I/O, which is why they are often found close to the analog-to-digital converters (ADC)
behind radar phased arrays. Board level vendors continue to roll out integrated solutions using
the latest greats ADCs and DACs tied with FPGA processing. In an example along those lines,
Curtiss-Wright last month announced a collaboration with Tektronix Component Solutions to
developed technology that double the analog-to-digital (ADC) and digital-to-analog (DAC) data
bandwidth performance supported by its CHAMP-WB OpenVPX board family.
The new receiver and transmitter products will deliver 25 Gsamples/s and the combined boardset will enable direct RF sampling of bandwidths up to 12GHz using open architecture COTS
modules. The board-set’s ultra-high sampling rate will enable these applications to scan huge
swaths of bandwidth for signals of interest. The CHAMP-WB is the first entry in Curtiss-Wright
Defense Solutions’ family of user-programmable Xilinx Virtex-7 FPGA-based computing
products and is targeted specifically at wide-band, low latency applications that require large
FPGA processing, wide input/output requirements, with minimal latency. When combined with
the TADF-4300 module, featuring 12 GS/s 8-bit ADC technology and 12 GS/s 10-bit DAC
technology from Tektronix, an extremely high performance wide-band DRFM system can be
created. The combined card-set is called the CHAMP-WB-DRFM. The CHAMP-WB
complements this processing capability with a data plane directly connected to the FPGA with
support for Gen2 Serial RapidIO (SRIO). 10.3 Gbps Aurora links can also be supported between
FPGA cards. Alternate fabrics can also be supported with different FPGA cores.
Integrated FPGA Solution
Pushing the performance envelope in a similar way, Pentek in September rolled out new
members of Onyx family of high-speed data converter XMC FPGA modules: the 3-channel
Onyx Model 71721 and the 4-channel Onyx Model 71761, 200 MHz 16-bit A/D XMC modules
based on the high density Xilinx Virtex-7 FPGA. Each has a programmable digital down
converter and a suite of built-in programmable cores. Each module has a front end A/D converter
stage that accepts three (Model 71721) or four (Model 71761) analog HF or IF inputs on front
panel SSMC connectors, with each transformer-coupled to Texas Instruments ADS5485 200
MHz, 16-bit A/D converters (Figure 1). The 200 MHz sampling rate handles the needed
bandwidth for a wide range of signal processing applications. The Model 71721 also includes a
two-channel 16-bit 800 MHz D/A converter.
The Model 71721 and Model 71761 come preconfigured with a suite of built-in functions for
digital down conversion, data capture, synchronization, time tagging, and formatting, making
them ideal turn-key interfaces for radar, communications, or general data acquisition
applications. An A/D acquisition IP module is included for easy data capture and delivery to
system memory. Building on the design in the Cobalt Virtex-6 family, architectural
enhancements in the Onyx family include a doubling of the DDR3 memory in both size and
speed to 4 Gbytes and 1600 MHz, respectively. The PCIe interface has been upgraded to Gen 3,
delivering peak transfer rates up to 8 Gbytes/s. The Virtex-7 is more power efficient than
previous generations making it easier to utilize larger FPGAs. Optional LVDS and gigabit serial
connections to the Virtex-7 are available for connecting to custom high performance I/O.
Altera Weighs In
Although Xilinx FPGAs tend to dominate in terms of number of board products on the market,
Altera-FPGA technology offers interesting alternatives. According to Altera, initiatives from
Naval Research Lab and other sources are seeking out new ways to develop flexible, multimission RADAR, or 'FlexDAR' capabilities. Figure 2 shows a block diagram of the
implementation using Altera Arria 10 FPGAs.
Board level Arria 10 FPGAs are emerging too. Exemplifying that trend, Bittware’s latest family
of board is the A10 family based on Altera’s Arria 10 FPGAs and SoCs. The A10 board family
features flexible memory configurations, sophisticated clocking and timing options, QSFP28
cages that support 100Gbps (including 100GigE) optical transceivers, FPGA Mezzanine Card
(FMC), and support for the network-enabled Altera SDK for OpenCL. Built on 20nm process
technology, Arria 10 FPGAs and SoCs are the industry’s first FPGA to integrate hardened
floating-point (IEEE 754-compliant) DSP blocks that deliver breakthrough floating-point
performance of up to 1.5 TFLOPS. Arria 10 SoCs are also the industry’s only 20nm FPGA to
integrate a dual-core ARM Cortex-A9 MPCore hard processor system (HPS). The A10 family
includes Bittware’s A10 family consists of 11 board variants including PCIe, AMC, VPX form
factors.
At a highly level of the signal processing food chain, there’s long been a lack of any kind of
standards-based approach to military signal processing that encompasses RF architectures. Along
such lines, Mercury last month announced an initiative called OpenRFM to streamline the
integration of RF and digital subsystems in advanced sensor processing applications with the
goal of creating more affordable, flexible and open standards-based solutions. According to
Mercury, this initiative will directly address DoD procurement mandates including open systems
architecture, interoperability, technology re-use and affordability. The goal for OpenRFM is
provide state-of-the-art design, test, and control practices for interfacing RF and digital
subsystems in an embedded architecture, such as OpenVPX. This will in theory enabled
seamless integration of RF and microwave elements within electronic warfare (EW) and signals.
HPC Levels of GPU Performance
Back to GPGPUs, the parallel processing capabilities of GPGPUs have made them a building
block in a number of High Performance Computing solutions introduced in the past 12 months.
A number of solutions are available under the HPC categories where the goal is more pure
performance than ruggedness. Performance levels of these systems are in the Teraflop range and
usually make use of GPGPU or FPGA technologies. Along such lines, One Stop Systems offers
a PCIe Gen3 expansion appliance that supports up to 16 high-end accelerator boards from a
single or multiple servers. The 3U High-Density Compute Accelerator (CA16000) provides up to
73.3 Teraflops of computational power using NVIDIA Tesla K10 GPU accelerators. The
CA16000 is a complete appliance, solving integration issues and making installation easy. The
user simply connects the cable or cables to the host server(s) and has hundreds or thousands of
additional compute cores readily available.
Even though there have been surprisingly few new rugged board level GPGPU products released
this calendar year, it continues to become a more accepted technology in defense applications.
And system level solutions are becoming available as well. Along those lines, last month at
AUSA GE’s Intelligent Platforms announced that it had received a $2.6 million order from BAE
Systems Platforms and Services for a quantity of its latest generation 3U VPX COTS Rugged
Systems. The systems will be deployed as part of the US Army’s CETU (Common Embedded
Training Unit) which sees in-vehicle training and simulation incorporated into the Bradley
Fighting Vehicle (Figure 3).
Housed in a rugged, 5-slot enclosure, the system includes a GE 3U VPX single board computer
featuring an Intel Core i7 processor and a rugged graphics board that takes advantage of the
performance of an NVIDIA 384-core ‘Kepler’ GPU. The graphics board is a result of GE’s close
working relationship with NVIDIA which has allowed GE to incorporate truly rugged
technology rather than commercial/benign environment technology. GE’s expertise in
developing sophisticated GPU-based solutions also allows the system to support non-standard
and legacy display formats.
Figure 1.
The 4-channel Onyx Model 71761, 200 MHz 16-bit A/D XMC module is based on the high
density Xilinx Virtex-7 FPGA.
Figure 2.
Naval Research Lab and other sources are seeking out new ways to develop flexible, multimission RADAR, or 'FlexDAR' capabilities based on FPGAs.
Figure 3.
US Army’s CETU (Common Embedded Training Unit) implements in-vehicle training and
simulation in the Bradley Fighting Vehicle. An M6 Linebacker air defense variant of the Bradley
is shown here.
Download