Development of an FPGA-Based Two Transform Pulse Compressor CONCEPTS Power PC Phase Multiply Correction (MC) e jF(k) Ref 1D FFT Phase Multiply Correction (MC) e jF(k) Error Compensation (MC) Conj * Ch1 1D IFFT 1D FFT Rng Select Perform a Two-Transform Pulse Compression using a Received reflected signal and a Reference signal Input signals are first phase corrected using a complex phase factor multiply Range Compression is achieved by a crosscorrelation of the Received signal with the Reference signal which is implemented as mulktiplication of the Received signal by the conjugate of the Reference in the frequency domain Both input signals are first transformed to the frequency domain using Fast Fourier Transforms (FFTs) Provisions for a frequency domain correction are included as a complex multiply after the crosscorrelation Following cross-correlation and error correction, an Inverse Fast Fourier Transform (IFFT) is used to obtain the time domain compressed signal An optional swath selection is used to select a desired portion of the output compressed signal FPGA Logic WILDSTAR II™ PCI BOARD DDR2 DDR2 DDR2 SRAM 4 MB SRAM 4 MB SRAM 4 MB 36 36 PREPROC I/O #0 168 WSDP / FPGA DDR2 DDR2 DDR2 SRAM 4 MB SRAM 4 MB SRAM 4 MB 36 36 DDR SDRAM DDR SDRAM 64 MB 64 MB 172 36 36 DDR2 SRAM 4 MB DDR2 SRAM 4 MB DDR2 SRAM 4 MB Prog Osc Prog Osc 32 36 Single Ended Single Wildstar II Board provides up 16 Million FPGA gates and 4.8 Gbytes/sec I/O on WSDP ports PREPROC Pulse #1 WSDP0 2 pulses 64K ea 250 Msps WSDP1 WSDP0 WSDP1 2 pulses 64K ea 250 Msps Pulse #2 1 pulse 64K 125 Msps Ch1b Refa Refb 2 Wildstar II Boards Process 4 Simultaneous Pulses Router/Interface Collected results: 4 processed pulses FPGA IMPLEMENTATION via COREFIRETM 32 168 I/O #1 3 36 36 36 DDR2 SRAM 4 MB DDR2 SRAM 4 MB DDR2 SRAM 4 MB CorefireTM Annapolis Microsystems design tool Allows fast development of FPGA core designs using libraries of functional blocks Interface board design illustrated Receives data pulses from upstream splitter Performs FFT pre-processing 32 Master Clock Generator Flash 16 66/133 MHz PCLK MCLK ICLK Copyright 2002 Annapolis Micro Systems, Inc. PCI BUS DEMONSTRATION HARDWARE 10 PERFORMANCE 0 -10 4 6 8 10 12 14 16 FFT Input/Max Bit W idth 18 20 22 Rngcomp MATLAB vs W ildstar (scaling: 1051.1) FFT designs analyzed using metrics Bit widths and growth specified in models -2 -2.5 Performance considered using synthetic bandlimited pulse Sidelobe Ratio Energy in peak compared to energy in sidelobes Degradation in pulse compression will manifest itself with higher sidelobe levels 30 Impulse Response Shape and quantization effects considered for compressed pulses Xcorr XmitSig IPR Comparison (64K, Mode2) -5 -10 -15 55 -20 Multiple pulse design currently running 4 simultaneous pulses processed on 2 Wildstar II boards FFT Throughput Performance One point per clock Clock currently running at 81Mhz on speed grade -4 parts Anticipate speeds up to 133 MHz on speed grade -6 parts MATLAB: blue Wildstar: red 60 20 bits 18 bits 16 bits 14 bits 12 bits 10 bits floating pt 0 dB Xcorr Signal2 Power 10log10 dB Xcorr ISLR Loss, dB PREPROC 20 FFT Maximum bit widths of 18 bits appear to give less than 0.1 dB of ISLR loss, corresponds to space efficient fixed point FPGA FFT core implementations using Xilinx parts with embedded 18x18 multipliers PREPROC 30 25 PREPROC 1 pulse 64K 125 Msps 36 PE 1 VIRTEX TM II XC2V 6000, 8000 64 Bits -1 20 Mpy Bit Width PREPROC WILDSTAR Board #2 FPGA Node configuration Processing node of 2 FPGA boards performs complete Range Compression on 2 range pulses using combination of V2000E Xilinx FPGAs on the WSDP I/O cards and V6000 FPGAs on the base cards Pass through concept: each iteration node strips off 1st 64k pulse samples to process, passes remaining pulse data onto succeeding iteration nodes 32 172 36 Differential -0.5 15 1 pulse 64K 125 Msps FFT Select IFFT EC Xmpy FFT PCI 16K 32K 64K 0 10 PREPROC Select IFFT EC Xmpy FFT FFT 1 pulse 64K 125 Msps WILDSTAR Board #1 PREPROC 65 -1.5 Select IFFT EC Xmpy FFT PE1 16 32 40 XmitSig ISLR Loss vs FFT Bit Width (64K, Mode2) PE0 FFT 4 pulses 64K ea 500 Msps II PCI Flash PE 0 VIRTEX TM II XC2V 6000, 8000 50 SQNR dB TM Flash 32 Fixed Point FFT SQNRs vs Bit Width & FFT Length (Mode 1) Bit growth from 8/10 bit inputs appears to give reasonable SQNRs Select IFFT EC Xmpy FFT 1 pulse 64K 125 Msps Ch1a 16 70 Bit growth through FFT added Range Compression Processor (~6 boards) PE1 FFT 12 to 16 Million System Gates Virtex™ E FPGA is larger, faster, and uses less power than Virtex™ FPGA 150 MHz Board, FPGA and Memory Speed 4.8 GBytes/Sec Memory Band Width I/O Band Width 66 MHz PCI - Up to Theoretical Maximum of 512 MBytes/Sec with 64 Bits WILDSTAR™ PE to I/O Board - 3 GBytes/Sec LAD Bus - 256 MBytes/Sec at 66 MHz/32 Bits Supports Internet Reconfiguration Program from Flash on Power Up Commercial Off the Shelf Product (COTS) WILDSTAR Fixed Point Cores offer ~5:1 size advantage over Floating Point cores 3 dB difference for each doubling of FFT length (1/2 bit) Three input channel architecture illustrated Benefits WILDSTAR IITM ARCHITECTURE Floating Point vs Fixed Point Sizing Fairly consistent 5:1 ratio Observed with FFT, complex mpy and add, divide, sqrt cores 60 Interface (Custom) Ch1 Ref Fixed Point complex FFT core Approximately 5:1 size reduction over Floating Point core Multiply/accumulators not driving factor in size Can fit ~4 x 8-bit FFT cores in a single V6000 FPGA • 4:1 hardware improvement over Floating Point • 64K vector length; 8 bit input; 18 bit max bit width 4 FFT points/clock + latency • 64K complex FFT @ 150 MHz 109us • 32K complex FFT @ 150 MHz 55us Signal to Quantization Noise Ratio (SQNR) Analyzed using MATLAB FFT models using specified bit widths and truncations Signal to Quantization Noise Ratio Uses uniform distributed noise input to FFT S|Xfloat|2 S|Xfloat-Xfixed|2 Router / Time Alignment / Interface (Custom) Board required Signal Processor requires WSDP data input interfaces due to high data rates Time align Ref and Ch1, Ch2, Ch3... channels Buffer and rate reduce each channel into lower rate channels for WSDP capabilities (800 MB/sec) Provide WSDP compatible output interfaces Ch3 A/D PE0 3 Ch2 A/D FPGA growth path includes increased gate density and increased features for smaller designs with improved precision and capabilities 2 Virtex™ II FPGA Processing Elements XC2V6000 or X2V8000 0 to 48 MBytes of Synchronous ZBT SRAM in 6 Memory Banks 0 to 256 MBytes of Synchronous DRAM in 1 Memory Banks PCI Bus - Rev 2.2 Compliant 5V Board - 32/64 Bit, 33 MHz, 5V or 3.3V Slot 3.3V Board - 32/64 Bit, 33/66MHz, 3.3V Slot Automatic 32/64 Bit PCI Bus Recognition Host Software: NT 4.0 and 2000, Linux, Solaris API and Device Drivers VHDL Model of the System for Easy Development Accepts COTS High speed WILDSTAR™ I/O Cards WILDSTAR™ Data Port (WSDP™), FPDP, Myrinet™, 65 MHz A/D, and 1 GHz A/D Perform Pulse Compressions on input data in real time Up to 64K (16K, 32K, 64K) sample input pulses Up to 500 MHz data sample rate Data samples are complex, up to 8 bits per sample Ch1 PULSE COMPRESSOR ARCHITECTURE DESIGN ANALYSIS Currently Available Features SYSTEM DESIGN GOALS Xilinx 2VP50 FPGA 4 PPCs High Speed I/O Swath Selection FPGA Processors Offer high throughput, much higher density than DSP processors Reconfigurable processing COTS solution Low cost and much faster alternative to ASICs Implemented in Annapolis Wildstar COTS Boards Powerful core design tools and libraries available for fast development and prototyping Includes high speed WSDP data interconnects FPGAs offer growth path to improved processors 50 million gate parts Platform FPGAs including PPC processors, I/O and RAM are currently available RAM A/D 2007 Technology “Platform” FPGA Power PC 50M gates REF WSDP / FPGA V8000 Parts Currently Available A/D Router/Time Align/Interface V6000 Parts: ECP Demo Part integration (4X) Improved PPC speed (2X) Xilinx FPGA RAM Buffers Xilinx V8000 FPGA 8M gates Xilinx V6000 FPGA 6M gates PowerPC Processor Two-Transform Pulse Compressor Algorithm SYSTEM ARCHITECTURE SYSTEM COMPONENTS GOALS Create a high-throughput Two Transform Pulse Compressor for use in wideband real-time Radar Signal Processor applications using Commercial OffThe-Shelf (COTS) Field Programmable Gate Array (FPGA) processor boards. 50 45 -25 40 -30 -35 -40 -45 -50 3.268 3.27 3.272 3.274 Cell 3.276 3.278 3.28 x 10 4 3 Wildstar II Virtex V6000 Board Nodes IBM PC servers to host boards (6) 6 Million gate parts Status: Operational 3 Wildstar II assemblies complete and operating 1 Data driver and collector board 2 Processing boards Four node parallel processor implemented and operating; investigation continues for faster operating clock and larger parts for increased bit widths and data precision 35 30 3.265 3.27 3.275 3.28 3.285 3.29 x 10 4 ISLR loss alone can be deceiving metric, need to consider factors such as IPR shape, which can show severe truncations with apparently good ISLR Integrated Sensors, Inc. (315)798-1377 www.sensors.com