Development of an FPGA-Based Two Transform Pulse Compressor

advertisement
Development of an FPGA-Based Two Transform Pulse Compressor
CONCEPTS
Power
PC
Phase Multiply
Correction (MC)

e jF(k)

Ref
1D
FFT

Phase Multiply
Correction (MC)
e jF(k)
Error
Compensation
(MC)
Conj *


Ch1
1D
IFFT
1D
FFT
Rng
Select
Perform a Two-Transform Pulse Compression using a
Received reflected signal and a Reference signal
Input signals are first phase corrected using a
complex phase factor multiply
Range Compression is achieved by a crosscorrelation of the Received signal with the Reference
signal which is implemented as mulktiplication of the
Received signal by the conjugate of the Reference in
the frequency domain

Both input signals are first transformed to
the frequency domain using Fast Fourier
Transforms (FFTs)
Provisions for a frequency domain correction are
included as a complex multiply after the crosscorrelation
Following cross-correlation and error correction, an
Inverse Fast Fourier Transform (IFFT) is used to
obtain the time domain compressed signal
An optional swath selection is used to select a
desired portion of the output compressed signal
FPGA
Logic

WILDSTAR II™ PCI BOARD
DDR2
DDR2
DDR2
SRAM
4
MB
SRAM
4
MB
SRAM
4
MB
36
36
PREPROC
I/O
#0
168
WSDP / FPGA
DDR2
DDR2
DDR2
SRAM
4
MB
SRAM
4
MB
SRAM
4
MB
36
36
DDR
SDRAM
DDR
SDRAM
64 MB
64 MB
172
36
36
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
Prog
Osc
Prog
Osc
32
36
Single Ended
Single Wildstar II Board provides up 16 Million
FPGA gates and 4.8 Gbytes/sec I/O on WSDP ports
PREPROC
Pulse #1
WSDP0
2 pulses
64K ea
250 Msps
WSDP1
WSDP0
WSDP1
2 pulses
64K ea
250 Msps
Pulse #2
1 pulse
64K
125 Msps
Ch1b
Refa
Refb
2 Wildstar II Boards
Process 4 Simultaneous
Pulses
Router/Interface
Collected
results: 4
processed
pulses
FPGA IMPLEMENTATION via COREFIRETM
32
168
I/O
#1
3
36
36
36
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB

CorefireTM
 Annapolis Microsystems design
tool
 Allows fast development of
FPGA core designs using
libraries of functional blocks

Interface board design illustrated
 Receives data pulses from
upstream splitter
 Performs FFT pre-processing
32
Master
Clock
Generator
Flash
16
66/133 MHz
PCLK
MCLK
ICLK
Copyright 2002
Annapolis Micro Systems, Inc.
PCI BUS
DEMONSTRATION HARDWARE
10
PERFORMANCE
0
-10
4
6
8
10
12
14
16
FFT Input/Max Bit W idth
18
20
22
Rngcomp MATLAB vs W ildstar (scaling: 1051.1)
FFT designs analyzed using
metrics
 Bit widths and growth
specified in models
-2
-2.5
Performance considered using
synthetic bandlimited pulse
Sidelobe Ratio
 Energy in peak compared to
energy in sidelobes
 Degradation in pulse
compression will manifest
itself with higher sidelobe
levels
30

Impulse Response
 Shape and quantization
effects considered for
compressed pulses
Xcorr XmitSig IPR Comparison (64K, Mode2)
-5
-10
-15
55
-20

Multiple pulse design currently running
 4 simultaneous pulses processed on 2
Wildstar II boards

FFT Throughput Performance
 One point per clock
 Clock currently running at 81Mhz on
speed grade -4 parts
 Anticipate speeds up to 133 MHz on
speed grade -6 parts
MATLAB: blue
Wildstar: red
60
20 bits
18 bits
16 bits
14 bits
12 bits
10 bits
floating pt
0
dB

Xcorr Signal2 Power 10log10 dB
Xcorr ISLR Loss, dB
PREPROC
20

FFT Maximum bit widths of 18 bits appear to give less than 0.1 dB of
ISLR loss, corresponds to space efficient fixed point FPGA FFT core
implementations using Xilinx parts with embedded 18x18 multipliers
PREPROC
30

25
PREPROC
1 pulse
64K
125 Msps
36
PE 1
VIRTEX TM II
XC2V 6000, 8000
64 Bits
-1
20
Mpy Bit Width
PREPROC
WILDSTAR
Board #2
FPGA Node configuration
 Processing node of 2 FPGA
boards performs complete
Range Compression on 2
range pulses using
combination of V2000E Xilinx
FPGAs on the WSDP I/O cards
and V6000 FPGAs on the
base cards
 Pass through concept: each
iteration node strips off 1st
64k pulse samples to
process, passes remaining
pulse data onto succeeding
iteration nodes
32
172
36
Differential
-0.5
15
1 pulse
64K
125 Msps
FFT

Select
IFFT
EC
Xmpy
FFT
PCI
16K
32K
64K
0
10
PREPROC
Select
IFFT
EC
Xmpy
FFT
FFT
1 pulse
64K
125 Msps
WILDSTAR
Board #1
PREPROC
65
-1.5
Select
IFFT
EC
Xmpy
FFT
PE1
16
32
40
XmitSig ISLR Loss vs FFT Bit Width (64K, Mode2)
PE0
FFT
4 pulses
64K ea
500 Msps
II PCI
Flash
PE 0
VIRTEX TM II
XC2V 6000, 8000
50
SQNR dB
TM
Flash
32
Fixed Point FFT SQNRs vs Bit Width & FFT Length (Mode 1)
Bit growth from 8/10
bit inputs appears
to give reasonable
SQNRs
Select
IFFT
EC
Xmpy
FFT
1 pulse
64K
125 Msps
Ch1a
16
70
 Bit growth through FFT
added
Range Compression
Processor (~6 boards)
PE1
FFT
 12 to 16 Million System Gates
 Virtex™ E FPGA is larger, faster, and uses less power than
Virtex™ FPGA
 150 MHz Board, FPGA and Memory Speed
 4.8 GBytes/Sec Memory Band Width
 I/O Band Width
 66 MHz PCI - Up to Theoretical Maximum of 512 MBytes/Sec
with 64 Bits
 WILDSTAR™ PE to I/O Board - 3 GBytes/Sec
 LAD Bus - 256 MBytes/Sec at 66 MHz/32 Bits
 Supports Internet Reconfiguration
 Program from Flash on Power Up
 Commercial Off the Shelf Product (COTS)
WILDSTAR
Fixed Point Cores offer ~5:1 size advantage over Floating Point cores
 3 dB difference for each
doubling of FFT length
(1/2 bit)
Three input channel
architecture illustrated
Benefits
WILDSTAR IITM ARCHITECTURE
Floating Point vs Fixed Point Sizing
 Fairly consistent 5:1 ratio
 Observed with FFT, complex mpy and add, divide, sqrt cores
60
Interface
(Custom)
Ch1
Ref
Fixed Point complex FFT core
 Approximately 5:1 size reduction over Floating Point core
 Multiply/accumulators not driving factor in size
 Can fit ~4 x 8-bit FFT cores in a single V6000 FPGA
• 4:1 hardware improvement over Floating Point
• 64K vector length; 8 bit input; 18 bit max bit width
 4 FFT points/clock + latency
• 64K complex FFT @ 150 MHz
109us
• 32K complex FFT @ 150 MHz
55us
Signal to Quantization Noise
Ratio (SQNR)
 Analyzed using MATLAB
FFT models using
specified bit widths and
truncations
 Signal to Quantization
Noise Ratio
 Uses uniform distributed
noise input to FFT
 S|Xfloat|2
S|Xfloat-Xfixed|2
Router / Time Alignment / Interface (Custom) Board required
 Signal Processor requires WSDP data input interfaces due
to high data rates
 Time align Ref and Ch1, Ch2, Ch3... channels
 Buffer and rate reduce each channel into lower rate
channels for WSDP capabilities (800 MB/sec)
 Provide WSDP compatible output interfaces
Ch3
A/D
PE0
3


Ch2
A/D

FPGA growth path includes increased gate density
and increased features for smaller designs with
improved precision and capabilities
 2 Virtex™ II FPGA Processing Elements
 XC2V6000 or X2V8000
 0 to 48 MBytes of Synchronous ZBT SRAM in 6 Memory Banks
 0 to 256 MBytes of Synchronous DRAM in 1 Memory Banks
 PCI Bus - Rev 2.2 Compliant
 5V Board - 32/64 Bit, 33 MHz, 5V or 3.3V Slot
 3.3V Board - 32/64 Bit, 33/66MHz, 3.3V Slot
 Automatic 32/64 Bit PCI Bus Recognition
 Host Software: NT 4.0 and 2000, Linux, Solaris
 API and Device Drivers
 VHDL Model of the System for Easy Development
 Accepts COTS High speed WILDSTAR™ I/O Cards
 WILDSTAR™ Data Port (WSDP™), FPDP, Myrinet™, 65 MHz
A/D, and 1 GHz A/D
Perform Pulse Compressions on input data in real time
 Up to 64K (16K, 32K, 64K) sample input pulses
 Up to 500 MHz data sample rate
 Data samples are complex, up to 8 bits per sample
Ch1
PULSE COMPRESSOR ARCHITECTURE
DESIGN ANALYSIS

Currently
Available
Features
SYSTEM DESIGN GOALS

Xilinx
2VP50
FPGA
4 PPCs
High Speed I/O
Swath
Selection

FPGA Processors
 Offer high throughput, much higher density
than DSP processors
 Reconfigurable processing
 COTS solution
 Low cost and much faster alternative to ASICs
Implemented in Annapolis Wildstar COTS Boards
 Powerful core design tools and libraries
available for fast development and prototyping
 Includes high speed WSDP data interconnects
FPGAs offer growth path to improved processors
 50 million gate parts
 Platform FPGAs including PPC processors, I/O
and RAM are currently available

RAM
A/D
2007
Technology
“Platform” FPGA
Power
PC
50M
gates
REF
WSDP / FPGA

V8000 Parts
Currently Available
A/D
Router/Time Align/Interface
V6000 Parts:
ECP Demo
 Part integration (4X)
 Improved PPC speed
(2X)
Xilinx
FPGA
RAM Buffers
Xilinx
V8000
FPGA
8M gates
Xilinx
V6000
FPGA
6M gates
PowerPC
Processor
Two-Transform Pulse Compressor Algorithm
SYSTEM ARCHITECTURE
SYSTEM COMPONENTS
GOALS
Create a high-throughput Two Transform Pulse Compressor for use in
wideband real-time Radar Signal Processor applications using Commercial OffThe-Shelf (COTS) Field Programmable Gate Array (FPGA) processor boards.
50
45
-25
40
-30

-35
-40
-45

-50
3.268
3.27
3.272
3.274
Cell
3.276
3.278
3.28
x 10
4
3 Wildstar II Virtex V6000 Board Nodes
 IBM PC servers to host boards
 (6) 6 Million gate parts
Status: Operational
 3 Wildstar II assemblies complete
and operating
 1 Data driver and collector board
 2 Processing boards
Four node parallel processor
implemented and operating;
investigation continues for faster
operating clock and larger parts for
increased bit widths and data precision
35
30
3.265
3.27
3.275
3.28
3.285
3.29
x 10
4
ISLR loss alone can be deceiving metric, need to consider factors
such as IPR shape, which can show severe truncations with
apparently good ISLR
Integrated Sensors, Inc. (315)798-1377 www.sensors.com
Download