Tracking Radar Digital Matched-Filter ASIC Design

advertisement
Tracking Radar Digital Matched-Filter ASIC Design
Zhenyu Liu Zhimei Zhou
Electronic Engineering Department
Beijing Institute of Technology
Beijing 100081
People’s Republic of China
Abstract : Matched-filter is widely used in real time signal processing, especially in Radar Signal Processing.
This paper provides a novel structure of digital matched-filter used in tracking radar system. This design applies
block-floating-point arithmetic to improve the precision. The whole digital matched-filter is implemented in
only one chip of FPGA. This ASIC has two work modes: 512 points pulse compression and 256 points pulse
compression. It complements three channels of 512 points complex signal pulse compression in 102us.
Key-Words: matched-filter, parallel processing, low power dissipation, FFT, FPGA
1 Introduction
Modern radar systems generally apply FM signals
to get large time-bandwidth product signal, so
matched-filter must be used in the system to generate
pulse compression[1]. Compared with the analog
matched-filter, the digital is programmable, more
accurate and compact. The digital matched-filter,
which is the crucial component in modern radar
system, is getting more and more attention.
In tracking radar system, three parameters must be
estimated to track a target. These parameters are the
relative target position, the elevation angle and the
azimuth angle of the target. In order to realize the
pulse compression of these signals, a bank of
matched-filters are needed. In this paper, a novel
digital matched-filter that can simultaneously process
these three signals is designed. This chip has two
work modes: the first is 512-points matched-filter
and the second is 256-points matched-filter. This
design is implemented with VirtexII xc2v500.
2 Matched-filter basic concept
The basic concept of matched-filter evolved from
the effort to obtain a better theoretical understanding
of the factors leading to optimum performing of radar
system[1]. The technique of match filtering
constitutes the optimum linear processing of radar
signals. This form of signal processing transforms the
raw radar data, available at the receive input and
assumed to be corrupted by white Gaussian noise,
into a form that is suitable for performing optimum
detection decisions (i.e., target or no target) or for
estimating target parameters (i.e., range, velocity, etc)
with minimum rms errors, or for obtaining maximum
resolution among a group of targets.
In tracking radar system, a singe matched-filter is
no longer sufficient. In order to track a target, the
relative range, the elevation and the azimuth of the
target are needed. Three channels of complex signal
are fed into the matched-filters and they will be
processed simultaneously in one PRT interval. The
system block diagram is shown in figure 1. The
required parameters will be derived from
y1 (n) , y 2 (n) and y 3 (n) .
x1(n)
IFFT(FFT(x1(n))·S*(w))
x2(n)
IFFT(FFT(x2(n))·S*(w))
x3(n)
IFFT(FFT(x3(n))·S*(w))
y1(n)
y2(n)
y3(n)
Fig.1 tracking radar matched-filter bank
We should notice two characters of this procedure:
First, the three matched-filters have the same
frequency coefficients, which is denoted S ∗ (ω ) ;
Second, the complex signals, x1 (n) , x 2 (n) and
x3 (n) , are sampled and processed simultaneously, in
addition, they have the identical processing algorithm.
These characters will be utilized in the design to
reduce hardware complexity.
3 Matched-filter ASIC Design
The original design [2][4] consists of two identical
processors for direct and inverse fast Fourier
Transform, and a complex multiplier for fast
convolution in frequency domain. In this design,
through novel design, just a dual-butterfly unit
completes these operations. The hardware is reduced
to one third but the throughout is doubled.
multiplication and IFFT. This reduces the hardware
overhead to one third of the original one.
W Nk I W Nk R
_
Real
3.1 Improved Radix-2 Butterfly Unit
In this design, we will use “butterfly operation
processor” to perform FFT transform, complex
multiplication and IFFT transform. Obviously the
butterfly unit is the soul of the whole design, and we
will first introduce the architecture of the improved
radix-2 butterfly unit.
Radix-2 butterfly algorithm in Decimation In Time
(DIT) form of FFT is expressed as follows:
Am (i ) = Am −1 (i ) + Am −1 ( j )W Nk
Am ( j ) = Am −1 (i ) − Am −1 ( j )W
Phi1
+
Phi3
delay
Am−1(i)
+/_
delay
ADD/SUB
Phi2
control
Fig2. Block Diagram of Radix-2 Butterfly
Am−1 (i )
clk
(1)
Phi1
Am −1 (i ), Am −1 ( j ) : (m-1) stage data
Phi2
Am (i ), Am ( j ) : (m) stage data
Phi3
W Nk : Rotation coefficient
In every butterfly operation there is only one
complex multiplication. We can use this character to
reduce the number of multipliers.
The block diagram of the improved butterfly unit
is illustrated in figure 2 and the related timing
diagram of the control signals is shown in figure 3. It
can be seen from this architecture that Am−1 ( j ) is
multiplied with W Nk , but Am−1 (i) just passes by. This
is realized through controlling the phase characters of
“Phi1”, “Phi2” and “Phi3”, which is illustrated in
figure 3. In this way, one complex multiplication is
completed in two clock periods through just two
multipliers: In the first clock period, Am−1 ( j ) is
multiplied with W Nk R . In the second clock period,
Am−1 ( j )
W
Img
Am−1(j)
Am−1 ( j )
k
N
where:
k
NR
+/_
Am−1(i) Am−1(j)
is
multiplied
with
WNk I .
represents the real part of W
k
N
(Note:
and WNk I
represents the imagine part of W Nk .)
After W Nk is replaced with W N− k , the butterfly unit
will perform inverse fast Fourier Transform.
Perhapse, It can be noticed that the butterfly unit can
also perform complex multiplication. If Am−1 (i ) is set
zero and “ADD/SUB” unit is controlled to perform
only addition operation, the butterfly unit can be used
to perform complex multiplication. One complex
multiplication is completed in two clock periods.
This idea is applied in this design. That means
butterfly units are used to perform FFT, complex
Fig3. Timing Diagram of Control Signal
3.2 Dual-Butterfly Operation Unit
The bottleneck of the whole system is the butterfly,
because this part would implement most algorithm
operations. We can solve the bottleneck problem by
using two improved butterfly units to construct a dual
butterfly operation unit. Its block diagram is shown in
figure 4.
There is 180-degree phase difference between
CLK0 and CLK1. Only in this way, the raw data can
be dispatched to the correct butterfly processor. The
frequencies of CLK0 and CLK1 are the same and
they are only the half of the address unit clock. This is
the advantage of this kind architecture. The main
reason can be expressed briefly as follows: All of the
math components in the system, such as adders,
subtracters and multipliers, are all in butterfly units.
Applying this architecture, we reduce the working
clock frequency of these components. In addition,
reducing the clock frequency helps to reduce power
dissipation [7]. Therefore, we could reduce the power
supply voltage to the dual-butterfly unit when we
fabricate ASIC. From reference [7], we know that
this will greatly contribute to the system power
consumption reduction.
Butterfly_0
Am −1 R
Am −1 I
MUX
In_Real Out_Real
In_Img Out_Img
Am R
CLK
Clk0
Clk1
Butterfly_1
MUX
In_Real Out_Real
In_Img Out_Img
CLK
Clk1
Am I
Clk1
Fig 4. Block Diagram of Dual Butterfly Unit
From the timing diagram, we can understand how
the source data flow into these two butterfly
units. ”CLK_SYS” is the system clock, “address
generator” and dual-port SRAM blocks all work
under this clock. The subscripts, ‘R’ and ‘I’, denote
real or imagine part of the data respectively. The
subscripts, ‘0’ and ‘1’, are used to denote which
butterfly unit this data should be sent to,
“butterfly_0” or “butterfly_1”.
Am−1 ( j) R0
Am −1 ( j ) R1
Am−1 ( j ) I 0
Am−1 ( j ) I 1
Then the signal spectrum data will be multiplied with
S * (ω ) and the data is still stored in reverse sequence.
After IFFT transform, the data are stored in dual-port
RAM with normal sequence again. This can provide
more convenience for the post matched-filter
processing. Input and output data are both in normal
sequence. This is another merit of this design.
The address unit structures are depicted in figure
7-10. The methods to derive the correct data and
rotation coefficient addresses have been discussed in
details in reference [5][6]. Compared with the
original design, the address unit fetches data and
coefficients for two butterfly units, so some reforms
have been made to let the dual-butterfly unit work
simultaneously.
From figure 2, it is noticed that the real part and
imaginary part of the coefficients are fed into the
processor serially. So all complex coefficients are
stored serially in one 4k × 12bits ROM. Figure
depicts the map of the coefficient ROM.
000H
CLK_SYS
Am−1( j)R0 to Butterfly_0
Am −1 ( j ) I 0 to Butterfly_0
CLK0
Am−1 ( j) R1 to Butterfly_1
A m −1 ( j) I1 to Butterfly_1
CLK1
Fig 5. Timing Diagram of the input data
Though radix-4 butterfly[5][6] performs less
multiply operation, in this applicaiton, radix-2 dual
butterfly unit has such advantages: First, it can
process any 2 n points data, while radix-4 can only
process 4 n points data. In this application, we need
512 points pulse compression. Second, as pointed out
previously, dual radix-2 butterfly unit could be
deviced to perform not only FFT and IFFT transform,
but also complex multiplication. Radix-4 butterfly
can not realize this hardware sharing. Third,
compared with the radix-4 butterfly block diagram,
radix-2 has more compact structure and its
interconnections between blocks are fewer and
shorter. In deep submirometer technique, reducing
interconnection delay is crucial to enhance the
performance. So applying dual-butterfly structure is a
great advantage for this application.
3.3 FFT_IFFT Address Unit
In order to let this architecture work properly, the
data address and the rotation coefficient address must
be generated correctly. It is assumed that the input
raw data are store in the dual port RAM in normal
sequence. First these data will be processed fast
fourier transform. After FFT, the spectrum data are
stored in the dual port RAM in reverse sequence.
3FFH
400H
7FFH
800H
BFFH
C00H
IFFT coefficient
……
Wk512R
FFT coefficient
Wk512I
Wk+1512R
512 points S*(w)
Wk+1512I
……
256 points S*(w)
FFFH
Fig 6. The map of the coefficient ROM
The 256 points FFT and IFFT coefficients are
±k
. This could be expressed in
expressed as W256
±k
±2 k
= W512
. That means the 256
another form: W256
FFT and IFFT use the even coefficients of the 512
FFT and IFFT, so they can share the same rotation
coefficients.
In figure 8 and figure 10, only the least significant
bits of the rotation coefficient address are depicted.
The current operation and work mode decide the
segment address of the coefficients.
In our design, the ROM, which contains the
coefficients, is implemented with the build-in
dual-port block RAM in Virtex2 FPGA. At the
startup stage, the content of the ROM is initialized
through a chip of EPROM outside the FPGA. For
each chosen FM radar pulse, the related compressor
spectrum is stored in the EPROM. So this system is
fully programmable.
FFT
FFT
IFFT
count
IFFT
count
Stage=0
Stage=0
0
0
coef address
Data address
Stage=1
0
0
Stage=2
0
0
Stage=3
0
0
Stage=3
Stage=4
0
0
Stage=4
Stage=5
0
0
0
0
Stage=1
Stage=2
Stage=6
Stage=5
0
0
Stage=7
Stage=6
Fig 10 256 coefficients address generator
Stage=7
Stage=8
Fig 7 512 data address generator
FFT
IFFT
count
Stage=0
0
0
coef address
As pointed out before, because the three channels
of signal are processed simultaneously and their
coefficients are identical, they can share the unique
address generator and the coefficient ROM. It would
dramatically reduce the hardware overhead.
Stage=1
0
0
3.4 Block-Floating-Point Arithmetic
Stage=2
0
0
0
0
Stage=4
0
0
Stage=5
0
0
0
0
In order to trade off between precision and
performance, block-floating-point arithmetic [5][8]
is applied in this design. From reference[9], we can
get the conclusion that block-floating-point has much
more higher SNR than definite-point and its
implementation is very simple.
Block-floating-point comprises two parts,
“overflow-detector”
and
“scale-counter”.
“overflow-detector” is a finite-state-machine. If it is
assumed that the word length of the data fed into
“dual-butterfly-unit” is N bits, the word length of the
output is N+2 bits. The state of “overflow-detector”
is decided by the most significant 3 bits of the output,
N + 2 ~ N bits. They have the fllowing possibility:
000 (or 111) [ no overflow, output of
“overflow-detector”=0 ]
001 (or 110) [1-bit overflow, output of
“overflow-detector”=1 ]
01x (or 10x) [2-bit overflow, output of
“overflow-detector”=2 ]
The state diagram of “overflow-detector” is shown
in figure 11.
Stage=3
Stage=6
0
0
Stage=7
Stage=8
Fig 8 512 coefficients address generator
FFT
count
‘0’
‘0’
IFFT
Stage=0
Data address
‘0’
‘0’
‘0’
‘0’
‘0’
‘0’
‘0’
‘0’
Stage=1
Stage=2
Stage=3
Stage=4
‘0’
‘0’
Stage=5
‘0’
‘0’
①
Stage=6
S1
‘0’
‘0’
Stage=7
①②
Fig 9
256 data address generator
③
②
S2
①②③
③
S3
Fig.11 State Diagram of “Overflow-Detector
At the beginning of every stage, the
“overflow-detector” is reset to S1. At the end of the
stage, the “scale-counter” is active, so the result of
“overflow-detector” is accumulated, at the same time,
the “overflow-detector” output is registed to control
the next stage input data shifter, indicating how many
bits should be shift right.
4 Design Verification
This design is implemented with Virtex2 FPGA
(XC2V500). The work clock frequence is 100MHz
and 512 points pulse compression is completed
within 102us. The input and output signal are both
12bits 2’s complement representation. In addition,
the result has 5 bits group exponent.
In order to verify the correctness of this design, we
choose two typical groups of data as the test vectors,
and then compare the process results with the
“Matlab” calculation results.
The first group of test vectors are used to test the
512-points work mode of the matched-filter. The
signal is linear FM.
3
1 2

n / 2) 0 ≤ n ≤ 383
exp( j ⋅ 2π (− n +
x ( n) = 
8
512
0
384 ≤ n ≤ 511
S ∗ (ω ) is set as FFT * [ x(n)] .
The processed result through matched-filter is
shown in figure 12. Because 512 < 384 × 2 , overlap
occures during convolution. But because x(n) is a
chirp signal, the correlation between the begin and
the end of the signal is weak and this overlap will not
affect the pulse compression.
X(n)
X(-6)
X(-5)
X(-4)
X(-3)
process(dB)
-23.2767
-23.1839
-40.5492
-20.5537
calculate(dB)
-23.2189
-23.2864
-39.6585
-20.5036
X(n)
X(-2)
X(-1)
X(0)
X(1)
process(dB)
-13.4264
-10.4001
0
-10.3836
calculate(dB)
-13.4672
-10.4015
0
-10.4015
X(n)
X(2)
X(3)
X(4)
X(5)
process (dB)
-13.4377
-20.4741
-39.9253
-23.2885
calculate(dB)
-13.4672
-20.5036
-39.6585
-23.2864
X(n)
X(6)
X(7)
X(8)
X(9)
process(dB)
-23.1968
-31.1193
-33.8152
-26.8170
calculate(dB)
-23.2189
-30.9762
-33.8457
-26.8910
X(n)
X(10)
process(dB)
-29.1364
calculate(dB)
-29.1881
The second group of test vectors are applied to test
the 256-points work mode. The input signal is:
3
3 2

n / 2) 0 ≤ n ≤ 63
exp( j ⋅ 2π (− n +
x ( n) = 
8
256
0
64 ≤ n ≤ 255
The processed result is shown in figure 13 and the
relative comparison is listed in table 2. In this test,
there is no overlap in convolution.
Fig.12 512-points pulse compression result
To further verify the result, the most important
twenty one points processed results are listed in
table1. Through comparing these results with the
“Matlab” calculation results, it could been seen that
the process precision of this design is very high.
Table 1
X(n)
X(-10)
X(-9)
X(-8)
X(-7)
process(dB)
-28.9965
-27.0478
-33.7113
-30.8718
calculate(dB)
-29.1881
-26.8910
-33.8457
-30.9762
Fig.13 256-points pulse compression result
Table 2
X(n)
X(-10)
X(-9)
X(-8)
X(-7)
process(dB)
-28.6051
-30.8746
-28.3943
-25.6623
calculate (dB)
-28.5801
-31.0146
-28.3904
-25.6476
X(n)
X(-6)
X(-5)
X(-4)
X(-3)
process(dB)
-35.3313
-21.3956
-24.6855
-24.1825
calculate(dB)
-35.2251
-21.4539
-24.5592
-24.1273
X(n)
X(-2)
X(-1)
X(0)
X(1)
process(dB)
-13.4969
-10.1513
0
-10.1301
calculate(dB)
-13.5515
-10.1443
0
-10.1443
X(n)
X(2)
X(3)
X(4)
X(5)
process(dB)
-13.5157
-24.0544
-24.5103
-21.4588
calculate(dB)
-13.5515
-24.1273
-24.5592
-21.4539
X(n)
X(6)
X(7)
X(8)
X(9)
process(dB)
-34.8020
-25.5586
-28.5885
-31.1649
calculate(dB)
-35.2251
-25.6476
-28.3904
-31.0146
X(n)
X(10)
process(dB)
-28.3975
calculate(dB)
-28.5801
5 Conclusion
The matched-filter provided in this paper is
specially designed for the tracking radar system. It
can process three channels of radar backscattering
signal simultaneously. This design has such
characteristics: 1) It applies block-floating-point
arithmetic and achieves very high precision. 2)
Through parallel processing, its performance is
improved
significantly.
512
points
pulse
compression operation consumes only 102us. 3) Its
peripheral circuitry is very simple. Except an
EPROM for the coefficient storage, no other
auxiliary chips are needed. So it is very suitable for
the airborne environment. 4) This chip is low power
dissipation. The leak power consumption of the chip,
estimated with XPOWER tool provided by Xilinx, is
405mw. After processing a group of data, this chip
turns to idle status automatically.
References:
[1] Charles E. Cook Marvin Bernfeld, Radar Signals
an Introduction to Theory and Application,
Artech House, INC, 1993
[2] Tortoli, P. Guidi, F. Atzeni, C., Digital VS SAW
matched filter implementation for radar pulse
compression, Proceedings of the IEEE
Ultrasonics Symposium 1 Nov 1-4 1994 1994
Sponsored by: Ultrasonics, Ferroelectrics, and
Frequency Control Society IEEE pp. 199-202
1051-0117
[3] Tapan K. Brown, Russell D., Ultra-low sidelobe
pulse compression technique for high
performance radar systems Sarkar, IEEE National
Radar Conference - Proceedings May 13-15 1997
1997 Sponsored by: IEEE, pp 111-114
[4] Huang Ruojian, The Design of Match Filter in
Pulse Compression, 1997 (Master dessertation of
Beijing Institute of Technology)
[5] Liu Zhaohui, The Design of Application Specific
FFT Processors and The Study of CFAR
Detectors Based on Systolic Array, 1999 (Ph.D
dessertation of Beijing Institute of Technology)
[6] Brigham, E. Oran, The fast Fourier transform and
its application, Englewood Cliffs, N.J., Prentice
Hall, 1988.
[7]Rabaey Jan M., Digital Integrated Circuit a
Design Perspective [M], Prentice Hall Inc.,
Simon & Schuster / A Viacom Company, 1996:
522-533.
[8] Hu Guorong and Lee Tak Kwan., Asynchronous
FFT-ASIC Architecture, Chinese Journal of
Electronics, Oct 1998, Vol. 7 No. 4 pp. 333-337.
[9] A. V. Oppenheim and C. J. Weinstein. Effects of
finite register length in digital filtering and the
fast Fourier transform. Proc. of IEEE. 1972, pp.
957~976
Download