Rapid Prototyping of RADAR Signal Processing Systems using

advertisement
Rapid Prototyping
of
RADAR Signal Processing Systems
using
Ptolemy Classic
Ptolemy MiniConference UCB
Denis Aulagnier, Patrick Meyer, Hans Schurer, Xavier Warzee,
THALES
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p1
CONTENTS
*
The ESPADON programme & methodology
– Environment & development process used for the benchmark
*
ESPADON Ptolemy developments & the benchmark
– Benchmarking application for ESPADON
– Ptolemy developments
• Library set-up and features
– Improvements done after first use
• MERCURY target development
– Benchmark iterations and results
*
Conclusions
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p2
ESPADON & INDUSTRIAL PARTNERS
*
*
ESPADON: Environment for Signal Processing Application Development
and PrOtotypiNg
EUROFINDER PROGRAMME in France, UK, Netherlands:
– FRANCE
• THALES (Former THOMSON-CSF)
–
–
–
–
THALES AIRBORNE SYSTEMS,
THALES COMMUNICATION,
THALES OPTRONIC,
THALES AIR DEFENCE SYSTEMS
• THOMSON MARCONI SONAR SAS
• MATRA BAe Dynamics
– UNITED KINGDOM
• BAE SYSTEMS Advanced Technology Centres
• THOMSON MARCONI SONAR Ltd
– NETHERLANDS
• THALES Naval Netherlands (former: THOMSON-CSF SIGNAAL)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p3
THE ESPADON METHODOLOGY
Phase 1:
Analysis and Selection of the requirements
allocated to SP Subsystem
Plan SP Development
From System Development
From Previous Process
Requirements
Spiral Model Representation
Risk Analysis
Definition
Development
Validation
Risk Register
•
•
•
•
Risk driven development life cycle
Model Year approach
Reuse and capitalisation
Support for:
- Traceability
- Cost performance trade off
Phase 2:
Definition of SP Subsystem
INCREASING LEVEL OF REFINEMENT
SP production
Hardware/Software
description
D
Mapping
R Development
R
eExample
Specification
Plan of risk:
Software development
description
e
i
v
Example of risk:
Refinement of
q
s
e
Real time performance
architecture choice
u
Review
k
l
Computer
Example of risk:
i
o
architecture
Computing powerFunctional Design
choice
r
R
p
Example of risk:
SP Functional definition
e
e
m
SP algorithms, ...
GO/NO GO
m
g
e
e To Next/Previous
i
n
Architectural Design
Process
Functional
Simulation
n
s
t
modelling
Placement
t
t
Choice
of functions
validation
s
e
P
Development of
r
l Validation
Implementation
performance model
of performance model
Software/Hardware
a
Validation
development
ofn
virtual prototype
(synthesis)
Validation of
manufactured computer
Phase 4:
Validation of SP
Subsystem
System Review
To System Development
Production
Integration
Phase 3:
Development
of SP Subsystem
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p4
ESPADON DESIGN ENVIRONMENT (EDE)
Matlab
• Algorithm
Prototyping
EDE Framework
Simulink/RTW
PTOLEMY
(or GEDAE)
•Tools
Target/
Porting Kit
VSIP
HANDEL-C
• Libraries
• Standards
ICS
FPGA board
Target H/W
Rapid prototyping machine
Mercury G4/RACE++
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p5
PTOLEMY DEVELOPMENT PROCESS
Reusable Components
Functional simulation SDF
STIMULI
STIMULI
(BDF, FSM)
FUNCTIONAL
FUNCTIONAL
RESULTS
RESULTS
F2
F1
Functional design
library:
F5
F3
F4
• SDF VSIP Stars
Architecture Design :
Performance analysis:
• CGC VSIP Stars with
performance info
MATLAB
PE1
STIMULI
GENERATOR F1
PE 1
F1receiveF3F2
PE 2
F2
PE 3
Code
generation
Code
generation
libraries:
libraries:
•CGC
• CGCVSIP
VSIPStars
Stars
•Target
• VHDLoptimised
library
library
• VHDL Drivers for
•Communication
Communication
library
CG
TATL
Target selection / Partitioning / Performance analysis
Real time trace Display
FPGA
PE2
SEND/RECEIVE
Send
F5
F4
F3
F4
MATLAB
POST-PROC
POST-PROC
PE3
DISPLAY
DISPLAY
COMPARASON
F5
PTOLEMY
Gantt Chart Display
Implementation: CGC with an “Handel-C” syntax
Implementation:
PE 1 F1 CGC
F3
C code for the target
HANDEL-C
-------------------------
C code generation
F2
PE 2 for Target
Run on the Target
PE 3
PROTOTYPE
RESULTS
Target B ------------------------Target C
F5
C to VHDL/EDIF
F4 conversion
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p6
BEAMFORMER APPLICATION
*
Channel 1
Receive antennas
*
From a vertical array, e.g. 8 antenna channels, to 6 beams
High level set-up of the radar beamformer application:
*
*
*
Beam 1
Window:
Stabilization,
Tapering,
Calibration...
Channel 8
N-point
FFT/FIR
Elevation beams output towards
velocity filtering & detection
Beam 6
Waveform: 16 pulses, PRF=3-6 kHz, Fsample=2.5MHz
Input: 8 IQ-channels 32 bits complex float: 160 MB/sec
Output: 6 beams 32 bits complex float: 120 MB/sec
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p7
BEAMFORMER CALIBRATION
*
*
Normal burst pattern is one clutter sweep + 16 air pulses
Calibration is performed instead of clutter measurement
using 48 pulses (mode switch):
Clutter Pulse
Air-Burst Pulses
0
1
s= 0
2
Clutter Measurement
s=1
t=0
3
s=2
s=3
T2
T1
4
T3
16
s=4 ...........
T4
s=16
T 16
Burst k
Test Pulse
1
Calibration Pulses
2
3
4
5
6
7
8
9
10
etc.
First incoherent integration
sum of 4 R Q s
Second incoherent
integration sum of 4 R Q s
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p8
BEAMFORMER DESIGN
BEAMFORMER FUNCTIONAL DIAGRAM
SIGNAL
WEIGHTING
(CMPLX MUL)
I/Q VIDEO
level_stab_shift, CVE_phase, RF, receiver_STC
CALCULATE
WEIGHTING
FUNCTION
INCOHERENT
INTEGRATION
CALCULATE
PHASE
DIFFERENCE
COHERENT
INTEGRATION
CALCULATE
GAIN
DIFFERENCE
NOISE
MEASUREMENT
BEAM CALCULATION
weighting _control
Input
interface
to
the File
system
BEAM
FORMING
(FFT)
Output
interface
to
the File
system
CALIBRATION
BF_control
BF_status
BEAMFORMER MONITOR & CONTROL
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p9
ESPADON PTOLEMY
*
Within Ptolemy we only use:
– SDF (or BDF) Domain for functional simulation
– CGC Domain for Code Generation (and implementation)
*
What we have developed for the benchmark is:
– An extension of the Library of stars (both in SDF/BDF and CGC
available, total: 70)
• Radar Library (5 components)
• VSIP Core Light Library (partially, 11 components)
• Support Library (e.g. components for parallel operation, 19 components)
– Target for the MERCURY Machine (G2 and G4 processor)
• VSIP vectors are allocated in one buffer (per processor)
• Synchronized Inter-Processor Communication for Complex Vector (The
Burst Message is always sent along with the data)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p10
PTOLEMY LIBRARY
*
Use of VSIPL standard library
– Pass pointers of VSIPL views between stars instead of data (‘int’-type)
*
*
*
Develop multi- and complex-interleave star needed for
corner-turn process (in HOF domain)
Extent CGC-BDF to handle multiprocessor architecture
Important requirements to developed elements:
– Keep library platform independent, dependency is only in the target
– Make control flow explicit in the data-flow graphs
*
Stars with vector output are provided with 2 extra
parameters:
– MAX_BUF_LENGTH: Maximum length of a vector
– OUT_BUF_OPT: Number of output buffers used for each vector
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p11
PTOLEMY LIBRARY FEATURES
*
*
*
Support in-place operation (if possible)
Support rate change i.e. the output buffer is automatically
duplicated as many times as needed
Colours of the stars highlight the different kind of stars
used in the design:
– Standard Ptolemy stars (WHITE) that use only std C library,
– VSIPL stars (GREEN) that use the std C library and the VSIPL Core
Light library,
– Application specific stars (RED) that also use MERCURY library
(ICS) and/or are specific to the ESPADON radar benchmark.
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p12
LIBRARY SET-UP (CGC)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p13
PTOLEMY LIBRARY IMPROVEMENT
PARAMETERS
DATA & BURST MESSAGE
SLOT 1
SLOT 2
CHANNEL 1
CHANNEL 2
CHANNEL N
SLOT 3
SLOT 4
SECOND STAR
ONE SLOT
CHANNEL 1
SIGNAL DATA
GLOBAL BUFFER (SMAB)
SYNCHRONISATION FLAGS
FIRST STAR
LIBRARY STARS
OFFSET 0
COMMUNICATION CHANNELS
All the stars allocate the required buffers in the “Global
Buffer” during the setup phase:
GROWING OFFSETS
*
CHANNEL 2
CHANNEL N
LAST STAR
FREE SPACE
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p14
PTOLEMY MERCURY TARGET (1)
*
Features
– Generate a C-file for each processor, compile, load and run the
application on the machine
– Use MERCURY ICS Library and VSIPL (exclusively)
⇒ Make it portable to any MERCURY machine
– Arrange synchronisation and data transfer between PPCs
– Data transfer uses DMA ⇒ efficient
• Synchronisation protocol uses simple flags
• Support Variable Vector Length: each communication buffer is duplicated
N times (user defined) and the effective transfer length is set in real time
• Memory is allocated for the maximum vector length (user defined)
• Support both complex storage types (interleaved & split)
• Support complex float vectors (only)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p15
PTOLEMY MERCURY TARGET (2)
*
Features (continued)
– Implement TATL Trace Tool from MERCURY
– Overview of the main parameters (to be set by the user):
• Number of processors
• CE id for each processor
• Size of the Shared Memory Buffer (SMB) for each processor (only one
SMB is created in each processor)
• Size of the “heap” is set for all processors
• Communication buffer length (only one parameter for all the
communication channels)
• ON/OFF switches for debug messages and TATL (trace for all stars
possible)
• Give any ‘runmc’ command line option
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p16
PTOLEMY MERCURY TARGET (3)
*
Interface with VSIPL issues
– If the input vector is already allocated inside the SMB and the stride of
the vector view is equal to one, then the copy is not needed.
⇒ efficient transfer is possible (using In-Place operation)
(Vector view with a stride > 1 are not supported. A 2D DMA is required).
– But according to VSIPL policy, any VSIPL function is allowed to move
the data to the more appropriate place (e.g. to internal memory for a
DSP). Therefore the copy is always needed if we use the ‘VSIPL data’
space.
– This problem is solved if we use only ‘User data’ space. In doing this
we do not follow the defined VSIPL standard, however!
⇒ VSIPL does not fit well on a multi-processor machine like the
MERCURY machine (interface VSIPL - ICS not efficient).
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p17
ESPADON PTOLEMY ISSUES (1)
*
Future work to solve known problems:
– The same buffer size is applied to all communication channels
⇒ Memory allocation overhead
– The Burst Message structure is hard-coded
⇒ Application dependent stars are used in the design
– The BDF stars are available only for galaxies with single input & single
output, and multi-rate is not supported
⇒ Strong design constraint
– The BDF stars can only be used inside a processor
⇒ Design constraint
– The CGC library elements are not calibrated in terms of execution time
⇒ Automatic mapping may fail
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p18
ESPADON PTOLEMY ISSUES (2)
*
Future work to solve known problems (continued):
– The Memory boards are implemented inside the I/O stars
⇒ Memory boards are not really integrated in the design environment
– The inter-processor communication functions support only VSIPL
complex float vectors
⇒ Design constraint
– The TATL Tool cannot be used if the design counts more than 384
different stars (due to the limited number of event types)
⇒ Design constraint
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p19
ITERATION 1: BARE BEAMFORMER DESIGN
*
Iteration 1 (6 processor design):
Data in
+
Distribute
2 striplines
NofRQ/NofProcs
2 striplines
NofRQ/NofProcs
2 striplines
NofRQ/NofProcs
2 striplines
NofRQ/NofProcs
Collect
+
Data out
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p20
TATL RESULTS FOR ITERATION 4
*
Bare beamformer on 8 processors
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p21
BENCHMARK PERFORMANCE METRICS
Iteration 1
Iteration 2
Iteration 3
Iteration 4
NofChannel
8
8
8
8
NofSweep
17
17
17
17
NofProc
4 (+2)
5 (+2)
4+4
8
Possible NofProc
1, 2, 4, 8 (+2) 2, 3, 5, 9, 17 (+2)
>1
1, 2, 4, 8
Input data
DMA
DMA
PRE-LOADED PRE-LOADED
Output data
(DMA)
(DMA)
(DMA)
- (PbMCS)
CORNER-TURN
4->4
NO
4->4
8->8
RACE++ peak load
?
?
?
53 %
LATENCY
1 burst
1 burst
2 bursts
1 burst
PERFORMANCE*
25 ms
21 ms
9.5 ms
9 ms
Support Var. Burst L.
YES
YES
YES
YES
Design Time#
72 H
16 H
12 H
16 H
* The performance is the average processing time for one burst. The measurement has been done with
TATL on 10 bursts of 19000 RQ of 400 ns (i.e. 7.6 ms).
#
Time is without extensive functional testing.
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p22
BENCHMARK FINAL DESIGN (1)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p23
BENCHMARK FINAL DESIGN (2)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p24
FINAL BURST TIMING RESULTS
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p25
CONCLUSIONS (1)
*
*
*
Main functional requirements are met by the final design
(12 of the 19 requirements)
Throughput and latency requirements are almost met;
expected to be met in case of full speed G4 daughter
cards and/or VSIPL functions redesign
Review of graphical Ptolemy designs seems faster and
more efficient than code reviews
– Disadvantage is parameter handling and scope.
– Design is highly multi-rate, but this is difficult to see
– Some functionality is inside stars (hidden)
*
Total design, validate & test time for bare beamformer
was 354.5 hours, while normal development takes 481
hours: Approximately 36% faster (improvement ~1.36)
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p26
CONCLUSIONS (2)
*
*
Development time from functional/architectural design to
implementation is very short: matter of days
For which purpose can we use it?
– Mainly for rapid prototyping of new algorithms
– Rapid prototyping of demonstrators
– Open source approach enables us to adapt the tool to our needs
*
Many improvements are needed before it can be used for
a complete application/project
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p27
Download