A KASSPER Real-Time Embedded Signal Processor Testbed Glenn Schrader

advertisement
A KASSPER
Real-Time Embedded Signal
Processor Testbed
Glenn Schrader
Massachusetts Institute of Technology
Lincoln Laboratory
28 September 2004
This work is sponsored by the Defense Advanced Research Projects Agency, under Air Force Contract F19628-00-C-0002. Opinions,
interpretations, conclusions, and recommendations are those of the author and are not necessarily endorsed by the United States Government.
MIT Lincoln Laboratory
HPEC 2004 KASSPER Testbed-1
GES 4/21/2004
Outline
•
•
KASSPER Program Overview
KASSPER Processor Testbed
Computational Kernel Benchmarks
– STAP Weight Application
– Knowledge Database Pre-processing
• Summary
HPEC 2004 KASSPER Testbed-2
GES 4/21/2004
MIT Lincoln Laboratory
Knowledge-Aided Sensor Signal Processing
and Expert Reasoning (KASSPER)
MIT LL Program Objectives
Develop systems-oriented
approach to revolutionize
ISR signal processing for
operation in difficult
environments
• Develop and evaluate high•
•
performance integrated radar
system/algorithm
approaches
Incorporate knowledge into radar
resource scheduling, signal, and
data processing functions
Define, develop, and demonstrate
appropriate real-time signal
processing architecture
SAR Spot
Ground Moving Target
Indication (GMTI) Search
HPEC 2004 KASSPER Testbed-3
GES 4/21/2004
Synthetic Aperture
Radar (SAR) Stripmap
GMTI Track
MIT Lincoln Laboratory
Representative KASSPER Problem
• Radar System Challenges
• KASSPER System Characteristics
– Heterogenous clutter environments
– Clutter discretes
– Dense target environments
• Results in decreased system
performance (e.g. higher Minimum
Detectable Velocity, Probability of
Detection vs. False Alarm Rate)
– Knowledge Processing
– Knowledge-enabled Algorithms
ISR Platform
– Knowledge Repository
– Multi-function Radar with
KASSPER
– Multiple Waveforms
+ Algorithms
System
– Intelligent Scheduling
• Knowledge Types
– Mission Goals
– Static Knowledge
– Dynamic Knowledge
Convoy B
Launcher
Convoy A
.
HPEC 2004 KASSPER Testbed-4
GES 4/21/2004
MIT Lincoln Laboratory
Outline
• KASSPER Program Overview
• KASSPER Processor Testbed
• Computational Kernel Benchmarks
– STAP Weight Application
– Knowledge Database Pre-processing
• Summary
HPEC 2004 KASSPER Testbed-5
GES 4/21/2004
MIT Lincoln Laboratory
High Level KASSPER Architecture
Intertial Navigation System (INS),
Global Positioning System (GPS),
etc.
Look-ahead
Scheduler
Sensor Data
Signal
Processing
Target
Detects
Data
Processor
(Tracker, etc)
Track
Data
Knowledge Database
e.g. Mission
Knowledge
e.g. Terrain
Knowledge
Static (i.e. “Off-line”) Knowledge Source
e.g. Real-time
Intelligence
External System
Operator
HPEC 2004 KASSPER Testbed-6
GES 4/21/2004
MIT Lincoln Laboratory
Baseline Signal Processing Chain
GMTI Signal Processing
GMTI Functional Pipeline
Raw
Sensor
Data
INS, GPS, etc
Data
Input
Task
Beamformer
Task
Filtering
Filter
Task
Task
Detect
Task
Target
Detects
Data
Output
Task
Data
Processor
(Tracker, etc)
Look-ahead
Scheduler
Knowledge Processing
Track
Data
Knowledge Database
Baseline data cube sizes:
• 3 channels x 131 pulses x 1676 range cells
• 12 channels x 35 pulses x 1666 range cells
.
HPEC 2004 KASSPER Testbed-7
GES 4/21/2004
MIT Lincoln Laboratory
KASSPER Real-Time Processor Testbed
Processing Architecture
INS, GPS, etc
Sensor Data
Look-ahead
Scheduler
Signal
Processing
Results
Software Architecture
Data
Processor
(Tracker, etc)
Application
Application
Application
Parallel Vector Library (PVL)
Knowledge Database
Kernel
Kernel
Kernel
Off-line Knowledge Source (DTED, VMAP, etc)
KASSPER Signal Processor
•
•
•
•
•
Sensor Data
Storage
Surrogate for actual
radar system
“Knowledge”
Storage
HPEC 2004 KASSPER Testbed-8
GES 4/21/2004
Rapid Prototyping
High Level Abstractions
Scalability
Productivity
Rapid Algorithm Insertion
120 PPC G4 CPUs @ 500 MHZ
4 GFlop/Sec/CPU = 480 GFlop/Sec
MIT Lincoln Laboratory
KASSPER Knowledge-Aided
Processing Benefits
HPEC 2004 KASSPER Testbed-9
GES 4/21/2004
MIT Lincoln Laboratory
Outline
• KASSPER Program Overview
• KASSPER Processor Testbed
• Computational Kernel Benchmarks
– STAP Weight Application
– Knowledge Database Pre-processing
• Summary
HPEC 2004 KASSPER Testbed-10
GES 4/21/2004
MIT Lincoln Laboratory
Space Time Adaptive Processing (STAP)
Range cell
of interest
?
sample
1
Training Data (A)
Steering Vector (v)
sample
2
Solve for weights
( R=AHA, W=R-1v)
…
sample sample
N-1
N
STAP
Weights (W)
X
Range cell/w
suppressed
clutter
• Training samples are chosen to form an estimate of the
•
background
Multiplying the STAP Weights by the data at a range cell
suppresses features that are similar to the features that
were in the training data
HPEC 2004 KASSPER Testbed-11
GES 4/21/2004
MIT Lincoln Laboratory
Space Time Adaptive Processing (STAP)
Breaks
Assumptions
Range cell
of interest
?
sample
1
Training Data (A)
Steering Vector (v)
sample
2
Solve for weights
( R=AHA, W=R-1v)
…
sample sample
N-1
N
STAP
Weights (W)
X
Range cell/w
suppressed
clutter
• Simple training selection approaches may lead to degraded
performance since the clutter in training set may not match
the clutter in the range cell of interest
HPEC 2004 KASSPER Testbed-12
GES 4/21/2004
MIT Lincoln Laboratory
Space Time Adaptive Processing (STAP)
Range cell
of interest
?
sample sample
1
2
Training Data (A)
Solve for weights
( R=AHA, W=R-1v)
Steering Vector (v)
…
sample
N-1
sample
N
STAP
Weights (W)
X
Range cell/w
suppressed
clutter
• Use of terrain knowledge to select training samples can
mitigate the degradation
HPEC 2004 KASSPER Testbed-13
GES 4/21/2004
MIT Lincoln Laboratory
STAP Weights and Knowledge
• STAP weights computed from training samples within a
•
•
training region
To improve processor efficiency, many designs apply
weights across a swath of range (i.e. matrix multiply)
With knowledge enhanced STAP techniques, MUST assume
that adjacent range cells will not use the same weights (i.e.
weight selection/computation is data dependant)
STAP Input
STAP Input
select
1 of N
Training
*
Weights
STAP Output
Equivalent to Matrix Multiply
since same weights
are used for all columns
HPEC 2004 KASSPER Testbed-14
GES 4/21/2004
*
Weights
Training
External
data
STAP Output
Cannot use Matrix Multiply
since the weights applied to
each column are data dependant
MIT Lincoln Laboratory
Benchmark Algorithm Overview
Benchmark Reference Algorithm (Single Weight Multiply)
STAP Output
=
Weights
X
STAP Input
Benchmark Data Dependant Algorithm (Multiple Weight Multiply)
STAP Input
Training Set
STAP
Weights
N pre-computed
Weight vectors
Weights applied to a column
selected sequentially modulo N
*
STAP Output
Benchmark’s goal is to determine the achievable performance
compared to the well-known matrix multiply algorithm.
HPEC 2004 KASSPER Testbed-15
GES 4/21/2004
MIT Lincoln Laboratory
Test Case Overview
JxKxL
Single Weight Multiply
K
L
X
J
K
Multiple Weight Multiply
L
K
X
J
K
N
N = # of weight matrices = 10
• Complex Data
– Single precision
interleaved
• Two test architectures
– PowerPC G4 (500Mhz)
– Pentium 4 (2.8Ghz)
HPEC 2004 KASSPER Testbed-16
GES 4/21/2004
J and K dimension length
Data Set
L dimension length
• Four test cases
–
–
–
–
Single weight multiply/wo cache flush
Single weight multiply/w cache flush
Multiple weight multiply/wo cache flush
Multiple weight multiply/w cache flush
MIT Lincoln Laboratory
Data Set #1 - LxLxL
Single Weight Multiply
L
X
MFLOP/Sec
L
MFLOP/Sec
L
L
Multiple Weight Multiply
L
L
X
L
L
10
Single/wo flush
Single/w flush
Length (L)
HPEC 2004 KASSPER Testbed-17
GES 4/21/2004
Length (L)
Multiple/wo flush
Multiple/w flush
MIT Lincoln Laboratory
Data Set #2 – 32x32xL
Single Weight Multiply
32
X 32
MFLOP/Sec
32
MFLOP/Sec
L
Multiple Weight Multiply
L
32
X
32
32
10
Single/wo flush
Single/w flush
Length (L)
HPEC 2004 KASSPER Testbed-18
GES 4/21/2004
Length (L)
Multiple/wo flush
Multiple/w flush
MIT Lincoln Laboratory
Data Set #3 – 20x20xL
Single Weight Multiply
20
X 20
MFLOP/Sec
20
MFLOP/Sec
L
Multiple Weight Multiply
L
20
X
20
20
10
Single/wo flush
Single/w flush
Length (L)
HPEC 2004 KASSPER Testbed-19
GES 4/21/2004
Length (L)
Multiple/wo flush
Multiple/w flush
MIT Lincoln Laboratory
Data Set #4 – 12x12xL
Single Weight Multiply
12
X 12
MFLOP/Sec
12
MFLOP/Sec
L
Multiple Weight Multiply
L
12
X
12
12
10
Single/wo flush
Single/w flush
Length (L)
HPEC 2004 KASSPER Testbed-20
GES 4/21/2004
Length (L)
Multiple/wo flush
Multiple/w flush
MIT Lincoln Laboratory
Data Set #5 – 8x8xL
Single Weight Multiply
8
X
MFLOP/Sec
8
MFLOP/Sec
L
8
Multiple Weight Multiply
L
8
X
8
8
10
Single/wo flush
Single/w flush
Length (L)
HPEC 2004 KASSPER Testbed-21
GES 4/21/2004
Length (L)
Multiple/wo flush
Multiple/w flush
MIT Lincoln Laboratory
Performance Observations
• Data dependent processing cannot be optimized as well
– Branching prevents compilers from “unrolling” loops to
improve pipelining
• Data dependent processing does not allow efficient “reuse” of already loaded data
– Algorithms cannot make simplifying assumptions about
already having the data that is needed.
• Data dependent processing does not vectorize
– Using either Altivec or SSE is very difficult since data
movement patterns become data dependant
• Higher memory bandwidth reduces cache impact
• Actual performance scales roughly with clock rate rather
than theoretical peak performance
Approx 3x to 10x performance degradation,
Processing efficiency of 2-5% depending on CPU architecture
.
HPEC 2004 KASSPER Testbed-22
GES 4/21/2004
MIT Lincoln Laboratory
Outline
• KASSPER Program Overview
• KASSPER Processor Testbed
• Computational Kernel Benchmarks
– STAP Weight Application
– Knowledge Database Pre-processing
• Summary
HPEC 2004 KASSPER Testbed-23
GES 4/21/2004
MIT Lincoln Laboratory
Knowledge Database Architecture
Knowledge
Manager
New Knowledge (Time Critial Targets, etc)
GPS, INS,
Lookup
User Inputs,
Update Look-ahead
etc
Off-line
Pre-Processing
Knowledge
Scheduler
Database
Command
New Knowledge
Sensor
Data
Load/
Store/
Send
Results
Signal
Processing
Downstream
Processing
(Tracker, etc)
Stored Data
Stored Data
Knowledge
Cache
Send
Knowledge
Processing
Store
Load
Knowledge
Store
HPEC 2004 KASSPER Testbed-24
GES 4/21/2004
Stored Data
Off-line
Knowledge
Reformatter
Raw a priori
Raster & Vector
Knowledge
(DTED,VMAP,etc)
MIT Lincoln Laboratory
Static Knowledge
• Vector Data
– Each feature represented by a set of point
locations:
Points - longitude and latitude (i.e. towers, etc)
Lines - list of points, first and last points are
not connected (i.e. roads, rail, etc)
Areas - list of points, first and last point are
connected (i.e. forest, urban, etc)
– Standard Vector Formats
Vector Smart Map (VMAP)
Trees (area)
Roads (line)
• Matrix Data
– Rectangular arrays of evenly spaced data
points
– Standard Raster Formats
Digital Terrain Elevation Data (DTED)
DTED
HPEC 2004 KASSPER Testbed-25
GES 4/21/2004
MIT Lincoln Laboratory
Terrain Interpolation
Geographic Position
Sampled Terrain Height
& Type data Data
Slant
Range
N
Terain
Height
Earth Radius
Converting from radar to geographic coordinates
requires iterative refinement of estimate
HPEC 2004 KASSPER Testbed-26
GES 4/21/2004
MIT Lincoln Laboratory
Terrain Interpolation
Performance Results
CPU Type
P4 2.8
G4 500
Time per interpolation
1.27uSec/interpolation
3.97uSec/interpolation
Processing Rate
144MFlop/sec (1.2%)
46MFlop/sec (2.3%)
• Data dependent processing does not vectorize
– Using either Altivec or SSE is very difficult
• Data dependent processing cannot be optimized as well
– Compilers cannot “unroll” loops to improve pipelining
• Data dependent processing does not allow efficient “reuse” of already loaded data
– Algorithms cannot make simplifying assumptions about
already having the data that is needed
Processing efficiency of 1-3% depending on CPU architecture
.
HPEC 2004 KASSPER Testbed-27
GES 4/21/2004
MIT Lincoln Laboratory
Outline
• KASSPER Program Overview
• KASSPER Processor Testbed
• Computational Kernel Benchmarks
– STAP Weight Application
– Knowledge Database Pre-processing
• Summary
HPEC 2004 KASSPER Testbed-28
GES 4/21/2004
MIT Lincoln Laboratory
Summary
• Observations
– The use of knowledge processing provides a significant
system performance improvement
– Compared to traditional signal processing algorithms, the
implementation of knowledge enabled algorithms can result
in a factor of 4 or 5 lower CPU efficiency (as low as 1%-5%)
– Next-generation systems that employ cognitive algorithms
are likely to have similar performance issues
• Important Research Directions
– Heterogeneous processors (i.e. a mix of COTS CPUs, FPGAs,
GPUs, etc) can improve efficiency by better matching
hardware to the individual problems being solved. For what
class of systems is the current technology adequate?
– Are emerging architectures for cognitive information
processing needed (e.g. Polymorphous Computing
Architecture – PCA, Application Specific Instruction set
Processors - ASIPs)?
HPEC 2004 KASSPER Testbed-29
GES 4/21/2004
MIT Lincoln Laboratory
Download