DSPs Vs General Purpose Microprocessors

advertisement
DSPs Vs General Purpose
Microprocessors
AOE 5984 – Real Time Systems
Common DSP Applications…
 Communications
 Audio, Video processing
 Graphics, 3-D rendering
 Navigation, radars, GPS
 Controls – Robotics, guidance, Machine Vision
 Filtering
 Frequency-Time transformations (FFT-IFFT)
Common DSP Tasks…
 Modulation-Demodulation, Error correction
 Noise reduction, equalization, echo
cancellation
 Audio compression
 Vector and Matrix calculations
 Control algorithms
DSPs Need to Do…
 Efficient repetitive numerical calculations
 Maintain numeric fidelity
 Provide high memory bandwidth
 Streaming data
 Real Time processing
DSPs Need to Minimize…
 Real Time execution unpredictability
 Memory use
 Power consumption
 Cost
 Development time
What Do DSPs Have?
 Specialized memory architecture (Harvard)
 Specialized parallel execution units
 Specialized addressing modes
 Specialized instruction sets for parallel
execution
 Specialized peripherals
FIR Filtering…
x
D
h0
D
h1
D
hn
y
1.
2.
3.
4.
Two data fetches,
Multiply operation,
Accumulate Operation,
Input vector shifting
Multiply-Accumulate (MAC)


Multiplication in single cycle
Execution time ~ 200 ns
Register
Multiplier
ALU
Accumulator
Special Hardware Units…




Hardware shifter.
Hardware circular buffers.
Special h/w for zero overhead looping.
Special address generation units.
Address Generation Units…




Work in parallel with DSP core execution unit.
Access new addresses without pausing to calculate
new addresses.
Take advantage of predictability in the pattern of
data access in DSP algorithms, using special
addressing modes.
e.g. register-indirect with post increment
addressing, circular (modulo) addressing, bitreverse addressing in hardware.
Von Neumann Architecture…
Processor Core
Address bus
1.
Fetch MAC instruction
2.
Read value of ‘x’
3.
Read value of ‘h’
4.
Multiply x, h and accumulate
5.
Write result to memory
•
4 memory access operations
•
One multiplication
Data bus
Memory
(Code+Data)
Harvard Architecture…
1.
Data and Code in
separate memory
segments
2.
Multiple address and data
buses
3.
Double memory
bandwidth
4.
Simultaneous code and
data fetch
Processor Core
AB1
DB1
AB2
DB2
Memory A
Memory B
Caches in DSP and GPP…
1. GPPs normally contain two on-chip caches – one for
data and the other for instructions.
2. Allows full speed retrieval of instructions and data
without accessing slower off-chip memory.
3. DSPs contain a very small instruction cache and no
data cache.
4. GPPs use control logic to determine what code and
data goes into cache, while in DSPs it is
programmer’s job to make a decision.
Fixed-Point Arithmetic…




Most DSPs use fixed point arithmetic than
floating point.
Faster.
Cheaper.
Hardware support for saturation arithmetic,
rounding and shifting.
Special Instructions

Why special instructions?




Multiple operations per instruction cycle.
Minimize program memory space.
Specify several parallel operations in a
single instruction.
These instructions permit restricted access
to registers and do not allow arbitrary
operation combinations.
Special Instructions…
MAC X0, Y0, A, X: (R0)+,X0, Y:(R4)+N4, Y0






Multiply contents of X0 and Y0
Add result to accumulator A
Load register X0 from X memory location pointed to by R0
Load register Y0 from Y memory location pointed to by R4
Post-increment R0 by 1
Post-increment R4 by the contents of register R4
This instruction calculates one tap of the FIR filter in one clock cycle
Execution Time Predictability…





Non-DSP applications have a maximum average response
time (firm real time).
DSP applications are hard real time.
Important to be able to calculate exactly the processing
time required, or at least the worst time scenario.
GPPs do not have a good execution time predictability.
Lack of execution time predictability affect code
optimization.
Execution Time Predictability…




GPPs – complicated algorithms for branch
prediction and caching.
Speculative code execution depending on branch
prediction.
Programmer does not know which instructions and
data will go into cache and when.
Worst case execution time may be a order of
magnitude greater than the actual execution time.
Execution Time Predictability…



DSPs do not use branch prediction
algorithms.
Programmer decides which instruction go
into cache.
No data cache in most DSPs.
Other features of DSPs and GPPs

VLIW (Very Long Instruction Word).




Combines a number of different instructions in
a long instruction word.
e.g. 256 bytes word – 8 instructions.
More MACs, ALUs and other execution
units.
GPPs use SIMD.
Download