REAL TIME REAL TIME DIGITAL DIGITAL SIGNAL PROCESSING

advertisement
Eng. Julian S. Bruno
REAL TIME
DIGITAL
SIGNAL
PROCESSING
FI-UBA 2010
Introduction
Why Digital? A brief comparison with analog.
Seminario de Electrónica: Sistemas Embebidos
FI-UBA 2010
Advantages




Eng. Julian S. Bruno
The BIG picture
Flexibility. Easily modifiable and upgradeable.
Reproducibility.
p
y Don’t depend
p
on
components tolerance. Exactly reproduced
from one unit to other.
Reliability. No age or environmental drift.
Comple it Allows
Complexity.
Allo s sophisticated applications
in only one chip.
D t
Data
Real time
algorithms
Results
FAST
Real Time DSP System
FI-UBA 2010
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
Sampling signals: A very important
fifirst step.
Sampling low-pass
low pass signals (CT)
The sampling theorem indicates that a continuous signal
can be properly sampled, only if it does not contain
frequency components above one-half of the sampling
rate.
t
Fc
N
LF
Fs
MIPS
MFLOPS
Real Time DSP System
FI-UBA 2010
Nyquist sampling theorem
Eng. Julian S. Bruno
Aliasing and frequency ambiguity
FI-UBA 2010
fS  2 fN
Eng. Julian S. Bruno
Sampling band-pass
band pass signals
IF sampling
Harmonic
sampling
Sub-Nyquist
sampling
p g
Undersampling
FI-UBA 2010
2 fc  B
2f B
 fs  c
m
m 1
Eng. Julian S. Bruno
FI-UBA 2010
for any positive integer m
m,
where fs ≥ 2B is accomplished.
Eng. Julian S. Bruno
Sampling band-pass
band pass signals
m
(2Fc-B)/m
(2Fc-B)/(m+1)
Optimum Fs
1
35.0 MHz
22.5 MHz
22.5 MHz
2
17.5 MHz
15.0 MHz
17.5 MHz
3
11.66 MHz
11.25 MHz
11.25 MHz
4
8 75 MH
8.75
MHz
9 0 MH
9.0
MHz
-
5
7.0 MHz
7.5 MHz
-
Reconstruction signals
Optimum Fs is defined here
as that optimum
p
frequency
q
y
where spectral replications do
no butt up against each other
except at zero Hz
Real Time DSP System
FI-UBA 2010
Eng. Julian S. Bruno
Reconstruction signals




Analog signal X can be
reconstructed from its
samples by using the
f ll i formula:
following
f
l
The reconstruction is
based on the interpolation
of shifted sinc functions.
functions.
It is very difficult to generate
sinc
i functions
f
ti
by
b electronic
l t i
circuitry.
An approximation of a sinc
function is a pulse. Sample
and hold circuit performs
this approximation
approximation.
FI-UBA 2010
Eng. Julian S. Bruno
Reconstruction Errors
 t  kTS
X (t )   X (kTS )  sinc
k  

 TS

FI-UBA 2010
Sample and hold circuits



The gain in the desired central band is not constant
The are high-frequency replica of the signal spectrum
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
Reconstruction Solutions
Reconstruction Errors Example
The gain in the desired central band is not constant
It is p
possible to compensate
p
for this non-ideality
y byy
using an inverse filter as part of the DSP component
The are high-frequency replica of the signal spectrum
which can be removed by using a lowpass filter
FI-UBA 2010
Eng. Julian S. Bruno
Real Time constraints
FI-UBA 2010
Eng. Julian S. Bruno
Real time constraints


Signal Path

Algorithms time (tA) MUST fit between two
consecutive sampling periods (tS).
Thus tA limits the maximum frequency that a system
can work.
The definition of real time is VERY application
y
)
dependant ((faster speed of evolution of the system).
Real Time DSP System
FI-UBA 2010
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
Real time constraints






DSP hardware
Block
Bl
k Processing
P
i Mode
M d
4 memory buffers of
length N are required
f double-buffering
for
ff
method.
2 memoryy buffers (in
(
and out) are needed for
internal processing by
the processor.
A delay of 2NTs is incurred in block
processing.
More complicated programming is needed
to manage the switching between buffers.
Can be configured the ADC and DAC to
transfer data samples into the internal
memory of processor using the serial ports
and the DMA.
FI-UBA 2010
Real Time DSP System
Eng. Julian S. Bruno
FI-UBA 2010
What can we do with a DSP?





Almost any linear and nonlinear system (PID
controller).
Digital filters (FIR-IIR).
Adaptive systems (LMS algorithm)
algorithm).
Modulators and demodulators.
Any mathematical intensive algorithm (FFTDCT-WT).
Linear systems implementation


FI-UBA 2010
Eng. Julian S. Bruno
Eng. Julian S. Bruno
Being x(n) and h(n) are arrays of
numbers. If we want to compute y(n)
we have to multiply and sum the last M
samples being M the length of h(n)
samples,
h(n).
This repeated for every new sample
received from de ADC.
As you can see, any linear system
uses multiplications, accumulations
(
(sums),
) and
d lloops iintensively.
t
i l
FI-UBA 2010
Eng. Julian S. Bruno
Summary of desirable features of a
DSP
Fast Fourier Transform FFT





FI-UBA 2010
Eng. Julian S. Bruno
So those are DSP math features
So,





Multiply
M
lti l and
dA
Accumulators
l t
(MAC’s) units.
ALU’s
ALU
s (fixed and floating
point).
Barrel shifters.
Depending on DSP
application, more than one
unit
it are presentt in
i modern
d
DSP’s, allowing parallelism.
Harvard (modified)
architecture provide multiple
operations per cycle.
FI-UBA 2010
Fastt in
F
i mathematics
th
ti operations,
ti
and
d
combinations of them (multiply and sum
specially).
i ll )
Flexible addressing modes (bit reversal, circular
buffers,
ff
zero overhead loops))
DSP specific instruction set ((arithmetic shifting,
g
saturating arithmetic, rounding, normalization)
Minimum overhead p
peripherals
p
((communications
devices specially)
DSP instructions for specific applications (Video,
Control, Audio)
FI-UBA 2010
Architectural Features for Efficient
P
Programming
i






Eng. Julian S. Bruno
Eng. Julian S. Bruno
FI-UBA 2010
Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
Eng. Julian S. Bruno
Architectural Features for Efficient
P
Programming
i
Specialized Addressing Modes
Circular buffering






Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
FI-UBA 2010
B0 = 0x00; L0 = 0; // Base and length
B0 = 0x00; L0 = 44; // Base and length
I0 = 0x00; M0 = 16; // Index and increment
R0 = [I0++M0]; // R0=1 & I0=0x10
R0 = [I0++M0]; // R0=5 & I0=0x20
R0 = [I0++M0]; // R0=9 & I0=0x04
R0 = [I0++M0];
[I0 M0] // R0
R0=2
2 & I0=0x14
I0 0 14
R0 = [I0++M0]; // R0=6 & I0=0x24
Eng. Julian S. Bruno
Architectural Features for Efficient
P
Programming
i





FI-UBA 2010
Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
I0=0; M0=1; // Index and increment
I2=256; P0 = 8;
LOOP(start, end) LC0 = P0;
start: // I0 automatically incremented in B-R progression
R0 = [I0] || I0 += M0 (BREV);
end: // I2 point to bit
bit-riversed
riversed buffer
[I2++] = R0;
FI-UBA 2010
Eng. Julian S. Bruno
Hardware Loop Constructs


Bit-Riversal

Looping is a critical feature in communications
processing algorithms.
There are two key looping-related features that
can improve
p
p
performance on a wide variety
y of
algorithms:
 “zero-overhead
zero overhead
hardware loop”
loop
 “hardware loop buffers”
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
Architectural Features for Efficient
P
Programming
i
Cacheable memories







Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
FI-UBA 2010


Eng. Julian S. Bruno
FI-UBA 2010
Architectural Features for Efficient
P
Programming
i





Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
Eng. Julian S. Bruno
Multiple operations per cycle


Today’s high-speed
processors would
effectivelyy run at much
slower speeds because
larger applications would
only fit in slower external
memory.
Programmers would be
forced to manuallyy move
key code in and out of
internal SRAM.
Adding
g data and instruction
caches into the
architecture, external
memory becomes much
more manageable.
bl

In addition to
performing multiple
ALU/MAC
operations each
core processor
cycle, additional
data loads and
stores can also be
completed in the
same cycle.
y
The memory is
typically portioned
into sub-banks
sub banks that
can be dualaccessed by the
core and optionally
p
y
by a DMA controller.
There are two multi-issue architectures: VLIW and superscalar
FI-UBA 2010
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
Architectural Features for Efficient
P
Programming
i






Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
Interlocked pipeline



FI-UBA 2010
Eng. Julian S. Bruno
Architectural Features for Efficient
P
Programming
i
I order
In
d tto increase
i
th
throughput,
h t DSPs
DSP are d
designed
i
d tto
be pipelined
When assembly programming is required
required, the pipeline
can make programming more challenging.
The processor automatically handles stalls and
bubbles.
FI-UBA 2010
Another important features







FI-UBA 2010
Specialized
p
addressing
g modes
Hardware Loop Constructs
Cacheable memories
Multiple operations per cycle
Interlocked pipeline
Another important features
Eng. Julian S. Bruno



RISC lik
like registers
i t
and
d iinstruction
t ti sett
Multiple data/program buses.
DMA controller for handling peripherals
In traditional fixed
fixed-point
point DSPs, word sizes are
usually fixed. However, there is an advantage
to having data registers that can be treated as:
 One
64-bit word
 Two 32
32-bit
bit word
 Four 16-bit word
 Eight 8-bit word
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
DSP clasification





Why DSP hardware?
Fixed or Floating point arithmetic.
Millions of multiply–accumulate
py
operations
p
p
per
second, MMACs.
Millions of floating
floating-point
point operations per
second, MFLOPS.
Application specific feat
features
res ((video,
ideo a
audio,
dio
control, communications).
Memory





FI-UBA 2010
Eng. Julian S. Bruno
TI Processors

Special-purpose (custom) chips such as application-specific
integrated circuits (ASIC).
Field-programmable
Field
programmable gate arrays (FPGA).
General-purpose microprocessors or microcontrollers (μP/μC).
General-purpose digital signal processors (DSP processors).
DSP processors with application-specific hardware (HW)
accelerators.
FI-UBA 2010
Eng. Julian S. Bruno
C5000™ DSP Platform Roadmap
C5000
C6000™ High
Hi h Performance
P f
DSP
DSPs
Ideal for imaging, broadband infrastructure and performance audio applications.

C6000™ Performance Value DSPs
C6000
Ideal for broadband infrastructure and performance audio applications. Lower
cost.

C6000™ Floating-point DSPs
Ideal for professional audio products, biometrics, medical, industrial, digital
imaging, speech recognition, conference phones and voice
voice-over
over packet

C5000™ Power-Efficient DSPs
Optimized for power- and cost-efficient embedded signal processing solutions

C2000™ 32-bit Real-time MCUs
Optimized core can run multiple complex control algorithms at speeds
necessary for
f demanding
d
di control
t l applications
li ti
FI-UBA 2010
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
TI’ss ARM Processor
TI
Processor-Based
Based
C6000™ DSP Platform Roadmap
C6000
Sitara™ ARM Microprocessors




Stellaris® MCU









ARM9-based devices. LP, general-purpose, multimedia and graphics
processing
ARM® Cortex™-A8
Cortex™ A8 core
C64x+™ DSP and Video Accelerators
3-D Graphics Acceleration
DaVinci™ Digital Media Processors


Eng. Julian S. Bruno
ARM Cortex-M3
Cl k Speed:
Clock
S
d Up
U to
t 100 MHz
MH
Up to 125 MIPS (at 100 MHz)
Advanced integration: Serial interfaces, motion control, system, analog
OMAP™ Applications
A li ti
P
Processors

FI-UBA 2010
Cortex™-A8 and ARM9™-based embedded microprocessors
Clock Speed: 300 MHz to 1.5 GHz
3D Graphics Accelerator and Power technology (OMAP)
FI-UBA 2010

Optimized for digital video systems
ARM9 only, ARM9 + DSP and DSP only.
Eng. Julian S. Bruno
ADI Processors
TI’ss ARM Processor
TI
Processor-Based
Based Products

TigerSHARC® Processors






SHARC® Processors




32-Bit floating-point
Clock Speed: 150MHz to 400MHz / 2.4 GFLOPs.
Accelerator Architecture: FIR, IIR, FFT.
Blackfin® Processors





32-bit fixed-point as well as floating-point
Clock Speed: 250MHz to 600MHz
4.8 GMACs of 16-bit
16 bit performance / 3.6 GFLOPs
24 Mbits of on- chip memory
5 Gbytes of I/O bandwidth
16/32-bit fixed point
Clock Speed:
p
200MHz to 756MHz / 1.5 GMACs
Very low power consumption: 0.23mW/Mhz
RTOS supported. Multicore 600MHz / 2.4 GMACs.
ADSP-21xx Processors
16/32-bit fixed point
Clock Speed: 75MHz to 160MHz
Analog Devices brought first programmable processor to market in 1986
FI-UBA 
2010


FI-UBA 2010
Eng. Julian S. Bruno
Eng. Julian S. Bruno
ADSP-21xx Processors
Blackfin Processors
ADSP-2191 BLOCK DIAGRAM
ADSP-BF536/ADSP-BF537 BLOCK DIAGRAM
FI-UBA 2010
Eng. Julian S. Bruno
FI-UBA 2010
SHARC Processors
TigerSHARC
g
Processor
ADSP-2146x BLOCK DIAGRAM
ADSP-TS201S BLOCK DIAGRAM
FI-UBA 2010
Eng. Julian S. Bruno
FI-UBA 2010
Eng. Julian S. Bruno
Eng. Julian S. Bruno
Markets and Applications
Recommended bibliography

RG L
Lyons, U
Understanding
d t di Di
Digital
it l Si
Signall P
Processing
i
2nd ed. Prentice Hall 2004.


SW Smith, The Scientist and Engineer’s guide to
DSP. California Tech. Pub. 1997.



FI-UBA 2010
Eng. Julian S. Bruno
Questions?
Thank you!
Eng. Julian S. Bruno
FI-UBA 2010
Ch1: The Breadth and Depth of DSP
Ch3: ADC and DAC
SM Kuo,
K
BH L
Lee. R
Real-Time
l Ti
Digital
Di it l Si
Signall P
Processing
i
2nd ed. John Wiley and Sons. 2006


Ch2: Periodic Sampling
Ch1:Introduction to Real
Real-Time
Time Digital Signal Processing
NOTE: Many images used in this presentation were extracted from the
recommended bibliography.
FI-UBA 2010
Eng. Julian S. Bruno
Download