here - Oxford Digital

advertisement
DSP—Why So Hard?
November 2010
Who ?
 Peter.Eastty@Oxford-Digital.com
 Design and sell processor cores and matching
programming environments.
 Program strange algorithms onto stranger
processors with the strangest tools.
 Customers
 NDAs
 Client lists.
 You
Why ?
 People expensive, Silicon cheap.
 People slow, Silicon fast.
 People slow, Computers fast.
 Programmer efficiency is
everything, it gets you time to
market which gets you the
market.
Why ?
 Targets move, right up to the last minute.
 Never, ever build a fixed function device.
 Three stories.
 Third party algorithms will have to be
adopted whether public domain or highly
secret.
What is an audio signal?
 Known bandwidth
 Known resolution
 Known number of channels
 So why don’t I enumerate them
Where ?







Large mixing consoles
Cell Phones
Hi Fi
TVs
iPod docs
PCs
etc.
What processing do we want to do to
the audio?
 Continuously varying value against time






Filtering
Polynomial
Non-linear
Decisions
DSP = Low Delay
Not block structured
What hardware resources are
available?
 Memory
 Multipliers
 Connections
 Adders etc.
Data types for DSP
 See RBJ on headroom and floating point
 Want a fixed point data type.
 Word length, 16 – 32 etc. have nothing to
do with audio, setup the word-length to
suit the audio not a computer.
Languages for DSP
 C is not a DSP language, the data types
are all wrong and it has no concept of
time.
 C++ could be a DSP language but it
doesn’t want to be one, it too has no
concept of time.
Languages for DSP
 With modern hardware design and
compiler technology there is never a need
for assembler. NEVER EVER.
 Of course if you’re tied to old hardware for
legacy code reasons you still might have
to hack in assembler.
Languages for DSP
main (…)
{ ASM(…);
ASM(…);
ASM(…);
}
/* This is not C. */
Languages for DSP
main (…)
{ Clear_Acc();
MAC(…);
Store_Acc_to_Register(…);
}
/* This isn’t either.
*/
Languages for DSP
main (…)
{ Multiply_by_Coefficient();
Biquad(…);
Do_FFT(…);
}
/* Neither is this. */
Languages for DSP
 Beware of ‘optional extensions’.
 They can become mandatory.
 There is still at least one University
teaching DSP using FORTRAN and
assembler …
 ...sad to say they apologized about the
FORTRAN.
Languages for DSP
 I don’t know the perfect DSP language.
 But any high level language is better than
any machine specific language.
Multiple Memory Banks
 If there are multiple memories then
memory allocation is NOT the
programmers job, the tool-chain should do
this for you.
 But it might be nice to be able to do some
if you want to.
Multiply-Add
 Source level individual operations (add,
multiply etc.) should be independent,
hardware instructions can combine
multiple operations (like Multiply-Add).
 Make sure the operations in a combined
instruction are exactly the same as those
in individual instructions.
Limiting
 Whatever number system you use it will
have a range, even floating point.
 Limiting will be required after every
operation that can exceed the range,
multiply, add, subtract and absolute value.
 This includes the multiply in a multiplyadd.
 -1 x -1 = -1 ????????
Pipelines
 User should never have to think about
pipelines.
 Variable pipelines are wrong.
 Pipeline is not a panacea for timing
problems, it limits the processing in a loop.
 Pushing code through a branch.
 Using the pipe for parameter passing.
Pipelines
 Definition of pipeline length, count
between the instruction that generates an
item and an instruction that may use it.
 Short circuiting the pipe. Useful, but not
very useful.
 Can unwind the execution by having
pipeline-length prime relative to instruction
count, but this adds to delay, which in turn
adds to storage requirement.
Branching
 If you can find another way avoid
branches.
 If you have to have jumps and a pipeline
keep it all away from the programmer.
 If you do have jumps they’ll likely break
the guaranteed timing.
Conditional Execution
 Conditional execution doesn’t break
pipeline etc.
 But you’ll need as many condition code
stores as you have pipeline length.
 Timing is identical for conditional execution
and multiplexer.
Multiplexer
 y = (a < 0.0) ? b : c;
 Timing is identical for conditional execution
and multiplexer.
 With multiplexer you can use any variable
as a control so no condition code store is
required.
 y = (a <= 0.0) ? b : c;
ABS
 For simple bends in an input/output
relationship, Absolute Value plus some
Addition and Subtraction is more
economical than most other methods.
Truncation, Rounding, Dither and
Noise Shaping
 For every instruction that needs it …
… and just for Output
 Assume fixed point
 Floating point is hard
Truncation, Rounding and Truncation
Towards Zero!.




Truncation is easy but has DC offset
Truncation Towards Zero!
½ LSB offset number systems
Rounding wins and is not much more
complex.
Dither








How do we make it?
Truly random, pseudo random, hash?
What colour do we want it to be?
What PDF do we want?
Make sure it’s un-correlated.
Want repeatability for test.
Problems with infinite gain components.
Rounding wins.
Noise shaping





What shape?
What order?
Want repeatability for test.
Problems with infinite gain components.
Rounding wins.
 Make sure your instruction set can do
dither and noise shaping.
Coefficient Interpolation




Coefficients as a sampled system
SRC called interpolation
HW or SW, 2-3 instructions to feed one.
Only in exceptional circumstances is it
worth a hardware solution.
 Linear is possible, first order filter is easy
and works for many applications.
Coefficient Synchronization
 Coefficient synchronisation.
 Lots of people ignore it or treat it on a per
use basis.
 Can be done for linear or first order filters
with ease.
 This is really a synchronous sampling
problem.
Coefficient
Synchronization, Synchronization
J
itter
Scaling, multiprocessors,
synchronisation & segmentation
 Not all solutions fit in a single processor.
 Automatic segmentation of programs
across multiple processors is possible.
 But it is hard.
 If the processors are not identical, and
identically connected it’s very, very hard.
Scaling, multiprocessors,
synchronisation & segmentation
 If you have multiple processors and no
branches then you can run them in
lockstep, many examples.
 For data transfer between processors
simply send from one processor and
receive by the other at the same time.
 Disastrous for assembler, easy for
compiler.
Scaling, multiprocessors,
synchronisation & segmentation
 How do you connect multiple processors,
series or parallel?
 If you chose either then you can’t do some
algorithms. Use mesh or router instead.
 Small routers are actually cheap and
relatively easy to generate code for.
 Multiple processors I/O, dedicated
processor connections or is I/O a full
member of the clan?
Constant folding and common code
removal.
 Easy in a compiler, often missed by an assembly
language programmer.
 Keep everything as source until the last possible
moment.
 That way common parts can be taken advantage
of, constants, but more importantly data and
instructions.
 Leads to documentation of library functions
requiring “at most X data memories and Y
instructions”.
Libraries
 Binary libraries don’t work well with
pipelined processors, the cost of getting
into or out of them is usually to great.
 A binary library (like a dll) is NOT a secure
method of distributing intellectual property.
 Encrypted source going through a trusted
tool-chain to generate encrypted binaries
is the way to go.
Hardware with problems….
 Let’s just have one continuous data type
(and maybe one integer type).
 Different widths for different memories
makes horrible problems.
 Private instruction sets and ‘Useful’
instructions.
Hardware with problems….
 Do not chisel a digital analogue of an
analogue circuit out of digits.
 Sample rate to silicon clock ratio
Hardware with problems….
 Bi-quad coefficient ranges.
 Feedback coefficients ranges need to be
big enough.
 Feed-forward coefficient ranges are not
limited, they can get big. If there’s
nowhere in your system to make gain,
you’re in trouble.
Hardware with problems….
 The accumulator is dead. When hardware
was expensive and DSP engineers were
cheap it made sense to get performance
this way, but that is no longer true.
 Most of today’s algorithms aren’t sums of
products anyway.
 And it makes a high level description
difficult.
Hardware with problems….
 Double precision is probably not the right
thing for LF filters.
 Choosing the right filter structure and
adding a few bits is a financially better
solution.
Hardware with problems….
 If you must have an accumulator make
sure you can load and store it!
Hardware with problems….
 Shifting is required to get gain into the
system.
 There are few reasons for a shift of greater
than 2^7 and very few for more than 2^15.
 Shift after the multiplier, it’s the only place
where there are the bits to shift.
 Shift in the wrong place is common.
Hardware with problems….
 If a standard 5 coefficient bi-quad takes
more than 5 instructions there’s something
wrong.
 A simple Z-1 delay, and cascades thereof
should not consume instructions.
 Simple rotating memory, and language
support.
Hardware with problems….
 A pipeline needs to be started cleanly.
 This is not always easy.
Debuggnig
 Source level debugging is perfectly
standard in almost every general purpose
processor toolset, why is it missing from
DSP toolsets?
Debuggnig
 If you do add a debugger, remember that
the objects you are processing are signals,
thus they vary with time.
 A numerical display of a signal is generally
useless, like using a DVM to analyse
audio, necessary but not sufficient.
 Provide a scope and signal generator.
Debuggnig
 Debugging Input or Output is a signal.
 Easiest done by the instruction NOT the
location.
How do we make DSP easier?
 Get the algorithm away from the hardware
 Use DSP that is compiler compatible
DSP – Why So Hard?
 Program
 Only
 Signals.
 Easy?
 Yes!
Program Only Signals. Easy? Yes!
DSP—Made Easy!
November 2010
Download