DSP—Why So Hard? November 2010 Who ? Peter.Eastty@Oxford-Digital.com Design and sell processor cores and matching programming environments. Program strange algorithms onto stranger processors with the strangest tools. Customers NDAs Client lists. You Why ? People expensive, Silicon cheap. People slow, Silicon fast. People slow, Computers fast. Programmer efficiency is everything, it gets you time to market which gets you the market. Why ? Targets move, right up to the last minute. Never, ever build a fixed function device. Three stories. Third party algorithms will have to be adopted whether public domain or highly secret. What is an audio signal? Known bandwidth Known resolution Known number of channels So why don’t I enumerate them Where ? Large mixing consoles Cell Phones Hi Fi TVs iPod docs PCs etc. What processing do we want to do to the audio? Continuously varying value against time Filtering Polynomial Non-linear Decisions DSP = Low Delay Not block structured What hardware resources are available? Memory Multipliers Connections Adders etc. Data types for DSP See RBJ on headroom and floating point Want a fixed point data type. Word length, 16 – 32 etc. have nothing to do with audio, setup the word-length to suit the audio not a computer. Languages for DSP C is not a DSP language, the data types are all wrong and it has no concept of time. C++ could be a DSP language but it doesn’t want to be one, it too has no concept of time. Languages for DSP With modern hardware design and compiler technology there is never a need for assembler. NEVER EVER. Of course if you’re tied to old hardware for legacy code reasons you still might have to hack in assembler. Languages for DSP main (…) { ASM(…); ASM(…); ASM(…); } /* This is not C. */ Languages for DSP main (…) { Clear_Acc(); MAC(…); Store_Acc_to_Register(…); } /* This isn’t either. */ Languages for DSP main (…) { Multiply_by_Coefficient(); Biquad(…); Do_FFT(…); } /* Neither is this. */ Languages for DSP Beware of ‘optional extensions’. They can become mandatory. There is still at least one University teaching DSP using FORTRAN and assembler … ...sad to say they apologized about the FORTRAN. Languages for DSP I don’t know the perfect DSP language. But any high level language is better than any machine specific language. Multiple Memory Banks If there are multiple memories then memory allocation is NOT the programmers job, the tool-chain should do this for you. But it might be nice to be able to do some if you want to. Multiply-Add Source level individual operations (add, multiply etc.) should be independent, hardware instructions can combine multiple operations (like Multiply-Add). Make sure the operations in a combined instruction are exactly the same as those in individual instructions. Limiting Whatever number system you use it will have a range, even floating point. Limiting will be required after every operation that can exceed the range, multiply, add, subtract and absolute value. This includes the multiply in a multiplyadd. -1 x -1 = -1 ???????? Pipelines User should never have to think about pipelines. Variable pipelines are wrong. Pipeline is not a panacea for timing problems, it limits the processing in a loop. Pushing code through a branch. Using the pipe for parameter passing. Pipelines Definition of pipeline length, count between the instruction that generates an item and an instruction that may use it. Short circuiting the pipe. Useful, but not very useful. Can unwind the execution by having pipeline-length prime relative to instruction count, but this adds to delay, which in turn adds to storage requirement. Branching If you can find another way avoid branches. If you have to have jumps and a pipeline keep it all away from the programmer. If you do have jumps they’ll likely break the guaranteed timing. Conditional Execution Conditional execution doesn’t break pipeline etc. But you’ll need as many condition code stores as you have pipeline length. Timing is identical for conditional execution and multiplexer. Multiplexer y = (a < 0.0) ? b : c; Timing is identical for conditional execution and multiplexer. With multiplexer you can use any variable as a control so no condition code store is required. y = (a <= 0.0) ? b : c; ABS For simple bends in an input/output relationship, Absolute Value plus some Addition and Subtraction is more economical than most other methods. Truncation, Rounding, Dither and Noise Shaping For every instruction that needs it … … and just for Output Assume fixed point Floating point is hard Truncation, Rounding and Truncation Towards Zero!. Truncation is easy but has DC offset Truncation Towards Zero! ½ LSB offset number systems Rounding wins and is not much more complex. Dither How do we make it? Truly random, pseudo random, hash? What colour do we want it to be? What PDF do we want? Make sure it’s un-correlated. Want repeatability for test. Problems with infinite gain components. Rounding wins. Noise shaping What shape? What order? Want repeatability for test. Problems with infinite gain components. Rounding wins. Make sure your instruction set can do dither and noise shaping. Coefficient Interpolation Coefficients as a sampled system SRC called interpolation HW or SW, 2-3 instructions to feed one. Only in exceptional circumstances is it worth a hardware solution. Linear is possible, first order filter is easy and works for many applications. Coefficient Synchronization Coefficient synchronisation. Lots of people ignore it or treat it on a per use basis. Can be done for linear or first order filters with ease. This is really a synchronous sampling problem. Coefficient Synchronization, Synchronization J itter Scaling, multiprocessors, synchronisation & segmentation Not all solutions fit in a single processor. Automatic segmentation of programs across multiple processors is possible. But it is hard. If the processors are not identical, and identically connected it’s very, very hard. Scaling, multiprocessors, synchronisation & segmentation If you have multiple processors and no branches then you can run them in lockstep, many examples. For data transfer between processors simply send from one processor and receive by the other at the same time. Disastrous for assembler, easy for compiler. Scaling, multiprocessors, synchronisation & segmentation How do you connect multiple processors, series or parallel? If you chose either then you can’t do some algorithms. Use mesh or router instead. Small routers are actually cheap and relatively easy to generate code for. Multiple processors I/O, dedicated processor connections or is I/O a full member of the clan? Constant folding and common code removal. Easy in a compiler, often missed by an assembly language programmer. Keep everything as source until the last possible moment. That way common parts can be taken advantage of, constants, but more importantly data and instructions. Leads to documentation of library functions requiring “at most X data memories and Y instructions”. Libraries Binary libraries don’t work well with pipelined processors, the cost of getting into or out of them is usually to great. A binary library (like a dll) is NOT a secure method of distributing intellectual property. Encrypted source going through a trusted tool-chain to generate encrypted binaries is the way to go. Hardware with problems…. Let’s just have one continuous data type (and maybe one integer type). Different widths for different memories makes horrible problems. Private instruction sets and ‘Useful’ instructions. Hardware with problems…. Do not chisel a digital analogue of an analogue circuit out of digits. Sample rate to silicon clock ratio Hardware with problems…. Bi-quad coefficient ranges. Feedback coefficients ranges need to be big enough. Feed-forward coefficient ranges are not limited, they can get big. If there’s nowhere in your system to make gain, you’re in trouble. Hardware with problems…. The accumulator is dead. When hardware was expensive and DSP engineers were cheap it made sense to get performance this way, but that is no longer true. Most of today’s algorithms aren’t sums of products anyway. And it makes a high level description difficult. Hardware with problems…. Double precision is probably not the right thing for LF filters. Choosing the right filter structure and adding a few bits is a financially better solution. Hardware with problems…. If you must have an accumulator make sure you can load and store it! Hardware with problems…. Shifting is required to get gain into the system. There are few reasons for a shift of greater than 2^7 and very few for more than 2^15. Shift after the multiplier, it’s the only place where there are the bits to shift. Shift in the wrong place is common. Hardware with problems…. If a standard 5 coefficient bi-quad takes more than 5 instructions there’s something wrong. A simple Z-1 delay, and cascades thereof should not consume instructions. Simple rotating memory, and language support. Hardware with problems…. A pipeline needs to be started cleanly. This is not always easy. Debuggnig Source level debugging is perfectly standard in almost every general purpose processor toolset, why is it missing from DSP toolsets? Debuggnig If you do add a debugger, remember that the objects you are processing are signals, thus they vary with time. A numerical display of a signal is generally useless, like using a DVM to analyse audio, necessary but not sufficient. Provide a scope and signal generator. Debuggnig Debugging Input or Output is a signal. Easiest done by the instruction NOT the location. How do we make DSP easier? Get the algorithm away from the hardware Use DSP that is compiler compatible DSP – Why So Hard? Program Only Signals. Easy? Yes! Program Only Signals. Easy? Yes! DSP—Made Easy! November 2010