Eng. Julian S. Bruno Introduction REAL TIME DIGITAL SIGNAL PROCESSING Why Digital? A brief comparison with analog. www.electron.frba.utn.edu.ar/dplab UTN - FRBA 2011 Simposio Argentino de Sistemas Embebidos UTN - FRBA 2011 Advantages Eng. Julian S. Bruno The BIG picture Flexibility. Easily modifiable and upgradeable. Reproducibility. p y Don’t depend p on components tolerance. Exactly reproduced from one unit to other. Reliability. No age or environmental drift. Comple it Allows Complexity. Allo s sophisticated applications in only one chip. D t Data Real time algorithms Results FAST Real Time DSP System UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno Sampling signals: A very important fifirst step. Sampling low-pass low pass signals (CT) The sampling theorem indicates that a continuous signal can be properly sampled, only if it does not contain frequency components above one-half of the sampling rate. t Fc N LF Fs MIPS MFLOPS Real Time DSP System UTN - FRBA 2011 Nyquist sampling theorem Eng. Julian S. Bruno Aliasing and frequency ambiguity UTN - FRBA 2011 IF sampling Harmonic sampling Sub-Nyquist sampling p g Undersampling 2 fc B 2f B fs c m 1 m Eng. Julian S. Bruno Eng. Julian S. Bruno Sampling band-pass band pass signals Fs = 6KHz UTN - FRBA 2011 fS 2 fN UTN - FRBA 2011 for any positive integer m m, where fs ≥ 2B is accomplished. Eng. Julian S. Bruno Sampling band-pass band pass signals m (2Fc-B)/m (2Fc-B)/(m+1) Optimum Fs 1 35.0 MHz 22.5 MHz 22.5 MHz 2 17.5 MHz 15.0 MHz 17.5 MHz 3 11.66 MHz 11.25 MHz 11.25 MHz 4 8 75 MH 8.75 MHz 9 0 MH 9.0 MHz - 5 7.0 MHz 7.5 MHz - Reconstruction signals Optimum Fs is defined here p frequency q y as that optimum where spectral replications do no butt up against each other except at zero Hz Real Time DSP System UTN - FRBA 2011 Eng. Julian S. Bruno Reconstruction signals Analog signal X can be reconstructed from its samples by using the f ll i formula: following f l The reconstruction is based on the interpolation of shifted sinc functions. functions. It is very difficult to generate sinc i functions f ti by b electronic l t i circuitry. An approximation of a sinc function is a pulse. Sample and hold circuit performs this approximation approximation. UTN - FRBA 2011 Eng. Julian S. Bruno Reconstruction Errors t kTS X (t ) X (kTS ) sinc k TS UTN - FRBA 2011 Sample and hold circuits The gain in the desired central band is not constant The are high-frequency replica of the signal spectrum Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno Reconstruction Solutions Reconstruction Errors Example The gain in the desired central band is not constant It is p possible to compensate p for this non-ideality y byy using an inverse filter as part of the DSP component The are high-frequency replica of the signal spectrum which can be removed by using a lowpass filter UTN - FRBA 2011 Eng. Julian S. Bruno Real Time constraints UTN - FRBA 2011 Eng. Julian S. Bruno Real time constraints Signal Path Algorithms time (tA) MUST fit between two consecutive sampling periods (tS). Thus tA limits the maximum frequency that a system can work. The definition of real time is VERY application y ) dependant ((faster speed of evolution of the system). Real Time DSP System UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno Real time constraints DSP hardware Block Bl k Processing P i Mode M d 4 memory buffers of length N are required f double-buffering for ff method. 2 memoryy buffers (in ( and out) are needed for internal processing by the processor. A delay of 2NTs is incurred in block processing. More complicated programming is needed to manage the switching between buffers. Can be configured the ADC and DAC to transfer data samples into the internal memory of processor using the serial ports and the DMA. UTN - FRBA 2011 Real Time DSP System Eng. Julian S. Bruno UTN - FRBA 2011 What can we do with a DSP? Almost any linear and nonlinear system (PID controller). Digital filters (FIR-IIR). Adaptive systems (LMS algorithm) algorithm). Modulators and demodulators. Any mathematical intensive algorithm (FFTDCT-WT). Image compression algorithms Real time video processing UTN - FRBA 2011 Eng. Julian S. Bruno Eng. Julian S. Bruno Linear systems implementation Being x(n) and h(n) are arrays of numbers. If we want to compute y(n) we have to multiply and sum the last M samples being M the length of h(n) samples, h(n). This repeated for every new sample received from de ADC. As you can see, any linear system uses multiplications, accumulations ( (sums), ) and d lloops iintensively. t i l UTN - FRBA 2011 Eng. Julian S. Bruno Summary of desirable features of a DSP Fast Fourier Transform FFT UTN - FRBA 2011 Eng. Julian S. Bruno So those are DSP math features So, Multiply M lti l and dA Accumulators l t (MAC’s) units. ALU’s ALU s (fixed and floating point). Barrel shifters. Depending on DSP application, more than one unit it are presentt in i modern d DSP’s, allowing parallelism. Harvard (modified) architecture provide multiple operations per cycle. UTN - FRBA 2011 Fastt in F i mathematics th ti operations, ti and d combinations of them (multiply and sum specially). i ll ) Flexible addressing modes (bit reversal, circular buffers, ff zero overhead loops)) DSP specific instruction set ((arithmetic shifting, g saturating arithmetic, rounding, normalization) Minimum overhead p peripherals p ((communications devices specially) DSP instructions for specific applications (Video, Control, Audio) UTN - FRBA 2011 Architectural Features for Efficient P Programming i Eng. Julian S. Bruno Eng. Julian S. Bruno Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features UTN - FRBA 2011 Eng. Julian S. Bruno Architectural Features for Efficient P Programming i Specialized Addressing Modes Circular buffering Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features UTN - FRBA 2011 B0 = 0x00; L0 = 0; // Base and length B0 = 0x00; L0 = 44; // Base and length I0 = 0x00; M0 = 16; // Index and increment R0 = [I0++M0]; // R0=1 & I0=0x10 R0 = [I0++M0]; // R0=5 & I0=0x20 R0 = [I0++M0]; // R0=9 & I0=0x04 R0 = [I0++M0]; [I0 M0] // R0 R0=2 2 & I0=0x14 I0 0 14 R0 = [I0++M0]; // R0=6 & I0=0x24 Eng. Julian S. Bruno Architectural Features for Efficient P Programming i Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features UTN - FRBA 2011 I0=0; M0=1; // Index and increment I2=256; P0 = 8; LOOP(start, end) LC0 = P0; start: // I0 automatically incremented in B-R progression R0 = [I0] || I0 += M0 (BREV); end: // I2 point to bit bit-riversed riversed buffer [I2++] = R0; UTN - FRBA 2011 Eng. Julian S. Bruno Hardware Loop Constructs Bit-Riversal Looping is a critical feature in communications processing algorithms. There are two key looping-related features that can improve p p performance on a wide variety y of algorithms: “zero-overhead zero overhead hardware loop” loop “hardware loop buffers” Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno Architectural Features for Efficient P Programming i Cacheable memories Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 Architectural Features for Efficient P Programming i Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features Eng. Julian S. Bruno Multiple operations per cycle Today’s high-speed processors would effectivelyy run at much slower speeds because larger applications would only fit in slower external memory. Programmers would be forced to manuallyy move key code in and out of internal SRAM. Adding g data and instruction caches into the architecture, external memory becomes much more manageable. bl In addition to performing multiple ALU/MAC operations each core processor cycle, additional data loads and stores can also be completed in the same cycle. y The memory is typically portioned into sub-banks sub banks that can be dualaccessed by the core and optionally p y by a DMA controller. There are two multi-issue architectures: VLIW and superscalar UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno Architectural Features for Efficient P Programming i Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features Interlocked pipeline UTN - FRBA 2011 Eng. Julian S. Bruno Architectural Features for Efficient P Programming i UTN - FRBA 2011 Specialized p addressing g modes Hardware Loop Constructs Cacheable memories Multiple operations per cycle Interlocked pipeline Another important features UTN - FRBA 2011 Eng. Julian S. Bruno Another important features I order In d tto increase i th throughput, h t DSPs DSP are d designed i d tto be pipelined When assembly programming is required required, the pipeline can make programming more challenging. The processor automatically handles stalls and bubbles. RISC lik like registers i t and d iinstruction t ti sett Multiple data/program buses. DMA controller for handling peripherals In traditional fixed fixed-point point DSPs, word sizes are usually fixed. However, there is an advantage to having data registers that can be treated as: One 64-bit word Two 32 32-bit bit word Four 16-bit word Eight 8-bit word Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno DSP clasification Why DSP hardware? Fixed or Floating point arithmetic. Millions of multiply–accumulate py operations p p per second, MMACs. Millions of floating floating-point point operations per second, MFLOPS. Application specific feat features res ((video, ideo a audio, dio control, communications). Memory UTN - FRBA 2011 Eng. Julian S. Bruno TI Processors Special-purpose (custom) chips such as application-specific integrated circuits (ASIC). Field-programmable Field programmable gate arrays (FPGA). General-purpose microprocessors or microcontrollers (μP/μC). General-purpose digital signal processors (DSP processors). DSP processors with application-specific hardware (HW) accelerators. UTN - FRBA 2011 Eng. Julian S. Bruno C5000™ DSP Platform Roadmap C5000 C6000™ High Hi h Performance P f DSP DSPs Ideal for imaging, broadband infrastructure and performance audio applications. C6000™ Performance Value DSPs C6000 Ideal for broadband infrastructure and performance audio applications. Lower cost. C6000™ Floating-point DSPs Ideal for professional audio products, biometrics, medical, industrial, digital imaging, speech recognition, conference phones and voice voice-over over packet C5000™ Power-Efficient DSPs Optimized for power- and cost-efficient embedded signal processing solutions C2000™ 32-bit Real-time MCUs Optimized core can run multiple complex control algorithms at speeds necessary for f demanding d di control t l applications li ti UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno TI’ss ARM Processor TI Processor-Based Based C6000™ DSP Platform Roadmap C6000 Sitara™ ARM Microprocessors Stellaris® MCU Eng. Julian S. Bruno ARM9-based devices. LP, general-purpose, multimedia and graphics processing ARM® Cortex™-A8 Cortex™ A8 core C64x+™ DSP and Video Accelerators 3-D Graphics Acceleration DaVinci™ Digital Media Processors UTN - FRBA 2011 ARM Cortex-M3 Cl k Speed: Clock S d Up U to t 100 MHz MH Up to 125 MIPS (at 100 MHz) Advanced integration: Serial interfaces, motion control, system, analog OMAP™ Applications A li ti P Processors Cortex™-A8 and ARM9™-based embedded microprocessors Clock Speed: 300 MHz to 1.5 GHz 3D Graphics Accelerator and Power technology (OMAP) Optimized for digital video systems only, ARM9 + DSP and DSP only. ARM9 UTN - FRBA 2011 Eng. Julian S. Bruno ADI Processors TI’ss ARM Processor TI Processor-Based Based Products TigerSHARC® Processors SHARC® Processors 32-Bit floating-point Clock Speed: 150MHz to 400MHz / 2.4 GFLOPs. Accelerator Architecture: FIR, IIR, FFT. Blackfin® Processors 32-bit fixed-point as well as floating-point Clock Speed: 250MHz to 600MHz 4.8 GMACs of 16-bit 16 bit performance / 3.6 GFLOPs 24 Mbits of on- chip memory 5 Gbytes of I/O bandwidth 16/32-bit fixed point Clock Speed: p 200MHz to 756MHz / 1.5 GMACs Very low power consumption: 0.23mW/Mhz RTOS supported. Multicore 600MHz / 2.4 GMACs. ADSP-21xx Processors 16/32-bit fixed point Clock Speed: 75MHz to 160MHz Analog Devices brought first programmable processor to market in 1986 UTN - FRBA 2011 UTN - FRBA 2011 Eng. Julian S. Bruno Eng. Julian S. Bruno ADSP-21xx Processors Blackfin Processors ADSP-2191 BLOCK DIAGRAM ADSP-BF536/ADSP-BF537 BLOCK DIAGRAM UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 SHARC Processors TigerSHARC g Processor ADSP-2146x BLOCK DIAGRAM ADSP-TS201S BLOCK DIAGRAM UTN - FRBA 2011 Eng. Julian S. Bruno UTN - FRBA 2011 Eng. Julian S. Bruno Eng. Julian S. Bruno Markets and Applications Recommended bibliography RG L Lyons, U Understanding d t di Di Digital it l Si Signall P Processing i 2nd ed. Prentice Hall 2004. SW Smith, The Scientist and Engineer’s guide to DSP. California Tech. Pub. 1997. UTN - FRBA 2011 Eng. Julian S. Bruno Questions? Thank you! Eng. Julian S. Bruno UTN - FRBA 2011 Ch1: The Breadth and Depth of DSP Ch3: ADC and DAC SM Kuo, K BH L Lee. R Real-Time l Ti Digital Di it l Si Signall P Processing i 2nd ed. John Wiley and Sons. 2006 Ch2: Periodic Sampling Ch1:Introduction to Real Real-Time Time Digital Signal Processing NOTE: Many images used in this presentation were extracted from the recommended bibliography. UTN - FRBA 2011 Eng. Julian S. Bruno