DSP C5000 Chapter 14 Finite Impulse Response (FIR) Filter Implementation Copyright © 2003 Texas Instruments. All rights reserved. Outline Digital Filters and FIR filters Implementation of FIR Filters on C54x Implementation of FIR Filters on C55x Comparison of C54x and C55x ESIEE, Slide 2 Copyright © 2003 Texas Instruments. All rights reserved. Outline of FIR Filters ESIEE, Slide 3 Generalities on Digital Filters FIR Filters with Matlab Implementation of FIR Filters Copyright © 2003 Texas Instruments. All rights reserved. Digital Filters Sampling frequency fS x(t) Analog antialiasing filter yn xn A D C Digital Filter xn D A C Analog y(t) smoothing filter yn Digital Filter ESIEE, Slide 4 Copyright © 2003 Texas Instruments. All rights reserved. Linear, Time-Invariant Digital Systems 1 R 2 R Linearity 1x1( n) 2 x2 ( n) 1 y1( n) 2 y2 ( n) Time Invariance x( n ) y( n) x( n n0 ) y( n n0 ) ESIEE, Slide 5 Copyright © 2003 Texas Instruments. All rights reserved. Impulse Response n 0 un 0 Impulse sequence un u0 1 n 0 u 0 n un n=0 ESIEE, Slide 6 Digital Filter hn Copyright © 2003 Texas Instruments. All rights reserved. Input-Output Relationship, Convolution xn n=-1 0 1 2 = n=-1 0 1 2 + n=-1 0 1 2 + n=-1 0 1 2 + n=-1 0 1 2 x-1un+1 xn xu k k nk x0un x1un-1 x2un-2 ESIEE, Slide 7 Copyright © 2003 Texas Instruments. All rights reserved. Input-Output Relationship, Convolution yn Using linearity and time invariance: k x output(u k yn ESIEE, Slide 8 k nk ) k xh k k k k k k nk xk hnk hk xnk Copyright © 2003 Texas Instruments. All rights reserved. Output for a Single Frequency Input Single frequency input Single frequency output xn e j0nTe yn xn H (0 ) H (0 ) H(0 ) H(0 ) e ESIEE, Slide 9 k hk e j0kTe k j arg( H (0 )) A(0 )e j(0 ) Copyright © 2003 Texas Instruments. All rights reserved. Frequency Transfer Function For a digital filter the frequency transfer function is periodic. H( ) H( ) e 1 hn 2f e f e H ( )e j arg( H ( )) jnTe ESIEE, Slide 10 Amplitude d f e ( ) arg H Phase A( )e j( ) ( ) ( ) Group delay Copyright © 2003 Texas Instruments. All rights reserved. Relationship Between Fourier Transforms of Input and Output X ( ) n xn e n jnTe Y ( ) n yn e jnTe n Y ( ) H ( ) X ( ) ESIEE, Slide 11 Copyright © 2003 Texas Instruments. All rights reserved. Z Transfer Function H( z) hn z n n H ( ) hne n jnTe H ( z ) z e jTe Y( z ) X ( z )H( z ) ESIEE, Slide 12 Copyright © 2003 Texas Instruments. All rights reserved. Basic Relationships of a Digital Filter yn k k k k xk hnk hk xnk Y ( ) H ( ) X ( ) Y( z ) X ( z )H( z ) ESIEE, Slide 13 Copyright © 2003 Texas Instruments. All rights reserved. Rational z Transfer Function Q N(z) H(z) D( z ) bi z i0 P i 1 ak z k k 1 Linear equation with constant coefficients. Q P i 0 k 1 yn bi xni ak ynk ESIEE, Slide 14 Copyright © 2003 Texas Instruments. All rights reserved. IIR and FIR Filters IIR = Infinite Impulse Response FIR = Finite Impulse Response Q H ( z ) bi z i0 i FIR hn z n n n 0, Q 1 hn 0 n 0, Q 1 hn bn IIR N(z) With D( z) constant. H(z) D( z ) ESIEE, Slide 15 Copyright © 2003 Texas Instruments. All rights reserved. FIR and IIR FIR: output yn is a linear combination of a finite number of input samples. Q Q i 0 i 0 yn hi xni bi xni , bi hi . IIR: output yn is a linear combination of a finite number of input and of output samples. Recursive form. Q P i 0 k 1 yn bi xni ak ynk ESIEE, Slide 16 Copyright © 2003 Texas Instruments. All rights reserved. Causality and Stability A filter is causal if hn=0 for n < 0 A filter is stable if the output is bounded for any bounded input. Condition for stability is: All the poles of H(z) are inside the unit circle FIR are always stable. Or: hn A n ESIEE, Slide 17 Copyright © 2003 Texas Instruments. All rights reserved. Representation of Poles and Zeroes of H(z) in the Complex Plane Imaginary Part 1 0.5 Real Part 0 -0.5 -1 -1 ESIEE, Slide 18 -0.5 0 0.5 1 Copyright © 2003 Texas Instruments. All rights reserved. Some Useful Matlab Functions Example for a FIR filter: N ( z ) b0 b1 z b [b0 b3 z 3 b2 ] [1 1 1 1]. b=[1 1 1 1]; a=1; Calculate transfer function Hf, its amplitude and phase on 256 samples, with fs=1: ESIEE, Slide 19 b2 b2 z 2 Enter the filter coefficients vector b: b1 1 [Hf,f]=freqz(b,a,256,1); HfA=abs(Hf); Hfphi=angle(Hf); Copyright © 2003 Texas Instruments. All rights reserved. Some Useful Matlab Functions Plot impulse response: stem(b) Plot amplitude and phase of transfer function: plot(f,HfA) and plot(f,Hfphi) Phase of the transfer function 1 Amplitude of the transfer function 4 3.5 0.5 3 0 2.5 -0.5 2 -1 1.5 -1.5 1 -2 0.5 -2.5 0 ESIEE, Slide 20 0.05 0.1 0.15 0.2 0.3 0.35 0.25 Frequency, FS=1 0.4 0.45 0.5 0 0 0.05 0.1 0.15 0.2 0.3 0.35 0.25 Frequency, FS=1 0.4 0.45 0.5 Copyright © 2003 Texas Instruments. All rights reserved. Some Useful Matlab Functions Generate a test signal = sum of cosines: Apply the filter to x. Output is y: x=cos(2*pi*[0:99]*0.25)+2*cos(2*pi*[0:99]*0.1); y=filter(b,a,x); Plot the results: plot(x); plot(y) Input x 3 x is the sum of 2 frequencies : 0.25 and 0.1. 2 4 1 2 0 0 -1 -2 -2 -4 -3 ESIEE, Slide 21 0 20 40 Time 60 Output y 6 80 100 -6 The filter cancels the frequency 0.25. y has only the freq. 0.1. 0 20 40 60 80 100 Time Copyright © 2003 Texas Instruments. All rights reserved. Calculation of a FIR using Matlab For given attenuation and frequency response characteristics, the transfer function can be calculated using different methods: Corresponding Matlab functions ESIEE, Slide 22 Mean square error, miniMax (Chebychev) Empirical window method firls and remez. fir and fir1. Copyright © 2003 Texas Instruments. All rights reserved. Example using Matlab Design a low pass filter: Sampling frequency = 9600 Hz Maximum attenuation (passband) = 0.1 dB Minimum attenuation (stopband) = 50 dB Limit frequencies of passband and stopband = 1200 Hz and 2600 Hz. Attenuation in dB f in Hz 1200 ESIEE, Slide 23 2600 Copyright © 2003 Texas Instruments. All rights reserved. Example using Matlab ESIEE, Slide 24 Vector of limited frequencies (normalized) F=[0 1200 2600 4800]/4800; Vector of required amplitudes: A=[1 1 0 0]; Least square calculation of filter: Bls=firls(23,F,A); Mini Max calculation of filter: Bre=remez(21,F,A); Window method (Hamming): Bwin=fir1(25,(1200+2600)/9600); Copyright © 2003 Texas Instruments. All rights reserved. Results of Matlab Example The minimum orders to satisfy the constraints are 23 for LS, 21 for minimax and 25 for the window method. 140 Least square method 120 Attenation in dB = 20*log(1/|H(f)|) 100 Window method 80 60 40 20 Mini Max window 0 Frequency -2 0 0 ESIEE, Slide 25 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Copyright © 2003 Texas Instruments. All rights reserved. Results of Matlab Example Impulse Response 0.4 0.35 hn 0.3 0.25 0.2 0.15 0.1 0.05 n 0 -0.05 -0.1 ESIEE, Slide 26 0 5 10 15 20 25 Copyright © 2003 Texas Instruments. All rights reserved. FIR Filters with Constant Group Delay or Linear Phase For many applications, it is desirable to use a filter with a constant group delay (independant of the frequency). 2 possible cases: ESIEE, Slide 27 The phase will be linear or affine. symmetrical or asymmetrical FIR. Constant group delay = TS (N-1)/2 Symmetrical: h(n)=h(N-1-n) Asymmetrical; h(n)=-h(N-1-n) Copyright © 2003 Texas Instruments. All rights reserved. FIR filters with Constant Group Delay or Linear Phase Asymmetric case: linear phase ( f ) kf Asymmetrical case: ( f ) kf ESIEE, Slide 28 2 Copyright © 2003 Texas Instruments. All rights reserved. Fixed Point Implementation of FIR Filters Numerical Issues Fixed point implementation: Fixed point representation of data 16 bits for data and coefficients Accumulators have size 40 bits Size B = 16 bits, Format Qk: k fractional bits Quantization of coefficients Maximum magnitude coefficient = hmax Number of bits of the integer part of coefficients is Bi: ESIEE, Slide 29 Bi = log2(hmax) Coefficients in Qk’ with k = 16-Bi Copyright © 2003 Texas Instruments. All rights reserved. Matlab Example The coefficients Bre can be quantized using 16-bit fixed point with 15 fractional bits: To store the result in a text file for CCS: ESIEE, Slide 30 Bre=round(Bre*2^15); fp=fopen('coef.asm','wt') for i=1:22 fprintf(fp,' .word %d \n',Ba(i)) end fclose(fp) Copyright © 2003 Texas Instruments. All rights reserved. Matlab Example ESIEE, Slide 31 File coef.asm Can be edited to be used with CCS. .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word .word 39 -92 -242 25 668 579 -978 -2229 86 6374 12127 12127 6374 86 -2229 -978 579 668 25 -242 -92 39 Copyright © 2003 Texas Instruments. All rights reserved. FIR Implementation, Numerical issues, FRCT bit Common case: ESIEE, Slide 32 Data and coefficients in Q15 format Product h(i)x(n-i) in Q30 (2 sign bits) By shifting products 1 bit left, the product are in Q31 format with only 1 sign bit. If the FRCT bit (Fraction) is set to 1, products are automatically shifted 1 bit left. Copyright © 2003 Texas Instruments. All rights reserved. Structures for FIR Implementation Common structures for FIR filters Transversal structures Trellis structure Transversal structures using: ESIEE, Slide 33 Useful in some adaptive situations. Linear buffers Circular buffers Special case for symmetrical or asymmetrical FIRs. Copyright © 2003 Texas Instruments. All rights reserved. Transversal Structures of FIR Structure with a delay line xn-1 xn b0 b1 xn-2 b2 xn-N+1 b3 bN-1 yn Transposed structure yn bN-1 bN-2 b3 b2 b1 b0 xn ESIEE, Slide 34 Copyright © 2003 Texas Instruments. All rights reserved. Implementation of a FIR with a Delay Line Most common structure used in DSP. The delay line can be implemented using a linear or a circular buffer. Basic operations: Read a new data value x(n) every TS ACCU=0 for i=0 to N-1: ESIEE, Slide 35 Multiply h(i) by x(n-i) and add it to accumulator Output y(n) Copyright © 2003 Texas Instruments. All rights reserved. Implementation of FIR Filters on C54x Implementation of General Transversal FIR filters ESIEE, Slide 36 Using linear buffers Using circular buffers Implementation of Symmetrical FIR filters Copyright © 2003 Texas Instruments. All rights reserved. Operations using a Linear Buffer for a FIR with N Coefficients Length of the delay line = N samples Read a new sample x(n) and store it in the delay line in the first position. ACCU=0 for i=0 to N-1 ESIEE, Slide 37 Read h(i) and x(n-i) Multiply h(i) by x(n-i) and add it to ACCU Output y(n) N-1 Shifts in the delay line. Copyright © 2003 Texas Instruments. All rights reserved. Linear Buffer, MACD Mode Instead of shifting N-1 samples at the end, do the shift in the loop one by one. Read a new sample xn and store it in the delay line in the first position. ACCU=0 for i=N-1 to 0 ESIEE, Slide 38 Read h(i) and x(n-i) Multiply h(i) by x(n-i) and add it to ACCU Shift x(n-i) in the delay line Output y(n) Copyright © 2003 Texas Instruments. All rights reserved. MACD Instruction MACD: Multiply Accumulate and Delay move. MACD Smem, pmad, src If MACD used in a loop with RPT the program memory (pmad) address is automatically incremented. ESIEE, Slide 39 src=src+Smem*pmad; T=Smem; (Smem+1)=Smem MACD alone = 3 cycle times In a RPT loop 1 cycle time Copyright © 2003 Texas Instruments. All rights reserved. Implementing a FIR with MACD Memory organization of data and coefficients Program Memory Addresses Content i=pmad b(N-1) i+1 b(N-2) i+2 b(N-3) … … i+N-1 b(0) Data Memory Addresses Content k=Smem x(n) k+1 x(n-1) k+2 x(n-2) … k+N-1 x(n-N+1) dummy place for copy of k+N ESIEE, Slide 40 x(n-N+1) Copyright © 2003 Texas Instruments. All rights reserved. Initialization of Registers STM Stores #value to the MMR early in the pipeline to avoid latencies. Initialization of FRCT bit (fractional mode): Instructions SSBX (Set Status Bit) and RSBX (Reset Status Bit). Initialization of ACCU ESIEE, Slide 41 2 words, 2cycles. Using RPTZ :RePeaT after initializing ACCU at 0 Or via LD #0,A Copyright © 2003 Texas Instruments. All rights reserved. RPT, RPTZ Instructions RPT #n RPTZ src, #n ESIEE, Slide 42 Repeat next instruction n+1 times. Repetition counter set to n and decreases until 0. 1 or 2 cycles, not interruptible. Same as repeat, except that src ACCU is cleared to zero before repeat. 2 cycles , not interruptible. Some instructions execute faster when in repeat mode (pipeline). Copyright © 2003 Texas Instruments. All rights reserved. Implementing a FIR Filter with MACD .bss adr_fin_dat .set .text * Initialization of STM SSBX * Filter loop RPTZ MACD adr_debut_dat,N+1 adr_debut_dat+N-1 AR1 and FRCT #adr_fin_dat, AR1 FRCT A, #N-1 *AR1-, adr_coef, A Test with CCS ESIEE, Slide 43 Filter with N=32 coefficients all equal to 1/32 Create a file fircoef.asm, address of coefficients in program mem = adr_coef Copyright © 2003 Texas Instruments. All rights reserved. Implementing a FIR Filter with MACD File containing coefficients fircoef.asm adr_coef ESIEE, Slide 44 .global adr_coef .sect ".coef" .word 0X400, 0X400 .word 0X400,0X400,0X400,0X400,0X400 .word 0X400,0X400,0X400,0X400,0X400 .word 0X400,0X400,0X400,0X400,0X400 .word 0X400,0X400,0X400,0X400,0X400 .word 0X400,0X400,0X400,0X400,0X400 .word 0X400,0X400,0X400,0X400,0X400 Copyright © 2003 Texas Instruments. All rights reserved. Implementing a FIR Filter with MACD File firmacd.asm with the program 2 files to compile and link: Test by associating files on the ports DRR0 and DXR0 ESIEE, Slide 45 fircoef.asm and firmacd.asm File infir.dat attached to DRR0 File outfir.dat attached to DXR0 Copyright © 2003 Texas Instruments. All rights reserved. Implementing a FIR Filter with MACD Program file firmacd.asm: initializations N adr_fin_dat .mmregs .global .global .global .set .bss .set .text * Initialization of LD SSBX * Initialization of STM STM STM ESIEE, Slide 46 adr_debut_dat adr_fin_dat adr_coef 32 adr_debut_dat,N+1 adr_debut_dat+N-1 DP and FRCT #0, DP FRCT AR0, AR1, AR2 #(adr_debut_dat),AR2 #(adr_debut_dat-1),AR1 #N, AR0 Copyright © 2003 Texas Instruments. All rights reserved. Implementing a FIR Filter with MACD Program file firmacd.asm: endless loop debut: * set AR1 at adr_fin_dat MAR *AR1+0 * Read x(n) at DRR LDM DRR0, A STL A,*AR2 * Endless filter loop RPTZ A, #N-1 MACD *AR1-, adr_coef, A * Write y(n) in DXR * by saving the high part of ACCU in DXR STH A,DXR0 * Go back to the beginning of the loop B debut ESIEE, Slide 47 See files firmacd.asm and fircoef.asm for the test in directory tutorial. Copyright © 2003 Texas Instruments. All rights reserved. FIR with MACD, Test with CCS Create project, create command file, compile and link. To test the impulse response: Create a file infir.dat with: Set 2 probe points 1 at reading of DRR: LDM DRR 1 at end of loop: B debut Attach files to probe points ESIEE, Slide 48 A value 0.5 (0x4000) then zeros (at least 40) infir.dat at 1rst probe point (read value stored at address 0x20 DRR) outfir.dat at second probe point (data at address 0x21 DXR is strored in the file) Copyright © 2003 Texas Instruments. All rights reserved. Results Let program run until end of file infir.dat Load file outfir.dat at some address in the DSP data memory (File-Data-Load) Plot the content of this memory area (View-Graph-Time/Frequency). ESIEE, Slide 49 Plot a time graph (Single Time) Plot a frequency graph (FFT: Magnitude and Phase) Copyright © 2003 Texas Instruments. All rights reserved. Results for the impulse response and its FFT ESIEE, Slide 50 Copyright © 2003 Texas Instruments. All rights reserved. Second Test ESIEE, Slide 51 New test with a sine input. Replace infir.dat by file insinus.dat containing 80 samples of a sine with 40 samples per period of sine. Name outsine.dat the result file. Repeat the same operations as in the preceding test. Copyright © 2003 Texas Instruments. All rights reserved. Second test ESIEE, Slide 52 Observe that the output is attenuated and is phase shifted by values corresponding at H(f) at fS/40. Copyright © 2003 Texas Instruments. All rights reserved. Implementation using a Circular Buffer A circular buffer of length N is a block of contiguous memory words addressed by a pointer using a modulo N addressing mode. Characteristics of a circular buffer: ESIEE, Slide 53 The 2 extreme words of the memory block are considered as contiguous. Instead of moving the N data in memory, just modify the pointers. When a new data x(n) arrives, the pointer is incremented and the new data is written in place of the oldest one. Copyright © 2003 Texas Instruments. All rights reserved. Trace of Memory and Pointer in a Circular Buffer of Length 3 ESIEE, Slide 54 Time n Time n+1 Time n+2 Time n+3 x(n-1) x(n) x(n-2) x(n-1) x(n) x(n+1) x(n+2) x(n) x(n+1) x(n+2) x(n+3) x(n+1) Copyright © 2003 Texas Instruments. All rights reserved. FIR with Circular Buffers 2 circular buffers 1 for data 1 for coefficients Data Memory adr_deb_data Coefficient memory adr_deb_coef b(N-1) b(N-2) pnt_coef pnt_data adr_fin_coef ESIEE, Slide 55 adr_fin_coef b(0) Copyright © 2003 Texas Instruments. All rights reserved. Operation of FIR with Circular Buffer Read a new input sample x(n) Store it at address of pnt_data ACCU=0 for i=1 to N-1 ESIEE, Slide 56 multiply data pointed by pnt_data by coefficient pointed by pnt_coef. Add product to ACCU decrement pointers pnt_data and pnt_coef end output y(n) from ACCU increment pnt_data of 1 Copyright © 2003 Texas Instruments. All rights reserved. Instruction MAC with 2 operands in Indirect Addressing Mode MAC: Multiply and Accumulate MAC Xmem, Ymem, src[, dest] Dual operand instructions indirect addressing restricted to: ESIEE, Slide 57 dst=src+Xmem*Ymem T=Xmem With Xmem, Ymem use only AR2 to AR5 Can be executed in 1 cycle time. AR2, AR3, AR4, AR5 none, +, -, +0% Copyright © 2003 Texas Instruments. All rights reserved. Circular Buffer with C54x Circular indirect addressing mode: *ARi-%, *ARi+%, *ARi-0%, *ARi+0%, *ARi(lk)% In dual operand mode Xmem, Ymem: BK register: ESIEE, Slide 58 *ARi+0% only valid mode To perform a decrement, store a negative value in AR0. Stores the size N of the circular buffer. Must be initialized before use. There may be several circular buffers at different addresses at the same time but with the same length. Copyright © 2003 Texas Instruments. All rights reserved. Limitations on Start Addresses of Circular Buffers If N is written on nb bits in binary, the start address must have its nb LSB at 0: Examples: To access a circular buffer: Initialize BK with N (nb bits) Choose 1 ARi as a pointer ESIEE, Slide 59 for N=32, 6 LSB of start address =0 for N=30, 5 LSB of start address =0 The effective start address of the buffer is the value in ARi with its nb LSB at 0. The end address = start addess +N-1. Copyright © 2003 Texas Instruments. All rights reserved. Circular buffer on C54x Data Memory Start_address = xxxxxxxxxxx00000 ARi BK xxxxxxxxxxx00010 N=30=1 1 1 1 0 ARi End_address = xxxxxxxxxxx11111 ESIEE, Slide 60 Copyright © 2003 Texas Instruments. All rights reserved. Implementation of FIR Filter with 2 Circular Buffers Same filter as in the preceding example, coefficients in section .coef (in program memory) in file fircoef.asm. N=32 2 buffers are allocated in data memory for the coefficients and the data of the filters First step of program after initialization: ESIEE, Slide 61 Start addresses must be multiple of 64. Transfer coefficients from program to data memory from adr_coef to adr_debut_coef. Copyright © 2003 Texas Instruments. All rights reserved. Move Instructions MVPD #pmad, Smem Copy values from program to data memory In RPT mode pmad is automatically incremented. Program Data MVPD, MVDP READA, WRITEA Data Data MVKD, MVDK, MVDD ESIEE, Slide 62 MMR Data MVMD, MVDM MMR MMR MVMM Copyright © 2003 Texas Instruments. All rights reserved. Implementation of FIR with 2 Circular Buffers, Initializations N adr_debut_dat adr_debut_coef adr_fin_dat adr_fin_coef .mmregs .global .global .global .global .global adr_debut_dat adr_fin_dat adr_debut_coef adr_fin_coef adr_coef .set .usect .usect .set .set 32 "buf_data", N "buf_coef", N adr_debut_dat+N-1 adr_debut_coef+N-1 .text * Initialization of BK,AR0,FRCT STM #N, BK STM #-1, AR0 SSBX FRCT * Initialization of AR2, AR3 STM #(adr_debut_dat),AR2 STM #(adr_fin_coef),AR3 ESIEE, Slide 63 Copyright © 2003 Texas Instruments. All rights reserved. Implementation of FIR with 2 Circular Buffers, Program * Transfer of coefficients from * program to data memory STM #adr_debut_coef, AR4 RPT #N-1 MVPD adr_coef, *AR4+ * Endless loop debut: * Read x(n) at DRR LDM DRR0, A STL A, *AR2 * Calculation of y(n) RPTZ A, #N-1 MAC *AR2+0%, *AR3+0%, A * Write y(n) in DXR * by saving high part of ACCU STH A, DXR0 * Go back to the beginning of the loop MAR *AR2+ B debut ESIEE, Slide 64 See files fircirc.asm and fircoef.asm for the test. Copyright © 2003 Texas Instruments. All rights reserved. Command File for Circular Buffer Addressing Constraint The addresses adr_debut_dat and adr_debut_coef have to be aligned with a multiple of 64 in the example. adr_debut_dat is the start address of unitialized section buf_data. adr_debut_coef is the start address of unitialized section buf_coef. To align the 2 sections on a multiple of 64, in the command file add align(64) after the name of the sections in the MEMORY directive, for example: ESIEE, Slide 65 buf_data align(64) > DATA page 1 Copyright © 2003 Texas Instruments. All rights reserved. Implementation of a Symmetrical FIR filter The symmetry of coefficients is used to decrease the computational load: b(n)=b(N-1-n) N time cycles for a general FIR filter with N coefficients is N (in good conditions). N/2 time cycles for a symmetrical FIR filter. Use of specific instruction FIRS. N 1 2 y (n) b(i) x(n i ) x(n N 1 i ) N even i 0 N 1 1 2 N 1 N 1 y (n) b(i) x(n i ) x(n N 1 i ) b x n N odd 2 2 i 0 ESIEE, Slide 66 Copyright © 2003 Texas Instruments. All rights reserved. FIRS Instruction to Work with RPT(Z) FIRS Xmem, Ymem, pmad Xmem, Ymem corresponds to: x(n-i), x(n-N+1+i) Coefficients in program memory pmad operations of FIRS: pmad PAR while RC 0 ESIEE, Slide 67 B = B + A(32:16) x Pmem addressed by PAR A = (Xmem+Ymem)<<16 PAR=PAR+1 RC=RC-1 Copyright © 2003 Texas Instruments. All rights reserved. Using FIRS for a Symmetrical FIR Filter 3 arrays: N/2 first coefficients, N/2 newest data and N/2 oldest data. adr_debut_coef PAR Program Memory Data Memory b(0) x(n-2) b(1) b(2) x(n) x(n-1) x(n-3) adr_debut_dat0 AR2 adr_debut_dat1 AR3 x(n-5) x(n-4) Example for N = 6 2 circular buffers ESIEE, Slide 68 Copyright © 2003 Texas Instruments. All rights reserved. Using FIRS for a Symmetrical FIR Filter BK = N/2 At the beginning AR2 and AR3 point to: the newest data x(n) and the oldest data x(n-N+1) Beginning x(n) x(n-1) x(n-N/2-1) ESIEE, Slide 69 x(n-N+3) x(n-N/2) x(n-N+1) x(n-N+2) After N/2 +1 incrementations x(n) x(n-1) x(n-N/2-1) x(n-N+3) x(n-N/2) x(n-N+1) x(n-N+2) Copyright © 2003 Texas Instruments. All rights reserved. Using FIRS for a Symmetrical FIR Filter FIRS is repeated N/2 times The first sum x(n)+x(n-N+1) is done before entering the loop. N/2 iterations (AR2 and AR3 incremented by 1): ESIEE, Slide 70 At the first iteration AR2 points on x(n-1) and AR3 on x(n-N+2) After N/2 iterations: AR2 is decremented of 2 and AR3 of 1. The oldest sample x(n-N/2+1) of 1st buffer is stored in 2nd buffer in place of x(n-N+1). Then AR is incremented by 1. New sample x(n+1) is stored in place of x(n). Copyright © 2003 Texas Instruments. All rights reserved. Symmetrical FIR Implementation with FIRS, Initializations N Nsur2 adr_debut_coef adr_debut_dat adr_debut_dat1 .mmregs .global .global .global .set .set .set .usect .usect adr_debut_coef adr_debut_dat0 adr_debut_dat1 32 16 adr_coef "buf_data0", N "buf_data1", N .text * Initialization of BK, AR0,FRCT STM #Nsur2, BK STM #-2, AR0 SSBX FRCT * Initialization of AR2, AR3 STM #(adr_debut_dat0),AR2 STM #(adr_debut_dat1),AR3 ESIEE, Slide 71 Copyright © 2003 Texas Instruments. All rights reserved. Symmetrical FIR Implementation using FIRS, Program * Endless loop debut: * Read x(n) at DRR LDM DRR0, A STL A, *AR2 * Calculation of y(n) * Calculation of the first sum ADD *AR2+0%,*AR3+0%,A * Repeat N/2 times FIRS RPTZ B, #(Nsur2-1) FIRS *AR2+0%, *AR3+0%, adr_coef * Write y(n) at DXR * by saving high part of ACCU in DXR STH B, DXR0 * Transfer of the oldest value of 1rst array * to the oldest value of the 2nd array MAR *+AR2(-2)% MAR *AR3-% MVDD *AR2, *AR3+0% * Go back to the beginning of the loop B debut ESIEE, Slide 72 See files firsym.asm and fircoef.asm for the test. Copyright © 2003 Texas Instruments. All rights reserved. Tutorial The listing files for the prceent examples can be found in directory tutorial: ESIEE, Slide 73 Tutorial > Dsk5416 > Chapter 14 > Labs_fir Copyright © 2003 Texas Instruments. All rights reserved. Implementation of FIR Filters on C55x Implementation of block filters Implementation of symmetrical or asymmetrical FIR filters ESIEE, Slide 74 Copyright © 2003 Texas Instruments. All rights reserved. Implementation of FIR Filters using C55x 2 MAC units accessed using 3 data buses D, B, C make it possible to: Calculate 2 output samples y at a time using same set of coefficients and different data x. Calculate 2 output samples y at a time using same input data x but 2 set of coefficients. D a ta R e a d B u s e s t M AC MAC AC A0 AC1 ESIEE, Slide 75 Copyright © 2003 Texas Instruments. All rights reserved. Using the 2 MAC Units yn = Use of block filtering in order to calculate 2 output samples at a time. b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3 D a ta R e a d B u s e s t M AC MAC AC A0 AC1 y n+1 = b 0 x n+1 + b 1 x n + b 2 x n-1 + b 3 x n-2 C55x yn = C54x ESIEE, Slide 76 MAC *AR2+, *CDP+, AC0 :: MAC *AR3+, *CDP+, AC1 b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3 MAC *AR2+, *AR3+, A Copyright © 2003 Texas Instruments. All rights reserved. Block Filter Calculate a block of M output samples: Avoids interrupts sample by sample Allows calculation of 2 samples at a time N 1 ynm bi xn mi i 0 ESIEE, Slide 77 m 0, M 1. M+N-1 inputs necessary to calculate M output samples. Because of N-1 initial conditions. Copyright © 2003 Texas Instruments. All rights reserved. Block Filter, example N=4, M=3 CDP Coeffcients b0 AR2 b1 AR3 b2 b3 Input data xn xn-1 xn-2 xn-3 xn-4 xn-5 … yn = b0xn+b1xn-1+b2xn-2+b3xn-3 yn-1 = b0xn-1+b1xn-2+b2xn-3+b3xn-4 yn-2 = b0xn-2+b1xn-3+b2xn-4+b3xn-5 ESIEE, Slide 78 Copyright © 2003 Texas Instruments. All rights reserved. Block Filter Example Double loop: On coefficients and on m Coefficients accessed by CDP: CDP (Cmem) modifications limited to: *CDP, *CDP+, *CDP-, *(CDP+T0). CDP uses B bus only for dual-MAC. Because B bus is internal only, coefficients must also be internal. ESIEE, Slide 79 Place data operands carefully to avoid memory conflicts (SA/DARAM). Copyright © 2003 Texas Instruments. All rights reserved. Using Dual MAC y n = b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3 y n+1 = b 0 x n+1 + b 1 x n + b 2 x n-1 + b 3 x n-2 CDP AR2 AR3 B C D CDP MAC MAC AC0 AC1 Coeffcients b0 AR2 b1 AR3 b2 b3 Input data xn xn-1 xn-2 xn-3 xn-4 xn-5 … MAC *AR2+, *CDP+, AC0 :: MAC *AR3+, *CDP+, AC1 ESIEE, Slide 80 Copyright © 2003 Texas Instruments. All rights reserved. Initialization of Pointers Use AMOV to do transfers during the “AD” pipeline phase. Init AR2 to point to the 1st value of input data : (x) Init AR3 to point to the 2nd value of input data (x+1) Init CDP to point to coefficient array (a) AMOV AMOV AMOV ESIEE, Slide 81 #x,XAR2 #(x+1),XAR3 #a0,XCDP Copyright © 2003 Texas Instruments. All rights reserved. Inner Loop on Coefficients RPT #3 MAC *AR2+,*CDP+,AC0 :: MAC *AR3+,*CDP+,AC1 Pointers at the end of the repeat instruction: CDP CDP ESIEE, Slide 82 Coeffcients b0 b1 b2 AR2 b3 AR3 AR2 AR3 Input data xn xn-1 xn-2 xn-3 xn-4 xn-5 … Reinitialization of pointers for next output sample: ASUB ASUB MOV #2,AR2 #2,AR3 #a0,CDP Copyright © 2003 Texas Instruments. All rights reserved. Circular Addressing Mode for Coefficients Initialize size of the circular buffer: BK Set up Buffer Start Address: BSA and Xeven Set up ARi or CDP No memory alignment constraint b0 BKzz Xeven : BSAxx b1 b2 ARn/CDP b3 ESIEE, Slide 83 Copyright © 2003 Texas Instruments. All rights reserved. Circular Buffer Addressing Mode Buffer Start Address = Offset into Buffer = Calculated Address = Buffer Length = ESIEE, Slide 84 Xeven[22:16] + Xeven[22:16] BSAxx[15:0] ARn/CDP BSAxx + ARn/CDP BKzz[15:0] Copyright © 2003 Texas Instruments. All rights reserved. Circular Buffer Addressing Mode Offset AR0 AR1 AR2 AR3 AR4 AR5 AR6 AR7 CPD Xeven Buffer Start Address XAR0[22:16] BSA01 Block size Register BK03 XAR2[22:16] BSA01 XAR4[22:16] BSA01 BK03 XAR6[22:16] BSA01 XCDP[22:16] BSAC BKC The even XARn (i.e. 0,2,4,6) determines the 64K Page ESIEE, Slide 85 Copyright © 2003 Texas Instruments. All rights reserved. Selecting Circular or Linear Addressing Mode Use the LSB of Status word ST2_55 15 ST2_55 9 8 7 6 5 4 3 2 1 0 other bits or rsvd 0 = linear mode C D P L C A R 7 L C A R 6 L C A R 5 L C A R 4 L C A R 3 L C A R 2 L C A R 1 L C A R 0 L C 1 = circular mode (default) Set or reset status bits: BSET AR5LC BCLR AR3LC ESIEE, Slide 86 ;AR5 in circular mode ;AR3 in linear mode Copyright © 2003 Texas Instruments. All rights reserved. Circular Buffer Exercise Use AR4 as a circular pointer to x{5}: A AR R44 x .sect “data” .int 7,1,9,6,2 .sect “code” __________________ AMOV #x,XAR4 __________________ MOV #x,BSA45 __________________ MOV #5,BK47 __________________ MOV #0,AR4 __________________ BSET AR4LC MOV MOV MOV MOV ESIEE, Slide 87 #3,T0 *(AR4+T0),AC0 *+AR4(#4h),AC1 *AR4(T0),AC2 ;init data ;init XAR ;init start addr ;init length ;init AR4 to top ;set AR4 to circ ;index ;AC0 =_7__, AR4 =_3__ ;AC1 =_9__, AR4 =_2__ ;AC2 =_7__, AR4 =_2__ x 7 1 9 6 2 0 1 2 3 4 Results are cumulative Copyright © 2003 Texas Instruments. All rights reserved. Circular Buffer for Coefficients Table of coefficients b0 … b3: Circular buffer addressed by CDP. Initialize XCDP: 7 MSB Initialize CDP to 0: offset in the buffer Set up CPD in circular addressing mode s1: AMOV AMOV AMOV MOV MOV MOV BSET ESIEE, Slide 88 #x,XAR2 #a0,XCDP #(x+1),XAR3 #a0,BSC #0,CDP #4,BKC CDPLC Copyright © 2003 Texas Instruments. All rights reserved. Store Results, 32-bit Moves Assuming fractional mode, 2 results are in high parts of AC0 and AC1 AC0 and AC1 can be saved separately: MOV HI(AC0), *AR4+ MOV HI(AC1), *AR4+ AC0, AC1 can be saved at the same time: MOV pair(hi(AC0)),dbl(*AR4+) ESIEE, Slide 89 Pairs: (AC0,AC1), (AC2,AC3) ARi incremented of 2 Even align y Copyright © 2003 Texas Instruments. All rights reserved. Block Filter Inner Loop s1: e1: ESIEE, Slide 90 AMOV AMOV AMOV AMO V MOV MOV MOV BSET #x,XAR2 #a0,XCDP #(x+1),XAR3 # y, X A R4 #a0,BSAC #0,CDP #4,BKC CDPLC MOV MOV RPT MAC ::MAC ASUB ASUB MOV #0,AC0 #0,AC1 #3 *AR2+,*CDP+,AC0 *AR3+,*CDP+,AC1 #2,AR2 #2,AR3 p ai r ( hi (A C 0) ), d bl ( *A R4 + ) Copyright © 2003 Texas Instruments. All rights reserved. Outer Loop Using RPTB or RPTBlocal Use RPTB Repeat Block instruction We must specifiy: Start address of the block: next instruction End address: label specifies last instruction The number of repetitions counter: RPTBlocal: executes from the IBU ESIEE, Slide 91 BRC0: loop counter initialized with count-1 Min count = 2 56 bytes maximum (if > 56 Bytes use RPTB) Reduces power consumption Copyright © 2003 Texas Instruments. All rights reserved. Outer Loop on m: Calculate M yn-m s1 : AMO V #x, XAR2 AMO V #a0 ,XCD P AMO V #(x +1), XA R 3 AMO V #y, XAR4 MOV #a0 ,BSA C MOV #0, CDP MOV #4, BKC BSE T CDP LC MOV #(( samp s- t aps )/2) ,BR C 0 RPT BLO CA L e1 MOV #0, AC0 MOV #0, AC1 RPT #3 MAC *AR 2+,* CD P +,A C0 :: MAC *AR 3+,* CD P +,A C1 ASU B #2, AR2 ASU B #2, AR3 e1 : MOV pai r(hi (A C 0)) ,dbl (*A R 4+ ) ESIEE, Slide 92 Copyright © 2003 Texas Instruments. All rights reserved. More Nested loops ? Nesting RPTB or RPTBlocal: 2 levels supported using BRC0 (outer) and BRC1/BRS1 (inner) No saving of registers required for nested block repeat. MOV #outer_cnt,BRC0 MOV #inner_cnt,BRC1 RPTBLOCAL outer . . . RPTBLOCAL inner . . . inner: last_inner . . . outer: last outer ESIEE, Slide 93 ;load outer loop count ;load BRC1, auto-load BRS1 ;use BRC0 ;BRC1: decrements, BRS1-no change Copyright © 2003 Texas Instruments. All rights reserved. Laboratory on Block Filter Implement a block FIR with 16 coefficients and input block size = 200. Implement subroutine C 5 5 10 64 Kx8 ROM FF_0000h EPtable{16} 1_0000h code 4000h FF_FF00h vectors 6000h SARAM0 8Kx8 a{16} DARAM2 8Kx8 x{200} DARAM3 8Kx8 SP/SSP 5_0000h AC0 16Kx8 CE0 y All addresses and lengths are shown in bytes ESIEE, Slide 94 Copyright © 2003 Texas Instruments. All rights reserved. Using the Stack and Subroutines Subroutines require call and ret. During a call the return address is stored in the Stack SP. Let us call fir the subroutine: ESIEE, Slide 95 call fir Copyright © 2003 Texas Instruments. All rights reserved. Initialize the Stack Declare an unitialized section (.usect) of appropriate length to reserve space. Initialize stack pointer to point to the top of stack +1. Recommendation: place the stack in internal memory and align on a 4-byte boundary: ALIGN= specifies bytes 0 Size .set 100h Stack .usect "STK",size AMOV #(stack+size),XSP Mem STK SP ESIEE, Slide 96 Copyright © 2003 Texas Instruments. All rights reserved. The System Stack SSP ESIEE, Slide 97 When a call occurs PC[15:0] is pushed on the stack The upper 8 bits SP[23:16] are pushed on the system stack accessed by SSP System Stack Pointer. CFCT is used to store the active loop context. WSP and XSSP share the same upper 7 bits. Place SP and SSP with care to avoid dual-access delays. Copyright © 2003 Texas Instruments. All rights reserved. Data Types Byte: 8 bits Word: 16 bits Long: 32 bits Long access assumes address points to MSW LSW read from same address with LSB toggled. Ptr=100h, MSW=100h, LSW = 101h Ptr=101h, MSW=101h, LSW = 100h To ensure proper alignment: Constants (int, long) are automatically aligned on type boundaries Variables: 16 bit: no problem 32 bits use: use the even-align flag: ESIEE, Slide 98 .usect “vars”,Nwords,,1 Copyright © 2003 Texas Instruments. All rights reserved. Solution: Declarations x0 stklen a0 y0 BOS BOSS .sect "indata" .copy in7.dat .def start .cpl_off .arms_off .c54cm_off .set 100 .usect "coeffs",16,1,1 .usect "results",200,1,1 .usect "STK", stklen,1,1 .usect "SSTK",stklen,1,1 .sect "init" table ESIEE, Slide 99 .int .int .int .int 7FCh, 800h, 803h, 7FFh, 7FDh, 801h, 802h, 7FEh, 7FEh, 802h, 801h, 7FDh, 7FFh 803h 800h 7FCh Copyright © 2003 Texas Instruments. All rights reserved. Solution: Code sect "code" .DP a0 . start: here: ESIEE, Slide 100 AMOV #BOS+stklen,XSPc ;set up Stack + MOV #BOSS+stklen,SSP ;System Stack Ptrs CALL copy ;copy coeffs BSET FRCT BSET M40 BSET SXMD ;turn on mult. shift ;turn on 40 bit math ;turn on sign exten. CALL fir nop B here ;perform fir ;stop Copyright © 2003 Texas Instruments. All rights reserved. Solution: Subroutine copy copy: AMOV #table,XAR2 ;load pointers AMOV #a0,XAR3 RPT #7 MOV dbl(*AR2+),dbl(*AR3+) ;move from table to a RET ESIEE, Slide 101 Copyright © 2003 Texas Instruments. All rights reserved. Solution: Subroutine fir fir: MOV #92,BRC0 ;block repeat count AMOV #x0,XAR2 ;initialize pointers AMOV #x0+1,XAR3 ;for data, AMOV #y0,XAR4 ;results AMOV #a0,XCDP ;and coeffiecients MOV #a0,BSAC ;buffer start address MOV #16,BKC ;buffer size MOV #0, CDP ;index BSET CDPLC ;turn on circ adr CDP end ESIEE, Slide 102 RPTBlocal end MPYM *AR2+,*CDP ,AC0 ;AC0 1st product MPYM *AR3+,*CDP+,AC1 ;AC1 gets 2nd prd RPT #14 MAC *AR2+,*CDP+,AC0 ;form results :: MAC *AR3+,*CDP+,AC1 MOV pair(hi(AC0)),dbl(*AR4+) ;store AC0/AC1 ASUB #14,AR2 ;wrap data pointers ASUB #14,AR3 ;next calculation RET Copyright © 2003 Texas Instruments. All rights reserved. Implementation of Symmetrical and Anti-symmetrical FIR filters on ‘C55x Symmetrical Anti-symmetrical Coeff s Coeff s b0 b1 b2 b3 b0 b1 b2 b3 b4 b5 b6 b7 b4 b5 b6 b7 These filters may be “folded” and performed with N adds and N/2 MACs Filters need to be designed as even length N 1 2 y (n) b(i ) x(n i ) x(n N 1 i ) N even. i 0 ESIEE, Slide 103 Copyright © 2003 Texas Instruments. All rights reserved. Instructions FIRSADD and FIRSSUB FIRSADD Xmem,Ymem, coef,Acx,Acy FIRSSUB Xmem,Ymem, coef,Acx,Acy ESIEE, Slide 104 Acy = Acy + (Acx x (*CDP)) || Acx = Xmem + Ymem For symmetrical FIR Acy = Acy + (Acx x (*CDP)) || Acx = Xmem - Ymem For anti-symmetrical FIR If performing a block FIR, dual MAC has better performance than FIRS. A design consideration for migration from ‘C54x. Copyright © 2003 Texas Instruments. All rights reserved. Comparison of C54x and C55x 2 MAC in ‘C55x versus 1 for C54x Circular addressing modes: ESIEE, Slide 105 Well suited for block filtering and 2 taps per cycle time instead of 1 (for large N). 3 BK registers in C55X instead of 1 in ‘C54x: allows for several simultaneous circular buffers with different size. In C54x, circular addressing mode is specified in indirect addressing type % in the instructions. In C55x, the mode in set in status register ST2_55 for each register (linear or circular). No memory alignment constraint. Copyright © 2003 Texas Instruments. All rights reserved. Comparison of C54x and C55x Symmetrical and Anti-symmetrical FIR Filters In C54x, instruction FIRS: In C55x, instructions FIRSADD + FIRSSUB: ESIEE, Slide 106 Allows 2 taps/cycle for a symmetrical FIR Allow us to efficiently implement symmetrical and anti-symmetrical FIRs. Despite the 2 MACs, as there is only 1 ALU, again 2 taps/cycle for symmetrical or antisymmetrical FIRs. Copyright © 2003 Texas Instruments. All rights reserved. Follow On Activities on 5416 DSK Laboratory 3 for TMS320C5416 DSK Laboratory 4 for TMS320C5416 DSK To determine by experiment how many FIR coefficients are required for acceptable audio quality. Application 4 for TMS320C5416 DSK ESIEE, Slide 107 To determine by practical experiment the best FIR window functions for audio. Electronic Crossover for multiple loudspeaker system. Divides audio signal into treble and bass at 16 different selectable frequencies using FIR filters. Copyright © 2003 Texas Instruments. All rights reserved. Follow on activities on 5510 DSK Application “delays and echo” for TMS320C5510 DSK ESIEE, Slide 108 Simulates delays in communications networks and reflection of sound heard in a canyon. Introduces circular buffers and the configuration used for a Finite Impulse Response (FIR) filter. Copyright © 2003 Texas Instruments. All rights reserved.