DSP C5000 Chapter 13 Numerical Issues Copyright © 2003 Texas Instruments. All rights reserved. Learning Objectives Data formats Fixed point: integer and fractional numbers Use methods for handling multiplicative and accumulative overflow Floating point Block floating point Comparison of formats ESIEE, Slide 2 Copyright © 2003 Texas Instruments. All rights reserved. Data Formats and Numerical Issues Common data sizes: 8, 16, 24, 32 bits Fixed or floating point For a given technology: Processors of the ‘C5000 family are fixed point processors. ESIEE, Slide 3 Fixed point is faster and less expensive But fixed point programming is more difficult But they can also execute floating point operations through software Copyright © 2003 Texas Instruments. All rights reserved. Digital Representation of a Signal Sampling ADC Analog to Digital Conversion ESIEE, Slide 4 Quantization Coding of the quantized value Digital representation used in DSP Copyright © 2003 Texas Instruments. All rights reserved. Digital Coding of Data and Arithmetic Finite precision: ESIEE, Slide 5 Representation uses a given number of bits Fixed point Floating point Block floating point Copyright © 2003 Texas Instruments. All rights reserved. Interface ADC - DSP - DAC A D D A C C D S P Possible Conversions: fixed point floating point A or mu law linear law (Compression-Expansion) ESIEE, Slide 6 Copyright © 2003 Texas Instruments. All rights reserved. Binary Representation of Signed Integers used in ADC-DAC or DSP in Fixed Point Format ESIEE, Slide 7 2’s Complement (digital processors) 1’s Complement Sign, magnitude Offset Binary Copyright © 2003 Texas Instruments. All rights reserved. Fixed Point Arithmetic 2’s Complement Representation ESIEE, Slide 8 Copyright © 2003 Texas Instruments. All rights reserved. Example of Size 3 bits for Integers, Decimal and Binary Representations Positive Positive Signed Signed Signed Signed integers integers integers integers integers integers Offset Sign + decimal Binary Decimal Decimal binary magnitude 7 111 3 111 3 011 6 110 2 110 2 010 5 101 1 101 1 001 4 100 0 100 0 000 3 011 -1 011 0 100 2 010 -2 010 -1 101 1 001 -3 001 -2 110 0 000 -4 000 -3 111 Weights 2 1 0 2 2 2 ESIEE, Slide 9 Copyright © 2003 Texas Instruments. All rights reserved. Example of Size 3 bits for Integers, Decimal and Binary Representations Signed integers Decimal 3 2 1 0 -1 -2 -3 -4 Signed integers 1's complement Signed integers 2's complement 011 010 001 0 0 0 or 1 1 1 110 101 100 011 010 001 000 111 110 101 100 y =-2N ESIEE, Slide 10 x 1 y =-2 N x Copyright © 2003 Texas Instruments. All rights reserved. Representation of Signed Integers in 2’s Complement Format x b N -1 bk x 0x = b0 N -1 -k b 2 k k =0 x 0 y = 2N - x y = N -1 -k b 2 k k =0 x = -2 N -1 N-2 b N -1 b k 2 -k k =0 ESIEE, Slide 11 Copyright © 2003 Texas Instruments. All rights reserved. Non-Integer Numbers Using Fixed Point Format Qk : k fractional bits associated with negative power of 2. The binary representation of a number x in format Qk is the 2’s complement representation of the integer y: y=2 x k Integer Part bN -1- k Fractional Part , b1b0 b-1 x = -bN -1-k 2N -1-k bN -2-k 2N -2-k ESIEE, Slide 12 b- k b0 b-1 2-1 b-k 2-k Copyright © 2003 Texas Instruments. All rights reserved. Some Properties of 2’s Complement Representation Max number=2 N -1 - 1 Min number=-2 N-1 Circular Representation: (OVM, SATD) (2 N -1 - 1) 1 = 2 N -1 -2 N -1 Sign bit Extension: (SXM, SXMD) Related status bits in C5000 DSP OVM = OVerflow Mode of the C54 DSPs on C54 DSPs SATD = SATuration mode of the D unit on C55 DSPs SXM = Sign eXtension Mode on C54 DSPs SXMD = Sign eXtension Mode of the D unit on C55 DSPs ESIEE, Slide 13 Copyright © 2003 Texas Instruments. All rights reserved. Addition and Subtraction Using 2’s Complement Simple hardware operator: to add 2 signed N-bit integers with a result of size N bits. Whatever the sign of numbers, it is sufficient to add the 2’s complement values. 0 -1 Carry 1 -2 2 -3 3 111 + 111 -------1 110 010 + 001 -------0 011 110 + 011 -------1 001 110 + 001 -------0 111 -4 Overflow (intermediate) OV=1 ESIEE, Slide 14 Copyright © 2003 Texas Instruments. All rights reserved. Multiplying and Shifting in 2’s Complement Simple hardware operator but more difficult than with a sign-magnitude representation. The product of 2 N-bit numbers needs support for 2N-bit results. Generally, the product register is of size 2N bits => 2 identical MSB (1bit left shift). Booth Algorithm (on 3 bits) AB=-4A(b2-b1)-2A(b1-b0)-A(b0-0) ESIEE, Slide 15 k bits right Arithmetic shifting: sign bit extension necessary. Copyright © 2003 Texas Instruments. All rights reserved. Sign eXtension Mode SXM or SXMD With 2’s complement, when 16-bit data are loaded into a 32-bit accumulator, the sign bit is also extended. This sign extension may be annoying: e.g. Calculation of 16-bit addresses. The user can choose whether or not to use sign bit extension mode. ESIEE, Slide 16 SXM = Sign eXtension Mode bit in the status word ST1 in C54 DSPs. SXMD = Sign eXtension Mode bit for the D unit in the status word ST1_55 in C55 DSPs Copyright © 2003 Texas Instruments. All rights reserved. Sign Bit Extension Example data size 6 bits, Accumulator size 12 bits Data 1 0 1 0 0 1 Loading of ACCU with sign extension 1 1 1 1 1 1 1 0 1 0 0 1 Loading of ACCU without sign extension 0 0 0 0 0 0 1 0 1 0 0 1 ESIEE, Slide 17 Copyright © 2003 Texas Instruments. All rights reserved. Addition Overflow When adding 2 numbers of size N bits, the result may need N+1 bits. Example for integers of N=3 bits: ESIEE, Slide 18 3+3 = 6 cannot be represented using 3 bits, but can be expressed using 4 bits. In format Q2 of N=3 bits, 0.75 + 0.5 =1.25 cannot be represented using 3 bits, needs 4 bits. When adding M numbers of N bits, the result potentially needs N+ log2(M) bits. Copyright © 2003 Texas Instruments. All rights reserved. Using Saturation Overflows in 2’s complement create unexpected sign changes and peaks that are difficult to filter. Saturation arithmetic detects the overflow and replaces the result with a saturation value. Example, max value = 0.75 1 0.8 Saturation at 0.75 0.6 0.4 0.2 0 2’s complement overflow -0.2 -0.4 -0.6 -0.8 -1 ESIEE, Slide 19 0 0.2 0.4 0.6 0.8 1 Copyright © 2003 Texas Instruments. All rights reserved. Setting saturation modes with OVM or SATD The user can choose whether or not to use saturation mode by setting the corresponding mode bits. OVM = OVerflow Mode bit in status word ST1 in C54 DSPs. If OVM = 1: SATD = SATuration mode bit for the D unit in the status word ST1_55 in C55 DSPs. If SATD = 1 and M40 =0, same as for C54 DSP If SATD=1 and M40 =1 ESIEE, Slide 20 positive results are saturated to 00 7FFF FFFF Negative results are saturated to FF 8000 0000. positive results are saturated to 7F FFFF FFFF Negative results are saturated to 80 0000 0000. Copyright © 2003 Texas Instruments. All rights reserved. Saturation mode for the A unit in C55 DSPs SATA = SATuration mode bit for the Aunit ALU in the status word ST3_55 in C55 DSPs. If SATA=1, if a calculation in the A-unit results in an overflow: ESIEE, Slide 21 positive results are saturated to 7FFF Negative results are saturated to 8000. Copyright © 2003 Texas Instruments. All rights reserved. Effect of 2’s Complement Overflow As 2’s complement is a circular representation, if the result holds on N bits, the intermediate overflows do not alter the final result This is not the case for saturation Example of N = 3 bits: Calculate x = 3+2-4, the theoretical result is 1 With 2’s complement overflow: With saturation: ESIEE, Slide 22 Calculate first y=(3+2)= 011+010 =101 =-3 overflow Then (y-4)=101+100=1 001 = 1 and carry =1 correct result Calculate first y=(3+2)=3 saturation Then (y-4) = 011+100=111=-1 wrong result If a system has a unity gain, saturation should not be used. Copyright © 2003 Texas Instruments. All rights reserved. Example of 2’s Complement Binary Representations Represent x = 1.75 using N=6 bits in format Q3 Represent x = -1.75 using N=6 bits in format Q3 Answer 110.0 10 = - 4 +2+1/4 Represent x = 1. 805 using N=6 bits in format Q3 ESIEE, Slide 23 Answer 001.110 = 1 +1/2 +1/4 Answer 001.110 = 1 + 1/2 + 1/4 Copyright © 2003 Texas Instruments. All rights reserved. Operations with Fractional Numbers using Fixed Point Format Addition: align on same size N and align bits with same weight. Qk Qk Qk Multiplication: product requires 2N bits Qk Qk ' Qk k ' ESIEE, Slide 24 Copyright © 2003 Texas Instruments. All rights reserved. Example of 2’s Complement Binary Operations Data size N=6, format Q3 Product 12 bits, Q6 Product 1.75 x 2.5 = 4.375 Binary representation: Sum 6 bits, format Q3 Sum 1.75 + 1.5 = 3.25 Binary representation: ESIEE, Slide 25 001.110 x 010.010 = 000100.011000 001.110 + 001.100 = 011.010 Copyright © 2003 Texas Instruments. All rights reserved. Accumulator and size of the result The final result of a calculation usually uses more than 16 bits (size of memory words). ACCUs use 32, 40, 56 ... Bits If we want to save the result in a single memory word, the question is: Which pack of N bits must be saved from accumulator? ESIEE, Slide 26 Possibility of overflow and underflow Overflow during accumulation or during saving. Copyright © 2003 Texas Instruments. All rights reserved. ACCUMULATOR Possibility of overflow and underflow Scaling when adding many products Gu a rd bits 39 32 ACCU Hig h 31 ACCU Lo w 16 15 0 16 bits to s a v e ESIEE, Slide 27 Copyright © 2003 Texas Instruments. All rights reserved. Saturation on store mode, SST bit ESIEE, Slide 28 SST = mode bit in PMST (C54) or ST3_55 (C55) status word. If SST is set, the CPU saturates a shifted or unshifted accumulator value before storing it. The saturation value depends on the value of the sign extension mode bit. ACCU remains unchanged. Copyright © 2003 Texas Instruments. All rights reserved. Example of Fixed Point Processing y(n)=x(n)+a1y(n-1) Data size N=16, product size 32 bits, accumulator size 40 bits The coefficient a1 is smaller than 1: format Q15. Format of data = Q15, accumulator size 40bits Accumulator 39 32 31 16 15 0 a1y(n-1), Q30 39 32 31 16 15 0 16 15 0 a 1 y ( n- 1 ) , Q 3 0 + 39 32 x ( n ) , Q1 5 31 y ( n ) , Q1 5 16 bits to save ESIEE, Slide 29 Copyright © 2003 Texas Instruments. All rights reserved. Representation of Sum of Products The basic sum of M products operation, for data and coefficients of size N bits: M -1 y(n) = bk x(n - k) k =0 ESIEE, Slide 30 Needs 2N bits for each product + log2(M) bits for the sum of M products, or maximum 2N+log2(M) bits. The C5000 DSP has Accumulators of size 32+8 bits that allow for the sum of 256 products without overflow. If M>256, may necessitate scaling of data Copyright © 2003 Texas Instruments. All rights reserved. Solutions to Overflow Overflow multiplication can be prevented by using pure fractional numbers (< 1) Saturation of the result Scaling of the inputs and use of fractional arithmetic Use double precision or double word ESIEE, Slide 31 But decreases speed of calculation Use DSP with larger accumulators. But loss of precision 8 guard bits in the’C5000 accumulators. Design system with unity gain. Use floating point Copyright © 2003 Texas Instruments. All rights reserved. Products of size 2N or 2N-1 Bits? 1 of 3 The product of 2 data values of size N bits can be stored using 2N-1 bits, except where the two most negative numbers are multiplied together. Example of size N=3 bits for integer values. Example on N=16 bits and Q15 format: ESIEE, Slide 32 The integer values are between –4 and +3. All the products are between –16 and +15 and can be written on 2N-1=5 bits, Except –4 x –4 = 16. -1 x –1 = +1 cannot be written on 31 bits in Q30. Copyright © 2003 Texas Instruments. All rights reserved. Products of Size 2N or 2N-1 Bits? 2 of 3 Consider the case of data < 1 using N=16 bits, Q15 format. Their products are < 1 and can be expressed using 32 bits format Q30 with 2 sign bits It is possible with the C5000 DSP to automatically eliminate one sign bit by a left shift of 1 bit, thus obtaining a Q31 result. ESIEE, Slide 33 If bit FRCT in ST1 is set to 1, products are automatically shifted left by 1 bit. Copyright © 2003 Texas Instruments. All rights reserved. Products of Size 2N or 2N-1 Bits? 3 of 3 The exception –1 x –1 can be treated using the SMUL status bit that saturate the result of the multiplication before accumulation. –1 is equal to 8000 in hexadecimal on 16 bits. If SMUL=1, SATD or OVM=1, FRCT =1 ESIEE, Slide 34 The product of (1)8000 x (1)8000 is saturated to the positive number 7FFF FFFF after the multiplication and before accumulation in MAC or MAS instructions. Consistent with ETSI-GSM specifications. Copyright © 2003 Texas Instruments. All rights reserved. Fixed Point Programming ESIEE, Slide 35 Perpetual compromise between dynamic range and precision constraints Keep enough bits to represent the integer part of the result Keep enough bits in the fractional part to satisfy the precision. Rounding results. Copyright © 2003 Texas Instruments. All rights reserved. Entering Non-Integer Values using the Software Development Tools The tools do not support fractions To store 0.707 in Q15 use: To store 3.252 in Q13 use: .word 8192*3252/1000 Generally, to convert a real number x using 2’s complement representation with size N bits and format Qk: ESIEE, Slide 36 .word 32768*707/1000 Calculate the integer y=round(x 2k) The 2’s comp. representation of y is the 2’s comp. representation of x in format Qk. Copyright © 2003 Texas Instruments. All rights reserved. Some more stuff on Saturation Two saturation methods exist: Manual: using the SAT instruction (ACx only) AC0 128 1 SAT AC0 0 -1 -128 ESIEE, Slide 37 Auto: using the SATA/SATD or OVM control bits SATA affects TAx registers (T0-3/AR0-7) in A unit ex: 7FFFh + 2 = 7FFFh ex: 8001h - 3 = 8000h SATD affects AC0-3 registers in D unit (ST1_55M40 = 0) 00.7FFF.FFFF or FF.8000.0000 (ST1_55M40 = 1) 7F.FFFF.FFFF or 80.0000.0000 - Affects ST0_55ACxOV and can be tested Copyright © 2003 Texas Instruments. All rights reserved. Rounding $ $ $ $ How do you round this amount to the nearest $ ? 1.53 - Add $0.50 0.50 - Partial result 2.03 - Truncate result (to nearest $) 2. Instructions RND in C54 DSPs or ROUND in C55 DSPs, rounds the content of the accumulator. For the C55, 2 kinds of rounding: biaised or unbiaised, depending on the bit RDM in ST2_55. Biased Rounding (ST2_55RDM = 0) or round to the infinite - Direct: ROUND AC0 - Store: MOV uns(rnd(HI(saturate(AC0)))),*AR1 ESIEE, Slide 38 rnd() and ROUND perform the following operation: (add 1 to bit 15) and (truncate) (ACx+0x8000) & 0xFFFF0000 Copyright © 2003 Texas Instruments. All rights reserved. Other Useful Stuff... ESIEE, Slide 39 Absolute Value ABS AC0,AC1 2’s Complement NEG AC0,AC1 1’s Complement NOT AC0,AC1 1-bit division SUBC Smem,ACx Normalization MANT; EXP Setting ST1_55 SMUL, FRCT, SATD = 1 will saturate (-1 x -1) to 7FFF_FFFFh prior to adding/subtracting to/from the accumulator. This ensures a 1 cycle ETSI-compatible operation and prevents temporary overflow. Copyright © 2003 Texas Instruments. All rights reserved. - 39 Copyright © 2003 Texas Instruments. All rights reserved. Floating Point Arithmetic ESIEE, Slide 40 Copyright © 2003 Texas Instruments. All rights reserved. Floating Point Representation Number x -> Mantissa M and Exponent E x = M2E If M is of size m bits and E is of size e bits, then x is of size N = m + e bits Range of positive numbers for 0.5 |M| <1 and 2’s comp. representation of M and E: 1 -2e -1 1-m 2e -1 -1 , 1- 2 2 2 2 ESIEE, Slide 41 Copyright © 2003 Texas Instruments. All rights reserved. Normalization of the mantissa The decomposition of a real value x into the product of a mantissa and an exponent term is not unique: M must be normalized to make the decomposition unique. The normalization is a constraint applied to M ESIEE, Slide 42 x=M12E1=M22E2 … Example: 12.8=0.8 24 and also 12.8= 1.6 23 for example: 0.5 |M| < 1 The ratio of the limits of the interval must be smaller than 2 to have the same exponent. Copyright © 2003 Texas Instruments. All rights reserved. Floating Point Representation Non-linear scale: The precision decreases geometrically while the data size increases. m- 2 m- 2 2values 8xmin 2values 4xmin 2xmin xmin 0 For a given number of bits: the number of bits of the mantissa determines the precision the number of bits of the exponent determines the dynamic range. ESIEE, Slide 43 Copyright © 2003 Texas Instruments. All rights reserved. Floating Point Overflow or Underflow Very unlikely to occur Overflow 2 e-1 -1 x 2 1 - 2 e-1 Underflow x 2 2 ESIEE, Slide 44 Copyright © 2003 Texas Instruments. All rights reserved. Floating Point Addition Operator A B = Ma 2 Ea Mb 2 Eb = Ma Mb 2 Eb - Ea 2 Ea It is necessary to denormalize the smallest number (B) ESIEE, Slide 45 Its mantissa is multiplied by 2Eb-Ea before being added to Ma. Loss of precision due to the rounding of the mantissa Copyright © 2003 Texas Instruments. All rights reserved. Floating Point Multiplication Operator A B = Ma Mb 2 ESIEE, Slide 46 Ea Eb It is necessary to normalize MaMb 1 extra bit would be necessary to prevent overflow of Ea+Eb. 2m-1 bits are necessary to represent MaMb If M is truncated to m bits, the absolute error increases rapidly. Copyright © 2003 Texas Instruments. All rights reserved. Examples of Floating Point DSP Some DSP devices of the C6000 family: ESIEE, Slide 47 C67xx support both single and double precision format. The C5000 DSP are fixed point DSP but can be programmed in floating point if necessary. Copyright © 2003 Texas Instruments. All rights reserved. Example of Floating Point Representation Represent x=1.75 in Floating Point Solution: ESIEE, Slide 48 Use N=8, Mantissa size m=5 bits, exponent size e=3 bits, M and E in 2’s complement Mantissa normalized to 0.5 |M| <1 E=1 in binary representation: 001 M=0.875 in binary representation 0.1110 Copyright © 2003 Texas Instruments. All rights reserved. Comparison of Fixed and Floating Point Formats Fixed point: linear scale Floating point: non-linear scale with a geometrical progression ESIEE, Slide 49 Absolute error more or less constant SNR decreases when the input decreases Relative error more or less constant SNR more or less constant over the full data range Copyright © 2003 Texas Instruments. All rights reserved. Quantization Error and SNR with Fixed-Point d = x - x q d 2 E (d ) = 0 E d 2 For x xmax = SNR dB SNR dB 10log10 ESIEE, Slide 50 (rounding ) 2 x 2 2 d q = 12 x2 = 10 log10 2 d 6N -10log x 10 2 max 3 10log10 2 Copyright © 2003 Texas Instruments. All rights reserved. Quantization Error with Floating Point d = x - x (rounding) d m = rounding error on mantissa 1 - m-1 0 dm 2 2 d dm d r = relative error on x = = x M ESIEE, Slide 51 Copyright © 2003 Texas Instruments. All rights reserved. Quantization Error and SNR with Floating Point dr 2 - m -1 For x random with "fast variations" d r white noise uncorrelated with x d = xˆ - x = xd r d2 = x2 d2 r SNR dB = 6m 1.44 ESIEE, Slide 52 Copyright © 2003 Texas Instruments. All rights reserved. Comparison of Fixed Point and Floating Point SNR 100 86dB RSB en dB 80 Fixed point SNR 73 dB 60 Floating point SNR 40 20 0 -20 -100 -50 0 50 100 Signal Power in dB Example for N=16 bits: m=12 e=4 ESIEE, Slide 53 Copyright © 2003 Texas Instruments. All rights reserved. Comparison of Fixed Point and Floating Point For N bits, with Floating Point format there is a compromise between dynamic range (E), and precision (M). Example for N=32 bits: Fixed Point 32 bits Dynamic Range Precision max Floating Point: m=24 b=8 109 Dynamic Range 1077 9 digits Precision 7 digits Dynamic range is defined as the ratio of the largest positive value on the smallest non zero positive value ESIEE, Slide 54 Copyright © 2003 Texas Instruments. All rights reserved. Fixed Point vs. Floating Point Fixed Point: Floating Point: ESIEE, Slide 55 Simple operators of addition and multiplication But it is necessary to monitor overflow and underflow in order to keep precision and dynamic range at their best. Greater dynamic range and simpler programming More complex operators, so the performances in terms of speed or power consumption are not so good as those of fixed point DSP. Copyright © 2003 Texas Instruments. All rights reserved. IEEE 754 Floating Point Format 1 of 4 Most processors respect the IEEE 754 format for Floating Point representation of numbers. IEEE format for N=32 bits: 32 bits = 1 bit (Sign bit) + 8 bits (Exponent) + 23 bits (Fraction) Exponent: offset binary, offset = 127, exponent=expo-127 Mantissa: sign-magnitude, normalized between 1.0...0 and 1.1...1 Hidden bit 1,... .Only the fractional part (Fraction) is stored. When exponent not equal to 0, |Mantissa = 1.fraction e.g. : x=28=1,75 24 0 10000011 1100...0 sign (expo-127) value = (-1) * (1.fraction) * 2 for non-zero exponent ESIEE, Slide 56 Copyright © 2003 Texas Instruments. All rights reserved. Dynamic Range of IEEE 754 Single Precision Floating Point Format 2 of 4 value = (-1)sign * (1.fraction) * 2(expo-127) for non-zero exponent Largest positive number: Smallest positive number (non-zero) ESIEE, Slide 57 Max exponent = 254-127=127 Max Mantissa = 2-2-23 Max positive value = (2 -2-23)x2127 2128 Min exponent = 1-127 Min Mantissa = 1.0 Min positive value = 1.0 x 2-126 Copyright © 2003 Texas Instruments. All rights reserved. IEEE 754 Single precision Floating Point Format, Special Cases 3 of 4 ESIEE, Slide 58 Zero: 32 bits are 0 Underflow: exponent < 1 Overflow: exponent > 254 Copyright © 2003 Texas Instruments. All rights reserved. IEEE 754 Floating Point Format 4 of 4 Double precision 64 bits: ESIEE, Slide 59 1+11+52 Exponent offset binary: offset= 1023 Extended simple precision 43 bits : 1+11+31 Extended Double precision 79 bits: 1+15+63 Copyright © 2003 Texas Instruments. All rights reserved. Block Floating Point ESIEE, Slide 60 Copyright © 2003 Texas Instruments. All rights reserved. Block Floating Point This is not a DSP format This is a way of doing floating point operations efficiently on a fixed point DSP Natural approach for block operations such as the Fast Fourier Transform (FFT). ESIEE, Slide 61 See details in chapter 19. Copyright © 2003 Texas Instruments. All rights reserved. Block Floating Point ESIEE, Slide 62 A register contains the value of the exponent (constant) to be applied to a block of data: BLOCK EXPONENT The mantissa is of size N bits. Each data block is tested and scaled by the exponent in order to avoid overflows. Useful when N is small (e.g.: N=16 bits) Limits the loss of precision due to the increase in dynamic range of floating point. Copyright © 2003 Texas Instruments. All rights reserved.