Quantization Effects in Digital Filters Distribution of Truncation Errors In two's complement representation an exact number would have infinitely many bits (in general). When we limit the number of bits to some finite value we have truncated all the lower-order bits. Suppose the number is 23/64. In two's complement binary this is 0.0101110000... If we now limit the number of digits to 5 we get 0.0101 which represents exactly 20/64 or 5/16. The truncation error is 5 /16 - 23/ 64 = −3/ 64. If the exact number is − 15 / 64, this is 1.110001000... in two's complement. Truncating to 5 bits yields 1.1100 which exactly represents − 1/ 4. The error is − 1/ 4 − ( −15 / 64 ) = −1/ 64. This illustrates that in two's complement truncation the error is always negative. Distribution of Truncation Errors In analysis of truncation errors we make the following assumptions. 1. The distribution of the errors is uniform over a range equivalent to the value of one LSB. 2. The error is a white noise process. That means that one error is not correlated with the next error or any other error. 3. The truncation error and the signal whose values are being truncated are not correlated. Quantization of Filter Coefficients Let a digital filter's ideal transfer function be N H(z) = −k b z ∑k k =0 N 1 + ∑ ak z − k k =1 If the a's and b's are quantized, the poles and zeros move, creating a new transfer function N ˆ (z) = H ˆ z −k b ∑k k =0 N 1 + ∑ aˆk z − k k =1 where aˆk = ak + ∆ak and bˆk = bk + ∆bk . Quantization of Filter Coefficients The denominator can be factored into the form N D ( z ) = 1 + ∑ ak z k =1 −k = ∏ (1 − pk z − k ) N k =1 and, with quantization, −1 ˆ ( z ) = 1 + ∑ aˆ z − k = ˆ D 1 − p z ( ) ∏ k k N N k =1 k =1 where pˆ k = pk + ∆pk . The errors in the pole locations are related to the errors in the coefficients by N ∂pi ∂pi ∂pi ∂pi ∆pi = ∆a1 + ∆a2 + " + ∆a N = ∑ ∆ ak ∂a1 ∂a2 ∂aN k =1 ∂ak Quantization of Filter Coefficients The partial derivatives can be found from ⎡∂ D( z)⎤ ⎡∂ D( z)⎤ ∂pi =⎢ ⎢ ⎥ ⎥ × ⎣ ∂ak ⎦ z = pi ⎣ ∂z ⎦ z = pi ∂ak or ∂pi ⎡⎣∂ D ( z ) / ∂ak ⎤⎦ z = pi = ∂ak ⎡⎣∂ D ( z ) / ∂z ⎤⎦ z = p i ⎡∂ D( z)⎤ ∂ = ⎢ ⎥ ⎣ ∂ak ⎦ z = pi ∂ak N ⎡ −k ⎤ −k −k ⎡ ⎤ 1 a z z p + = = i ⎢ ∑ k ⎥ ⎣ ⎦ z = pi ⎣ k =1 ⎦ z = pi ⎡∂ N ⎤ ⎡∂ D( z)⎤ −1 = ⎢ ∏ (1 − pq z ) ⎥ ⎢ ⎥ ⎣ ∂z ⎦ z = pi ⎣ ∂z q =1 ⎦ z = pi Quantization of Filter Coefficients The derivative of the qth factor is pq ⎡∂ −1 ⎤ ⎢⎣ ∂z (1 − pq z ) ⎥⎦ = z 2 and the overall derivative is ⎡N ⎡∂ D( z)⎤ pk ⎢ = ∑ 2 ⎢ ⎥ ⎣ ∂z ⎦ z = pi ⎢⎢ k =1 z ⎣ ⎤ N pk N −1 ⎥ −1 − = − 1 1 p z p p ( ) ( ∑ ∏ q q i ) 2 ∏ ⎥ k =1 pi q =1 q =1 ⎥⎦ z = pi q≠k q≠k N N N ⎡∂ D( z)⎤ pk −( N −1) N 1 N = ∑ 2 pi pi − pq ) = N +1 ∑ pk ∏ ( pi − pq ) ( ∏ ⎢ ⎥ pi k =1 q=1 q =1 ⎣ ∂z ⎦ z = pi k =1 pi q≠k q≠k Quantization of Filter Coefficients ∑ p ∏ ( p − p ), for any k other than i, N N In the k summation k =1 k q =1 q≠k i q the iterated product ∏ ( pi − pq ) has a factor pi − pi = 0. Therefore N q =1 q≠k the only term in the k summation that survives is the k = i term and N ⎡∂ D( z)⎤ 1 1 = N +1 pi ∏ ( pi − pq ) = N ⎢ ⎥ pi q =1 ⎣ ∂z ⎦ z = pi pi q≠k ∏( p − p ) N q =1 q≠k i q Quantization of Filter Coefficients Then ∂pi = 1 ∂ak piN pi− k ∏( p − p ) N i q =1 q≠k q and N ∆ak piN −k k =1 ∏( p − p ) ∆pi = ∑ N q =1 q≠k i q Quantization of Filter Coefficients The error in a pole location caused by errors in the coefficients is strongly affected by the denominator factor ∏ ( pi − pq ) which is a product of N q =1 q≠k differences between pole locations. So if the poles are closely spaced, the distances are small and the error in the pole location is large. Therefore systems with widely-spaced poles are less sensitive to errors in the coefficients. Quantization of Pole Locations Consider a 2nd order system with transfer function 1 H(z) = 1 + a1 z −1 + a2 z −2 and quantize a1 and a2 with 5 bits. If we quantize a1 in the range −1 ≤ a1 < 1 and − 2 ≤ a2 < 2 we get a finite set of possible pole locations. Quantization of Pole Locations If we look at only pole locations for stable systems, Obviously the pole locations are non-uniformly distributed. Quantization of Pole Locations Consider this alternative 2nd order system design. dz −1 H (z ) = 1 − (c + jd )z −1 1 − (c − jd )z −1 ( )( ) Pole locations are uniformly distributed. Quantization of Pole Locations Quantization of Pole Locations Quantization of Pole Locations Scaling to Prevent Overflow Let y k [ n ] be the response at node k to an input signal x [ n ] and let h k [ n ] be the impulse response at node k . yk [ n] = ∞ ∞ ∑ h [ m] x [ n − m] ≤ ∑ m =−∞ k k m =−∞ h k [ m] x k [ n − m] Let the upper bound on x [ n ] be Ax . y k [ n ] ≤ Ax ∞ ∑ m=−∞ h k [ m] If we want y k [ n ] to lie in the range ( -1,1) then Ax < 1 ∞ ∑ h [ n] n =−∞ k guarantees that oveflow will not occur. Scaling to Prevent Overflow The condition Ax < 1 ∞ ∑ h [ n] n =−∞ k may be too severe in some cases. Another type of scaling that may be more appropriate in some cases is 1 Ax = max H k ( e jΩ ) 0≤Ω≤π F → H k ( e jΩ ) . where h k [ n ] ←⎯ Scaling to Prevent Overflow Scaling to Prevent Overflow Scaling Example Assume that the input signal x is white noise bounded between -1 and +1 and that all additions and multiplications are done using 8-bit two’s complement arithmetic. Also let all numbers (signals and coefficients) be in the range -1 to 1. 1 − z −1 + 0.5 z −2 z 2 − z + 0.5 H( z) = = 2 −1 −2 1 + 0.2 z − 0.15 z z + 0.2 z − 0.15 Scaling Example Quantize the K's to 8 bits. K1 = 0.2353 → [Q ] → 0.2344 (0.0011110) K 2 = −0.15 → [Q ] → −0.1562 (1.1101100) Scale and quantize the v's. v0 = 1.338 → 1.338 /1.338 = 1 → [Q ] → 0.9922 (0.1111111) v1 = −1.1 → −1.1/1.338 = −0.8221 → [Q ] → −0.8281 (1.0010110) v2 = 0.5 → 0.5 /1.338 = 0.3737 → [Q ] → 0.3672 (0.0101111) Scaling Example Scaling the v’s scales the output signal but the input signal may still need scaling to prevent overflow. N 1 vm Β m ( z ) where The exact transfer function is H ( z ) = ∑ Α 2 ( z ) m =0 Α 2 ( z ) = 1 + 0.2 z −1 − 0.15 z −2 . The actual Α 2 ( z ) , after quantizing the K's is Α 2 ( z ) = 1 + 0.1978 z −1 − 0.1562 z −2 . Β0 ( z ) = 1, Β1 ( z ) = 0.2344 + z −1 and Β 2 ( z ) = −0.1562 + 0.1978 z −1 + z −2 . Therefore the actual transfer function is 0.7407 − 0.7555 z −1 + 0.3672 z −2 H(z) = . −1 −2 1 + 0.1978 z − 0.1562 z Scaling Example The maximum magnitude of H ( e jΩ ) in this example is 2.882 and the maximum magnitude of H 2 ( e jΩ ) is 1.5455. So the scaling factor should be 1/ 2.882 = 0.347. The simplest scaling is by shifting right. In this case that would require a shift of two bits to the right to scale by 0.25. This is significanly less that 0.347 so a more complicated scaling might be preferred. We could instead scale by 0.25 + 0.0625 + 0.03125 = 0.34375 by shifting two bits, then four bits and then 5 bits and adding the results. Statistical Analysis of Quantization Effects The exact estimation of quantization effects requires numerical simulation and is not amenable to exact analytical methods. But an approach that has proven useful is to treat the quantization noise effects as a random process problem. In doing this we get an approximate analytical estimate instead of an exact numerical simulation. We do this by treating quantization noise as an added noise signal in the system everywhere quantization occurs. Statistical Analysis of Quantization Effects Statistical Analysis of Quantization Effects The probability density function for quantization noise using two's complement representation is 1 ⎧1 , − LSB < q < 0 fQ ( q ) = . ⎨ LSB ⎩0 , otherwise The expected value of the quantization noise is E ( Q ) = − LSB / 2. The variance of the quantization noise is σ Q2 = LSB2 /12. Therefore the mean-squared quantization noise is E (Q 2 2 σ = + E Q = LSB /12 + − LSB / 2 = LSB /3 ( ) ( ) ) 2 Q 2 2 2 Statistical Analysis of Quantization Effects The power spectral density function for quantization noise using two's complement representation is G Q ( F ) . It has an impulse of strength LSB2 / 4 at F = 0 and is otherwise flat with a value K in the range − 1/ 2 < F < 1/ 2. It repeats periodically outside this range. ∞ ⎡ ⎤ E ( Q 2 ) = ∫ G Q ( F ) dF = ∫ ⎢( LSB2 / 4 ) ∑ δ ( F − k ) + K ⎥ dF k =−∞ ⎦ −1/ 2 −1/ 2 ⎣ 1/ 2 1/ 2 = LSB2 / 4 + K = LSB2 / 3 Solving, K = LSB2 /12 and ∞ G Q ( F ) = LSB /12 + ( LSB / 4 ) ∑ δ ( F − k ) 2 2 k =−∞ Statistical Analysis of Quantization Effects ∞ G Q ( F ) = LSB /12 + ( LSB / 4 ) ∑ δ ( F − k ) is the power spectral 2 2 k =−∞ density of both quantization noise sources. The power spectral density of the quantization noise effect on the output signal is G y ( F ) = G y,x ( F ) + G y, f ( F ) where G y,x ( F ) = G x H x y ( F ) = G Q H x y ( F ) 2 and 2 G y, f ( F ) = G y H f y ( F ) = G Q H f y ( F ) 2 Y(F ) 2 Y(F ) e j 2π F e j 2π F = j 2π F = j 2π F and H x y ( F ) = and H f y ( F ) = −a −a Qx ( F ) e Qf ( F ) e Statistical Analysis of Quantization Effects The two quantization noise sources have the same power spectral density and they are independent so the output quantization noise is just twice what would be produced by one of them acting alone. 1/ 2 PQ ,y = 2 ∫ GQ ( F ) −1/ 2 e e j 2π F j 2π F 2 −a LSB ⎛ 1 ⎞ LSB = ⎜ ⎟ + 2 ⎝ 1− a ⎠ 6 2 dF 2 1/ 2 ∫ −1/ 2 2 e e j 2π F j 2π F −a 2 dF LSB ⎛ 1 ⎞ LSB2 1 . = ⎜ ⎟ + 2 2 ⎝ 1− a ⎠ 6 1− a 2 Evaluating the integral, PQ ,y 2 Statistical Analysis of Quantization Effects 2 In PQ ,y LSB2 ⎛ 1 ⎞ LSB2 1 = ⎜ ⎟ + 2 ⎝ 1− a ⎠ 6 1 − a2 2 LSB2 ⎛ 1 ⎞ ⎜ ⎟ is the effect of the expected value of the input 2 ⎝ 1− a ⎠ quantization noise and LSB2 1 is the effect of the variance of the input quantization noise. 2 6 1− a Both are dependent on a. The closer a is to one, the larger the output quantization noise.