Image Compression System Megan Fuller and Ezzeldin Hamed 1 Transforms of Images Original Image Image Reconstructed from 25% of DFT coefficients Magnitude of DFT of Image-128 (otherwise DC component = ~8e6) 2 The 2D Discrete Fourier Transform π2 π1 π π1 , π2 = π π π π2 π₯ π1 , π2 ππ11 1 ππ22 π2 =0 π1 =0 π2π π− π Where ππ = . This can be computed separably by rearranging: π2 π π1 , π2 = π π2 π1 ππ22 π2 =0 π π1 π₯ π1 , π2 ππ11 π1 =0 3 The 2D Discrete Cosine Transform π2 π1 πΆπ₯ π1 , π2 = 4π₯ π1 , π2 π2 =0 π1 =0 π cos π 2π1 + 1 2π1 1 π cos π 2π2 + 1 2π2 2 • Computed separably • Computed as a DFT + 1 multiply • Generally gives better energy compaction than DFT 4 High Level Architecture Input Memory Separable, inplace 2D DFT/DCT Output Module (sending data to PC) Coefficient > Threshold? • The choice between DFT and DCT is provided at compile time • Threshold is provided by the user at run time 5 What’s Interesting? • Reducing the computation required • Sharing resources in the DCT case • Some memory organization tricks • Reducing bit width 6 Number of FFTs • Using FFT to calculate the 1D-DFT π 2 → π log π • We need 2π FFTs to calculate the 2D-DFT • Can we reduce the number of FFTs? 7 Reduction for the DFT case • Using the DFT properties Real Imag S00 S01 S02 S03 S10 S11 S12 S13 S20 S21 S22 S23 S30 S31 S32 • N/2 FFTs of the rows, followed by Even/Odd decomposition S33 – Input is real π₯(π) – Output is symmetric π π = π ∗ (−π) – Combining rows πΉπΉπ ππ π π + ππ π+π/2 – Even/Odd decomposition • Output is symmetric (discard half the columns) S11 S31 • N/2 FFTs of the columns • Total of N FFT computations 8 Reduction in the DCT case • Again combining the rows in the same way as in DFT (N/2 FFTs) • Even/Odd decomposition then extra multiplication to calculate the DCT S00 S20 S01 S21 Real • Results are not symmetric S10 S30 S02 S22 S03 S23 S12 S32 S13 S33 Imag S11 S31 • But the DCT is real • We can combine the columns the same way we combined the rows (N/2 FFT) • The same multiplier inside the FFT is used • Another Even/Odd decomposition is required here with an extra complex multiplier • Total of N FFT computations + few extra multiplications 9 In-Place Radix-4 FFT • Critical path • Fixed point arithmetic • Bit Width? • Quantization noise • Rounding instead of Truncation • Avoid any overflow • • π additions Needs log 2 π extra bits • Can we do better? 10 Static Scaling Vs. Dynamic Scaling • Shift when you expect an overflow – • Shift only when overflow occurs – Shift after each addition Track overflows and account for them • The location of the fraction point is fixed at each computation step • The location of the fraction point is the same for each 1D-FFT frame • Almost no overhead compared to fixed point • Needs simple circuitry to track the overflow and shift when required • Higher effective bit width only in the first computation steps • Effective bit width depend on the data. • No effect on the critical path • No effect on the critical path 11 Design Space Explored Dynamic Scaling Yes No DFT 8 12 DCT 16 8 12 DFT 16 8 12 DCT 16 8 12 16 • 8 bits with dynamic scaling considered later • 8 bits without dynamic scaling (and 12 for DCT) perform too poorly to be considered • 12 does as good as 16 bits with dynamic scaling in the DFT 12 Dynamic Scaling of DFT • 50% of coefficients is sufficient for perfect reconstruction because of the symmetry of the DFT • 16 bits without dynamic scaling does as well as floating point • 12 bits with dynamic scaling also does nearly as well as floating point 13 Dynamic Scaling of DFT(continued) • Improvement in performance when dynamic scaling is used more than makes up for reduced compression because the scaling bits have to be saved • 12 bits with dynamic scaling does nearly as well as 16 bits 14 DCT Vs. DFT • All cases are using dynamic scaling • DCT provides better energy compaction • For DCT, 12 bits gives a lower MSE for a given compression ratio (this was not the case for the DFT). 15 8 Bits Image reconstructed from 50% of the DFT coefficients, computed with 8 bits, using dynamic scaling. MSE = 452. Image reconstructed from 6% of the DFT coefficients, computed with 16 bits, MSE = 129. 16 Physical Considerations Transform # of Bits Dynamic Scaling? Critical Path Slice Slice Registers LUTs BRAM DSP48Es DFT 16 No 11.458ns 16% 23% 29% 7 DFT 16 Yes 11.763ns 17% 24% 29% 7 DFT 12 No 11.273ns 15% 22% 24% 7 DFT 12 Yes 11.464ns 16% 23% 24% 7 DFT 8 Yes 11.287ns 15% 22% 18% 6 DCT 16 Yes 11.458ns 19% 26% 29% 10 DCT 12 Yes 11.273ns 18% 25% 24% 10 DCT 8 Yes 11.066ns 17% 23% 18% 8 • Critical path about the same for all designs, could probably be improved with tighter synthesis constraints • Resource usage increases with bitwidth, addition of dynamic scaling, and DCT, but overall doesn’t change much 17 • DCT uses extra DSP blocks because of the extra multiplication Latency Component Latency (clock cycles) Potential Frame Rate with 50MHz Clock Initialization 870,000 DCT 263,900 189 images/second DFT 262,200 191 images/second - Nπ’ππππ ππ πΆπ¦ππππ ≅ π 2 log 4 π 18 Future Work • Use of DRAM to allow compression of larger images • Support for color images • Support for rectangular images of arbitrary edge length • Combining the DCT and DFT into a single core that could compute either transform, as selected by the user at runtime 19 Relationship Between the DFT and the DCT The N-point DFT of a sequence is the Fourier Series coefficients for that sequence made periodic with period N. 20 Relationship Between the DFT and the DCT (continued) The N-point DCT of a sequence is a twiddle factor multiplied by the first N Fourier Series coefficients of the 2N point sequence y(n) made periodic with period 2N. x(n) y(n) = x(x) + x(2N-1-n) 21 Relationship Between the DFT and the DCT (continued) The DCT can be computed from the DFT as follows: • Define the sequences y(n) = x(n) + x(2N-1-n) v(n) = y(2n) • Compute the N-point DFT of v(n), V(k) π 2 • πΆπ₯ π = π2π π π + π − π2π2 π −π 22 Rounding Design MSE Decrease with Rounding 12 bits, no dynamic scaling, DFT 20 16 bits, no dynamic scaling, DFT 0 12 bits, dynamic scaling, DFT 2 16 bits, no dynamic scaling, DCT 0 12 bits, dynamic scaling, DCT 2 16 bits, dynamic scaling, DCT 0 Conclusion: Never hurt, often helped. Free in hardware (just a register initialization), so always use it. All subsequent results will be using rounding. 23 Dynamic Scaling of DCT 24 Dynamic Scaling of DCT (continued) 25 Limitations of MSE Image reconstructed from 5.7% of the DCT coefficients, computed with dynamic scaling. MSE = 193 Image reconstructed from 6.1% of the DCT coefficients, computed without dynamic 26 scaling. MSE = 338 Performance of 8 Bit Systems 27 More Limitations of MSE (Left) 8 bit DFT coefficients, computed with rounding. Compression ratio = 2.3, MSE = 869. (Right) 8 bit DFT coefficients, computed without rounding. Compression ratio = 2.1, MSE = 664 (Left) 8 bit DCT coefficients, computed with rounding. Compression ratio = 2.2, MSE = 517. (Right) 8 bit DCT coefficients, computed without rounding. Compression ratio = 2.4, MSE = 563 28