group1

advertisement
Image Compression System
Megan Fuller and Ezzeldin Hamed
1
Transforms of Images
Original Image
Image Reconstructed from 25% of DFT coefficients
Magnitude of DFT of Image-128 (otherwise DC
component = ~8e6)
2
The 2D Discrete Fourier Transform
𝑁2
𝑁1
𝑋 π‘˜1 , π‘˜2 =
π‘˜ 𝑛
π‘˜ 𝑛2
π‘₯ 𝑛1 , 𝑛2 π‘Šπ‘11 1 π‘Šπ‘22
𝑛2 =0
𝑛1 =0
𝑗2πœ‹
𝑒− 𝑁
Where π‘Šπ‘ =
.
This can be computed separably by rearranging:
𝑁2
𝑋 π‘˜1 , π‘˜2 =
π‘˜ 𝑛2
𝑁1
π‘Šπ‘22
𝑛2 =0
π‘˜ 𝑛1
π‘₯ 𝑛1 , 𝑛2 π‘Šπ‘11
𝑛1 =0
3
The 2D Discrete Cosine Transform
𝑁2
𝑁1
𝐢π‘₯ π‘˜1 , π‘˜2 =
4π‘₯ 𝑛1 , 𝑛2
𝑛2 =0
𝑛1 =0
πœ‹
cos
π‘˜ 2𝑛1 + 1
2𝑁1 1
πœ‹
cos
π‘˜ 2𝑛2 + 1
2𝑁2 2
• Computed separably
• Computed as a DFT + 1 multiply
• Generally gives better energy compaction than DFT
4
High Level Architecture
Input
Memory
Separable, inplace 2D DFT/DCT
Output
Module
(sending
data to
PC)
Coefficient >
Threshold?
• The choice between DFT and DCT is provided at compile time
• Threshold is provided by the user at run time
5
What’s Interesting?
• Reducing the computation required
• Sharing resources in the DCT case
• Some memory organization tricks
• Reducing bit width
6
Number of FFTs
• Using FFT to calculate the 1D-DFT
𝑁 2 → 𝑁 log 𝑁
• We need 2𝑁 FFTs to calculate the 2D-DFT
• Can we reduce the number of FFTs?
7
Reduction for the DFT case
• Using the DFT properties
Real
Imag
S00
S01
S02
S03
S10
S11
S12
S13
S20
S21
S22
S23
S30
S31
S32
• N/2 FFTs of the rows, followed by Even/Odd
decomposition
S33
– Input is real π‘₯(𝑛)
– Output is symmetric
𝑋 π‘˜ = 𝑋 ∗ (−π‘˜)
– Combining rows
𝐹𝐹𝑇 π‘œπ‘“ 𝑅𝑖 + 𝑗𝑅𝑖+𝑁/2
– Even/Odd decomposition
• Output is symmetric (discard half the columns)
S11
S31
• N/2 FFTs of the columns
• Total of N FFT computations
8
Reduction in the DCT case
• Again combining the rows in the same way as in DFT (N/2 FFTs)
• Even/Odd decomposition then extra multiplication to calculate the DCT
S00 S20
S01 S21
Real
• Results are not symmetric
S10 S30
S02 S22
S03 S23
S12 S32
S13 S33
Imag
S11 S31
• But the DCT is real
• We can combine the columns the same way we combined the rows (N/2 FFT)
• The same multiplier inside the FFT is used
• Another Even/Odd decomposition is required here with an extra complex
multiplier
• Total of N FFT computations + few extra multiplications
9
In-Place Radix-4 FFT
• Critical path
• Fixed point arithmetic
• Bit Width?
• Quantization noise
• Rounding instead of Truncation
• Avoid any overflow
•
•
𝑁 additions
Needs log 2 𝑁 extra bits
• Can we do better?
10
Static Scaling Vs. Dynamic Scaling
•
Shift when you expect an overflow
–
•
Shift only when overflow occurs
–
Shift after each addition
Track overflows and account for them
•
The location of the fraction point is
fixed at each computation step
•
The location of the fraction point is
the same for each 1D-FFT frame
•
Almost no overhead compared to
fixed point
•
Needs simple circuitry to track the
overflow and shift when required
•
Higher effective bit width only in the
first computation steps
•
Effective bit width depend on the
data.
•
No effect on the critical path
•
No effect on the critical path
11
Design Space Explored
Dynamic Scaling
Yes
No
DFT
8
12
DCT
16
8
12
DFT
16
8
12
DCT
16
8
12
16
• 8 bits with dynamic scaling considered later
• 8 bits without dynamic scaling (and 12 for DCT) perform too poorly
to be considered
• 12 does as good as 16 bits with dynamic scaling in the DFT
12
Dynamic Scaling of DFT
• 50% of coefficients is
sufficient for perfect
reconstruction because of
the symmetry of the DFT
• 16 bits without dynamic
scaling does as well as
floating point
• 12 bits with dynamic
scaling also does nearly as
well as floating point
13
Dynamic Scaling of DFT(continued)
• Improvement in
performance when dynamic
scaling is used more than
makes up for reduced
compression because the
scaling bits have to be saved
• 12 bits with dynamic scaling
does nearly as well as 16 bits
14
DCT Vs. DFT
• All cases are using dynamic
scaling
• DCT provides better energy
compaction
• For DCT, 12 bits gives a
lower MSE for a given
compression ratio (this was
not the case for the DFT).
15
8 Bits
Image reconstructed from 50% of the DFT
coefficients, computed with 8 bits, using
dynamic scaling. MSE = 452.
Image reconstructed from 6% of the DFT
coefficients, computed with 16 bits, MSE =
129.
16
Physical Considerations
Transform
# of Bits
Dynamic
Scaling?
Critical Path
Slice
Slice
Registers LUTs
BRAM
DSP48Es
DFT
16
No
11.458ns
16%
23%
29%
7
DFT
16
Yes
11.763ns
17%
24%
29%
7
DFT
12
No
11.273ns
15%
22%
24%
7
DFT
12
Yes
11.464ns
16%
23%
24%
7
DFT
8
Yes
11.287ns
15%
22%
18%
6
DCT
16
Yes
11.458ns
19%
26%
29%
10
DCT
12
Yes
11.273ns
18%
25%
24%
10
DCT
8
Yes
11.066ns
17%
23%
18%
8
• Critical path about the same for all designs, could probably be improved with tighter
synthesis constraints
• Resource usage increases with bitwidth, addition of dynamic scaling, and DCT, but
overall doesn’t change much
17
• DCT uses extra DSP blocks because of the extra multiplication
Latency
Component
Latency (clock cycles)
Potential Frame Rate
with 50MHz Clock
Initialization
870,000
DCT
263,900
189 images/second
DFT
262,200
191 images/second
-
Nπ‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝐢𝑦𝑐𝑙𝑒𝑠 ≅ 𝑁 2 log 4 𝑁
18
Future Work
• Use of DRAM to allow compression of larger
images
• Support for color images
• Support for rectangular images of arbitrary
edge length
• Combining the DCT and DFT into a single core
that could compute either transform, as
selected by the user at runtime
19
Relationship Between the DFT and the
DCT
The N-point DFT of a sequence is the Fourier
Series coefficients for that sequence made
periodic with period N.
20
Relationship Between the DFT and the
DCT (continued)
The N-point DCT of a sequence is a twiddle factor
multiplied by the first N Fourier Series coefficients
of the 2N point sequence y(n) made periodic with
period 2N.
x(n)
y(n) = x(x) + x(2N-1-n)
21
Relationship Between the DFT and the
DCT (continued)
The DCT can be computed from the DFT as
follows:
• Define the sequences
y(n) = x(n) + x(2N-1-n)
v(n) = y(2n)
• Compute the N-point DFT of v(n), V(k)
π‘˜
2
• 𝐢π‘₯ π‘˜ = π‘Š2𝑁 𝑉 π‘˜ +
π‘˜
−
π‘Š2𝑁2 𝑉
−π‘˜
22
Rounding
Design
MSE Decrease with Rounding
12 bits, no dynamic scaling, DFT
20
16 bits, no dynamic scaling, DFT
0
12 bits, dynamic scaling, DFT
2
16 bits, no dynamic scaling, DCT
0
12 bits, dynamic scaling, DCT
2
16 bits, dynamic scaling, DCT
0
Conclusion: Never hurt, often helped. Free in hardware (just a register
initialization), so always use it. All subsequent results will be using
rounding.
23
Dynamic Scaling of DCT
24
Dynamic Scaling of DCT (continued)
25
Limitations of MSE
Image reconstructed from 5.7% of the DCT
coefficients, computed with dynamic scaling.
MSE = 193
Image reconstructed from 6.1% of the DCT
coefficients, computed without dynamic
26
scaling. MSE = 338
Performance of 8 Bit Systems
27
More Limitations of MSE
(Left) 8 bit DFT coefficients,
computed with rounding.
Compression ratio = 2.3, MSE
= 869.
(Right) 8 bit DFT coefficients,
computed without rounding.
Compression ratio = 2.1, MSE
= 664
(Left) 8 bit DCT coefficients,
computed with rounding.
Compression ratio = 2.2, MSE
= 517.
(Right) 8 bit DCT coefficients,
computed without rounding.
Compression ratio = 2.4, MSE
= 563
28
Download