Trainable Block Transform Kyong Hwan Jin Dept. of Electrical Engineering & Computer Science Image Processing Lab. 1 Overview ■ Introduce Block Transform Coding ■ Deep Block Transform ■ Trainable block transform ■ Applied to Autoencoders ■ Summary 2 Block Transform | Basics ■ Block transform (transform coding) *Bernd Girod: EE368b Image and Video Compression ■ Variable block transform coding in Video codec domain (HEVC) Schwarz, Heiko, Thomas Schierl, and Detlev Marpe. "Block structures and parallelism features in HEVC." High Efficiency Video Coding (HEVC). Springer, Cham, 2014. 49-90. 3 Previous researches ■ JPEG Encoding : block splitting -> DCT 8x8 https://www.edn.com/baseline-jpeg-compression-juggles-image-quality-and-size/ 4 Previous researches 5 Previous researches 6 Equality between convolution and block transform ■ We discover a block transform from a convolutional layer of a stride ≥ 2 and kernel size ≥ stride 7 Equality between convolution and block transform ■ s : stride, k: 2x2 kernel, x: input discrete signal ■ No overlap happens when stride==kernel size >> becomes a block transform >> trainable block transform when we use trainable convolution layer with the same stride and kernel size 8 Padding-free backpropagation Typical convolution layer 9 Padding-free backpropagation Proposed block transform 10 Padding-free backpropagation Typical convolution layer Proposed block transform 11 Padding-free backpropagation a block Toeplitz matrix a matrix producing zero-padded vector Local gradient for C additional errors by inserting zero-rows (4I + 2HI + 2W I) 12 Padding-free backpropagation a transposed block Toeplitz matrix a block Toeplitz matrix Local gradient for Ba and Bb are non-overlapped convolution/transposed convolution matrices (Ba and Bb), so it is fullrank leading to no additional errors coming from a rank-deficient matrix, such as Z. 13 DC term (zero-frequency) removal ■ In K-SVD, they set an atom consisting of ones > averaging values in a DC basis in DCT patch > DC term ■ In DCT, DC basis is present ■ Discrete wavelet transform go beyond with residuals subtracted from the output of scaling function >> do not consider a DC term in further dyadic level 14 Nonlinear activations ■ CLIP ■ Ablation studies 15 Architecture of deep block transform 16 Autoencoders "Autoencoding" is a data compression algorithm where the compression and decompression functions are 1) data-specific, 2) lossy, and 3) learned automatically from examples rather than engineered by a human. Additionally, in almost all contexts where the term "autoencoder" is used, the compression and decompression functions are implemented with neural networks. https://blog.keras.io/building-autoencoders-in-keras.html 17 Experiments ■ Make autoencoders with the trainable block transform ■ AE: Cv(st:1,kr:3), M(st:2), Cv(st:1,kr:3), M(st:2), Cv(st:1,kr:3), B(st:2), Cv(st:1,kr:3), B(st:2), Cv(st:1,kr:3,c:1), M: maxpooling, B:bilinear interpolation ■ BTN-K-S: Cv(st:S,kr:K), Cv(st:S,kr:K), CvT (st:S,kr:K), CvT (st:S,kr:K,c:1) ■ BTN/DC-2-2 = proposed trainable block transform 18 Experiments - dataset ■ Study 1 ■ 64×64 numerical images which had 4 lines with different widths at every boundary. Six generated images were ternary (0, 0.5, 1) ■ L2 loss, 50 epochs, Adam with 10-3 ■ Fashion MNIST ■ L2 loss, batch size : 32, 10 epochs, Adam with 10-3 ■ Splitting ratio : training and validating sets was 5:1 ■ BSDS500 ■ Processing Luma channel only ■ L2 loss, batch size : 8, 100 epochs, Adam with 10-3 ■ Splitting ratio : training and validating sets was 4:1 19 Experiment - Study 1 ■ Baselines ü 20 Experiment – Fashion MNIST ü 21 Experiment - summary 22 Experiment – BSDS500 ■ LC:8, C:16 ü 23 Experiment – BSDS500 ■ BSDS 500 (LC:8) 24 Conclusion ■ We discover a trainable block transform from convolutional network with stride>1 and kernel size=stride ■ We apply a trainable block transform to autoencoders, and obtain superior representation than a normal autoencoder based on 3x3 kernel and stride 1 ■ We observe that with simple changes, convolutional neural networks become trainable block transform 25