Manipulating Lossless Video in the Compressed Domain William Thies1, Steven Hall2, Saman Amarasinghe2 1 Microsoft Research India 2 Massachusetts Institute of Technology ACM Multimedia October 20, 2009 Processing in the Compressed Domain • Multimedia archives are growing rapidly – Monsters vs. Aliens production – Facebook photos – YouTube 100 TB 400 TB 600 TB lossless prior to distribution • How to analyze or modify the data? Compressed Input Uncompress Process Recompress Typical practice Compressed Input Process Compressed Output Compressed-domain transformation Compressed Output Prior Work: Focus on Lossy Formats • DCT-based spatial compression (JPEG, MPEG stills) – – – – – Resizing [Dugad & Ahuja 2001] [Mukherjee & Mitra 2002] Edge detection [Shen & Sethi 1996] Image segmentation [Feng & Jiang 2003] Shearing and rotating inner blocks [Shen & Sethi 1998] Linear combinations of pixels [Smith & Rowe 1996] • DCT-based temporal compression (MPEG video) – – – – Captioning [Nang, Kwon, & Hong 2000] Reversal [Vasudev 1998] Distortion detection [Dorai, Ratha, & Bolle 2000] Transcoding [Acharya & Smith 1998] • Almost no work on lossless formats – Transpose and rotation of black/white images [Shoji 1995; Misra et al. 1999] – Pattern matching in compressed text [Farach & Thorup 1998; Navarro 2003] – Modifying pitch and playback of audio [Levine 1998] Prior Work: Focus on Lossy Formats • DCT-based spatial compression (JPEG, MPEG stills) – – – – – • Resizing [Dugad & Ahuja 2001] [Mukherjee & Mitra 2002] Edge detection [Shen & Sethi 1996] Image segmentation [Feng & Jiang 2003] Shearing and rotating inner blocks [Shen & Sethi 1998] Linear combinations of pixels [Smith & Rowe 1996] Our Focus: Regular Processing of DCT-based temporal compression (MPEG video) LZ77-Compressed – Captioning [Nang, Kwon, & Hong 2000]Data Streams – Reversal [Vasudev 1998] – Distortion detection [Dorai, Ratha, & Bolle 2000] – Transcoding [Acharya & Smith 1998] • Almost no work on lossless formats – Transpose and rotation of black/white images [Shoji 1995; Misra et al. 1999] – Pattern matching in compressed text [Farach & Thorup 1998; Navarro 2003] – Modifying pitch and playback of audio [Levine 1998] Example Input: O O O O L A L A L A to lowercase Output: o o o o l a l a l a Example Input: O O O O L A L A L A Compressed Input: O O O O L A L A L A Output: o o o o l a l a l a Example Input: O O O O L A L A L A Compressed Input: 4 2 O O O O L A L A L A Output: o o o o l a l a l a Example Input: O O O O L A L A L A Compressed Input: 4 2 L A Count Distance O O O O “Repeat Token” Output: o o o o l a l a l a Example Input: O O O O L A L A L A Compressed Input: 3 1 O O O O 4 2 L A Count Distance “Repeat Token” Output: o o o o l a l a l a Example Input: O O O O L A L A L A Compressed Input: 3 1 O 4 2 L A Count Distance “Repeat Token” Output: o o o o l a l a l a Example Input: O O O O L A L A L A Compressed Input: 3 1 O 4 2 L A Compressed Output: 3 1 o 4 2 l a Output: o o o o l a l a l a Example Input: O O O O L A Transformation L A L A Compressed Domain Compressed Input: 3 1 O 4 2 L A Compressed Output: 3 1 o 4 2 l a Output: o o o o l a l a l a Example Compressed Domain Transformation Compressed Input: 3 1 O 4 2 L A Compressed Output: 3 1 o 4 2 l a Our Contributions • Handle the general case – Produce and consume more than one data item – Split and join data streams Compressed Domain Transformation Compressed Input: 3 1 O 4 2 L A Compressed Output: 3 1 o 4 2 • Implement in a compiler l a – Programmer thinks in terms of uncompressed data – Compiler translates to work on compressed data – Relies on StreamIt programming language • Evaluate on video processing tasks – 12 videos in Apple Animation format – Adjust colors or overlay two videos – Speedups proportional to compression ratio (median 15x) In This Talk • StreamIt Language • Compressed Domain Transformation • Experimental Evaluation The StreamIt Language void->void pipeline FMRadio(freq1 low, float freq2, int N) { AtoD add AtoD(); add FMDemod(); FMDemod add splitjoin { split duplicate; Duplicate for (int i=0; i<N; i++) { add pipeline { add LowPassFilter(freq1 + i*(freq2-freq1)/N); LPF1 LPF2 LPF3 add HighPassFilter(freq2 + i*(freq2-freq1)/N); HPF1 HPF2 HPF3 } } join roundrobin(); RoundRobin } add Adder(); add Speaker(); } Adder Speaker The StreamIt Language • Applications – DES and Serpent [PLDI 05] – MPEG-2 [IPDPS 06] – SAR, DSP benchmarks, JPEG, … AtoD FMDemod • Programmability – StreamIt Language (CC 02) – Teleport Messaging (PPOPP 05) – Programming Environment in Eclipse (P-PHEC 05) Duplicate • Domain Specific Optimizations – Linear Analysis and Optimization (PLDI 03) – Optimizations for bit streaming (PLDI 05) – Linear State Space Analysis (CASES 05) • Architecture Specific Optimizations – Compiling for Communication-Exposed Architectures (ASPLOS 02 & 06, dasCMP 07) – Phased Scheduling (LCTES 03) – Cache Aware Optimization (LCTES 05) – Load-Balanced Rendering (Graphics Hardware 05) • Migrating Legacy Code to a Stream Representation – Using a Dynamic Analysis (MICRO 07) LPF1 LPF2 LPF3 HPF1 HPF2 HPF3 RoundRobin Adder Speaker Language Primitives Filter Splitter pop pop N 2 push M 1 roundrobin(1,1) roundrobin(N,M) Joiner roundrobin(2,2) Filter Model of computation also known as cyclo-static dataflow Example: Video Compositing Source 1 Source 2 roundrobin(1,1) 2 MultiplyPixels 1 Output In This Talk • StreamIt Language • Compressed Domain Transformation • Experimental Evaluation Transforming Windows of Data Input: O O O O L A L A L A Hyphenate Pairs Output: O O – O O – L A – L A – L A – Transforming Windows of Data Input: O O O O L A L A L A Hyphenate Pairs Output: O O – O O – L A – L A – L A – Transforming Windows of Data Input: Compressed Input: Compressed Output: O O O O L A L A L A 3 1 O 4 6 2 L A 3 L A – Output: O O – O O – L A – L A – L A – Transforming Windows of Data Input: Compressed Input: Compressed Output: O O O O L A L A L A 3 1 O 4 6 2 L A 3 L A – Output: O O – O O – L A – L A – L A – Transforming Windows of Data Input: O O O O L A L A L A 3 Compressed Input: Coarsened, Expanded Compressed Output: 3 2 1 O 4 2 L A 2 O O 4 2 L A 3 O O – 6 3 L A – Output: O O – O O – L A – L A – L A – General Case: Filters N … D … I O Filter Coarsen N’ … D’ ..… I O Filter Translate N’ % I items … D’ = LCM (D, I) N’ = N – (D’ – D) I O Filter N’’ = N’ – N % I N’’O/I D’O/I … … Splitting Streams Output: Input: L A L A L A L A L A Compressed Input: Output: 1 8 2 L A L A L A L A L A 4 1 1 1 4 1 1 Splitting Streams Output: Input: L A L A L A L A L A Compressed Input: L A 2 2 2 2 Splitting Streams Compressed Coarsened, Output: Expanded Input: 4 2 6 4 L A L A L A L A L A 2 2 2 2 Splitting and Joining: Transpose O O O O 4 1 X O O O 4 1 Splitting and Joining: Transpose O O O O 4 O O O O 1 X O O O 4 X O O O 1 Splitting and Joining: Transpose O O O O 4 1 X O O O 4 1 Splitting and Joining: Transpose 3 1 O O O O 4 X O O O 4 2 1 1 1 Splitting and Joining: Transpose 3 1 O 4 X 2 O 4 1 3 1 O O X 2 O 1 1 4 1 2 General Case: Joiners N1 … N2 … D1 … D2 … W1 W2 N’ … D1(W1+W2) W1 … If D1 % W1 = 0 and D2 % W2 = 0 and D1/W1 = D2/W2 In This Talk • StreamIt Language • Compressed Domain Transformation • Experimental Evaluation Implementation • Implemented subset of transformations in StreamIt 1 1 1-to-1 filter 1 1 2 1 1-to-1 joiner with 2-to-1 filter – User can change graph connectivity + filter functions • Supported file format: Apple Animation (part of .MOV) – Standard format for interchange of lossless video – Compression: Run-length encoding within a line + difference encoding between frames • Emit executable plugins for MEncoder and Blender – Allows integration with standard video editing workflow Experimental Methodology • Evaluated on 12 videos drawn from Internet video, computer animation, and stock digital television content • Two classes of transformations: 1. Color adjustment: inverse, brightness, contrast 2. Composite transformations: alpha-under, multiply + = x = alpha under 1 1 1 1 2 1 Results: Execution Time 1000x Brightness Speedup Contrast 100x Inverse Compositing 10x Color Adjustment: - 2.5x to 471x (median 17x) Compression factor was low (≤1.1x) for one of source videos Compositing: - 1.1x to 32x (median 6.6x) 1x 1x 10x 100x 1000x Compression CompressionFactor Factor Following Re-compression Results: File Bloat File Bloat Relative to Recompression 6x Brightness 5x Contrast Masked out areas not re-compressed 4x 3x Inverse Compositing Saturated colors not re-compressed 2x 1x 0x 1x 10x 100x Compression Compression Factor Following Re-compression 1000x Opportunity: Ignoring “Dead” Data • Some pixels in composite frames do not depend on both input frames – Example: digital television mask (a low-performance case) x = • If two data streams are multiplied, and one of them is repeatedly zero, then the repeat can be copied to the output (regardless of the values in the other stream) – We expect this would fix performance of our outlier cases 1 2 1 – Requires pattern matching on stream graph 1 Extension to Other File Formats • High-efficiency mappings – Flic Video – Microsoft RLE – Targa (with run-length encoding) • Medium-efficiency mappings – Open EXR – Planar RGB Re-arranges data by color or by byte • Low-efficiency mappings – ZIP – GZIP – PNG Performs Huffman coding prior to LZ77 Conclusions • New method for direct processing of lossless-encoded data streams – Relies on LZ77 compression and stream programming model – Supports operations on windows of data – Supports splitting, joining, and reordering data • Preliminary implementation in an automatic compiler – Write program on uncompressed data, run on compressed data • Good speedups in the context of video processing – 15x speedup (median) on color adjustment and compositing – Across 12 videos in Apple Animation format – May prove useful as more content authored in lossless formats • Scope for extending technique, finding new applications Extra Slides General Case: Splitters N … D … U Split V Coarsen N’ … D’ ..… U Split V Translate N’ % (U+V) items … D’ = LCM (D, U+V) N’ = N – (D’ – D) U Split V N’’ = N’ – N % (U+V) N’’V U+V … D’V U+V …