Video Compression using Computer Vision

Presented by Yehuda Dar Advanced Topics in Computer Vision (048921) Winter 2011-2012 Video Compression Basics  Fundamental tradeoff among:  Bit-rate  Distortion  Computational complexity Video Compression Basics  Utilized redundancies:  Spatial  Temporal  Psycho-visual  Statistical H.264 Overview H.264 Redundancy Utilization Redundancy Utilization Means Spatial High Temporal High Psycho-visual Medium Statistical High • Transform coding • Intra coding (spatial prediction) Motion estimation & compensation • YCbCr color space • 4:2:0 sampling • DC \ AC coefficients quantization Entropy coding Compression using Computer Vision Motivation:  Better utilization of the psycho-visual redundancy  Application-specific compression methods  Exploring new approaches A Review of: A Scheme for Attentional Video Compression R. Gupta and S. Chaundhury PAMI 2011 Method Outline  Salient region detection  Foveated video coding  Integration into H.264 Foveated image coding demonstration Figure from Guo & Zhang, Trans. Image Process., 2010 Saliency Map Step 1: Creating a 3D Feature Map Feature type Calculation method Based on Global Color spatial variance Liu et al, CVPR 2007 Local Center-surround multi-scale ratio of dissimilarity Pulse-DCT Huang et al, ICPR 2010 Rarity Yu et al, ICDL 2009 Relevance Vector Machine (RVM)  Used here as a binary classifier  Advantages over support-vector-machine (SVM):  Provides posterior probabilities  Better generalization ability  Faster decisions Saliency Map Step 2: Unify Features using RVM Training Procedure for MBs: Global average local average rarity average ground truth count pixels æavg ö çç global ÷ ÷ ÷ çç ÷ çç avglocal ÷ ÷ ÷ çç ÷ ÷ ÷ çç avg ÷ rarity ø è sample ‘salient’ \ ‘non salient’ label RVM Saliency Map Step 2: Unify Features using RVM Trained RVM Usage: æavg ö çç global ÷ ÷ ÷ çç ÷ çç avglocal ÷ ÷ ÷ çç ÷ ÷ ÷ çç avg ÷ rarity ø è New input Binary label ‘salient’ \ ‘non salient’ Probability Relative saliency RVM Saliency Map: Result Comparison input global local rarity [Huang et al, ICPR 2010] [Yu et al, ICDL 2009] proposed [Harel et al, NIPS 2006] [Bruce & Tsotsos, NIPS 2006] Figures from Gupta & Chaundhury, PAMI 2011 Saliency Map: ROC Curve Proposed [Harel et al, NIPS 2006] Figure from Gupta & Chaundhury, PAMI 2011 Integration Into H.264: Calculation of Saliency Values  Recalculating saliency map only when it significantly changes  Mutual-information between successive frames indicates changes in saliency: Figures from Gupta & Chaundhury, PAMI 2011 Integration Into H.264: Propagation of Saliency Values  For inter-coded MBs, the saliency value is a weightedaverage of those pointed by the motion-vector Figures from Gupta & Chaundhury, PAMI 2011 Integration Into H.264: Salient-Adaptive Quantization  Non-uniform bit-allocation  Smaller saliency value => coarser quantization Integration Into H.264 Figure from Gupta & Chaundhury, PAMI 2011 Paper Evaluation  Novelty:  Methods for:   saliency map saliency value propagation  Assumption:  All the MBs in P-frames are inter-coded (problematic)  Writing level:  Good  Partially self-contained Paper Evaluation  Feasibility:  Higher complexity than H.264 encoders  Not for real-time encoders  Useful at low bit-rates  Objects entering the scene may be considered unimportant  Experimental evaluation:  Saliency: visual comparison: good  ROC curve comparison: partial  Compression:  None (authors’ future direction)  Future Directions  Improving encoding complexity  less complex saliency method  Better object entrance treatment  Using mutual-information of frame areas  Treat intra-coded MBs in P-frames A Review of: 3D Models Coding and Morphing for Efficient Video Compression F. Galpin, R. Balter, L. Morin, K. Deguchi CVPR 2004 Method Outline  3D model extraction  3D model-based video coding  Reconstruction using adaptive geometric morphing 3D Models Stream Generation Figure from Galpin et al, CVPR 2004 Stream Compression  Three data types to compress:  3D model  Texture images  Camera parameters Texture Image Compression Reconstruction Process: Figure from Galpin et al, CVPR 2004 3D Model Compression  The 3D model originates in decimated depth map  Compressed by:  Wavelet transform  Depth-adaptive quantization Figures from Galpin et al, CVPR 2004 Video Reconstruction: Texture Fading Figure from Galpin et al, CVPR 2004 Video Reconstruction: Texture Fading without texture fading with texture fading Figures from Galpin et al, CVPR 2004 Video Reconstruction: Geometric Morphing  Improving 3D model interpolation Figure from Galpin et al, CVPR 2004 Video Reconstruction: Geometric Morphing regular interpolation interpolation with geometric morphing Figures from Galpin et al, CVPR 2004 Result Comparison with H.264 Paper Evaluation  Novelty:  Compression using unknown 3D model  Assumptions:  Static scene  Moving monocular camera  Neglected camera rotation  GOP intrinsic parameters are fixed  Writing level:  Good  Not self-contained Paper Evaluation  Feasibility:  Only for static scene video  High encoder\decoder complexity  Real-time unsuitable  Useful at very low bit-rates  Experimental evaluation:  Sufficient visual comparison with H.264  No run-time information Future Directions  Treat moving objects  Improve complexity  At least for real-time decoding Approach Comparison Attention 3D model Video type Any Static scene Bit-rates useful at Low Very low Encoder complexity High High Decoder complexity Regular High Integration in H.264 Possible Unsuitable Promising Inferior Overall evaluation

Video Compression using Computer Vision

Related documents

Products

Support

Video Compression using Computer Vision

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib