Video Compression using Computer Vision

advertisement
Presented by Yehuda Dar
Advanced Topics in Computer Vision (048921)
Winter 2011-2012
Video Compression Basics
 Fundamental tradeoff among:
 Bit-rate
 Distortion
 Computational complexity
Video Compression Basics
 Utilized redundancies:
 Spatial
 Temporal
 Psycho-visual
 Statistical
H.264 Overview
H.264 Redundancy Utilization
Redundancy
Utilization Means
Spatial
High
Temporal
High
Psycho-visual
Medium
Statistical
High
• Transform coding
• Intra coding (spatial prediction)
Motion estimation &
compensation
• YCbCr color space
• 4:2:0 sampling
• DC \ AC coefficients quantization
Entropy coding
Compression using Computer Vision
Motivation:
 Better utilization of the psycho-visual redundancy
 Application-specific compression methods
 Exploring new approaches
A Review of:
A Scheme for Attentional Video
Compression
R. Gupta and S. Chaundhury
PAMI 2011
Method Outline
 Salient region detection
 Foveated video coding
 Integration into H.264
Foveated image coding demonstration
Figure from Guo & Zhang, Trans. Image Process., 2010
Saliency Map
Step 1: Creating a 3D Feature Map
Feature type Calculation method
Based on
Global
Color spatial
variance
Liu et al,
CVPR 2007
Local
Center-surround
multi-scale ratio of
dissimilarity
Pulse-DCT
Huang et al,
ICPR 2010
Rarity
Yu et al,
ICDL 2009
Relevance Vector Machine (RVM)
 Used here as a binary classifier
 Advantages over support-vector-machine (SVM):
 Provides posterior probabilities
 Better generalization ability
 Faster decisions
Saliency Map
Step 2: Unify Features using RVM
Training Procedure for MBs:
Global
average
local
average
rarity
average
ground truth
count
pixels
æavg
ö
çç global ÷
÷
÷
çç
÷
çç avglocal ÷
÷
÷
çç
÷
÷
÷
çç avg
÷
rarity ø
è
sample
‘salient’ \
‘non salient’
label
RVM
Saliency Map
Step 2: Unify Features using RVM
Trained RVM Usage:
æavg
ö
çç global ÷
÷
÷
çç
÷
çç avglocal ÷
÷
÷
çç
÷
÷
÷
çç avg
÷
rarity ø
è
New
input
Binary label
‘salient’ \
‘non salient’
Probability
Relative
saliency
RVM
Saliency Map: Result Comparison
input
global
local
rarity
[Huang et al,
ICPR 2010]
[Yu et al,
ICDL 2009]
proposed
[Harel et al,
NIPS 2006]
[Bruce & Tsotsos,
NIPS 2006]
Figures from Gupta & Chaundhury, PAMI 2011
Saliency Map: ROC Curve
Proposed
[Harel et al, NIPS 2006]
Figure from Gupta & Chaundhury, PAMI 2011
Integration Into H.264:
Calculation of Saliency Values
 Recalculating saliency map only when it significantly changes
 Mutual-information between successive frames indicates
changes in saliency:
Figures from Gupta & Chaundhury, PAMI 2011
Integration Into H.264:
Propagation of Saliency Values
 For inter-coded MBs, the saliency value is a weightedaverage of those pointed by the motion-vector
Figures from
Gupta & Chaundhury, PAMI 2011
Integration Into H.264:
Salient-Adaptive Quantization
 Non-uniform bit-allocation
 Smaller saliency value => coarser quantization
Integration Into H.264
Figure from Gupta & Chaundhury, PAMI 2011
Paper Evaluation
 Novelty:
 Methods for:


saliency map
saliency value propagation
 Assumption:
 All the MBs in P-frames are inter-coded (problematic)
 Writing level:
 Good
 Partially self-contained
Paper Evaluation
 Feasibility:
 Higher complexity than H.264 encoders
 Not for real-time encoders
 Useful at low bit-rates
 Objects entering the scene may be considered unimportant
 Experimental evaluation:
 Saliency:
visual comparison: good
 ROC curve comparison: partial
 Compression:
 None (authors’ future direction)

Future Directions
 Improving encoding complexity
 less complex saliency method
 Better object entrance treatment
 Using mutual-information of frame areas
 Treat intra-coded MBs in P-frames
A Review of:
3D Models Coding and Morphing
for Efficient Video Compression
F. Galpin, R. Balter, L. Morin, K. Deguchi
CVPR 2004
Method Outline
 3D model extraction
 3D model-based video coding
 Reconstruction using adaptive geometric morphing
3D Models Stream Generation
Figure from Galpin et al, CVPR 2004
Stream Compression
 Three data types to compress:
 3D model
 Texture images
 Camera parameters
Texture Image Compression
Reconstruction Process:
Figure from Galpin et al, CVPR 2004
3D Model Compression
 The 3D model originates in
decimated depth map
 Compressed by:
 Wavelet transform
 Depth-adaptive quantization
Figures from Galpin et al, CVPR 2004
Video Reconstruction:
Texture Fading
Figure from Galpin et al, CVPR 2004
Video Reconstruction:
Texture Fading
without texture fading
with texture fading
Figures from Galpin et al, CVPR 2004
Video Reconstruction:
Geometric Morphing
 Improving 3D model
interpolation
Figure from Galpin et al, CVPR 2004
Video Reconstruction:
Geometric Morphing
regular interpolation
interpolation with geometric morphing
Figures from
Galpin et al, CVPR 2004
Result Comparison with H.264
Paper Evaluation
 Novelty:
 Compression using unknown 3D model
 Assumptions:
 Static scene
 Moving monocular camera
 Neglected camera rotation
 GOP intrinsic parameters are fixed
 Writing level:
 Good
 Not self-contained
Paper Evaluation
 Feasibility:
 Only for static scene video
 High encoder\decoder complexity
 Real-time unsuitable
 Useful at very low bit-rates
 Experimental evaluation:
 Sufficient visual comparison with H.264
 No run-time information
Future Directions
 Treat moving objects
 Improve complexity
 At least for real-time decoding
Approach Comparison
Attention
3D model
Video type
Any
Static scene
Bit-rates useful at
Low
Very low
Encoder complexity
High
High
Decoder complexity
Regular
High
Integration in H.264
Possible
Unsuitable
Promising
Inferior
Overall evaluation
Download