Video pre-analysis

advertisement
Pré-analyse de la vidéo pour un
codage adapté
Application au codage de la TVHD
en flux H.264
Olivier Brouard
20 juillet 2010
Encadrants : Dominique Barba et Vincent Ricordel
École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)
Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée
Pre-analysis of video for its
advanced coding
Application to the HDTV coding in
H.264 streams
Olivier Brouard
July 20th 2010
Supervisors : Dominique Barba and Vincent Ricordel
École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM)
Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée
Introduction
Motivations
 Emergence of the HDTV
 New displays
 From SDTV to HDTV
 SDTV: 720x576 pixels
 HDTV: 1920x1080 pixels
 from 4% to 20% of the
visual field
 better immersion for the users
 more pixels (5x)
 Need for a new video coding standard
 H.264 (or MPEG-4 AVC)
13 April, 2015
Olivier Brouard
Slide 3/47
Introduction
H.264
 Advanced video coder
Reference frames
(dissymetrical coding)
+ prediction modes richness
+ advanced entropy coding
 higher bit rate reduction
(up to 50%  MPEG-2)
 But
 short term decisions, « low level » signal based
 no coding consistency
13 April, 2015
Olivier Brouard
Slide 4/47
Introduction
Human as the final observer
Needs
 Control the perceptual quality
 Ensure the coding temporal coherence of the
objects
 the rendering of an object has to be
consistent temporally
 avoid the perceptible distortions
- blocking effects
- flickering effects
13 April, 2015
Olivier Brouard
Slide 5/47
Introduction
Objectives & proposals
 How to do ?
 medium/long term decisions
 « high level » considerations
 no such tools within the current encoders
 Solution
 realize a video pre-analysis before the
encoding step
 guide the encoder in its decisions
13 April, 2015
Olivier Brouard
Slide 6/47
Outline
1. Video pre-analysis
1.1 Advanced motion estimation
1.2 Spatio-temporal segmentation
1.3 Visual attention modeling
2. Applications: H.264 video coding
2.1 GOP structure adaptation
2.2 Adaptive quantization
13 April, 2015
Olivier Brouard
Slide 7/47
1- Video pre-analysis
Video pre-analysis
 Based on HVS properties
 « high level » information to the encoder
 The Human Visual System (HVS)
•
•
•
•
Luminance perception
Color perception
Contrast sensibility
Masking effects
 Visual Attention
• Bottom-Up  guided by the saliency
• Top-Down  guided by the tasks
13 April, 2015
Olivier Brouard
Slide 8/47
1- Video pre-analysis
Visual attention
 Attributes guiding the deployment of visual attention
[Wolfe 04]
• Contrast, Motion, Color, Orientation, …
 Visual attention modeling
[Itti 01; Le Meur 07; Marat 10]
 based on the Koch and Ullman model [Koch 85]
 Perceptually important regions  most salient objects
(physically and semantically)
 Shapes of regions (saliency maps)  shape of objects
[Milanese 1993]
 moving objects attract our visual attention
13 April, 2015
Olivier Brouard
Slide 9/47
1- Video pre-analysis
Video pre-analysis
13 April, 2015
Olivier Brouard
Slide 10/47
1- Video pre-analysis – Advanced motion estimation
Spatio-temporal tube (1)
 Visual fixing time in the HVS ~ 200 ms
 Next generation of HDTV
 1920x1080 in progressive mode at 50Hz
 temporal segment of 9 frames: 180ms [Péchard 2007]
 Assumption
- uniform motion
 spatio-temporal tube
 coherence of the
motion along a
perceptually significant
duration
 motion vectors field more homogeneous
13 April, 2015
Olivier Brouard
Slide 11/47
1- Video pre-analysis – Advanced motion estimation
Spatio-temporal tube (2)
 Implementation
• spatial down-sampling
• temporal down-sampling
- central frame  current frame
- 4 reference frames
 The spatio-temporal
tube minimizes
=> MSEG
with k = -4, -2, +2, +4
 MSEk  based on the 3 YUV components
13 April, 2015
Olivier Brouard
Slide 12/47
1- Video pre-analysis – Spatio-temporal segmentation
Global motion
 Apparent motions due to
 moving objects
 camera motion
 Motion segmentation
 based on the residual motion
 Affine model
• a1, a2, a3, a4: deformation parameters
• tx, ty: translation parameters
• Vx, Vy: horizontal and vertical components of
each MV (spatio-temporal tube)
13 April, 2015
Olivier Brouard
Slide 13/47
1- Video pre-analysis – Spatio-temporal segmentation
Global motion parameters estimation

Motion vectors fields
 parameters estimation
[Coudray 2005]
 Global motion estimation
in 2 steps:
1. For each MV (tube)
 calculation of the derivatives
•
•
accumulation of the
parameters assumptions
localization of the main
peak
2. Accumulation of the
residual MVs (tubes)
2-D histogram (tx, ty)
13 April, 2015
Olivier Brouard
Slide 14/47
1- Video pre-analysis – Spatio-temporal segmentation
Motion segmentation
 2-D Histogram of the translation parameters
 residual MVs (tx, ty)
 Each histogram peak => a moving object
 analysis of all the peaks
 Iterative approach
1. Initialisation
 detection of the
main peak
 greedy approach
(local gradient)
2. Detection of the
other peaks
 greedy approach
13 April, 2015
Olivier Brouard
Accumulation histogram
Secondary peak
Main peak
Segmented space
Slide 15/47
1- Video pre-analysis – Spatio-temporal segmentation
Motion segmentation – results
 need of a spatial and temporal regularization
13 April, 2015
Olivier Brouard
Slide 16/47
1- Video pre-analysis
Video pre-analysis
13 April, 2015
Olivier Brouard
Slide 17/47
1- Video pre-analysis – Spatio-temporal segmentation
Spatio-temporal regularization
 Motion-based segmentation
 some blocks are misclassified
 more criteria to improve the segmentation
•
•
•
•
connexity
color
texture
motion
 Markovian approach
13 April, 2015
Olivier Brouard
Slide 18/47
1- Video pre-analysis – Spatio-temporal segmentation
Markovian approach
 The Hammersley-Clifford theorem [Besag 1974]
 Gibbs distribution  Markov Random Field
 the optimal label configuration
 minimize a global energy function
• E: label field
• O: observation field
 Markovian property
 U(o, e): sum of potential
functions defined on cliques
• site  spatio-temporal tube
13 April, 2015
Olivier Brouard
Slide 19/47
1- Video pre-analysis – Spatio-temporal segmentation
Spatial regularization
 Spatial connexity
• Segmented region
 locally homogeneous
 Color features
• color distributions
 Bhattacharrya coefficient  discrete densities
 Texture features
• texture distributions
 2 spatial gradients (Sobel filters)
 Bhattacharrya coefficient
13 April, 2015
Olivier Brouard
Slide 20/47
1- Video pre-analysis – Spatio-temporal segmentation
Temporal regularization
 Motion features
 distance between the MVs
 Temporal connexity
• Segmented region
=> temporally homogeneous
 segmentation map of the
previous temporal segment
 Regions tracking
• criteria
- color, texture, recovery
video objects tracking
13 April, 2015
Olivier Brouard
Slide 21/47
1- Video pre-analysis – Spatio-temporal segmentation
Energy minimization
 The global energy function
-
 potential functions
 weigthing factors
 Sequential sites processing
 stack of instability
13 April, 2015
Olivier Brouard
Slide 22/47
1- Video pre-analysis – Spatio-temporal segmentation
Results
motion
segmentation only
13 April, 2015
Olivier Brouard
regularized spatiotemporal segmentation
Slide 23/47
1- Video pre-analysis
Video pre-analysis
13 April, 2015
Olivier Brouard
Slide 24/47
1- Video pre-analysis – Visual attention modeling
Spatial saliency
 Spatial saliency based on the color contrast
[Aziz 2008]
 color transformation: YUV to HSV
• color features influencing the visual attention
1- Saturation Contrast
2- Intensity Contrast
3- Hue Contrast
4- Opponents Contrast
5- Warm and Cold colors Contrast
6- Dominance of the warm colors
7- Dominance of the luminance and saturation
Spatial saliency: SSP => combination of these 7 features
13 April, 2015
Olivier Brouard
Slide 25/47
1- Video pre-analysis – Visual attention modeling
Temporal saliency
 Temporal saliency based on the relative motion
•
•
•
: MV of the site s
: dominant motion
: relative motion of s
=>
• maximum velocity of
smooth pursuit of the eye
[Daly 1998]: => 80°/s
=> temporal saliency ST
13 April, 2015
Olivier Brouard
Slide 26/47
1- Video pre-analysis – Visual attention modeling
Spatio-temporal saliency
 Fusion of the spatial saliency and temporal
saliency maps
 Observers => focus on the center of the
screen [Le Meur 2005]
 weighting by a 2-D gaussian function
13 April, 2015
Olivier Brouard
Slide 27/47
1- Video pre-analysis – Visual attention modeling
Results
13 April, 2015
Olivier Brouard
Slide 28/47
1- Video pre-analysis
Possible applications
 Video pre-analysis
 information
- moving objects segmentation, objects tracking
- color, texture
- salient regions
 applications
-
advanced video coding
video transmission with priority (saliency maps)
video summarization, indexation
…
 ArchiPEG (ANR Project)
- HD MPEG-4 AVC real-time compression
- pre-analysis video resource
13 April, 2015
Olivier Brouard
Slide 29/47
Outline
1. Video pre-analysis
1.1 Advanced motion estimation
1.2 Spatio-temporal segmentation
1.3 Visual attention modeling
2. Applications: H.264 video coding
2.1 GOP structure adaptation
2.2 Adaptive quantization
13 April, 2015
Olivier Brouard
Slide 30/47
2- Applications: H.264 video coding – GOP structure adaptation
GOP structure
 Three kinds of frames: I, P, B
• GOP begins by a I frame  intra coded
• P frames at regular intervals  predicted
• B frames between P frames  bi-predicted
 Fixed interval between I frames
• not adapted to changing scenes and temporal
variations of the video => more bits
 dynamic GOP size  irregular I-frames insertion
 Typically: number of B frames = 1 or 2
 good trade-off between bitrate and quality
• low motion or panning of the camera
 increase the number of B-frames
13 April, 2015
Olivier Brouard
Slide 31/47
2- Applications: H.264 video coding – GOP structure adaptation
B frames adaptation (1)
 Analysis of the video sequences
 x264 encoder
 different fixed number of B frames: 0, 1, 2, 3
Video Sequence
Optimal GOP configuration
New Mobile and Calendar
2 B frames
Night
2 B frames
Knightshields
2 B frames
Crew
1 B frame
Park run
1 B frame
Park joy
no B frame
Tractor
no B frame
Umbrella
no B frame
 optimal number of B frames => content dependent
 classify videos according to their content
13 April, 2015
Olivier Brouard
Slide 32/47
2- Applications: H.264 video coding – GOP structure adaptation
B frames adaptation (2)
 Spatio-temporal characterization
-> 2 indices to evaluate the spatio-temporal activity
- IT: temporal activity => MVs
- IS: spatial activity => MSEG
For each temporal segment
13 April, 2015
Olivier Brouard
For the entire sequence
Slide 33/47
2- Applications: H.264 video coding – GOP structure adaptation
B frames adaptation (3)
 Classification space function of IT and IS
• classe Ci => i B frames between P-P or I-P frames
 IT constant between P-P or I-P frames
 same rule for IS
13 April, 2015
Olivier Brouard
Slide 34/47
2- Applications: H.264 video coding – GOP structure adaptation
GOP size adaptation (1)
 Changes detection within a video shot
• high motion
 significant changes
 reduce the interval
• low motion
 little variation
 increase the interval
• mid-range motion
 classical approach => fixed GOP size
 2 thresholds to detect critical changes
- sh => high motion
- sb => low motion
13 April, 2015
Olivier Brouard
Slide 35/47
2- Applications: H.264 video coding – GOP structure adaptation
GOP size adaptation (2)
 Analysis of IT evolution  3 cases
Mid-range motion
High motion
Low motion
13 April, 2015
Olivier Brouard
Slide 36/47
2- Applications: H.264 video coding – GOP structure adaptation
Performances
 8 video sequences
 4 different bitrates
 defined by an experts group
 Comparison between
- x264 encoder: GOP size = 25, 2 B frames
- a modified version
=> GOP structure adaptation
13 April, 2015
Olivier Brouard
Slide 37/47
2- Applications: H.264 video coding – GOP structure adaptation
Results
 Rate – Distortion (PSNR) [Bjontegaard 2001]
Video Sequence
Bitrate gain (%) PSNR gain (dB)
New Mobile and Calendar
9.15
0.32
Night
2.45
0.09
Knightshields
1.68
0.06
Park run
-0.1
-0.01
Umbrella
4.11
0.13
Park joy
2.83
0.09
Crew
4.5
0.13
Tractor
10.94
0.48
Average
13 April, 2015
Olivier Brouard
4.45
0.16
Slide 38/47
2- Applications: H.264 video coding – GOP structure adaptation
Subjective tests
 Setup
•
•
•
•
display  resolution 1920x1080
normalized room [BT.500-11]
~30 naïve observers
(72=8x4x2+8) video sequences
 Methodology  ACR
• for each sequence
 observers have
to assess the quality
13 April, 2015
Olivier Brouard
Slide 39/47
2- Applications: H.264 video coding – GOP structure adaptation
Results
• QGOP: MOS  modified coder
• Qx264: MOS  x264 coder

Video Sequence
New Mobile and Calendar
Night
Knightshields
Park run
Umbrella
Park joy
Crew
Tractor
Average
0.31
-0.02
0.24
0.04
-0.09
0.14
0.48
0.33
0.18
• sequences with a high IT value  high motion
 GOP structure adaptation
13 April, 2015
Olivier Brouard
Slide 40/47
2- Applications: H.264 video coding – Adaptive quantization
Adaptive quantization
 Objective
 control the distribution of binaries resources
 saliency maps
 increase the perceived visual quality
 Modification of the saliency maps
 quantization and morphological filtering
 Modification of the coder
13 April, 2015
Olivier Brouard
Slide 41/47
2- Applications: H.264 video coding – Adaptive quantization
Results (1)
 Rate – Distortion (PSNR) [Bjontegaard 2001]
Video Sequence
Entire sequence
Region of Interest
Bitrate gain (%) PSNR gain (dB) Bitrate gain (%) PSNR gain (dB)
New Mobile and Calendar
-2.49
-0.09
-0.67
-0.03
Night
-3.38
-0.12
-0.39
-0.02
Knightshields
-3.02
-0.12
-0.84
-0.03
Parkrun
-0.81
-0.03
0.25
0.01
Umbrella
2.34
0.07
4.17
0.14
Parkjoy
2.68
0.09
4.42
0.14
Crew
-0.36
-0.01
2.74
0.09
Tractor
10.94
0.05
4.35
0.20
Average
13 April, 2015
-0.52
Olivier Brouard
-0.02
1.75
0.06
Slide 42/47
2- Applications: H.264 video coding – Adaptive quantization
Subjective assessments
 Results
• QQA: MOS  modified coder (adaptive quantization)
• Qx264: MOS  x264 coder

Video Sequence
New Mobile and Calendar
Night
Knightshields
Park run
Umbrella
Park joy
Crew
Tractor
Average
-0.13
0.09
-0.02
-0.06
0.06
0.17
-0.06
0.04
0.04
 no specific content suitable
 unsuitable for coding and broadcasting of HDTV at high
bitrate
 overhead, linear law ?
13 April, 2015
Olivier Brouard
Slide 43/47
Conclusion
Conclusion (1)
 Video pre-analysis
• spatio-temporal segmentation
 detection of moving objects
 objects tracking
• visual attention modeling
 saliency maps
 Applications
• advanced video coding
• video transmission with priority based on the
saliency maps [Boulos 2010]
• video summarization, indexation
•…
13 April, 2015
Olivier Brouard
Slide 44/47
Conclusion
Conclusion (2)
 Applications of the video pre-analysis
• GOP structure adaptation
- B frames dynamic variation
 temporal segment classification
IT and IS
- GOP size adaptation
 I frame insertion
 change detection: IT
• Adaptive quantization based on the saliency
maps
13 April, 2015
Olivier Brouard
Slide 45/47
Conclusion
Conclusion (3)
 Subjective quality assessment tests
• GOP structure adaptation
 no significant differences
 +0.18 (on a scale of 1 to 5)
 well suited for sequences with high motion
• Adaptive quantization
 no clearly content suitability
 seems unsuitable for coding and broadcasting of
HDTV at high bitrate
… adaptation law could be modified …
13 April, 2015
Olivier Brouard
Slide 46/47
Conclusion
Perspectives
 Better performance evaluation of our visual
attention model
 eye-tracking experiments
 Psychophysical experiments to optimize the
model parameters
 improve the fusion process [Marat 2010]
 Add high-level visual information
 face, flesh hue, …
13 April, 2015
Olivier Brouard
Slide 47/47
Thank you.
Questions ?
13 April, 2015
Olivier Brouard
Slide 48
Download