Pré-analyse de la vidéo pour un codage adapté Application au codage de la TVHD en flux H.264 Olivier Brouard 20 juillet 2010 Encadrants : Dominique Barba et Vincent Ricordel École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM) Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée Pre-analysis of video for its advanced coding Application to the HDTV coding in H.264 streams Olivier Brouard July 20th 2010 Supervisors : Dominique Barba and Vincent Ricordel École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM) Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée Introduction Motivations Emergence of the HDTV New displays From SDTV to HDTV SDTV: 720x576 pixels HDTV: 1920x1080 pixels from 4% to 20% of the visual field better immersion for the users more pixels (5x) Need for a new video coding standard H.264 (or MPEG-4 AVC) 13 April, 2015 Olivier Brouard Slide 3/47 Introduction H.264 Advanced video coder Reference frames (dissymetrical coding) + prediction modes richness + advanced entropy coding higher bit rate reduction (up to 50% MPEG-2) But short term decisions, « low level » signal based no coding consistency 13 April, 2015 Olivier Brouard Slide 4/47 Introduction Human as the final observer Needs Control the perceptual quality Ensure the coding temporal coherence of the objects the rendering of an object has to be consistent temporally avoid the perceptible distortions - blocking effects - flickering effects 13 April, 2015 Olivier Brouard Slide 5/47 Introduction Objectives & proposals How to do ? medium/long term decisions « high level » considerations no such tools within the current encoders Solution realize a video pre-analysis before the encoding step guide the encoder in its decisions 13 April, 2015 Olivier Brouard Slide 6/47 Outline 1. Video pre-analysis 1.1 Advanced motion estimation 1.2 Spatio-temporal segmentation 1.3 Visual attention modeling 2. Applications: H.264 video coding 2.1 GOP structure adaptation 2.2 Adaptive quantization 13 April, 2015 Olivier Brouard Slide 7/47 1- Video pre-analysis Video pre-analysis Based on HVS properties « high level » information to the encoder The Human Visual System (HVS) • • • • Luminance perception Color perception Contrast sensibility Masking effects Visual Attention • Bottom-Up guided by the saliency • Top-Down guided by the tasks 13 April, 2015 Olivier Brouard Slide 8/47 1- Video pre-analysis Visual attention Attributes guiding the deployment of visual attention [Wolfe 04] • Contrast, Motion, Color, Orientation, … Visual attention modeling [Itti 01; Le Meur 07; Marat 10] based on the Koch and Ullman model [Koch 85] Perceptually important regions most salient objects (physically and semantically) Shapes of regions (saliency maps) shape of objects [Milanese 1993] moving objects attract our visual attention 13 April, 2015 Olivier Brouard Slide 9/47 1- Video pre-analysis Video pre-analysis 13 April, 2015 Olivier Brouard Slide 10/47 1- Video pre-analysis – Advanced motion estimation Spatio-temporal tube (1) Visual fixing time in the HVS ~ 200 ms Next generation of HDTV 1920x1080 in progressive mode at 50Hz temporal segment of 9 frames: 180ms [Péchard 2007] Assumption - uniform motion spatio-temporal tube coherence of the motion along a perceptually significant duration motion vectors field more homogeneous 13 April, 2015 Olivier Brouard Slide 11/47 1- Video pre-analysis – Advanced motion estimation Spatio-temporal tube (2) Implementation • spatial down-sampling • temporal down-sampling - central frame current frame - 4 reference frames The spatio-temporal tube minimizes => MSEG with k = -4, -2, +2, +4 MSEk based on the 3 YUV components 13 April, 2015 Olivier Brouard Slide 12/47 1- Video pre-analysis – Spatio-temporal segmentation Global motion Apparent motions due to moving objects camera motion Motion segmentation based on the residual motion Affine model • a1, a2, a3, a4: deformation parameters • tx, ty: translation parameters • Vx, Vy: horizontal and vertical components of each MV (spatio-temporal tube) 13 April, 2015 Olivier Brouard Slide 13/47 1- Video pre-analysis – Spatio-temporal segmentation Global motion parameters estimation Motion vectors fields parameters estimation [Coudray 2005] Global motion estimation in 2 steps: 1. For each MV (tube) calculation of the derivatives • • accumulation of the parameters assumptions localization of the main peak 2. Accumulation of the residual MVs (tubes) 2-D histogram (tx, ty) 13 April, 2015 Olivier Brouard Slide 14/47 1- Video pre-analysis – Spatio-temporal segmentation Motion segmentation 2-D Histogram of the translation parameters residual MVs (tx, ty) Each histogram peak => a moving object analysis of all the peaks Iterative approach 1. Initialisation detection of the main peak greedy approach (local gradient) 2. Detection of the other peaks greedy approach 13 April, 2015 Olivier Brouard Accumulation histogram Secondary peak Main peak Segmented space Slide 15/47 1- Video pre-analysis – Spatio-temporal segmentation Motion segmentation – results need of a spatial and temporal regularization 13 April, 2015 Olivier Brouard Slide 16/47 1- Video pre-analysis Video pre-analysis 13 April, 2015 Olivier Brouard Slide 17/47 1- Video pre-analysis – Spatio-temporal segmentation Spatio-temporal regularization Motion-based segmentation some blocks are misclassified more criteria to improve the segmentation • • • • connexity color texture motion Markovian approach 13 April, 2015 Olivier Brouard Slide 18/47 1- Video pre-analysis – Spatio-temporal segmentation Markovian approach The Hammersley-Clifford theorem [Besag 1974] Gibbs distribution Markov Random Field the optimal label configuration minimize a global energy function • E: label field • O: observation field Markovian property U(o, e): sum of potential functions defined on cliques • site spatio-temporal tube 13 April, 2015 Olivier Brouard Slide 19/47 1- Video pre-analysis – Spatio-temporal segmentation Spatial regularization Spatial connexity • Segmented region locally homogeneous Color features • color distributions Bhattacharrya coefficient discrete densities Texture features • texture distributions 2 spatial gradients (Sobel filters) Bhattacharrya coefficient 13 April, 2015 Olivier Brouard Slide 20/47 1- Video pre-analysis – Spatio-temporal segmentation Temporal regularization Motion features distance between the MVs Temporal connexity • Segmented region => temporally homogeneous segmentation map of the previous temporal segment Regions tracking • criteria - color, texture, recovery video objects tracking 13 April, 2015 Olivier Brouard Slide 21/47 1- Video pre-analysis – Spatio-temporal segmentation Energy minimization The global energy function - potential functions weigthing factors Sequential sites processing stack of instability 13 April, 2015 Olivier Brouard Slide 22/47 1- Video pre-analysis – Spatio-temporal segmentation Results motion segmentation only 13 April, 2015 Olivier Brouard regularized spatiotemporal segmentation Slide 23/47 1- Video pre-analysis Video pre-analysis 13 April, 2015 Olivier Brouard Slide 24/47 1- Video pre-analysis – Visual attention modeling Spatial saliency Spatial saliency based on the color contrast [Aziz 2008] color transformation: YUV to HSV • color features influencing the visual attention 1- Saturation Contrast 2- Intensity Contrast 3- Hue Contrast 4- Opponents Contrast 5- Warm and Cold colors Contrast 6- Dominance of the warm colors 7- Dominance of the luminance and saturation Spatial saliency: SSP => combination of these 7 features 13 April, 2015 Olivier Brouard Slide 25/47 1- Video pre-analysis – Visual attention modeling Temporal saliency Temporal saliency based on the relative motion • • • : MV of the site s : dominant motion : relative motion of s => • maximum velocity of smooth pursuit of the eye [Daly 1998]: => 80°/s => temporal saliency ST 13 April, 2015 Olivier Brouard Slide 26/47 1- Video pre-analysis – Visual attention modeling Spatio-temporal saliency Fusion of the spatial saliency and temporal saliency maps Observers => focus on the center of the screen [Le Meur 2005] weighting by a 2-D gaussian function 13 April, 2015 Olivier Brouard Slide 27/47 1- Video pre-analysis – Visual attention modeling Results 13 April, 2015 Olivier Brouard Slide 28/47 1- Video pre-analysis Possible applications Video pre-analysis information - moving objects segmentation, objects tracking - color, texture - salient regions applications - advanced video coding video transmission with priority (saliency maps) video summarization, indexation … ArchiPEG (ANR Project) - HD MPEG-4 AVC real-time compression - pre-analysis video resource 13 April, 2015 Olivier Brouard Slide 29/47 Outline 1. Video pre-analysis 1.1 Advanced motion estimation 1.2 Spatio-temporal segmentation 1.3 Visual attention modeling 2. Applications: H.264 video coding 2.1 GOP structure adaptation 2.2 Adaptive quantization 13 April, 2015 Olivier Brouard Slide 30/47 2- Applications: H.264 video coding – GOP structure adaptation GOP structure Three kinds of frames: I, P, B • GOP begins by a I frame intra coded • P frames at regular intervals predicted • B frames between P frames bi-predicted Fixed interval between I frames • not adapted to changing scenes and temporal variations of the video => more bits dynamic GOP size irregular I-frames insertion Typically: number of B frames = 1 or 2 good trade-off between bitrate and quality • low motion or panning of the camera increase the number of B-frames 13 April, 2015 Olivier Brouard Slide 31/47 2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (1) Analysis of the video sequences x264 encoder different fixed number of B frames: 0, 1, 2, 3 Video Sequence Optimal GOP configuration New Mobile and Calendar 2 B frames Night 2 B frames Knightshields 2 B frames Crew 1 B frame Park run 1 B frame Park joy no B frame Tractor no B frame Umbrella no B frame optimal number of B frames => content dependent classify videos according to their content 13 April, 2015 Olivier Brouard Slide 32/47 2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (2) Spatio-temporal characterization -> 2 indices to evaluate the spatio-temporal activity - IT: temporal activity => MVs - IS: spatial activity => MSEG For each temporal segment 13 April, 2015 Olivier Brouard For the entire sequence Slide 33/47 2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (3) Classification space function of IT and IS • classe Ci => i B frames between P-P or I-P frames IT constant between P-P or I-P frames same rule for IS 13 April, 2015 Olivier Brouard Slide 34/47 2- Applications: H.264 video coding – GOP structure adaptation GOP size adaptation (1) Changes detection within a video shot • high motion significant changes reduce the interval • low motion little variation increase the interval • mid-range motion classical approach => fixed GOP size 2 thresholds to detect critical changes - sh => high motion - sb => low motion 13 April, 2015 Olivier Brouard Slide 35/47 2- Applications: H.264 video coding – GOP structure adaptation GOP size adaptation (2) Analysis of IT evolution 3 cases Mid-range motion High motion Low motion 13 April, 2015 Olivier Brouard Slide 36/47 2- Applications: H.264 video coding – GOP structure adaptation Performances 8 video sequences 4 different bitrates defined by an experts group Comparison between - x264 encoder: GOP size = 25, 2 B frames - a modified version => GOP structure adaptation 13 April, 2015 Olivier Brouard Slide 37/47 2- Applications: H.264 video coding – GOP structure adaptation Results Rate – Distortion (PSNR) [Bjontegaard 2001] Video Sequence Bitrate gain (%) PSNR gain (dB) New Mobile and Calendar 9.15 0.32 Night 2.45 0.09 Knightshields 1.68 0.06 Park run -0.1 -0.01 Umbrella 4.11 0.13 Park joy 2.83 0.09 Crew 4.5 0.13 Tractor 10.94 0.48 Average 13 April, 2015 Olivier Brouard 4.45 0.16 Slide 38/47 2- Applications: H.264 video coding – GOP structure adaptation Subjective tests Setup • • • • display resolution 1920x1080 normalized room [BT.500-11] ~30 naïve observers (72=8x4x2+8) video sequences Methodology ACR • for each sequence observers have to assess the quality 13 April, 2015 Olivier Brouard Slide 39/47 2- Applications: H.264 video coding – GOP structure adaptation Results • QGOP: MOS modified coder • Qx264: MOS x264 coder Video Sequence New Mobile and Calendar Night Knightshields Park run Umbrella Park joy Crew Tractor Average 0.31 -0.02 0.24 0.04 -0.09 0.14 0.48 0.33 0.18 • sequences with a high IT value high motion GOP structure adaptation 13 April, 2015 Olivier Brouard Slide 40/47 2- Applications: H.264 video coding – Adaptive quantization Adaptive quantization Objective control the distribution of binaries resources saliency maps increase the perceived visual quality Modification of the saliency maps quantization and morphological filtering Modification of the coder 13 April, 2015 Olivier Brouard Slide 41/47 2- Applications: H.264 video coding – Adaptive quantization Results (1) Rate – Distortion (PSNR) [Bjontegaard 2001] Video Sequence Entire sequence Region of Interest Bitrate gain (%) PSNR gain (dB) Bitrate gain (%) PSNR gain (dB) New Mobile and Calendar -2.49 -0.09 -0.67 -0.03 Night -3.38 -0.12 -0.39 -0.02 Knightshields -3.02 -0.12 -0.84 -0.03 Parkrun -0.81 -0.03 0.25 0.01 Umbrella 2.34 0.07 4.17 0.14 Parkjoy 2.68 0.09 4.42 0.14 Crew -0.36 -0.01 2.74 0.09 Tractor 10.94 0.05 4.35 0.20 Average 13 April, 2015 -0.52 Olivier Brouard -0.02 1.75 0.06 Slide 42/47 2- Applications: H.264 video coding – Adaptive quantization Subjective assessments Results • QQA: MOS modified coder (adaptive quantization) • Qx264: MOS x264 coder Video Sequence New Mobile and Calendar Night Knightshields Park run Umbrella Park joy Crew Tractor Average -0.13 0.09 -0.02 -0.06 0.06 0.17 -0.06 0.04 0.04 no specific content suitable unsuitable for coding and broadcasting of HDTV at high bitrate overhead, linear law ? 13 April, 2015 Olivier Brouard Slide 43/47 Conclusion Conclusion (1) Video pre-analysis • spatio-temporal segmentation detection of moving objects objects tracking • visual attention modeling saliency maps Applications • advanced video coding • video transmission with priority based on the saliency maps [Boulos 2010] • video summarization, indexation •… 13 April, 2015 Olivier Brouard Slide 44/47 Conclusion Conclusion (2) Applications of the video pre-analysis • GOP structure adaptation - B frames dynamic variation temporal segment classification IT and IS - GOP size adaptation I frame insertion change detection: IT • Adaptive quantization based on the saliency maps 13 April, 2015 Olivier Brouard Slide 45/47 Conclusion Conclusion (3) Subjective quality assessment tests • GOP structure adaptation no significant differences +0.18 (on a scale of 1 to 5) well suited for sequences with high motion • Adaptive quantization no clearly content suitability seems unsuitable for coding and broadcasting of HDTV at high bitrate … adaptation law could be modified … 13 April, 2015 Olivier Brouard Slide 46/47 Conclusion Perspectives Better performance evaluation of our visual attention model eye-tracking experiments Psychophysical experiments to optimize the model parameters improve the fusion process [Marat 2010] Add high-level visual information face, flesh hue, … 13 April, 2015 Olivier Brouard Slide 47/47 Thank you. Questions ? 13 April, 2015 Olivier Brouard Slide 48