Video pre-analysis

Pré-analyse de la vidéo pour un codage adapté Application au codage de la TVHD en flux H.264 Olivier Brouard 20 juillet 2010 Encadrants : Dominique Barba et Vincent Ricordel École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM) Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée Pre-analysis of video for its advanced coding Application to the HDTV coding in H.264 streams Olivier Brouard July 20th 2010 Supervisors : Dominique Barba and Vincent Ricordel École Doctorale Sciences et Technologie de l’Information et Mathématiques (EDSTIM) Spécialité : Automatique, Robotique, Traitement du Signal et Informatique Appliquée Introduction Motivations  Emergence of the HDTV  New displays  From SDTV to HDTV  SDTV: 720x576 pixels  HDTV: 1920x1080 pixels  from 4% to 20% of the visual field  better immersion for the users  more pixels (5x)  Need for a new video coding standard  H.264 (or MPEG-4 AVC) 13 April, 2015 Olivier Brouard Slide 3/47 Introduction H.264  Advanced video coder Reference frames (dissymetrical coding) + prediction modes richness + advanced entropy coding  higher bit rate reduction (up to 50%  MPEG-2)  But  short term decisions, « low level » signal based  no coding consistency 13 April, 2015 Olivier Brouard Slide 4/47 Introduction Human as the final observer Needs  Control the perceptual quality  Ensure the coding temporal coherence of the objects  the rendering of an object has to be consistent temporally  avoid the perceptible distortions - blocking effects - flickering effects 13 April, 2015 Olivier Brouard Slide 5/47 Introduction Objectives & proposals  How to do ?  medium/long term decisions  « high level » considerations  no such tools within the current encoders  Solution  realize a video pre-analysis before the encoding step  guide the encoder in its decisions 13 April, 2015 Olivier Brouard Slide 6/47 Outline 1. Video pre-analysis 1.1 Advanced motion estimation 1.2 Spatio-temporal segmentation 1.3 Visual attention modeling 2. Applications: H.264 video coding 2.1 GOP structure adaptation 2.2 Adaptive quantization 13 April, 2015 Olivier Brouard Slide 7/47 1- Video pre-analysis Video pre-analysis  Based on HVS properties  « high level » information to the encoder  The Human Visual System (HVS) • • • • Luminance perception Color perception Contrast sensibility Masking effects  Visual Attention • Bottom-Up  guided by the saliency • Top-Down  guided by the tasks 13 April, 2015 Olivier Brouard Slide 8/47 1- Video pre-analysis Visual attention  Attributes guiding the deployment of visual attention [Wolfe 04] • Contrast, Motion, Color, Orientation, …  Visual attention modeling [Itti 01; Le Meur 07; Marat 10]  based on the Koch and Ullman model [Koch 85]  Perceptually important regions  most salient objects (physically and semantically)  Shapes of regions (saliency maps)  shape of objects [Milanese 1993]  moving objects attract our visual attention 13 April, 2015 Olivier Brouard Slide 9/47 1- Video pre-analysis Video pre-analysis 13 April, 2015 Olivier Brouard Slide 10/47 1- Video pre-analysis – Advanced motion estimation Spatio-temporal tube (1)  Visual fixing time in the HVS ~ 200 ms  Next generation of HDTV  1920x1080 in progressive mode at 50Hz  temporal segment of 9 frames: 180ms [Péchard 2007]  Assumption - uniform motion  spatio-temporal tube  coherence of the motion along a perceptually significant duration  motion vectors field more homogeneous 13 April, 2015 Olivier Brouard Slide 11/47 1- Video pre-analysis – Advanced motion estimation Spatio-temporal tube (2)  Implementation • spatial down-sampling • temporal down-sampling - central frame  current frame - 4 reference frames  The spatio-temporal tube minimizes => MSEG with k = -4, -2, +2, +4  MSEk  based on the 3 YUV components 13 April, 2015 Olivier Brouard Slide 12/47 1- Video pre-analysis – Spatio-temporal segmentation Global motion  Apparent motions due to  moving objects  camera motion  Motion segmentation  based on the residual motion  Affine model • a1, a2, a3, a4: deformation parameters • tx, ty: translation parameters • Vx, Vy: horizontal and vertical components of each MV (spatio-temporal tube) 13 April, 2015 Olivier Brouard Slide 13/47 1- Video pre-analysis – Spatio-temporal segmentation Global motion parameters estimation  Motion vectors fields  parameters estimation [Coudray 2005]  Global motion estimation in 2 steps: 1. For each MV (tube)  calculation of the derivatives • • accumulation of the parameters assumptions localization of the main peak 2. Accumulation of the residual MVs (tubes) 2-D histogram (tx, ty) 13 April, 2015 Olivier Brouard Slide 14/47 1- Video pre-analysis – Spatio-temporal segmentation Motion segmentation  2-D Histogram of the translation parameters  residual MVs (tx, ty)  Each histogram peak => a moving object  analysis of all the peaks  Iterative approach 1. Initialisation  detection of the main peak  greedy approach (local gradient) 2. Detection of the other peaks  greedy approach 13 April, 2015 Olivier Brouard Accumulation histogram Secondary peak Main peak Segmented space Slide 15/47 1- Video pre-analysis – Spatio-temporal segmentation Motion segmentation – results  need of a spatial and temporal regularization 13 April, 2015 Olivier Brouard Slide 16/47 1- Video pre-analysis Video pre-analysis 13 April, 2015 Olivier Brouard Slide 17/47 1- Video pre-analysis – Spatio-temporal segmentation Spatio-temporal regularization  Motion-based segmentation  some blocks are misclassified  more criteria to improve the segmentation • • • • connexity color texture motion  Markovian approach 13 April, 2015 Olivier Brouard Slide 18/47 1- Video pre-analysis – Spatio-temporal segmentation Markovian approach  The Hammersley-Clifford theorem [Besag 1974]  Gibbs distribution  Markov Random Field  the optimal label configuration  minimize a global energy function • E: label field • O: observation field  Markovian property  U(o, e): sum of potential functions defined on cliques • site  spatio-temporal tube 13 April, 2015 Olivier Brouard Slide 19/47 1- Video pre-analysis – Spatio-temporal segmentation Spatial regularization  Spatial connexity • Segmented region  locally homogeneous  Color features • color distributions  Bhattacharrya coefficient  discrete densities  Texture features • texture distributions  2 spatial gradients (Sobel filters)  Bhattacharrya coefficient 13 April, 2015 Olivier Brouard Slide 20/47 1- Video pre-analysis – Spatio-temporal segmentation Temporal regularization  Motion features  distance between the MVs  Temporal connexity • Segmented region => temporally homogeneous  segmentation map of the previous temporal segment  Regions tracking • criteria - color, texture, recovery video objects tracking 13 April, 2015 Olivier Brouard Slide 21/47 1- Video pre-analysis – Spatio-temporal segmentation Energy minimization  The global energy function -  potential functions  weigthing factors  Sequential sites processing  stack of instability 13 April, 2015 Olivier Brouard Slide 22/47 1- Video pre-analysis – Spatio-temporal segmentation Results motion segmentation only 13 April, 2015 Olivier Brouard regularized spatiotemporal segmentation Slide 23/47 1- Video pre-analysis Video pre-analysis 13 April, 2015 Olivier Brouard Slide 24/47 1- Video pre-analysis – Visual attention modeling Spatial saliency  Spatial saliency based on the color contrast [Aziz 2008]  color transformation: YUV to HSV • color features influencing the visual attention 1- Saturation Contrast 2- Intensity Contrast 3- Hue Contrast 4- Opponents Contrast 5- Warm and Cold colors Contrast 6- Dominance of the warm colors 7- Dominance of the luminance and saturation Spatial saliency: SSP => combination of these 7 features 13 April, 2015 Olivier Brouard Slide 25/47 1- Video pre-analysis – Visual attention modeling Temporal saliency  Temporal saliency based on the relative motion • • • : MV of the site s : dominant motion : relative motion of s => • maximum velocity of smooth pursuit of the eye [Daly 1998]: => 80°/s => temporal saliency ST 13 April, 2015 Olivier Brouard Slide 26/47 1- Video pre-analysis – Visual attention modeling Spatio-temporal saliency  Fusion of the spatial saliency and temporal saliency maps  Observers => focus on the center of the screen [Le Meur 2005]  weighting by a 2-D gaussian function 13 April, 2015 Olivier Brouard Slide 27/47 1- Video pre-analysis – Visual attention modeling Results 13 April, 2015 Olivier Brouard Slide 28/47 1- Video pre-analysis Possible applications  Video pre-analysis  information - moving objects segmentation, objects tracking - color, texture - salient regions  applications - advanced video coding video transmission with priority (saliency maps) video summarization, indexation …  ArchiPEG (ANR Project) - HD MPEG-4 AVC real-time compression - pre-analysis video resource 13 April, 2015 Olivier Brouard Slide 29/47 Outline 1. Video pre-analysis 1.1 Advanced motion estimation 1.2 Spatio-temporal segmentation 1.3 Visual attention modeling 2. Applications: H.264 video coding 2.1 GOP structure adaptation 2.2 Adaptive quantization 13 April, 2015 Olivier Brouard Slide 30/47 2- Applications: H.264 video coding – GOP structure adaptation GOP structure  Three kinds of frames: I, P, B • GOP begins by a I frame  intra coded • P frames at regular intervals  predicted • B frames between P frames  bi-predicted  Fixed interval between I frames • not adapted to changing scenes and temporal variations of the video => more bits  dynamic GOP size  irregular I-frames insertion  Typically: number of B frames = 1 or 2  good trade-off between bitrate and quality • low motion or panning of the camera  increase the number of B-frames 13 April, 2015 Olivier Brouard Slide 31/47 2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (1)  Analysis of the video sequences  x264 encoder  different fixed number of B frames: 0, 1, 2, 3 Video Sequence Optimal GOP configuration New Mobile and Calendar 2 B frames Night 2 B frames Knightshields 2 B frames Crew 1 B frame Park run 1 B frame Park joy no B frame Tractor no B frame Umbrella no B frame  optimal number of B frames => content dependent  classify videos according to their content 13 April, 2015 Olivier Brouard Slide 32/47 2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (2)  Spatio-temporal characterization -> 2 indices to evaluate the spatio-temporal activity - IT: temporal activity => MVs - IS: spatial activity => MSEG For each temporal segment 13 April, 2015 Olivier Brouard For the entire sequence Slide 33/47 2- Applications: H.264 video coding – GOP structure adaptation B frames adaptation (3)  Classification space function of IT and IS • classe Ci => i B frames between P-P or I-P frames  IT constant between P-P or I-P frames  same rule for IS 13 April, 2015 Olivier Brouard Slide 34/47 2- Applications: H.264 video coding – GOP structure adaptation GOP size adaptation (1)  Changes detection within a video shot • high motion  significant changes  reduce the interval • low motion  little variation  increase the interval • mid-range motion  classical approach => fixed GOP size  2 thresholds to detect critical changes - sh => high motion - sb => low motion 13 April, 2015 Olivier Brouard Slide 35/47 2- Applications: H.264 video coding – GOP structure adaptation GOP size adaptation (2)  Analysis of IT evolution  3 cases Mid-range motion High motion Low motion 13 April, 2015 Olivier Brouard Slide 36/47 2- Applications: H.264 video coding – GOP structure adaptation Performances  8 video sequences  4 different bitrates  defined by an experts group  Comparison between - x264 encoder: GOP size = 25, 2 B frames - a modified version => GOP structure adaptation 13 April, 2015 Olivier Brouard Slide 37/47 2- Applications: H.264 video coding – GOP structure adaptation Results  Rate – Distortion (PSNR) [Bjontegaard 2001] Video Sequence Bitrate gain (%) PSNR gain (dB) New Mobile and Calendar 9.15 0.32 Night 2.45 0.09 Knightshields 1.68 0.06 Park run -0.1 -0.01 Umbrella 4.11 0.13 Park joy 2.83 0.09 Crew 4.5 0.13 Tractor 10.94 0.48 Average 13 April, 2015 Olivier Brouard 4.45 0.16 Slide 38/47 2- Applications: H.264 video coding – GOP structure adaptation Subjective tests  Setup • • • • display  resolution 1920x1080 normalized room [BT.500-11] ~30 naïve observers (72=8x4x2+8) video sequences  Methodology  ACR • for each sequence  observers have to assess the quality 13 April, 2015 Olivier Brouard Slide 39/47 2- Applications: H.264 video coding – GOP structure adaptation Results • QGOP: MOS  modified coder • Qx264: MOS  x264 coder  Video Sequence New Mobile and Calendar Night Knightshields Park run Umbrella Park joy Crew Tractor Average 0.31 -0.02 0.24 0.04 -0.09 0.14 0.48 0.33 0.18 • sequences with a high IT value  high motion  GOP structure adaptation 13 April, 2015 Olivier Brouard Slide 40/47 2- Applications: H.264 video coding – Adaptive quantization Adaptive quantization  Objective  control the distribution of binaries resources  saliency maps  increase the perceived visual quality  Modification of the saliency maps  quantization and morphological filtering  Modification of the coder 13 April, 2015 Olivier Brouard Slide 41/47 2- Applications: H.264 video coding – Adaptive quantization Results (1)  Rate – Distortion (PSNR) [Bjontegaard 2001] Video Sequence Entire sequence Region of Interest Bitrate gain (%) PSNR gain (dB) Bitrate gain (%) PSNR gain (dB) New Mobile and Calendar -2.49 -0.09 -0.67 -0.03 Night -3.38 -0.12 -0.39 -0.02 Knightshields -3.02 -0.12 -0.84 -0.03 Parkrun -0.81 -0.03 0.25 0.01 Umbrella 2.34 0.07 4.17 0.14 Parkjoy 2.68 0.09 4.42 0.14 Crew -0.36 -0.01 2.74 0.09 Tractor 10.94 0.05 4.35 0.20 Average 13 April, 2015 -0.52 Olivier Brouard -0.02 1.75 0.06 Slide 42/47 2- Applications: H.264 video coding – Adaptive quantization Subjective assessments  Results • QQA: MOS  modified coder (adaptive quantization) • Qx264: MOS  x264 coder  Video Sequence New Mobile and Calendar Night Knightshields Park run Umbrella Park joy Crew Tractor Average -0.13 0.09 -0.02 -0.06 0.06 0.17 -0.06 0.04 0.04  no specific content suitable  unsuitable for coding and broadcasting of HDTV at high bitrate  overhead, linear law ? 13 April, 2015 Olivier Brouard Slide 43/47 Conclusion Conclusion (1)  Video pre-analysis • spatio-temporal segmentation  detection of moving objects  objects tracking • visual attention modeling  saliency maps  Applications • advanced video coding • video transmission with priority based on the saliency maps [Boulos 2010] • video summarization, indexation •… 13 April, 2015 Olivier Brouard Slide 44/47 Conclusion Conclusion (2)  Applications of the video pre-analysis • GOP structure adaptation - B frames dynamic variation  temporal segment classification IT and IS - GOP size adaptation  I frame insertion  change detection: IT • Adaptive quantization based on the saliency maps 13 April, 2015 Olivier Brouard Slide 45/47 Conclusion Conclusion (3)  Subjective quality assessment tests • GOP structure adaptation  no significant differences  +0.18 (on a scale of 1 to 5)  well suited for sequences with high motion • Adaptive quantization  no clearly content suitability  seems unsuitable for coding and broadcasting of HDTV at high bitrate … adaptation law could be modified … 13 April, 2015 Olivier Brouard Slide 46/47 Conclusion Perspectives  Better performance evaluation of our visual attention model  eye-tracking experiments  Psychophysical experiments to optimize the model parameters  improve the fusion process [Marat 2010]  Add high-level visual information  face, flesh hue, … 13 April, 2015 Olivier Brouard Slide 47/47 Thank you. Questions ? 13 April, 2015 Olivier Brouard Slide 48

Video pre-analysis

Related documents

Products

Support

Video pre-analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib