TIME-CONSTANT HISTOGRAM MATCHING FOR COLOUR COMPENSATION OF MULTI-VIEW VIDEO SEQUENCES Ulrich Fecker, Marcus Barkowsky, and André Kaup Multimedia Communications and Signal Processing University of Erlangen-Nuremberg, Cauerstr. 7, 91058 Erlangen, Germany {fecker, barkowsky, kaup}@LNT.de ABSTRACT Significant advances have recently been made in the coding of video data recorded with multiple cameras. However, luminance and chrominance variations between the camera views may deteriorate the performance of multi-view video codecs and renderers. In this paper, the usage of time-constant histogram matching is proposed to compensate these differences in a pre-filtering step. It is shown that the usage of histogram matching prior to multi-view video coding leads to significant gains for the coding efficiency of both the luminance and the chrominance components. Histogram matching can also be useful for image-based rendering to avoid incorrect illumination and colour reproduction resulting from miscalibrations in the recording setup. It can be shown that the algorithm is further improved by additionally using RGB colour conversion. Index Terms— Multi-view video, video coding, video signal processing, image-based rendering 1. INTRODUCTION Multi-view video is a technique where an object or a scene is recorded using a setup of several synchronous cameras from different positions. The resulting multi-view video sequence consists of multiple video streams, one for each camera view. Such a dataset can also be referred to as dynamic light field. Applications for multi-view video techniques include threedimensional television (3D TV) and free-viewpoint television (FTV), where the viewer is able to watch the scene individually from his desired viewpoint [1]. Multi-view video involves huge amounts of data, for which efficient compression is necessary. In the recent past, several coding schemes have been proposed which exploit not only the temporal correlation between subsequent frames but also the spatial correlation between frames from neighbouring cameras. Such a coding scheme, based on hierarchical B pictures, is currently being standardised by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG [2, 3]. When multi-view video data is recorded, significant variations can often be observed between the luminance and chrominance components of the different camera views. The aim of this work is therefore to compensate these differences in a pre-filtering step. Based on an idea outlined by Hekstra et. al. in [4], the authors suggested in [5] to use histogram matching prior to encoding multi-view data. One camera view — close to the centre of the camera setup — is chosen as a reference. All other camera views are corrected in such a way that their cumulative histograms fit the cumulative histogram of the reference view. This correction is done individally for each time step of the sequence. A statistical evaluation based on the method described in [6] showed that in multi-view video coding, the spatial prediction efficiency across the camera views can be improved when histogram matching is used [5]. However, applying the algorithm does not lead to a coding gain. This can be explained by the fact that the algorithm is applied on the whole sequence on a frame-by-frame basis. This leads to a good fit of the different camera views at each particular point in time, but it may also lead to variations between subsequent time steps, which may deteriorate the coding performance and might also affect the visual quality. In addition, the algorithm works in the YCbCr colour space because this colour space is used in common video codecs. However, the cameras causing the miscalibrations usually operate in the RGB colour space. That is why in this paper, the histogram matching algorithm is extended and improved to overcome these problems. A time-constant algorithm is presented which avoids variations between subsequent time steps. Furthermore, RGB colour conversion is considered to achieve a better adaption to the distortions introduced by the camera setup. One aim is to achieve a higher coding efficiency for multiview video. The performance of the algorithm is therefore evaluated using the Joint Multiview Video Model (JMVM) reference software. In addition, histogram matching may also be useful for image-based rendering, as it frees the rendering result from incorrect illumination and colour reproduction. 2. TIME-CONSTANT HISTOGRAM MATCHING In this section, it is explained how the luminance and chrominance of a distorted sequence are adapted to a reference se- quence using time-constant histogram matching. Each camera view is adapted separately to the reference view in the centre of the camera setup. The calculations can be done in any colour space (e. g. YCbCr or RGB) and are applied individually on each of the three components. Here, the procedure is exemplarily shown for the YCbCr colour space and the luminance component Y. For the Cb and Cr components, the procedure is done in an analogous manner. yR [m, n, t] denotes the amplitude of the luminance image at time step t of the reference sequence. As a first step, the histogram of the reference sequence is calculated as follows: Sum of occurrences cD [i] cR [i] cR [u + 1] cD [v] cR [u] 1 u u+1 v i M[v] = u Amplitude of the luminance signal ℓ−1 h−1 X X w−1 X 1 δ v, yR [m, n, t] ℓ · w · h t=0 m=0 n=0 ( 1 if a = b with δ[a, b] = 0 else Fig. 1. Details of the mapping algorithm shown in a section of the cumulative histogram (1) In this equation, w denotes the width and h the height of the image. The histogram is summed up over all frames of the sequence, and the length of the sequence is denoted by ℓ. If the histogram shall be calculated based on a part of the sequence only, ℓ denotes the length of this interval. Next, the cumulative histogram cR [v] of the reference sequence is created: cR [v] = v X hR [i] (2) i=0 The histogram hD [v] and the cumulative histogram cD [v] of the distorted sequence are calculated in the same manner. Based on the cumulative histograms cR [v] and cD [v], a mapping function M is derived. We find the mapping by matching the number of occurrences in the distorted image to the number of occurrences in the reference image: M [v] = u with cR [u] ≤ cD [v] < cR [u + 1] (3) This process is illustrated in Fig. 1. An example for a resulting mapping function is shown in Fig. 2. The mapping may then be applied to the distorted sequence yD [m, n, t], resulting in the corrected sequence yC [m, n, t]: yC [m, n, t] = M yD [m, n, t] (4) Please note that a single mapping function is created for the whole sequence. This mapping function is then applied to all frames in the distorted view. The algorithm is however applied separately for each camera view of a multi-view video sequence except for the centre view, which serves as a correction basis for all other views. If N is the number of camera views, N − 1 mapping functions are therefore created. Depending on the application, luminance and chrominance compensation may also be helpful for the renderer and does therefore not need to be reversed. This means that no 250 200 150 M[v] hR [v] = 100 50 Mapping function Inverse mapping function 0 0 50 100 150 200 250 v Fig. 2. Example for a mapping function in the RGB colour space and its inverse (Race1, view 0, R component) additional data needs to be transmitted. If necessary, it could however be approximately reversed by applying the inverse of the mapping function (see Fig. 2 for an example) to the decoded data. The amount of additional data in this case is limited, as only one mapping function per corrected view is involved for the whole sequence. 3. RGB COLOUR SPACE The original algorithm operates in the YCbCr colour space, correcting the Y, Cb and Cr components individually. This choice is based on the fact that common video codecs operate in this colour space. In addition, the used test sequences are mostly stored in the YCbCr colour space with a colour subsampling according to 4:2:0. However, the cameras originally recording the sequences will in most cases operate in the RGB colour space. Therefore, the ability of the correction algorithm to compensate camera miscalibrations is improved if it works in this colour space. If the video data is stored as YCbCr sequences, a conversion to RGB needs to be done. As the resolution of the Cb and Cr components is reduced by a factor of 2 horizontally and vertically, their original resolution is additionally restored by bilinear interpolation. After the conversion, the R, G and 44 42 42 40 40 PSNR [dB] PSNR [dB] 44 38 36 Original, Y Original, Cb Original, Cr Histogram Matching, Y Histogram Matching, Cb Histogram Matching, Cr 34 32 30 2000 4000 6000 8000 Bit rate [kbit/s] 10000 36 Original, Y Original, Cb Original, Cr Histogram Matching, Y Histogram Matching, Cb Histogram Matching, Cr 34 32 30 2000 12000 Fig. 3. Coding performance using histogram matching in the YCbCr colour space (Ballroom, 8 views) 38 4000 6000 8000 Bit rate [kbit/s] 10000 12000 Fig. 4. Coding performance using histogram matching in the RGB colour space (Ballroom, 8 views) 44 4. CODING RESULTS In this section, the effect of the described algorithm on the performance of multi-view video coding is analysed. For that, several test sequences have been compensated using histogram matching and have thereafter been coded using the JMVM reference software (version 2.4). The PSNR values have been calculated based on the difference between the encoder input and the decoder output. The reference therefore is the compensated sequence when histogram matching is used and the original sequence when the algorithm is not used. The algorithm can either be applied in the YCbCr color space or in the RGB color space, as described in the last section. Coding experiments clearly showed that the performance is better when RGB colour conversion is used. As an example, the coding performance is shown in Fig. 3 and 4 for the “Ballroom” sequence using both colour spaces. Figures 5 to 8 show the coding performance for several other test sequences when time-constant histogram matching with RGB colour conversion is used compared to the performance without histogram matching. As can be seen from the plots, the coding performance is in most cases improved for the luminance as well as the chrominance components when histogram matching is applied. The PSNR of the Y component is typically about 0.2 dB to 0.7 dB higher using histogram matching. For the Cb and Cr components, even larger gains of up to 1.9 dB can be observed. For the “Breakdancers” and “Uli” sequences, the PSNR of the luminance component stays the same or is slightly deteriorated, but the PSNR of both chrominance components is improved. 42 40 PSNR [dB] B components are processed using the histogram matching algorithm. After that, the corrected sequences are converted back to the YCbCr colour space to be passed to the multi-view encoder. Though the data is converted from YCbCr to RGB and back in floating-point arithmetic, still a quantisation error is introduced by the histogram matching algorithm. This effect is however rather small and hardly affects the quality of the video sequences. 38 36 Original, Y Original, Cb Original, Cr Histogram Matching, Y Histogram Matching, Cb Histogram Matching, Cr 34 32 2000 4000 6000 8000 Bit rate [kbit/s] 10000 12000 Fig. 5. Coding performance using histogram matching in the RGB colour space (Race1, 8 views) 5. SUMMARY AND CONCLUSIONS A time-constant histogram matching algorithm was proposed for luminance and chrominance compensation of multi-view video sequences. The algorithm was described, and it could further be improved by additionally using RGB colour conversion. For most of the tested sequences, the multi-view coding performance could be increased by up to 0.7 dB for the luminance component and by up to 1.9 dB for the chrominance components. As the distorted sequence and the reference sequence originate from cameras at different positions, they do not show exactly the same content. Instead, there is a certain displacement between the two camera views. One might therefore think of calculating the histograms based on the overlapping area only. This has also been tested, using a phase correlation algorithm [7] to determine the global disparity between the two sequences before the histogram calculation is performed. However, using global disparity compensation did not further improve the coding performance for the particular test sequences used. Nevertheless, it might lead to an improvement for sequences with larger disparities between the camera views. If not the whole sequence should be available for filtering, e. g. during a real-time transmission, time-constant histogram matching could be applied individually on small parts of the sequence, such as single GOPs or groups of subsequent 44 38 42 36 40 PSNR [dB] PSNR [dB] 40 34 32 Original, Y Original, Cb Original, Cr Histogram Matching, Y Histogram Matching, Cb Histogram Matching, Cr 30 28 26 1000 2000 3000 4000 Bit rate [kbit/s] 5000 6000 Fig. 6. Coding performance using histogram matching in the RGB colour space (Crowd, 5 views) 38 36 Original, Y Original, Cb Original, Cr Histogram Matching, Y Histogram Matching, Cb Histogram Matching, Cr 34 32 30 1 1.5 2 2.5 Bit rate [kbit/s] 3 x 10 4 Fig. 8. Coding performance using histogram matching in the RGB colour space (Uli, 8 views) 46 7. REFERENCES 44 [1] M. Tanimoto, “Free viewpoint television — FTV,” in Picture Coding Symposium (PCS 2004), San Francisco, CA, USA, Dec. 2004. PSNR [dB] 42 40 38 Original, Y Original, Cb Original, Cr Histogram Matching, Y Histogram Matching, Cb Histogram Matching, Cr 36 34 2000 3000 4000 5000 6000 Bit rate [kbit/s] 7000 8000 9000 Fig. 7. Coding performance using histogram matching in the RGB colour space (Breakdancers, 8 views) GOPs. Detecting scene changes and restarting the histogram calculation after each scene change might also be beneficial. If the video data is recorded using a fixed camera setup, mapping functions for each camera could also be generated in advance during a calibration step and could then be applied in real time during the recording and transmission of the multiview video. First results of a recent investigation indicate that a further gain is possible when histogram-matching is combined with block-based illumination compensation techniques modifying the coder and decoder themselves, especially the approach implemented in the JMVM reference software [8]. 6. ACKNOWLEDGEMENTS This work was funded by the German Research Foundation (DFG) within the Collaborative Research Centre “Modelbased analysis and visualisation of complex scenes and sensor data” under grant SFB 603/TP C8. Only the authors are responsible for the content. The authors would like to thank Peter Prokein for his valuable assistance with the implementation of the algorithm and the simulations. Furthermore, the authors express their thanks for providing test sequences to KDDI Corporation, the Interactive Visual Media Group at Microsoft Research, Mitsubishi Electric Research Laboratories (MERL) and Fraunhofer HHI. [2] A. Vetro, P. Pandit, H. Kimata, and A. Smolic, “Joint multiview video model (JMVM) 3.0,” in Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Document JVT-V207, Marrakech, Morocco, Jan. 2007. [3] K. Mueller, P. Merkle, H. Schwarz, T. Hinz, A. Smolic, T. Oelbaum, and T. Wiegand, “Multi-view video coding based on H.264/MPEG4-AVC using hierarchical B pictures,” in Picture Coding Symposium (PCS 2006), Beijing, China, Apr. 2006. [4] A. P. Hekstra, J. G. Beerends, D. Ledermann, F. E. de Caluwe, S. Kohler, R. H. Koenen, S. Rihs, M. Ehrsam, and D. Schlauss, “PVQM — a perceptual video quality measure,” Signal Processing: Image Communication, vol. 17, no. 10, pp. 781–798, Nov. 2002. [5] U. Fecker, M. Barkowsky, and A. Kaup, “Improving the prediction efficiency for multi-view video coding using histogram matching,” in Picture Coding Symposium (PCS 2006), Beijing, China, Apr. 2006. [6] A. Kaup and U. Fecker, “Analysis of multi-reference block matching for multi-view video coding,” in Proc. 7th Workshop Digital Broadcasting, Erlangen, Germany, Sept. 2006, pp. 33–39. [7] Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Communications, Prentice Hall, 2001. [8] Y.-L. Lee, J.-H. Hur, Y.-K. Lee, K.-H. Han, S. Cho, N. Hur, J. Kim, J.-H. Kim, P.-L. Lai, A. Ortega, Y. Su, P. Yin, and C. Gomila, “CE11: illumination compensation,” in Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Document JVT-U052r2, Hangzhou, China, Oct. 2006.