Time-Constant Histogram Matching for Colour

advertisement
TIME-CONSTANT HISTOGRAM MATCHING FOR COLOUR
COMPENSATION OF MULTI-VIEW VIDEO SEQUENCES
Ulrich Fecker, Marcus Barkowsky, and André Kaup
Multimedia Communications and Signal Processing
University of Erlangen-Nuremberg, Cauerstr. 7, 91058 Erlangen, Germany
{fecker, barkowsky, kaup}@LNT.de
ABSTRACT
Significant advances have recently been made in the coding of
video data recorded with multiple cameras. However, luminance and chrominance variations between the camera views
may deteriorate the performance of multi-view video codecs
and renderers. In this paper, the usage of time-constant histogram matching is proposed to compensate these differences
in a pre-filtering step. It is shown that the usage of histogram
matching prior to multi-view video coding leads to significant
gains for the coding efficiency of both the luminance and the
chrominance components. Histogram matching can also be
useful for image-based rendering to avoid incorrect illumination and colour reproduction resulting from miscalibrations in
the recording setup. It can be shown that the algorithm is further improved by additionally using RGB colour conversion.
Index Terms— Multi-view video, video coding, video
signal processing, image-based rendering
1. INTRODUCTION
Multi-view video is a technique where an object or a scene is
recorded using a setup of several synchronous cameras from
different positions. The resulting multi-view video sequence
consists of multiple video streams, one for each camera view.
Such a dataset can also be referred to as dynamic light field.
Applications for multi-view video techniques include threedimensional television (3D TV) and free-viewpoint television
(FTV), where the viewer is able to watch the scene individually from his desired viewpoint [1].
Multi-view video involves huge amounts of data, for
which efficient compression is necessary. In the recent past,
several coding schemes have been proposed which exploit not
only the temporal correlation between subsequent frames but
also the spatial correlation between frames from neighbouring cameras. Such a coding scheme, based on hierarchical
B pictures, is currently being standardised by the Joint Video
Team (JVT) of ISO/IEC MPEG and ITU-T VCEG [2, 3].
When multi-view video data is recorded, significant variations can often be observed between the luminance and
chrominance components of the different camera views. The
aim of this work is therefore to compensate these differences
in a pre-filtering step. Based on an idea outlined by Hekstra
et. al. in [4], the authors suggested in [5] to use histogram
matching prior to encoding multi-view data. One camera
view — close to the centre of the camera setup — is chosen as a reference. All other camera views are corrected in
such a way that their cumulative histograms fit the cumulative histogram of the reference view. This correction is done
individally for each time step of the sequence.
A statistical evaluation based on the method described in
[6] showed that in multi-view video coding, the spatial prediction efficiency across the camera views can be improved when
histogram matching is used [5]. However, applying the algorithm does not lead to a coding gain. This can be explained by
the fact that the algorithm is applied on the whole sequence
on a frame-by-frame basis. This leads to a good fit of the
different camera views at each particular point in time, but
it may also lead to variations between subsequent time steps,
which may deteriorate the coding performance and might also
affect the visual quality. In addition, the algorithm works in
the YCbCr colour space because this colour space is used in
common video codecs. However, the cameras causing the
miscalibrations usually operate in the RGB colour space.
That is why in this paper, the histogram matching algorithm is extended and improved to overcome these problems.
A time-constant algorithm is presented which avoids variations between subsequent time steps. Furthermore, RGB
colour conversion is considered to achieve a better adaption
to the distortions introduced by the camera setup.
One aim is to achieve a higher coding efficiency for multiview video. The performance of the algorithm is therefore
evaluated using the Joint Multiview Video Model (JMVM)
reference software. In addition, histogram matching may also
be useful for image-based rendering, as it frees the rendering
result from incorrect illumination and colour reproduction.
2. TIME-CONSTANT HISTOGRAM MATCHING
In this section, it is explained how the luminance and chrominance of a distorted sequence are adapted to a reference se-
quence using time-constant histogram matching. Each camera view is adapted separately to the reference view in the
centre of the camera setup. The calculations can be done in
any colour space (e. g. YCbCr or RGB) and are applied individually on each of the three components. Here, the procedure is exemplarily shown for the YCbCr colour space and
the luminance component Y. For the Cb and Cr components,
the procedure is done in an analogous manner.
yR [m, n, t] denotes the amplitude of the luminance image
at time step t of the reference sequence. As a first step, the
histogram of the reference sequence is calculated as follows:
Sum of occurrences
cD [i]
cR [i]
cR [u + 1]
cD [v]
cR [u]
1
u u+1
v
i
M[v] = u
Amplitude of the luminance signal
ℓ−1 h−1
X
X w−1
X 1
δ v, yR [m, n, t]
ℓ · w · h t=0 m=0 n=0
(
1 if a = b
with δ[a, b] =
0 else
Fig. 1. Details of the mapping algorithm shown in a section
of the cumulative histogram
(1)
In this equation, w denotes the width and h the height of
the image. The histogram is summed up over all frames of
the sequence, and the length of the sequence is denoted by
ℓ. If the histogram shall be calculated based on a part of the
sequence only, ℓ denotes the length of this interval. Next,
the cumulative histogram cR [v] of the reference sequence is
created:
cR [v] =
v
X
hR [i]
(2)
i=0
The histogram hD [v] and the cumulative histogram cD [v]
of the distorted sequence are calculated in the same manner.
Based on the cumulative histograms cR [v] and cD [v], a mapping function M is derived. We find the mapping by matching
the number of occurrences in the distorted image to the number of occurrences in the reference image:
M [v] = u with
cR [u] ≤ cD [v] < cR [u + 1]
(3)
This process is illustrated in Fig. 1. An example for a resulting mapping function is shown in Fig. 2. The mapping
may then be applied to the distorted sequence yD [m, n, t], resulting in the corrected sequence yC [m, n, t]:
yC [m, n, t] = M yD [m, n, t]
(4)
Please note that a single mapping function is created for
the whole sequence. This mapping function is then applied to
all frames in the distorted view. The algorithm is however applied separately for each camera view of a multi-view video
sequence except for the centre view, which serves as a correction basis for all other views. If N is the number of camera
views, N − 1 mapping functions are therefore created.
Depending on the application, luminance and chrominance compensation may also be helpful for the renderer and
does therefore not need to be reversed. This means that no
250
200
150
M[v]
hR [v] =
100
50
Mapping function
Inverse mapping function
0
0
50
100
150
200
250
v
Fig. 2. Example for a mapping function in the RGB colour
space and its inverse (Race1, view 0, R component)
additional data needs to be transmitted. If necessary, it could
however be approximately reversed by applying the inverse
of the mapping function (see Fig. 2 for an example) to the
decoded data. The amount of additional data in this case is
limited, as only one mapping function per corrected view is
involved for the whole sequence.
3. RGB COLOUR SPACE
The original algorithm operates in the YCbCr colour space,
correcting the Y, Cb and Cr components individually. This
choice is based on the fact that common video codecs operate in this colour space. In addition, the used test sequences
are mostly stored in the YCbCr colour space with a colour
subsampling according to 4:2:0. However, the cameras originally recording the sequences will in most cases operate in
the RGB colour space. Therefore, the ability of the correction
algorithm to compensate camera miscalibrations is improved
if it works in this colour space.
If the video data is stored as YCbCr sequences, a conversion to RGB needs to be done. As the resolution of the Cb
and Cr components is reduced by a factor of 2 horizontally
and vertically, their original resolution is additionally restored
by bilinear interpolation. After the conversion, the R, G and
44
42
42
40
40
PSNR [dB]
PSNR [dB]
44
38
36
Original, Y
Original, Cb
Original, Cr
Histogram Matching, Y
Histogram Matching, Cb
Histogram Matching, Cr
34
32
30
2000
4000
6000
8000
Bit rate [kbit/s]
10000
36
Original, Y
Original, Cb
Original, Cr
Histogram Matching, Y
Histogram Matching, Cb
Histogram Matching, Cr
34
32
30
2000
12000
Fig. 3. Coding performance using histogram matching in the
YCbCr colour space (Ballroom, 8 views)
38
4000
6000
8000
Bit rate [kbit/s]
10000
12000
Fig. 4. Coding performance using histogram matching in the
RGB colour space (Ballroom, 8 views)
44
4. CODING RESULTS
In this section, the effect of the described algorithm on the
performance of multi-view video coding is analysed. For
that, several test sequences have been compensated using histogram matching and have thereafter been coded using the
JMVM reference software (version 2.4). The PSNR values
have been calculated based on the difference between the encoder input and the decoder output. The reference therefore is
the compensated sequence when histogram matching is used
and the original sequence when the algorithm is not used.
The algorithm can either be applied in the YCbCr color
space or in the RGB color space, as described in the last
section. Coding experiments clearly showed that the performance is better when RGB colour conversion is used. As an
example, the coding performance is shown in Fig. 3 and 4 for
the “Ballroom” sequence using both colour spaces.
Figures 5 to 8 show the coding performance for several
other test sequences when time-constant histogram matching
with RGB colour conversion is used compared to the performance without histogram matching.
As can be seen from the plots, the coding performance
is in most cases improved for the luminance as well as the
chrominance components when histogram matching is applied. The PSNR of the Y component is typically about
0.2 dB to 0.7 dB higher using histogram matching. For the
Cb and Cr components, even larger gains of up to 1.9 dB can
be observed. For the “Breakdancers” and “Uli” sequences,
the PSNR of the luminance component stays the same or is
slightly deteriorated, but the PSNR of both chrominance components is improved.
42
40
PSNR [dB]
B components are processed using the histogram matching
algorithm. After that, the corrected sequences are converted
back to the YCbCr colour space to be passed to the multi-view
encoder. Though the data is converted from YCbCr to RGB
and back in floating-point arithmetic, still a quantisation error
is introduced by the histogram matching algorithm. This effect is however rather small and hardly affects the quality of
the video sequences.
38
36
Original, Y
Original, Cb
Original, Cr
Histogram Matching, Y
Histogram Matching, Cb
Histogram Matching, Cr
34
32
2000
4000
6000
8000
Bit rate [kbit/s]
10000
12000
Fig. 5. Coding performance using histogram matching in the
RGB colour space (Race1, 8 views)
5. SUMMARY AND CONCLUSIONS
A time-constant histogram matching algorithm was proposed
for luminance and chrominance compensation of multi-view
video sequences. The algorithm was described, and it could
further be improved by additionally using RGB colour conversion. For most of the tested sequences, the multi-view
coding performance could be increased by up to 0.7 dB for
the luminance component and by up to 1.9 dB for the chrominance components.
As the distorted sequence and the reference sequence
originate from cameras at different positions, they do not
show exactly the same content. Instead, there is a certain displacement between the two camera views. One might therefore think of calculating the histograms based on the overlapping area only. This has also been tested, using a phase correlation algorithm [7] to determine the global disparity between
the two sequences before the histogram calculation is performed. However, using global disparity compensation did
not further improve the coding performance for the particular
test sequences used. Nevertheless, it might lead to an improvement for sequences with larger disparities between the
camera views.
If not the whole sequence should be available for filtering, e. g. during a real-time transmission, time-constant histogram matching could be applied individually on small parts
of the sequence, such as single GOPs or groups of subsequent
44
38
42
36
40
PSNR [dB]
PSNR [dB]
40
34
32
Original, Y
Original, Cb
Original, Cr
Histogram Matching, Y
Histogram Matching, Cb
Histogram Matching, Cr
30
28
26
1000
2000
3000
4000
Bit rate [kbit/s]
5000
6000
Fig. 6. Coding performance using histogram matching in the
RGB colour space (Crowd, 5 views)
38
36
Original, Y
Original, Cb
Original, Cr
Histogram Matching, Y
Histogram Matching, Cb
Histogram Matching, Cr
34
32
30
1
1.5
2
2.5
Bit rate [kbit/s]
3
x 10
4
Fig. 8. Coding performance using histogram matching in the
RGB colour space (Uli, 8 views)
46
7. REFERENCES
44
[1] M. Tanimoto, “Free viewpoint television — FTV,” in
Picture Coding Symposium (PCS 2004), San Francisco,
CA, USA, Dec. 2004.
PSNR [dB]
42
40
38
Original, Y
Original, Cb
Original, Cr
Histogram Matching, Y
Histogram Matching, Cb
Histogram Matching, Cr
36
34
2000
3000
4000
5000
6000
Bit rate [kbit/s]
7000
8000
9000
Fig. 7. Coding performance using histogram matching in the
RGB colour space (Breakdancers, 8 views)
GOPs. Detecting scene changes and restarting the histogram
calculation after each scene change might also be beneficial.
If the video data is recorded using a fixed camera setup, mapping functions for each camera could also be generated in advance during a calibration step and could then be applied in
real time during the recording and transmission of the multiview video.
First results of a recent investigation indicate that a further gain is possible when histogram-matching is combined
with block-based illumination compensation techniques modifying the coder and decoder themselves, especially the approach implemented in the JMVM reference software [8].
6. ACKNOWLEDGEMENTS
This work was funded by the German Research Foundation
(DFG) within the Collaborative Research Centre “Modelbased analysis and visualisation of complex scenes and sensor
data” under grant SFB 603/TP C8. Only the authors are responsible for the content.
The authors would like to thank Peter Prokein for his valuable assistance with the implementation of the algorithm and
the simulations. Furthermore, the authors express their thanks
for providing test sequences to KDDI Corporation, the Interactive Visual Media Group at Microsoft Research, Mitsubishi
Electric Research Laboratories (MERL) and Fraunhofer HHI.
[2] A. Vetro, P. Pandit, H. Kimata, and A. Smolic, “Joint
multiview video model (JMVM) 3.0,” in Joint Video
Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, Document JVT-V207, Marrakech, Morocco, Jan. 2007.
[3] K. Mueller, P. Merkle, H. Schwarz, T. Hinz, A. Smolic,
T. Oelbaum, and T. Wiegand, “Multi-view video coding
based on H.264/MPEG4-AVC using hierarchical B pictures,” in Picture Coding Symposium (PCS 2006), Beijing, China, Apr. 2006.
[4] A. P. Hekstra, J. G. Beerends, D. Ledermann, F. E.
de Caluwe, S. Kohler, R. H. Koenen, S. Rihs, M. Ehrsam,
and D. Schlauss, “PVQM — a perceptual video quality measure,” Signal Processing: Image Communication,
vol. 17, no. 10, pp. 781–798, Nov. 2002.
[5] U. Fecker, M. Barkowsky, and A. Kaup, “Improving
the prediction efficiency for multi-view video coding using histogram matching,” in Picture Coding Symposium
(PCS 2006), Beijing, China, Apr. 2006.
[6] A. Kaup and U. Fecker, “Analysis of multi-reference
block matching for multi-view video coding,” in Proc.
7th Workshop Digital Broadcasting, Erlangen, Germany,
Sept. 2006, pp. 33–39.
[7] Y. Wang, J. Ostermann, and Y.-Q. Zhang, Video Processing and Communications, Prentice Hall, 2001.
[8] Y.-L. Lee, J.-H. Hur, Y.-K. Lee, K.-H. Han, S. Cho,
N. Hur, J. Kim, J.-H. Kim, P.-L. Lai, A. Ortega, Y. Su,
P. Yin, and C. Gomila, “CE11: illumination compensation,” in Joint Video Team (JVT) of ISO/IEC MPEG &
ITU-T VCEG, Document JVT-U052r2, Hangzhou, China,
Oct. 2006.
Download