Video Tonal Stabilization via Color States Smoothing

advertisement
4838
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014
Video Tonal Stabilization via Color
States Smoothing
Yinting Wang, Dacheng Tao, Senior Member, IEEE, Xiang Li, Mingli Song, Senior Member, IEEE,
Jiajun Bu, Member, IEEE, and Ping Tan, Member, IEEE
Abstract— We address the problem of removing video color
tone jitter that is common in amateur videos recorded with handheld devices. To achieve this, we introduce color state to represent
the exposure and white balance state of a frame. The color state
of each frame can be computed by accumulating the color transformations of neighboring frame pairs. Then, the tonal changes
of the video can be represented by a time-varying trajectory
in color state space. To remove the tone jitter, we smooth the
original color state trajectory by solving an L1 optimization
problem with PCA dimensionality reduction. In addition, we
propose a novel selective strategy to remove small tone jitter
while retaining extreme exposure and white balance changes
to avoid serious artifacts. Quantitative evaluation and visual
comparison with previous work demonstrate the effectiveness of
our tonal stabilization method. This system can also be used as
a preprocessing tool for other video editing methods.
Index Terms— Tonal stabilization, color state, L1 optimization,
selective strategy.
I. I NTRODUCTION
A
VIDEO captured with a hand-held device, such as a
cell-phone or a portable camcorder, often suffers from
undesirable exposure and white balance changes between
successive frames. This is caused mainly by the continuous
automatic exposure and white balance control of the device
in response to illumination and content changes of the scene.
We use “tone jitter” to describe these undesirable exposure and
white balance changes. The first row of Fig. 1 shows an example of a video with tone jitter; it can be seen that some surfaces
(e.g., leaves, chairs and glass windows) in frames extracted
from the video have different exposures and white balances.
Manuscript received September 24, 2013; revised March 23, 2014 and
July 23, 2014; accepted September 5, 2014. Date of publication September 17,
2014; date of current version September 30, 2014. This work was supported
in part by the National Natural Science Foundation of China under Grant
61170142, in part by the Program of International Science and Technology Cooperation under Grant 2013DFG12840, National High Technology
Research and Development Program of China (2013AA040601), and in part
by the Australian Research Council under Project Grant FT-130101457 and
Project DP-120103730. The associate editor coordinating the review of this
manuscript and approving it for publication was Prof. Joseph P. Havlicek.
(Corresponding author: Mingli Song.)
Y. Wang, X. Li, M. Song, and J. Bu are with the College of
Computer Science, Zhejiang University, Hangzhou 310027, China (e-mail:
brooksong@ieee.org).
D. Tao is with the Centre for Quantum Computation and Intelligent
Systems, Faculty of Engineering and Information Technology, University of
Technology, Sydney, NSW 2007, Australia (e-mail: dacheng.tao@uts.edu.au).
P. Tan is with the School of Computing Science, Simon Fraser University,
Burnaby, BC V5A 1S6, Canada (e-mail: pingtan@sfu.ca).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2014.2358880
It is of great importance to create a tonally stabilized video by
removing tone jitter for online sharing or further processing.
In this paper therefore, we address this video tonal stabilization
problem to remove undesirable tone jitter in a video.
Farbman and Lischinski [1] have proposed a method to
stabilize the tone of a video. One or more anchors to the frames
of the input video are first designated, and then an adjustment
map is computed for each frame to make all frames appear to
be filmed with the same exposure and white balance settings as
the corresponding anchor. The adjustment map is propagated
from one frame to its neighbor, based on an assumption that a
large number of the pixel grid points from the two neighboring
frames will sample the same scene surfaces. However, this
assumption is usually erroneous, especially when the camera
undergoes sudden motion or the scene has complex textures.
In this case, a very small corresponding pixel set will be
produced, and erratic color changes will occur in some regions
of the final output. The performance of this method also
depends on the anchor selection. Therefore, it is tedious for
users to carefully examine the entire video and select several
frames as anchors following strict rules. If we simply set one
anchor to the middle frame, or two anchors to the first and
last frames, the result video might suffer from over-exposure
artifacts or contrast loss, especially in videos of a scene with
high dynamic range.
Exposure and white balance changes in an image sequence
have been studied in panoramic image construction before
the work of Farbman and Lischinski on tonal stabilization.
To compensate for these changes, earlier approaches compute
a linear model that matches the averages of each channel over
the overlapping area in RGB [2] or YCbCr color spaces [3],
while Zhang et al. [4] constructed a mapping function
between the color histograms in the overlapping area. However, these models are not sufficiently accurate to represent
tonal changes between frames and may result in unwanted
compensation results. Other methods have been proposed to
perform color correction using non-linear models, such as a
polynomial mapping function [5] and linear correction for
chrominance and gamma correction for luminance [6]. However, these models have the limitations of large accumulation
errors and high computational complexities when adapted to
video tonal stabilization.
If the camera response function is known, the video tone
can be stabilized by applying the camera response function inversely to each frame. Several attempts have been
made to model the camera response function by utilizing
1057-7149 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WANG et al.: VIDEO TONAL STABILIZATION VIA COLOR STATES SMOOTHING
4839
Fig. 1. Five still frames extracted from a video with unstable tone and the results of tonal stabilization. Top: the original frames. Middle: the result of
removing all tone jitter. Bottom: the result using our tonal stabilization strategy.
a gamma curve [7], polynomial [8], semi-parametric [9] or
PCA model [10]. However, most of these methods require
perfect pixel-level image alignment, which is unrealistic
in practice for amateur videos. The work proposed by
Kim et al. [11], [12] jointly tracks features and estimates the
radiometric response function of the camera as well as exposure differences between the frames. Grundmann et al. [13]
employed the KLT feature tracker to find the pixel correspondences for alignment. After alignment, they locally computed
the response curves for key frames and then interpolated these
curves to generate the pixel-to-irradiance mapping. These two
methods adjust all frames to have the same exposure and white
balance according to the estimated response curves, without
taking into account any changes in the illumination and content
of the scene; this leads to artifacts of over-exposure, contrast
loss or erratic color in the results.
Color transfer is a topic that is highly related to this paper.
It is possible to stabilize a video using a good color transformation model to make all frames have a tone similar to the
selected reference frame. Typical global color transformation
models are based on a Gaussian distribution [14], [15] or
histogram matching [16]. An and Pellacini [17] proposed a
joint model that utilizes an affine model for chrominance
and a mapping curve for luminance. Chakrabarti et al. [18]
extended the six-parameter color model introduced in [19]
to three nonlinear models, independent exponentiation, independent polynomial and general polynomial, and proved that
the general polynomial model has the smallest RMS errors.
Local model-based methods [20]–[23] either segment the
images and then compute a local model for each corresponding segment pair or estimate the local mapping between a
small region of the source image and the target image and
then propagate the adjustment to the whole image. While
these global and local models are powerful for color transfer
between a pair of images, stabilizing the tone of a video
by using frame-to-frame color transfer is still impractical
because of error accumulation. Furthermore, they cannot handle the large exposure and white balance changes contained in
some video.
Commercial video editing tools, such as Adobe Premiere or
After Effect, can be used to remove tone jitter. However, too
many user interactions are required to manually select the key
frames and edit their exposure and white balance.
In summary, there are two major difficulties in stabilizing
the tone of a video:
• How to represent the tone jitter? A robust model is
required to describe the tonal change between frames.
Because the video contains camera motion and the exposure and white balance setting are not constant, it is very
challenging to model the exposure and white balance
change accurately.
• How can the tone jitter be removed selectively? A good
strategy should be proposed for tonal stabilization.
It should be able to remove tonal jitter caused by imperfect automatic exposure and white balance control, while
preserving necessary tonal changes due to illumination
and content change of the scene. Videos captured in
complex light conditions may have a wide exposure
and color range, and neighboring frame pairs from such
videos may exhibit very sharp color or exposure changes.
Removing these sharp changes will produce artifacts of
over-exposure, contrast loss or erratic colors (refer to
the second row of Fig. 1). A perfect tonal stabilization
strategy will eliminate small tone jitter while preserving
sharp exposure and white balance changes, as in the result
shown in the last row of Fig. 1.
To overcome these two difficulties, a novel video tonal
stabilization framework is proposed in this paper. We introduce
a new concept of color state, which is a parametric representation of the exposure and white balance of a frame. The tone
jitter can then be represented by the change of the color states
between two successive frames. To remove the tone jitter, a
smoothing technique is applied to the original color states to
obtain the optimal color states. We then adjust each frame to its
new color state and generate the final output video. In this way,
our method stabilizes the tone of the input video and increases
its visual quality. Additionally, the proposed method can also
serve as a pre-processing step for other video processing and
4840
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014
Fig. 2. Flowchart of our tonal stabilization method. (a) The input frames. (b) The aligned frames. (c) The correspondence masks. (d) The original color
states St . (e) Mt . (f) The new color states Pt . (g) The update matrices Bt . (h) The output frames.
computer vision applications, such as video segmentation [24],
object tracking [25], etc.
Inspired by the camera shake removal work of
Grundmann et al. [26], in which the camera pose of
each frame in an input video is first recovered and then
smoothed to produce a stabilized result, our method further
extends the framework and applies it to this new video tonal
stabilization problem. Specifically, the contributions of our
work are as follows:
• We use color state, a parametric representation, to
describe the exposure and white balance of an image.
With this representation, the tone of a frame is described
as a point in a high dimensional space, and then the
video tonal stabilization problem can be modeled as a
smoothing process of the original color states;
• For the first time, we propose a selective strategy to
remove undesirable tone jitter while preserving exposure
and white balance changes due to sharp illumination and
scene content changes. This strategy can help to avoid
the artifacts of over-exposure, contrast loss or erratic
color when processing videos with high dynamic ranges
of tone;
• To achieve tonal stabilization, we combine PCA
dimensionality reduction with linear programming
for color state smoothing. This not only significantly
improves the stabilization results but also greatly reduces
the computational cost.
II. OVERVIEW
In this paper, we use color state to represent the exposure
and white balance of each frame in the input video. With this
representation, the tonal changes between successive frames
form a time-varying trajectory in the space of the color states.
Undesirable tone jitter in the video can then be removed by
adaptively smoothing the trajectory.
Fig. 2 shows the flowchart of our method. We first conduct
a spatial alignment and find the corresponding pixel pairs
between successive frames. This helps us estimate the original
color states, denoted as S. The path of S is then smoothed
by an L1 optimization with PCA dimensionality reduction to
obtain the stabilized color states P. An update matrix Bt is
then estimated, and by applying it, each frame t is transferred
from the original state St to the new color state Pt to generate
the final output video.
We propose a selective strategy to implement video tonal
stabilization. Because some videos have sharp exposure and
white balance changes, transferring all of the frames to have
the same tone will result in serious artifacts. Our goal is to
keep the color states constant in the sections of the video with
little tone jitter and give the color states a smooth transition
between the sections with sharp tone changes. Thus, we adopt
the idea in [26] to smooth the path of color states so that
it contains three types of motion corresponding to different
situations of exposure and white balance changes:
• Static: A static path means the final color states stay
unchanged, i.e., Dt1 (P) = 0, where Dtn (.) is the n-th
derivative at t.
• Constant speed: A constant rate of changes allows the
tone of the video to change uniformly from one color
state to another, i.e., Dt2 (P) = 0.
• Constant acceleration: The segments of static and constant rate are both stable; constant acceleration in color
state space is needed to connect two discrete stable
segments. The transition from one stable segment with
a constant acceleration to another segment will make the
video tone change smoothly, i.e., Dt3 (P) = 0.
To obtain the optimal path composed of distinct constant,
linear and parabolic segments instead of a superposition of
them, we use L1 optimization to minimize the derivatives of
the color states. Our main reason for choosing L1 rather than
L2 optimization is that the solution induced by the L1 cost
function is sparse, i.e., it will attempt to satisfy many of the
above motions along the path exactly. The computed path
therefore has derivatives that are exactly zero for most segments, which is very suitable for our selective strategy. On the
other hand, the L2 optimization will satisfy the above motions
WANG et al.: VIDEO TONAL STABILIZATION VIA COLOR STATES SMOOTHING
4841
on average (in the least-squares sense), which results in small
but non-zero gradients. Qualitatively, the L2 optimized color
state path always has some small non-zero motion (most likely
in the direction of the original color state motion), while the
L1 optimized path is composed only of segments resembling
static, constant speed and constant acceleration.
The rest of this paper is organized as follows. In Section III,
we introduce a clear definition of the color state and show how
to estimate it. A color state smoothing method is presented in
Section IV to stabilize the path of color states. We show our
experimental results in Section V and conclude the paper in
Section VI.
III. D EFINITION OF C OLOR S TATE
A. Frame Color State
In this paper, we use the term “color state” to represent the
exposure and white balance of an image. Let St denote the
color state of frame t of the video. The change to color state St
from St −1 is considered to be the exposure and white balance
changes between these two frames. We use the following affine
transformation to model the color state change between two
successive frames,
⎤
⎡
a00 a01 a02 b0
⎢a10 a11 a12 b1 ⎥
⎥
(1)
A=⎢
⎣a20 a21 a22 b2 ⎦ .
0
0
0 1
An affine transformation includes a series of linear transformations, such as a translation, scaling, rotation or similarity
transformation. These transformations can model the exposure
and white balance changes well. An affine model has been
successfully applied to user-controllable color transfer [17]
and color correction [27]. In practice, although most cameras
contain non-linear processing components, we find in our
experiment that an affine model can approximate a non-linear
transformation well and produce results with negligible errors.
Given a pair of images I and J of different tones, a color
j can be applied to transfer the
transformation function A
i
pixels in I to have the same exposure and white balance
as their corresponding pixels in J . Let x and x denote a
pair of corresponding pixels in I and J , respectively, and
Ix = [IxR , IxG , IxB ]T and Jx = [ JxR , JxG , JxB ]T represent the
j (Ix ).
colors of these two pixels. Then, Jx = A
i
⎡
⎤ ⎡
⎤⎡ R ⎤ ⎡ ⎤
R
J Ix
a00 a01 a02
b0
⎢ xG ⎥ ⎣
⎢ G⎥ ⎣ ⎦
⎦
(2)
⎣ Jx ⎦ = a10 a11 a12 ⎣ Ix ⎦ + b1
a20 a21 a22
b2
J B
IxB
x
Note that the color transfer process in all our experiments is
performed in the log domain due to the gamma correction of
input videos.
Let Att −1 denote the color transformation between frames
t −1 and t. Then, the color state of frame t can be represented
by St = Att −1 St −1 . The color transformation between S0 and
St can be computed by accumulating all of the transformation
matrices from frame 0 to frame t, i.e.,
St = Att −1 . . . A21 A10 S0 .
(3)
Fig. 3. Color transfer results by our affine model. (a) The source image I .
(b) The target image J . (c) The aligned image. (d) The correspondence mask.
The white pixels are the corresponding pixels used to estimate A, and the
black ones are the expelled outliers. (e) The result without the constraint
term. (f) The result with the identity constraint and using a uniform weight
ωc = 2 × 103 .
Thus St is a 4 × 4 affine matrix and has 12 degrees of
freedom. We can further set the color state of the first frame
S0 to be identity matrix. Therefore, to compute the color
state of each video frame, we need only to estimate the color
transformation matrices for all neighboring frame pairs. In the
next subsection, we will present how to estimate the color
transformation model.
B. Color Transformation Model Estimation
For a common scene point projected in two different frames
I and J at pixel locations x and x , respectively, the colors of
these two pixels Ix and Jx should be the same. Therefore,
j
if the corresponding pixels can be found, the matrix Ai
describing the color transformation between frames I and J
can be estimated by minimizing
j (Ix )2 .
Jx − A
(4)
i
x
To estimate the color transformation matrix, we first need to
find the pixel correspondences. However, we cannot use the
sparse matching of local feature points directly to estimate
the transformation because local feature points are usually
located at corners, where the surface color is not well defined.
The positive aspect is that local feature descriptors (such as
SIFT [28] and LBP [29]) are robust to tone differences. Thus,
we can use the sparse matching of local feature points to align
frames. To achieve that, we track the local feature points using
pyramidal Lucas-Kanade [30] and then compute a homography
between two successive frames to align them. Fig. 3 shows a
challenging alignment case in which (c) is the aligned result
from (a) to (b). After alignment, two pixels having the same
coordinates in the two images can be treated as a candidate
corresponding pixel pair.
To estimate the color transformation more accurately, we
further process the candidate correspondence set to remove
outliers. Firstly, the video frames may contain noise, which
will affect the model computation. To avoid that, we employ
a bilateral filter [31] to smooth the video frames. Secondly, the
colors of pixels on edges are not well defined and cannot be
4842
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014
used to estimate the color transformation. Therefore, we conduct edge detection and exclude pixels around edges. Thirdly,
because under- and over-exposure truncation may affect estimating the model, we discard all under- and over-exposed pixels from the candidate correspondence set. Fourthly, we adopt
the RANSAC algorithm to further expel outliers during the
model estimation, avoiding the effect caused mainly by noise
and dynamic objects (such as moving persons, cars or trees,
in the scene). Fig. 3(d) shows the correspondence mask. Note
that in our implementation, frames are downsampled to a small
size (shorter side of 180 pixels), so that the computational cost
is reduced while an accurate color transformation can still be
estimated.
We notice that if we estimate the color transformation by
directly minimizing Equation (4), the affine model tends to
over-fit the data and accumulate errors at each stage, especially
for scenes that contain large regions of similar color. To avoid
the over-fitting problem, we can add a regularization term to
Equation (4). Because the color tone of a natural video is
unlikely to change much between two successive frames, the
color transformation of two successive frames should be very
close to an identity matrix. Based on this observation, the
estimation of color transformation can be re-formulated as
ωc j (Ix )2 + A j − I4×4 2 .
Jx − A
(5)
i
i
|X| x
ωc
Here, I4×4 is a 4 × 4 identity matrix. |X|
is the weight used
to combine the two terms, where |X| denotes the number of
corresponding pixel pairs and ωc was set to 2 × 103 in our
experiments.
The identity constraint helps to choose the solution closer
to the identity matrix only when getting several solutions
with similar small errors by minimizing the first term of
Equation (5), which reduces the over-fitting problem and
improves the estimation accuracy. Taking the scene in Fig. 3 as
an example, the estimated model without using the regularizer
over-fits to the creamy-white pixels (table, bookshelf and wall)
and causes large errors for the highlighted regions. To provide
a numerical comparison, we placed a color check chart into
this scene. The accuracy of color transfer is measured by an
angular error [32] in RGB color space, which is the angle
between the mean color of the pixels inside a color quad (c)
in the transfer result
their mean (ĉ) in the target image,
cT ĉ and
. The average angular errors of each
e AN G = arccos c
ĉ color quad from (e) and (f) are 5.943◦ and 1.897◦, respectively.
The quad in the color of Red (Row 3, Col 3) in (e) has the
highest error, 22.03◦. For comparison, the highest error of the
color check chart in (f) is 4.476◦ from Light Blue (Row 3,
Col 6). Note that the color check chart is not used to aid
model estimation.
IV. C OLOR S TATES S MOOTHING
To remove tone jitter, we can transfer all video frames
to have a constant color state. However, as shown in
Section I, large tone changes need to be retained in videos
that contain large illumination or scene content changes;
otherwise, artifacts may arise after removing tone jitter, such
as over-exposure, contrast loss or erratic color. In this paper,
we propose a selective strategy for video tonal stabilization.
Under this selective strategy, we generate a sequence of frames
with smooth color states and meanwhile constrain the new
color states close to the original. Here, ‘smooth’ means that
the color states remain static or are in uniform or uniformly
accelerated motion.
We utilize an L1 optimization to find the smooth color states
by minimizing the energy function,
E = ωs E(P, S) + E(P).
(6)
E(P, S) reflects our selective strategy, which is a soft constraint and ensures that the new color states P do not deviate
from the original states S,
E(P, S) = |P − S|1 .
(7)
E(P) smoothes the frame color states. As mentioned in
Section II, the first, second and third derivatives of Pt should
be minimized to make the path of Pt consist of segments of
static, constant speed or constant acceleration motion.
E(P) = ω1 |D 1 (P)|1 + ω2 |D 2 (P)|1 + ω3 |D 3 (P)|1 ,
(8)
where D n (P) represents the n-th derivative of the new color
states P. Minimizing |D 1 (P)|1 causes the color states to tend
to be static. Likewise, |D 2 (P)|1 constrains the color states to
uniform motion, and |D 3 (P)|1 is relative with the acceleration
of the color state motion. The weights ω1 , ω2 and ω3 balance
these three derivatives. In our experiments, ω1 , ω2 and ω3 were
set to 50, 1 and 100, respectively. The weight ωs combining the
two terms is the key parameter in our method. It makes the new
color states either have an ideal smooth path or remain very
close to the original states. We conducted many experiments
to analyze ωs and found that [0.01, 0.3] is the tunable range
of ωs . When ωs = 0.01, the new color states remain constant.
In contrast, if ωs = 0.3, the new paths of color states retain
part of the initial motion. A detailed discussion of parameter
setting is presented in Section V.
Here we discuss the smoothness of color states in different
frames. We can optimize Pt similarly to the L1 camera
shake removal method [26], using forward differencing to
derive |D n (P)|1 and minimizing their residuals. However, this
method has limitations. From its optimization objective, the
12 elements of color state are smoothed independently, whose
result is that the 12 elements do not change synchronously.
Fig. 4 shows the curves of the new states generated by an
L1 optimization-based method as in [26]. To relieve this
problem in removing camera shake, an inclusion constraint is
utilized in [26] that the four corners of the crop rectangle must
reside inside the frame rectangle. However, we cannot find the
corners or boundaries of color states. So optimizing the new
color states by the L1 optimization directly will result in some
new color states being outside the clusters of the original color
states, and the corresponding output frames will have erratic
colors (shown in the middle row of Fig. 5).
We therefore seek to improve the smoothing method so that
all 12 elements of color state can change in the same way. The
path of original color states is a curve in a 12D space; if we
WANG et al.: VIDEO TONAL STABILIZATION VIA COLOR STATES SMOOTHING
4843
Algorithm 1 LP for Color States Optimization
Fig. 4. The curves of the 12 color state elements before and after the
L1 optimization without PCA. Green curves: original color state elements.
Red curves: new color state elements. From top to bottom, left to right, each
curve corresponds to an element of color state. The vertical axis plots the value
of the corresponding element, and the horizontal axis plots the frame index.
Fig. 6. An example of the paths of original color states (Red curve) and the
linear subspace (Green curve) generated by PCA. Note that the plotted dots
are not the real color states but simulation values. We choose a point in the
scene and use the color curve over time to simulate the color state path.
Fig. 5. The comparison of stabilization results generated by the L1 optimization methods without and with PCA. Top: the input video. Middle: the result
generated by the L1 optimization without PCA, which contains erratic color in
some output frames. Bottom: the result using the L1 optimization with PCA.
can constrain all of the new color states to be along a straight
line, the above problem will be solved. We employ Principal
Component Analysis (PCA) [33], [34] to find a linear subspace
near the original color state path. Using PCA, the color states
ci ci
can be represented by St = S̄ +
i βt S , where S̄ denotes
c
c
the mean color state over t, and S i and βt i denote the
eigenvector and eigenvalue of the i -th component, respectively.
S̄ and S ci are 4 × 4 matrices, and βtci is a scalar. The mean
color state S̄ and the eigenvector of the first component S c1
are used to build this linear subspace, and the new color states
are encoded as
Pt = S̄ + Mt S .
c1
(9)
Here, Mt denotes the new coefficient, which is a scalar.
Because S c1 is the first principle component corresponding to
the largest eigenvalue βtc1 , the line of the new color states
will not deviate much from the original color state path,
as shown in Fig. 6. In this way, we limit the degrees of
freedom of the solution to a first order approximation. Then,
in the smoothing method we only need to take into account
minimizing the magnitude of the velocity and acceleration of
the color state path in the L1 optimization and do not need
consider their direction changes. Our method is different from
the dimensionality-reduction-based smoothing methods that
directly smooth the coefficients of the first or several major
components with a low-pass filter; we will find a smoothly
changing function (a smooth curve) subject to t, and then all
12 elements of the new color states will have the same motion
as the curve of Mt . After PCA, our L1 optimization objective
is re-derived and minimized based on Equation (9).
Minimizing E(P, S): The new formulation of Pt is substituted into Equation (7),
S̄ + Mt S c1 − St .
(10)
E(P, S) =
1
t
Minimizing E(P): Forward differencing is used to compute
the derivatives of P. Then
|D 1 (P)| =
|Pt +1 − Pt |
t
=
|( S̄ + Mt +1 S c1 ) − ( S̄ + Mt S c1 )|
t
≤
|S c1 ||Mt +1 − Mt |.
t
Because |S c1 | is known, we only need minimize |Mt +1 − Mt |,
i.e., |D 1 (M)|. Similarly, we can prove that |D 2 (P)| and
|D 3 (P)| are equivalent to |D 2 (M)| and |D 3 (M)|, respectively.
Then, our goal is to find a curve Mt such that (1) it changes
smoothly and (2) after mapping along this curve, the new color
states are close to the original states. Algorithm 1 summarizes the entire process of our L1 optimization. To minimize
Equation (6), we introduce non-negative slack variables to
bound each dimension of the color state derivatives and solve
a linear programming problem as described in [26]. Using Mt
4844
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014
Fig. 7. The curves of the 12 color state elements before and after our
L1 optimization with PCA. Green curves: original color state elements.
Red curves: new color state elements.
to represent the color states, the number of unknown variables
to be estimated for each frame becomes 1, and the number
of introduced slack variables also declines substantially. This
significantly reduces the time and space costs of the linear programming. In our implementation, we employed COIN CLP1
to minimize the energy function and generate a sequence of
stable states.
The curves of our optimal color states and the original color
states are shown in Fig. 7. In contrast to the result from the
L1 optimization without PCA, all 12 elements of color state
obey the same motion. The last row of Fig. 5 shows examples
of the output frames generated by our L1 optimization with
PCA; the problem of unusual color has been avoided.
After obtaining the new color states, an update matrix Bt
is calculated to transfer each frame from the original color
state to the new color state. From the definition of color state,
Pt = Bt St . The update matrix Bt can then be computed as
Bt = Pt St−1 .
(11)
It is applied to the original frame t pixel by pixel to generate
the new frame.
V. E XPERIMENTS AND E VALUATION
Fig. 8.
The curves of Mt with different ωs for the video in Fig. 1.
c
Red curves: Mt . Green curves: βt 1 . ω1 = 50, ω2 = 1 and ω3 = 100.
(a) ωs = 0.01. (b) ωs = 0.1. (c) ωs = 0.3. (d) ωs = 1.0.
Fig. 9. The curves of Mt with different ω1 , ω2 and ω3 for the video in
c
Fig. 1. Red curves: Mt . Green curves: βt 1 . In this experiment, ωs = 0.3.
(a) ω1 = 100, ω2 = ω3 = 1. (b) ω2 = 100, ω1 = ω3 = 1. (b) ω3 = 100,
ω1 = ω2 = 1. (d) ω1 = 50, ω2 = 1, ω3 = 100.
The weight that balances the constraint and smoothness
terms is the most important parameter in our system. As this
weight is varied, the system generates different results. If ωs
is small (such as 0.01), Mt tends to be a straight line, which
maps all of the frames to the same color state. This keeps
the exposure and white balance unchanged always. For videos
whose exposure and color ranges are too wide, a straight line
mapping will cause some frames to have artifacts, such as
over-exposure, contrast loss or erratic color. If ωs is large
(such as 1.0), Pt will be very close to St . Then, most color
state changes will be retained. The weight ωs balances these
two aspects, and it is difficult to find a general value suitable
for all types of videos. We leave this parameter to be tuned
by users. We suggest that users tune ωs within three levels,
0.01, 0.1 and 0.3, which were widely used in our experiments.
Variant weights were tried to stabilize the video in Fig. 1;
the curves of Mt are shown in Fig. 8. The comparison of
the output videos with different parameters is shown in our
supplementary material.2
Three other parameters affecting color state trajectory are
ω1 , ω2 and ω3 . We explored different weights for stabilizing
the video in Fig. 1 and plotted the curves of Mt in Fig. 9. If
we set one of the three parameters to a large value but depress
the other two, it is apparent that the new color state path
tends to be (a) constant non-continuous, (b) linear with sudden
changes or (c) a smooth parabola, but it is rarely static. A
more agreeable viewing experience is produced by setting ω1
and ω3 larger than ω2 because we hope that the optimal path
can sustain longer static segments and be absolutely smooth.
For this paper, we set ω1 = 50, ω2 = 1 and ω3 = 100; the
corresponding curve of Mt is shown in Fig. 9(d).
In practical situations, users may prefer the exposure and
white balance of some frames and hope to keep the tone of
these frames unchanged. Our system can provide this function
1 COIN CLP is an Open Source linear programming solver that can be freely
downloaded from http://www.coin-or.org/download/source/Clp/.
2 The supplementary material can be found in our project page,
http://eagle.zju.edu.cn/~wangyinting/TonalStabilization/.
A. Parameter Setting
WANG et al.: VIDEO TONAL STABILIZATION VIA COLOR STATES SMOOTHING
Fig. 10. The stabilization results generated by keeping the exposure and white
balance of the ‘preferred frame’ unchanged. Top: the original video. Middle:
the stabilization result with the first frames fixed. Bottom: the stabilization
result with the last frame fixed. ωs,t is set to 100 for the preferred frame and
0.01 for the others.
by using a non-uniform weight ωs,t instead of ωs . We ask the
users to point out one or several frames as ‘preferred frames’
and set higher ωs,t for these frames. The weights ωs,t for
the other frames are chosen by our selective strategy. Then,
the new optimal color states for the ‘preferred frames’ will
be very close to the original ones. Fig. 10 gives an example
in which the second row is the result generated by setting
the first frame as the ‘preferred frame’, and the third row
is the result with the last frame fixed. In this example, we
set ωs,t equal to 100 for the ‘preferred frame’ and 0.01 for
the others.
In a similar way, the weight ωc in Equation (5) can be
changed to a spatial-variant one, ωc,x , and then the estimated
affine model would be more accurate for the pixels with larger
weights. We can extract the salient regions [35], [36] of each
frame or ask the users to mark their ‘preferred regions’ and
track these regions in each frame. In this way, our system
will generate more satisfying results for the regions to which
users pay more attention. A numeral comparison experiment
is described in the next subsection.
B. Quantitative Evaluation
To illustrate the effectiveness of our tonal stabilization
method, we employed a low reflectance grey card to do
quantitative evaluation as [1]. We placed an 18% reflectance
grey card into a simple indoor scene and recorded a video with
obvious tone jitter with an iPhone 4S; five example frames
from the video are shown in the first row of Fig. 11. This
video was then processed by our tonal stabilization method.
4845
Note that the grey card was not used to aid and improve
the stabilization. We compared our results with Farbman and
Lischinski’s pixel-wise color alignment method (PWCA) [1].
For PWCA, we set the first frame as the anchor. To reach a
similar processing result, we set ωs,t to 100 for the first frame
and 0.01 for the other frames, so that the exposure and white
balance of the first frame was fixed and propagated to the
others. Both uniform weight ωc and non-uniform weight ωc,x
were tried in this experiment. For uniform weight, we set
ωc = 2 × 103 . For non-uniform weight, we set ωc,x to 104 for
all of the corresponding pixel pairs inside the grey card and
2 × 103 for the others. We measured the angular error [32]
in RGB color space between the mean color of the pixels
inside the grey card of each frame and the first frame. The
plot in Fig. 11(a) shows the angular errors from the first
frame of the original video (Red curve), the video stabilized by
PWCA (Blue curve), the results generated by our method with
uniform weight (Green curve) and non-uniform weight (Dark
Green curve). The second column of Table I is the average
errors over each frame of the original video and the results
generated by PWCA and our method. Both PWCA and our
method performed well, and our method with non-uniform
weight came out slightly ahead.
To assess the benefits of tonal stabilization for white
balance estimation, we conducted a similar experiment to
that presented in [1]; we applied a Grey-Edge family of
algorithms [37] to a video and its stabilization results and compared the performance of white balance estimation. The two
white balance estimation methods chosen assume that some
properties of the scene, such as average reflectance (GreyWorld) or average derivative (Grey-Edge), are achromatic.
We computed the angular error [32] of the mean color inside
the grey card of all frames to the ground truth; the plots of the
estimation error are shown in Figs. 11(b) and (c). The third
and fourth columns of Table I are the average angular errors
of each frame after white balance estimation by Grey-World
and Grey-Edge, respectively.
The grey card restricts that the camera motion during video
shooting should not be large and the camera should not be
very far from the scene; thus, the video used for evaluation
will be relatively simple. On the other hand, PWCA performs
extremely well for large homogeneous regions (grey card).
These two factors are the reasons why our method only led
PWCA a little in the quantitative evaluations.
C. Comparison
From the discussion above, we find that both PWCA and our
method can generate good results for simple videos. However,
PWCA sometimes does not work well for videos that include
scenes with complex textures or which have sudden camera
shaking. For these cases, it produces a very small robust set
and results in a final output that is not absolutely stable.
Our method aligns the successive frames and detects the
corresponding pixels in a more robust way. Fig. 12 compares
the result of PWCA with an anchor set to the central frame
and our result with ωs = 0.01. We can see that our result is
more stable. Another advantage of our method is the selective
4846
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014
Fig. 11. Quantitative evaluation of tonal stabilization and consequent white balance correction. The first row is several still frames from the input video. The
second row compares the numeral errors of the original video and the results generated by PWCA and our method. (a) The angular errors in RGB color space
of average color within the grey card of each frame with respect to the first frame. (b) and (c) The angular errors from the ground truth after white balance
estimation by grey-world and grey-edge. Red curves: original video. Blue curves: the result of PWCA. Green curves: the result of our method with uniform
ωc = 2 × 103 . Dark Green curves: the result of our method with non-uniform ωc,x , 104 for the pixels within the grey card and 2 × 103 for the other pixels.
TABLE I
M EAN A NGULAR E RRORS OF E ACH F RAME . T HE S ECOND C OLUMN I S THE E RRORS W ITH R ESPECT TO THE F IRST F RAME OF THE O RIGINAL V IDEO
AND I TS
S TABILIZATION R ESULTS . T HE T HIRD AND F OURTH C OLUMNS A RE THE E RRORS F ROM THE G ROUND T RUTH A FTER
W HITE BALANCE E STIMATION BY G REY-W ORLD AND G REY-E DGE , R ESPECTIVELY
Fig. 12. Comparison of the stabilization results generated by PWCA and
our method. Top: the original video. Middle: the result of PWCA, which still
contains tone jitter. Bottom: the result of our method.
stabilization strategy. It allows us to adaptively smooth the
path of color states according to the original color states.
Users need only tune the parameter ωs to generate results
with different degrees of stability and choose the best of them.
Even for a video with a large dynamic range, our method
is still very convenient to use. Benefitting from the selective
strategy, the stabilized result will remove the small tone jitter
and retain the sharp tone changes. In contrast, to generate
comparable result, PWCA requires that the user choose the
anchors carefully. If one or two anchors are set automatically,
the result may include artifacts. The second row of Fig. 13 is
the result of PWCA with two anchors set to the first and last
frames. It is clear that some frames of the output video are
over-exposed.
Fig. 13. Another comparison of PWCA and our method. The right two
frames of the result generated by PWCA are over-exposed.
D. Challenging Cases
A video containing noise or dynamic objects is very challenging for affine model estimation because of the outliers
from the noise and dynamic objects. Our robust correspondence selection helps to handle these challenging cases by
discarding most of the outliers. Figs. 14 and 15 show two
examples of these challenging cases and their stabilization
results from our method with a uniform ωc (2 × 103 ) and
a small ωs (0.01). Because noise and dynamic objects do
not affect PWCA, we choose its stabilization results as borderline; they are shown in the second row of each figure.
These two examples demonstrate that our method can perform well for videos containing noise and dynamic objects,
and our results are even a little better than those generated
WANG et al.: VIDEO TONAL STABILIZATION VIA COLOR STATES SMOOTHING
4847
Fig. 16. Failure case. These four frames are extracted from our stabilization
result. The stone appears in two different sections of the video, and the colors
in our result are not coherent.
we plan to accelerate the performance of our method by
using GPUs.
F. Discussion
Fig. 14. A video containing dynamic object and its stabilization results.
Top: the original frames from the video. Middle: the result of PWCA. Some
tone jitter can be found in the 3rd and 4th columns. Bottom: the result of our
method.
Fig. 15. A noisy video and its stabilization results with PWCA and our
method. The exposure and color of the 2nd column is a little different from
the other columns in the PWCA result.
by PWCA for exposure and color consistency (refer to the
third and fourth columns of Fig. 14 and the second column
of Fig. 15).
E. Computational Overhead
The running time of our system depends on the length of
the input video. When we compute the transformation matrix
for two neighboring frames, the images are resized to a fixed
small size, so the size of frame does not significantly affect the
running time. For the 540-frame video (1920 × 1080 in size)
shown in Fig. 1, it took about 511 seconds to complete the
entire stabilization process. Computing the color transformation matrix for each pair of successive frames took approximately 0.88 seconds, and the system spent 7.28 seconds on
optimizing the color states. Our system is implemented in
C++ without code optimization, running on a Dell Vostro
420 PC with an Intel Core2 Quad 2.40GHz CPU and 8GB of
memory.
The running time of our method can be shortened by
parallelization. Approximately 90% of running time is spent
on computing the color transformation matrix between neighboring frames, which is easily parallelizable because the frame
registration and affine model estimation for each pair of
neighboring frames can be carried out independently. On the
other hand, when processing a long video, we can first cut
the video into several short sub-videos and set the frame that
contacts two sub-videos as a ‘preferred frame’; these subvideos can then be stabilized synchronously. In future work
Our method depends on alignment of consecutive frames,
so feature tracking failures will affect our stabilization results.
There are several situations that may cause feature tracking
failures. a) The neighboring frames are very homogeneous
(e.g., wall, sky) and have too few matching feature points.
In this situation, we assume these two source frames are
aligned. Because the frames are homogeneous, this will not
result in large errors from misalignment; our correspondence
selection algorithm helps to discard the outliers from noise,
edges, etc. Therefore, our method also performs well on this
situation. b) Very sharp brightness changes between consecutive frames may cause feature tracking to fail. We can adopt
an iteration-based method to solve this problem. The two
neighboring frames are denoted as I and J . We first assume
j
the two images are aligned, i.e., Hi = I3×3 . We estimate the
j
color transfer model Ai that is applied to frame I to make the
exposure and white balance of I and J closer. Then, a new
j
homography matrix Hi is estimated to align the modified
j
j
frame I and J . We repeat to compute Ai and Hi ; usually
two or three iterations are sufficient to generate a good result.
Most of the neighboring frames in natural videos will not have
brightness changes sharp enough to affect feature tracking.
We tested approximately 150 videos and never encountered
this problem. c) Most local features are located at non-rigid
moving objects. This is the most challenging case, and a vast
majority of feature-tracking-based vision algorithms cannot
handle it, such as [38]. Because our method needs to track
a feature for only two adjacent frames, if the dynamic objects
do not move too quickly, serious artifacts will not result.
Otherwise, our method will fail.
In addition to feature tracking, our method has another
limitation. If the camera is moved from one scene to a
new scene and then returned to the former, the color of the
same surface in different sections of a video stabilized by
our method may be not coherent. An example is shown in
Fig. 16, in which a large stone appears, is passed by, and
then reappears. We can see from the figure that the color of
the stone has changed slightly in our output frames. This is
caused by the error arising during color state computation.
When we estimate the color transformation model for two
neighboring images, if a surface with a particular color exists
only in one frame, then the computed model may be unsuitable
for the region of this surface. Because the color state is
the accumulation of color transformation matrices, the error
of color transformation for two frames will be propagated
to all later images. Another possible reason for this artifact
is that our trajectory smoothing method cannot ensure that
4848
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014
two similar original color states remain similar after stabilization. We leave these unsolved problems for future work.
VI. C ONCLUSION
In this paper, we utilize a 4 × 4 matrix to model the
exposure and white balance state of a video frame, which we
refer to as the color state. PCA dimensionality reduction is
then applied to find a linear subspace of color states, and an
L1 optimization-based method is proposed to generate smooth
color states in the linear subspace and achieve the goal of video
tonal stabilization. Our experimental results and quantitative
evaluation show the effectiveness of our stabilization method.
Compared to previous work, our method performs better in
looking for pixel correspondences in two neighboring frames.
In addition, we use a new stabilization strategy that retains
some tone changes due to sharp illumination and scene content
changes, so our method can handle videos with an extreme
dynamic range of exposure and color. Our system can remove
tone jitter effectively and thus increase the visual quality of
amateur videos. It also can be used as a pre-processing tool
for other video editing methods.
R EFERENCES
[1] Z. Farbman and D. Lischinski, “Tonal stabilization of video,”
ACM Trans. Graph., vol. 30, no. 4, pp. 1–89, 2011.
[2] G. Y. Tian, D. Gledhill, D. Taylor, and D. Clarke, “Colour correction
for panoramic imaging,” in Proc. 6th Int. Conf. Inf. Visualisat., 2002,
pp. 483–488.
[3] S. J. Ha, H. I. Koo, S. H. Lee, N. I. Cho, and S. K. Kim, “Panorama
mosaic optimization for mobile camera systems,” IEEE Trans. Consum.
Electron., vol. 53, no. 4, pp. 1217–1225, Nov. 2007.
[4] Z. Maojun, X. Jingni, L. Yunhao, and W. Defeng, “Color histogram
correction for panoramic images,” in Proc. 7th Int. Conf. Virtual Syst.
Multimedia, Oct. 2001, pp. 328–331.
[5] B. Pham and G. Pringle, “Color correction for an image sequence,” IEEE
Comput. Graph. Appl., vol. 15, no. 3, pp. 38–42, May 1995.
[6] Y. Xiong and K. Pulli, “Color matching of image sequences with
combined gamma and linear corrections,” in Proc. Int. Conf. ACM
Multimedia, 2010, pp. 261–270.
[7] S. Mann and R. W. Picard, “On being ‘undigital’ with digital cameras:
Extending dynamic range by combining differently exposed pictures,”
in Proc. IS&T, 1995, pp. 442–448.
[8] T. Mitsunaga and S. K. Nayar, “Radiometric self calibration,” in Proc.
Comput. Vis. Pattern Recognit., vol. 1. Jun. 1999, pp. 374–380.
[9] F. M. Candocia and D. A. Mandarino, “A semiparametric model for
accurate camera response function modeling and exposure estimation
from comparametric data,” IEEE Trans. Image Process., vol. 14, no. 8,
pp. 1138–1150, Aug. 2005.
[10] M. D. Grossberg and S. K. Nayar, “Modeling the space of camera
response functions,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26,
no. 10, pp. 1272–1282, Oct. 2004.
[11] S. J. Kim, J.-M. Frahm, and M. Pollefeys, “Joint feature tracking and
radiometric calibration from auto-exposure video,” in Proc. IEEE 11th
Int. Conf. Comput. Vis., Oct. 2007, pp. 1–8.
[12] S. J. Kim and M. Pollefeys, “Robust radiometric calibration and
vignetting correction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30,
no. 4, pp. 562–576, Apr. 2008.
[13] M. Grundmann, C. McClanahan, S. B. Kang, and I. Essa, “Postprocessing approach for radiometric self-calibration of video,” in Proc.
IEEE Int. Conf. Comput. Photography, Apr. 2013, pp. 1–9.
[14] E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, “Color transfer
between images,” IEEE Comput. Graph. Appl., vol. 21, no. 5, pp. 34–41,
Sep./Oct. 2001.
[15] F. Pitié, A. C. Kokaram, and R. Dahyot, “Automated colour grading
using colour distribution transfer,” Comput. Vis. Image Und., vol. 107,
nos. 1–2, pp. 123–137, Jul. 2007.
[16] T. Oskam, A. Hornung, R. W. Sumner, and M. Gross, “Fast and stable
color balancing for images and augmented reality,” in Proc. 2nd Int.
Conf. 3D Imag., Modeling, Process., Visualizat. Transmiss., Oct. 2012,
pp. 49–56.
[17] X. An and F. Pellacini, “User-controllable color transfer,” Comput.
Graph. Forum, vol. 29, no. 2, pp. 263–271, May 2010.
[18] A. Chakrabarti, D. Scharstein, and T. Zickler, “An empirical camera
model for internet color vision,” in Proc. Brit. Mach. Vis. Conf., vol. 1.
2009, pp. 1–4.
[19] G. Finlayson and R. Xu, “Illuminant and gamma comprehensive normalisation in log RGB space,” Pattern Recognit. Lett., vol. 24, no. 11,
pp. 1679–1690, Jul. 2003.
[20] S. Kagarlitsky, Y. Moses, and Y. Hel-Or, “Piecewise-consistent color
mappings of images acquired under various conditions,” in Proc. IEEE
12th Int. Conf. Comput. Vis., Sep./Oct. 2009, pp. 2311–2318.
[21] Y.-W. Tai, J. Jia, and C.-K. Tang, “Local color transfer via probabilistic
segmentation by expectation-maximization,” in Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., vol. 1. Jun. 2005, pp. 747–754.
[22] D. Lischinski, Z. Farbman, M. Uyttendaele, and R. Szeliski, “Interactive
local adjustment of tonal values,” ACM Trans. Graph., vol. 25, no. 3,
pp. 646–653, Jul. 2006.
[23] Y. Li, E. Adelson, and A. Agarwala, “ScribbleBoost: Adding classification to edge-aware interpolation of local image and video adjustments,”
Comput. Graph. Forum, vol. 27, no. 4, pp. 1255–1264, 2008.
[24] Q. Zhu, Z. Song, Y. Xie, and L. Wang, “A novel recursive Bayesian
learning-based method for the efficient and accurate segmentation of
video with dynamic background,” IEEE Trans. Image Process., vol. 21,
no. 9, pp. 3865–3876, Sep. 2012.
[25] S. Das, A. Kale, and N. Vaswani, “Particle filter with a mode tracker
for visual tracking across illumination changes,” IEEE Trans. Image
Process., vol. 21, no. 4, pp. 2340–2346, Apr. 2012.
[26] M. Grundmann, V. Kwatra, and I. Essa, “Auto-directed video stabilization with robust L1 optimal camera paths,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., Jun. 2011, pp. 225–232.
[27] H. Siddiqui and C. A. Bouman, “Hierarchical color correction for
camera cell phone images,” IEEE Trans. Image Process., vol. 17, no. 11,
pp. 2138–2155, Nov. 2008.
[28] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
[29] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 971–987,
Jul. 2002.
[30] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Comput.
Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 1994, pp. 593–600.
[31] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color
images,” in Proc. 6th Int. Conf. Comput. Vis., Jan. 1998, pp. 839–846.
[32] S. D. Hordley, “Scene illuminant estimation: Past, present, and future,”
Color Res. Appl., vol. 31, no. 4, pp. 303–314, Aug. 2006.
[33] I. T. Jolliffe, Principal Component Analysis. New York, NY, USA:
Springer-Verlag, 1986, p. 487.
[34] B.-K. Bao, G. Liu, C. Xu, and S. Yan, “Inductive robust principal
component analysis,” IEEE Trans. Image Process., vol. 21, no. 8,
pp. 3794–3800, Aug. 2012.
[35] M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu,
“Global contrast based salient region detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2011, pp. 409–416.
[36] C. Jung and C. Kim, “A unified spectral-domain approach for
saliency detection and its application to automatic object segmentation,”
IEEE Trans. Image Process., vol. 21, no. 3, pp. 1272–1283,
Mar. 2012.
[37] J. van de Weijer, T. Gevers, and A. Gijsenij, “Edge-based color constancy,” IEEE Trans. Image Process., vol. 16, no. 9, pp. 2207–2214,
Sep. 2007.
[38] F. Liu, M. Gleicher, J. Wang, H. Jin, and A. Agarwala, “Subspace video
stabilization,” ACM Trans. Graph., vol. 30, no. 1, pp. 1–4, 2011.
Yinting Wang received the B.E. degree in software
engineering from Zhejiang University, Hangzhou,
China, in 2008, where he is currently pursuing
the Ph.D. degree in computer science. His research
interests include computer vision and image/video
enhancement.
WANG et al.: VIDEO TONAL STABILIZATION VIA COLOR STATES SMOOTHING
Dacheng Tao (M’07–SM’12) is currently a
Professor of Computer Science with the Centre
for Quantum Computation and Intelligent Systems,
and the Faculty of Engineering and Information
Technology, University of Technology, Sydney,
NSW, Australia. He mainly applies statistics and
mathematics for data analysis problems in data
mining, computer vision, machine learning, multimedia, and video surveillance. He has authored
over 100 scientific articles at top venues, including the IEEE IEEE T RANSACTIONS ON PATTERN
A NALYSIS AND M ACHINE I NTELLIGENCE, the IEEE T RANSACTIONS ON
N EURAL N ETWORKS AND L EARNING S YSTEMS , the IEEE T RANSACTIONS
ON I MAGE P ROCESSING , the IEEE Conference on Neural Information
Processing Systems, the International Conference on Machine Learning,
the International Conference on Artificial Intelligence and Statistics, the
IEEE International Conference on Data Mining series (ICDM), the IEEE
Conference on Computer Vision and Pattern Recognition, the International
Conference on Computer Vision, the European Conference on Computer
Vision, the ACM Transactions on Knowledge Discovery from Data, the
ACM Multimedia Conference, and the ACM Conference on Knowledge Discovery and Data Mining, with the Best Theory/Algorithm Paper
Runner Up Award in IEEE ICDM’07 and the Best Student Paper Award
in IEEE ICDM’13.
Xiang Li received the B.E. degree in computer
Science from Zhejiang University, Hangzhou, China,
in 2013. He is currently pursuing the M.S. degree
in information technology with Carnegie Mellon
University, Pittsburgh, PA, USA. His research interests include machine learning and computer vision.
4849
Mingli Song (M’06–SM’13) is currently an
Associate Professor with the Microsoft Visual Perception Laboratory, Zhejiang University, Hangzhou,
China. He received the Ph.D. degree in computer
science from Zhejiang University in 2006. He was
a recipient of the Microsoft Research Fellowship in
2004. His research interests include face modeling
and facial expression analysis.
Jiajun Bu is currently a Professor with the
College of Computer Science, Zhejiang University,
Hangzhou, China. His research interests include
computer vision, computer graphics, and embedded
technology.
Ping Tan is currently an Assistant Professor with the
School of Computing Science, Simon Fraser University, Burnaby, BC, Canada. He was an Associate
Professor with the Department of Electrical and
Computer Engineering, National University of
Singapore, Singapore. He received the Ph.D. degree
in computer science and engineering from the
Hong Kong University of Science and Technology,
Hong Kong, in 2007, and the bachelor’s and master’s degrees from Shanghai Jiao Tong University,
Shanghai, China, in 2000 and 2003, respectively. He
has served as an Editorial Board Member of the International Journal of
Computer Vision and the Machine Vision and Applications. He has served
on the Program Committees of SIGGRAPH and SIGGRAPH Asia. He was
a recipient of the inaugural MIT TR35@Singapore Award in 2012, and the
Image and Vision Computing Outstanding Young Researcher Award and the
Honorable Mention Award in 2012.
Download