Interactive Image Matting for Multiple Layers

advertisement
Interactive Image Matting for Multiple Layers
Dheeraj Singaraju
René Vidal
Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218, USA
Abstract
Image matting deals with finding the probability that
each pixel in an image belongs to a user specified ‘object’
or to the remaining ‘background’. Most existing methods
estimate the mattes for two groups only. Moreover, most
of these methods estimate the mattes with a particular bias
towards the object and hence the resulting mattes do not
sum up to 1 across the different groups. In this work, we
propose a general framework to estimate the alpha mattes
for multiple image layers. The mattes are estimated as the
solution to the Dirichlet problem on a combinatorial graph
with boundary conditions. We consider the constrained optimization problem that enforces the alpha mattes to take
values in [0, 1] and sum up to 1 at each pixel. We also analyze the properties of the solution obtained by relaxing either of the two constraints. Experiments demonstrate that
our proposed method can be used to extract accurate mattes of multiple objects with little user interaction.
1. Introduction
Image matting refers to the problem of assigning to each
pixel in an image, a probabilistic measure of whether it belongs to a desired object or not. This problem finds numerous applications in image editing, where the user is interested only in the pixels corresponding to a particular object,
rather than in the whole image. In such cases, one prefers
assigning soft values to the pixels rather than a hard classification. This is because there can be ambiguous areas where
one cannot make clear cut decisions about the pixels’ membership. Matting therefore deals with assigning a partial
opacity value α ∈ [0, 1] to each pixel, such that pixels that
definitely belong to the object or background are assigned a
value α = 1 or α = 0 respectively. More specifically, the
matting problem tries to estimate the value αi at each pixel
i, such that its intensity Ii can be expressed in terms of the
true foreground and background intensities Fi and Bi as
Ii = αi Fi + (1 − αi )Bi .
(1)
Note that in (1), the number of unknowns is more than
the number of independent equations, thereby making the
matting problem ill posed. Therefore, matting algorithms
typically require some user interaction that specifies the
object of interest and the background, thereby embedding
Figure 1. Typical example of the image matting problem. Left:
Given image with a toy object. Right: Desired matte of the toy.
some constraints in the image. User interaction is provided
in the form of a trimap that specifies the regions that definitely belong to the object and the background, and an intermediate region where the mattes need to be estimated. Early
methods use the trimap to learn intensity models for the object as well as the background [2, 11, 3]. These learned
intensity models are subsequently used to predict the alpha
mattes in the intermediate region. Also, Sun et al. [12] use
the image gradients to estimate the mattes as the solution to
the Dirichlet problem with boundary conditions.
A common criticism of these methods is that they typically require an accurate trimap with a fairly narrow intermediate region to produce good results. Moreover, they
give erroneous results if the background and object have
similar intensity distributions or if the image intensities do
not vary smoothly in the intermediate region. These issues
have motivated the development of methods that extract the
mattes for images with complex intensity variations. One
line of work employs popular segmentation techniques for
the problem of image matting [10, 5, 1]. However, these
methods enforce the regions in the image to be connected.
This is a major limitation, as these methods cannot deal with
images such as Figure 1, where the actual matte has holes.
More recent work addresses the aforementioned issues
and lets the user extract accurate mattes for challenging images with low levels of interaction [14, 8, 6, 16, 15, 13].
However, a fundamental limitation of these methods is that
they extract mattes for two groups only. The only existing
work dealing with multiple groups is the Spectral Matting
algorithm proposed by Levin et al. [9]. They propose to use
the eigenvectors of the so-called Matting Laplacian matrix
to obtain the mattes for several layers. In reality, the algorithm uses multiple eigenvectors to estimate the matte for a
single object. Moreover, the experiments are restricted to
extracting the mattes for two layers only.
Two additional areas of concern with the aforementioned
methods are the following. First, some methods do not enforce the estimated mattes to take values in [0, 1]. If the
matte at a pixel does not lie in [0, 1], its original interpretation as a probabilistic measure breaks down. Second,
most existing methods exhibit a bias towards the object. In
particular, consider repeating the estimation of mattes after switching the labels of the object and background in the
trimap. Then, the mattes estimated at a pixel do not necessarily sum up to 1 across both trials. Consequently, one
needs to postprocess the estimated mattes in order to enforce these constraints. One would like to devise a method
such that these constraints are automatically satisfied, because postprocessing can degrade the quality of the mattes.
Paper contributions: In this work, we present what is to
the best of our knowledge, the first method that can be used
to extract accurate mattes of multiple layers in an image. We
demonstrate that the algorithms of [8] and [15] that are designed for 2 layers, admit an equivalent electrical network
construction that provides a unifying framework for analyzing them. We show how this framework can be used to
generalize the algorithms to estimate the mattes for multiple
image layers. We pose the matting problem as a constrained
optimization that enforces that the alpha mattes at each pixel
(a) take values in [0, 1], and (b) sum up to 1 across the different image layers. Subsequently, we discuss two cases
where either of these two constraints can be relaxed and we
analyze the properties of the resulting solutions. Finally, we
present quantitative and qualitative evaluations of our algorithms’ performance on a database of nearly 25 images.
2. Image Matting for Two Groups
In this section, we discuss the previous algorithms of
Levin et al. [8] and Wang et al. [15] that approach the matting problem via optimization on a combinatorial graph. We
describe the construction of equivalent electrical networks,
the physics of which exhibit the same optimization scheme
as in these methods. In this process, we provide a unifying
framework for studying these algorithms and appreciating
the fine differences between them. Before delving into the
details, we need to introduce some notation.
A weighted graph G consists of a pair G = (V, E) with
nodes v i ∈ V and undirected edges eij ∈ E. The nodes on
the graph typically correspond to pixels in the image. An
edge that spans two vertices v i and v j is denoted by eij .
The neighborhood of a node v i is given by all the nodes
v j that share an edge with v i , and is denoted by N (v i ).
Each edge is assigned a value wij that is referred to as its
weight. Since the edges are undirected, we have wij =
wji . These edge weights
P are used to define the degree di
of a node v i as di = vj ∈N (vi ) wij . One can use these
definitions to construct a Laplacian matrix L for the graph
as L = D − W , where D = diag(d1 , d2 , . . . , d|V| ) and W
is a |V| × |V| matrix whose (i, j) entry is given by the edge
weight wij . By construction, the Laplacian matrix has the
constant vector of 1s in its null space. In fact, when the
graph is connected, the vector of 1s is the only null vector.
The algorithms in [8] and [15] require the user to mark
representative seed nodes for the different image layers.
These seeds embed hard constraints on the mattes and are
subsequently used to predict the mattes of the remaining unmarked nodes. The set S ⊂ V contains the locations of the
nodes marked as seeds and the set U ⊂ V contains the locations of the unmarked nodes. By construction S ∩ U = ∅
and S ∪ U = V. We further split the set S into the sets
O ⊂ S and B ⊂ S that contain the locations of the seeds
for the foreground object and the background, respectively.
By construction, we have O ∩ B = ∅ and O ∪ B = S.
The alpha matte at the pixel v i is denoted as αi and represents the probability that v i belongs to the object. The
mattes of all pixels in the image can be stacked into a vector
>
α>
as α = [α>
S ] , where αU and αS correspond to the
U
mattes of the unmarked and marked pixels, respectively.
2.1. Closed Form Solution for Image Matting
Levin et al. [8] assume that the image intensities satisfy
the line color model. According to this model, the intensities of the object and background vary linearly in small
patches of the image. More specifically, let (IjR , IjB , IjG )
denote the red, blue and green components of the intensity
I j ∈ R3 at the pixel v j . The line color model assumes
B
G
that there exist constants (aR
i , ai , ai , bi ) for every pixel
v i ∈ V, such that the mattes αj of the pixels v j in a small
window W(v i ) around v i can be expressed as
R
B B
G G
αj = aR
i Ij + ai Ij + ai Ij + bi , ∀v j ∈ Wi .
(2)
The algorithm of [8] then proceeds by minimizing the fitting error with respect to this model in every window, while
enforcing the additional constraint that the model parameters vary smoothly across the image. In particular, for some
> 0, the algorithm aims to minimize the cost function
Xh X
X
2 X c 2 i
J(α, a, b) =
αj −
aci Ijc −bi +
ai .
v i ∈V j∈W(v i )
c
c
(3)
The parameter > 0 prevents the algorithm from estiB
G
mating the trivial model (aR
i , ai , ai ) = (0, 0, 0). Note that
the quadratic cost function J(α, a, b) involves two sets of
unknowns, namely, the mattes and the parameters of the line
color model. If one assumes that the mattes α are known,
B
G
N
the model parameters {(aR
i , ai , ai , bi )}i=1 can be linearly
estimated in terms of the image intensities and the mattes by
minimizing (3). Substituting these model parameters back
in to (3) makes the cost J a quadratic function of the mattes.
In particular, the cost function can be expressed as
J(α) = α> Lα, where L is a |V| × |V| Laplacian matrix. This Laplacian is referred to as the Matting Lapla-
cian and essentially captures the local statistics of the intensity variations in a small window around each pixel [8].
We shall now introduce some terms needed for defining the
Matting Laplacian. Consider a window W(v k ) around a
pixel v k ∈ V and let the number of pixels in this window be nk . Let the mean and the covariance of the intensities of the pixels in W(v k ) be denoted as µk ∈ R3 and
Σk ∈ R3×3 . Denoting an m × m identity matrix as Im , [8]
showed that the Matting Laplacian can be constructed from
the edge weights w̃ij that are defined as
X
n−1
1+(I i−µk )> (Σk+ I3 )−1 (I j−µk ) . (4)
k
nk
k|vi ,v j ∈W(v k )
Note that these weights can be positive or negative. The
neighborhood structure however ensures that the graph is
connected. Recall that the user marks some pixels in the
image as seeds that are representative of the object and the
background. To this effect, we define a vector s ∈ R|S| ,
such that an entry of s is set to 1 or 0 depending on whether
it corresponds to the seed of an object or background, respectively. Also, for some λ > 0, we define a matrix
Λ =λI|S| . The mattes in the image are then estimated as
X
X
2
2
α = argmin
wij (αi − αj ) +
λ(αi − si )
α
eij ∈E
i∈S
>
>
= argmin α Lα + λ(αS − s) (αS − s)
(5)
α




αU
B>
0
> > > LU
= argmin αU αS s  B LS + Λ −Λ αS  .
α
s
0
−Λ
Λ
Noticing that the expression in (3) is non-negative, [8]
showed that the Matting Laplacian is positive semi-definite.
This makes the cost function in (5) convex and therefore
the optimization has a unique minimizer that can be linearly
estimated in closed form as
αS = [A + Λ]−1 Λs,
αU =
>
−L−1
U B αS
and
>
−1
= −L−1
Λs,
U B [A + Λ]
(6)
>
where A = LS − BL−1
U B . The algorithm of Levin et al.
[8] employs the above optimization by choosing a large finite valued λ, while the algorithm of Wang et al. [15] works
in the limiting case by choosing λ = ∞. We note that [15]
employs a modification of the Matting Laplacian. However,
it suffices for the purpose of our analysis that the modified
matrix is also a positive semi-definite Laplacian matrix.
2.2. Electrical Networks for Image Matting
It is interesting to note that there exists an equivalent
electrical network that solves the optimization problem in
(5). In particular, one constructs a network as shown in Figure 2, such that each node in the graph associated with the
image is equivalent to a node on the network.
O
O
i
j
i
j
B
wij
B
Figure 2. Equivalent construction of combinatorial graphs and
electrical networks.
The edge weights correspond to the conductance values of resistors connected between neighboring nodes, i.e.
1
Rij = wij . Since the edge weights wij are not all positive, one can argue that this system might not be physically realizable. However, we note that the network is
constrained to dissipate positive energy due to the positive
semi-definiteness of the Laplacian. Therefore, we can think
of the image as a real resistive load that dissipates energy.
The seeded nodes are connected to the network ground and
unit voltage sources by resistive impedances of value λ1 .
In particular, the background seeds are connected to the
ground and the object seeds are connected to the unit voltage sources. Hence, we have s = 1V at the voltage sources
and s = 0V at the ground, where all measurements are with
respect to the network’s ground.
From network theory, we know that the potentials x at
the nodes in the network distribute themselves such that
they minimize the energy dissipated by the network. The
potentials can therefore be estimated as
X 1
X
2
2
x̄ = argmin
(xi −xj ) +
λ(xi −si ) . (7)
Rij
x̄
eij ∈E
i∈S
We note that this is exactly the same expression as in (5).
Therefore, if one were to construct the equivalent network
and measure the potentials at the nodes they would give us
the required alpha mattes.
In what follows, we shall show that as λ → ∞, the
fraction (η) of work done by the electrical sources that is
used to drive the load is maximized. Note that one can
use the positive definiteness of L and the connectedness
of the graph to show that the matrix A is positive definite.
Now, consider the singular value decomposition of the matrix A. We have A = U ΣA U > , where U = [u1 . . . u|S| ]
and ΣA = diag(σ1 , . . . , σ|S| ), σ1 ≥ σ2 ≥ · · · ≥ σ|S| > 0.
We can then explicitly evaluate η as
x> Lx
Eload
= >
(8)
η=
Etotal
x Lx + (xS − s)> Λ(xS − s)
=
|S|
2
X
σi (u>
i s)
σi
( + 1)2
i=1 λ
(s> Λ) [A + Λ]−1 A[A + Λ]−1 (Λs)
=
.
|S|
2
X
(s> Λ) Λ−1 − [A + Λ]−1 (Λs)
σi (u>
s)
i
σi
(
+
1)
λ
i=1
Since λ > 0 and ∀i = 1, . . . , |S|, σi > 0, we have that
1 + σλi > 1. Given this observation, it can be verified that
∀λ ∈ (0, ∞), η < 1 and lim η = 1. The fractional enλ→∞
ergy delivered to the load is therefore maximized by setting
λ = ∞. In the network, this limiting case corresponds to
setting the values of impedances between the sources and
the load to zero. In terms of the image, this forces the mattes at the pixels marked by the scribbles to be 1 for the foreground object and 0 for the background.
The algorithm of [8] solves the optimization problem in
(5) by setting λ to be a large finite valued number. Since
there is always a finite potential drop across the resistors
connecting the voltage source and the grounds to the image,
we note that the mattes (potentials) at the seeds are close
to the desired values but not equal. The algorithm of Wang
et al. [15] corresponds to the limiting case of λ = ∞ and
forces the mattes at the seeds to be equal to the desired values. The numerical framework of the latter is referred to
as the solution to the combinatorial Dirichlet problem with
boundary conditions. Based on the argument given above,
we favor the optimization of [15] over that of [8] because
it helps in utilizing the scribbles to the maximum extent to
provide knowledge about the unknown mattes.
3. Image Matting for Multiple Groups
In this section, we show how to solve the matting problem for n ≥ 2 image layers, by using generalizations of
the methods discussed in Section 2. Essentially, we assume
that the intensity of each pixel in the query image can be expressed as the convex linear combination of the intensities
at the same pixel location in the n image layers. We are then
interested in estimating partial opacity values at each pixel
as the probability of belonging to one of the image layers.
In what follows, we denote the intensity of a pixel v i ∈
V in the query image by Ii . Similarly, the intensity of a pixel
v i ∈ V in the j th image layer is denoted by Fij , where 1 ≤
j ≤ n. The alpha matte at a pixel v i ∈ V with respect to the
j th image layer is denoted by αij . Given these definitions,
the matting problem requires us to estimate alpha mattes
{αij }nj=1 and the intensities {Fij }nj=1 of the n image layers
at each pixel v i ∈ V, such that we have
Ii =
n
X
j=1
αij Fij , s.t.
n
X
αij = 1, and {αij }nj=1 ∈ [0, 1].
(9)
j=1
We shall pose this matting problem as an optimization
problem on combinatorial graphs. While we retain most
of the notations from Section 2, we need to introduce new
notation for the sets that contain the seeds’ locations. In
particular, we split the set S that contains the locations of all
the seeds into the sets R1 , R2 , . . . , Rn , where Ri contains
the seed locations for the ith layer. By construction, we have
∪ni=1 Ri = S and ∀1 ≤ i < j ≤ n, Ri ∩ Rj = ∅.
Algorithm 1 (Image matting for n ≥ 2 image layers)
1: Given an image, construct the matting Laplacian L for
the image as described in Section 2.
2: For each image layer j ∈ {1, · · · , n}, fix the mattes at
the seeds as αji = 1 if v i ∈ Rj and αji = 0 if v i ∈ S \ Rj .
3: Estimate the alpha mattes {αjU }nj=1 for the unmarked
pixels with respect to the n image layers as
n h
i
X
>
(10)
{αjU }nj=1 =argmin
αj Lαj
{αjU }n
j=1 j=1
such that
(a) ∀j ∈ {1, . . . , n}, 0 ≤ αjU ≤ 1, and
n
X
(b)
αjU = 1.
(11)
j=1
We propose to solve this problem by minimizing the sum
of the cost functions associated with the estimation of mattes for each image layer. Unlike [8] and [15], we propose
to impose constraints that the mattes sum up to 1 at each
pixel, and take values in [0, 1]. In particular, we propose
that the matting problem can be solved using Algorithm 1.
Notice that this is a standard optimization of minimizing a
quadratic function subject to linear constraints. Recall from
Section 2 that the cost function in (10), is a convex function of the alpha mattes. Also, notice that the set of feasible
solutions of the mattes {αjU }nj=1 is compact and convex.
Therefore, the optimization problem posed in Algorithm 1
is guaranteed to have a unique solution. However, this optimization can be computationally cumbersome due to the
large number of unknown variables. In what follows, we
shall discuss relaxing either one of the constraints and analyze its effect on the solution to the optimization problem.
3.1. Image matting for multiple layers, without constraining the sum of the mattes at a pixel
We analyze the properties of the mattes obtained by solving the optimization problem of (10) without enforcing the
mattes to take values between 0 and 1. In particular, Theorem 1 states an important consequence of this case.
Theorem 1 The alpha mattes obtained as the solution to
Algorithm 1 without imposing the constraint that they take
values between 0 and 1, are naturally constrained to sum
up to 1 at every pixel.
Proof. Notice that the set of feasible solutions for the mattes
is convex, even when we do not impose constraint (a). The
optimization problem is hence guaranteed to have a unique
solution. Now, we know that this solution must satisfy the
Karusch-Kuhn-Tucker (KKT) conditions [7]. In particular,
the KKT conditions guarantee the existence of a vector Γ ∈
R|U | such that the solution αU satisfies
∀j ∈ {1, . . . , n}, LU αjU + B > αjS + Γ = 0.
(12)
The entries of Γ are the Lagrange multipliers for the constraints that the mattes sum up to one at each pixel. We now
assume that the mattes sum up to one, and show that Γ= 0.
This is equivalent to proving that the constraint (b) in (11) is
automatically
Pn satisfied without explicitly enforcing it. Assume that j=1 αjU = 1. Also, recall that by construction,
Pn
j
j=1 αS = 1. Now summing up the KKT conditions in
(12) across all the image layers gives us
n
X
[LU αjU + B > αjS + Γ] = LU 1 + B > 1 + nΓ = 0 (13)
j=1
Recall that the vector of 1s lies in the null space of L,
and hence we have LU 1 + B > 1 = 0 Therefore, we can
conclude from (13) that Γ = 0. This implies that the solution automatically satisfies the constraint that the mattes
sum up to 1 at each unmarked pixel. In fact, if one does not
enforce constraint (a) in (11), then the solution is the same,
irrespective of whether constraint (b) is enforced or not.
This is an important result which follows intuitively from
the well known Superposition Theorem in electrical network theory [4]. It is important to notice that the estimation of mattes for the ith image layer is posed as the solution
to the Dirichlet problem with boundary conditions. In fact,
this is a simple extension of [15], where the ith image layer
is treated as the foreground and the rest of the layers are
treated as the background. Theorem 1 guarantees that the
estimation of the mattes for any n − 1 of the n image layers
automatically determines the mattes for the remaining layer.
Wang et al. [15] claim that this optimization scheme
finds analogies in the theory of random walks. Therefore,
a standard result states that the mattes are constrained to lie
between 0 and 1. However, this result is derived specifically for graphs with positive edge weights. This is not true
for the graphs constructed for the matting problem since the
graphs’ edge weights are both positive and negative.
In fact, Figure 3 gives an example where the system has
a positive semi-definite Laplacian matrix, but the obtained
potentials (mattes) do not all lie in [0, 1]. Note that xc takes
the value 43 on switching the locations of the voltage source
and the ground. An easy fix employed by [8] and [15] to resolve this issue, is to clip all the alpha values such that they
take values in [0, 1]. This scheme might give results that are
visually pleasing, but they might not obey the KKT conditions. Moreover, the postprocessed mattes at each pixel
might not sum up to 1 across all the layers anymore.
the general scheme for n ≥ 2 layers, we consider the simple case of estimating the mattes for n = 2 layers. In this
case, the mattes at the unmarked pixels are estimated as the
solution of the quadratic programming problem given by
h >
i
>
αjU = argmin αjU LU αjU + 2αjU BαS + α>
S LS αS ,
αjU
subject to 0 ≤ αjU ≤ 1, for j = 1, 2.
(14)
Note that the set of feasible solutions for the mattes of
the unmarked pixels is given by [0, 1]|U | , which is a compact
convex set. Also, the objective function being minimized in
(14) is a convex function of the alpha mattes. Hence, the
optimization problem is guaranteed to have a unique minimizer. In this case, the KKT conditions guarantee the existence of |U| × |U| diagonal matrices {Λj0 }2j=1 and {Λj1 }2j=1 ,
with non-positive entries such that the solution αjU for the
j th image layer satisfies
LU αU + B > αS + Λ0 − Λ1 = 0, and
Λj0 αjU = 0, Λji (1 − αjU ) = 0, j = 1, 2.
(15)
In what follows, we denote the (i, i) diagonal entry of
j=1,2
{Λjk }k=0,1
as λjki ≤ 0. Notice that if the matte αij at the
ith unmarked pixel with respect to the j th image layer lies
between 0 and 1, then the KKT conditions state that the associated slack variables λj0i and λj1i are equal to zero. Consequently, the ith row P
of the first equation in (15) gives us
the result that αij =
v k ∈N (v k )
P
wik αjk
v k ∈N (v i )
wik
. In particular, this
implies that αi can be expressed as the weighted average of
the mattes of the neighboring pixels. But, when αij = 0 or
αij = 1, we see that αij is not the weighted average of the
mattes at the neighboring pixels and this is accounted for by
the slack variables λj0i and λj1i . Recall from Section 3.1 that
the methods of [8] and [15] clip the estimated mattes to take
values between 0 and 1, when necessary. This step is somewhat equivalent to using such slack variables. However, one
also needs to re-adjust the alpha mattes at the unmarked pixels that lie in the neighborhoods of the pixels whose mattes
are clipped. In particular, they need to be adjusted so that
they still satisfy the KKT conditions. However, this step is
neglected in the methods of [8] and [15].

3.2. Image matting for multiple layers, without constraining the limits of the mattes
In this section, we discuss the properties of the solution
of Algorithm 1, when we do not enforce the constraint that
the mattes sum up to 1 at each pixel. Before we discuss

0.75 −1 0.25
2
−1 
L =  −1
0.25 −1 0.75
Figure 3. Example where all the potentials do not lie in [0, 1].
We now consider the problem of solving this constrained
optimization for n ≥ 2 image layers. In particular, we want
to analyze whether the sum of the mattes at each pixel is
constrained to be 1 or not. Theorem 2 states an important
property of this optimization that the sum of mattes is enforced to be 1, only when n = 2 and not otherwise.
Theorem 2 The alpha mattes obtained as the solution to
Algorithm 1 without imposing the constraint that they sum
up to 1 at each pixel, are in fact naturally constrained to
sum up to 1 at each pixel when n = 2, but not when n > 2.
Proof. Consider the following optimization problem for estimating the mattes at the unmarked nodes.
{αjU }nj=1
= argmin
n h
X
{αjU }n
j=1 j=1
>
αjU LU αjU
s.t. 0 ≤ {αjU }nj=1 ≤ 1, and
n
X
+
>
2αjU BαjS
αjU = 1.
i
,
(16)
j=1
As discussed earlier, the KKT conditions guarantee the
existence of diagonal matrices {Λj0 }ni=1 and {Λj1 }ni=1 , and
a vector Γ, such that the solution {αjU }nj=1 satisfies
∀1 ≤ j ≤ n, LU αjU + B > αjS + Λj0 − Λj1 + Γ = 0, and
∀1 ≤ j ≤ n, Λj0 αju = 0, Λji (1 − αjU ) = 0.
j=1
Λj0 −
n
X
Λj1 + nΓ = 0
4. Experiments
We denote the variants of Algorithm 1 discussed in Sections 3.1 and 3.2 as Algorithm 1a and Algorithm 1b respectively. We now evaluate these algorithms’ performance on
the database proposed by Levin et al. [9]. The database
consists of images of three different toys, namely a lion, a
monkey and a monster. Each toy is imaged against 8 different backgrounds and so the database has 24 distinct images.
The database provides the ground truth mattes for the toys.
All the images are of size 560 × 820 and are normalized to
take values between 0 and 1. In our experiments, the Matting Laplacian for each image is estimated as described in
Section 2.1, using windows of size 3×3 and with = 10−6 .
(17)
4.1. Quantitative Evaluation for n = 2 Layers
Γ acts as a Lagrange multiplier for the constraint that the
mattes sum up to 1. We now assume that the mattes sum
up to 1 at each pixel and inspect the value of Γ. Essentially,
if Γ = 0, it means that the estimated mattes are naturally
constrained to sum up to 1 at each pixel. Assuming that the
mattes sum up to 1, we sum up the first expression in (17)
across all the image layers to conclude that
n
X
However, this case cannot arise when n = 2. In fact
for n = 2, we see that both the mattes take values in
either (0, 1) or in {0, 1}. In the first case, we see that
λ10i = λ20i = λ11i = λ11i = 0. For the second case, assume without loss of generality that αi1 = 0 and αi2 = 1.
We then have λ11i = λ20i = 0. Moreover, we notice that setting λ10i = λ21i satisfies the KKT conditions. Hence, we can
conclude that γi = 0 in both cases. Consequently, the mattes are naturally constrained to sum up to 1 at each pixel,
for n = 2.
(18)
j=1
This relationship is derived using the fact that LU 1 +
B > 1 = 0. We now inspect the mattes at a pixel v i ∈ U
and analyze all the possible solutions. Firstly, consider the
case when the mattes {αij }kj=1 of the first k < n − 1 image
layers are identically equal to 0 and the remaining mattes
{αij }nj=k+1 take values in (0, 1). We can always consider
such reordering of the image layers without any loss of generality. We see that the variables {λj1i }nj=1 and {λj0i }nj=k+1
are equal to zero. Substituting these values in (18), gives
Pk
γi + j=1 λj0i = 0. Therefore, γi is non-negative, and
is equal to zero if and only if the variables {λj0i }kj=1 are
identically equal to zero. Since these variables are not constrained to be zero valued, we see that γi is not equal to
zero. It is precisely due to this case exactly that the mattes
are not constrained to sum up to 1, when n > 2.
We present a quantitative evaluation of our algorithms
on the described database. We follow the same procedure
as in [9] and define the trimap by considering the pixels
whose true mattes lie between 0.05 and 0.95 and dilating
this region by 4 pixels. The algorithms performance is then
evaluated by considering the SSD (sum of squared differences) error between the estimated mattes and the ground
truth. Table 1 shows the statistics of performance.
We see that the performance of both variants is on par.
Recall that the numerical framework of Algorithm 1a is very
similar to that of the closed form matting solution proposed
by Levin et al. [8]. Consequently, both these algorithms
would have similar performances. The analysis in [9] shows
that the algorithm of [8] performs better than the matting
algorithms of [12, 5, 14] and is outperformed only by the
Spectral Matting algorithm. Therefore, we conclude that
both variants of Algorithm 1 potentially match up to the
state of the art algorithms.
Algorithm 1a
Toy
Mean Median
Lion
532.06 486.24
Monkey 273.34 277.03
Monster 822.07 958.06
Algorithm 1b
Toy
Mean Median
Lion
547.00 509.18
Monkey 273.40 278.00
Monster 796.03 864.42
Table 1. Statistics of SSD errors in estimating mattes of 2 layers.
4.2. Qualitative Evaluation for n ≥ 2 Layers
We now present results of using Algorithm 1a to extract
mattes for n > 2 image layers with low levels of user interaction. Since there is no ground truth data available for
multiple layers, we validate the performance visually. In
particular, we first estimate the matte for the multiple layers
using Algorithm 1a. We then use the scheme outlined in [8]
to reconstruct the intensities of each image layer. Finally,
we show the contribution of each image layer to the query
image. Figures 4 and 5 show the results for the extraction of
mattes for 3 and 5 image layers, respectively. The scribbles
are color coded to demarcate between the different image
layers. We note that the results visually appear to be quite
good. The results highlight a disadvantage of our method
that the mattes can be erroneous if the intensities near the
boundary of adjoining image layers are similar. This problem can be fixed by adding more seeds, and hence at an
expense of increased levels of user interaction.
5. Conclusions
We have proposed a constrained optimization problem
to extract accurate mattes for multiple image layers with
low levels of user interaction. We discussed two variants of
the method where we relaxed the constraints of the problem and presented a theoretical analysis of the properties of
the estimated mattes. Experimental evaluation of both these
variants shows that they provide visually pleasing results.
Acknowledgments. Work supported by JHU startup funds,
by grants NSF CAREER ISS-0447739, NSF EHS-0509101,
and ONR N00014-05-1083, and by contract APL-934652.
[3] Y.-Y. Chuang, B. Curless, D. Salesin, and R. Szeliski. A Bayesian
Approach to Digital Matting. In CVPR (2), pages 264–271, 2001.
[4] P. Doyle and L. Snell. Random walks and electric networks. Number 22 in The Carus Mathematical Monographs. The Mathematical
Society of America, 1984.
[5] L. Grady, T. Schiwietz, S. Aharon, and R. Westermann. Random
Walks for Interactive Alpha-Matting. In ICVIIP, pages 423–429,
September 2005.
[6] Y. Guan, W. Chen, X. Liang, Z. Ding, and Q. Peng. Easy Matting:
A Stroke Based Approach for Continuous Image Matting. In Eurographics 2006, volume 25, pages 567–576, 2006.
[7] H. W. Kuhn and A. W. Tucker. Nonlinear Programming. In 2nd
Berkeley Symposium, pages 481–492, 1951.
[8] A. Levin, D. Lischinski, and Y. Weiss. A Closed Form Solution to
Natural Image Matting. IEEE Trans. on PAMI, 30(2): 228–242, 2008
[9] A. Levin, A. Rav-Acha, and D. Lischinski. Spectral Matting. In
CVPR, Pages 1–8, 2007.
[10] C. Rother, V. Kolmogorov, and A. Blake. ”Grabcut”: Interactive
Foreground Extraction Using Iterated Graph Cuts. ACM Trans.
Graph., 23(3):309–314, 2004.
[11] M. A. Ruzon and C. Tomasi. Alpha Estimation in Natural Images.
In CVPR, pages 1018–1025, 2000.
[12] J. Sun, J. Jia, C.-K. Tang, and H.-Y. Shum. Poisson Matting. ACM
Trans. Graph., 23(3):315–321, 2004.
[13] J. Wang, M. Agrawala, and M. F. Cohen. Soft Scissors: An Interactive Tool for Realtime High Quality Matting. ACM Trans. Graph.,
26(3):9, 2007.
[14] J. Wang and M. F. Cohen. An Iterative Optimization Approach for
Unified Image Segmentation and Matting. In ICCV, pages 936–943,
2005.
[15] J. Wang and M. F. Cohen. Optimized Color Sampling for Robust
Matting. In CVPR, Pages 1–8, 2007.
[16] J. Wang and M. F. Cohen. Simultaneous Matting and Compositing.
In CVPR, Pages 1–8, 2007.
References
[1] X. Bai and G. Sapiro. A Geodesic Framework for Fast Interactive
Image and Video Segmentation and Matting. In ICCV, pages 1–8,
2007.
[2] A. Berman, A. Dadourian, and P. Vlahos. Method for Removing
from an Image the Background Surrounding a Selected Object. US
Patent no. 6,134,346, October 2000.
(a) Scribbles for marking seeds
(b) αmonster Fmonster
(c) αsky Fsky
(d) αwater Fwater
Figure 4. Extracting 3 image layers using Algorithm 1a.
(a) Scribbles for marking seeds
(b) αlion Flion
(c) αsky Fsky
(d) αhill Fhill
(e) αsand Fsand
(f) αwater Fwater
Figure 5. Extracting 5 image layers using Algorithm 1a.
Download