Higher-Order Correlation Clustering for Image Segmentation

advertisement
Higher-Order Correlation Clustering for Image
Segmentation
Sungwoong Kim
Department of EE, KAIST
Daejeon, South Korea
sungwoong.kim01@gmail.com
Sebastian Nowozin
Microsoft Research Cambridge
Cambridge, UK
Sebastian.Nowozin@microsoft.com
Pushmeet Kohli
Microsoft Research Cambridge
Cambridge, UK
pkohli@microsoft.com
Chang D. Yoo
Department of EE, KAIST
Daejeon, South Korea
cdyoo@ee.kaist.ac.kr
Abstract
For many of the state-of-the-art computer vision algorithms, image segmentation
is an important preprocessing step. As such, several image segmentation algorithms have been proposed, however, with certain reservation due to high computational load and many hand-tuning parameters. Correlation clustering, a graphpartitioning algorithm often used in natural language processing and document
clustering, has the potential to perform better than previously proposed image segmentation algorithms. We improve the basic correlation clustering formulation by
taking into account higher-order cluster relationships. This improves clustering
in the presence of local boundary ambiguities. We first apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph and
then develop higher-order correlation clustering over a hypergraph that considers higher-order relations among superpixels. Fast inference is possible by linear programming relaxation, and also effective parameter learning framework by
structured support vector machine is possible. Experimental results on various
datasets show that the proposed higher-order correlation clustering outperforms
other state-of-the-art image segmentation algorithms.
1 Introduction
Image segmentation, a partitioning of an image into disjoint regions such that each region is homogeneous, is an important preprocessing step for many of the state-of-the-art algorithms for high-level
image/scene understanding for three reasons. First, the coherent support of a region, commonly assumed to be of a single label, serves as a good prior for many labeling tasks. Second, these coherent
regions allow a more consistent feature extraction that can incorporate surrounding contextual information by pooling many feature responses over the region. Third, compared to pixels, a small
number of larger homogeneous regions significantly reduces the computational cost for a successive
labeling task.
Image segmentation algorithms can be categorized into either non-graph-based or graph-based algorithms. Some well-known non-graph-based algorithms represented by mode-seeking algorithms
such as the K-means [1], mean-shift [2], and EM [3] are available, while well-known graph-based
algorithms are available as the min-cuts [4], normalized cuts [5] and Felzenszwalb-Huttenlocher
(FH) segmentation algorithm [6]. In comparison to non-graph-based segmentations, graph-based
segmentations have been shown to produce consistent segmentations by adaptively balancing local
1
judgements of similarity [7]. Moreover, the graph-based segmentation algorithms with global objective functions such as the min-cuts and normalized cuts have been shown to perform better than
the FH algorithm that is based on the local objective function, since the global-objective algorithms
benefit from the global nature of the information [7]. However, in contrast to the min-cuts and normalized cuts which are node-labeling algorithms, the FH algorithm benefits from the edge-labeling
in that it leads to faster inference and does not require a pre-specified number of segmentations in
each image [7].
Correlation clustering is a graph-partitioning algorithm [8] that simultaneously maximizes intracluster similarity and inter-cluster dissimilarity by solving the global objective (discriminant) function. In comparison with the previous image segmentation algorithms, correlation clustering is a
graph-based, global-objective, and edge-labeling algorithm and therefore, has the potential to perform better for image segmentation. Furthermore, correlation clustering leads to the linear discriminant function which allows for approximate polynomial-time inference by linear programming (LP)
and large margin training based on structured support vector machine (S-SVM) [9]. A framework
that uses S-SVM for training the parameters in correlation clustering has been considered previously by Finley et al. [10]; however, the framework was applied to noun-phrase and news article
clusterings. Taskar derived a max-margin formulation for learning the edge scores for correlation
clustering [11]. However, his learning criterion is different from the S-SVM and is limited to applications involving two different segmentations of a single image. Furthermore, Taskar does not
provide any experimental comparisons or quantitative results.
Even though the previous (pairwise) correlation clustering can consider global aspects of an image using the discriminatively-trained discriminant function, it is restricted in resolving the segment
boundary ambiguities caused by neighboring pairwise relations presented by the pairwise graph.
Therefore, to capture long-range dependencies of distant nodes in a global context, this paper proposes a novel higher-order correlation clustering to incorporate higher-order relations. We first
apply the pairwise correlation clustering to image segmentation over a pairwise superpixel graph
and then develop higher-order correlation clustering over a hypergraph that considers higher-order
relations among superpixels.
The proposed higher-order correlation clustering is defined over a hypergraph in which an edge can
connect to two or more nodes [12]. Hypergraphs have been previously used to lift certain limitations of conventional pairwise graphs [13, 14, 15]. However, previously proposed hypergraphs
for image segmentation are restricted to partitioning based on the generalization of normalized cut
framework, which suffer from a number of problems. First, inference is slow and difficult especially with increasing graph size. A number of algorithms to approximate the inference process
have been introduced based on the coarsening algorithm [14] and the hypergraph Laplacian matrices [13]; these are heuristic approaches and therefore are sub-optimal. Second, incorporating a
supervised learning algorithm for parameter estimation under the spectral hypergraph partitioning
framework is difficult. This is in line with the difficulties in learning spectral graph partitioning. This
requires a complex and unstable eigenvector approximation which must be differentiable [16, 17].
Third, utilizing rich region-based features is restricted. Almost all previous hypergraph-based image
segmentation algorithms are restricted to use only color variances as region features.
The proposed higher-order correlation clustering overcomes all of these problems due to the generalization of the pairwise correlation clustering and enables to take advantages of using a hypergraph.
The proposed higher-order correlation clustering algorithm uses as its input a hypergraph and leads
to a linear discriminant function. A rich feature vector is defined based on several visual cues involving higher-order relations among superpixels. For fast inference, the LP relaxation is used to
approximately solve the higher-order correlation clustering problem, and for supervised training of
the parameter vector by S-SVM, we apply a decomposable structured loss function to handle unbalanced classes. We incorporate this loss function into the cutting plane procedure for S-SVM
training. Experimental results on various datasets show that the proposed higher-order correlation
clustering outperforms other state-of-the-art image segmentation algorithms.
The rest of the paper is organized as follows. Section 2 presents the higher-order correlation clustering for image segmentation. Section 3 describes large margin training for supervised image segmentation based on the S-SVM and the cutting plane algorithm. A number of experimental and
comparative results are presented and discussed in Section 4, followed by a conclusion in Section 5.
2
i
k
yik
k
yijk
j
yij
y jk
j
(a)
yj k
(b)
Figure 1: Illustrations of a part of (a) the pairwise graph (b) and the triplet graph built on superpixels.
2
Higher-order correlation clustering
The proposed image segmentation is based on superpixels which are small coherent regions preserving almost all boundaries between different regions, since superpixels significantly reduce computational cost and allow feature extraction to be conducted from a larger homogeneous region.
The proposed correlation clustering merges superpixels into disjoint homogeneous regions over a
superpixel graph.
2.1
Pairwise correlation clustering over pairwise superpixel graph
Define a pairwise undirected graph G = (V, E) where a node corresponds to a superpixel and a link
between adjacent superpixels corresponds to an edge (see Figure 1.(a)). A binary label yjk for an
edge (j, k) ∈ E between nodes j and k is defined such that
{
1, if nodes j and k belong to the same region,
yjk =
(1)
0, otherwise.
A discriminant function, which is the negative energy function, is defined over an image x and label
y of all edges as
∑
Simw (x, j, k)yjk
F (x, y; w) =
(j,k)∈E
=
∑
⟨w, ϕjk (x)⟩yjk = ⟨w,
(j,k)∈E
∑
ϕjk (x)yjk ⟩ = ⟨w, Φ(x, y)⟩
(2)
(j,k)∈E
where the similarity measure between nodes j and k, Simw (x, j, k), is parameterized by w and
takes values of both signs such that a large positive value means strong similarity while a large
negative value means high degree of dissimilarity. Note that the discriminant function F (x, y; w)
is assumed to be linear in both the parameter vector w and the joint feature map Φ(x, y), and
ϕjk (x) is a pairwise feature vector which reflects the correspondence between the jth and the kth
superpixels. An image segmentation is to infer the edge label, ŷ, over the pairwise superpixel graph
G by maximizing F such that
ŷ = argmax F (x, y; w)
(3)
y∈Y
E
where Y is the set of {0, 1} that corresponds to a valid segmentation, the so called multicut polytope. However, solving (3) with this Y is generally NP-hard. Therefore, we approximate Y by means
of a common multicut LP relaxation [18] with the following two constraints: (1) cycle inequality and
(2) odd-wheel inequality. When producing the segmentation results based on the approximated LP
solutions, we take the floor of a fractionally-predicted label of each edge independently for simply
obtaining valid integer solutions that may be sub-optimal.
Even though this pairwise correlation clustering takes a rich pairwise feature vector and a trained
parameter vector (which will be presented later), it often produces incorrectly predicted segments
due to the segment boundary ambiguities caused by limited pairwise relations of neighboring superpixels (see Figure 2). Therefore, to incorporate higher-order relations, we develop higher-order
correlation clustering by generalizing the correlation clustering over a hypergraph.
2.2
Higher-order correlation clustering over hypergraph
The proposed higher-order correlation clustering is defined over a hypergraph in which an edge
called hyperedge can connect to two or more nodes. For example, as shown in Figure 1.(b), one
3
(a)
(b)
(c)
(d)
Figure 2: Example of segmentation result by pairwise correlation clustering. (a) Original image. (b)
Ground-truth. (c) Superpixels. (d) Segments obtained by pairwise correlation clustering.
can introduce binary labels for each adjacent vertices forming a triplet such that yijk = 1 if all
vertices in the triplet ({i, j, k}) are in the same cluster; otherwise, yijk = 0. Define a hypergraph
HG ∪
= (V, E) where V is a set of nodes (superpixels) and E is a set of hyperedges (subsets of V) such
that e∈E = V. Here, a hyperedge e has at least two nodes, i.e. |e| ≥ 2. Therefore, the hyperedge
set E can be divided into two disjoint subsets: pairwise
∪ edge set Ep = {e ∈ E | |e| = 2} and higherorder edge set Eh = {e ∈ E | |e| > 2} such that Ep Eh = E. Note that in the proposed hypergraph
for higher-order correlation clustering all hyperedges containing just two nodes (∀ep ∈ Ep ) are
linked between adjacent superpixels. The pairwise superpixel graph is a special hypergraph where
all hyperedges contain just two (neighboring) superpixels: Ep = E. A binary label ye for a hyperedge
e ∈ E is defined such that
{
1, if all nodes in e belong to the same region,
(4)
ye =
0, otherwise.
Similar to the pairwise correlation clustering, a linear discriminant function is defined over an image
x and label y of all hyperedges as
∑
F (x, y; w) =
Homw (x, e)ye
e∈E
∑
∑
∑
=
⟨w, ϕe (x)⟩ye =
⟨wp , ϕep (x)⟩yep+
⟨wh , ϕeh (x)⟩yeh =⟨w, Φ(x, y)⟩ (5)
e∈E
ep ∈Ep
eh ∈Eh
where the homogeneity measure among nodes in e, Homw (x, e), is also the inner product of the
parameter vector w and the feature vector ϕe (x) and takes values of both signs such that a large
positive value means strong homogeneity while a large negative value means high degree of nonhomogeneity. Note that the proposed discriminant function for higher-order correlation clustering is
decomposed into two terms by assigning different parameter vectors to the pairwise edge set Ep and
the higher-order edge set Eh such that w = [wpT , whT ]T . Thus, in addition to the pairwise similarity
between neighboring superpixels, the proposed higher-order correlation clustering considers a broad
homogeneous region reflecting higher-order relations among superpixels.
Now the problem is how to build our hypergraph from a given image. Here, we use unsupervised
multiple partitionings (quantizations) from baseline superpixels. We obtain unsupervised multiple
partitionings by merging not pixels but superpixels with different image quantizations using the
ultrametric contour maps [19]. For example, in Figure 3, there are three region layers, one superpixel
(pairwise) layer and two higher-order layers, from which a hypergraph is constructed by defining
hyperedges as follows: first, all edges (black line) in the pairwise superpixel graph from the first
layer are incorporated into the pairwise edge set Ep , while hyperedges (yellow line) corresponding
to regions (groups of superpixels) in the second and third layers are included in the higher-order edge
set Eh . Note that we can further decompose the higher-order term in (5) into two terms associated
with the second layer and the third layer, respectively, by assigning different parameter vectors;
however for simplicity, this paper aggregates all higher-order edges from all higher-order layers into
a single higher-order edge set assigning the same parameter vector.
2.2.1 LP relaxation for inference
An image segmentation is to infer the hyperedge label, ŷ, over the hypergraph HG by maximizing
the discriminant function F such that
ŷ = argmax F (x, y; w)
y∈Y
4
(6)
Higher-order layer
Higher-order layer
Superpixel (pairwise) layer
(a)
Superpixel (pairwise) layer
(b)
(c)
Figure 3: Hypergraph construction from multiple partitionings. (a) Multiple partitionings from
baseline superpixels. (b) Hyperedge (yellow line) corresponding to a region in the second layer. (c)
Hyperedge (yellow line) corresponding to a region in the third layer.
where Y is also the set of {0, 1}E that corresponds to a valid segmentation. Since the inference
problem (6) is also NP-hard, we relax Y by (facet-defining) linear inequalities. In addition to the
constraints placed on pairwise labels such that the cycle inequality and odd-wheel inequality hold
pairwise correlation clustering, we augment the constraints for labels on the higher-order edges,
called higher-order inequalities, for a valid segmentation; there is no all-one pairwise labels in a
region for which the higher-order edge is labeled as zero (non-homogeneous region), and when a
region is labeled as one (homogeneous region), all pairwise labels in that region should be one.
These higher-order inequalities can be formulated as
yeh ≤ yep , ∀ep ∈ Ep |ep ⊂ eh ,
∑
(1 − yeh ) ≤
(1 − yep ).
(7)
ep ∈Ep |ep ⊂eh
Indeed, the LP relaxation to approximately solve (6) is formulated as
∑
∑
argmax
⟨wp , ϕep (x)⟩yep +
⟨wh , ϕeh (x)⟩yeh
y
s.t.
ep ∈Ep
∀ e ∈ E(= Ep
∪
(8)
eh ∈Eh
Eh ), ye ∈ [0, 1],
∀ ep ∈ Ep , cycle inequalities, odd-wheel inequalities [18],
∀ eh ∈ Eh , higher-order inequalities (7).
Note that the proposed higher-order correlation clustering follows the concept of soft constraints:
superpixels within a hyperedge are encouraged to merge if a hyperedge is highly homogeneous.
2.2.2 Feature vector
We construct a 771-dimensional feature vector ϕe (x) by concatenating several visual cues with different quantization levels and thresholds. The pairwise feature vector ϕep (x) reflects the correspondence between neighboring superpixels, and the higher-order feature vector ϕeh (x) characterizes a
more complex relations among superpixels in a broader region to measure homogeneity. The magnitude of w determines the importance of each feature, and this importance is task-dependent. Thus,
w is estimated by supervised training described in Section 3.
1. Pairwise feature vector (611-dim): ϕep = [ϕcep ; ϕtep ; ϕsep ; ϕeep ; ϕvep ; 1].
• Color difference ϕc : The 26 RGB/HSV color distances (absolute differences, χ2 distances, earth mover’s distances) between two adjacent superpixels.
5
• Texture difference ϕt : The 64 texture distances (absolute differences, χ2 -distances,
earth mover’s distances) between two adjacent superpixels using 15 Leung-Malik
(LM) filter banks [19].
• Shape/location difference ϕs : The 5-dimensional shape/location feature proposed in
[20].
• Edge strength ϕe : The 1-of-15 coding of the quantized edge strength proposed in [19].
• Joint visual word posterior ϕv : The 100-dimensional vector holding the joint visual
word posteriors for a pair of neighboring superpixels using 10 visual words and the
400-dimensional vector holding the joint posteriors based on 20 visual words [21].
e
tm
2. Higher-order feature vector (160-dim): ϕeh = [ϕva
eh ; ϕeh ; ϕeh ; 1].
• Variance ϕva : The 14 color variances and 30 texture variances among superpixels in
a hyperedge.
• Edge strength ϕe : The 1-of-15 coding of the quantized edge strength proposed in [19].
• Template matching score ϕtm : The color/texture and shape/location features of all
regions in the training images are clustered using k-means with k = 100 to obtain 100
representative templates of distinct regions. The 100-dimensional template matching
feature vector is composed of the matching scores between a region defined by a
hyperedge and templates using the Gaussian RBF kernel.
In each feature vector, the bias (=1) is augmented for proper similarity/homogeneity measure which
can either be positive or negative.
3
Structural learning
The proposed discriminant function is defined over the superpixel graph, and therefore, the groundtruth segmentation needs to be transformed to the ground-truth edge labels in the superpixel graph.
For this, we first assign a single dominant segment label to each superpixel by majority voting over
the superpixel’s constituent pixels and then obtain the ground-truth edge labels.
Using this ground-truth edge labels of the training data, the S-SVM [9] is performed to estimate the
n
parameter vector. Given N training samples {xn , yn }N
n=1 where y is the ground-truth edge labels
for the nth training image, the S-SVM [9] optimizes w by minimizing a quadratic objective function
subject to a set of linear margin constraints:
min
w,ξ
s.t.
N
∑
1
∥w∥2 + C
ξn
2
n=1
(9)
⟨w, ∆Φ(xn , y)⟩ ≥ ∆(yn , y) − ξn , ∀n, y ∈ Y\yn ,
ξn ≥ 0, ∀n
where ∆Φ(xn , y) = Φ(xn , yn ) − Φ(xn , y), and C > 0 is a constant that controls the trade-off
between margin maximization and training error minimization. In the S-SVM, the margin is scaled
with a loss ∆(yn , y), which is the difference measure between prediction y and ground-truth label
yn of the nth image. The S-SVM offers good generalization ability as well as the flexibility to
choose any loss function [9].
The cutting plane algorithm [9, 18] with LP relaxation for loss-augmented inference is used to solve
the optimization problem of S-SVM, since fast convergence and high robustness of the cutting plane
algorithm in handling a large number of margin constraints are well-known [22, 23].
A loss function is usually a non-negative function, and a loss function that is decomposable is preferred, since it enables the loss-augmented inference in the cutting plane algorithm to be performed
efficiently. The most popular loss function that is decomposable is the Hamming distance which
is equivalent to the number of mismatches between yn and y at the edge level in this correlation
clustering. Unfortunately, in the proposed correlation clustering for image segmentation, the number of edges which are labeled as 1 is considerably higher than that of edges which are labeled as 0.
This unbalance makes other learning methods such as the perceptron algorithm inappropriate, since
it leads to the clustering of the whole image as one segment. This problem due to the unbalance also
6
PRI
VOI
SCO
BDE
0.84
11
0.6
3
0.82
10
0.5
0.8
2.5
9
0.78
0.4
2
0.76
20
25
30
35
Average number of regions
Multi -NCut
40
8
20
25
30
35
Average number of regions
gPb-owt -ucm
40
0.3
gPb-Hoiem
20
25
30
35
Average number of regions
40
Corr -Cluster -Pairwise
20
25
30
35
Average number of regions
40
Corr -Cluster -Higher
Figure 4: Obtained evaluation measures from segmentation results on the SBD.
occurs when we use the Hamming loss in the S-SVM. Therefore, we use the following loss function:
)
)
∑(
∑(
∆(yn , y)=
Rp yenp +yep − (Rp + 1)yenp yep +D
Rh yenh +yeh − (Rh + 1)yenh yeh (10)
ep ∈Ep
eh ∈Eh
where D is the relative weight of the loss at higher-order edge level to that of the loss at pairwise
edge level. In addition, Rp and Rh control the relative importance between the incorrect merging of
the superpixels and the incorrect separation of the superpixels by imposing different weights to the
false negative and the false positive. Here, we set both Rp and Rh to be less than 1 to overcome the
problem due to the unbalance.
4
Experiments
To evaluate segmentations obtained by various algorithms against the ground-truth segmentation,
we conducted image segmentations on three benchmark datasets: Stanford background dataset [24]
(SBD), Berkeley segmentation dataset (BSDS) [25], MSRC dataset [26]. For image segmentation
based on correlation clustering, we initially obtain baseline superpixels (438 superpixels per image
on average) by the gPb contour detector and the oriented watershed transform [19] and then construct
a hypergraph. The function parameters are initially set to zero, and then based on the S-SVM, the
structured output learning is used to estimate the parameter vectors. Note that the relaxed solutions
in loss-augmented inference are used during training, while in testing, our simple rounding method is
used to produce valid segmentation results. Rounding is only necessary in case we obtain fractional
solutions from LP-relaxed correlation clustering.
We compared the proposed pairwise/higher-order correlation clustering to the following state-of-theart image segmentation algorithms: multiscale NCut [27], gPb-owt-ucm [19], and gPb-Hoiem [20]
that grouped the same superpixels based on pairwise same-label likelihoods. The pairwise samelabel likelihoods were independently learnt from the training data with the same 611-dimensional
pairwise feature vector. We consider four performance measures: probabilistic Rand index (PRI)
[28], variation of information (VOI) [29], segmentation covering (SCO) [19], and boundary displacement error (BDE) [30]. As the predicted segmentation is close to the ground-truth segmentation, the PRI and SCO are increased while the VOI and BDE are decreased.
4.1
Stanford background dataset
The SBD consists of 715 outdoor images with corresponding pixel-wise annotations. We employed
5-fold cross-validation with the dataset randomly split into 572 training images and 143 test images
for each fold. Figure 4 shows the obtained four measures from segmentation results according to
the average number of regions. Note that the performance varies with different numbers of regions,
and for this reason, we designed each algorithm to produce multiple segmentations (20 to 40 regions). Specifically, multiple segmentations in the proposed algorithm were obtained by varying
Rp (0.001∼0.2) and Rh (0.1∼1.0) in the loss function during training (D=10). Irrespective of the
measure, the proposed higher-order correlation clustering (Corr-Cluster-Higher) performed better
than other algorithms including the pairwise correlation clustering (Corr-Cluster-Pairwise). Figure
5 shows some example segmentations. The proposed higher-order correlation clustering yielded
the best segmentation results. In specific, incorrectly predicted segments by pairwise correlation
clustering were reduced in the segmentation results obtained by higher-order correlation clustering
7
Original image
Ground-truth
Multi -NCut
gPb-owt-ucm
gPb-Hoiem
Corr-Cluster Pairwise
Corr-Cluster Higher
Figure 5: Results of image segmentation.
Table 1: Quantitative results on the BSDS test set and on the MSRC test set.
Test set
Multi-NCut
gPb-owt-ucm
gPb-Hoiem
Corr-Cluster-Pairwise
Corr-Cluster-Higher
PRI
0.728
0.794
0.724
0.806
0.814
BSDS
VOI
SCO
3.043 0.315
1.909 0.571
3.194 0.316
1.829 0.585
1.743 0.599
BDE
14.257
11.461
14.795
11.194
10.377
PRI
0.628
0.779
0.614
0.773
0.784
MSRC
VOI
SCO
2.765 0.341
1.675 0.628
2.847 0.353
1.648 0.632
1.594 0.648
BDE
11.941
9.800
13.533
9.194
9.040
owing to the consideration of higher-order relations in broad regions. Regarding the runtime of our
algorithm, we observed that for test-time inference it took on average around 15 seconds (graph
construction and feature extraction: 14s, LP: 1s) per image on a 2.67GHz processor, whereas the
overall training took 10 hours on the training set. Note that other segmentation algorithms such as
the multiscale NCut and the gPb-owt-ucm took on average a few minutes per image.
4.2
Berkeley segmentation dataset and MSRC dataset
The BSDS contains 300 natural images split into the 200 training images and 100 test images. Since
each image is segmented by multiple human subjects, we defined a single probabilistic (real-valued)
ground-truth segmentation of each image for training in the proposed correlation clustering. The
MSRC dataset is composed of 591 natural images. We split the data into 45% training, 10% validation, and 45% test sets, following [26]. The performance was evaluated using the clean ground-truth
object instance labeling of [31]. On average, all segmentation algorithms were set to produce 30
disjoint regions per image on the BSDS and 15 disjoint regions per image on the MSRC dataset.
As shown in Table 1, the proposed higher-order correlation clustering gave the best results on both
datasets. Especially, the obtained results on the BSDS are similar or even better than the best results
ever reported on the BSDS [32, 19].
5
Conclusion
This paper proposed the higher-order correlation clustering over a hypergraph to merge superpixels
into homogeneous regions. The LP relaxation was used to approximately solve the higher-order
correlation clustering over a hypergraph where a rich feature vector was defined based on several visual cues involving higher-order relations among superpixels. The S-SVM was used for supervised
training of parameters in correlation clustering, and the cutting plane algorithm with LP-relaxed inference was applied to solve the optimization problem of S-SVM. Experimental results showed that
the proposed higher-order correlation clustering outperformed other image segmentation algorithms
on various datasets. The proposed framework is applicable to a variety of other areas.
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the
Korea government (MEST) (No.2011-0018249).
8
References
[1] T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Wu, “An efficient k-means
clustering algorithm: Analysis and implementation,” PAMI, vol. 24, pp. 881–892, 2002.
[2] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” PAMI, vol.
24, pp. 603–619, 2002.
[3] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: image segmentation using expectationmaximization and its application to image querying,” PAMI, vol. 24, pp. 1026–1038, 2002.
[4] F. Estrada and A. Jepson, “Spectral embedding and mincut for image segmentation,” in BMVC, 2004.
[5] J. Shi and J. Malik, “Normalized cuts and image segmentation,” PAMI, vol. 22, pp. 888–905, 2000.
[6] P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based image segmentation,” IJCV, vol. 59, pp.
167–181, 2004.
[7] F. Estrada and A. Jepson, “Benchmarking image segmentation algorithms,” IJCV, vol. 85, 2009.
[8] N. Bansal, A. Blum, and S. Chawla, “Correlation clustering,” Machine Learning, vol. 56, 2004.
[9] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and
independent output variables,” JMLR, vol. 6, 2005.
[10] T. Finley and T. Joachims, “Supervised clustering with support vector machines,” in ICML, 2005.
[11] B. Taskar, “Learning structured prediction models: a large margin approach,” Ph.D. thesis, Stanford
University, 2004.
[12] C. Berge, Hypergraphs, North-Holland, Amsterdam, 1989.
[13] L. Ding and A. Yilmaz, “Image segmentation as learning on hypergraphs,” in Proc. ICMAL, 2008.
[14] S. Rital, “Hypergraph cuts and unsupervised representation for image segmentation,” Fundamenta Informaticae, vol. 96, pp. 153–179, 2009.
[15] A. Ducournau, S. Rital, A. Bretto, and B. Laget, “A multilevel spectral hypergraph partitioning approach
for color image segmentation,” in Proc. ICSIPA, 2009.
[16] F. Bach and M. I. Jordan, “Learning spectral clustering,” in NIPS, 2003.
[17] T. Cour, N. Gogin, and J. Shi, “Learning spectral graph segmentation,” in AISTATS, 2005.
[18] S. Nowozin and S. Jegelka, “Solution stability in linear programming relaxations: Graph partitioning and
unsupervised learning,” in ICML, 2009.
[19] P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik, “Contour detection and hierarchical image segmentation,” PAMI, vol. 33, pp. 898–916, 2011.
[20] D. Hoiem, A. A. Efros, and M. Hebert, “Recovering surface layout from an image,” IJCV, 2007.
[21] D. Batra, R. Sukthankar, and T. Chen, “Learning class-specific affinities for image labelling,” in CVPR,
2008.
[22] T. Finley and T. Joachims, “Training structural SVMs when exact inference is intractable,” in ICML,
2008.
[23] A. Kulesza and F. Pereira, “Structured learning with approximate inference,” in NIPS, 2007.
[24] S. Gould, R. Fulton, and D. Koller, “Decomposing a scene into geometric and semantically consistent
regions,” in ICCV, 2009.
[25] C. Fowlkes, D. Martin, and J. Malik, The Berkeley Segmentation Dataset and Benchmark (BSDB),
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/.
[26] J. Shotton, J. Winn, C. Rother, and A. Criminisi, “Textonboost: joint apprearence, shape and context
modeling for multi-class object recognition and segmentation,” in ECCV, 2006.
[27] T. Cour, F. Benezit, and J. Shi, “Spectral segmentation with multiscale graph decomposition,” in CVPR,
2005.
[28] W. M. Rand, “Objective criteria for the evaluation of clustering methods,” Journal of the American
Statistical Association, vol. 66, pp. 846–850, 1971.
[29] M. Meila, “Computing clusterings: An axiomatic view,” in ICML, 2005.
[30] J. Freixenet, X. Munoz, D. Raba, J. Marti, and X. Cufi, “Yet another survey on image segmentation:
Region and boundary information integration,” in ECCV, 2002.
[31] T. Malisiewicz and A. A. Efros, “Improving spatial support for objects via multiple segmentations,” in
BMVC, 2007.
[32] T. Kim, K. Lee, and S. Lee, “Learning full pairwise affinities for spectral segmentation,” in CVPR, 2010.
9
Download