Slides - CLAIR

advertisement
SI/EECS 767
Yang Liu
Apr 2, 2010
MINCUTS
INTRODUCTION
A minimum cut is the smallest cut that will
disconnect a graph into two disjoint subsets.
 Application:

 Graph
partitioning
 Data clustering
 Graph-based machine learning
BACKGROUND KNOWLEDGE

Cut
A cut C = (S,T) is a partition of V of a graph G = (V, E).
 An s-t cut C = (S,T) of a network N = (V, E) is a cut of N such
that s∈S and t∈T, where s and t are the source and
the sink of N respectively.
 The cut-set of a cut C = (S,T) is the set {(u,v)∈E | u∈S, v∈T}.
 The size of a cut C = (S,T) is the number of edges in the cutset. If the edges are weighted, the value of the cut is the
sum of the weights.
(http://en.wikipedia.org/wiki/Cut_(graph_theory))

BACKGROUND KNOWLEDGE

Minimum cut

A cut is minimum if the size of the cut is not larger than
the size of any other cut.

Max-flow-min-cut theorem
 The
maximum flow between two vertices is always equal to the
size of the minimum cut times the capacity of a single pipe.
 Also applies to weighted networks in which individual pipes
can have different capacities.
BACKGROUND KNOWLEDGE
Max-flow min-cut theorem is very useful
because there are simple computer algorithms
that can calculate maximum flows quite quickly
(in polynomial time) for any given networks.
 We can use these same algorithms to quickly
calculate the size of a cut set.

THE AUGMENTING PATH ALGORITHM

Basic idea:
 First
find a path from source s to sink t using the
breadth-first search;
 Then find another path from s to t among the
remaining edges and repeat this procedure until no
more paths can be found.
s
t
A SIMPLE FIX

Allow fluid to flow simultaneously both ways
down an edge in the network.
Mark Newman’s text book (preprint version)
Graph Clustering and Minimum
Cut Tress
(Flake et al 2004)
INTRODUCTION
Clustering data into disjoint groups
 Data sets can be represented as weighted
graphs

 Nodes
= entities to be clustered
 Edges = a similarity measure between entities

Present a new clustering algorithm based on
maximum flow. (in particular minimum cut tree)
MINIMUM CUT TREE
Also known as Gomory–Hu tree
 A weighted tree that consists of edges
representing all pairs minimum s-t cut in the
graph
 For every undirected graph, there always exists
a min-cut tree.
 See [Gomory and Hu 61] for detail and the
algorithm for calculating min-cut trees.

CUT CLUSTERING ALGORITHM
CHOOSING α
α→0, the trivial cut ({t}, V)
 α→∞, n trivial clusters, all singletons
 The exact value of α depends on the
structure of G and the distribution of the
weights over the edges.
 The algorithm finds all clusters either in
increasing or decreasing order, we can stop the
algorithm as soon as a desired cluster has
been found.

HIERARCHICAL CUT-CLUSTERING ALGORITHM




Once a clustering is produced, contract the clusters into
single nodes and apply the same algorithm to the
resulting graph.
When contracting a set of nodes, they get replaced by a
single new node; possible loops get deleted and parallel
edges are combined into a single edge with weight equal
to the sum of their weights.
break if
((clusters returned are of desired number and size) or
(clustering failed to create nontrivial clusters))
EXPERIMENTAL RESULTS

CiteSeer
 Citation
network (documents as nodes, citations as
edges)
Low level
high level
CONCLUSION
Minimum cut trees, based on expanded
graphs, provide a means for producing quality
clusterings and for extracting heavily connected
components.
 A single parameter, α, can be used as a strict
bound on the expansion of the clustering while
simultaneously serving to bound the
intercluster weight as well.

Bipartite Graph Partitioning and
Data Clustering
(Zha et al 2001)
INTRODUCTION

Bipartite graph
 Two
kinds of vertices
 One representing the original vertices and the other
representing the groups to which they belong
 Examples: terms and documents, authors and
authors of an article

Adapt undirected graphs criteria for bipartite
graph partitioning and therefore solve the biclustering problem.
BIPARTITE GRAPH PARTITIONING
Bipartite graph G(X, Y, W)
 In the context of document clustering

X
represents the set of terms
 Y represents the set of documents
 W = (wij) represents term frequency of i in
document j.

Tends to produce unbalanced clusters

The problem becomes following optimization
problem

Computational complexity: general linear in the
number of documents to be clustered
EXPERIMENTS

20 news groups
Learning from Labeled and
Unlabeled Data using Graph
Mincuts
(Blum & Chawla 2001)
INTRODUCTION
Many application domains suffer from not
having enough labeled training data for
learning.
 Large amounts of unlabeled examples
 How unlabeled data can be used to aid
classification

THE GRAPH MINCUT LEARNING ALGORITHM
A set L of labeled examples
 A set U of unlabeled examples
 Binary classification

 L+
to denote the set of positive examples
 L- to denote the set of negtive examples
Construct a weighted graph G = (V, E), where V =
L∪U∪{v+, v-}, e ∈ E is a weight w(e).
v+, v-: classification vertices;
other vertices: example vertices;
 w(v, v+) = ∞ for all v ∈L+ and w(v, v-) = ∞ for all v
∈L The edge between example vertices are assigned
weights based on some relationship
(similarity/distance) between the examples

Determine a minimum (v+, v-) cut for the graph, i.e.
the minimum total weight set of edges whose
removal disconnects v+ and v-. (using a max-flow
algorithm in which v+ is the source, v- is the sink)
 Assign a positive label to all unlabeled examples in
the set V+ and a negative label to all unlabeled
examples in the set V-.
 *edges between examples which are similar to
each other should be given a high weight

POTENTIAL PROBLEM
If there are few labeled examples, it can cause
mincut to assign the unlabeled examples to
one class or the other
 If the graph is too sparse, it could have a
number of disconnected components
 Therefore it is important to use a proper
weighting function

EXPERIMENTAL ANALYSIS


Datasets: UCI, 2000
The mincut algorithm has many degrees of freedom in
terms of how the edge weights are defined.


Mincut-3: each example is connected to its nearest labeled
example and two other nearest examples overall
Mincut- δ: if too nodes are closer than δ , they are
connected
Mincut- δ0: max δ which graph has a cut of value 0
 Mincut- δ1/2: the size of the largest connected component in
the graph is half the number of datapoints
 Mincut- δopt: the values of δ that corresponds to the least
classification error in hindsight

REVIEW
The basic idea of this algorithm is to build a
graph on all the data with edges between
examples that are sufficiently similar
 then to partition the graph into a positive and a
negative set in a way that

 (a)
agrees with the labeled data
 (b) cuts as few edges as possible
Semi-supervised Learning using
Randomized Mincuts
(Blum et al 2004)
INTRODUCTION

The drawbacks of the graph mincut approach:
A
graph may have many minimum cuts and the
mincut algorithm produces just one, typically the
“leftmost” one using standard network flow
algorithms.
 Produced based on joint labeling rather than pernode probabilities.

Can be improved by averaging over many small
cuts.
BASIC IDEA
Repeatedly add artificial random noise to the
edge weights
 Solve for the minimum cut in the resulting
graphs
 Output a fractional label for each example
corresponding to the fraction of the time it was
on one side or the other

RANDOMIZED MINCUTS WITH SANITY CHECK
Given a graph G, produce a collection of cuts by
repeatedly adding random noise to the edge
weights and then solving for the minimum cut
in the perturbed graph.
 Sanity check: remove those that are highly
unbalanced (any cut with less than 5% of the
vertices on one side in this paper)
 Predict based on a majority vote

EXAMPLE
Overcome some of the limitations of the plain
mincut algorithm.
 Consider a graph which simply consists of a line
with a positively labeled node at one end and a
negatively labeled node at the other end with the
rest being unlabeled.

Plain mincut: the cut will be the leftmost or right most
one
 Randomized mincut: end up using the middle of the
line with confidence that increases linearly out to the
endpoints

UNIFORM DISTRIBUTION OF MINIMUM CUTS
GRAPH DESIGN CRITERIA
The graph should be either be connected or at
least have the property that a small number of
connected components cover nearly all the
examples.
 Good to create a graph that at least has some
small balanced cuts.

TWO GRAPH CONSTRUCTION METHODS
MST: simply construct a minimum spanning
tree on the entire dataset
 δ-MST: connect two points with an edge if they
are within a radius δ. Then veiw the
components produced as super nodes and
connect them via an MST.

EXPERIMENTAL ANALYSIS
Handwritten digits
 20 newsgroups
 Various UCI datasets

CONCLUSTION
Improve performance when the number of
labeled examples is small
 Providing a confidence score for accuracycoverage curves.

A Sentimental Education:
Sentiment Analysis using
Subjectivity Summarization
based on Minimum Cuts
(Pang & Lee 2004)
INTRODUCTION

machine-learning method that applies textcategorization techniques to determine the
sentiment polarity—positive (“thumbs up”) or
negative (“thumbs down”)
Previous approaches focused on selecting
indicative lexical features
 Their approach:

 Label
the sentences as either subjective or
objective
 Apply a standard machine-learning classifier to the
resulting extract.
METHODS
SUBJECTIVITY DETECTION

n items x1, . . . , xn to divide into two classes C1 and C2
• Individual scores indj(xi): non-negative estimates of
each xi’s preference for being in Cj based on just the
features of xi alone;
• Association scores assoc(xi, xk): non-negative
estimates of how important it is that xi and xk be in the
same class.
Minimize the partition cost

:



Build an undirected graph G with vertices {v1, . .
. , vn, s, t}; the last two are, respectively, the
source and sink.
 Add n edges (s, vi), each with weight ind1(xi),
and n edges (vi, t), each with weight ind2(xi).
 Finally, add edges (vi, vk), each with weight
assoc(xi, xk).

EVALUATION FRAMEWORK
Classifying movie reviews as either positive and
negative
 The correct label can be extracted
automatically from rating information (number
of stars)

CONSTRUCTION OF THE GRAPH
The source s and sink t correspond to the class
of subjective and objective sentences
 Each internal node vi corresponds to the
document’s ith sentence si
 Set the ind1(si) to

: Naive Bayes’ estimate of the probability
that sentence s is subjective


EXPERIMENTAL RESUTLS




NB as a subjectivity detector in conjunction with a NB
document-level polarity 86.4% accuracy VS 82.8%
without extraction
SVM: 87.15% VS 86.4%
Sentences labeled as objective as input:
71% for NB and 67% for SVMs
Taking just the N most subjective sentences:
5 most subjective sentences is almost as informative as
the Full review while containing only about 22% of the
source words.
CONCLUSION
Subjectivity detection can compress reviews
into shorter extracts still retain polarity
information
 Minimum-cut frame work results in the
development of efficient algorithm for
sentiment analysis


Questions?
Download