Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Robust Multi-Class Transductive Learning with Graphs Wei Liu and Shih-Fu Chang Columbia University June 19, 2009 Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments What is Semi-Supervised Learning (SSL)? F In the narrow sense, SSL refers particularly to semi-supervised classification using labeled data and unlabeled data, which often includes transductive and inductive cases. seen data + - unseen data inductive learning transductive learning Figure: Narrow-sense semi-supervised learning. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments What is Semi-Supervised Learning (SSL)? F In the wide sense, SSL covers all learning tasks where prior knowledge about a few data is known and knowledge about the remaining data can be inferred. The knowledge may be labels, response values, vector representations, and pairwise relations. regression clustering Figure: Wide-sense semi-supervised learning. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Survey and Book Xiaojin Zhu. Semi-Supervised Learning Literature Survey, Computer Sciences Technical Report 1530, University of Wisconsin-Madison, 2005. Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. Semi-Supervised Learning, MIT Press, 2006. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Binary-Class SSL Setting I A data set X = {x1 , · · · , xl , · · · , xn } ⊂ Rd in which the first l samples are labeled and the remaining u = n − l ones are unlabeled. Prior labels saved in y ∈ Rn such that yi ∈ {1, −1} if xi is labeled and yi = 0 if unlabeled. Use the graph Laplacian matrix L or its normalized variant L̄ to infer the overall labeling f ∈ Rn . I Graph Laplacian: L = D − W where W is the weight matrix of the P graph G (V , E , W ) built on the dataset X , and Dii = j Wij . I Normalized Graph Laplacian: L̄ = D − 2 LD − 2 . 1 Wei Liu and Shih-Fu Chang 1 Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments State-of-The-Arts F Label Propagation – the key is the Laplacian-shaped regularizer. Gaussian Fields and Harmonic Functions (GFHF), Zhu et al. 2003: min f T Lf s.t. fl = yl f Local and Global Consistency (LGC), Zhou et al. 2004: min kf − yk2 + µf T L̄f f Quadratic Criterion (QC), Bengio et al. 2006: min kfl − yl k2 + µf T Lf + µ²kfk2 f Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments F Remarks 1. All these methods are akin to each other. I found that X. Zhu’s method GFHF gives more robust performance because of the hard constraint and no trade-off parameters. 2. All these methods heavily depend on graph structures. 3. All these methods naturally generalize to multi-class problems. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Motivation 1. ”Several graph-based methods listed here are similar to each other. They differ in the particular choice of the loss function and the regularizer. We believe it is more important to construct a good graph than to choose among the methods. However graph construction, as we will see later, is not a well studied area.” X. Zhu, the SSL survey 2005. 2. Two mostly used kinds of graphs: k-NN graph and h-neighborhood graph. Empirically, k-NN weighted graph with small k tends to perform better. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments A Simple Toy Problem–Noisy Two Moons Noisy two moons 1.2 unlabeled noise labeled: +1 labeled: −1 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 Figure: Noisy two moons given two labeled points. We only have ground truth labels for the points on two moons, so we evaluate classification performance on these on-manifold points. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments A Simple Toy Problem–Noisy Two Moons (c) GFHF with sGraph (0%) (b) GFHF (14.21%) (a) LGC (13.55%) 1.2 1.2 1.2 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 −0.2 −0.2 −0.4 −0.4 −0.4 −0.8 −1.5 −0.6 −0.6 −0.6 −1 −0.5 0 labeled to ’+1’ 0.5 1 1.5 labeled to ’−1’ 2 ’+1’ −0.8 2.5 −1.53 ’−1’ −1 −0.5 0 labeled to ’+1’ 0.5 1 1.5 labeled to ’−1’ 2 ’+1’ −0.8 2.5 −1.53 ’−1’ −1 −0.5 0 labeled to ’+1’ 0.5 1 1.5 labeled to ’−1’ 2 ’+1’ 2.5 3 ’−1’ Figure: Error rates over unlabeled points. (a) LGC with 13.55% error rate using a 10-NN graph; (b) GFHF with 14.21% error rate using a 10-NN graph; (c) GFHF with zero error rate using a symmetry-favored 10-NN graph. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Illumination I Using the traditional k-NN graph, LGC and GFHF cause many errors. But GFHF achieves perfect results when using the proposed symmetry-favored k-NN graph. This illustrates that graph quality is critical to SSL, and the same SSL method leads to very different results using different graph construction schemes. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments k-NN Graph I Let us define an asymmetric n × n matrix: ³ ´ ( d(x ,x )2 exp − σi 2 j , if j ∈ Ni Aij = 0, otherwise (1) where the set Ni saves the indexes of k nearest neighbors of point xi and d(xi , xj ) is some distance measure (e.g. Euclidean distance) between xi and xj . I The P parameter σ is empirically estimated by σ = ni=1 d(xi , xik )/n, where xik is the k-th nearest neighbor of xi . Such an estimation is verified simple and sufficiently effective. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments k-NN sGraph I Let us define a symmetric n × n matrix: Aij + Aji , if j ∈ Ni and i ∈ Nj A , if j ∈ / Ni and i ∈ Nj Wij = ji Aij , otherwise (2) Obviously, W = A + AT and W is symmetric with Wii = 0 (to avoid self loops). This weighting scheme favors the symmetric edges < xi , xj > such that xi is in the neighborhood of xj and xj is simultaneously in the neighborhood of xi . Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Remark 1. The weights of those symmetric edges are doubled explicitly due to the reasonable consideration that two points connected by a symmetric edge are prone to be on the same submanifold. 2. In contrast, the weighting scheme adopted by traditional k-NN graphs treats all edges in the same manner, which defines the weighted adjacency matrix by max{A, AT }. 3. We call the graph constructed through eq. (2) the symmetry-favored k-NN graph or k-NN sGraph in abbreviation. The proposed graph is relatively robust to noise as it reinforces the similarities between points on manifolds. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Comparision 2-NN Graph 2-NN sGraph Figure: Thicker edges represent larger edge weights. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Graph Laplacian I Given the constructed graph G (V , E , W), the smooth semi-norm used in most graph-based approaches is 1 kf k2G = (f (vi ) − f (vj ))2 Wij = f T Lf, 2 where we elicit the graph Laplacian matrix L = D − W. I (3) n×n is a diagonal matrix such that The degree Pn matrix D ∈ R Dii = j=1 Wij . Dii approximates the local density of neighborhood at xi . Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Doubly-Stochastic Matrix I Theorem 1 (in paper) implies that the smooth norm emphasizes neighborhoods of high densities (large Dii ). However, sampling is usually not uniform in practice, so over-emphasizing the neighborhoods of high densities may occlude the information in sparse regions. Figure: Ununiform sampling. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Doubly-Stochastic Matrix I To fully exploit the power of unlabeled data, we wouldn’t expect sparse densities from all unlabeled data. Thus, we choose to enforce the equal degree constraint Dii = 1 by setting W1 = 1 which makes the adjacency matrix W a doubly-stochastic matrix. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments How to learn? I We try to learn W from training data without any presumed function form. We only assume that W is close to the initial W0 calculated via eq. (2). I We can infuse semi-supervised information into W. Consider a pair set T = {(i, j)|i = j or (xi , xj ) differ in labels} and define its matrix form T. In particular, wePrequire Wij = 0 for (i, j) ∈ T or equivalently require (i,j)∈T Wij = 0 due to Wij ≥ 0. This constraint is intuitive since it removes self loops and erroneous edges. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Learning W I We formulate learning doubly-stochastic W subject to differently labeled information T as follows 1 min G(W) = kW − W0 k2F 2 X s.t. Wij = 0 (i,j)∈T W1 = 1, W = WT , W ≥ 0 (4) where k.kF stands for the Frobenius norm. Eq. (4) falls into an instance of quadratic programming (QP). Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Learning W I For efficient computation, we divide this QP problem into two convex sub-problems 1 min G(W) = kW − W0 k2F 2 X Wij = 0, W1 = 1, W = WT s.t. (5) (i,j)∈T and 1 min G(W) = kW − W0 k2F 2 Wei Liu and Shih-Fu Chang s.t. W ≥ 0 (6) Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Learning W I We find a simple solution to the sub-problem eq. (6): W = dW0 e≥0 in which the operator dW0 e≥0 zeros out all negative entries of W0 . The operator is essentially a conic subspace projection operator. I We solve the sub-problem eq. (5) µ ¶ µ0 21T Tµ T 0 0 0 µ0 , (7) W = P(W , T) = W − t + T + µ 0 1T + 1µ |T | where P(W0 , T) behaves as an affine subspace projection operator. t0 and µ0 are also computed based on W0 . Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Successive Projection I We tackle the original QP problem eq. (4) by successive projection using the two subspace projection operators. I Von-Neumanns successive projection lemma: the successively alternate projection process will converge onto the intersect of the affine and conic subspace operators. VNs lemma ensures that alternately solving sub-problems eq. (5) and (6) is theoretically guaranteed to converge to the globally optimal solution of the target problem eq. (4). Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Algorithm 1. Doubly-Stochastic Adjacency Matrix Learning INPUT: the initial adjacency matrix W0 the differently labeled information T the maximum iteration number MaxIter . LOOP: m = 1, · · · , MaxIter Wm = P(Wm−1 , T) If Wm ≥ 0 stop LOOP; else Wm = dWm e≥0 . OUTPUT: W = Wm . Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Two Rings Toy Problem 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure: Two rings toy data. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Two Rings Toy Problem (a) k−NN Graph (b) b−Matching Graph 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure: k = 10. The b-matching graph is a regular graph where each node has k adjacent nodes. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Two Rings Toy Problem (c) unit−degree Graph (d) unit−degree Graph given two labeled points 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure: These two graphs have doubly-stochastic matrices learned based on the 10-NN sGraph. The former doesn’t use the differently labeled information T (good enough!), while the latter does. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Merits of Doubly-Stochastic Matrix I It offers a nonparametric form for W, flexibly representing data lying in compact clusters or intrinsic low-dimensional submanifolds. I It is highly robust to noise, e.g., when a noisy sample xj invades the neighborhood of xi , the unit-degree constraint makes the weight Wij absolutely small compared to the weights between xi and closer neighbors. I It provides the “balanced” graph Laplacian with which the smooth norm penalizes label prediction functions on each sample (node) uniformly, resulting in uniform label propagation. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Goal I Solve a soft label matrix F ∈ Rn×c for any multi-class SSL task. Yl known class assignment Fl F= Fu = [F.1 , F.2 ,..., F.c ] account for each class unknown Figure: Provided Yl infer Fu . Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Multi-Class Constraints I It suffices to suppose the class posteriors for the labeled data be p(Ck |xi ) = Yik = 1 if xi ∈ Ck and p(Ck |xi ) = Yik = 0 otherwise. Importantly, if we knew class priors ω T 1c = 1) and regarded soft labels ω = [p(C1 ), · · · , p(Cc )]T (ω Fik as p(Ck |xi ), we would have the equation n n 1T F.k ∼ X p(Ck |xi ) X p(xi )p(Ck |xi ) = p(Ck ) = = n n i=1 (8) i=1 where the marginal probability density p(xi ) ∝ Dii = 1 is assumed to be 1/n. Eq. (8) induces a hard constraint ω T (FT 1 = nω ω ). 1T F = nω Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Multi-Class Label Propagation I To address multi-class problems, our motivation is to let the soft labels Fik carry the main properties of p(Ck |xi ). Hence, ω and F1c = 1 (due we P impose two hard constraints FT 1 = nω to k p(Ck |xi ) = 1, 1c is a c-dimensional 1-entry vector) to obtain a constrained multi-class label propagation: minF tr (FT LF) s.t. ω Fl = Yl , F1c = 1, FT 1 = nω Wei Liu and Shih-Fu Chang (9) Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Multi-Class Label Propagation I Eq. (9) reduces to T min Q(Fu ) = tr (FT u Luu Fu ) + 2tr (Fu Lul Yl ) ω − YlT 1l s.t. Fu 1c = 1u , FT (10) u 1u = nω · ll ¸ L Llu where Luu and Lul are sub-matrices of L = , and Lul Luu 1l and 1u are l- and u-dimensional 1-entry vectors, respectively. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Multi-Class Label Propagation I Theorem 2 (in paper) shows a closed-form solution to eq. (10). The formulated multi-class label propagation succeeds in incorporating class priors, different from all existing label propagation methods. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Flowchart of RMGT prior labels k-NN sGraph input feature vectors unit-degree Graph doubly-stochastic adjacency matrix learning multi-class label propagation global classification Figure: The RMGT algorithm. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Experimental Setup Data #Features #Samples #Classes USPS (test) 256 2007 10 FRGC (subset) 4608 3160 316 Figure: Digit and face images. RMGT: without graph adjacency matrix learning. RMGT(W): with graph adjacency matrix learning. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Performance Curves USPS FRGC 0.35 0.85 LGC SGT GFHF+CMN RMGT RMGT(W) 0.25 0.2 0.15 0.1 20 0.8 Recognition Rate (%) Error Rate (%) 0.3 0.75 0.7 LGC SGT GFHF+CMN RMGT RMGT(W) 0.65 30 40 50 60 70 80 90 100 # Labeled Samples Wei Liu and Shih-Fu Chang 3 4 5 6 7 8 9 10 # Labeled Samples/100 Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Conclusions I All compared SSL algorithms achieve performance gains when switching k-NN graphs to k-NN sGraphs. I RMGT performs better than the other methods, thus demonstrating the success of multi-class label propagation with class priors. I RMGT(W) is significantly superior to the others, manifesting that the proposed graph learning technique (doubly-stochastic adjacency matrix learning) boosts graph-based SSL performance. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs Outline Introduction Graph Construction Graph Learning Robust Multi-Class Graph Transduction (RMGT) Experiments Thanks! For any problems, please email to wliu@ee.columbia.edu. Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs