Relevance Aggregation Projections for Image Retrieval CIVR 2008 Wei Liu Wei Jiang Shih-Fu Chang wliu@ee.columbia.edu Syllabus z Motivations and Formulation z Our Approach: Relevance Aggregation Projections z Experimental Results z Conclusions Liu et al. Columbia University 2/28 Syllabus z Motivations and Formulation z Our Approach: Relevance Aggregation Projections z Experimental Results z Conclusions Liu et al. Columbia University 3/28 Motivations and Formulation z Relevance feedback to close the semantic gap. to explore knowledge about the user’s intention. to select features, refine models. z Relevance feedback mechanism User selects a query image. The system presents highest ranked images to user, except for labeled ones. During each iteration, the user marks “relevant” (positive) and ”irrelevant” (negative) images. The system gradually refines retrieval results. Liu et al. Columbia University 4/28 Problems z Small sample learning – Number of labeled images is extremely small. z High dimensionality – Feature dim >100, labeled data number < 100. z Asymmetry – relevant data are coherent and irrelevant data are diverse. Liu et al. Columbia University 5/28 Asymmetry in CBIR query relevant images irrelevant images Liu et al. Columbia University 6/28 Possible Solutions z Asymmetry: T query query margin =1 margin =1 z Small sample learning Æ semi-supervised learning z Curse of dimensionality Æ dimensionality reduction Liu et al. Columbia University 7/28 Previous Work Methods LPP NIPS’03 labeled unlabeled √ SSP SR ACM MM’05 ACM MM’06 ACM MM’07 √ √ √ √ √ √ l-1 2 √ asymmetry dimension bound ARE d l-1 image dim: d, total sample #: n, labeled sample #: l In CBIR, n > d > l Liu et al. Columbia University 8/28 Disadvantages z LPP: unsupervised. z SSP and SR: fail to engage the asymmetry. SSP emphasizes the irrelevant set. SR treats relevant and irrelevant sets equally. z ARE, SSP and SR: produce very low-dimensional subspaces (at most l-1 dimensions). Especially for SR (2D subspace). Liu et al. Columbia University 9/28 Syllabus z Motivations and Formulation z Relevance Aggregation Projections (RAP) z Experimental Results z Conclusions Liu et al. Columbia University 10/28 Symbols n : total #, l : labeled # d : original dim, r : reduced dim X = [ x1 ,..., xl , xl +1 ,..., xn ] ∈ \ d ×n : samples X l = [ x1 ,..., xl ] ∈ \ d ×l : labeled samples F + : relevant set, F − : irrelevant set l + : relevant #, l − : irrelevant # A ∈ \ d ×r : subspace, a ∈ \ d : projecting vector G (V , E , W ) : graph, L = D − W : graph Laplacian Liu et al. Columbia University 11/28 Graph Construction z Build a k-NN graph as 2 ⎧ xi − x j k k ⎪exp(− ∈ ∨ ∈ ), x N ( x ) x N ( xi ) i j j 2 Wij = ⎨ σ ⎪ 0, otherwise ⎩ z Establish an edge if xi is among k-NNs of x j or x j is among k-NNs of xi . z Graph Laplacian L = D − W ∈ \ n×n : used in smoothness regularizers. Liu et al. Columbia University 12/28 Our Approach T T min tr ( A XLX A) d ×r (1.1) A∈\ s.t. AT xi = ∑ j∈F + AT x j / l + , ∀i ∈ F + (1.2) 2 AT ( xi − ∑ j∈F + x j / l + ) ≥ r , ∀i ∈ F − (1.3) Target – subspace A reducing raw data from d dims to r dims Obj (1.1) – minimize local scatter using labeled and unlabeled data Cons (1.2) – aggregate positive data (in F+ ) to the positive center Cons (1.3) – push negative data (in F-) far away from the positive center with at least r unit distances. Cons (1.2) (1.3) just address asymmetry in CBIR. Liu et al. Columbia University 13/28 Core Idea: Relevance Aggregation z An ideal subspace is one in which the relevant examples are aggregated into a single point and the irrelevant examples are simultaneously separated by a large margin. Liu et al. Columbia University 14/28 Relevance Aggregation Projections z We transform eq. (1) to eq. (2) in terms of each column vector a in A (a is a projecting vector): mind aT XLX T a (2.1) s.t. aT xi = aT c + , ∀i ∈ F + (2.2) a∈\ + c = where Liu et al. ∑ j∈F + + 2 a ( xi − c ) ≥ 1, ∀i ∈ F − (2.3) T x j / l + is the positive center. Columbia University 15/28 Solution z Eq. (2.1-2.3) is a quadratically constrained quadratic optimization problem and thus hard to solve directly. z We want to remove constraints first and minimize the cost function then. z We adopt a heuristic trick to explore the solution. Find ideal 1D projections which satisfy the constraints. Removing constraints, solve a part of the solution. Solve another part of the solution. Liu et al. Columbia University 16/28 Solution: Find Ideal Projections z Run PCA to get the r principle eigenvectors and renormalize them to get V = [v1 ,..., vr ] ∈ \ d ×r such that V T XX T V = I . z On each vector v in V, vT xi − vT x j < 2, i, j = 1,..., n. z Form the ideal 1D projections on each projecting direction v ⎧ vT c + , i ∈ F + ⎪ T − T T + v x , i F v x v c ≥1 ∈ ∧ − ⎪ i i yi = ⎨ − T + T T + ⎪ v c + 1, i ∈ F ∧ 0 ≤ v xi − v c < 1 ⎪vT c + − 1, i ∈ F − ∧ −1 < vT x − vT c + < 0 i ⎩ y = [ y1 ,..., yl ]T ∈ \ l Liu et al. Columbia University (3) 17/28 Solution: Find Ideal Projections vT X l ∈ \1×l yT ∈ \1×l vT xi − vT c + > 1 vT xi − vT c + ≤ 1 vT c + yi − vT c + > 1 yi − vT c + = 1 The vector y is formed according to each PCA vector v. Liu et al. Columbia University 18/28 Solution: QR Factorization z Remove constraints eq. (2.2-2.3) via solving a linear system X lT a = y (4) z Because l < d , eq. (4) is underdetermined and thus strictly satisfied. ⎡R⎤ z Perform QR factorization: X l = [Q1 Q2 ] ⎢ ⎥ = Q1 R ⎣0⎦ z The optimal solution is a sum of a particular solution and a complementary solution, i.e. where b1 = ( RT ) −1 y Liu et al. a = Q1b1 + Q2b2 Columbia University (5) 19/28 Solution: Regularization z We hope that the final solution will not deviate the PCA solution too much, so we develop a regularization framework. z Our framework is f (a ) = a − v + γ aT XLX T a 2 (6) γ > 0 controls the trade-off between PCA solution and data locality preserving (original loss function). The second term behaves as a regularization term. z Plugging a = Q1b1 + Q2b2 into eq. (6), we solve b2 = ( I + γ Q2T XLX T Q2 )-1 (Q2T v − γ Q2T XLX T Q1b1 ) Liu et al. Columbia University 20/28 Algorithm ① Construct a k-NN graph W , L, S = XLX T ② PCA initialization V = [v1 ,..., vr ] ③ QR factorization Q1 , Q2 , R for j = 1: r form y with v j ④ Transductive Regularization ⑤ Projecting b1 = ( RT ) −1 y b2 = ( I + γ Q2T SQ2 )-1 [a1 ,..., ar ]T x (Q2T v j − γ Q2T SQ1b1 ) a j = Q1b1 + Q2b2 end Liu et al. Columbia University 21/28 Syllabus z Motivations and Formulation z Our Approach: Relevance Aggregation Projections z Experimental Results z Conclusions Liu et al. Columbia University 22/28 Experimental Setup z Corel image database: 10,000 image, 100 image per category. z Features: two types of color features and two types of texture features, 91 dims. z Five feedback iterations, label top-10 ranked images in each iteration. z The statistical average top-N precision is used for performance evaluation. Liu et al. Columbia University 23/28 Evaluation Liu et al. Columbia University 24/28 Evaluation Liu et al. Columbia University 25/28 Syllabus z Motivations and Formulation z Our Approach: Relevance Aggregation Projections z Experimental Results z Conclusions Liu et al. Columbia University 26/28 Conclusions z We develop RAP to simultaneously solve three fundamental issues in relevance feedback: asymmetry between classes small sample size (incorporate unlabeled samples) high dimensionality z RAP learns a semantic subspace in which the relevant samples collapse while the irrelevant samples are pushed outward with a large margin. z RAP can be used to solve imbalanced semi-supervised learning problems with few labeled data. z Experiments on COREL demonstrate RAP can achieve a significantly higher precision than the stat-of-the-arts. Liu et al. Columbia University 27/28 Thanks! http://www.ee.columbia.edu/~wliu/ Liu et al. Columbia University 28/28