Support Vector Machine Active Learning for Image Retrieval Author: Simon Tong & Edward Chang Presented By: Navdeep Dandiwal 800810102 Content Motivation Introduction SVM Version Space Active Learning Image Characterization Experimental Data Conclusions Motivation Relevance feedback is often a critical component when designing image databases. Interactively determines a user’s desired output by asking user to label images Can it get boring for the user? How to create effective relevance feedback? Abstract The proposed use of Support Vector Machine Active Learning Algorithm is: Effective relevance feedback by grasping user’s query concept accurately and quickly, while asking to label small number of images Selects the most informative images to query a user Quickly learns boundary that separates the images that satisfy the user‘s query concept from the rest of the dataset Introduction User should be able to implicitly inform a database of his or her desired output or query concept Relevance feedback can be used as a query Refinement Scheme to learn user query concept Based on the answers, another image set is brought up for user to label We call such refinement scheme as query concept learner Introduction(contd…) Refinement Scheme Fetches few image instances User labels each instance Relevant images Irrelevant images Introduction(contd…) Most machine learning algorithms are passive Passive in the sense that they are generally applied using randomly selected training set Key idea of active learning It should be able to choose its next poolquery based upon the past answers to previous pool-queries Introduction(contd…) Support Vector Machine Active Learner(SVMActive). It works on following ideas: Similar to learning SVM binary classifier where a hyperplane separates relevant and irrelevant images in a projected space. Learns the classifier quickly via active learning Returns top-k most relevant images. These are the ones farthest from the hyperplane Support Vector Machines In their simplest forms, SVMs are hyperplanes that separate the training data by maximal margin All vectors on one side of hyperplane are labeled as ‘-1’ and on the other side as ‘1’ Training instances that lie closest to the hyperplanes are called support vectors Support Vector Machines(contd…) - Support vectors Given training data {x1 . . . xn} that are vectors in some space We also give their labels {y1 . . . yn} where yi {-1,1} Support Vector Machines(contd…) SVMs allow one to project the original training data in space to a higher dimensional feature space via a Mercers kernel operator K. When we classify x as +1, otherwise as -1 Support Vector Machines(contd…) Support Vector Machines(contd…) When K satisfies Mercer’s condition it can be written as and “.” denotes inner product. We can write f as: Thus by using K we are implicitly projecting the training data into a different (often higher dimensional) space F Version Space Version Space(contd…) Given a labeled training data and a Mercer kernel K , then the set of consistent hyperplanes that separate the data in the induced feature space is called the version space Our set of possible hypothesis is given as: Where parameter space is simply equal to Version Space(contd…) The version space, is defined as: Notice that is a set of hyperplanes, there is a exact correspondence between unit vectors w and hypothesis f in . Thus we will redefine as: Version Space(contd…) SVMs find the hyperplane that maximizes the margin in feature space . One way to pose this as follows: subject to: Cause solution to lie in version space Version Space(contd…) We want to find the point in the version space that maximizes the minimum distance to any of the delineating hyperplanes. Largest sphere whose center lies in version space and whose surface does not intersect with the hyperplanes. It’s center corresponds to SVM and radius is the margin of SVM in feature space Active Learning In pool based active learning we have a pool of unlabeled instances Instances x are independently and identically distributed according to underlying function F(x) Labels are distributed according to some conditional distribution P(y|x) Active Learning(contd…) Given unlabeled pool U Active learner l: (f, q, X) Classifier f Querying function q(X) Labeled data X Given current labeled set X, decides which instance in U to query next Can also return a classifier f after each or fixed number of pool-queries Difference between active and passive classifier Active Learning(contd…) How to choose the next unlabeled instance in the pool to query? Use approach that queries points so as to attempt to reduce the size of the version space as much as possible Active Learning(contd…) -The surface of the hypersphere represents unit weight vectors -Each of the two hyperplanes corresponds to a labeled training instance -Version space is the surface segment closest to the camera Active Learning(contd…) -A large sphere could be embedded -The center of this sphere lies in version space and surface does not intersect with the hyperplanes -Center is SVM, radius is margin Active Learning(contd…) Reduce version space as fast as possible by choosing a pool-query that halves V Next pool-query Unlabeled instances Labeled instances wi Largest hypersphere that fits inside version space SVM Version space Active Learning(contd…) SVMActive takes simple approach chooses pool query of twenty images closest to its separating hyperplane It can be unstable during first round of RF Therefore choose random images for the first round SVMActive Algorithm Learn SVM on current labeled data Is it first feedback round? yes no Ask user to label 20 pool images closest to SVM boundary Ask user to label 20 randomly selected images Learn final SVM on labeled data Display top-k relevant images, farthest from SVM After relevance feedback rounds Image Characterization Our system employs a multi-resolution image representation scheme. In this scheme, we characterize images by two main features: Color Texture We consider shape as the attribute of these main features Image Characterization(contd…) Multi-resolution Color Features Image Characterization(contd…) Multi-resolution Texture Features Three characterizing texture features: Structuredness Orientation Scale Discrete Wavelet Transformation (DWT) using quadrature mirror filters because of its computational efficiency Image Characterization(contd…) Multi-resolution Texture Features Image Characterization(contd…) Color Texture extraction 144 dimensional vector Space for SVMActive is a 144 dimensional space Each image in database corresponds to a point in this space Experiments 4-category; 10-category; 15-category datasets To enable objective measure of performance, it is assumed that a query concept was an image category Accuracy is computed by looking at the fraction of the k returned result that belongs to the target image category All SVM algorithms require at least one relevant and one irrelevant image to function Experiments(contd…) 4-category set 10-category set 15-category set Experiments(contd…) SVMActive displays 20 images per pool-querying round The trade-off Number of images in one round Keeping it constant fewer Number of querying rounds Performance lower Experiments(contd…) 20 random + 2 rounds of 10 vs 20 random + 1 rounds of 20 Because active learner has more control and freedom to adapt when asking two rounds of 10 images than one round of 20 images 20 random + 2 rounds of 20 vs 20 random + 2 rounds of 10 -Increase in cost of asking 20 images per round to user is negligible, since user can pick out relevant images easily -Virtually no additional computational cost in calculating the 20 images to query Experiments(contd…) SVMActive displays 20 images per pool-querying round The trade-off Number of images in one round Keeping it constant Performance Number of querying rounds conduct more rounds increase Experiments(contd…) Active and regular passive learning on 15-category dataset After three rounds of querying After five rounds of querying Experiments(contd…) Average top-50 accuracy over the 4category data set using a regular SVM trained on 30 images Accuracy on 4-category data set after three querying rounds using various kernels Experiments(contd…) Scheme comparison Other Schemes(QPM; QEX) Traditional information retrieval schemes require a large number of image instances to achieve any substantial refinement Tend to be fairly localized in their exploration of the image space and hence rather slow in exploring the entire space Experiments(contd…) Scheme comparison SVMActive During relevance feedback, it takes both the relevant and irrelevant images into account when choosing the next pool-queries Chooses to ask user to label images that are regarded most informative for leaning the query concept, rather than relying on the likelihood of being relevant Experiments(contd…) Average top-k accuracy over the 15-category dataset Conclusions In a nut shell the contributions of this study are: SVMActive can produce a well suited learner that significantly outperforms traditional methods Organizing image features in different resolutions gives learner the flexibility to model subjective perception and to satisfy a variety of search tasks. Conclusions(contd…) Running time of SVMActive algorithm scales linearly with the size of image database. Solution: Subsampling databases – using few thousand images as pool with which to query user Designing methods to seed algorithms It would be beneficial to make SVMActive independent of having a starting relevant image Resources http://courses.cms.caltech.edu/cs101.2/slides/c s101.2-09-svm-active-learning.pdf http://airccse.org/journal/sipij/papers/3112sipi j04.pdf