Multiple Instance Learning with Query Bags

Multiple Instance Learning with Query Bags Boris Babenko, Piotr Dollar, Serge Belongie [In prep. for ICML’09 – feedback appreciated!] Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Multiple Instance Learning (MIL) • Ambiguity in training data • Instead of instance/label pairs, get bag of instances/label pairs • Bag is positive if one or more of it’s members is positive Multiple Instance Learning (MIL) • Supervised Learning Training Input • MIL Training Input • Goal: learning instance classifier MIL Assumptions • Bags are predefined/fixed & finite (size • Bag label determined by: ) • Typical assumption: instances all drawn i.i.d. • Refer to this as a classical bag MIL Theory • Best known PAC bound due to Blum et al. • • • = dimensionality = bag size = the desired error. • Problem harder for larger bags • Result relies on i.i.d. assumption Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion MIL Applications • Most MIL Apps: bag generated by breaking object into many overlapping pieces. • Let’s see some examples… Vision • Image known to contain object, but precise location unknown [Andrews et al. 02, Viola et al. 05] Audio • Audio wave can be broken up spatially or in frequency domain [Saul et al. 01] Biological Sequences • Known to contain short subsequence of interest ACTGTGTGACATGTAGC { ACTG, CTGT, TGTG…} … [Ray et al. 05] Text • Text document broken down into smaller pieces [Andrews et al. 02] Observations • Sliding windows: bags are large/infinite. • In practice, bag is sub-sampled – Could violate the assumption • Instances of bag not independent – often lie on low dim. manifold (i.e. image patches) ! Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Query Bags for MIL • Bag not fixed – can query oracle to get arbitrary number of instances • Each query bag represented by object • To retrieve instances, use query function with location parameter Query Bag for MIL • Instances often lie on low dim. manifold • Can query for nearby instances Query Bags for MIL • Can express bag as • Define bag label as Distribution of locations • Assume for each bag there is some distribution (known or unknown) • Could provide some prior information. • Let , how informative is Query Bag Size • To determine bag label with confidence need • Bigger bag = better. Less chance of missing correct positive instance • Note the difference between query bags and classical bags Example: Line Bags • Instances of a bag lie on a line. Example: Hypercube Bags • Instances of a bag lie in a hypercube Example: Image Translation Bags • Let be large image, at location be patch centered • Could easily extend this to rotations, scale changes, etc. Experiments • Goal: compare behavior of synthetic classical bags and query bags to real dataset (MNIST). • Use MILBoost (Viola et al. ’05). • Expect qualitatively similar results for other MIL algorithms. • For query bags, subsample instances Results Experiment: Variance • How does distribution affect error? • Repeat Line Bag experiment, increase variance of - spreads points out along the line. Observations • PAC results not applicable to query bags – performance increase as increases. • MNIST results closely resemble synthetic query bag examples. • Need computational strategy for dealing with large bags. • Take advantage of relationships between instances. Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion MILBoost Review • Train a strong classifier (just like AdaBoost) • Optimize log likelihood of bags where and • Use Gradient Boosting (Friedman ’01) – In each iteration add close to MILBoost w/ Query Bags • Bag probability over all instances • In practice, subsample bag: • Could subsample once in the beginning, or do something more clever… Filtering Strategies • Recently, Bradley & Schapire proposed FilterBoost, which learns from continuous source of data. • Alternates between training weak classifier and querying oracle for more data. • Apply this idea to MILBoost Filtering Strategies • Want highest probability instances • Parameters: – – – = number of boosting iterations = number of instances to evaluate = frequency of filtering Filtering Strategies • Random Sampling (RAND) – Query instances, keep best • Memory (MEM) – Query new instances, combine with old ones, keep best MEM RAND Filtering Strategies • Search (SRCH) – Assume instances lie on low dimensional manifold – – Search for nearby such that – Test nearby locations MEM SRCH RAND MNIST Filtering Experiments • Turn SRCH and MEM on and off. • Sweep through: – R = sampling amount (16) – m = bag size (4) – F = sampling frequency (1) – T = boosting iterations (64) MNIST Filtering Exp: m • Filtering converges w/ smaller memory usage MNIST Filtering Exp: R & F • MEM is very effective • SRCH helps when MEM is OFF, not as big of a difference when MEM is ON MNIST Filtering Exp: T • w/o MEM filtering does not converge • Positive region becomes sparse Why MEM Works • Let be log likelihood with • Can show (for a fixed classifier H) – – Using MEM, we add bag in each iteration, so new instances per • In reality H is not fixed; hard to show convergence. Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Summary • Current assumptions for MIL are not appropriate for typical MIL applications. • We proposed query bag model, fits real data better • For query bags, sampling more instances is better. • We proposed some simple strategies for dealing with large/infinite query bags. Future Work • Develop more theory for the query bag model. • Experiments with other domains (audio, bioinformatics). • MCL – learning pedestrian parts automatically. Questions? Filtering Query Bags MILBoost with Filtering

Multiple Instance Learning with Query Bags

Related documents

Products

Support

Multiple Instance Learning with Query Bags

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib