Multiple Instance Learning with Query Bags Boris Babenko, Piotr Dollar, Serge Belongie [In prep. for ICML’09 – feedback appreciated!] Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Multiple Instance Learning (MIL) • Ambiguity in training data • Instead of instance/label pairs, get bag of instances/label pairs • Bag is positive if one or more of it’s members is positive Multiple Instance Learning (MIL) • Supervised Learning Training Input • MIL Training Input • Goal: learning instance classifier MIL Assumptions • Bags are predefined/fixed & finite (size • Bag label determined by: ) • Typical assumption: instances all drawn i.i.d. • Refer to this as a classical bag MIL Theory • Best known PAC bound due to Blum et al. • • • = dimensionality = bag size = the desired error. • Problem harder for larger bags • Result relies on i.i.d. assumption Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion MIL Applications • Most MIL Apps: bag generated by breaking object into many overlapping pieces. • Let’s see some examples… Vision • Image known to contain object, but precise location unknown [Andrews et al. 02, Viola et al. 05] Audio • Audio wave can be broken up spatially or in frequency domain [Saul et al. 01] Biological Sequences • Known to contain short subsequence of interest ACTGTGTGACATGTAGC { ACTG, CTGT, TGTG…} … [Ray et al. 05] Text • Text document broken down into smaller pieces [Andrews et al. 02] Observations • Sliding windows: bags are large/infinite. • In practice, bag is sub-sampled – Could violate the assumption • Instances of bag not independent – often lie on low dim. manifold (i.e. image patches) ! Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Query Bags for MIL • Bag not fixed – can query oracle to get arbitrary number of instances • Each query bag represented by object • To retrieve instances, use query function with location parameter Query Bag for MIL • Instances often lie on low dim. manifold • Can query for nearby instances Query Bags for MIL • Can express bag as • Define bag label as Distribution of locations • Assume for each bag there is some distribution (known or unknown) • Could provide some prior information. • Let , how informative is Query Bag Size • To determine bag label with confidence need • Bigger bag = better. Less chance of missing correct positive instance • Note the difference between query bags and classical bags Example: Line Bags • Instances of a bag lie on a line. Example: Hypercube Bags • Instances of a bag lie in a hypercube Example: Image Translation Bags • Let be large image, at location be patch centered • Could easily extend this to rotations, scale changes, etc. Experiments • Goal: compare behavior of synthetic classical bags and query bags to real dataset (MNIST). • Use MILBoost (Viola et al. ’05). • Expect qualitatively similar results for other MIL algorithms. • For query bags, subsample instances Results Experiment: Variance • How does distribution affect error? • Repeat Line Bag experiment, increase variance of - spreads points out along the line. Observations • PAC results not applicable to query bags – performance increase as increases. • MNIST results closely resemble synthetic query bag examples. • Need computational strategy for dealing with large bags. • Take advantage of relationships between instances. Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion MILBoost Review • Train a strong classifier (just like AdaBoost) • Optimize log likelihood of bags where and • Use Gradient Boosting (Friedman ’01) – In each iteration add close to MILBoost w/ Query Bags • Bag probability over all instances • In practice, subsample bag: • Could subsample once in the beginning, or do something more clever… Filtering Strategies • Recently, Bradley & Schapire proposed FilterBoost, which learns from continuous source of data. • Alternates between training weak classifier and querying oracle for more data. • Apply this idea to MILBoost Filtering Strategies • Want highest probability instances • Parameters: – – – = number of boosting iterations = number of instances to evaluate = frequency of filtering Filtering Strategies • Random Sampling (RAND) – Query instances, keep best • Memory (MEM) – Query new instances, combine with old ones, keep best MEM RAND Filtering Strategies • Search (SRCH) – Assume instances lie on low dimensional manifold – – Search for nearby such that – Test nearby locations MEM SRCH RAND MNIST Filtering Experiments • Turn SRCH and MEM on and off. • Sweep through: – R = sampling amount (16) – m = bag size (4) – F = sampling frequency (1) – T = boosting iterations (64) MNIST Filtering Exp: m • Filtering converges w/ smaller memory usage MNIST Filtering Exp: R & F • MEM is very effective • SRCH helps when MEM is OFF, not as big of a difference when MEM is ON MNIST Filtering Exp: T • w/o MEM filtering does not converge • Positive region becomes sparse Why MEM Works • Let be log likelihood with • Can show (for a fixed classifier H) – – Using MEM, we add bag in each iteration, so new instances per • In reality H is not fixed; hard to show convergence. Outline • • • • • Multiple Instance Learning (MIL) Review Typical MIL Applications Query Bag Model for MIL Filtering Strategies Conclusion Summary • Current assumptions for MIL are not appropriate for typical MIL applications. • We proposed query bag model, fits real data better • For query bags, sampling more instances is better. • We proposed some simple strategies for dealing with large/infinite query bags. Future Work • Develop more theory for the query bag model. • Experiments with other domains (audio, bioinformatics). • MCL – learning pedestrian parts automatically. Questions? Filtering Query Bags MILBoost with Filtering