Multiple Instance Learning with Query Bags

advertisement
Multiple Instance Learning with
Query Bags
Boris Babenko, Piotr Dollar, Serge Belongie
[In prep. for ICML’09 – feedback appreciated!]
Outline
•
•
•
•
•
Multiple Instance Learning (MIL) Review
Typical MIL Applications
Query Bag Model for MIL
Filtering Strategies
Conclusion
Outline
•
•
•
•
•
Multiple Instance Learning (MIL) Review
Typical MIL Applications
Query Bag Model for MIL
Filtering Strategies
Conclusion
Multiple Instance Learning (MIL)
• Ambiguity in training data
• Instead of instance/label pairs, get bag of
instances/label pairs
• Bag is positive if one or more of it’s members
is positive
Multiple Instance Learning (MIL)
• Supervised Learning Training Input
• MIL Training Input
• Goal: learning instance classifier
MIL Assumptions
• Bags are predefined/fixed & finite (size
• Bag label determined by:
)
• Typical assumption: instances all drawn i.i.d.
• Refer to this as a classical bag
MIL Theory
• Best known PAC bound due to Blum et al.
•
•
•
= dimensionality
= bag size
= the desired error.
• Problem harder for larger bags
• Result relies on i.i.d. assumption
Outline
•
•
•
•
•
Multiple Instance Learning (MIL) Review
Typical MIL Applications
Query Bag Model for MIL
Filtering Strategies
Conclusion
MIL Applications
• Most MIL Apps: bag generated by breaking
object into many overlapping pieces.
• Let’s see some examples…
Vision
• Image known to contain object, but precise
location unknown
[Andrews et al. 02, Viola et al. 05]
Audio
• Audio wave can be broken up spatially
or in frequency domain
[Saul et al. 01]
Biological Sequences
• Known to contain short subsequence of interest
ACTGTGTGACATGTAGC
{ ACTG, CTGT, TGTG…}
…
[Ray et al. 05]
Text
• Text document broken down into smaller pieces
[Andrews et al. 02]
Observations
• Sliding windows: bags are large/infinite.
• In practice, bag is sub-sampled
– Could violate the assumption
• Instances of bag not independent – often lie
on low dim. manifold (i.e. image patches)
!
Outline
•
•
•
•
•
Multiple Instance Learning (MIL) Review
Typical MIL Applications
Query Bag Model for MIL
Filtering Strategies
Conclusion
Query Bags for MIL
• Bag not fixed – can query oracle to get
arbitrary number of instances
• Each query bag represented by object
• To retrieve instances, use query function
with location parameter
Query Bag for MIL
• Instances often lie on low dim. manifold
• Can query for nearby instances
Query Bags for MIL
• Can express bag as
• Define bag label as
Distribution of locations
• Assume for each bag there is some
distribution (known or unknown)
• Could provide some prior information.
• Let
, how informative is
Query Bag Size
• To determine bag label with confidence
need
• Bigger bag = better. Less chance of missing correct
positive instance
• Note the difference between query bags and
classical bags
Example: Line Bags
• Instances of a bag lie on a line.
Example: Hypercube Bags
• Instances of a bag lie in a hypercube
Example: Image Translation Bags
• Let
be large image,
at location
be patch centered
• Could easily extend this to rotations, scale changes,
etc.
Experiments
• Goal: compare behavior of synthetic classical
bags and query bags to real dataset (MNIST).
• Use MILBoost (Viola et al. ’05).
• Expect qualitatively similar results for other
MIL algorithms.
• For query bags, subsample
instances
Results
Experiment: Variance
• How does distribution
affect error?
• Repeat Line Bag experiment, increase variance
of
- spreads points out along the line.
Observations
• PAC results not applicable to query bags –
performance increase as increases.
• MNIST results closely resemble synthetic
query bag examples.
• Need computational strategy for dealing with
large bags.
• Take advantage of relationships between
instances.
Outline
•
•
•
•
•
Multiple Instance Learning (MIL) Review
Typical MIL Applications
Query Bag Model for MIL
Filtering Strategies
Conclusion
MILBoost Review
• Train a strong classifier (just like AdaBoost)
• Optimize log likelihood of bags
where
and
• Use Gradient Boosting (Friedman ’01)
– In each iteration add
close to
MILBoost w/ Query Bags
• Bag probability over all instances
• In practice, subsample bag:
• Could subsample once in the beginning, or do
something more clever…
Filtering Strategies
• Recently, Bradley & Schapire proposed
FilterBoost, which learns from continuous
source of data.
• Alternates between training weak classifier
and querying oracle for more data.
• Apply this idea to MILBoost
Filtering Strategies
• Want highest probability instances
• Parameters:
–
–
–
= number of boosting iterations
= number of instances to evaluate
= frequency of filtering
Filtering Strategies
• Random Sampling (RAND)
– Query
instances, keep best
• Memory (MEM)
– Query
new instances, combine with old
ones, keep best
MEM
RAND
Filtering Strategies
• Search (SRCH)
– Assume instances lie on low dimensional manifold
–
– Search for nearby
such that
– Test
nearby locations
MEM
SRCH
RAND
MNIST Filtering Experiments
• Turn SRCH and MEM on and off.
• Sweep through:
– R = sampling amount (16)
– m = bag size (4)
– F = sampling frequency (1)
– T = boosting iterations (64)
MNIST Filtering Exp: m
• Filtering converges w/ smaller memory usage
MNIST Filtering Exp: R & F
• MEM is very effective
• SRCH helps when MEM is OFF, not as big of a
difference when MEM is ON
MNIST Filtering Exp: T
• w/o MEM filtering does not converge
• Positive region becomes sparse
Why MEM Works
• Let
be log likelihood with
• Can show (for a fixed classifier H)
–
– Using MEM, we add
bag in each iteration, so
new instances per
• In reality H is not fixed; hard to show
convergence.
Outline
•
•
•
•
•
Multiple Instance Learning (MIL) Review
Typical MIL Applications
Query Bag Model for MIL
Filtering Strategies
Conclusion
Summary
• Current assumptions for MIL are not
appropriate for typical MIL applications.
• We proposed query bag model, fits real data
better
• For query bags, sampling more instances is
better.
• We proposed some simple strategies for
dealing with large/infinite query bags.
Future Work
• Develop more theory for the query bag
model.
• Experiments with other domains (audio,
bioinformatics).
• MCL – learning pedestrian parts automatically.
Questions?
Filtering Query Bags
MILBoost with Filtering
Download