Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles

advertisement
Proactive Learning: CostSensitive Active Learning
with Multiple Imperfect
Oracles
Pinar Donmez and Jaime Carbonell
Language Technologies Institute,
School of Computer Science,
Carnegie Mellon University
CIKM ’08, Napa Valley, October 2008
Active learning Assumptions and
Real World
Active Learning
► unique
oracle
► perfect
oracle
 always right
 never tired
► works
for free or
charges uniformly
Real World
► multiple
sources of
information
► imperfect oracles
 unreliable
 reluctant
► expensive
or charges
non-uniformly
Solution: Proactive Learning
► Proactive
learning is a generalization of
active learning to relax these assumptions
► decision-theoretic
framework to jointly
optimize instance-oracle pair
► utility
optimization problem under a fixed
budget constraint
Outline
► Methodology
 3 Scenarios
► Reluctance
► Fallibility
► Variable
and Fixed Cost
► Evaluation
 Problem Setup
 Datasets
 Results
► Conclusion
Scenario 1: Reluctance
►2
oracles:
 reliable oracle: expensive but always answers
with a correct label
 reluctant oracle: cheap but may not respond to
some queries
► Define
a utility score as expected value of
information at unit cost
P (ans | x , k ) *V (x )
U (x , k ) 
Ck
How to simulate oracle unreliability?
►
depend on factors such as query difficulty (hard to classify), complexity of the
data (requires long and time-consuming analysis), etc. In this work, we model
it based on query difficulty
►
Assumptions
 Perfect oracle ~ classifier having zero training error on the entire data
 Imperfect oracle ~ weak classifier trained on a subset of the entire data
►
Train a logistic regression classifier on the subset to obtain Pˆ(y | x )
►
Identify instances with
►
These are the unreliable instances
►
Challenge: tradeoff between the information value of an instance and the
reliability of the oracle
Pˆ(y | x )  [0.45, 0.5]
How to estimate
►
►
Pˆ(ans | x , k ) ?
Cluster unlabeled data using k-means
Ask the label of each cluster centroid to the reluctant oracle.
If
 label received: increase Pˆ(ans | x ,reluctant) of nearby points
 no label: decrease Pˆ(ans | x ,reluctant) of nearby points

 h (x c t , y c t ) maxd  x c t  x
ˆ
P (ans | x ,reluctant) 
exp 
ln
Z
2
x ct  x


0.5
h (x c , y c )  {1, 1}
►


 x  C t


equals 1 when label received, -1 otherwise
# clusters depend on the clustering budget and oracle fee
►
Algorithm works in rounds till no budget
►
At each round, sampling continues until a label is obtained
►
Be careful: You may spend the entire budget on a single
attempt
►
If no label, decrease the utility of remaining instances:
Pˆ(ans | x ,reluctant) *V (x )
ˆ
U (x ,reluctant) 
C round
where C round is the amount spent thus far in the given round
►
This is adaptive Penalization of the Reluctant Oracle
Algorithm for Scenario 1
Scenario 2: Fallibility
►
2 oracles:
 One perfect but expensive oracle
 One fallible but cheap oracle, always answers
►
Alg. Similar to Scenario 1 with slight modifications
►
During exploration:
 Fallible oracle provides the label with its confidence
 Confidence = Pˆ(y | x ) of fallible oracle
 If Pˆ(y | x )  [0.45, 0.5] then we don’t use the label
but we still update Pˆ(correct | x , k )
Outline of Scenario 2
Scenario 3: Non-uniform Cost
► Uniform
etc.
cost: Fraud detection, face recognition,
► Non-uniform
cost: text categorization, medical
diagnosis, protein structure prediction, etc.
►2
oracles:
 Fixed-cost Oracle
 Variable-cost Oracle
C non unif (x )  1 
max y Y Pˆ(y | x )  1 Y
1 1 Y
Outline of Scenario 3
Evaluation
► Datasets:
Face detection, UCI Letter (V-vs-Y),
Spambase, and UCI Adult
Oracle Properties and Costs
►
►
►
The cost is inversely proportional to reliability
Higher costs for the fallible oracle since a noisy label
should be penalized more than no label at all
Cost ratio creates an incentive to choose between oracles
Underlying Sampling Strategy
► Conditional
entropy based sampling, weighted by
a density measure
► Captures
the information content of a close
neighborhood


U (x )  log  min Pˆ(y | x ,wˆ)   exp  x  k
k  x N x
y { 1}


2
2


ˆ
* min P (y | k ,wˆ) 
y { 1}

close neighbors of x


Results: Overall and Reluctance on
Spambase Data
Results: Reluctance
Cost varies non-uniformly
statistically significant results (p<0.01)
More light on the clustering step
► Run
each baseline without the clustering step
► Entire budget is spent in rounds for data elicitation
► No separate clustering budget
► Results on Spambase under Scenario 1, cost 1:3
Conclusion
►
Address issues with the assumptions of active learning
►
Introduction to a Proactive Learning framework
►
Analysis of imperfect oracles with differing properties and
costs
►
Expected utility maximization across oracle-instance pairs
►
Effective against exploitation of a single oracle
Download