dimaio.icdm03.ppt

advertisement
Speeding Up Relational Data
Mining by Learning to Estimate
Candidate Hypothesis Scores
Frank DiMaio and Jude Shavlik
UW-Madison Computer Sciences
ICDM Foundations and New Directions of Data Mining Workshop
19 November 2003
Rule-Based Learning
positive examples
negative examples
• Goal: Induce a rule (or rules) that explains ALL
positive examples and NO negative examples
Inductive Logic Programming (ILP)
• Encode background knowledge in first-order logic as
facts…
containsBlock(ex1,block1A).
containsBlock(ex1,block1B).
is_red(block1A).
is_square(block1A).
is_blue(block1B).
is_round(block1B).
on_top_of(block1B,block1A).
and logical relations …
above(A,B) :- onTopOf(A,B)
above(A,B) :- onTopOf(A,Z), above(Z,B).
Inductive Logic Programming (ILP)
• Covering algorithm applied to explain all data
-
-
+
+
-
+
+
-
+
+
-
-
+
+
+ +
+
-
Repeat
Generate
Remove
Choose
until
all
best
every
examples
some
rulepositive
that
positive
covered
covers
example
example
this
by this
is
example
covered
rule
Inductive Logic Programming (ILP)
• Saturate an example by writing everything true about it
• The saturation of an example is the bottom clause ()
ex2
C
B
A
positive(ex2) :contains_block(ex2,block2A),
contains_block(ex2,block2B),
contains_block(ex2,block2C),
isRed(block2A),
isBlue(block2B),
isBlue(block2C),
isRound(block2A),
isRound(block2B),
isSquare(block2C),
onTopOf(block2B,block2A),
onTopOf(block2C,block2B),
above(block2B,block2A),
above(block2C,block2B),
above(block2C,block2A).
Inductive Logic Programming (ILP)
• Candidate clauses are
generated by
Selected literals from 
containsBlock(ex2,block2B)
 choosing literals from 
isRed(block2A)
 converting ground terms to
variables
onTopOf(block2B,block2A)
• Search through the space of
candidate clauses using
standard AI search algo
• Bottom clause ensures
search finite
Candidate Clause
positive(A) :containsBlock(A,B),
onTopOf(B,C),
isRed(C).
ILP Time Complexity
• Time complexity of ILP systems depends on




Size of bottom clause ||
Maximum clause length c
Number of examples |E|
Search algorithm Π
c
• O(|| |E|) for exhaustive search
• O(|||E|) for greedy search
• Assumes constant-time clause evaluation!
Ideas in Speeding Up ILP
• Search algorithm improvements
 Better heuristic functions, search strategy
 Srinivasan’s (2000) random uniform sampling
(consider O(1) candidate clauses)
• Faster clause evaluations
 Evaluation time of a clause (on 1 example)
exponential in number of variables
 Clause reordering & optimizing
(Blockeel et al 2002, Santos Costa et al 2003)
• Evaluation of a candidate still O(|E|)
A Faster Clause Evaluation
• Our idea: predict clause’s evaluation in O(1)
time (i.e., independent of number of examples)
• Use multilayer feed-forward neural network to
approximately score candidate clauses
• NN inputs specify bottom clause literals selected
• There is a unique input for every candidate
clause in the search space
Neural Network Topology
Selected literals from 
Candidate Clause
containsBlock(ex2,block2B)
positive(A) :containsBlock(A,B),
onTopOf(B,C),
isRed(C).
onTopOf(block2B,block2A)
isRed(block2A)
containsBlock(ex2,block2B)
1
onTopOf(block2B,block2A)
1
Σ
isRound(block2A)
0
isRed(block2A)
1
predicted
output
Speeding Up ILP
• Trained neural network provides a tool for
approximate evaluation in O(1) time
• Given enough examples (large |E|), approximate
evaluation is free versus evaluation on data
• During ILP’s search over hypothesis space …
 Approximately evaluate every candidate explored
 Only evaluate a clause on data if it is “promising”
 Adaptive Sampling – use real evaluations to improve
approximation during search
When to Evaluate Approximated
Clauses?
• Treat neural network-predicted score as a
Gaussian distribution of true score
• Only evaluate clauses when there is sufficient
likelihood it is the best seen so far, e.g.
Pred = 11.1
current
best
P(Best) = 0.03
don’t evaluate
Best = 22
Pred = 18.9
current
hypothesis
← clause scores →
potential moves
P(Best) = 0.24
evaluate
Results
• Trained learning only on benchmark datasets




Carcinogenesis
Mutagenesis
Protein Metabolism
Nuclear Smuggling
• Clauses generated by random sampling
• Clause evaluation metric
compression =
posCovered – negCovered – length + 1
totalPositives
• 10-fold c.v. learning curves
Results
10-fold c.v. RMS Error
0.16
0.14
Protein Metabolism
0.12
Nuclear Smuggling
Mutagenesis
0.10
Carcinogenesis
0.08
0.06
0.04
0.02
0.00
0
200
400
600
Training Set Size
800
1000
• Test in an ILP system
 Potential for speedup in datasets
with many examples
 Will inaccuracy hurt search?
Predicted
Score
Future Work
Space of Clauses
• The trained network defines a function over the space of
candidate clauses
• We can use this function …
 Extract concepts
 Escape local maxima in heuristic search
Acknowledgements
Funding provided by
 NLM grant 1T15 LM007359-01
 NLM grant 1R01 LM07050-01
 DARPA EELD grant F30602-01-2-0571
Download