Optimized Backward Chaining Reasoning System for a

advertisement
Optimized Backward Chaining
Reasoning System for a Semantic Web
Hui Shi, Kurt Maly, and Steven Zeil
Contact: maly@cs.odu.edu
WIMS 2014, June 2-4Thessaloniki, Greece
1
Outline
•
•
•
•
•
Problem
– Semantic web subject to changes
– How to scale a reasoner to big data?
Background
– Knowledge base using ontologies
– Inference strategies
– Benchmarks
– Query optimization
Integrated optimized backward chaining
– Selection function
– Switching resolution methods
– Avoidance of non-termination – OLDT
– Owl:sameAs optimization
Evaluation
Conclusions
WIMS 2014, June 2-4Thessaloniki, Greece
2
Problem
Efficiency of reasoning in the face of large scale
and frequent changes within a question/answer
system over a semantic web
• Issue
– Forward chaining scales well for fixed knowledge
bases
– Backward chaining can handle changes in knowledge
base but does not scale
WIMS 2014, June 2-4Thessaloniki, Greece
3
Background
• Existing semantic application: question/answer systems
– Libra, Cimple, Arnetminer
• Semantic Web
–
–
–
–
Resource Description Framework(RDF)
Web Ontology Language (OWL) for specific knowledge domains
SPARQL query language for RDF
SWRL rule language
• Reasoning systems
–
–
–
–
Jena proprietary Jena rules
Pellet and KANON
ORACLE 11g
OWLIM
WIMS 2014, June 2-4Thessaloniki, Greece
4
Background
• Knowledge base (KB)
– Ontologies
– Representation formalism: Description Logic (DL)
• Inference methods for First Order Logic
– Materialization and forward chaining
• pre-computes inferred truths and starts with the known data
• suitable for frequent computation of answers with data that are relatively static
• Owlim and Oracle
– Query-rewriting and backward chaining
• expands the queries and starts with goals
• suitable for efficient computation of answers with data that are dynamic and infrequent
queries
• Virtuoso
WIMS 2014, June 2-4Thessaloniki, Greece
5
Background
• Benchmarks evaluate and compare the performances of
different reasoning systems
– The Lehigh University Benchmark (LUBM)
– The University Ontology Benchmark (UOBM)
WIMS 2014, June 2-4Thessaloniki, Greece
6
Background
• Query optimization – issues
– Query (conjunction of individual clauses) optimization over
databases – well understood
– Having reasoner -> uncertainty regarding the size of solution
space associated with resolving individual clauses
– Query optimization in the presence of such uncertainty
• Dynamic Optimization with an Interposed Reasoner
• A greedy ordering of the proofs of the individual clauses
according to estimated sizes anticipated for the proof results
• Deferring joins of results from individual clauses where such
joins are likely to result in excessive combinatorial growth of
the intermediate solution
WIMS 2014, June 2-4Thessaloniki, Greece
7
Hybrid reasoner
Motivation example
• Assume fully materialized KB
• Harvester adds new fact: student0 enrolled
course0
• Query ‘Who is enrolled in course 0?’ ok
• Assume fact Porf0 teaches course0 in KB
• Query “Who is being taught by Prof0?” not ok as
simple lookup; needs reasoning with rule such as:
enrolledIn(?Student,?Course?),
teaches(?Faculty,?Course)
:- isTaughtBy(?Student,?faculty)
WIMS 2014, June 2-4Thessaloniki, Greece
8
Optimized Backward Chaining
• Problem
– Generate a query response for a given query
pattern based on a specific rule set (RDFS ,
Horst, custom)
• Four Optimizations
– Ordered Selection Function
– Switching between Binding Propagation and Free Variable
Resolution
– Avoid Repetition and Non-Termination (OLDT)
– owl:sameAs Optimization
WIMS 2014, June 2-4Thessaloniki, Greece
9
Dynamic Selection of Propagation Mode
• Suppose that:
– we have a rule body containing clauses (?x
p1 ?y) and (?y p2 ?z)
– we have already proven that the first clause
can be satisfied using value pairs {(x1, y1),
(x2,y2),…(xn,yn)}.
WIMS 2014, June 2-4Thessaloniki, Greece
10
Dynamic Selection of Propagation Mode
• Binding propagation mode
– the bindings from the earlier solutions are substituted
into the upcoming clause to yield multiple instances of
that clause as goals for subsequent proof
– (y1 p2 ?z), (y2 p2 ?z), …, (yn p2 ?z)
• Free variable resolution mode
– a single proof is attempted of the upcoming clause in
its original form, with no restriction upon the free
variables in that clause
– (?y p2 ?z)
WIMS 2014, June 2-4Thessaloniki, Greece
11
Dynamic Selection of Propagation Mode: Example
• Suppose we have an earlier body clause 1: “?y
type Course” and a subsequent body clause 2:
“?x takesCourse ?y”.
– 1.749 seconds to prove body clause 1
– average of 0.235 seconds to prove body clause 2 for
a given value of ?y from the proof of body clause 1.
– 86,361 students satisfying variable ?x
– 0.235 *86,361=20,295 seconds with binding
propagation
– 2.612 seconds to resolve the second clause in free
variable resolution
WIMS 2014, June 2-4Thessaloniki, Greece
12
Dynamic Selection of Propagation Mode
– Dynamically switch between modes based
upon the size of the partial solutions obtained
• Let n denote the number of solutions that satisfy an already proven clause
• Let t denote the threshold used to dynamically select between modes
• If n≤t, then the binding propagation mode will be
selected
• If n>t, then the free variable resolution mode will
be selected
• The larger the threshold is, the more likely
binding propagation mode will be selected.
WIMS 2014, June 2-4Thessaloniki, Greece
13
Calculation of Threshold t
–
–
–
–
Let join1 denote the time spent on the join operations in binding propagation mode
Let join2 denote the time spent on the join operations in free variable resolution mode
Let proof1i denote the time of proving first clause with i free variables and proof2j be the
average time of proving new specialized form with j free variables. (i ∈ [1,3], j ∈ [0,2])
Let proof3k denote the time of proving second clause with k free variables (k∈[1,3])
• Compare the time spent on binding propagation
mode and free variable resolution mode to
determine t. Binding propagation is favored
when
proof1i + proof2j * n + join1 < proof1i + proof3k + join2
• t = floor(proof3k/ proof2j )
WIMS 2014, June 2-4Thessaloniki, Greece
14
Calculation of Threshold t
• To estimate proof3k and proof2j
– we record the time spent on proving goals with different numbers
of free variables
– after we have recorded a sufficient number of proof times ,we
compute the average time spent on goals with k free variables
and j free variables respectively
• Start with historical default value
• Update the threshold several times when
answering a particular query
WIMS 2014, June 2-4Thessaloniki, Greece
15
Evaluation
Time (ms),
Dynamic
selection
Query1
Query2
Query3
Query4
Query5
Query6
Query7
Query8
Query9
Query10
Query11
Query12
Query13
Query14
343
1,060
15
858
15
1,170
1,341
1,684
1,591
982
93
109
0
156
Time (ms),
Binding
propagation
only
343
1,341
20
961
16
592,944
551,822
513,773
524,787
509,078
109
156
10
140
Time (ms),
Free variable
resolution only
296
21,278
15
42,572
22,323
19,968
20,217
40,061
20,841
19,734
19,141
38,313
21,528
140
WIMS 2014, June 2-4Thessaloniki, Greece
16
Overall Performance
LUBM(1)
LUBM(40)
Time
Time
Time
Time
(ms),
(ms),
(ms),
(ms),
Opt.
OWLIM
Opt.
OWLIM
Backwd
-SE
Backwd
-SE
LUBM(1)
= 100,839
Loading
time
Query1
Query2
Query3
Query4
Query5
Query6
Query7
Query8
Query9
Query10
Query11
Query12
Query13
Query14
2,900
9,600
95,000
350,000
260
490
56
470
33
180
190
540
250
140
190
220
28
24
27
3.4
1.0
8.4
59
240
4.4
460
63
0.10
4.9
1.0
0.20
23
1,400
9,100
36
5,900
15
43,000
51,000
57,000
87,000
51,000
200
3,600
33
1,200
26
5,100
2.5
14
41
5,300
54
3,000
4,400
0.60
5.4
11
17
2,500
WIMS 2014, June 2-4Thessaloniki, Greece
LUBM(40)
= 5,307,754
17
Conclusions
• We have developed optimizations for a backward chaining
algorithm
• New optimized algorithm outperformed one of the best
forward-chaining reasoner in scenarios where the
knowledge base is subject to frequent change
WIMS 2014, June 2-4Thessaloniki, Greece
18
Download