Optimized Backward Chaining Reasoning System for a Semantic Web Hui Shi, Kurt Maly, and Steven Zeil Contact: maly@cs.odu.edu WIMS 2014, June 2-4Thessaloniki, Greece 1 Outline • • • • • Problem – Semantic web subject to changes – How to scale a reasoner to big data? Background – Knowledge base using ontologies – Inference strategies – Benchmarks – Query optimization Integrated optimized backward chaining – Selection function – Switching resolution methods – Avoidance of non-termination – OLDT – Owl:sameAs optimization Evaluation Conclusions WIMS 2014, June 2-4Thessaloniki, Greece 2 Problem Efficiency of reasoning in the face of large scale and frequent changes within a question/answer system over a semantic web • Issue – Forward chaining scales well for fixed knowledge bases – Backward chaining can handle changes in knowledge base but does not scale WIMS 2014, June 2-4Thessaloniki, Greece 3 Background • Existing semantic application: question/answer systems – Libra, Cimple, Arnetminer • Semantic Web – – – – Resource Description Framework(RDF) Web Ontology Language (OWL) for specific knowledge domains SPARQL query language for RDF SWRL rule language • Reasoning systems – – – – Jena proprietary Jena rules Pellet and KANON ORACLE 11g OWLIM WIMS 2014, June 2-4Thessaloniki, Greece 4 Background • Knowledge base (KB) – Ontologies – Representation formalism: Description Logic (DL) • Inference methods for First Order Logic – Materialization and forward chaining • pre-computes inferred truths and starts with the known data • suitable for frequent computation of answers with data that are relatively static • Owlim and Oracle – Query-rewriting and backward chaining • expands the queries and starts with goals • suitable for efficient computation of answers with data that are dynamic and infrequent queries • Virtuoso WIMS 2014, June 2-4Thessaloniki, Greece 5 Background • Benchmarks evaluate and compare the performances of different reasoning systems – The Lehigh University Benchmark (LUBM) – The University Ontology Benchmark (UOBM) WIMS 2014, June 2-4Thessaloniki, Greece 6 Background • Query optimization – issues – Query (conjunction of individual clauses) optimization over databases – well understood – Having reasoner -> uncertainty regarding the size of solution space associated with resolving individual clauses – Query optimization in the presence of such uncertainty • Dynamic Optimization with an Interposed Reasoner • A greedy ordering of the proofs of the individual clauses according to estimated sizes anticipated for the proof results • Deferring joins of results from individual clauses where such joins are likely to result in excessive combinatorial growth of the intermediate solution WIMS 2014, June 2-4Thessaloniki, Greece 7 Hybrid reasoner Motivation example • Assume fully materialized KB • Harvester adds new fact: student0 enrolled course0 • Query ‘Who is enrolled in course 0?’ ok • Assume fact Porf0 teaches course0 in KB • Query “Who is being taught by Prof0?” not ok as simple lookup; needs reasoning with rule such as: enrolledIn(?Student,?Course?), teaches(?Faculty,?Course) :- isTaughtBy(?Student,?faculty) WIMS 2014, June 2-4Thessaloniki, Greece 8 Optimized Backward Chaining • Problem – Generate a query response for a given query pattern based on a specific rule set (RDFS , Horst, custom) • Four Optimizations – Ordered Selection Function – Switching between Binding Propagation and Free Variable Resolution – Avoid Repetition and Non-Termination (OLDT) – owl:sameAs Optimization WIMS 2014, June 2-4Thessaloniki, Greece 9 Dynamic Selection of Propagation Mode • Suppose that: – we have a rule body containing clauses (?x p1 ?y) and (?y p2 ?z) – we have already proven that the first clause can be satisfied using value pairs {(x1, y1), (x2,y2),…(xn,yn)}. WIMS 2014, June 2-4Thessaloniki, Greece 10 Dynamic Selection of Propagation Mode • Binding propagation mode – the bindings from the earlier solutions are substituted into the upcoming clause to yield multiple instances of that clause as goals for subsequent proof – (y1 p2 ?z), (y2 p2 ?z), …, (yn p2 ?z) • Free variable resolution mode – a single proof is attempted of the upcoming clause in its original form, with no restriction upon the free variables in that clause – (?y p2 ?z) WIMS 2014, June 2-4Thessaloniki, Greece 11 Dynamic Selection of Propagation Mode: Example • Suppose we have an earlier body clause 1: “?y type Course” and a subsequent body clause 2: “?x takesCourse ?y”. – 1.749 seconds to prove body clause 1 – average of 0.235 seconds to prove body clause 2 for a given value of ?y from the proof of body clause 1. – 86,361 students satisfying variable ?x – 0.235 *86,361=20,295 seconds with binding propagation – 2.612 seconds to resolve the second clause in free variable resolution WIMS 2014, June 2-4Thessaloniki, Greece 12 Dynamic Selection of Propagation Mode – Dynamically switch between modes based upon the size of the partial solutions obtained • Let n denote the number of solutions that satisfy an already proven clause • Let t denote the threshold used to dynamically select between modes • If n≤t, then the binding propagation mode will be selected • If n>t, then the free variable resolution mode will be selected • The larger the threshold is, the more likely binding propagation mode will be selected. WIMS 2014, June 2-4Thessaloniki, Greece 13 Calculation of Threshold t – – – – Let join1 denote the time spent on the join operations in binding propagation mode Let join2 denote the time spent on the join operations in free variable resolution mode Let proof1i denote the time of proving first clause with i free variables and proof2j be the average time of proving new specialized form with j free variables. (i ∈ [1,3], j ∈ [0,2]) Let proof3k denote the time of proving second clause with k free variables (k∈[1,3]) • Compare the time spent on binding propagation mode and free variable resolution mode to determine t. Binding propagation is favored when proof1i + proof2j * n + join1 < proof1i + proof3k + join2 • t = floor(proof3k/ proof2j ) WIMS 2014, June 2-4Thessaloniki, Greece 14 Calculation of Threshold t • To estimate proof3k and proof2j – we record the time spent on proving goals with different numbers of free variables – after we have recorded a sufficient number of proof times ,we compute the average time spent on goals with k free variables and j free variables respectively • Start with historical default value • Update the threshold several times when answering a particular query WIMS 2014, June 2-4Thessaloniki, Greece 15 Evaluation Time (ms), Dynamic selection Query1 Query2 Query3 Query4 Query5 Query6 Query7 Query8 Query9 Query10 Query11 Query12 Query13 Query14 343 1,060 15 858 15 1,170 1,341 1,684 1,591 982 93 109 0 156 Time (ms), Binding propagation only 343 1,341 20 961 16 592,944 551,822 513,773 524,787 509,078 109 156 10 140 Time (ms), Free variable resolution only 296 21,278 15 42,572 22,323 19,968 20,217 40,061 20,841 19,734 19,141 38,313 21,528 140 WIMS 2014, June 2-4Thessaloniki, Greece 16 Overall Performance LUBM(1) LUBM(40) Time Time Time Time (ms), (ms), (ms), (ms), Opt. OWLIM Opt. OWLIM Backwd -SE Backwd -SE LUBM(1) = 100,839 Loading time Query1 Query2 Query3 Query4 Query5 Query6 Query7 Query8 Query9 Query10 Query11 Query12 Query13 Query14 2,900 9,600 95,000 350,000 260 490 56 470 33 180 190 540 250 140 190 220 28 24 27 3.4 1.0 8.4 59 240 4.4 460 63 0.10 4.9 1.0 0.20 23 1,400 9,100 36 5,900 15 43,000 51,000 57,000 87,000 51,000 200 3,600 33 1,200 26 5,100 2.5 14 41 5,300 54 3,000 4,400 0.60 5.4 11 17 2,500 WIMS 2014, June 2-4Thessaloniki, Greece LUBM(40) = 5,307,754 17 Conclusions • We have developed optimizations for a backward chaining algorithm • New optimized algorithm outperformed one of the best forward-chaining reasoner in scenarios where the knowledge base is subject to frequent change WIMS 2014, June 2-4Thessaloniki, Greece 18