SLUBM: An Extended LUBM Benchmark for Stream Reasoning Tu Ngoc Nguyen, Wolf Siberski L3S Research Center, Universität Hannover, Germany {tunguyen, siberski}@l3s.de 1 Outline 1. Motivation 2. Benchmark • • Dataset Methodology 3. Tested Systems 4. Evaluation • Settings and Results 5. Conclusion 2 Motivation RDF Stream is everywhere - social network, feed, financial market, network sensor The need of processing heterogeneous and noisy RDF - Stream-based reasoner Application developers have to choose - Best practice Benchmark 3 Benchmark Extended Lehigh University Benchmark [LUBM] • Synthetic data, fixed list of 14 queries Can be scaled to arbitrary sizes • Generate data of University domain Familiar but not trivial ontology • University, Faculty, Professors, Students, Courses, … • Realistic structural properties • Artificial literal data “Professor1”, “GraduateStudent216“, “Course7“ 4 Dataset • Simulate temporal University data • • Partition data by semesters RDF triples + time annotations • e.g., (<GraduateStudent31, ub:takescourse, GraduateCourse1>, semester2) • Predicate dynamic classification • Three classes: dynamic, near-dynamic and static • Examples: • Dynamic: teaches, takes course • Near-dynamic: has a member • Static: has a degree from 5 Methodology System pipeline 6 Methodology • Data Generator: • • Re-generate University -domain facts A semester counter for the loop ub:takescourse ub:Student rdfs:subClassOf ub:GradStudent ub:GradCourse rdfs:subClassOf semester ++ ub:Undergrad 7 Methodology • RDF Handler: • • Parse RDF stream into RDF triples Annotate RDF with timestamp according to the semester counter 8 Methodology out-dated facts need to be removed before adding new facts •Rules for dynamic facts (with dynamic predicates): • a time-to-last △t • a produced fact will be removed after △t 9 Tested Systems 1. BaseVISor • • Forward chaining inference engine Based on Rete algorithm 2. Pellet • OWL-DL reasoner 3. (Pellet)+Jena • RDF Framework, supports triple-based abstraction 4. (Pellet)+OWLAPI • RDF Framework, supports higher level of OWL abstraction syntax, the axioms 5. C-SPARQL • language for continuous queries over streams of RDF data • potential but not yet reasoning support 10 Evaluation Settings • Intel(R) Xeon(R) E7520 1.87GHz processor 80GB memory OpenJDK 1.6.0 24 Linux 2.6.x 64 bit • • • 14 LUBM Queries 1 dynamic predicate: takecourses (approx. 10 percent of generated data are dynamic) Metrics: load time, query response time 11 Evaluation Results BaseVISor Query time for LUBM queries for extended LUBM (1,0,5), which is LUBM(1,0) over 5 semesters •Query 5: (type Person ?X) (memberOf ?X http://www.Department0.University0.edu) •Query 6: (type Student ?X) •Query 13: (type Person ?X) (hasAlumnus http://www.University0.edu ?X) •Query 14: (type UndergraduateStudent ?X) “UndergraduateStudent ” BaseVISor Query time for Query 14 for extended LUBM (1,0,5), (5,0,5), (10,0,5) and (50,0,5) 12 Evaluation Results Query time for Query 14 for extended LUBM (10,0,5) “UndergraduateStudent ” Load time for extended LUBM (5,05), (10,0,5), (20,0,5) and (50,0,5) 13 Evaluation Results Query time for extended LUBM (1,0,5), (5,0,5), (10,0,5), (20,0,5) and (50,0,5) (for Query 14) “UndergraduateStudent ” 14 Evaluation Results 15 Conclusion Identified strong need for a stream-based reasoning benchmark • For stream-based application and stream-based reasoning developers Extended LUBM towards a stream-based benchmark • Other benchmarks can be extended similarly Preliminary experiment with (adapted) stream-based reasoners • BaseVISor shows potential performance 16