Toward Scalable Reasoning over Annotated RDF Data Using

Chang Liu1, Guilin Qi2 1Shanghai Jiao Tong University 2Southeast University, China Motivation  More interests to represent additional information on top of RDF  Time, uncertainty, trust, and provenance  => Annotated RDF  Large amount of data  YAGO2  Problem: Large Scale Reasoning Motivation (cont’d)  Recent work on scalable reasoning using MapReduce  WebPIE (ISWC ‘09, ESWC ‘10)  Fuzzy pD* (ISWC ‘11)  Our idea  Large scale annoated RDF reasoner using MapReduce Background: Annotated RDF  Syntax: 𝑠, 𝑝, 𝑜 : 𝜆  Deductive rules:  Subproperty, Subclass, Domain, Range, Generalization  Example:  Subproperty (a) 𝐴, sp, B : 𝜆1 , 𝐵, sp, 𝐶 : 𝜆2 ⇒ 𝐴, sp, 𝐶 : 𝜆1 ⊗ 𝜆2  Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012) Background: MapReduce Naïve Implementation  Subproperty (a) 𝑋, P, 𝑌 : 𝜆1 , 𝑃, sp, 𝑄 : 𝜆2 ⇒ 𝑋, Q, 𝑌 : 𝜆1 ⊗ 𝜆2 (P,sp,Q) : 𝜆2 (X, P, Y) : 𝜆1 Mapper Key P Mapper Mapper Reducer Reducer Value 1 X Y 2 Q 𝜆2 Reducer (X,Q,Y) : 𝜆1 ⊗ 𝜆2 𝜆1 Challenges and solutions  Generalization Rule  𝑠, 𝑝, 𝑜 : 𝜆1 , 𝑠, 𝑝, 𝑜 : 𝜆2 ⇒ 𝑠, 𝑝, 𝑜 : 𝜆1 ⊕ 𝜆2  Delete triples from the data set  𝑠, 𝑝, 𝑜 : 𝜆1 , 𝑠, 𝑝, 𝑜 : 𝜆2  Large data reconstruction cost  Solution  Only perform at the beginning and at the end  Combine Generalization Rule with other rules  E.g. when a reducer generates 𝑠, 𝑝, 𝑜 : 𝜆1 and 𝑠, 𝑝, 𝑜 : 𝜆2 , it generates 𝑠, 𝑝, 𝑜 : 𝜆1 ⊕ 𝜆2 instead. Challenges and solutions (cont’d)  Unnecessary Derivation  E.g. 𝑠, 𝑝, 𝑜 : 1, 2 , 𝑝, sp, 𝑞 : 3, 4 ⇒ 𝑠, 𝑞, 𝑜 : ∅  Waste a lot of computation time  Solution  Incorporate the annotation into mapped key  E.g.    Map 𝑠, 𝑝, 𝑜 : 1, 2 to ((t1, p), (1, s,o, [1,2])) Map 𝑝, sp, 𝑞 : 3, 4 to (t3, p), (2, q, [3,4])) They will not be grouped together! Challenges and solutions (cont’d)  Fixpoint Calculation  Subproperty/subclass rules require fixpoint iteration  Solution  Load subproperty/subclass schema triples into memory  Calculate the closure  Shortest path calculation ⇒ Floyd-Warshall style algorithm 𝑥1 , sp, 𝑥2 : 𝜆1 , 𝑥2 , sp, 𝑥3 : 𝜆2 , … , 𝑥𝑛 , sp, 𝑥𝑛+1 : 𝜆𝑛 ⇒ 𝑥1 , sp, 𝑥𝑛+1 : 𝜆1 ⊗ ⋯ ⊗ 𝜆𝑛 “Shortest” path 𝑥1 𝑥2 … 𝑥𝑛+1 Experiment setup  Dataset  Fuzzified DBPedia core ontology  fpdLUBM 1000, 2000, 4000, 8000  Cluster  25 machine with 75 mapper/reducer slots  Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012) Experiment result - fuzzy DBPedia Dataset: fuzzified DBPedia core ontology Results: #units 128 32 16 8 4 2 Time(sec.) 122.653 136.861 146.393 170.859 282.802 446.917 822.269 Speedup 5.62 4.81 2.91 1.84 1.00 6.70 64 6.01 Experiment result – fpdLUBM Experimental results of FuzzyPD and WebPIE Number of Universities Time of FuzzyPD (minutes) Time of WebPIE (minutes) 1000 38.8 41.32 2000 66.97 74.57 4000 110.40 130.87 8000 215.48 210.01 Experiment result– fpdLUBM (cont’d) Scalability over number of units Number of units Time(minutes) Speedup 128 38.80 4.01 64 53.15 2.93 32 91.58 1.70 16 155.47 1.00 Experiment result– fpdLUBM (cont’d) Scalability over number of units Experiment result– fpdLUBM (cont’d) Scalability over data volume Number of universities Input (Mtriples) Output (Mtriples) Time (minutes) Throughput (Ktriples/secon d) 1000 155.51 92.01 38.8 39.52 2000 310.71 185.97 66.97 46.28 4000 621.46 380.06 110.40 57.37 8000 1243.20 792.54 215.50 61.29 Conclusion and Future work  We show how to design MapReduce algorithms to achieve scalable annotated RDFS reasoning  Several challenges along with solutions  Future work  More experiments on annotated RDFS ontologies  Annotated OWL 2 RL Q&A

Toward Scalable Reasoning over Annotated RDF Data Using

Related documents

Products

Support

Toward Scalable Reasoning over Annotated RDF Data Using

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib