Chang Liu1, Guilin Qi2 1Shanghai Jiao Tong University 2Southeast University, China Motivation More interests to represent additional information on top of RDF Time, uncertainty, trust, and provenance => Annotated RDF Large amount of data YAGO2 Problem: Large Scale Reasoning Motivation (cont’d) Recent work on scalable reasoning using MapReduce WebPIE (ISWC ‘09, ESWC ‘10) Fuzzy pD* (ISWC ‘11) Our idea Large scale annoated RDF reasoner using MapReduce Background: Annotated RDF Syntax: 𝑠, 𝑝, 𝑜 : 𝜆 Deductive rules: Subproperty, Subclass, Domain, Range, Generalization Example: Subproperty (a) 𝐴, sp, B : 𝜆1 , 𝐵, sp, 𝐶 : 𝜆2 ⇒ 𝐴, sp, 𝐶 : 𝜆1 ⊗ 𝜆2 Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012) Background: MapReduce Naïve Implementation Subproperty (a) 𝑋, P, 𝑌 : 𝜆1 , 𝑃, sp, 𝑄 : 𝜆2 ⇒ 𝑋, Q, 𝑌 : 𝜆1 ⊗ 𝜆2 (P,sp,Q) : 𝜆2 (X, P, Y) : 𝜆1 Mapper Key P Mapper Mapper Reducer Reducer Value 1 X Y 2 Q 𝜆2 Reducer (X,Q,Y) : 𝜆1 ⊗ 𝜆2 𝜆1 Challenges and solutions Generalization Rule 𝑠, 𝑝, 𝑜 : 𝜆1 , 𝑠, 𝑝, 𝑜 : 𝜆2 ⇒ 𝑠, 𝑝, 𝑜 : 𝜆1 ⊕ 𝜆2 Delete triples from the data set 𝑠, 𝑝, 𝑜 : 𝜆1 , 𝑠, 𝑝, 𝑜 : 𝜆2 Large data reconstruction cost Solution Only perform at the beginning and at the end Combine Generalization Rule with other rules E.g. when a reducer generates 𝑠, 𝑝, 𝑜 : 𝜆1 and 𝑠, 𝑝, 𝑜 : 𝜆2 , it generates 𝑠, 𝑝, 𝑜 : 𝜆1 ⊕ 𝜆2 instead. Challenges and solutions (cont’d) Unnecessary Derivation E.g. 𝑠, 𝑝, 𝑜 : 1, 2 , 𝑝, sp, 𝑞 : 3, 4 ⇒ 𝑠, 𝑞, 𝑜 : ∅ Waste a lot of computation time Solution Incorporate the annotation into mapped key E.g. Map 𝑠, 𝑝, 𝑜 : 1, 2 to ((t1, p), (1, s,o, [1,2])) Map 𝑝, sp, 𝑞 : 3, 4 to (t3, p), (2, q, [3,4])) They will not be grouped together! Challenges and solutions (cont’d) Fixpoint Calculation Subproperty/subclass rules require fixpoint iteration Solution Load subproperty/subclass schema triples into memory Calculate the closure Shortest path calculation ⇒ Floyd-Warshall style algorithm 𝑥1 , sp, 𝑥2 : 𝜆1 , 𝑥2 , sp, 𝑥3 : 𝜆2 , … , 𝑥𝑛 , sp, 𝑥𝑛+1 : 𝜆𝑛 ⇒ 𝑥1 , sp, 𝑥𝑛+1 : 𝜆1 ⊗ ⋯ ⊗ 𝜆𝑛 “Shortest” path 𝑥1 𝑥2 … 𝑥𝑛+1 Experiment setup Dataset Fuzzified DBPedia core ontology fpdLUBM 1000, 2000, 4000, 8000 Cluster 25 machine with 75 mapper/reducer slots Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012) Experiment result - fuzzy DBPedia Dataset: fuzzified DBPedia core ontology Results: #units 128 32 16 8 4 2 Time(sec.) 122.653 136.861 146.393 170.859 282.802 446.917 822.269 Speedup 5.62 4.81 2.91 1.84 1.00 6.70 64 6.01 Experiment result – fpdLUBM Experimental results of FuzzyPD and WebPIE Number of Universities Time of FuzzyPD (minutes) Time of WebPIE (minutes) 1000 38.8 41.32 2000 66.97 74.57 4000 110.40 130.87 8000 215.48 210.01 Experiment result– fpdLUBM (cont’d) Scalability over number of units Number of units Time(minutes) Speedup 128 38.80 4.01 64 53.15 2.93 32 91.58 1.70 16 155.47 1.00 Experiment result– fpdLUBM (cont’d) Scalability over number of units Experiment result– fpdLUBM (cont’d) Scalability over data volume Number of universities Input (Mtriples) Output (Mtriples) Time (minutes) Throughput (Ktriples/secon d) 1000 155.51 92.01 38.8 39.52 2000 310.71 185.97 66.97 46.28 4000 621.46 380.06 110.40 57.37 8000 1243.20 792.54 215.50 61.29 Conclusion and Future work We show how to design MapReduce algorithms to achieve scalable annotated RDFS reasoning Several challenges along with solutions Future work More experiments on annotated RDFS ontologies Annotated OWL 2 RL Q&A