Toward Scalable Reasoning over Annotated RDF Data Using

advertisement
Chang Liu1, Guilin Qi2
1Shanghai Jiao Tong University
2Southeast University, China
Motivation
 More interests to represent additional information on
top of RDF
 Time, uncertainty, trust, and provenance
 => Annotated RDF
 Large amount of data
 YAGO2
 Problem: Large Scale Reasoning
Motivation (cont’d)
 Recent work on scalable reasoning using MapReduce
 WebPIE (ISWC ‘09, ESWC ‘10)
 Fuzzy pD* (ISWC ‘11)
 Our idea
 Large scale annoated RDF reasoner using MapReduce
Background: Annotated RDF
 Syntax: 𝑠, 𝑝, 𝑜 : 𝜆
 Deductive rules:
 Subproperty, Subclass, Domain, Range, Generalization
 Example:

Subproperty (a) 𝐴, sp, B : 𝜆1 , 𝐵, sp, 𝐶 : 𝜆2 ⇒ 𝐴, sp, 𝐶 : 𝜆1 ⊗ 𝜆2
 Zimmermann et al.: A general framework for representing,
reasoning and querying with annotated Semantic Web
data. Journal of Web Semantics 11, 72-95 (2012)
Background: MapReduce
Naïve Implementation
 Subproperty (a) 𝑋, P, 𝑌 : 𝜆1 , 𝑃, sp, 𝑄 : 𝜆2 ⇒ 𝑋, Q, 𝑌 : 𝜆1 ⊗ 𝜆2
(P,sp,Q) : 𝜆2
(X, P, Y) : 𝜆1
Mapper
Key
P
Mapper
Mapper
Reducer
Reducer
Value
1
X
Y
2
Q
𝜆2
Reducer
(X,Q,Y) : 𝜆1 ⊗ 𝜆2
𝜆1
Challenges and solutions
 Generalization Rule
 𝑠, 𝑝, 𝑜 : 𝜆1 , 𝑠, 𝑝, 𝑜 : 𝜆2 ⇒ 𝑠, 𝑝, 𝑜 : 𝜆1 ⊕ 𝜆2
 Delete triples from the data set

𝑠, 𝑝, 𝑜 : 𝜆1 , 𝑠, 𝑝, 𝑜 : 𝜆2
 Large data reconstruction cost
 Solution
 Only perform at the beginning and at the end
 Combine Generalization Rule with other rules

E.g. when a reducer generates 𝑠, 𝑝, 𝑜 : 𝜆1 and 𝑠, 𝑝, 𝑜 : 𝜆2 , it
generates 𝑠, 𝑝, 𝑜 : 𝜆1 ⊕ 𝜆2 instead.
Challenges and solutions (cont’d)
 Unnecessary Derivation
 E.g. 𝑠, 𝑝, 𝑜 : 1, 2 , 𝑝, sp, 𝑞 : 3, 4 ⇒ 𝑠, 𝑞, 𝑜 : ∅
 Waste a lot of computation time
 Solution
 Incorporate the annotation into mapped key
 E.g.



Map 𝑠, 𝑝, 𝑜 : 1, 2 to ((t1, p), (1, s,o, [1,2]))
Map 𝑝, sp, 𝑞 : 3, 4 to (t3, p), (2, q, [3,4]))
They will not be grouped together!
Challenges and solutions (cont’d)
 Fixpoint Calculation
 Subproperty/subclass rules require fixpoint iteration
 Solution
 Load subproperty/subclass schema triples into memory
 Calculate the closure

Shortest path calculation ⇒ Floyd-Warshall style algorithm
𝑥1 , sp, 𝑥2 : 𝜆1 , 𝑥2 , sp, 𝑥3 : 𝜆2 , … , 𝑥𝑛 , sp, 𝑥𝑛+1 : 𝜆𝑛 ⇒ 𝑥1 , sp, 𝑥𝑛+1 : 𝜆1 ⊗ ⋯ ⊗ 𝜆𝑛
“Shortest” path
𝑥1
𝑥2
…
𝑥𝑛+1
Experiment setup
 Dataset
 Fuzzified DBPedia core ontology
 fpdLUBM 1000, 2000, 4000, 8000
 Cluster
 25 machine with 75 mapper/reducer slots
 Liu et al.: Reasoning with Large Scale Ontologies in
Fuzzy pD* Using MapReduce. Computational
Intelligence Magazine, IEEE 7(2), 54-66 (2012)
Experiment result - fuzzy DBPedia
Dataset: fuzzified DBPedia core ontology
Results:
#units
128
32
16
8
4
2
Time(sec.) 122.653 136.861
146.393
170.859
282.802
446.917
822.269
Speedup
5.62
4.81
2.91
1.84
1.00
6.70
64
6.01
Experiment result – fpdLUBM
Experimental results of FuzzyPD and WebPIE
Number of Universities
Time of FuzzyPD
(minutes)
Time of WebPIE
(minutes)
1000
38.8
41.32
2000
66.97
74.57
4000
110.40
130.87
8000
215.48
210.01
Experiment result– fpdLUBM (cont’d)
Scalability over number of units
Number of units
Time(minutes)
Speedup
128
38.80
4.01
64
53.15
2.93
32
91.58
1.70
16
155.47
1.00
Experiment result– fpdLUBM (cont’d)
Scalability over number of units
Experiment result– fpdLUBM (cont’d)
Scalability over data volume
Number of
universities
Input
(Mtriples)
Output
(Mtriples)
Time
(minutes)
Throughput
(Ktriples/secon
d)
1000
155.51
92.01
38.8
39.52
2000
310.71
185.97
66.97
46.28
4000
621.46
380.06
110.40
57.37
8000
1243.20
792.54
215.50
61.29
Conclusion and Future work
 We show how to design MapReduce algorithms to
achieve scalable annotated RDFS reasoning
 Several challenges along with solutions
 Future work
 More experiments on annotated RDFS ontologies
 Annotated OWL 2 RL
Q&A
Download