Document

advertisement
Efficient Physical Operators for a
cost-based XPath Execution Engine
Haris Georgiadis
Minas Charalambides
Vasilis Vassalos
Athens University of Economics and Business
1
Motivation (1)
 XPath query: /s/r/*/it[mb/m/to=‘x’]//k
 Three navigation alternatives (among others):
Straightforward
Starting
from to
k navigation
retrieveallallto
return
k itelements
elements
elementswith
under
under
at least
/s/r/*/it/mb/m/to,
/s/r/*/it;
onekeep
it ancestor,
those
keephaving
which
only those
at
in least
turn:
with
one
text
to
•descendant
value
has a‘x’,
tothen
descendant
under
go backward
/mb/m/to
undervia
with
/mb/m/to
parent::m/parent::mb/parent::it
text value
with text
‘x’. For
value
the‘x’it and
elements
and,left,
for
•return
the
has
it elements
atheir
s document
k descendants.
left, return
element
their
ancestor
k descendants
via relative path
parent::*/parent::r/parent::s.
Athens University of Economics and Business
2
Motivation (2)
 Many XPath processing algorithms
 PPFS+ , Staircase Join, Sort Merge-based structural
joins, PathStack, Twig2Stack etc
 Many physical data models and storage
techniques :
 Shredding on relations:
 Schema-based mapping vs. edge-based mapping
 Storage into disk pages preserving XML hierarchy
 Structural encodings:
 Region Encoding vs. Prefix based encoding
 Data structures: XB-trees, F&B Index, Path indexes
Athens University of Economics and Business
3
Contribution I
 GeCOEX: the first generic Xpath cost-based
execution and optimization framework
 Agnostic to the underlying XML storage system
and the access methods it supports
 Independent of the techniques and algorithms
available for XPath processing.

Encapsulated in operator implementations, and rewriting rules
 Cost based optimization
Athens University of Economics and Business
5
Contribution II
 XPalgebra: A novel XPath logical algebra
 Good fit with many XPath processing techniques
 Lookup and SM: two novel and efficient families
of physical operators for Xpath
 Multiple storage engines
 Experimental evaluation: Direct comparison of
operator implementations
Athens University of Economics and Business
6
GeCOEX System Architecture
XPath query
Query Execution Query Optimization
Parser
Physical
Plan
Selector
Physical Plan
Executor
result
Rewriting Rules
Database
Statistics
Descriptors
Physical Operator
Descriptors Cost
Models
Descriptors
Physical Operators
Primitive
Access
Method Cost
Models
XPA
Driver
Primitive
Access
Methods
Data Model
XPA API
Athens University of Economics and Business
7
XPalgebra
 Generic sequence-based logical algebra for a subset of XPath
 Forward and backward axes
 Non-positional predicates involving conjunctive boolean expressions
 Maintains the navigation nature of Xpath
 Data Model
 Element
 Sequence

Duplicate-free list of elements in document order
 Sequence Operators: (mainly) navigation
 Input and Output: Sequence
 Boolean Operators: used for filtering
 Input: Element
 Output: True or False
Athens University of Economics and Business
8
XPalgebra – Sequence Operators
 Both the input and the output of a Sequence operator are sequences of
nodes
 The input sequence is called context sequence
BoolExpr: const | Ъ1^Ъ2^ … ^Ъn , where Ъi : Boolean Operator
Athens University of Economics and Business
10
XPalgebra – Boolean Operators
 applied on single nodes only
 the input element is called context element
 return boolean values
f(S, Ъfp/d//c)
…[d//c]
BoolExpr: const | Ъ1^Ъ2^ … ^Ъn , where Ъi : Boolean Operator
Athens University of Economics and Business
12
XPalgebra - examples
/s/r/*/it[mb/m/to=‘x’]//k
dk(f(fp/s/r/*/it(root), Ъfp/mb/m/to(Ъvftext()=x)))
Athens University of Economics and Business
13
Physical Operators
 Implements the Sequence interface of XPA API
 Access the XML data using the AccessMethods interface of the XPA API
Example: a physical operator implementation
That’s how physical operators are
agnostic to the physical data model
Athens University of Economics and Business
14
Physical Operators
 Large number of physical operators, divided roughly into
four ‘families’:
 Lookup operators (LU)
 Inspired by indexed nested loops join
 dLUa: for each element n from input sequence S make a lookup using
XPAAPI.Descs(n, a)
 SortMerge-based operators(SM)
 Inspired by Sort Merge join
 dSMa: scan all elements from input sequence S and all a elements
(using XPAAPI.Descs(root, a)) and find ‘ancestor-descendant’
matches
 Staircase Join operators[Grust 2003]
 PathStack operators [Bruno 2002]
Athens University of Economics and Business
15
Physical Operators
LU*
SM*
Staircase
[Grust 2003]
PathStack
[Bruno 2002]
c (child)


 **

d (descendant)





 **




X
 **
a (ancestor)



 **
bp (backward path)


 **
X
cs (cousin)


X
X
s
fp (forward path)
p (parent)
**: inspired by original
Athens University of Economics and Business
16
5 XML Storage Systems and their XPA drivers
 The PE-basic Native XML storage system

XPath query
Dewey encoding, 1 B-Tree per tag name
Rewriting Rules
Parser
Query Execution Query Optimization
Database
 The RE-basic Native XML storage system
Statistics
Pre/Post/Level
encoding,
1 B-Tree per tag name
Descriptors
Physical
Primitive
Plan
 The PE-Path
NativePhysical
XMLOperator
storage system
Access
Selector
Descriptors Cost
 Dewey encoding, 1 B-Tree per tag name, Paths
MethodB-Tree
Cost

Models
XPA
Driver
Models
 The RE-Path Native XML storage system

Pre/Post/Level encoding, 1 B-Tree per tag name,
Primitive
Physical Plan
Descriptors
Paths
B-Tree
Access
Executor
Physical Operators
Methods
 The Edge-RE Native XML storage system

XML
Storage
System
Pre/Post/Level
encoding, 1 B-Tree for all elements
result
Data Model
XPA API
Athens University of Economics and Business
22
Lookup Operators
 Novel efficient algorithms for holistically evaluating forward and
backward multi-step paths
 Based on root-to-node filtering.
 buffered-leaping: a new technique for pipelined duplicate
elimination and document order preservation
 Search a minimum window of elements for each element in the
context sequence
 window: the result of calling the method from the AccessMethods
interface of the XPA API (e.g. Descs(), Ancs()) corresponding to the
XPath axis (e.g. descendant, ancestor) for a given context element
Example: fpLU/c/f
f17
regExprFilter(f1.getRTNPath(),
regExprFilter(f2.getRTNPath(),
regExprFilter(f3.getRTNPath(),
regExprFilter(f4.getRTNPath(),
regExprFilter(f5.getRTNPath(),
f6
f8
f12
f13
f16
is reachable
not
descendant
reachable
offrom
b5
b7
b7
b9
via
b9 /c//f
via/c//f,
/c//f 1) = true
false
f9
f10
f11
descendant
again
again
not
not
reachable
reachable
offrom
b3
and
from
from
f7 not
b5
regExprFilter(f8.getRTNPath(),
any
of b3, b5, b7 via /c//f
/c//f, 1)
regExprFilter(f6.getRTNPath(),
regExprFilter(f7.getRTNPath(),
3) = false
true
r
b1
c
f1
c
e
b2
f4
f3
b3
b4
c
b5
f5
d
c
f2
f6
b8
b6
c
b7
c
c
f8 f9
f7
window =XPAPI.Descs(b9,‘f’);
=XPAPI.Descs(b1,‘f’);
=XPAPI.Descs(b2,‘f’);
=XPAPI.Descs(b3,‘f’);
The size of chain at
any time is very small
and upper bounded
by the depth of the
XML document
b9
d
c
c
f16
f11
c
f14
f15
f12
f13
c
f17
f10
f1
next()
f3
next()
f5
next()
f7
contextEl
b1
b1
b2
b2
b3
b3
b9
b5
next()
f12
b7
next()
f13
next()
f17
null
chain
next()
rootAnc
b9
b2 is
b3
b5
b7
b9
context
notnot
a adescendant
sequence
descendant
a descendant
isofexhausted
ofb3
of
b1b3
b2
d
b5 b7
reverseOf(parent::c/ancestor::b)=/c//f
V: regExprFilter(f3.getRTNPath(), /c//f, 1)=true
Example:
LU
bp parent::c/ancestor::b
f8
f3
f5
f6
not
b1
b1
b2
f11is
isa adescendant
descendantofofb3
b3
r
b1
c
f1
c
e
b2
f4
f3
c
b3
b4
b5
f5
d
c
f2
f6
b6
c
f7
window =XPAPI.Ancs(f3,‘b’);
window
window
window
c
f8 f9
window =XPAPI.Ancs(f2,‘b’);
window
b8
b7
c
d
f11
f12
b9
c
c
f16
c
f14
f15
d
c
f17
f13
f10
next()
b1
next()
b2
next()
b4
contextEl sortedElements
V
f2
# b1
f3
V
#
b2
Cheap implementationf5of Ancs() in the PE-Path driver
=XPAPI.Ancs(f5,‘b’);
V
Dewey(f2)=1.1.2.1.1
# b3 # b4 # b5 # b7
f6
=XPAPI.Ancs(f6,‘b’);
RTN(f2)= /r/b/c/f => there is a ‘b’ ancestor b’ at level 2
Dewey(b’)= substr(dewey(f2),
…) = 1.1
f8
=XPAPI.Ancs(f8,‘b’);
RTN(b’)=substr(RTN(f2), …) = /r/b
f11
=XPAPI.Ancs(f11,‘b’);
Ancs() outputs n without actually retrieving b1 from the
database. n is the virtual
representation of b1, denoted as #b1
null
SM Operators
 Inspired by sort-merge join algorithms
 Traverse two sequences of elements, left and right
 left: the context sequence (the input sequence)
 right: always consists of all the elements of the requested tag name
 Keeping track of the current elements on left and right, try to find
matching pairs according to the appropriate navigation axis and
condition
 Novel techniques for holistic SM-based forward path and
backward path operators with guaranteed low memory
requirements
Performance Comparison
Performance Comparison
Sensitivity to context selectivity
descendant
forward path
ancestor
Conclusions I
 Novel techniques for evaluating forward and
backward multi-step paths
 pipelined duplicate elimination and document order
preservation
 Lookup fp, Lookup bp, Lookup cs, SM fp, SM bp, SM cs
 Fast backwards navigation that fully exploits the
capabilities of the underlying storage system
 Algorithms perform well across a variety of different
physical storage models
 First steps towards building cost models for XPath
Athens University of Economics and Business
33
Conclusions II
 Operator-based XPath processing provides significant
optimization opportunities
 Different implementations of logical operators can
provide benefits in different circumstances
 E.g. context selectivity
 Query plans can be much more efficient than
(existing) monolithic (twig) techniques in most
circumstances
Athens University of Economics and Business
34
Thank you!
Athens University of Economics and Business
36
Download