CSE 636 Data Integration XML Distributed Query Processing Slides by Yannis Papakonstantinou

advertisement
CSE 636
Data Integration
XML Distributed Query Processing
Slides by Yannis Papakonstantinou
Overview
• The Virtual XML View Approach towards Data Integration
• Query Processing in XML Mediators
– Issues Overview
– An Algebra-Based Architecture
– Navigation-driven Evaluation
2
Data Integration Requirements in
eBusiness Applications
• It starts with …
“Provide to customers, partners, employees
Application X”, where X may be in Business
Intelligence, Customer Support, …
• Then the problem comes up…
“The applications uses information assets widely
distributed across my enterprise”
• If only….
“Give to the application a single place to go to
access all the information required. Requirements
are evolving so make sure the system can be
easily maintained and upgraded”
3
View-Based Approach: Wrappers
Export Basic Source Views
customer_table
<customer_table>
customer
<customer>
name
<name>John</name>
John
<id>56</id>
id
<city>Chicago</city>
56
</customer>
city
<customer>
Chicago
<name>George</name>
customer
<id>58</id>
name
<city>Chicago</city>
George
</customer>
id
…
58
</customer_table>
city
Chicago
…
Client
Application
Integrated (XML) View
Mediator
(XML) View
(XML) View
Wrapper
Wrapper
Customers
Rel. DB
Orders
Rel. DB
4
Wrappers Export Basic Source Views
Client
Application
Integrated (XML) View
Mediator
(XML) View
(XML) View
Wrapper
Wrapper
Customers
Rel. DB
Orders
Rel. DB
order_table
order
id
1034
cid
56
item
chips
order
id
1567
cid
56
item
salsa
…
5
customers
Mediators Export Integrated
Views,
customer
Tailored to Application Needsname
John
id
customer_table
customer
name
John
id
56
city
56
city
Client
Chicago
Application
orders
order
id
Integrated (XML) View
1034
item
chips
Mediator
order
…
customer
order_table
…
(XML) View
(XML) View
order
id
Wrapper
Wrapper
1034
cid
Customers
Orders
56
Rel. DB
Rel. DB
item
6
Virtual Views: Query-Driven Mediator
Operation
Application
Retrieve Chicago
customer names
and id’s
Mediator
Find all Chicago
customer names,
along with their
ordered items
Retrieve all cid’s
and item names
of orders
Wrapper
Wrapper
Customers
Database
Orders
Database
7
On-Demand (Query-Driven)
Mediator Operationcustomers
Application
customer
name
John
id
56
…
Wrapper
Customers
Database
Mediator
customer
name
John
ordered_items
item
chips
item
salsa
customer
…
order
cid
56
item
Wrapper
chips
order
cid
56
item
Orders
salsa
Database …
8
Multiple Plans are Possible
• Retrieve customers
• For each customer find matching orders
9
A New Kind of Query Processing
Problem
• Build and Run “Optimal” Plan
– Consisting of operators that
– Collect source info using supported queries and
commands
– Combine info into XML result
10
Challenges in Query Processing &
Optimization
• Operate within the Limited and Different Capabilities of the
Sources
– Describe sets of supported queries
– Use most efficient supported queries
• Optimize plans/queries sent to sources
–
–
–
–
Estimate Costs of Plans
Adapt Plans Along the Way
Beyond Conjunctive Queries
Compose Queries/Views Efficiently
• Schema inference & optimization
• Combine navigation & querying
11
From Limited Wrappers to Efficient
Plans for Extended Query Sets
all queries
over schema
Queries supported
by mediator
• Answering Queries Using
Views
• But with Infinite Sets of
Views
• Increasing Relevance due
to Web Services
Queries supported
by wrapper
Source
Data &
Schema
Source
Data &
Schema
12
Challenges in Query Processing &
Optimization
• Operate within the Limited and Different Capabilities of the
Sources
– Describe sets of supported queries
– Use most efficient supported queries
• Optimize plans/queries sent to sources
–
–
–
–
Estimate Costs of Plans
Adapt Plans Along the Way
Beyond Conjunctive Queries
XQuery processing
• Schema inference & optimization
• Combine navigation & querying
– Build iterator models for low memory footprint
13
Navigation-Driven Evaluation of Query
Result
customer_table
customer
name
John
id
56
city
Chicago
customer
name
George
id
58
customers
customer
name
John
id
56
city
order_table
Chicago
order
orders
id
order
1034
id
cid
1034
56
item
item
chips
chips
order
order
…
id
customer
1567
…
cid
56
14
Navigation-Driven Evaluation
p
Input: client
navigations
Client
right(p)
down(p)
view definition
ans = q( s1 … sn )
result
Lazy Mediator
Output: source
navigations
s1
XML source
...
sn
XML source
15
Navigation-Driven Evaluation
Input: client
navigations
Client
view definition
ans = q( s1 … sn )
result
Lazy Mediator
Output: source
navigations
s1
XML source
...
sn
XML source
16
Navigation-Driven Evaluation
Input: client
navigations
Client
view definition
ans = q( s1 … sn )
result
Lazy Mediator
Output: source
navigations
s1
XML source
...
sn
XML source
17
Navigation-Driven Evaluation
Input: client
navigations
Client
view definition
ans = q( s1 … sn )
result
Lazy Mediator
Output: source
navigations
s1
XML source
...
sn
XML source
18
Navigation-Driven Evaluation
Input: client
navigations
Client
view definition
ans = q( s1 … sn )
result
Lazy Mediator
Output: source
navigations
s1
XML source
...
sn
XML source
19
Mixing Querying & Navigation
customers
customer
name
John
Find details of all
id
salsa orders below
56
visited node
city
Chicago
orders
order
id
1034
item
chips
order
…
customer
…
20
Challenges in Mixing Querying &
Navigation
• Two-dimensional navigation
– Reminds of cursors but there are multiple continuation
points
• Controlling size + shape
• Contextualizing queries by navigation
21
Overview
• The Virtual XML View Approach towards Data Integration
• Query Processing in XML Mediators
– Issues Overview
– An Algebra-Based Architecture
– Navigation-driven Evaluation
22
An Algebra-Based Query Processor
Architecture
Client
XQuery
XQuery
Views
Navigation
Requests
Results
Translation to Algebra
Algebra Plan
Source
Description
Function
Description
Rewriter/Optimizer
Physical Algebra Plan
Plan Execution Engine
Source Schemas
& Types
Functions
Queries & Fetch
Requests to Sources
23
Query Processing on Tuple-Oriented
Algebra Enables…
• Well-known efficient physical implementations of
the operators
• Join optimization
• Nested data by nested plans or group-by
• Efficient iterator model
24
XQuery: Queries & Views for XML
<customers>
{
for $cust in document(“db”)/customer
return
<customer>
{
$cust/id,
for $order in document(“db”)/order
where $order/cid = $cust/id
return <order> { $order/id } </order>
}
</customer>
}
</customers>
25
Access and Navigation
$db1
ct
ct
$cust $cust_id
c1
i1
c2
i2
getD $cust, id  $cust_id
$db1
ct
ct
$cust
c1
c2
getD $db1, customer  $cust
$db1
ct
source
db, [$db1]
ct
c1
c2
db
customer_table
customer
name
John
id
56
customer
name
George
id
58
i1
i2
26
Simplification Using Schema Inference
Since $cust_id  $cust and
$cust is “useless” otherwise
$db1
ct
ct
$cust_id
ct
i1
i2
getD $db1, customer/id  $cust_id
$db1
ct
source
db, [$db1]
db
customer_table
customer
name
John
id
56
customer
name
George
id
58
i1
i2
27
Nested Plans
$db1
ct
ct
apply
$cust_id
i1
i2
$orders
[o11…]
[o21…]
$cust_id
$cust_id
i2
i1
nestedSrc
… Plan p
$part
$part, p  $orders
$db1
ct
for
$db1
$db1
ct
ct
ct
$cust_id
$part
i1
$db1
ct
$cust_id
i1
i2
$db1
ct
$cust_id
i2
$part
$db1
ct
ct
$cust_id
i1
i2
28
Joins and Selections
$cust_id
$db1 $cust_id $db2 $order $cust_id2 $order_id
…
$cust_id2=?
$db2 $order $cust_id2 $order_id
…
getD $order, id  $order_id
getD $order, cid  $cust_id2
$db1
ct
getD $db2, order  $order
$cust_id
i1
nestedSrc
$part
source
db, [$db2]
29
Constructors
$orders
[e1, e2]
e2
listify $oidE  $orders
…
…
…
$oidL
[o1]
[o2]
crEl order, $oidL  $oidE
…
…
…
$order_id
o1
o2
crList $order_id  $oidL
…
…
…
$order_id
o1
o2
$oidE
e1
e2
e1
order
order
o2
o1
$oidL
[o1]
[o2]
30
Algebra Example
31
Plan Decomposition
•
•
•
•
Within Rewriting Optimizer
Rules replacing “leaf” trees
May move commutable parts
Catch: No projection limitation
32
Plan After Decomposition
33
Replacing Nested Plans with
GroupBy/Outerjoin Combinations
apply
apply
$part, p  $R
$part, p  $R
p3
p3
nestedSrc
groupBy
nestedSrc
for
$part
$part
S(p1)  $part
p2
$part
p1
p1
p2
34
Multiple Possible Plans
35
Overview
• The Virtual XML View Approach towards Data Integration
• Query Processing in XML Mediators
– Issues Overview
– An Algebra-Based Architecture
– Navigation-driven Evaluation
36
Building Navigation-Driven Evaluation
on the Algebra
Client
Source
access
Source
access
Source
Source
37
Think of Each Operator as a Lazy
Mediator
root
tuple
$db1
$db1
ct
ct
$cust $cust_id
c1
i1
c2
i2
getD $cust, id  $cust_id
$db1
ct
ct
$cust
c1
c2
customer_table
$cust
customer
name
$cust_id
John
id
tuple
56
customer
$db1
name
$cust
George
id
$cust_id
58
c1
i1
c2
i2
38
Navigation-Driven Evaluation of
Operators
Augmented with
• nextTuple(p)
• p.attr
Input: client
navigations
result
Lazy Operator
Output: source
navigations
s1
Result of
Operator below
...
sn
Result of
Operator below
39
Use of Semantic Id’s in NavigationDriven Evaluation
r/d(<f1, f2, …, fn>)
Operator
State
V1: f1
V2: f2
… …
Vn: fn
Other: …
Proceed
down/right
<f’1, f’2, …, f’n>
Operator
State
V1: f’1
V2: f’2
… …
Vn: f’n
Other: …
40
Fragments Reduce the “Set State” –
“Produce State” Overhead
root
Hole
3
customer
name,
“John”
Hole
2
order
oid,
123
lineitem
lineitem
lineitem
Hole
1
41
Fragments Reduce the “Set State” –
“Produce State” Overhead
root
Hole
3
customer
name,
“John”
order
order ordnum=16
Hole
5
oid,
123
lineitem
lineitem
lineitem
Hole
1
lineitem
lineitem
Hole
4
42
Controlling the Size and Shape of
Fragments
Client
listify
Client-Server
Interaction
Controller
listify
Source
access
Source
access
Source
Source
43
 Fragment Size causes  Memory
Footprint causes  Performance
44
Fragmentation Strategies
• Fixed Fragment Size
– Ideal for depth-first, left-to-right navigation
• Adaptive Fragment Size
– Assign larger pieces to those who use them
45
Response Performance for
Breadth-First and Depth-First
Depth First traversal
Breadth First traversal
46
References
• Navigation-Driven Evaluation of Virtual Mediated
Views
– Bertram Ludäscher, Yannis Papakonstantinou, Pavel
Velikhov
– EDBT 2000
• Architecture and Implementation of an XQuerybased Information Integration Platform
– Yannis Papakonstantinou, Vasilis Vassalos
– IEEE Data Eng. Bull. 25(1), 2002
• XML queries and algebra in the Enosys
integration platform
– Yannis Papakonstantinou, Vinayak R. Borkar, Maxim
Orgiyan, Konstantinos Stathatos, Lucian Suta, Vasilis
Vassalos, Pavel Velikhov
– Data Knowl. Eng. 44(3), 2003
47
Download