Adaptive Query Processing for Wide

advertisement
Adaptive Query Processing
for
Wide-Area Distributed Data
Michael Franklin
UC Berkeley
Joint work with Tolga Urhan, Laurent Amsaleg,
and Anthony Tomasic
Motivation


The Internet enables access to globallydistributed data sources...
But, current search and data access technology is
primitive:
 Discovering
relevant sources and data is difficult.
 Simple
text-based searches.
 Navigation through link clicking.
 Collecting,
aggregating, and manipulating data from
multiple sources is not supported.
M. Franklin, August 2000
2
QP on the Internet? — Issues

Semantic Interoperability
 Wrapper/Mediator
Architecture.
 XML,XMI, CWMI,OLE-DB, ...

Source Discovery
 Metadata

Repositories and Directories.
Performance
 Distributed

database technology, caching, etc.
Responsiveness and Availability
 Unpredictability:
how to build responsive systems?
 This is the focus of this talk.
M. Franklin, August 2000
3
Databases to the Rescue?

DB query languages used to be navigational.

Relational languages are more useful for many tasks.
Powerful, and (more or less) declarative.
 Queries are written without regard to the physical
structure/location/etc. of data. (Data Independence)
 Easily extended to distributed systems.



DB query languages and optimization techniques have been
developed over decades.
This technology is unavailable to the Internet user.
M. Franklin, August 2000
4
Distributed Query Processing (QP)
SELECT eid,ename,title,salary
FROM Emp, Proj, Assign
WHERE Emp.eid = Assign.eid
AND Proj.pid = Assign.pid
AND Emp.loc <> Proj.loc


System handles query plan
generation & optimization;
ensures correct execution.
Originally conceived for
corporate networks.
M. Franklin, August 2000
©1998 Ozsu and Valduriez
5
Wide-area + Wrapped sources  Unpredictability
 Sources
may be unreachable
or slow to respond.
 Data delivery may be:
 slower
than expected
 bursty
 interrupted
 Data
statistics/cost
estimates may be unavailable
or unreliable.
Traditional, static query processing approaches
cannot cope with such problems at run-time.
M. Franklin, August 2000
6
Some Solutions

Adaptive Query Processing
 Query
Scrambling - “Reactive Query Execution”
 XJoin – non-blocking, reactive query operator.
 and beyond!

Risk-Aware Query Planning


Producing robust plans.
Exploiting Alternative Sources
 Mirrors

or “not exactly”.
Relaxing Query Semantics

Partial, Fuzzy, or Alternative answers
M. Franklin, August 2000
7
Query Scrambling - Introduction


Goal: Overcome limitations of static QP for
unexpected delays.
A Reactive Approach:
 Start

with an optimized plan.
 Modify the plan on-the-fly if problems are
detected.
 Hide delays by performing other useful work.
Assumptions:
 Focus
on Initial Delay
 Query processing at client; Iterator model
 No replication.
M. Franklin, August 2000
8
Query Scrambling - Overview


An iterative algorithm.
Monitor input and scramble when problems are
detected.
Normal
Execution
Source(s)
delayed
Scrambling
Phase 1
Source(s)
responded
Phase 2
Still delayed


Phase 1: Reschedule “runable” operators.
Phase 2: Operator synthesis: create new
operators.
M. Franklin, August 2000
9
Query Scrambling Example
ABCDE
4
1
4
1
3
B
2
A B C D E
Initial Plan
M. Franklin, August 2000
CDE
A
Reschedule
BCDE
B CDE
A
A
New Operators
Reschedule
10
Building a Scrambling Engine

A thread per operator.

Monitoring and scheduling.
Not
Started
Closed
Stalled
open
timeout
done
Active
data_arrival
de-schedule
Suspended
resume

A “smart” materialization operator.

Multi-threaded query operators?
M. Franklin, August 2000
11
Directing Scrambling



[SIGMOD 98]
Original formulation [PDIS 96] was based on
heuristics.
Demonstrated the ability for QS to hide delays, but
was susceptible to making bad choices.
Query optimizers are able to choose good plans, but
how to use an optimizer to do scrambling?
 Phase I
 Issue: where to place the materialization operator?
 Answer: Choose subtree with best overhead/useful work ratio.
 Phase
II is trickier.
M. Franklin, August 2000
12
Phase II - Operator Synthesis

If no runable subtrees, create new ones.

Needed: an optimizer that:
1) is lightweight & incremental, and
2) understands delays.

Most QP systems optimize for total work.

But, delay is inherently a response-time issue.

Response-time optimization can “magically” move
delayed operators to the “best” point in the plan,
but only if it knows the duration of the delay!
M. Franklin, August 2000
13
Include Delayed (ID) Algorithm



Invokes the optimizer with a very large delay value.
Optimizer pushes the delayed relation as far back
as is useful.
Large delay estimation
M. Franklin, August 2000
Aggressive
14
Estimated Delay (ED) Algorithm

Initially calls the RT optimizer with a small delay
 Small

Successively increases the delay estimation.
 50%

value = 25 % of the RT of the original query
and then 100% of the original RT.
Increasing estimates
M. Franklin, August 2000
Adaptive
15
Experimental Environment

Workload: Queries derived from TPC-D benchmark
 TPC-D

(5), TPC-D(8), TPC-D(9), (1 GB base data)
Optimizer (built from scratch):
 Two

Phase Randomized Optimizer
a la [Ioannidis 90].
 Optimizes

for Total Work or Response Time (GHK 92).
 Search space = bushy plans
Studied algorithms on a simulated environment
 Network,
remote sites, query engine etc.
 Subsequently validated with Predator-based
implementation.
M. Franklin, August 2000
16
National Market Share Query (TPC-D 8)

Experiments with several memory sizes

Delayed relation (Part) is an important relation.

Used hash joins only.
Part
LineItem
1/150
Order
Customer
2/7
Supplier
Nation
Nation
1/5
Region

Lineitem is the largest relation, Part is a “reducer”

Optimizer initially chooses to go left-to-right.
M. Franklin, August 2000
17
National Market Share Query
(large memory)
> 4 MB
Response Time (Sec)
1000
800
600
400
PAIR
IN
200
ED
0
0
200
400
600
800
1000
Delay (sec)
M. Franklin, August 2000
18
National Market Share Query (Sm. memory)
Response Time (Sec)
3000
2500
2000
1500
1000
PAIR
IN
ED
500
0
0
500
1000
1500
2000
2500
Delay (sec)
•
•
•
•
Scrambling becomes more expensive
Pair: Local Decisions, lack of global view
IN : Poor performance for short delays.
ED : Good for a wide range of delay values.
M. Franklin, August 2000
19
Cost-Based Query Scrambling
Summary:
 Traditional
static query processing does not scale to
the wide-area environment.
 A reactive approach is needed.
 This requires a multi-threaded engine and a
scrambling-enabled optimizer.
Experimental Results:
 Avoids
many of the problems of heuristic algorithms.
 Response time-based optimization is needed.
 Fundamental tradeoffs arise in the absence of good
delay predictions.
M. Franklin, August 2000
20
XJoin - Improving Responsiveness



QS can speed up the delivery of the entire answer.
But, its ability to hide delays is limited by the amount
of useful work that can be done in the query.
XJoin is a new query operator that:
 Produces
results incrementally as they become
available.
 Allows progress to be made in highly erratic situations.
 Has a small memory footprint.
 Tolerates bursty and slow behavior.
M. Franklin, August 2000
21
Hash Join
Symmetric
Hash Join
Hash Hash Hash
Table ATable ATable B
Source A
Build
Probe
Source B

block when
input
stalls.
Traditional
Symmetric Hash
Hash Joins
Join (SHJ)
blocksone
only
if both
stall.
 Processes
tuples as they arrive from sources.
 Produces all tuples in the join and no duplicates.
M. Franklin, August 2000
22
Memory Utilization





As originally specified, SHJ requires both inputs to
be memory resident.
For a complex query, this means all intermediate
results must be in memory.
This is wasteful and can result in thrashing.
XJoin extends SHJ to allow it to work with limited
memory (like “Hybrid Hash”).
Spilled tuples are processed by a reactivelyscheduled background thread.
M. Franklin, August 2000
23
Partitioning




XJoin is a partitioned hash
join method.
When allocated memory is
exhausted, a partition is
flushed to disk.
Join processing continues
on memory-resident data.
Disk-resident tuples are
handled in background.
M. Franklin, August 2000
24
The 3 Stages of XJoin

Stage 1 - Symmetric hash join (memory-to-memory)

Stage 2- Disk-to-memory
Separate thread - runs when stage 1 blocks.
 Stage 1 and 2 trade off until all input has been received.


Stage 3 - Clean up stage
Stage 1 misses pairs that were not in memory concurrently.
 Stage 2 misses pairs when both are on disk, and may not get
to run to completion.

M. Franklin, August 2000
25
XJoin - Details


The asynchronous/multi-threaded nature of XJoin combined
with its small footprint allows it to be fully pipelined, but…
Duplicate result tuples can be introduced during stages 2
and 3. These are avoided using timestamps.
Each tuple is given an Arrival Timestamp (ATS) and a
Departure Timestamp (DTS).
 Two tuples with overlaping ATS-DTS ranges have already been
matched in stage 1.
 Timestamp of when disk-resident partition was used allows
detection of tuples matched during stage 2.


Second stage can be further optimized, at the expense of a
bit of memory and some additional duplicate detection.
M. Franklin, August 2000
26
XJoin-Performance


We implemented XJoin in our multi-threaded
version of the PREDATOR ORDBMS (from Cornell).
We modeled network delays using traces obtained
from accessing sites across the Internet.
 Replaying


these traces provides repeatable results.
Focus on a “slow” (24.1 KB/sec) and “fast” (132.8
KB/Sec) trace - both exhibit bursty behavior.
Workload is simple join queries on Wisconsin
Benchmark relations.
M. Franklin, August 2000
27
Results - 2-Way Joins
(Time in seconds to nth tuple)
H
H
XJ-2
XJoin
XJ-2
Slow Build,
Slow Probe
XJoin
Fast Build,
Slow Probe
H
H
XJ-2
XJoin
M. Franklin, August 2000
Slow Build,
Fast Probe
XJoin
XJ-2
Fast Build,
Fast Probe
28
Taming the Second Stage

Impact of the second stage decreases during the
execution of an XJoin. Scheduling can be adjusted to
account for this.
XJoin
H
XJoin-A
Fast Build,
Fast Probe
M. Franklin, August 2000
29
Results – Multiway Joins
#
1st
5K
50K
Last
Rels XJ HHJ XJ HHJ XJ HHJ XJ HHJ
2 5
823 195 826 668 836 860 878
4
46
916
482
938
6
79
992
400
2
1
150
4
17
6
75
907
1018
1075 860
1144 952
1174
36
153
127
181
178
201
285
175
307
378
362
470
387
476
381
559
803
629
892
660
M. Franklin, August 2000
786
992
Delivery Times (in Seconds)
30
XJoin - Summary

A non-blocking, small footprint join operator.

It is multi-threaded, consisting of three stages.




These stages allow XJoin to make progress when input blocks,
but they can introduce duplicates.
XJoin is optimized for streaming results to users
as fast as they are created.
Like QS, XJoin hides delays with useful work, but
at the operator level rather than at the plan level.
Experiments showed   order-of-magnitude
improvements in time to get initial results.
M. Franklin, August 2000
31
Eddy – Continuous Optimization
Join
RS
R
S
Eddy
T



Join
ST
Flow-based (“Rivers”)
Tuples are routed via a ticket-based scheme and
back-pressure.
Hellerstein and Avnur 99
M. Franklin, August 2000
32
Adaptive Approaches
static
plans
current
DBMS

late
binding
reopt.
continuous
opt.
anarchy
Eddy
Dynamic, Query Scrambling,
Parametric,
Kabra/DeWitt
XJoin
Competitive,
…
???
Increased uncertainty argues for increased adaptivity.
 Wide-area
nets and admin domains introduce uncertainty.
 Pesky users introduce uncertainty.
 Non-traditional data sources introduce uncertainty.

Implications for data-intensive Internet services.
M. Franklin, August 2000
33
The Telegraph Project

Adaptive data management for Internet-scale
composition of services.
 Dataflow-based
scheduling.
 Cross-domain negotiation.
 “User-in-the-loop”
 Adaptation and learning over varying granularities
 individual
long-running jobs
 many similar short jobs
 continuous data flows and filters.

M. Franklin, August 2000
34
Conclusions


Current static query processing technology cannot
cope with the wide-area environment.
A key concern is unpredictability.
 Query
Scrambling is a reactive execution approach.
 XJoin is a pipelined operator that streams answers.
 Even more adaptive approaches are possible.

Complementary approaches (and future work):
 Alternative
semantics.

sources, optimizing for robustness, relaxing
These ideas extend to the composition of
Internet services.
M. Franklin, August 2000
35
The End
Future Work

Investigating the properties of query plans that make them
robust in the presence of network problems.



Will use these properties in the objective function for query
optimization.
Next step is to use alternative, but not necessarily
equivalent sources.
Further progress will involve relaxing the guarantees on
semantics that the query system provides.
 The
WWW has shown us that users will accept this!
M. Franklin, August 2000
37
Conclusions

Current Internet querying and data manipulation capabilities
are too limited.
Unexpressive, too coarse grained, etc.
 Do not support manipulating data from multiple sites.



Distributed querying technology addresses these concerns
but is not applicable on the Internet.
A key concern is unpredictability.
Query Scrambling is a reactive execution approach.
 XJoin is a pipelined operator that streams answers.
 Lots more interesting work to be done in this area.

M. Franklin, August 2000
38
Motivation



Pervasive network connectivity enables
global-scale federated DBMSs.
Improvements in heterogeneous DBMS and
emerging standards enable Internet query
processing.
Telegraph: Flow-based composition of dataintensive Internet services.
M. Franklin, August 2000
39
Download