Massive Scale-out of
Expensive Continuous Queries
Erik Zeitler and Tore Risch
Uppsala Database Laboratory
Uppsala University
1.
Introduction
2.
Stream splitting strategies for scale-out
3.
Evaluating stream splitting strategies
4.
Cost model and heuristic
5.
Energy efficiency
6.
Related work
7.
Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 2
user or programmer
01001011
11001011
31 Aug 2011
Input data streams
Query processing software
Stream data access software
Query result data stream metadata
Erik Zeitler and Tore Risch stored data
3
How to ensure scalable CQ execution
• with growing input stream rate?
• with high CQ execution cost?
By scale-out.
CQs are scaled out by splitting the input stream.
• applications require customizable input stream splitting, called splitstream
• both tuple route and broadcast allowed
31 Aug 2011
CQ
CQ
Erik Zeitler and Tore Risch merge
4
How to ensure scalable CQ execution
• with growing input stream rate?
• with high CQ execution cost?
By scale-out.
CQs are scaled out by splitting the input stream.
• applications require customizable input stream splitting, called splitstream
• both tuple route and broadcast allowed
How to split massive streams over massively parallel CQs?
• By parallelization of splitstream CQ
CQ
CQ
CQ merge
CQ
CQ
31 Aug 2011 5
1.
Introduction
2.
Stream splitting strategies for scaleout
3.
Scale-up of stream splitting strategies
4.
Cost model and heuristic
5.
Energy efficiency
6.
Related work
7.
Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 6
splitstream(stream s , integer q , function rfn , function bfn )
vector of stream sv s splitstream
User defines rfn and bfn
rfn(object tpl, integer q) integer rfnLRB (event e , integer q) integer as q select expressway( e ) where eventtype( e ) = 0;
bfn(object tpl) boolean bfnLRB (event e ) boolean as select eventtype( e ) = 2;
rfn and bfn for streams are analogous to fragmentation and replication conditions in distributed DBMS
Unlike DDBMS, execution of rfn and bfn is parallelized
31 Aug 2011 Erik Zeitler and Tore Risch 7 sv
Naïve (flat) splitstream implementation: fsplit fsplit(stream s, integer q, function rfn , function bfn )
vector of stream sv
CQ
CQ fsplit
Expensive stream splitting computations
Bottleneck!
31 Aug 2011 Erik Zeitler and Tore Risch
CQ
CQ
CQ
CQ
CQ
CQ
8
Tree shaped splitstream implementation: maxtree maxtree(stream s, integer q, function rfn , function bfn )
vector of stream sv fsplit
CQ
CQ
CQ fsplit
• Bottleneck is alleviated
[Zeitler and Risch,
DASFAA 2010]
• but still problematic fsplit
CQ
CQ
CQ
CQ
CQ
31 Aug 2011 Erik Zeitler and Tore Risch 9
Scaled-out splitstream: parasplit parasplit(stream s, integer q, function rfn , function bfn )
vector of stream sv fsplit
PR fsplit
Window router distributes entire windows
31 Aug 2011 fsplit
Window splitter
Stream merge
Erik Zeitler and Tore Risch
CQ
CQ
CQ
CQ
CQ
CQ
CQ
CQ
10
Parasplit: route – //fsplit – //(merge – CQ) parasplit(stream s, integer q, function rfn , function bfn )
vector of stream sv fsplit
CQ
CQ
CQ
CQ
PR fsplit
Window router distributes entire windows fsplit
CQ
CQ
CQ
CQ
31 Aug 2011
Window splitter
Stream merge
Erik Zeitler and Tore Risch 11
PR
31 Aug 2011
Tree shaped window routing: parasplit*
PR
PR
PR fsplit fsplit fsplit fsplit fsplit fsplit fsplit fsplit fsplit
Erik Zeitler and Tore Risch 12
1.
Introduction
2.
Stream splitting strategies
3.
Scale-up of stream splitting strategies
4.
Cost model and heuristic
5.
Energy efficiency
6.
Related work
7.
Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 13
www.cs.brandeis.edu/~linearroad
Hardware
Linux cluster
Up to 70 nodes
Each node has 2x quad-core Intel®
Xeon®
E5430@2.66GHz,
6 MB L2$.
TCP/IP over GbE
Performance number L : Number of xways the DSMS can handle
31 Aug 2011 Erik Zeitler and Tore Risch 14
Performance number L : Number of xways the DSMS can handle name
Aurora
Commercial sys A
SPC
Xquery
DataCell stream schema org
Brandeis,
Brown, MIT
IBM
ETHZ
CWI
ETHZ
SCSQ maxtree
SCSQ parasplit
UU
UU year L cores comment
2004 2.5
1
2004 0.5
1
2006 2.5
170 3GHz Xeon
2007 1.5
2009 1
2010 5
1
4 1.4s avg RT
4
2010 64 48
2011 512 560
D disabled (later verified in mySQL)
D disabled
31 Aug 2011 Erik Zeitler and Tore Risch 15
1 000,00
800,00
600,00
400,00
200,00
0,00
0
parasplit* parasplit maxtree fsplit
1 Gbps wire speed
31 Aug 2011
100 200
Erik Zeitler and Tore Risch
300 q
400 500
16
Window router stream rate fsplit
PR fsplit
W p fsplit
W – physical window size p – number of parallel fsplit
31 Aug 2011 Erik Zeitler and Tore Risch
CQ
CQ
CQ
CQ
CQ
CQ
CQ
CQ
17
31 Aug 2011
Impact of window size W in window router network bound for large enough windows
1000,00
800,00 p=4 p=64
600,00
400,00
200,00
0,00
0 15 5
W [kB]
10
Erik Zeitler and Tore Risch 18
31 Aug 2011
Impact of window size W in window router when scaling p
1000,00
800,00
600,00
400,00
200,00
0,00
0 p=4 p=128 p=512 p=64 p=256
15 5
W [kB]
10
Erik Zeitler and Tore Risch 19
Parasplit*
Tree shaped window router
1 000,00
900,00
800,00
700,00
600,00
500,00
400,00
300,00
200,00
100,00
0,00
0
W = 16 kB
31 Aug 2011 window router tree (parasplit*) single window router (parasplit)
100 200 p p
300
Erik Zeitler and Tore Risch
400 500
20
1.
Introduction
2.
Stream splitting strategies
3.
Scale-up of stream splitting strategies
4.
Cost model and heuristic
5.
Energy efficiency
6.
Related work
7.
Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 21
Eliminate p parasplit(stream s, integer q, function rfn , function bfn )
vector of stream sv fsplit
PR p
Given
• Input stream rate Φ
D
• Parallelism of continuous query
Automatically determine
• fsplit parallelism p q fsplit fsplit q
31 Aug 2011 Erik Zeitler and Tore Risch
CQ
CQ
CQ
CQ
CQ
CQ
CQ
CQ
22
consume split emit(R
1
)
...
R
1
...
emit(R q
) R q
C fsplit
cr
cs ( o
r
q
b )
ce ( r
q
b )
cr – read cost per tpl (read + de-marshal) cs – split cost per tpl (execute rfn and bfn ) ce – emit cost per tpl (marshal + print) o – omit % r – routing % according to rfn and bfn b – broadcast % q – number of output streams
31 Aug 2011 Erik Zeitler and Tore Risch 23
S
1
...
S p consume(S
1
)
...
merge compute split emit(R
1
)
...
consume(S p
)
C
CQ emit(R w
)
cr
p
cp
cm
O
R
1
...
R w cr – read cost per tpl (read + de-marshal) cp – poll cost per tpl cm – merge cost per tpl
O – cost of executing the CQ and emit its result
31 Aug 2011 Erik Zeitler and Tore Risch 24
CQ fsplit
CQ
CQ
CQ
PR fsplit
CQ p
CQ fsplit q b
CQ
C
PR
cr
W
C
C fsplit
CQ
cr cr
W
p cs
W
cs
cp o
ce
W r cm
q
b
O
ce
r
q
CQ p can be eliminated using cost model, but requires extensive profiling everywhere
31 Aug 2011 Erik Zeitler and Tore Risch 25
p
Assume
• 1% broadcast tuples (configurable)
C
• 0% omitted tuples (configurable) fsplit
cr
W
cs
o
r
q
b
ce
r
q
b
fsplit fsplit
cs
ce
0 .
99
0 .
01
q
PR fsplit p fsplit
Measure Φ on fsplit
(1) fsplit rfn and bfn , q = 1: cs + ce = 1/Φ fsplit q
Estimate p by
D
( 1 ) fsplit
0 .
99
0 .
01
q
31 Aug 2011 Erik Zeitler and Tore Risch
CQ
CQ
CQ
CQ
CQ
CQ
CQ
CQ
26
1 000,00
900,00
800,00
700,00
600,00
500,00
400,00
300,00
200,00
100,00
0,00
0 p according to heuristics vs.
p using exact cost model parasplit cost model
Too high p (p=q)
Too low p (p=1)
400 500
31 Aug 2011
100 200
CQ parallelism,
300 q
Erik Zeitler and Tore Risch 27
1.
Introduction
2.
Stream splitting strategies
3.
Scale-up of stream splitting strategies
4.
Cost model and heuristic
5.
Energy efficiency
6.
Related work
7.
Conclusions and future work
31 Aug 2011 Erik Zeitler and Tore Risch 28
η
How much extra energy does parasplit consume in comparison to fsplit?
fsplit
CQ
CQ
CQ
CQ
PR fsplit
CQ
Conservatively assume energy consumption proportional to CPU usage:
Useful work
• p ∙ C fsplit
Overhead
• C
PR
• q ∙ C
CQ
( O =0)
C
PR
fsplit p
p
C fsplit
C fsplit
q
( O
C
CQ
0 )
CQ
CQ
CQ
31 Aug 2011 Erik Zeitler and Tore Risch 29
31 Aug 2011
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0
parasplit* parasplit cost model
Too high p (p=q)
Too low p (p=1)
100 200
CQ parallelism,
300 q
Erik Zeitler and Tore Risch
400 500
30
Nobody else has investigated strategies for scalable customizable stream splitting
IBM SPADE/System S [Andrade et al 2009]
• Splitstream operator with broadcast capabilities
• Streaming throughput degrades when scaling q
Event based systems [Brenna et al 2009]
• Custom stream splitting shown to be a bottleneck
Gigascope [Johnson et al 2008]
• Assumes specialized stream splitting hardware
• No customizable stream splitting
GSDM [Ivanova, Risch 2005]
• Parallel execution of expensive UDFs
• More limited parallelization
Streaming MapReduce [Condie et al 2010]
• Does not handle scalable stream splitting
[Balkesen, Tatbul 2011]
• Distributing entire windows over CQs
• q ≤ 4
31 Aug 2011 Erik Zeitler and Tore Risch 31
Naïve stream splitting is prohibitive for scale-out of CQs
Parasplit
• eliminates the bottleneck of stream splitting, providing network bound stream rates
Parasplit*
• provides network bound stream rates for highly scaled-out stream splitting
Push selection predicates from CQ to
Improve energy efficiency
High Availability
SCSQ home page rfn of splitstream
• http://www.it.uu.se/research/group/udbl/SCSQ.html
31 Aug 2011 Erik Zeitler and Tore Risch 32
31 Aug 2011 Erik Zeitler and Tore Risch 33
Window router tree
Cost model
LRB
• Parallelization of LRB
Additional related work
31 Aug 2011 Erik Zeitler and Tore Risch 34
Single process window router, p =64
31 Aug 2011 Erik Zeitler and Tore Risch 35
Tree shaped window router, p =64
Parasplit
+ tree shaped window router
= parasplit*
31 Aug 2011 Erik Zeitler and Tore Risch 36
p
Given
• Input stream rate Φ
D
• Parallelism of continuous query
Determine fsplit parallelism p q
• If max stream rate of fsplit is Φ fsplit choose p such that p ∙ Φ fsplit
≥ Φ
D
C
CQ
cr
p
cp
cm
O
• C
CQ increases with
Must choose p p carefully
31 Aug 2011
Φ
D
PR p
Erik Zeitler and Tore Risch fsplit fsplit fsplit q
CQ
CQ
CQ
CQ
CQ
CQ
CQ
CQ
37
Simulates vehicles travelling
(and colliding)
• on a number of expressways
• using variable tolling
• based on traffic conditions and accident proximity
Input: One stream of position reports and historical queries
(account balance, daily tolls)
Continuous queries: Toll notifications, accident notifications
Output: Four result streams of responses to historical and continuous queries:
0.
toll alerts
1.
accident alerts
2.
account balance responses
3.
daily expenditure responses
L-rating: Number of xways processed within RT constraints
31 Aug 2011 Erik Zeitler and Tore Risch 38
Parallelization of LRB using fsplit fsplit
CQ CQ fsplit
CQ CQ fsplit
Scale up q fsplit fsplit
31 Aug 2011 union union groupby toll alerts accident alerts account balance answers
Erik Zeitler and Tore Risch
Daily expenditure queries D are excluded here.
Daily expenditure data is managed by a regular DBMS.
39
Nobody else has investigated strategies for scalable customizable stream splitting
IBM SPADE/System S [Andrade et al 2009]
• Splitstream operator with broadcast capabilities
• Streaming throughput degrades when scaling q
Event based systems [Brenna et al 2009]
• Custom stream splitting shown to be a bottleneck
Gigascope [Johnson et al 2008]
• Assumes specialized stream splitting hardware
• No customizable stream splitting
GSDM [Ivanova, Risch 2005]
• Parallel execution of expensive UDFs
• More limited parallelization
Streaming MapReduce [Condie et al 2010]
• Does not handle scalable stream splitting
[Balkesen, Tatbul 2011]
• Distributing entire windows over CQs
• q ≤ 4
31 Aug 2011 Erik Zeitler and Tore Risch 40
Medusa [Balazinska et al 2004]
• Parallel DSMS
• Dynamic migration of operators between nodes
• Without scale-out, heavy operators are bottlenecks
Dryad [Isard et al 2007]
• User defined process graphs in QL (edges + vertices)
• SCSQ automatically generates such graphs from splitstream
SCOPE [Chaiken et al 2008], Map-reduce-merge [Yang et al 2007]
• All these are batch systems, not DSMSs
Distributed DBMS
• rfn and bfn are analogous for streams to fragmentation and replication conditions in DDBMS
• DDBMS do not scale out fragmentation and replication, while splitstream parallelizes rfn and bfn.
31 Aug 2011 Erik Zeitler and Tore Risch 41