Slides

advertisement

Massive Scale-out of

Expensive Continuous Queries

Erik Zeitler and Tore Risch

Uppsala Database Laboratory

Uppsala University

Outline

1.

Introduction

2.

Stream splitting strategies for scale-out

3.

Evaluating stream splitting strategies

4.

Cost model and heuristic

5.

Energy efficiency

6.

Related work

7.

Conclusions and future work

31 Aug 2011 Erik Zeitler and Tore Risch 2

user or programmer

01001011

11001011

31 Aug 2011

Input data streams

Query processing software

Stream data access software

Query result data stream metadata

Erik Zeitler and Tore Risch stored data

3

Research Questions

 How to ensure scalable CQ execution

• with growing input stream rate?

• with high CQ execution cost?

By scale-out.

 CQs are scaled out by splitting the input stream.

• applications require customizable input stream splitting, called splitstream

• both tuple route and broadcast allowed

31 Aug 2011

CQ

CQ

Erik Zeitler and Tore Risch merge

4

Research Questions

 How to ensure scalable CQ execution

• with growing input stream rate?

• with high CQ execution cost?

By scale-out.

 CQs are scaled out by splitting the input stream.

• applications require customizable input stream splitting, called splitstream

• both tuple route and broadcast allowed

 How to split massive streams over massively parallel CQs?

• By parallelization of splitstream CQ

CQ

CQ

CQ merge

CQ

CQ

31 Aug 2011 5

Outline

1.

Introduction

2.

Stream splitting strategies for scaleout

3.

Scale-up of stream splitting strategies

4.

Cost model and heuristic

5.

Energy efficiency

6.

Related work

7.

Conclusions and future work

31 Aug 2011 Erik Zeitler and Tore Risch 6

Defining stream splitting

 splitstream(stream s , integer q , function rfn , function bfn )

 vector of stream sv s splitstream

 User defines rfn and bfn

 rfn(object tpl, integer q)  integer rfnLRB (event e , integer q)  integer as q select expressway( e ) where eventtype( e ) = 0;

 bfn(object tpl)  boolean bfnLRB (event e )  boolean as select eventtype( e ) = 2;

 rfn and bfn for streams are analogous to fragmentation and replication conditions in distributed DBMS

 Unlike DDBMS, execution of rfn and bfn is parallelized

31 Aug 2011 Erik Zeitler and Tore Risch 7 sv

Naïve (flat) splitstream implementation: fsplit fsplit(stream s, integer q, function rfn , function bfn )

 vector of stream sv

CQ

CQ fsplit

Expensive stream splitting computations

Bottleneck!

31 Aug 2011 Erik Zeitler and Tore Risch

CQ

CQ

CQ

CQ

CQ

CQ

8

Tree shaped splitstream implementation: maxtree maxtree(stream s, integer q, function rfn , function bfn )

 vector of stream sv fsplit

CQ

CQ

CQ fsplit

• Bottleneck is alleviated

[Zeitler and Risch,

DASFAA 2010]

• but still problematic fsplit

CQ

CQ

CQ

CQ

CQ

31 Aug 2011 Erik Zeitler and Tore Risch 9

Scaled-out splitstream: parasplit parasplit(stream s, integer q, function rfn , function bfn )

 vector of stream sv fsplit

PR fsplit

Window router distributes entire windows

31 Aug 2011 fsplit

Window splitter

Stream merge

Erik Zeitler and Tore Risch

CQ

CQ

CQ

CQ

CQ

CQ

CQ

CQ

10

Parasplit: route – //fsplit – //(merge – CQ) parasplit(stream s, integer q, function rfn , function bfn )

 vector of stream sv fsplit

CQ

CQ

CQ

CQ

PR fsplit

Window router distributes entire windows fsplit

CQ

CQ

CQ

CQ

31 Aug 2011

Window splitter

Stream merge

Erik Zeitler and Tore Risch 11

PR

31 Aug 2011

Tree shaped window routing: parasplit*

PR

PR

PR fsplit fsplit fsplit fsplit fsplit fsplit fsplit fsplit fsplit

Erik Zeitler and Tore Risch 12

Outline

1.

Introduction

2.

Stream splitting strategies

3.

Scale-up of stream splitting strategies

4.

Cost model and heuristic

5.

Energy efficiency

6.

Related work

7.

Conclusions and future work

31 Aug 2011 Erik Zeitler and Tore Risch 13

Experimental set-up

www.cs.brandeis.edu/~linearroad

Hardware

Linux cluster

 Up to 70 nodes

 Each node has 2x quad-core Intel®

Xeon®

E5430@2.66GHz,

6 MB L2$.

 TCP/IP over GbE

Performance number L : Number of xways the DSMS can handle

31 Aug 2011 Erik Zeitler and Tore Risch 14

LRB result

Performance number L : Number of xways the DSMS can handle name

Aurora

Commercial sys A

SPC

Xquery

DataCell stream schema org

Brandeis,

Brown, MIT

IBM

ETHZ

CWI

ETHZ

SCSQ maxtree

SCSQ parasplit

UU

UU year L cores comment

2004 2.5

1

2004 0.5

1

2006 2.5

170 3GHz Xeon

2007 1.5

2009 1

2010 5

1

4 1.4s avg RT

4

2010 64 48

2011 512 560

D disabled (later verified in mySQL)

D disabled

31 Aug 2011 Erik Zeitler and Tore Risch 15

1 000,00

800,00

600,00

400,00

200,00

0,00

0

Splitstream stream rate

parasplit* parasplit maxtree fsplit

1 Gbps wire speed

31 Aug 2011

100 200

Erik Zeitler and Tore Risch

300 q

400 500

16

Window router stream rate fsplit

PR fsplit

W p fsplit

W – physical window size p – number of parallel fsplit

31 Aug 2011 Erik Zeitler and Tore Risch

CQ

CQ

CQ

CQ

CQ

CQ

CQ

CQ

17

31 Aug 2011

Impact of window size W in window router network bound for large enough windows

1000,00

800,00 p=4 p=64

600,00

400,00

200,00

0,00

0 15 5

W [kB]

10

Erik Zeitler and Tore Risch 18

31 Aug 2011

Impact of window size W in window router when scaling p

1000,00

800,00

600,00

400,00

200,00

0,00

0 p=4 p=128 p=512 p=64 p=256

15 5

W [kB]

10

Erik Zeitler and Tore Risch 19

Parasplit*

Tree shaped window router

1 000,00

900,00

800,00

700,00

600,00

500,00

400,00

300,00

200,00

100,00

0,00

0

W = 16 kB

31 Aug 2011 window router tree (parasplit*) single window router (parasplit)

100 200 p p

300

Erik Zeitler and Tore Risch

400 500

20

Outline

1.

Introduction

2.

Stream splitting strategies

3.

Scale-up of stream splitting strategies

4.

Cost model and heuristic

5.

Energy efficiency

6.

Related work

7.

Conclusions and future work

31 Aug 2011 Erik Zeitler and Tore Risch 21

Eliminate p parasplit(stream s, integer q, function rfn , function bfn )

 vector of stream sv fsplit

PR p

 Given

• Input stream rate Φ

D

• Parallelism of continuous query

 Automatically determine

• fsplit parallelism p q fsplit fsplit q

31 Aug 2011 Erik Zeitler and Tore Risch

CQ

CQ

CQ

CQ

CQ

CQ

CQ

CQ

22

Cost model for fsplit

consume split emit(R

1

)

...

R

1

...

emit(R q

) R q

C fsplit

 cr

 cs ( o

 r

 q

 b )

 ce ( r

 q

 b )

 cr – read cost per tpl (read + de-marshal) cs – split cost per tpl (execute rfn and bfn ) ce – emit cost per tpl (marshal + print) o – omit % r – routing % according to rfn and bfn b – broadcast % q – number of output streams

31 Aug 2011 Erik Zeitler and Tore Risch 23

Cost model for merge in CQ

S

1

...

S p consume(S

1

)

...

merge compute split emit(R

1

)

...

consume(S p

)

C

CQ emit(R w

)

 cr

 p

 cp

 cm

O

R

1

...

R w cr – read cost per tpl (read + de-marshal) cp – poll cost per tpl cm – merge cost per tpl

O – cost of executing the CQ and emit its result

31 Aug 2011 Erik Zeitler and Tore Risch 24

Cost model for parasplit

CQ fsplit

CQ

CQ

CQ

PR fsplit

CQ p

CQ fsplit q b

CQ

C

PR

 cr

W

C

C fsplit

CQ

 cr cr

W

 p cs

W

 cs

 cp o

 ce

W r cm

 q

 b

O

 ce

 r

 q

CQ p can be eliminated using cost model, but requires extensive profiling everywhere

31 Aug 2011 Erik Zeitler and Tore Risch 25

Heuristic for estimating

p

 Assume

• 1% broadcast tuples (configurable)

C

• 0% omitted tuples (configurable) fsplit

 cr

W

 cs

 o

 r

 q

 b

 ce

 r

 q

 b

 fsplit fsplit

 cs

 ce

 

0 .

99

0 .

01

 q

PR fsplit p fsplit

 Measure Φ on fsplit

(1) fsplit rfn and bfn , q = 1: cs + ce = 1/Φ fsplit q

 Estimate p by 

D

 ( 1 ) fsplit

0 .

99

0 .

01

 q

31 Aug 2011 Erik Zeitler and Tore Risch

CQ

CQ

CQ

CQ

CQ

CQ

CQ

CQ

26

1 000,00

900,00

800,00

700,00

600,00

500,00

400,00

300,00

200,00

100,00

0,00

0 p according to heuristics vs.

p using exact cost model parasplit cost model

Too high p (p=q)

Too low p (p=1)

400 500

31 Aug 2011

100 200

CQ parallelism,

300 q

Erik Zeitler and Tore Risch 27

Outline

1.

Introduction

2.

Stream splitting strategies

3.

Scale-up of stream splitting strategies

4.

Cost model and heuristic

5.

Energy efficiency

6.

Related work

7.

Conclusions and future work

31 Aug 2011 Erik Zeitler and Tore Risch 28

Estimating energy efficiency,

η

 How much extra energy does parasplit consume in comparison to fsplit?

fsplit

CQ

CQ

CQ

CQ

PR fsplit

CQ

 Conservatively assume energy consumption proportional to CPU usage:

 Useful work

• p ∙ C fsplit

 Overhead

• C

PR

• q ∙ C

CQ

( O =0)

 

C

PR

 fsplit p

 p

C fsplit

C fsplit

 q

 ( O

C

CQ

0 )

CQ

CQ

CQ

31 Aug 2011 Erik Zeitler and Tore Risch 29

31 Aug 2011

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

0

Measuring energy efficiency

parasplit* parasplit cost model

Too high p (p=q)

Too low p (p=1)

100 200

CQ parallelism,

300 q

Erik Zeitler and Tore Risch

400 500

30

Related work

 Nobody else has investigated strategies for scalable customizable stream splitting

 IBM SPADE/System S [Andrade et al 2009]

• Splitstream operator with broadcast capabilities

• Streaming throughput degrades when scaling q

 Event based systems [Brenna et al 2009]

• Custom stream splitting shown to be a bottleneck

 Gigascope [Johnson et al 2008]

• Assumes specialized stream splitting hardware

• No customizable stream splitting

 GSDM [Ivanova, Risch 2005]

• Parallel execution of expensive UDFs

• More limited parallelization

 Streaming MapReduce [Condie et al 2010]

• Does not handle scalable stream splitting

 [Balkesen, Tatbul 2011]

• Distributing entire windows over CQs

• q ≤ 4

31 Aug 2011 Erik Zeitler and Tore Risch 31

Conclusions and future work

 Naïve stream splitting is prohibitive for scale-out of CQs

 Parasplit

• eliminates the bottleneck of stream splitting, providing network bound stream rates

 Parasplit*

• provides network bound stream rates for highly scaled-out stream splitting

 Push selection predicates from CQ to

 Improve energy efficiency

 High Availability

 SCSQ home page rfn of splitstream

• http://www.it.uu.se/research/group/udbl/SCSQ.html

31 Aug 2011 Erik Zeitler and Tore Risch 32

31 Aug 2011 Erik Zeitler and Tore Risch 33

Extra material

 Window router tree

 Cost model

 LRB

• Parallelization of LRB

 Additional related work

31 Aug 2011 Erik Zeitler and Tore Risch 34

Single process window router, p =64

31 Aug 2011 Erik Zeitler and Tore Risch 35

Tree shaped window router, p =64

Parasplit

+ tree shaped window router

= parasplit*

31 Aug 2011 Erik Zeitler and Tore Risch 36

Heuristics for estimating

p

 Given

• Input stream rate Φ

D

• Parallelism of continuous query

 Determine fsplit parallelism p q

• If max stream rate of fsplit is Φ fsplit choose p such that p ∙ Φ fsplit

≥ Φ

D

 C

CQ

 cr

 p

 cp

 cm

O

• C

CQ increases with

 Must choose p p carefully

31 Aug 2011

Φ

D

PR p

Erik Zeitler and Tore Risch fsplit fsplit fsplit q

CQ

CQ

CQ

CQ

CQ

CQ

CQ

CQ

37

Linear Road Benchmark

 Simulates vehicles travelling

(and colliding)

• on a number of expressways

• using variable tolling

• based on traffic conditions and accident proximity

Input: One stream of position reports and historical queries

(account balance, daily tolls)

Continuous queries: Toll notifications, accident notifications

Output: Four result streams of responses to historical and continuous queries:

0.

toll alerts

1.

accident alerts

2.

account balance responses

3.

daily expenditure responses

 L-rating: Number of xways processed within RT constraints

31 Aug 2011 Erik Zeitler and Tore Risch 38

Parallelization of LRB using fsplit fsplit

CQ CQ fsplit

CQ CQ fsplit

Scale up q fsplit fsplit

31 Aug 2011 union union groupby toll alerts accident alerts account balance answers

Erik Zeitler and Tore Risch

Daily expenditure queries D are excluded here.

Daily expenditure data is managed by a regular DBMS.

39

Related work

 Nobody else has investigated strategies for scalable customizable stream splitting

 IBM SPADE/System S [Andrade et al 2009]

• Splitstream operator with broadcast capabilities

• Streaming throughput degrades when scaling q

 Event based systems [Brenna et al 2009]

• Custom stream splitting shown to be a bottleneck

 Gigascope [Johnson et al 2008]

• Assumes specialized stream splitting hardware

• No customizable stream splitting

 GSDM [Ivanova, Risch 2005]

• Parallel execution of expensive UDFs

• More limited parallelization

 Streaming MapReduce [Condie et al 2010]

• Does not handle scalable stream splitting

 [Balkesen, Tatbul 2011]

• Distributing entire windows over CQs

• q ≤ 4

31 Aug 2011 Erik Zeitler and Tore Risch 40

Other related work

 Medusa [Balazinska et al 2004]

• Parallel DSMS

• Dynamic migration of operators between nodes

• Without scale-out, heavy operators are bottlenecks

 Dryad [Isard et al 2007]

• User defined process graphs in QL (edges + vertices)

• SCSQ automatically generates such graphs from splitstream

 SCOPE [Chaiken et al 2008], Map-reduce-merge [Yang et al 2007]

• All these are batch systems, not DSMSs

 Distributed DBMS

• rfn and bfn are analogous for streams to fragmentation and replication conditions in DDBMS

• DDBMS do not scale out fragmentation and replication, while splitstream parallelizes rfn and bfn.

31 Aug 2011 Erik Zeitler and Tore Risch 41

Download