Introduction to epiC E3 engine

advertisement
epiC: an Extensible and
Scalable System for
Processing
Big Data
Why we need another new
MapReduce-like system?

MapReduce



Pregel/GraphLab/Dryad



M/R framework cannot handle iterative processing efficiently
Everything needs to be transformed into map and reduce
functions
DAG based data flow
User should design how the graph is constructed and how
different operators are linked
Can we combine the advantages of both types of
systems?
Overview of epiC




Unit works independently
Units communicate via “email”
Master works as mail server to
forward the messages
epiC is based on the actormodel
master
master
master network
master
control msg
message queue
epiC code
I/O
library
Unit: PageRank
control msg
master
message service
schedule service
control msg
message queue
epiC code
naming service
I/O
library
Unit: PageRank
message queue
epiC code
I/O
library
Unit: PageRank
Distributed Storage System (DFS, Key-value store, Dstributed Database,...)
Compare epiC to MapReduce and
Pregel


Using PageRank as an example
MapReduce:


Multi-iterations
The second job loads the output of the first job to continue
the processing
iterations
graph
data
score
vector
divide the score
to neighbors
compute the new
score of each vertex
map
reduce
map
reduce
...
...
map
reduce
score
vector
Compare epiC to MapReduce and
Pregel

Pregel:
 In
each super-step, the vertex computes its new
PageRank values and broadcasts the value to its
neighbors
receive scores
from neighbors
...
broadcast scores
to neighbors
compute new
score
...
Compare epiC to MapReduce and
Pregel

epiC





0. send messages to unit to activate it
1. Unit loads a partition of graph data and score vector based on the
received message
2. compute new score vector of vertices
3. generate new score vector files
4. send messages to master network
2
0
message queue
3
epiC code
I/O
library
Unit: PageRank
4
master network
storage system
1
Compare epiC to MapReduce and
Pregel

Flexibility:


Optimization:


MR is not designed for such job. Pregel and epiC can express the algorithm
more effectively. Unit in epiC is equivalent to the worker of Pregel
Both MR and epiC supports customized optimization, e.g., buffering the
intermediate results in local disk
Extensibility:

MR and Pregel have their pre-defined programming model, while in
epiC, users can create their own.
Using epiC to simulate MR



Create two basic
units: MapUnit and
ReduceUnit
MapUnit loads a
partition of data and
sends messages to
all ReduceUnits
ReduceUnit gets its
input from the DFS.
The locations of the
input are obtained
from the messages
of MapUnits.
Using epiC to simulate Relational
DB

Three Units are created:
 SingleTableUnit:
Handles all processings on a
single Table
 JoinUnit: Joins two or more tables
 AggregatUnit: applies the group by operator
and computes the aggregation results
Using epiC to simulate Relational
DB
Example: TPC-H Q3
 5 steps are required

TPC-H Q3 (Step 1)
Master network
Partition info of
Customer
Partition info of
Orders
SingleTableUnit
select c_custkey from
Customer where
c_mktsegment = ':1'
Partition info of
Lineitem
SingleTableUnit
select o_orderdate, o_custkey,
o_orderkey, o_shippriority from
Orders where o_orderdate < date
':2'
SingleTableUnit
select l_orderkey, l_extendedprice,
l_discount from Lineitem where
l_shipdate > date ':2'
Relational Data of Customer/Orders/Lineitem
TPC-H Q3 (Step 2 and 3)
Master network
Partition info of the
Partial Results of
Lineitem and Orders
JoinUnit
Create Partition JoinView1 as
(Lineitem join Orders)
Partial Results of Customer/Lineitem
Master network
Partition info of the Partial
Results of Customer and
JoinView1
JoinUnit
Create Partition JoinView2 as
(Customer join JoinView1)
Partial Results of Customer/JoinView1
TPC-H Q3 (Step 4 and 5)
Master network
Partition info of the Partial
Results of JoinView2
SingleTableUnit
Select * from JoinView2 Group
By o_orderdate, o_shippriority
Partial Results of JoinView2
Master network
Partition info of Groups
AggregateUnit
Compute Aggregation Results
for Each Group
Partial Results of Group By
Download