kaust - The Stanford University InfoLab

advertisement
Mizan: Optimizing Graph Mining
in Large Parallel Systems
Panos Kalnis
King Abdullah University of Science and Technology (KAUST)
H. Jamjoom (IBM Watson) and Z. Khayyat, K. Awara (KAUST)
Graphs: Are they Important?

Graphs are everywhere




KAUST
Internet Web graph
Social networks
Biological networks
Processing graphs





Find patterns, rules, anomalies
Rank web pages
‘Viral' or 'word-of-mouth' marketing
Identify interactions among proteins
Computer security: anomalies in email traffic
2
Graph Research in InfoCloud
 FD3:


RDF query engine
Distributed
On-the-fly placement and indexing
KAUST
Panos
isA
works
Yasser
studies
isA

GraMi: Graph mining


KAUST
student
E.g., find frequent subgraphs
Mizan



professor
Framework for executing graph algorithms
Distributed, large-scale
GOAL: Graph DBMS
3
Existing Graph-processing Frameworks

Map-Reduce based


HADI, Pegasus
Message passing


KAUST
Pregel
Specialized graph engines

Parallel Boost Graph Library (pBGL)
4
PageRank with Map-Reduce
4
3
1
2
1
5
1
4
1
2
v2
3
v3
1
v1
5
v5
4
v4
Map-2
3
Map-3
2
Map-1
1
2
3
2
v2
2
1
3
v2
2
v2
1
v2
1
v1
1
v1
Reduce-2
3
1
3
v3
4
1
1
v3
3
v3
4
v4
4
Reduce-3
Map-1
3
Reduce-1
Write on
HDFS
Map-2
5
Write on
HDFS
Reduce-1
2
2 v2
1
v2
1
v1
v3
1
v4
v5
v4
v2
3
1
v4
v3
4
v4
5
1
5
v5
5
v5
1
v5
v2
1
v1
v2
v3
v4
v5
3
v2
v3
4
v4
5
v5
Reduce-2
3
Map-3
2
KAUST
Reduce-3
5
v5
5
Pregel[1]


KAUST
Bulk Synchronous Parallel model
Statefull model: long-lived processes compute,
communicate, and modify local state
 vs. data-flow model: process computes solely
on input data and produces output data
[1] G. Malewich et al., Pregel: a system for large scale graph processing, SIGMOD, 2010
6
Pregel Example: MAX
3
6
2
6
Example
6
1
6
from [Malewich et al., SIGMOD, 2010]
6
2
6
6
6
KAUST
6
6
6
6
7
Mizan - Overview



Min-cut partitioning of input graph
Point-to-point message passing
Good for power-law graphs
KAUST



Random partitioning of input
Ring overlay message passing
Good for non-power-law graphs
8
α – Minimum-Cut Partitioning
KAUST
9
METIS [2]
[2] Karypis and Kumar, “Multilevel k-way Partitioning Scheme for Irregular Graphs”, JPDC, 1998
KAUST
10
α – Percentage of Edge Cuts
with Minimum-Cut Partitioning
Power-law
KAUST
Non-Power-law
11
α – Node Replication
KAUST
12
α – Percentage of Edge Cuts
with Node Replication
Power-law
KAUST
Non-Power-law
13
KAUST
Partition
User’s code
Cost of Min-Cut Partitioning
14
γ – Message-passing in a Ring
Point-to-Point communication
KAUST
Ring-based communication
Mizan-γ
15
Optimizer

α  Partitioning cost (min-cut)


Pays off for power-law graphs
γ  Latency due to the ring



KAUST
Each message must be needed by many nodes
Good for non-power law graphs
Is the input power-law?




Take a random sample
Use [2] to compare with theoretical
power-law distribution
Compute pValue
0.1 ≤ pValue < 0.9 Power-law
[2] A. Clauset et al., Power-Law Distributions in Empirical Data. SIAM Review, 51(4), 2009.
16
KAUST
Real
Synthetic
Datasets & Optimizer’s Decisions
17
Example: Diameter Estimation
KAUST
18
Non-Power-law
8
EC2 instances, Diameter estimation
KAUST
19
Power-law
8
EC2 instances, Diameter estimation
KAUST
20
Cloud Computing in KAUST
KAUST
Scientific & commercial Applications
21
IBM BlueGene/P – 3D Torus Network
KAUST
22
IBM-BlueGene/P vs. Amazon EC2


KAUST
IBM/P: 850MHz
EC2: 2.4GHz
23
Points to remember

Mizan: Framework for graph algorithms in
large scale computing infrastructures




KAUST
α: Power-law graphs
γ: Non-power-law graphs
Runs on cloud and on supercomputers
To do list:



Dynamic graph placement
Hybrid (alpha and gamma)
Better optimizer
24
KAUST
Questions?
http://cloud.kaust.edu.sa
Download