Pregel: A System for Large- Scale Graph Processing

advertisement
Pregel: A System for LargeScale Graph Processing
Presented by Dylan Davis
Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Ilan Horn, Naty Leiser, Grzegorz Czajkowski
(GOOGLE, INC.)
Overview
• What is a graph?
• Graph Problems
• The Purpose of Pregel
• Model of Computation
• C++ API
• Implementation
• Applications
• Experiments
What is a graph?
G = (V, E)
Binary Tree
Graph Problems
Network Routing
Social Network Connections
The Purpose of Pregel
• Google was interested in applications that could
perform internet-related graph algorithms, such
as PageRank, so they designed Pregel to perform
these tasks efficiently.
• It is a scalable, general-purpose system for
implementing graph algorithms in a distributed
environment.
• Focus on “Thinking Like a Vertex” and parallelism
Model of Computation
Model of Computation (Vertex)
Vertex ID
Vertex Edge Value
ID
Vertex Value
Vertex
ID
Model of Computation (Superstep)
Superstep 0
Superstep 1
Compute()
Compute()
Compute()
Compute()
Compute()
Compute()
Execution Time
Superstep 2
Compute()
Compute()
Compute()
Model of Computation (Vertex Actions)
A vertex can:
Vertex ID
Vertex Value
• Modify its values
• Receive messages from
previous superstep
• Send messages
• Request topology changes
Model of Computation (State Machine)
C++ API
C++ API (Message Passing)
Destination
Vertex ID
1
Message
Value
2
57
Message
Buffer
2
C++ API (Combiners & Aggregators)
Combiner
Aggregator
C++ API (Topology Mutations)
V
Superstep
C++ API (Input and Output)
0
1
2
3
4
0
0
0
1
0
1
1
0
0
1
1
1
2
1
0
0
1
1
3
1
1
1
0
0
4
0
1
1
1
0
Implementation
Implementation (Basic Architecture)
Implementation (Program Execution)
Flow:
1. Copy user program – Master copy & worker
copies
2. Master assigns graph partitions
3. Master takes user input data, assigns to workers
– load vertex data
4. Supersteps (Compute() and send messages)
5. Save output
Implementation (Fault Tolerance)
Checkpoint
Worker
Save()
Recover
Worker
Recompute()
Worker
Save()
Worker
Save()
Worker
Recompute()
X
Worker
Implementation (Worker)
Worker
Worker
Implementation (Master)
List of
Workers
Partitions
Master
Applications
Applications (Shortest Path)
2
1
5
3
Experiments
Experiments (Description)
• Test the execution times of Pregel running the SingleSource Shortest Path algorithm.
• Use a cluster of 300 multicore commodity PCs.
• Run Pregel with Binary Tree graphs, and with a more
realistic, randomly-distributed graph.
• Results do not include initialization, graph generation,
and result verification times.
• Failure Recovery is not included (reduces overhead)
Conclusion
•Pregel is a model suitable for large-scale graph computing
with a production-quality, scalable and fault tolerant
implementation.
Programs are expressed as a sequence of iterations, in each
of which a vertex can receive messages sent in the previous
iteration, send messages to other vertices, and modify its
own state and that of its outgoing edges.
This implementation is flexible enough to express a broad
set of algorithms.
•
•
Download