Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor

advertisement
Large-Scale Network
Analysis with the Boost
Graph Libraries
Douglas Gregor
Open Systems Lab
Indiana University
dgregor@osl.iu.edu
1
What are the BGLs?

A collection of libraries for computation on
graphs/networks.




Common design




Graph data structures
Graph algorithms
Graph input/output
Flexibility/customizability throughout
Obsessed with performance
Common interfaces throughout the collection
All open source, freely available online
Intro
2
The BGL Family

The Original (sequential) BGL

BGL-Python

The Parallel BGL

Parallel BGL-Python
Intro
3
The Original BGL

The largest and most mature BGL




~7 years of research and development
Many users, contributors outside of the OSL
Steadily evolving
Written in C++



Generic
Highly customizable
Efficient (both storage and execution)
Intro
BGL
4
BGL: Graph Data Structures

Graphs:




Adaptors:



adjacency_list: highly configurable with
user-specified containers for vertices and edges
adjacency_matrix
compressed_sparse_row
subgraphs, filtered graphs, reverse graphs
LEDA and Stanford GraphBase
Or, use your own…
Intro
BGL
5
Original BGL: Algorithms






Searches (breadth-first,
depth-first, A*)
Single-source shortest
paths (Dijkstra, BellmanFord, DAG)
All-pairs shortest paths
(Johnson, Floyd-Warshall)
Minimum spanning tree
(Kruskal, Prim)
Components (connected,
strongly connected,
biconnected)
Maximum cardinality
matching
Intro
BGL









6
Max-flow (Edmonds-Karp,
push-relabel)
Sparse matrix ordering
(Cuthill-McKee, King,
Sloan, minimum degree)
Layout (Kamada-Kawai,
Fruchterman-Reingold,
Gursoy-Atun)
Betweenness centrality
PageRank
Isomorphism
Vertex coloring
Transitive closure
Dominator tree
Task: Biconnected Components
Input Graph
Output Graph
Articulation points: B G A
Intro
BGL
7
Define a Graph Type

Determine vertex/edge properties:
struct Vertex { string name; };
struct Edge { int bicomponent; };

Determine the graph type:
typedef adjacency_list<
/*EdgeListS=*/ vecS,
/*VertexListS=*/ vecS,
/*DirectedS=*/ undirectedS,
/*VertexProperty=*/ Vertex,
/*EdgeProperty=*/ Edge> Graph;
Intro
BGL
8
Read in a GraphViz DOT File

Build an empty graph:
Graph g;

Map vertex properties:
dynamic_properties dyn;
dyn.property(“node_id”,
get(&Vertex::name, g));

Read in the GraphViz graph:
ifstream in(“biconnected_components.dot”);
read_graphviz(in, g, dyn);
Intro
BGL
9
Run Biconnected Components

Keep track of the articulation points:
vector<Graph::vertex_descriptor> art_points;

Compute biconnected components:
biconnected_components
(g,
get(&Edge::bicomponent, g),
back_inserter(art_points));
Intro
BGL
10
Output results

Attach bicomponent number to the “label” property
of edges:
dyn.property(“label”,
get(&Edge::bicomponent, g));

Write results to another GraphViz file:
ofstream out(“bc_out.dot”);
write_graphviz(out, g, dyn);

Show articulation points:
cout << “Articulation points: “;
for (int i = 0;i < art_points.size(); ++i) {
cout << g[art_points[i]].name << ‘ ‘;
}
Intro
BGL
11
Task: Biconnected Components
Input Graph
Output Graph
Articulation points: B G A
Intro
BGL
12
Original BGL Summary

The original BGL is large, stable, efficient




Lots of algorithms, graph types
Peer-reviewed code with many users, nightly
regression testing, etc.
Performance comparable to FORTRAN.
Who should use the BGL?


Programmers comfortable with C++
Users with graph sizes from tens of vertices to
millions of vertices
Intro
BGL
13
BGL-Python

Python is ideal for rapid prototyping:




It’s a scripting language (no compiler)
Dynamically typed means less typing for you
Easy to use: you already know Python…
BGL-Python provides access to the BGL
from within Python




Similar interfaces to C++ BGL
Easier to learn than C++
Great for scripting, GUI applications
help(bgl.dijkstra_shortest_paths)
Intro
BGL
Python
14
Example: Biconnected Components
import boost.graph as bgl # Pull in the BGL bindings
g = bgl.Graph.read_graphviz("biconnected_components.dot")
# Compute biconnected components and articulation points
bicomponent = g.edge_property_map(‘int’)
art_points = bgl.biconnected_components(g, bicomponent);
# Save results with bicomponent numbers as edge labels
g.edge_properties[‘label’] = bicomponent
g.write_graphviz("biconnected_components_out.dot")
print "Articulation points: ",
node_id = g.vertex_properties[‘node_id’]
for v in art_points:
print node_id[v],’ ’,
print ""
Intro
BGL
Python
15
Wrapping the BGL in Python

BGL-Python is not a…



BGL-Python wraps the
C++ BGL




“port”
reimplementation
Python calls translate to
C++ calls
C++ can call back into
Python
Most of the speed of C++
Most of the flexibility of
Python
16
Performance: Shortest Paths
30
25
20
Seconds 15
10
5
0
BGL Dijkstra
Intro
BGL
Python
BGL Dijkstra w ith Python Dijkstra
Python Visitor
17
BGL-Python Summary

BGL-Python is all about tradeoffs:




More gradual learning curve
Faster time-to-solution
Lower performance
Our typical approach:
1.
2.
Prototype in Python to get your ideas down
Port to C++ when performance matters
Intro
BGL
Python
18
19
The Parallel BGL

A version of the C++ BGL
for computational clusters




Distributed memory for huge
graphs
Parallel processing for
improved performance
An active research project
Closely related to the
original BGL

Parallelizing BGL programs
should be “easy”
Intro
BGL
Python Parallel
20
Parallel BGL: Distributed Graphs
A simple, directed graph…
Intro
BGL
distributed across 3 processors.
Python Parallel
21
Parallel Graph Algorithms





Breadth-first search
Eager Dijkstra’s
single-source shortest
paths
Crauser et al. singlesource shortest paths
Depth-first search
Minimum spanning
tree (Boruvka, Dehne
& Götz)
Intro
BGL







Python Parallel
22
Connected
components
Strongly connected
components
Biconnected
components
PageRank
Graph coloring
Fruchterman-Reingold
layout
Max-flow (Dinic’s)
Performance: Sparse graphs
Wall Clock Time (seconds)
1000
Breadth-First Search
Crauser et al.
Eager Dijkstra 0.1
Dense Boruvka
Merging Local MSFs
Boruvka-Then-Merge
Boruvka-Mixed-Merge
Boman et al Coloring
100
10
1
1
10
# of Processors
100
23
Scalability (~547k vertices/node)
400
Up to 70M Vertices
1B Edges
Small-World Graph
Wall Clock Time (seconds)
350
300
Breadth-First Search
250
Crauser et al. Shortest
Paths
Eager Dijkstra Shortest
Paths
Connected Components
200
150
Vertex Coloring
100
50
0
0
50
100
# of Processors
150
24
Performance vs. CGMgraph
96k vertices
10M edges
Erdos-Renyi
17x
30x
Intro
BGL
Python Parallel
25
Parallel BGL Summary

The Parallel BGL is built for huge graphs




Parallel programming has a learning curve



Millions to hundreds of millions of nodes
Distributed-memory parallel processing on
clusters
Future work will permit larger graphs…
Parallel graph algorithms much harder to write
Distributed graph manipulation can be tricky
Parallel BGL is an active research library
Intro
BGL
Python Parallel
26
Distributed Graph Layout
Intro
BGL
Python Parallel
27
Parallel BGL in Python

Preliminary support for the Parallel BGL in
Python



Several options for usage with MPI:



Just import boost.graph.distributed
Similar interface to sequential BGL-Python
Straight MPI: mpirun -np 2 python script.py
pyMPI: allows interactive use of the interpreter
Initially used to prototype our distributed
Fruchterman-Reingold implementation.
Intro
BGL
Python Parallel
28
Porting for Performance
Intro
BGL
Python Parallel
29
Porting
Which BGL is Right for You?


Is any BGL right for you?
Depends on how large your networks are:




Up to 1/2 million vertices, any BGL will do
C++ BGL can push to a couple million vertices
For tens of millions or larger, Parallel BGL only
Other considerations:



You can prototype in Python, port to C++
Algorithm authors might prefer the original BGL
Parallelism is very hard to manage
Intro
BGL
Python Parallel
30
Porting
Conclusion

The Boost Graph Library family is a
collection of full-featured graph libraries





All are flexible, customizable, efficient
Easy to port from Python to C++
Can port from sequential to parallel
Always growing, improving
Is one of the BGLs right for you?

A typical “build or buy” decision
Intro
BGL
Python Parallel
31
Porting Conclusion
For More Information…




(Original) Boost Graph Library
http://www.boost.org/libs/graph/doc
Parallel Boost Graph Library
http://www.osl.iu.edu/research/pbgl
Python Bindings for (Parallel) BGL
http://www.osl.iu.edu/~dgregor/bgl-python
Contact us!


Douglas Gregor <dgregor@osl.iu.edu>
Andrew Lumsdaine <lums@osl.iu.edu>
Intro
BGL
Python Parallel
32
Porting Conclusion
Other BGL Variants

QuickGraph (C#)
http://www.codeproject.com/cs/miscctrl/quickgraph.asp

Ruby Graph Library
http://rubyforge.org/projects/rgl/

Rooster Graph (Scheme)
http://savannah.nongnu.org/projects/rgraph/

RBGL (an R interface to the C++ BGL)
http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.
html

Disclaimer: These are all separate
projects. We do not maintain them.
Intro
BGL
Python Parallel
33
Porting
Comparative Performance
BC Clustering Performance
BGL vs. JUNG
Wall clock time (minutes)
60
50
40
30
20
10
0
200
225
250
275
300
325
350
375
400
# of Movies
BGL
Intro
JUNG
BGL
34
Download