Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University dgregor@osl.iu.edu 1 What are the BGLs? A collection of libraries for computation on graphs/networks. Common design Graph data structures Graph algorithms Graph input/output Flexibility/customizability throughout Obsessed with performance Common interfaces throughout the collection All open source, freely available online Intro 2 The BGL Family The Original (sequential) BGL BGL-Python The Parallel BGL Parallel BGL-Python Intro 3 The Original BGL The largest and most mature BGL ~7 years of research and development Many users, contributors outside of the OSL Steadily evolving Written in C++ Generic Highly customizable Efficient (both storage and execution) Intro BGL 4 BGL: Graph Data Structures Graphs: Adaptors: adjacency_list: highly configurable with user-specified containers for vertices and edges adjacency_matrix compressed_sparse_row subgraphs, filtered graphs, reverse graphs LEDA and Stanford GraphBase Or, use your own… Intro BGL 5 Original BGL: Algorithms Searches (breadth-first, depth-first, A*) Single-source shortest paths (Dijkstra, BellmanFord, DAG) All-pairs shortest paths (Johnson, Floyd-Warshall) Minimum spanning tree (Kruskal, Prim) Components (connected, strongly connected, biconnected) Maximum cardinality matching Intro BGL 6 Max-flow (Edmonds-Karp, push-relabel) Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree) Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun) Betweenness centrality PageRank Isomorphism Vertex coloring Transitive closure Dominator tree Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL 7 Define a Graph Type Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; }; Determine the graph type: typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph; Intro BGL 8 Read in a GraphViz DOT File Build an empty graph: Graph g; Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g)); Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn); Intro BGL 9 Run Biconnected Components Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points; Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points)); Intro BGL 10 Output results Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g)); Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn); Show articulation points: cout << “Articulation points: “; for (int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; } Intro BGL 11 Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL 12 Original BGL Summary The original BGL is large, stable, efficient Lots of algorithms, graph types Peer-reviewed code with many users, nightly regression testing, etc. Performance comparable to FORTRAN. Who should use the BGL? Programmers comfortable with C++ Users with graph sizes from tens of vertices to millions of vertices Intro BGL 13 BGL-Python Python is ideal for rapid prototyping: It’s a scripting language (no compiler) Dynamically typed means less typing for you Easy to use: you already know Python… BGL-Python provides access to the BGL from within Python Similar interfaces to C++ BGL Easier to learn than C++ Great for scripting, GUI applications help(bgl.dijkstra_shortest_paths) Intro BGL Python 14 Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz("biconnected_components.dot") # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz("biconnected_components_out.dot") print "Articulation points: ", node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print "" Intro BGL Python 15 Wrapping the BGL in Python BGL-Python is not a… BGL-Python wraps the C++ BGL “port” reimplementation Python calls translate to C++ calls C++ can call back into Python Most of the speed of C++ Most of the flexibility of Python 16 Performance: Shortest Paths 30 25 20 Seconds 15 10 5 0 BGL Dijkstra Intro BGL Python BGL Dijkstra w ith Python Dijkstra Python Visitor 17 BGL-Python Summary BGL-Python is all about tradeoffs: More gradual learning curve Faster time-to-solution Lower performance Our typical approach: 1. 2. Prototype in Python to get your ideas down Port to C++ when performance matters Intro BGL Python 18 19 The Parallel BGL A version of the C++ BGL for computational clusters Distributed memory for huge graphs Parallel processing for improved performance An active research project Closely related to the original BGL Parallelizing BGL programs should be “easy” Intro BGL Python Parallel 20 Parallel BGL: Distributed Graphs A simple, directed graph… Intro BGL distributed across 3 processors. Python Parallel 21 Parallel Graph Algorithms Breadth-first search Eager Dijkstra’s single-source shortest paths Crauser et al. singlesource shortest paths Depth-first search Minimum spanning tree (Boruvka, Dehne & Götz) Intro BGL Python Parallel 22 Connected components Strongly connected components Biconnected components PageRank Graph coloring Fruchterman-Reingold layout Max-flow (Dinic’s) Performance: Sparse graphs Wall Clock Time (seconds) 1000 Breadth-First Search Crauser et al. Eager Dijkstra 0.1 Dense Boruvka Merging Local MSFs Boruvka-Then-Merge Boruvka-Mixed-Merge Boman et al Coloring 100 10 1 1 10 # of Processors 100 23 Scalability (~547k vertices/node) 400 Up to 70M Vertices 1B Edges Small-World Graph Wall Clock Time (seconds) 350 300 Breadth-First Search 250 Crauser et al. Shortest Paths Eager Dijkstra Shortest Paths Connected Components 200 150 Vertex Coloring 100 50 0 0 50 100 # of Processors 150 24 Performance vs. CGMgraph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Python Parallel 25 Parallel BGL Summary The Parallel BGL is built for huge graphs Parallel programming has a learning curve Millions to hundreds of millions of nodes Distributed-memory parallel processing on clusters Future work will permit larger graphs… Parallel graph algorithms much harder to write Distributed graph manipulation can be tricky Parallel BGL is an active research library Intro BGL Python Parallel 26 Distributed Graph Layout Intro BGL Python Parallel 27 Parallel BGL in Python Preliminary support for the Parallel BGL in Python Several options for usage with MPI: Just import boost.graph.distributed Similar interface to sequential BGL-Python Straight MPI: mpirun -np 2 python script.py pyMPI: allows interactive use of the interpreter Initially used to prototype our distributed Fruchterman-Reingold implementation. Intro BGL Python Parallel 28 Porting for Performance Intro BGL Python Parallel 29 Porting Which BGL is Right for You? Is any BGL right for you? Depends on how large your networks are: Up to 1/2 million vertices, any BGL will do C++ BGL can push to a couple million vertices For tens of millions or larger, Parallel BGL only Other considerations: You can prototype in Python, port to C++ Algorithm authors might prefer the original BGL Parallelism is very hard to manage Intro BGL Python Parallel 30 Porting Conclusion The Boost Graph Library family is a collection of full-featured graph libraries All are flexible, customizable, efficient Easy to port from Python to C++ Can port from sequential to parallel Always growing, improving Is one of the BGLs right for you? A typical “build or buy” decision Intro BGL Python Parallel 31 Porting Conclusion For More Information… (Original) Boost Graph Library http://www.boost.org/libs/graph/doc Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python Contact us! Douglas Gregor <dgregor@osl.iu.edu> Andrew Lumsdaine <lums@osl.iu.edu> Intro BGL Python Parallel 32 Porting Conclusion Other BGL Variants QuickGraph (C#) http://www.codeproject.com/cs/miscctrl/quickgraph.asp Ruby Graph Library http://rubyforge.org/projects/rgl/ Rooster Graph (Scheme) http://savannah.nongnu.org/projects/rgraph/ RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL. html Disclaimer: These are all separate projects. We do not maintain them. Intro BGL Python Parallel 33 Porting Comparative Performance BC Clustering Performance BGL vs. JUNG Wall clock time (minutes) 60 50 40 30 20 10 0 200 225 250 275 300 325 350 375 400 # of Movies BGL Intro JUNG BGL 34