A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of California, Santa Barbara Aydin Buluc & Viral Shah (UCSB) Brad McRae (NCEAS) Steve Reinhardt (Interactive Supercomputing) with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL) 1 Support: DOE Office of Science, NSF, DARPA, SGI, ISC 3D Spectral Coordinates 2 2D Histogram: RMAT Graph 3 Strongly Connected Components 4 Social Network Analysis in Matlab: 1993 Co-author graph from 1993 Householder symposium 5 Social Network Analysis in Matlab: 1993 Sparse Adjacency Matrix Which author has the most collaborators? >>[count,author] = max(sum(A)) count = 32 author = 1 >>name(author,:) ans = Golub 6 Social Network Analysis in Matlab: 1993 Have Gene Golub and Cleve Moler ever been coauthors? >> A(Golub,Moler) ans = 0 No. But how many coauthors do they have in common? >> AA = A^2; >> AA(Golub,Moler) ans = 2 And who are those common coauthors? >> name( find ( A(:,Golub) .* A(:,Moler) ), :) ans = Wilkinson VanLoan 7 Outline 8 • Infrastructure: Array-based sparse graph computation • An application: Computational ecology • Some nuts and bolts: Sparse matrix multiplication Combinatorial Scientific Computing Emerging large scale, high-performance applications: • Web search and information retrieval • Knowledge discovery • Computational biology • Dynamical systems • Machine learning • Bioinformatics • Sparse matrix methods • Geometric modeling • ... How will combinatorial methods be used by nonexperts? 9 Analogy: Matrix Division in Matlab x = A \ b; • Works for either full or sparse A • Is A square? no => use QR to solve least squares problem • Is A triangular or permuted triangular? yes => sparse triangular solve • Is A symmetric with positive diagonal elements? yes => attempt Cholesky after symmetric minimum degree • Otherwise => use LU on A(:, colamd(A)) 10 Matlab*P A = rand(4000*p, 4000*p); x = randn(4000*p, 1); y = zeros(size(x)); while norm(x-y) / norm(x) > 1e-11 y = x; x = A*x; x = x / norm(x); end; 11 Star-P Architecture Star-P client manager package manager processor #1 dense/sparse sort processor #2 ScaLAPACK processor #3 FFTW Ordinary Matlab variables processor #0 FPGA interface MPI user code UPC user code ... MATLAB® processor #n-1 server manager matrix manager 12 Distributed matrices Distributed Sparse Array Structure P0 31 41 59 26 53 1 3 2 3 1 31 1 41 53 59 26 2 3 P1 P2 Pn 13 Each processor stores local vertices & edges in a compressed row structure. Has been scaled to >108 vertices, >109 edges in interactive session. The sparse( ) Constructor • A = sparse (I, J, V, nr, nc); 14 • Input: ddense vectors I, J, V, dimensions nr, nc • Output: A(I(k), J(k)) = V(k) • Sum values with duplicate indices • Sorts triples < i, j, v > by < i, j > • Inverse: [I, J, V] = find(A); Sparse Array and Matrix Operations • dsparse layout, same semantics as ordinary full & sparse • Matrix arithmetic: +, max, sum, etc. • matrix * matrix and matrix * vector • Matrix indexing and concatenation A (1:3, [4 5 2]) = [ B(:, J) C ] ; 15 • Linear solvers: x = A \ b; using SuperLU (MPI) • Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI) Large-Scale Graph Algorithms 16 • Graph theory, algorithms, and data structures are ubiquitous in sparse matrix computation. • Time to turn the relationship around! • Represent a graph as a sparse adjacency matrix. • A sparse matrix language is a good start on primitives for computing with graphs. • Leverage the mature techniques and tools of highperformance numerical computation. Sparse Adjacency Matrix and Graph 1 2 4 7 3 AT x 5 6 ATx • Adjacency matrix: sparse array w/ nonzeros for graph edges • Storage-efficient implementation from sparse data structures 17 Breadth-First Search: Sparse mat * vec 1 2 4 7 3 AT 18 x 5 6 ATx • Multiply by adjacency matrix step to neighbor vertices • Work-efficient implementation from sparse data structures Breadth-First Search: Sparse mat * vec 1 2 4 7 3 AT 19 x 5 6 ATx • Multiply by adjacency matrix step to neighbor vertices • Work-efficient implementation from sparse data structures Breadth-First Search: Sparse mat * vec 1 2 4 7 3 AT 20 x 5 6 ATx (AT)2x • Multiply by adjacency matrix step to neighbor vertices • Work-efficient implementation from sparse data structures SSCA#2: “Graph Analysis” Benchmark (spec version 1) Fine-grained, irregular data access Searching and clustering 21 • Many tight clusters, loosely interconnected • Input data is edge triples < i, j, label(i,j) > • Vertices and edges permuted randomly Clustering by Breadth-First Search • Grow local clusters from many seeds in parallel • Breadth-first search by sparse matrix * matrix • Cluster vertices connected by many short paths % Grow each seed to vertices % reached by at least k % paths of length 1 or 2 C = sparse(seeds, 1:ns, 1, n, ns); C = A * C; C = C + A * C; C = C >= k; 22 Toolbox for Graph Analysis and Pattern Discovery Layer 1: Graph Theoretic Tools 23 • Graph operations • Global structure of graphs • Graph partitioning and clustering • Graph generators • Visualization and graphics • Scan and combining operations • Utilities Typical Application Stack Computational ecology, CFD, data exploration Applications CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya) Preconditioned Iterative Methods Graph querying & manipulation, connectivity, spanning trees, geometric partitioning, nested dissection, NNMF, . . . Graph Analysis & PD Toolbox Arithmetic, matrix multiplication, indexing, solvers (\, eigs) Distributed Sparse Matrices 24 Landscape Connnectivity Modeling 25 • Landscape type and features facilitate or impede movement of members of a species • Different species have different criteria, scales, etc. • Habitat quality, gene flow, population stability • Corridor identification, conservation planning Pumas in Southern California Habitat quality model Joshua Tree N.P. L.A. Palm Springs 26 Predicting Gene Flow with Resistive Networks N = 100 m = 0.01 Genetic vs. geographic distance: Circuit model predictions: 27 Early Experience with Real Genetic Data • Good results with wolverines, mahogany, pumas • Matlab implementation • Needed: 28 – Finer resolution – Larger landscapes – Faster interaction 5km resolution(too coarse) Combinatorics in Circuitscape 29 • Initial grid models connections to 4 or 8 neighbors. • Partition landscape into connected components with GAPDT • Graph contraction from GAPDT contracts habitats into single nodes in resistive network. (Need current flow between entire habitats.) • Data-parallel computation on large graphs - graph construction, querying and manipulation. • Ideally, model landscape at 100m resolution (for pumas). Tradeoff between resolution and time. Numerics in Circuitscape 30 • Resistance computations for pairs of habitats in the landscape • Direct methods are too slow for largest problems • Use iterative solvers via Star-P: – Hypre (PCG+AMG) – Experimenting with support graph preconditioners Parallel Circuitscape Results Pumas in southern California: – 12 million nodes – Under 1 hour (16 processors) – Original code took 3 days at coarser resolution Targeting much larger problems: – 31 Yellowstone-to-Yukon corridor Figures courtesy of Brad McRae, NCEAS Sparse Matrix times Sparse Matrix • 32 A primitive in many array-based graph algorithms: – Parallel breadth-first search – Shortest paths – Graph contraction – Subgraph / submatrix indexing – Etc. • Graphs are often not mesh-like, i.e. geometric locality and good separators. • Often do not want to optimize for one repeated operation, as in matvec for iterative methods Sparse Matrix times Sparse Matrix • 33 Current work: – Parallel algorithms with 2D data layout – Sequential hypersparse algorithms – Matrices over semirings ParSpGEMM J K B(K,J) K I * = C(I,J) A(I,K) C(I,J) += A(I,K)*B(K,J) • Based on SUMMA • Simple for non-square matrices, etc. 34 How Sparse? HyperSparse ! nnz(j) = p c 0 p nnz(j) = c p blocks Any local data structure that depends on local submatrix dimension n (such as CSR or CSC) is too wasteful. 35 SparseDComp Data Structure 36 • “Doubly compressed” data structure • Maintains both DCSC and DCSR • C = A*B needs only A.DCSC and B.DCSR • 4*nnz values communicated for A*B in the worst case (though we usually get away with much less) Sequential Operation Counts • Matlab: O(n+nnz(B)+f) • SpGEMM: O(nzc(A)+nzr(B)+f*logk) Required non- zero operations (flops) Number of columns of A containing at least one non-zero Break-even point 37 Parallel Timings time vs n/nnz, log-log plot • 16-processor Opteron, hypertransport, 64 GB memory • R-MAT * R-MAT • n = 220 • nnz = {8, 4, 2, 1, .5} * 220 38 Matrices over Semirings • Matrix multiplication C = AB (or matrix/vector): Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j • Replace scalar operations and + by : associative, distributes over , identity 1 : associative, commutative, identity 0 annihilates under 39 • Then Ci,j = Ai,1B1,j Ai,2B2,j · · · Ai,nBn,j • Examples: (,+) ; (and,or) ; (+,min) ; . . . • Same data reference pattern and control flow Remarks • Tools for combinatorial methods built on parallel sparse matrix infrastructure • Easy-to-use interactive programming environment – Rapid prototyping tool for algorithm development – Interactive exploration and visualization of data • Sparse matrix * sparse matrix is a key primitive • Matrices over semirings like (min,+) as well as (+,*) 40