Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae (NCEAS) Steve Reinhardt (Interactive Supercomputing) Viral Shah (ISC & UCSB) with thanks to Alan Edelman (MIT & ISC) and Jeremy Kepner (MIT-LL) 1 Support: DOE, NSF, DARPA, SGI, ISC 3D Spectral Coordinates 2 2D Histogram: RMAT Graph 3 Strongly Connected Components 4 Social Network Analysis in Matlab: 1993 Co-author graph from 1993 Householder symposium 5 Combinatorial Scientific Computing Emerging large scale, high-performance applications: • Web search and information retrieval • Knowledge discovery • Computational biology • Dynamical systems • Machine learning • Bioinformatics • Sparse matrix methods • Geometric modeling • ... How will combinatorial methods be used by nonexperts? 6 Outline 7 • Infrastructure: Array-based sparse graph computation • An application: Computational ecology • Some nuts and bolts: Sparse matrix multiplication Matlab*P A = rand(4000*p, 4000*p); x = randn(4000*p, 1); y = zeros(size(x)); while norm(x-y) / norm(x) > 1e-11 y = x; x = A*x; x = x / norm(x); end; 8 Star-P Architecture Star-P client manager package manager processor #1 dense/sparse sort processor #2 ScaLAPACK processor #3 FFTW Ordinary Matlab variables processor #0 FPGA interface MPI user code UPC user code ... MATLAB® processor #n-1 server manager matrix manager 9 Distributed matrices Distributed Sparse Array Structure P0 31 41 59 26 53 1 3 2 3 1 31 1 41 53 59 26 2 3 P1 P2 Pn 10 Each processor stores local vertices & edges in a compressed row structure. Has been scaled to >108 vertices, >109 edges in interactive session. Sparse Array and Matrix Operations • dsparse layout, same semantics as ordinary full & sparse • Matrix arithmetic: +, max, sum, etc. • matrix * matrix and matrix * vector • Matrix indexing and concatenation A (1:3, [4 5 2]) = [ B(:, J) C ] ; 11 • Linear solvers: x = A \ b; using SuperLU (MPI) • Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI) Large-Scale Graph Algorithms 12 • Graph theory, algorithms, and data structures are ubiquitous in sparse matrix computation. • Time to turn the relationship around! • Represent a graph as a sparse adjacency matrix. • A sparse matrix language is a good start on primitives for computing with graphs. • Leverage the mature techniques and tools of highperformance numerical computation. Sparse Adjacency Matrix and Graph 1 2 4 7 3 AT x 5 6 ATx • Adjacency matrix: sparse array w/ nonzeros for graph edges • Storage-efficient implementation from sparse data structures 13 Breadth-First Search: sparse mat * vec 1 2 4 7 3 AT 14 x 5 6 ATx • Multiply by adjacency matrix step to neighbor vertices • Work-efficient implementation from sparse data structures Breadth-First Search: sparse mat * vec 1 2 4 7 3 AT 15 x 5 6 ATx • Multiply by adjacency matrix step to neighbor vertices • Work-efficient implementation from sparse data structures Breadth-First Search: sparse mat * vec 1 2 4 7 3 AT 16 x 5 6 ATx (AT)2x • Multiply by adjacency matrix step to neighbor vertices • Work-efficient implementation from sparse data structures HPCS Graph Clustering Benchmark Fine-grained, irregular data access Searching and clustering 17 • Many tight clusters, loosely interconnected • Input data is edge triples < i, j, label(i,j) > • Vertices and edges permuted randomly Clustering by Breadth-First Search • Grow local clusters from many seeds in parallel • Breadth-first search by sparse matrix * matrix • Cluster vertices connected by many short paths % Grow each seed to vertices % reached by at least k % paths of length 1 or 2 C = sparse(seeds, 1:ns, 1, n, ns); C = A * C; C = C + A * C; C = C >= k; 18 Toolbox for Graph Analysis and Pattern Discovery Layer 1: Graph Theoretic Tools 19 • Graph operations • Global structure of graphs • Graph partitioning and clustering • Graph generators • Visualization and graphics • Scan and combining operations • Utilities Typical Application Stack Computational ecology, CFD, data exploration Applications CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya) Preconditioned Iterative Methods Graph querying & manipulation, connectivity, spanning trees, geometric partitioning, nested dissection, NNMF, . . . Graph Analysis & PD Toolbox Arithmetic, matrix multiplication, indexing, solvers (\, eigs) Distributed Sparse Matrices 20 Landscape Connnectivity Modeling 21 • Landscape type and features facilitate or impede movement of members of a species • Different species have different criteria, scales, etc. • Habitat quality, gene flow, population stability • Corridor identification, conservation planning Pumas in Southern California Habitat quality model Joshua Tree N.P. L.A. Palm Springs 22 Predicting Gene Flow with Resistive Networks N = 100 m = 0.01 Genetic vs. geographic distance: Circuit model predictions: 23 Early Experience with Real Genetic Data • Good results with wolverines, mahogany, pumas • Matlab implementation • Needed: 24 – Finer resolution – Larger landscapes – Faster interaction 5km resolution(too coarse) Circuitscape: Combinatorics and Numerics • Model landscape (ideally at 100m resolution for pumas). • Initial grid models connections to 4 or 8 neighbors. • Partition landscape into connected components via GAPDT • Use GAPDT to contract habitats into single graph nodes. • Compute resistance for pairs of habitats . • Direct methods are too slow for largest problems. • Use iterative solvers via Star-P:Hypre (PCG+AMG) 25 Parallel Circuitscape Results Pumas in southern California: – 12 million nodes – Under 1 hour (16 processors) – Original code took 3 days at coarser resolution Targeting much larger problems: – 26 Yellowstone-to-Yukon corridor Figures courtesy of Brad McRae, NCEAS Sparse Matrix times Sparse Matrix • 27 A primitive in many array-based graph algorithms: – Parallel breadth-first search – Shortest paths – Graph contraction – Subgraph / submatrix indexing – Etc. • Graphs are often not mesh-like, i.e. geometric locality and good separators. • Often do not want to optimize for one repeated operation, as in matvec for iterative methods Sparse Matrix times Sparse Matrix • 28 Current work: – Parallel algorithms with 2D data layout – Sequential and parallel hypersparse algorithms – Matrices over semirings ParSpGEMM J K B(K,J) K I * = C(I,J) A(I,K) C(I,J) += A(I,K)*B(K,J) • Based on SUMMA • Simple for non-square matrices, etc. 29 How Sparse? HyperSparse ! nnz(j) = p c 0 p nnz(j) = c p blocks Any local data structure that depends on local submatrix dimension n (such as CSR or CSC) is too wasteful. 30 SparseDComp Data Structure 31 • “Doubly compressed” data structure • Maintains both DCSC and DCSR • C = A*B needs only A.DCSC and B.DCSR • 4*nnz values communicated for A*B in the worst case (though we usually get away with much less) Sequential Operation Counts • Matlab: O(n+nnz(B)+f) • SpGEMM: O(nzc(A)+nzr(B)+f*logk) Required non- zero operations (flops) Number of columns of A containing at least one non-zero Break-even point 32 Parallel Timings time vs n/nnz, log-log plot • 16-processor Opteron, hypertransport, 64 GB memory • R-MAT * R-MAT • n = 220 • nnz = {8, 4, 2, 1, .5} * 220 33 Matrices over Semirings • Matrix multiplication C = AB (or matrix/vector): Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j • Replace scalar operations and + by : associative, distributes over , identity 1 : associative, commutative, identity 0 annihilates under 34 • Then Ci,j = Ai,1B1,j Ai,2B2,j · · · Ai,nBn,j • Examples: (,+) ; (and,or) ; (+,min) ; . . . • Same data reference pattern and control flow Remarks • Tools for combinatorial methods built on parallel sparse matrix infrastructure • Easy-to-use interactive programming environment – Rapid prototyping tool for algorithm development – Interactive exploration and visualization of data • Sparse matrix * sparse matrix is a key primitive • Matrices over semirings like (min,+) as well as (+,*) 35