Parallel Sparse Operations in Matlab: John R. Gilbert Exploring Large Graphs

advertisement
Parallel Sparse Operations in Matlab:
Exploring Large Graphs
John R. Gilbert
University of California at Santa Barbara
Aydin Buluc (UCSB)
Brad McRae (NCEAS)
Steve Reinhardt (Interactive Supercomputing)
Viral Shah (ISC & UCSB)
with thanks to Alan Edelman (MIT & ISC)
and Jeremy Kepner (MIT-LL)
1
Support: DOE, NSF, DARPA, SGI, ISC
3D Spectral Coordinates
2
2D Histogram: RMAT Graph
3
Strongly Connected Components
4
Social Network Analysis in Matlab: 1993
Co-author graph
from 1993
Householder
symposium
5
Combinatorial Scientific Computing
Emerging large scale, high-performance applications:
•
Web search and information retrieval
•
Knowledge discovery
•
Computational biology
•
Dynamical systems
•
Machine learning
•
Bioinformatics
•
Sparse matrix methods
•
Geometric modeling
•
...
How will combinatorial methods be used by nonexperts?
6
Outline
7
•
Infrastructure: Array-based sparse graph computation
•
An application: Computational ecology
•
Some nuts and bolts: Sparse matrix multiplication
Matlab*P
A = rand(4000*p, 4000*p);
x = randn(4000*p, 1);
y = zeros(size(x));
while norm(x-y) / norm(x) > 1e-11
y = x;
x = A*x;
x = x / norm(x);
end;
8
Star-P Architecture
Star-P
client manager
package manager
processor #1
dense/sparse
sort
processor #2
ScaLAPACK
processor #3
FFTW
Ordinary Matlab variables
processor #0
FPGA interface
MPI user code
UPC user code
...
MATLAB®
processor #n-1
server manager
matrix manager
9
Distributed matrices
Distributed Sparse Array Structure
P0
31
41
59
26
53
1
3
2
3
1
31
1
41
53
59
26
2
3
P1
P2
Pn
10
Each processor stores
local vertices & edges
in a compressed row structure.
Has been scaled to >108 vertices,
>109 edges in interactive session.
Sparse Array and Matrix Operations
•
dsparse layout, same semantics as ordinary full & sparse
•
Matrix arithmetic: +, max, sum, etc.
•
matrix * matrix and matrix * vector
•
Matrix indexing and concatenation
A (1:3, [4 5 2]) = [ B(:, J) C ] ;
11
•
Linear solvers: x = A \ b; using SuperLU (MPI)
•
Eigensolvers: [V, D] = eigs(A); using PARPACK (MPI)
Large-Scale Graph Algorithms
12
•
Graph theory, algorithms, and data structures are
ubiquitous in sparse matrix computation.
•
Time to turn the relationship around!
•
Represent a graph as a sparse adjacency matrix.
•
A sparse matrix language is a good start on primitives
for computing with graphs.
•
Leverage the mature techniques and tools of highperformance numerical computation.
Sparse Adjacency Matrix and Graph

1
2
4
7
3
AT
x
5
6
ATx
• Adjacency matrix: sparse array w/ nonzeros for graph edges
• Storage-efficient implementation from sparse data structures
13
Breadth-First Search: sparse mat * vec

1
2
4
7
3
AT
14
x
5
6
ATx
•
Multiply by adjacency matrix  step to neighbor vertices
•
Work-efficient implementation from sparse data structures
Breadth-First Search: sparse mat * vec

1
2
4
7
3
AT
15
x
5
6
ATx
•
Multiply by adjacency matrix  step to neighbor vertices
•
Work-efficient implementation from sparse data structures
Breadth-First Search: sparse mat * vec


1
2
4
7
3
AT
16
x
5
6
ATx (AT)2x
•
Multiply by adjacency matrix  step to neighbor vertices
•
Work-efficient implementation from sparse data structures
HPCS Graph Clustering Benchmark
Fine-grained, irregular data access
Searching and clustering
17
•
Many tight clusters, loosely interconnected
•
Input data is edge triples < i, j, label(i,j) >
•
Vertices and edges permuted randomly
Clustering by Breadth-First Search
• Grow local clusters from many seeds in parallel
• Breadth-first search by sparse matrix * matrix
• Cluster vertices connected by many short paths
% Grow each seed to vertices
%
reached by at least k
%
paths of length 1 or 2
C = sparse(seeds, 1:ns, 1, n, ns);
C = A * C;
C = C + A * C;
C = C >= k;
18
Toolbox for Graph Analysis
and Pattern Discovery
Layer 1: Graph Theoretic Tools
19
•
Graph operations
•
Global structure of graphs
•
Graph partitioning and clustering
•
Graph generators
•
Visualization and graphics
•
Scan and combining operations
•
Utilities
Typical Application Stack
Computational ecology, CFD, data exploration
Applications
CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)
Preconditioned Iterative Methods
Graph querying & manipulation, connectivity, spanning trees,
geometric partitioning, nested dissection, NNMF, . . .
Graph Analysis & PD Toolbox
Arithmetic, matrix multiplication, indexing, solvers (\, eigs)
Distributed Sparse Matrices
20
Landscape Connnectivity Modeling
21
•
Landscape type and features facilitate or impede
movement of members of a species
•
Different species have different criteria, scales, etc.
•
Habitat quality, gene flow, population stability
•
Corridor identification, conservation planning
Pumas in Southern California
Habitat quality model
Joshua Tree N.P.
L.A.
Palm Springs
22
Predicting Gene Flow with Resistive Networks
N = 100 m = 0.01
Genetic vs. geographic distance:
Circuit model predictions:
23
Early Experience with Real Genetic Data
•
Good results with wolverines,
mahogany, pumas
•
Matlab implementation
•
Needed:
24
–
Finer resolution
–
Larger landscapes
–
Faster interaction
5km resolution(too coarse)
Circuitscape: Combinatorics and Numerics
•
Model landscape (ideally at 100m resolution for pumas).
•
Initial grid models connections to 4 or 8 neighbors.
•
Partition landscape into connected components via GAPDT
•
Use GAPDT to contract habitats into single graph nodes.
•
Compute resistance for pairs of habitats .
•
Direct methods are too slow for largest problems.
•
Use iterative solvers via Star-P:Hypre (PCG+AMG)
25
Parallel Circuitscape Results
Pumas in southern California:
–
12 million nodes
–
Under 1 hour (16 processors)
–
Original code took 3 days at
coarser resolution
Targeting much larger problems:
–
26
Yellowstone-to-Yukon corridor
Figures courtesy of Brad McRae, NCEAS
Sparse Matrix times Sparse Matrix
•
27
A primitive in many array-based graph algorithms:
–
Parallel breadth-first search
–
Shortest paths
–
Graph contraction
–
Subgraph / submatrix indexing
–
Etc.
•
Graphs are often not mesh-like, i.e. geometric locality
and good separators.
•
Often do not want to optimize for one repeated
operation, as in matvec for iterative methods
Sparse Matrix times Sparse Matrix
•
28
Current work:
–
Parallel algorithms with 2D data layout
–
Sequential and parallel hypersparse algorithms
–
Matrices over semirings
ParSpGEMM
J
K
B(K,J)
K
I
*
=
C(I,J)
A(I,K)
C(I,J) += A(I,K)*B(K,J)
• Based on SUMMA
• Simple for non-square
matrices, etc.
29
How Sparse? HyperSparse !
nnz(j) =
p
c
0
p
nnz(j) = c
p blocks
 Any local data structure that depends on local submatrix
dimension n (such as CSR or CSC) is too wasteful.
30
SparseDComp Data Structure
31
•
“Doubly compressed” data structure
•
Maintains both DCSC and DCSR
•
C = A*B needs only A.DCSC and B.DCSR
•
4*nnz values communicated for A*B in the worst case
(though we usually get away with much less)
Sequential Operation Counts
•
Matlab: O(n+nnz(B)+f)
•
SpGEMM: O(nzc(A)+nzr(B)+f*logk)
Required non- zero
operations (flops)
Number of columns
of A containing at
least one non-zero
Break-even
point
32
Parallel Timings
time vs n/nnz, log-log plot
• 16-processor Opteron,
hypertransport,
64 GB memory
• R-MAT * R-MAT
• n = 220
• nnz = {8, 4, 2, 1, .5} * 220
33
Matrices over Semirings
•
Matrix multiplication C = AB (or matrix/vector):
Ci,j = Ai,1B1,j + Ai,2B2,j + · · · + Ai,nBn,j
•
Replace scalar operations  and + by
 : associative, distributes over , identity 1
 : associative, commutative, identity 0 annihilates under 
34
•
Then Ci,j = Ai,1B1,j  Ai,2B2,j  · · ·  Ai,nBn,j
•
Examples: (,+) ; (and,or) ; (+,min) ; . . .
•
Same data reference pattern and control flow
Remarks
• Tools for combinatorial methods built on parallel
sparse matrix infrastructure
• Easy-to-use interactive programming environment
– Rapid prototyping tool for algorithm development
– Interactive exploration and visualization of data
• Sparse matrix * sparse matrix is a key primitive
• Matrices over semirings like (min,+) as well as (+,*)
35
Download