Graph Algorithms in Numerical Linear Algebra

advertisement
Graph Algorithms in
Numerical Linear Algebra:
Past, Present, and Future
John R. Gilbert
MIT and UC Santa Barbara
September 28, 2002
Outline
• Past (one extended example)
• Present (two quick examples)
• Future (6 or 8 examples)
A few themes
•
•
•
•
•
Paths
Locality
Eigenvectors
Huge data sets
Multiple scales
PAST
Graphs and sparse Gaussian elimination (1961-)
Fill: new nonzeros in factor
3
1
6
8
4
9
7
G(A)
2
4
9
7
6
8
10
5
3
1
10
5
G+(A)
[chordal]
2
Cholesky factorization:
for j = 1 to n
add edges between j’s
higher-numbered neighbors
Fill-reducing matrix permutations
Matrix reordered by nested dissection
0
20
40
60
80
100
Vertex separator in graph of matrix
120
0
Elimination tree with nested dissection
50
nz = 844
100
• Theory: approx optimal separators => approx optimal fill and flop count
• Orderings: nested dissection, minimum degree, hybrids
• Graph partitioning: spectral, geometric, multilevel
Directed graph
1
2
4
7
3
A
6
G(A)
• A is square, unsymmetric, nonzero diagonal
• Edges from rows to columns
• Symmetric permutations PAPT
5
Symbolic Gaussian elimination
1
2
4
7
3
A
L+U
6
G+(A)
• Add fill edge a -> b if there is a path from a to b
through lower-numbered vertices.
5
Strongly connected components
1
2
4
7
5
3
6
1
1
2
4
7
2
4
7
5
5
3
3
6
PAPT
6
G(A)
•
Symmetric permutation to block triangular form
•
Solve Ax=b by block back substitution
•
Irreducible (strong Hall) diagonal blocks
•
Row and column partitions are independent of choice of nonzero diagonal
•
Find P in linear time by depth-first search
Sparse-sparse triangular solve
1
2
3
4
1
5
=
3
2
5
4
L
x
b
G(LT)
1. Symbolic:
–
Predict structure of x by depth-first search from nonzeros of b
2. Numeric:
–
Compute values of x in topological order
Time = O(flops)
Column intersection graph
1
2
3
4
1
5
2
3
4
3
5
1
2
4
5
1
2
3
4
5
A
ATA
G(A)
• G(A) = G(ATA) if no cancellation (otherwise )
• Permuting the rows of A does not change G(A)
Filled column intersection graph
1
2
3
4
1
5
2
3
4
3
5
1
2
4
5
1
2
3
4
5
A
chol(ATA)
G+(A)
• G+(A) = symbolic Cholesky factor of ATA
• In PA=LU, G(U)  G+(A) and G(L)  G+(A)
• Tighter bound on L from symbolic QR
• Bounds are best possible if A is strong Hall
Column elimination tree
5
1
2
3
4
1
5
2
3
4
5
1
4
2
3
2
3
4
5
A
1
chol(ATA)
T(A)
• Elimination tree of ATA (if no cancellation)
• Depth-first spanning tree of G+(A)
• Represents column dependencies in various factorizations
That story continues . . .
•
•
•
[Matlab 4.0, 1992]
Left-looking column-by-column LU factorization
Depth-first search to predict structure of each column
•
•
Slow search limited speed
BLAS-1 limited cache reuse
...
•
•
SuperLU: supernodal BLAS-2.5 LU
UMFPACK: multifrontal BLAS-3 LU [Matlab 6.5, 2002]
• Ordering for nonsymmetric LU is still not well understood
PRESENT
Support graph preconditioning
[http://www.cs.sandia.gov/~bahendr/support.html]
• Define a preconditioner B for matrix A
• Explicitly compute the factorization B = LU
• Choose nonzero structure of B to make factoring cheap
(using combinatorial tools from direct methods)
• Prove bounds on condition number using both
algebraic and combinatorial tools
Support graph preconditioner: example
G(A)
[Vaidya]
G(B)
• A is symmetric positive definite with negative off-diagonal nzs
• B is a maximum-weight spanning tree for A
(with diagonal modified to preserve row sums)
• Preconditioning costs O(n) time per iteration
• Eigenvalue bounds from graph embedding: product of
congestion and dilation
• Condition number at most n2 independent of coefficients
• Many improvements exist
Support graph preconditioner: example
G(A)
G(B)
• can improve congestion and dilation by adding a few
strategically chosen edges to B
• cost of factor+solve is O(n1.75), or O(n1.2) if A is planar
• in experiments by Chen & Toledo, often better than
drop-tolerance MIC for 2D problems, but not for 3D.
[Vaidya]
Algebraic framework
[Gremban/Miller/Boman/Hendrickson]
• The support of B for A is
σ(A, B) = min{τ : xT(tB – A)x  0 for all x, all t  τ}
• In the SPD case, σ(A, B) = max{λ : Ax = λBx} = λmax(A, B)
• Theorem 1:
If A, B are SPD then κ(B-1A) = σ(A, B) · σ(B, A)
• Theorem 2: If V·W=U, then σ(U·UT, V·VT)  ||W||22
-a2
-c2
-a2
-b2
-b2
[
a2 +b2
-a2
- b2
- a 2 - b2
a2 +c2 -c2
-c2 b2 +c2
]
[
a2 +b2
-a 2
- b2
[
b
-a
c
-b -c
U
]
a2
b2
B =VVT
A =UUT
a
- a 2 - b2
][ ]
=
a
b
-a
c
x
[
1
1
-b
V
σ(A, B)  ||W||22  ||W|| x ||W||1
= (max row sum) x (max col sum)
 (max congestion) x (max dilation)
W
]
-c/a
c/b/b
Open problems I
• Other subgraph constructions for better bounds on ||W||22 ?
• For example [Boman],
||W||22  ||W||F2 = sum(wij2) = sum of (weighted) dilations,
and [Alon, Karp, Peleg, West] show there exists a spanning tree with
average weighted dilation exp(O((log n loglog n)1/2)) = o(n );
this gives condition number O(n1+) and solution time O(n1.5+),
compared to Vaidya O(n1.75) with augmented spanning tree
• Is there a construction that minimizes ||W||22 directly?
Open problems II
• Make spanning tree methods more effective in 3D?
• Vaidya: O(n1.75) in general, O(n1.2) in 2D
• Issue: 2D uses bounded excluded minors, not just separators
• Analyze a multilevel method in general?
• Extend to more general finite element matrices?
Link analysis of the world-wide web
1
1
2
2
3
4
1
2
3
4
7
5
4
5
6
3
6
7
• Web page = vertex
• Link = directed edge
• Link matrix: Aij = 1 if page i links to page j
5
6
7
Web graph: PageRank (Google)
[Brin, Page]
An important page is one that
many important pages point to.
• Markov process: follow a random link most of the time;
otherwise, go to any page at random.
• Importance = stationary distribution of Markov process.
• Transition matrix is p*A + (1-p)*ones(size(A)),
scaled so each column sums to 1.
• Importance of page i is the i-th entry in the principal
eigenvector of the transition matrix.
• But, the matrix is 2,000,000,000 by 2,000,000,000.
Web graph: Hubs and authorities
Hubs
Authorities
bikereviews.com
trekbikes.com
phillybikeclub.org
shimano.com
yahoo.com/cycling
campagnolo.com
[Kleinberg]
A good hub cites
good authorities
A good authority is
cited by good hubs
• Each page has hub score xi and authority score yi
• Repeat: y  A*x; x  AT*y; normalize;
• Converges to principal eigenvectors of AAT and ATA
(left and right singular vectors of A)
FUTURE
Combinatorial issues in numerical linear algebra
• Approximation theory for nonsymmetric LU ordering
• Preconditioning
• Complexity of matrix multiplication
Biology as an information science
“The rate-limiting step of genomics is
computational science.”
- Eric Lander
• Sequence assembly, gene identification, alignment
• Large-scale data mining
• Protein modeling: discrete and continuous
Linear and sublinear algorithms for huge problems
“Can we understand anything interesting about our data
when we do not even have time to read all of it?”
- Ronitt Rubinfeld
Fast Monte Carlo algorithms for finding
low-rank approximations
[Frieze, Kannan, Vempala]
• Describe a rank-k matrix B0 that is within ε of the best
rank-k approximation to A:
||A – B0||F  minB ||A - B||F + ε ||A||F
• Correct with probability at least 1 – δ
• Time polynomial in k, 1/ε, log(1/δ); independent of size of A
• Idea: using a clever distribution, sample an
O(k-by-k) submatrix of A and compute its SVD
(Need to be able to sample A with the right distribution)
Approximating the MST weight in sublinear time
[Chazelle, Rubinfeld, Trevisan]
• Key subroutine: estimate number of connected
components of a graph, in time depending on
expected error but not on size of graph
• Idea: for each vertex v define
f(v) = 1/(size of component containing v)
• Then Σv f(v) = number of connected components
• Estimate Σv f(v) by breadth-first search from a few vertices
Modeling and distributed control
Imagining a
molecular-scale
crossbar switch
[Heath et al., UCLA]
• Multiresolution modeling for nanomaterials and nanosystems
• Distributed control for systems on micro to global scales
“Active surface” airjet paper mover
PC-hosted DSP
control @ 1 kHz
576 valves
(144 per direction)
[Berlin, Biegelsen et al., PARC]
12” x 12” board
sensors: 32,000
gray-level pixels
in 25 linear arrays
A hard question
How will combinatorial methods be used
by people who don’t understand them
in detail?
Matrix division in Matlab
x = A \ b;
• Works for either full or sparse A
• Is A square?
no => use QR to solve least squares problem
• Is A triangular or permuted triangular?
yes => sparse triangular solve
• Is A symmetric with positive diagonal elements?
yes => attempt Cholesky after symmetric minimum degree
• Otherwise
=> use LU on A(:, colamd(A))
Matching and depth-first search in Matlab
• dmperm: Dulmage-Mendelsohn decomposition
• Bipartite matching followed by strongly connected components
• Square, full rank A:
• [p, q, r] = dmperm(A);
• A(p,q) has nonzero diagonal and is in block upper triangular form
• also, strongly connected components of a directed graph
• also, connected components of an undirected graph
• Arbitrary A:
• [p, q, r, s] = dmperm(A);
• maximum-size matching in a bipartite graph
• minimum-size vertex cover in a bipartite graph
• decomposition into strong Hall blocks
A few themes
•
•
•
•
•
•
Paths
Locality
Eigenvectors
Huge data sets
Multiple scales
Usability
Morals
• Things are clearer if you look at them from two directions
• Combinatorial algorithms are pervasive in scientific
computing and will become more so
• What are the implications for teaching?
• What are the implications for software development?
Thanks
Patrick Amestoy, Erik Boman, Iain Duff,
Mary Ann Branch Freeman, Bruce
Hendrickson, Esmond Ng, Alex Pothen,
Padma Raghavan, Ronitt Rubinfeld, Rob
Schreiber, Sivan Toledo, Paul Van Dooren,
Steve Vavasis, Santosh Vempala, ...
Download