Solving SDD Linear Systems in Nearly mlog1/2n Time Richard Peng M.I.T. A&C Seminar, Sep 12, 2014 OUTLINE • • • • The Problem Approach 1 Approach 2 Combining these LARGE GRAPHS Images Roads Social networks Meshes Algorithmic challenges: How to store? How to analyze? How to optimize? LAPLACIAN PARADIGM Graph Electrical network 1Ω 1Ω 2Ω Linear systems GRAPH LAPLACIAN Row/column vertex Off-diagonal -weight Diagonal weighted degree 1Ω 1Ω 2Ω Electrical network/ Weighted graph Properties • Symmetric • Row/column sums = 0 • Non-positive off-diagonals CORE ROUTINE Input: graph Laplacian L, vector b Output: vector x s.t. Lx ≈ b To measure performance: n: dimension, # vertices m: nnz, O(# of edges) APPLICATIONS Directly related: SDD, M matrices Elliptic PDEs Few iterations: eigenvectors, heat kernels Clustering Many iterations: Combinatorial graph problems Inference Image processing Random trees Flow / matching GENERAL LINEAR SYSTEM SOLVES Oldest studied algorithmic problem • [Liu 179]…[Newton `1710] [Gauss 1810]: O(n3) • [HS `52]: conjugate gradient, O(nm)* • [Strassen `69] O(n2.8) • [Coopersmith-Winograd `90] O(n2.3755) • [Stothers `10] O(n2.3737) • [Vassilevska Williams`11] O(n2.3727) COMBINATORIAL PRECONDITIONING 1Ω 1Ω 2Ω Use the connection to graphs to design numerical solvers • [Vaidya`91]: O(m7/4) Subsequent improvements: • [Elkin-Emek-Spielman-Teng `05] • [Batson-Spielman-Srivastava `09] • [Boman-Hendrickson`01] O(mn) • [Andersen-Chung-Lang `06] • `03] [Kolla-Makarychev-Saberi-Teng `10] • [Spielman-Teng O(m1.31) • [Koutis-Miller `07] [Koutis-Miller-P • [Spielman-Teng• `04] O(mlogc`10, n) `11] • • • [Spielman-Srivastava `08] [Abraham-Bartal-Neiman `08] [Andersen-Perez `09] … • • • [Orecchia-Vishnoi `11] [Abraham-Neiman `12] [OveisGharan-Trevisan `12] … ZENO’S DICHOTOMY PARADOX O(mlogcn) Fundamental theorem of Laplacian solvers: improvements decrease c by factor between [2,3] 2004: 70 2010: 6 2006: 32 2009: 15 c 2010: 2 2011: 1 2014: 1/2 OPT: 0? [Miller]: Instance of speedup theorem? OUTLINE • • • • The Problem Approach 1 Approach 2 Combining these WHAT IS A SOLVE? What is b = Lx? 3x3 – x2 – 2x1 = (x3 – x2) + 2(x3 – x1) 1Ω x2 1Ω 2Ω x3 Flows on edges Ohm’s law: current = voltage × conductance b: residue of electrical current x: voltages at vertices WHAT IS A SOLVE? find voltages x whose flow meets demand b Intermediate object: flows on edge, f [Kirchoff 1845, Doyle-Snell `84]: f minimizes energy dissipation Energy of f = Σer(e)f(e)2 WHAT MAKES SOLVES HARD Densely connected graphs, need: • sampling • error reduction Long paths, need: • Divide-and-conquer • Data structures Solvers need to combine both KELNER-ORECCHIA-SIDFORD-ZHU `13: ELECTRICAL FLOW SOLVER Start with some flow f meeting demand b Push flows along cycles in the graph to reduce energy ` [KOSZ `13]: energy of f approaches optimum Algorithmic challenges: How many cycles? Ω(m) How big is each cycle? Ω(n) HOW TO FIND CYCLES? ` ` Cycle = tree + edge: • Pick a tree • Sample off-tree edges ` HOW TO SAMPLE EDGES [KMP `10, `11, KOSZ `13]: Sample edges w.p. proportional to stretch Stretch = 4 • Stretch = length of edge / length of path in tree • Unweighted graph: length of tree path ` Stretch = 3 Key quantity: total stretch, S [KOSZ `13]: O(m + S) cycles halves error in expectation WHAT’S A LOW STRETCH TREE? n1/2-by-n1/2 unit weighted mesh Candidate 1: 2: ‘haircomb’: recursive C ‘fractal’ Stretch stretch(e)= = n1/2 O(1) 1/2) But stretch(e)=Ω(n only n1/2 such edges, Contribution = O(n) 3/2 total stretch = Ω(n ) • shortestO(logn) path tree such layers due to fractal, • max weight Total spanning = O(nlogn) tree LOW STRETCH TREES Runtime AKPW `91 O(m log log n) exp((log n log log n)1/2) Bartal 96, 98 O(m log n) FRT `03 EEST `05 ABN `08 AN `12 Stretch O(log n log log n) [CKMPX`14]: O(m log3 n) O(log n) Construction in O(mloglogn) time, log 2Sn= O(mlogn) O(log2 n log log n) willO(m assume (actual bounds more intricate) O(m log n) O(log n (log log n)c) O(m log n) O(log n log log n) Bartal / FRT trees have steiner vertices [KOSZ `13]: embeddable trees are ok! [KOSZ `13] IN A NUTSHELL [KOSZ `13] Converges on Flows Sampling Distribution Stretch Sample size 1 # updates O(S + m) Cost per update O(n) O(logn) using data structures Total (S = O(mlogn)) O(mnlogn) O(mlog2n) OUTLINE • • • • The Problem Approach 1 Approach 2 Combining these NUMERICAL VIEW OF SOLVERS Iterative methods: given H similar to G, solve LGx = b by solving several LHy = r Similar: LG ≼ LH ≼ kLG ≼ : Loewner ordering, A ≼ B xTAx ≤ xTBx for all x Chebyshev iteration: If LG ≼ LH ≼ kLG, can halve error in LGx = b by solving O(k1/2) problems in H to O(1/poly(k)) accuracy NUMERICAL VIEWS OF SOLVERS Chebyshev iteration (ignoring errors): If LG ≼ LH ≼ kLG, can solve problem in G by solving O(k1/2) problems in H Preconditioner construction view: • Given G, find H that’s easier to solve such that LG ≼ LH ≼ kLG Perturbation stability view: • Can adjust edge weights by factor of [1,k] • Need to solve O(k1/2) problems on resulting graph GENERATING H [KOSZ `13]: sample 1 edge, reduce error by 1-1/(m + S) O(m + S) samples halves error Matrix Chernoff: O(S logn) samples gives LG ≼ 2LH ≼ 3LG . ` ` Can halve error in LGx=b via. O(1) solves in H HOW DOES THIS HELP? Go from G to H with m’ = O(Slogn) off tree edges S = O(mlogn), m’ > m ` ` [KMP `10]: take perturbation stability view, scale up tree by some factor k • factor k distortion, O(k1/2) iterations • S’ = S/k, total stretch decrease by factor of k SIZE REDUCTION Scale up tree by factor of k so S’ = S/k Sample m’ = O(S’logn) = O(Slogn/k) edges • • ` ` • • • n - 1 tree edges + m’ off-tree edges Repeatedly remove degree 1 and 2 vertices New size: O(Slogn/k) T(m, s) = O(k1/2)(m + T(Slogn / k, S/k)) (can show: errors don’t accumulate in recursion) TWO TERM RECURRENCE? W-cycle algorithm: T(m, s) = O(k1/2)(m + T(Slogn / k, S/k)) These are really the same parameter! Key invariant from [KMP `11]: ‘spine-heavy’: S = m/O(logn) TIME ON SPINE HEAVY GRAPHS T(m) = O(k1/2)(m + T(m / k)) = O(m) Low stretch spanning tree: Sini = mlogn Initial scaling: O(log2n), O(logn) overhead More important runtime parameter: how small of S to get O(m) runtime? NUMERICAL SOLVER [KMP `10, `11] Converges on Vectors Sampling Distribution Stretch Cost per update O(1) amortized* O(m) ‘steps’ when S = O(m/logn) *All updates are matrix-vector multiplications Increase when SkS O(k1/2) Total (S = O(mlogn)) O(mlogn) Byproduct of this view: solving to O(logcn) error is `easy’, expected convergence become w.h.p. via. checking with O((loglogn)c) overhead NUMERICAL VS. COMBINATORIAL Converges on [KMP `10, `11] [KOSZ `13] Vectors Flows Sampling Distribution Stretch Stretch Cost per update O(logn) O(m) ‘steps’ when O(1) amortized S = O(m/logn) S = O(m) Increase when SkS O(k1/2) O(k) Total (S = O(mlogn)) O(mlogn) O(mlog2n) COMMONALITY: RANDOMNESS Reason: need to handle complete graph, easiest expander constructions are randomized [KMP `10, `11] [KOSZ `13] Sampling Distribution Stretch Stretch Analysis Method Matrix Chernoff + black box numerical methods Expected convergence, single potential function Sample Count arbitrary, O(m) 1 Overhead needed for convergence O(logn) (from union bound on n dimensions) O(1) Analogs Multigrid methods Preconditioning Stochastic methods Kaczmarz iteration Randomized descent Stretch is the ‘right’ parameter when we use trees due to the Sherman-Morrison formula NUMERICAL VS. COMBINATORIAL [KMP `10, `11] [KOSZ `13] Cost per update O(1) amortized O(logn) O(m) ‘steps’ when S = O(m/logn) S = O(m) Increase when SkS O(k1/2) O(k) • • • • • + Sublinear sample count + Recursive, O(1) per udpate - LG ≼ LH ≼ kLG, O(logn) overhead + Overhead: (S / m)1/2 - Recursive error propagation • • • • • - Ω(m) samples - Data structure, O(logn) + Adaptive convergence - Linear dependence on S + Simple error propagation COMBINE THESE? Can the adaptive convergence guarantees work with the recursive Chebyshev? Consequence : T(m, s) = O(k1/2)(m + T(S / k, S / k))) This Talk Converges on Vectors and Flows Sampling Distribution Stretch Cost per update O(1) amortized O(m) ‘steps’ when S = O(m) Increase when SkS O(k1/2) Total (S = O(mlogn)) O(mlog1/2n) OUTLINE • • • • The Problem Approach 1 Approach 2 Combining these DIRECT MODIFICATION Use Chebyshev iteration Thought`13]? Experiment [KOSZoutside `13] of [KOSZ Converges on Flows ???????????? Sampling Distribution Stretch Stretch Cost per update O(logn) O(logn) O(m) ‘steps’ when S = O(m) S = O(m) Increase when SkS O(k) O(k1/2) Total (S = O(mlogn)) O(mlog2n) O(mlog3/2n) Within (loglogn)c of [LS`13], which also uses Chebyshev-like methods Our analyses do make this rigorous CHALLENGES [KMP `10, `11] [KOSZ `13] Sample size m/k 1 Converges on Vectors Flows Cost per O(1) O(logn) update amortized 1. Multiple updates, instead of a few 2. Flows vs. voltages 3. Remove data structure overhead BATCHED CYCLE UPDATES This only gives flows, ` Chebyshev iteration works with voltages ` • • • Each cycle updates lowers flow energy Changing one after another is still valid flow update when considering both Updating together can only be better! MULTI-EDGE SAMPLING ` Expected error reduction from [KOSZ `13] extends to multiple edges, in voltage space Major modifications: • Voltages instead of flows • Sublinear # of samples • O(m + S) O(S) • O(m) term goes into iterative method • Only prove this for iterative refinement • Reduce to black-box Chebyshev • Simpler analysis, at cost of (loglogn)c WHAT HAPPENED TO THE DATA STRUCTURE? Path updates/queries in a tree in O(logn) time Amortized into: • Vertex elimination • Recursion GETTING RID OF LOG Recursive Chebyshev: T(m) = k1/2(m + T(m / k)) Call structure is ‘bottom light’: • Total size of bottom level calls is o(m) • Cost dominated by top level size, O(m), instead of being same at all O(logn) levels MLOG1/2N [KMP `10, `11] [KOSZ `13] [CKMPPRX`14] Converges on Vectors Flows Vectors/Flows Sampling Distribution Stretch Stretch Stretch Cost per update O(1) amortized O(logn) O(1) amortized O(m) ‘steps’ when S = O(m/logn) S = O(m) S = O(m/(loglogn)c) Increase when SkS O(k1/2) O(k) O(k1/2) Total (S = O(mlogn)) O(mlog2n) O(mlog1/2 (loglogn)cn) O(mlogn) Expected convergence as shown in [KOSZ `13] • Is a general phenomenon • Interacts well with numerical algorithms in recursive settings HIDDEN UNDER THE RUG • Error propagation in recursion • Vector based analysis of Chebyshev iteration (instead of operators) • Numerical stability analysis • Construction of better trees by discounting higher stretch OPEN PROBLEMS loglogn factors: how many are real? Extend this analysis? Numerical stability: double precision ok? O(m) time algorithm? (maybe with O(mlogn) pre-processing?) • Other notions of stretch? • Beyond Laplacians • Code packages 2.0. • • • • THANK YOU! Questions?