Introduction Background Chunky Implementation Between Sparse and Dense Arithmetic Daniel S. Roche Computer Science Department United States Naval Academy NARC Seminar November 28, 2012 Interpolation Introduction Background Chunky Implementation Interpolation The Problem People want to compute with really big numbers and polynomials. Two basic choices for representation: • Dense — wasted space, but fast algorithms • Sparse — compact storage, slower algorithms The goal: Alternative representations and algorithms that go smoothly between these two options Introduction Background Chunky Implementation Interpolation Application: Cryptography Public key cryptography is used extensively in communications. There are two popular flavors: RSA Requires integer computations modulo a large integer (thousands of bits). Long integer multiplication algorithms are generally the same as those for (dense) polynomials. ECC Usually requires computations in a finite extension field — i.e. computations modulo a polynomial (degree in the hundreds). In both cases, sparse integers/polynomials are used to make schemes more efficient. Introduction Background Chunky Implementation Application: Nonlinear Systems Nonlinear systems of polynomial equations can be used to describe and model a variety of physical phenomena. Numerous methods can be used to solve nonlinear systems, but usually: • Inputs are sparse multivariate polynomials • Intermediate values become dense. One approach (used in triangular sets) simply switches from sparse to dense methods heuristically. Interpolation Introduction Background Chunky Implementation Current Focus: Polynomial Multiplication • Addition/subtraction of polynomials is trivial. • Division uses multiplication as a subroutine. • Multiplication is the most important basic computational problem on polynomials. More application areas • Coding theory • Symbolic computation • Scientific computing • Experimental mathematics Interpolation Introduction Background Chunky Implementation What is a polynomial? A polynomial is any formula involving +, −, × on indeterminates and constants from a ring R. Examples with integer coefficients (R = Z) x10 + x9 + x8 + x7 + x6 + 1 Interpolation Introduction Background Chunky Implementation What is a polynomial? A polynomial is any formula involving +, −, × on indeterminates and constants from a ring R. Examples with integer coefficients (R = Z) x10 + x9 + x8 + x7 + x6 + 1 4x10 − 3x8 − x7 + 3x6 + x5 − 2x4 + 2x3 + 5x2 Interpolation Introduction Background Chunky Implementation What is a polynomial? A polynomial is any formula involving +, −, × on indeterminates and constants from a ring R. Examples with integer coefficients (R = Z) x10 + x9 + x8 + x7 + x6 + 1 4x10 − 3x8 − x7 + 3x6 + x5 − 2x4 + 2x3 + 5x2 x451 − 9x324 − 3x306 + 9x299 + 4x196 − 9x155 − 2x144 + 10x27 Interpolation Introduction Background Chunky Implementation What is a polynomial? A polynomial is any formula involving +, −, × on indeterminates and constants from a ring R. Examples with integer coefficients (R = Z) x10 + x9 + x8 + x7 + x6 + 1 4x10 − 3x8 − x7 + 3x6 + x5 − 2x4 + 2x3 + 5x2 x451 − 9x324 − 3x306 + 9x299 + 4x196 − 9x155 − 2x144 + 10x27 x426 − 6x273 y399 z2 + 10x246 yz201 − 10x210 y401 − 3x21 yz − 9z12 Interpolation Introduction Background Chunky Implementation Polynomial Representations Let f = 7 + 5xy8 + 2x6 y2 + 6x6 y5 + x10 . Dense representation: 0 0 0 0 0 0 0 0 7 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Degree d n variables t nonzero terms Dense size: O(dn ) coefficients Interpolation Introduction Background Chunky Implementation Polynomial Representations Let f = 7 + 5xy8 + 2x6 y2 + 6x6 y5 + x10 . Recursive dense representation: 0 0 0 0 0 0 0 0 7 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Degree d n variables t nonzero terms Recursive dense size: O(tdn) coefficients Interpolation Introduction Background Chunky Implementation Polynomial Representations Let f = 7 + 5xy8 + 2x6 y2 + 6x6 y5 + x10 . Sparse representation: Degree d n variables t nonzero terms 0 8 2 5 0 0 1 6 6 10 7 5 2 6 1 Sparse size: O(t) coefficients O(tn log d) bits Interpolation Introduction Background Chunky Aside: Cost Measures Measure of Success The “cost” of an algorithm is measured as its rate of growth as the input size increases. Most relevant costs: • O(n): Linear time Intractable when n ≥ 1012 or so. • O(n log n): Linearithmic time Intractable when n ≥ 1010 or so. • O(n2 ): Quadratic time Intractable when n ≥ 106 or so. Implementation Interpolation Introduction Background Chunky Implementation Direct Multiplication Most methods work by directly multiplying coefficients, adding them up, and so on. Example: “School” Multiplication × 332 213 Interpolation Introduction Background Chunky Implementation Direct Multiplication Most methods work by directly multiplying coefficients, adding them up, and so on. Example: “School” Multiplication × 332 213 6 Interpolation Introduction Background Chunky Implementation Direct Multiplication Most methods work by directly multiplying coefficients, adding them up, and so on. Example: “School” Multiplication × 332 213 96 Interpolation Introduction Background Chunky Implementation Direct Multiplication Most methods work by directly multiplying coefficients, adding them up, and so on. Example: “School” Multiplication × 332 213 996 Interpolation Introduction Background Chunky Implementation Direct Multiplication Most methods work by directly multiplying coefficients, adding them up, and so on. Example: “School” Multiplication × 332 213 996 332 664 Interpolation Introduction Background Chunky Implementation Direct Multiplication Most methods work by directly multiplying coefficients, adding them up, and so on. Example: “School” Multiplication × + 332 213 996 332 664 70716 Total cost: O(n2 ) (quadratic) Interpolation Introduction Background Chunky Implementation Interpolation Indirect Multiplication Some faster methods do their work in an alternate representation: 1 Convert input polynomials to alternate representation 2 Multiply in the alternate representation 3 Convert the product back to the original form FFT-Based Multiplication mod 5 f = 2x + 3 g = x2 + 2 x + 3 Introduction Background Chunky Implementation Interpolation Indirect Multiplication Some faster methods do their work in an alternate representation: 1 Convert input polynomials to alternate representation 2 Multiply in the alternate representation 3 Convert the product back to the original form FFT-Based Multiplication mod 5 f = 2x + 3 1 g = x2 + 2 x + 3 Evaluate each polynomial at x = 1, 3, 4, 2: falt = [0, 4, 1, 2] galt = [1, 3, 2, 1] Introduction Background Chunky Implementation Interpolation Indirect Multiplication Some faster methods do their work in an alternate representation: 1 Convert input polynomials to alternate representation 2 Multiply in the alternate representation 3 Convert the product back to the original form FFT-Based Multiplication mod 5 f = 2x + 3 1 g = x2 + 2 x + 3 Evaluate each polynomial at x = 1, 3, 4, 2: falt = [0, 4, 1, 2] 2 galt = [1, 3, 2, 1] Multiply the evaluations pairwise: (fg)alt = [0, 2, 2, 2] Introduction Background Chunky Implementation Interpolation Indirect Multiplication Some faster methods do their work in an alternate representation: 1 Convert input polynomials to alternate representation 2 Multiply in the alternate representation 3 Convert the product back to the original form FFT-Based Multiplication mod 5 f = 2x + 3 1 g = x2 + 2 x + 3 Evaluate each polynomial at x = 1, 3, 4, 2: falt = [0, 4, 1, 2] 2 galt = [1, 3, 2, 1] Multiply the evaluations pairwise: (fg)alt = [0, 2, 2, 2] 3 Interpolate at x = 1, 3, 4, 2: fg = 2 x3 + 2 x2 + 2 x + 4 Introduction Background Chunky Implementation Interpolation Dense Multiplication Algorithms Cost (in ring operations) of multiplying two univariate dense polynomials with degrees less than d: Cost Method Classical Method O(d2 ) Direct Divide-and-Conquer Karatsuba ’63 O(dlog2 3 ) or O(d1.59 ) Direct FFT-based Schönhage/Strassen ’71 Cantor/Kaltofen ’91 O(d log d llog d) Indirect We write M(d) for this cost. Introduction Background Chunky Implementation Interpolation Sparse Multiplication Algorithms Cost of multiplying two univariate sparse polynomials with degrees less than d and at most t nonzero terms: Cost Method Naı̈ve O(t3 log d) Direct Geobuckets (Yan ’98) O(t2 log t log d) Direct Heaps (Johnson ’74) (Monagan & Pearce ’07) O(t2 log t log d) Direct Introduction Background Chunky Implementation Interpolation Adaptive multiplication Goal: Develop new algorithms whose cost smoothly varies between existing dense and sparse methods. Ground rules 1 The cost must never be greater than any standard dense or sparse algorithm. 2 The cost should be less than both in many “easy cases”. Primary technique: Develop indirect methods for the sparse case. Introduction Background Overall Approach Overall Steps 1 Recognize structure 2 Change rep. to exploit structure 3 Multiply 4 Convert back • Step 3 cost depends on instance difficulty. • Steps 1, 2, 4 must be fast (linear time). Chunky Implementation Interpolation Introduction Background Chunky Implementation Chunky Polynomials Example • f = 5x6 + 6x7 − 4x9 − 7x52 + 4x53 + 3x76 + x78 Interpolation Introduction Background Chunky Implementation Chunky Polynomials Example • f = 5x6 + 6x7 − 4x9 − 7x52 + 4x53 + 3x76 + x78 • f1 = 5 + 6x − 4x3 , • f = f1 x6 + f2 x52 + f3 x76 f2 = −7 + 4x, f3 = 3 + x 2 Interpolation Introduction Background Chunky Implementation Interpolation Chunky Multiplication Sparse algorithms on the outside, dense algorithms on the inside. • Exponent arithmetic stays the same. • Coefficient arithmetic is more costly. • Terms in product may have more overlap. Theorem Given f = f1 xe1 + f2 xe2 + · · · + ft xet g = g1 xd1 + g2 xd2 + · · · + gs xds , the cost of chunky multiplication (in ring operations) is deg gj O (deg fi ) · M deg fi deg f ≥deg g X i j 1≤i≤t, 1≤j≤s ! + deg fi (deg gj ) · M deg gj deg f <deg g X i j 1≤i≤t, 1≤j≤s !! . Introduction Background Chunky Implementation Conversion to the Chunky Representation Initial idea: Convert each operand independently, then multiply in the chunky representation. But how to minimize the nasty cost measure? Theorem Any independent conversion algorithm must result in slower multiplication than the dense or sparse method in some cases. Interpolation Introduction Background Chunky Implementation Two-Step Chunky Conversion First Step: Optimal Chunk Size Suppose every chunk in both operands was forced to have the same size k. This simplifies the cost to t(k) · s(k) · M(k), where t(k) is the least number of size-k chunks to make f . First conversion step: Compute the optimal value of k to minimize this simplified cost measure. Interpolation Introduction Background Chunky Implementation Computing the optimal chunk size Optimal chunk size computation algorithm: 1 Create two min-heaps with all “gaps” in f and g, ordered on the size of resulting chunk if gap were removed. 2 Remove all gaps of smallest priority and update neighbors 3 Approximate t(k), s(k) by the size of the heaps, and compute t(k) · s(k) · M(k) 4 Repeat until no gaps remain. With careful implementations, this can be made linear-time in either the dense or sparse representation. Constant factor approximation; ratio is 4. Interpolation Introduction Background Chunky Implementation Computing the optimal chunk size Optimal chunk size computation algorithm: 1 Create two min-heaps with all “gaps” in f and g, ordered on the size of resulting chunk if gap were removed. 2 Remove all gaps of smallest priority and update neighbors 3 Approximate t(k), s(k) by the size of the heaps, and compute t(k) · s(k) · M(k) 4 Repeat until no gaps remain. With careful implementations, this can be made linear-time in either the dense or sparse representation. Constant factor approximation; ratio is 4. Observe: We must compute the cost of dense multiplication! Interpolation Introduction Background Chunky Implementation Two-Step Chunky Conversion Second Step: Conversion given chunk size After computing “optimal chunk size”, conversion proceeds independently. We compute the optimal chunky representation for multiplying by a single size-k chunk. Idea: For each gap, maintain a linked list of all previous gaps to include if the polynomial were truncated here. Algorithm: Increment through gaps, each time finding the last gap that should be included. Interpolation Introduction Background Chunky Implementation Interpolation Conversion given optimal chunk size The algorithm uses two key properties: • Chunks larger than k bring no benefit. • For smaller chunks, we want to minimize X M(deg fi ) i deg fi . Theorem Our algorithm computes the optimal chunky representation for multiplying by a single size-k chunk. Its cost is linear in the dense or sparse representation size. Introduction Background Chunky Implementation Chunky Multiplication Overview Input: f , g ∈ R[x], either in the sparse or dense representation Output: Their product f · g, in the same representation 1 Compute approximation to “optimal chunk size” k, looking at both f and g simultaneously. 2 Convert f to optimal chunky representation for multiplying by a single size-k chunk. 3 Convert g to optimal chunky representation for multiplying by a single size-k chunk. 4 Multiply pairwise chunks using dense multiplication. 5 Combine terms and write out the product. Interpolation Introduction Background Chunky Implementation Interpolation Timings vs “Chunkiness” 1.4 Time over dense time 1.2 1 0.8 0.6 0.4 Dense Sparse Adaptive 0.2 0 0 5 10 15 20 Chunkiness 25 30 35 40 Introduction Background Chunky Implementation Interpolation Timings without imposed chunkiness 100 Time (seconds) 10 1 0.1 0.01 0.001 Degree 10000 Sparsity Dense Sparse Adaptive 100000 1e+06 10000 3000 1000 1e+07 300 1e+08 100 Introduction Background Chunky Implementation Interpolation The Ultimate Goal? Adaptive methods go between linearithmic-time dense algorithms and quadratic-time sparse algorithms. • Output size of dense multiplication is always linear. • Output size of sparse multiplication is at most quadratic, but might be much less • Can we have linearithmic output-sensitive cost? Note: This is the best we can hope for and would generalize the “chunky” approach. Introduction Background Chunky Implementation Interpolation Summary • New indirect multiplication methods to go between existing sparse and dense multiplication algorithms. • Chunky multiplication is never (asymptotically) worse than existing approaches, but can be much better in well-structured cases. • Recent results on sparse FFTs and sparse interpolation may lead to a significant breakthrough in theory. • Much work remains to be done!