Between Sparse and Dense Arithmetic Daniel S. Roche Computer Science Department

advertisement
Introduction
Background
Chunky
Implementation
Between Sparse and Dense Arithmetic
Daniel S. Roche
Computer Science Department
United States Naval Academy
NARC Seminar
November 28, 2012
Interpolation
Introduction
Background
Chunky
Implementation
Interpolation
The Problem
People want to compute with really big numbers and polynomials.
Two basic choices for representation:
• Dense — wasted space, but fast algorithms
• Sparse — compact storage, slower algorithms
The goal: Alternative representations and algorithms that go
smoothly between these two options
Introduction
Background
Chunky
Implementation
Interpolation
Application: Cryptography
Public key cryptography is used extensively in communications.
There are two popular flavors:
RSA
Requires integer
computations modulo a large
integer (thousands of bits).
Long integer multiplication
algorithms are generally the
same as those for (dense)
polynomials.
ECC
Usually requires
computations in a finite
extension field — i.e.
computations modulo a
polynomial (degree in the
hundreds).
In both cases, sparse integers/polynomials are used to make
schemes more efficient.
Introduction
Background
Chunky
Implementation
Application: Nonlinear Systems
Nonlinear systems of polynomial equations can be used to
describe and model a variety of physical phenomena.
Numerous methods can be used to solve nonlinear systems, but
usually:
• Inputs are sparse multivariate polynomials
• Intermediate values become dense.
One approach (used in triangular sets) simply switches from
sparse to dense methods heuristically.
Interpolation
Introduction
Background
Chunky
Implementation
Current Focus: Polynomial Multiplication
• Addition/subtraction of polynomials is trivial.
• Division uses multiplication as a subroutine.
• Multiplication is the most important basic computational
problem on polynomials.
More application areas
• Coding theory
• Symbolic computation
• Scientific computing
• Experimental mathematics
Interpolation
Introduction
Background
Chunky
Implementation
What is a polynomial?
A polynomial is any formula involving +, −, × on indeterminates
and constants from a ring R.
Examples with integer coefficients (R = Z)
x10 + x9 + x8 + x7 + x6 + 1
Interpolation
Introduction
Background
Chunky
Implementation
What is a polynomial?
A polynomial is any formula involving +, −, × on indeterminates
and constants from a ring R.
Examples with integer coefficients (R = Z)
x10 + x9 + x8 + x7 + x6 + 1
4x10 − 3x8 − x7 + 3x6 + x5 − 2x4 + 2x3 + 5x2
Interpolation
Introduction
Background
Chunky
Implementation
What is a polynomial?
A polynomial is any formula involving +, −, × on indeterminates
and constants from a ring R.
Examples with integer coefficients (R = Z)
x10 + x9 + x8 + x7 + x6 + 1
4x10 − 3x8 − x7 + 3x6 + x5 − 2x4 + 2x3 + 5x2
x451 − 9x324 − 3x306 + 9x299 + 4x196 − 9x155 − 2x144 + 10x27
Interpolation
Introduction
Background
Chunky
Implementation
What is a polynomial?
A polynomial is any formula involving +, −, × on indeterminates
and constants from a ring R.
Examples with integer coefficients (R = Z)
x10 + x9 + x8 + x7 + x6 + 1
4x10 − 3x8 − x7 + 3x6 + x5 − 2x4 + 2x3 + 5x2
x451 − 9x324 − 3x306 + 9x299 + 4x196 − 9x155 − 2x144 + 10x27
x426 − 6x273 y399 z2 + 10x246 yz201 − 10x210 y401 − 3x21 yz − 9z12
Interpolation
Introduction
Background
Chunky
Implementation
Polynomial Representations
Let f = 7 + 5xy8 + 2x6 y2 + 6x6 y5 + x10 .
Dense representation:
0
0
0
0
0
0
0
0
7
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
Degree d
n variables
t nonzero terms
Dense size:
O(dn ) coefficients
Interpolation
Introduction
Background
Chunky
Implementation
Polynomial Representations
Let f = 7 + 5xy8 + 2x6 y2 + 6x6 y5 + x10 .
Recursive dense representation:
0
0
0
0
0
0
0
0
7
5
0
0
0
0
0
0
0
0 0 0 0 0
0
0
0
6
0
0
2
0
0 0 0 0
0
0
0
0
0
0
0
0
1
Degree d
n variables
t nonzero terms
Recursive dense size:
O(tdn) coefficients
Interpolation
Introduction
Background
Chunky
Implementation
Polynomial Representations
Let f = 7 + 5xy8 + 2x6 y2 + 6x6 y5 + x10 .
Sparse representation:
Degree d
n variables
t nonzero terms
0 8 2 5 0
0 1 6 6 10
7 5 2 6 1
Sparse size:
O(t) coefficients
O(tn log d) bits
Interpolation
Introduction
Background
Chunky
Aside: Cost Measures
Measure of Success
The “cost” of an algorithm is measured as its
rate of growth as the input size increases.
Most relevant costs:
• O(n): Linear time
Intractable when n ≥ 1012 or so.
• O(n log n): Linearithmic time
Intractable when n ≥ 1010 or so.
• O(n2 ): Quadratic time
Intractable when n ≥ 106 or so.
Implementation
Interpolation
Introduction
Background
Chunky
Implementation
Direct Multiplication
Most methods work by directly multiplying coefficients, adding
them up, and so on.
Example: “School” Multiplication
×
332
213
Interpolation
Introduction
Background
Chunky
Implementation
Direct Multiplication
Most methods work by directly multiplying coefficients, adding
them up, and so on.
Example: “School” Multiplication
×
332
213
6
Interpolation
Introduction
Background
Chunky
Implementation
Direct Multiplication
Most methods work by directly multiplying coefficients, adding
them up, and so on.
Example: “School” Multiplication
×
332
213
96
Interpolation
Introduction
Background
Chunky
Implementation
Direct Multiplication
Most methods work by directly multiplying coefficients, adding
them up, and so on.
Example: “School” Multiplication
×
332
213
996
Interpolation
Introduction
Background
Chunky
Implementation
Direct Multiplication
Most methods work by directly multiplying coefficients, adding
them up, and so on.
Example: “School” Multiplication
×
332
213
996
332
664
Interpolation
Introduction
Background
Chunky
Implementation
Direct Multiplication
Most methods work by directly multiplying coefficients, adding
them up, and so on.
Example: “School” Multiplication
×
+
332
213
996
332
664
70716
Total cost:
O(n2 )
(quadratic)
Interpolation
Introduction
Background
Chunky
Implementation
Interpolation
Indirect Multiplication
Some faster methods do their work in an alternate representation:
1 Convert input polynomials to alternate representation
2 Multiply in the alternate representation
3 Convert the product back to the original form
FFT-Based Multiplication mod 5
f = 2x + 3
g = x2 + 2 x + 3
Introduction
Background
Chunky
Implementation
Interpolation
Indirect Multiplication
Some faster methods do their work in an alternate representation:
1 Convert input polynomials to alternate representation
2 Multiply in the alternate representation
3 Convert the product back to the original form
FFT-Based Multiplication mod 5
f = 2x + 3
1
g = x2 + 2 x + 3
Evaluate each polynomial at x = 1, 3, 4, 2:
falt = [0, 4, 1, 2]
galt = [1, 3, 2, 1]
Introduction
Background
Chunky
Implementation
Interpolation
Indirect Multiplication
Some faster methods do their work in an alternate representation:
1 Convert input polynomials to alternate representation
2 Multiply in the alternate representation
3 Convert the product back to the original form
FFT-Based Multiplication mod 5
f = 2x + 3
1
g = x2 + 2 x + 3
Evaluate each polynomial at x = 1, 3, 4, 2:
falt = [0, 4, 1, 2]
2
galt = [1, 3, 2, 1]
Multiply the evaluations pairwise:
(fg)alt = [0, 2, 2, 2]
Introduction
Background
Chunky
Implementation
Interpolation
Indirect Multiplication
Some faster methods do their work in an alternate representation:
1 Convert input polynomials to alternate representation
2 Multiply in the alternate representation
3 Convert the product back to the original form
FFT-Based Multiplication mod 5
f = 2x + 3
1
g = x2 + 2 x + 3
Evaluate each polynomial at x = 1, 3, 4, 2:
falt = [0, 4, 1, 2]
2
galt = [1, 3, 2, 1]
Multiply the evaluations pairwise:
(fg)alt = [0, 2, 2, 2]
3
Interpolate at x = 1, 3, 4, 2:
fg = 2 x3 + 2 x2 + 2 x + 4
Introduction
Background
Chunky
Implementation
Interpolation
Dense Multiplication Algorithms
Cost (in ring operations) of multiplying two univariate dense
polynomials with degrees less than d:
Cost
Method
Classical Method
O(d2 )
Direct
Divide-and-Conquer
Karatsuba ’63
O(dlog2 3 ) or O(d1.59 )
Direct
FFT-based
Schönhage/Strassen ’71
Cantor/Kaltofen ’91
O(d log d llog d)
Indirect
We write M(d) for this cost.
Introduction
Background
Chunky
Implementation
Interpolation
Sparse Multiplication Algorithms
Cost of multiplying two univariate sparse polynomials with degrees
less than d and at most t nonzero terms:
Cost
Method
Naı̈ve
O(t3 log d)
Direct
Geobuckets
(Yan ’98)
O(t2 log t log d)
Direct
Heaps
(Johnson ’74)
(Monagan & Pearce ’07)
O(t2 log t log d)
Direct
Introduction
Background
Chunky
Implementation
Interpolation
Adaptive multiplication
Goal: Develop new algorithms whose cost smoothly varies
between existing dense and sparse methods.
Ground rules
1
The cost must never be greater than any standard dense or
sparse algorithm.
2
The cost should be less than both in many “easy cases”.
Primary technique: Develop indirect methods for the sparse case.
Introduction
Background
Overall Approach
Overall Steps
1
Recognize structure
2
Change rep. to
exploit structure
3
Multiply
4
Convert back
• Step 3 cost depends
on instance difficulty.
• Steps 1, 2, 4 must be
fast (linear time).
Chunky
Implementation
Interpolation
Introduction
Background
Chunky
Implementation
Chunky Polynomials
Example
• f = 5x6 + 6x7 − 4x9 − 7x52 + 4x53 + 3x76 + x78
Interpolation
Introduction
Background
Chunky
Implementation
Chunky Polynomials
Example
• f = 5x6 + 6x7 − 4x9 − 7x52 + 4x53 + 3x76 + x78
• f1 = 5 + 6x − 4x3 ,
• f = f1 x6 + f2 x52 + f3 x76
f2 = −7 + 4x,
f3 = 3 + x 2
Interpolation
Introduction
Background
Chunky
Implementation
Interpolation
Chunky Multiplication
Sparse algorithms on the outside, dense algorithms on the inside.
• Exponent arithmetic stays the same.
• Coefficient arithmetic is more costly.
• Terms in product may have more overlap.
Theorem
Given
f = f1 xe1 + f2 xe2 + · · · + ft xet
g = g1 xd1 + g2 xd2 + · · · + gs xds ,
the cost of chunky multiplication (in ring operations) is
deg gj
O
(deg fi ) · M
deg fi
deg f ≥deg g
X
i
j
1≤i≤t, 1≤j≤s
!
+
deg fi
(deg gj ) · M
deg gj
deg f <deg g
X
i
j
1≤i≤t, 1≤j≤s
!!
.
Introduction
Background
Chunky
Implementation
Conversion to the Chunky Representation
Initial idea: Convert each operand independently,
then multiply in the chunky representation.
But how to minimize the nasty cost measure?
Theorem
Any independent conversion algorithm must result in slower
multiplication than the dense or sparse method in some cases.
Interpolation
Introduction
Background
Chunky
Implementation
Two-Step Chunky Conversion
First Step: Optimal Chunk Size
Suppose every chunk in both operands was forced to have the
same size k.
This simplifies the cost to t(k) · s(k) · M(k),
where t(k) is the least number of size-k chunks to make f .
First conversion step:
Compute the optimal value of k to minimize this simplified cost
measure.
Interpolation
Introduction
Background
Chunky
Implementation
Computing the optimal chunk size
Optimal chunk size computation algorithm:
1
Create two min-heaps with all “gaps” in f and g,
ordered on the size of resulting chunk if gap were removed.
2
Remove all gaps of smallest priority and update neighbors
3
Approximate t(k), s(k) by the size of the heaps,
and compute t(k) · s(k) · M(k)
4
Repeat until no gaps remain.
With careful implementations, this can be made linear-time in
either the dense or sparse representation.
Constant factor approximation; ratio is 4.
Interpolation
Introduction
Background
Chunky
Implementation
Computing the optimal chunk size
Optimal chunk size computation algorithm:
1
Create two min-heaps with all “gaps” in f and g,
ordered on the size of resulting chunk if gap were removed.
2
Remove all gaps of smallest priority and update neighbors
3
Approximate t(k), s(k) by the size of the heaps,
and compute t(k) · s(k) · M(k)
4
Repeat until no gaps remain.
With careful implementations, this can be made linear-time in
either the dense or sparse representation.
Constant factor approximation; ratio is 4.
Observe: We must compute the cost of dense multiplication!
Interpolation
Introduction
Background
Chunky
Implementation
Two-Step Chunky Conversion
Second Step: Conversion given chunk size
After computing “optimal chunk size”, conversion proceeds
independently.
We compute the optimal chunky representation for multiplying
by a single size-k chunk.
Idea: For each gap, maintain a linked list of all previous gaps
to include if the polynomial were truncated here.
Algorithm: Increment through gaps, each time finding
the last gap that should be included.
Interpolation
Introduction
Background
Chunky
Implementation
Interpolation
Conversion given optimal chunk size
The algorithm uses two key properties:
• Chunks larger than k bring no benefit.
• For smaller chunks, we want to minimize
X M(deg fi )
i
deg fi
.
Theorem
Our algorithm computes the optimal chunky representation
for multiplying by a single size-k chunk.
Its cost is linear in the dense or sparse representation size.
Introduction
Background
Chunky
Implementation
Chunky Multiplication Overview
Input: f , g ∈ R[x], either in the sparse or dense representation
Output: Their product f · g, in the same representation
1
Compute approximation to “optimal chunk size” k,
looking at both f and g simultaneously.
2
Convert f to optimal chunky representation
for multiplying by a single size-k chunk.
3
Convert g to optimal chunky representation
for multiplying by a single size-k chunk.
4
Multiply pairwise chunks using dense multiplication.
5
Combine terms and write out the product.
Interpolation
Introduction
Background
Chunky
Implementation
Interpolation
Timings vs “Chunkiness”
1.4
Time over dense time
1.2
1
0.8
0.6
0.4
Dense
Sparse
Adaptive
0.2
0
0
5
10
15
20
Chunkiness
25
30
35
40
Introduction
Background
Chunky
Implementation
Interpolation
Timings without imposed chunkiness
100
Time (seconds)
10
1
0.1
0.01
0.001
Degree 10000
Sparsity
Dense
Sparse
Adaptive
100000
1e+06
10000
3000
1000
1e+07
300
1e+08
100
Introduction
Background
Chunky
Implementation
Interpolation
The Ultimate Goal?
Adaptive methods go between linearithmic-time dense algorithms
and quadratic-time sparse algorithms.
• Output size of dense multiplication is always linear.
• Output size of sparse multiplication is at most quadratic,
but might be much less
• Can we have linearithmic output-sensitive cost?
Note: This is the best we can hope for and would generalize the
“chunky” approach.
Introduction
Background
Chunky
Implementation
Interpolation
Summary
• New indirect multiplication methods to go between existing
sparse and dense multiplication algorithms.
• Chunky multiplication is never (asymptotically) worse than
existing approaches, but can be much better in well-structured
cases.
• Recent results on sparse FFTs and sparse interpolation may
lead to a significant breakthrough in theory.
• Much work remains to be done!
Download