Turnstile Streaming Algorithms Might as Well Be Linear Sketches

advertisement
Turnstile Streaming Algorithms
Might as Well Be Linear Sketches
Yi Li
Huy L. Nguyen
David Woodruff
Turnstile Streaming Model
• Underlying n-dimensional vector x initialized to 0n
• Long stream of updates x à x + ei or x à x - ei for
standard unit vector ei
• At end of the stream, x 2 {-m, -m+1, …, m-1, m}n for
some bound m · poly(n)
• Output an approximation to f(x) whp
• Goal: use as little space (in bits) as possible
Example: Norms
• Suppose you want |x|pp = Ʃi=1n |xi|p
• Want Z for which (1-Ɛ) |x|pp · Z · (1+Ɛ) |x|pp
• Many applications
• p=2
– Geometry, linear algebra
• p=1
– Distances between distributions, network monitoring
Algorithm for 2-Norm
• Let r = 1/Ɛ2
• Choose an r x n matrix A of i.i.d. N(0,1/r) normal
random variables (with precision 1/poly(n))
• Maintain Ax in the stream
• Output |Ax|22
• Proof: Johnson-Lindenstrauss Lemma
Algorithm for 1-Norm [Indyk]
• Let r = 1/Ɛ2
• Choose an r x n matrix A of i.i.d. Cauchy random
variables (with precision 1/poly(n))
• Maintain Ax in the stream
• Output median(|Ax1|, …, |Axr|)
• Proof: 1-stability of Cauchy distribution
– If C1, C2 are independent Cauchy r.v.s, then
a*C1 + b*C2 » (|a| + |b|) C3 for Cauchy r.v. C3
Common Features
Algorithms for 2-norm and 1-norm have the
following Some
form: functions f(x) may be
weird:
1. Choose a random matrix A independent of x
What is
xxx
2. Maintain Ax in the stream
1
3. Output a function of Ax
Question (?!): does the optimal algorithm for
approximating any function in the turnstile
model have this form?
Our Results
•
Yes, up to a factor of log n
•
Theorem: for computing a relation f for x in
{-m, -m+1, …, m}n in the turnstile model, there is
a correct (whp) algorithm which:
1. samples an integer matrix A uniformly from
log m) hardwired matrices, independent of x
2. outputs a function of Ax
O(n
Logarithm of the number of states of Ax, for x in
{-m, -m+1, …, m}n, plus amount of randomness, is
optimal up to a log n factor
Consequences
a 2 {0,1}n
Create stream s(a)
b 2 {0,1}n
Create stream s(b)
Lower Bound Technique
1. Run Alg on s(a), transmit state of Alg(s(a)) to Bob
2. Bob computes Alg(s(a), s(b))
3. If Bob solves g(a,b), space complexity of Alg at least the 1way communication complexity of g
Consequences
a 2 {0,1}n
Create stream s(a)
b 2 {0,1}n
Create stream s(b)
Our main theorem implies:
If Bob can solve g(a,b), then space of Alg at least the
simultaneous communication complexity of g
Weaker public-coin model in which Alice and Bob
simultaneously send a message to a referee
The log n Factor Loss
• Main Theorem: The logarithm of the number of
states of Ax, as x ranges over {-m, -m+1, …, m}n,
plus the amount of randomness to store A, is
optimal up to a log n factor
• The log n loss is necessary
• Consider f(x) = x1 mod 2
Non-Uniformity Restriction
• Careful wording: “samples an integer matrix A
uniformly from O(n log m) hardwired matrices,
independent of x”
• Algorithm is non-uniform
– Output of each state for each A also hardwired
• Alternatively, allow algorithm to use more space
to process a stream update, provided it only
retains Ax and its randomness
– Regenerate A during each stream update
Comment on the Model
• For each random seed, algorithm is a deterministic
automaton with a finite number of states
• Main theorem only requires correctness for
x 2 {-m, -m+1, …, m}n
It counts the number of states as x varies in this range
• While processing the stream, may have |x|1 > m
• The algorithm can’t abort if this happens. It must still be
correct at the end of the stream for x in {-m, -m+1, …, m}n
Related Work
• Ganguly
– Deterministic algorithms
– Specific to heavy hitters problem
– Shows algorithm might as well be a linear
sketch over the reals
– Dimension lower bound over the reals
Talk Outline
•
Proof of Main Theorem
1. Reduction to path-independent automata
2. From path-independent automata to linear
sketches
•
Applications and Open Questions
Stream Automaton for Fixed
Randomness
…
-en
+en
…
-e1, +e2
Start
+e1
…
Want each state of
the automaton to
only depend on x, …
not how it got there
+e1
0n in two
+edifferent
5
states
-e1
…
…
Path-Independent Automaton
• Undirected connected graph
• Each x 2 Zn in a unique state
• For each randomness, can we modify the
automaton to make it path-independent?
• Rule out algorithms that remember how they
arrived at a state, e.g., an algorithm that stores
the last 5 stream updates
Path-Reversible Automaton
• Path-reversible: 8 states s, if σ is a stream
(+ei1, -ei2, -ei3, …,+eir) of updates, resulting
in a state t, then from t the stream
σ-1 = (-eir, …,+ei3,+ei2, -ei1) returns us to s
+e2
s1
-e1
s2
-e2
+e5
s3
+e1
s4
-e5
• Path-reversible does not imply path-independent
Strategy
Arbitrary
Automaton
PathReversible
Automaton
PathIndependent
Automaton
For stream σ, freq(σ) 2 Zn is “net update” to each coordinate
Idea: 1. if in a state s, and update by a stream σ,
with freq(σ) = 0, answers ought to be similar
2. collapse all states s, s’ for which s+σ = s’ and
freq(σ) = 0 for some stream σ
Issue: how to define new output and transition function?
Zero-Frequency Graph
• Directed graph G = (V,E)
• V = states of old automaton Aold (for fixed randomness)
• (s,t) 2 E if there is a stream σ with s+σ=t and freq(σ) = 0
– Finite number of streams to consider
• Terminal equivalence class: strongly connected
component with no outgoing edge
– Path in G lands in a terminal equivalence class
– States of new automaton Anew = terminal equivalence classes
New Transition Function
• Suppose in terminal equivalence class C
• Given an update ei
• Let v 2 C be an arbitrary node
• Compute v+ei using transition function of Aold
• Walk from v+ei until reach terminal equivalence class C’
– C’ is unique
• Does not depend on choice of v
• Only one terminal equivalence class reachable
– Anew is path-reversible
Terminal
equivalence
class
u
+ei
freq(σ) = 0
v
+ei
freq(σ’) = 0
Terminal
equivalence
class
Output Function of Anew
• In each terminal equivalence class C, sample node u from
stationary distribution from random walk in C (add self-loops)
– Output of Anew on C = Output of Aold on u
• If v is starting vertex of Aold,
– take a random walk in G from v
– let starting vertex of Anew be terminal equivalence class C reached
• Why is it correct?
Correctness
• Let ¦ be an arbitrary distribution on streams ¾
• Choose fixed randomness so Aold correct on ¦’:
– Long sequence of zero streams,
– Followed by ¾ sampled from ¦,
– Followed by long sequence of zero streams
• Output of Anew on ¦ statistically close to output of Aold
on ¦’
• => for every ¦ there is a path-reversible Anew correct
on ¦
Arbitrary
Automaton
PathReversible
Automaton
PathIndependent
Automaton
• Undirected zero-frequency graph G
• New automaton states = connected components of G
• 8 x 2 Zn, only one connected component of G
contains states containing x
• Uses path-reversibility
• => Well-defined transition function
• Random walk in components to choose outputs
Talk Outline
•
Proof of Main Theorem
1. Reduction to path-independent automata
2. From path-independent automata to linear
sketches
•
Applications and Open Questions
Path Independent Automata and
Submodules
•
•
•
•
•
•
•
•
Let o be the initial state
M = {x 2 Zn such that x in o}
0n 2 M
If x 2 M, then –x 2 M
If x, y 2 M, then x+y 2 M
M is a free submodule of Zn (a lattice)
M has a basis
Any two bases have the same cardinality
Path Independent Automata and
Sketches
• States of automaton are elements (cosets) of the
quotient module Zn/M
• Space of automaton is log of the number of
cosets containing an x 2 {-m, …, m}n
• Goal: build a sketching algorithm A¢x
–
–
–
–
A is fixed for this automaton
Space of A¢x ¼ space of automaton
Injection from states of automaton to states of A¢x
Will replace {-m, …, m}n with {-m/n, …, m/n}n
Smith Normal Form
• Zn/M examples:
– Zn/e1 is free. It remembers all but first coordinate
– Zn/(2e1, 2e2, …, 2en) not free. It remembers coordinate
parities
• Smith Normal Form: 9 a basis y1, …, yn of Zn for
which the generators of M are qi¢yi for i = 1, …, r,
where qi | qi+1 are positive integers, and r = rank(M)
• If qs = 1 but qs+1 > 1, the generators of Zn/M are
ys+1+ M, …, yn + M, and Zn/M is isomorphic to
Z/qs+1 © … © Z/qr © Zn-r
Counting States
• Define n x n matrix B, where i-th column Bi is
coefficients of ei in basis y1, …, yn
• State = Bx mod q, after removing first s rows
• For each i, there are x != x’ 2 {-m/n, …, m/n}n with
1. (Bx)i != (Bx’)i mod qi
2. (Bx)j = (Bx’)j mod qj 8 j != i
• Proof: otherwise delete row i
• Corollary: # states ¸ 2n-s
Removing Torsion
• Let sketch be Bx without mod q
– After reducing entries in B mod q
• For each old state Bx mod q, at most (mn)n-s
new states
• # new states <= #(old states)*(mn)n-s
• log(# new states) <= log(# old states)*log(mn)
Handling Large Entries in B
• Want B in Z(n-s) x s to have integer entries of value
at most poly(n)
• Removing states from M outside of {-m, …, m}n,
can assume qi · exp(poly(n))
• Take random linear combinations of rows of B,
reduce each row mod a random prime
• Whp if Bx != By, after this transformation to B, Bx
!= By
Talk Outline
•
Proof of Main Theorem
1. Reduction to path-independent automata
2. From path-independent automata to linear
sketches
•
Applications and Open Questions
Applications and Open Questions
• Simpler proof of ~(n1-2/p) bit lower bound
for estimating Fp, p > 2
– No communication complexity
• Many dimension lower bounds known for
sketching norms over the reals
– Fp, matrix norms, adaptive sketching
– Do these give turnstile streaming lower
bounds with finite precision?
Download