Sketching and Streaming Entropy via Approximation Theory

Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT) Streaming Model m updates x = (9, (0, 2, (1, 0, 0, 5, 0, 1, …,12) …, 0) x ∈ ℤn Algorithm Goal: Compute statistics, e.g. ||x||1, ||x||2 … Trivial solution: Store x (or store all updates) O(n·log(m)) space Goal: Compute using O(polylog(nm)) space Streaming Algorithms (a very brief introduction) • Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] p x Can compute (1±) p = (1±)Fp using O(-2 logc n) bits of space (if 0 p2) O(-O(1) n1-2/p · logO(1)(n)) bits (if 2<p) • Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] – Proofs using communication complexity and information theory Practical Motivation • General goal: Dealing with massive data sets – Internet traffic, large databases, … • Network monitoring & anomaly detection – Stream consists of internet packets – xi = # packets sent to port i – Under typical conditions, x is very concentrated – Under “port scan attack”, x less concentrated – Can detect by estimating empirical entropy [Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07] Entropy • Probability distribution a = (a1, a2, …, an) • Entropy H(a) = -Σ ailg(ai) • Examples: – a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) – a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 • small when concentrated, LARGE when not Streaming Algorithms for Entropy • How much space to estimate H(x)? – [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] – [Chakrabarti-Cormode-McGregor ‘07]: multiplicative (1±) approx: O(-2 log2 m) bits additive  approx: O(-2 log4 m) bits ~ -2) lower bound for both Ω( • Our contributions: – Additive  or multiplicative (1±) approximation – Õ(-2 log3 m) bits, and can handle deletions – Can sketch entropy in the same space First Idea If you can estimate Fp for p≈1, then you can estimate H(x) Why? Rényi entropy Review of Rényi • Definition: H p ( x)   p p log x p / x 1  1 p Hp(x) 0 1 p 2 … Claude AlfredShannon Rényi • Convergence to Shannon: lim p 1 H p ( x)  H ( x) Overview Analysis of Algorithm • Set p=1.01 and let ~x = x / x 1 ~ • Compute y  (1   )  x p p (using Li’s “compressed counting”) ~ p log( x p ) log( 1   ) 1 • Set H  log( y)   1 p 1 p 1 p ~ ~ • So H  H ( x)  100  H1.01 ( x)  100  H (x ) As p1 this gets better this gets worse! Making the tradeoff • How quickly does Hp(x) converge to H(x)? • Theorem: Let x~ be distr., with mini x~i ≥ 1/m. Multiplicative Approximation    Let 1  p  1  O  . Then  log m  H ( x~) 1  1  H p ( x~) Additive Approximation     . Then Let 1  p  1  O  log 2 m    0  H (~x)  H p ( x~)   • Plugging in: O(-3 log4 m) bits of space suffice for additive  approximation Proof: A trick worth remembering • Let f : ℝ  ℝ and g : ℝ  ℝ be such that lim p 1 f ( p)  0 lim p 1 g ( p)  0 lim p 1 f ( p) L g ( p) • l’Hopital’s rule says that lim p 1 f ( p) L g ( p) • It actually says more! It says at least as fast as f ( p ) g ( p) f ( p) g ( p) does. converges to L Improvements • Status: additive  approx using O(-3 log4 m) bits • How to reduce space further? – Interpolate with multiple points: Hp1(x), Hp2(x), ... LEGEND Shannon Single Rényi Hp(x) Multiple Rényis 0 1 p 2 … Analyzing Interpolation Hp(x) 0 1 2 p • Let f(z) be a Ck+1 function • Interpolate f with polynomial q with q(zi)=f(zi), 0≤i≤k • Fact: f ( y )  q( y )  (b  a) k 1 sup f ( k 1) ( z ) z[ a ,b ] where y, zi [a,b] • Our case: Set f(z) = H1+z(x) • Goal: Analyze f(k+1)(z) … Bounding Derivatives • Rényi derivatives are messy to analyze ~ p 1 x p • Switch to Tsallis entropy f(z) = S1+z(x), S p ( x)  p 1 • Can prove Tsallis also converges to Shannon Define: Gk ( z )  n k x log ( xi ) i ~ 1 z ~ i 1 k ( 1) k  j k!G ( z )   (1) k!(1  G0 ( z ))  j (k )  f ( z)    k 1 k  j  1  j 1  z z j !   k ( k 1) k 1    sup f z  O H ( x ) log m Fact: z[ a ,b ] (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m Key Ingredient: Noisy Interpolation • We don’t have f(zi), we have f(zi)±ε • How to interpolate in presence of noise? • Idea: we pick our zi very carefully Chebyshev Polynomials Tk ( x)  cos( k  arccos( x)) • Rogosinski’s Theorem: q(x) of degree k and |q(βj)|≤ 1 (0≤j≤k) |q(x)| ≤ |Tk(x)| for |x| > 1 • • • • Map [-1,1] onto interpolation interval [z0,zk] Choose zj to be image of βj, j=0,…,k ~ Let q(z) interpolate f(zj)±ε and q(z) interpolate f(zj) ~ r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! Tradeoff in Choosing zk Tk grows quickly once leaving [z0, zk] z0 • zk close to 0  |Tk(preimage(0))|still small • …but zk close to 0  high space complexity • Just how close do we need 0 and zk to be? 0 zk The Magic of Chebyshev • [Paturi ’92]:Tk(1 + 1/kc) ≤ 1-(c/2) 4k e . Set c = 2. • Suffices to set zk=-O(1/(k3log m)) • Translates to Õ(-2 log3 m) space The Final Algorithm (additive approximation) • Set k = lg(1/) + lglg(m), zj = (k2cos(jπ/k)-(k2+1))/(9k3lg(m)) (0 ≤ j ≤ k) ~ ~ • Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 ≤ j ≤ k ~ • Interpolate degree-k polynomial q(zj) = S1+zj ~ ~ • Output q(0) Multiplicative Approximation • How to get multiplicative approximation? – Additive approximation is multiplicative, unless H(x) is small – H(x) small  x  large [CCM ’07] • • • • Suppose xi  x  and define RFp   xip i i We combine (1±ε)RF1 and (1±ε)RF1+zj to get (1±ε)f(zj) Question: How do we get (1±ε)RFp? Two different approaches: * * – A general approach (for any p, and negative frequencies) – An approach exploiting p ≈ 1, only for nonnegative freqs (better by log(m)) Questions / Thoughts • For what other problems can we use this “generalize-then-interpolate” strategy? – Some non-streaming problems too? • The power of moments? • The power of residual moments? CountMin (CM ’05) + CountSketch (CCF ’02)  HSS (Ganguly et al.) • WANTED: Faster moment estimation (some progress in [Cormode-Ganguly ’07])

Sketching and Streaming Entropy via Approximation Theory

Related documents

Products

Support

Sketching and Streaming Entropy via Approximation Theory

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib