PPT

Streaming Algorithms Piotr Indyk MIT Data Streams • A data stream is a sequence of data that is too large to be stored in available memory • Examples: – Network traffic – Sensor networks – Approximate query optimization and answering in large databases – Scientific data streams – ..And this talk Outline • Data stream model, motivations, connections • Streaming problems and techniques – Estimating number of distinct elements – Other quantities: Norms, moments, heavy hitters… – Sparse approximations and compressed sensing Basic Data Stream Model • Single pass over the data: i1, i2,…,in – Typically, we assume n is known • Bounded storage (typically n or logc n) – Units of storage: bits, words, coordinates or „elements” (e.g., points, nodes/edges) • Fast processing time per element – Randomness OK (in fact, almost always necessary) 8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ... Connections • External memory algorithms – Linear scan works best – Streaming algorithms yield efficient external memory algorithms • Communication complexity/sketching – Low-space algorithms yield lowcommunication protocols • Metric embeddings, sub-linear time algorithms, pseudo-randomness, compressive sensing A B Counting Distinct Elements • Stream elements: numbers from {1...m} • Goal: estimate the number of distinct elements DE in the stream – Up to 1 – With probability 1-P • Simpler goal: for a given T>0, provide an algorithm which, with probability 1-P: – Answers YES, if DE> (1+)T – Answers NO, if DE< (1-)T • Run, in parallel, the algorithm with T=1, 1+, (1+)2,..., n – Total space multiplied by log1+n ≈ log(n)/  Vector Interpretation Stream: 8 2 1 9 1 9 2 4 4 9 4 2 5 4 2 5 8 5 2 5 Vector X: 1 2 3 4 5 6 7 8 9 • Initially, x=0 • Insertion of i is interpreted as xi = xi +1 • Want to estimate DE(x) = the number of non-zero elements of x Estimating DE(x) (based on[Flajolet-Martin’85]) Vector X: 1 2 3 4 5 6 7 8 9 Set S: • • • • + ++ (T=4) Choose a random set S of coordinates – For each i, we have Pr[iS]=1/T Maintain SumS(x) = iS xi Estimation algorithm I: – – YES, if SumS(x)>0 NO, if SumS(x)=0 Analysis: – – – Pr=Pr[SumS(x)=0] = (1-1/T)DE For T large enough: (1-1/T)DE ≈e-DE/T Using calculus, for  small enough: • If DE> (1+)T, then Pr ≈ e-(1+) < 1/e - /3 • if DE< (1-)T, then Pr ≈ e-( 1-) > 1/e + /3 0.8 0.7 Pr 0.6 0.5 0.4 Series1 0.3 0.2 0.1 0 1 3 5 7 9 11 13 15 17 19 DE Estimating DE(x) ctd. • We have an algorithm: – If DE> (1+)T, then Pr<1/e-/3 – if DE< (1-)T, then Pr>1/e+/3 • Need to amplify the probability of correctness: – Select sets S1 … Sk , k=O(log(1/P)/2) – Let Z = number of SumSj(x) that are equal to 0 – By Chernoff bound with probability >1-P • If DE> (1+)T, then Z<k/e • if DE< (1-)T, then Z>k/e • Total space: – Decision version: O(log (1/P)/2 ) numbers in range 0…n – Estimation: : O(log(n)/ * log (1/P)/2 ) numbers in range 0…n (the probability of error is O(P log(n)/) ) • Better algorithms known: – Theory: O( 1/ 2 +log n) bits [Kane-Nelson-Woodruff’10] – Practice: need 128 bytes for all works of Shakespeare , ε≈10% [Durand-Flajolet’03] Chernoff bound Let X1…Xn be a sequence of i.i.d. 0-1 random variables such that Pr[Xi=1]=p. Let X=sumi Xi Then for any eps<1 we have Pr[ |X-pn| >eps pn] <2e-eps^2 pn/3 Comments • Implementing S: – Choose a uniform hash function h: {1..m} -> {1..T} – Define S={i: h(i)=1} – We have i  S iff h(i)=1 • Implementing h – In practice: to compute h(i) do: • Seed=i • Return random() More comments Vector X: 1 2 3 4 5 6 7 8 9 • The algorithm uses “linear sketches” SumSj(x)=iSj xi • Can implement decrements xi=xi-1 – I.e., the stream can contain deletions of elements (as long as x≥0) – Other names: dynamic model, turnstile model More General Problem • What other functions of a vector x can we maintain in small space ? • Lp norms: ||x||p = ( ∑i |xi|p )1/p – We also have ||x|| =maxi |xi| – … and we can define ||x||0 = DE(x), since ||x||pp =∑i |xi|p→DE(x) as p→0 • Alternatively: frequency moments Fp = p-th power of Lp norms (exception: F0 = L0 ) • How much space do you need to estimate ||x||p (for const. ) ? • Theorem: – For p[0,2]: polylog n space suffices – For p>2: n1-2/p polylog n space suffices and is necessary [Alon-Matias-Szegedy’96, Feigenbaum-Kannan-Strauss-Viswanathan’99, Indyk’00, Coppersmith-Kumar’04, Ganguly’04, Bar-Yossef-JayramKumar-Sivakumar’02’03, Saks-Sun’03, Indyk-Woodruff’05] Estimating L2 norm: AMS Alon-Matias-Szegedy’96 • Choose r1 … rm to be i.i.d. r.v., with Pr[ri=1]=Pr[ri=-1]=1/2 • Maintain Z=∑i ri xi under increments/decrements to xi • Algorithm I: Y=Z2 • “Claim”: Y “approximates” ||x||22 with “good” probability Analysis • We will use Chebyshev inequality – Need expectation, variance • The expectation of Z2 = (∑i ri xi )2 is equal to E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj] • We have – For i≠j, E[rirj] = E[ri] E[rj] =0 – term disappears – For i=j, E[rirj] =1 • Therefore E[Z2] = ∑i xi2 =||x||22 (unbiased estimator) Analysis, ctd. • The second moment of Z2 = (∑i ri xi )2 is equal to the expectation of Z4 = (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) • This can be decomposed into a sum of – ∑i (ri xi )4 →expectation= ∑i xi 4 – 6 ∑i<j (ri rj xixj )2 →expectation= 6∑i<j xi2 xj2 – Terms involving single multiplier ri xi (e.g., r1x1r2x2r3x3r4x4) →expectation=0 Total: ∑i xi 4 + 6∑i<j xi2 xj2 • The variance of Z2 is equal to E[Z4]-E2[Z2] = ∑i xi 4 + 6∑i<j xi2 xj2 – (∑i xi2 )2 = ∑i xi 4 + 6∑i<j xi2 xj2 – ∑i xi4 -2 ∑i<j xi2 xj2 = 4∑i<j xi2 xj2 ≤ 2 (∑i xi 2 )2 Analysis, ctd. • We have an estimator Y=Z2 – E[Y] = ∑i xi2 – σ2 =Var[Y] ≤ 2 (∑i xi 2 )2 • Chebyshev inequality: Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c2 • Algorithm II: – Maintain Z1 … Zk (and thus Y1 … Yk ), define Y’ = ∑i Yi /k – E[Y’] = k ∑i xi2 /k = ∑i x i2 – σ’2 = Var[Y’] ≤ 2k(∑i xi 2 )2 /k2 = 2 (∑i xi 2 )2 /k • Guarantee: Pr[ |Y’ - ∑i xi2 | ≥c (2/k)1/2 ∑i xi2 ] ≤ 1/c2 • Setting c to a constant and k=O(1/ε2) gives (1 ε)approximation with const. probability Comments • Only needed 4-wise independence of r1…rm – I.e., for any subset S of {1…m}, |S|=4, and any b{-1,1}4, we have Pr[rS=b]=1/24 – Can generate such vars from O(log m) random bits • What we did: – Maintain a “linear sketch” vector Z=[Z1...Zk] = R x – Estimator for ||x||22 : (Z12 +... + Zk2)/k = ||Rx||22 /k – “Dimensionality reduction”: x→ Rx … but the tail somewhat “heavy” – Reason: only used second moment of the estimator

PPT

Related documents

Products

Support

PPT

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib