Introduction to Streaming Algorithms Jeff M. Phillips September 21, 2013 Network Router Internet Router I data per day: at least I Terabyte I packet takes 8 nanoseconds to pass through router I few million packets per second What statistics can we keep on data? Want to detect anomalies for security. router packet Telephone Switch Cell phones connect through switches I each message 1000 Bytes I 500 Million calls / day I 1 Terabyte per month Search for characteristics for dropped calls? switch txt msg o rd s earc ad click page keyw Serving Ads on web Google, Yahoo!, Microsoft view Ad Auction h I Yahoo.com viewed 100 trillion times I 2 million / hour I Each page serves ads; which ones? How to update ad delivery model? server ad served delivery model Flight Logs on Tape All airplane logs over Washington, DC I About 500 - 1000 flights per day. I 50 years, total about 9 million flights I Each flight has trajectory, passenger count, control dialog Stored on Tape. Can make 1 pass! What statistics can be gathered? CPU statistics Streaming Model memory length m CPU makes ”one pass” on data CPU I I I word 2 [n] Ordered set A = ha1 , a2 , . . . , am i Each ai ∈ [n], size log n Compute f (A) or maintain f (Ai ) for Ai = ha1 , a2 , . . . , ai i. Streaming Model memory length m CPU makes ”one pass” on data CPU I I I word 2 [n] Ordered set A = ha1 , a2 , . . . , am i Each ai ∈ [n], size log n Compute f (A) or maintain f (Ai ) for Ai = ha1 , a2 , . . . , ai i. I Space restricted to S = O(poly(log m, log n)). I Updates O(poly(S)) for each ai . Streaming Model memory length m Space: CPU word 2 [n] I Ideally S = O(log m + log n) I log n = size of 1 word I log m = to store number of words Streaming Model memory length m Space: CPU I Ideally S = O(log m + log n) I log n = size of 1 word I log m = to store number of words Updates: word 2 [n] I O(S 2 ) or O(S 3 ) can be too much! I Ideally updates in O(S) Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I f (Ai ) = average({a1 , . . . , ai }) Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I I f (Ai ) = average({a1 , . . . , ai }) P Maintain: i and s = ij=1 ai . I f (Ai ) = s/i Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I I f (Ai ) = average({a1 , . . . , ai }) P Maintain: i and s = ij=1 ai . I f (Ai ) = s/i I Problem? s is bigger than a word! Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I I f (Ai ) = average({a1 , . . . , ai }) P Maintain: i and s = ij=1 ai . I f (Ai ) = s/i I Problem? s is bigger than a word! I s is not bigger than (log s/ log n) words (big int data structure) I usually 2 or 3 words is fine Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Example: Average k = log(1/") I (a) the count: i I (b) top k = log(1/ε) + 1 bits of s: ŝ I (c) number of bits in s Let fˆ(A) = ŝ/i 0000 I Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Example: Average k = log(1/") I (a) the count: i I (b) top k = log(1/ε) + 1 bits of s: ŝ I (c) number of bits in s Let fˆ(A) = ŝ/i 0000 I First bit has ≥ (1/2)f (A) Second bit has ≤ (1/4)f (A) jth bit has ≤ (1/2j )f (A) ∞ X (1/2j )f (A) < (1/2k )f (A) < ε · f (A) j=k+1 Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Example: Average k = log(1/") I (a) the count: i I (b) top k = log(1/ε) + 1 bits of s: ŝ I (c) number of bits in s Let fˆ(A) = ŝ/i 0000 I First bit has ≥ (1/2)f (A) Second bit has ≤ (1/4)f (A) jth bit has ≤ (1/2j )f (A) ∞ X (1/2j )f (A) < (1/2k )f (A) < ε · f (A) j=k+1 Where did I cheat? Trick 2: Randomization Return fˆ(A) instead of f (A) where h i Pr |f (A) − fˆ(A)| > ε · f (A) ≤ δ. fˆ(A) is a (1 + ε, δ)-approximation of f (A). Trick 2: Randomization Return fˆ(A) instead of f (A) where h i Pr |f (A) − fˆ(A)| > ε · f (A) ≤ δ. fˆ(A) is a (1 + ε, δ)-approximation of f (A). Can fix previous cheat using randomization and Morris Counter (Morris 78, Flajolet 85) Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. I Don’t actually do this!!! Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. I Don’t actually do this!!! If you are on IC’s original email list, with what probability will you not receive k good stock tips? Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. I Don’t actually do this!!! If you are on IC’s original email list, with what probability will you not receive k good stock tips? 1 − (1/2)k Sliding Window Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items). Goal: maintain f (Ai−s,i ). Sliding Window Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items). Goal: maintain f (Ai−s,i ). Another model: Each ai = (v , t) where t is a time stamp. [w ] Let Ai = {a = (v , t) ∈ Ai | t ≥ tnow − w } [w ] Goal: maintain f (Ai ). Sliding Window Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items). Goal: maintain f (Ai−s,i ). Another model: Each ai = (v , t) where t is a time stamp. [w ] Let Ai = {a = (v , t) ∈ Ai | t ≥ tnow − w } [w ] Goal: maintain f (Ai ). Simpler solution: Decay rate γ. Maintain a summary Si = f (Ai ); at each time step update Si+1 = f ((1 − γ)Si ∪ ai+1 ). Semi-Streaming Model Streaming on Graphs. Semi-Streaming Model Streaming on Graphs. Each ai = (vi , vj ) is an edge. I Is graph connected? I Size of best matching? (each vertex in at most one pair) Semi-Streaming Model Streaming on Graphs. Each ai = (vi , vj ) is an edge. I Is graph connected? I Size of best matching? (each vertex in at most one pair) Too hard! Assume that all vertices can fit in memory, say O(n log n) space. For 1 million vertices, may be ok, but not for 1 billion vertices (e.g. Facebook). Markov Inequality Let X be a random variable (RV). Let a > 0 be a parameter. Pr [|X | ≥ a] ≤ E[|X |] . a Chebyshev’s Inequality Let Y be a random variable. Let b > 0 be a parameter. Pr [|Y − E[Y ]| ≥ b] ≤ Var[|Y |] . b2 Chernoff Inequality Let Let Let Let {X1 , X2 , . . . , Xr } be independent random variables. ∆i = P max{Xi } − min{Xi }. M = ri=1 Xi . α > 0 be a parameter. " # r X −2α2 Pr |M − E[Xi ]| ≥ α ≤ 2 exp P 2 i ∆i i=1 Chernoff Inequality Let Let Let Let {X1 , X2 , . . . , Xr } be independent random variables. ∆i = P max{Xi } − min{Xi }. M = ri=1 Xi . α > 0 be a parameter. " # r X −2α2 Pr |M − E[Xi ]| ≥ α ≤ 2 exp P 2 i ∆i i=1 Often: ∆ = maxi ∆i and E[Xi ] = 0 then: −2α2 Pr [|M| ≥ α] ≤ 2 exp r ∆2i Attribution These slides borrow from material by Muthu Muthukrishnan: http://www.cs.mcgill.ca/~denis/notes09.pdf and Amit Chakrabarti: http://www.cs.dartmouth.edu/~ac/Teach/CS85-Fall09/