Introduction to Streaming Algorithms Jeff M. Phillips September 21, 2013

Introduction to Streaming Algorithms Jeff M. Phillips September 21, 2013 Network Router Internet Router I data per day: at least I Terabyte I packet takes 8 nanoseconds to pass through router I few million packets per second What statistics can we keep on data? Want to detect anomalies for security. router packet Telephone Switch Cell phones connect through switches I each message 1000 Bytes I 500 Million calls / day I 1 Terabyte per month Search for characteristics for dropped calls? switch txt msg o rd s earc ad click page keyw Serving Ads on web Google, Yahoo!, Microsoft view Ad Auction h I Yahoo.com viewed 100 trillion times I 2 million / hour I Each page serves ads; which ones? How to update ad delivery model? server ad served delivery model Flight Logs on Tape All airplane logs over Washington, DC I About 500 - 1000 flights per day. I 50 years, total about 9 million flights I Each flight has trajectory, passenger count, control dialog Stored on Tape. Can make 1 pass! What statistics can be gathered? CPU statistics Streaming Model memory length m CPU makes ”one pass” on data CPU I I I word 2 [n] Ordered set A = ha1 , a2 , . . . , am i Each ai ∈ [n], size log n Compute f (A) or maintain f (Ai ) for Ai = ha1 , a2 , . . . , ai i. Streaming Model memory length m CPU makes ”one pass” on data CPU I I I word 2 [n] Ordered set A = ha1 , a2 , . . . , am i Each ai ∈ [n], size log n Compute f (A) or maintain f (Ai ) for Ai = ha1 , a2 , . . . , ai i. I Space restricted to S = O(poly(log m, log n)). I Updates O(poly(S)) for each ai . Streaming Model memory length m Space: CPU word 2 [n] I Ideally S = O(log m + log n) I log n = size of 1 word I log m = to store number of words Streaming Model memory length m Space: CPU I Ideally S = O(log m + log n) I log n = size of 1 word I log m = to store number of words Updates: word 2 [n] I O(S 2 ) or O(S 3 ) can be too much! I Ideally updates in O(S) Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I f (Ai ) = average({a1 , . . . , ai }) Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I I f (Ai ) = average({a1 , . . . , ai }) P Maintain: i and s = ij=1 ai . I f (Ai ) = s/i Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I I f (Ai ) = average({a1 , . . . , ai }) P Maintain: i and s = ij=1 ai . I f (Ai ) = s/i I Problem? s is bigger than a word! Easy Example: Average length m memory CPU word 2 [n] I Each ai a number in [n] I I f (Ai ) = average({a1 , . . . , ai }) P Maintain: i and s = ij=1 ai . I f (Ai ) = s/i I Problem? s is bigger than a word! I s is not bigger than (log s/ log n) words (big int data structure) I usually 2 or 3 words is fine Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Example: Average k = log(1/") I (a) the count: i I (b) top k = log(1/ε) + 1 bits of s: ŝ I (c) number of bits in s Let fˆ(A) = ŝ/i 0000 I Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Example: Average k = log(1/") I (a) the count: i I (b) top k = log(1/ε) + 1 bits of s: ŝ I (c) number of bits in s Let fˆ(A) = ŝ/i 0000 I First bit has ≥ (1/2)f (A) Second bit has ≤ (1/4)f (A) jth bit has ≤ (1/2j )f (A) ∞ X (1/2j )f (A) < (1/2k )f (A) < ε · f (A) j=k+1 Trick 1: Approximation Return fˆ(A) instead of f (A) where |f (A) − fˆ(A)| ≤ ε · f (A). fˆ(A) is a (1 + ε)-approximation of f (A). Example: Average k = log(1/") I (a) the count: i I (b) top k = log(1/ε) + 1 bits of s: ŝ I (c) number of bits in s Let fˆ(A) = ŝ/i 0000 I First bit has ≥ (1/2)f (A) Second bit has ≤ (1/4)f (A) jth bit has ≤ (1/2j )f (A) ∞ X (1/2j )f (A) < (1/2k )f (A) < ε · f (A) j=k+1 Where did I cheat? Trick 2: Randomization Return fˆ(A) instead of f (A) where h i Pr |f (A) − fˆ(A)| > ε · f (A) ≤ δ. fˆ(A) is a (1 + ε, δ)-approximation of f (A). Trick 2: Randomization Return fˆ(A) instead of f (A) where h i Pr |f (A) − fˆ(A)| > ε · f (A) ≤ δ. fˆ(A) is a (1 + ε, δ)-approximation of f (A). Can fix previous cheat using randomization and Morris Counter (Morris 78, Flajolet 85) Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. I Don’t actually do this!!! Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. I Don’t actually do this!!! If you are on IC’s original email list, with what probability will you not receive k good stock tips? Decreasing Probability of Failure Investment Company (IC) sends out 100 × 2k emails: I 2k−1 say Stock A will go up in next week I 2k−1 say Stock A will go down in next week After 1 week, 1/2 of email receivers got good advice. Next week, IC sends letters 2k−1 letters, only to those who got good advice. I 2k−2 say Stock B will go up in next week. I 2k−2 say Stock B will go down in next week. After 2 weeks, 1/4 of all receivers have gotten good advice twice. After k weeks 100 receivers got good advice I IC now asks each for money to receive future stock tricks. I Don’t actually do this!!! If you are on IC’s original email list, with what probability will you not receive k good stock tips? 1 − (1/2)k Sliding Window Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items). Goal: maintain f (Ai−s,i ). Sliding Window Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items). Goal: maintain f (Ai−s,i ). Another model: Each ai = (v , t) where t is a time stamp. [w ] Let Ai = {a = (v , t) ∈ Ai | t ≥ tnow − w } [w ] Goal: maintain f (Ai ). Sliding Window Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items). Goal: maintain f (Ai−s,i ). Another model: Each ai = (v , t) where t is a time stamp. [w ] Let Ai = {a = (v , t) ∈ Ai | t ≥ tnow − w } [w ] Goal: maintain f (Ai ). Simpler solution: Decay rate γ. Maintain a summary Si = f (Ai ); at each time step update Si+1 = f ((1 − γ)Si ∪ ai+1 ). Semi-Streaming Model Streaming on Graphs. Semi-Streaming Model Streaming on Graphs. Each ai = (vi , vj ) is an edge. I Is graph connected? I Size of best matching? (each vertex in at most one pair) Semi-Streaming Model Streaming on Graphs. Each ai = (vi , vj ) is an edge. I Is graph connected? I Size of best matching? (each vertex in at most one pair) Too hard! Assume that all vertices can fit in memory, say O(n log n) space. For 1 million vertices, may be ok, but not for 1 billion vertices (e.g. Facebook). Markov Inequality Let X be a random variable (RV). Let a > 0 be a parameter. Pr [|X | ≥ a] ≤ E[|X |] . a Chebyshev’s Inequality Let Y be a random variable. Let b > 0 be a parameter. Pr [|Y − E[Y ]| ≥ b] ≤ Var[|Y |] . b2 Chernoff Inequality Let Let Let Let {X1 , X2 , . . . , Xr } be independent random variables. ∆i = P max{Xi } − min{Xi }. M = ri=1 Xi . α > 0 be a parameter. " # r X −2α2 Pr |M − E[Xi ]| ≥ α ≤ 2 exp P 2 i ∆i i=1 Chernoff Inequality Let Let Let Let {X1 , X2 , . . . , Xr } be independent random variables. ∆i = P max{Xi } − min{Xi }. M = ri=1 Xi . α > 0 be a parameter. " # r X −2α2 Pr |M − E[Xi ]| ≥ α ≤ 2 exp P 2 i ∆i i=1 Often: ∆ = maxi ∆i and E[Xi ] = 0 then: −2α2 Pr [|M| ≥ α] ≤ 2 exp r ∆2i Attribution These slides borrow from material by Muthu Muthukrishnan: http://www.cs.mcgill.ca/~denis/notes09.pdf and Amit Chakrabarti: http://www.cs.dartmouth.edu/~ac/Teach/CS85-Fall09/

Introduction to Streaming Algorithms Jeff M. Phillips September 21, 2013

Related documents

Products

Support

Introduction to Streaming Algorithms Jeff M. Phillips September 21, 2013

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib