Introduction to Streaming Algorithms Jeff M. Phillips September 21, 2013

advertisement
Introduction to Streaming Algorithms
Jeff M. Phillips
September 21, 2013
Network Router
Internet Router
I
data per day: at least I
Terabyte
I
packet takes 8
nanoseconds to pass
through router
I
few million packets per
second
What statistics can we keep on
data?
Want to detect anomalies for
security.
router
packet
Telephone Switch
Cell phones connect through
switches
I
each message 1000 Bytes
I
500 Million calls / day
I
1 Terabyte per month
Search for characteristics for
dropped calls?
switch
txt msg
o rd s
earc
ad
click
page
keyw
Serving Ads on web
Google, Yahoo!, Microsoft
view
Ad Auction
h
I
Yahoo.com viewed 100
trillion times
I
2 million / hour
I
Each page serves ads;
which ones?
How to update ad delivery
model?
server
ad
served
delivery
model
Flight Logs on Tape
All airplane logs over Washington, DC
I
About 500 - 1000 flights per day.
I
50 years, total about 9 million
flights
I
Each flight has trajectory,
passenger count, control dialog
Stored on Tape. Can make 1 pass!
What statistics can be gathered?
CPU
statistics
Streaming Model
memory
length m
CPU makes ”one pass” on data
CPU
I
I
I
word 2 [n]
Ordered set A = ha1 , a2 , . . . , am i
Each ai ∈ [n], size log n
Compute f (A) or maintain f (Ai )
for Ai = ha1 , a2 , . . . , ai i.
Streaming Model
memory
length m
CPU makes ”one pass” on data
CPU
I
I
I
word 2 [n]
Ordered set A = ha1 , a2 , . . . , am i
Each ai ∈ [n], size log n
Compute f (A) or maintain f (Ai )
for Ai = ha1 , a2 , . . . , ai i.
I
Space restricted to
S = O(poly(log m, log n)).
I
Updates O(poly(S)) for each ai .
Streaming Model
memory
length m
Space:
CPU
word 2 [n]
I
Ideally S = O(log m + log n)
I
log n = size of 1 word
I
log m = to store number of words
Streaming Model
memory
length m
Space:
CPU
I
Ideally S = O(log m + log n)
I
log n = size of 1 word
I
log m = to store number of words
Updates:
word 2 [n]
I
O(S 2 ) or O(S 3 ) can be too much!
I
Ideally updates in O(S)
Easy Example: Average
length m
memory
CPU
word 2 [n]
I
Each ai a number in [n]
I
f (Ai ) = average({a1 , . . . , ai })
Easy Example: Average
length m
memory
CPU
word 2 [n]
I
Each ai a number in [n]
I
I
f (Ai ) = average({a1 , . . . , ai })
P
Maintain: i and s = ij=1 ai .
I
f (Ai ) = s/i
Easy Example: Average
length m
memory
CPU
word 2 [n]
I
Each ai a number in [n]
I
I
f (Ai ) = average({a1 , . . . , ai })
P
Maintain: i and s = ij=1 ai .
I
f (Ai ) = s/i
I
Problem? s is bigger than a word!
Easy Example: Average
length m
memory
CPU
word 2 [n]
I
Each ai a number in [n]
I
I
f (Ai ) = average({a1 , . . . , ai })
P
Maintain: i and s = ij=1 ai .
I
f (Ai ) = s/i
I
Problem? s is bigger than a word!
I
s is not bigger than (log s/ log n)
words (big int data structure)
I
usually 2 or 3 words is fine
Trick 1: Approximation
Return fˆ(A) instead of f (A) where
|f (A) − fˆ(A)| ≤ ε · f (A).
fˆ(A) is a (1 + ε)-approximation of f (A).
Trick 1: Approximation
Return fˆ(A) instead of f (A) where
|f (A) − fˆ(A)| ≤ ε · f (A).
fˆ(A) is a (1 + ε)-approximation of f (A).
Example: Average
k = log(1/")
I
(a) the count: i
I
(b) top k = log(1/ε) + 1 bits of s: ŝ
I
(c) number of bits in s
Let fˆ(A) = ŝ/i
0000
I
Trick 1: Approximation
Return fˆ(A) instead of f (A) where
|f (A) − fˆ(A)| ≤ ε · f (A).
fˆ(A) is a (1 + ε)-approximation of f (A).
Example: Average
k = log(1/")
I
(a) the count: i
I
(b) top k = log(1/ε) + 1 bits of s: ŝ
I
(c) number of bits in s
Let fˆ(A) = ŝ/i
0000
I
First bit has ≥ (1/2)f (A)
Second bit has ≤ (1/4)f (A)
jth bit has ≤ (1/2j )f (A)
∞
X
(1/2j )f (A) < (1/2k )f (A) < ε · f (A)
j=k+1
Trick 1: Approximation
Return fˆ(A) instead of f (A) where
|f (A) − fˆ(A)| ≤ ε · f (A).
fˆ(A) is a (1 + ε)-approximation of f (A).
Example: Average
k = log(1/")
I
(a) the count: i
I
(b) top k = log(1/ε) + 1 bits of s: ŝ
I
(c) number of bits in s
Let fˆ(A) = ŝ/i
0000
I
First bit has ≥ (1/2)f (A)
Second bit has ≤ (1/4)f (A)
jth bit has ≤ (1/2j )f (A)
∞
X
(1/2j )f (A) < (1/2k )f (A) < ε · f (A)
j=k+1
Where did I cheat?
Trick 2: Randomization
Return fˆ(A) instead of f (A) where
h
i
Pr |f (A) − fˆ(A)| > ε · f (A) ≤ δ.
fˆ(A) is a (1 + ε, δ)-approximation of f (A).
Trick 2: Randomization
Return fˆ(A) instead of f (A) where
h
i
Pr |f (A) − fˆ(A)| > ε · f (A) ≤ δ.
fˆ(A) is a (1 + ε, δ)-approximation of f (A).
Can fix previous cheat using randomization and Morris Counter
(Morris 78, Flajolet 85)
Decreasing Probability of Failure
Investment Company (IC) sends out 100 × 2k emails:
I 2k−1 say Stock A will go up in next week
I 2k−1 say Stock A will go down in next week
After 1 week, 1/2 of email receivers got good advice.
Decreasing Probability of Failure
Investment Company (IC) sends out 100 × 2k emails:
I 2k−1 say Stock A will go up in next week
I 2k−1 say Stock A will go down in next week
After 1 week, 1/2 of email receivers got good advice.
Next week, IC sends letters 2k−1 letters, only to those who got
good advice.
I 2k−2 say Stock B will go up in next week.
I 2k−2 say Stock B will go down in next week.
After 2 weeks, 1/4 of all receivers have gotten good advice twice.
Decreasing Probability of Failure
Investment Company (IC) sends out 100 × 2k emails:
I 2k−1 say Stock A will go up in next week
I 2k−1 say Stock A will go down in next week
After 1 week, 1/2 of email receivers got good advice.
Next week, IC sends letters 2k−1 letters, only to those who got
good advice.
I 2k−2 say Stock B will go up in next week.
I 2k−2 say Stock B will go down in next week.
After 2 weeks, 1/4 of all receivers have gotten good advice twice.
After k weeks 100 receivers got good advice
I IC now asks each for money to receive future stock tricks.
Decreasing Probability of Failure
Investment Company (IC) sends out 100 × 2k emails:
I 2k−1 say Stock A will go up in next week
I 2k−1 say Stock A will go down in next week
After 1 week, 1/2 of email receivers got good advice.
Next week, IC sends letters 2k−1 letters, only to those who got
good advice.
I 2k−2 say Stock B will go up in next week.
I 2k−2 say Stock B will go down in next week.
After 2 weeks, 1/4 of all receivers have gotten good advice twice.
After k weeks 100 receivers got good advice
I IC now asks each for money to receive future stock tricks.
I Don’t actually do this!!!
Decreasing Probability of Failure
Investment Company (IC) sends out 100 × 2k emails:
I 2k−1 say Stock A will go up in next week
I 2k−1 say Stock A will go down in next week
After 1 week, 1/2 of email receivers got good advice.
Next week, IC sends letters 2k−1 letters, only to those who got
good advice.
I 2k−2 say Stock B will go up in next week.
I 2k−2 say Stock B will go down in next week.
After 2 weeks, 1/4 of all receivers have gotten good advice twice.
After k weeks 100 receivers got good advice
I IC now asks each for money to receive future stock tricks.
I Don’t actually do this!!!
If you are on IC’s original email list, with what probability will you
not receive k good stock tips?
Decreasing Probability of Failure
Investment Company (IC) sends out 100 × 2k emails:
I 2k−1 say Stock A will go up in next week
I 2k−1 say Stock A will go down in next week
After 1 week, 1/2 of email receivers got good advice.
Next week, IC sends letters 2k−1 letters, only to those who got
good advice.
I 2k−2 say Stock B will go up in next week.
I 2k−2 say Stock B will go down in next week.
After 2 weeks, 1/4 of all receivers have gotten good advice twice.
After k weeks 100 receivers got good advice
I IC now asks each for money to receive future stock tricks.
I Don’t actually do this!!!
If you are on IC’s original email list, with what probability will you
not receive k good stock tips?
1 − (1/2)k
Sliding Window
Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items).
Goal: maintain f (Ai−s,i ).
Sliding Window
Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items).
Goal: maintain f (Ai−s,i ).
Another model: Each ai = (v , t) where t is a time stamp.
[w ]
Let Ai = {a = (v , t) ∈ Ai | t ≥ tnow − w }
[w ]
Goal: maintain f (Ai ).
Sliding Window
Let Ai−s,i = {ai−s , ai−s+1 , . . . , ai } (the last s items).
Goal: maintain f (Ai−s,i ).
Another model: Each ai = (v , t) where t is a time stamp.
[w ]
Let Ai = {a = (v , t) ∈ Ai | t ≥ tnow − w }
[w ]
Goal: maintain f (Ai ).
Simpler solution: Decay rate γ.
Maintain a summary Si = f (Ai );
at each time step update Si+1 = f ((1 − γ)Si ∪ ai+1 ).
Semi-Streaming Model
Streaming on Graphs.
Semi-Streaming Model
Streaming on Graphs.
Each ai = (vi , vj ) is an edge.
I
Is graph connected?
I
Size of best matching? (each vertex in at most one pair)
Semi-Streaming Model
Streaming on Graphs.
Each ai = (vi , vj ) is an edge.
I
Is graph connected?
I
Size of best matching? (each vertex in at most one pair)
Too hard!
Assume that all vertices can fit in memory, say O(n log n) space.
For 1 million vertices, may be ok,
but not for 1 billion vertices (e.g. Facebook).
Markov Inequality
Let X be a random variable (RV).
Let a > 0 be a parameter.
Pr [|X | ≥ a] ≤
E[|X |]
.
a
Chebyshev’s Inequality
Let Y be a random variable.
Let b > 0 be a parameter.
Pr [|Y − E[Y ]| ≥ b] ≤
Var[|Y |]
.
b2
Chernoff Inequality
Let
Let
Let
Let
{X1 , X2 , . . . , Xr } be independent random variables.
∆i = P
max{Xi } − min{Xi }.
M = ri=1 Xi .
α > 0 be a parameter.
"
#
r
X
−2α2
Pr |M −
E[Xi ]| ≥ α ≤ 2 exp P 2
i ∆i
i=1
Chernoff Inequality
Let
Let
Let
Let
{X1 , X2 , . . . , Xr } be independent random variables.
∆i = P
max{Xi } − min{Xi }.
M = ri=1 Xi .
α > 0 be a parameter.
"
#
r
X
−2α2
Pr |M −
E[Xi ]| ≥ α ≤ 2 exp P 2
i ∆i
i=1
Often: ∆ = maxi ∆i
and
E[Xi ] = 0 then:
−2α2
Pr [|M| ≥ α] ≤ 2 exp
r ∆2i
Attribution
These slides borrow from material by Muthu Muthukrishnan:
http://www.cs.mcgill.ca/~denis/notes09.pdf
and Amit Chakrabarti:
http://www.cs.dartmouth.edu/~ac/Teach/CS85-Fall09/
Download