Part A

advertisement
Seminar on Communication Complexity
Computing On Data Streams
Student
Tom Roshko
Instructor
Ronitt Rubinfeld
Based on “Computing on data Streams”\Monika Rauch Henzinger, Prabhakar
Raghavan, and Sridar Rajagopalan\ Digital Systems Research Center
Today’s special case of communication
•Reminder - communication complexity model:
•Alice and Bob have their inputs, they communicate
using a protocol and solve a problem.
•Today’s special case:
Alice can only send
Bob can only receive
Both Alice&Bob might have
input
 Bob has to decide the answer
Bit Vector Probe problem
•Under the limited communication model
Alice gets a vector of bits x1…xm
Bob gets an index 1 ≤ i ≤m
Bob has to output xi as an answer
•The answer is :Alice has to send Bob the whole
vector
prove ? quite intuitive
•Sending the whole vector means
communication complexity O(m)
k-Layered Graphs
•Vertices are a disjoint union of k sets
•Edges only between following indexed sets
V = i =1Vi
k
E ⊆ {(vi , vi +1 ) | vi ∈ Vi , vi +1 ∈ Vi +1}
The MAX Problem
– Recursive Definition:
» Let u1 be the node of largest out-degree in
V1.
» Let ui∈Vi be a node of largest in-degree
among those incident to ui-1.
– Our Goal : Find uk.
u1
v2
Vk-1
vk
What can we do with MAX?
• Assume we suspect a relation between
number of links to/from a web site and it’s
relevancy
• We have a list of web sites, all linked from our
“super web site” u1.
• Duplicate the web pages k times (k our depth)
u1
vk
Link
Web site
Theorem no.1 - The MAX problem
Our stream : the graph (V,E) as:
Adjacency list, pairs of form (vi,k,vi+1,j)
Theorem no.1 - The MAX problem
Lower bound under the streaming model (one
pass allowed):
claim : MAX ≤ bit − vector − probe →
MAX ∈ Ω(| E |) = Ω(| V 2 |)
Can we reduce a problem in
streaming model to a
communication complexity
problem???
Theorem no.1 - The MAX problem
Can we reduce a problem in streaming model
to a communication complexity problem?
Reduction
function
Reduction
function
stream
MAX solver
Reduction
function
Adjacency items:
Other :
Reduction function
claim : MAX ≤ bit − vector − probe → MAX ∈ Ω(m)
•Given a vector of bits x0…xm-1 we construct a 2-layer graph
•.| V |= 3 m , | U |= 3 m
•Split V,U into 1/3, 2/3 groups
•Edge (vi,uj)∈E  bit no.i m + j is set.
•Upon getting a probe for the i‘th bit...
v0
v1
v2
v3
v4
v5
v6
v7
v8
u0
u1
u2
u3
u4
u5
u6
u7
u8
0
0
1
0
2
0
3
1
4
0
5
0
6
0
7
1
8
0
1 = 0 × 93+=11× 9 + 0
7 = 2× 9 +1
(v0 , u1 ) ∉ E (v1 , u0 ) ∈ E
(v2 , u1 ) ∈ E
Theorem no.1 - The MAX problem
•Probed for the i‘th bit (i.e. 7’th bit)
•Calculate i = v × m + u
7 = 2× m +1
•Add edges (v,a) for each a∊u2 Add edges (v2,a) for each a∊u2
•Add edges (a,u) for each a∊v2 Add edges (a,u1) for each a∊v2
v0
v1
v2
v3
v4
v5
v6
v7
v8
u0
u1
u2
u3
u4
u5
u6
u7
u8
Claim : u is MAX  (v,u)∊E probed bit xi=1
•Surely v=v2 has the maximal out-degree at V
•Surely u=u1 has the maximal in-degree at U
 If (v,u)∈E then v is MAX
Theorem no.1 - The MAX problem
If we could just get another chance on the
stream (allow 2 passes) we could get a
much lower space complexity result…
Theorem 1 : if we ease the streaming model
to allow 2 passes over the stream
MAX ∈ O(kn log n)
n = max
k
i =0
| Vi |≅ m
Theorem no.1 - The MAX problem
The First Pass – compute the in-degree of each
vertex (for u∊V1 take the out-degree)
• as the stream is an adjacency list just count
edges for each vertex, each degree takes
log(n) space.
• How many edges are there?
let n= max | V | then we get | V |≤ nk
k
i =0
i
Space comp. for first pass :
nk log n
Maximal # of vertexes
Space for degree
nk log n
Theorem no.1 - The MAX problem
The Second Pass – for each v∈Vi find it’s
maximal degree neighbor in Vi+1
• Need log(n) space to represent a vertex’s
serial number.
• As the adjacency list is streamed, retrieve
from our storage (created on the first pass)
the degree.
• Again : Space comp. for second pass :
nk log n
Maximal # of
vertexes
Space maximal degree
neighbor representatino
nk log n
nk log n
Theorem no.1 - The MAX problem
The third stage : compute u1 and repeatedly
find the highest degree neighbor of the
current node until uk is computed.
– Requires log n space.
nk log n
nk log n
log n
Theorem no.1 - The MAX problem
The third stage : compute u1 and repeatedly
find the highest degree neighbor of the
current node until uk is computed.
– Requires log n space.
nk log n + nk log n + log n = O(nk log n)
Theorem no.1 - The MAX problem
Out of Our Scope:
• Max is quite hard, can prove that
approximationg it for any (1+ε)/2 in one
pass takes O(n2) space
• A generlization to P>1 passes of the shown
kn log n
algorithm gives a result of O( P ) space
Theorem no.2 - The MAXTOTAL problem
Find a node v1∊V1 which is connected to the
largest number (has a path to) nodes in Vk
v1
Example : want to know from which airport in Africa can one reach the
biggest number of airports in America via Europe?
Theorem no.2 - The MAXTOTAL problem
Example : want to know from which airport in
Africa can one reach the biggest number of
airports in America via Europe?
African
Airports
European
Airports
American
Airports
Airport
Flight
Answer : model with a 3 layered graph : African airports,
European airports, American airports edges for each flight , find
MAXTOTAL
Theorem no.2 - The MAXTOTAL problem
claim : MAXTOTAL ≤ bit − vector − probe → MAXTOTAL ∈ Ω(m)
•Given a vector of bits x1…xm we construct a 4-layer graph
•|Z|= m½ +1, |V|=m½ +1, |U|= m½ +1,|W|= m½ +1
•Add Edges between the last nodes of Z,V,U,W
•Edge (vi,uj)∈E  bit no.i m + j is set. – note this is an
injective func.
•Upon getting a probe for the i‘th bit...
v1
z
m +1
u1
v1
v
m +1
u
m +1
w1
w
m +1
Theorem no.2 - The MAXTOTAL problem
claim : MAXTOTAL ≤ bit − vector − probe → MAXTOTAL ∈ Ω(m)
•Probed for the i‘th bit
•Calculate i = v m + u
•Add edges (u,wi) for every wi∊W
•Add edge (z,v)
•Now z is MAXTOTAL  (v,u)∊E o/w z1/2+1 is MAXTOTAL
z
z1
v1
z
v
m +1
v
m +1
u1
u
m +1
w1
w
m +1
If z is MAXTOTAL return 1 otherwise return 0
Las-Vegas Vs. Monte-Carlo
•Las-Vegas – A randomized algorithm is called Las-Veges if it
gives the correct answer on all input sequences , it’s running time
or workspace could be a variable depending on the coins tosses.
•Monte-Carlo - – A randomized algorithm is called Monte-Carlo
with error probabily ε if it gives the correct answer with
probability at least 1-ε , if no ε is specified, it is assumed to be
2/3.
Theorem 3 -
• Symmetry problem : given relation R as pairs
(u,v) determine whether R is symmetrical
(for every (u,v)∊R  (v,u)∊R
• Claim : the symmetry problem has a montecarlo algorithm with error probability
2log2n/n with space complexity O(logn).
• Claim : any Las-Vegas algorithm that solves
the symmetry problem with 1 pass uses
Ω(n2) space.
Download