Seminar on Communication Complexity Computing On Data Streams Student Tom Roshko Instructor Ronitt Rubinfeld Based on “Computing on data Streams”\Monika Rauch Henzinger, Prabhakar Raghavan, and Sridar Rajagopalan\ Digital Systems Research Center Today’s special case of communication •Reminder - communication complexity model: •Alice and Bob have their inputs, they communicate using a protocol and solve a problem. •Today’s special case: Alice can only send Bob can only receive Both Alice&Bob might have input Bob has to decide the answer Bit Vector Probe problem •Under the limited communication model Alice gets a vector of bits x1…xm Bob gets an index 1 ≤ i ≤m Bob has to output xi as an answer •The answer is :Alice has to send Bob the whole vector prove ? quite intuitive •Sending the whole vector means communication complexity O(m) k-Layered Graphs •Vertices are a disjoint union of k sets •Edges only between following indexed sets V = i =1Vi k E ⊆ {(vi , vi +1 ) | vi ∈ Vi , vi +1 ∈ Vi +1} The MAX Problem – Recursive Definition: » Let u1 be the node of largest out-degree in V1. » Let ui∈Vi be a node of largest in-degree among those incident to ui-1. – Our Goal : Find uk. u1 v2 Vk-1 vk What can we do with MAX? • Assume we suspect a relation between number of links to/from a web site and it’s relevancy • We have a list of web sites, all linked from our “super web site” u1. • Duplicate the web pages k times (k our depth) u1 vk Link Web site Theorem no.1 - The MAX problem Our stream : the graph (V,E) as: Adjacency list, pairs of form (vi,k,vi+1,j) Theorem no.1 - The MAX problem Lower bound under the streaming model (one pass allowed): claim : MAX ≤ bit − vector − probe → MAX ∈ Ω(| E |) = Ω(| V 2 |) Can we reduce a problem in streaming model to a communication complexity problem??? Theorem no.1 - The MAX problem Can we reduce a problem in streaming model to a communication complexity problem? Reduction function Reduction function stream MAX solver Reduction function Adjacency items: Other : Reduction function claim : MAX ≤ bit − vector − probe → MAX ∈ Ω(m) •Given a vector of bits x0…xm-1 we construct a 2-layer graph •.| V |= 3 m , | U |= 3 m •Split V,U into 1/3, 2/3 groups •Edge (vi,uj)∈E bit no.i m + j is set. •Upon getting a probe for the i‘th bit... v0 v1 v2 v3 v4 v5 v6 v7 v8 u0 u1 u2 u3 u4 u5 u6 u7 u8 0 0 1 0 2 0 3 1 4 0 5 0 6 0 7 1 8 0 1 = 0 × 93+=11× 9 + 0 7 = 2× 9 +1 (v0 , u1 ) ∉ E (v1 , u0 ) ∈ E (v2 , u1 ) ∈ E Theorem no.1 - The MAX problem •Probed for the i‘th bit (i.e. 7’th bit) •Calculate i = v × m + u 7 = 2× m +1 •Add edges (v,a) for each a∊u2 Add edges (v2,a) for each a∊u2 •Add edges (a,u) for each a∊v2 Add edges (a,u1) for each a∊v2 v0 v1 v2 v3 v4 v5 v6 v7 v8 u0 u1 u2 u3 u4 u5 u6 u7 u8 Claim : u is MAX (v,u)∊E probed bit xi=1 •Surely v=v2 has the maximal out-degree at V •Surely u=u1 has the maximal in-degree at U If (v,u)∈E then v is MAX Theorem no.1 - The MAX problem If we could just get another chance on the stream (allow 2 passes) we could get a much lower space complexity result… Theorem 1 : if we ease the streaming model to allow 2 passes over the stream MAX ∈ O(kn log n) n = max k i =0 | Vi |≅ m Theorem no.1 - The MAX problem The First Pass – compute the in-degree of each vertex (for u∊V1 take the out-degree) • as the stream is an adjacency list just count edges for each vertex, each degree takes log(n) space. • How many edges are there? let n= max | V | then we get | V |≤ nk k i =0 i Space comp. for first pass : nk log n Maximal # of vertexes Space for degree nk log n Theorem no.1 - The MAX problem The Second Pass – for each v∈Vi find it’s maximal degree neighbor in Vi+1 • Need log(n) space to represent a vertex’s serial number. • As the adjacency list is streamed, retrieve from our storage (created on the first pass) the degree. • Again : Space comp. for second pass : nk log n Maximal # of vertexes Space maximal degree neighbor representatino nk log n nk log n Theorem no.1 - The MAX problem The third stage : compute u1 and repeatedly find the highest degree neighbor of the current node until uk is computed. – Requires log n space. nk log n nk log n log n Theorem no.1 - The MAX problem The third stage : compute u1 and repeatedly find the highest degree neighbor of the current node until uk is computed. – Requires log n space. nk log n + nk log n + log n = O(nk log n) Theorem no.1 - The MAX problem Out of Our Scope: • Max is quite hard, can prove that approximationg it for any (1+ε)/2 in one pass takes O(n2) space • A generlization to P>1 passes of the shown kn log n algorithm gives a result of O( P ) space Theorem no.2 - The MAXTOTAL problem Find a node v1∊V1 which is connected to the largest number (has a path to) nodes in Vk v1 Example : want to know from which airport in Africa can one reach the biggest number of airports in America via Europe? Theorem no.2 - The MAXTOTAL problem Example : want to know from which airport in Africa can one reach the biggest number of airports in America via Europe? African Airports European Airports American Airports Airport Flight Answer : model with a 3 layered graph : African airports, European airports, American airports edges for each flight , find MAXTOTAL Theorem no.2 - The MAXTOTAL problem claim : MAXTOTAL ≤ bit − vector − probe → MAXTOTAL ∈ Ω(m) •Given a vector of bits x1…xm we construct a 4-layer graph •|Z|= m½ +1, |V|=m½ +1, |U|= m½ +1,|W|= m½ +1 •Add Edges between the last nodes of Z,V,U,W •Edge (vi,uj)∈E bit no.i m + j is set. – note this is an injective func. •Upon getting a probe for the i‘th bit... v1 z m +1 u1 v1 v m +1 u m +1 w1 w m +1 Theorem no.2 - The MAXTOTAL problem claim : MAXTOTAL ≤ bit − vector − probe → MAXTOTAL ∈ Ω(m) •Probed for the i‘th bit •Calculate i = v m + u •Add edges (u,wi) for every wi∊W •Add edge (z,v) •Now z is MAXTOTAL (v,u)∊E o/w z1/2+1 is MAXTOTAL z z1 v1 z v m +1 v m +1 u1 u m +1 w1 w m +1 If z is MAXTOTAL return 1 otherwise return 0 Las-Vegas Vs. Monte-Carlo •Las-Vegas – A randomized algorithm is called Las-Veges if it gives the correct answer on all input sequences , it’s running time or workspace could be a variable depending on the coins tosses. •Monte-Carlo - – A randomized algorithm is called Monte-Carlo with error probabily ε if it gives the correct answer with probability at least 1-ε , if no ε is specified, it is assumed to be 2/3. Theorem 3 - • Symmetry problem : given relation R as pairs (u,v) determine whether R is symmetrical (for every (u,v)∊R (v,u)∊R • Claim : the symmetry problem has a montecarlo algorithm with error probability 2log2n/n with space complexity O(logn). • Claim : any Las-Vegas algorithm that solves the symmetry problem with 1 pass uses Ω(n2) space.