>> Yuval Peres: Good morning everyone and we are... Stanford tell us about algorithms for bipartite matching problems and...

advertisement
>> Yuval Peres: Good morning everyone and we are happy to have Michael from
Stanford tell us about algorithms for bipartite matching problems and some applications.
>> Michael Kapralov: Thank you Yuval. So I will tell you about algorithms for bipartite
matching problems with connections to streaming and sparsification. Let me start with
the motivation. The need to process modern massive datasets imposes constraints on the
type of algorithms that we can use and very often we have constraints on the space usage
for the algorithm and also very often on the type of access that the algorithm can have to
the data. So for example, we can no longer assume that we can just load the whole input
into memory and have random access to it.
Well, this might raise the need to design succinct representations of the input that
preserve and perhaps approximately the properties of the input that we care about. So for
graph algorithms, which are the main topic of the talk, cut preserving graph sparsification
has become an important way to get a consistent representation of the input and has
become a fundamental part of the algorithmic toolkit. So since its invention in 1996 by
Benczur-Karger it has found numerous applications to undirected flow and cut problems.
However the sparsification for directed graphs is still a challenging open problem. So this
talk is centered around the following topics. First I will talk about some algorithms for
bipartite matching problems that will you sparsification and random walks in novel ways.
And here we should note that matchings are in a sense midway between directed and
undirected flow. Then we will talk about the question of how we can actually implement
cut preserving graph sparsification in modern data models. Then we will also talk about a
new notion of sparsification that we have for bipartite matching problems and if time
permits I will say some words on some new connections between different notions of
sparsification, in particular between spectral sparsification and spanners.
So more precisely this talk will have three or four parts depending on how much time I
have. I will first present some sub linear time algorithms for finding perfect matching in
bipartite regular graphs. Then I will talk about a new notion of sparsification related to
matchings in the streaming model and show some applications. Then I will mention some
work that we did on getting a distributed streaming implementation of cut sparsification
and finally some connections between spectral sparsification and spanners leading to
effective algorithms for spectral sparsification.
So I'm going to start in the first part I want to talk about sub linear time algorithms for
finding perfect matchings in regular bipartite graphs. We will get an algorithm that trains
in time order and log n. So I'm going to start with the background. So here we have the
bipartite graph G. The slices of the bi-partition will be denoted by P and Q and as a
consequence of regularity the slices of P and Q are the same which we denote by N. The
number of edges is denoted by M. Recall that the graph G is d-regular if the degree of
every vertex is equal to D, so in particular the number of all edges is just N times D. The
substantive edge is in M as in matching if no two edges in M share an endpoint and M is
a perfect matching if M is a matching and the size of M is exactly N. That is that N
matches all of the vertices in the graph. It is easy to see using Hall’s theorem that every
d-regular bipartite graph has a perfect matching. And finding one such matching in a dregular bipartite graph is the object of our talk.
These graphs having been studied extensively and the context of expander constructions
routing scheduling and task assignment and have several applications in communitorial
translation. So in particular I will also show applications to two problems. The first is
edge coloring of bipartite multi-graphs and the second is Burkhoff-von-Neumann
decompositions of doubly stochastic matrices. This problem has actually seen quite a bit
of algorithmic history, just close to 100 years. The first algorithm can be dated all the
way back to Konig in 1916 when Konig gave an algorithmic proof of existence. At that
time of course people were not thinking about algorithms, but one can see that Konig’s
proof runs in order mn time. 1974 Hopcroft and Karp gave their famous algorithm for
finding maximum matchings in general bipartite graphs that runs in time order m route n.
In 1982 Gabow and Kariv considered the question restricted to regular graphs and
obtained a very beautiful linear time algorithm for covering imperfect matching when the
degree of the graft is a power of two. This is a really nice algorithm; in fact it doesn't use
augmenting paths. Instead it does Euler tours to decompose the graph into regular graphs
of smaller degrees.
After that there were three improvements over about 20 years. The first by Cole and
Hopcroft, then by Shrijver and finally by Cole, Ost and Schirra who in 2000 obtained a
linear time algorithm that works for general degrees, so they removed the assumption that
d is a power two. This algorithm is extremely efficient. Linear time is just the time that
we need to revamp it. What else can we hope for here? And the question that we ask is
do we actually need to read the whole [inaudible]? So can we go sub linear here? For
sub linear algorithms it is of course important to fix the format in which the data is given,
so for the purposes of this talk we are assuming that the graph was given an adjacency
area of representation, which means that each vertex has an array of incident edges.
So a natural conjecture would be the following. What if we take a random sample of the
edges of the graph where each edge will be present independently with certain probability
and maybe we can prove that for certain sampling rates the matching will be preserved, a
perfect matching will be preserved in the sample with high probability. If we could do
that we could then run some standard algorithm like Hopcroft and Karp for general
graphs and maybe get an improvement. Well, this is a reasonable conjecture and
furthermore it turns out to be true. And this is something that we proved in 2009. We
show that it is sufficient to sample a uniform subgraph of certain size to, the sizes given
by the following expression that depends on n and the degree but the main point is that
this is never bigger than n to, than m root n. And if we take such a sample of this regular
graph then we show that a perfect matching will be preserved in the sample with high
probability.
Using Hopcroft-Karp and in the right region for the sampling, gives us an algorithm with
runtime n to 1.75 and this is sub linear for dense enough graphs, so this is a result. So we
do have a sub linear algorithm but n to 1.75 doesn't look like a natural stopping point. It
seems that this should be improvable and also it seems that if uniform sampling works
then most probably non-uniform sampling can help improve the runtime. That is also
correct. If we show that there is a two-stage sampling scheme that is a uniform sampling
followed by a non-uniform sampling process, together with a specialized analysis of the
run time of Hopcroft-Karp on these subsample graphs, gives us a runtime which is worstcase n to 1.5 and in fact is linear in the size of this uniform sample.
So at this point n to 1.5 is a fairly natural runtime for bipartite matching algorithms
especially given the Hopcroft-Karp algorithm, and furthermore one can see that this
runtime is optimal if we commit to the scheme that we're using, that is uniform sampling
first and then running the Hopcroft-Karp algorithm. However, the structure of worst-case
examples suggested to us that perhaps we can get an improvement if we somehow
managed to combine the sampling process and the process of augmentation. So this in
fact can be done and this is the main result of this part. We show that there exists a
randomized algorithm for finding a perfect matching in a d-regular bipartite graph as long
as the graph was given an adjacency area of representation that takes order n log n time
both in expectation and with high probability. So first…
>>: [inaudible].
>> Michael Kapralov: Okay.
>>: Probably the proof that two stages?
>> Michael Kapralov: Yes. The proof will show you the expectation part and hyper
ability will follow easily. So let me know the following. First the runtime of this
algorithm is independent of the degree of the graph, so basically we are independent of
the size of the input. Furthermore, the runtime is within an algorithmic factor of output
complexity because we need m over n time for just output matching. So now I will show
you the algorithm which is in fact quite simple and give the analysis. So the algorithm
will use augmenting paths, to repeatedly increase the size of the currently constructed
matching. At this point let me remind you that an augmenting path with respect to a
partial matching is a path that starts on one side of the graph, on the P side of the graph at
an unmatched vertex and then alternates between taking unmatched edges until it comes
to the Q side of the graph and an unmatched vertex. We need a randomization of this
process and a very natural randomization is the following. Instead of taking an arbitrary
step at odd steps, let's take a uniformly random outgoing edge which is unmatched at odd
steps in this path and still take matched edges at even steps. So this is something that we
refer to as the alternating random walk.
Let me give an example. Here we have a four regular graph and a matching 53. So the
green nodes are unmatched. The blue nodes are matched so the alternating random walk
starts at a uniformly random unmatched node on the P side, a green node, takes a
uniformly random outgoing edge and takes the matched edge back and proceeds in this
way. So note, for example, that it can easily visit certain vertices more than once.
Eventually it arrives at an unmatched node on the Q side of the graph. Great. So it
should be noted that if we have a sequence of steps taken by the alternating random walk
from the P to the Q side, then we can get an augmenting path from this sequence of steps
simply by removing loops. Here we have a loop. If we remove it we get a length three
augmenting path.
And now our algorithm works as follows. We start with the empty matching and then
repeatedly for K from one to N. We run the alternating random walk with respect to the
matching that we have reconstructed so far and wait until it hits an unmatched vertex in
the Q side of the graph. And we augment using the augmenting path that we get from
this walk and proceed. So I will now show that the algorithm above finds a perfect
matching in order and log n time, so to do that it will be convenient to introduce the
following concept where you find the matching graph H, which depends on graph G and
a partial matching M in the following way; so let me illustrate this. So here again we
have bipartite graph and a matching M on size 3 so let me first orient all edges from P to
Q. Then I will add a source and a sink so this source is connected to unmatched vertices
on the left and the edges are drawn in thick because in fact they are deep [inaudible]
edges for each thick edge. The sink is connected to the nodes on the right and now let's
look at the matched edges, and we will just contract all matched edges into super nodes.
So this is our matching graph H.
Well, our algorithm can be formulated in a very simple way in terms of this matching
graph, so what we are doing is the following, we are starting with the empty matching
and then repeatedly we run this simple random walk, from the source in this matching
graph and wait until it hits the sink. Once we have the sequence of steps, we augment
using the path that we obtained from it. So what we need to show right--the main lemma
in our analysis will say the following that if we have our d-regular bipartite graph and
matching M that leaves two K nodes unmatched, so K nodes on each side, then the
expected time until the simple random walk in the matching graph that we start from the
source, hits the sink is at most one plus n over K. So when we start with a very small
matching, if we use a lot of nodes unmatched, K is large; it will be extremely easy to find
an augmenting path with respect to this matching. It will get progressively harder but the
cumulative effort will be small anyway.
So now let me prove this statement. The proof will be very simple and it will be
convenient to modify the matching graph a little bit. Let's look at the nodes at the source
and the sink and let's merge them into one supernode. Okay. So the process we were
running on the matching graph was the simple random walk from s to t. Now in this case
this directly corresponds to starting your random walk at this new supernode s and
waiting until it gets back to s. And what we need to analyze than is the expected return
time to this vertex s. So what really helps is the fact that the graph that we are getting is a
balanced directed graph. As is probably most obvious to most of you that this will be
easy to analyze and in fact we know that for a balanced directed graph the distribution of
the simple random walk can be described in a simple way. So first let's check that it is
actually balanced. We have several types of nodes here. There is the supernode, then
there are these blue nodes that corresponded to matching edges that we contracted. Well
they have in this case in degree 3 and out degree 3 and in general is D- one and the green
nodes have out degree d and in degree d because these edges are thick. So they are
balanced too. Good. So we have a balanced directed graph. Now we know that the
distribution of the simple random walk on such a graph is proportional, is uniform over
edges and so the mass at the vertex is proportional to vertices odd degree.
At the same time what we are interested in is the return time to the special mode S. But
the return time was just the expected return time was just the inverse of the stationary
distribution, and now we can prove the result that we want. Now the degree of the node
S, of the supernode is D times K. So K is the number of nodes that were left unmatched,
so intuitively when the matching small a lot of nodes are unmatched this random walk
will spend a lot of time at S and that is good for us. So now we can do the calculation
and the calculation shows that the quantity is at most a 1+N over K. So this proves the
main lemma. Now it is easy to get the runtime analysis because we simply have N steps
and each step takes expected time, at most 1+N over K. so the runtime was bounded by
the summation of these quantities from 1 to N and that is exactly N times 1 plus the
harmonic number of N, that's [inaudible] again. Now this was the expected time analysis.
To get the hyper ability result we can just apply standard techniques truncate random
walks appropriately and use concentration. Okay. Great.
So this was order m log n algorithm for recovering one perfect matching. Now let me
show some applications. So the first application will be to edge coloring bipartite multigraphs. And here we get an extremely simple order m log n algorithm. Now this is
slightly slower than the best-known. The best-known is order m log d where d is the
degree, but the algorithm is so simple that I don't want to state it. So the algorithm works
in two steps. The first step is standard. We take the bipartite multi-graph and transform
it into a bipartite regular graph. Know in the next step we can simply take out matchings
from this d-regular graph that we get one by one. Each matching will take n log n time to
find, and we will be done in order n log n time in general. Now here, I am skipping this
point that when we run the alternating random walk, it is important to be able to sample a
uniformly random outgoing edge that is not matched efficiently. This has to be very fine;
it can indeed be shown that the sampling can be implemented in constant amortized time.
What seems very nice about this is that the fact that our algorithm for recovering one
perfect matching is extremely efficient. It takes n log n time irrespective of the size of
the input. Now we can find such edge colorings in a very simple manner just by taking
out matchings one by one. So another application is to find these matchings in doubly
stochastic matrices, and so here if we are given n by n doubly stochastic matrix with m
nonzero entries, then the Birkhoff-von-Neumann decomposition theorem says that every
such matrix can be represented as a common combination of at most M permutations
matrices. The question is can we recover such a decomposition efficiently? What if we
just sketch how this works and…
>>: [inaudible].
>> Michael Kapralov: So d is the number of bits that we use to represent the numbers in
the matrix. So since it is a doubly stochastic matrix we need to specify what kind of a
representation we use. So there are some known algorithms that find such decomposition
and for example they take order m times b time to find a single matching in the support of
the doubly stochastic matrix and they take order mb log n time to compute the whole
decomposition.
I want to just say that we have a very simple algorithm with a very efficient runtime here
because we can view this matrix M as a multi-graph and essentially the same analysis
will go through. So we can run our algorithm as long as we can implement the sampling
stage which is sampling the uniformly random outgoing edge. This is a little harder in
this case than in the edge coloring case, but we can in fact implement this in order log n
time and we get some efficient algorithms. Let me skip this. Great. So these, this was
the main algorithm and two applications. Now I want to mention some lower bounds.
We proved two statements here. The first we proved that randomization was crucial to
obtaining sub linear time algorithms and particular any deterministic algorithm has to
take at least linear time. So the algorithm of Cole Austin Sherrer which found a matching
in linear time on the size of the input was essentially optimal.
Furthermore, we show that we cannot improve upon the n log n runtime if we want an
algorithm that works with high probability. Essentially, what this shows is that while we
cannot rule out the existence of an algorithm that finds a matching n order n expected
time and terminates with probability one half, let's say but if we want an algorithm that
terminates with high probability, then it has to take at least n log n time in the worst case.
Great. So this completes the first part of the talk. So I talked about sub linear time
algorithms for finding regular matchings in bipartite graphs and some of them use
sparsification at least the first ones of them.
So now I want to spend a few minutes mentioning a different project that we worked on.
And this is about graph sparsification and how we can implement graph sparsification in
modern data architectures, in particular in ActiveDHTs. I will define what that is in just a
few minutes. Even though we use graph sparsification in the first part to obtain the first
use of linear term algorithms, I didn't define it, so let me define now. If we have a graph
G, a weighted undirected graph G, and there is some graph H is a cut sparsifier and
epsilon cut sparsifier of G, if all cuts in H are within a one plus minus epsilon factor of
the corresponding cuts in G. So this is a great concept because if H is sparse then we can
use H instead of G and cut over cut-based optimization problems getting better runtimes.
The famous theorem of Benczur-Karger proved in ‘96 shows how to construct such
sparsifiers. In particular, they show that one can calculate probabilities for each edge
when calculating the probability of P such that if edges of the graph are sampled with
these probabilities and then we weight the sampled edges appropriately, the resulting
weighted graph H will be a sparsifier of G with high probability. Furthermore, BenczurKarger also gave a nearly linear time algorithm for finding these weights and hence
constructing this sparsifier.
Since 1996 this has found numerous applications and cut and flow problems, and in fact
has become an integral part of the algorithmic toolkit, arguably alongside such
fundamental primitives such as BFS and DFS. So this motivates the need to obtain
efficient implementations of sparsification in modern data models. So the question that
we ask here is can one get an efficient implementation of cut sparsification in a
distributed streaming setting. Ideally we want an algorithm that works in a single pass in
a distributed streaming setting. To put this in perspective, one might think of the
situation where the nodes of the graph do not fit into the memory of one compute node.
And our architecture for this will be ActiveDHT which I will define in a few moments. It
should be noted that efficient implementations are known for random access model and
one pass streaming model, but we also want to be efficient in the distributed setting. So
let me say a few words about what ActiveDHT is and before I do that I need to remind
you of MapReduce steps. MapReduce is an immensely successful paradigm that
transformed off-line analytics and bulk data processing recently. And MapReduce data is
stored as key value pairs in a distributed file system and computations are the sequence of
iterations of certain MapReduce steps. MapReducers are essentially processes that
essentially are compute nodes and there is a programming paradigm that specifies how
they interact.
The main point here is MapReduce is great for off-line data processing. ActiveDHT, on
the other hand, may potentially become as important for online data processing as
MapReduce is for off-line problems. ActiveDHT here in fact stands for active distributed
hash table and the hash table is active in the sense that besides supporting lookups,
deletions and insertions, it also supports running arbitrary functions on key value pairs.
There are some examples of these systems implemented, for example Twitter’s Storm
system and Yahoo's S4 and the main applications are distributed stream processing and
continuous MapReduce. So one might think of this as MapReduce where the mappers
and reducers do not interact according to this rigid paradigm and iterations, well, they
have the ability to talk to each other continuously. These are fairly new systems and in
fact they are our challenges in implementing them which have not yet been fully solved
such as the inefficiency of small network requests, robustness, but people are working on
that. So this is all that I will say about ActiveDHTs.
Now what we are interested in here is constructing a sparsifier on ActiveDHT. So let me
sketch how this will work. To do that at first I would like to look at how standard
efficient algorithms for constructing sparsifiers work. In general, there are two steps.
First one needs to find these probabilities PE, find a sampling probabilities and once we
have the probabilities we can sample independently using these rates and weight edges
appropriately. The most important step here is, of course, how we find the probabilities
and at a high level the observation that we use is that one can estimate these sampling
probabilities using a hierarchy of union find data structures. The benefit of this will be
that union find will be fairly easy to distribute. Of course there are some challenges that
we need to work on to make this work. One of those is the fact that we need to estimate
connectivities and sample at the same time. We have to control the size of the sampling
when this happens, but this can be done. Another interesting point is we have to, when
we distribute the union find data structure, we have to ensure two things, first that our
distributed implementation does not lead to excessive communication, and furthermore,
we need to ensure some load-balancing properties. That is, not only do we have small
communication, also that the communication is somewhat evenly spread across compute
nodes.
These are some challenges that we can overcome and let me just state what we are
getting, so we get an efficient distributed stream processing algorithm which computes a
sparsifier on ActiveDHTs in one pass. And it has some favorable space usage properties
and good communication and load balance. So I have to skip most of the details here but
I am happy to chat off-line if somebody is interested. Great.
Now I just, so far I have been talking about sub linear algorithms for matchings and cut
sparsification. Now in the remaining time I would like to talk a little bit about a new
notion of sparsification that is related to bipartite matching problems that we recently
introduced, and I will show you some applications for approximating matching in one
pass in the streaming model. So let me now introduce this definition. Suppose that we
have a bipartite graph G. The sides of the bi-partition are P and Q and for simplicity we
will assume that the sides are equal. So the size of P is the size of Q equal to n. Now we
call this subgraph H an epsilon cover of G, if H preserves sizes of matchings between any
pair of subsets A and P and B and Q, up to an epsilon n additive error. So here is an
example. Suppose we have a graph G here, so this is the P side. This is a Q side. And
the condition that H is an epsilon cover says that which ever two sets A and B that we
look at and we compute the maximum matching between the two in G, and then we
compare it to the maximum matching of H, the maximum matching of H should only be
at most an epsilon n additive term smaller.
Of course, the main question that we are interested in here is what is the optimal size of
an epsilon cover for a graph on two n nodes and nodes on each side? So this question
asks for general trade-off. We are given n and we are given epsilon so what is the
optimal size of the cover? Now we will also be interested in the following twist of this
question. Suppose that I want an efficient cover. I want to represent the matchings in
graph that we are using with two edges. So I constrain my cover to have O to the n edges
and so n polylog n edges and this is a standard notion of small let's say for streaming
algorithms. Now the question becomes what is the smallest epsilon for which an epsilon
cover with few O ~ n always a give. So these are the two questions that we are interested
in.
To the best of our knowledge there is no prior work on this so I will just go to our results.
Here we prove the following. On the positive side we prove that were given an efficient
construction of a one half cover of [inaudible] G that has a linear number of edges.
Furthermore, we show that this is in fact tight in the sense that if we constrain the size of
the cover to have O ~ n edges and polylog for any polylog we cannot do, we cannot have
a cover for epsilon smaller than one half. If you want an epsilon cover for epsilon
smaller than one half, then some graphs it will need to have at least n to power one plus O
mega get of one over one log n edges which is significantly bigger then any n polylog n.
So this essentially completely characterizes the second question that we asked about what
is the best approximation that we can get with few edges. And for the general case, the
general trade-off, epsilon we show that the optimal size of an excellent cover is
essentially equal to the largest possible number of edges and a so-called epsilon RuzaSzemeredi graph. So this is a very interesting family of graphs that come up and
[inaudible] constructions, property testing and [inaudible]. And I will say a few words
about them at the very end.
So these are the results that the question, the natural question is how, how good is it? So
what does this mean that we have a one half cover? To put this into perspective let me
remind you what our main motivation is. The main motivation is finding approximate
matchings in the streaming model. So here we, the edges of the graph are given to us in
arbitrary order in a stream and we can only use Õ n memory. The question here is what
is the best approximation factor to the maximum matching in the graph that we are given
can be obtained in a single pass over the data?
So in the context of this one might think that a one half cover may not be useful because
a half seems, it seems like the half is half approximation that we can always get by just
keeping the maximum matching. That is in fact not correct and we show that the one half
cover is in fact roughly corresponds to an approximation factor of two thirds for
matching. So here are our results when related to streaming. The techniques that we use
to construct our half cover yield us the following. First we get a two thirds
approximation to the natural communication problem associated with matchings in one
pass, and I will define and a few slides. Furthermore, we get a lower bound of two thirds
for one pass streaming algorithms. That is we show that no one pass streaming algorithm
achieve a Õ n space can get a better than two thirds approximation to maximum matches.
And finally this will also be useful in the, so this was for the communication problem
and got a lower bound, but we will also show that our techniques are useful in the
streaming, in the general streaming case as long as we make this additional assumption
that we don't have edge arrivals, but vertices arriving in the stream. We will talk about
this little later. Great. In the remaining time I will do the following. First I will show the
construction of what we call the matching skeleton. So this is the matching sparsifier that
is our main tool for these results. We show that it is a half cover. I will have to skip
proof, however, and then I will talk a bit more about applications to streaming and also
the connection between epsilon covers and Ruza-Szemeredi graphs.
So the matching skeleton will be a sparse subgraph of G that in a sense preserves some
useful information about matchings. And now I will give the construction. First I will
make this one technical assumption that in our graph G there is a perfect matching of the
P side. The general construction will be very similar. This will be easier to describe. So
what this says in particular is that the vertex expansion of all sets on the P side is at least
one. So one other thing that I will need is the definition of an alpha matching. So this is
a fractional matching that matches each vertex on the P side at exactly alpha times and
each vertex in Q at most once. So alpha will be bigger than one.
So the construction of this matching skeleton will proceed in two steps. First I will take
the graph and come up with a decomposition of the vertex side of the graph into what we
call expanding pairs, so these will be pairs as J TJ and these will be vertex in do sub
graphs that have increasing vertex expansion. So the subgraph will have expansion
denoted by alpha J and this expansion will be the ratio of the size of the graph. So once I
have this decomposition, I will choose a fractional matching inside each subgraph so the
edges of the fractional matching is supported on will be the edges of the skeleton.
Okay so how does the decomposition work? The decomposition works as follows. We
start with the graph and we repeatedly find and remove sets S from the P side of the
graph that have the smallest vertex expansion. For example, here we find set S not that
has the minimal possible vertex expansion and we remove it from the graph. It might
seem ill-defined and in fact it is as I just stated it, because there could be a lot of such sets
that have the smallest possible vertex expansion, but one can show that there will always
be a maximal such set to remove and that’s what we do. So this gets removed from the
graph and we recurs on the rest. Now again, we find the smallest expanding set in the P
side and we remove it. So this goes on until the remaining part of the graph has
essentially the best possible expansion for such a graph that is expansion equal to the
ratio of the size of the bi-partition. So this is the decomposition. Now it can be shown in
fact that the vertex expansion goes up as we do this process and each piece in the
decomposition has vertex expansion which is the ratio of the sizes of the sets in the bipartition. So in particular, there exists a fraction alpha J matching where alpha J is this
expansion in each such subgraph. This matching can always be chosen to be a forest just
by canceling cycles and the edges that this forest are exactly the edges of our matching
skeleton.
So this is the construction. Now I have to skip the proof of the main property, but the
main property is the following. Suppose we have two graphs, bipartite G1 and G2 and
we are interested in the maximum matching of the union of these two graphs. Now if we
instead replaced the first graph with its sparsifier, with its matching skeleton, then what
we get is a two thirds approximation. So this is the main property and one can in fact
derive from this property that the matching skeleton is a half cover. So what this means
is that we have a graph with n vertices on each side. Then the matching skeleton of the
graph will preserve the sizes of matchings between any pair of subsets up to an additive n
over 2 term. Again, let me stress that it might seem that something simple like a
maximum matching would have these properties, but in fact that is not true. A maximum
matching does not give a better than two thirds cover.
So now let me sketch these connections to streaming and so so far I defined this matching
skeleton and showed that this is a half cover. Now let me show the connections to
streaming and here I need to define the following communication problem.
>>: [inaudible] cover?
>> Michael Kapralov: Oh, sure. So if we have a graph G then the graph H is an epsilon
cover and it's balanced in the sense that there are n vertices on each side. Now H is an
epsilon cover if the following is true. We will look at any pair of subsets A on one side
and B on the other side and calculate them maximum matching between these sets in G
and H and we compare them. Now the maximum matching in H should only be at most
an epsilon n additive term smaller. So in this case here we get this property with one half
to preserve these matchings up to n over 2 additive term.
So the communication problem is the following. We have two communicating parties,
Alice and Bob. Now Alice has a graph G1 and Bob has a graph G2 on the same set of
vertices but with a different set of edges. Now Alice sends a message to Bob after which
Bob is supposed to output a one minus epsilon approximation to the maximum matching
of the union of two graphs. So maybe this matching. The questions that we are
interested in here are a lot like the ones that we asked for matching covers. First what is
the minimum size of the message that Alice needs to send Bob will always let Bob output
a one minus epsilon approximate matching the union. So now this is again asking for
general trade-off between M and trade-off and a restricted version of the question is,
suppose we restrict the communication between Alice and Bob to be Õ n so n polylog n
was a linear in the number of vertices in Alice's graph. What is the best approximation
that they can achieve? Well, a natural approach to this problem is to just ask Alice to
send a maximal matching of her graph to Bob. This will be very little communication. It
will take Õ n communication and give a one half approximation. So now, well this is a
great problem, but why do we care about this problem?
Now the motivation for this problem comes from the problem of approximating
maximum matchings in one pass in the streaming setting. And in fact a lower bound for
the communication problem will immediately translate into a lower bound for streaming
algorithms and an upper bound will not really translate directly into anything, but
nevertheless the techniques that will work for the communication problem will also let us
get a result for streaming with vertex arrivals. So there is some prior work on this
problem. There has been significant progress on approximating matching's in the
streaming model in K passes, but for K greater than one. For single pass the best known
approximation is still one half and achieved by the trivial algorithm that just keeps a
maximal matching. This was very recently improved to 1/2+ epsilon for a small positive
constant epsilon, but this is under an additional assumption that the edges arrive in a
random order in the string and we are interested in the case when the arrival orders are
addressarial.
>>: Would you say [inaudible]?
>> Michael Kapralov: This is Conrad Minier and Matthew, a very recent thing, maybe a
month or two ago. On the lower bound side the only lower bound known is o mega n
squared for one pass but this is for computing exact matchings. So let me state our
results. It follows immediately using the results that we proved for the matching
skeleton, that the communication complexity of obtaining a two thirds approximation to
maximum matching is Õ n. In particular, instead of sending a maximal matching of her
graph, Alice can compute the matching skeleton which is sparse and send it to Bob. And
so this is for the communication problem, but for the general streaming case, we show
that in fact, if we use this sparsification procedure given by the matching skeleton, we
can just use it repeatedly in the streaming model and as long as we have the assumption
that not edges but vertices arrive in the stream, we will get a 1-1 over e approximation to
the maximum matching. This will take linear space and we will only use a single pass
over the data. So it should be noted here that one minus one over e can also be obtained
in this setting using the key to the algorithm for the online version of the problem, but
that algorithm is randomized and so our algorithm will be deterministic.
So far I showed that one half covers exist and that communication complexity is quasilinear when we want a two thirds approximation. A natural question is what about better
covers and better approximation? Well, here we show connections to a family of graphs
known as epsilon Ruzsa-Szemeredi graphs. Unfortunately, I do not have enough time to
define them properly. But essentially, these graphs are defined by the property that their
edge set can be partitioned into a union of induced matchings and each such induced
matching will have size at least epsilon n. In fact these graphs come up in applications in
property testing and PCP constructions and additive [inaudible] and it is a major problem
to determine the optimal size of these graphs as a function of epsilon and n. The gaps
between the best-known bounds are immense, so for example, the best-known upper
bounds for these graphs is n squared over log*. And the best-known constructions for
constant epsilon achieve the number of edges which is n to 1+ Omega of one over log log
n. And we show that for the general question of bounding the optimal size of epsilon
covers, this question is essentially equivalent to bounding the optimal size of epsilon
Ruzsa-Szemeredi graphs.
Furthermore, let me say how we obtained the lower bounds for streaming algorithms and
lower bounds for the communication complexity problem. And so this is done via an
extension of a beautiful result by Fisher and others where they construct Epsilon RuzsaSzemeredi graphs that have constant epsilon. And they achieve by the number of edges n
to one plus Omega of one over log log n, but their construction works for constant
epsilon and here we extend this construction to work for O epsilon arbitrarily close to one
half. Now this immediately gives us lower bounds that say that our bounds on the half
covers and linear communication complexity are best possible. That is if we insist on
quasi-linear communication, two thirds is the best we can do and if we insist on quasilinear number of edges, one half is the best that we can do for covers. So this also
implies a streaming bound and so this one half here is actually the largest epsilon that we
can possibly hope for because our construction of a one half cover precludes the
existence of these graphs with a large number of edges for larger epsilon.
So this concludes the discussion of our notions for sparsification for matching problems
and applications to streaming. Now in the remaining minute let me say two words about
some other work that we have been doing with Rina Panigraphy and so this is…
>>: [inaudible] [laughter].
>> Michael Kapralov: This shows some connections between spectral sparsification and
spanners. And in fact we show that one can obtain efficient algorithms for obtaining
spectral sparsifiers using spanners of random subgraphs. And yeah. I have done some
other work on online matching and prediction strategy problems and differentially private
low rank approximation and I thank you for your attention.
[applause].
>> Yuval Peres: Are there any questions?
>>: [inaudible] matching lower bound for that [inaudible] are there cases of further
assumption where it could be could be faster or [inaudible]?
>> Michael Kapralov: This algorithm definitely takes time and log n, on the complete
graph so yeah, I am not really sure. I don't know of any assumptions that would make it
actually order out. Now, but that is a good question. And the lower bound actually the
lower bound that we proved in fact precludes order n with hyper ability only for dense
graphs, [inaudible] squared [inaudible]. So if the graphs are sparser, it is not clear. And
in fact, it cannot be true for very sparse graphs because there is a linear algorithm if the
degree is constant.
>>: [inaudible].
>> Michael Kapralov: There is nothing that I am aware of. That's interesting.
>> Yuval Peres: Any more questions? Thank you.
[applause].
Download