18904 >> Philip Chou: It's my great pleasure to introduce... assistant professor at the Verterbe School of Engineering at USC. ...

advertisement
18904
>> Philip Chou: It's my great pleasure to introduce Alex Dimakis. He is an
assistant professor at the Verterbe School of Engineering at USC. And before
that he was a post-doc at Cal Tech, and before that a graduate student at U.C.
Berkeley, and during that time he was an MSR graduate fellow here. So it's a
great pleasure for me to welcome him back to MSR. Alex.
>> Alex Dimakis: Thanks, Phil. So it's great to be back in the new building. So I
will talk about network coding for distributed storage.
And before I talk about network coding, I want to talk about coding. So how to
store information using [inaudible] codes. So this is a very basic slide. But very
useful for anything else that follows. So let's say you have a file or a data object
which is this big yellow box here and you cut it into two pieces.
So K is going to be 2 in these examples. And you can use a code, a 3, 2 code to
store information distributedly. So the way you store is you store the first bucket,
the first part of the file in one server. The second part in another server, and you
store the bit-wise [inaudible] of the packets in a third server. And this is just a
single parity disk and this is used all over the place, of course.
And the key point here is that you have generated three packets so that any two
out of the three allow you to recover the original two. So that's why this is a 3, 2
maximum distance separable code and it's also known as single parity.
This is a little more interesting case where I generate four packets. This is a
single parity. And this is A plus 2B. A plus 2B what does that mean? That
means I group my bits into groups of two, let's say. I think of them as numbers in
a larger finite field.
And then I do my operations in a larger finite field. If you don't know about finite
fields you can just think of these as linear equations over the reels and
everything will work out fine.
The key point here is that these are four linear equations. And any two out of
these four allow me to reconstruct back A and B. For example, if I get B and A
plus 2B these are two linear equations I can to get back A and B.
The point of this slide is that erasure codes are better than replication. What is
replication? Replication says just store the packet twice and store the other
packet twice, and this is in four disks, and let's compare this with a 4, 2 erasure
code, which is the one I just showed.
And what I want to say here is that this scheme here, the code, is much better
than replication, because to the right any two allow us to recover the file,
whereas for this 2X replication scheme, if I lose the first and the third disk I'm
fine, but if I lose the two first disks I have lost A and therefore I have lost A.
So this scheme is more reliable than replication, of course. So this is well known.
And in fact erasure codes are introducing redundancy in an optimal way. It's
optimal because any K packets allow you to recover your file and you could not
hope to get your file back from any K minus 1 because you don't get enough bits.
And that's why people use erasure codes like Reed-Solomon, fountain codes and
error collecting codes more generally and they use them all over the place.
Still, however, most peer-to-peer storage systems and most information stored in
data centers today uses replication. So they make three, four copies of the
objects and they don't use parity. They don't mix information together. Now, why
is that? So this guy, this is Claude Shannon, and he was in the phone company.
And he basically, one of the main results that he showed is that when people
were sending bits over noisy channels, they were observing that, okay, we can
repeat the bits. Therefore, our rate goes down. But our probability of error
reduces. And people thought that you can, to reduce the probability of error you
have to reduce the rate. And so you have to repeat your symbol more and more
times.
And if you want your probability of error to go to 0, the useful things you would be
sending over the total number of bits you would be sending would be erasure
that vanishes. That's what people believed.
But the break-through result and the change of thinking was that you don't
actually have to have it. You can have arbitrarily small probability of error with a
fixed rate, as long as you're below the capacity.
So practically what I'm saying is that replication, which is repetition code, is a
terribly bad code and every coding theorist knows that, but still this is what we
use in many distributed storage systems.
And the question is can we improve the efficiency? Of course we use replication
for other things, too, for load balancing, for efficiency. But at least for archival
storage, when the main bottleneck is you want reliability, then for those cases we
should be using codes. Why don't we? Well, people have worked on using
codes. There are many problems. When you try to use codes over networks.
And this is great, because there are new open problems for people to look at.
So let me show you a few of them. One is when you have two servers, let's say,
that have two packets A and B and they want to create this very basic code here
that is A, B and A, X or B. Every time you send a packet over a network now you
have to pay communication costs.
Typically these costs were not considered important because everything was
happening centrally. But now when you have a network of computers and the
network is the bottleneck, you want to minimize the communication that's
happening in the network. So this is one problem, how to create codes that have
the smallest number of edges or the smallest amount of information when they
are created.
The other issue, for example, is update complexity, that if I'm storing A on this
disk and A plus B on this disk, the X or when one packet changes that means I
have to go to all the disks that are storing parities combined with that and I have
to change those.
So graph theoretically that is correlated to the degree of each pa account and I
would like to make these degrees as small as possible. The problem is these
degrees are in direct conflict with reliability. So what does a minimum update
complexity that you can have if you want to tolerate any N minus K failures, for
example, is one question that has been addressed a little bit, but there's many
open problems there, too.
The main open problem that I will talk about is this one here where, okay, you
have your code. So this was 3, 2 code. But now you have a failure. So you
have to get a new disk and you have to communicate some information to the
new disk so that these two guys with the new guy form a good code again. So
this is going to be called the code repair problem, and the repair communication
is a new problem that has not been looked at before because there was no
network in the picture.
Okay. So the story is this guy. This is Sean Wier, a Ph.D. student at Berkeley,
and he was building this system. Has anyone heard of this, Open DHD system?
It's a distributed peer-to-peer storage system. And later as I read Dynamo
system Amazon is building on that and many other systems are building on these
ideas, it's a system where you store distributed information over the Internet and
they use a lot of replication or coding. So I was the coding theory guy around.
And I went, we went for coffee and we were talking and said, okay, you know
there's all these amazing developments in coding theory, we can do very fast
decoding, sparse graph codes, fountain codes, network coding, all that, perhaps
we could use any of that in the distributed storage system. And basically after
talking with him, I realized that the main problem that or one of the most
important problems is this repair issue. It's not coding, decoding complexity or
sparsity or any of these things.
So here is an example. Let's say you have, your file is seven packets total. And
you use, you encode it into 14. That means any seven out of these 14 allow you
to get your file back. So this is your original data object here and these are
parities. Seven parities. So any seven packets out of these 14 will allow you to
recover your file back. And each packet is one megabyte in size, let's say. The
total file is seven. And they were using an Open DHD. They were using
Reed-Solomon code or the information despairs algorithm idea which is very
similar.
When they had one failure, they had a new peer here that had to create a new
parity. Either parity or X4 like the systematic symbol that was lost. The problem
to create a Reed-Solomon code symbol, for example, to create new parity you
need to have all the data. So to create one packet here that had size one
megabyte they had to send all these things to this guy so that was seven
megabytes in communication over a network.
And then this guy would use these, solve for all X1. X2 up to X7 and create a
new packet or new linear combination here and store that. So this is the punch
line that the amount of network traffic required to reconstruct one lost data block
was the main argument against using erasure codes in peer-to-peer storage
systems.
And several studies have pointed out that this is a big problem. And when you
use codes as a black box, repairing one failure basically requires all the data to
be present. Yes?
>>: Do you have the same problem if some of your result data changes? Or
your XI changes?
>> Alex Dimakis: If some of your XI changes then you have to go to your parities
and update it. That's another problem. It's called the update complexity
problem. One way around this is say I'm never going to change things, I'm only
going to use codes for stuff that is archival. So that's -- typically this is the case
of most interest. But there has been work on the update complexity.
So it's a separate related problem. The big codes, for example, have optimal
update complexity. Codes designed exactly for that issue. But for repair
problem there was nothing ->>: If you consider archival, it's also true that usually archives are updated. You
append. You apply, you never change things.
>> Alex Dimakis: Sure.
>>: So if you add things you still require a lot of communication, right, so you will
have to update.
>> Alex Dimakis: Yes, update is a separate problem that was known. And there
is work -- as far as I know, the optimal construction theoretically is a big code that
has minimum update complexity for given reliability. So they use these in RAID
6. So this is a problem that's relevant even when the storage is centralized.
The novelty in these problems is that when the storage is distributed, the network
is the bottleneck and now you also have these repair issues. Any other
questions so far?
Okay. So I will talk about this. My main, the main problem is how do we repair a
code? Well, one way to repair a code is bring all the data in one point. But you
can do better than that. So this is setting up the problem more formally. Assume
you have your code. Let's say it's a 4, 2 MDS code and the one node here, this
guy leaves the system. The question is how much data do we have to
communicate to this guy, the newcomer, so that these three guys combine with
the newcomer form a good M, K MDS code. So is the problem clear to
everyone?
Okay. First idea. Definitely I can communicate two megabytes, because any two
megabytes out of this give me back all the data, as we said. So I can send two
megabytes here. This guy can solve for NP. And store here A. Or it could store
a different linear combination. So it does have to be A here. It you could store A
plus 50B as long as this is linearly independent of these, this is still a good code.
But the newcomer is going to download two megabytes and store only one. Is it
possible to download less? This is a quote from peer-to-peer storage paper that
says if you use any of the codes we know to make one packet you need all the
data. And the main point is that this is not always true. You don't always need
all the data to make a new encoded packet. If you use network coding you can
do it with much less. So this is the main message of this talk.
Okay. So it is possible to download 1.5 megabytes. This is for this example.
This is the information theoretic minimal. Okay. How do you do it? The first
thing you have to do is do what's called sub packetization. So you have to take
every packet and cut it into two. So check that this is a code that has four
variables. Any two new boxes now include four linear equations in four variables.
You can check that you can actually solve this equation. This is a 4, 2 MDS
code. When you have a failure each guy is going to make a small linear
combination of the two packets they have. Each packet here has size half a
meg. So this is half a meg. So these are three packets of total size 1.5. You
send these three packets to the newcomer. The newcomer makes a linear
combination and stores it. And makes another linear combination and stores it
here. And this is what is being stored. So you see what the scheme I'm
suggesting is?
One question is instead of cutting it in two packets, why don't we cut it into 100
packets and send around 33 from each. And the total communication would not
be 1.5. The total communication would be approaching one megabyte as I cut
smaller and smaller. Well, if you do that, however, you can prove that this
packet, these linear equations here will not be good. They will not be in general
position compared to this. So they will not form a good MDS code. So you can
only reduce your communication to 1.5 in this example. So let me try to show
you why this is the case. Okay. So this is called information flow graph.
I take every storage node and I make two copies and I connect them with one
edge that is the capacity of the storage node. And here I put a source, a virtual
source where all the data began. This is the newcomer, and the communication
is going to be beta. This is to be minimized to repair this failure. Now, I want any
two disks to contain enough information to recover the data. So I claim that if the
minimum cuts separating this source to this data collector is smaller than the file
size, which is two megabytes in my example, it is information theoretically
impossible to get the data here.
So if you compute the minimum cut in this example, this is one cutting this edge,
plus 2 beta, and you find that 1 plus 2 beta has to be greater than two
megabytes. So beta has to be greater than one-half. So this is how you find that
you could not communicate less than 1.5.
So now but how do you actually achieve that? Is it possible to achieve -- this is
just -- this is called a cut set lower bound. Okay. The problem is why is it not
achievable, because it's not only this pair of nodes that could fail. You want any
pair of two nodes that fail to allow you to construct the data. So you have to
somehow simultaneously serve all the possible 4, 2 pairs that could appear.
And this is here after a lot of pain in PowerPoint I added all the possible 4, 2 pairs
and you want a code that simultaneously sends the data to all of these guys.
And how can we hope to do that? This is called multi-casting in information
theory. And the main message is that repairing a code is equivalent to
multi-casting on this graph that you can construct.
And the break-through results of network coding originally by Ahlswede et al,
Koetter and Medard and Tracy [inaudible] and coauthors showed basically that if
the minimum of the min cuts, so the minimum from the source to each one of
these data collectors is sufficient, then there exists a code that sends the
information to all of them simultaneously. This is highly nontrivial, right?
Because I'm serving many people at the same time. And I'm serving every one
at the rate of the poorest.
And further, you can achieve that with a linear code. And further you don't even
have to think too much about it. You can just make linear combinations of
everything arriving. And with high probability all these guys will get linear
equations that will be solvable as long as you are not trying to communicate
more than the min cut.
So is it clear what I'm saying here? So the only thing we have to worry about, if
we're trying to repair our code, is basically what is the right amount of
information, and just random linear combinations will suffice.
And you can -- the thing you have to evaluate is the minimum cuts on these
graphs that are formed by nodes failing. So if you have an N comma MDS code,
if each node is going out of this and communicates beta from each existing node,
you can do the graph theory and find this is the minimum storage and this is the
communication required to repair a failure.
And it's a reduction to a flow problem, but and the graph is infinite. I will show
you why the graph is infinite in one second. But before that, if you just plug into
this 14, 7 example that we had for peer-to-peer systems, even repairing a single
pair naively costs you seven megabytes network traffic, if you evaluate this
bound, you find you can repair with only 1.85. Very large reduction in the
communication required.
Of course, there is one key problem that I don't know if you have seen already. I
have not promised you that this box is going to be exactly what is lost. I'm going
to form here something that is a new linear equation that I only promise that any
seven out of these gives you back the data. Right? So this is much different
than just having exactly X4 here. X4 was part of the data. But now I formed a
new parity. So I'm changing my code as I go. So that's one weakness of these
results.
Okay. Now, why do you want -- why is this graph infinite? So in general you
have your code here, any four out of these -- sorry any K. This is general. So
your file has size M. You cut it into K pieces. Each one is M over K in size and
this is alpha the stored information. So when you have a failure here, you repair
it by having a new node connecting to D existing nodes and communicating beta
bits from each. Now you have repaired this failure but now maybe there's
another failure.
And you repair it again and then maybe there's another failure and you repair it
again. So this graph is, you don't know which failures are going to happen, right?
And these failures could be going on forever. So the graph here is unbounded in
size. And you want to make sure that no matter, when you connect to any K
nodes throughout this infinite graph you have enough flow.
So you have to compute these parameters so at any given time during the
evolution process there's enough flow on this infinite graph. So you have to find
what is the trade-off between this beta D and alpha. The storage and the repair
communication.
So this is what we did. The punch line is if you give me a little bit more storage,
the repair bandwidth can be greatly reduced. So I'll just give you some numbers
here. If you have a file that's let's say 20 megabytes you cut it into 20 pieces and
you make 25 out of those. So that any 20 out of the 25 give you back your file.
So you can tolerate five disks failing. If you use a Reed-Solomon code, then
each disk will store one megabyte and each failure will cost you 20, because for
repairing one failure you need all the data. If you use what we call minimum
storage regenerating codes, then you will be storing the same, but a repair will
only require 4.8 megabytes. And the trade-off now was the following. Okay, if
you allow me to store a little more, so I'm going to inflate my storage. So each
storage node stores 1.65, then you can compute the cut set bounds and you will
find that the repair bound width is reduced to 1.65 also. So I increased my
storage in the system by 60 percent but my repair bandwidth goes down four
times.
So is there ->>: Do you have any example maybe later of this reconstruction code, is it just a
linear code?
>> Alex Dimakis: These are all linear network codes. The reason they are
network codes is because to construct, to maintain them, you have to mix mixed
packets. So I will show some examples in a second.
>>: So just random coefficients and elements in a finite field?
>> Alex Dimakis: Yes.
>>: But still a --
>> Alex Dimakis: Yes. The thing is if you do that, you are not going to have
exact repair. You're going to be changing your code as you go. The parities will
be changing. They will be linear equations but they will be different linear
equations. And that is a big problem in practice and the more exciting thing is
how do you actually keep the code fixed.
So I will talk about that in one second.
>>: The goal is to reconstruct the file, right, so why is it a big problem if you can
always reconstruct the file?
>> Alex Dimakis: You can always reconstruct the file from any K. But most of
the time you don't want to -- most of the time you just want to read one thing. So
if you want to read the substance of the file like you want to read the one specific
bit of the file, you don't have to get all K solve these equations and get all your
data. You could ask for partial read. Because most of the time you have zero
failures and you just want to read something.
So that was -- that's why it's always good to keep the half of the code uncoded.
That's called systematic. So half of the code is that -- yeah.
>>: Can you reduce by increasing the [inaudible] storage [inaudible] can you
reduce the amount of bandwidth to the exact size of the ->> Alex Dimakis: Well, yes. Yes.
>>: So the minimum ->> Alex Dimakis: Yes.
>>: So if storage is not an issue then bandwidth can be as [inaudible].
>> Alex Dimakis: Yes. Yes. So there's an interesting point here at this
operation point you communicate exactly what you store, which is the minimum
possible. You see, because you only -- there's no way you could go below that,
right because you're storing 1.65. You can always achieve that. At the minimum
bandwidth point these two are equal.
Okay. So there's a trade-off between storage and communication here. And the
question is what are the achievable points? So, okay, so you can pose this as a
graph problem. You have this infinite graph. Everybody connects to D
communicates beta. You want any data collector. So any K to give you your
data back. So choose these parameters so that this is everybody here gets
enough flow.
And this is the general theorem that I will spare you from the details. The general
idea is if you have an N, K code if you store alpha bits, you connect to D nodes
and download D times beta. So D from each. D times beta is the gamma.
That's the total communication.
Then there is the crazy formula that describes the region, the trade-off region
between communication and storage. And if I just plot it, it's much easier. So
this is the region. This is how much you store per node. This is how much you
repair totally, total communication. This point here is called the minimum storage
point. So minimum storage regenerating codes stands for MSR. This is no
coincidence, because we did half of this while I was here at Microsoft Research.
This is Microsoft -- no, minimum storage point. This is the minimum bandwidth
point. So and there's a trade-off between the two. Everything above this blue
line is achievable with random linear network coding and everything below is
information theoretically impossible by a cut set bound. Like the one I showed.
Okay. So this is all good. And this is going to appear. But there's one very
important problem. So we characterize this region but only if you're changing
your code. So if you talk to a practitioner and you say, okay, that's all nice but I
actually want to repair exactly what I lost.
So this looks like a trivial extension initially, right? So you say, okay, now I lost
this guy and here I want to create a packet that is not just any linear combination
that's in general position. But I want to repair exactly what I lost. And I have
these bounds that were the cut set bounds, and now I want to ask, okay, this is a
strictly harder problem. Can I achieve the same cut set bounds? This was the
open problem.
And this is a very, very hard problem. Because as I said before, when we reduce
this problem to a network coding problem, you only have these data collector
guys who want all the data. Now, when all your clients want all the data, this is
called a multi-casting problem, and there is all multi casting problems have been
characterized and they're easy to characterize. You only have to serve the
poorest and everybody else will get the information.
However, now when we have exact repair, we have these intermediate guys who
want data themselves. They want specific linear combinations. So that
problem -- so this is the picture, repair is multi-casting, but exact repair is
multi-casting with intermediate nodes having requests, and the requests are
overlapping. So different people might want stuff that overlaps.
And therefore the cut set region, the region I showed you before, the blue line,
might not be achievable, linear codes might not suffice, and we don't know
basically -- it's a very, very difficult problem if you have network coding with
multiple sources, we have crazy examples of codes not, linear codes not
sufficing and we don't know how to characterize that region. So this was a very
difficult problem and it was open for a few years. Let me tell you the story. So in
general the question is, this is a blue region. What can you achieve with exact
repair? So the two points that we will mainly focus on is this point and this point.
There are no results as far as I know for any intermediate points. The
intermediate region is open. But let's only look at these two points. These two
points have received some work. The minimum storage point here and the
minimum bandwidth point. So for starting with Leong Ho, when I was an intern
here, we had this paper that said: If K is equal to 2. So the original file size, you
separate the file in two pieces, then systematic or exact, systematic repair is
exact repair, it's the same thing. You can match the cut set bound if K equals 2.
We had constructions, codes, linear codes that achieved the cut set bound for
this case.
And then when I was a post-doc at Cal Tech, we were trying find 5, 3, this was
the smallest case that was open. And Dunkliner, an undergraduate at the time,
ran a huge computer search over all possible codes and there was some
optimizations how to make this feasible, and we could find some codes that were
5, 3, exact. So they were matching cut set for this case.
And then there were some results by these two groups that showed that if the
rate of the code is less than one-half, then exact repair can match the cut set
bound. So this generalizes this result but not necessarily this one.
But they have very specific code. So your error correcting code has to be
constructed in a very special way from these cost sheet matrices [phonetic] and
then they show you can actually repair these codes exactly with as long as you're
rate below one-half and you could match the point on the blue curve, the cut set
bound. Are these clear what I'm saying? Okay.
One obvious question is for high rates, what can you do? So Kadamir [inaudible]
independently showed that you can actually approach -- approach, not match -approach the cut set bound for any N, K for the minimum storage point using this
technique that's called the symbolic extension technique. That's quite
remarkable because this technique was developed for an entirely different
problem, the interference channel in wireless. Nothing to do with network coding,
nothing wired networks. It's a problem over the reels with wireless channels
interfere.
The exact same technique can be applied and they show you can approach the
cut set bound for all K and A. This is quite remarkable result. The problem is
that it requires an enormous field size and enormous packetization. Remember I
was cutting every packet into two. Now you have to cut the packet in 10,000 or
billions, it's exponential in N, K.
So but it shows that it's possible. It can be done. So linear code suffice to
approach the cut set region for exact repair. For the whole range of parameters
for the minimum storage point. So this is one point where we know we can
approach it.
Okay. So now I want to give you -- how much time do I have? I should have
quite a lot of time?
>>: Half an hour.
>> Alex Dimakis: Half an hour. Good. So since most people are still awake, I'm
going to give you my 5-minute -- 10-minute crash course on interference
alignment and how it's possible to achieve these results.
Okay. So what's happening here? Imagine I give you three linear equations in
four variables. Okay. In general, if I give you three equations in four variables, in
four unknowns, you cannot solve for any of the variables, right? You would hope
to solve for three of them, and if the equations were trivial, if the equations were
A 1 is 5 and A 2 is 11 and A 3 is 12, then three equations and three unknowns
you get them. But now I have three equations, four unknowns. In general, you
cannot recover anything from them, the only thing you can recover is they lie on
a sub space. Well, let's look at these three equations, three equations and four
unknowns. However, as you can probably see, I can use these three equations.
So this equation says B 1 plus B 2 is Y 3. I can subtract these equations from
these two and get two linear equations in A 1 and A 2 and I can solve for A 1 and
A 2. So this is three equations in four unknowns but I can solve for two of them.
Why? Because these coefficients here and these coefficients here are aligned.
That means basically that the rank of this matrix, this is 1, 1, 1, 1. The rank of
this matrix is only 1, 1. And therefore I can get this equation and cancel this
interference and get two linear equations in the two things I actually want and
recover them. Do you see what I'm trying to say here?
This is, of course, something you can do in high school. The difficult thing is how
do you do many of these alignments at the same time. So how do we form good
codes that have these crazy alignments, a lot of them at the same time? This is
really the question.
So here is my example of a 4, 2 code that is exactly reparable. This is what you
were asked before.
So, first of all, observe this is a systematic code. The first two parities are the
data themselves. These are linear equations. A box now, a node stores two
linear equations. Observe that any two nodes contain four equations that you
can solve for your four variables. So this is indeed the 4, 2 code. Any two boxes
give you back your data. Now, you lose one.
What am I allowed to do? I'm allowed to send one linear equation from each of
these guys to this -- this is the newcomer. The newcomer wants to solve for X1,
X2. What I'm allowed to choose is my coefficients here. These were called
repair coefficients. So, for example, I can do 1 times X3, plus 1 times X 4 and I
form this linear equation. The size of this is half a meg. I can choose these
coefficients here again and form another equation that has half a meg and
another equation here. I can choose any of these the way I like. This is my
choice. What is this? These are three linear equations in four variables. There
is no way I can make these equations contain only three variables, because this
has to be a good code. So there's no way I can choose my coefficients to only
have this stuff that this guy wants. This guy wants X1 and X2 here. He really
doesn't want X3 and X 4. That's why this is red. So X3 and X 4 is interference to
this, to our friend here.
But the question is, I can choose 1, 1 here, 1, 1 here and 2 inverse and 3 inverse
here so that the interference part is the same here and here. And then using this
equation, this is exactly the same equation as the ones as I had before. I can
cancel this stuff out and this guy now has two equations in the two things he
wants.
So is that example clear? Okay. So the interesting thing is that this code, if you
lose the first node you can do this. If you lose the second node you can choose
again different coefficients to repair exactly. Now if you lose this parity, now you
have to recover this specific parity, right? Well, again, you can choose the
coefficient so you can solve for these linear equations and for these two. So this
code is a 4, 2 exactly reparable code and it matches the 1.5 bound. So this was
the construction that we had before. But this is not generalizable construction.
Okay. So and then there was the symbol extension idea of showing how you
could actually generalize this. So first of all, before -- I want to go from this
example into matrices. So how am I doing that? The first one is X one, X2.
Imagine multiplying by a vector here that's X one, X two, X3, X four. This is the
first equation, this is the second equation, and the third equation here is X one
plus X3. X one plus X3 is the third row here. This is my code. You can
represent the code by the coefficients in this form. The repair coefficients are
these things that are sitting here.
What is happening in the previous example is that this is the interference part.
This is the X3 plus X 4 and the key is that this matrix has low rank. This sub
matrix here has rank 1 and I get this extra equation on X3 and X 4 and I can
cancel these two and get the full rank matrix on this.
This is really what happened before. Let's look at it more abstractly now. This is
a systematic part. These are the diagonal ones. Now, this is the general code. I
can choose any code I want. And I can choose any repair coefficients I want. In
fact, I chose different repair coefficients here and here. But the [inaudible]
scheme uses the same coefficient. So I'm going to restrict my freedom and
choose the same coefficients here and choose the same coefficients for all the
systematic blocks so that's one assumption.
I'm going to form all these matrices. And what is my goal here? So choose the
same V prime for all systematic, the same V for all nonsystematic. And they also
chose the matrices to be IID diagonal. This is their choice. They can do
whatever they want with the matrices. So they chose the matrices to be IID
diagonal. What's the requirement? The requirement is that all these things here
are contained in this small matrix so I can cancel. And at the same time all this
stuff here is full rank. So I can get the stuff I actually want.
All right. Now, how am I going to do that? That's actually a very difficult problem
when there's many matrices. All right. So I say we want this full rank and we
want these vectors to be in the span of this. All right. So we have to choose V
and V prime because we already chose these matrices A. In general, I could
choose a matrices A and V. But these are quadratic equations that I cannot
solve. But they set A. So now I have to choose the Vs. Okay. So this is my one
slide crash course on this symbolic extension thing.
Let's start by choosing V prime here. Let's start by choosing V to be one vector.
Sorry not V prime, V. Let's say this was one vector, only one vector. If this was
one vector what do I need? I need V prime to contain V times this matrix and V
times this matrix. Okay. That's two extra vectors. If I chose V to be this, then V
prime would have to be these three vectors, right? The blue here is the extra
stuff. The suboptimal stuff. Ideally if I was operating at the cut set bound there
would be no blue stuff. But now when I start from one vector after mapping it I
get three. That's a lot of overhead. It's a huge overhead. But now I'm going to
do the following thing. I'm going to say now I'm going to take this and pretend
this was V. So if I -- so if I started -- so I call this fold-back V prime into V. So if
V was this vector, these three vectors now, well, now I would have to make sure
that every vector multiplied by A 32 and A 42 would stay in variant. Now, I
multiple again by A and I get the six vectors. So observe now I have some
overlap because W times A is already in here, and W times A squared is not, but
in the next step I'm going to fold it back in. I'm going to have more overlap. So
this is how this construction is working. And if you keep -- now you pretend this
is V and you see what would V prime have to be. And you keep on doing that.
You will see that the blue stuff, the extra stuff I have to get to cancel is actually
vanishing fraction of the overlap. So almost everything is aligned, and the part
that is not alined is vanishing as I keep on doing this process. Now, why is that a
problem? What is the problem with that? The problem is that until this becomes
very small I have to make this very large number of equations.
So this is the general statement that says if you use this idea of folding your
equations back again and again, you get perfect alignment. But to get close to
perfect -- you actually never get perfect alignment. You get super close to
perfect alignment. To get super close alignment you need to have foldings
exponential number. Can you do it better? I don't know. I think nobody knows.
It's the million-dollar question how can you do this without extending your field so
much, without cutting it into so many small vectors. Okay. But this shows that
linear code suffice to approach the cut set region for exact repair for this is for the
minimum storage point, of course. For the other point in the trade-off we don't
know again. It's only for one more point we do know. But for the region we don't.
And the key question is, do it with small field and small sub packetization. Okay.
So some other new results -- so this technique was done for the wireless
interference channel, right? It's a completely different problem. So this is
surprising result we had is the following: If you give me a code, I choose the
repair coefficients that reduce the repair communication over a field. This is a
computational problem. If you give me a fixed code, this is a computational
problem. It's NP hard. It's a rank minimization problem over a finite field. This is
another problem. You give me channel matrices in the wireless interference
channel, and I have to choose a bin forming mate tries that maximize the
degrees of freedom of that wireless interference channel. So one result that we
established recently with my student is that both of these problems are basically
the same problem. Basically if you had the box that could solve one, you could
use the box to solve the other. But, of course, this is over a field. This is over
the reels. But the problem is essentially the same.
It's minimizing the rank of some matrices, subject to the rank of some other
matrices being full. Like I think you can see that for the repair case. For the
interference channel, it's another story, but it's very simple. So these problems
are connected.
And there is other problems that can be put into that framework. So, for
example, you can think of security and secrecy problems where I want to
communicate to some guys, and I want other guys to not get anything. Again,
that you can pose as a problem I want full rank at this receiver and I want 0 or
rank 1 at the bad guys. This is, one problem, is for example, the multi-access
channel with eavesdroppers, and they're using the interference technique we can
find degrees of freedom for that.
So recently also this was applied for the problem of multiple unicast in network
coding. Probably the most important problem when you have multiple single
source single destination pairs. Again you can apply interference alignment
techniques and you can show in some cases very good performance. And
another of [inaudible] problems is of course I assume now my topology was fixed
and everybody was distance one from every other and I was counting bits. But
instead maybe there is cheap bits and there's expensive bits.
There's some people who are close by and some people who are far away. And
maybe you want to get more bits from some people and a fewer number from
others. So what's the right repair if you have a given topology with costs? That
is one open problem. There was one paper by [inaudible] Li and his group at
Infocom recently on that on repairing on trees and I'm going to talk about
allocations now.
How much time do I have? Maybe 15 or 10?
>>: 10 or 15.
>> Alex Dimakis: So before I move to that, are there any questions on repair
problems before I move to a slightly different -- yes.
>>: Is there not lots of validating in between reels and fields, like suppose -- as
far as the repair problem over reels there are some like finite number of bits.
>> Alex Dimakis: Yeah.
>>: Packetization, do you think it automatically -- because you have the
constraint.
>> Alex Dimakis: If I had a box that could show both. If I had a box that could
minimize the rank over both, something like that.
>>: Still the question, the box is the same.
>> Alex Dimakis: So we don't know of any scheme that works in one and does
not work in the other. Of course, I would expect if you limit my field size, if you
limit my field size to be binary, then of course I have much more restricted -- so I
mean we don't even know the repair bandwidth for binary. Even for functional
repair. Bounding the field size is also very difficult problem. But all the
techniques that work for reels so far work for -- so I don't know. Any other
questions? Okay. Let me move on to this allocation and there will be more.
All right. So this is just a motivating slide that says that everybody is watching
videos on their iPhones and that's a huge problem because 3G cannot tolerate
that. And what are we going to do, right? And you can put more antennas, but
this is not going to scale in the right way.
And the key approach, I think, is to do delivery of content with opportunistic
contacts. So use some idea like [inaudible] cells or even device-to-device
communication to cache the content and give the content in a device-to-device
way rather than getting it from the server, using 3G.
Okay. What is the point of this slide? Basically the video you want to watch is
very likely to be downloaded by someone nearby in the near future or past. I
claim this. This is not always true. Depends where you are. But in many cases
it is true. This is one of those plots that shows, you know, 10% of the content is
responsible for 90 percent of the traffic on YouTube and everywhere else. This
is the other interesting thing is that storage is increasing more than anything else.
Storage in phones is increasing more than anything else and storage in boxes is
increasing. So you can have a lot of storage. So the idea is can you do
distributed storage of the popular content and deliver it to the device 2 device
localized way. A lot of problems here.
So let me tell you a few of them. So again, of course, you might want to use
coding instead of replication. You might want to code across the content rather
than store the content in different storage nodes. You have again the problem of
maintaining the code and all these regenerating code stuff is relevant here. But
there's also many other problems. So you would like nodes to cache different
content in a distributed way. You don't want everybody to cache the latest Lady
Gaga video and the latest Lady Gaga video is nowhere to be found. So we have
to coordinate about what we have to cache and we have to find a way to cache
the popular content but in a somewhat balanced way. So which content to cache
is one question. How much to store, this is the most basic question, how much
to store on each of the storage nodes. How to find who has the stuff I want is
another question. And how do you give incentives maybe to people to donate
their storage and their resources, maybe you know in a [inaudible] way you will
get faster.
>>: Going to be over WIFI?
>> Alex Dimakis: You could do it over WIFI. You could actually do it over 3G.
>>: Same bottleneck.
>> Alex Dimakis: No, device-to-device over 3G. That would require -- it's
technologically very feasible. It's not done in current technology. But so you
could talk -- yeah, over different models. The key point is you want to limit your
power. So you want to go to the model where I talk very quietly in a very small
radius. So any technology that will allow that.
So I want to -- so these are very relevant problems for this. I'm going to just
mention the most trivial one. I'm going to mention a trivial problem. How much
to store. So, for example, I have two files and I want to store them in five storage
nodes. One thing you could do is you could say I'm going to store the first file in
the first two nodes and the second file in the second. But -- and then somebody
is going to drive by and with some probability access each one of these. That's
one storage scheme. But you could take the first file, cut it into pieces and code.
So any two out of the orange give you the yellow file. Any two out of these give
you the blue file. And then store this.
Now, any two storage nodes contain both files. So this is again strictly better
storage than using replication. So you might want to code across your storage
devices and get better access to your content. This is what I want to say here.
Okay. But you could also change the allocation. So maybe you store both files
at the first guy, both files at the second guy and you leave these guys empty. By
empty I mean you store other stuff there. This is a different allocation. Was
there a question? Okay. So this is a different allocation. So which allocation is
better? You say, okay, this one is better, obviously. But that's not clear. So I'm
going to make an even more trivial problem. My most trivial problem is I have
one file to store and each one of my storage devices is going to be like a bucket.
And each of my buckets is going to be killed with some probability independently.
And the same probability. So every device will be crushed with probability .1. I
have fixed redundancy of two liters of water say. I have the code. Even one liter
out of the two gives me back my five. What's the best way to allocate my
redundancy in these five symmetric storage devices. It's like the most basic thing
in the world. I have five minutes, maybe? Basically. Give or take.
All right. So this is the most basic thing in the world, right? Okay. So let me give
you an example. So you have your fixed storage budget. So you're going to
allow two units of storage. So one thing you could do is this. 1, 1 and leave the
others empty for another file. We call this minimum spreading. The other
extreme is maximal spreading. Spread your budget equally over all and maybe
you could do this. Maybe you could do one-half, one-half, one-half and empty.
So when somebody told me this problem a while ago, I said, okay, everybody
knows that's the best thing to do. That's the most [inaudible] turns out it is not.
And well, said, okay, maybe if the probabilities are different. Let's say no but
even if the probabilities are the same and everybody fails independently, this is
not always the most reliable thing.
Okay. Why? You just look at -- you just play with some examples. What's the
problem? Maximize the probability that the sum of the XIs that survive -- so this
is indicators -- is greater than 1. So if it's greater than 1, my code suffices to
recover the data. If it's smaller than 1, even if it's .99, I get nothing because from
a code unless you do something clever, you will get nothing. Subject to the total
storage is less than your budgeting.
And of course you can generalize to different failure models. This is noncovex
and harder than it looks. This is what I want to tell you. Even this very basic
allocation problem we don't know how to solve.
Why? Okay. So first claim. Symmetrical locations can be suboptimal. What do
I mean? Let's say you give me five storage nodes and you give me this budget,
12 over 5. That's my total.
You can prove that this crazy allocation that is not symmetric, so it's 3 over 5, 3
over 5, 2 over 5, 2 over 52 over 5 is better than anything else. This is the best
way to allocate information.
If you restrict yourself within symmetric allocations, which means they are all the
same or 0, the best symmetrical allocation is not to store evenly on all the nodes
but to store evenly on the four and leave the fifth guy empty. Why? Because two
nodes now contain enough information to get one unit whereas if you were
spreading evenly over all of them you would need three nodes to get one. So
finding the optimal allocation is very difficult. We don't know how to do. Finding
even the optimal symmetric allocation is nontrivial. Of course you can check but
we don't have a closed form for even the best symmetric allocation. This
problem has been discussed, it was discussed at Berkeley by several people for
a while, and in general it is open.
Do you understand the problem? Is the problem clear at all? So I can tell you
what the few intermediate results we had on this. Okay. So for the IID model we
approved the following thing. Maximal spreading, which is spread the intuitive
thing to do maximal spread the eggs as much as you can, is optimal -- it's not
optimal. It's asymptotically 0 gap from optimal from. So the gap from an upper
bound vanishes. If you're in the regime where TP is greater than one. T is the
budget. P is the probability of success. TP is the expected amount of bits that
will survive in the system. And if under this condition then this is an allocation
that approaches optimality as in -- this is the only result we have right now. We
have other results for symmetric allocations, but for the best allocation, it's quite
challenging. So this will appear in globe com. So even for starting one thing,
what's the best way to allocate is highly nontrivial. That's what I wanted to say.
Other problems, repair problems are, of course, very difficult under errors. So if
you have incorrect linear equations if somebody introduces incorrect error
equations into your code, then these equations will poison everything else
afterwards. That's a big problem with combining stuff.
If some of your equations are wrong, even if the guy leaves, now you repair, this
other guy, you get poisoned equations. Your whole system will get poisoned,
even if one linear equation is incorrect. How do you deal with that? You need
codes that can tolerate errors. How do we repair codes that can't tolerate errors,
or how do we use some hash, some signature schemes and all these are very
interesting.
So I will conclude. Before I conclude, I actually maintain a wiki of the
bibliography on storage stuff. So if anyone is interested you can go there and
see there's like a lot of literature for the repair problem and the allocation problem
and a few other things, so if anyone is interested, you can find it on my page.
Okay. A few open problems. Cut set bounds are tied. We don't know. We only
know for a few points. And other practical codes can achieve them. This is of
course a relevant problem. What's the limit of interference alignment techniques
is a very fascinating question for network coding. I think actually interference
alignment is more useful for network coding than wireless, because for wireless
you have to assume you know the channels perfectly, whereas for network
coding you actually can do what you want by. You design the problem.
Repairing codes in small fields as we discussed this is very tricky and interesting.
Repairing existing codes that people have already deployed is a very relevant
problem. So, for example, the B code and even node are useful codes using
RAID systems, how do you repair those given codes. And we have some prior,
some preliminary work on that.
How do you deal with bit errors in security? There's a paper on security that
appeared in I Society and finally what's the role of nontrivial network topologies,
as I mentioned? And last one, you know, allocations, even if you have multiple
objects is the real problem and it's highly nontrivial because even for one we
don't know what the optimal thing is.
So I think I'll stop here and any questions would be welcome. [applause].
>>: The interference alignment seemed to, I mean, there must be some special
structure of the problem of the exact repair that makes it possible to find, to
achieve the same capacity region. Because in light of the other results of
Dougherty and Seeger so forth in general it seems that you can't do as well. So
do you have any sense of what that special structure is and what kinds of
networks these tricks will work.
>> Alex Dimakis: It's a very good question. So so far all the networks that I
know are reduced to these rank minimization subject to fool rank constraint. So I
can write, I can write repair as an optimization problem of choosing these repair
coefficients to minimize rank of some matrices. Say some of the ranks. Subject
to some other matrices being full rank. And interference for the wireless channel
for the wireless interference channel I can write again in the same fashion.
Now, I cannot write every network coding problem in that fashion. So this is -but it is a fairly general framework. I do not know if -- so there's many techniques
for interference alignment. But the symbolic extension is the technique that
actually achieves the cut set bound asymptotically. For example, we don't know
if this technique achieves the cut set bound for the intermediate points because
we have shown it achieves it for the minimum storage point. So the intermediate
points, for example, is unclear if -- so I mean the minimum storage point is a very
interesting point because it corresponds to MDS codes. So it corresponds to
people using Reed-Solomon. So it's exactly the same point. I don't know what is
the magical structure of the property that allows it. But the fact that it was used
for multi unicasts also shows me that it's very -- it's not just interference and the
repair. It seems to be more general.
>>: On the topic, things are very good to work and interesting talk. But the
spreading code which you discussed later, have you considered that with the
interference alignment and any intuitions on that?
>> Alex Dimakis: No, because I have not -- you mean the allocation problems?
Yeah, no, I have not tried to apply any. Well, one interesting question there is if
someone gets a smaller subset of the equations, so, for example, there are some
users that get K equations and some users get, let's say, K over 2. In general, if
you have an MDS code, any K will give you all the data. But K over 2 will give
you nothing. The reason the allocation problem is difficult, if you get .99 percent
of water you get 0 useful information.
>>: I think here is this -- I mean basically this allocation problem generally
targeted data survival. So basically let's say I have data I want to store in the
systems, I want to make sure that the whole copy of the data is safe in this, right?
But the systematic property of the code is important for data retrieval. So many
times where you want data, you may not want the whole data. Just may want
basically a piece. And there's a systematic [inaudible] that would make the
retrieval much more efficient. So I think these two properties is indeed a very
desirable system. And ideally you want the both.
>> Alex Dimakis: Yes. I see. Basically you're saying that -- that's a very good
point. So the reason I'm the allocation problem is hard is exactly this hard all or
nothing. It's this all or nothing that makes it hard. Because I say the probability
that I get one, if I get one, I get all the data. If I get .99, I get nothing. So that's a
step function that is you're trying to optimize. That's why it's so difficult. If I had
the softer function here, so if I said, okay, if you get 99 you still get some utility.
Then this problem would become easy. But now the question becomes how do
you design codes that have good graceful degradation. Of course as you said a
systematic code, systematic MDS cold has some grateful degradation. Is that
the best you can do, I don't know, for example. And you could use interference
for that.
>>: Basically my point is for practical reasons. I would rather have a systematic
code.
>> Alex Dimakis: Of course.
>>: Now optimal than optimal code but ->> Alex Dimakis: Actually, I believe that the systematic code is also optimum for
graceful -- in terms of graceful degradation, I don't think you can do anything
better than keeping some bits in the clear and some parities. Practically, of
course, I agree with you. But I think even in theory it's the best. Perhaps you
can use alignment for that because it's alignment type of role. Very good point.
>>: Well, this resource allocation problem you present today like it is [inaudible]
but have you -- what do you think about using coding that is not rating that model
but rating [inaudible] do we need help?
>> Alex Dimakis: Coding makes it water because -- so I'm thinking of taking the
data and multiplying by a full matrix that is random IID and N by K matrix. Any K
will give me back all my data. So that's why any one liter of water gives me back
my original liter of water. So and then of course I'm saying I make the packets
super small so I don't have to worry about the discreteness of the packet.
But as we were saying, if you had some graceful degradation then the problem
would be different, because I don't just want this, I want to maximize some utility
which depends on the surviving amount of water, and then if I had the very
simple objective function here, maximize the amount of water just the amount of
water, then it's trivial. But that's not going to be -- if you had the magical code
that any K gave you the original data and any K minus 1 K gave you K minus 1 of
the data and any K over 2 gave you the data then this would be trivial. But of
course I don't think this is possible. What's the best thing you can achieve than a
code that performs well at any one point or any two points is very good -- I
conjecture that the best code is the systematic one. Systematic MDS code is
performing, I don't think you can beat that actually with any other scheme.
>> Philip Chou: All right. Let's thank the speaker again. [applause]
Download