Document 17865058

advertisement
>> Ratul Mahajan: Thanks for coming today or it's my great pleasure to introduce Aditya Akella
to tell us about some recent work he's been doing on OpenNF. For those of you who do not
know he is on faculty at University of Wisconsin and has done a lot of good work since he's
gotten there in redundancy elimination, complexity, understanding quality of experience and
whatnot and a lot of his work has been really influential. Personally, I think I've known Aditya
now for 10 or 15 years. I think we started grad school at roughly the same time doing
overlapping things to both [indiscernible]. But, you know, we grew up. Anyway, without
further ado I will hand it to him.
>> Aditya Akella: Thank you. Thanks everyone for coming to this talk. I'm going to be
describing the system called OpenNF that we have been working on for nearly a year now. It's
basically something that combines the power of SDN with software virtual appliances. It
enables the idea of distributed middle box style applications. A lot of this work was led by my
student, Aaron Gember but he got a lot of help from Raajay who is in the audience today, and
from Junaid and from Robert as well. How many of you here sort of know what middle boxes
are and what SDN is and so on? Who doesn't know, I guess? Is there anyone who needs… You
don't? Really? Okay. [laughter]. All right.
>>: That's all right. You have some slides for it [laughter].
>> Aditya Akella: I do, but… Okay. Let me just quickly walk through them. I will just start by
telling you what these software virtual appliances are and SDN and why they are being used
and then I'll present the motivation for the distributed processing work that we are doing.
Traditionally, you know your network is made of routers and switches and they provide very
basic functions like connectivity between different points, simple access controls and so on, but
often operators want more packet processing functions from their networks for security, for
performance reasons and so on. Network functions or middle boxes, both of these are fancy
terms. I don't know what they exactly describe, but they are terms, are devices that help fill
this gap. They allow an operator to introduce custom packet processing functions into their
networks and these are a lot of examples out there, things like load balancers for balancing, in
network balancers for balancing load across service, things like firewalls, SSL gateways,
intrusion prevention and protection systems, traffic scrubbers for security, caching proxy and
WAN optimizers for performance and so on. In context with their routing and switching
counterparts these devices maintain their state in nature. They maintain a log of state for the
flows that they process, so that's the big difference between these and routers and switches.
You may be wondering why am I working on these things. These are not arcane. They're
actually very popular. A recent study from Justin Sherry and others found that these middle
boxes are quite widely deployed across a bunch of different devices out there. This is an
average across 55 different enterprises and the key take away from this graph is that there are
at least as many of these appliances deployed in enterprise networks today as there are routers
and switches. And this is not just true for enterprise but also for other networks such as ISP
networks. Cellular networks have a similar widespread deployment of these devices. It's not a
surprising thing that the market for these devices is rapidly growing multibillion dollar market.
As newer applications arise, newer devices arise, newer threats emerge, newer appliances are
designed and they get deployed. They are popular, but they are also extremely painful to
manage. In general large networks are hard to manage and once you throw in these custom
back-and-forth things they require custom configuration, custom wiring and they are generally
difficult to deal with so they are a pain to manage. This is sort of where software appliances
and SDN come in. These are two trends that are actually making the management of these
networks that have these network functions simpler to manage. The first thing is that
traditionally these were hardwired devices. They are moving to software. This is riverbeds
steelhead appliances, its virtual counterpart. This is an Fi counter balancer. This is an in cloud
counterpart of that. These software alternatives are cheaper. They are easier to deploy,
upgrade, customized for a particular network and so on. They kind of bring down the
management burden significantly.
>>: The movement is actually [indiscernible] hardware? The movement is from customized
hardware to commodity hardware, like…
>> Aditya Akella: These are, it's a mix of both. I think there are models where you deploy on a
commodity hardware, sort of an extensible middle box along the lines of [indiscernible] where
you can deploy a bunch of different middle box applications and they will be running in
software, but these are sort of machine images. These are sort of in a package as VM's that you
could deploy inside EC2, for example. Both of those are fairly popular trends.
>>: [indiscernible] really less?
>> Aditya Akella: These things become easier. Actually, it's hard to quantify exactly whether
things become simpler. Hold on a little bit. I think once we talk about SDN, some of those
things will become clearer.
>>: [indiscernible] replacing one big iron box [indiscernible] smaller lightweight [indiscernible]
one the one. You have slight [indiscernible] showing…
>> Aditya Akella: Oh yeah. It's not. You need to use multiple of these. You need to use
multiple of these. I agree.
>>: [indiscernible] management cost increase substantially?
>> Aditya Akella: Yes and no. Let's wait until the next slide and then we can deal with this
question. The second trend is the use of SDN. Again, most people here hopefully know what
SDNs are. SDN is basically a framer that provides you logically central control over the
forwarding state of a network. To see how this is improving management of networks with
these and NFs or middle boxes, note that traditionally the way to use these devices was to
deploy them in chokepoints within the physical topology and then wrestle with distributed
protocols to funnel traffic through those checkpoints to get some traffic subsets processed by
them. With SDN, because you have this control over state, you can deploy these boxes off path
and then punch traffic in and out of them to specific traffic subsets in and out of them and that
immediately takes away the central points of failure and attack here and central points of
network condition. That aspect of the management story kind of becomes better. The other
thing you can do which is sort of an interesting line of research in middle boxes is that you don't
need, what you could do is you could have multiple different kinds of the these middle boxes
hanging off of the network and you can chain traffic through those different sets of middle
boxes such as specific traffic subsets or different sub chains and that allows you to use, realize a
variety of interesting rich high-level policies. That kind of stuff is something difficult to realize
as with hardware appliances deployed at chokepoints. What this talk is about is building on this
notion of decoupling of middle boxes from network topology that SDN enables. What
decoupling means is that you don't need to use one instance of a middle box. You can have
multiple instances deployed at different locations within your network. You could use SDN as a
way of steering traffic subsets towards those different instances. SDN actually gives you more
teeth than that. It allows you to dynamically reallocate traffic across instances in order to
achieve dynamic control over the distributive processing happening those instances. You can
use that to realize various high-level properties. One of the things you can do, for example, you
could immediately make load balancing more effective. For example, you could in this case,
whenever some middle box [indiscernible] you can immediately move load off of that middle
box by dynamically reallocating processing for those flows across to a different middle box.
This allows you to extract maximum performance across these different versions of the middle
box at a given cost. This is sort of what I mean by software appliances with SDN kind of making
the management story simpler. There is some limit of [indiscernible] in not having to worry
about doing many of these things. I don't have a concrete argument that this makes
management simpler, but…
>>: [indiscernible] example [indiscernible] so that you can [indiscernible] maintaining the
software state across multiple instances and showing that you keep on rolling [indiscernible]
ensuring you have [indiscernible] so think of what you would do on just one [indiscernible] that
will essentially help you do on [indiscernible] 1, 100 to 1, to one thousand. [indiscernible]
software [indiscernible] I mean the [indiscernible] cost [indiscernible] becomes much more
painful because they are ticking time bombs that [indiscernible] goes out [indiscernible]
customers [indiscernible] even with this kind of a figure [indiscernible] state full, so whenever
you are making any kind of a change you have [indiscernible] for the state, so what
[indiscernible] connections that you have to…
>> Aditya Akella: Okay. I think you are making the case for OpenNF, so let me, again, maybe
this is not the place I should have stopped. I think we are in agreement. I mean there are some
aspects that come into play. You don't have to put these in the middle of the network. That's
also the reason, for example SLB is used as opposed to a big honking F5 load balancer. It gives
you much greater capacity without having to babysit custom hardware. That's kind of the
trend. Things are kind of moving away from hardware. What I wanted to get into is not just
simple load-balancing. It's actually the ability to do dynamic allocation of distributed processing
allows you to build novel abstractions. One example in the section of the infinite capacity
middle box where the middle box instance runs hot, you run additional copies of distance and
you distribute subsets of traffic across those copies. Even a recent abstraction that you can
provide is that of an always updated always available middle box. Suppose you want to update
this middle box with some security patch. You deploy a hot standby that has this patch. Once
and this is ready and that will be clear in a minute, in a few slides, you take the middle box uses
the [indiscernible] traffic to this updated middle box and then you have an always updated
middle box abstraction. You can do something even more powerful. You can dynamically
enhance the functionality of a middle box by leveraging and in cloud brand-new version of this
middle box. For example, if this middle box is seeing something anomalous for a certain subset
of traffic, you can take the ongoing processing for that specific traffic subset, handed off to a
brand-new version in the cloud for additional processing. If you had the ability to do this kind
of time control over distributed processing, then you can realize all of these interesting
abstractions. A lot of cool fun things can happen if you get middle boxes and SDN to play with
each other. This may not be funny now, but hopefully, it will become funny in a couple of other
slides when I'll bring back these two characters. What OpenNF is it's a control plane that can
support key semantics in these distributed processing applications specifically for all the
dynamic application actions. Going back to the comment that you made, the kind of
guarantees that it can make is to ensure that the reallocation actions can be safe and that the
allocation can be done at any particular point in time. By safe, I'm talking about something like
a safety property which is output equivalence such that when you reallocate traffic, you know,
they can respond to load or where ever else. You can reason about the fact that the outcome
of the actions after the allocation is similar to that of one single middle box with the equivalent
infinite capacity to this is sort of a liveness property and what this essentially means is that an
operator should be able to trigger these reallocation operations at any given point in time and
he should be able to argue that it will finish sufficiently soon, even some bounded amount of
time. There is no control plane out there that does this today. We are the only system that can
provide these guarantees. Some of you know the SIGCOMM reviews are out for rebuttal, so
this is just my way of making myself feel good. Every day I look at myself in the mirror and say
you are my hero, no matter what other people say. [laughter] you are doing good. [laughter].
More seriously, this is actually the first system that can provide these kinds of guarantees.
Having these kinds of guarantees are useful for two things. The first thing is that it is actually a
necessary basis for some of the new abstractions I described on the previous slide. Without
having these kinds of guarantees, it's hard to build a dynamic remote enhancement kind of
middle box. It also forms the basis for strong SLAs, that you can actually reallocate load without
taking down the middle box or impacting reliability. Also you can re-spawn to load in some
small amount of time like within a few milliseconds. Further you can offer strong SLAs in these
things like load-balancing, elastic scaling and the always updated middle box applications I
discussed in the previous slide. In the case of load-balancing, for example, you can ensure that
the load we allocation mechanisms are sufficiently responsive and they don't impact the quality
of the decisions made by the middle boxes in the question. In terms of these kinds of
guarantees being necessary for being the basis of new abstractions, without being able to
reason about safety and liveness, you won't be able to build the dynamic invocation application
that I was showing. You fundamentally need the ability to move flows and be able to argue that
operation has these safety and liveness properties. So OpenNF is a system that can provide
these guarantees.
>>: What is the real need for those guarantees? Like why are you taking such a strong stance
on them? In the end I think the basic abstraction of the network itself is packets can be
corrupted by this kind of drop and [indiscernible] so why go for such strong guarantees?
>> Aditya Akella: Typically, when packets are reordered by the network, the middle box has
some logic to deal with that. The algorithms are basically a function of those possible inputs.
What we are saying is we want to be able to support the allocation actions and we want to
make sure that the reallocation doesn't introduce other unexpected inputs that the middle box
did not take care of. So we want output equivalence with respect to whatever the middle box’s
internal logic would do for just networking losses and corruptions. Does that make sense?
>>: Yeah, so I was thinking like I guess if you are willing to update let's say your application and
drops and the network, then it seems like the allocation stuff, reallocation becomes trivial
maybe. You basically bring up another middle box without actually worrying about any kind of
inconsistency and this middle box has logic to be with duplication and jobs and why is that not
enough?
>> Aditya Akella: I'll come to an example where that is not possible. Basically, for example, on
off path ideas, if the network doesn't introduce any drops, then anytime there is a missing
event that would trigger some sort of alarm for the off path ideas, or if some kind of reordering
happens it would trigger an alarm for the off path ideas. But in the case of an off path idea it
doesn't have the ability to request those dropped packets because the application may not read
[indiscernible] it would have to deal with that missing hole with some specific way. What we
want to make sure is that the reallocation does not change the nature of the decision that the
idea makes given what is happening with the application and given what is happening with the
network. Does that make sense? If you can engineer a network that does not corrupt packets,
that does not introduce reordering or drops, then the middle box’s output should be the same
with or without reallocation.
>>: If you have a network that is not [indiscernible] the fact is we do have networks that
[indiscernible] what that means is like maybe it's going back to and went. The middle box logic
should be robust enough that these things in which case you don't need like this control plane
with such strong guarantees.
>> Aditya Akella: I think -- let me get to the loss freeness guarantee and with that example we
can deal with this question. In that specific case we will see how we are showing that the
decisions are to be [indiscernible] ideas. Maybe hold onto that thought until we come to that
example. Just a quick, is SDN control enough? This is a simple elastic situation scaling kind of
situation where you have two traffic flows, the red traffic and the blue traffic subset. Each of
these devices creates state for the individual traffic subsets because of the state flow devices.
Suppose the volume of traffic on these individual subsets close causing the intrusion prevention
system to run hot. You decide to deploy another instance and essentially what you want to do
is you want to shed half the load now. There are two choices with SDN here. The first thing is
you can wait for new flows to arrive and fund all those new flows to the new instance. In this
case there are no new flows, in this extreme example, therefore, this bottleneck will persist.
This impacts the responsiveness liveness property that we were after. Another thing you can
do is you can say screw it. I'm just going to move the blue flows. The problem is that the state
that was needed for the blue flows is missing from its new location, so that impacts the
decisions that the intrusion prevention system makes in this case. It may cause it to either raise
false alarms or it may cause it to miss detecting attacks. Essentially, what this means is that this
simple SDN control is not enough. In this particular case, if you just relied on SDN, then that will
make some Milhouse cry somewhere and that Milhouse could be your network operator or
mailbox for whatever else. Going back to this picture, what we really need is that in addition to
being able to control the routing of traffic, we also need to move the state to the traffic's new
location. Essentially, to satisfy those kinds of properties, what we need is something that
exercises joint control over the forwarding state and the networking network internal state. So
OpenNF is basically a system that addresses this. Yes?
>>: To justify that you said you use an extreme example of heavy flows. Is that a valid?
>> Aditya Akella: Not really. New flows will arrive, but existing flows can last long enough that
the load never goes down below the point that you care about. I think it will be clear if you
think of this as a scaled-down situation where you want to decommission the middle box and
you want to take load off of it. You just have to wait for traffic to drain off. It can take several
minutes or sometimes even hours for traffic to drain out. I'll actually show you results in the
simple place where we analyze this it took up to a half an hour a middle box, for the traffic to
drain out.
>>: I'm just trying to make sure that your primary climb is worth the benefit, because you see
get strong [indiscernible] technique waiting for [indiscernible]?
>> Aditya Akella: Yeah. And hopefully at the end of the talk you'll be convinced that it was not
a terrible hike up there. Yeah?
>>: [indiscernible] more from the [indiscernible] middle box [indiscernible] what they do today
[indiscernible]
>> Aditya Akella: What people do today is the first alternate, is to wait for flows to drain out.
You want to decommission a middle box? You wait for it to drain out and then you check it out
and start using another one. So all new flows go to the new middle box and all existing flows
need to drain out from the old middle box. There are some research papers out there that say
just do the allocation, let's just reallocate. But what they ignore is what happens to the false
positive and false negatives in the decisions of the middle boxes.
>>: [indiscernible] wait for time A and then wait for time B and then what happens after that…
>> Aditya Akella: The problem is that you don't know. You don't know. You are monitoring
some state and you don't know whether, you don't know how long you will have to wait.
>>: Yeah. This would be [indiscernible] solution.
>> Aditya Akella: Yes. And the flow length can be arbitrary long and you have to wait for that
data flow to finish.
>>: What is it you are trying to provide [indiscernible] atomic operation essentially the which
say that considers this set of [indiscernible] as a single logical unit, and from a [indiscernible]
perspective do you care if the [indiscernible] as long as we can [indiscernible] the [indiscernible]
state from one to the other, I guess. [indiscernible] example would be beyond any
[indiscernible]
>> Aditya Akella: From an intern perspective or the tenant or whoever is using it, he just sees it
as one middle box and there is a control application that wants to preserve that abstraction of
that one available middle box that offers a certain kind of performance. To do that, then the
control application may need to do some reallocation of processing, and we want to ensure
certain semantics for that reallocation of that processing. Does that make sense?
>>: One of the points of [indiscernible] notorious for falling down under high load, so when
your control applications try to do this reallocation it could just be that you waited -- presuming
you would have some notion of its signal something function and you don't want to sort of
become aggressive [indiscernible] slight impression of order and you want to start doing this
rebalancing of [indiscernible] because you [indiscernible]. At the same time, it you also
[indiscernible] like a ticking time bomb now, you know. By the time you do any sort of
reallocation, you essentially have lost 80percent of your drop just because your box went
down.
>> Aditya Akella: Yeah. So I think that is sort of -- so in this picture, right, this is sort of an
OpenNF controller. What you are describing is this application upset, the elastic scaling
application. All of that would be in the logic for the elastics. At what point does the application
trigger a scale out? That is sort of an imperfect science. The application may decide once the
middle box sees more than 60percent load over a ten-minute interval, I will start to spin up a
new instance and start sending load [indiscernible]. That's a policy that is up to the application.
We are not saying, we are not advocating one way or the other what that policy should be, but
that is up to the application. Once the application makes those choices, it would essentially
translate that down to some set of reallocation operations and that becomes some sort of state
important export across these middle boxes. The OpenNF controller does those things in
coordination with the network to support some semantics for those operations so those
operations will complete safely in a certain amount of time.
>>: I agree with all of that. If you go back to the slide a few slides back where you are talking
more and we'll see if you will enlighten us.
>> Aditya Akella: Yeah.
>>: You have [indiscernible] just assume [indiscernible] and all of that [indiscernible] seems
explicitly now tied to how sensitive or aggressive you are in triggering the this break up
function. So then you can't just have any arbitrary guide and essentially loss [indiscernible]
you've got to give…
>> Aditya Akella: Yeah. Essentially, what we want to do, what can the controller provide? So if
the controller can say look. You give me some kind of reallocation operation; I can give you a
done signal within 500 milliseconds. That's the kind of soon enough kind of guarantees that we
are after.
>>: After you trigger [indiscernible] there's a boundary amount of time…
>> Aditya Akella: Time that it will take to be done. But that is a bounded amount of time. We
can reason about how long that is going to take and it is going to be small and I will show you
how we get there.
>>: Perhaps I am jumping in [indiscernible] depending on the amount of scale you have to
transport?
>> Aditya Akella: Yes. It's all norm and I will tell you why that is the case.
>>: Do you pause the traffic during the transfer?
>> Aditya Akella: No, no know. I will get to that. No pausing of traffic happens. Traffic is
getting processed. That's the thing that we have to deal with. That's a big problem. It creates
problems for us. We do not want to pause the traffic.
>>: But then how can you do that if, you just say at this point if you want to transfer the states,
new packets are coming in and they start updating the old state…
>> Aditya Akella: Okay. What you are talking about is what is the phase condition?
>>: Yes.
>> Aditya Akella: Okay. I will get there. That is the biggest challenge in designing OpenNF.
These are the challenges. The first thing is we want to be able to bring in arbitrary NF that are
out there into the fold. We don't want to change anything about how they are doing in terms
of internal state allocation because that may [indiscernible]. What we are going to do is this big
challenge that we have here. As we are moving state packets and I have an update state and so
state, those updates may be lost at the state's new location or they may happen out of order.
All state can become inconsistent and so we want to make sure that can't happen. Those are
the guarantees that we, these are the safety guarantees that we want to provide. Any other
questions before I jump into the technical part.
>>: I think the first [indiscernible] that you can have an NF [indiscernible] not a partition
[indiscernible] state [indiscernible] because it has [indiscernible], but I am saying in practice
that [indiscernible]
>> Aditya Akella: Yeah. In theory, you can have an NF that doesn't fit in this. But what I'm
going to talk about is that in practice the NFs that we looked at essentially they create state
that either applies to a single flow or some group of flows or all flows. This is a [indiscernible]
internal state, so there is a [indiscernible] connection object that consists of two objects, a TCP
analyzer object and an HTTP analyzer object. There is a bunch of state that is shared across
connections. This is like per flow counters which is often used for scan detection. And then
there is a bunch of statistics that are maintained for all traffic traversing the [indiscernible]
instance. We can basically think of state as being defined by three different scopes. Either it is
per flow state, multi-flow state or all flow state and we can use the traditional notion of flow
which is the connection [indiscernible] will do. I'll describe what state we are interested in and
what we want to do with that state. The API that speaks directly to the NFs is actually fairly
simple. It has three calls get, put and delete. The calls will take a filter f that describes the state
that we are interested in, which is typically some subset of the connection [indiscernible]. It
also defines the scope whether we are interested in the per flow state matching that scope and
that filter or the multi-flow state or whatever. But the key thing is that once that get put or
delete call is issued it is up to the NF to identify and provide all the states matching that
particular filter. That is when it is exporting state, and when it is importing state it is up to the
NF to combine the state provided by the controller with its internal data searches. What this
means is that the controller doesn't need to know what the internal state organization is
[indiscernible] all the work to the NF and we also don't need to change the NF to conform to a
specific allocation strategy. This doesn't obviously make it easier to change an NF, to bring it
into our [indiscernible] but at least NF don't have to be redesigned to work with OpenNF. Let
me actually get into these race conditions than what I mean by these, how they interfere with
the semantics we want to offer for these high-level operations. First thing I need to tell you is
what is high-level operations entail. Suppose here we have two instances of a [indiscernible]
intrusion detection system and the high-level operation is to reallocate port 80 flows to this
instance and that's the blue flows. The first thing you would want to do is to move all the flow
specific per flow state corresponding to the blue flows from NF1 to NF2. Here you just want to
do it for port 80 flows. The other thing, and this includes both the TCP analyzer objects and the
HTTP analyzer objects that we saw on the previous slides. The other thing you might want to
do is you might want to copy the state that is shared across loads. Recall those connection
counters that were maintained per host. You could have some host’s connections go here and
some host’s connections go here so you would need to maintain copies of that state.
Essentially the high level semantics we want to offer for the reallocation operation over there
boils down to semantics for move, copy and share. In particular, for move what we can provide
is that the move is loss free and order preserving. For copy and share its various notions of
consistency, like eventual, strong, strict, whatever. Let me describe very quickly the move
operation and the [indiscernible] connections. The move operation starts by the control
application issuing the move. The controller issues a get call. The NF returns a bunch of chunks
of state with flow IDs. Each ID is basically a subset of connection [indiscernible] things that
describe the [indiscernible] the state corresponds to. At some point the controller issues a
delete and then puts these chunks one at a time into the destination instance for this move,
and at some point updates the following controller. The destinations arise because of this
import of state and export of state and the interactions with the forward. Let's start with a
simple race condition where updates can be lost and this arises because packets arrive as state
is being moved and that can lead to some bad consequences. Here is an example. Packets
arrive. They establish this state. You be sure to move before the blue flows. The blue flows
state has moved. At some point another packet comes in. It updates the state at the old
location. Routing has been updated and the problem here is that the updated state to B2 is
missing from the state’s new location. In the case of an IDS instance, for example, if it's looking
for signatures or MD5 checks to check against some kind of an attack, depending on how it is
implemented it may have missed an attack because it computes an MD5 and it doesn't see that
signature, or it may throw up an alarm saying I saw something bizarre. Both of those things are
sort of things we would like to avoid. One solution for this is to stop all traffic, stop all
processing, but for all the packets, let the state move and then reallocate flows. This can last
hundreds of seconds. You may have to buffer a lot, but the more important problem is that
when you stop traffic there will still be some packets that were traversing and updates due to
those packets through the state may be lost. You have no way of knowing whether those
updates are lost or not. The higher the semantics that you want to provide, something we
called loss freeness is it we want to make sure that all state updates due to back and forth thing
are reflected in the transferred state and every packet the switch receives should be processed
an NF instance. Those of you who are familiar with the consistent updates paper that does the
one short update essentially provides the latter property, but it cannot provide this former
property. We are interested in a stronger consistent update property than what they, the loss
free paper offered. Yeah.
>>: Are you making an assumption here that your NF [indiscernible] has sufficient capacity to -because it may just [indiscernible] worried about the [indiscernible] condition but opposite…
>> Aditya Akella: All of that is up to the control application. We are assuming the control
application is monitoring that and when we tell, when it does this to reallocate, there is a
reason why it chose to reallocate.
>>: [indiscernible] in terms of the updates being applied to it the new [indiscernible] sufficient
[indiscernible] if [indiscernible] but then what [indiscernible] during the migration the new flow
that was supposed to be [indiscernible] condition. The very fact that you [indiscernible]
allocation means that something [indiscernible] but the very fact that that thing is running hard
you also are not able to say new connection or [indiscernible] so what, I mean now, what sort
of consistency properties [indiscernible] when those connections are [indiscernible] there is no
[indiscernible]. I mean I'm trying to get my head around best effort versus the idea, because to
me that sounds like a best effort thing essentially for things if we were to keep up we would try
to [indiscernible] but there is no guarantee that all of the schedules will still be maintained in a
consistent [indiscernible]
>> Aditya Akella: I am trying to wrap my head around that question.
>>: I guess one possible answer would be does not mean 90percent and you get out of here
before you burn up or does he mean 110percent and now you are dropping [indiscernible]
>> Aditya Akella: I would think that the elastic scaling application would probably have
considerable thresholds. I'm starting to see over some period of time it's hitting 80percent and
then it would…
>>: [indiscernible] load that's the whole point here that's an internal thing in your system.
Even if you start [indiscernible] at 10percent or 1percent, you don't control that. Unless you
can start [indiscernible] from that position you will still drop things. What I am essentially
trying to understand is where is, what, you are guiding the statement. You are essentially
saying [indiscernible] some state of the network flows which were passing through me, some
kind of that would get transferred to the new location.
>> Aditya Akella: So what you are saying is that the state we are transferring will not see losses
due to the transfer. It could see losses due to whatever the middle box is experiencing. That is
up to the implementation of the middle box. Our transfer doesn't impose any other losses in
addition to that. That is the guarantee.
>>: How does it drop them without doing anything at all?
>>: I think the common case is that I think yours is looking in on corner cases where you cannot
predict the load then the load is so high that some things get dropped. But I think maybe the
way to think about it in the common case where the load is not widely fluctuating
[indiscernible]. I would say that is the common case. Why not?
>>: Because things fail. That's the point [indiscernible] things failing. That's the common…
>>: Things failing due to overload, that's the point in the common case.
>>: Load moves smoothly in the common case, right? Like because load can be modeled so
[indiscernible] application wants to be, so the way to look at it I think there is an opportunity
[indiscernible] and yes, bad things can happen. In which case like nothing [indiscernible]
>>: Right. So that's a proportion…
>> Aditya Akella: Let's take this, we should probably take this discussion a little off-line because
I have like half the talk to go.
>>: [indiscernible] there's also some [indiscernible]
>> Aditya Akella: I'm going to come to that, okay? It doesn't work for scale down. Think about
it. How would you use [indiscernible] migration to scale down?
>>: I was thinking in terms of [indiscernible] each incoming packet [indiscernible]
>> Aditya Akella: I'm talking about scale down. When you want to collapse two instances into
one instance, how would you use VM replication for that?
>>: Not [indiscernible] migration.
>> Aditya Akella: Okay. We will come to that.
>>: [indiscernible]
>> Aditya Akella: Okay. What do you do? You had two VM instances. How do you merge that
state?
>>: You said you would organize [indiscernible] some specific logic to send to the
[indiscernible]
>> Aditya Akella: Yeah. I mean, that's complex. We'll come to that. We'll come to that. I think
we are going off in a very bizarre tangent here. The key idea we have is what I call event
abstraction that I'm going to use to add all the race conditions. This is basically a way of
preventing state events from happening and then sequencing them in a particular way. This
also helps as reasonable the time it would take for these operations to finish. The simplest way
we do this is a packet comes in. We enable this event. It applies to all the blue packets and
what this does is when a packet is received it raises an event to the controller but drop the
packet locally. Don't process it. Then we should get and delete. State is moving for the second
instance. Packet comes in. Event gets this. Packet gets buffered at the controller along with
the event. Eventually state gets put. Put returns and then the controller flushes all of the
buffered packets to the destination instance. Any packet that arrives in the interim is handled
in a similar fashion and at some later point in time the forwarding is updated and then all the
packets that arrive along the new path are updated similarly. If there are any remnant packets
there is events, they get processed but all the packets get processed. No packet is left behind
but they may get processed out of order. Make sense? So we can prove that this actually
ensures the move is loss free. Going back to this [indiscernible] stuff, things can be processed
in the middle box in an out of order manner. Again, this can happen because of the
interactions between the steps four and five on the previous slide where we are pushing the
buffer and updating the routing. If the buffer gets flushed with the events in it, two packets get
sent to the destination instance in the meantime because the routing has still not been
updated. An older packet comes in, goes to the order instance on the blue flow. An event gets
generated, gets buffered at the controller. Then we kick in a routing update. Suddenly, a new
packet shows up, gets sent to the new controller and then the controller eventually releases
the buffered packet and so they packets are processed out of order. This matters for
something like bro. Bro runs this thing called a weird activity script that looks for things like
requests happening after a response was seen or a seen happening after data was seen. Those
are indications of [indiscernible]. These set off alarms inside bro. The key order you want to
make sure that all the packets are processed in the order that they were forwarded to the NF
instance. Is there a question? Okay. Again, the basic idea we have is we want to keep track of
every packet but we want to know what the last packets seen by the old instance was and then
sequence all of the dates around that last packet. The way we do this is we, again, flush all the
packets like the previous case but we mark them in the do not buffer flag which means that the
destination has to process them. We enable events on this instance, but instead of dropping
the packets, we tell the instance to buffer and hold on to any packets that are deceased. But
instead of just blindly updating routing to send to the other instance, we first create an update
for the packets that are obligated to the controller to the original instance. And what this does
is any packet is sent to the controller and even gets sent to the controller and eventually
flushed to the destination process, but it helps us keep track of what packets, remnant packets
the first instance has seen so we will know which one is the last packet that the first instance NF
runs on. What we do is suppose B3 was the last packet that was seen by NF1. We have
updated the routing state. Some other packet comes. It hits this event, gets buffered locally,
but at some point this packet gets processed. It will raise an event to the controller so now we
know that the last packet B3 has been processed by NF2. Once that happens we can release all
the buffered packets and let them be processed. We kind of can sequence the processing of
packets by shooing events of the two instances.
>>: You didn't mention this, but why bother making sure they are in order when they could
have been out of order anyway? Are you trying to not [indiscernible]
>> Aditya Akella: We are talking about things going out of order in different directions of a
connection. If they are within a connection, typically, the middle box will do all right. If it
happens across directions the middle box has no way to know if that is weird or if it is. Does
that answer the original question that you had?
>>: Yeah, I think the answer is the middle boxes are sensitive to the reordering. Despite the
[indiscernible]
>> Aditya Akella: And we want to…
>>: Despite the [indiscernible] contract or whatnot. It seems an expectation normally that
doesn't happen and if it happens a lot it's okay to be two ways to force it or something like that.
>> Aditya Akella: Yeah, we don't want that. We don't want, yeah exactly. That's well worded.
Yeah, finish.
>>: How [indiscernible] controller in terms of maintaining [indiscernible]
>> Aditya Akella: Very beefy. The controller, right now we have an optimized controller. It can
handle, I don't know, a few hundreds to thousands of events per second with buffering and all
of that. We haven't optimized that further. It also has one socket that it listens to for
everything. It's kind of very [indiscernible] right now, but that's an active [indiscernible]. How
can we make this controller robust? But we don't see any fundamental bottleneck in scaling
that. Quickly, with copy and share. Copy is an operation that provides an eventual consistency
of state across middle boxes. Shared is something we can use to provide stronger version of
consistency. It is kind of like a poor man's version of share right now. Essentially what we do is
we stop packet processing from all instances that are sharing state. We have them all
generated events. We queue the events in a FIFO order and then we release them one at a
time to NF instances and once we know an instance has changed state and that it is known to
an event, then we trigger copies of that state across all instances. It's sort of like very
heavyweight implementation of strong consistency. What do applications do? They decide on
the granularity. They decide on the scope, which state they want to operate on, whether
Monday flow or [indiscernible] state and the filter, you know, there should be a [indiscernible]
prefix or just 480 flows and they also decide whether to move or copy or share that state. They
also can describe the guarantees that they desire. We provide free for move, no guarantee or
just loss freeness or loss freeness and order preserving. For copy, it is either no or eventual
consistency. Share can be various versions of strong or strict consistency. Let me describe this
quick application and then I will go into the evaluation stuff. The point of this application is to
show some of the choices that can be made with respect to the API that we have designed.
This is bro. I don't know why they chose the logo. It's like a zit in your eye, but that's what it is.
Here typically bro runs these are very popular scripts. This is a scanning script that looks for
connection counters. This is a wonderful browser detection script that essentially looks for all
HTTP requests and analyzes the signatures. This is the weird activities script that I was
describing that looks in both directions for bizarre stuff to happen. Here are two sets of flows
and a third set going to a third middle box and what we want to do is we want to move some
prefix between two middle boxes. This is the prefix we want to move. This is the old instance
and this is the new instance we want to move to. The first thing that would translate to is that
basically copying state from all the multi-flow state from the old instance to the new instance,
so this is the multi-flow state connection counter that we wanted to copy. That's important for
the scan detection stuff. We want to move the per flow state from the first middle box to the
second middle box and this describes it as per flow and these are the guarantees we want. We
want loss free and order [indiscernible] and I'll get to that in a minute, why we want those
strong semantics. The way we do sort of eventual consistency is we have a loop that every 60
seconds copies state across those two instances, copies the connection counters from A to B
and B to A. This is sort of what this application would look like the way it is written. The key
thing to notice is this a multi-flow thread is needed for the scanned or bro script. The
vulnerability detection stuff that's mainly looking for poorly designed browsers needs just loss
freeness. It doesn't need loss free and order preserving because it's only examining one
direction. But weird activities script needs both guarantees. Again, this re-copying is necessary
for this kind of bro stuff so you probably would half to look at your application and based on
that application you can decide on what kind of guarantees you want, what specific chunks of
flow state you want those guarantees to apply to. Does that make sense? I had a more
complex example of an application, but I won't go into that. I'll quickly go over results and
there is some talk whether we can deliver on the promise that we had earlier, but are there any
questions at this point? Is this kind of clear how the application would go? Okay. This is the
real thing. We have this controller that wrote on top of Floodlight people. It's a Java-based
controller, a module in Floodlight. We provide a shared NF library, roughly 3000 lines of code
and we wanted a bunch of different middle boxes out there. Bro and intrusion detection
system, this is an asset detection system. It looks for various signatures of different kinds of
operating systems running in the network. A firewall and NAT and a caching proxy. Roughly up
to 8 percent increase in lines of code for bro which is the most complex thing we changed. This
translated to something like 6 to 700 lines of code we had to add to support OpenNF. Microbenchmarks, what does this mean for the kind of actions we need to have in NF support and
what does it mean for the higher-level applications? This basically shows the performance of
get and put per flow across the three middle boxes that we modified. Here what we are doing
is we have some number of flows whose state we want to get and then put and we vary that
number and this is the total time it took to do that operation. The big point is in order to get
per flow takes a lot of time, a lot more time than put per flow and we sort of, if you look at the
thousand flow scales in the case of bro, for example, get can take up to 850 milliseconds. The
big contribution to this cost is the serialization and deserialization of state in both cases,
unpacking state, changing it to wire format and so on. Another thing to notice is that as a
complex state of middle box grows, the complexity of the state also grows. But the key point is
that once you have a middle box you can benchmark these things and you can kind of know
how long it will take for a certain size state to be read or written. What does this mean for the
operations? What kind of guarantees can we provide that some operation will be complete in a
certain amount of time? Here the setup is we have 500 flows in some middle box and we want
to move all the flows with the per flow state. The middle box is seeing about 1000 packets per
second and it is 50 percent utilized. If we have no guarantees requested, the move operation
finishes in about 250 milliseconds. This is actually almost exactly serialization plus serialization
cost. We can parallelize it. In this case what's happening is all the flow in state is being read
and then it is being written, but we can analyze read and write and that will improve the overall
latency further. The point is that in both of these cases there are packets that are dropped as
state is being moved. That is up to 200 packets. If you kind of see that's the number of packets
that get transmitted at 1000 packets per second for the 250 milliseconds that it takes to move
the state. Suppose if you do loss free with the same parallelization optimization. We see a
slight increase in latency due to handling of events. That translates to on a per packet basis and
average latency increase of between 100 milliseconds and a maximum latency increase of 250
milliseconds. This is because buffered packets will have to wait for the state to be transferred
and then buffered packets get released. This latency is exactly the total time that it took to put
the state. There are about 230 packets in events. Again, this you can calculate because you
know, again, the load that the middle box has seen and you know how long it would take for
you to move the state. With a different optimization, we can bring down the per packet latency
even lower and essentially what this optimization says is in this case we wait for the entire put
for the whole state to return and then we start releasing events, but we can wait for individual
chunks of state to return and start releasing events corresponding with those chunks so we can
reduce the amount of time packets are buffered in the controller. If you want a stronger
guarantee you end up seeing higher latencies and a lot of the higher latency is because of
packets buffered at the destination instance waiting to be processed. But a key claim, though,
these latencies are average latencies seen by a packet of the total completion time can be
calculated making some assumptions about the network. It's a function of the load, the
thousand packets per second, the amount of state that you move, which is a function of how
many chunks of flows you have seen so far, and the processing speed of the middle box. Once
you know these things, you can kind of predict that I will finish the safe move operation within
a certain amount of time. That we argue is something formed quick and forms the basis for
SLS. Yeah.
>>: Give us some [indiscernible] function realistic workloads [indiscernible] say at least two or
three zeros in your [indiscernible]. What was your latency numbers look like?
>>: [indiscernible]
>> Aditya Akella: A single software instance? I doubt it.
>>: [indiscernible]
>> Aditya Akella: A singles date for software VM?
>>: Yes.
>> Aditya Akella: Million packets per second?
>>: Yes. I wouldn't say like [indiscernible] so large [indiscernible] this is on the low end of
states.
>> Aditya Akella: Sure. I didn't mean to poke fun at you but that's probably possible.
>>: [indiscernible] accurate [indiscernible]
>> Aditya Akella: No. It's not a definite [indiscernible] I was just curious where that was coming
from.
>>: [indiscernible]
>> Aditya Akella: What I'm wondering is why would anything change here?
>>: My question is what would the latency numbers be like? What does that function look
like?
>> Aditya Akella: Okay. Good question. So if you had a million flows, so let's go back…
>>: Not a million flows, a million [indiscernible]
>> Aditya Akella: Sure. The latency is determined by how much you're transferring. That is a
function of the number of flows that you're moving and the complexity kind of grows with the
number of flows that you are moving. You can kind of imagine that when you have a lot of -- it
depends on how many flows there are. If it's a million packets per second, if it's across, I don't
know, 1000 flows, the amount of time you take to do the move operation is purely a function of
that. What would happen is when you are moving that state, the million packets per second
determines how many events are raised, how many packets get sent to the controller. That is
what the million packets per second would determine and, you know, we need a robust
controller to handle that. We just, you know, it's up to the controller to be able to handle that
kind of load. It doesn't change the time it takes to finish the move operation. If the middle box
can really handle that many packets, really all we are talking about is all those packets coming
to the controller and then the controller sends them to the middle box and they can process
them there. If the controller can handle that kind of a load, fine, but the total latency of the
move operation itself is a function of the amount of state moved. It has nothing to do with the
load.
>>: It seems to me that there would be some correlation…
>> Aditya Akella: But it depends on the number of flows and how the middle boxes are
handling them.
>>: [indiscernible] if the number of flows becomes [indiscernible] is it [indiscernible]
>> Aditya Akella: Okay. That's kind of the point of the graph. I think it's kind of linear, maybe
doubling the number of flows in the size of these bars is kind of doubling.
>>: The thought [indiscernible]
>> Aditya Akella: Sure. But we haven't done fine benchmarking of how the size of the state will
grow. But I don't see why it would be anything but linear. The amount of state for both flows
state should grow with the number of flows. Okay. Let me talk about the last thing which kind
of got brought up, which is why not use VM replication. Here we did this elastic scale upscale
down stuff again. I'm running a little over. This is probably my last slide. Some way midway
through we scale up to a new middle box and then at some point we scale back down. With
OpenNF it takes a quarter of a second to scale up and a quarter of a second to finish the scale
back down. With VM application when we did the scale we actually found a bunch of bizarre
log entries at the scale up thing. And then our best hypothesis is that because of super flow
state that particular VM instance did not need. And there is no clear way for it to support scale
down. If we just did forwarding control alone, scale down is delayed by more than 1500
seconds. This is going back to the argument that, the question that you had John. How long
would we have to wait? There was a particular flow that we saw in our cases that lasted for up
to half an hour and we had to wait for that flow to clear. That's basically it. You know what
OpenNF is. It enables fairly rich control over state and distributed processing and we can
provide clean semantics and reasonable performance and you can go to the service center,
download the code and play with it. I don't know if I have time, but I wanted to reflect on the
relationship with SDN if I have time. But if people don't, feel free to leave. Do I have a minute?
>> Ratul Mahajan: Sure you have a minute.
>> Aditya Akella: Okay. So Ratul in the e-mail that he sent out said this was a talk that was sort
of like SDN for middle boxes. It is and it is not. SDN provides control over forwarding state. We
provide control over NF internal state, so in that sense it is similar. But in many senses it is
different from SDN. In SDN a controller can create state and it does sort of compute some
routing protocol and it creates a forwarding state and pushes a forwarding state. In OpenNF
we make a conscious choice not to do that. We are just handling state. We are moving state or
we are copying state, but all of the state is created by the middle boxes. We haven't ripped
that logic out and re-implemented that at the controller. That was a conscious choice because
we couldn't figure out a common way to do that across all of the middle boxes. The purist view
of SDN is that network elements are dumb. Control plane is ripped out and implemented
somewhere. OpenNF is not so pure. There is still a lot of stuff going on inside the middle
boxes. We don't really know what ripping out a control plane would mean for a middle box. In
that sense it's sort of similar to sort of a pragmatic view of SDN. There is still control plane
outside. In the middle boxes you have a logical controller that allows you to leave a forwarding
state. The other thing with SDN research is that people were like, there was a lot of hype and
buzz over SDN and then they realized shit, we have to deal with consistency of updates of
states, so it came as an afterthought. There has been phenomenal work reasoning about
various consistency semantics. Kind of think about OpenNF as sort of SDN with okay. Let's deal
with the consistency problem from the ground up. Okay. So that's the relationship with SDN
research. I'm done. If you have any more questions I'm happy to talk about them.
>> Ratul Mahajan: Let's thank the speaker. [applause].
>>: I wanted to quibble with this one. I think your distinction is actually less than, in the end
even in a forwarding case routers if nothing else are generating and updating state. They are
[indiscernible] open flow counters, or some controllers make decisions based on control
counters. In that sense I think the only difference is like yes. You are not moving state or
whatnot, but these switches are in the end generating some state based on which decisions get
made and things like that happen.
>> Aditya Akella: I think you are probably quibbling more with this argument than with this.
Yeah. It's not that the…
>>: The second one.
>> Aditya Akella: The point that I was trying to make is that the forwarding state that gets
installed at a switch could be completely determined by the controller. The switch starts with
something and the controller can say throw this out and use this instead. We are not quite
doing that. We are letting the logic pertaining to NF determine what that state should be. All
we are deciding is where to locate that state. We are not actually computing that state for the
NF.
>>: If you look at state [indiscernible] state [indiscernible] SDN is right. They are creating state
for the switch. And so are you in a sense. Like when you create [indiscernible] you are actively
telling the switch here is your state and do things based on this date and in that sense you are
doing the same. I think the key difference is, I would have put it differently, which is like in a
traditional sense the SDN controller exactly knows what the switch is going to do based… What
you're saying is that you don't know what this box does and that's basically it. A lot of the
reasoning that comes from that, but because you don't know what that state, you don't know
what processing is happening so you have to be extra careful like knowing this is a forwarding
switch. All it does if I give it state s this is how its behavior will change. You don't know that.
>> Aditya Akella: True. I would say the arguments are kind of related. You know what the
switch is doing and you can use that as an input for future computations. Because we don't
know what the NF is doing we cannot do anything about what the NF’s processing is. We can
only deal with management of that state.
>> Ratul Mahajan: Nothing else?
>> Aditya Akella: Okay. Cool. Thanks.
>> Ratul Mahajan: Thank you.
Download