>> Ratul Mahajan: Thanks for coming today or it's my great pleasure to introduce Aditya Akella to tell us about some recent work he's been doing on OpenNF. For those of you who do not know he is on faculty at University of Wisconsin and has done a lot of good work since he's gotten there in redundancy elimination, complexity, understanding quality of experience and whatnot and a lot of his work has been really influential. Personally, I think I've known Aditya now for 10 or 15 years. I think we started grad school at roughly the same time doing overlapping things to both [indiscernible]. But, you know, we grew up. Anyway, without further ado I will hand it to him. >> Aditya Akella: Thank you. Thanks everyone for coming to this talk. I'm going to be describing the system called OpenNF that we have been working on for nearly a year now. It's basically something that combines the power of SDN with software virtual appliances. It enables the idea of distributed middle box style applications. A lot of this work was led by my student, Aaron Gember but he got a lot of help from Raajay who is in the audience today, and from Junaid and from Robert as well. How many of you here sort of know what middle boxes are and what SDN is and so on? Who doesn't know, I guess? Is there anyone who needs… You don't? Really? Okay. [laughter]. All right. >>: That's all right. You have some slides for it [laughter]. >> Aditya Akella: I do, but… Okay. Let me just quickly walk through them. I will just start by telling you what these software virtual appliances are and SDN and why they are being used and then I'll present the motivation for the distributed processing work that we are doing. Traditionally, you know your network is made of routers and switches and they provide very basic functions like connectivity between different points, simple access controls and so on, but often operators want more packet processing functions from their networks for security, for performance reasons and so on. Network functions or middle boxes, both of these are fancy terms. I don't know what they exactly describe, but they are terms, are devices that help fill this gap. They allow an operator to introduce custom packet processing functions into their networks and these are a lot of examples out there, things like load balancers for balancing, in network balancers for balancing load across service, things like firewalls, SSL gateways, intrusion prevention and protection systems, traffic scrubbers for security, caching proxy and WAN optimizers for performance and so on. In context with their routing and switching counterparts these devices maintain their state in nature. They maintain a log of state for the flows that they process, so that's the big difference between these and routers and switches. You may be wondering why am I working on these things. These are not arcane. They're actually very popular. A recent study from Justin Sherry and others found that these middle boxes are quite widely deployed across a bunch of different devices out there. This is an average across 55 different enterprises and the key take away from this graph is that there are at least as many of these appliances deployed in enterprise networks today as there are routers and switches. And this is not just true for enterprise but also for other networks such as ISP networks. Cellular networks have a similar widespread deployment of these devices. It's not a surprising thing that the market for these devices is rapidly growing multibillion dollar market. As newer applications arise, newer devices arise, newer threats emerge, newer appliances are designed and they get deployed. They are popular, but they are also extremely painful to manage. In general large networks are hard to manage and once you throw in these custom back-and-forth things they require custom configuration, custom wiring and they are generally difficult to deal with so they are a pain to manage. This is sort of where software appliances and SDN come in. These are two trends that are actually making the management of these networks that have these network functions simpler to manage. The first thing is that traditionally these were hardwired devices. They are moving to software. This is riverbeds steelhead appliances, its virtual counterpart. This is an Fi counter balancer. This is an in cloud counterpart of that. These software alternatives are cheaper. They are easier to deploy, upgrade, customized for a particular network and so on. They kind of bring down the management burden significantly. >>: The movement is actually [indiscernible] hardware? The movement is from customized hardware to commodity hardware, like… >> Aditya Akella: These are, it's a mix of both. I think there are models where you deploy on a commodity hardware, sort of an extensible middle box along the lines of [indiscernible] where you can deploy a bunch of different middle box applications and they will be running in software, but these are sort of machine images. These are sort of in a package as VM's that you could deploy inside EC2, for example. Both of those are fairly popular trends. >>: [indiscernible] really less? >> Aditya Akella: These things become easier. Actually, it's hard to quantify exactly whether things become simpler. Hold on a little bit. I think once we talk about SDN, some of those things will become clearer. >>: [indiscernible] replacing one big iron box [indiscernible] smaller lightweight [indiscernible] one the one. You have slight [indiscernible] showing… >> Aditya Akella: Oh yeah. It's not. You need to use multiple of these. You need to use multiple of these. I agree. >>: [indiscernible] management cost increase substantially? >> Aditya Akella: Yes and no. Let's wait until the next slide and then we can deal with this question. The second trend is the use of SDN. Again, most people here hopefully know what SDNs are. SDN is basically a framer that provides you logically central control over the forwarding state of a network. To see how this is improving management of networks with these and NFs or middle boxes, note that traditionally the way to use these devices was to deploy them in chokepoints within the physical topology and then wrestle with distributed protocols to funnel traffic through those checkpoints to get some traffic subsets processed by them. With SDN, because you have this control over state, you can deploy these boxes off path and then punch traffic in and out of them to specific traffic subsets in and out of them and that immediately takes away the central points of failure and attack here and central points of network condition. That aspect of the management story kind of becomes better. The other thing you can do which is sort of an interesting line of research in middle boxes is that you don't need, what you could do is you could have multiple different kinds of the these middle boxes hanging off of the network and you can chain traffic through those different sets of middle boxes such as specific traffic subsets or different sub chains and that allows you to use, realize a variety of interesting rich high-level policies. That kind of stuff is something difficult to realize as with hardware appliances deployed at chokepoints. What this talk is about is building on this notion of decoupling of middle boxes from network topology that SDN enables. What decoupling means is that you don't need to use one instance of a middle box. You can have multiple instances deployed at different locations within your network. You could use SDN as a way of steering traffic subsets towards those different instances. SDN actually gives you more teeth than that. It allows you to dynamically reallocate traffic across instances in order to achieve dynamic control over the distributive processing happening those instances. You can use that to realize various high-level properties. One of the things you can do, for example, you could immediately make load balancing more effective. For example, you could in this case, whenever some middle box [indiscernible] you can immediately move load off of that middle box by dynamically reallocating processing for those flows across to a different middle box. This allows you to extract maximum performance across these different versions of the middle box at a given cost. This is sort of what I mean by software appliances with SDN kind of making the management story simpler. There is some limit of [indiscernible] in not having to worry about doing many of these things. I don't have a concrete argument that this makes management simpler, but… >>: [indiscernible] example [indiscernible] so that you can [indiscernible] maintaining the software state across multiple instances and showing that you keep on rolling [indiscernible] ensuring you have [indiscernible] so think of what you would do on just one [indiscernible] that will essentially help you do on [indiscernible] 1, 100 to 1, to one thousand. [indiscernible] software [indiscernible] I mean the [indiscernible] cost [indiscernible] becomes much more painful because they are ticking time bombs that [indiscernible] goes out [indiscernible] customers [indiscernible] even with this kind of a figure [indiscernible] state full, so whenever you are making any kind of a change you have [indiscernible] for the state, so what [indiscernible] connections that you have to… >> Aditya Akella: Okay. I think you are making the case for OpenNF, so let me, again, maybe this is not the place I should have stopped. I think we are in agreement. I mean there are some aspects that come into play. You don't have to put these in the middle of the network. That's also the reason, for example SLB is used as opposed to a big honking F5 load balancer. It gives you much greater capacity without having to babysit custom hardware. That's kind of the trend. Things are kind of moving away from hardware. What I wanted to get into is not just simple load-balancing. It's actually the ability to do dynamic allocation of distributed processing allows you to build novel abstractions. One example in the section of the infinite capacity middle box where the middle box instance runs hot, you run additional copies of distance and you distribute subsets of traffic across those copies. Even a recent abstraction that you can provide is that of an always updated always available middle box. Suppose you want to update this middle box with some security patch. You deploy a hot standby that has this patch. Once and this is ready and that will be clear in a minute, in a few slides, you take the middle box uses the [indiscernible] traffic to this updated middle box and then you have an always updated middle box abstraction. You can do something even more powerful. You can dynamically enhance the functionality of a middle box by leveraging and in cloud brand-new version of this middle box. For example, if this middle box is seeing something anomalous for a certain subset of traffic, you can take the ongoing processing for that specific traffic subset, handed off to a brand-new version in the cloud for additional processing. If you had the ability to do this kind of time control over distributed processing, then you can realize all of these interesting abstractions. A lot of cool fun things can happen if you get middle boxes and SDN to play with each other. This may not be funny now, but hopefully, it will become funny in a couple of other slides when I'll bring back these two characters. What OpenNF is it's a control plane that can support key semantics in these distributed processing applications specifically for all the dynamic application actions. Going back to the comment that you made, the kind of guarantees that it can make is to ensure that the reallocation actions can be safe and that the allocation can be done at any particular point in time. By safe, I'm talking about something like a safety property which is output equivalence such that when you reallocate traffic, you know, they can respond to load or where ever else. You can reason about the fact that the outcome of the actions after the allocation is similar to that of one single middle box with the equivalent infinite capacity to this is sort of a liveness property and what this essentially means is that an operator should be able to trigger these reallocation operations at any given point in time and he should be able to argue that it will finish sufficiently soon, even some bounded amount of time. There is no control plane out there that does this today. We are the only system that can provide these guarantees. Some of you know the SIGCOMM reviews are out for rebuttal, so this is just my way of making myself feel good. Every day I look at myself in the mirror and say you are my hero, no matter what other people say. [laughter] you are doing good. [laughter]. More seriously, this is actually the first system that can provide these kinds of guarantees. Having these kinds of guarantees are useful for two things. The first thing is that it is actually a necessary basis for some of the new abstractions I described on the previous slide. Without having these kinds of guarantees, it's hard to build a dynamic remote enhancement kind of middle box. It also forms the basis for strong SLAs, that you can actually reallocate load without taking down the middle box or impacting reliability. Also you can re-spawn to load in some small amount of time like within a few milliseconds. Further you can offer strong SLAs in these things like load-balancing, elastic scaling and the always updated middle box applications I discussed in the previous slide. In the case of load-balancing, for example, you can ensure that the load we allocation mechanisms are sufficiently responsive and they don't impact the quality of the decisions made by the middle boxes in the question. In terms of these kinds of guarantees being necessary for being the basis of new abstractions, without being able to reason about safety and liveness, you won't be able to build the dynamic invocation application that I was showing. You fundamentally need the ability to move flows and be able to argue that operation has these safety and liveness properties. So OpenNF is a system that can provide these guarantees. >>: What is the real need for those guarantees? Like why are you taking such a strong stance on them? In the end I think the basic abstraction of the network itself is packets can be corrupted by this kind of drop and [indiscernible] so why go for such strong guarantees? >> Aditya Akella: Typically, when packets are reordered by the network, the middle box has some logic to deal with that. The algorithms are basically a function of those possible inputs. What we are saying is we want to be able to support the allocation actions and we want to make sure that the reallocation doesn't introduce other unexpected inputs that the middle box did not take care of. So we want output equivalence with respect to whatever the middle box’s internal logic would do for just networking losses and corruptions. Does that make sense? >>: Yeah, so I was thinking like I guess if you are willing to update let's say your application and drops and the network, then it seems like the allocation stuff, reallocation becomes trivial maybe. You basically bring up another middle box without actually worrying about any kind of inconsistency and this middle box has logic to be with duplication and jobs and why is that not enough? >> Aditya Akella: I'll come to an example where that is not possible. Basically, for example, on off path ideas, if the network doesn't introduce any drops, then anytime there is a missing event that would trigger some sort of alarm for the off path ideas, or if some kind of reordering happens it would trigger an alarm for the off path ideas. But in the case of an off path idea it doesn't have the ability to request those dropped packets because the application may not read [indiscernible] it would have to deal with that missing hole with some specific way. What we want to make sure is that the reallocation does not change the nature of the decision that the idea makes given what is happening with the application and given what is happening with the network. Does that make sense? If you can engineer a network that does not corrupt packets, that does not introduce reordering or drops, then the middle box’s output should be the same with or without reallocation. >>: If you have a network that is not [indiscernible] the fact is we do have networks that [indiscernible] what that means is like maybe it's going back to and went. The middle box logic should be robust enough that these things in which case you don't need like this control plane with such strong guarantees. >> Aditya Akella: I think -- let me get to the loss freeness guarantee and with that example we can deal with this question. In that specific case we will see how we are showing that the decisions are to be [indiscernible] ideas. Maybe hold onto that thought until we come to that example. Just a quick, is SDN control enough? This is a simple elastic situation scaling kind of situation where you have two traffic flows, the red traffic and the blue traffic subset. Each of these devices creates state for the individual traffic subsets because of the state flow devices. Suppose the volume of traffic on these individual subsets close causing the intrusion prevention system to run hot. You decide to deploy another instance and essentially what you want to do is you want to shed half the load now. There are two choices with SDN here. The first thing is you can wait for new flows to arrive and fund all those new flows to the new instance. In this case there are no new flows, in this extreme example, therefore, this bottleneck will persist. This impacts the responsiveness liveness property that we were after. Another thing you can do is you can say screw it. I'm just going to move the blue flows. The problem is that the state that was needed for the blue flows is missing from its new location, so that impacts the decisions that the intrusion prevention system makes in this case. It may cause it to either raise false alarms or it may cause it to miss detecting attacks. Essentially, what this means is that this simple SDN control is not enough. In this particular case, if you just relied on SDN, then that will make some Milhouse cry somewhere and that Milhouse could be your network operator or mailbox for whatever else. Going back to this picture, what we really need is that in addition to being able to control the routing of traffic, we also need to move the state to the traffic's new location. Essentially, to satisfy those kinds of properties, what we need is something that exercises joint control over the forwarding state and the networking network internal state. So OpenNF is basically a system that addresses this. Yes? >>: To justify that you said you use an extreme example of heavy flows. Is that a valid? >> Aditya Akella: Not really. New flows will arrive, but existing flows can last long enough that the load never goes down below the point that you care about. I think it will be clear if you think of this as a scaled-down situation where you want to decommission the middle box and you want to take load off of it. You just have to wait for traffic to drain off. It can take several minutes or sometimes even hours for traffic to drain out. I'll actually show you results in the simple place where we analyze this it took up to a half an hour a middle box, for the traffic to drain out. >>: I'm just trying to make sure that your primary climb is worth the benefit, because you see get strong [indiscernible] technique waiting for [indiscernible]? >> Aditya Akella: Yeah. And hopefully at the end of the talk you'll be convinced that it was not a terrible hike up there. Yeah? >>: [indiscernible] more from the [indiscernible] middle box [indiscernible] what they do today [indiscernible] >> Aditya Akella: What people do today is the first alternate, is to wait for flows to drain out. You want to decommission a middle box? You wait for it to drain out and then you check it out and start using another one. So all new flows go to the new middle box and all existing flows need to drain out from the old middle box. There are some research papers out there that say just do the allocation, let's just reallocate. But what they ignore is what happens to the false positive and false negatives in the decisions of the middle boxes. >>: [indiscernible] wait for time A and then wait for time B and then what happens after that… >> Aditya Akella: The problem is that you don't know. You don't know. You are monitoring some state and you don't know whether, you don't know how long you will have to wait. >>: Yeah. This would be [indiscernible] solution. >> Aditya Akella: Yes. And the flow length can be arbitrary long and you have to wait for that data flow to finish. >>: What is it you are trying to provide [indiscernible] atomic operation essentially the which say that considers this set of [indiscernible] as a single logical unit, and from a [indiscernible] perspective do you care if the [indiscernible] as long as we can [indiscernible] the [indiscernible] state from one to the other, I guess. [indiscernible] example would be beyond any [indiscernible] >> Aditya Akella: From an intern perspective or the tenant or whoever is using it, he just sees it as one middle box and there is a control application that wants to preserve that abstraction of that one available middle box that offers a certain kind of performance. To do that, then the control application may need to do some reallocation of processing, and we want to ensure certain semantics for that reallocation of that processing. Does that make sense? >>: One of the points of [indiscernible] notorious for falling down under high load, so when your control applications try to do this reallocation it could just be that you waited -- presuming you would have some notion of its signal something function and you don't want to sort of become aggressive [indiscernible] slight impression of order and you want to start doing this rebalancing of [indiscernible] because you [indiscernible]. At the same time, it you also [indiscernible] like a ticking time bomb now, you know. By the time you do any sort of reallocation, you essentially have lost 80percent of your drop just because your box went down. >> Aditya Akella: Yeah. So I think that is sort of -- so in this picture, right, this is sort of an OpenNF controller. What you are describing is this application upset, the elastic scaling application. All of that would be in the logic for the elastics. At what point does the application trigger a scale out? That is sort of an imperfect science. The application may decide once the middle box sees more than 60percent load over a ten-minute interval, I will start to spin up a new instance and start sending load [indiscernible]. That's a policy that is up to the application. We are not saying, we are not advocating one way or the other what that policy should be, but that is up to the application. Once the application makes those choices, it would essentially translate that down to some set of reallocation operations and that becomes some sort of state important export across these middle boxes. The OpenNF controller does those things in coordination with the network to support some semantics for those operations so those operations will complete safely in a certain amount of time. >>: I agree with all of that. If you go back to the slide a few slides back where you are talking more and we'll see if you will enlighten us. >> Aditya Akella: Yeah. >>: You have [indiscernible] just assume [indiscernible] and all of that [indiscernible] seems explicitly now tied to how sensitive or aggressive you are in triggering the this break up function. So then you can't just have any arbitrary guide and essentially loss [indiscernible] you've got to give… >> Aditya Akella: Yeah. Essentially, what we want to do, what can the controller provide? So if the controller can say look. You give me some kind of reallocation operation; I can give you a done signal within 500 milliseconds. That's the kind of soon enough kind of guarantees that we are after. >>: After you trigger [indiscernible] there's a boundary amount of time… >> Aditya Akella: Time that it will take to be done. But that is a bounded amount of time. We can reason about how long that is going to take and it is going to be small and I will show you how we get there. >>: Perhaps I am jumping in [indiscernible] depending on the amount of scale you have to transport? >> Aditya Akella: Yes. It's all norm and I will tell you why that is the case. >>: Do you pause the traffic during the transfer? >> Aditya Akella: No, no know. I will get to that. No pausing of traffic happens. Traffic is getting processed. That's the thing that we have to deal with. That's a big problem. It creates problems for us. We do not want to pause the traffic. >>: But then how can you do that if, you just say at this point if you want to transfer the states, new packets are coming in and they start updating the old state… >> Aditya Akella: Okay. What you are talking about is what is the phase condition? >>: Yes. >> Aditya Akella: Okay. I will get there. That is the biggest challenge in designing OpenNF. These are the challenges. The first thing is we want to be able to bring in arbitrary NF that are out there into the fold. We don't want to change anything about how they are doing in terms of internal state allocation because that may [indiscernible]. What we are going to do is this big challenge that we have here. As we are moving state packets and I have an update state and so state, those updates may be lost at the state's new location or they may happen out of order. All state can become inconsistent and so we want to make sure that can't happen. Those are the guarantees that we, these are the safety guarantees that we want to provide. Any other questions before I jump into the technical part. >>: I think the first [indiscernible] that you can have an NF [indiscernible] not a partition [indiscernible] state [indiscernible] because it has [indiscernible], but I am saying in practice that [indiscernible] >> Aditya Akella: Yeah. In theory, you can have an NF that doesn't fit in this. But what I'm going to talk about is that in practice the NFs that we looked at essentially they create state that either applies to a single flow or some group of flows or all flows. This is a [indiscernible] internal state, so there is a [indiscernible] connection object that consists of two objects, a TCP analyzer object and an HTTP analyzer object. There is a bunch of state that is shared across connections. This is like per flow counters which is often used for scan detection. And then there is a bunch of statistics that are maintained for all traffic traversing the [indiscernible] instance. We can basically think of state as being defined by three different scopes. Either it is per flow state, multi-flow state or all flow state and we can use the traditional notion of flow which is the connection [indiscernible] will do. I'll describe what state we are interested in and what we want to do with that state. The API that speaks directly to the NFs is actually fairly simple. It has three calls get, put and delete. The calls will take a filter f that describes the state that we are interested in, which is typically some subset of the connection [indiscernible]. It also defines the scope whether we are interested in the per flow state matching that scope and that filter or the multi-flow state or whatever. But the key thing is that once that get put or delete call is issued it is up to the NF to identify and provide all the states matching that particular filter. That is when it is exporting state, and when it is importing state it is up to the NF to combine the state provided by the controller with its internal data searches. What this means is that the controller doesn't need to know what the internal state organization is [indiscernible] all the work to the NF and we also don't need to change the NF to conform to a specific allocation strategy. This doesn't obviously make it easier to change an NF, to bring it into our [indiscernible] but at least NF don't have to be redesigned to work with OpenNF. Let me actually get into these race conditions than what I mean by these, how they interfere with the semantics we want to offer for these high-level operations. First thing I need to tell you is what is high-level operations entail. Suppose here we have two instances of a [indiscernible] intrusion detection system and the high-level operation is to reallocate port 80 flows to this instance and that's the blue flows. The first thing you would want to do is to move all the flow specific per flow state corresponding to the blue flows from NF1 to NF2. Here you just want to do it for port 80 flows. The other thing, and this includes both the TCP analyzer objects and the HTTP analyzer objects that we saw on the previous slides. The other thing you might want to do is you might want to copy the state that is shared across loads. Recall those connection counters that were maintained per host. You could have some host’s connections go here and some host’s connections go here so you would need to maintain copies of that state. Essentially the high level semantics we want to offer for the reallocation operation over there boils down to semantics for move, copy and share. In particular, for move what we can provide is that the move is loss free and order preserving. For copy and share its various notions of consistency, like eventual, strong, strict, whatever. Let me describe very quickly the move operation and the [indiscernible] connections. The move operation starts by the control application issuing the move. The controller issues a get call. The NF returns a bunch of chunks of state with flow IDs. Each ID is basically a subset of connection [indiscernible] things that describe the [indiscernible] the state corresponds to. At some point the controller issues a delete and then puts these chunks one at a time into the destination instance for this move, and at some point updates the following controller. The destinations arise because of this import of state and export of state and the interactions with the forward. Let's start with a simple race condition where updates can be lost and this arises because packets arrive as state is being moved and that can lead to some bad consequences. Here is an example. Packets arrive. They establish this state. You be sure to move before the blue flows. The blue flows state has moved. At some point another packet comes in. It updates the state at the old location. Routing has been updated and the problem here is that the updated state to B2 is missing from the state’s new location. In the case of an IDS instance, for example, if it's looking for signatures or MD5 checks to check against some kind of an attack, depending on how it is implemented it may have missed an attack because it computes an MD5 and it doesn't see that signature, or it may throw up an alarm saying I saw something bizarre. Both of those things are sort of things we would like to avoid. One solution for this is to stop all traffic, stop all processing, but for all the packets, let the state move and then reallocate flows. This can last hundreds of seconds. You may have to buffer a lot, but the more important problem is that when you stop traffic there will still be some packets that were traversing and updates due to those packets through the state may be lost. You have no way of knowing whether those updates are lost or not. The higher the semantics that you want to provide, something we called loss freeness is it we want to make sure that all state updates due to back and forth thing are reflected in the transferred state and every packet the switch receives should be processed an NF instance. Those of you who are familiar with the consistent updates paper that does the one short update essentially provides the latter property, but it cannot provide this former property. We are interested in a stronger consistent update property than what they, the loss free paper offered. Yeah. >>: Are you making an assumption here that your NF [indiscernible] has sufficient capacity to -because it may just [indiscernible] worried about the [indiscernible] condition but opposite… >> Aditya Akella: All of that is up to the control application. We are assuming the control application is monitoring that and when we tell, when it does this to reallocate, there is a reason why it chose to reallocate. >>: [indiscernible] in terms of the updates being applied to it the new [indiscernible] sufficient [indiscernible] if [indiscernible] but then what [indiscernible] during the migration the new flow that was supposed to be [indiscernible] condition. The very fact that you [indiscernible] allocation means that something [indiscernible] but the very fact that that thing is running hard you also are not able to say new connection or [indiscernible] so what, I mean now, what sort of consistency properties [indiscernible] when those connections are [indiscernible] there is no [indiscernible]. I mean I'm trying to get my head around best effort versus the idea, because to me that sounds like a best effort thing essentially for things if we were to keep up we would try to [indiscernible] but there is no guarantee that all of the schedules will still be maintained in a consistent [indiscernible] >> Aditya Akella: I am trying to wrap my head around that question. >>: I guess one possible answer would be does not mean 90percent and you get out of here before you burn up or does he mean 110percent and now you are dropping [indiscernible] >> Aditya Akella: I would think that the elastic scaling application would probably have considerable thresholds. I'm starting to see over some period of time it's hitting 80percent and then it would… >>: [indiscernible] load that's the whole point here that's an internal thing in your system. Even if you start [indiscernible] at 10percent or 1percent, you don't control that. Unless you can start [indiscernible] from that position you will still drop things. What I am essentially trying to understand is where is, what, you are guiding the statement. You are essentially saying [indiscernible] some state of the network flows which were passing through me, some kind of that would get transferred to the new location. >> Aditya Akella: So what you are saying is that the state we are transferring will not see losses due to the transfer. It could see losses due to whatever the middle box is experiencing. That is up to the implementation of the middle box. Our transfer doesn't impose any other losses in addition to that. That is the guarantee. >>: How does it drop them without doing anything at all? >>: I think the common case is that I think yours is looking in on corner cases where you cannot predict the load then the load is so high that some things get dropped. But I think maybe the way to think about it in the common case where the load is not widely fluctuating [indiscernible]. I would say that is the common case. Why not? >>: Because things fail. That's the point [indiscernible] things failing. That's the common… >>: Things failing due to overload, that's the point in the common case. >>: Load moves smoothly in the common case, right? Like because load can be modeled so [indiscernible] application wants to be, so the way to look at it I think there is an opportunity [indiscernible] and yes, bad things can happen. In which case like nothing [indiscernible] >>: Right. So that's a proportion… >> Aditya Akella: Let's take this, we should probably take this discussion a little off-line because I have like half the talk to go. >>: [indiscernible] there's also some [indiscernible] >> Aditya Akella: I'm going to come to that, okay? It doesn't work for scale down. Think about it. How would you use [indiscernible] migration to scale down? >>: I was thinking in terms of [indiscernible] each incoming packet [indiscernible] >> Aditya Akella: I'm talking about scale down. When you want to collapse two instances into one instance, how would you use VM replication for that? >>: Not [indiscernible] migration. >> Aditya Akella: Okay. We will come to that. >>: [indiscernible] >> Aditya Akella: Okay. What do you do? You had two VM instances. How do you merge that state? >>: You said you would organize [indiscernible] some specific logic to send to the [indiscernible] >> Aditya Akella: Yeah. I mean, that's complex. We'll come to that. We'll come to that. I think we are going off in a very bizarre tangent here. The key idea we have is what I call event abstraction that I'm going to use to add all the race conditions. This is basically a way of preventing state events from happening and then sequencing them in a particular way. This also helps as reasonable the time it would take for these operations to finish. The simplest way we do this is a packet comes in. We enable this event. It applies to all the blue packets and what this does is when a packet is received it raises an event to the controller but drop the packet locally. Don't process it. Then we should get and delete. State is moving for the second instance. Packet comes in. Event gets this. Packet gets buffered at the controller along with the event. Eventually state gets put. Put returns and then the controller flushes all of the buffered packets to the destination instance. Any packet that arrives in the interim is handled in a similar fashion and at some later point in time the forwarding is updated and then all the packets that arrive along the new path are updated similarly. If there are any remnant packets there is events, they get processed but all the packets get processed. No packet is left behind but they may get processed out of order. Make sense? So we can prove that this actually ensures the move is loss free. Going back to this [indiscernible] stuff, things can be processed in the middle box in an out of order manner. Again, this can happen because of the interactions between the steps four and five on the previous slide where we are pushing the buffer and updating the routing. If the buffer gets flushed with the events in it, two packets get sent to the destination instance in the meantime because the routing has still not been updated. An older packet comes in, goes to the order instance on the blue flow. An event gets generated, gets buffered at the controller. Then we kick in a routing update. Suddenly, a new packet shows up, gets sent to the new controller and then the controller eventually releases the buffered packet and so they packets are processed out of order. This matters for something like bro. Bro runs this thing called a weird activity script that looks for things like requests happening after a response was seen or a seen happening after data was seen. Those are indications of [indiscernible]. These set off alarms inside bro. The key order you want to make sure that all the packets are processed in the order that they were forwarded to the NF instance. Is there a question? Okay. Again, the basic idea we have is we want to keep track of every packet but we want to know what the last packets seen by the old instance was and then sequence all of the dates around that last packet. The way we do this is we, again, flush all the packets like the previous case but we mark them in the do not buffer flag which means that the destination has to process them. We enable events on this instance, but instead of dropping the packets, we tell the instance to buffer and hold on to any packets that are deceased. But instead of just blindly updating routing to send to the other instance, we first create an update for the packets that are obligated to the controller to the original instance. And what this does is any packet is sent to the controller and even gets sent to the controller and eventually flushed to the destination process, but it helps us keep track of what packets, remnant packets the first instance has seen so we will know which one is the last packet that the first instance NF runs on. What we do is suppose B3 was the last packet that was seen by NF1. We have updated the routing state. Some other packet comes. It hits this event, gets buffered locally, but at some point this packet gets processed. It will raise an event to the controller so now we know that the last packet B3 has been processed by NF2. Once that happens we can release all the buffered packets and let them be processed. We kind of can sequence the processing of packets by shooing events of the two instances. >>: You didn't mention this, but why bother making sure they are in order when they could have been out of order anyway? Are you trying to not [indiscernible] >> Aditya Akella: We are talking about things going out of order in different directions of a connection. If they are within a connection, typically, the middle box will do all right. If it happens across directions the middle box has no way to know if that is weird or if it is. Does that answer the original question that you had? >>: Yeah, I think the answer is the middle boxes are sensitive to the reordering. Despite the [indiscernible] >> Aditya Akella: And we want to… >>: Despite the [indiscernible] contract or whatnot. It seems an expectation normally that doesn't happen and if it happens a lot it's okay to be two ways to force it or something like that. >> Aditya Akella: Yeah, we don't want that. We don't want, yeah exactly. That's well worded. Yeah, finish. >>: How [indiscernible] controller in terms of maintaining [indiscernible] >> Aditya Akella: Very beefy. The controller, right now we have an optimized controller. It can handle, I don't know, a few hundreds to thousands of events per second with buffering and all of that. We haven't optimized that further. It also has one socket that it listens to for everything. It's kind of very [indiscernible] right now, but that's an active [indiscernible]. How can we make this controller robust? But we don't see any fundamental bottleneck in scaling that. Quickly, with copy and share. Copy is an operation that provides an eventual consistency of state across middle boxes. Shared is something we can use to provide stronger version of consistency. It is kind of like a poor man's version of share right now. Essentially what we do is we stop packet processing from all instances that are sharing state. We have them all generated events. We queue the events in a FIFO order and then we release them one at a time to NF instances and once we know an instance has changed state and that it is known to an event, then we trigger copies of that state across all instances. It's sort of like very heavyweight implementation of strong consistency. What do applications do? They decide on the granularity. They decide on the scope, which state they want to operate on, whether Monday flow or [indiscernible] state and the filter, you know, there should be a [indiscernible] prefix or just 480 flows and they also decide whether to move or copy or share that state. They also can describe the guarantees that they desire. We provide free for move, no guarantee or just loss freeness or loss freeness and order preserving. For copy, it is either no or eventual consistency. Share can be various versions of strong or strict consistency. Let me describe this quick application and then I will go into the evaluation stuff. The point of this application is to show some of the choices that can be made with respect to the API that we have designed. This is bro. I don't know why they chose the logo. It's like a zit in your eye, but that's what it is. Here typically bro runs these are very popular scripts. This is a scanning script that looks for connection counters. This is a wonderful browser detection script that essentially looks for all HTTP requests and analyzes the signatures. This is the weird activities script that I was describing that looks in both directions for bizarre stuff to happen. Here are two sets of flows and a third set going to a third middle box and what we want to do is we want to move some prefix between two middle boxes. This is the prefix we want to move. This is the old instance and this is the new instance we want to move to. The first thing that would translate to is that basically copying state from all the multi-flow state from the old instance to the new instance, so this is the multi-flow state connection counter that we wanted to copy. That's important for the scan detection stuff. We want to move the per flow state from the first middle box to the second middle box and this describes it as per flow and these are the guarantees we want. We want loss free and order [indiscernible] and I'll get to that in a minute, why we want those strong semantics. The way we do sort of eventual consistency is we have a loop that every 60 seconds copies state across those two instances, copies the connection counters from A to B and B to A. This is sort of what this application would look like the way it is written. The key thing to notice is this a multi-flow thread is needed for the scanned or bro script. The vulnerability detection stuff that's mainly looking for poorly designed browsers needs just loss freeness. It doesn't need loss free and order preserving because it's only examining one direction. But weird activities script needs both guarantees. Again, this re-copying is necessary for this kind of bro stuff so you probably would half to look at your application and based on that application you can decide on what kind of guarantees you want, what specific chunks of flow state you want those guarantees to apply to. Does that make sense? I had a more complex example of an application, but I won't go into that. I'll quickly go over results and there is some talk whether we can deliver on the promise that we had earlier, but are there any questions at this point? Is this kind of clear how the application would go? Okay. This is the real thing. We have this controller that wrote on top of Floodlight people. It's a Java-based controller, a module in Floodlight. We provide a shared NF library, roughly 3000 lines of code and we wanted a bunch of different middle boxes out there. Bro and intrusion detection system, this is an asset detection system. It looks for various signatures of different kinds of operating systems running in the network. A firewall and NAT and a caching proxy. Roughly up to 8 percent increase in lines of code for bro which is the most complex thing we changed. This translated to something like 6 to 700 lines of code we had to add to support OpenNF. Microbenchmarks, what does this mean for the kind of actions we need to have in NF support and what does it mean for the higher-level applications? This basically shows the performance of get and put per flow across the three middle boxes that we modified. Here what we are doing is we have some number of flows whose state we want to get and then put and we vary that number and this is the total time it took to do that operation. The big point is in order to get per flow takes a lot of time, a lot more time than put per flow and we sort of, if you look at the thousand flow scales in the case of bro, for example, get can take up to 850 milliseconds. The big contribution to this cost is the serialization and deserialization of state in both cases, unpacking state, changing it to wire format and so on. Another thing to notice is that as a complex state of middle box grows, the complexity of the state also grows. But the key point is that once you have a middle box you can benchmark these things and you can kind of know how long it will take for a certain size state to be read or written. What does this mean for the operations? What kind of guarantees can we provide that some operation will be complete in a certain amount of time? Here the setup is we have 500 flows in some middle box and we want to move all the flows with the per flow state. The middle box is seeing about 1000 packets per second and it is 50 percent utilized. If we have no guarantees requested, the move operation finishes in about 250 milliseconds. This is actually almost exactly serialization plus serialization cost. We can parallelize it. In this case what's happening is all the flow in state is being read and then it is being written, but we can analyze read and write and that will improve the overall latency further. The point is that in both of these cases there are packets that are dropped as state is being moved. That is up to 200 packets. If you kind of see that's the number of packets that get transmitted at 1000 packets per second for the 250 milliseconds that it takes to move the state. Suppose if you do loss free with the same parallelization optimization. We see a slight increase in latency due to handling of events. That translates to on a per packet basis and average latency increase of between 100 milliseconds and a maximum latency increase of 250 milliseconds. This is because buffered packets will have to wait for the state to be transferred and then buffered packets get released. This latency is exactly the total time that it took to put the state. There are about 230 packets in events. Again, this you can calculate because you know, again, the load that the middle box has seen and you know how long it would take for you to move the state. With a different optimization, we can bring down the per packet latency even lower and essentially what this optimization says is in this case we wait for the entire put for the whole state to return and then we start releasing events, but we can wait for individual chunks of state to return and start releasing events corresponding with those chunks so we can reduce the amount of time packets are buffered in the controller. If you want a stronger guarantee you end up seeing higher latencies and a lot of the higher latency is because of packets buffered at the destination instance waiting to be processed. But a key claim, though, these latencies are average latencies seen by a packet of the total completion time can be calculated making some assumptions about the network. It's a function of the load, the thousand packets per second, the amount of state that you move, which is a function of how many chunks of flows you have seen so far, and the processing speed of the middle box. Once you know these things, you can kind of predict that I will finish the safe move operation within a certain amount of time. That we argue is something formed quick and forms the basis for SLS. Yeah. >>: Give us some [indiscernible] function realistic workloads [indiscernible] say at least two or three zeros in your [indiscernible]. What was your latency numbers look like? >>: [indiscernible] >> Aditya Akella: A single software instance? I doubt it. >>: [indiscernible] >> Aditya Akella: A singles date for software VM? >>: Yes. >> Aditya Akella: Million packets per second? >>: Yes. I wouldn't say like [indiscernible] so large [indiscernible] this is on the low end of states. >> Aditya Akella: Sure. I didn't mean to poke fun at you but that's probably possible. >>: [indiscernible] accurate [indiscernible] >> Aditya Akella: No. It's not a definite [indiscernible] I was just curious where that was coming from. >>: [indiscernible] >> Aditya Akella: What I'm wondering is why would anything change here? >>: My question is what would the latency numbers be like? What does that function look like? >> Aditya Akella: Okay. Good question. So if you had a million flows, so let's go back… >>: Not a million flows, a million [indiscernible] >> Aditya Akella: Sure. The latency is determined by how much you're transferring. That is a function of the number of flows that you're moving and the complexity kind of grows with the number of flows that you are moving. You can kind of imagine that when you have a lot of -- it depends on how many flows there are. If it's a million packets per second, if it's across, I don't know, 1000 flows, the amount of time you take to do the move operation is purely a function of that. What would happen is when you are moving that state, the million packets per second determines how many events are raised, how many packets get sent to the controller. That is what the million packets per second would determine and, you know, we need a robust controller to handle that. We just, you know, it's up to the controller to be able to handle that kind of load. It doesn't change the time it takes to finish the move operation. If the middle box can really handle that many packets, really all we are talking about is all those packets coming to the controller and then the controller sends them to the middle box and they can process them there. If the controller can handle that kind of a load, fine, but the total latency of the move operation itself is a function of the amount of state moved. It has nothing to do with the load. >>: It seems to me that there would be some correlation… >> Aditya Akella: But it depends on the number of flows and how the middle boxes are handling them. >>: [indiscernible] if the number of flows becomes [indiscernible] is it [indiscernible] >> Aditya Akella: Okay. That's kind of the point of the graph. I think it's kind of linear, maybe doubling the number of flows in the size of these bars is kind of doubling. >>: The thought [indiscernible] >> Aditya Akella: Sure. But we haven't done fine benchmarking of how the size of the state will grow. But I don't see why it would be anything but linear. The amount of state for both flows state should grow with the number of flows. Okay. Let me talk about the last thing which kind of got brought up, which is why not use VM replication. Here we did this elastic scale upscale down stuff again. I'm running a little over. This is probably my last slide. Some way midway through we scale up to a new middle box and then at some point we scale back down. With OpenNF it takes a quarter of a second to scale up and a quarter of a second to finish the scale back down. With VM application when we did the scale we actually found a bunch of bizarre log entries at the scale up thing. And then our best hypothesis is that because of super flow state that particular VM instance did not need. And there is no clear way for it to support scale down. If we just did forwarding control alone, scale down is delayed by more than 1500 seconds. This is going back to the argument that, the question that you had John. How long would we have to wait? There was a particular flow that we saw in our cases that lasted for up to half an hour and we had to wait for that flow to clear. That's basically it. You know what OpenNF is. It enables fairly rich control over state and distributed processing and we can provide clean semantics and reasonable performance and you can go to the service center, download the code and play with it. I don't know if I have time, but I wanted to reflect on the relationship with SDN if I have time. But if people don't, feel free to leave. Do I have a minute? >> Ratul Mahajan: Sure you have a minute. >> Aditya Akella: Okay. So Ratul in the e-mail that he sent out said this was a talk that was sort of like SDN for middle boxes. It is and it is not. SDN provides control over forwarding state. We provide control over NF internal state, so in that sense it is similar. But in many senses it is different from SDN. In SDN a controller can create state and it does sort of compute some routing protocol and it creates a forwarding state and pushes a forwarding state. In OpenNF we make a conscious choice not to do that. We are just handling state. We are moving state or we are copying state, but all of the state is created by the middle boxes. We haven't ripped that logic out and re-implemented that at the controller. That was a conscious choice because we couldn't figure out a common way to do that across all of the middle boxes. The purist view of SDN is that network elements are dumb. Control plane is ripped out and implemented somewhere. OpenNF is not so pure. There is still a lot of stuff going on inside the middle boxes. We don't really know what ripping out a control plane would mean for a middle box. In that sense it's sort of similar to sort of a pragmatic view of SDN. There is still control plane outside. In the middle boxes you have a logical controller that allows you to leave a forwarding state. The other thing with SDN research is that people were like, there was a lot of hype and buzz over SDN and then they realized shit, we have to deal with consistency of updates of states, so it came as an afterthought. There has been phenomenal work reasoning about various consistency semantics. Kind of think about OpenNF as sort of SDN with okay. Let's deal with the consistency problem from the ground up. Okay. So that's the relationship with SDN research. I'm done. If you have any more questions I'm happy to talk about them. >> Ratul Mahajan: Let's thank the speaker. [applause]. >>: I wanted to quibble with this one. I think your distinction is actually less than, in the end even in a forwarding case routers if nothing else are generating and updating state. They are [indiscernible] open flow counters, or some controllers make decisions based on control counters. In that sense I think the only difference is like yes. You are not moving state or whatnot, but these switches are in the end generating some state based on which decisions get made and things like that happen. >> Aditya Akella: I think you are probably quibbling more with this argument than with this. Yeah. It's not that the… >>: The second one. >> Aditya Akella: The point that I was trying to make is that the forwarding state that gets installed at a switch could be completely determined by the controller. The switch starts with something and the controller can say throw this out and use this instead. We are not quite doing that. We are letting the logic pertaining to NF determine what that state should be. All we are deciding is where to locate that state. We are not actually computing that state for the NF. >>: If you look at state [indiscernible] state [indiscernible] SDN is right. They are creating state for the switch. And so are you in a sense. Like when you create [indiscernible] you are actively telling the switch here is your state and do things based on this date and in that sense you are doing the same. I think the key difference is, I would have put it differently, which is like in a traditional sense the SDN controller exactly knows what the switch is going to do based… What you're saying is that you don't know what this box does and that's basically it. A lot of the reasoning that comes from that, but because you don't know what that state, you don't know what processing is happening so you have to be extra careful like knowing this is a forwarding switch. All it does if I give it state s this is how its behavior will change. You don't know that. >> Aditya Akella: True. I would say the arguments are kind of related. You know what the switch is doing and you can use that as an input for future computations. Because we don't know what the NF is doing we cannot do anything about what the NF’s processing is. We can only deal with management of that state. >> Ratul Mahajan: Nothing else? >> Aditya Akella: Okay. Cool. Thanks. >> Ratul Mahajan: Thank you.