>> Nikhil Swamy: All right. So let’s get started. Thanks for coming. It’s my pleasure to introduce Mike Hicks, who is my advisor at the University of Maryland, and he’s visiting here for this week and then for a bit of next week as well for the faculty [inaudible]. So I’m sure you can corner him at some point and chat with him more about DSU and other things too. So he’s going to tell us about Kitsune, which I understand is the latest in a long string of dynamic software updating systems, and this one seems like it’s really easy to use and actually works on large [inaudible] programs. >> Mike Hicks: Okay, thanks. Very glad to be here. So this is joint work with colleagues at the University of Maryland. So Jeff Foster is a professor there. Everybody else is a student. And Chris Hayden is underlined because this work is part of his thesis. He just graduated. So software updates are critical. Programs change. We add new features. We fix bugs. Some bugs are really important like security bugs. And so we want to make and apply those patches as quickly as we can. On the other hand, software grids are disruptive because now you have to stop what you’re doing and upgrade your software and restart it again. And that would make the service unavailable, which is perhaps not what you want. I’m not sure very many people like these sorts of messages. So dynamic software updating is an approach that attempts to solve these problems by updating the programs while they run. And of course the advantages are then that you avoid interruptions and you get to preserve critical program state that you had gathered up while you were running. So this is going to be great for non-stop services that provide financial transaction processing or traffic control or something like that. But in general it’s going to be useful for programs with long-lived connections. So for example a login sessions for SSH or a media streaming, or any sort of long-running program that gathers up in-memory state that you’d rather not lose, so an operating system, a caching server like memcached, an in-memory database, or something like that. So how does it work? So a dynamic software updating process is running along at the original version, v0, and some patch becomes available. We’ve gathered up some state at that point that we’d like to preserve and we have the new code for the new version of the program. And what we’re going to do is we’re going to dynamically update that running program and we’re going to transform that state as necessary to be compatible with the new program. And then we’ll carry on running with the updated process. And the goal here is to preserve all of those existing connections in that critical state that you had gathered up while taking advantage o the new program features and the bug fixes. So dynamic software updating has been around for a while. This is not a new idea by any means. I’ve been working on it for quite a while but it’s been in the field long before that. This describes the DMERS [phonetic] system that was a telephone switch system, an operating system that ran Bell telephone switches in the 70s and 80s. And it describes their dynamic updating process. And in the last 20 years or so there’s been a flurry of research on many different systems. Kind of since the mid 2000’s a dozen or so systems have come out for a variety of programming languages and frameworks that provide some form of dynamic software updating. So for example, languages that you might have heard of have dynamic software updating features. .Net and Java both of their VMs give you a way to change methods on the fly, the code of the method bodies without shutting the system down. Erland and small Talk provide more pervasive support for dynamic changes. There are tools that aim to support applications. They’re not within the VM but there are ways of compiling or modifying a running application to do updating. And most recently the company Ksplice provides a dynamic updating service for the Linux kernel for very small, sort of code only, changes targeted at security patches. And this company was bought by Oracle in 2011. So there’s increasing interest in this. So what are the research questions with software updating? I think the first question that people immediately think of when I talk to them about this topic is how does it work? How do you actually take a program that’s running and change it on the fly? And there are lots of ways to do that. You could compile the program specially in advance to be prepared to be updated. You could modify on the fly by rewriting the binary, inserting code into the running program. You can build a VM specially to do it. You could use process migration and checkpoint restart and other techniques to do it. So there are lots of ways to do it and there are lots of trade offs amongst those different techniques. Another important question is why do you think your program is going to be correct after you update it? You have the old version. You have the new version. You have sessions that started at the old version. Now you’ve changed the code. The program does something different. Why does that session make sense when it continues between the two versions? So figuring out a way to say what you mean. Having a way to verify that indeed what you mean is what you get. And to test that your update works properly is important. And then finally all of these things are trade-offs. It’s a big design space where you’re trying to balance performance, flexibility, the pause time that it takes when you do your update, the applicability to many programs. So it’s a big research area and we have worked on all of these questions over the last ten years in my research group. We’ve built implementations for C in Java. As Nick mentioned, this is the second C implementation that we’ve built that I’m going to talk about today. We’ve implemented dynamic updates for fifteen programs, dozens of updates per program for years worth of their release history. We’ve developed formal models, verification tools, testing tools for updates. And we’ve empirically validated a bunch of assumptions that various groups are making about those updates. So today I’m going to tell you about the synthesis of all of that experience. All of this ten years of becoming and expert, as they say you’re ten thousand hours on a particular topic. That boils down to a better way to do dynamic updating. Basically the system I’m going to tell you about today is very easily arguably better than any system that has preceded it. And I’ll give you details in a minute. Okay, so what’s the main idea of this new system? Well, first of all, current DSU systems aim to provide their service transparently. The idea is that the programmer just writes the program as they would have normally written it, and then magically someone’s going to come along, some clever researcher, and can tell them how to modify your program on the fly and things will be just fine. And the goal here is to reduce the lines of code you have to change and hopefully reduce effort. And basically I’ve come to believe that that’s a silly idea. That’s not going to work. And there are a couple of reasons for that. If we think about C, first of all, programmers are really aiming for low-level control when they’re writing a program in C. If you really didn’t care about your data representations or the time it takes you to perform a certain operation, or the way that you manage resources, memory, things like that, then you would be programming in C. You’d be programming in some higher-level language where you could defer all of that stuff to a runtime or something else. And yet, many updating systems that aim to be transparent for C take that control away. They compile those data representations, the structures, and the things like that in your program differently. They use different memory management strategies. They forbid certain programming idioms like casts between buffers and structures and things like that in order to implement transparently this dynamic updating approach. And all of these things cause programs to break. So you’d have to change your program. Or they reduce performance. And you don’t want to do either of those two things in C. The other thing is that even if you did do those things, what we have found is that through an empirical study where we use a systematic testing strategy, it’s very difficult to know when during a programs execution to perform an update so that the program behaves properly. And the approaches that people have considered before, one of them is called activeness safety, which says you’re allowed to update your program as long as any code that you have changed is not actively running, works most of the time but doesn’t work all of the time. And if it doesn’t work all of the time and your program crashes, then you’ve sort of defeated the point of doing dynamic updating in the first place. If you’re willing for your program to crash you might as be willing to shut it down and restart it so it works properly. A prior system that I worked on called Ginseng tried to find situations where that might happen by using a whole program static analysis, which didn’t scale, was conservative, and so complained about programs that there was no problem for and would be basically impossible for non-experts to use. So we felt like all of these things were lousy and they were all stemming from this idea that you could just magically do dynamic updates. So we thought, well what if we instead decided to consider updating a program feature where we get the programmer more involved. We say you’re going to have to write more code to support dynamic updating as a feature for your program, but you’re not going to have to give up those other things. You’re going to be able to represent your data however you want. You’re going to be able to more easily reason about what’s going to happen because we’re not going to use any automatic safety checks or anything like that anymore. You’d be able to read the code of your program to see when the updates and so on are going to take place. And by doing it this way we’re going to get better flexibility. We’ll be able to update more programs. We’re going to get better performance because the way you wrote your program before is the way it’s actually going to be compiled and implemented. So you have the performance that you want. And you get to maintain that low-level control. So the principle is that you pay for what you use. The name of this system is Kitsune. That’s the Japanese word for fox, and fox in Japanese folklore is a shape shifter. Foxes are also very clever so it seemed like a nice word to use. Okay. So what are the results? We applied Kitsune to modify five open-source programs, memcached, redis, and icecast. Memcached is a caching server and often sits between a client server and a back end database to speed up queries. Redis is an in-memory database that’s used by things like Craigslist and the guardian. And icecast is a streaming media server. Tor is an anonymous router and vsftpd is a secure ftp demon. In all these cases we looked at at least five updates to those programs through their release history. In the case of Tor and vsftpd we actually looked at two full years in four full years respectively of their release history, and we could perform all of those updates. We could create patches from each one of those versions. We could get the program running and then dynamically update it on the fly while it was going. No performance overhead effectively in all of these things. It’s basically in the noise. The time it took to perform an update that your program was paused was forty milliseconds, which is also roughly in the noise. And the amount of work that the programmer had to perform, which was largely one time, was less than 200 lines of code for each program. For the part where you transform the state, if you remember when I did the little update and the little circle went across. And you might need to modify your state to work with the new code. We developed a new tool called Xfgen for transformer generator that takes programs written in a little dsl that you use to define transformations between state, and in these cases between 27 and 200 lines for the entire release history that we considered. So while I think you should not be too swayed by lines of code, sometimes one line of code can be very hard to come by, so that doesn’t directly correlate with effort, I think the fact that it’s reasonably small for larger programs is encouraging and all of the other benefits that we get are encouraging as well. >>: How big are the programs themselves? >> Mike Hicks: Let’s see. So Tor was the biggest one. That’s 75 thousand lines. Icecast is -- well there’s going to be a chart later on so I might get this wrong. Vsftpd is on the order of 20 thousand lines. I think icecast is maybe 20 or 30 thousand lines. And memcached is smaller, more like five or six thousand lines. All of these things though, I should say, are in production use. Memcached is widely used. Tor is widely used. Vsftpd is the only ftp server that Linux people use these days. >>: So these people in your group [inaudible] or were you working with the open source people? >> Mike Hicks: We modified these programs. Okay. So here’s how it works. Kitsune takes your original program, and instead of compiling it to an 8-odd out [phonetic] an executable that you would just run from scratch, instead what it does is it compiles your program to a shared object file, and instead loads it up with a little driver program that’s about 100 lines of code. So the driver program starts up and it loads your program into memory and starts to run it, and it will be running for a while and eventually it will be signaled that a new version of the program is available. At the point it’s gathered up a whole bunch of state in memory because it’s been processing connections and storing its database and so on. And so at that point it will call back into the driver, remembering sort of where it was when it got the signal. And it will load the new version of the program. It then starts the new version of the program and migrates and transforms the existing state to run with the new code. So this is your database and the representation has changed so it’s going to run this transformation. It will then free up the old resources and it will continue on with the new version making its way back to the equivalent program point where it was when it received the update and then carry on with new processing. So in terms of the build process, things are mostly as they were before. Instead of compiling with your regular compiler flags you add two additional flags here. You have to compile to position independent code because now you’re loading your object as a shared object file. You also have to run it through our little source-to-source translator, which basically just inserts a few boilerplate calls into your program but otherwise does not affect the way your program works. It doesn’t change fundamentally the way your data structures are compiled for example. Once you compile all that stuff you link it with our run time system that’s going to receive the signals and provide boilerplate and stuff for these transformations that you’ll ultimately write and then you’ll link it together into a shared object file. So often times when I was a junior researcher I didn’t think these things mattered, but now that I am a little bit more senior I think differently that the old systems for dynamic software updating would require these very pervasive changes to your build process. You would have to change the way you did make files or whatever to use different compilers and so on. And I think it’s really important that all we have to do is change a couple of flags and change the way the program is linked, but otherwise leave the programs host build process alone. >>: Does this require single-threaded program? >> Mike Hicks: No. >>: Okay. >> Mike Hicks: I’ll spend some time talking about that later. Okay. So what does a programmer have to do? This is a program feature. They have to do three things. First they have to identify where dynamic updates are allowed to take place inside the program. They have to identify so-called update points. I’ll show you what those look like. They have to ensure execution reaches the right event loop when the new version restarts. They have to work their way back to that equivalent program point. And they have to identify the state to be transformed. And of course they have to identify how to do it. So to illustrate these obligations we’re going to use a simple example. This is a single threaded server that’s implementing an in-memory database. So we have some data here and we have a mapping that’s from [inaudible] basically. I’m going to index into this data mapping. The clients will present, get, and put requests or something like that. So in the main code the server will start up, it will initialize the mapping, it will create a listen socket to accept connections, and then it will repeatedly accept connections. Each time it gets a connection it will go into the client loop and then it will receive one at a time client requests and then it will return back to the main program. And this isn’t a particularly realistic server, but you get the idea of the shape of the programs we’re going to update. It’s going to be similar to this. Okay. So now I’m going to show you the parts that you have to add to use Kitsune. So first thing you have to do is you have to identify program points at which updates are permitted to take place. So the first one is going to be within the while loop of receiving client connections. And why is that? Well because it’s going to be running around this loop repeatedly and we want to make sure that we have an opportunity between client connections to dynamically update the program if a new update is available. Another place that we’re going to insert an update point is within the loop of receiving client and actual requests. And again this is going to allow us between each request to look and have an opportunity if an update is available to go ahead and apply it. In general the way you would pick these points is you would find these long running loops in your program, these event processing loops, and you would stick an update point in the middle of them. The next thing that you do is you decide what data needs to be migrated. By default all global variables are assumed to be in play for migration. So you don’t have to do anything in particular that is by putting no annotation, doing nothing, will assume that the data is going to be migrated from the old version to the new version, and then you’re going to insert a call in the beginning of your main program so that when this program is restarted as a result of an update, right? I loaded in the new version and called into it. This do auto-migrate call is going to begin the process of moving that data across. Once the auto-migration has happened, the data transformation has occurred, the program is going to continue on and now it’s the programmer’s job to get the program back to the place where it was before. So you’re going to insert calls like this that say if I was starting my program from scratch I’m going to do one thing, if I’m not updating I’ll go ahead and allocate the mapping. But if I am updating I’ve started it because of an update, I’m going to skip over this because I don’t want to reallocate the mapping that I just spent all this time gathering and migrating across. It may be that you want to also map across local variables, in which case you need to indicate to our compiler that you care to register the local variables for this function, which you do with this little annotation. And then you’re allowed to use this function migrate local that says, well, if I’m not updating, then go ahead and execute the body of this code. But if I am, go ahead and migrate across the initialize this LFD with whatever the value was in the old system. Okay. So finally we now need to work our way back to where the update points were. And we remember we had two choices where an update could happen. The update could have happened in the old version at this point, or it could have happened at this point. When we restart the new version we want to go back to the equivalent point in the new program and so we’re going to insert this call. If I am updating, and I updated from the client point then I’m going to go ahead and locate, I’m going to migrate the client file descriptor as well, because that’s now a live file descriptor. And then I’m going to go ahead and cal in to the client loop so I can go back to where I was before taking requests inside the client. If I was not updating from the client loop I’m going to go around this right to the regular while loop and I’m going to hit this Kitsune update point, which is the point where I would have updated before. Okay? So this little bit of code is going to server to migrate one extra local variable if you had previously updated from the client point. And otherwise it’s going to skip over that and go to the original point there. Okay so that’s it. That’s what the programmer has to write. Now there’s a lot of red on this slide because there’s not a lot of black on it. But basically this red code is one time fix code that you write sort of per server and it largely stays unchanged. And it’s not really dependant on the size of the whole program. That is you mostly added this stuff in the main function and in functions that are sort of shallow on the stack. And you don’t make any changes to any other code inside of your program. So as we’ll see later on for the 75 thousand-line Tor or the 5 thousand-line memcached it’s essentially the same number of lines of code that are changed. Okay. So you asked a question about multithread. Here’s the way that it works in that case. Instead of just having one thread of control that you’re in the main loop and then you go back and then you migrate your way back again, you have to orchestrate the restarting of all of your threads. And the way that that works is that each thread that’s running in your program is going to have to synchronize at it’s own update point. Once all threads have reached these Kitsune update points the main thread will do this jump back to the beginning of main again and restart the orchestration process. Once it gets to it’s equivalent update point it sort of unleashes, it unbarriers, the other threads who are then allowed to proceed forward. And inside of each of their loops and so on there’s this similar sort of logic that gets them back to where they were before. In order to do this we use a LD preload to highjack the threads library so we can keep track of what functions were these threads calling so that we can restart, for example, the new version of those functions if the functions have changed. And we also have to do things like hijack blocking system calls like except if it’s waiting on a connection or p thread conned wait or something like that, so that we can interrupt those things so the threads can more rapidly get to their update points. Okay. And once all threads reach their update points the update is considered finished and we free up all the resources and carry on. Okay. So the last thing is we have to write the code that’s going to migrate and transform the state. How do we do that? Well, let’s look at an example first. Here’s our example program where we have our data mapping. And let’s suppose on the new version of the program we change the definition of the data type so that it’s now a string rather than an int. Well, the existing data is a mapping from ints-to-ints, but now it needs to be a mapping from ints to strings, and so we’re going to have to change that existing state so that it corresponds to the new types before the new code is going to be able to use it properly, or it’s going to core dump or do something unto word by treating what are ints but should be care stars. So how would we do that? Well, conceptually what’s going to happen is when I call that data auto-migrate function I’m going to need to run through all of the elements of type data in my program. In this particular program they all live inside of this mapping, but in general when I change a type I have to find every occurrence within my entire program in the stack, in the heap, in the static data segment and so on, and I need to change them, for example, to do this. I’ll malloc [phonetic] a pointer to the actual data to be some size and I will store there the string representation of whatever integer was there before. So conceptually this is saying, writing a bunch of code like this is the find all instances of data. And this is the interesting part that you are telling the system here’s how I want you to transform those things. So in order to make it to the program it doesn’t have to write the find all elements of data of this type part. We wrote a tool called Xfgen where you write the interesting part and we will generate for you code that will go and find all of this stuff that needs to be changed. So this is an example of the code that you would write. You would say I’m transforming the old representation of data to the new representation and this is how I want to do it. And I’m going to show you some examples in a minute. >>: To do that discovery automatically, isn’t that kind of a self-version of the requirement that should be a little bit well behaved that [inaudible]? >> Mike Hicks: Yes. Let me tell you how I do it and let’s come back to that. Okay so here’s how Xfgen works inside of this process. So here’s the build process that I saw before and here’s the new bit that I haven’t shown you yet. So when I have a new version of my program I’m going to write one of these .xf files that defines the data transformations for this dynamic update and I’m going to run it through my tool Xfgen, which is going to generate a C file that does all of that find all of the elements and perform your user to find transformation on it. And then that’s going to get linked in with the new version of the program so that when it calls migrate local and do auto-migrate it will invoke the right code to perform that. Xfgen attempts to be helpful and identifies all types and values that have changed between the two versions of the program. These are stored in these .ts files that Kitsune produces, and if you fail to write an xf definition for some code that’s changed it’ll flag you and say hey a new field was added to this [inaudible] but you haven’t told me what to do with it. Please write a specification to do that. It also uses this stuff to type check the code that you write and make sure that you haven’t introduced silly type errors inside the program. Okay. So what do these specs look like? So the first one is an initialization specification. So this says I have a new global variable or a new local variable and I need to initialize it as if not the program was starting from scratch but because it’s been running for a while. How do I do that? Well, I name it here and I put an action in here. What’s the action? That’s just Clike code. It’s basically C code with a few meta-variables in it that will substitute various things. And I’ll show you what those look like in a minute. This is the one that we quickly saw an example of. This is the target-to-target transformation. So in this case target can be either a type. So a type definition changed and uniformly tells you here’s how you change all things of that type. It could also be that something was renamed. For example, you might say hey this type that was data in the old program is now named to data two and I’m just telling you that renaming has taken place. Or it might be that a global variable has changed and I would say here’s what the old global variable was, here’s the new one, and here’s how you initialize the old given the new, the new given the old. Okay. And then Xfgen is going to generate C code from these specs. So let’s look at an example. Let’s suppose that in your old program you had a counter of the number of operations that you had performed, and let’s say that the operations for our sample server were get or put operations to stick them in our memory database. And instead of just counting the operations all together, I now want to in the new version count them individually. So I’ll count the number of sets and the number of gets. Okay. So that means that I’m in the middle of running my program. The programmer is going to have to decide how do I initialize these two things given the old value? The old value does not actually provide enough information to accurately make that decision because who knows how many gets and sets have been performed up to this point? But you have to initialize them in some way. So the programmer is going to have to define what that is and they might write some code like this. They’re going to say well these are two new variables, so I’ll initialize them. And I’ll refer to this meta-variable out for storing the value of those things. And I’ll compute the floor or maybe the ceiling of those two things. I’ll just assume maybe half were gets and half were sets. From that we’re going to generate this code, which is going to be called to initialize get counts. So the do auto-migrate thing will call this and this function. And this will look up in a symbol table a pointer to where the old counter was stored, the pointer to this. And it will then initialize the new one and then will look up the pointer to the new one and then it will initialize the new one based on the old one. And it will do the same thing down here. So the nice thing is you write this and then we write this icky stuff for you. So what are these little meta-variables that appear in the C program? Well, out and in are as you would expect. Out is a variable that represents the target, the new version. And in represents the old version. You’re dealing with two different names for types. If you have the data arrow data transformation that I mentioned before, well, one data is the old data representation. One is the new. So you can distinguish between those when you need to by wrapping with this new type old type. Same thing goes if you were initializing a variable where you want to refer to the new one versus the old one but they happen to have the same name. And then finally you might want to recursively look up the type of some other transformation function. For example, if you were transforming an array where that array had things in it that themselves were changed, you could recursively look up what the transformation function is for the elements of the array and then [inaudible] apply it to each thing in the array. And in fact that’s an example that I’m going to show you now. Let’s suppose, for example, we had in our old array, we want to turn it into a link list instead. So instead of the key being the index to the array and the value being whatever the contents are at that location in the array, I now put the key and value explicitly in link list elements and then you’d transform this to this. We are not going to generate code to automatically do that for you. You are going to have to write code that will iterate through the array and create these new link list elements. But what you’ll be able to do is you’ll be able to refer to the old configuration size to know how big this was. You’ll be able to refer to the new type for the list versus the old type and so on to fill in these values. By the way, in all of those programs that I showed you updates for, that were 35 updates to 5 programs, we have never had to write something this complicated. So this is another interesting thing about research. There’s the worst case. There’s what about this? What about this? Can you handle this and this and this? And then there’s what actually happens in practice. And, if you think about it, people don’t make gigantic, pervasive, crazy changes to their program very often. They make small, reliable, incremental changes. And so we see over and over again that it’s very rare that someone massively restructures their program that would necessitate writing a complicated transformation like this. >>: But if they did make a massive change then [inaudible] shutting down service for that? >> Mike Hicks: Yes, that’s right. Exactly. I mean eventually, for example, the hardware is going to have to be swapped out. Maybe we could coordinate our system with VM migration or something like that so you could even get around the hardware-swapping problem. But yes, the goal is not necessarily to run indefinitely but to create much greater spaces between shut down and restart. Okay. So here’s the last example. We update data this time from int to long. And let’s say we have our link listed that I showed you a moment ago and we add new field to it, and then we happen to rename the next pointer for some odd reason. In this case we can write this transformation. What this says is that while I’m changing data to data, but interestingly because the representation of those things has changed, I’m going to have to assign the in to the out and I’ll cast it to long. So of course the cast is going to extend it from 32 to 64 bits. In this case I’ve added a new field. So it shows here that the target does not have to be just a variable or a complete type but you can actually indicate a field that says hey, keep every other field the same and just initialize the new field this way. And then finally this is illustrating a renaming. Do nothing but take note of the fact that this p next is a renaming of the old thing. And this is to keep the tool happy when the tool notices syntactically that there’s some sort of change, and says hey you need to tell me what to do with this p next field. You say you don’t have to do anything with it, you just have to realize that it’s been assigned form the list.next field. Okay. So now let’s talk about how the generation, that’s the specification the programmer writes. Let’s talk about the generation code that our system produces. We generate this code based on data types. So we look at the types, the strucs and so on, the global variables of your program. And we generate a traversal that’s sort of like a garbage collection traversal that starts at the roots and follows pointers based on the type information that it has. Now, the problem is C does not provide you enough type information to do the right thing in all cases. For example, here you see a pointer to a type T, you don’t know whether that’s a pointer to one t, to an arbitrary number of t’s, like it’s an array, etc, etc. You don’t know if it’s a zero terminated string and so on. There’s type information that’s missing. So we require the programmer to add annotations to the type definitions if the programmer wants us to generate this traversal code automatically. So we borrow annotations from a project called deputy that came out of Berkeley for proving type soundness of C programs, and we find that we basically need the same source of annotations. So here are some examples of those annotations. If you have a pointer to an array, you have to tell the annotation what’s the size of the array. It could be a constant, it could be another global variable, it could be a field inside of a struc. Opaque tells Kitsune that this is a pointer that does not need to be traversed. It’s just going to be the data that was there before has not been changed at all by the update and you’re just going to copy the pointer from the old version to the new version and no traversal is needed. And then KS gen is a way of specifying generics. If you have a container, an array for example, you might want to indicate what the elements of the array are so that Xfgen can automatically do this recursive call for you to transform each of the elements of the array. So as an example the annotations look like this. The C++ style template annotation where you indicate that this is the quantified variable and then this is the instantiation. And we basically have annotations that correspond to both of those things. Okay. So given such a definition, if you had a list that you said has data elements in it, and you then defined a data-to-data transformation, it would automatically traverse the list for you and transform each of the data objects. Okay, so now let’s go to your question about a stealth requirement that the programmer has to change their program. So, yes and no. Yes, it’s a stealth requirement in that if you want Xfgen to do the work for you, you’re going to have to make sure that you don’t do things that will -- you’re going to have to make sure that Xfgen is going to do the right thing. You have to understand enough about how it works so that you don’t think that it’s just going to magically get everything right. In the worst case what that will cause you to do is to write some of the traversal code yourself that you might have wished that it could write for you. In our experience that almost never happens. Although our experience does not involve really, really big programs, we’re in the process actually of getting Kitsune to work with Snort, which is about 250 thousand lines. It’s an intrusion detection packet inspection system. And snort does some wacky things that are causing us to have to do some stuff ourselves. And I’ll talk a little bit about that. Okay. So there’s the bench marking programs. Let me just show you a quick demo. Okay. So I’m going to do a dynamic update to icecast, which is a streaming media server. Okay. So I just started icecast. This is running on a machine at Maryland, I’m remotely connected to it. And that started the STIC script starts the Kitsune driver, which then loaded in the SO file that implements icecast. And that’s what icecast says when it starts up. The next thing I’m going to do is start a stream. Actually, let me do this first. So this is the icecast status page. So you can see that icecast is running version 2.3.0 rc2, and there’s the connection that I’m making to that guy. And it doesn’t have any streams yet. There are no live feeds that you can connect to. Now, I’m going to start this stream so that when I reload this page we can see there’s now a streaming subscription that you can get. You can connect to that stream and start listening to music. So I have that on this tab. There it is. Okay. And now finally I’m going to um, let’s see here -- So now I’m going to do the update. This is a little script that says sends a signal to the running program. The program is the pit of this driver. And it’s going to say I want you to update it to this new SO file, which is version rc3 instead of rc2, which is the one there before. So I do the dynamic update. The music is still playing. And when I reload the admin page you can see that the version has changed to rc3. >>: As exciting as a programmer [inaudible]. [laughter] >> Mike Hicks: I knew you guys would appreciate it. [laughter] It’s free. It’s one of those you can download and it’s a complete free open common license and so on. Okay. So I’ve already told you about these programs. There they all are. What do they look like? So here’s the number of versions that we did that we considered for each one. These are both multithreaded programs where as these are single-threaded programs. These are the releases and then that’s a rough size of the program. So they’re even actually smaller than I thought, other than Tor. This is a description of some of the changes in the programs. So Vsftpd aims to be very secure. That’s why is the very secure ftp demon. And as an example of the sorts of changes that you would want to apply right away, security patches. Hey there’s a problem with TLS. There’s a way that we could have denial of service tax. There’s a bunch of SSL improvements that eliminate timing channels or whatever it is. I might want to apply these updates right away, rather than just waiting till tomorrow or the next day or the next day to shut my system down and restart it. Tor is an anonymous routing server. Again, very concerned about security. It’s got encryption and so on in it. And so these are a bunch of changes for the releases that we considered that if you used our system you would be able to get these things on the fly. >>: So can you unload the old code? >> Mike Hicks: The old code does get unloaded at the end of the update process. Kitsune does that. Okay. So now let’s look at the changes that are required. Here are the programs again. And this is a summary of all the changes that we have to make. This first column says how many update points we had to write. So these are the places where you write Kitsune update inside the program. And, as I mentioned, they tend to go on these long-running loops. Most of them there’s a handful of these changes. Memcached there’s basically one update point per thread type that it uses. Icecast has two update points per thread type. There are six different threads. And actually a new thread is added in the very last release. This plus here indicates the number of changes that were made in versions other than the first version. Whereas the thing that’s to the left of the plus indicates the changes that we made to the very first version just to get updating to work in the first place. So you can see that most changes that you end up making are to the very first version of the program, and then occasionally you make changes to subsequent versions, for example in this case because icecast added a new thread. Okay, so those are the number of update points. These are the number of lines of code we had to add for control migration. So that’s the if this is updating then do this. If this, do that. And so on. So that’s on the order of a couple of dozen lines of code in the worst case. For data migrations, these are things like the annotation note locals, various things like that where you make small changes to your program to indicate the data that’s being migrated. And then this e underscored our annotations that we had to make to structs and so to provide type information to Xfgen so that it can generate the traversal code. So that’s what that looks like there. And then the rest of this is just other changes that we had to make in the program. Right, so just as an example the 66 lines of code that we had to make to memcached. The reason for that is there was state that was stored inside of a library that was not accessible from the program itself. It was only accessible from the library, and yet that state needed to be transformed. So we had to change the program a little bit to keep track of the state that it then registered with the library so that it could then come along in the subsequent version and dynamically update it. >>: So you guys didn’t write memcached, right? So it must have been a bit painful to figure out that that was an extra bit of state stored away somewhere. >> Mike Hicks: Yeah, that was painful. So usually the way that -- this is a good point. So the way that we figure out whether we got this right is we use testing. And we usually test updating a version of the program to itself. So you create a version, you do a dynamic update to it that just loads the same SO file as before, and you have sort of the null Xfgen update. And that tends to work out these problems that you’re pointing out. So in that case when we do the v0 to v0 update we call it from memcached, the newer version of the program, what was being stored away were function pointers inside of this lib event library. And so the moment that we did that and then unloaded the old SO, the program crashes immediately because it tries to call a function that doesn’t actually exist anymore. Or in the case that it was pointing to data in the heap or something like that, again, eventually it’s going to crash because that’s data that’s stale that you didn’t’ have access to. So that does a very good job of working out most of the kinks of just getting the v0 to v0 update. And then when the new version of your program comes out of course you test it locally before you would do it in a live system and you’d do the same thing. Okay. So then this part is describing the Xfgen changes. So these are the total number of the specifications we had to write. So this is the v-to-v specification. These are ones where global variable X got changed in the new program, so we have to write some transformation for it. And this is one is some type definition T changed to some new definition in the new program. So in some sense these are just characterizations of how much the programs data representations change over time. And it’s not surprising, I suppose, well, maybe it is a little surprising. Icecast had the most. It changed quite a lot over the small streak of releases that we considered. And because it changed a lot we ended up having to write consequently more total code to update that. So these are the total number of specifications that we wrote, and then this is the size of the total code, the action code, for each of the specifications we had to write. But never the less, right, a couple hundred lines in the worst case for a streak of a years worth of releases is not so bad. Yeah? >>: So the function for the thing is interesting to me. So it seems that because you’re in C but you don’t have closures, you can write these [inaudible] from function pointers in C itself. So if you were actually in some, so it seems to be a case in which C actually helps you because you were in ML lets say, where you actually had closures. You wouldn’t be able to just unload code and get rid of the old stuff because you couldn’t break the closure to transform it. >> Mike Hicks: Yeah, that’s a good point. >>: So you got the v0-to-v0 testing and I’m having a hard time getting intuition for -- it seems the worry that you had to do something in v0 to be prepared for some future transition that you might not have done. Like have you thought about randomized semantics preserving transformation testing, or ->> Mike Hicks: So I guess the key is to think about what are the things that could possibly limit you about what you could do in the future? So one limitation is that the control migration code that you stuck in will some how be insufficient. But that’s actually not a limitation because the control migration code is in the new version of the program that you’re going to load in. So whatever you need to do once the new version becomes available you can do inside the new version and you can test it. So that’s not so much of a problem. One thing that’s in the old code that you can change is the location and placement of update points. If you didn’t put enough update points in so that it took forever to reach an update point, that would be a problem. And probably you could do an analysis, for example, to make sure these things are reached often enough. You could do it by testing. I think that’s something that you could do v0-to-v0. You should do v0-to-v0. In terms of data, the thing that could happen is you could fail to recall that you need to notate which local variables, right? Remember I had to write this note locals thing. If you forgot to do that there might be some local variable that you can’t carry over because you failed to remember that you remember it. But that’s also an easy fix. You can very easily do a control flow graph analysis that figures out what are all of the stack frames that could lead up to any update point in the program, and just annotate all of those functions with note locals. And then I’m not limited. And then the same thing goes with global variables. By default all the global variables are available. So the only draw back to doing all of that stuff is you might do more work and hurt performance just a tiny bit by remembering stuff you didn’t need to. But it’s usually in the noise so we just do. Okay. So here’s the performance overhead slide. Basically for each of these programs we came up with workloads. We ran them under the build the system the way that the designers wanted you to build it. Just run the make file, put in all the optimizations and everything else and run it on this workload. And then we did exactly the same thing with Kitsune, and we see that the performance overhead is in the noise. It’s at worst around 2% slower, but sometimes it’s faster. And actually I really enjoy citing Todd Mitkavitz’s [phonetic] paper, which says if your performance difference is less than 8% it doesn’t count. On modern architectures it could be link order that causes a difference that’s that much. So you know this means nothing. It’s basically in the noise. And why is that? Well, this makes sense. All of the performance overhead that we introduce, the registration of these variables, this is it updating or is it not updating, all of that stuff is on paths that do not intersect with paths of the regular program’s execution, right? That just goes along unfettered the way that it was before, other than every oncein-a-while reaching one of these update points. And doing checking a flag, is an update available, is an update available, every duration of this loop it’s in the noise for a program that’s doing IO. Now, ginseng, that’s a prior system that we developed for C that attempts to be transparent. And it does crazy compilation and adds extra sort of slop space to strucs that allow you to modify them in place. It sticks in levels of indirection. It does this analysis all of these other things. Its performance overhead is more substantial. For example, in memcached it’s up to 20%. Another system, Upstair, that also uses a special compiler, introduces 16 overhead for Vsftpd. So I like to quote what Alfar Arlington [phonetic] told me about when they built a tool for CFI that if you have more than 15% overhead no one will ever use it. So our goal was 0% overhead, and we just viewed, you know, this is unacceptable. And of course that leaves out all the other problems of certain programming idioms are not supported and so on. This is the time to actually perform the updates. So when I sent that do update I sent a signal, I hit the update point. And then it takes a while to carry the state over and to migrate control flow back. And during all that time your program is paused not doing anything. So how long does that take? So usually it takes on the order of 40 milliseconds or less. Here’s the one exception. The reason that icecast is slow is that it has a bunch of sleep ones all over the place in the program. And so surprise, surprise it takes about an extra second to get past that. If you drop the sleep ones then it drops down to 180 milliseconds. And then if you also make a small change to the way you do blocking IO in icecast on the order of six lines of code you can knock it down to about 130 milliseconds. I’m going to skip that. This is the function of update time with respect to state size. So the X-axis is the number of key value pairs in both reddis and memcached, and this is the time it takes to perform the update, right? So if I’m going to traverse all of these things, the more data I have the longer it’s going to take. Now, it turns out for memcached it doesn’t make any difference how much data you have because the data representation does not change and it just copies one pointer which is to the whole big array from the old version to the new version. So that doesn’t hurt you. Where as it turns out for reddis, it has a link list structure where each element in its link list points to a statically allocated address. And the problem is we have to run through the whole list and redirect al the pointers to the old address base’s address to the new address base’s address, which takes all kinds of time, upwards of 150 milliseconds. So the way we solve that problem is we fix the design. If you don’t do that and point to a static allocated address but you use an e-num, which is what a sane person would have done, it’s really unclear why they did it this way, then you’re back to it costs you no extra time. Okay. So I said at the beginning there’s been lots of stuff that’s lead up to this work. Our own system Ginseng I already mentioned, but there are a bunch of other systems too. Ksplice is the one for the Linux kernel that I talked about. I’ve already talked about Upstair. Dynamosis is a system for updating the Linux kernel too. Opis is for a binary rewriting strategy for updating running C programs. Polis is the same thing. So many of these systems have been proposed in the last ten years. All of them have the problems that I’ve already said. They all impose too much performance overhead or they limit what you can do in the attempt to make things transparent. And it’s simply just not necessary. >>: Is there any requirement that you could [inaudible] that you could only update usable programs in [inaudible] operating system [inaudible]? >> Mike Hicks: That would be [inaudible]. I mean nothing in principle, that’s an easy thing to say. But it’s definitely not supported right now. I mean the tools rely on using shared objects and using the GCC compiler to do shared objects and it would just be a whole bunch of different mechanisms in the OS. Okay. So we don’t have support for custom allocators yet. This is a big problem with big programs. The reason is that when Xfgen generates a bunch of code and it finds something in the heap that’s old it frees it and it calls malloc to make the new version. But if you didn’t use malloc to make the old version, then calling free on it is going to crash your program, and calling malloc on the new version when you should have called some custom allocator is going to cause your program to crash later. So you need a way of doing custom allocation. We’re working on that. Another problem is this pause at update time. You have run through all of your state, and as we saw with the more state you have the more time that might take. We have an idea for doing this lazily by using page-mapping tricks. Basically we can make the new not copy the stuff over right away but page protect the global variables that would have new stuff in them and them incrementally bring stuff over as we page fall. So that’s in the works. And then we’re also working on getting this to go for Java. Java everything will be much easier but the same strategy will apply. Why? Because Java is type safe and doesn’t have this same custom allocation problems and so on. A lot of the problems we have to deal with now, or that the programmer has to think about, they would not have to think about in that setting. >>: [inaudible] Java transformations are you turning safe Java code into more safe Java code? You’re only inserting things that are safe? Or do you have to do things that are potentially unsafe? >> Mike Hicks: I don’t -- Are you thinking about like if you had C code that’s linked with your Java code or something like that? >>: If you have pure Java code. >> Mike Hicks: Well, so it’s not implemented yet. So you might be identifying something that’s going to be harder than I think. But if it’s just pure Java code you should never have to do anything unsafe. I’ve actually already implemented a Java system that’s a part of a VM that implements safe Java transformations. You have a new version of the class that has extra fields and so on and all of that is fine. This would be sort of a version that you would not have to change the VM and it would have the similar sort of benefits that Kitsune provides. >>: Would you have to violate access from [inaudible] and stuff, right? To reach that state ->> Mike Hicks: That is true. That is true. So what we would end up doing is, yeah. That’s a good point. So you could compile your program to sort of hide those for the sake of being able to support this. Or you can use reflection, which allows ->>: Violates? >> Mike Hicks: Yes, you can get around access modifiers that way. Okay? So there you go. Kitsune DSU, dynamic software updating as a program feature. It makes the semantics apparent to the programmer so you have to write some code to get it to work. But it’s not that much code, and it’s exactly the code that’s important for understanding the semantics of your dynamic update so that you can read the code, you can see what it does. And it does not impinge on your ability to use the programs idioms, at least in our experience so far, that you would like to use when programming in C. This is the most substantial demonstration of effectiveness of any DSU system to date. 35 updates at five applications, 2 of which are multithreaded, with better performance and better flexibility of any prior system. So we’re actually considering a start-up venture for this stuff where if we can get Snort to work we’re going to start releasing our own version of Snort that people can use so that they can have a DSU-enabled Snort. And then hopefully if we manage to generate a user community we’ll see where things go from there. And that’s it. [applause] >>: You mentioned that that’s where you enforce a fake update. >> Mike Hicks: Ah. So you’re worried about some bad guy can come and update your program. So I would say that it’s a similar problem that the offline patching people already have to deal with. When you get a Windows update how do you know it was from Microsoft? How do you not make sure that some other person can inject some patch into your system? So whatever techniques that people use there I would argue you could use the same techniques to make sure that bad code does not end up getting injected in your running program. >>: So you had the one example where you kind of had to, there wasn’t enough information in the preexisting state to properly copulate the new state and you had to make a guess. >> Mike Hicks: Yes. >>: And it was a particularly simple example of the sort of guess you had to make. I gather 35 updates you’ve seen almost nothing like that. >> Mike Hicks: So not in those updates. No, actually that’s not true. So there’s one Vsftpd update that’s like this. And here’s the example. At some point Vsftpd decided to implement load balancing so that you can only have end connections from a particular IP address. And the problem is when you perform the dynamic update you have to initialize a couple of hash tables that implement the load balancing. So they have a map from IP address to PID of the child process. And they also have a map of IP address to the number of connections from that address. Or maybe it’s a map from the PID to the number of connections or something like that so that you don’t have too many more than m connections. And the problem is when the child process dies, it informs the parent process hey, I’m dead. And the parent then goes and looks up in the hash table to see what PID was that and what address did it have so I can knock the count down? Now, the problem is when I perform the dynamic update and I’m in the parent, I already have a pile of child processes implementing connections, but I don’t know what their IP addresses are. So I can’t populate the table the right way. And if I just left the Vsftpd code the way it was, when the child dies and a signal gets sent, the parent goes to look up the entry in the table, which it’s sure will be there, and it isn’t. And then it core dumps. So we actually had to change the source program in that case to stick a test, if this doesn’t equal null, then go ahead and make this change. Otherwise don’t. So in that case we could not implement the transformation the way that we wanted. We didn’t have the information. So we had to change the program to accommodate that lack of information so that it wouldn’t crash based on how we did actually initialize the state. But that’s the only time in probably sixty total updates I’ve ever implemented that ran into that problem. >>: So this is in the context, you’re explaining this in the context of server stuff. You mentioned Java and .Net. Things that are somewhat similar to that I imagine, at least for .Net you’re talking about Continue? >> Mike Hicks: So you could do editing Continue with this, although it seems like that method that’s already used for editing Continue in the VMs, which is just a replace a method as long as that method is not running is more appropriate because you just want a quick and dirty response. And you don’t care if your program crashes, because if it crashes you just restart it, which is what you would have to do anyway. Having to write state transformation code and all of this stuff you may be less interested in doing that. On the other hand, if you really are testing something and you care about the current state of the program and you had a scaffold that got it so that you have a bunch of state and you want to test a fix in that setting, maybe you would be inclined to use our stuff to do it. So I wasn’t thinking about editing Continue. I was more thinking of Java server programs, you know, application servers, JSP, things like that. Or I was thinking you could even imagine phone applications like Android, for example, where you’re in the middle of playing angry birds and you get a new version of angry birds, and you know hey why not perform the update and not have to stop playing this particular screen and go back to the main menu or whatever. I mean that’s a bit of a frivolous application. But once you can support in a language, why not? >>: So these five applications that you talked about. So they were already designed to be, let’s say, running [inaudible] all the time? There was a lot of care? >> Mike Hicks: Yeah. >>: But then they have to have some kind of full [inaudible] or something to be [inaudible] copies or something. So I mean how do you basically compare? It’s a very, very vague question but the type of techniques that you use with the work and full tolerance and how do you merge both? Because if you were to apply these type of techniques for a real phone system, you mentioned, but there are a lot of things in place. I mean was it bread and butter business they have never [inaudible]. And yet software updates are a problem. But then you would have to merge the type of techniques that you talk about with these kind of existing features are a part of the system [inaudible]. So I don’t even know if it’s a, it’s not a crisp question. But what do you think about that stuff? >> Mike Hicks: So I have thought about this. I’ve thought that, so there’s an easy answer and a more nuanced answer. The easy answer is so a lot of people will try to debunk what we’re doing by saying hey, why don’t you just build -doesn’t the fault tolerant stuff just get you this? So for example if you got hot standby like they did in the old ESS ISS phone switches. Load the new version on the standby and then fall over to the new version and then now the old version is a standby and then upgrade that one. And the reason they didn’t do it that way is that you needed to then wait for there to be no active connections, right? You had to have no state. And that’s what we want to avoid. We want to keep the calls. If there’s that phone is serving calls, you want to preserve those calls. So that’s why the Dmers [phonetic] systems allowed you to do that. You could change the inside of that and then you’d still have on the side the fault tolerance part. So one answer is you certainly want to have fall tolerance for catastrophic failures. You know, if you’re using memcached, you’re using it as a front end for a back-end database where you really care about the data. And so yes, you’re going to use fault tolerance mechanisms to make sure that you never lose your data and availability and so on. What we’re doing is in some sense with memcached is we’re saving you performance problems, right? You could just whack your memcached and restart it but then you’d lose your ten-megabyte cache that made your user’s performance much, much better, which you’d rather not do if you can get away with it. Or you’d rather not, if you’re icecast, kill particular users’ connections. I mean icecast does not have any way for fault tolerance reasons failing over if it crashes while people are listening to music. If your SSH server crashes while you’re logged in, you lose. Your connection goes away. So it’s orthogonal in that sense. But on the other hand, if you had a fault tolerance mechanism you could imagine trying to use it and the main problem then would be how would you migrate the active state, the calls in the example of the Bell lab switch, so that it worked in the new code that might represent those calls in some different way. You’d have to have some different way to package up the state and bring it over and then initialize your new system in some way to deal with that state. And in some sense that’s what Kitsune is doing, right? It’s got the old version, it’s just loading it into the same address space, but all the state is sitting over here and then it’s bringing it across. It would actually not be very hard to start up a new process, actually send it across on a socket, and then restart the new process and initialize it using the transformation stuff that we did before. In fact, we have an earlier system called Ecoden that is just like Kitsune but works that way. So that’s basically all I’ve thought about it. >>: So I have another maybe not so well formed question, which is let’s say I have a system running. I find that it has some security vulnerability in it and I want to update it. It may be that version 0, my system that was running, has been compromised or maybe the attacker has come by and he’s sprayed my heap and my heap is full of [inaudible] code. And you’re going to [inaudible] you’re going to copy all that [inaudible] code to the new version. Then what you might really like to do is just clean up and start fresh with the new patch that you had in place. >> Mike Hicks: So you could do that. So the Xfgen gives you, you can write your own transformation. For example, if you wanted to do nothing. Let’s say you’re doing a patch where you’re just changing a little bit of code and none of the state changes. You could make all those XF files just be opaque pointers and all of it would just go straight across. Or what you could do is you could say I don’t want to do any of that. I want it to just be initialized the way that it was before. And you can write that in the control migration code, right? So remember when I say is updating whatever, I’ll either initialize this or I won’t. If I want to reuse it, fine. I won’t put that if test there and I’ll just go ahead and initialize it when I restart. And we do do that on many occasions actually where we don’t care about this particular state and there’s no point in going through the pain in migrating it over. There’s a separate question, which you might want to keep the old the same but you might want to fix it. So a good example is suppose you have a memory leak and you fix the memory leak, but now you’d like to fix the heap too, right? Like the heap’s got all this extra stuff in it that you’d like to throw away. So you can do that again by writing Xfgen code that will find the leaked objects and throw them away. And then the challenge is how do you know what the heap looks like to know the stuff that’s leaked? But the mechanism is there for you to use. So this work is published at [inaudible] this year. We have another paper that’s appearing at [inaudible] that I did with Katherine McKinley that attempts to automatically infer fixes for memory leaks. That it can look at the heaps of the old and the new program and figure out what the relationship between objects that are leaked and objects of the same type that are not leaked and generate transformation code that goes through and nulls out all of the leaked objects. But you could use, if we had a similar sort of synthesis system for Kitsune, you could use it with Xfgen to fix your leak or fix your heap spray or whatever sort of corruption you had. >>: Any more questions? Let’s thank Mike again. [applause]