>> Jay Lorch: Thank you, everyone. Thank you for coming. We're excited to be joined today by Ari Rabkin, who is going to be interviewing for no less than three positions here. Researcher in distributed systems, cloud computing storage, and rise. Ari got his Ph.D. in 2012 from UC Berkeley and has since then been flourishing as a post doc at Princeton, where he's gotten NSDI paper, a Hot OS paper, and a best paper award at OOPSLA. So to give you some background, he's a systems researcher, but with a lot of background you don't often see, background in static analysis and other programming languages techniques and sociological methods. He's driven by the desire for real world impact as evidenced by the widespread adoption of Chukwa, his tool for gathering and analyzing logs from large distributed systems. So please join me in welcoming Ari. >> Ariel Rabkin: >>: Hi, folks. What does Chukwa mean? >> Ariel Rabkin: The Yahoos picked the name. They assured me that it was, in Indian myth, the turtle that supported the elephant that supported the world, which seemed like a good name for a Hadoop spin notify. I cannot vouch for the anthropology of this. I just repeat what I'm told. Let me give you a sort of high level view of what I do and why I do it. So my goal as a researcher is really about making distributed systems easier to use. That we have very complicated software, and every day we are finding newer and cleverer ways to build complicated systems. We still have users. I would like to bridge this gap. Let me give you a sort of high level view of, therefore, what I think systems research in particular is. We have problems. We have many problems. We solve those problems by building software systems. We have users. Our users interact with our software systems. This gives them new problems. I have worked sort of throughout this space. I've done some work on analyzing the things that make systems break at failure analysis. I've build various software systems to meet various needs. Jay mentioned Chukwa. I will tell you about JetStream, which is appearing here again in Seattle in a month. I've done a lot of work on configuration debugging, which is about fixing the gap between the system and the user, and I've looked a bit at the language adoption problem, which is really about how our users then adopt technologies that change the set of problems they have. And I've done a little work on system education, let us say. How to teach people about new systems tools. Here today, I'm really talking about JetStream and I'm talking about the configuration debugging work. It will be about two thirds, one third. Let me now give some technical context. Once upon a time, when people talked about big data, the thing that they had in mind was web services. And web services can have a lot of data. It can be petabytes of data. Typically, at some centralized site or small group of sites. These days, when people talk about big data, really this means automatic data collection. That we have smart sensors in our houses. We have very smart sensors in our pockets. All of our systems produce log data. Every cyber physical system, be it a highway or a power grid, produces log data. I was alarmed to discover that the latest generation of soda fountains from Coca Cola are transmitting a continuous stream of data back to Atlanta about what are people drinking and how much. So everywhere you look and some places you didn't think to look, there are sensors that are producing quite substantial volumes of data. It could be exabytes. And a thing to note about this data is that it is dispersed to, when it's created at it's at every telephone, at every smart meter, at every soda fountain throughout the world. And the software with which we process it is, therefore, also changing. There was a time where when people thought about big data, the people, at least the ones who weren't at Microsoft thought about Hadoop. The mascot is an elephant. It's large, it's slow, it's powerful. It will stomp on your data, but it won't be really very nimble and it won't be very flexible. These days, we have a very much richer software stack that you have, if you are in the open source world, a very wide variety of choices of storage layer. This is going to be really about the open source view of the world. I understand that large companies have often a smaller and more cohesive stack but even in a large company, you will interact with outsiders. Those outsiders have many choices of storage system. I could have drawn this going off to the wall. Once you have your data stored, you have your choice of how to process it. There's a wide variety of execution tools. Again, I could have a drawn this out a long ways. These tools can be low level. People build higher level languages and tool sets on top of them for processing. Could, again, imagine there's more. But so you have and even above this, there's a management layer where you have sort of coordination tools, things like Zookeeper, right, which is a distributed log manager and coordination service. So we have this immense set of software tools that we now have available to us to process our data. And these things must somehow be stitched together, right. For every pair of these, you have an interface problem where you have to make sure that these systems talk. And there's one other thing I want to draw your attention to is that we have a different user population than we are used to. There was a time when people's notion of who was using our software systems and who was processing big data was technical experts who were qualified to write high performance programs. That is not the case any more. These days, the hot new thing is data science. People say it's, you know, the sexiest job of the 21st century. These people are not programmers, really. These people are somewhat of a programmer, somewhat of a statistician, somewhat of a domain expert. They're not people who want to spend their time thinking about caching and about data locality. They're people who have a query and they want to run their query and move on with their lives. So this is a different population than we are used to building systems for. The original MapReduce, Azure, what have you was targeted at expert system developers. That's not the audience of the future. Let me now sort of put these trends I've outlined together. The scale of data is going up. The technical sophistication is going done. Resource management, that's going to be a bigger problem as a result. That as the humans are less technically focused, the system has to do more of that. And likewise, as the systems are more complex, configuring it, that's going to have to be a bigger challenge, right, that users are going to need help stitching all of this software together. It gets more complicated as they get less interested in it. I'm going to start by talking about resource management. As I mentioned, there was a time when people's data lived in a centralized datacenter, and it would feed in a little bit as users interacted with it, but mostly the data lived in one place. That is not the world of today. That approach does not scale. Therefore, we're going to be in a world where there's sort of dispersed data throughout the world. I drew four. You could imagine 400 or 4,000. The data will be, I think, progressively dispersed, since our ability to generate it is growing far faster than our ability to move it. And this brings me, therefore, to JetStream, which a system that I and my colleagues at Princeton have been building for analytics in this space. Our goal is analytics; that is to say, not transaction processing. The assumption is you have data coming in. You have queries, some of which you've had for a while, some of which are new. You'd like to query this data. And the thesis I put to you is that we're going to need new abstractions in this space. That we need not only a system but a set of abstractions that let us and let users reason about what to do. And our goal was to design a system that would be flexible enough to cope with a bunch of domains. You should be able to use it for your logs. You should be able to use it for your digital video streams if you have cameras pointing at highways. Your sensor data, what have you. One system that has the right abstractions for all of these. And just as a motivating example, let's talk about content distribution networks. You have many sites. Users are making requests. For each request, you might save, perhaps, a kilobyte of data about what they got and how quickly they got it and statistics about the transaction and the request. And you might have a simple question like how popular are my websites. And so in a naive world, what you would do is just back haul all that data, you would just copy it and then analyze it in one place with your favorite analysis tool. There's a problem, and the problem is that you actually don't have enough bandwidth for that. That you should imagine that bandwidth, in the real world that sort of your needs are going to be sort of diurnal as the request load ramps up and ramps down. The amount of bandwidth you have will be, let's pretend, constant over time. In fact, it might be worse. It might be that the amount of bandwidth you have is inversely related to how much you need to back haul your analysis data. And this invites a question of what happens here? And you have a sort of buyers remorse problem where you bought more bandwidth than you are really using. Bandwidth is expensive. You pay for it typically not per byte, but in terms of the high percentiles of your need or else in terms of flat monthly fee, and the consequence is if you aren't using it, you wasted your money paying for it. There's another problem, which is that sometimes your system might produce more analysis data than you can copy right then and there, and you have a sort of, let's call it, analysts remorse problem where there's data you wished you had and you don't have it. And now, this is a problem. And I want to talk a little bit more about that problem. What actually happens in that case? You have some bandwidth. You have some need. What's going to happen here? And in particular, I want you to think about latency. And I want you to all spend a moment and think about what this graph will look like. It's going to look like that. That while there's enough bandwidth, the system will be okay. As there stops being enough bandwidth, the queue will build up. As you have a large gap between the bandwidth you have and the bandwidth you need, your queue size will grow without bound and then your system will fall over or else it will have to have some ad hoc mechanism in place for coping with that. Our goal is to fix that in JetStream so that the system will use the bandwidth that it has, no more, no less. We will use it efficiently and, therefore, you will be sort of better off in terms of both your analysis and your cost. That we will adapt to shortages if there isn't enough bandwidth, the system will send less. And then it can sort of go back and fill in the gaps later. And we need new abstractions to do this. Let me say a little bit about the system architecture. It's a data flow system, sort of similar to other streaming query processors. The model is that the user, in their program, sort of specifies a query graph as a network of operators. This is then handed to a planning library, which will optimize it and figure out where the things go. It's handed to a coordinator. The coordinator then hands it across to a data plane. At the data plane, you have potentially multiple sites. You could have this point of presence, that point of presence, the home office with the big data center, and in each of those locations you may have worker nodes. You may have stream sources. The coordinator figure out where everything goes. Let me now pop up and give you the sort of soft review of what really these query graphs look like. You might imagine in this example of the content distribution network that you have logs that are written by your legacy system. There's some operator that reads that file. That hands it to some other operator that parses out the lines. This then goes into local storage, where it will sit. Every ten seconds, you query your local storage at both this site and at another site and the results go forward to some central datacenter where they are queried and the results go forward. And because we have this local storage, the system is able to adapt and in the event that you didn't have enough bandwidth, data is still there locally and in this distributed way so you can go back and pick it up. But how is it really going to be stored? We have choices. There are two sort of bunch of things that we want our data storage to have. We would like it to be updatable so that you have streaming data coming in and then you can have one cohesive representation of it, which is hopefully smaller than the full stream, so it needs to be the case that you can update it in place. Needs to be the case that you can merge it. That if you have data stored here and data stored there, there should be some way to represent the merged data naturally. And it ought to be reducible. It ought to be the case that if I have too much data, I can produce a sort of compact representation of it. These are the sort of things we want. And it turns out that these are not actually such standard properties, right. That if you have raw byte strings, like a key value pair, you don't have any semantics. The system doesn't know how to merge it and doesn't know how to update it. It turns out database tables also have the same problem. That actually, a table doesn't tell you what to do with it. If you hand me a table and a tuple, I am confused since there isn't a unique right answer for how to merge those. And there certainly isn't a right answer if you have a database table for how you produce a smaller database table. Happily, there is a representation that does the thing we want. It's called the data cube. It's the thing that the OLAP analysts came up with in, I believe, the '90s and it turns out that it has all the properties we want. Let me tell you a little bit more. It has a high level API in the same way that sort of SQL does. In fact, it was developed by database people. Unlike database tables, it doesn't give you arbitrary joins and therefore it does give you sort of predictable performance that, by sacrificing the ability to sort of do arbitrary ad hoc joins in this representation, we can talk in a much more useful way about its performance. The data is still there. You can still write these sort of arbitrarily complicated queries. They don't go through this abstraction. So they're sort of potentially a side way in if you have to do something more complicated. But for our purpose, you should just think about this cube interface. >>: Can you repeat your argument why it is, if I have a tuple on the table, I can't merge or aggregate that? >> Ariel Rabkin: It's not that you can't do it. It's that the table alone doesn't tell you how. That what you wind up doing is you write a SQL statement that says take this tuple and on update add or on update max or on update this or that or the other. That the way SQL is set up, the thing that you do when you have new data that you want to add must be specified with respect to that data addition. That it doesn't come with a table. It's a sort of separate part of your >>: it. It's not generic? I mean, you can certainly write SQL to do >> Ariel Rabkin: You can for a table write SQL for how to do this, but it's not part of the schema. And there's nothing in the spec or the sort of understanding people have of databases that enforces any sort of consistency here. There was another question I wanted to >>: I'm not sure if this is the same question or it's a different, related question. Which is is there a generic way to, in databases, you specify a query as a standard layer of the query that can work between your high level query and this level of manipulation of the indices. Is there a can you get back to something like arbitrary joins in the data key representation where you write a query and it transforms in some generic way that query down to the [indiscernible] that have to happen the [indiscernible]. >> Ariel Rabkin: >>: Yes. So, I mean, arbitrary joins aren't completely ruled out? >> Ariel Rabkin: They're not completely ruled out, but the point is that they're not part of the abstraction. >>: Okay. >> Ariel Rabkin: And in general, it's as with most abstractions and systems, there's some underlying abstraction which is more flexible than the thing you gave. And so I'm just for our purposes, we have ruled them out. We can give them back to you later. >>: Okay. >> Ariel Rabkin: Take it that way. I want to just tell you what it is that I gave you, for those of you who are not database people. The model of a data cube is that it's a multi dimensional array with some set of aggregates indexed by some dimensions. To make that more concrete, here's where you are. Our example, you have one dimension, which is the URL. You have another dimension, which is the time. And the thing that is different from this being a database table is that we also have an aggregation function that tells us how to merge two cells. This is not something that you have in vanilla SQL. There's nothing in the nature of a database or the nature of relational algebra that tells you what do I do if I have two different relations and I want to get one relation out. A data cube has that. It has this aggregation function. And once you have that, you can do a lot. You can roll up your data. You can take some set of cells and squish it down and give you, for instance, all of the accesses to that URL at all times, or you could ask about the sort of total number of requests at a particular time. And we use this one function for updates, for rollups, for merging, for degrading. And this is the key thing that we need is some semantic about how to manipulate the data. And it's just one function so it's sort of the minimalist possible operational semantics, let's say. And once we have that, we can do a lot. In particular, we're now going to modify the data dynamically based on feedback control that we can look at the network and we can look at the data and because we now have a way of updating our data, we can produce a smaller version of it for copy. And the feedback control will tell us when to degrade, and there will be a user defined policy about how to degrade. And then because the data is there, you could do later queries to pull back the bits that you didn't get the first time through. There's more than one way to degrade your data. And that is why we need a policy. That you could imagine, for instance, that you had data every minute and you wanted data every five minute. We will call this dimension coarsening. That is, you have many samples and now you're going to have fewer samples. And it doesn't have to be time. It could be that you had data at every URL, and you'd like to have data at every domain or at every sort of prefix of the URL. A different thing you could do is drop low ranked values. That you had some curve of these are the very popular URLs. These are the less popular URLs, and you just drop the tail. That's a different transformation. You could do either, you could do both. And, in fact, there's a lot of things you could do. Here are five we came up with. This is not an all encompassing list. These are sort of five variable data degradations. There's coarsening, there's dropping values. It turns out there are sort of global protocols that let you do this in a consistent way where you drop things not base yeah. >>: Question go back to the first box. For dimensions, in your abstractions, are the dimensions fixed like a SQL table, or a dimension can be added? >> Ariel Rabkin: They are fixed for the cube. The cube schema is very much like a database schema in that changing schema is a heavyweight operation. >>: Just for clarification, going back to the first question, I didn't quite understand, with you showed the tables, so the tables have a schema. You can write a SQL query to do, say, aggregation. So what is the exact difference? >> Ariel Rabkin: The difference is that with a cube, the dimensions and the aggregates are different, whereas in SQL, there is no such distinction. You can define data cubes in terms of SQL and, in fact, that is both how we implement it and how they have been historically defined. But we are going to take that abstraction and put it in the middle of the system and we're going to sort of use it only through this more restrictive interface. >>: Can you give me an example? >> Ariel Rabkin: Of? >>: For, you know, if you did something with SQL, you know, here's how you would do it. And if you did something >> Ariel Rabkin: Yeah, so the way that we implement the aggregation function is we write a complicated SQL statement, right and that sort of based on the cube schema, we are able to produce these. But you needed to ask the user really what did you mean and what kind of aggregate is this. Is this a maximum? Is this an average? Is this a median? Right. You have choices there which are not visible actually at the SQL layer. And so we are sort of bolting a thing on pop that specifies these semantics. And if you need a longer explanation, I want to do it later. I want to first just tell you that you have, once you have your cube's many choices for what to do, right. That if you had, for instance, a histogram in your cube, you could have chosen to down sample this histogram or you could have chosen to keep fewer histograms. And there are trade offs. In particular, the trade off I want you to notice is that most of the time, you have to choose between a fixed bandwidth savings, that there's a transformation that predictably will give you half as much data, and then there are transformations that have a fixed accuracy cost. And in general, you have to choose. And that it's not the case that there's an all purpose transformation that has predictable consequences, both for the size of your data and the accuracy of your data. And that fact would be important in a minute. So jumping back to the system, the first thing you might say is, well, I have a feedback controller and I will just pick an operator and put it in my data flow graph and that will specify how to transform the data. And the operator is then attached to some controller. You specify the policy by fixing an operator, and then you have a sensor that says you're sending four times too much data back off that much, and then the operator reads the sensor. Let me give you an example of where you don't always have a predictable bandwidth savings. Let us suppose that you have decided to aggregate your data over time, that you had data every five seconds, and you want to coarsen it to every minute. The amount of data that comes out of this is not predictable in advance. That if you did this for domains, you'd get a large savings. This is, by the way, data from the CoralCDN. And if you do this for URLs, you get no savings. And the reason is that proximately every Coral URL is unique. People have these queries. The queries have a large query string. Those don't repeat. The domains do repeat. And so over time, you get savings as you start to coarsen this and have data for this domain only every minute or every hour. But it's a totally new set of URLs. So there's no savings. And you might plausibly not know in advance which case you are in, right. This is, of course, a continuum. You don't know which case you're in in advance. This depends on your users, your users change over time. And so a natural thought is I will have a composite policy. That I will have two different operators that will read the sensor, and then I can do a little of each. What happens here? It will pause for a moment and contemplate what happens if you try and do this in the naive way. So you get, in a precise sense, is chaos. That if you have two different actuators driven off the same sensor, you don't have a stable feedback loop. This can't work. There's a more subtle problem, which is that actually operator placement is not free. That there's constraints on placement due to the sort of underlying data flow. That, for instance, the way we do coarsening is that there's a thing that's querying a cube and it does a query rolling up to the minute level or the second level or what have you. And that query has to be next to the cube because of the sort of nature of the data flow. So it's not the case that you could put your operator where you like. So you can't use that as a cue to what to do. So the natural thought is let's have a controller. And then the controller will be able to tell the operators what to do and will specify the priority. And this is nice because by the way, the thing I should say is this is on every network connection. This is not global to the system. It is not even global to the machine. Every time there's a network connection, there's a little controller with a policy that says first apply this, then apply that. And the controller reads the sensor and tunes the operators to do the right thing. And so there's a potential to put a policy on every one of these, and that policy can say, first, apply this then apply that. Apply this only up to a certain point. You can specify what the degradation behavior should be. And this is good because now we're no longer bound by the topology. Which operator you apply first is unrelated to the structure of your data flow, and that's good. There's a problem, and the problem is that we do need, at the end of the day, to take the data that's coming from here and the data that's coming from there and merge it together. And that's not, in general, trivial. Let us suppose I have data every five seconds and then at some other site I have data every six seconds. What do you do? The thing that represents the data, quote, exactly, the sort of the natural representation, is to do it every 30 seconds. And that's really bad, because you just dropped a large chunk of accuracy that you didn't need to drop. And we think that the fix this is bad. We think that the fix is, instead, to jump from every five to every ten. That if you couldn't afford to send data every five seconds, you should start sending it every ten seconds. And now it's the case that this can be merged after the fact without an additional loss of accuracy. That this gives you sort of better degradation behavior and better semantics. The thing I want you to notice is you can't clean the unified data at arbitrary degradations. This is not unique to time series data. This is the case, for instance, if you have a sketch of a distribution, right. There's these sophisticated data structures called sketches that let you represent an approximation of a whole statistical distribution for things like finding compiles. These come in fixed sizes. It's not the case that you could make them ten percent smaller. You often have to go down a factor of two. And so the system needs to know that. And the degradation operators need to have fixed levels. So now, this is therefore going to really shape our final interface here. That the model is that you have an operator, the operator goes to the controller and says, I have a bunch of choices for how much data to send. I'm currently dropping half the incoming data. It could be zero. It could be 75 percent. What do you want it to be? And the controller, which has access to the sensor, is able to pick the level that will use as much bandwidth as is available and no more. And so that's the interface. And the thing to notice is this set of levels can be determined dynamically, that the operator is in a position to look at the statistics of the data and estimate what it will do. And this is quite a valuable point of flexibility. And another thing that you can do as a result of the semantic is that the operator can put the level changes at semantically meaningful points, right. That you might want it to be the case that for every minute or every hour or what have you, you have some consistent accuracy or that, you know, you only switch for instance if you're doing something audio visual, you might want to only change frame rate at some point that makes sense with respect to the underlying code. And our interface is flexible enough that we can do that, and that's important. This really works. It turns out. I will now give you some experimental evidence. We used 80 nodes on the VICCI test bed. That's the sort of descendant of Planet Lab. It's the same model. You get slices on a bunch of machines. Happily for us, there's hardly any users and so we had relatively clean experimental conditions. The data is back hauled to Princeton and we will drop data if there isn't enough bandwidth. We ran this twice. Once with and without our mechanism so you can see the difference. Without degradation, we ran the system for about 40 minutes, and then we turned on bandwidth shaping. We sort of told the Linux kernel, only send this much data. And so bandwidth usage drops due to the bandwidth shaping. When you remove it, it, of course, goes back up as it drains its queue. This is what happens to latency. And this is not good. I'm showing you the median, the 95th percentile and the maximum. And the thing to notice is all of them start growing rapidly as the queues build up, and then when we turn off the bandwidth shaping, the queues start to drain. The median node drains quickly and only takes about ten minutes for it to recover. The 95th percentile takes a good long while. It takes something like 45 minutes before the 95th percentile latency recovers. And the last node takes an hour to recover, right. That node is starving for bandwidth. That's bad. This is the sort of precise experimental version of that figure I wanted you to imagine earlier. This is really what it looks like. And this is no good if you're trying to do streaming analytics, because that says if there's a bandwidth glitch, your streaming is going to fall behind. The adaptation fixes this problem. Here's the same experiment. Here I'm turning on bandwidth shaping. I'm turning it on twice to show you that there isn't a transient there. This is what the median latency looks like. It went up to 15ments before. Now it goes up to eight seconds. That's a big win. This is the 95th percentile. You'll notice that it's not that much worse. You'll notice that the 95th percentile again recovers quickly. Just to show you I don't have anything up my sleeve, here is the maximum latency, the graph is a little ugly because it turns out that nodes don't you get sort of aliasing effects because nodes don't always report, but even the maximum latency only goes up to about 30 seconds. So life is okay, right? That's okay for that worse mode, worst of the time. Mostly, the nodes are quite good. It's only a few seconds. There's no really great way to evaluate programability so let me tell you what we did. We picked eight queries drawn from operational experience with Coral. We coded them up. We discovered that we could code all of them in somewhere between five and 20 lines of code, a few one of them was about a hundred lines. That did some complicated two round thing where you would look at what was the yeah? >>: Just to clarify on the previous slide, you are getting two different results in that one of these gets you degraded data and one of them doesn't; is that correct? >> Ariel Rabkin: Yes, that is the case. >>: Okay. data? And does this one ever get you back to the original >> Ariel Rabkin: It doesn't do it automatically in our current implementation. You can go back and query that data. That datas all stored. We haven't thrown it away, but we aren't doing it in a streaming way, right. The logic of this was that if you're a user, often late is not actually an improvement on never. That if you're making some decision now, the fact that you can ultimately get back that data is not extremely exciting. So it made sense to say that we don't back haul by default. That if you didn't look at your logs immediately, we assume you might never look at them and so we will leave them where they're created. >>: So in a way, we've shifted to a problem of people people don't know how to program systems that will use the bandwidth appropriately and now we're hoping that they know how to trust the statistical validity of degraded data with degraded data? >> Ariel Rabkin: We can tell them how statistically accurate their data is. Often, you can get a quite good error bound. In particular, if you have a sketch or a histogram, those come with really well defined error bounds. There's no difficulty in telling people your data is accurate to within two percent. Likewise if you're dropping low ranked values, it's quite easy to know actually what the biggest effect that could be was. So we can tell people this is the statistical validity of your data. We have that. >>: How sensitive is this experiment to the topology of the [indiscernible]. >> Ariel Rabkin: We couldn't arbitrarily vary the topology, and we couldn't so we don't have a great way to tell you about the effect of the topology of the underlying network. >>: But it is topologies does affect this results. In other words, if you had a different topology [indiscernible], these results could be a little different. >> Ariel Rabkin: >>: Presumably, yeah. What is the topology of VICCI? >> Ariel Rabkin: The topology of VICCI is that the diff represent sites are attached to internet2, and the question now is what is the topology of internet2, and the answer is I don't really know. I don't believe we're saturating that network. The bottleneck here, I believe, is the Gateway to Princeton. >>: The Gateway to Princeton, I see. >> Ariel Rabkin: That this is about 500 megabits and Princeton only has a gigabit link. So we're using a large chunk of what's available there. >>: So that node that was starving is a Princeton node that is trying to get out through that >> Ariel Rabkin: I assume it's a node not at Princeton that's trying to connect to us and discovers that that gateway is saturated. >>: Okay. >> Ariel Rabkin: All right. And with degradation mechanism, the other nodes can be backed off, which you can't easily do otherwise. All right. Speaking of the difficulty of configuration and error analysis, that is my next topic. Software systems have many knobs and switches that, right, often there's a thread pool or a number of threads or a memory bound. There's some number that you have to put in by hand. And often, then, there's a switch which is do you want this feature on or off? Do you want this always, sometimes or never. What have you. You have choices. And then there's external identifiers. There's places where you need to refer to something out in the world. You need an IP address, you need a file name. You need to specify the network interface to bind to. You have these options. And I will tell you how we will debug them, particularly these last two kinds that you have some option that's just totally wrong. It's not merely inefficient, it's wrong. And as with microphones, often you don't really know what's wrong. If you start Hadoop out of the box without configuring it, you get this. And this is not very helpful if you are a novice user. You have a stack trace with a null pointer exception, what do you do? If you're an expert, it's great because you can read the code and understand what it means. But if you're a novice who doesn't want to read the code, you're very confused. >>: Can I ask you a scoping question? The thesis here is that these configuration challenges are coming from the fact that the user is having to tie together dozens of pieces of software that fit together in different ways, is that >> Ariel Rabkin: That is one of the things that makes this especially hard. But even if you have one, quote, system, that system has pieces underneath that don't always align perfectly. >>: I guess that's what I'm asking is historically, you start out with these sort of with systems that are designed that only engineers could love because they're made out of [indiscernible] that you can rearrange. And at some point, we understand what the killer outs for them are. >> Ariel Rabkin: Yes. >>: And then some company comes and builds a monolithic tower or something called Office with Excel and everything and says here's a nice package. And no, you're not going to want to see the null player exception but that's okay because we tested it. So, I mean, is there a reason to believe that managing a stack of bricks is actually going to be a long term phenomenon, or are we going to learn what the sort of named motivating apps are and then build single coherent packages that are well tested? >> Ariel Rabkin: That's a really good question. >>: Feel free to defer it. fits into the I'm just trying to understand how it >> Ariel Rabkin: So one thing to say is your office runs on one machine and your distributed execution engine does not. And so just intrinsically, you have now more bricks if you are in a distributed world, because you need to specify what the hardware is and what the network is. And I don't think that really is going to be monolithed away. I suppose that there are companies that want to sell you an appliance and that's the way that we sort of hide that. But unless you believe that the processing appliances will take other the world, yeah, we're going to have a lot of software pieces. Also, the sort of economics of development are such that the size of a coherent, fully engineered solution would be really infeasibly big when you start to get to multi million line software systems. There's very few companies that can deliver a product the size of Office, and I would be surprised if anytime soon the whole ecosystem of distributed processing gets that consolidated. So the answer is it's not intrinsically impossible to deliver shrink wrapped software in this environment, but it isn't happening soon. And if you want a longer answer, we can discuss more later. So what do you do if you have this stack trace? So the natural thought you might have is there must be some knob I can change. Hadoop has knobs. It has many knobs. You're not out of knobs yet. I stopped at M. So you have hundreds of options. And this, by the way, is not a fanciful problem. I spent a summer at CloudEra. I looked at their trouble ticket database and the thing you learn is misconfiguration is the biggest problem. When you measure by supporter time or supporter number of cases, mostly it's misconfiguration, or at least that's plurality of the problem. That is a bigger problem than bugs. In this case, a milligrams configuration is a thing that you fixed by changing the configuration. A bug is a thing where we went to the developers and said you need to patch this. So mostly, the problem is in the configuration, or at least, sorry, the plurality of the time, the problem is in the configuration. So that's a sign that this is really where we should be devoting our effort. And because the software is sort of bolted together from bricks and because different users have different use cases and because no two MapReduce sites are quite alike, often the error messages are rather unhelpful and often it's sort of hard to figure out really what your problem is. That people routinely ask for help and don't get it, which is why there's a support business. What do we do about this? Well, one approach to automated debugging is collect a lot of data that you can take some program and you can instrument et in six different ways. You could do dynamic instrumentation, you could watch the system calls it does, you could sort of profile its execution in various ways. You could just collect a lot of data and use that to match it against some either library of known problems or to figure out where in the program it went awry. And this is great if the program is running on a machine you control. But often, the program is running on a machine you don't have full access to. In that case, wouldn't it be nice if we could just search if you could take the error message and put it in a search box and get back a result. And I will tell you how to do this, and this is great because it requires no access to the site where the program is running. It requires no modification of anything. install. You just search. >>: There's nothing to You still have to instrument? >> Ariel Rabkin: No, there's no instrumentation happening and I will tell you what we do instead. So the goal here is to resolve the misconfiguration with only the error message, whether or not we've seen it before. All we get is an error message, right. So this is sort of the minimalist thing you could have. If the user says help, it's broken, with no details, no. But otherwise, this is probably the least we could possibly assume, which is good. And because all that we are going to rely on is error messages, we could do it all in advance. That there's only a finite number of points in the program and so for each point, we could build a table which is what was the option that could have affected could have caused an error there. We're going to build a table, and we're going to do it with static analysis all in advance. And at the end of the day, when the user has an error, they can go to some diagnosis service, maybe with the app, maybe on the web, and they can do a query and they can get back a result. The assumption here is that the developers are minimally friendly. They at least will give us access to friendly compiled binary. We don't actually need source, but you do need debug symbols to know line numbers. The assumption is we have that and use some insight into the structure of the program. We don't need to run it. We don't need anything else. Yeah. >>: Why not change the lines of code into helpful error messages instead? >> Ariel Rabkin: I give you two answers, the first of which is you don't always know in advance what the helpful error message would be. Since if the problem is that it threw an exception, it's sort of hard to figure out all the possible exceptions that would be caused by misconfiguration and hard to think about all the possible configuration causes for errors. So we don't actually, in general, know what those error messages should be. The second thing to say is that you don't always have the ability to make those changes. That often, it takes until fairly late in the release cycle before you understand what can go wrong and people do not want to put in a lot of new code to log things at that point. So from an engineering point of view, that actually is quite painful. It took, I think, eight versions of Hadoop before they patched that silly crash on startup, right, which is like the very most obvious thing, because it's fresh out of the box. If you can't patch that, I think that just says that you're not going to get very far telling engineers to write better error messages. So we're going to try and clean up after them. And the [indiscernible] that is if you have some exception, you can just look it up in this table. The table then matches, you say, aha, line 200, I know what that, and back comes your response and life is good. The semantic here, by the way, is possibly responsible. I make no claims that it's always necessary one of these or that it's always included. That will be I'll show you empirically good enough. This is the part that's hard, the static analysis, the rest is a RegX or a web search. And this is work that appeared sort of in software generating venues a couple years ago. So let me now say a little bit about how this happens. So there's this stack trace. What really is going on under the hood, to give you the flavor of why configuration bugs creep in, is there's some configuration option, which is what is the file system I should be binding to. And this is null by default. And then when you concatenate null with a port number, that's going to fail. You can imagine how this happened, right? One developer writes this, one developer writes that. They're testing only in a sort of well managed cluster where that option is always set. They didn't bother to look at what happens if the user despite put anything in the configuration file by default. Oops. How will we fix this? Well, we're going to do data flow on the configuration options, that there's a point in the code where it reads an option out of the config. This has a key value interface, and then we can sort of propagate those labels through statically. That we're going to do data flow, and we're going to say aha, if there's a use of a value, then the output is also labeled. There's a points to analysis in the background and so we can sort of trace values through the heap. If you stuff something into an object and you can read it out on the other side, that will get sort of picked up by the analysis. It turns out that you have choices at this point. That there isn't one standard way to do points to. There are many choices. And you have to choose between a sort of liberal notion of data flow, which you sort of include all the possible ways values could flow, and a sort of strict notion of flow in which you might miss some flows. And if you take the liberal approach, your analysis will produce more false positives and if you take the strict notion, it will produce more false negatives. And so when you are designing this analysis, you must sort of steer between the monster and the whirlpool. And just to put some sort of concrete techniques down, if you want to be liberal, you need to do inter procedural control flow. You say, well, whether or not this method was called depends upon this option. And you want to do a sound point to analysis that picks up all the possible points to dependencies. Or you could be strict and say, I will only do a limited control flow analysis, and I will ignore some possible points. >>: Seems like a crash on startup is sort of the best possible case for this, because at that point, that code's only been tainted by one or two options. this will >> Ariel Rabkin: But in the middle of the program, This is a static analysis. >>: So even so, in the middle of the program, say deeper in the program, as opposed to the middle, there are hundreds of options. >> Ariel Rabkin: Yes, this is an unusually friendly case, which is why I wanted to pick it to talk, about to sort of make the analysis a little clearer to describe. I will show you empirically that this works, don't worry. That was not going to be my only example. Let me now talk a little bit more about why system software is hard and why sort of applying static analysis to this kind of problem is not a straightforward exercise. Let us suppose you have some program where main calls method A and there's no visible caller of method C. If this were undergraduate programming analysis, program analysis, you would say aha, I will write an inductive rule for reachability. Main is reachable, that's my base case, there's an inductive rule that says if A calls B then, you know, B is reachable if A was reachable. And so if there's no caller of C, C is not reachable. This is not how system software works. That in the world of complicated software systems, there are RPCs, right. Stuff is invoked remotely and there's some reflective glue that makes that happen. And you could try and do a sound precise reflection analysis. This turns out to be, in general, intractable, and so the fix that I did instead was to just label the classes that are exposed remotely. And this is actually not hard. There's two of them for a program the size of Hadoop, four of them or something like that. It's a handful. So the network interface that the system exposes and you say that class is the network interface. And this requires a minimal amount of insight about the system, but only a minimal amount. This did not require deep surgery. Once you do this, you have to adjust the points to analysis to compensate, right. That if stuff is invoked remotely, you need to know upon what object it was invoked, and this then requires that you adjust the underlying points to cope with this fact that they're sort of call change that didn't come in sort of through main. And once you do this, it turns out you can actually get static analysis to run on these complicated distributed systems that it has not historically been run upon. And that's an achievement. There's another problem which is that it doesn't scale. The static analysis is exponential in the worst case. And in the concrete cases I care about, it's really bad that Cassandra, which at the time was 10,000 lines of code, so quite small, took three and a half hours to run. Took a month on Hadoop. And that's really bad, because there's a new version of Hadoop every six weeks. So if your analysis takes a month, you're in trouble. And, of course, there are larger systems than Hadoop. So this is a problem. Happily, there is a fix and the fix is that we can buy adroit choice of heuristic, get this down to something manageable. And, in fact, for Hadoop, we can get it down to about 20 minutes. That's a 2000x speedup. That's enough to be very happy as a system researcher. What is the problem, really? So it turns out that the problem really is the libraries. That when you have a program, you think you have a program, but actually you have this sort of giant iceberg of underlying code. That there's a standard library which is usually much larger than your program, and there's all these of a third party libraries that you have linked against. And the problem is that the analysis is spending its time there, and it's spending its time there because you actually have a genuinely different points to groove in different programs that use different subsets of the library. So you can't just analyze it once, because what points to what did depend on what you did. Fortunately, there is a way for our purposes that we can ignore all this code, that we can sort of cut on this dotted line and just model the library instead of analyzing it. Let me now tell you how. So turns out libraries are special in a deep way. And the deep way is this. In a static analysis, if you're analyzing method A and it calls method B, the analysis has to go and analyze B in this concrete call site and look at it with these arguments, right. You have to trace this data flow through method B. Library code is special. It can't modify the user heap. There's a sort of dual contract to protect you that the type system says the library can't really access the fields of application structures because it doesn't see their types and there's a social convention which is that it would be terrible manners for the library to try and evade this with reflection, right. That in general, the library won't do that to you for good reasons. And likewise, libraries don't have global state. Again, this is enforced to some extent technically and to a large extent socially that it would be a very bad library design if data flows in here and flows out there, and there's no visible connection between those sites. And if we assume that these rules are actually followed, then there's suddenly no need to analyze the library. That we're guaranteed that any data flow through the library will be local, that if a configuration option goes in, it will come out right here. That either the return value or the receiver object will be tainted but not some arbitrary other structure. >>: I wanted to question the second one about global state. Plenty of libraries have [indiscernible] where you create a foo and you get something back that's kind of like handle for a foo and then you do various >> Ariel Rabkin: That's okay. come back tainted. That will work. That handle will >>: Even if it's [indiscernible]. >> Ariel Rabkin: objects. Yes. The analysis treats them as opaque >>: So the thing that you are asserting libraries don't do is tuck that tainted state away in a static global that only the library can see and a different [indiscernible]. >> Ariel Rabkin: doesn't happen. Right. I am asserting that that normally >>: Okay. So it's okay to recognize the static global if it indexes into it >> Ariel Rabkin: Yes. That's fine. The analysis will do the right thing there. You might, at this point, get nervous. I will reassure you in a moment. Let me first reassure you that this runs quickly. That whatever it gives us, it gives us in a hurry. That for something the size of ANT or FreePastry, which I think are half a million lines of code, this takes half an hour for the component of Hadoop which are order of 70,000 lines. This is a couple minutes. That's okay. This would have been totally off the charts otherwise. So we really needed these heuristics or something like that. needed something to make this tractable. Now let me try and directly address those questions. Yes? We >>: FreePastry, you said, is half a million lines of code without libraries? >> Ariel Rabkin: It's up there, yeah. Maybe it was 200,000 lines. I don't remember. I'd have to look that up. But it was large. And because it was written by researchers, it has sort of quite convoluted code. It has the sort of continuation passing style with a lot of call backs and that turns out to create very tangled points to graph. So it was slow to analyze. Let me now talk about accuracy. And for accuracy, I'm going to look at two programs. I'm going to look at Hadoop, and I'm going to look at the JChord analysis tool. So these are sort of large, complicated configuration heavy systems programs that I didn't write but have used and, therefore, am qualified to evaluate on. Because I some idea if the results make sense. To evaluate, it turns out there's a tool called ConfErr, which does false testing for configuration. You give it a program in a working configuration and it will permute that working configuration until it crashes. In running this, I came up with 18 failed instances across the two programs. So this is a set of configuration mistakes that came that were not created by me. I have nothing up my sleeve. I didn't invent them. We're now going to use these to check if the analysis is correct, right. That we know the root cows, because we injected it. We can find out what the analysis tool finds. And what you get is I'm going to show this as false positives versus false negatives. That the analysis gives you 80 to 100 percent of the correct guesses and it gives you only a couple of diagnosis each. You give it an error message, and it gives you back three guesses, usually clouding the right one. To help you interpret the plot, the theoretically ideal is there. That's really the theoretical ideal. That's in the case where every error message has exactly one cause. And every problem leads to exactly one error. That doesn't happen for real software, but that's sort of the very best you could do. And note that these programs have hundreds of options so if you did the naive thing and just returned everything, you'd be, again, off the charts. So three guesses, that's not so bad. We can do better. I've been talking about errors as though they emerge at a point. Actually, they emerge from a path. There's the point where an exception is raised and then there's the stack trace that got you there. And we can use that whole stack trace. That in particular, when you have an exception, this stack trace is telling you the calling chain that got you there. That method A could have been reached from any of the various paths. This is the one that caused the exception. And you can do a static analysis where you ignore all the other call chains and where you ask only about the data flow that can reach a given point via that chain. So this is now a static analysis that's parameterized by this stack trace. And that's actually quite quick. That here, you really can reuse the points to and everything else because it's the same across different analyses of the same program. And the consequence is that this is now very quick to do. And so you can imagine doing some caching so that the user doesn't even see that, but seconds, that's not so bad. There's another thing we can do, and this maybe bears on what you were hinting at earlier about early in the program. As a program runs, it will look at an option and then it will look at another option. And you might imagine that it would normally read option C and D, but instead it crashes. This told us something. This told us dynamically for this concrete failure, C and D could not possibly be responsible, because the program never looked at them. And so if you record dynamically concretely at run time what the program read, you now know which options it didn't look at, and that really helps you, all right, that you can sort of trivially then filter away the things that never came in. And now using these two techniques, we can now get a substantial precision improvement that using the stack traces, we get a noticeable improvement, and now using this login, we get another improvement. And now we're really quite close to that theoretical ideal and life is good. A thing to notice is some programs have been, quote, helpful and produce a, quote, friendly error message and not a stack trace. This is good for humans and bad for automated analysis, because you have less information about the state that got you to that error. And JChord, which was programmed cautiously and defensively to produce a lot of error messages doesn't tell you by what both it got to that error, and that's why you didn't get an improvement by looking at stack traces. So stack traces really are useful if you can get them. >>: So I'm still a little worried that you sort of motivated the problem in the beginning by saying that CloudEra had a plurality of problems with configuration. Are those all problems that would be addressed by this, or actually an even better question might be, how many of those would be addressable by Google, which is in some sense the trivial way of solving the problem. >> Ariel Rabkin: None. They wouldn't have filed a support ticket if they could have looked it up on their own. By the time they file a support ticket, smart people have been stuffed for a while. >>: So does this is there any [indiscernible] looking at some of those cases and determining >> Ariel Rabkin: This would have helped on some. It turns out that there's a piece of this analysis that is less interesting from a research point of view and more useful practically, which is we can produce a list of what are all the options. And that's really useful as a supporter, since it often turns out that people have like a typo in their configuration, and being able to just statically extract a list of options and what their types are, that's huge. That really made it into production use at CloudEra. The answer is half of this. The half of this that is less research interesting was more practical, as so often happens in life. But a large chunk of this sort of static analysis machinery was actually necessary. Did that partially answer the question? >>: Yes, partially. >> Ariel Rabkin: I mean one way to say it, in other words, let me give you then a better thought, which is that there's different classes of users who have different classes of bugs. The class of bugs which this work is directed to is I am a non expert. I am trying to set something up. It isn't working. And in these cases, typically, the errors are going to be kind of blatant. They'll come up on startup. The kind of errors where you pay a support company a lot of money to fix it for you, those are the hard errors. No, this technique is not the one you would use for the hardest possible errors, right. We sort of filtered out everything they could do by Google, everything they could do by fiddling for half an hour, all of the easy cases have already been stripped out. It turns out that when you automate the easy cases, that is not you don't see that benefit at a support company. Users will see it. These bugs do happen. I have a lot of anecdotal evidence that people do hit silly mistakes. It happens they don't file support tickets at an enterprise company for them. There's a longer discussion about what pieces of heavy machinery are useful at an enterprise support company, which we can have offline. The answer is a lot. Although admittedly not this. I suspect that a large fraction of the silly mistakes you hit as a novice user are misconfigurations. You don't really hit bugs typically in those cases. So you should expect that the set of problems that you care about as a non expert actually is very heavily skewed towards misconfiguration. So I think the motivation is still valid. Was there another question there? Yes. >>: So it sounded from your methodology you just generated 18 random errors. Why wouldn't Google help with that? >>: Because there's two things to say. The first of which is that people are really bad at documenting these things publicly. That actually, the state of public discussion of these sorts of bugs is not amazing. That often, you just get nothing. That's the first thing to say. The second thing to say is, JChord has, I think a dozen users. I mean, it's a tool that you use if you're a static analysis learner so it does not have a highly public support. There's a lot of software that has a dozen users and so I think that being able to help those users is meaningful. This is sort of great on the long tail where you don't have enterprise support companies and high traffic forums. That's the first thing to say. The second thing to say is particularly something like Hadoop, which is really a moving target with lot of versions, the thing that was the right answer a year ago is not the right answer today. And you get actually worse than useless results from search engines because you get a lot of answers that would have been right three years ago. And there's no mechanism to deprecate them in says actually we fixed that problem. And if you see this error today, it's a different cause than it was before. And users get confused by this. Next, okay. Let me say a little bit then about my next steps. As I mentioned, I want to make modern software systems easier for non experts. One of the ways that you get in trouble is, as was mentioned, that we have many bricks. That we have a big software stack, often something breaks at the top level. Often there was an underlying cause at a low level and you'd like to tie them together. Sometimes it's the reverse. That there's an underlying cause at a low level that some ethernet cable fell out and you see a high level thing which is your overnight script failed to complete. And I would like to be able to tie together the symptoms and the causes sort of through the software stack. I want the sort of cross layer visibility. I want to sort of drill down through the layers and tie it all together. Let me be more concrete and talk about permissions. That if you have a permissions problem, if something isn't readable, that what will happen is there's some file read that fails. Some input or output will produce an exception. And if you are debugging this, what you really want to know is what is being read, what credentials do I have, what credentials do I need. You'd like to understand the sort of flow of authority through the system. And this is currently quite painful to do, since this isn't really logged anywhere and you don't have good visibility. Note by the way that this is a security problem since the users will fix this by making everything world readable and so you should really worry about this. Let me give you another example, which is that the way that MapReduce and its cousins work is that your data is partitioned. You run a task on each data partition. One of those tasks might fail. And a very natural question is why is this task different from all other tasks? Why did this one fail? And there's this advantage which is that the situation is symmetric. That it's really the same code and on similar data. And there's a technique called statistical debugging that's been kicking around for a while, for sprinkling predicates through your program and doing a statistical analysis to figure out where things went awry. And this is an unusually fertile domain for it, since it's really the same code and it's all in a totally managed environment where you can instrument it. There's a wrinkle, which is that people don't really write MapReduce programs that often. They have high level tools like Hive that compile down to MapReduce. And to make this really usable, what we'd need to do is invert that mapping and figure out from your analysis of where the underlying program went wrong, what went wrong at the level of the user's abstraction that you have to sort of translate this back up through the stack. And this is not an insuperable problem. Debuggers do manage to do this. How do you do it for something sort of complicated and ad hoc like in MapReduce scripting language? research problem. That's a hard Let me sum up. I believe I am now out of time. I've told you about JetStream, which was about how we configure how we can support configurable degradation policies to manage bandwidth efficiently and I've told you about my work on configuration debugging, where we match errors to causes with static analysis. More questions? And thank you for having me. >>: I'd like to return to JetStream. The second half gave me some time to think about some questions. So in order to do these control functions that you're describing, you have to have a knowledge both of the limitation of network bandwidth and the amount of data that's flowing to it, and then secondly, you've got to know something about these operators that are going to compress and/or grade the quality of the data. What did you do in both of those areas? information? Who supplies the >> Ariel Rabkin: So measuring the data flow is quite straightforward. We control the flow. We can measure how many bytes we one of the out on the network. Likewise, we have mechanisms in place to measure the latency, and that lets us watch the queue grow and measure what is the relationship between the bandwidth that we are using and the bandwidth that the network is supplying, right. If the latency doubles when you add ten kilobytes per second of data, that told you that you needed to back off by a factor of two. >>: You don't really need to know much about the topology of the network? You just >> Ariel Rabkin: measuring the We are not measuring the topology. We are >>: Just latency versus volume, and if you see a problem, then you just presume there's a bandwidth problem? >> Ariel Rabkin: >>: We are presuming that we should back off, yes. Okay. >> Ariel Rabkin: You could imagine a more sophisticated version of this that has actual insight into the network and can know where the bottleneck is and can say this latency is irrelevant to me. I will keep sending. This is not really feasible in the wide area, which is the context we're targeting, because we don't own those networks, and AT&T or internet2 won't tell us where the bottleneck is. All we know is the packet took this long to arrive. >>: And the second half is there's a cost estimation issue of knowing what these operators do that supplies that. >> Ariel Rabkin: Yes. was so da, da, da. >>: The operator itself is coded with as I Part of the operator interface >> Ariel Rabkin: The interface between the operator and the controller encodes this. That the operator, when it is able to change its degradation level, specifies what its levels are in terms of bandwidth. And for some operators, this is very simple, right. If it's a histogram that you're down sampling, you know perfectly well that if you drop half the buckets, you'll have half the data. That's the easy case. The hard case is you're doing something like coarsening, where you don't really know. And in those cases, the operator will estimate. >>: What happens if there's multiple ways for the controller to get the desired effect of reduced bandwidth? >> Ariel Rabkin: The controller has the policy, and the user explicitly specifies what is the priority to do. What to try first. >>: Both in terms of the data sources and in choice of operators? >> Ariel Rabkin: The model is that you fix the, quote, operators in advance. And then the policy specifies which of them to apply first. All of these operators, you should think of them as a sort of variable resistor or a sort of variable faucet, where it can be all on, all off or somewhere in between. And by default, they're all on, and so the question is which knob to turn. >>: What I meant by that is the controller could choose to either greatly reduce one data source's >> Ariel Rabkin: Yes. >>: Data supply in order to get latency down, or it could peanut butter the thing and just spread it evenly across all of them. >> Ariel Rabkin: Yes. >>: Moreover, for any given data source, it may have a choice of several operators. >> Ariel Rabkin: >>: In different combinations with the desired effect? >> Ariel Rabkin: >>: Yes. Yes. So all of that is wrapped up in this? >> Ariel Rabkin: That is all wrapped up in the policy. And the policy language we currently have is not infinitely flexible. It lets you specify the priority of your operators. You could imagine extending it to cover sort of every other such case. We think it's a very great advance to have a centralized policy and a point of mechanism where you can apply it. And that's sort of the contribution here. We aren't claiming that we have the all purpose policy language to express every possible data transformation. That would be nice, but that seems sort of out of reach of current techniques. >>: I just have this feeling that without the topology, it's kind of hard for this to work. And feel free to push back. Maybe I don't understand. But one of the things I think you mentioned was that the system was able to degrade the data but was not able to sort of go back. If the bandwidth we comes more plentiful, the going back part wasn't implemented yet or at least you guys haven't done that. >> Ariel Rabkin: So what's implemented today is if you want to go back and get the data, you'd issue a query and the data will be copied. >>: [indiscernible] but the data that is going to come in the future, that I want now to go back to full resolution because I have [indiscernible]. >> Ariel Rabkin: tune >>: Sorry. The system is auto tuning. It will It is auto tuning? >> Ariel Rabkin: Yeah, it tunes back up as well as down. >>: So then I don't understand why you don't encounter flopping issues where if you have multiple nodes that shift their bottom [indiscernible] start to degrade and those start to sort of >> Ariel Rabkin: >>: Yeah, so [indiscernible], you know. >> Ariel Rabkin: So the answer is you see a little of this by sort of tuning of the control algorithm, you're able to minimize it and you get pretty consistent performance and pretty even performance. The other thing to say is that often the topologies don't really look like that, that if what you're worried about, for instance, is sensors, typically the first link is where the bottleneck is and that's not shared. If you have data, coming from cell phones, there isn't really shared bandwidth there. That the limit is my link to the tower. >>: Is it fair to say that while the system doesn't have control over the network, the system could infer where the bottlenecks or and maybe not bottlenecks are about what sort of operators share bottlenecks. >> Ariel Rabkin: Being able to either acquire that knowledge automatically or incorporate that knowledge into the system are both really interesting research problems that we have not yet tackled. Yes. It would be nice to handle those cases. We think it's still useful without them. Next. No? Good, all right, thank you. This was fun.