21989 >> Tom Ball: Hello. Good morning. And...

21989 >> Tom Ball: Hello. Good morning. And thank you for coming. I'm really pleased to have Mayur Naik back. He's been here quite a few times and is well known to us. He was an intern on the SLAM project back in the summer of 2002, we computed, right? And we had a great time with Mayur. In fact, in that summer I think he started in May, and we got some results and sent off a paper to POPL in July and it got accepted. So it was like the shortest time from like paper inception idea doing the work, do we have enough, send it off, get accepted. Yeah, it was really great. And Mayur graduated from Stanford with Alex Aiken, did a lot of work on program analysis, and we'll hear a little bit about program analysis today. Since then he's been at Intel Labs. He's done really seminal work on data arrays detection and concurrency bug detection in programs using static analysis but he's broadened out from those beginnings on defect detection and verification, and he's going to tell us today about some other applications. So Mayur, welcome back. >> Mayur Naik: Thank you. So let me begin with the context for my research. So we have three dominant computing platforms today. Parallel computing, cloud computing and mobile computing. And these platforms have certain characteristics that distinguish them from the traditional desktop computing model. For example, the hardware devices in these platforms are multitudenous, and they're diverse and [inaudible], and they're geographically distributed. And these characteristics pose unprecedented challenges in software engineering. For example, in software reliability, programmer productivity and energy efficiency. So I'm going to give a talk about a challenge in each of these three kinds of computing platforms in this talk. So first comes mobile computing. So Smartphones are ubiquitous today. And so are the apps that run on Smartphones. For example, Apple recently announced their ten billionth app download from its app store. But the problem with these Smartphones is that they can render rich computive intensive apps because of very limited battery size, CPU memory size. There are many challenges, of course, in mobile computing. One is how can we seamlessly partition these rich mobile apps and offload their compute intensive parts to the cloud. What I mean by the cloud is even a more powerful computing device in the vicinity such as a desktop or a laptop with which the Smartphone has a strong connectivity. Speaking of cloud computing, it has taken off finally thanks to the growth of the Internet. And it has its own software engineering challenges. So, for example, cloud provider's expected to meet service level agreements. There are various resource management challenges such as how can the cloud be energy efficient, how can we do a better job scheduling so as to improve the throughput. How can we exploit data locality of jobs in the cloud. The challenge is how can we automatically predict various performance metrics of programs, such as the running time or the energy consumption, or the data locality. It would help solve some of these problems if we had a solution to this problem. And finally comes a challenge in parallel computing. Roughly around 2004, CPU speeds stopped increasing and what we see is an increasing number of cores now. None of which is getting any faster. What that means is software will have to be returned in a concurrent fashion in order to take the performance -- advantage of the performance benefits of these multiple cores. But it's well known that writing concurrent software reliably is significantly harder than writing sequential software. So as this quote as an expert on Java concurrency shows. So the challenge is how can we automatically make concurrent programs more reliable. So I'll begin with some terminology here. So program analysis is a body of techniques for discovering facts about programs. And we have two kinds of program analysis. We have dynamic analysis, which are program analysis that run the program to discover these facts. And we have static analysis, which don't involve running the program. So this talk then is about synergistically combining diverse techniques, such as static program analysis, dynamic program analysis and machine learning. In order to solve some of these modern software challenges that I talked about such as program scalability, program reliability and program performance estimation. So I'll give you a preview of our results before proceeding to the key insights that helped us achieve these results. So the challenge in mobile computing is how can we scale mobile, rich mobile apps on resource constrained mobile devices? And I'll show you how we have combined a static and dynamic program analysis to seamlessly partition these mobile apps, and off load the compute intensive parts to the cloud and thereby achieve up to a 20 X decrease in the energy consumed on the phone. In cloud computing, the challenge -- one of the challenges is how can we estimate various performance metrics of programs, and I'll show you hoe we have combined static analysis, dynamic analysis and machine learning to automatically predict the running time of general purpose programs. And we do this both accurately and efficiently. What I mean by accurately is that the prediction error for benchmarks is on average less than 7 percent, and the cost, the prediction cost is on average less than 6 percent of the total execution cost of these real world programs. And finally for parallel computing, the challenge is how can we make these parallel programs more reliable. And I'll show you how we get combined static and dynamic program analysis to scalably verify concurrent programs. In the process, it has exposed a 400 concurrency bugs in up to 1.5 million lines of code of vitally used Java programs. And many of these bugs were fixed within a week of reporting by the developers of these programs. So what I'm going to do next is present the key insights behind each of these three problems, which enable us to achieve those results. And as I go from one problem to the other, the depth of the program analysis is going to progressively increase. So I'm going to start with seamless program partitioning, in the context of mobile computing. So suppose we have a Smartphone and we want to run a rich app on it. What I mean by that is say an app which has a compute-intensive function such as a face detection routine which detects the faces in all images stored on the phone. But if there's a more powerful device in the vicinity such as a desktop which this phone has good connectivity then we might want to offload this computation to that device so that we can conserve the energy that is going to be consumed if we run that compute intensive part on the phone itself. So we've built a system called Clone Cloud that allows you to do such offloading. And while Clone Cloud is built for the Dalvik VM on the Android platform, this could be done for Microsoft.net or Apple's IOS as well. Any application layer would work. Clone Cloud recognizes two kinds of instructions, offload and resume instruction. And the way these instructions work is as follows. So when the app starts executing and reaches an offload instruction, the state of the thread, which is executing the offload instruction, is migrated to the cloud. And execution there presumably of the compute intensive part resumes, continues, and finally when it hits the resume statement, any migrated nucleated state is shifted back to the mobile device and execution resumes here. There are many system level aspects we worked on but I'm going to focus on the program analysis challenge here, which is how do we automatically find which functions in this app to migrate? And notice that this is not obvious because even though you're getting the power from more powerful device to execute compute intensive functions there's also the cost of shipping data back and forth which itself consumes energy on the phone. >>: I have a question about -- are you assuming that the app is single-threaded so there's no interference from other kinds of things. >> Mayur Naik: Could be multi-threaded and execution continues on the app until it touches any state which has been migrated. So any reachable state from the thread which could migrate. We do thread level migration. >>: At which point you have the allocation does it get blocked until the resume happens? >> Mayur Naik: The one running on the app itself will get blocked until the resume happens. So we formulate this as a mathematical optimization problem using integer linear programming and solve it offline using an off-the-shelf LP solver. So the constraints are obtained from static analysis, and the objective function is obtained from dynamic analysis. So static analysis can dictate what solutions are correct. So where is it legal to put offloads and resumes and we allow this to be put at the entry and exit of any function of the app. Constraints, an example of a constraint might be that certain functions have to run on the mobile device. Such as functions that access sensors, such as a GPS or camera. More interesting constraints are that we shouldn't have nested offloads and resumes. Offloads and resumes should alternate on every execution path. We use a simple static call graph analysis for that purpose. Many solutions in practice are correct. But only a few of them will be optimal in the sense that they will minimize their total execution time of the energy consumed on the phone. So these are the two metrics we have implemented. So dynamic analysis can data mine, specify the cost of each of these solutions, and then the ILP will choose an optimal one. And for this purpose we use program profiles, profiles of this app from both the mobile device and the cloud. So the program analysis here is very simple. But I'm going to show you some of the results we have. >>: Are the program profiles just control files? >> Mayur Naik: They measure the -- so we have a representation of the execution time of the program in terms of each function. So how much time did each function take and then you can ->>: So you're not breaking down data, amount of data transmitted. >> Mayur Naik: We also have the amount of data. So for each function we give both the execution time and the data, the input and output data that is needed. Because if you shift that function to the cloud you have to ->>: So the program profile is time? And then also the amount of network access? >> Mayur Naik: Yeah, we know in advance how much energy will be consumed on a given network for each byte of data that's transferred. So just measuring the number of bytes that is transferred is enough. >>: So the methodology then you just find a whole bunch of current solutions and you run it through the dynamic analysis -- >> Mayur Naik: We are generating an integer linear program. I'm just telling you how the constraints come, for the integer linear program. And how the objective function comes. And once you have an integer linear program you just solve it using an LP solver and the output tells where to put the offloads and resumes. >>: Can you compare and contrast the work of Jay Orchestra? >> Mayur Naik: So the novel aspects here are one is we do thread level migration. So native state, for example, is not migrated for efficiency purposes. Another aspect is I would say is we are doing this automatically. I don't know how Jay Orchestra does it but here it's completely oblivious to the programmer how we put the offloads and resumes. You get another aspect is we allow the thread as I said continue on the app. If it's a UI thread it can continue as opposed to blocking. So we have implemented compute intensive apps ourselves on this Android platform. And I'll show you results for one of these apps which is the face detection app I talked about. We have implemented others such as a wire scanning app and a behavioral profile app. So here the setting is Google phone, running Dalvic VM on Androidware, and the cloud is a standard desktop running on Android VM, on Linux. So on the Y axis is the energy consumed on the phone which includes the energy to transfer data back and forth. We have done two sets of -- we've done many sets of experiments, and I'm just showing you two extremes. >>: [inaudible]. >> Mayur Naik: It's very similar to Java bytecode. Slightly modified. So all the analysis I'm going to talk about is Java bytecode. >>: So here are two sets of experiments, one is where one image is on the phone and another where there are 100 images, so two extremes. And within each experiment, we have three results. One is what happens if you run the app entirely on the phone. And another is what if you use our system, Clone Cloud, and then there are two choices there, if you use a Wi-Fi network or a 3G network. And as you can see, for one image Clone Cloud decides not to partition the app because the cost of data transfer is just not worth shipping the computation to the cloud. Whereas the moment we go to 100 images the cost is amortized over the 100 images. So we get a 20X, upload 20X speedup if we ship the computation, if we partition the app, face detection app and offload. >>: [inaudible]. >> Mayur Naik: I'm sorry. It's decrease in energy used on the phone. But we've also computed the total running time. And as most people notice, this is usually correlated. So the running time is speedups are very similar to -- so we have up to, up to close to 20 X speedup as well for the total app. >>: Are you shipping the images over? >> Mayur Naik: Yes. But not from the device. So the way the app is returned is that it loads up the images into memory. Because if -- so if you ever make a function call that might involve reading from the disk, if we make a native function call, that cannot be migrated. >>: Are the images already on the cloud? >> Mayur Naik: No. So the images -- so the cost here includes the cost of shipping the images. I would say they aren't that big images. I think it's up to 200 KB each. So we haven't really -there were limitations of the platform itself, what we could ->>: But the idea in this app is that the images would be captured on the phone? >> Mayur Naik: Yes. >>: And then the question is whether to put it down ->> Mayur Naik: Yes. >>: So did you look at the overall energy consumption on both sides? >> Mayur Naik: No. Okay. No, we didn't. >>: Is that important? >>: If you're only concerned with batteries it isn't important. If you're interested in systems [inaudible]. >> Mayur Naik: In this case the focus was just on the battery on the mobile device. Yeah. >>: So you have to determine but static analysis has to determine that the functions you're shipping do not do IO on the phone. >> Mayur Naik: Yeah, so because we know, we understand Android platform, we know which libraries access functionality specific to the device. We do virtualize a lot of functionality. There's a lot of computation -- the computation which is migrated on the cloud can use a lot of the hardware devices such as the disk and so on, on the cloud itself. But there are certain functionalities which will have to remain on the mobile device. >>: So I don't have a Smartphone. So my question is -- I'm still waiting. So suppose I wrote an app for image detection, couldn't I run it as a service and then have you upload those images to the service and the entire app would consume this memory, this energy? And more to the -- so here you present one app that can benefit from an off loading scenario, where I question for that particular app that it's the right way to write the app, and the other question is, is this typical for apps on the phone? >>: So the reason we wrote these apps is because they don't exist. I mean, the reason people probably don't do compute intensive things on the mobile devices because they're compute intensive. So we want to enable new kinds of apps. >>: But the app that you would take a picture of a -- Google has something where you take a picture of some tourist attraction and it tells you where you are. Wouldn't you want to post that. >> Mayur Naik: Yeah, I mean, of course. You can have other kinds of models. I'm not sure if this is -- here the focus is on computing. >>: Is this the right question to ask for phone apps, the energy offloading? >> Mayur Naik: Well, I mean, I showed you, right, that Apple has ten billion app downloads. >>: For those 300,000 apps that Apple has, how many of those benefit from [inaudible]. >> Mayur Naik: We haven't done that study. My guess is most apps are not even returned that way just because this feature is something which is very new. Even cloud computing is so recent that this is just trying to be ahead of the curve to enable these apps. Let me move on to the next part predicting the performance of programs automatically. I'll use the Clone Cloud as an example. What I'll show you, we do offline partitioning in Clone Cloud, static partitioning. What that means it will use the same partitioning regardless of the input. But we notice that for different inputs, different partitioning were optimal. For example, if it's one image which is the input to the face detection app, it's optimal to run it on the, entirely on the phone. Where as for 400 images, it's optimal to partition it. And the challenge is how can we automatically thread it the performance metrics, such as running time of a function like the face detection app, or the energy usage of the function on a given input. If you could do this, we could make this choice in Clone Cloud online whether or not to partition. But this is such a fundamental problem that, performance prediction that it has many other applications in computer science. So wherever you need dynamic or adaptive approaches for scheduling, load balancing, resource management, optimization, you need performance prediction. And this comes up in various fields. Databases, networking, virtual machines, compilers, cloud computing, and tile computing. So I'm going to define the problem abstractly without any application. So the input to this problem is a program P and an input I to that program. And the output should be the estimated running time of the program on that input. Since I'm going to use this as a running example in the rest of the talk, let me briefly describe what it does. So it's a multi-threaded discrete event simulation program, some like to call it the elevator program. And you can see the input to this program is a file on disk which has the number of elevators, the number of floors, and it has a bunch of events, one per line. So, for example, this event says that at time two a person wants to go from floor one to floor five. We would like to estimate the running time of this program P on this input with the following goals. So of course we want to be accurate in terms of estimation. Secondly, we want to be efficient. So, for example, you're not allowed to run the program P on that input to completion and tell me what the running time is. You're at least allowed to look at the input, of course, and do computation linear in size of the input. So you are, for example, allowed to scan this file. Two other features that are unique to our work is we want this to work for general purpose programs, and we want this to be fully automatic. So people have solved this problem of performance prediction modeling in the past, but even the domain-specific context, for example, just for database query programs or network applications where you can use domain knowledge to build performance models. Or they've done it manually. So where expert knowledge is used. Someone who really understands the program is returned and can provide a performance model. So our solution is a system called Mantis that has two parts, an offline part and online part. The offline part takes program P whose performance you want to thread it, and it takes a bunch of training inputs and it builds a performance model which is then fed to the online path when there's a new input I in which you want to predict the running time of P. And it gives you the estimated running time. >>: The assumption, the compute [inaudible] mix of all of them. >> Mayur Naik: One thing we're not doing here is we're not at all modeling the environment. This is assuming that the running time of the program just depends on the input. So other than that, we just require the program to terminate, so that we can estimate its running time. >>: I guess my question is, [inaudible] inside the program but [inaudible] the IO password over the wire and then typically [inaudible]. >> Mayur Naik: Yeah, so that's what I meant by modeling the environment. I think this is -- that I think we view that as an orthogonal problem and we haven't even tried those kind of problems. >>: You also mean by that the caches. >> Mayur Naik: Everything. Exactly. So all those properties. >>: This is abstract this is more like big O. >> Mayur Naik: Exactly. >>: You want to give me big O. >> Mayur Naik: It's not exactly big O because we're going to say one minute or 42 seconds for this input. >>: Oh. So this is different. >> Mayur Naik: So much of the work in programming, in program analysis, right, has been on worst case inputs. So you're not even given input I you're given input P and you want to know what the worst case input. Whereas. >>: If you do not model all the cache and network and all this stuff, then when you report this, what is my -- know what you're going to report? >> Mayur Naik: So that is true. But to a rough approximation, right? I mean most programs running times do depend on their inputs. There are certainly classes of applications where ->>: [inaudible] coming back ->> Mayur Naik: I agree this is not the complete solution. There's the environment and so on. But any solution will have to take this into account. >>: So really the estimate then here is an actual number. >> Mayur Naik: Is an actual number. >>: Okay. But we're not allowed to run the program? >> Mayur Naik: Yeah. >>: But during the training you are running a program. >> Mayur Naik: Yes, of course in the training program we are going to run it. >>: I'm not quite understanding the answer you had about the speed. Seems like the answer was about worst case performance but speed actually doesn't give you worst case performance it gives you performance as a function of the inputs, which is very similar to what you're talking about, I think. So how do they differ, then, the two approaches? >> Mayur Naik: I don't understand. So the output of this whole exercise is going to be a number such as ->>: Right. It will give you numbers. For worst case, right? >>: No, it's not worst case. If you call function with input N, it will tell you a number like two minutes based on N. >> Mayur Naik: Okay. So eventually we'll -- that's an example where it's not fully automatic. Someone tells you there's an input size N on which the running time depends. And I'm going to automatically get a way to compute N for you. So it will be the same as N, but it will be done in a way which is automatic. Let me get to the end of this, and you can ask me again if you still have the question. I'm going to define the off line stage over here. So what we're going to do, since we cannot use either domain or expert knowledge, we are going to instrument this program with broader classes of features. So distinguishing characteristic of any performance prediction technique is what feature it uses to model the performance. So in our case we are going to say that there are three classes of features we're going to instrument. All loop counts. All branch counts and various statistics on all values of variables of primitive data type. Such as the frequency, the sum, average and so on. Because all of these are potentially correlated with the programs running time. So let me give you an example of a loop counter. Here's a counter F 1 that we'll instrument. This one as you notice is counting the number of flows which are going to be added. You have another counter here, F2, which is going to count the number of elevators in the input. Here you have a counter F 3 which is counting the number of events, how many people want to go from which floor to which floor. Here's an interesting counter coming from a different scheme where what we're going to do is compute the sum of all the time fields. So a person wants to go from one floor to another at a certain time, we are going to take the sum of all of those lines, because again that's potentially correlated with running time. What we're going to do next is run this instrumented program off line on all the N inputs and we're going to get for each input the exact value of each of these counters and the exact running time on that input. Now, this is a classic machine learning problem where we want to approximate the running time R as a function of these features. And we use up to cubic polynomial expansion, and this is, for example, a running time function that we might get for this program. While this is mostly standard in machine learning, there are two distinct machine aspects about our work. So one is nonlinearity and another is sparsity. So by, nonlinearity what I mean is we want to allow terms such as N cross M, where N might be the number of times a loop iterates and number is the times a nested loop iterates for each iteration of the outer loop because that's what models ->>: F1. >> Mayur Naik: Yes, cross terms aren't easy, even though they aren't appearing here. What I meant by sparsity is we want to choose just a handful of features in this running time function, even though in practice we have thousands and tens of thousands of features here. We need sparsity for two reasons. One is we don't want to overfit for off line inputs and the second reason is that remember that our ultimate goal is performance prediction on a given input. We're going to have to evaluate the values of these features like F 4. Any feature that appears in this performance model on this new input I. So the lesser features we have, the better. >>: How ->>: [inaudible]. >> Mayur Naik: Okay. So exactly. So this is what comes to the next point which is how to evaluate these features. And we use a classic program analysis technique called static slicing to automatically obtain code snippets whose goal will be to only compute the value of these features Now, let me ->>: [inaudible]. >> Mayur Naik: I'm going to run the slice on an input. >>: Can't you use machine learning to predict those? >> Mayur Naik: On a new input. -- I think this is what Ben also mentioned somehow you look at the input. The main difference is that would be a black box performance prediction and it just says show me the inputs like the file size, the command line arguments and I'll do machine learning on some features over there. And maybe N is one of the features you get out of it, right, but what we're saying, we're subsuming that, we're saying let's look deeper into the program and ultimately things like the file size and so on will be captured by some feature over here. So in some sense it's more general than that. >>: So if I understand correctly, the slices that you're going to compute, they're going to run some of these loops to compute these for you? >> Mayur Naik: Exactly. >>: But if my program basically everything that happens that's going to take time is in the loops, right? So if you're basically going to run all the loops, without ->> Mayur Naik: Excellent question. >>: It seems like cheating, right? >> Mayur Naik: What you guys are doing is nicely giving me the segue to the next point. So let me come to that. This is exactly why loops are not the only features we have loop counters, so most previous work only does performance modeling. If you had only loop counters, that would be enough to model the performance of any program, because that's where programs spend most of their time. But we also have other statistics such as variable values. So imagine a loop runs from 0 to N minus 1, then we will have features that both count the number of iterations as well as the value N itself, which might be computed in constant time. So anyway what is slicing? So the static slice of a program variable such as a feature F 4 is the set of all actions that might affect the value of this variable. And the goal is to have as small a slice as possible. The standard way to compute the assisted data and to control dependencies I'll show you the slice feature 4 which is in the performance model. Clearly the statement that writes to F 4 has to be in the slice, and then I'm going to go a bit faster with this. This is how we compute data dependencies. So, for example, here you refer to variable T. And it was returned there. So you have to include that. The nontrivial dependencies such as here, for example, not only is B returned here and is radial you need this dependency but you need a pointer analysis which tells you other places where you could have returned data that is on which the statement is dependent. There are control dependencies as well which I won't go into that which these far loops I included. This is the slice for this feature. Notice that what it has sliced out is the part of the code which builds the floors and elevators. In a real program, this will in practice slice out large parts of either one computation for any given feature. Now what Manuel asked what if a slice is expensive? So there can be two reasons why a slice is expensive. Either we are doing imprecise slicing or the feature is inherently difficult to compute. So maybe it's a loop counter which is dominating the running time. What we can do then is, first of all, how do we even measure a cost of a feature, right? We have this off-line data so we can simply run the slice on each of these inputs. We know the exact running time of the slice now. We can ask is it more than ten percent of the running time. We can set any threshold the user is willing to have. And we set it at 10 percent. If it is more than 10 percent on any input we can simply say throw this feature out and repeat the process, starting with regression. This time regression will not be allowed to use that feature, let's say F4. But it will have access to many other features, presumably equally valid ones. With each iteration the accuracy of prediction drops, because now regression has denied the use of the best more correlated features, features most correlated with running time. But, on the other hand, the cost of prediction itself is dropping because it's picking features, the slice is differentiable. Let me step back a bit and show you where program analysis and machine learning is interacting here. We have dynamic analysis here which is instrumenting the program, so on. We have machine learning here doing regression, and we have static program analysis here doing slicing. These are not just three pieces which are just loosely connected. They actually title it. For example, dynamic analysis provides training data for doing machine learning. Machine learning builds a performance model, gives certain features to the static slicer. The static slicer provides profile data to data mine the cost of slices, and it might decide to reject certain features that are too expensive. This is where you see the iterative process. We've run Mantis on some real world programs, I'll show you one which which is Lucine [phonetic]. Open source text search and engine. The dataset we used was the works of Shakespeare and King James bible. We use a thousand different inputs. The inputs is a list of words to search in these data sets and give statistics, the frequency and so on. We used a hundred of these for training and the remaining 900 for evaluation or test. So as you can see our Mantis instrumented 600,900 features, but regression only considered 410 as serious contenders. What that means is the rest either stay constant, have no variability across runs, or they're weakly correlated with running time and so on. But, of course, not all 410 of these features are equally cheap to compute. The iterative process finally chose two out of these 410 as having a good trade-off between prediction accuracy and prediction costs. >>: What were they? >> Mayur Naik: The two features, one was like the end you mentioned. So the input is a set of files actually, each of which has a bunch of queries, one per line. This feature, what this does it counts the total number of queries in these files. The interesting part is the slicing part. The way Lucine is returning, it picks each query from each file at a time and immediately searches it in a database. And it's already built the database and indexed it up front. What the slice will do now, it's going to slice out the indexing part as well as the search in each iteration, if you were not to slice those out you're pretty much entirely running the Lucine engine. The goal of slicing is to remove one computation which comes in the way of evaluating a feature. The other feature was just the number of threads we spawned, so Lucine is multi-threaded. So ->>: What was the other feature again? >> Mayur Naik: It is the number of threads. It's actually not exactly the number of threads, but it's correlated with that because you can spawn a bunch of query processors to ->>: That's measured to the number of queries and the input. >> Mayur Naik: It is also but the more threads you might have, at some point you won't have speed ups but it will affect the running time how many threads you spawn. It's another input besides query files. Here's two graphs, one showing prediction accuracy and the other showing prediction cost. Here we have 900 points, one for each of the 900 test inputs. And this point has an intercept on the X axis, which is the predictive time by Mantis on that input. And the Y axis, the intercept is the actual running time on that input. And as you can see, these points more or less lie on the 45 degree model which means we're doing close to perfect prediction. In fact, the arrow is just 4.8 percent on average on the 900 inputs. Looking at the cost of prediction, what we have is a CDF of the execution time, both of the total program and which is shown in red, and the cost of these, of running of these two slices for these two features, which is shown in blue. And as you can see, even though the running time has a lot of variability, the slices, more or less, remain constant. It's actually not constant. It's linear in the size of the input. We can see a slight curve here. The point is that the slices give a 28 X speedup over time. What it means the slices execute just under 4 percent of the total program in order to estimate its running time. So that was all I'm going to talk about performance prediction, yeah. >> Then you can answer questions. I was looking at the Mantis paper. This is based on Lee's squared optimization or optimization using these squares? >> Mayur Naik: Yes we use a technique called Lasso. Do you want me to go into those? >>: No, I want to check the assumptions which is if you have two -- how well do these methods capture feature attraction, so if you -- so in -- well, I will elaborate on that question later on, but it's not a precise question. But what I didn't get with the transition from the first to the second part was here you predict runtime and performance. In the first part you would predict energy consumption. >> Mayur Naik: Yes. They're too often correlated. It's one example of performance metric. >>: Cell phone energy consumption can come from the [inaudible] units. You have various parts of your device which contribute to the energy consumption, if you're looking at the program. >> Mayur Naik: We can always model more features. So the external things which affect energy consumption, we can -- we can stop at 6,900. You can point a few more and still model that. >>: Actually found it very interesting for programming, actually, there are very small features that are good features. Do you define that for this problem? >> Mayur Naik: Part of the problem is I would take that with a pinch of salt. Because these runs are also generated by us. What I notice is more often it's trying to fit the training data that we have. And ideally we would get data from the wild, and even if that was at a certain profile, then that's fine, then that's what the data was. So it is always a small number of features. We have run this on six different programs. And it turns out that even if you go beyond cubic models, if you allow many more features, many more terms, the accuracy doesn't really improve. >>: So it's actually -- so the performance debugging, then I can use this technique to sort of focus on if I have to include the performance of this program, could I use these features to say these ->> Mayur Naik: I don't know. I think this information would -- I think one of the goals is performance debugging. But we really haven't applied it to that. Because this is capturing the dominant running time. I think probably you would know that. So if you didn't know, then knowing it wouldn't probably tell you how to speed it up. >>: In your experience you did define other mixed terms at times. >> Mayur Naik: Yes. >>: In the profile. >> Mayur Naik: Yeah. >>: Can you compare and contrast to the work that Chan, [inaudible] are predicting? >> Mayur Naik: I think I know vaguely about this work. I think the key thing, I think the key difference as I mentioned up front this is not domain-specific. So I think if I believe correctly what he's trying to do is model compiler optimizations, various flags to JVMs and so on. So he picks features such as those. And depending on what features are used to compile a program. He wants to estimate how long the compiler might take. I think that's what he's trying to do. >>: And the analysis -- I think it's closer to your work than anything. >> Mayur Naik: Yeah, so he's probably the -- we do cite his work but it's kind of not at the top of my head right now. I guess one thing is certainty, he doesn't use program slicing, maybe he runs the program and hopes that all the features are correlated with running time are there up front. Which is the case for most programs, they read [inaudible]. This is more for programs that are lazy and might read inputs, environment variable. >>: I think some of the -- I think one of the things he looks at are [inaudible] so he has a notion of features kind of like what we're talking about -- but the flip side he's doing things where he can predict specifically where in the program the time is to be spent. And that's the result, they can improve performance of optimizer, the optimizer on the part of the program of the most time. >> Mayur Naik: I see. >>: Follow up on this, the programs or inputs [inaudible] accuracy. >> Mayur Naik: It is. Actually what we have noticed here, right, is that the bias is -- so I mean just want to ensure as well. But we have run this on programs that actually take exponential time. So, for example, to have [inaudible] a program or sack solvers, what it typically does is sacrifices accuracy by a huge amount for these outliers and then just goes with a more dominant trend in the training inputs. So I should really provide error bus for that but I don't have that. >>: [inaudible] which is picking up a data structure for sort of keywords. Enabling search to back end [inaudible] presumably in memory table, right. But I guess I'm going back to your previous point, so for inputs that essentially you say wow this is within five percent that's great. But would there be cases where you would really be off but we're distinguishing where this is, the case getting back [inaudible]. >> Mayur Naik: So all this is future work. I agree it's something we don't do right now. So let me come to the final part which is scaleable program verification. So I'll use Mantis itself to motivate this problem. So notice that I talked about slicing earlier, and just using data and control dependencies was fine as long this program was single threaded but in fact this program is multi-threaded. What this means is you have to be careful to ensure that the data control flow of one thread, in this case the main thread which contains the slice, is not affected by actions of other threads. So in this case the elevator threads which are spawned. So what you have to do now to do sound slicing in the presence of concurrency is either show that all of these actions only touch thread local data, so data visible only from the main thread or you have to include all other actions in the slice. Because you don't know if the elevated threads might actually affect the value of the F 4. So this problem of proving noninterference between actions of different threads is well known in concurrency. It has many applications it's called arrays condition. And one way to prove a pair of actions is noninterfering or freeze free is to prove that actions involved is attaching only local thread data. That's what I'm going to focus on in this talk. So this is in the literature it's called a thread escape analysis problem. We are going to phrase it in terms of queries. We'll take a pair V comma D and this query will be true for all inputs of this program. Whenever a thread reaches program point B this point variable B is pointing to an object that's not even reachable from any thread other than the current thread. To give you the early weight example, let's say this is the program point B and this is the variable V, which is a button press event. I'm going to show you just one program state that arrives at P on one input. But the reasoning will be similar for other states. So here's how the data structure built by this program looks like at program point B. One snapshot. I'm going to use red for shared locations and blue for local locations. So as you can notice, the building, the floors are shared between the two elevator threads, and the main thread. But this part of the data structure is local to the main thread. So in particular V points to a button press location, which is local only to the main thread. And we want to be able to prove this. It's easy to see here actually this query is true because no matter how many button press events you cleared in the input or how many flows you create, these are always going to point to a local location whenever the main thread reaches P. So in order to prove this, we need to use a static analysis, and I'm going to go a bit fast here because people here know what static analysis is. So all static analyses need abstraction, and there are certain reasons why we need that. The most static analyses will abstract two things, including our analysis, one is pointer locations and the other is control flow. And both of these are statically unbounded in real world programs, because of, say, dynamic memory allocation and recursion, loops. So pointer abstraction is -- there's a whole field called pointer analysis which focuses on this. I'm going to give you a flavor of some of the abstractions in this field. So I'm going to start with a trivial abstraction, which is going to say -- we're going to use one abstract location to model all concrete locations. And clearly this cannot prove this query. And why is that? Because the static analysis cannot distinguish this local location from, say, this shared location. And so it has to assume that we might point to a shared location. Let's look at another abstraction which is slightly better. So you have -- it's what is known as allocation side point abstraction. What it means is that it abstracts all locations created at the same allocation site using a separate abstract location. And in this case again we are unable to prove the query but for a more subtle reason. The reason here is the confusion now is in this abstract location. So these two locations are being confused. But because thread sharedness is a transitive property you have to assume that everything reachable from one shared location is also shared. And so we cannot prove that V points to a local location. And this goes on. So there are other kinds of abstractions, for example, there's K of SA which finally does prove this query, but let's see where all this is headed. So we have -- we saw three different kinds of abstractions. As you go through the more and more sophisticated ones, you are able to prove more and more queries, being more and more precise, but you're also being less and less scaleable. For example, this is constant. This one takes linear number of abstract locations and this one is exponential in K. And in practice you can rarely go beyond K equal to 1 for the real world programs. And that is just part of the point abstraction story. There's also control flow which I mentioned. We have notions called flow and context sensitivity. And again you see the same trend here except it's much worse this time. What you see here is if you want to fully flow context sensitive analysis, then the overall analysis becomes exponential in the number of abstract values here. Okay. So it turns out that for thread escape analysis we will have to be flow and context sensitive, and what that means is we now can't even use allocation sites because it's linear, because the total analysis will become exponential. So what that means is that we are limited to a constant number of abstract values. And as static analysis, I'll show you is going to have just two partitions. So we are served from the exponential blowup. There's still some exponential blowup. It's still in the number of fields, but I'm going to show you that in practice it's a small constant number of fields that matters. One interesting thing you'll notice here is what was not there with here is you have a dependence on the number of queries, it's linear. What we're going to do is we're going to run the static analysis separately for each query. I've shown you each query but in fact we have thousands of them. This is the reason why we are able to use two partitions and eliminate the exponential dependence on N. It's also the reason why S, the number of fields we are going to track, is very small. Because we are focusing on one query at a time. >>: What is L. >> Mayur Naik: L is the number of program points. Because it's flow sensitive you have to keep a separate state at each point. And F is the number of instance fields. So one drawback, interesting approach suffers from that is different queries coming from different parts of the program clearly will need to abstract different data structures precisely. But existing static analysis mostly use a single abstraction, A, to prove all queries simultaneously. And what that means is you either use a very precise abstraction that proves many queries but now is not scaleable or you use something that's highly scaleable but doesn't prove most queries. So the first insight -- we have two insights first one is client-driven static analysis. This is a known concept in program analysis. But I'm going to show you how we apply it for thread escape analysis. So first is we're going to be query-driven. What that means is we're going to run a separate analysis, conceptually, for each query. Secondly, we are going to be highly parametric. So each will be dumb, it's going to say you give me the hint for what abstraction I should build. And we are going to choose highly flexible parameters. So here you can imagine there are five program parts that need to be modeled with varying amounts of precision, but in practice we'll have thousands or tens of thousands of different program parts. So that each query can be highly specialized to abstract only a small number of program parts that really matter. Okay. Coming to our thread escape analysis, the parameter here has one bit per allocation side in the program. So this has seven sides. In practice there are thousands. And we can say -- we can tell the static analysis to treat sides one, 5, six and seven precisely. That means bit one. I'm going to denote it by the tan color. And the rest imprecisely, which means I'm going to use the white color. What this induces is the following abstraction. Okay. So now this is not an object allocation site abstraction. It is something that can be understood in separation logic and tree value logic and so on. I'm not going to go into those details, but this is what the static analysis ends up computing if it gets this parameter. As you can see this actually proves the query. The reason is because you don't see any shared locations here, nor do you see an edge from any of these shared locations to this partition. This is the relevant partition and this is the irrelevant one. So the second insight -- this is all static analysis so far. I'm going to show you how we use dynamic analysis now. So the challenge here is these parameters, so, first of all, we have thousands of queries. And for each query we have a parameter which has thousands of tens of thousands of bits. And to make matters worse, most choices are not going to work. Okay. Either they're going to fail to prove the query or what I mean by that they're going to be imprecise, or they're not going to scale. So there's still an exponential dependence on the number of fields and that really comes from how many sites you deem as relevant to proving the query. So our solution to find these sites, there's three challenges, how do you efficiently find these bits and furthermore they should be as few bits as possible so that is the sheet and they're scaleable. And furthermore they should define the query. What we're going to do is use dynamic analysis, we're going to run this program, take all the queries and observe how they behave and we're going to come up with values for these parameters for each query. I'm going to just give you the highlights first, what are our main results. So clearly the procedure is efficient. So we are linear in H as opposed to exponential. But now the catch is we might fail to prove queries. But this is nice empirical and theoretical properties. The empirical property is this whole procedure is precise in practice. I'm going to show the vast majority of queries are going to be proven using the parameter values properties that dynamic analysis gives. The theoretical result is any bit that is set to one by this dynamic analysis must be set to one by any parameter value that is able to prove the query. So the dynamic analysis sets things to one only if it's absolutely certain that the only way to prove this query is to set this bit to one. >>: Your dynamic analysis [inaudible]. >> Mayur Naik: Yeah. >>: Or do you do collecting on top of that? >> Mayur Naik: You could do those kinds of things as well for efficiency or for to mimic the static analysis. >>: How far are you going in the static. >> Mayur Naik: We aren't, actually. So we don't mimic the -- it does have some knowledge of the static analysis. But it doesn't do any abstraction. It's purely dynamic. >>: But it does know the query. >>: It does know the query, of course. I'll show you in a moment what it does. So what this means if we end up proving a query we'll have a minimal abstraction, the smallest number of sites that need to be set to one to prove that query. What does this dynamic analysis look like? So it starts by this vector. It assumes everything is irrelevant for a query, for each query, and I'm going to show it just for this one query. Every time it reaches the program point P, which is the query point, it looks like the heap. And this is already available in JVM so any managed language. So we're not actually backing this out. It asks what V the query variable points to. If it points to a shared location, then we are actually done. >>: Reachable. >> Mayur Naik: We know what is reachable for multiple threads. We tried that, there's an instrumentation that tells us that a location has escaped. So if it is shared, then we declare that there's no way to prove this query thread local because they observed its sharing. No abstraction going on here. But if it points to a local location we go a step further. And what we ask is what is the side at which that location was allocated. In this case it's six. And so we say that side has to be treated as an element. Because if you don't, then we know there's a background proof that says that the static analysis would not be able to prove this query using that abstraction it has. Just that side is not enough. We have to take the backward transitive closure the way that thread escape analysis works. So we end upsetting the bits for all the sites at which any location was allocated from which we can reach this location. Even though it looks like 4 out of 7 locations in site here in practice you can think this vector has thousands of entries but we still set these four bits. And this is actually the way you would reason about this query as a human. So the reason we have to minimal abstractions not only because it's scaleable but because it's also the way we would reason about these queries, if you were asked to prove this query. I'm going to show you some benchmarks, some experimental results here. So we have a red color, Lucine search, database program, a microcontroller, simulator and a four query [inaudible] rendering system. The benchmarks are half a million byte codes. As I said we have up to 6,100 allocation sites and up to 14,400 queries. These queries essentially are elementary read and writes in the program and in all instances reads and writes. So here we are going to compare our approach, which is this true partition point abstraction with flow and context sensitive flow abstraction with an analysis which is allocation side point abstraction but flow and context insensitive. We've tried this point, but it doesn't -- our analysis doesn't even terminate using allocation site and flow and context sensitivity for any of these benchmarks. This is really true partition abstraction which enables flow and context sensitivity. The previous approach, as you see by the bars here, only proves only 27 percent of these queries on average. But our approach ends up resolving 82 percent of these queries, the red plus blue parts. The red parts are a simple part, because we do dynamic analysis, we can observe queries which are escaping, the 27 percent, the red parts here, are not static analysis which sound can prove them. The blue parts are more interesting, they are the cases the dynamic analysis made a guess for the parameter value and the static analysis was actually able to prove it. 55 percent of those queries, the remaining 18 percent on average. >>: [inaudible]. >> Mayur Naik: Also, I forgot to mention it you could run it multiple times and trivially combine the vectors but we run it on one input because that's the only input we have. If we run it on other inputs we have -- we really need very different inputs to exercise other parts. So just assume we run it once. >>: [inaudible] the analysis, the one goes where in some part of the program, right? >> Mayur Naik: So I have a reason for that. I think it's partly related to the properties we're after which are very simple. I think we see this over and over again in program analysis, that proofs for things like thread escape are really simple, and just one path is needed to observe, if you ever reach the query point, the accession point, you saw the results and you also see the reason why it didn't fail on that one part. >>: [inaudible]. >> Mayur Naik: That case of course this is not prove for everything this is why it's 18 percent false positives. >>: [inaudible] what could be even. >> Mayur Naik: I'm not showing you those query points I'm shooting a bit over there. But if you don't hit an assertion we're not trying to prove it. >>: So here we have [inaudible] thread first and then the static analysis takes over, there's no going back. >> Mayur Naik: No. So since there's passive use of dynamic analysis, going back and forth will be like a thread of refinement and so on. We don't even have infrastructure, that would require driving programs across ->>: I'm just asking, you have a step ->> Mayur Naik: Okay. So the 18 percent false positives are mostly because of static coverage problems. So it's actually a very simple coverage problem. What it means is that the assertion was reached but some site whose bit should have been set to 1 and deemed relevant was not even reached. What this means if you could simply at least reach all the sites you don't need to reach them in a fancy state. But if you just hit them and there's a very high chance that we'll actually end up finding the proof. By the way, one more thing I should mention we're working right now on using machine learning for this. Now we have all this data for 82 percent queries, these are minimal abstractions now we know the correlation between which sites are relevant to proving which queries. So in the end the hope is to even completely throw out the dynamic analysis and just use machine learning to predict which sites are relevant to proving which queries. >>: Doesn't machine learning require ->> Mayur Naik: There will be features intermediate features not loop counts it will be more sophisticated it could come from things like program analysis and points and so on we'll have to run some amount of analysis but I don't have results yet. It's just a thought that ->>: Why wouldn't you want to throw in -- cheapest bang for your buck, right. >> Mayur Naik: Programs which you can't run and so on. So let me quickly finish. >>: You're fine. >>: If there are programs you can run ->> Mayur Naik: Okay. The queries you can't do in reach -- okay. So the running time breakdown, as you can see, the analysis we compare against is really cheap. It's a whole program flow context insensitive analysis, just takes one minute on the largest benchmark. Dynamic analysis is you can make it arbitrarily cheap. You can sample and so on. It doesn't take much time. The static analysis, the total time is quite significant. Sometimes over an hour. But the key thing to notice here is that we are doing a static analysis separately for each query, and what I'm showing you here is running the static analysis sequentially for all queries. In fact we do some things slightly smarter we group queries after the dynamic state that have the same parameter value. Which have the same abstraction. That gives a big reduction. What I'm showing you here is the mean time it took for the static analysis for any query group and the max. So we can see the max is just 21 seconds, which is if you were to spawn all of these static analysis in parallel you'd be done in 21 seconds. >>: Total time is -- looping already? >> Mayur Naik: The total -- well, this is running it for each query group serially. It includes grouping already. So as an example if you start with 14,000 queries we end up with say 400 groups or so. I have the numbers but not here. Shedding. >>: What engine is the static analysis. >> Mayur Naik: It's [inaudible] gives algorithm, which does fully context sensitive and fully flow sensitive static analysis. I think it's the same that's used in SLAM as well. >>: Reachable? >> Mayur Naik: Yeah. >>: It's field sensitive? >> Mayur Naik: Yes. >>: Specialized program, because my impression from the slides and from [inaudible] is if you use a data mining engine, it's essentially you can express it in data but if you do the magic set transformation, you get the same slice based on the queries. >> Mayur Naik: I don't think so. We actually do all this. I can talk about all those things off line. But this is not -- we don't really care. I don't think I have good ->>: You're not going to learn all the features. >>: Right. Transformation at least can clearly provides the -- [inaudible] it's not clear how much [inaudible] it's set to. Allocations. >>: Maybe we should go off ->> Mayur Naik: So just to show the sparsity of abstraction. So how many sites were deemed as relevant? Here you can see the total number of sites. Again, up to 6,000. For all queries for which the dynamic analysis guesses these parameter values. You can see the mean is quite low which means few sites, few things are relevant to proving any query on average. The max can be pretty high. But if you look at the ones which were actually proven as opposed to to all for which there's dynamic analysis guessed the numbers drop. These drop. The more things deemed relevant the more the chance the dynamic analysis is going to lose on coverage. It's not going to see some things which are relevant. So for the things which end up being proven for the queries which are proven, these numbers would be slightly lesser. It's still impressive, though, that, for example, in a case where there were 31 sites which needed to be set to one, the dynamic analysis actually saw all of them. So notice if you were to flip the bit for any of these 31 sites, the static analysis is guaranteed to not be able to prove this query. >>: Tried to run the experiment now, it's a matter of time it takes to analyze the whole set of queries, where you cut off at 31 and say, okay, if it's more than 31 I'm not going to run it. >> Mayur Naik: That's an interesting point. I haven't tried that. >>: So some stuff that we used in the optimization using ->> Mayur Naik: But -- >>: It's not going to work out and so you're going to cut off the query time. >> Mayur Naik: The question is -- you're not asking how to choose it because choosing which 31 would be ->>: Not which 31, just have a cut-off in terms of the number of features that were picked and say it's more than 30, let's not even run the query because the likelihood of proving. >> Mayur Naik: If it needs more than 30, then it actually needs more than 30. Of course. We could fan it out. >>: Analysis question. So for your benchmarks, what percentage of [inaudible]. >> Mayur Naik: So this is not just really shared between threads. Anything reachable from a global is also considered thread escape positive. So I'll show you the numbers. It was 55 percent is what we could prove. My guess is all those 18 percent are false alarms. So 55 plus 18. >>: These are guaranteed do not have any threats from ->> Mayur Naik: 70 percent is what I see. >>: It's not even local. It's not even locking. >> Mayur Naik: These are not real reachable from global. Different programs have different styles for what they store and make it reachable from a global, even when it's a completely single threaded program, and even single threaded programs here, see what is reachable. >>: It's not current data streams. It's program points across ->> Mayur Naik: It's static. >>: [inaudible]. >> Mayur Naik: Okay. So we have implemented this thread escape analysis, and not just applied it to static slicing, but even concurrency data, reduction proofs, recent deadlock and constancy performance checkers. All require thread escape analysis. This is actually early work in my Ph.D. where we actually ran these on real world programs. And these are the kinds of reactions we get which I'll come back to the comments I showed earlier about this Java expert, Java concurrency expert. But most Java programs have concurrency bugs. Some projects actually shut down after we found so many bugs that they believed synchronization was either [inaudible] but these are the ones that survived actually still. >>: How do you use this to detect bugs? >> Mayur Naik: These conditions -- the goal of the strategy we have in all these tools is prove as many things risk free and then report everything else to the users and the hope is to make that really small. >>: Just push -- I think there's just so much you find so many bugs that they shut it down, who was using that stuff? >> Mayur Naik: No, no. So these are projects that actually people are, so JD Chris -- so this is Java, a JDBC interface which actually -- what they claim it's the fastest for Microsoft SQL Server. You can see the source for these projects. They're widely used. Many of these bug fixes weren't by the project developers it was by others who were using these libraries. >>: Understand. I've never heard of a project shut down because somebody program finds so many bugs that it has to be shut down. Usually if you have so many bugs while people are using it, users, experience with bugs. Otherwise these bugs aren't [inaudible]. >> Mayur Naik: So it's true. >>: That's the kind of thing. That's why I'm surprised. >> Mayur Naik: There are two that were shut down. There were probably other reasons to shut down. [laughter] but this is one of the reasons where ->>: Violation of ->> Mayur Naik: They didn't close the bugs ->>: They used this. This is the first I've ever -- let me see your programs. Completely shut down. There's so many bugs in it. My program analysis ->> Mayur Naik: Okay. So I started with partition analysis, which was about partitioning programs. And I told you how it can use call graph analysis to decide what to migrate. I said if you wanted to do dynamic partitioning you can do performance prediction, which in turn requires slicing, and because this needs data and control dependencies, you need a call graph and you need a pointer analysis. If you're going to slice concurrent programs, you need arrays detector. And of course it needs various analyses, different ways to prove things arrays free. I went into detail for the thread escape analysis. And I think someone mentioned, Modan, these have many other uses other than the ones intended for. I've given some examples here. And all this actually is publicly available, and they are actually integrated this way, using our program analysis platform I've been building over the last several years, generally represented to a real FLDI this year. Let me tell you a few details about this platform. So of course each analysis is returned in isolation, so once it's returned you can reuse it for various purposes other than the one where it was originally intended. And each analysis builds upon others that other people have potentially built. These dependencies you see here are data and control dependencies. Even though this looks like a DAG, they actually have cycles. Even within this there's different queries so there's concurrency within each block as well. So these dependencies have semantics in a declarative parallel language called CNC or concurrent collections that Intel and Rice is jointly developing. The runtime we use is Rice's Habanero Java built on top of X10. Why do we need a parallel runtime? Because there's a lot of parallelism in program analysis. Whether on a multicore machine or even better on a cluster. What are the cool things we can do once we expose these dependencies between program analysis? One we can do demand-driven computation. We can say I want to run the arrays detector in call and it will automatically figure it needs to run these four analyses. The second is you can do, reuse of results. So if you ask for the call graph multiple times it will just be computed once. Another is the running independent things in parallel. Not just at a coarse-grained level such as these two analyses, but even within a thread escape analysis, so different query groups can be scheduled in parallel. All this happens despite having loops and so on. We have iterative refinement analysis and so on which induce loops. And despite all the parallelism, we also have a guarantee for determinism, because of CNC's dynamic single assignment form. So no matter how many times you run call, it might choose different strategies to schedule things in each one but at the end you're still guaranteed data parallelism. >>: Java [inaudible]. >> Mayur Naik: So all this is Java. Extent is an extension of Java, Habanero. >>: Is there a top ->> Mayur Naik: CNC is just a programming model. It has Java, C++. Python. You can implement it for any language you want. It's about connecting boxes and arrows and giving them dependencies. >>: The box, it's your Extent. >> Mayur Naik: So I confused you by saying something happened within this process box. All it will be exposed out. Think of each box has something returned in Java or Datalog, C++. Doesn't even see what's happening here. As long as it doesn't have side effects. The dynamics single assignment property doesn't hold. So we have built several systems, tools and frameworks that other program analysis researchers can use using Call. I just described three over here, Clone Cloud, Mantis and arrays detector we have. There are people outside program analysis such as in systems who are using call. So for instance here's a student at U.C. Berkeley doing his dissertation on how to automatically mine configuration options and their types for complex systems software. For example Hadoop, is one of his experiment benchmarks. He's trying to figure out what are the configuration options for these pieces of code. Why is he doing this? Because there's a company called Cloud Era in the Bay Area which gives Hadoop this map reduce framework as a service. When they get the bug reports they don't know what configuration people used. If you can mine these there's several versions of Hadoop lying around. If someone can mine these and automatically generate their values then debugging would be reproducing would be much easier for Cloud Era. I don't think I'll go -- should I go into this? So what I talked about today was how can we use computers automatic techniques for solving some of these software challenges. I've just scratched the surface of what can be done by combining program analysis and machine learning. Some interesting reasons why machine learning should be used is one is we have exponential search in many of these problems, like static analysis and we have sparsity, you saw it both in Mantis and thread escape. Very few things are relevant, either to proving or performance modeling and so on. Often you have incomplete or noisy data. So, for example, in performance modeling. And earlier work I did on cooperative bug isolation. So for these three reasons I believe machine learning has hope to solve program analysis problems. But I don't believe computers can do -- solve all these problems. So we need better languages and models. So CNC was one model where you saw you could really leverage the concurrency in Call. There are various versions of this. We can extend languages, restrict them or have blue languages. And CNC actually falls into this. It's a blue language. And finally I'd like to exploit domain language, even though much of the things I've shown you are general purpose, I believe one can use a lot of domain knowledge to make these problems more tractable. So with that, I'll conclude. Modern computing platforms have these exciting software engineering challenges and we can combine these various technologies to solve these problems effectively. And finally one thing we noticed was program analysis can be used to solve problems that they weren't intended for, for example, slicing can be used for performance prediction, can get called from this website. Thank you for your attention. >> Tom Ball: Thanks, Mayur.

21989 >> Tom Ball: Hello. Good morning. And...

Related documents

Products

Support

21989 &gt;&gt; Tom Ball: Hello. Good morning. And...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

21989 >> Tom Ball: Hello. Good morning. And...