21989 >> Tom Ball: Hello. Good morning. And...

advertisement
21989
>> Tom Ball: Hello. Good morning. And thank you for coming. I'm really pleased to have Mayur
Naik back. He's been here quite a few times and is well known to us. He was an intern on the
SLAM project back in the summer of 2002, we computed, right? And we had a great time with
Mayur. In fact, in that summer I think he started in May, and we got some results and sent off a
paper to POPL in July and it got accepted.
So it was like the shortest time from like paper inception idea doing the work, do we have enough,
send it off, get accepted.
Yeah, it was really great. And Mayur graduated from Stanford with Alex Aiken, did a lot of work
on program analysis, and we'll hear a little bit about program analysis today.
Since then he's been at Intel Labs. He's done really seminal work on data arrays detection and
concurrency bug detection in programs using static analysis but he's broadened out from those
beginnings on defect detection and verification, and he's going to tell us today about some other
applications.
So Mayur, welcome back.
>> Mayur Naik: Thank you. So let me begin with the context for my research. So we have three
dominant computing platforms today. Parallel computing, cloud computing and mobile
computing.
And these platforms have certain characteristics that distinguish them from the traditional desktop
computing model.
For example, the hardware devices in these platforms are multitudenous, and they're diverse and
[inaudible], and they're geographically distributed. And these characteristics pose unprecedented
challenges in software engineering.
For example, in software reliability, programmer productivity and energy efficiency. So I'm going
to give a talk about a challenge in each of these three kinds of computing platforms in this talk.
So first comes mobile computing. So Smartphones are ubiquitous today. And so are the apps
that run on Smartphones. For example, Apple recently announced their ten billionth app
download from its app store. But the problem with these Smartphones is that they can render
rich computive intensive apps because of very limited battery size, CPU memory size.
There are many challenges, of course, in mobile computing. One is how can we seamlessly
partition these rich mobile apps and offload their compute intensive parts to the cloud. What I
mean by the cloud is even a more powerful computing device in the vicinity such as a desktop or
a laptop with which the Smartphone has a strong connectivity.
Speaking of cloud computing, it has taken off finally thanks to the growth of the Internet. And it
has its own software engineering challenges.
So, for example, cloud provider's expected to meet service level agreements. There are various
resource management challenges such as how can the cloud be energy efficient, how can we do
a better job scheduling so as to improve the throughput. How can we exploit data locality of jobs
in the cloud.
The challenge is how can we automatically predict various performance metrics of programs,
such as the running time or the energy consumption, or the data locality.
It would help solve some of these problems if we had a solution to this problem.
And finally comes a challenge in parallel computing. Roughly around 2004, CPU speeds stopped
increasing and what we see is an increasing number of cores now. None of which is getting any
faster.
What that means is software will have to be returned in a concurrent fashion in order to take the
performance -- advantage of the performance benefits of these multiple cores.
But it's well known that writing concurrent software reliably is significantly harder than writing
sequential software. So as this quote as an expert on Java concurrency shows.
So the challenge is how can we automatically make concurrent programs more reliable. So I'll
begin with some terminology here. So program analysis is a body of techniques for discovering
facts about programs. And we have two kinds of program analysis. We have dynamic analysis,
which are program analysis that run the program to discover these facts.
And we have static analysis, which don't involve running the program. So this talk then is about
synergistically combining diverse techniques, such as static program analysis, dynamic program
analysis and machine learning.
In order to solve some of these modern software challenges that I talked about such as program
scalability, program reliability and program performance estimation.
So I'll give you a preview of our results before proceeding to the key insights that helped us
achieve these results.
So the challenge in mobile computing is how can we scale mobile, rich mobile apps on resource
constrained mobile devices? And I'll show you how we have combined a static and dynamic
program analysis to seamlessly partition these mobile apps, and off load the compute intensive
parts to the cloud and thereby achieve up to a 20 X decrease in the energy consumed on the
phone.
In cloud computing, the challenge -- one of the challenges is how can we estimate various
performance metrics of programs, and I'll show you hoe we have combined static analysis,
dynamic analysis and machine learning to automatically predict the running time of general
purpose programs.
And we do this both accurately and efficiently. What I mean by accurately is that the prediction
error for benchmarks is on average less than 7 percent, and the cost, the prediction cost is on
average less than 6 percent of the total execution cost of these real world programs.
And finally for parallel computing, the challenge is how can we make these parallel programs
more reliable. And I'll show you how we get combined static and dynamic program analysis to
scalably verify concurrent programs.
In the process, it has exposed a 400 concurrency bugs in up to 1.5 million lines of code of vitally
used Java programs. And many of these bugs were fixed within a week of reporting by the
developers of these programs.
So what I'm going to do next is present the key insights behind each of these three problems,
which enable us to achieve those results.
And as I go from one problem to the other, the depth of the program analysis is going to
progressively increase.
So I'm going to start with seamless program partitioning, in the context of mobile computing.
So suppose we have a Smartphone and we want to run a rich app on it. What I mean by that is
say an app which has a compute-intensive function such as a face detection routine which
detects the faces in all images stored on the phone.
But if there's a more powerful device in the vicinity such as a desktop which this phone has good
connectivity then we might want to offload this computation to that device so that we can
conserve the energy that is going to be consumed if we run that compute intensive part on the
phone itself.
So we've built a system called Clone Cloud that allows you to do such offloading. And while
Clone Cloud is built for the Dalvik VM on the Android platform, this could be done for
Microsoft.net or Apple's IOS as well. Any application layer would work.
Clone Cloud recognizes two kinds of instructions, offload and resume instruction. And the way
these instructions work is as follows. So when the app starts executing and reaches an offload
instruction, the state of the thread, which is executing the offload instruction, is migrated to the
cloud.
And execution there presumably of the compute intensive part resumes, continues, and finally
when it hits the resume statement, any migrated nucleated state is shifted back to the mobile
device and execution resumes here.
There are many system level aspects we worked on but I'm going to focus on the program
analysis challenge here, which is how do we automatically find which functions in this app to
migrate?
And notice that this is not obvious because even though you're getting the power from more
powerful device to execute compute intensive functions there's also the cost of shipping data
back and forth which itself consumes energy on the phone.
>>: I have a question about -- are you assuming that the app is single-threaded so there's no
interference from other kinds of things.
>> Mayur Naik: Could be multi-threaded and execution continues on the app until it touches any
state which has been migrated. So any reachable state from the thread which could migrate. We
do thread level migration.
>>: At which point you have the allocation does it get blocked until the resume happens?
>> Mayur Naik: The one running on the app itself will get blocked until the resume happens.
So we formulate this as a mathematical optimization problem using integer linear programming
and solve it offline using an off-the-shelf LP solver. So the constraints are obtained from static
analysis, and the objective function is obtained from dynamic analysis. So static analysis can
dictate what solutions are correct. So where is it legal to put offloads and resumes and we allow
this to be put at the entry and exit of any function of the app.
Constraints, an example of a constraint might be that certain functions have to run on the mobile
device. Such as functions that access sensors, such as a GPS or camera.
More interesting constraints are that we shouldn't have nested offloads and resumes. Offloads
and resumes should alternate on every execution path. We use a simple static call graph
analysis for that purpose.
Many solutions in practice are correct. But only a few of them will be optimal in the sense that
they will minimize their total execution time of the energy consumed on the phone.
So these are the two metrics we have implemented. So dynamic analysis can data mine, specify
the cost of each of these solutions, and then the ILP will choose an optimal one. And for this
purpose we use program profiles, profiles of this app from both the mobile device and the cloud.
So the program analysis here is very simple. But I'm going to show you some of the results we
have.
>>: Are the program profiles just control files?
>> Mayur Naik: They measure the -- so we have a representation of the execution time of the
program in terms of each function. So how much time did each function take and then you can ->>: So you're not breaking down data, amount of data transmitted.
>> Mayur Naik: We also have the amount of data. So for each function we give both the
execution time and the data, the input and output data that is needed. Because if you shift that
function to the cloud you have to ->>: So the program profile is time? And then also the amount of network access?
>> Mayur Naik: Yeah, we know in advance how much energy will be consumed on a given
network for each byte of data that's transferred. So just measuring the number of bytes that is
transferred is enough.
>>: So the methodology then you just find a whole bunch of current solutions and you run it
through the dynamic analysis --
>> Mayur Naik: We are generating an integer linear program. I'm just telling you how the
constraints come, for the integer linear program. And how the objective function comes.
And once you have an integer linear program you just solve it using an LP solver and the output
tells where to put the offloads and resumes.
>>: Can you compare and contrast the work of Jay Orchestra?
>> Mayur Naik: So the novel aspects here are one is we do thread level migration. So native
state, for example, is not migrated for efficiency purposes.
Another aspect is I would say is we are doing this automatically. I don't know how Jay Orchestra
does it but here it's completely oblivious to the programmer how we put the offloads and
resumes.
You get another aspect is we allow the thread as I said continue on the app. If it's a UI thread it
can continue as opposed to blocking.
So we have implemented compute intensive apps ourselves on this Android platform. And I'll
show you results for one of these apps which is the face detection app I talked about. We have
implemented others such as a wire scanning app and a behavioral profile app. So here the
setting is Google phone, running Dalvic VM on Androidware, and the cloud is a standard desktop
running on Android VM, on Linux.
So on the Y axis is the energy consumed on the phone which includes the energy to transfer data
back and forth.
We have done two sets of -- we've done many sets of experiments, and I'm just showing you two
extremes.
>>: [inaudible].
>> Mayur Naik: It's very similar to Java bytecode. Slightly modified. So all the analysis I'm going
to talk about is Java bytecode.
>>: So here are two sets of experiments, one is where one image is on the phone and another
where there are 100 images, so two extremes. And within each experiment, we have three
results. One is what happens if you run the app entirely on the phone. And another is what if you
use our system, Clone Cloud, and then there are two choices there, if you use a Wi-Fi network or
a 3G network.
And as you can see, for one image Clone Cloud decides not to partition the app because the cost
of data transfer is just not worth shipping the computation to the cloud.
Whereas the moment we go to 100 images the cost is amortized over the 100 images.
So we get a 20X, upload 20X speedup if we ship the computation, if we partition the app, face
detection app and offload.
>>: [inaudible].
>> Mayur Naik: I'm sorry. It's decrease in energy used on the phone. But we've also computed
the total running time. And as most people notice, this is usually correlated. So the running time
is speedups are very similar to -- so we have up to, up to close to 20 X speedup as well for the
total app.
>>: Are you shipping the images over?
>> Mayur Naik: Yes. But not from the device. So the way the app is returned is that it loads up
the images into memory. Because if -- so if you ever make a function call that might involve
reading from the disk, if we make a native function call, that cannot be migrated.
>>: Are the images already on the cloud?
>> Mayur Naik: No. So the images -- so the cost here includes the cost of shipping the images.
I would say they aren't that big images. I think it's up to 200 KB each. So we haven't really -there were limitations of the platform itself, what we could ->>: But the idea in this app is that the images would be captured on the phone?
>> Mayur Naik: Yes.
>>: And then the question is whether to put it down ->> Mayur Naik: Yes.
>>: So did you look at the overall energy consumption on both sides?
>> Mayur Naik: No. Okay. No, we didn't.
>>: Is that important?
>>: If you're only concerned with batteries it isn't important. If you're interested in systems
[inaudible].
>> Mayur Naik: In this case the focus was just on the battery on the mobile device. Yeah.
>>: So you have to determine but static analysis has to determine that the functions you're
shipping do not do IO on the phone.
>> Mayur Naik: Yeah, so because we know, we understand Android platform, we know which
libraries access functionality specific to the device. We do virtualize a lot of functionality. There's
a lot of computation -- the computation which is migrated on the cloud can use a lot of the
hardware devices such as the disk and so on, on the cloud itself. But there are certain
functionalities which will have to remain on the mobile device.
>>: So I don't have a Smartphone. So my question is -- I'm still waiting. So suppose I wrote an
app for image detection, couldn't I run it as a service and then have you upload those images to
the service and the entire app would consume this memory, this energy? And more to the -- so
here you present one app that can benefit from an off loading scenario, where I question for that
particular app that it's the right way to write the app, and the other question is, is this typical for
apps on the phone?
>>: So the reason we wrote these apps is because they don't exist. I mean, the reason people
probably don't do compute intensive things on the mobile devices because they're compute
intensive.
So we want to enable new kinds of apps.
>>: But the app that you would take a picture of a -- Google has something where you take a
picture of some tourist attraction and it tells you where you are. Wouldn't you want to post that.
>> Mayur Naik: Yeah, I mean, of course. You can have other kinds of models. I'm not sure if
this is -- here the focus is on computing.
>>: Is this the right question to ask for phone apps, the energy offloading?
>> Mayur Naik: Well, I mean, I showed you, right, that Apple has ten billion app downloads.
>>: For those 300,000 apps that Apple has, how many of those benefit from [inaudible].
>> Mayur Naik: We haven't done that study. My guess is most apps are not even returned that
way just because this feature is something which is very new. Even cloud computing is so recent
that this is just trying to be ahead of the curve to enable these apps.
Let me move on to the next part predicting the performance of programs automatically. I'll use
the Clone Cloud as an example. What I'll show you, we do offline partitioning in Clone Cloud,
static partitioning. What that means it will use the same partitioning regardless of the input. But
we notice that for different inputs, different partitioning were optimal.
For example, if it's one image which is the input to the face detection app, it's optimal to run it on
the, entirely on the phone. Where as for 400 images, it's optimal to partition it. And the challenge
is how can we automatically thread it the performance metrics, such as running time of a function
like the face detection app, or the energy usage of the function on a given input. If you could do
this, we could make this choice in Clone Cloud online whether or not to partition.
But this is such a fundamental problem that, performance prediction that it has many other
applications in computer science. So wherever you need dynamic or adaptive approaches for
scheduling, load balancing, resource management, optimization, you need performance
prediction. And this comes up in various fields. Databases, networking, virtual machines,
compilers, cloud computing, and tile computing.
So I'm going to define the problem abstractly without any application. So the input to this problem
is a program P and an input I to that program. And the output should be the estimated running
time of the program on that input.
Since I'm going to use this as a running example in the rest of the talk, let me briefly describe
what it does. So it's a multi-threaded discrete event simulation program, some like to call it the
elevator program. And you can see the input to this program is a file on disk which has the
number of elevators, the number of floors, and it has a bunch of events, one per line. So, for
example, this event says that at time two a person wants to go from floor one to floor five.
We would like to estimate the running time of this program P on this input with the following goals.
So of course we want to be accurate in terms of estimation. Secondly, we want to be efficient.
So, for example, you're not allowed to run the program P on that input to completion and tell me
what the running time is.
You're at least allowed to look at the input, of course, and do computation linear in size of the
input. So you are, for example, allowed to scan this file.
Two other features that are unique to our work is we want this to work for general purpose
programs, and we want this to be fully automatic. So people have solved this problem of
performance prediction modeling in the past, but even the domain-specific context, for example,
just for database query programs or network applications where you can use domain knowledge
to build performance models.
Or they've done it manually. So where expert knowledge is used. Someone who really
understands the program is returned and can provide a performance model.
So our solution is a system called Mantis that has two parts, an offline part and online part. The
offline part takes program P whose performance you want to thread it, and it takes a bunch of
training inputs and it builds a performance model which is then fed to the online path when there's
a new input I in which you want to predict the running time of P. And it gives you the estimated
running time.
>>: The assumption, the compute [inaudible] mix of all of them.
>> Mayur Naik: One thing we're not doing here is we're not at all modeling the environment. This
is assuming that the running time of the program just depends on the input.
So other than that, we just require the program to terminate, so that we can estimate its running
time.
>>: I guess my question is, [inaudible] inside the program but [inaudible] the IO password over
the wire and then typically [inaudible].
>> Mayur Naik: Yeah, so that's what I meant by modeling the environment. I think this is -- that I
think we view that as an orthogonal problem and we haven't even tried those kind of problems.
>>: You also mean by that the caches.
>> Mayur Naik: Everything. Exactly. So all those properties.
>>: This is abstract this is more like big O.
>> Mayur Naik: Exactly.
>>: You want to give me big O.
>> Mayur Naik: It's not exactly big O because we're going to say one minute or 42 seconds for
this input.
>>: Oh. So this is different.
>> Mayur Naik: So much of the work in programming, in program analysis, right, has been on
worst case inputs. So you're not even given input I you're given input P and you want to know
what the worst case input.
Whereas.
>>: If you do not model all the cache and network and all this stuff, then when you report this,
what is my -- know what you're going to report?
>> Mayur Naik: So that is true. But to a rough approximation, right? I mean most programs
running times do depend on their inputs. There are certainly classes of applications where ->>: [inaudible] coming back ->> Mayur Naik: I agree this is not the complete solution. There's the environment and so on.
But any solution will have to take this into account.
>>: So really the estimate then here is an actual number.
>> Mayur Naik: Is an actual number.
>>: Okay. But we're not allowed to run the program?
>> Mayur Naik: Yeah.
>>: But during the training you are running a program.
>> Mayur Naik: Yes, of course in the training program we are going to run it.
>>: I'm not quite understanding the answer you had about the speed. Seems like the answer was
about worst case performance but speed actually doesn't give you worst case performance it
gives you performance as a function of the inputs, which is very similar to what you're talking
about, I think. So how do they differ, then, the two approaches?
>> Mayur Naik: I don't understand. So the output of this whole exercise is going to be a number
such as ->>: Right. It will give you numbers. For worst case, right?
>>: No, it's not worst case. If you call function with input N, it will tell you a number like two
minutes based on N.
>> Mayur Naik: Okay. So eventually we'll -- that's an example where it's not fully automatic.
Someone tells you there's an input size N on which the running time depends.
And I'm going to automatically get a way to compute N for you. So it will be the same as N, but it
will be done in a way which is automatic.
Let me get to the end of this, and you can ask me again if you still have the question.
I'm going to define the off line stage over here. So what we're going to do, since we cannot use
either domain or expert knowledge, we are going to instrument this program with broader classes
of features.
So distinguishing characteristic of any performance prediction technique is what feature it uses to
model the performance. So in our case we are going to say that there are three classes of
features we're going to instrument. All loop counts. All branch counts and various statistics on all
values of variables of primitive data type. Such as the frequency, the sum, average and so on.
Because all of these are potentially correlated with the programs running time. So let me give
you an example of a loop counter. Here's a counter F 1 that we'll instrument. This one as you
notice is counting the number of flows which are going to be added.
You have another counter here, F2, which is going to count the number of elevators in the input.
Here you have a counter F 3 which is counting the number of events, how many people want to
go from which floor to which floor. Here's an interesting counter coming from a different scheme
where what we're going to do is compute the sum of all the time fields.
So a person wants to go from one floor to another at a certain time, we are going to take the sum
of all of those lines, because again that's potentially correlated with running time.
What we're going to do next is run this instrumented program off line on all the N inputs and we're
going to get for each input the exact value of each of these counters and the exact running time
on that input.
Now, this is a classic machine learning problem where we want to approximate the running time
R as a function of these features. And we use up to cubic polynomial expansion, and this is, for
example, a running time function that we might get for this program.
While this is mostly standard in machine learning, there are two distinct machine aspects about
our work. So one is nonlinearity and another is sparsity. So by, nonlinearity what I mean is we
want to allow terms such as N cross M, where N might be the number of times a loop iterates and
number is the times a nested loop iterates for each iteration of the outer loop because that's what
models ->>: F1.
>> Mayur Naik: Yes, cross terms aren't easy, even though they aren't appearing here.
What I meant by sparsity is we want to choose just a handful of features in this running time
function, even though in practice we have thousands and tens of thousands of features here. We
need sparsity for two reasons. One is we don't want to overfit for off line inputs and the second
reason is that remember that our ultimate goal is performance prediction on a given input.
We're going to have to evaluate the values of these features like F 4. Any feature that appears in
this performance model on this new input I.
So the lesser features we have, the better.
>>: How ->>: [inaudible].
>> Mayur Naik: Okay. So exactly. So this is what comes to the next point which is how to
evaluate these features. And we use a classic program analysis technique called static slicing to
automatically obtain code snippets whose goal will be to only compute the value of these features
Now, let me ->>: [inaudible].
>> Mayur Naik: I'm going to run the slice on an input.
>>: Can't you use machine learning to predict those?
>> Mayur Naik: On a new input. -- I think this is what Ben also mentioned somehow you look at
the input. The main difference is that would be a black box performance prediction and it just
says show me the inputs like the file size, the command line arguments and I'll do machine
learning on some features over there.
And maybe N is one of the features you get out of it, right, but what we're saying, we're
subsuming that, we're saying let's look deeper into the program and ultimately things like the file
size and so on will be captured by some feature over here.
So in some sense it's more general than that.
>>: So if I understand correctly, the slices that you're going to compute, they're going to run some
of these loops to compute these for you?
>> Mayur Naik: Exactly.
>>: But if my program basically everything that happens that's going to take time is in the loops,
right? So if you're basically going to run all the loops, without ->> Mayur Naik: Excellent question.
>>: It seems like cheating, right?
>> Mayur Naik: What you guys are doing is nicely giving me the segue to the next point. So let
me come to that. This is exactly why loops are not the only features we have loop counters, so
most previous work only does performance modeling. If you had only loop counters, that would
be enough to model the performance of any program, because that's where programs spend
most of their time.
But we also have other statistics such as variable values. So imagine a loop runs from 0 to N
minus 1, then we will have features that both count the number of iterations as well as the value
N itself, which might be computed in constant time.
So anyway what is slicing? So the static slice of a program variable such as a feature F 4 is the
set of all actions that might affect the value of this variable.
And the goal is to have as small a slice as possible. The standard way to compute the assisted
data and to control dependencies I'll show you the slice feature 4 which is in the performance
model.
Clearly the statement that writes to F 4 has to be in the slice, and then I'm going to go a bit faster
with this. This is how we compute data dependencies. So, for example, here you refer to
variable T. And it was returned there. So you have to include that.
The nontrivial dependencies such as here, for example, not only is B returned here and is radial
you need this dependency but you need a pointer analysis which tells you other places where you
could have returned data that is on which the statement is dependent.
There are control dependencies as well which I won't go into that which these far loops I included.
This is the slice for this feature. Notice that what it has sliced out is the part of the code which
builds the floors and elevators. In a real program, this will in practice slice out large parts of either
one computation for any given feature.
Now what Manuel asked what if a slice is expensive? So there can be two reasons why a slice is
expensive. Either we are doing imprecise slicing or the feature is inherently difficult to compute.
So maybe it's a loop counter which is dominating the running time.
What we can do then is, first of all, how do we even measure a cost of a feature, right? We have
this off-line data so we can simply run the slice on each of these inputs. We know the exact
running time of the slice now. We can ask is it more than ten percent of the running time. We
can set any threshold the user is willing to have. And we set it at 10 percent. If it is more than 10
percent on any input we can simply say throw this feature out and repeat the process, starting
with regression. This time regression will not be allowed to use that feature, let's say F4. But it
will have access to many other features, presumably equally valid ones. With each iteration the
accuracy of prediction drops, because now regression has denied the use of the best more
correlated features, features most correlated with running time. But, on the other hand, the cost
of prediction itself is dropping because it's picking features, the slice is differentiable.
Let me step back a bit and show you where program analysis and machine learning is interacting
here. We have dynamic analysis here which is instrumenting the program, so on. We have
machine learning here doing regression, and we have static program analysis here doing slicing.
These are not just three pieces which are just loosely connected. They actually title it. For
example, dynamic analysis provides training data for doing machine learning. Machine learning
builds a performance model, gives certain features to the static slicer. The static slicer provides
profile data to data mine the cost of slices, and it might decide to reject certain features that are
too expensive. This is where you see the iterative process.
We've run Mantis on some real world programs, I'll show you one which which is Lucine
[phonetic]. Open source text search and engine. The dataset we used was the works of
Shakespeare and King James bible. We use a thousand different inputs. The inputs is a list of
words to search in these data sets and give statistics, the frequency and so on.
We used a hundred of these for training and the remaining 900 for evaluation or test. So as you
can see our Mantis instrumented 600,900 features, but regression only considered 410 as
serious contenders.
What that means is the rest either stay constant, have no variability across runs, or they're weakly
correlated with running time and so on. But, of course, not all 410 of these features are equally
cheap to compute. The iterative process finally chose two out of these 410 as having a good
trade-off between prediction accuracy and prediction costs.
>>: What were they?
>> Mayur Naik: The two features, one was like the end you mentioned. So the input is a set of
files actually, each of which has a bunch of queries, one per line. This feature, what this does it
counts the total number of queries in these files. The interesting part is the slicing part. The way
Lucine is returning, it picks each query from each file at a time and immediately searches it in a
database. And it's already built the database and indexed it up front.
What the slice will do now, it's going to slice out the indexing part as well as the search in each
iteration, if you were not to slice those out you're pretty much entirely running the Lucine engine.
The goal of slicing is to remove one computation which comes in the way of evaluating a feature.
The other feature was just the number of threads we spawned, so Lucine is multi-threaded.
So ->>: What was the other feature again?
>> Mayur Naik: It is the number of threads. It's actually not exactly the number of threads, but
it's correlated with that because you can spawn a bunch of query processors to ->>: That's measured to the number of queries and the input.
>> Mayur Naik: It is also but the more threads you might have, at some point you won't have
speed ups but it will affect the running time how many threads you spawn. It's another input
besides query files.
Here's two graphs, one showing prediction accuracy and the other showing prediction cost. Here
we have 900 points, one for each of the 900 test inputs. And this point has an intercept on the X
axis, which is the predictive time by Mantis on that input. And the Y axis, the intercept is the
actual running time on that input.
And as you can see, these points more or less lie on the 45 degree model which means we're
doing close to perfect prediction. In fact, the arrow is just 4.8 percent on average on the 900
inputs. Looking at the cost of prediction, what we have is a CDF of the execution time, both of
the total program and which is shown in red, and the cost of these, of running of these two slices
for these two features, which is shown in blue.
And as you can see, even though the running time has a lot of variability, the slices, more or less,
remain constant. It's actually not constant. It's linear in the size of the input. We can see a slight
curve here. The point is that the slices give a 28 X speedup over time. What it means the slices
execute just under 4 percent of the total program in order to estimate its running time.
So that was all I'm going to talk about performance prediction, yeah.
>> Then you can answer questions. I was looking at the Mantis paper. This is based on Lee's
squared optimization or optimization using these squares?
>> Mayur Naik: Yes we use a technique called Lasso. Do you want me to go into those?
>>: No, I want to check the assumptions which is if you have two -- how well do these methods
capture feature attraction, so if you -- so in -- well, I will elaborate on that question later on, but it's
not a precise question. But what I didn't get with the transition from the first to the second part
was here you predict runtime and performance. In the first part you would predict energy
consumption.
>> Mayur Naik: Yes. They're too often correlated. It's one example of performance metric.
>>: Cell phone energy consumption can come from the [inaudible] units. You have various parts
of your device which contribute to the energy consumption, if you're looking at the program.
>> Mayur Naik: We can always model more features. So the external things which affect energy
consumption, we can -- we can stop at 6,900. You can point a few more and still model that.
>>: Actually found it very interesting for programming, actually, there are very small features that
are good features. Do you define that for this problem?
>> Mayur Naik: Part of the problem is I would take that with a pinch of salt. Because these runs
are also generated by us. What I notice is more often it's trying to fit the training data that we
have. And ideally we would get data from the wild, and even if that was at a certain profile, then
that's fine, then that's what the data was.
So it is always a small number of features. We have run this on six different programs. And it
turns out that even if you go beyond cubic models, if you allow many more features, many more
terms, the accuracy doesn't really improve.
>>: So it's actually -- so the performance debugging, then I can use this technique to sort of focus
on if I have to include the performance of this program, could I use these features to say these ->> Mayur Naik: I don't know. I think this information would -- I think one of the goals is
performance debugging. But we really haven't applied it to that. Because this is capturing the
dominant running time. I think probably you would know that. So if you didn't know, then
knowing it wouldn't probably tell you how to speed it up.
>>: In your experience you did define other mixed terms at times.
>> Mayur Naik: Yes.
>>: In the profile.
>> Mayur Naik: Yeah.
>>: Can you compare and contrast to the work that Chan, [inaudible] are predicting?
>> Mayur Naik: I think I know vaguely about this work. I think the key thing, I think the key
difference as I mentioned up front this is not domain-specific. So I think if I believe correctly what
he's trying to do is model compiler optimizations, various flags to JVMs and so on. So he picks
features such as those. And depending on what features are used to compile a program. He
wants to estimate how long the compiler might take.
I think that's what he's trying to do.
>>: And the analysis -- I think it's closer to your work than anything.
>> Mayur Naik: Yeah, so he's probably the -- we do cite his work but it's kind of not at the top of
my head right now. I guess one thing is certainty, he doesn't use program slicing, maybe he runs
the program and hopes that all the features are correlated with running time are there up front.
Which is the case for most programs, they read [inaudible]. This is more for programs that are
lazy and might read inputs, environment variable.
>>: I think some of the -- I think one of the things he looks at are [inaudible] so he has a notion of
features kind of like what we're talking about -- but the flip side he's doing things where he can
predict specifically where in the program the time is to be spent. And that's the result, they can
improve performance of optimizer, the optimizer on the part of the program of the most time.
>> Mayur Naik: I see.
>>: Follow up on this, the programs or inputs [inaudible] accuracy.
>> Mayur Naik: It is. Actually what we have noticed here, right, is that the bias is -- so I mean
just want to ensure as well. But we have run this on programs that actually take exponential time.
So, for example, to have [inaudible] a program or sack solvers, what it typically does is sacrifices
accuracy by a huge amount for these outliers and then just goes with a more dominant trend in
the training inputs.
So I should really provide error bus for that but I don't have that.
>>: [inaudible] which is picking up a data structure for sort of keywords. Enabling search to back
end [inaudible] presumably in memory table, right. But I guess I'm going back to your previous
point, so for inputs that essentially you say wow this is within five percent that's great. But would
there be cases where you would really be off but we're distinguishing where this is, the case
getting back [inaudible].
>> Mayur Naik: So all this is future work. I agree it's something we don't do right now.
So let me come to the final part which is scaleable program verification. So I'll use Mantis itself to
motivate this problem. So notice that I talked about slicing earlier, and just using data and control
dependencies was fine as long this program was single threaded but in fact this program is
multi-threaded.
What this means is you have to be careful to ensure that the data control flow of one thread, in
this case the main thread which contains the slice, is not affected by actions of other threads. So
in this case the elevator threads which are spawned.
So what you have to do now to do sound slicing in the presence of concurrency is either show
that all of these actions only touch thread local data, so data visible only from the main thread or
you have to include all other actions in the slice.
Because you don't know if the elevated threads might actually affect the value of the F 4. So this
problem of proving noninterference between actions of different threads is well known in
concurrency. It has many applications it's called arrays condition. And one way to prove a pair of
actions is noninterfering or freeze free is to prove that actions involved is attaching only local
thread data.
That's what I'm going to focus on in this talk. So this is in the literature it's called a thread escape
analysis problem. We are going to phrase it in terms of queries. We'll take a pair V comma D
and this query will be true for all inputs of this program. Whenever a thread reaches program
point B this point variable B is pointing to an object that's not even reachable from any thread
other than the current thread.
To give you the early weight example, let's say this is the program point B and this is the variable
V, which is a button press event.
I'm going to show you just one program state that arrives at P on one input. But the reasoning
will be similar for other states.
So here's how the data structure built by this program looks like at program point B. One
snapshot. I'm going to use red for shared locations and blue for local locations.
So as you can notice, the building, the floors are shared between the two elevator threads, and
the main thread. But this part of the data structure is local to the main thread. So in particular V
points to a button press location, which is local only to the main thread. And we want to be able
to prove this. It's easy to see here actually this query is true because no matter how many button
press events you cleared in the input or how many flows you create, these are always going to
point to a local location whenever the main thread reaches P.
So in order to prove this, we need to use a static analysis, and I'm going to go a bit fast here
because people here know what static analysis is. So all static analyses need abstraction, and
there are certain reasons why we need that. The most static analyses will abstract two things,
including our analysis, one is pointer locations and the other is control flow.
And both of these are statically unbounded in real world programs, because of, say, dynamic
memory allocation and recursion, loops.
So pointer abstraction is -- there's a whole field called pointer analysis which focuses on this. I'm
going to give you a flavor of some of the abstractions in this field.
So I'm going to start with a trivial abstraction, which is going to say -- we're going to use one
abstract location to model all concrete locations.
And clearly this cannot prove this query. And why is that? Because the static analysis cannot
distinguish this local location from, say, this shared location. And so it has to assume that we
might point to a shared location.
Let's look at another abstraction which is slightly better. So you have -- it's what is known as
allocation side point abstraction. What it means is that it abstracts all locations created at the
same allocation site using a separate abstract location.
And in this case again we are unable to prove the query but for a more subtle reason. The
reason here is the confusion now is in this abstract location.
So these two locations are being confused. But because thread sharedness is a transitive
property you have to assume that everything reachable from one shared location is also shared.
And so we cannot prove that V points to a local location.
And this goes on. So there are other kinds of abstractions, for example, there's K of SA which
finally does prove this query, but let's see where all this is headed.
So we have -- we saw three different kinds of abstractions. As you go through the more and
more sophisticated ones, you are able to prove more and more queries, being more and more
precise, but you're also being less and less scaleable.
For example, this is constant. This one takes linear number of abstract locations and this one is
exponential in K. And in practice you can rarely go beyond K equal to 1 for the real world
programs.
And that is just part of the point abstraction story. There's also control flow which I mentioned.
We have notions called flow and context sensitivity. And again you see the same trend here
except it's much worse this time. What you see here is if you want to fully flow context sensitive
analysis, then the overall analysis becomes exponential in the number of abstract values here.
Okay. So it turns out that for thread escape analysis we will have to be flow and context
sensitive, and what that means is we now can't even use allocation sites because it's linear,
because the total analysis will become exponential.
So what that means is that we are limited to a constant number of abstract values.
And as static analysis, I'll show you is going to have just two partitions. So we are served from
the exponential blowup. There's still some exponential blowup. It's still in the number of fields,
but I'm going to show you that in practice it's a small constant number of fields that matters.
One interesting thing you'll notice here is what was not there with here is you have a dependence
on the number of queries, it's linear. What we're going to do is we're going to run the static
analysis separately for each query. I've shown you each query but in fact we have thousands of
them.
This is the reason why we are able to use two partitions and eliminate the exponential
dependence on N. It's also the reason why S, the number of fields we are going to track, is very
small. Because we are focusing on one query at a time.
>>: What is L.
>> Mayur Naik: L is the number of program points. Because it's flow sensitive you have to keep
a separate state at each point. And F is the number of instance fields.
So one drawback, interesting approach suffers from that is different queries coming from different
parts of the program clearly will need to abstract different data structures precisely. But existing
static analysis mostly use a single abstraction, A, to prove all queries simultaneously.
And what that means is you either use a very precise abstraction that proves many queries but
now is not scaleable or you use something that's highly scaleable but doesn't prove most queries.
So the first insight -- we have two insights first one is client-driven static analysis. This is a known
concept in program analysis. But I'm going to show you how we apply it for thread escape
analysis.
So first is we're going to be query-driven. What that means is we're going to run a separate
analysis, conceptually, for each query. Secondly, we are going to be highly parametric. So each
will be dumb, it's going to say you give me the hint for what abstraction I should build.
And we are going to choose highly flexible parameters. So here you can imagine there are five
program parts that need to be modeled with varying amounts of precision, but in practice we'll
have thousands or tens of thousands of different program parts.
So that each query can be highly specialized to abstract only a small number of program parts
that really matter. Okay. Coming to our thread escape analysis, the parameter here has one bit
per allocation side in the program. So this has seven sides. In practice there are thousands.
And we can say -- we can tell the static analysis to treat sides one, 5, six and seven precisely.
That means bit one. I'm going to denote it by the tan color. And the rest imprecisely, which
means I'm going to use the white color.
What this induces is the following abstraction. Okay. So now this is not an object allocation site
abstraction. It is something that can be understood in separation logic and tree value logic and
so on.
I'm not going to go into those details, but this is what the static analysis ends up computing if it
gets this parameter. As you can see this actually proves the query. The reason is because you
don't see any shared locations here, nor do you see an edge from any of these shared locations
to this partition. This is the relevant partition and this is the irrelevant one.
So the second insight -- this is all static analysis so far. I'm going to show you how we use
dynamic analysis now.
So the challenge here is these parameters, so, first of all, we have thousands of queries. And for
each query we have a parameter which has thousands of tens of thousands of bits.
And to make matters worse, most choices are not going to work. Okay. Either they're going to
fail to prove the query or what I mean by that they're going to be imprecise, or they're not going to
scale. So there's still an exponential dependence on the number of fields and that really comes
from how many sites you deem as relevant to proving the query.
So our solution to find these sites, there's three challenges, how do you efficiently find these bits
and furthermore they should be as few bits as possible so that is the sheet and they're scaleable.
And furthermore they should define the query. What we're going to do is use dynamic analysis,
we're going to run this program, take all the queries and observe how they behave and we're
going to come up with values for these parameters for each query.
I'm going to just give you the highlights first, what are our main results.
So clearly the procedure is efficient. So we are linear in H as opposed to exponential. But now
the catch is we might fail to prove queries. But this is nice empirical and theoretical properties.
The empirical property is this whole procedure is precise in practice. I'm going to show the vast
majority of queries are going to be proven using the parameter values properties that dynamic
analysis gives.
The theoretical result is any bit that is set to one by this dynamic analysis must be set to one by
any parameter value that is able to prove the query. So the dynamic analysis sets things to one
only if it's absolutely certain that the only way to prove this query is to set this bit to one.
>>: Your dynamic analysis [inaudible].
>> Mayur Naik: Yeah.
>>: Or do you do collecting on top of that?
>> Mayur Naik: You could do those kinds of things as well for efficiency or for to mimic the static
analysis.
>>: How far are you going in the static.
>> Mayur Naik: We aren't, actually. So we don't mimic the -- it does have some knowledge of
the static analysis. But it doesn't do any abstraction. It's purely dynamic.
>>: But it does know the query.
>>: It does know the query, of course. I'll show you in a moment what it does.
So what this means if we end up proving a query we'll have a minimal abstraction, the smallest
number of sites that need to be set to one to prove that query. What does this dynamic analysis
look like?
So it starts by this vector. It assumes everything is irrelevant for a query, for each query, and I'm
going to show it just for this one query. Every time it reaches the program point P, which is the
query point, it looks like the heap. And this is already available in JVM so any managed
language. So we're not actually backing this out.
It asks what V the query variable points to. If it points to a shared location, then we are actually
done.
>>: Reachable.
>> Mayur Naik: We know what is reachable for multiple threads. We tried that, there's an
instrumentation that tells us that a location has escaped. So if it is shared, then we declare that
there's no way to prove this query thread local because they observed its sharing. No abstraction
going on here.
But if it points to a local location we go a step further. And what we ask is what is the side at
which that location was allocated. In this case it's six. And so we say that side has to be treated
as an element. Because if you don't, then we know there's a background proof that says that the
static analysis would not be able to prove this query using that abstraction it has.
Just that side is not enough. We have to take the backward transitive closure the way that thread
escape analysis works.
So we end upsetting the bits for all the sites at which any location was allocated from which we
can reach this location.
Even though it looks like 4 out of 7 locations in site here in practice you can think this vector has
thousands of entries but we still set these four bits. And this is actually the way you would reason
about this query as a human. So the reason we have to minimal abstractions not only because
it's scaleable but because it's also the way we would reason about these queries, if you were
asked to prove this query.
I'm going to show you some benchmarks, some experimental results here. So we have a red
color, Lucine search, database program, a microcontroller, simulator and a four query [inaudible]
rendering system. The benchmarks are half a million byte codes. As I said we have up to 6,100
allocation sites and up to 14,400 queries. These queries essentially are elementary read and
writes in the program and in all instances reads and writes.
So here we are going to compare our approach, which is this true partition point abstraction with
flow and context sensitive flow abstraction with an analysis which is allocation side point
abstraction but flow and context insensitive. We've tried this point, but it doesn't -- our analysis
doesn't even terminate using allocation site and flow and context sensitivity for any of these
benchmarks.
This is really true partition abstraction which enables flow and context sensitivity.
The previous approach, as you see by the bars here, only proves only 27 percent of these
queries on average. But our approach ends up resolving 82 percent of these queries, the red
plus blue parts. The red parts are a simple part, because we do dynamic analysis, we can
observe queries which are escaping, the 27 percent, the red parts here, are not static analysis
which sound can prove them.
The blue parts are more interesting, they are the cases the dynamic analysis made a guess for
the parameter value and the static analysis was actually able to prove it.
55 percent of those queries, the remaining 18 percent on average.
>>: [inaudible].
>> Mayur Naik: Also, I forgot to mention it you could run it multiple times and trivially combine the
vectors but we run it on one input because that's the only input we have. If we run it on other
inputs we have -- we really need very different inputs to exercise other parts.
So just assume we run it once.
>>: [inaudible] the analysis, the one goes where in some part of the program, right?
>> Mayur Naik: So I have a reason for that. I think it's partly related to the properties we're after
which are very simple. I think we see this over and over again in program analysis, that proofs for
things like thread escape are really simple, and just one path is needed to observe, if you ever
reach the query point, the accession point, you saw the results and you also see the reason why
it didn't fail on that one part.
>>: [inaudible].
>> Mayur Naik: That case of course this is not prove for everything this is why it's 18 percent
false positives.
>>: [inaudible] what could be even.
>> Mayur Naik: I'm not showing you those query points I'm shooting a bit over there. But if you
don't hit an assertion we're not trying to prove it.
>>: So here we have [inaudible] thread first and then the static analysis takes over, there's no
going back.
>> Mayur Naik: No. So since there's passive use of dynamic analysis, going back and forth will
be like a thread of refinement and so on. We don't even have infrastructure, that would require
driving programs across ->>: I'm just asking, you have a step ->> Mayur Naik: Okay. So the 18 percent false positives are mostly because of static coverage
problems. So it's actually a very simple coverage problem. What it means is that the assertion
was reached but some site whose bit should have been set to 1 and deemed relevant was not
even reached.
What this means if you could simply at least reach all the sites you don't need to reach them in a
fancy state. But if you just hit them and there's a very high chance that we'll actually end up
finding the proof.
By the way, one more thing I should mention we're working right now on using machine learning
for this. Now we have all this data for 82 percent queries, these are minimal abstractions now we
know the correlation between which sites are relevant to proving which queries.
So in the end the hope is to even completely throw out the dynamic analysis and just use
machine learning to predict which sites are relevant to proving which queries.
>>: Doesn't machine learning require ->> Mayur Naik: There will be features intermediate features not loop counts it will be more
sophisticated it could come from things like program analysis and points and so on we'll have to
run some amount of analysis but I don't have results yet. It's just a thought that ->>: Why wouldn't you want to throw in -- cheapest bang for your buck, right.
>> Mayur Naik: Programs which you can't run and so on. So let me quickly finish.
>>: You're fine.
>>: If there are programs you can run ->> Mayur Naik: Okay. The queries you can't do in reach -- okay. So the running time
breakdown, as you can see, the analysis we compare against is really cheap. It's a whole
program flow context insensitive analysis, just takes one minute on the largest benchmark.
Dynamic analysis is you can make it arbitrarily cheap. You can sample and so on. It doesn't take
much time.
The static analysis, the total time is quite significant. Sometimes over an hour. But the key thing
to notice here is that we are doing a static analysis separately for each query, and what I'm
showing you here is running the static analysis sequentially for all queries.
In fact we do some things slightly smarter we group queries after the dynamic state that have the
same parameter value. Which have the same abstraction. That gives a big reduction. What I'm
showing you here is the mean time it took for the static analysis for any query group and the max.
So we can see the max is just 21 seconds, which is if you were to spawn all of these static
analysis in parallel you'd be done in 21 seconds.
>>: Total time is -- looping already?
>> Mayur Naik: The total -- well, this is running it for each query group serially. It includes
grouping already. So as an example if you start with 14,000 queries we end up with say 400
groups or so. I have the numbers but not here. Shedding.
>>: What engine is the static analysis.
>> Mayur Naik: It's [inaudible] gives algorithm, which does fully context sensitive and fully flow
sensitive static analysis. I think it's the same that's used in SLAM as well.
>>: Reachable?
>> Mayur Naik: Yeah.
>>: It's field sensitive?
>> Mayur Naik: Yes.
>>: Specialized program, because my impression from the slides and from [inaudible] is if you
use a data mining engine, it's essentially you can express it in data but if you do the magic set
transformation, you get the same slice based on the queries.
>> Mayur Naik: I don't think so. We actually do all this. I can talk about all those things off line.
But this is not -- we don't really care. I don't think I have good ->>: You're not going to learn all the features.
>>: Right. Transformation at least can clearly provides the -- [inaudible] it's not clear how much
[inaudible] it's set to. Allocations.
>>: Maybe we should go off ->> Mayur Naik: So just to show the sparsity of abstraction. So how many sites were deemed as
relevant? Here you can see the total number of sites. Again, up to 6,000. For all queries for
which the dynamic analysis guesses these parameter values. You can see the mean is quite low
which means few sites, few things are relevant to proving any query on average. The max can
be pretty high.
But if you look at the ones which were actually proven as opposed to to all for which there's
dynamic analysis guessed the numbers drop. These drop. The more things deemed relevant the
more the chance the dynamic analysis is going to lose on coverage. It's not going to see some
things which are relevant. So for the things which end up being proven for the queries which are
proven, these numbers would be slightly lesser.
It's still impressive, though, that, for example, in a case where there were 31 sites which needed
to be set to one, the dynamic analysis actually saw all of them. So notice if you were to flip the bit
for any of these 31 sites, the static analysis is guaranteed to not be able to prove this query.
>>: Tried to run the experiment now, it's a matter of time it takes to analyze the whole set of
queries, where you cut off at 31 and say, okay, if it's more than 31 I'm not going to run it.
>> Mayur Naik: That's an interesting point. I haven't tried that.
>>: So some stuff that we used in the optimization using ->> Mayur Naik: But --
>>: It's not going to work out and so you're going to cut off the query time.
>> Mayur Naik: The question is -- you're not asking how to choose it because choosing which 31
would be ->>: Not which 31, just have a cut-off in terms of the number of features that were picked and say
it's more than 30, let's not even run the query because the likelihood of proving.
>> Mayur Naik: If it needs more than 30, then it actually needs more than 30. Of course. We
could fan it out.
>>: Analysis question. So for your benchmarks, what percentage of [inaudible].
>> Mayur Naik: So this is not just really shared between threads. Anything reachable from a
global is also considered thread escape positive.
So I'll show you the numbers. It was 55 percent is what we could prove. My guess is all those
18 percent are false alarms. So 55 plus 18.
>>: These are guaranteed do not have any threats from ->> Mayur Naik: 70 percent is what I see.
>>: It's not even local. It's not even locking.
>> Mayur Naik: These are not real reachable from global. Different programs have different
styles for what they store and make it reachable from a global, even when it's a completely single
threaded program, and even single threaded programs here, see what is reachable.
>>: It's not current data streams. It's program points across ->> Mayur Naik: It's static.
>>: [inaudible].
>> Mayur Naik: Okay. So we have implemented this thread escape analysis, and not just
applied it to static slicing, but even concurrency data, reduction proofs, recent deadlock and
constancy performance checkers. All require thread escape analysis. This is actually early work
in my Ph.D. where we actually ran these on real world programs. And these are the kinds of
reactions we get which I'll come back to the comments I showed earlier about this Java expert,
Java concurrency expert. But most Java programs have concurrency bugs.
Some projects actually shut down after we found so many bugs that they believed
synchronization was either [inaudible] but these are the ones that survived actually still.
>>: How do you use this to detect bugs?
>> Mayur Naik: These conditions -- the goal of the strategy we have in all these tools is prove as
many things risk free and then report everything else to the users and the hope is to make that
really small.
>>: Just push -- I think there's just so much you find so many bugs that they shut it down, who
was using that stuff?
>> Mayur Naik: No, no. So these are projects that actually people are, so JD Chris -- so this is
Java, a JDBC interface which actually -- what they claim it's the fastest for Microsoft SQL Server.
You can see the source for these projects. They're widely used.
Many of these bug fixes weren't by the project developers it was by others who were using these
libraries.
>>: Understand. I've never heard of a project shut down because somebody program finds so
many bugs that it has to be shut down. Usually if you have so many bugs while people are using
it, users, experience with bugs. Otherwise these bugs aren't [inaudible].
>> Mayur Naik: So it's true.
>>: That's the kind of thing. That's why I'm surprised.
>> Mayur Naik: There are two that were shut down. There were probably other reasons to shut
down. [laughter] but this is one of the reasons where ->>: Violation of ->> Mayur Naik: They didn't close the bugs ->>: They used this. This is the first I've ever -- let me see your programs. Completely shut down.
There's so many bugs in it. My program analysis ->> Mayur Naik: Okay. So I started with partition analysis, which was about partitioning
programs. And I told you how it can use call graph analysis to decide what to migrate. I said if
you wanted to do dynamic partitioning you can do performance prediction, which in turn requires
slicing, and because this needs data and control dependencies, you need a call graph and you
need a pointer analysis. If you're going to slice concurrent programs, you need arrays detector.
And of course it needs various analyses, different ways to prove things arrays free. I went into
detail for the thread escape analysis. And I think someone mentioned, Modan, these have many
other uses other than the ones intended for. I've given some examples here.
And all this actually is publicly available, and they are actually integrated this way, using our
program analysis platform I've been building over the last several years, generally represented to
a real FLDI this year. Let me tell you a few details about this platform.
So of course each analysis is returned in isolation, so once it's returned you can reuse it for
various purposes other than the one where it was originally intended.
And each analysis builds upon others that other people have potentially built. These
dependencies you see here are data and control dependencies. Even though this looks like a
DAG, they actually have cycles. Even within this there's different queries so there's concurrency
within each block as well.
So these dependencies have semantics in a declarative parallel language called CNC or
concurrent collections that Intel and Rice is jointly developing. The runtime we use is Rice's
Habanero Java built on top of X10. Why do we need a parallel runtime? Because there's a lot of
parallelism in program analysis.
Whether on a multicore machine or even better on a cluster. What are the cool things we can do
once we expose these dependencies between program analysis?
One we can do demand-driven computation. We can say I want to run the arrays detector in call
and it will automatically figure it needs to run these four analyses.
The second is you can do, reuse of results. So if you ask for the call graph multiple times it will
just be computed once.
Another is the running independent things in parallel. Not just at a coarse-grained level such as
these two analyses, but even within a thread escape analysis, so different query groups can be
scheduled in parallel. All this happens despite having loops and so on. We have iterative
refinement analysis and so on which induce loops.
And despite all the parallelism, we also have a guarantee for determinism, because of CNC's
dynamic single assignment form. So no matter how many times you run call, it might choose
different strategies to schedule things in each one but at the end you're still guaranteed data
parallelism.
>>: Java [inaudible].
>> Mayur Naik: So all this is Java. Extent is an extension of Java, Habanero.
>>: Is there a top ->> Mayur Naik: CNC is just a programming model. It has Java, C++. Python. You can
implement it for any language you want. It's about connecting boxes and arrows and giving them
dependencies.
>>: The box, it's your Extent.
>> Mayur Naik: So I confused you by saying something happened within this process box. All it
will be exposed out. Think of each box has something returned in Java or Datalog, C++. Doesn't
even see what's happening here. As long as it doesn't have side effects. The dynamics single
assignment property doesn't hold.
So we have built several systems, tools and frameworks that other program analysis researchers
can use using Call. I just described three over here, Clone Cloud, Mantis and arrays detector we
have.
There are people outside program analysis such as in systems who are using call. So for
instance here's a student at U.C. Berkeley doing his dissertation on how to automatically mine
configuration options and their types for complex systems software.
For example Hadoop, is one of his experiment benchmarks. He's trying to figure out what are the
configuration options for these pieces of code.
Why is he doing this? Because there's a company called Cloud Era in the Bay Area which gives
Hadoop this map reduce framework as a service. When they get the bug reports they don't know
what configuration people used.
If you can mine these there's several versions of Hadoop lying around. If someone can mine
these and automatically generate their values then debugging would be reproducing would be
much easier for Cloud Era.
I don't think I'll go -- should I go into this? So what I talked about today was how can we use
computers automatic techniques for solving some of these software challenges. I've just
scratched the surface of what can be done by combining program analysis and machine learning.
Some interesting reasons why machine learning should be used is one is we have exponential
search in many of these problems, like static analysis and we have sparsity, you saw it both in
Mantis and thread escape. Very few things are relevant, either to proving or performance
modeling and so on.
Often you have incomplete or noisy data. So, for example, in performance modeling. And earlier
work I did on cooperative bug isolation.
So for these three reasons I believe machine learning has hope to solve program analysis
problems.
But I don't believe computers can do -- solve all these problems. So we need better languages
and models. So CNC was one model where you saw you could really leverage the concurrency
in Call. There are various versions of this. We can extend languages, restrict them or have blue
languages. And CNC actually falls into this. It's a blue language.
And finally I'd like to exploit domain language, even though much of the things I've shown you are
general purpose, I believe one can use a lot of domain knowledge to make these problems more
tractable. So with that, I'll conclude. Modern computing platforms have these exciting software
engineering challenges and we can combine these various technologies to solve these problems
effectively. And finally one thing we noticed was program analysis can be used to solve problems
that they weren't intended for, for example, slicing can be used for performance prediction, can
get called from this website. Thank you for your attention.
>> Tom Ball: Thanks, Mayur.
Download