>> Juan Vargas: We are ready to start with the... is "Tools for Parallel Testing and Debugging." And we are...

advertisement
>> Juan Vargas: We are ready to start with the second session. This one
is "Tools for Parallel Testing and Debugging." And we are going to have
two presentations from the University of Illinois. Darko Marinov is
going to talk about the testing tools at University of Illinois. Then
Danny who just came from the other building, he's going to be talking
about "Refactoring." Then we'll have Koushik Sen -- Koushik Sen, hey -who just got tenure.
[ Audience clapping ]
>> Juan Vargas: from UC Berkley talking about "Active Testing and
Concurrit." And finally Sunny Chatterlee, right here from Microsoft, is
going to be talking about "Fighting Concurrency Bugs with Advanced
Static Analysis Technology." This is going to be a very compressed
session. We may or may not finish by 12:00 because we will have lunch.
So if we can make it probably 12:15 because lunch is going to be in the
back wall. [Inaudible] going to be okay.
>> : That means all the good stuff will be taken and we'll just eat the
scraps [inaudible].
>> Juan Vargas: Yes, both. True for both. Okay. So please let's welcome
Darko Marinov from Illinois, and he is going to be talking about UIUC
testing tools. Okay. Thank you.
>> Darko Marinov: Can I just close this?
>> : Yeah.
>> Darko Marinov: Okay. Thank you. So, yeah, I'll be talking about some
work we've been doing on testing for [inaudible] code at Illinois.
Obviously it's a short presentation. I'll focus it on my work but there
has been other by other colleagues as well. So we've seen in the
previous session that people say it's difficult to develop
multithreaded code. I'd like to make that a bit more precise to say
it's difficult to develop correct multithreaded code. I think it's very
easy to develop incorrect one, just not the one that we want. And it's
very easy to introduce all kind of these bugs. Races, deadlocks,
atomicity violations mostly come due to the non-deterministic
scheduling.
So not only it's difficult to develop this code but it's also difficult
to test multithreaded code. So this is what's usually done. You have
the code, you write some test and then you need to explore all these
different [inaudible] that you have that could potentially lead to
different results. So the issue here is that these failures can be
triggered only by some specific schedules. So it's difficult to explore
these schedules. The [inaudible] space is usually very large, so it's
needs very sophisticated knowledge to do that. Indeed, most research is
focused on that, on exploring the schedules for this. A lot of good
work here from Microsoft, you know, folks have worked on [inaudible].
You know, people from Berkley have worked on active testing. I know
that in the next talk that Koushik will be talking only about active
testing. Actually I just learned from him that he'll be also talking
about other things.
Well, let me just say what most of this existing work focused on
basically it was thing that someone somehow wrote the test for the
code. And you have one code version, and now what you need to do is to
explore these [inaudible] to find whether there is a bug. And there are
a lot of techniques that are proposed for doing that. There are many
other problems that we have in testing multithreaded code, especially
if you want to unit test this code. So the issue is how to write these
tests. Most previous research just assumes that the test somehow
exists; the developers wrote it manually. But it's not clear how people
can write that especially in the cases where you do want to encode
certain expected result that does depend on the schedule, how to
express the specific schedule for which you want to encode the result.
Then the next thing is how to explore these tests especially when the
code evolves. Again, as I said most of the previous research focuses on
analyzing only one given code version but as we know code evolves over
time, people make changes, correct bugs [inaudible] functionality. So
in the context of sequential code there has been a lot of work on this
regression testing, how do we make testing better. As the code changes,
how do we make it more efficient? This was not addressed widely in the
context of parallel code. And then there also the issues of how to
dramatically generate some of these multithreaded unit tests, how to
generate the test code itself and how to generate schedules.
These are some of the challenges; there are obviously others, but I
picked those ones that we worked on. So we do work at Illinois on all
of these three topics. And what I'll spend the most time on today will
be on this, the first topic of writing these multithreaded unit tests.
Basically, how do you manually write tests especially in the cases
where you want to say that the result does depend on a certain schedule
that is taken. So we've developed a tool that we call Immunity or
IMUunit, immunity for improving this testing. We've also done work on
these other two topics. So one is this regression testing. And the idea
is I test my code once, now I go and make the change in the code.
Typically the change that I'm making on the code is quite small, but
it's running this testing that takes a lot of time. The question is can
I make my testing faster if I just focus on the change? So we've
developed some techniques there for doing this test prioritization,
test selection.
[Inaudible] building on some very successful results from testing
sequential code. And then one of the issues also that I mentioned was
automatic generation of tests. So we had the recent result on that,
something we called Ballerina, can ultimately generate the code and the
check for some parallel bugs. And there is also a lot of other work by
the colleagues in the department, some of them are through UPCRC and
I2PC. Marthu Parthasarathy has done a lot of work with Penelope and
Exceptional and a few other tools and approaches testing [inaudible]
violations and looking at other problems. Grigore Rosu has done a lot
of work with jPredictor and Java MOP extensions for concurrent code and
so on, but I won't be talking about that because I don't have time.
So the focus on will be the IMUnit. This was a project not only funded
through UPCRC but we had some other, you know, NSF and NASA, NSA,
Samsung and so on. It's a part of a project on trying, basically, to
make this parallel testing for parallel code easier to be adopted into
the whole software development process. Some people who worked on that:
Vilas was the senior student who was leading this, a few other students
and then Grigore and myself.
But let me get to the technical things. So here's the example of this
thing. Let's say we have a certain class that we want to test. This
comes from some open source codes from Google. Let's say we have this
class, the ConcurrentHashMultiSet. It's basically a collection but it
can store objections. If you have operations like add, remove, count
and what you would like to do is to write the test for this particular
class.
So let's say we want to test certain scenario but we have two threads
that execute, so they operated on the same shared object. One of them
does two "adds" and the other thread does the remove, and what you
would like to do is to test whether these add and remove and count do
behave as we would like them to behave.
Now the issue here is that the value of this count is scheduledependent based on the order in which we execute these add and
"removes", based on the order in which the instructions from them
interleave, we can get different results.
For example if you're executing this scenario, what should be the count
of forty-two? Uh, one. Oh, okay. That was the answer, one. I thought it
was a hand up. Okay. So the answer here should be one because we added,
then we remove it, and we add it again. The question becomes how do we
encode in our scenario this particular schedule? Now notice here that,
you know, we are not doing here interleaving of methods. I just even
want to say they just need to be executed atomically. So what we've
seen a lot in the open source is something that's bad. Don't do this at
home. This is what people do, so they use a lot of these "Sleeps." They
basically say, "I'm going to start two threads." This is the actual
Java code. This is starting one thread. This thread does two "adds".
There is another thread going on here; this one does remove. And then
in order to ensure that they are getting this order or [inaudible] in
order to attempt to get the specific ordering of the events, they add
these "sleeps." This basically says, "Wait for forty milliseconds,
meanwhile I hope something else will finish. And I'm going to, here,
wait for eighty milliseconds and do this."
So as you all probably know there are a lot of problems with these
"sleeps." They are not the best way to do that, but we've seen that
people do this a lot. So some of these problems include that these
tests are fragile. And what I mean by that is the following: that even
when you put these "sleeps," you're not sure that you're going to get
the result that you want. It may happen that your Java Virtual Machine
does not execute the schedule that you intended.
So we get into this situation where the test seemingly fails even
though there is no [inaudible]. So the test did not fail, and we get
here say two or zero not because there is something wrong with your
remove or add but because these did not execute in the order in which
you wanted them to execute. So in order to prevent these problems,
people usually put these bounds on "sleeps" that are very long. Let's
say, you know, "Wait here forty milliseconds," when, you know, ten
would be enough. But just to make sure that you get the schedule that
you want, you wait longer than necessarily and then you get inefficient
tests.
You also get tests that are non modular. Namely if I have two very
similar tests, I cannot combine them. I cannot reuse that code because
I am sprinkling these "sleeps" all over and I cannot reuse the
"sleeps." What I want to reuse is this thing that I have these "adds"
and "removes".
And then the schedule is very implicit. You know, if you try to
understand what this test is doing, what is it it's encoding, it's
going to be very hard to figure out what's going on. So others
[inaudible] recognize this issue, so there have a been a few research
proposals how to address these things, this ConAn, ConJUnit thread
control. The latest solution before ours was something called
MultithreadedTC from Bill Pugh and his group from Maryland. And it
addressed some of the problems from these sleep-based tests but not all
the problems. So we've then proposed our solution which we call this
IMUnit which we hope makes these things easier to solve.
So here's how IMUnit looks like. Instead of writing those "sleeps" that
I had there, what you do is write these events. You say there's certain
events that are happening in the execution. And then we write this
schedule that says what is the order in that I want of these events.
Basically what I am saying here is that I want this add to finish
before I start this remove. And then once I finish the remove then I
want to start this add too. So I insert these "adds" and then I add
that schedule there. Now this makes this test robust.
What I mean by that is once I write this schedule there is certain
execution engine underneath this that's going to ensure that the code
actually does execute in the order in which we specified. If the Java
Virtual Machine wants to execute in a different order then it will
start kind of stopping certain threads to ensure that you get the
actual order that you want.
As we'll see in the experiments, they turn out to be even more
efficient than the sleep-based tests. They're modular. This is
something I did not show, but basically you can reuse the schedules
from different tests. If you want to have different scenarios, there
are actually two schedules up there. And the schedule is very explicit,
so if you need to understand what this test does, if something fails,
if you need to debug, if you need to change these things, it's much
easier to understand.
Yes?
>> : [Inaudible] code
a deadlock or something bad in your schedule?
>> Darko Marinov: Yes. It's very, very possible. We have both static
and dynamic analysis to try to help with that. [Inaudible] code is the
partial order of events; you can just introduce the cycle. So you just
do something wrong. Let's say there instead of two you put one by
mistake. So we can statically find some cycles. But, you know, people
usually don't have too many of those. What does happen often is
dynamically as you run this code the schedule may be unrealizable and
then we can detect whether this is happening because of the deadlock in
the code on the test or because this execution engine got stuck here
waiting on certain events to happen. Then we can give appropriate
warning or error message telling you that you are having this problem.
Okay. So to see how expressive this language is: so we got about two
hundred tests, these sleep-based unit tests from open source Java code
from various projects then we transferred them into this IMUnit by
adding these events and orderings. And we found that we were able to
express almost all of them. One issue that we did not support, though,
is events in loops. So we cannot actually, you know, use this IMUnit
thing as kind of general purpose programming where we would enforce
ordering between events because we do not allow events to be repeated.
And the reason was that simply we did not need that for test. If we
wanted more expressive language that's something we consider as a
future work but it's just going to make the language much more complex
and, you know, potentially less likely to be adopted because of that.
We also measured the speed of the execution. So what we found was that
we were about three times faster than the sleep-based tests. Across all
of these two hundred tests when you run with the sleep with the bounce
that the developers put [inaudible] without engine, the issue here is
that simply these "sleeps" are inefficient besides all the other
problems that they have.
So basically to just summarize on IMUnit: so it's [inaudible] to write
these multithreaded unit tests. The current dominant solution kind of
in practice is using these sleep-based tests despite all the problems
that they have. So IMUnit addresses those problems and the schedule
language is expressive and our execution is efficient. We have also a
tool to help you migrate from these old, traditional sleep-based tests
to the new things. You know, more details are online in the paper on
the tool. So the tool is publicly available. We had some people who
downloaded; they send us bug reports. On one hand we can view that as
bad like, "Ah, our code has bugs." On the other we can view that as
positive: someone is using our tool, you know, and cares enough to
submit a bug report.
Okay. So that was about IMUnit. I'm just going to skim in two minutes
through two other projects, so one of them is this Change-Aware
Preemption Prioritization. So the goal of IMUnit was just write tests;
the goal here is as my code changes how can I make my testing faster?
So basically here is the idea, you know, I have the code, parallel
code. I have test. This testing here takes a lot of time. As the code
changes, the change is typically small. Can I somehow explore the
knowledge of this change? Can I statically analyze this change and
explore that to optimize this process here?
And we found that this is indeed possible. Here is a comparison for
using this change-aware prioritization. Basically if you have something
that understands what's changing the code versus just doing the
exploration that does not understand that. And so what this shows is
the speed-up [inaudible] over these change-unaware prioritization kind
of as the best case in some of the studies we do is [inaudible]. In
some others we use some other exploration approaches. Some of the
things we sometimes obtain a speedup of five times in this testing.
These are for various statefull and stateless explorations. Sometimes
we obtain 2.7. There was only one case where we were slower, where our
approach was slower than the default. So overall take of a message here
is that there are ways to make regression testing faster for parallel
code.
And then the last thing that we did most recently was this about
automatic generation and clustering of these unit tests for
multithreaded code. So rather than manually writing some of these, to
try to automatically generate. And basically we call this technique
Ballerina. So what it does, it generates these tests that use random
generation to find kind of randomly here what methods to put. So it
generates some complex prefix; it generates potentially complex
subjects. And the example that I had with that multiset [inaudible]. It
was simple things but it can generate more complex things and then run
some methods in parallel to try to find bugs. So the good thing here
was that we could find bugs. The main potentially bad thing was that we
find too many failures. So we had some clustering methods that help us
to identify which of the problems are likely due to the same root
cause.
So evaluated on some known bugs, we found that this approach worked
better than some other baselines. And we also found some previously
unknown bugs. So basically to summarize we have, you know, a lot of
work going on, on testing for parallel code at Illinois. I'm just
presenting some of the work from my group on this IMUnit and then this
regression testing test generation. There are other colleagues who work
on other things. Okay?
>> : Any questions?
>> : What about an issue in coverage? [Inaudible] is good enough then
you need to get some measurements on coverage.
>> Darko Marinov: Okay. That's a very good question. So the question
was, can we measure coverage and we did not look into that. But there
is work on proposing various coverage criteria, how to measure, maybe
interactions, the shared variable, did you cover certain interactions
about that, do you cover certain, you know, locks, [inaudible] on the
locks. First even defining the coverage criteria for parallel code is
[inaudible] how to do. But, yeah, this work we did not do that. Yes?
>> : Back on the IMUnit work, writing a test it actually seems pretty
hard to read, right? Why can't you just write the thread [inaudible]?
>> Darko Marinov: Okay.
>> : Putting the markers and then having the, you know, [inaudible]. It
seems like it's difficult to read.
>> Darko Marinov: Sure. Sure. Sure. So you mean you would more want to
write it the way it looked like on a slide? You know, kind of the way
this way? Yeah, so that goes back to the, you know, how can I have, you
know, an integrated development environment that would let me visualize
my code in this way? Right, if I write one thread here, one thread
there, is there something to visually spread these things and do that?
>> : [Inaudible].
>> : It's [inaudible] this is the thread-one, this is the thread-two.
>> Darko Marinov: Sure, sure, sure. Yeah, one can think of those ways
how to do that. Yes?
>> : This is a question for anybody who builds a debugger. It seems
like if we're going to teach parallel computing online and do autograting, having a debugger that automatically looks for [inaudible]
then the submitted code would be a very useful tool. But I'm wondering
is any one ripe enough to use for that purpose?
>> Darko Marinov: So here we did not really focus on, you know, trying
to find bugs. It's more like you encode your test. Once you run this,
you still need to use some other tools to look for these bugs. But,
yes, if you have [inaudible] code you likely need to run the tools to
find the bugs. Actually you can just skim through the [inaudible] code
most time and just find the bug. You know what I mean? If...
>> : [Inaudible].
>> Darko Marinov: Okay. Well, it gets harder. Sure. Sure. If you have,
you know, messy codes then you need to run some tool to look for these
bugs. Yes?
>> : So in your unit testing, I think a better assertion at the very
end would be something like the count is either one or two. And
whenever it is one -- whenever it is two, [inaudible]. So what you want
to say is some assertion that captures behaviors of all [inaudible] as
opposed to [inaudible].
>> Darko Marinov: Okay, so that's a very good comment. So the
suggestion is the thread that they are writing, I want to get exactly
value one. In this particular schedule, the suggestion was why don't
here that the value is one or two. So if I just say the value is one or
two without saying under which schedule then I can be missing bugs,
right? So if my code always returns one that would pass.
>> : [Inaudible] if it returns two, it so happened that the [inaudible]
executed before [inaudible]. Right? So essentially I was to write
assertions for all possible schedules.
>> Darko Marinov: That's fine. You can actually do some of that in
IMUnit. Unfortunately that's not shown here. But you write assertions
that actually say for which of the schedules you're getting which of
the results. You do not need to write only one schedule. You can have a
larger number of these schedules, and then you can encode, you know, a
set of the results here. I mean the limit you can try all these, you
know, [inaudible] trials, possible orderings, just encode that you are
going to get to one of those under certain conditions. And these events
need not be only here in the code and the test, you can actually put
them in the test code. You can put them in the code and the test.
Sorry. They need not be only the test code, you can put them in the
bodies of these methods such that you get there [inaudible] so that
it's not executed atomically. Is there one more question here or...?
>> Juan Vargas: [Inaudible].
>> Darko Marinov: Okay.
>> Juan Vargas: Thank you very much.
>> Darko Marinov: Thank you.
>> Juan Vargas: So now we have Danny, Danny Dig and -- Okay.
>> Darko Marinov: He's going to talk about annotations, I believe.
Danny?
>> Danny Dig: Refactoring.
>> Darko Marinov: Refactoring. Thank you.
>> Danny Dig: All right. So I'm Danny Dig. I'm a research professor at
the University of Illinois where I'm in the area of software evolution,
and especially I want to give more of a high-level talk on my best work
on software evolution, my current work on software evolution, and I
will conclude with my future work on software evolution. Of course a
lot of -- I will put emphasis, more emphasis, on the work that has to
do with parallel programming, but I've done much of other works that
have nothing to do with parallel programming. Regardless, they're still
very useful.
So this guy here, it's a famous Greek philosopher. His name is
Heraclitus. He is famous for many things, among others this quote,
"Change is the only guaranteed constant." You know, this is [inaudible]
for us who are using software every day because we know that the only
software that remains successful is the software that constantly
changes. And here I have example of changes so people add more
features. Microsoft constantly pushes more features, new versions. You
know, Windows 8 comes now. You know, people fix bugs, improve
performance, improve security. In fact the only software that doesn't
change is software that is dead that nobody uses.
So here are some visual reminders that our software changes. This is
from -- just look for updates on my Microsoft office and apparently
Office 2011 has a bunch of updates and some of them are critical, so
probably I should go and apply them. Now here's another visual
reminder. So Windows 7 also has a bunch of updates for me. And I see
there are several people here also using Mac's, so if you're using
Mac's this is another reminder that also Apple has lots of updates for
you as well. Again, these are reminding us that our software constantly
changes.
So programming is all about change. It's, how can we manage and how can
we express changes in large complex, large code-bases. So we need the
way we program to better support this changing ecosystem. So I view
programming as program transformations, [inaudible] version and
indicate version N plus 1. Now how do they do that -- This is one of
the research questions that my group addresses, what are the kinds of
changes that occur most often in practice? And second, how can we
automate them to improve program productivity on the software quality?
So answering these questions is not only very valuable for the practice
of software development, it's extremely rewarding and challenging also
intellectually.
Here is a very high-level overview of my work on automating this free
kinds -- successfully automating free kinds of software
transformations. So for my [inaudible] I've been working on how can we
upgrade clients of a library API to move from version 5.2 to version
6.9 or whatever. I've also been looking at software testing which I
mostly collaborated with Darko on. Well, it's not only that the
software under testing or the production system changes. But when a
production system changes, you also have to change all these regression
test [inaudible]. You have to change the insertions. So how can you
automatically also update the test [inaudible]? Of course the most
recent -- My work has been in the area of how can we change sequential
software for doing [inaudible] parallelism and [inaudible] parallelism.
And of course this is a very hard problem, and I do not believe in full
automation. I'd rather believe in an interactive approach where we use
the smarts, brain of the programmer to guide the tool. And then the
programmer [inaudible] to the tool, "This is the kind of
transformations I want to do on my code." The tool is going to check,
"Is this safe?" And if it's safe then it applies the transformation. So
this is not fully automatic. It's interactive; it's driven by the human
brain.
More than the number of publications, you know, I am very, very happy
to see a real world impact of my work. So I've been developing the
first open-sources, the world's first open-source refactoring engine
for Java. This was developed as a plug-in for JEdit. JEdit used to be
the number one ID ten years ago. Some of my other work that is the area
of software upgrading is shipping with official release of Eclipse. So
Eclipse is the number one developer environment for the Java
developers. It is used daily by millions of Java Developers. Some of
the other work that I have done here doesn't ship with official release
of Eclipse but is stand-alone plug-ins. But it's still widely used at
several companies and several research institutions, so it's used
widely at Google and other big large companies. Here in the area of
software testing, one of our tools, ASTGen, is using the testing
infrastructure at Sun NetBeams where they test net beams. ID, actually,
right now is no longer Sun; this should be Oracle. And of course the
most recent work on refactoring, interactive refactoring for
retrofitting parallelism is going right now into the official release
of Eclipse. So if you are using Eclipse Juno release which is 4.2, some
of this work is going into 4.21 which is coming late August.
So I do not have time to talk about all this work. This is just one
slide, very high-level overview of our interactive refactoring for
parallelism. So we are supporting all kinds of changes that people find
practical and they need in the practice of converting their sequential
code for parallelism. We have one [inaudible] called enabling
transformations; these are transformations for thread-safety. They do
not introduce multithreading, they just take your code and prepare it
for multithreading to make it more thread safe. One example of this is
making a whole class immutable. So if I'm changing the class so that
all the instances from this class, you know, I can never update and
never change the state. Once I [inaudible] this class, now this class
is what we call [inaudible] thread-safe. I can share it with all my
friends in the world, and there is no need for synchronization.
Here we have refactoring from the second category. These are actually
refactoring to introduce multithreading. So here we have one that
introduces the parallel recursive divide-and-conquer four-joint task
parallel pattern that Tim also has in his book in OPL. So we have one
refactoring, and this is one of [inaudible] that goes in Eclipse 4.21.
And we also have another one loop parallelism. This converting
sequential loops to parallel loops, again, via a refactoring
interactive approach.
We have a third category of refactorings. These are these are the ones
that you want to apply if you add some locks or some other heavy way
synchronization mechanism in your code. You want to make your code more
scalable, so you want to get rid of this and replace them with more
scalable constructs like these Atomic packages that use underlying,
under-the-hood compare and swap hardware instructions. So Java has them
via the Atomic package. The [inaudible] has them via the interlog
construct. I forgot how it's called in -- TBB also has something called
Atomic. So it's the same construct.
We always as an approach, as the way how I'm developing and how I'm
conducting research, I always validate empirically. So we validate our
refactoring tools by running them against hundreds of files from opensource projects and also doing controlled experiments with programmers.
And we found out that these kind of tools dramatically improve
programmer productivity. It's much -- You know, they fast enough even
though they do very complex and very intelligent program analysis which
requires interprocedural point analysis to figure out what [inaudible].
So these are still very fast. It's fast enough that actually
programmers can use it in the interactive environment, in interactive
mode. We found out that unlike the open-source programmers, our tool
applies the transformations correctly; whereas, open-source developers
apply them incompletely so they probably carry out nine-tenths out of
the refactoring and left one-tenth. And exactly that one-tenth left,
it's a bug. Of course there is a good motivation. So once we refactor
this code they exhibit good speed up.
Doug Lea, who is the Java number one lead architect of the Concurrent
package in the Java standard libraries, writes on the mailing list that
he was very impressed with one of tools, which is ReLooper. He says, "I
expect it'll be useful to just about anyone interested in exploring
these forms of parallelization."
Well, it's not only that we develop these kinds of tools and we put
them into the hands of the programmers but, you know, I believe that if
you put them, if you ship them as a plug in for a widely-used developer
environment they are much, much more likely to be used. If you want to
really impact the lives of millions of developers that's the way to
ship and to package your research.
So the two that we've presented so far are all developed as plug-ins
for Eclipse. Now we're starting to work on plug-ins for Visual Studio,
but that's the next slide coming. You can download these tools for
free, open-source, at refactoring.info/tools or you can wait for
another five weeks and you can download them from the official release
of Eclipse. So if you do the Eclipse Juno release, the minor release
number one, some of these tools will be integrated in the official
release of Eclipse. This is very exciting.
On the educational side, we have these successful summer schools. So I
educate more than 800 participants who come to these summer schools on
this topic of refactoring for multi-code parallelism and on the topic
of multi-code parallelism itself. So we just finished last week our
fourth summer school on multicode programming at Illinois, and this was
very successful. We had some high profile keynote speaker at the
caliber of Doug Lea and Cliff [Inaudible]. I also do one-week training
courses, so I've done three of them in this area. And when I come to
Seattle and I either come at Boeing or I come at Microsoft. Boeing is
very interested in this topic, so apparently they are interested enough
to hire me to teach these one-week programming intensive exercises and
these training course. I also do tutorials and big conference like
OOPSLA or ICSM. And more than this, actually, we have another metric so
that actually people out there who care enough about tools so they not
only download and use our tools but they also they bother to send us a
bug report. So we are humans; we make mistakes. And our software is no
different, so all the tools that I've presented so far I know they have
some bugs. We are fixing those bugs. We are making them more
production-quality and apparently some of them are good enough that
Eclipse Foundation has integrated them into official release.
So here I want to talk about some of the current and ongoing work. So I
looked at the industry trend, and I saw that the industry has this
trend of converting the very hard problem of using parallelism to a
slightly easier problem of using a parallel API, using a library. And
Microsoft has this library; it's TPL and PLINQ. Intel has [inaudible]
building blogs. And there are many other libraries out there. Yet, we
know very little about how do programmers actually adopt these
libraries in practice. We know very little of how can they find
examples; how can they educate themselves on how to use these library
API's. You know, some of these API's are overly complex. Some of them
are very rich, so they expose several overloaded methods with several
arguments. Yet, programmers have very little examples and very few
examples from the real world. So also library designers if they don't
know about how people use these in practice, they do not find examples
of misusages. They do not know what's tedious, what's error-prone.
Researchers can make wrong assumptions or you can build a data
[inaudible] tool thinking that people only use logs.
It turns out that from our study we found out that actually logs are
just one construct. There's a long tale of many, many other
synchronization constructs. So we analyzed all the open-source projects
that we could find in CodePlex repository; this is Microsoft CodePlex
repository and GitHub is a very widely popular repository. So we
analyzed all the projects that use Microsoft TPL and PLINQ libraries.
This is about 17 million lines of code contributed by 1600 different
developers, and this is the first in-depth large study, a study on this
scale, of how do people in the wild, how do they adopt these parallel
libraries.
So some of the findings that we discovered through this study are quite
interesting. For example, we found out that indeed open-source
developers embrace and adopt parallelisms. We saw that 37% of all the
open-source projects in the GitHub and CodePlex ecosystem use some sort
of multithreading and out of them 74% use [inaudible] concurrency
[inaudible] and about 39% use multithreading for improving throughput
and actually squeezing parallelism out of their code. We also found
very surprisingly that in 10% of the case, developers' code -- You
know, we think code runs in parallel but in fact their code runs
sequentially. There was just this very small minor syntactic mistake
that they made in their code. Also found out that developers make their
parallel code unnecessarily complex. So this is what we know from
software engineering; we call this accidental complexity. Parallel
programming is hard but they make it unnecessarily hard. This is the
first large-scale research project. We could only do this because
Microsoft has this new infrastructure called Roslyn, so it's analysis
infrastructure for Visual Studio. And we were, for a while and I think
we are still one, we are still the one [inaudible] pushing it to its
limits. So when you crank this on the 17 million lines of code, you
push it. You know you push this to its limit. And of course we found
lots of bugs, and we report more than 20 bugs to the developers. They
were very keen on acknowledging and on fixing those bugs. So in the
recent release that just came out last week or so, Roslyn is much more
robust now.
And, you know, one of the things is how do you report bugs without
actually cutting bridges and without losing friends? And apparently we
managed to report more than 20 bugs in the Roslyn environment and we're
still very good friends with them. So some of the implications: so this
good news for developers. So we have this website learnparallelism.net.
The only reason why it's called "dot net" is because it's for dot net.
Here you find thousands and tens of thousands. You know, if you're a
new developer who has just heard about, "Okay, I can do a parallel four
in C Sharp and TPL," you can look at this website and can find tens of
thousands of examples of how other developers in this more than 600
open-source projects use parallel dot four or any other constant you
care to learn about.
This is good news for researchers because now I can make a more
informed decision on what are the kind of research tools that I want to
develop. Of course it's good news for library designers because Stephen
Toub who is the lead architect of TPL found this study very useful and
this is going to influence future API development of the TPL library.
So since I'm in the Microsoft ecosystem I thought, well, Microsoft was
nice to me so it gave me another grant to keep working on Microsoft
technology. So I started another project, this is very recent, on
refactoring for spreadsheets. So the surprising figures are the number
of spreadsheet end-users -- These are people who are called end-user
programmers -- this is estimated to be at least a hundred million. You
know, this is very, very conservative; the number could be way, way,
way much larger than a hundred million. But anyway this give you sense,
at least [inaudible] more end-user programmers than professional
programmers. Well, what does it mean practically? It means that the
number of bugs that end-user programmers create is at least [inaudible]
large than the number of bugs in professional software. And when we
looked at thousands of spreadsheets from the real world, we found out
that indeed they are riddled with the same, they are plagued with the
same smells, with the same errors, with the same mistakes that
professional programmers make so lots of hard-coded expressions,
duplicated expressions, duplicated constants, accidental complexity. So
this has effects on both the performance and also the future
maintainability of the spreadsheet and the work we could sell. So we
developed REFBOOK. This is the world's first refactoring tool for
Microsoft Excel formulas. And right now we are supporting several
refactorings. Here is an example of one of them. So you can look in a
table. You can extract a sub-expression from a complex formula and
extract it in its own column, and then the tool itself will go and find
some other instances of the same sub-expression in other columns and
replace all those instances with, you know, the new column that you
just extracted. So now if you have to change your table in the future,
you have to change that sub-expression, just go and change it in one
single column. You don't have to go and hunt all these other columns
that previously were duplicating the same sub-expression.
So like with any other tools we always [inaudible] the empirical list.
So here we've done an evaluation of [inaudible] from three different
angles to look at a survey and control experiment with 28 Excel users
and also look at a case study where we looked at 4000 spreadsheets from
the real world. And we found in it that users prefer the [inaudible]
ability and the nicer maintainability of refactored formulas. We also
found that our tool, of course as you expect with tools, you know, they
are faster and more accurate than doing these changes by hand. And we
also found that these refactors are widely applicable because
[inaudible] thousands of spreadsheets there, as I said before, they
were riddled with all kinds of smells and all kinds of problems that
could be fixed through refactoring.
So I want to conclude with -- Uh-oh. Apparently I'm presenting the
wrong version of the slides. What happened here? Oh, so this is a
recover file. This is -- This is the recover file. This is not my -Whoa. Okay. This is surprise ending. I noticed that one my bullets was
empty before. Okay. Let's see. Okay, that's because I [inaudible] -It's because I closed the lid so that the lid wouldn't bother you. So
probably that crashed Microsoft Office and it recovered this. Anyway,
what I wanted to say here -- Okay. Currently I moved it somewhere.
Okay, so I'll just end up then with this slide. I had one more slide
which in my deck of slides is nice and polished, but you know
apparently the version that Office recovered just doesn't contain my
text.
So we are starting and we are developing and inventing the next
generation of development environments. This is a large project funded
by SHF: Large, program funding by National Science Foundation. We are
inventing a new generation of programming environments that treat
software changes intelligently. So we are enabling programmers to
actually author and prescribe their own transformation. You can be able
to use them in other context. We are enabling a version control system
to show the history at the high-level, so it makes it easier to
understand the changes. Of course we are inferring these high-level
changes from low-level changes. So it means it's going to significantly
and drastically change the ecosystem of programming environments. And
of course it would be good to see some of these things going into
Visual Studio probably in a few years. So that's all that I had to say.
[ Audience clapping ]
>> Juan Vargas: We are running a little late. Probably if you have
questions, please see Danny during lunch. We are now going to have the
presentation from Koushik Sen, "Active Testing and Concurrit." Joseph
Tereles reminded me that Koushik came from the University of Illinois,
and he just got tenure from UC Berkeley. So this is another example of
great collaboration between two schools.
>> Koushik Sen:Thank you.
>> Juan Vargas: Big transfer.
>> Koushik Sen:So today I’m going to talk about Concurrit: a domain
specific language for testing concurrent programs. And this is joint
work with my student Jacob Burnim, Tayfun Elmas, my [inaudible], and my
colleague, George Necula.
So in the last few years there has been a significant amount of
progress in automated test generation for both sequential programs and
concurrent programs. And I was in fact involved in a couple of those
projects, and what I noticed in the last few years that there is a very
slow adoptions rate for these kind of techniques in the industry. And
people are not -- programmers are not [inaudible] to use these
automated test generation tools. And on the other hand if you look at
tools like JUnit, xUnit and so on, programmers use them regularly and
they're very popular.
So we started thinking about developing a similar kind of xUnit tool
for concurrent programs, and we came up with this idea called
Concurrit. And there are some other tools that you have seen like
IMUnit, and there was a tool developed by Bill Puy [inaudible] in the
same spirit.
So suppose this is -- So I use this SpiderMonkey JavaScript engine as a
running example. Suppose you want to test this code. It has 121,000
lines of code, and it used by the Firefox browser. And you want to test
several functions in this code. Okay? Now it's easy to write a
sequential test for this code. You just write the function, harness
function, and you fix the inputs and you call the methods that you want
to test and you check the output.
And one of the nice properties of this kind of unit test is that if you
run it multiple times you get the same outcome, either it will fail or
it will say that the test passes. So this is a determinism property for
test and it's very desirable. And we get this kind of determinist
behavior for test written for sequential programs. Unfortunately if you
want to do multithreaded testing or concurrent testing for this program
then you create several threads and you run this code in parallel.
But if you write such a test for concurrent program, you lose this
property of determinism because the threads can interact with these
other and you may not get the same output if you run it multiple times.
Okay? So that is the key problem why there has not been any successful
tool for unit testing of concurrent programs. Now what people use in
the absence of such unit testing tools for concurrent programs, they do
stress testing. The idea is you create numerous threads, thousands of
threads, and you run the program for several minutes and see if
something bad can happen or not. Okay? And this is people call stress
testing, and it's really used in practice.
Another set of approaches that people have developed is called model
checking. It's mostly in the research area where you try to explore
explicitly all the schedules of the thread and see if there is any bug.
The problem with model checking is if you try to apply to real world
code, like say Apache or Firefox, it's very hard to scale because there
are too many -- the [inaudible] is too high. Moreover, you have to have
total control over all possible sources of non-determinism if you are
trying to model checking, and this is not realistic for real software.
Okay?
And also we have seen that many times programmers want direct control
over the scheduling and the test that they are trying to write. And if
we can incorporate the programmers inside in the test then our testing
could be effective. Okay?
Another alternative that people have explored and which is surprising
the most used approach to test concurrent programs is to insert sleep
statements as Darko showed. And if you look at the [inaudible] for
various kind of concurrent software, you'll see that most of the bug
reports include sleep statements. And the idea is you create a number
of threads, but also you put the sleep statements at the right location
so that you can play the schedule and you can reproduce the bug. And
the reason by the success of the sleep statements is that it's very
lightweight and convenient to write this test. On the other hand it's
not formal and it's not robust; you might see the bugs sometimes. You
may not see bug other times.
So what we wanted to do, we wanted to come up with a testing framework
that is as lightweight and convenient as sleep statements but at the
same time which is formal and robust. Okay? And Concurrit is the result
of that. So I'll give you a tutorial introduction to this Concurrit
using this benchmark the SpiderMonkey JavaScript engines which crashes
on some execution used with assertion failure. And here is the bug
report that was filed by some user to the bug database. And as you can
see that he tells us there is some kind of schedule involving three
threads and some unknown schedule between two other threads, and if you
take that particular schedule you will hit the assertion violation.
Now our question is how can you take this kind of programmer insight
and translate it into a very small test script so that we can make this
bug reproducible. And here is what we do in Concurrit. We take this
user insight and ideas about the thread schedule from the programmer,
and we write a test in our DSL. And then we run the DSL along with the
software and the test and systematically explore all-and-only thread
schedules that are specified by this DSL and see if we can reproduce
the bug. And I'm going to describe how the DSL looks like.
And here is how it works. The Software Under Test we instrumented and
it generates events in a similar way as immunity does. But whenever it
sends events, it gets blocked by the Concurrit DSL, and the Concurrit
DSL based on its logic, it unblocks the thread from time to time.
Okay? So let me show you how a Concurrit test would look like. So the
bug you have seen, it says that it mentions that you will see that
assertion violation if you schedule three particular threads in a
particular way. Okay, so we definitely know that the bug is happening
due to three threads. Okay? So let's try to first write a test that
will see if the bug is due to concurrency or not. So here is our first
test script which says that, "Pick any three thread in the program and
run them sequential until they terminate." Okay? And this is what we do
in this test. We pick TA, TB and TC, three distinct threads, and while
they're running, choose a thread and run them until the end. Now this
is a test that we write in our DSL, so this is very easy to
understands. It's an embedded DSL in C and C++ so you don't have to
learn a new language to write this kind of test. You have to understand
a few constructs that we have provided, and once you have written the
test you will run it along the software and the test. And we'll explore
six schedule in this case, six possible ways of running these threads.
And we see that there is no assertion violation.
And at this point we know that the bug is definitely due to
concurrency. Okay? Now we need to refine this test so that we can
actually create that particular schedule and hit the assertion
violation. So we write our next test which will try to explore all the
schedules of the program. And that's what we exactly do in case of
model checking. Okay? So here is the second test that we write. We just
change one line in our test which says that, "Run thread until ReadMem
or WritesMem," or some other event instead of saying that, "Run thread
until it ends." Okay? So this means that [inaudible] these threads in
all possible ways and explore the [inaudible].
Now this is a good test. This actually finds the bug, but it's like
model checking. We end up exploring more than one million schedules and
it runs for days and eventually finds that bug. Okay? And also in this
process we learn something: if we write this kind of test, you have to
take control of all the possible sources of non-determinism in your
court, otherwise, you cannot guarantee systematic termination of your
search. Okay? And moreover, since you are instrumenting all possible
instructions in your core, it's too heavyweight and you spend a lot of
time searching schedules which are not important for the bug. Okay?
So we came up with this notion called Tolerant Model Checking where we
say that it's unrealistic to instrument all possible sources of nondeterminism in your source code. So, why don't we allow the programmers
to specify the non-determinism that matters for the purpose of the bug
and control them? Okay? And also why don't we provide mechanisms so
that we can constrain the search, so that we can say that, "Only
interweave the threads within these two functions. Do not interweave
the threads all over the places." Okay? So these are the two things we
provide in this Concurrit DSL, this concurrent framework. And also in
this framework, if you wish to do [inaudible] you can write your own
heuristics which can full model checking or it can be context model
checking and other heuristics.
So let me show you how we can localize the search and only specify the
non-determinism that we are interested in. So if you look at the bug
report again, it says that the bug happens only when the three threads
are executing the js destroy context. Two of them are executing the js
destroy context and one of them is calling the new context. Now let's
try to encode it in our test. And here is our test. We modify again two
more lines. Here, instead of waiting until it entered the js new
context, we also make the thread TA and TB until entered the function
js destroy context. And then we try out all possible interweaving
between these threads. Okay? So this is the most restricted and
localized search. We do not start the search as soon as we create the
threads. But once we have entered those particular functions, we start
the search. Okay? Now if we -- So this is how the tree look like, the
model checking tree looks like. It first takes a particular specific
schedule and then it tries to search the interweaving space. Now if we
write this test then we can actually hit the bug after exploring fifty
thousand schedules, and we see the assertion failure in a few hours.
So this shows the power of Concurrit. In order to write a Concurrit
test, you don't have to control all possible sources of non-determinism
in your code. You can just specify the non-determinism that you want to
control and keep the other sources of non-determinism uncontrolled. And
the Concurrit test can still do systematic search, and if at any point
the systematic search fails because the fact that you are not
controlling all uncontrolled non-determinism, it will raise a flag. And
at that time the programmer, what they can do, they can either make the
test more robust by putting more non-determinism or they can continue
the search and not expect any soundness guarantee.
Okay? So this is the idea. And finally fifty thousand schedules are too
high. So we looked at the exact thread interweaving and we tried to
further localize it. And it turns out in the bug report, it tells us
where to exactly interweave the thread so we can create the bug. And we
incorporate that knowledge further into the test by adding three of
these lines. And if we now run the test along with our software
[inaudible] test, we actually hit the bug schedule within ten
iterations.
And so after refining the test, we had a better understanding about the
bug and we knew exactly why the bug is happening. So we finally came up
with an exact schedule of the thread that actually leads to the bug.
And this specifies a single schedule. You don't have to do a search,
and you can run it and it will hit the bug on the first schedule. Okay?
And note one thing, this is not like specifying the entire schedule of
all the threads. We are only specifying the key scheduling
distributions that are important for producing this bug. Now once you
have this test, you can put it as a regression suite in your test. And
it's kind of robust to code change. If the code changes, it will still
run and it will be able to find the bug if it's still present.
So this is a brief tutorial of the Concurrit framework that we have
developed for testing concurrent programs where the programmers get
better control of what they want to write and how they want to control
the schedules and play with the various kinds of model checking
heuristics and also search techniques.
So we have implemented this tool in an embedded DSL for C++ which is
available. We can write both unit tests and system testing. And for
unit testing we run it in the same process, and we do both manual and
automated instrumentation. And for system testing, we run it as a
separate process. So I guess we are running out of time. So just to
give you an idea, as I mentioned, that most of the model checking
techniques do not scale, but we managed to run it on regular software
including the Moxilla JavaScript engine, the Threading Library, the
Memcahed, Apache HTTP Server and MySQL, and we managed to actually
reproduce a number of bugs that has been reported in the bug database
in a robust way. And the tests are like five or six lines of code. And
that was a big success for us.
Thank you.
>> Juan Vargas: Thank you.
[ Audience clapping ]
>> Juan Vargas: So if you have more questions, Koushik is going to be
around for lunch. And now we are going to have the last presentation by
Sunny. Sunny Chatterjee from Microsoft, and he's going to be talking
about concurrency bugs with advanced static analysis technology. And
after his talk we will have lunch. Food is going to be in the back
behind this wall, and then we are continuing with a session on
applications from one to three. So at one, please come back here and we
will continue with the session.
>> Sunny Chatterjee: Hello, everyone. My name is Sunny Chatterjee. I am
a developer in the Analysis Technologies team in Windows. And today I'm
going to talk about a set of concurrency tools we have developed based
on static analysis that we use for finding and fixing concurrency bugs
in major software in Microsoft like Windows Office and other divisions.
I know we are running short on time so I'll be as brief as possible so
that you can go for lunch. So first I would like to talk about our
team. We are part of the Engineering Desktop team in Windows. We
develop and support some of the most critical program analysis tools
and services that are used across Microsoft. We have various tools at
the source level and binary level. At the source level we have a set of
global analyzers, local analyzers, and we have a source code annotation
language called SAL. At the binary level, we have binary
instrumentation tools and we have code coverage tools based on the
binary instrumentation technology.
Today specifically I'll be talking about a concurrency SAL which is a
source annotation language we have developed for specifying locking
behavior in a program. We'll also talk about toolset that is called
EspC. EspC stands extended search over programming that finds
concurrency defects which can also understand SAL. We'll also talk
about how we are using these tools internally in Microsoft to find and
fix thousands of concurrency bugs that compile time on the developer's
desktop and how we are planning to ship these tools externally so that
we can help the ecosystem.
So we are all aware of the common locking problems that we see every
day. There are insufficient lock protections which results in
[inaudible] conditions. There are lock order violations which result in
deadlocks. We forget to release locks resulting in orphaned locks.
There are no-suspend guarantee violations. There are API's which have
an implicit locking nature like send message and we inadvertently call
them without realizing that they might have a potential to block our
application. There are many similar classes of locking problems, and
our tools try to find and fix these problems at the developer desktop.
So the key challenge for us is how do we force a locking discipline
because we all know that locking discipline is essential for avoiding
multithreaded errors. However, it's surprisingly difficult to enforce
in practice and there is no support in the high-level languages like C,
C++. So our solution is a set of concurrency tools that is based on
annotations for locking rules. Annotations are a formal way to make
implicit assumptions about the locking behavior explicit. We then have
a tool called EspC that uses local static analysis to catching locking
violations in the program. And we have a tool called CSALInfer which is
annotation inference and patching tool that can help jumpstart the
effort on a legacy code base where you don't have any annotations. So
what happens is, if you run CSALInfer it automatically infers and
patches your code with concurrency SAL, and then EspC becomes a lot
more effective and accurate in catching locking violations in your
program. So the question might be asked why we use SAL as a solution.
So there are a few other approaches we explored. For example, manual
review of code. We all know there can be a large number of paths in a
given program. And in a multithreaded environment it's very difficult
to figure out concurrency bugs just by manual inspection.
Testing we found is ineffective for two reasons: one is there are -often times we find simple programming errors where data needs to be
protected by a lock. We acquired the lock, we write to the data and we
forget to relieve the lock in a certain path. These are the kind of
simple programming errors that we do not want to defer to until
testing. Testing is also ineffective because sometimes concurrency bugs
are very hard to detect and debug. They might result in
nondeterministic failures. They might result in [inaudible] stress
tests, so mapping that back to the source code can be time consuming
and expensive. We also have global analysis tools like global EspC, but
it heavyweight and time consuming. It can take a week to provide
results which renders it a bit ineffective where we want to actually
provide results right on the developer desktop at the compile time. So
what we use is a local analysis tool called EspC which can find all
these bugs at the compile time. But to make it more effective we use
SAL because SAL provides a calling context to the EspC tool set which,
otherwise, would limit its accuracy.
SAL is nothing but a lightweight specification language that makes
implicit assumptions about a function explicit. So what happens many a
time we find people do provide locking side effects about a function in
comments. Wouldn't it be nice to have a formal way where we can provide
a formal language where we can specify the same commands and at the
same time the local analysis tools can take advantage of these? SAL
does exactly that.
We have a few annotations that we came up with like -- I just wanted to
show a few of them. For concurrency annotations we have "acquires
lock," "releases lock," for example, provide function post conditions
where functions [inaudible] event functions acquire and relieve the
lock as opposed condition respectively.
Similarly we have function pre-conditions like "requires lock held" and
"requires lock not held" where in certain cases a function needs a
certain lock to be held or not held before it is called as a precondition. And these two annotations let us specify these
preconditions. There are also a group of invariant annotations,
sometimes specific data needs to be protected by a certain lock. And we
can specify this behavior using the "guarded by" annotation. There is
also a "lock level order" annotation that can specify your ordering on
the locks. So whenever we acquired the lock in the reverse order, we
can find out deadlock situations.
There is also an annotation called "low competing thread" because we
are seeing that in a multithreaded environment there are certain
initialization functions or constructors that execute in a singlethreaded context and no competing thread tails are not to worry about
multithreaded context when analyzing that function.
So our tool is based on a very rich static analysis platform that's
used for a wide variety of static [inaudible] in Microsoft. So the
beauty of this platform is that it abstracts out the high-level source
language at the front end. So basically it can parse C, C++, SAL. It
can parse manage code using an MSIL driver. It can also parse
JavaScript and reconstruct an intermediate representation using
controls or graphs. And we have an analysis layer on top of the
intermediate controls or graphs which provides the tools for analysis.
For example, we have an alias analysis engine which can help us
determine if two variable regions point to the same memory location. We
have a pretty accurate symbolic path simulator called Symbolic
Simulation Manager which can accurately point out which are the
feasible paths in a given program. And on top of that we have a group
of checkers like EspC that check for specific properties. EspC, for
example, checks for concurrency properties. There are other checkers,
like mal Pointer. There are other specific checkers on top of that.
So this is a very, very robust platform that's used extensively for
writing static tools. So...
>> : [Inaudible]?
>> Sunny Chatterjee: So the false negative rate -- The criteria is that
if we want to enable the tool on the developer desktop or if we run it
as a daily build then the tool has to be 80% or more accurate. So what
we do is -- So the warnings that we enabled on the desktop have a false
positive rate after 80% and less for false negative, it's not -- So for
a developer desktop it's a not a big concern because they don't cause a
negative experience. What we want to make sure is we want to enable
developers and make them realize that we are actually helping them find
the right bugs. So we are more tuned and optimized towards reducing
false positives and not so much on -- we haven't done any analysis on
false negative in that way.
But if we have the right annotations then it is very, very accurate.
The only scenarios where you will have false negatives is when you
don't have good annotation in your code base. Then sometimes in the
calling context you might not have all the locking behavior. And at
that point you might have false negatives. So the approach we take is
we run EspC out of the box without any annotations. It provides value.
Then you run CSALInfer to infer annotations. And you run it again in an
iterated way, and it provides a lot more accurate warnings.
So here we show how the lock sequence is computed in the control
[inaudible]. So at every point in the program, EspC keeps track of a
set of locks that acquired and released in the program. And then it
does some checking for finding concurrency defects that I'll talk about
in the next slide. I wanted to talk about a few optimizations that we
had to make so that this can scale in large code bases like Windows
because, otherwise, we did have scaling issues. One issue is that we
tried to explore every path in a given function. And we found that that
does not scale always. So one of the algorithms we use is to have a
merging. So in a given path, if the property we are checking for
doesn't change then we merge the path into one. And this really helped
in the scalability of this tool. There are also heuristics we used. We
tried to determine if a given function actually is concurrent or not.
And if it doesn't seem like an interesting function to analyze, we will
skip that. The path simulation also is time-consuming. So what we do is
we first [inaudible] off and then we tried to analyze to find
concurrency defects. And then we turn it back on to rule out the false
paths, the infeasible paths, that way we can provide very accurate
warnings at the same time in a very optimal way. We checked for various
classes of defects. For example, we checked for cyclic locking to point
out deadlock warnings. We checked for insufficient locking to point out
race conditions. We also find out if functions exit while holding a
lock so that we can point out orphaned lock warnings.
I would like to talk briefly about the annotation inference engine that
we have today. So we initially developed an annotation inference based
on a [inaudible] constrained solver. And when we tried to deploy it in
Windows, it did not scale. So what we ended up developing is a hybrid
tool which uses certain classes of heuristics to infer those
annotations. In this particular example, you can see that when we are
writing to the data in process buffer, we are protecting it by the lock
PCS. So EspC is smart to figure out that the data must be protected by
the lock and it infers guarded by annotations on the structure.
Next about internal adoption in Windows. So today we run these tools on
the engineer desktop. So every developer that writes code and compiles
would have EspC running in the background process. And all these
concurrency warnings will show up on the desktop. This way we are
fixing thousands of warnings even before they are getting into the code
base.
We also run these tools daily as a build verification service. So even
if a developer checks in a concurrency warning in the code base, we
have a daily build verification service that would flag these errors.
And the developer would need to fix these errors before code can move
out of the branch. So this ensures we have a high-quality product which
does have these concurrency warnings. It is used very extensively in
Windows. In the beginning of Windows 8, for example, we added like
thousands of concurrency annotations. It was a joint effort across all
the [inaudible] that signed up for doing this work. We have also other
divisions like Office, Windows Mobile, SQL and the ConCRT team that
have also used these tools. We are helping the ecosystem by shipping
EspC as part of the VS level code analysis feature. So this is
available in the pro and ultimate versions of Visual Studio 2012. In
this case, for example, we see a program where a count is protected,
the balance is protected by the lock CS, critical section, and the
unsafe withdraw method. We are actually accessing that data without
acquiring the lock. So EspC is quick to warn about the risk and the
possible risk condition. So this way we do believe that by exposing
this exposing this to external developers, we can help the ecosystem
write better multithreaded code. We have a bunch of resources
externally. We have our MSDN documentation that talks about concurrency
SAL. This documentation is still work in progress that we are still
finalizing, but you can still access it today and go through these
annotations in details. We also have our team blog externally in Visual
Studio code analysis. We also have a couple of talks that talk go into
-- it's like a one-hour talk so it shows a lot more details about these
tools in the build conference last year. So you can also go and take a
look at those talks.
So what we covered today is a brief primer on concurrency SAL, and we
learned that we shouldn't treat locking disciplines as afterthoughts.
It should be very much part of an interface design. We learned that we
should be using EspC toolset because for a Microsoft product, no corner
case is rare. So we do want to avoid those hangs and non-deterministic
failures at the customer's desktop. We also talked about how we
internally adopt these tools in Microsoft because the cost of fixing a
bug increases with time. It's cheapest to fix it at the developers
desktop. This way we can push quality upstream. We also talked about
how we are shipping these with VS11 code analysis feature to help the
ecosystem as a whole. So that's pretty much what I had.
[ Audience clapping ]
Download