>> Martin Monperrus: Code Hunt makes programming fun. ... and I applaud because fun is the way to go...

advertisement
>> Martin Monperrus: Code Hunt makes programming fun. Makes for fun so it's for fun twice
and I applaud because fun is the way to go in life and it's what I want and it's what I want to
give to my children and so fun is the word. The other one I completely disagree with that Code
Hunt makes programming fun because programming is indeed fun. I don't know for you, but
for me it can be completely thrilling. I am on my computer for hours with this mindset, this
pure programming mindset where you are deeply concentrating and you are, indeed having a
whole lot of fun. Why on the one hand do we say Code Hunt makes programming fun and on
the other hand we know that intuitively programming is fun? In programming there is one kind
of fun, one special kind of fun which is bug fixing. You know there is something wrong. Your
program does something unexpectedly. You expect something and the output is another thing.
You start make some reports. You start to add some logging statements to understand exactly
what happened and it can take hours to find an explanation and then to have a fix, and then to
find a fix for this bug. Bug fixing is fun indeed and for a couple of years I started the meta bug
fixing activity which is automatic repair, so designing programs that fix bugs directly. This is
what I will be talking about today. With this programming is fun. Bug fixing is fun and meta
bug fixing which is automatic repair is even more fun. I come from France, from Lille. Where is
Lille? This is all of Europe. France, UK, Belgium, Germany here and so Lille is in the northern
part of France about here. It is very close to Belgium so we used to drink Belgian beer a lot and
yesterday I was asked about the specialties of food. We have a very good Cabernet and on the
last slide I will tell you about the best dessert that you can have in Lille, but this is only for the
last line. I am an associate professor there at the University of Lille which is one of the biggest
computer science departments in France, and a researcher at Inria which is a research institute
for computer science, so it's co-located with the university. I spent most of my time there. I
worked in Germany for three years before coming to Lille and my main research there is
automatic software repair with three kinds of things. Patch generation for like automating the
bug fixing process practice it is developers. What I call a runtime state repair, so it's repair at
runtime so instead of having a crash you try to change the application at the program state so
that the program doesn't crash or something else so it's like a different kind of repair and I like
to, I think that we can, we will be able to automatically repair bugs only if we really understand
the deep nature of bugs and the deep nature of bug fixing which is slightly different, so I'm
doing a lot of empirical studies on what bugs are. How will you fix them and so I have those
three activities. When I am fixing these I do most of my experiments on Java because of the
missing link here, so in my group we have an expertise in Java source code and transformation
so we have the perfect tools to manipulate ASTs to manipulate test [indiscernible] and exactly
what's needed for automatic repair. What is the repair game? The typical repair game
[indiscernible] we have already seen yesterday that [indiscernible] we have two different
games. The first one is a classical one. We have a specification where we try to code this
specification. And the second one is we have nothing and we try to reverse injurings of the
program. The typical repair game is slightly different. You have a program and you have a test
case, like this one. It's a real test case and the test case is failing. The user reports a bug.
Maybe the same user or a developer wrote this case to highlight the bug, so this test case is
failing and the goal is to make it passing. This often happens that first try the test case, so you
first try the test case and then you say I want to make it passing. The user tells me that there is
no way when you have zero entries in simplex solver, so this is a real bug from a real math
library from Apache. The test case is this one and we can see what's the test case. We have
typical code and we have zero assertions. A zero assertion is what highlights a bug. Zero
assertion is failing. It's a red one. Now we are in real life so the developer that wrote this code
left so I am the developer to fix this bug and I have to understand. Okay zero entries in simplex
solver, so probably this bug has something to do with zero values. But there is one here, one
here, one here, one here, okay. There are many zeros. It's really hard to know which one is the
problematic one. The bug is in the output of the simplex solver, so maybe the fix is in the
simplex solver class, but maybe not, because maybe the bug is actually been a used by
something in the constraint class or in the linear constraint class or may be in the linear
objective function, so I don't really know. The game here consists of understanding what
happens because simplex solver is not very natural for all of us for where it happens and what
to do. The final solution is actually for this real bug. The real patch is this one. We have one
character change in the method, but as we saw yesterday with William, a single character
change can have a huge semantic impact. And even if the final solution takes us where we
want to go, what's the path to get there? The path to get there in this case, and this is different
from what you have seen so far is that we are dealing with large things. In this example, at this
point in time when this bug was reported it was more than 38 k lines of code and more than
200 classes and the specifications of the test cases is even larger than the applications so 14 k
of specification and more than one test cases. This sets up somehow the size of the search
page, so I have to understand where is the bug and what I should change. What is repair?
More formerly, according to the repair game we have just seen, we have a specification S of
program P and so the very standard question in programming is the correctness question does
P comply with S correctness, usually binary if it is not the [indiscernible] case. We can add the
synthesis problem. We only have S inside the program P that complies with S. The repair which
is rather new in the research for some reason, I'm still looking for those reasons, it is slightly
different. Between the two it is like this. It is okay you are [indiscernible] program P and
association S but P doesn't comply with S and you are looking for change C, so does the change
apply to P, complies with S? We have something like this. Definitely it involves this notion of
specification and satisfaction and also this notion of change, so edits on the existing program.
This is the repair game that could be played. If we want to repair games in Code Hunt we give
the players P, S and we asked them for a change to fix the bug. This is a basic repair program,
basic repair game which is funny in itself, but we can have much more fun on the specification
side in real life. First the specification could be incomplete. Usually we have test suite. It is
only some input points, some data points and so having a partial input domain a partial error
domain makes the repair problem statement even harder. The specification can also be implicit
so this can be such that the program shall not crash. It's not necessarily written anywhere and
so sometimes you try to repair an implicit article and in this case it makes things more difficult.
And the specification itself may be partially incorrect. So when we use test suite specification
what we truly often see in real test suite is that sometimes there are some commits which
make test cases passing but some commits slightly change above the test case and the
program, which means that the test case before was incorrect, so you cannot really take the
specification as untouchable truth that you cannot touch. This is kind of the other kind of
repair thing. As we said with this P plus C, today I see repair as a local synthesis problem where
we have 99 percent of the synthesis is already done. This is an existing program and you try to
synthesize the remaining percent of the program and this is a fix. This is the fix you are looking
for. Let's assume that you can perfectly identify where is the problem. Then the location of
this consists of synthesize of synthesizing the part where the problem is. For instance, if you
can synthesize, if you can identify exactly which if statement is buggy, you can remove the
existing code and you try to synthesize only the if statement to repair the program. We have in
this case a look at synthesis where we have the complete program. We have 99 percent of the
solution. We have some kind of [indiscernible] technique that tells us that this if is faulty and
then we synthesize it. Yesterday, Daniel in his system told us some changes line to capture the
code. It's exactly identical to what we have here, change this line to repair the code. If we can
find the exact location we have a synthesis problem. This we have explored this direction in a
system called the Nopol that I will present now. We are focusing on repairing buggy conditions.
In general, like 10 percent of commits, of perfect commits are one liner fixes. Among those 10
percent of one liners one third are changes in if conditions, so a common can of bugs. We want
to repair those kinds of bugs. I assume that you have a piece of code. This case which is failing
and we want to find the exact place where the bug is. What's different with Nopol is what we
call, so we will use a term from a ICSE 2010 paper by [indiscernible] and colleagues which is the
concept of angelic value. Let's assume that a [indiscernible] can come during execution, change
[indiscernible] program state and the execution continues afterwards. Let's we have failing test
case and passing test case [indiscernible] test case so the failing test case thoughts, executes
and so on and arrives here. Before executing the condition in angel comes down to earth and
says this if condition [indiscernible] to true. This is true. You resume the program execution.
You come to the answer at the end and see whether it fits or passes. It fails. You start again.
You come again to this if condition. The same angel comes, but this time sets the condition
value to false. By setting it to false the test case passes, so what we have here is that we
observe and angelic value which is a value that is arbitrarily set at runtime to enable a failing
test case to pass. If you have a failing test case executing 100 if conditions you can go through
each of them one after the other asking the angel to manipulate the execution and see whether
you the test case passes afterwards. If you have an angelic value you have done 50 percent of
the repair process. You know that at this point of the execution if the code is the new code
produces angelic value the test case passes.
>>: If there are 100 conditions, wouldn't I have 200 different possibilities to try out to find the
angelic values for each one of them?
>> Martin Monperrus: No for two different reasons. The first is because we assume a single
point of repair, so we eliminate them one by one, and second, because we assume that it is
very close to reality that during the test case execution and if condition is otherwise related to
the same value. A single test case usually for a given if case it is always true or false to some
extent, but we have unproven data making this claim. We have what we call this angelic fix
localization where we have this where and so we find where we can synthesize the code to fix
the bug. Some properties see search space is two times n where n is the number of if
conditions. We have an interesting property which is if we cannot find an angelic value it
means that the bug cannot be repaired by changing and if condition. It's another kind of fix. If
you can find an angelic value for a given if for all failing test cases you have an input of
pervasive specification of the repair problem where you are trying to, you are looking for a
Boolean expression here because we are in an if condition, so we are looking for a Boolean
expression such that for each failing test the Boolean expression of the expression context
returns an angelic value. For each passing test you don't want to break the existing function
[indiscernible] so for each passing test you want the synthesized expression to return the actual
executed value, so you may change the expression [indiscernible] synthesize so it is a new
[indiscernible] that you are looking for may be different from the previous one, but for the
passing tests it should give the same execution in the same value after revelation. And so now
we have an input of [indiscernible] specification of the repair problem. We know that we may
be able to find a piece of code that returns the good if conditions. Here we have the context.
The context of an if is the context of an if so in Nopol we correct the number of variables in the
program and the repair so, of course, are primitive variables we aim at is I told you I work with
Java so we aim at repairing Java programs, so we could collect them so the nullness of all of the
object variables whether an object is null or not because to be able to synthesize if a is not null.
We collect the values of side effect free methods with no parameters like list.size, for instance,
of course all constants. And we have a piece of secret sauce here which is one of the famous
repair technique is called genplug [phonetic] by Vimar, Forrest, Kalogwess and Colliques
[phonetic] in 2009. And the key assumption in genplug is that the repair comes from elsewhere
in the code. Last year we did our own study published at ICSE where we verified this
assumption based on [indiscernible] mining. We look for commits that are only composed of
code existing somewhere else in the code base. Depending on the way you combed between
10 and 50 percent of commits indeed never invent new code. They are rearranged
[indiscernible]. Here we can do the same thing and this also has some validity. A lot of if
condition repair just to rearrange the if conditions so they just reuse a composed condition
from elsewhere. What we can do is we can collect, we can evaluate complex expressions. By
complex I mean a method code with parameters. We can evaluate them before synthesis
because we cannot on code the semantics of those complex methods during [indiscernible] this
process. We collect this and we have a bunch of values and now we have really and in fit of
[indiscernible] specification when the context is input the output which is the angelic value as I
presented.
>>: Maybe you said it and I missed it. He said that you look at the actual repairs and actual
fixes and how many of those were only if conditions? What's the evidence that there's a lot of
bugs that are just [indiscernible]?
>> Martin Monperrus: Between 10 percent of [indiscernible] are one-liners and approximately
1/3 of one-liners are if condition fixes.
>>: So 3 percent of the fixes are, okay.
>> Martin Monperrus: In Nopol we start the pure synthesis part and we use a componentbased synthesis. It's a wonderful technique. It was published in ICSE 2010 from Berkeley which
encodes the synthesis as an SMT problem. I won't go into the details, but basically what we
have is the input and input-based specifications and so we have the input which is here, so here
we assume that the input is only two variables. , One variable index and a constant zero. We
put components to the synthesis so what can we have in the synthesized expression? Here we
can have only one negation operator and one inequality operator, but we can have as many as
we want. And we have the output and this synthesis assigns one line number, so it denotes line
number to each of those inputs and outputs and the solver, the SMT solver is asked to find
some wiring between those components, so here the solver searches for four integer values.
How those integer values are interpreted? Let's assume the solver finds those values. I really
miss the bottom of the slide, but anyway. Let's give it this way for now. The condition is
connected so the synthesized code is connected to the component on line 3 here. Line 3 is the
negation operator. The negation operator is connected to the output of line 4. The output of
line 4 is the inequality operator here. And the inequality operator is wired to line 2 and line 1
which corresponds to the two input variables. At the end the synthesized expression, so the
SMT solver gives us those four integer values and they correspond to this synthesized
expression. We can use exactly the same thing to object oriented fixes like here we encode so
as input the expressions we want to have in the repaired expressions. For instance, the size list
is not null and so on. We also asked for values to the solver and at the end here we almost see
it. We have the output connected to line 5 so line 5 is the end expression connected to 3 and 4
and so on so we have are [indiscernible] synthesize this object oriented code so this fix, so list is
not null and list.size is lower than index. Of course, we connect many components and some
components may not be used and in this case you just give the answer to the questions in life.
What is Nopol? We have this angelic fix localization which gives us the answer to where the
bug is and where it should be fixed. We have the runtime value collection where we collect a
large number of things. We encode this into SMT as a synthesis technique. We could use
another one and it gives us what is the new code and then we have the patch synthesis. This is
what Nopol is. Start the evaluation we take real bugs, so on two libraries here, Apache
Commons Math, Apache Common Lang which is up to 64 lines of code. We ran the system and
we observe a different kind of things. We see the kind of patches that are synthesized by
Nopol. We see the standard if conditions. As a result of this evaluation is that Nopol fixed 18
conditional bugs of large-scale object-oriented Java source code. The repair takes less than 2
minutes. The patches often differ from the original one which is always surprising but it gets to
the next point which is there often exists multiple different patches. It seems that humans
select one patch among a set of different patches and all of them being valid, let's say. As soon
as you play with, so this is one component of the fun part of the repair is that you observe that
software is much more plastic than you might imagine in the first place. Yet?
>>: Your repair of the test cases past, right?
>> Martin Monperrus: Yes.
>>: And how many of these cases actually go through the [indiscernible] my issue with these
kinds of repairs is always that may be in the test case that doesn't mean that you fixed the
code. So when you say that the patches are different and you manually check that they are
semantically equivalent?
>> Martin Monperrus: Yeah.
>>: And they are all?
>> Martin Monperrus: Some of them they are semantically equivalent. For some reason they
are not semantically equivalent but they seem correct or so according to our understanding of
the domain because we are not the domain expert. And for some of them this is the
[indiscernible] made after the test switch is not enough.
>>: Not semantically equivalent but you think they are correct? I mean it's tough to say, right,
with any reasonable, so for how many of them you can, there is semantically equivalent?
>> Martin Monperrus: Semantically equivalent or semantically correct?
>>: Semantically equivalent. Semantically correct you don't have a specific case so you can't
really…
>> Martin Monperrus: Exactly. I will tell you after. The paper we have another column which is
original patch, but here I only have this one. I think it's 1/3.
>>: 1/3, and do you remember for how many test cases were going through there?
>> Martin Monperrus: Between four and 60 or 70.
>>: So we're going to the condition that you fixed?
>> Martin Monperrus: Yes, exactly. But the number of test cases is always, not realize its
metric because you can have one test case which is very large with many, many assertions, so
it's actually a very, very strong test case as far as the conditions.
>>: Yeah, but that's already good because some of the work that I have seen you had like one,
too test cases going through which means that you can fix it very easily, right, you just have to
patches so the test case behaves as expected. Sometimes you don't even have an assertion so
as long as you refresh, it's fixed and so anyways, okay.
>> Martin Monperrus: I agree. Some discussion, what is the limitations of the system? First is
the synthesis limitation, synthesis of the code somehow the semantics of the repair of this
achieved and we were not able to synthesize code containing methods with parameters. Large
test cases are always a problem because in large test cases a key assumption is you only
evaluate the if to draw a false [indiscernible] case doesn't hold anymore and this was an issue
for us. Weak test cases, of course, when there are no assertions or bad assertions or too weak
assertions is bad. The angelic value [indiscernible] technique does do well unless you have the
if something break in loops because if you put if something so when the angel comes the angel
is an angel and is never guilty just beautiful and perfect, so the angel puts true if true, so if true
is okay. But if false is an issue in loops because you get into an infinite loop and so on.
Perspectives of the system using other synthesis techniques to overcome the first limitation
and so we are experimenting the automatic repair of infinite loops which is [indiscernible] the
automatic fix or repair preconditions, method precondition of the classical so if you are in poor
language with respect to preconditions such as [indiscernible] the preconditions are returned as
ifs and things for new exception and so we can repair them or so we are exploring this. This is
one part of what I claimed in the previous paper is we are very careful about what we are
talking about exactly when we compare to current problems whether it's captures a code so
reverse engineering program or it's just specification parameters the same. It's not the same
thing to repair buggy conditions, unhandled exceptions, memory leaks and so on and we have
to be very careful. We have to qualify the problem and the [indiscernible] data set. And we
come to the last slide. No, this is not the last one because the last one is about the dessert
from Lille, which is very good if you like chocolate. Just before this repair and Code Hunt, so
yesterday and today I have seen many, many relations between repair and what we have seen
in Code Hunt. First, of course, hint generation system presented by Daniel yesterday. We could
use exactly the same kind of technique as angelic value to speed up finding the hints. We could
also use component-based synthesis instead of dynamic synthesis. It may hasten the hint
generation. I would be very interested in observing the player's repairs. It's similar to what we
have done in these commits. The very last step in the current game is it adding the same,
[indiscernible] mentioning the return value and so on, so we could really understand again the
nature of getting to a solution and to understand the path from the problem to the solution?
But the third thing is this notion of repair duels. As I claim that the beginning of bug fixing is
very fun and very addictive, so maybe a repair duel where the goal is to you have the most
correct implementation is to find the, to make it correct. This might be also very fun and very
addictive for players or this might just be a break in the game experience. Sometimes you have
a specification problem. Sometimes you have a reverse generating problem and sometimes
you have a repair where you how to find the, so it's can be a break in the game experience. If
we go for repair duels there is a key question which is what makes a good repair duel. It's a
hard question. Yesterday I started to play with the system and basically there are some small
changes that make all test cases given by [indiscernible] a failing. Some of them result only in
one failing test case. I tend to think that a good repair duel is a good balance between passing
and failing test cases. In other words it's kind of depressing that you have one failing test case.
You don't want to repair this. It's not much of anything, so there is something in the game
experience between the failing and the passing. There is definitely something with respect to
what William told us about yesterday about the [indiscernible]. Maybe or so it's more fun to
repair in the repair duel to repair a program when, which is correct to 90 percent of the input
space. I don't know, but something has to be explored here and the missing point here is if we
are able to qualify those two points we may be able to generate the repair duels from the
existing solutions. So modify the secret code so that we have a bug program which is fun with
respect to the balance between failing and passing and the model counting. If we can generate
the repair duels it's good with respect to having a lot of data because as far as I understood one
issue is the number of programs we have. If we can generate new fun programs with a good
game experience it would be a cool thing. Conclusion, automatic software repair is fun. That's
the take away of my talk because it's fun because we have the fun of the search space which is
huge. We have the fun of dealing with the specifications. We can be incomplete; we can be
incorrect so it's very, very fun to understand the nature of associations and the synthesis is
really fun as well. When you have a system, so duel for automatic repair is always very boring
because it just [indiscernible] out of the system, but that [indiscernible] of the output where
you can read, touch [indiscernible], so that's very good experience. This is the end of my talk
and this is a wonderful picture of [indiscernible]. What do we have in [indiscernible]? We also
have different components like automatic repairs, specification search space, so [indiscernible]
we basically have three components. The [indiscernible] when you first look at it looks like fact
because there's a lot of chocolate and cream and so on, but as usual with French cuisine it's
actually very, very fine. In the middle there is meringue, so meringue is something with eggs,
but it's very, very light so that as soon as it's in the mouth it disappears. [laughter]. The
meringue we have some cream afterwards which is excellent and at the top it's dark chocolate,
but what's incredible with [indiscernible] is that even if it looks very fat it's very fine and very
light at the end. This is a specialty of Lille so since she is only 45 minutes away and towards the
goal, one hour away from London and one hour away from Brussels you are all welcome to visit
us in Lille and try the [indiscernible]. Thank you very much. This is the end of my talk.
[applause]
>>: If you go back to the previous slides and like the proposed work on automatic generating
duels, I do not know where you [indiscernible] dimension or what you repair for repair duels.
The coding duel naturally you support that. Basically, instead of giving you return zero is a
starting point. You just give them the four diversions. I mean with existing [indiscernible] you
could already prepare these kinds of duels for training students in terms of their buggy
repairing skills, right? I mean I just would like that clarified, because we have seen almost all of
them just start with return zeros, return x, but the [indiscernible] is very flexible allowing
whatever your initial code is to start with, whatever code happens as a hint for telling students
what kind of requirements you need to [indiscernible] other than pure guesses. I think, I like
the last point is could we leverage the data to change histories of all these players to evolve or
produce new games or new coding duels? Maybe initial dues would be pure guessing but just
for the mistakes, from the mistakes being made by the players in the past you may just produce
specific more like repair duels.
>> Martin Monperrus: The thing about what strikes me today is that an automatic repair is fun
but actually there is a hidden part that is not fun at all, which is finding actual bug, real bugs for
your next ICSE paper because if she wants you to repair real bugs. Finding bugs is easy but
repairing them is hard and it takes you a whole lot of time so for a long time okay but we can
generate bugs. It's very easy. We just take mutants and that's fine, but then it is not realistic
anymore, so for your next ICSE paper it's an issue. But here it's different. We can generate
them because the goal is again experience. And maybe skill building and so in this case it's
perfectly fine to maintain the code according to like game experience metric and it makes
perfect sense. It's pretty cool. Thank you, again. [applause]
Download