>> Martin Monperrus: Code Hunt makes programming fun. Makes for fun so it's for fun twice and I applaud because fun is the way to go in life and it's what I want and it's what I want to give to my children and so fun is the word. The other one I completely disagree with that Code Hunt makes programming fun because programming is indeed fun. I don't know for you, but for me it can be completely thrilling. I am on my computer for hours with this mindset, this pure programming mindset where you are deeply concentrating and you are, indeed having a whole lot of fun. Why on the one hand do we say Code Hunt makes programming fun and on the other hand we know that intuitively programming is fun? In programming there is one kind of fun, one special kind of fun which is bug fixing. You know there is something wrong. Your program does something unexpectedly. You expect something and the output is another thing. You start make some reports. You start to add some logging statements to understand exactly what happened and it can take hours to find an explanation and then to have a fix, and then to find a fix for this bug. Bug fixing is fun indeed and for a couple of years I started the meta bug fixing activity which is automatic repair, so designing programs that fix bugs directly. This is what I will be talking about today. With this programming is fun. Bug fixing is fun and meta bug fixing which is automatic repair is even more fun. I come from France, from Lille. Where is Lille? This is all of Europe. France, UK, Belgium, Germany here and so Lille is in the northern part of France about here. It is very close to Belgium so we used to drink Belgian beer a lot and yesterday I was asked about the specialties of food. We have a very good Cabernet and on the last slide I will tell you about the best dessert that you can have in Lille, but this is only for the last line. I am an associate professor there at the University of Lille which is one of the biggest computer science departments in France, and a researcher at Inria which is a research institute for computer science, so it's co-located with the university. I spent most of my time there. I worked in Germany for three years before coming to Lille and my main research there is automatic software repair with three kinds of things. Patch generation for like automating the bug fixing process practice it is developers. What I call a runtime state repair, so it's repair at runtime so instead of having a crash you try to change the application at the program state so that the program doesn't crash or something else so it's like a different kind of repair and I like to, I think that we can, we will be able to automatically repair bugs only if we really understand the deep nature of bugs and the deep nature of bug fixing which is slightly different, so I'm doing a lot of empirical studies on what bugs are. How will you fix them and so I have those three activities. When I am fixing these I do most of my experiments on Java because of the missing link here, so in my group we have an expertise in Java source code and transformation so we have the perfect tools to manipulate ASTs to manipulate test [indiscernible] and exactly what's needed for automatic repair. What is the repair game? The typical repair game [indiscernible] we have already seen yesterday that [indiscernible] we have two different games. The first one is a classical one. We have a specification where we try to code this specification. And the second one is we have nothing and we try to reverse injurings of the program. The typical repair game is slightly different. You have a program and you have a test case, like this one. It's a real test case and the test case is failing. The user reports a bug. Maybe the same user or a developer wrote this case to highlight the bug, so this test case is failing and the goal is to make it passing. This often happens that first try the test case, so you first try the test case and then you say I want to make it passing. The user tells me that there is no way when you have zero entries in simplex solver, so this is a real bug from a real math library from Apache. The test case is this one and we can see what's the test case. We have typical code and we have zero assertions. A zero assertion is what highlights a bug. Zero assertion is failing. It's a red one. Now we are in real life so the developer that wrote this code left so I am the developer to fix this bug and I have to understand. Okay zero entries in simplex solver, so probably this bug has something to do with zero values. But there is one here, one here, one here, one here, okay. There are many zeros. It's really hard to know which one is the problematic one. The bug is in the output of the simplex solver, so maybe the fix is in the simplex solver class, but maybe not, because maybe the bug is actually been a used by something in the constraint class or in the linear constraint class or may be in the linear objective function, so I don't really know. The game here consists of understanding what happens because simplex solver is not very natural for all of us for where it happens and what to do. The final solution is actually for this real bug. The real patch is this one. We have one character change in the method, but as we saw yesterday with William, a single character change can have a huge semantic impact. And even if the final solution takes us where we want to go, what's the path to get there? The path to get there in this case, and this is different from what you have seen so far is that we are dealing with large things. In this example, at this point in time when this bug was reported it was more than 38 k lines of code and more than 200 classes and the specifications of the test cases is even larger than the applications so 14 k of specification and more than one test cases. This sets up somehow the size of the search page, so I have to understand where is the bug and what I should change. What is repair? More formerly, according to the repair game we have just seen, we have a specification S of program P and so the very standard question in programming is the correctness question does P comply with S correctness, usually binary if it is not the [indiscernible] case. We can add the synthesis problem. We only have S inside the program P that complies with S. The repair which is rather new in the research for some reason, I'm still looking for those reasons, it is slightly different. Between the two it is like this. It is okay you are [indiscernible] program P and association S but P doesn't comply with S and you are looking for change C, so does the change apply to P, complies with S? We have something like this. Definitely it involves this notion of specification and satisfaction and also this notion of change, so edits on the existing program. This is the repair game that could be played. If we want to repair games in Code Hunt we give the players P, S and we asked them for a change to fix the bug. This is a basic repair program, basic repair game which is funny in itself, but we can have much more fun on the specification side in real life. First the specification could be incomplete. Usually we have test suite. It is only some input points, some data points and so having a partial input domain a partial error domain makes the repair problem statement even harder. The specification can also be implicit so this can be such that the program shall not crash. It's not necessarily written anywhere and so sometimes you try to repair an implicit article and in this case it makes things more difficult. And the specification itself may be partially incorrect. So when we use test suite specification what we truly often see in real test suite is that sometimes there are some commits which make test cases passing but some commits slightly change above the test case and the program, which means that the test case before was incorrect, so you cannot really take the specification as untouchable truth that you cannot touch. This is kind of the other kind of repair thing. As we said with this P plus C, today I see repair as a local synthesis problem where we have 99 percent of the synthesis is already done. This is an existing program and you try to synthesize the remaining percent of the program and this is a fix. This is the fix you are looking for. Let's assume that you can perfectly identify where is the problem. Then the location of this consists of synthesize of synthesizing the part where the problem is. For instance, if you can synthesize, if you can identify exactly which if statement is buggy, you can remove the existing code and you try to synthesize only the if statement to repair the program. We have in this case a look at synthesis where we have the complete program. We have 99 percent of the solution. We have some kind of [indiscernible] technique that tells us that this if is faulty and then we synthesize it. Yesterday, Daniel in his system told us some changes line to capture the code. It's exactly identical to what we have here, change this line to repair the code. If we can find the exact location we have a synthesis problem. This we have explored this direction in a system called the Nopol that I will present now. We are focusing on repairing buggy conditions. In general, like 10 percent of commits, of perfect commits are one liner fixes. Among those 10 percent of one liners one third are changes in if conditions, so a common can of bugs. We want to repair those kinds of bugs. I assume that you have a piece of code. This case which is failing and we want to find the exact place where the bug is. What's different with Nopol is what we call, so we will use a term from a ICSE 2010 paper by [indiscernible] and colleagues which is the concept of angelic value. Let's assume that a [indiscernible] can come during execution, change [indiscernible] program state and the execution continues afterwards. Let's we have failing test case and passing test case [indiscernible] test case so the failing test case thoughts, executes and so on and arrives here. Before executing the condition in angel comes down to earth and says this if condition [indiscernible] to true. This is true. You resume the program execution. You come to the answer at the end and see whether it fits or passes. It fails. You start again. You come again to this if condition. The same angel comes, but this time sets the condition value to false. By setting it to false the test case passes, so what we have here is that we observe and angelic value which is a value that is arbitrarily set at runtime to enable a failing test case to pass. If you have a failing test case executing 100 if conditions you can go through each of them one after the other asking the angel to manipulate the execution and see whether you the test case passes afterwards. If you have an angelic value you have done 50 percent of the repair process. You know that at this point of the execution if the code is the new code produces angelic value the test case passes. >>: If there are 100 conditions, wouldn't I have 200 different possibilities to try out to find the angelic values for each one of them? >> Martin Monperrus: No for two different reasons. The first is because we assume a single point of repair, so we eliminate them one by one, and second, because we assume that it is very close to reality that during the test case execution and if condition is otherwise related to the same value. A single test case usually for a given if case it is always true or false to some extent, but we have unproven data making this claim. We have what we call this angelic fix localization where we have this where and so we find where we can synthesize the code to fix the bug. Some properties see search space is two times n where n is the number of if conditions. We have an interesting property which is if we cannot find an angelic value it means that the bug cannot be repaired by changing and if condition. It's another kind of fix. If you can find an angelic value for a given if for all failing test cases you have an input of pervasive specification of the repair problem where you are trying to, you are looking for a Boolean expression here because we are in an if condition, so we are looking for a Boolean expression such that for each failing test the Boolean expression of the expression context returns an angelic value. For each passing test you don't want to break the existing function [indiscernible] so for each passing test you want the synthesized expression to return the actual executed value, so you may change the expression [indiscernible] synthesize so it is a new [indiscernible] that you are looking for may be different from the previous one, but for the passing tests it should give the same execution in the same value after revelation. And so now we have an input of [indiscernible] specification of the repair problem. We know that we may be able to find a piece of code that returns the good if conditions. Here we have the context. The context of an if is the context of an if so in Nopol we correct the number of variables in the program and the repair so, of course, are primitive variables we aim at is I told you I work with Java so we aim at repairing Java programs, so we could collect them so the nullness of all of the object variables whether an object is null or not because to be able to synthesize if a is not null. We collect the values of side effect free methods with no parameters like list.size, for instance, of course all constants. And we have a piece of secret sauce here which is one of the famous repair technique is called genplug [phonetic] by Vimar, Forrest, Kalogwess and Colliques [phonetic] in 2009. And the key assumption in genplug is that the repair comes from elsewhere in the code. Last year we did our own study published at ICSE where we verified this assumption based on [indiscernible] mining. We look for commits that are only composed of code existing somewhere else in the code base. Depending on the way you combed between 10 and 50 percent of commits indeed never invent new code. They are rearranged [indiscernible]. Here we can do the same thing and this also has some validity. A lot of if condition repair just to rearrange the if conditions so they just reuse a composed condition from elsewhere. What we can do is we can collect, we can evaluate complex expressions. By complex I mean a method code with parameters. We can evaluate them before synthesis because we cannot on code the semantics of those complex methods during [indiscernible] this process. We collect this and we have a bunch of values and now we have really and in fit of [indiscernible] specification when the context is input the output which is the angelic value as I presented. >>: Maybe you said it and I missed it. He said that you look at the actual repairs and actual fixes and how many of those were only if conditions? What's the evidence that there's a lot of bugs that are just [indiscernible]? >> Martin Monperrus: Between 10 percent of [indiscernible] are one-liners and approximately 1/3 of one-liners are if condition fixes. >>: So 3 percent of the fixes are, okay. >> Martin Monperrus: In Nopol we start the pure synthesis part and we use a componentbased synthesis. It's a wonderful technique. It was published in ICSE 2010 from Berkeley which encodes the synthesis as an SMT problem. I won't go into the details, but basically what we have is the input and input-based specifications and so we have the input which is here, so here we assume that the input is only two variables. , One variable index and a constant zero. We put components to the synthesis so what can we have in the synthesized expression? Here we can have only one negation operator and one inequality operator, but we can have as many as we want. And we have the output and this synthesis assigns one line number, so it denotes line number to each of those inputs and outputs and the solver, the SMT solver is asked to find some wiring between those components, so here the solver searches for four integer values. How those integer values are interpreted? Let's assume the solver finds those values. I really miss the bottom of the slide, but anyway. Let's give it this way for now. The condition is connected so the synthesized code is connected to the component on line 3 here. Line 3 is the negation operator. The negation operator is connected to the output of line 4. The output of line 4 is the inequality operator here. And the inequality operator is wired to line 2 and line 1 which corresponds to the two input variables. At the end the synthesized expression, so the SMT solver gives us those four integer values and they correspond to this synthesized expression. We can use exactly the same thing to object oriented fixes like here we encode so as input the expressions we want to have in the repaired expressions. For instance, the size list is not null and so on. We also asked for values to the solver and at the end here we almost see it. We have the output connected to line 5 so line 5 is the end expression connected to 3 and 4 and so on so we have are [indiscernible] synthesize this object oriented code so this fix, so list is not null and list.size is lower than index. Of course, we connect many components and some components may not be used and in this case you just give the answer to the questions in life. What is Nopol? We have this angelic fix localization which gives us the answer to where the bug is and where it should be fixed. We have the runtime value collection where we collect a large number of things. We encode this into SMT as a synthesis technique. We could use another one and it gives us what is the new code and then we have the patch synthesis. This is what Nopol is. Start the evaluation we take real bugs, so on two libraries here, Apache Commons Math, Apache Common Lang which is up to 64 lines of code. We ran the system and we observe a different kind of things. We see the kind of patches that are synthesized by Nopol. We see the standard if conditions. As a result of this evaluation is that Nopol fixed 18 conditional bugs of large-scale object-oriented Java source code. The repair takes less than 2 minutes. The patches often differ from the original one which is always surprising but it gets to the next point which is there often exists multiple different patches. It seems that humans select one patch among a set of different patches and all of them being valid, let's say. As soon as you play with, so this is one component of the fun part of the repair is that you observe that software is much more plastic than you might imagine in the first place. Yet? >>: Your repair of the test cases past, right? >> Martin Monperrus: Yes. >>: And how many of these cases actually go through the [indiscernible] my issue with these kinds of repairs is always that may be in the test case that doesn't mean that you fixed the code. So when you say that the patches are different and you manually check that they are semantically equivalent? >> Martin Monperrus: Yeah. >>: And they are all? >> Martin Monperrus: Some of them they are semantically equivalent. For some reason they are not semantically equivalent but they seem correct or so according to our understanding of the domain because we are not the domain expert. And for some of them this is the [indiscernible] made after the test switch is not enough. >>: Not semantically equivalent but you think they are correct? I mean it's tough to say, right, with any reasonable, so for how many of them you can, there is semantically equivalent? >> Martin Monperrus: Semantically equivalent or semantically correct? >>: Semantically equivalent. Semantically correct you don't have a specific case so you can't really… >> Martin Monperrus: Exactly. I will tell you after. The paper we have another column which is original patch, but here I only have this one. I think it's 1/3. >>: 1/3, and do you remember for how many test cases were going through there? >> Martin Monperrus: Between four and 60 or 70. >>: So we're going to the condition that you fixed? >> Martin Monperrus: Yes, exactly. But the number of test cases is always, not realize its metric because you can have one test case which is very large with many, many assertions, so it's actually a very, very strong test case as far as the conditions. >>: Yeah, but that's already good because some of the work that I have seen you had like one, too test cases going through which means that you can fix it very easily, right, you just have to patches so the test case behaves as expected. Sometimes you don't even have an assertion so as long as you refresh, it's fixed and so anyways, okay. >> Martin Monperrus: I agree. Some discussion, what is the limitations of the system? First is the synthesis limitation, synthesis of the code somehow the semantics of the repair of this achieved and we were not able to synthesize code containing methods with parameters. Large test cases are always a problem because in large test cases a key assumption is you only evaluate the if to draw a false [indiscernible] case doesn't hold anymore and this was an issue for us. Weak test cases, of course, when there are no assertions or bad assertions or too weak assertions is bad. The angelic value [indiscernible] technique does do well unless you have the if something break in loops because if you put if something so when the angel comes the angel is an angel and is never guilty just beautiful and perfect, so the angel puts true if true, so if true is okay. But if false is an issue in loops because you get into an infinite loop and so on. Perspectives of the system using other synthesis techniques to overcome the first limitation and so we are experimenting the automatic repair of infinite loops which is [indiscernible] the automatic fix or repair preconditions, method precondition of the classical so if you are in poor language with respect to preconditions such as [indiscernible] the preconditions are returned as ifs and things for new exception and so we can repair them or so we are exploring this. This is one part of what I claimed in the previous paper is we are very careful about what we are talking about exactly when we compare to current problems whether it's captures a code so reverse engineering program or it's just specification parameters the same. It's not the same thing to repair buggy conditions, unhandled exceptions, memory leaks and so on and we have to be very careful. We have to qualify the problem and the [indiscernible] data set. And we come to the last slide. No, this is not the last one because the last one is about the dessert from Lille, which is very good if you like chocolate. Just before this repair and Code Hunt, so yesterday and today I have seen many, many relations between repair and what we have seen in Code Hunt. First, of course, hint generation system presented by Daniel yesterday. We could use exactly the same kind of technique as angelic value to speed up finding the hints. We could also use component-based synthesis instead of dynamic synthesis. It may hasten the hint generation. I would be very interested in observing the player's repairs. It's similar to what we have done in these commits. The very last step in the current game is it adding the same, [indiscernible] mentioning the return value and so on, so we could really understand again the nature of getting to a solution and to understand the path from the problem to the solution? But the third thing is this notion of repair duels. As I claim that the beginning of bug fixing is very fun and very addictive, so maybe a repair duel where the goal is to you have the most correct implementation is to find the, to make it correct. This might be also very fun and very addictive for players or this might just be a break in the game experience. Sometimes you have a specification problem. Sometimes you have a reverse generating problem and sometimes you have a repair where you how to find the, so it's can be a break in the game experience. If we go for repair duels there is a key question which is what makes a good repair duel. It's a hard question. Yesterday I started to play with the system and basically there are some small changes that make all test cases given by [indiscernible] a failing. Some of them result only in one failing test case. I tend to think that a good repair duel is a good balance between passing and failing test cases. In other words it's kind of depressing that you have one failing test case. You don't want to repair this. It's not much of anything, so there is something in the game experience between the failing and the passing. There is definitely something with respect to what William told us about yesterday about the [indiscernible]. Maybe or so it's more fun to repair in the repair duel to repair a program when, which is correct to 90 percent of the input space. I don't know, but something has to be explored here and the missing point here is if we are able to qualify those two points we may be able to generate the repair duels from the existing solutions. So modify the secret code so that we have a bug program which is fun with respect to the balance between failing and passing and the model counting. If we can generate the repair duels it's good with respect to having a lot of data because as far as I understood one issue is the number of programs we have. If we can generate new fun programs with a good game experience it would be a cool thing. Conclusion, automatic software repair is fun. That's the take away of my talk because it's fun because we have the fun of the search space which is huge. We have the fun of dealing with the specifications. We can be incomplete; we can be incorrect so it's very, very fun to understand the nature of associations and the synthesis is really fun as well. When you have a system, so duel for automatic repair is always very boring because it just [indiscernible] out of the system, but that [indiscernible] of the output where you can read, touch [indiscernible], so that's very good experience. This is the end of my talk and this is a wonderful picture of [indiscernible]. What do we have in [indiscernible]? We also have different components like automatic repairs, specification search space, so [indiscernible] we basically have three components. The [indiscernible] when you first look at it looks like fact because there's a lot of chocolate and cream and so on, but as usual with French cuisine it's actually very, very fine. In the middle there is meringue, so meringue is something with eggs, but it's very, very light so that as soon as it's in the mouth it disappears. [laughter]. The meringue we have some cream afterwards which is excellent and at the top it's dark chocolate, but what's incredible with [indiscernible] is that even if it looks very fat it's very fine and very light at the end. This is a specialty of Lille so since she is only 45 minutes away and towards the goal, one hour away from London and one hour away from Brussels you are all welcome to visit us in Lille and try the [indiscernible]. Thank you very much. This is the end of my talk. [applause] >>: If you go back to the previous slides and like the proposed work on automatic generating duels, I do not know where you [indiscernible] dimension or what you repair for repair duels. The coding duel naturally you support that. Basically, instead of giving you return zero is a starting point. You just give them the four diversions. I mean with existing [indiscernible] you could already prepare these kinds of duels for training students in terms of their buggy repairing skills, right? I mean I just would like that clarified, because we have seen almost all of them just start with return zeros, return x, but the [indiscernible] is very flexible allowing whatever your initial code is to start with, whatever code happens as a hint for telling students what kind of requirements you need to [indiscernible] other than pure guesses. I think, I like the last point is could we leverage the data to change histories of all these players to evolve or produce new games or new coding duels? Maybe initial dues would be pure guessing but just for the mistakes, from the mistakes being made by the players in the past you may just produce specific more like repair duels. >> Martin Monperrus: The thing about what strikes me today is that an automatic repair is fun but actually there is a hidden part that is not fun at all, which is finding actual bug, real bugs for your next ICSE paper because if she wants you to repair real bugs. Finding bugs is easy but repairing them is hard and it takes you a whole lot of time so for a long time okay but we can generate bugs. It's very easy. We just take mutants and that's fine, but then it is not realistic anymore, so for your next ICSE paper it's an issue. But here it's different. We can generate them because the goal is again experience. And maybe skill building and so in this case it's perfectly fine to maintain the code according to like game experience metric and it makes perfect sense. It's pretty cool. Thank you, again. [applause]