>> Jim Larus: All right. It’s my pleasure today... the University of Illinois. Swarup and Vikram are both...

>> Jim Larus: All right. It’s my pleasure today to welcome two visitors from the University of Illinois. Swarup and Vikram are both here today and tomorrow so if there are other people who are not on this schedule who want to talk there is a little bit of time, but it’s my pleasure to welcome Swarup who is going to be giving the talk today about some work he has been doing Debugging Framework. >> Swarup Kumar Sahoo: Thank you. Good afternoon everyone. The title of my talk is Towards a General Automated Debugging Framework using Automated Software Fault Localization by Filtering Likely Invariants. This work was done along with my colleagues Vikram Adve, John Criswell and Chase Geigle at the University of Illinois. The goal of our work is to deliver up some kind of semi-automated system to help programmers fix bugs. And let me give some motivation for our work. According to a NIST report software failures cost nearly 60 billion dollars every year. And widely used applications contain the largest number of bugs. For example Mozilla gets nearly 300 bugs everyday. So they needs some kind of fail to prioritize and diagnose these bugs. It’s also known that the cost of fixing bugs increases as the software develops on life cycle progresses. So a bug fixing costs are very high during operational/maintenance phase. So it’s very important to fix as many bugs as possible before the application ships and gets deployed. The process of debugging involves 3 key steps. First is reproducing the failure, then trying to locate and understand the root cause that is responsible for the failure and then finally trying to fix the root cause. And debugging is a complex job for many reasons. Reproducing the failures may be very difficult in many cases. And the point of failure may be very far off from the root cause. That kind of complicates the process of debugging. And debugging is mostly a manual and time-consuming process now, and automatic fault localization, which mainly focuses on the second step. It can automatically identify the root causes of the program statements that are responsible for a failure. It can also extract other valuable information which may help the programmer in debugging or fixing the bug. Automatic fault localization can reduce the developing cost and time significantly. So our overall goal is to develop some kind of semi-automatic system to help programmers fix bugs. One great example where such a system can be applicable is doing software testing. Doing any kind of automatic software testing which tests input programs, test inputs, expected outputs and produces failing tests. So they have all the ingredients that are required for automatic debugging framework, specifically all the failing tests have some kind of oracle which can detect whether the program fails and some inputs that are not. So using all this information an automatic debugging tool can try to point out possible locations of root causes and some other information like faulty program values, faulty execution paths and the cause and effect chains which produces the system. And all this information can be greatly valuable for debugging and fixing bugs. In particular currently we have worked on actually trying to localize the faults. In one of our recent [indiscernible] papers we developed the automatic system to identify the root causes of the failures. It was very scalable and it reports very few false positives. And our tool takes program and faulty input as the input and tries to produce the faulty locations in the program in some way. It also provides some other valuable information and we output them in a presentable way. So our technique is best on many basic techniques. One of them is the well known popular Delta debugging strategy, which tries to compare the memory states of two different runs to isolate the root causes. However, it’s pretty expensive to do that. And likely program invariants can be a very efficient way to summarize and compare memory states of different runs. And what are likely program invariants? They are program properties which are observed to hold in some set of successful runs, but unlike sound invariants they might not hold true for all possible future runs. For example we can say return value of some function is always positive, or some stored value is between 0 and 100, or some load value is always 10. These are some examples of likely invariants. And all the likely invariants that fail during the failing run can give us a set of candidate root causes. But, even after this step we need still lots of improvements for effective localization. Some important contributions to our work were that we used a Novel mechanism to train invariants. In particular we used auto-generated similar inputs, which are close to the failing input to generate these invariants. And we combine our approach with the dynamic slicing in software and we used two novel heuristics for reducing the false positives further. And we used many bugs in [indiscernible] applications like Squid, Apache, MySQL and Clang for evaluation. And we got 5 to 28 locations as root causes even for programs of 100K-1M lines of code. After that we applied some trivial manual filtering steps which give us only 2-14 program locations. And in many cases we had only 2-4 locations. So the results were excellent. >> So I have a question. What is the root cause mean? can change these instructions? Does it mean that you >> Swarup Kumar Sahoo: I guess the root cause is any program statements which are responsible for the failure. But in evaluation [indiscernible] we saw the [indiscernible] which statements were changed to fix the failure. So for evaluation we use only those statements. We call those statements root causes. >> [inaudible]. >> Swarup Kumar Sahoo: Yeah, some [indiscernible], although we don’t have [indiscernible]. >> So how large are your final reports? statements? Is it program slice or just several >> Swarup Kumar Sahoo: It’s several statements; maybe I will show the statements. Okay. So I have the motivation and contribution of our work and now I will be talking with some problems with the existing fault localization work which we tried to address. And after that I will details of our bug diagnosis framework. Then I will give some key experimental results. Then I will talk about some of the future work we plan to do towards usable automatic debugging tools. So before going further I will give some definitions that I just talked about. So we define all the faulty program statements that are responsible for the failure as root cause of the software. And for experimental evaluation purposes all the modified statements in the patches we call them the locations of root cause of the software. And all the candidate root causes which are not the true locations of the root cause, they are called false positives. >> So question: so you modify [indiscernible] locations and you find one of them? Is that a success? >> Swarup Kumar Sahoo: No. >> You need to find [inaudible]? >> Swarup Kumar Sahoo: Only [indiscernible], which are really actually. Sometimes they actually try to fix other things and some other irrelevant statements. So, if it finds all relevant statements that need to be changed then we call it success, but we need to find all of them. So there has been a lot of other work on automatic fault localization. We have classified them into 6 categories here. I will talk about only the first two which are the most relevant to our work. If any of you are interested I can talk about anything else. So Delta debugging is very popular work for automatic fault localization. It’s a smart approach which compares memory starts of different runs, but it doesn’t scale well. And there have been some improvements to the developing so that it can handle larger applications. And this aims to find cause effect chains, but in many cases in 55 percent of the cases it can still miss the root cause. And, as I said, invariants is a good way to compact and not a precise way to compact different runs, but most of the previous work, I think all of the previous work has many issues. First the test inputs they use to train the invariants may not always be applicable. And [indiscernible] of test inputs is often low for training. And they don’t have any solution to make the likely invariants narrow or tighter. So when the invariants become very broad it may miss the root cause. So some of they key insights for our work which try to improve on the previous work was we used likely invariants to compact [indiscernible] to summarize and compare different runs. And in this way we can quickly isolate the difference in behavior and give the programmer an initial set of candidates of root causes. And we, instead of using similar test inputs we automatically generated similar closed good inputs to train the invariants. And because of this we can now use very few closed good inputs to train the invariants. And because of this we get much tighter and relevant invariants. So we have very few false positives. That means we don’t miss --. >> False negatives. >> Swarup Kumar Sahoo: Oh, false negatives. That means we don’t miss many root causes. But this may result in many false positives. And hence we double up the sequence of novel filtering techniques to reduce this false positive to a much smaller set. Okay, now let me give some more details about bug diagnosis framework. So this is the overall architecture of our tool. Our tool takes the program, the original bad input and then optional input specification. It then uses them to try to generate many similar inputs, many similar good inputs. And these good inputs are used to generate the invariants. These invariants are then are instrumented back into the program. Then after that all the failed invariants which will give us the initial set of candidate root causes. Then we apply a set of false positive failures to reduce this set of initial candidates to a much smaller set. In particular we apply three filtering steps. The first one backward slicing, second one is dependence filtering and the multiple faulty input filtering. And I will talk about them later. So let me give a concrete example to explain some of later on. is dynamic third one is in more detail the concepts So this is a bug from MySQL and this bug happens when SQL uses a specific data fill with [indiscernible] zero. And this causes a segmentation fault in MySQL. And the segment fault happens at line 7 and when the weekday value becomes negative this results in a buffer overflow. But the actual root cause starts at line 3 where unsigned year value is used. And because of this when the year is 0, year minus becomes a very large value instead of minus 1. And this value [indiscernible] through various [indiscernible] values and then daynr value. Then daynr is used and this value [indiscernible] to weekday. Then finally weekday becomes negative and it results in segmentation fault. Okay. What I showed is a kind of simplified version of the code. Actually this code is split between three different functions. And this is where the buffer flow occurs at line 16. And there is a function which computes the weekday value I showed in the previous slide. And the other function which computes the daynr value I showed earlier. And the faulty values flow through the green arrows here. And I will use the example to illustrate some concepts later on. Now a diagnosis with invariants, so in this work we use likely range invariants to find potential root causes. And what are likely range invariants is range of values which are computed by individual instructions in the program in the correct runs. And when these invariants get [indiscernible] doing the faulty run they give us the set of candidate locations. Currently we have invariants only on load values, stored values and the function return values since they are the most critical locations. These are some of the examples of the invariants. So here, the return value of weekday is between 0 and 6. And here some load value is always positive. And here the stored value is always 100. So these are some of the examples of invariants we are going to use. In the source code example we have invariants on the return value of these two, weekday and daynr functions in lines 9 and 12. And here the invariants return value is always positive. And these actually fail during the faulty run. And they give us kind of initial set of candidate locations. One important point to note here is we are not actually trying to observe invariants on [indiscernible] values. Like for example lets day [indiscernible] or temporary value here. So the bug may actually be anywhere in [indiscernible] values, which feed values to the function return values of the invariants instructions. So when we present the results we give all these statements, we output all these statements also. We give them to the programmer since the bug may be anywhere in the expression. And we call this actually Expression Tree of the invariants of the return value. As I said earlier we train invariants using very similar inputs which are close to the failing input. And in this way we can capture the key relevant differences between different runs. And because of this we use very few inputs to train the invariants. We get much tighter and relevant invariants and we are less likely to miss the root causes, though it might result in many false positives. We have many false positive filters for them. I will briefly talk about how we construct inputs and one important one is this might not be the best way to actually generate inputs although these techniques work. And one of the key reasons we [indiscernible] here is that we want to collaborate with testing teams like [indiscernible] and [indiscernible] who are doing dynamic symbolic execution and other tools. By using them properly we can generate the inputs in a much more systematic manner. Currently we have three approaches to generate inputs. One is deletion-based specification-independent approach, which is kind of a variation of the well known ddmin algorithm and apply character-level deletion. And the second approach is a replacement-based specification-dependent approach. And for this actually we need some kind of input specification like what are the tokens that can depend on the input. And for each token what are the alternative set of tokens that we can replace them with? So depending upon that token type we try and create many variations of its token. And then we combine one token at a time to create the inputs. And the third one was for compiler bugs we used a C-Reduced based approach. So C-Reduced is a tool which tries to automatically create minimal test cases for compiler bugs. And while it does so it actually produces many similar inputs along the way. So we actually modified the test scripts in the CReduce tool to keep track of the good and the faulty inputs and classify them accordingly. And then after that we can actually select a small set of inputs which are close to the original failing input. I have some slides if anyone wants’ to know more details about them I can actually explain later. So right now what we have is we have a set of similar inputs. We then select a set of close good inputs from them. Then we generate invariants using those good inputs. And then what we will do is we will insert those invariants back into the code and run it with the bad input. Now the failed invariants will give us the set of initial candidates. But, we still have 100s of candidates after this state. It’s a significant reduction, but still it’s too much for the program [indiscernible]. Hence we applied three different filtering techniques. The first is Dynamic Backward Slicing. We strived to remove any kind of candidate invariants which may not be influencing the symptom. Then we applied something called Dependence Filtering where we tried to discard the dependent failed invariant if there is no intervening passing invariants between two failing invariants. And the third is Multiple Faulty Input Filtering where we run the techniques for many similar different similar faulty inputs. And then we try to take an intersection of the candidate root causes from all such inputs. talk a little bit about them now. And I will The first is Dynamic Backwards Slicing. Here we try to build the Dynamic Backwards Slicing starting from the failure symptom. And any [indiscernible] instruction or any initial candidate root causes which do not fall under the backward slice we did not move them. We implemented the NpwC algorithm in 2 phases and we handled both the data flow and control flow dependence’s. At the run time we record all the memory locations accessed, all the basics blocks that are traversed and the function calls and returns. We then build a dynamic program dependence graph using this trace and the SSA form. And this tool we call it as [indiscernible] available as open-source and other people started using it. And also that is a Google summer project this year where we are trying to actually make it more general and wildly available. And we computed this Dynamic Backwards Slicing on the original failing run since the root cause is likely during the failing run. And in our example the two invariants which were on the return of daynr and weekday function rely on the Dynamic Backwards Slicing so they are not filtered out. Okay. Now let me talk about the next filtering step which is Dependence Filtering. So the main idea here is the return value of the daynr function is actually used by the return value of weekday function. In this case we say that this one actually failed not because it’s faulty, but since it used a faulty value from the previous dependent instruction. So, most likely the root cause is here, not here. Hence actually, so we say that this is a possible root cause because the invariants return value is greater than equal of 0 or [indiscernible] here by a negative value. But, this one is probably not a root cause so we can filter this out. So in general the idea is that we go through the dynamic program dependence graph and we check for invariant failures. If a failed invariant uses value from another failed invariant we say that the dependent invariant is actually probably not a root cause. It only failed because it used a faulty value from the previous invariant so we can filter this out. In other cases where there are passing invariants between two failing invariants, in those cases we don’t filter this dependent invariant, because it uses the value from a passing invariant. So our assumption is that this value is correct. So this used the correct value and failed. So this is also a likely root cause. So in this case we [indiscernible] the root causes. >> [inaudible]. >> Swarup Kumar Sahoo: Oh, eliminate the top one? >> Yeah, because it looks like the code seems to have [inaudible]. >> Swarup Kumar Sahoo: Yes, it may have recovered and one more thing is we are actually seeing one part. It might be going to another part to the other symptom. That’s one important reason. This is actually not a sound technique. Our failing techniques are not sound. So it’s possible that actually this may not be a root cause also. And it’s possible that is a root cause. So our technique is currently not sound. So it’s possible that sometimes it can filter out the true root cause. But, in this case we don’t know if it may be going through other values and may be affecting the symptom. And now Multiple Faulty Inputs Filtering step. This is a very simple idea. So we assume that root causes are the same for all the similar faulty inputs which cause the same failure. So we assume that the root cause must be present in the candidate root causes of all such inputs. So what we do is we use the similar input methodology to create many similar inputs. And we repeat the previous three steps to construct the candidates root causes for each different input. Then we take an intersection of all those candidate root causes which gives us the final set of candidate root cause locations. Any questions? Okay. Now I described some of the key details of Bug Diagnosis Framework. So I will talk about some key experimental results now. So here I would like to address two key important questions. The first is how effective is our overall bug localization framework? The second is how effective are our filtering techniques? So for the experimental evaluation properties we use [indiscernible] bugs from four applications. And there are 5 bugs which were missing code bugs. That means there were some parts of the code which were missing. And we didn’t consider because our framework currently can’t handle them. So we need some additional kinds of invariants like [indiscernible] to handle them. So we didn’t consider them for this evaluation. And we used LLVM for compiling programs and running our passes. This table gives some key characteristics of the 8 server bugs. We used 3 server applications: Squid, Apache and MySQL. The third column here gives total number of static lines of code that are executed by the faulty run. So we have around kinds of thousands of lines of codes that get executed. And the fourth column gives the distance from the root cause to the symptom in terms of dynamic number of LLVM instructions. And this column gives the distance in terms of number of static number of lines of code. And this gives the distance in terms of static number of functions from the root cause to the symptom. The important observation is that the thousands of lines of code that gets executed in the failing runs and the distance along the slice from the root cause to the symptom spans several functions. And this distance is especially high for the incorrect output bugs. So for the incorrect output bugs there are kind of tens of functions between the root cause and the system. And for such bugs the diagnosis process is more difficult. >> So if you just did a dynamic slicing how close do those distances become? >> Swarup Kumar Sahoo: This distance is showing along the dynamic slice I think. Oh, sorry, this distance right, yeah this is along the dynamic slice actually. Okay. I have not included the other instructions here. Let me give the other relevant instructions. So if we take the slice from the symptom to the root cause it will span through this many functions. >> [inaudible]. >> Swarup Kumar Sahoo: Oh the bug input? >> [inaudible]. >> Swarup Kumar Sahoo: Oh for this application, like MySQL is kind of some kind of query, like I said example MySQL query. And for Squid and Apache it’s [indiscernible]. So we take the inputs for the application. Okay. So now let me talk about how effective was our overall bug localization framework. If you see each of these bugs executed thousands of static invariants. And when we run our invariants bugs we had around hundreds of failed invariants in each of those bugs. So we can see it’s kind of a significant reduction from thousands of invariants to hundreds of invariants, but I still think it’s a lot more for the programmers to analyze each of them and figure out the root cause. And when we apply all the three previous filtering techniques we got around 5 to 28 program locations of standard root cause. So the filtering steps were quite effective. And then what we did was we actually manually went through those root causes. And we applied a [indiscernible] filtering step which I will talk about a little bit later. We could then reduce it to only 2 to 14 program locations. So the approach was pretty effective for these bugs. And we missed root cause in one of the cases. And here the root cause was inside the Visit function the skipProcessUses to false. And we called it the VisitExpr function here. And here the condition in this branch was wrong, hence [indiscernible] skipProcessUses to true. So it remained false and it comes back and incorrectly caused the ProcessUses function and it results in all sorts of violations. So to handle these kinds of bugs there are several ways to tackle this. First is we can do a better input generation for that [indiscernible] of the failing runs from the good runs. Another kind of invariants may help here, like [indiscernible] and also invariants on the intermediate values. Right now, as I said, we have only [indiscernible], not on the temporary intermediate values. Invariants on intermediate values can also help in such cases. And that’s the kind of future work for us. Yes? >> [inaudible]. >> Swarup Kumar Sahoo: [indiscernible]. I mean any kind of general invariance which takes into account which branches the program takes. One example I can think of is kind of [indiscernible] invariants. So basically if I have some use, which definitions it is using? That depends on the [indiscernible] of the application, this kind of invariants. So for some classes bugs may be pretty useful. For example like missing code bugs, this can be useful. Now okay, how effective were our individual filters? So if we see the Slicing Filter is pretty effective. It was able to reduce nearly 80 percent of the false positives. And the second was Dependence Filtering and it was pretty effective reducing nearly 53 percent of the remaining false positives. And the third one, Multiple Faulty Input Filtering was somewhat less effective. It reduced 14 percent, but still I think we can say since it appeared a set of very effective filtering steps because it’s still a significant improvement. Now one important thing was for one of the bugs the last filtering step actually missed the root cause. Since for some of the faulty inputs we generated it didn’t contain the root cause. And so finally the last step actually we had the root cause for 11 out of 13 bugs. >> So did any of these [inaudible]? >> Swarup Kumar Sahoo: Sorry, can your repeat? >> Did it change the control flow of the [inaudible] of all these bugs? >> Swarup Kumar Sahoo: After fixing? Um, the control flow? Yes, some of them, not all, but some of them will change the control flow for sure. But, I don’t understand, sorry. >> I understand Slice and Dependence filter are dependent on each other. >> Swarup Kumar Sahoo: Yes, Dependence Filtering actually uses the Slicing scale actually. The Dynamic Program Dependence Graph is built --. >> [inaudible]. >> Swarup Kumar Sahoo: [indiscernible]. >> You do it by itself? >> Swarup Kumar Sahoo: No, okay, we applied these previous steps on all those faulty inputs into [indiscernible]. So it can --. >> [inaudible]. >> Swarup Kumar Sahoo: Oh, just by itself you mean; that we have not tried? It may be much more effective then if we apply it at the end. So I will talk about the manual filtering step we did. The programmer can actually manually look into those root causes and quickly try and filter false positives. For example when we looked through the candidate root causes we found out that advancing many --. We didn’t do any kind of sophisticated processing; we just looked at the function name where those failed candidates invariants were. By just looking at the function names we could figure out they are very less likely to affect the symptom. For example Lex and Parsing functions. Many candidates will fail if there is a slight defense between the inputs, but it’s very less likely to affect the input. Same is the case with the input/output functions. Also for random number generates they can randomly get evaluated without really effective the symptom. And we also observed that many time-related functions fail. And they can fail if you are on them different times, but it will less likely effect the root cause. This actually did not eliminate the false positives. So after applying this from 5 to 28 we could reduce them to 2 to 14 locations. This is actually one of the bugs, the candidate locations in one of the bugs. And here I have simply [indiscernible] them to just include the function names here. So first actually we can remove all the candidate root causes because of time function. Then I remove the candidate root cause for the random --. My underscore is some kind of random number [indiscernible] in MySQL. Then there are two functions which were input/output functions. Then finally there were functions which were Lex and Parsing functions. After that we had only 3 candidate root causes here. And the root cause was in this function. >> How come you are generating invariants on random number [inaudible] in the first place? [inaudible]. >> Swarup Kumar Sahoo: Oh, okay you mean that can be any kind of invariant there. Yeah, but since we are not using sound static analysis we are doing it at the runtime. So any kind of [indiscernible] observes that. It will try and form some kind if invariants there. >> Oh, so if you random long enough on good inputs [inaudible]? >> Swarup Kumar Sahoo: Yeah, [inaudible]. >> [inaudible]. number of runs. It’s more like an observed range of values in some small >> Swarup Kumar Sahoo: Of the properties. >> I get that, but I was surprised that you saw these dynamically generated properties. >> [inaudible]. >> Swarup Kumar Sahoo: Okay. I will talk about some of the --. So, one of the bugs [inaudible] is quid-len bug. It worked in our case, but if you use kind of general test inputs it may not work; why? It’s because the input fill, the faulty input actually it uses many special characters to reproduce the failure. And if you have any larger user name in the training search then the faulty input you will actually miss the root cause. For example in this case the failure in failing input was something like this where it had many specific special characters in the input. And if you have in the training set if you include kind of many different inputs and you have a very large [indiscernible] fail, because there are some invariants which are based on the lengths of the parts of the input. And this will become very broad if you use many different large inputs. But, our approach is more likely to find the root cause in this case. So, some of the caveats of our work: the first thing is I talked about the expression trees and how we output to the programmer. And the second is input sensitivity. So we output to the programmer all the failing candidates at each filtering step. And for each candidate set we also include the maximum local sub-expr tree rooted at that candidate, which includes all the intermediate values that feeds value to the invariant instructions. And we do this since we only track the load stored values and the function return values, not the intermediate values where the bug may be actually. As I had sworn earlier actually for this return value, for the invariant on this return value the expression tree will include all the statements that are marked in red. And for the number of candidate root causes that we had in our last filtering step. So, the total number of lines of code that the source expression tree maps is here. These numbers are somewhat high for some of the bugs; however I think that we can actually reduce the size of the expression tree to a much smaller set by putting invariants on the intermediate values and using some other invariants like address-based invariants and control-flow invariants. When you form the expression tree the values can actually escape through the addresses and make the expression tree large for some of the candidate root causes. And also we observed the bug behavior was somewhat sensitive to the inputs we used. For example as you saw in one of the bugs we missed the root cause in the last filtering step because of some similar faulty inputs we used. And the [indiscernible] bug is SQL [indiscernible] bug. In one of our experiments when we used manual general inputs to train the invariant it missed the root cause, but in our automatic setup it didn’t remove the root cause. So why does our approach work so well? We initially had a few thousand static lines of code which were exercised by the faulty run. And we were able to remove that to 2 to 14 locations. So some key reasons were I think the likely range invariants were effective for comparing successful and failing runs of many bugs. And we used few similar inputs to train the invariants. So we had a very tight and relevant set of invariants which averted the false positives. >> False negatives. >> Swarup Kumar Sahoo: Sorry, I have been talking about false positives more. So, “which prevented false negatives” and we also had some very effective filtering steps to reduce the false positives. Now let me talk about some of the future work we are planning to do towards developing a really usable debugging tool. First of all our analysis already extracts many useful information. For example the failed invariant and its value can be very helpful in debugging. And the bad inputs and the good inputs also and their differences can provide [indiscernible] clues about what the root cause might be. For example when you observe the year field is always 0 in all the bad inputs. That will give us a strong clue. For example in one of the bugs the parameter to the aggregate function was always negative to reproduce the failure. I think that such [indiscernible] can be pretty helpful. We also had the dynamic execution path from the invariant failure to the symptom. And all this information is pretty useful. And we can additional use some custom, but simple static analysis to extract more information from the symptom, invariant and execution path. For example we can do some kind of symptom specific analysis. For example memory bugs have some specific particular type of root causes. So if we can do some kind of symptom specific analysis we could probably pinpoint the root causes better. Secondly is whitebox fuzz testing uncovers a large number of bugs. And today it is difficult to find out which bugs to fix, which bugs not to fix and how. And diagnosis can actually significantly help decide which bugs to fix. And in such a testing environment for example in the nightly testing environment our tool can be much more practical; because one of the problems with our tool is we need some kind of detector to find out whether input causes the failure or whether it’s a good input. So especially for the incorrect output bugs we don’t have such a detector. So we currently use some other [indiscernible] applications, some other similar application, to compare the output and to detect whether it’s a failure run or not. However, in the nightly testing it automatically provides with an easy detector for all the bugs and also it gives us a buggy input. So this can make our tool really practical. Also we plan to pursue many future research directions to make this tool really usable and to really pinpoint the root cause. The first is our current input generation is not really general. So we plan to do some kind of robust input generation. And also our filtering techniques are not really sound so we are planning to do some more robust filtering steps. And we want to explore a broader category of invariants. Currently there are many automatic testing tools like [indiscernible], etc which use those types of approaches. And it uses constraint solvers to construct new good and bad inputs. And such techniques can be laborious to build a more robust general and automatic generation for our framework. However, there are some crucial differences. For example we need a substrategy which can quickly search the execution path for similar inputs. In contrast the existing testing tools try to explore different paths so that they can increase the code recoveries. But our goal is the opposite. And we also need a selection strategy to increase the likelihood of finding the root causes. I have some examples of how we can do this, for example in this bug the root cause was here in the last step. There are two conditions here in this function if the C1 to CN path constraints on the condition to reach this function. And we can actually combine the other two conditional statements here. Now we have a complete path constraint up to this standard root cause. And if we try to negate a certain constraint and solve up to that constraint we can get a new set of inputs. For example if we input the last constraint here and each of these conditions here are branch conditions through which we reach the particular point. And if we solve these constraints the solver may be able to give a different input here which is not a failing input, but a similar input to the failing input. Next we can actually try to negate the previous constraint and try to solve. And the solver can give us another input here which is similar to the previous input. But, as I said our goal is to actually explore similar paths, not increase the block coverage. So the search strategy will need to minimize the differences, not the other way around. The next one is building a more robust filtering step. We are trying to investigate dynamic symbolic execution to reduce the false positives. In particular, given a failure inducing input --, sorry, given a failure-inducing input and a particular statement is there a different value of that statement which can avert the failure? So we would like to ask this to the constraint solver. And depending on the answer from the solver we can categorizes them into three different sets. The candidate statements for which a different value can avert the failure, some candidate statements for which no different value can avert the failure and third the solver may not actually find any solution within certain time durations, certain time bounds. Such an approach may not actually be possible if we try to apply that on the whole program, but if we have a small set of candidate locations already then this may be a feasible approach. One issue here is the scalability issue because the execution can diverge from the point from [indiscernible] at the invariant location. And we can possible use the successful and failing runs to control the scalability. For this example we can give some sort of constraints like 10 [indiscernible] daynr + 5. Okay, sorry. Yeah, so here it was computing daynr +5. So this constraint kind of models that statement. Then this statement actually models the cost from [indiscernible]. And now we have a condition which says the [indiscernible] lies within the array bound. And depending on if the solver actually finds the solution, which it finds for this case, and then we can say that this is a possible root cause. If the solver can’t find the solution we can filter them out. >> [inaudible]? >> Swarup Kumar Sahoo: I am sorry, pardon me. >> This sounds like [indiscernible]. >> Swarup Kumar Sahoo: Okay. >> They have a similar approach. >> Swarup Kumar Sahoo: Similar approach, okay. >> [inaudible]. >> Swarup Kumar Sahoo: Yeah, I don’t remember. >> Yeah, we should take a look at it. >> Swarup Kumar Sahoo: Yeah, yeah. Okay. The tool can automatically identify the 5 - 28 candidate root causes and likely range invariants were effective for comparing runs. We had very few similar good inputs to get the tighter invariants and we had Novel filtering techniques to effectively reduce the false positives. And after we applied the manual filtering step we had only 2 - 14 candidate locations. One important question we would like to as is: “Can Concolic execution be used to make this approach more robust and pinpoint the root cause(s) of failure”? Questions? >> So I think in general it’s interesting to think about how what’s good for your technique is good for search. I mean in things like Chess and Sage there is a search strategy by which you search paths or you search schedules. And the size of the property you are searching and essentially as long as things are good you are doing some pruning and you’re [indiscernible]. The next path or the next trace is often very similar to the previous one. So then once you have crossed the threshold from good to bad, once you find the bad trace it’s very often the case that since the traces are so similar you already get the very good candidate to pause because of the search strategy, because you are already close to a good trace. Then you find the bad trace. So I think it’s interesting to think about search and what the notion of closeness is in similarity, because I think in some cases you are randomly generating tests. I can see the odd; you just leave it up to the users to generate good and bad tests. Then you don’t have necessarily the notion of closeness, but in search you do. You get it sort of for free and it really helps, in our experience I think, you get the root quals almost for free if your search strategy has nice properties. >> Swarup Kumar Sahoo: Yeah, one way maybe it will see how the execution path is different between different runs to find out the closeness between the inputs. >> Jim Larus: Any other questions? Let’s thank our speaker. >> Swarup Kumar Sahoo: Okay, thank you. [clapping]

>> Jim Larus: All right. It’s my pleasure today... the University of Illinois. Swarup and Vikram are both...

Related documents

Products

Support

&gt;&gt; Jim Larus: All right. It’s my pleasure today... the University of Illinois. Swarup and Vikram are both...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Jim Larus: All right. It’s my pleasure today... the University of Illinois. Swarup and Vikram are both...