50.530: Software Engineering Sun Jun SUTD Course Outline Date Topic Sep 15 Introduction Sep 22 Automatic Testing Sep 29 Delta Debugging Oct 13 Bug Localization Oct 20 Specification Mining Nov 3 Race Detection Nov 10 Hoare Logic and Proving Nov 17 Symbolic Execution Nov 24 Invariant Generation Dec 1 Software Model Checking Dec 12 Rely Guarantee Reasoning Dec 19 Final Exam Remarks Debugging Verification Dec 15, 10 - 12 Week 5: Specification Mining for Debugging Where the bug is? Where the bug is depends on what the programmer wants at each step. How do we know what the programmer wants? We “find out” what the programmer wants, borrowing ideas and techniques from machine learning. Sahoo et al. ASPLOS 2013 USING LIKELY INVARIANTS FOR AUTOMATED SOFTWARE FAULT LOCALIZATION The Idea Delta Debugging is perhaps inefficient and unscalable because it compares a pair of concrete program states: too many differences and too detailed. Good Bad The Idea In fact, the details don’t matter. The fact that the graph is cyclic matters. The Idea 1. Generate more passed test cases Good Good Good The Idea 2. Generate likely invariants At L, x = 1 and y = -2 At L, x = 2 and y = 0 1<=x<=3 and -2<=y<=1 At L, x = 3 and y = 1 What forms of invariants do I use? The Idea 3. Test the likely invariant with the failed test Bad 1<=x<=3 and -2<=y<=1 At L, x = 50 and y = 0 L is a candidate root cause of the bug! The Idea 4. Reduce the candidate root causes • Dynamic program slicing: finding out which statements affect the candidate root cause • Dynamic dependence filtering: given two root causes A and B, if B is affected by A and A comes earlier, A is more likely the real cause. Overall Picture Overall Picture How to generate inputs? What invariants to generate? How to conclude one candidate root cause is more likely than the other? Where is the bug? It fails when the date is 0000-Jan-01. From MySQL database server 1. Generate Inputs • The inputs should be “close” to the failure input, in the same spirit of “nearest neighbor”. • Systematically generate inputs based on the DD algorithm. The initial good inputs + good inputs generated from DD A queue of good inputs to generate more good inputs from. A list of good inputs Algorithm 1 Algorithm 1 Consider the input is “SELECT DATE_FORMAT(“0000-01-01”, ‘%W %d %M %Y’) for the MySQL example, does it work? If a specification of the input format is given, we can generate better and meaningful inputs. Generate new inputs based on type Algorithm 2 Research Discussion How do we guarantee to generate inputs which are close to the failure input? Can we generate inputs at a program points closer to the failure? 2. Generate Invariants • The invariant should rightly “guess” what the programmer wants somewhere in the program. – Where do we generate invariants? – What form of the invariants should take? Invariant: The returned value must be positive. How should we know this? 2. Generate Invariants • Where do we generate invariants? – (in the paper) load, store and function return instructions. • Load: array[i] * 5 + 2 • Store: array[i] = array[k] + 100; • Return: return x + y; How would you justify this? What is the consequence? 2. Generate Invariants • What form of the invariants should take? – (in the paper) a range invariant, e.g., x in [1..5] How would you justify this? Overall Picture How to generate inputs? What invariants to generate? How to conclude one candidate root cause is more likely than the other? 4. Reduce Candidate Causes • Using dynamic program slicing: given a statement S, the backward slice of S contains all statements which S depends on. – A data dependency is a situation in which S refers to the data of a preceding statement. – S is control dependent on a preceding statement if the outcome of latter determines whether S should be executed or not. Remove all those candidate causes which the initial failure statement does not depend on. Dynamic Program Slicing int[] previous = new int[5]; public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } } previous[0] = max; return max; } So if the value of returned max caused a failure, “previous[0] = max” should not be a candidate cause. public int max (int[] list) int max = list[0]; int i = 0 i < list.length-1 if (max < list[i]) { max = list[i] i++ i < list.length-1 Previous[0] = max return max Exercise 1 int sum = 0; int i = 0; while (i < 1100) { sum += i; i++; } assert(sum >=0); Use program slicing on the assertion. 4. Reduce Candidate Causes • Using dependency filtering: if a faulty statement that is the bug’s root cause triggers an invariant failure, then any statement using the faulty value computed by that statement might also trigger an invariant failure. • If statement T (control/data-)depends on S, remove T. Is this justified? Invariant failure here Invariant failure here dependency 4. Reduce Candidate Causes • If there are multiple failed test cases, with the same cause of failure, intersect the candidate cause set for each failed test case. Is this justified? Case Study • Objects of analysis – The Squid HTTP proxy server – The MySQL database server – The Apache HTTP web server • Selected 8 real software bugs – Have to be software versions which can be supported by the tool developed by the authors – No concurrency bugs. Why? – No missing code bugs. Why? Case Study Case Study: Effectiveness Q1: whether the approach can find the true root causes of bugs? • For each bug, the correction patch in the bug reports is used to identify the minimal statements which should be changed or deleted to remove the failure symptom. Q2: how many false positives it generates? Is this justified? Case Study: Effectiveness Results Given a set of remaining causes, find out the statements the causes depend on. Compared with Tarantula The range of source codes that have to be checked. What are the limitations? LEARN TO DEBUG Click HERE for Slides; Click HERE for the Paper feature 2 Research Discussion O O O O O O O XX X X XX O O O O O O O O O O feature 1 What if the vectors are located like above. Research Discussion int[] previous = new int[5]; public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { As an expert programmer, how do you max = list[i]; learn what the programmer wants? } } previous[0] = max; return max; } What does this program do and how do you know? Research Discussion if (card == null) { printk (KERN_ERR, “capidrv-%d: … %d!\n”, card->contrnr, id); } How do you know there is a bug in the program? Research Discussion int mxser_write (struct tty_struct *tty, …) { struct mxser_struct *info = tty->driver_data; unsigned long flags; if (!tty || !info->xmit_buf) { return (0); } } There is a potential problem and why? Exercise 3 Take this program and this input as example. Apply both methods and argue whether it works to find the bug. If there is a challenge, how do you overcome it or what assumptions you would make to overcome it? Research Discussion What else can we learn what the programmer really want from? The Overall View the behaviors we wanted the behaviors we have