22130 >> Peli de Halleux: Hello, everyone. And welcome here. Today we have Xusheng from North State Carolina. He's going to talk about work he's done in Pex and work about helping users understanding what the dynamics symbol execution engine tells them. It's very confusing, and he's done a lot of work trying to help them in two areas. So we're going to just going to let Xusheng work about that, and maybe introduce himself a little bit deeper. >> Xusheng Xiao: Okay. So hello, everyone. My name is Xusheng Xiao. I'm a second-year Ph.D. student at North Carolina State University. My advisor is Dr. Tao Xie. So this is clever relations work with the Pex group. So to ensure the software reliability and quality, software testing is so by far the most widely used testing techniques. So one of the testing techniques used extensively is structural testing. Structural testing tests the internal structure or working of the programs. It's also called wide box testing. So structural testing usually kills the programs and tests using different test inputs and measures the achieved structural coverage. For example, the most commonly used state minimal coverage which measures how many percent of the state of the [inaudible] is covered by the test cases and the other one is the branch coverage. Mainly how many percent of the branches is covered. So the important goal of structural testing is to achieve at least high structural coverage. Because high structural coverage can help you identify the insufficiency of the test cases. For example, we can tell you which part of the program is still not executed by your tests, so you may consider providing more test inputs to test that part of the program. So to produce the test inputs that can achieve high structural coverage, development testers can manually produce such test inputs. But as we know, it is labor intensive and tedious. So usually we employ automated test generation tool to automate generate test inputs. To automatedly test inputs, we can employ random approach. So the random approach is I randomly have the text impose across the test space, input space. This approach, the test generation of the general approach is cheap and fast because random [inaudible] user perform very well. However, the general test input usually achieve low structural coverage, especially when the program has compressed controlled structures or only a subset of the test input among large input space that would achieve high coverage. In the minimum approach it would have low probability to produce such inputs. So the others, they are the test generation approach is dynamic small executions. So dynamic small execution, it's not random, it's based on [inaudible], so it kills the program symbolically collecting the [inaudible] from the conditional statement, user can generally produce test inputs and then explore the further parts of the program. So although dynamic symbol execution, I'm sorry, I missed one important slide. The state-of-art dynamic execution to the Pex. So although dynamic execution can simply achieve high structural coverage for simple programs, usually face different kinds of challenges in dealing with compressed program practice. So here I give an example. I apply Pex to an open source project unit which is used by many developers in real project in practice. So when I apply the text-based development kinds of problem, the first problem you face is called external method call problem, where the tools cannot deal with the external -- the method call to the external library. By external library, I mean the library to the native six ton cost like file cistern or network socket and all the free compilers party libraries, which normally your tool will not automatically deal with them. And the second problem is the object duration problems. So in object orienting problem, to cover certain branches, it may require you to create some object. To create the objects, you need to produce a sequence of method call to first query object and then you need some state modifying method calls to modify the state of the object so that you can cover certain branches. Later, I will give more examples on these two problems. Okay. So to study what problems preventing automated generation tool from achieving high structural coverage, we perform a prelinear study on four open source projects. So we apply Pex to explore the public method of these four projects, and we collect, achieve coverage and study the problems. So as you can see, the total broad coverage achieved is about 50 percent, with the lowest is just only 15 percent. So which means that we actually face many problems when we apply complex problem in practice. So the top measure problem is the object duration problem which test cannot generate desirable objects for some certain branches. The second top problem is the internal method call problem, is like the problem happens when the return value of some external method call is used by the subsequent branches, then text is very -- Pex cannot generate test input to foster external method call return a desired value so the branches cannot be covered. Or the external method call may keep slowing exception when Pex generated validated input, so a broader test execution and test cannot explore the main path of the program. There are certain type is the validity problem, it usually happens when Pex exists on resources before it can reach a certain amount of coverage. Typical example is like Pex, my KIP [phonetic], including the iteration number of the load, and it will not handle the main part of the programs. So the first problem is the limitation of the [inaudible] over. So like as you can see in the cert subject mass.net here we have this because this library involves many broad point computation which is the limitation of the use congents [phonetic] over. So like congents over can give some approximate integer. Still some branches that he cannot recover. So in my column work, I focus on the first two measure problems because they account for the most probable. So next I will give you an example how external method or problem will prevent automation generation tool from achieving high coverage. So we can see that from the call problem one, you can see here there was an external method call [inaudible] exist. Internal method call receive argument of the variable configure file name. Actually, this configure file name is used the data defined by the program input. So we know that it exists actually data dependency on program inputs. To cover the branches at line one, I mean, line one the connection statement, test inputs caused the file to exist, to return the true value and forced value in the other case. But, unfortunately, because the tool cannot analyze this external method call, so usually you cannot achieve the full coverage of these branches. The second example is a little bit similar. Is like the four path. This method call has data dependency on the program. But this time the method actually checks the argument, the validity of the argument. So if you generate, invalidate, input this method, keep slowing exceptions. And across all the test executions, test cannot figure out how to generate a meaningful input for this method, test cannot explore the following part of the code. And the second example is some kind of external method, they have data dependency on program input. They also have a return value. But their return value is just used to print out, to print out some suggestions. For example, dot format. Such kind of external method actually does not cause any problem to the test generation tool. So we need to identify these three cases and filter out the irrelevant candidates. And here is an example for object creation problem. So we have a class called a fixed sized stack. It uses the classical information of a stack. So the stack is like you cannot just design a list of size to the stack. You need to call the push to increase the size of the stack. So we have parameters -- sorry. We have a parameter unit test called test push which is simply a test of push method of the side stack. So to cover the branches at li-fi, we need the two branches of li-fi, we need a special method call sequence generate stack size is 10. So the sequence would go like this. So you first query a stack, and then you call push ten times. When you get a state object, you involve the conjecture of the fixed size stack on the path in the stack. However, this method sequence would be very difficult for many tools to generate, because the tool cannot figure out like if I call push, I increase size. I may call push first and then I go call pop in my stack, go back to MVP state. So the state for this is very huge and very difficult for the tool to figure out. One observation we can see is like actually you can see that if this branch is not covered, this branch actually has data dependency on one of the few of these objects, which is the stack for items. Later I will give more detail on this. So to adjust this problem, we propose a new approach called cooperative development testing. So when the tool is not perfect enough to deal with all the problems, the human can key in on to have the pool. In our example, we can first apply the tool to a generator test. Other tools generating the tests, the tool can reportedly achieve coverage. Reported in cover problem during the test generation. So when the developer look at these problems, he may can provide some thing called SNP or help to the tool. And he can apply the tool to see the coverage increase. For example, when you see an external method call problem departed, you may single fly, maybe I can ask Pex to instrument and explore this external library. But it's not guaranteed. If I can do unit testing, I can mock this object. Provide mock object for only internal method call set that's identified as the problems. For object creation problem we can actually provide factory methods, so we can just called an SNPe, how to draw the tools, generate method sequence to increase the size of the stack in our example. So there are some existing solutions for problem identification. So the very easy way I can just report external method call in color during my test generation and executions, or I can report all the non paramative object type in their program and fields. However, the limitation approach is there. The number could be high. People don't want to look at all the problems because later I will show examples. And also some identified problem, actually not relevant. Even like you provide guidance to them, you may not have the tool to achieve coverage. So we need a way to precisely identify such problems. Let's go back to the previous example. So we can see that here, like text report 44 external method call problem, 18 object creation problems. However, when we mainly investigate all these problems, we found out that now this external method call actually will cause problem to us. You can just skip them. And actually only fly the object type would need to require human help. So we propose an approach called a Corvana. Corvana, like we, Corvana is into precisely identify the problems first by the tool when the tool value achieving the structural coverage. So based on our previous study we have insights we have found that some partially recovered statement, they have data dependence on the program candidates. So our approach contains three main steps. The first step is we like we will identify the problem candidate. The problem candidate. For example, for external method call, we first need to identify which external method would be the candidate for our analysis. For object creation issue, for object creation problem, we need to identify the nonprimitive object type for our analysis. When we identify the candidates, we perform simple execution on these problem candidate, we basically assign symbolic value to the return value of the external method call, or we just have symbolic value to the nonprimitive object type program input. And so then we perform the forward symbol execution. We collect runtime informations and achieve coverage, which will be used in our later state data dependence analysis. Then we identify which problem, which candidates actually are real problem and which are are not. We prune the very relevant candidates. This is the overview of our approach. So the input of our approach is the program or the parameter unique test under test input generated by tool. And the output of our approach is to identify problems. So when we have this, we just tell you what's the problem, the tool. So we first perform forward symbol execution using the text inputs. The reason why we do that is we want to collect the runtime events when it's generated test input, so we can identify the problem candidates. So I will give an example to show them. So for external method call candidate, we don't consider all the external method call as the candidate. Because if the external method call does not have data dependency on the program input, it varies in a constant stream. So it's very likely that these advanced calls are just precongestion or [inaudible]. We don't need to analyze such external method call. With the user you will not cause problem for achieving higher coverage. For example, here we have external method call file exist. When we locate, we can see that this, we use the variable as MP3 file. It's identified one of the variables from the program input, so we know external method call has data dependency on the program input. So we identify it as a candidate. Similarly, for this method, we can also inadvertently use the program input as to the argument so we know that this is one of the candidates. For identifying the candidate for the object creation problem, we simply identify the nonprimitive object type of the program inputs. So we don't -- because for parameter type into tuple you don't have such kind of programs. So for, for example, for this PUT we can see at the fixed high stack object. And so here we identified the program as a candidate and all its field will be collected. Also identifies the candidates. So the next step is like we will perform a symbol execution on these program candidates. For us, we turn to the elements of the program candidate symbolic. For external method call, which is identified candidate, we return the return value as symbolic value. For the object creation issue, for object creation problem, we assign symbolic value to the program input and all its fields. Then we can leverage the symbol execution and perform simple executions. For example, in our [inaudible] we use the symbolic execution and text, and then we collect the runtime information. The runtime information including the symbolics impressions in branches and uncalled exceptions which will be later used for our program analysis. Okay. So I will walk you through some examples. For example, we can see that this is a full method candidate, return value as the symbolic. So we can collect it, congents from the branches at line one. Then by extracting the elements from the congents, we can know that the branch statement line one is data dependency on external method call farther exist. Using this data dependence analysis, we identified a real program -- I mean, who will not -- who will -- the candidate should have data dependency on program, return value is also used by some not covered branches. For example, here we know that we already know that it has data dependency on the program input. And then we found that the fourth branch in line one is not covered, which means line one is partial recovered statement, and then we profile does exist as a real problem. So the second analysis for identifying external method calls is to analyze the exceptions. So we collect co-exception fielding test execution, and then we extract external method call from the stacked chase from the exception chase. And then we check whether there's any part of the call site is not covered. For example, this method slows objection for all the executions. And we also check the code up to line six is not covered. We know the main part of the program is not further covered and further conform it, past docket is a real problem. So similarly to identify object creation problem, we also use the similar technique to analyze the data dependency. So if an uncovered branch has data dependency on the online primitive, program input, like if you are directly depends on the program input, for example, you may have to like object stack cannot be now. Then you can just report a stack, because you need to first provide a stack. But it depends on the field. Can we just report object type of the field, directory? Let me give you an example to show this. So, for example, we can see that this statement, that this method returns the size of the field, state of items. So we know this statement has data dependency on this field. And then we further found that, okay, the true branches is not covered. So if we directly report the object type of this field, we will report a list with objects. However, this is the first only. Because even if you provide a method call sequence of control list, the fixed size stack cannot use it because this list cannot be designed to the field stack for items in directory. You cannot find any public structure or public data to design these external objects with a field. Therefore, to adjust this problem, we provide a field decoration, hierarchical analysis. So first we can show a field hierarchy. From the field, up to the program input, in this case you'll look like this. To conjunct this field, you can use hierarchy, refreshing to do that. So we start from the program input and then we look for the field level by levels, until we find out the field we want. And then to do that we need to analyze the field level by level. For example, if we, depending on the program input, as I expressed earlier, then we just report it. And later we check whether the field is designable for its declared in classes or not. If the field cannot be designed, designable, this field can be designed external objects using the public structure or public set method of this declared in classes, so if it cannot be designed, then we just reporting, declaring classes. If the field can be designed by declaring classes, we can capture and report until the field is there. So I give you example to illustrate the analysis. So we have this field hierarchy. We first check whether the fixed size stack or stacks can be designed using the public structure or set of the fixed size stack. We found from the con structure we can pass a stack, and this stack can be designed to the field. So we keep going checking the next step. So we check, okay, we found that stack of items is not desirable. And we report a stack as the object type for the object creation problem. This is correct when we provide methods, provide object create a factory for actual text generated test cases to cover these branches. So we increment our approach, part of our approach is implementing extension to Pex. So this is to identify the program candidates, design elements of program candidates, and we also need to collect the runtime information from Pex. And the data dependency analyzed in the GUI opponent is in parenthesis, C sharp application, it's just to analyze the collected information and show the analyst result. So our tool is released, publicly available on the Pex extension website. To evaluate our approach, we choose two subjects. The first subject is X unit, is unit testing framework for .Net. It's the popular testing framework used by many real projects in practice. The other one is the Quick Graph, is C sharp plus library, is also used by many authors, because our approach tend to analyze the problem for the come-back problem in practice and often used by many people. That's why we chose these two subjects. So we first -- so we first used the Pex, the implemented intention as our generation tool. So we apply Pex to generating the test input first. Then we apply our approach to collect the information, produce the identified external method [inaudible] creation problems. So in our evaluations we answer two research questions. The first research question we answer is how effective it can now approach in identifying external method call problems and object creation problems. The second question we answer is like when we identify so many problem candidates, how effectively can our approach like who means this irrelevant ones. So here is a table. So in the evaluation results. So I give you highlights like identify 43 implement call problem with only one false positive and two false negative. To identify the first negative, we manually go through all these programs, and then we identify the problems, first manually, and then we compare it with the identified one by our approach. So next I will give you some examples how our approach is successfully identified the problem and why our approach failed in some cases. So this is the first example is actually from the X unit. So first Pex cannot achieve high coverage of this problem, because the external method call here. So after we executed a test, we can see that we created a carrier and we see the branch statement in line 1 is not covered in data dependency on program input. That's what will employ as external method call problem. Similarly, for this executer, because the method keeps slowing exceptions for all executions. And we checked the coverage and we report it correctly. So this is an example why our approach produced false negative. As we can see, after we check it we find, okay, this branch is not covered. You actually use the return value of the previous, of the external method call opened to kit, however this is still a method. It does not have data dependency on program input, user constant input. And after I further check it, I further confirm that it's actually used to read the value from the windows registration inches. So when I run the test cases, if I want to cover the other side, I need to modify the registration sheets. So that's one problem missed in our approaches. So in our future work, we may plan to turn every external method call as candidate, and we can check this; but we need more experiments to show how it will affect preseason and record our approach. Next I will give you examples of how approach identified object creation program on your subject. So first also achieve low coverage for these methods. The reason is that to enter this far roof, you need a field top of test, not now, and also it has -- it has to implement a list one interfaces in the object, because this is a reflection of things. And by checking class structure, by analyzing the field, declaration hierarchy, we found, okay, this field can be designed using the public structure, we correctly reported the object type of this field. When we do symbolically provide a mock object, interface I type before these branches can be covered. The other one that we've produced false negative is shown here. So actually this is similar to the successful cases. However, when we found that the field fixture is implemented using the dictionary type which is not handled correctly by Pex, and we lost a check of the symbolic value there. So at that time when we were doing the experiments, we did not implement that logic. So we produced some of the false negative. For the false positive of our approaches, of, sorry, there's also one more false positive we get is the static field. So when some branches related to the static field, we also lost a check of the symbolic value. And I seem like the later one Pex will fix it. So we can maintain output. And the reason why we have some false positive is like currently we just, when we are collecting symbolic execution from Pex, we usually collect a stream from it. So when we write some code, abstract it from the stream, you will have some problems of dealing with some corner cases, that's why we sometimes have false positive. And this is the evaluation tool showing how we proved the candidates, as we can see in the next previous example. Only a field, external method call, when it's causing problems for automation generation tool and our tool can effectively prune these irrelevant candidates. Also we can see that like to cover certain branches, you actually don't need to provide object type for everything, you only need to provide to some of the fields of the object type. Yeah. So our approach is actually not limited to just a simple dynamic execution. Our approach can also assist other structural dynamic systems approach. For example, our approach already identified the external method call that will cause problems. So if it's possible to use a way to automatically generate mock objects for only this external method call. The second thing is like, you know, to assist the random approach, actually is our approach already identified object type that would cause problems. So design more probability of exploring this pipe. So you have more chances to cover more branches. Besides, some of the advanced methods, sequence generation process can also be used. For example, when we identify an object creation problem, you can apprise some of these expansive approaches to only explore the type that are reported by our approaches. So the future work of my approach is like, although we don't see many false positives or false negatives with our approaches, but we still have some potential issues. For example, currently our approach did not consider the arguments I felt external method call. So some of the external method call, when you're passing program input, you may modify the program input. So it actually will have some problems. The currently outputting considering data dependencies. So if they happen that some of the cases you need to control dependency, help you produce more precise results, but this requires more experiments to try, whether combined controlled dependency. How controlled dependency will affect the preseasoned record approaches. Also, we may plan to leverage static analysis. So because like currently we're only analyzing the code security by the text input. Some of the regions, if it's not covered by the test cases, that we lose the chance to analyze them. So stack analysis maybe is always a good way to extend our approaches. The other things, currently the output result of our approaches is output tells you have a problem that may prevent you from covering branches one to three. It's still very difficult for developer or tester to understand. So we may provide visualization approaches to visualize the analysis of result with the problems. And we also plan to carry out a user study of evaluating the effectiveness. So, yeah. So here comes the conclusion. So any questions? >>: So it's interesting that you don't need actually a test suite from Pex to apply this tool. >> Xusheng Xiao: Yeah. >>: You can actually take an existing unit test suite and run it and also collect information. >> Xusheng Xiao: Yes. Yeah. You can actually manually provide test inputs, and we also output, can help you identify why you cannot cover certain branches. >>: Very interesting information. Going forward the coverage, did you know they're really independent on the machine environment? >> Xusheng Xiao: Yeah. >>: Can you incubate this? I found four more hours for today. This is great. It's a well-known issue. We claim we don't have post positive dynamic symbol execution. It's not true of object creation and external methods. It's really a big problem. That's why we build models from the beginning. But there's more work to be done there. >> Peli de Halleux: Thank you. >> Xusheng Xiao: Thank you. [applause]