22130 >> Peli de Halleux: Hello, everyone. And welcome...

advertisement
22130
>> Peli de Halleux: Hello, everyone. And welcome here. Today we have Xusheng from North
State Carolina. He's going to talk about work he's done in Pex and work about helping users
understanding what the dynamics symbol execution engine tells them.
It's very confusing, and he's done a lot of work trying to help them in two areas. So we're going to
just going to let Xusheng work about that, and maybe introduce himself a little bit deeper.
>> Xusheng Xiao: Okay. So hello, everyone. My name is Xusheng Xiao. I'm a second-year
Ph.D. student at North Carolina State University. My advisor is Dr. Tao Xie. So this is clever
relations work with the Pex group.
So to ensure the software reliability and quality, software testing is so by far the most widely used
testing techniques.
So one of the testing techniques used extensively is structural testing. Structural testing tests the
internal structure or working of the programs.
It's also called wide box testing. So structural testing usually kills the programs and tests using
different test inputs and measures the achieved structural coverage.
For example, the most commonly used state minimal coverage which measures how many
percent of the state of the [inaudible] is covered by the test cases and the other one is the branch
coverage. Mainly how many percent of the branches is covered.
So the important goal of structural testing is to achieve at least high structural coverage.
Because high structural coverage can help you identify the insufficiency of the test cases.
For example, we can tell you which part of the program is still not executed by your tests, so you
may consider providing more test inputs to test that part of the program.
So to produce the test inputs that can achieve high structural coverage, development testers can
manually produce such test inputs. But as we know, it is labor intensive and tedious.
So usually we employ automated test generation tool to automate generate test inputs. To
automatedly test inputs, we can employ random approach.
So the random approach is I randomly have the text impose across the test space, input space.
This approach, the test generation of the general approach is cheap and fast because random
[inaudible] user perform very well. However, the general test input usually achieve low structural
coverage, especially when the program has compressed controlled structures or only a subset of
the test input among large input space that would achieve high coverage.
In the minimum approach it would have low probability to produce such inputs.
So the others, they are the test generation approach is dynamic small executions. So dynamic
small execution, it's not random, it's based on [inaudible], so it kills the program symbolically
collecting the [inaudible] from the conditional statement, user can generally produce test inputs
and then explore the further parts of the program.
So although dynamic symbol execution, I'm sorry, I missed one important slide. The state-of-art
dynamic execution to the Pex. So although dynamic execution can simply achieve high structural
coverage for simple programs, usually face different kinds of challenges in dealing with
compressed program practice.
So here I give an example. I apply Pex to an open source project unit which is used by many
developers in real project in practice.
So when I apply the text-based development kinds of problem, the first problem you face is called
external method call problem, where the tools cannot deal with the external -- the method call to
the external library.
By external library, I mean the library to the native six ton cost like file cistern or network socket
and all the free compilers party libraries, which normally your tool will not automatically deal with
them.
And the second problem is the object duration problems. So in object orienting problem, to cover
certain branches, it may require you to create some object.
To create the objects, you need to produce a sequence of method call to first query object and
then you need some state modifying method calls to modify the state of the object so that you can
cover certain branches.
Later, I will give more examples on these two problems. Okay. So to study what problems
preventing automated generation tool from achieving high structural coverage, we perform a
prelinear study on four open source projects.
So we apply Pex to explore the public method of these four projects, and we collect, achieve
coverage and study the problems.
So as you can see, the total broad coverage achieved is about 50 percent, with the lowest is just
only 15 percent. So which means that we actually face many problems when we apply complex
problem in practice.
So the top measure problem is the object duration problem which test cannot generate desirable
objects for some certain branches.
The second top problem is the internal method call problem, is like the problem happens when
the return value of some external method call is used by the subsequent branches, then text is
very -- Pex cannot generate test input to foster external method call return a desired value so the
branches cannot be covered.
Or the external method call may keep slowing exception when Pex generated validated input, so
a broader test execution and test cannot explore the main path of the program.
There are certain type is the validity problem, it usually happens when Pex exists on resources
before it can reach a certain amount of coverage.
Typical example is like Pex, my KIP [phonetic], including the iteration number of the load, and it
will not handle the main part of the programs.
So the first problem is the limitation of the [inaudible] over. So like as you can see in the cert
subject mass.net here we have this because this library involves many broad point computation
which is the limitation of the use congents [phonetic] over. So like congents over can give some
approximate integer. Still some branches that he cannot recover.
So in my column work, I focus on the first two measure problems because they account for the
most probable.
So next I will give you an example how external method or problem will prevent automation
generation tool from achieving high coverage.
So we can see that from the call problem one, you can see here there was an external method
call [inaudible] exist. Internal method call receive argument of the variable configure file name.
Actually, this configure file name is used the data defined by the program input. So we know that
it exists actually data dependency on program inputs. To cover the branches at line one, I mean,
line one the connection statement, test inputs caused the file to exist, to return the true value and
forced value in the other case.
But, unfortunately, because the tool cannot analyze this external method call, so usually you
cannot achieve the full coverage of these branches.
The second example is a little bit similar. Is like the four path. This method call has data
dependency on the program. But this time the method actually checks the argument, the validity
of the argument.
So if you generate, invalidate, input this method, keep slowing exceptions. And across all the test
executions, test cannot figure out how to generate a meaningful input for this method, test cannot
explore the following part of the code.
And the second example is some kind of external method, they have data dependency on
program input. They also have a return value. But their return value is just used to print out, to
print out some suggestions. For example, dot format. Such kind of external method actually
does not cause any problem to the test generation tool.
So we need to identify these three cases and filter out the irrelevant candidates.
And here is an example for object creation problem. So we have a class called a fixed sized
stack. It uses the classical information of a stack. So the stack is like you cannot just design a
list of size to the stack. You need to call the push to increase the size of the stack.
So we have parameters -- sorry. We have a parameter unit test called test push which is simply
a test of push method of the side stack.
So to cover the branches at li-fi, we need the two branches of li-fi, we need a special method call
sequence generate stack size is 10.
So the sequence would go like this. So you first query a stack, and then you call push ten times.
When you get a state object, you involve the conjecture of the fixed size stack on the path in the
stack. However, this method sequence would be very difficult for many tools to generate,
because the tool cannot figure out like if I call push, I increase size. I may call push first and then
I go call pop in my stack, go back to MVP state. So the state for this is very huge and very
difficult for the tool to figure out.
One observation we can see is like actually you can see that if this branch is not covered, this
branch actually has data dependency on one of the few of these objects, which is the stack for
items.
Later I will give more detail on this. So to adjust this problem, we propose a new approach called
cooperative development testing.
So when the tool is not perfect enough to deal with all the problems, the human can key in on to
have the pool.
In our example, we can first apply the tool to a generator test. Other tools generating the tests,
the tool can reportedly achieve coverage. Reported in cover problem during the test generation.
So when the developer look at these problems, he may can provide some thing called SNP or
help to the tool. And he can apply the tool to see the coverage increase.
For example, when you see an external method call problem departed, you may single fly, maybe
I can ask Pex to instrument and explore this external library. But it's not guaranteed. If I can do
unit testing, I can mock this object.
Provide mock object for only internal method call set that's identified as the problems. For object
creation problem we can actually provide factory methods, so we can just called an SNPe, how to
draw the tools, generate method sequence to increase the size of the stack in our example.
So there are some existing solutions for problem identification. So the very easy way I can just
report external method call in color during my test generation and executions, or I can report all
the non paramative object type in their program and fields.
However, the limitation approach is there. The number could be high. People don't want to look
at all the problems because later I will show examples.
And also some identified problem, actually not relevant. Even like you provide guidance to them,
you may not have the tool to achieve coverage. So we need a way to precisely identify such
problems.
Let's go back to the previous example. So we can see that here, like text report 44 external
method call problem, 18 object creation problems.
However, when we mainly investigate all these problems, we found out that now this external
method call actually will cause problem to us. You can just skip them.
And actually only fly the object type would need to require human help. So we propose an
approach called a Corvana. Corvana, like we, Corvana is into precisely identify the problems first
by the tool when the tool value achieving the structural coverage.
So based on our previous study we have insights we have found that some partially recovered
statement, they have data dependence on the program candidates.
So our approach contains three main steps. The first step is we like we will identify the problem
candidate. The problem candidate. For example, for external method call, we first need to
identify which external method would be the candidate for our analysis.
For object creation issue, for object creation problem, we need to identify the nonprimitive object
type for our analysis.
When we identify the candidates, we perform simple execution on these problem candidate, we
basically assign symbolic value to the return value of the external method call, or we just have
symbolic value to the nonprimitive object type program input.
And so then we perform the forward symbol execution. We collect runtime informations and
achieve coverage, which will be used in our later state data dependence analysis. Then we
identify which problem, which candidates actually are real problem and which are are not. We
prune the very relevant candidates.
This is the overview of our approach. So the input of our approach is the program or the
parameter unique test under test input generated by tool.
And the output of our approach is to identify problems. So when we have this, we just tell you
what's the problem, the tool.
So we first perform forward symbol execution using the text inputs. The reason why we do that is
we want to collect the runtime events when it's generated test input, so we can identify the
problem candidates. So I will give an example to show them.
So for external method call candidate, we don't consider all the external method call as the
candidate.
Because if the external method call does not have data dependency on the program input, it
varies in a constant stream.
So it's very likely that these advanced calls are just precongestion or [inaudible]. We don't need
to analyze such external method call. With the user you will not cause problem for achieving
higher coverage.
For example, here we have external method call file exist. When we locate, we can see that this,
we use the variable as MP3 file. It's identified one of the variables from the program input, so we
know external method call has data dependency on the program input.
So we identify it as a candidate. Similarly, for this method, we can also inadvertently use the
program input as to the argument so we know that this is one of the candidates.
For identifying the candidate for the object creation problem, we simply identify the nonprimitive
object type of the program inputs. So we don't -- because for parameter type into tuple you don't
have such kind of programs.
So for, for example, for this PUT we can see at the fixed high stack object. And so here we
identified the program as a candidate and all its field will be collected. Also identifies the
candidates.
So the next step is like we will perform a symbol execution on these program candidates. For us,
we turn to the elements of the program candidate symbolic. For external method call, which is
identified candidate, we return the return value as symbolic value.
For the object creation issue, for object creation problem, we assign symbolic value to the
program input and all its fields.
Then we can leverage the symbol execution and perform simple executions.
For example, in our [inaudible] we use the symbolic execution and text, and then we collect the
runtime information. The runtime information including the symbolics impressions in branches
and uncalled exceptions which will be later used for our program analysis.
Okay. So I will walk you through some examples. For example, we can see that this is a full
method candidate, return value as the symbolic.
So we can collect it, congents from the branches at line one. Then by extracting the elements
from the congents, we can know that the branch statement line one is data dependency on
external method call farther exist.
Using this data dependence analysis, we identified a real program -- I mean, who will not -- who
will -- the candidate should have data dependency on program, return value is also used by some
not covered branches.
For example, here we know that we already know that it has data dependency on the program
input. And then we found that the fourth branch in line one is not covered, which means line one
is partial recovered statement, and then we profile does exist as a real problem.
So the second analysis for identifying external method calls is to analyze the exceptions.
So we collect co-exception fielding test execution, and then we extract external method call from
the stacked chase from the exception chase. And then we check whether there's any part of the
call site is not covered. For example, this method slows objection for all the executions.
And we also check the code up to line six is not covered. We know the main part of the program
is not further covered and further conform it, past docket is a real problem.
So similarly to identify object creation problem, we also use the similar technique to analyze the
data dependency.
So if an uncovered branch has data dependency on the online primitive, program input, like if you
are directly depends on the program input, for example, you may have to like object stack cannot
be now. Then you can just report a stack, because you need to first provide a stack.
But it depends on the field. Can we just report object type of the field, directory?
Let me give you an example to show this. So, for example, we can see that this statement, that
this method returns the size of the field, state of items. So we know this statement has data
dependency on this field.
And then we further found that, okay, the true branches is not covered. So if we directly report
the object type of this field, we will report a list with objects. However, this is the first only.
Because even if you provide a method call sequence of control list, the fixed size stack cannot
use it because this list cannot be designed to the field stack for items in directory. You cannot
find any public structure or public data to design these external objects with a field.
Therefore, to adjust this problem, we provide a field decoration, hierarchical analysis. So first we
can show a field hierarchy.
From the field, up to the program input, in this case you'll look like this. To conjunct this field, you
can use hierarchy, refreshing to do that. So we start from the program input and then we look for
the field level by levels, until we find out the field we want.
And then to do that we need to analyze the field level by level. For example, if we, depending on
the program input, as I expressed earlier, then we just report it. And later we check whether the
field is designable for its declared in classes or not.
If the field cannot be designed, designable, this field can be designed external objects using the
public structure or public set method of this declared in classes, so if it cannot be designed, then
we just reporting, declaring classes.
If the field can be designed by declaring classes, we can capture and report until the field is there.
So I give you example to illustrate the analysis.
So we have this field hierarchy. We first check whether the fixed size stack or stacks can be
designed using the public structure or set of the fixed size stack. We found from the con structure
we can pass a stack, and this stack can be designed to the field.
So we keep going checking the next step. So we check, okay, we found that stack of items is not
desirable. And we report a stack as the object type for the object creation problem. This is
correct when we provide methods, provide object create a factory for actual text generated test
cases to cover these branches.
So we increment our approach, part of our approach is implementing extension to Pex. So this is
to identify the program candidates, design elements of program candidates, and we also need to
collect the runtime information from Pex.
And the data dependency analyzed in the GUI opponent is in parenthesis, C sharp application,
it's just to analyze the collected information and show the analyst result.
So our tool is released, publicly available on the Pex extension website.
To evaluate our approach, we choose two subjects. The first subject is X unit, is unit testing
framework for .Net. It's the popular testing framework used by many real projects in practice.
The other one is the Quick Graph, is C sharp plus library, is also used by many authors, because
our approach tend to analyze the problem for the come-back problem in practice and often used
by many people. That's why we chose these two subjects.
So we first -- so we first used the Pex, the implemented intention as our generation tool. So we
apply Pex to generating the test input first. Then we apply our approach to collect the
information, produce the identified external method [inaudible] creation problems. So in our
evaluations we answer two research questions.
The first research question we answer is how effective it can now approach in identifying external
method call problems and object creation problems.
The second question we answer is like when we identify so many problem candidates, how
effectively can our approach like who means this irrelevant ones. So here is a table. So in the
evaluation results. So I give you highlights like identify 43 implement call problem with only one
false positive and two false negative.
To identify the first negative, we manually go through all these programs, and then we identify the
problems, first manually, and then we compare it with the identified one by our approach.
So next I will give you some examples how our approach is successfully identified the problem
and why our approach failed in some cases.
So this is the first example is actually from the X unit. So first Pex cannot achieve high coverage
of this problem, because the external method call here.
So after we executed a test, we can see that we created a carrier and we see the branch
statement in line 1 is not covered in data dependency on program input.
That's what will employ as external method call problem. Similarly, for this executer, because the
method keeps slowing exceptions for all executions. And we checked the coverage and we
report it correctly.
So this is an example why our approach produced false negative. As we can see, after we check
it we find, okay, this branch is not covered. You actually use the return value of the previous, of
the external method call opened to kit, however this is still a method. It does not have data
dependency on program input, user constant input.
And after I further check it, I further confirm that it's actually used to read the value from the
windows registration inches. So when I run the test cases, if I want to cover the other side, I need
to modify the registration sheets.
So that's one problem missed in our approaches. So in our future work, we may plan to turn
every external method call as candidate, and we can check this; but we need more experiments
to show how it will affect preseason and record our approach.
Next I will give you examples of how approach identified object creation program on your subject.
So first also achieve low coverage for these methods.
The reason is that to enter this far roof, you need a field top of test, not now, and also it has -- it
has to implement a list one interfaces in the object, because this is a reflection of things.
And by checking class structure, by analyzing the field, declaration hierarchy, we found, okay, this
field can be designed using the public structure, we correctly reported the object type of this field.
When we do symbolically provide a mock object, interface I type before these branches can be
covered.
The other one that we've produced false negative is shown here. So actually this is similar to the
successful cases.
However, when we found that the field fixture is implemented using the dictionary type which is
not handled correctly by Pex, and we lost a check of the symbolic value there. So at that time
when we were doing the experiments, we did not implement that logic. So we produced some of
the false negative. For the false positive of our approaches, of, sorry, there's also one more false
positive we get is the static field. So when some branches related to the static field, we also lost
a check of the symbolic value. And I seem like the later one Pex will fix it. So we can maintain
output.
And the reason why we have some false positive is like currently we just, when we are collecting
symbolic execution from Pex, we usually collect a stream from it. So when we write some code,
abstract it from the stream, you will have some problems of dealing with some corner cases,
that's why we sometimes have false positive.
And this is the evaluation tool showing how we proved the candidates, as we can see in the next
previous example. Only a field, external method call, when it's causing problems for automation
generation tool and our tool can effectively prune these irrelevant candidates.
Also we can see that like to cover certain branches, you actually don't need to provide object type
for everything, you only need to provide to some of the fields of the object type.
Yeah. So our approach is actually not limited to just a simple dynamic execution. Our approach
can also assist other structural dynamic systems approach. For example, our approach already
identified the external method call that will cause problems.
So if it's possible to use a way to automatically generate mock objects for only this external
method call. The second thing is like, you know, to assist the random approach, actually is our
approach already identified object type that would cause problems. So design more probability of
exploring this pipe.
So you have more chances to cover more branches.
Besides, some of the advanced methods, sequence generation process can also be used. For
example, when we identify an object creation problem, you can apprise some of these expansive
approaches to only explore the type that are reported by our approaches.
So the future work of my approach is like, although we don't see many false positives or false
negatives with our approaches, but we still have some potential issues.
For example, currently our approach did not consider the arguments I felt external method call.
So some of the external method call, when you're passing program input, you may modify the
program input. So it actually will have some problems.
The currently outputting considering data dependencies. So if they happen that some of the
cases you need to control dependency, help you produce more precise results, but this requires
more experiments to try, whether combined controlled dependency. How controlled dependency
will affect the preseasoned record approaches.
Also, we may plan to leverage static analysis. So because like currently we're only analyzing the
code security by the text input. Some of the regions, if it's not covered by the test cases, that we
lose the chance to analyze them.
So stack analysis maybe is always a good way to extend our approaches.
The other things, currently the output result of our approaches is output tells you have a problem
that may prevent you from covering branches one to three. It's still very difficult for developer or
tester to understand.
So we may provide visualization approaches to visualize the analysis of result with the problems.
And we also plan to carry out a user study of evaluating the effectiveness.
So, yeah. So here comes the conclusion. So any questions?
>>: So it's interesting that you don't need actually a test suite from Pex to apply this tool.
>> Xusheng Xiao: Yeah.
>>: You can actually take an existing unit test suite and run it and also collect information.
>> Xusheng Xiao: Yes. Yeah. You can actually manually provide test inputs, and we also
output, can help you identify why you cannot cover certain branches.
>>: Very interesting information. Going forward the coverage, did you know they're really
independent on the machine environment?
>> Xusheng Xiao: Yeah.
>>: Can you incubate this? I found four more hours for today. This is great. It's a well-known
issue. We claim we don't have post positive dynamic symbol execution.
It's not true of object creation and external methods. It's really a big problem. That's why we
build models from the beginning. But there's more work to be done there.
>> Peli de Halleux: Thank you.
>> Xusheng Xiao: Thank you.
[applause]
Download