>> Mark Marron: So hello, everyone. So I'm... myself and Sumit Guani [phonetic] here. He did an...

>> Mark Marron: So hello, everyone. So I'm introducing Xusheng Xiao. He's visiting
myself and Sumit Guani [phonetic] here. He did an internship with us last summer where
he worked on some natural language processing and translation to Excel. He's also done
a lot of work on testing and software engineering, so he's going talk to us today about
some of his work. And he's just about to graduate from NC State University. So I'll let
you go.
>> Xusheng Xiao: Okay. Thanks, Mark, for the introductions. Good afternoon,
everyone. Welcome to my talk. I'm going to present my research, cooperative testing
and analysis via decision making.
You are welcome to ask me any questions during the presentation. So software quality
includes both functional quality and nonfunctional quality. Functional quality refers to
the functional correctness of the software and the functional quality suppose the delivery
of the functional requirements.
Failure to ensure the high quality of software will result in serious consequences.
Software with poor functional qualities has many functional defects which will cause the
software to behave incorrectly. Recently a functional defect called the Knight Capital's
computers to execute a series of automatic orders that were supposed to be spread out
over a period of days which caused the company over $440 million.
Software with poor functional quality may compromise user security and privacies. For
example, posting your kid's photos using
applications with [inaudible] taggings might
expose your kid's exact location to strangers. Such security issues are also reported in
To improve software quality, software testing and analysis tool can be used to automate
certain activities in software development and maintenance which reduce the manual
efforts of quality assurance.
Among software testing and analysis tools for ensuring high quality, test generation to
encode coverage tools are the type of tool that are commonly used in practice.
The reason is that generating test inputs to cover a line of code is necessary for exposing
faults at this line of codes.
Although tool automations is important to reduce manual efforts, these tools face various
challenge when dealing with complex softwares. For example, even the state-of-art test
generation tools have difficulties in achieving high coverages for complex object-oriented
programs since the tools cannot generate desired object states for certain branches and
cannot deal with the method calls to external libraries. These challenges will still be
difficult for the tool to tackle in the near future.
For some of these challenges, user can provide their help to address the challenges. For
example, user can provide more objects to replace method call to external libraries, help
the test generator to improve the test courage.
However, most of the tools does not communicate information with the users. A little
research has been done to support such call operations. To better support such call
operation, I propose a call -- a general methodology, cooperative testing analysis, which
allows users to make informed decisions when cooperating with software testing analysis
My methodology advocates two types of call operations. The first type of call operation
focuses on the tools, and user provide their help to address the difficulties faced by the
tools. Here I would like to use the Google driverless car as the example. Given the
destination, the Google driverless car will now tell us the destination automatically.
Along the way, the car may face different kinds of difficulties. For example, the car may
enter a crowded street and does not know how to move forward. In these cases, user can
jump in and manually drive the car, help the car to get through the street. After that, the
car can continue to move toward the destination automatically. If more difficulty are
encountered later, then user can join in and help again.
The second type of the cooperations focuses on the users, and the tools help the users
make informed decisions to [inaudible] the task more effectively and more efficiently.
Consider the Google driverless car example again. [inaudible] the user forgets the next
destination to go, he may ask the AI assistance in the car to suggest the destinations.
Then the AI assistant then reports a list of suggested destinations along with the
information of how these destinations are suggested. For example, some destination may
be suggested based on user's preferences. Some destination maybe be suggested based
on user's current location and previous history.
While looking at this kind of information of how the tools suggest destinations, provide
context for users to choose the next destination to go, enable user to make informed
In cooperative testing analysis, I have met research contributions on analyzing structured
software artifacts such as problem source code and exclusion traces with software testing
analysis techniques. I have also met research contributions on analyzing unstructured
software artifacts using analysis techniques. These techniques focus on assuring the
functional correctness, security and privacy and performance of the softwares.
The structured software artifacts analyzed by my techniques include various types
software applications such as command line application, GUI application, and mobile
applications. The unstructured software artifacts analyzed by my techniques include
various types of natural language software artifacts include such as API documentation,
application description, and requirement documents.
In today's talk, I will focus on three concrete projects that use two types of the
cooperations to improve test generation and mobile securities. I will start with my
projects on test generations. So software testing is one of the most widely used
techniques to improve software qualities. But it's typically labor intensive and costly
process. To address these issues, structural test generation can be used to produce
high-covering tests automatically. Recently, with the advances of [inaudible], test
generation based on dynamic symbolic executions has shown promising results in
achieving high coverage and detecting real faults in well-tested libraries.
Although test generation tools can achieve high coverages on certain problems, these
tools face various challenges when dealing with complex object-oriented problems. So
here is an example test report generated by Pex. Pex is based on dynamic symbolic
executions which instrument the code to explore the feasible paths for generating test
Since the number of paths grows exponentially with the size of the problems, which is
called path explosions, Pex includes a set of heuristics that guides the exploration to
achieve high coverage faster and more effectively.
So after the test generation, test generator tests reports that shows how many test cases
has been generated and what are the achieved coverages. So let's look at this example.
The achieved coverage of Pex is not that hard. Here I refer to the not covered part of the
code as the symptoms.
So what are the causes to these symptoms? Is it because of the path is frozen or what
else? To understand the challenges faced by the test generation tools, I conduct a
preliminary study on four popular open source projects. I first apply Pex to generate test
inputs to achieve coverage. As we can see from the results, the achieved coverage is not
that hard. Then I manually studied causes to the not-covered branches and I find out
that -- yeah.
>>: So what was your setup for applying Pex? I mean, how -- what was -- what was the
experimental setup? So you had a function, you had an application with, say, a hundred
methods. How did you apply Pex?
>> Xusheng Xiao: Okay. So Pex comes with a feature that can automatically generate
[inaudible] for each public method. So I try to apply Pex just to generate this program
input for this public method.
>>: So just Pex out of the box? So you push button. You did no manual effort?
>> Xusheng Xiao: Yeah, yeah. I just [inaudible] Pex 1, yes.
>>: Okay. So when you say apply Pex, there's a feature in Pex that just lets you generate
regardless of what exists already, what tests exist already? You just run Pex on each
public method and then collect the results? So this is completely automated?
>> Xusheng Xiao: Yeah, completely automated.
>>: And you haven't done any manual work to this point to achieve this coverage?
>> Xusheng Xiao: Right, right. I haven't done manual work.
>>: I've got another question. If you run Pex on a method, if it doing [inaudible]
execution I can imagine that [inaudible] continue running forever, the presence of loops
and whatnot. So did you provide a tight bound? Or how does it decide to stop?
>> Xusheng Xiao: Okay, yeah. So Pex comes with preset boundaries for different kind
of resources, like the execution time, a number of branches that you can collect. Yeah.
So I just let Pex run with the default boundaries. Yes. Yeah.
So I found out that like most of the not-covered branches are caused by two major kinds
of problems. The first major kind of problem, object-creation problems, where Pex
cannot generate desirable receiver or argument objects. Next I will show how this
problem will compromise the achieved coverages.
So here is example class, fixed size stack. This class is a [inaudible] class of stack. It
puts a limitation on the maximum number of objects that can be pushed into the stack.
To cover the true branches [inaudible], the test generation tool needs to generate a stack
object whose size is 10. And here is one of the target sequences that can be used to
produce such desired object states. And so we can see there are multiple method can be
invoke on the object stack. And the combination of the method call grows exponentially
with the number of method call in the sequences. Therefore, the search space is very
huge and [inaudible] too often cannot easily generate such sequences. Without such
frequencies, the true branches [inaudible] cannot be covered.
The second major problem, external-method call problems, where the test generator tool
failed to deal with a method call to the external library. So typically test generation tool
instruments explores only methods at the project under test. The reason is that these are
third-party API methods. They may have too many paths and the test generation tool use
up the resources in exploring these experiment methods.
Also, some of the external method, they are not instrumentable. For example, like
method call to file assistant, a method call to network I/Os.
So next I will use two example to show how external method calls may compromise the
achieved coverage. So let's look at the first external method call file.exist. The return
value of this method is used in an if statement. Since the test generation tool does not
know the files in existence, it typically cannot generate a value to cause this method to
return true. Therefore, the true branches is not covered.
Next let's look at a second external method call, path.getfullpath. This external method
slows exception for invalid inputs. So if the test generation tool cannot figure out how to
generate the value of the inputs, this method will keep slowing exceptions, preventing the
test generation tool from covering the remaining part of the problems.
So for object-creation problem and external method call problems, user can provide their
help to address these problems. To tackle object-creation problems, user can provide
factory methods that encode sequence of method calls to produce a desired object state
and the test generation tools can use these factory methods to generate more test cases for
improving the achieved coverage.
To tackle external method call problems, users can instruct the test generation tool to
explore the external libraries and address [inaudible] but if the external library are too
complex and containing too many paths, the test generation tool may not still generate
desired values.
Alternatively, user can provide more objects to simulate the environment dependencies
and enable the test generation tool to generate the desired values for improving the
achieved coverage.
So to obtain user's help, the tools need to report the problems for the not-covered
branches. Given the not-covered branches as the symptoms, the existing approaches will
report all the non-permitted program inputs and fields as object-creation problems and
report all the [inaudible] external method calls as external method call problems. These
will become the likely causes to the not-covered branches here.
However, these existing approaches faces two problems. The first problem is like most
of the reported problems are false onlys which means that if the users provide their help
to solve all these problems, a lot of efforts will be wasted. The second problem is like ->>: What is a false warning to you in this case?
>> Xusheng Xiao: Yeah, so, for example, if I tell you that could you provide a factory
method for object type C, and if I provide that factory method, still the recovery cannot
be improved, then the users' effort will be wasted.
>>: Okay, okay, okay.
>> Xusheng Xiao: Yeah. And the second problem is like the existing approaches does
not know the target state to solve the problems. That is that you can see in the approach
it does not know what are the desired object states for solving object equation problem
and what are the desired return values for solving external method call problems.
Without such information, it's very difficult for user to provide their help. For example, a
user may provide a factory method that produce a state object whose size is 5, but this
factory method cannot help the test generation tools to improve the coverage. Therefore,
the user may try several times until he luckily provides a factory method that can provide
a desired state object.
So to adjust these two problems, I propose an approach called Covana. Covana precisely
identify problems faced by the test generation tool for the not-covered branches. The
insight used by Covana is that partially covered conditional statements usually have
either data or control dependency on the real problems.
So here is the overview of Covana. Covana first performs forward symbolic executions
on the problem using the generated test inputs. During the test executions, Covana
observe the runtime events and identify the problem candidates. This problem candidate
will become the likely causes to the not-covered branches.
After identify these problem candidates, Covana turns the elements of the problem
candidate into a symbolic value. That is, Covana turns the program inputs and fields into
a symbolic value [inaudible] turn values of external method call into a symbolic value.
Covana then performs forward symbolic executions on these symbolic values and collect
the coverages and the runtime informations.
Using these collected informations, Covana computes the data and control dependencies
to prune irrelevant problem candidates and identify the problems along with the target
state to solve these problems. So the symbol execution here is used to compute the data
dependencies and the target states to solve our problems. Next I will use some example
to show how Covana identify external method call problems.
So Covana can see the only external method call whose arguments have data dependency
on the program input as candidates. The reason is that other external method calls
usually pre and constant string and I'll put the thread into three or four constant times,
which does not compromise the achieved coverages.
In this example, both of the external method call have data dependency on the program
input and therefore I can see that as the problem candidates.
Covana turns the return value of the external method call into symbolic values and
compute the data dependency using forward symbolic executions. In this example, based
on the data dependency computation, Covana knows that the if statement has data
dependency on the external method call [inaudible]. Since the true branches of this if
statement is not covered, Covana correctly reports this external method as an external
method call problem.
Along with the target state to solve this problem is like if you generate a value to make
sure this method retain true, then the not-covered branches can be covered.
Besides data dependencies, Covana also use control dependencies to identify external
method call problems. So recall that this external method call slows exception if given
invalid inputs. So if Covana identified that this external method call slow exception for
all test executions and that the code after line 1 is not covered, then Covana reports this
external method another external method call problems.
So so far I have showed how Covana identifies external method call problems. Here I
would like to show how Covana prunes irrelevant problem candidates. So in this
example, most of the methods have data dependency and the program improve and
therefore are considered as candidates. However, none of these methods compromise the
subsequent coverage and therefore are safely pruned by Covana.
To identify object -- yes.
>>: Sorry.
>> Xusheng Xiao: Yeah.
>>: Can't format throw an exception?
>> Xusheng Xiao: Not in these cases. So if like in building the test generation, if one of
the test execution does not slow exceptions, which means that the test generation to
already figure out some of the paths, that doesn't lead to the exceptions, so then the
following part of the code can be covered.
>>: Okay. I see. So what you're saying is that you gain your existence of tests.
>> Xusheng Xiao: Yes.
>>: Within those tests, some methods throw exceptions.
>> Xusheng Xiao: Yes.
>>: Then, quote/unquote, there's some reason that, yes, this may throw exception.
>> Xusheng Xiao: Right.
>>: But within those tests if a method does not run exception, you just assume that this
method does not run exception.
>> Xusheng Xiao: Yeah, yeah.
>>: Okay.
>> Xusheng Xiao: Yeah. So to identify object equation problems, Covana uses similar
data dependency analysis. So in this example, if the true branches at line 5 is not
covered, Covana knows that that's because the field stack.size is not equal to 10.
However, if Covana directly reports the object type of this field, which is integer as the
object equation problem, then it will result in an inaccurate report. The problem is that
this field is private and it can only be modified by invoking the method of push and pop.
So to adjust this problem, users actually provide factory methods for stack instead of
[inaudible]. So to address these problems, so how do we adjust these problems? So
given the field, we can actually create a field declaration hierarchy up to the program
input, and the problem here is like which object type the test generation tool fail to
generate the sequence of method call full.
So if we look at how test generation tools generate a sequence or method call more
carefully, we can observe that it's not difficult for test generation tools to generate method
calls that invoke construct -- public constructors or public setters to assigned fields.
Therefore, typically the difficulty lies in the field that is not assignable by invoking these
public methods.
So based on these observations, starting from the field of the problem inputs, if a field is
available, then object creation problems is either the field or the field at the next level in
these hierarchies. And if the field is not assignable, then the object-creation problem is
the declaring classes of the fields.
So based on these insights, I provide an algorithm to identify the object-creation
problems from the field declaration hierarchies. I will use this same example to show
how my algorithm works.
>>: Can I ask a question?
>> Xusheng Xiao: Yep.
>>: I'm trying to understand what is the problem. Is the problem that we don't have a
model for the push method on stack? Not fixed site stack, on stack.
>> Xusheng Xiao: Right.
>>: That it can modify the count field.
>> Xusheng Xiao: Right.
>>: We don't have that model.
>> Xusheng Xiao: Yeah, we don't have the models.
>>: And so how long to recover that?
>> Xusheng Xiao: Yes. Yes. So we would like the users to help us to find out the
relationships. Because you can -- besides push, you can also invoke pop, so then the state
would change [inaudible] and it's difficult for the tool to automatically figure out how to
generate. Yeah.
>>: [inaudible] I mean, if you have the DLL or the library for stack, can't you just read
the I/O code?
>> Xusheng Xiao: Yeah, so basically the test generation that we try -- for example, I
randomly try invoking different methods then, but it does not know like whether I should
invoke -- keep invoking this method in order to approach the target states. For example,
like you can try to invoke push one time and you see the state increase by 1. Then you
try the pop again, and it goes back to 0. So yes.
>>: Can I rephrase the question? So the question is I have the I/O.
>> Xusheng Xiao: Yeah.
>>: I can look at the I/O, I can see the internal state.
>> Xusheng Xiao: Right.
>>: So can I just basically muck with that I/O, rewrite it somehow so I can just build an
argument stack and set its size equal to a fixed value?
>> Xusheng Xiao: Oh, yeah. Yeah, you can do that. So you violate this like object
invariance. Yes.
>>: That's not how we set up the problem. The reason other problems that you have
injected a valid sequence of methods.
>>: And I think the other thing there is if you assign that field, then it's empty. So if the
next thing here was pop rather than push, it would report an invariant violation there,
even though that doesn't make sense from the [inaudible].
>> Xusheng Xiao: Yes. Yeah. Basically you will generate the invariant objects. So
then when the people run your test cases, you may cover it, but it doesn't show the real
>>: I mean, but I still -- I would like to refine my question further. Supposing I do have
the I/O available to me, okay ->> Xusheng Xiao: Right.
>>: So I can make some heuristic judgment based on the knowledge that the push
method modifies the count field. It's not perfect because modification is a very loose
thing. I know that in one case it is incrementing and in the other case it is decrementing
>> Xusheng Xiao: Yes.
>>: So I'm wondering if the problem that you're trying to solve is because I don't have
perfect knowledge of what push is doing or is it the case that I don't have any knowledge
at all.
>> Xusheng Xiao: It's because I don't know what the push is doing. You actually have
that knowledge. You can know that this method modify which fields.
>>: Okay. That's allowed. I see.
>> Xusheng Xiao: Right, right, right.
>>: But you don't -- but you don't know, you know, how does it modify it.
>> Xusheng Xiao: Yes, yes, yes. But actually my technique is orthogonal to this
technique because you can always employ more powerful automation technique to
irrelevant problem. But in the end there are still some objects that you cannot figure out.
And my technique can -- applicable in the cases to identify the problems.
Yeah. So my analysis start with analyzing the field. The problem input in these cases is
a fixed size stack.stack. So my analysis final that these methods can be -- this field can
be assigned by invoking the public in [inaudible] so that my analysis continue to analyze
the next field.
In this case, this is stack.stack. Since this filed cannot be assignable, then my analysis
stops and reports the stack as the object-creation problems and ask users to provide
factory methods for these object types and the target state for this object state is like the
field stack.stack should be 10 in order to cover the branches.
So to evaluate the effectiveness of Covana, I use two open source projects, xUnit
QuickGraph. So I first apply Pex to generate test inputs for these two open source
projects and I feed the problems and the generated test cases to Covana for identify
object-creation problem and external method call problems for the not-covered branches.
I then compare the effectiveness of Covana and the existing approaches.
So I conduct two evaluations. The first evaluation is to show how effective is Covana in
identifying these two types of problems. And so we can see from the results, Covana is
able to identify both type of the problems with low false positive and low false negatives.
And the second evaluation is to evaluate how effective is Covana in pruning the
irrelevant problem candidates identified by the existing approaches. The result shows
that Covana is able to prune that more than 60 percent of the irrelevant problem candidate
also with low false positive and low false negatives.
So Covana enables the cooperations between the developers and the test generation
tools ->>: Sorry, can I ask -- can you explain what qualifies as a false positive and a false
>> Xusheng Xiao: Yeah. So for after the tool generator test inputs, many only look at all
the not-covered branches, and I figure what are the real problems that cause these
branches cannot covered. So and then when Covana reports a problem and then try to
provide factory method of program, not objects, either after I do that, if I can cover the
branches as a true positive, otherwise it's a false positive. And for the branches that
supposed to be object-creation problem, but Covana does not correctly report or mark it
as false negative.
>>: So you worked with all 2,000 of those ones?
>> Xusheng Xiao: Yeah, yeah. No. Sorry. This is the [inaudible] reported by the
existing approaches. Most of them are false positives. But my approaches only report
about 200. So I look at all 200 of them.
>>: So maybe I misunderstood this. So you said you pruned 1567 of them.
>> Xusheng Xiao: Right.
>>: And so there was one false positive and two false negatives?
>> Xusheng Xiao: Yes, yes.
>>: Didn't you have to look at all of them to make sure that they were the false positives?
>> Xusheng Xiao: Oh, because I look at the not-covered branches and I already know
like how many real problems are there. So many of them are actually false positive by
the existing approaches. Yeah.
Okay. So Covana enables the developers to cooperate with the test generation tool. So
first the test generation tool is used to generate the test cases automatically. Covana then
diagnoses the difficulty first by the test generation tool for the not-covered branches. A
report identify objection-creation problem and external method call problems to the users.
By looking at these problems, user can provide factory methods and mock objects to
address the problems. These factory methods and mock objects are [inaudible] the test
generation tools for generating more test cases for improving the test coverages.
So I have show like how user can cooperate with the test generation tool to improve the
test coverage. Next I would like to show how tools can -- how tools can help users make
informed decisions on how to control their privacies. I would like to use my projects on
mobile security for illustrating these kind of cooperations.
So [inaudible] of smartphones and mobile applications, application markets such as
Apple's App Store, Google's Google Play, Microsoft Windows Phone markets has
become a primary mechanism for distributing software [inaudible] into mobile devices.
And unfortunately this is also an easy mechanism for malicious users to build malwares.
To protect users' privacies, these predominate mobile phones provide different privacy
control mechanisms. For Google Androids, a permission list is showed to the users
before the user install the applications. This permission list solves the permissions that
are requested by the mobile applications and user has to approve all the permissions in
order to install the applications.
For Apple iOS, a popup dialogue is showed to the users the first time the application try
to use permissions. The Windows Phones use a mix of strategies that combine the both.
So although these two approaches are a bit different, both of these two approaches just
report what permissions are requested by the mobile applications, but it does not explain
how and why they would like to use your permissions.
Therefore, users make uninformed decisions on controlling their privacies. Studies show
that this is simple really to situations where users just approve all the permissions without
looking at what permissions are requested by the mobile applications.
So to improve the privacy control mechanism, I propose a user-aware privacy control
approaches. Instead of showing the permission list before the user install the
applications, by approach shows the information flows.
So this information flow shows what data type flow into what output channels, which can
be used to explain how the applications would like to use your permissions.
In this example, this information flow shows that this application will share your
locations at Facebook. Here I refer to the [inaudible] of the private information as
source, and I'll refer to the output channels where the private information may escape as a
Besides showing the information first, my approach provides monitored sink which are
output channels where users can reveal their information. For example, an application
may allow users to take photos using the camera and share it at the Facebook. Before the
pictures is shared, a popup dialogue shows the pictures for the users to review.
In these cases, users are aware of the information that's going to be shared, and also user
can perform runtime inspection of the shared data instances.
However, there are two challenges in realizing this user-aware privacy controls. The first
challenge is that the information flow can escape users' inspections by flowing to a
non-monitored sink. I'll refer to such information flow as escaping flows. The second
challenge is that the applications can tamper with the data before the data was showed to
the users. That is what you see may not be what you shared. For example, an application
can encode the location's information into the photos before the photo is showed to the
Although the user get to review the picture, it's difficult for the user to notice the small
changes. Even worse, the application can encode a location into the metadata of the
pictures. In that cases, user does not notice the location information at all. Record
[inaudible] the picture of location information can cause serious security issues. So I'll
refer to such information flow as tampering flows.
To address these two challenges, I propose approaches that computes information flows
and classified information flow as different kinds of flows. So to identify escaping flows,
my approach is [inaudible] information flows, flow into anonymous sink. To identify
tampering flows, my approach provides a tampering analysis which checks whether the
data tampered with before flow into a sink.
And our approach identifies information flow where tampered data flow into monitored
sink as tampering flows. Based on this tampering analysis, our project does not require
user's decision at [inaudible] for information flow where untampered data flow into
monitored sink. Sink's users have the opportunity to perform runtime inspections for
these information flows.
So our approaches identify the tampering flow and escaping flows and ask the user to
provide their decisions on what kind of information they would like to use for these
flows. However, the user may not be sure about whether these flows are [inaudible]. So
our approaches allows user to use anonymized data for these information flows. By
using the anonymized data, the user can experience the applications without
compromising their privacies. And if the user confirms these information flows
[inaudible], the user can go back and change the settings to use the private data for these
information flows.
If the user does not want to provide decisions at all, my approaches would by default use
anonymized data for the identified tampering and escaping flows.
Next I will show how my approaches compute the information flows. So my approaches
computes information flow stack query using interpretations. My analysis maintains on
object states of the applications and updated state based on the simulated execution of a
statement. This -- yeah.
>>: So you said there was a result that showed that people don't pay attention to those
lists of permissions that applications require.
>> Xusheng Xiao: Yes.
>>: It's good that you have a default on here, but if they deny something it might change
the way the application behaves. So is there a scenario where people either don't pay
attention or don't care or flip everything to real data and then they get the whole
application experience?
>> Xusheng Xiao: Yeah, yeah, they can do that as well.
>>: Okay. So I guess is that as likely as people just reading through the permission list
and saying, oh, whatever?
>> Xusheng Xiao: Yeah, yeah. Actually, some of the people, they don't care whether
you share their location or not. But some users, they don't like -- like if you like to share
my information, at least you need to let me know. So that's why I have this like
approaches called user-aware. So we need to make user aware of like what the
application can go into use your permissions.
>>: Okay. You're arguing that this is harder to just ignore, like your permissions lists.
>> Xusheng Xiao: Yes, yes, yes. Yeah. So my analysis is a summary-based
interprocedural analysis. It computes a symbolic method summary for each method and
use a fix point algorithm -- yep.
>>: One more. Is the application that you're like regulating either anonymous or
[inaudible], does it know what [inaudible]?
>> Xusheng Xiao: What do you mean by [inaudible]?
>>: Could it degrade what it does if it knows that it's getting anonymized data so that it's
more likely that a user will give it real data so that it can do what mischievous thing it
wants to do?
>> Xusheng Xiao: Yeah, I think they can -- at the end, the malicious user can observe
the patterns of how you do this. But then like for anonymized data, you can just like
choose some random data.
>>: No, but, I mean, could -- so the application developer wants real data so they can do
something bad.
>> Oh, yeah, yeah.
>>: So could they just say, if I'm getting anonymized data, I'm going to make this
application just not work correctly?
>> Xusheng Xiao: Yes, yeah.
>>: So if users want to use the application, they have to give it real data and then the
application can do its malicious thing. Is that possible? Does the application have access
to what permission level it's been given?
>> Xusheng Xiao: No, the application doesn't [inaudible]. But the malicious user
definitely can observe some pattern. Like if you keep sending me junk datas, then they
can block you as well, yeah.
So one way to deal with that is like to generate the random datas and then just send to
the -- send to the server. So they may have difficulties in like whether -- to observe
whether this is some fixed data or not. Like you -- you don't just like give the fixed data
every time when there's APIs involved. You generate a little bit different datas. And so
they may not be able to detect it.
So, yeah. But it's not a perfect event. Yeah. It's just one way to deal with that. Yeah.
So my approaches provides notations that can be used to annotate built-in APIs. So these
annotations are used to describe who are the sources and the sink of the built-in APIs.
And they're also used to describe how the information will be propagated by invoking the
built-in APIs.
So the formal definition of this data and how I update this data are in the papers. Here we
just use some example to show how my analysis computer states identified information
So let's look at the simple examples. At line 4, the local variable is classified with the
location informations. Through line 5 and line 6, the local variables, S and P, are
classified as well. And finally at line 7 my analysis encounter a sink, and identified
information flow from the location to the sharings.
So besides of this explicit it information flows, based on dataflow and control flows,
there are also implicit flows that pose challenges for the information flow computations.
So let's look at an example of implicit flows. So at line 3 a message is added to a
message of corrections. And this message is classified at line 4 by using the secret
information stored in S. And later in the code if another message is retrieved from the
message of corrections, a new share via Web service, we have a potential flow here.
Because the retrieved messages can be the classified message that we classify at line 5.
But explicit information flow computation cannot detect such information flow since the
message is classified after it's added to the message correction. So the message
correction cannot be correctly classified.
Such implicit flow propagate the information through the containers like [inaudible].
There are also other kind of implicit flows that can propagate the information. For
example, an implicit information flow can propagate the information through a file
system. It can save classified information into the file system in one method and send the
data out using another method.
So to identify the implicit flow based on the containers, my approach provides an edge to
connect the data types. So when the message is added to the message collections, my
approach is to provide the link between message collection and message.
With this link, when the message is classified, but the message collection is classified as
well. But with such links, my approaches potentially may produce many false positive
because the retrieved message from the collections may not be the classified information.
So in future work I plan to record all these locations where my analysis made
conservative decisions and ask the user to confirm whether such decisions are desirable
or not.
So my final approach is that [inaudible] increased flow through file systems. In future
work I plan to use dynamic analysis to identify such flows.
So information flow provide the basics for the tampering analysis. The tampering
analysis is used to track whether the source -- the data are tampered with before flow into
the sink. So the tamper analysis provide a tamper notation which it can be used to
annotate the building API.
In this example, the API in collocation should be annotated with the tamper annotations.
Based on the tampered annotation, the tamper analysis will apply the tamper operator on
a set of sources that are associated with local variable. I will use an example to show
how this tamper analysis work.
So we take pictures using a camera. And here is what the state would look like. The
local variable picture is associated with the source camera. If the API in collocation is
invoked on the pictures, a tamper operator will be applied on the camera sources,
indicates that the camera sources has been tampered with. So later when we encounter a
sink, my approaches can identify a tampering flow.
So I implement this whole approach is hard to develop. I think here everyone knows
TouchDevelop -- what TouchDevelop is, so I will not try to explain it too details. So
TouchDevelop allows users to publish and share their applications, which we call scripts
in TouchDevelop, through the script bazaars. And these characteristics [inaudible]
similar mobile phones to Android and iOS.
I evaluate my approaches on about 600 scripts published by about 200 users. So I
compare my approaches with the existing approaches and the flow approaches. The
existing approaches will require users' decisions for a flow script if most sources in sink
are found in the permission list. The flow approaches will require users' decisions, if any
information flow is found for scripts.
Record my approaches will only identify the temporary flow and escaping flow and will
own require users' decision for those two kind of flows.
So I conduct two evaluations. The first evaluation is to measure how many script
requires users' decisions. And so we can see from the results there are 172 script that use
sources. And the existing approaches will require users' decisions for 89 scripts. This
number can be reduced to 78 if we're using the flow approaches.
Since our approach does not require users' decisions for information flow with
untampered information flow to monitor sink, our approach will only require users'
decisions for 54 scripts.
The second evaluation is to measure how many sources require users' decisions. And so
we can see from the results the existing approaches will require users' decision for 152
sources. And this number can be reduced to 119 by using the flow approaches.
By using our approaches, we can reduce this number to 63. These results shows that
even with the false positive introduced by the static analysis, our approaches is still very
effectively in reducing the users' decisions for the scripts.
So so far I have showed that like information flow can be used to explain how a mobile
application will use the permissions. My result work with others, WHYPER, propose a
framework that use natural language processing technique to establish links between
sentences in the application descriptions and the permission in the permission list. Such
sentences can be used to explain why an application would like to use your permissions.
Both of kind of this information complements the results of the information flow analysis
and they serve as the first step toward reaching user expectations and the application
So consider sentences in application descriptions. Also we can share the yoga exercises
to your friends via e-mail and SMS.
The phrases share to your friends via e-mail and SMS implies that this application needs
to use your contact information. The reason is that they need -- the application needs to
know your friends' e-mail in order to send the e-mails. So I'll refer to certain sentences as
permission sentences since they're sentences that indicate the use of the permissions.
So to identify such sentences, a straightforward approach is just to use the keyword-based
search on the application descriptions. So, for example, we can just search the sentences
that contains the keyword contacts.
However, these keyword-based search approaches face two challenges. The first
challenge is there are certain keywords, such as contact, that have confounding meanings.
For example, in the sentence like display user contacts, the contacts here means like the
application needs to read your contact information. However, in other sentences, like
contact me at an e-mail address, the contact here does not imply that the application
would like to use your contact permissions.
The second challenges is that like sentences are often described as sensitive operations
without actually referring to the keywords. Consider the same example. In these
sentences, it does not contain the keywords contacts, implies that this application needs to
use your contact permissions.
So to address these two challenges, my approach -- our framework is natural language
processing techniques. So natural language processing techniques help computers
understand the natural language artifacts such as speech recognition and translations.
The reason natural language technique have shown some successes recently such as the
IBM Watson is still difficult to use natural language processing technique for general
purposes. However, natural language processing techniques is feasible for analyzing
domain-specific sentences with specific styles.
My previous work on specializing natural languages processing techniques using domain
knowledges to infer formal model from API documentations and use cases have shown
promising results. So in WHYPER we also specialize the natural language processing
techniques using the domain knowledges from the API documentations. So here I will
just show the major components of WHYPER.
So WHYPER produces an intermediate representation of the input natural language
sentences ->>: Can I ask you a quick question?
>> Xusheng Xiao: Yep.
>>: So I'm kind of confused. Like so in the previous slide you said API documentation.
>> Xusheng Xiao: Right.
>>: And then in this next slide this sounds like -- this sounds like a description from
what the app does.
>> Xusheng Xiao: Right, right, right.
>>: So where are these sentences coming from?
>> Xusheng Xiao: So these sentences are from the application descriptions. Now, the
domain knowledge are from the API documentation. I will explain that domain
knowledge later. Yeah.
So first my approach is to use the natural language parser to identify the syntax structure
of the sentences such as like noun phrases and verb group.
Based on the natural language parsing, we also produce a dependence tree. So this
dependence tree shows the grammatical relationships between the words. It contains
more semantic information than the simple syntax trees.
Using these dependent trees, our approach is to further produce an intermediate
representation of the sentences, which is essentially a first-order logical presentation.
This is also a tree structures. The leaf node of the trees are the entities and the other
nodes are the predicates.
So each -- the children nodes of predicates are the participating entities, and the first child
is the governing entity and the second child is the dependent entities.
So using this intermediate representations and the semantic graph inferred from the API
documentations, our approaches identify whether sentences is a permission sentences. So
here is a semantic graph for the permissions with contacts. So the contacts is the
resources protected by the permissions that read the contacts.
We also look into the API documentation to identify the attributes of the contacts such as
the phone number, e-mail, location. These become the [inaudible] resources of the
contacts. And then we look into the API documentations that require the permissions
with contacts to identify the verbs that represent the action that can be performed on these
resources such as read, search.
So to identify whether a sentence is permission sentences, we search the first-order
logical representations to find a pair of predicate and entity that match a pair of action or
resources in the semantic graphs.
So record in this first-order logic representations the leaf know our entities. So our
algorithm starts by matching the leaf node with the resources. In this example, we can
find a perfect match, e-mails. After we found this message, we start to search the
predicate node to try to match the predicate node with the actions which is basically a
So in order to perform the match, we use the one [inaudible]. In this example, we will
search for starting from the end predicate and we keep going up to search the parent
predicate. We also search the sibling predicate if the sibling predicate is the first child or
is the first child, the first-order logic [inaudible].
So in this example, if we reach the sibling predicate share of the predicate owned, based
on the one [inaudible] we're going to find out that these two verb are very similar. So we
find a match pair of predicate and entities and a pair of the action resources and consider
these sentences as permission sentences. Yeah.
>>: So what if there was the word not in there? Like let's say the English sentence was
does not allow you to share your yoga exercises with your friends via e-mail and SMS?
>> Xusheng Xiao: Okay. Good question. Yeah. So we -- yeah, our kind of approaches
haven't considered the negative effects. But based on this first-order logic representation,
we can encode the [inaudible] into that.
>>: Well, a different aspect of this but similar is e-mail in the left side of the screen is a
verb ->> Xusheng Xiao: Yeah.
>>: -- e-mail on the right side of the screen is a noun.
>> Xusheng Xiao: Yes.
>>: And if you're trying to match -- or trying -- you say one of the problems in the -- just
using control F for find was you get the sense of the word wrong. And here you're
getting the sense of the word wrong even though you're matching.
>> Xusheng Xiao: So because this tree is built from the dependency tree, so we already
know the e-mail here is a noun because based on the natural language parsing I already
know the syntax structure of each word.
>>: Sure. But in a way it's a verb because it's a means of ->> Xusheng Xiao: Yeah, if it's a verb, then it would be kind of ->>: [inaudible] verb, essentially.
>> Xusheng Xiao: Yeah.
>>: They parse the sentence with [inaudible].
>> Xusheng Xiao: Yeah. So we already parse it ->>: That's not English, that's jargon, e-mail and SMS. That's not English. So [inaudible]
sentence unless it's specific to jargon, computers ->> Oh. Oh, yeah, yeah, yeah. Yeah.
>>: Don't know those words.
>> Xusheng Xiao: Yeah. Because I -- yeah. Because the [inaudible] I did the
>>: Structure says that has to be [inaudible].
>> Xusheng Xiao: Yes.
>>: Via this and that.
>> Xusheng Xiao: Right, right.
>>: English changes things. I don't know, natural language is statistical, not parsing.
>> Xusheng Xiao: Yeah, yeah.
>>: [inaudible] don't mess with the guy that [inaudible].
>>: [inaudible] training so when you ->> Xusheng Xiao: Right, right. So -- yeah. So the e-mail and SMS is like there is a
technique in natural language processing called them entity recognitions. So to improve
the precision of the entity recognition, we also maintain a static list of some nouns that
we common use in the domain of the mobile applications so that will improve the
precision and just the problem is a problem. Yeah.
So we evaluate our approaches on 600 application descriptions that consist of about 1,000
sentences, and we achieve for missing results in terms of average precision and recalls.
So we also compare our approaches to the keyword-based searching process, and the
results of our approach can significantly improve the precisions, and performance slightly
worse in terms of the cost. Yes.
>>: What's your ground truth?
>> Xusheng Xiao: Oh, the ground truth, we manually annotate each sentences as
whether the sentences is the permission sentences or not.
>>: So human.
>>: Yes, human. Yes, yes. Yeah, so we showed out the results so that we have much
better precisions but we have slightly worse recalls.
The reason like why we have slightly worse recall is because the action or resource pairs
inferred from the [inaudible] cannot cover every cases of how the people will use the API
and use the resources.
So in the future work we plan to use some learning techniques to learn from the user
comments and the [inaudible] to improve the semantic graph.
And also under our contacts, precision is much more important because if our technique
says that this application has the description for permissions but in the end it doesn't, it's
pretty bad. So in our contacts we've seen precision is much more important.
So information flow analysis on the WHYPER enables user to make informed decision
on how to control their privacies. So a privacy [inaudible] system will give users an
application with the application description to inspect. And in my information flow
analysis will compute information flows and identify the escaping flow and the tampering
flows. The WHYPER will also identify the sentences that indicate the user of the
permissions. These two kinds of informations are then presented to the users, helping
user make informed decisions on how to control their privacies.
If user are not sure about this tampering flows and escaping flows, user can use
anonymized data to experience the applications without compromising their privacies.
So this process can be repeated until all the information flow has been inspected by the
users or the user loses the patience.
So I have to show like two kinds of cooperations. And the first part, the car, shows users
can cooperate with the test generation tool to improve the coverage. And the second part
of my talk shows that tools can help users make informed decision on controlling their
My future work in cooperative testing analysis consists of two directions. So the first
direction is economical analysis. So in both type the cooperations, user may have limited
time for each task and they would like focus on the more important task items. So for
test generation, the benefit -- so we would like to have a technique that can estimate the
benefit and the cost for solving a task items.
For test generation, for benefits, like if I solve this object creation problem, how much
more coverage can I get. And the cost is how difficult to solve this object creation
problems. For mobile security, the benefit is like if I grant private data full permissions,
what functionality can benefit from it. And the cost is that if I use anonymized data for
permissions, how difficult for me to assess this permission uses when they're using the
The second direction is I would like to provide better explanations. So for test generation
I would like to in best case how can I use this visualization tool to improve user
understanding of the problem faced by the test generation tools. I also plan to conduct
user study to measure the effectiveness of the visualizations.
For mobile securities, I plan to investigate technique to reveal the contacts of the
permission uses. So these contacts can be used and explained when and where the
permission will be used. So certain malicious behavior will be easier for users to notice
if their contacts are revealed to the users. For example, malicious applications may send
a text message in the background when a signal is changed. So if such contacts
information is revealed to the users, is much -- user can easily find suspicious behaviors.
So my long-term future work in cooperative testing analysis focus on three types of
questions. The first question is like what user can do. So currently I assume users have
to send programming skill to provide their help. In reality user may have different kind
of skills. So I would like to investigate how the factors might affect the cooperation
process proposed by my methodologies.
The second question is like how can the tools learn from the users. So after the users
provide their help to the tools, the tool can accumulate the user interaction histories. So
later if a similar problem is faced by the tool, can we mutate this user interaction history
to provide solutions to the new problems.
And the third question is like how to assist the user in providing their help. So I pray
their investigative technique provide better tools of course for the cooperations. For
example, currently user need to provide a factory method from the scratch. But in the
future, I plan to investigate like to use program synthesis technique to synthesize a partial
solution and maybe also suggest method call for user to choose in order to contract
factory methods.
So to conclude my talk, I have introduced methodologies, cooperative testing analysis
where users make informed decision when cooperating with software testing analysis
tool. I have also showed how this methodology can be used to improve test generation
and mobile security.
So thanks for coming to my talk, and I'm ready to take questions.
>> Mark Marron: Do we have any questions? We asked a bunch during the talk.
>>: Already asked.
>> Xusheng Xiao: Yeah, yeah. Thank you.
>>: Okay. I do have one, though. So on your evaluation on the user assistant testing,
right, you're using some existing frameworks including an xUnit, a unit testing
framework, right?
>> Xusheng Xiao: Right.
>>: I assume they have unit tests written for xUnit that's very high coverage. How does
the coverage you get compare to the coverage of this handwritten unit tests?
>> Xusheng Xiao: Yeah, so ->>: Any idea about like how much effort you had to do using the sort of cooperative
approach versus I write all the unit tests by hand?
>> Xusheng Xiao: So basically if you look at these like unit tests are written manually,
they already encode method call sequences in order to contract different kind of objects.
Sometimes they even provide some factory methods for testing purposes.
So by using my approaches, I also did a small study like if I provide -- if I provide factory
methods to solve two or three problems and also like some external method call
problems, I can boost the coverage from the 50 percent to about 80 percent.
>>: Okay. And like how does that compare with -- I mean, if this had unit tests already
written for it, how does that compare with what other coverages you need to make versus
the ->> Xusheng Xiao: Yeah. So, I mean, if you written manually, it's possible for you to
reach about more than 90 percent, maybe not hundred percent. Yeah. If you use test
generation tools, it provide much less efforts because the first 50 percent or 60 percent
were covered by Pex automatically, and later you just need to provide a few factory
methods and you can also reach that similar levels.
>> Xusheng Xiao: So would it be nice to say like, okay, I can write 200 units tests by
hand and get 90 percent coverage or I can write -- just use an automatic tool, get 50
percent coverage, write three more tests, get to 80 percent coverage, write five more tests
and get to 90 percent coverage, and then I had things I had to do manually versus 200,
right? So I'm curious as to what that number is, if you evaluated that yet.
>> Xusheng Xiao: I don't have the exact data for that.
>>: Okay.
>> Xusheng Xiao: But I think like it's much less help if you're using automated test
generation tools.
>>: And based on that quantification, especially on something like xUnit, right?
>> Mark Marron: I can write unit tests better than you can ->> Xusheng Xiao: Yeah, yeah, yeah. Okay. Thank you.
>> Mark Marron: Any other questions? All right. Thanks a lot.
>> Xusheng Xiao: Thank you, yeah.