>> Mark Marron: So hello, everyone. So I'm introducing Xusheng Xiao. He's visiting myself and Sumit Guani [phonetic] here. He did an internship with us last summer where he worked on some natural language processing and translation to Excel. He's also done a lot of work on testing and software engineering, so he's going talk to us today about some of his work. And he's just about to graduate from NC State University. So I'll let you go. >> Xusheng Xiao: Okay. Thanks, Mark, for the introductions. Good afternoon, everyone. Welcome to my talk. I'm going to present my research, cooperative testing and analysis via decision making. You are welcome to ask me any questions during the presentation. So software quality includes both functional quality and nonfunctional quality. Functional quality refers to the functional correctness of the software and the functional quality suppose the delivery of the functional requirements. Failure to ensure the high quality of software will result in serious consequences. Software with poor functional qualities has many functional defects which will cause the software to behave incorrectly. Recently a functional defect called the Knight Capital's computers to execute a series of automatic orders that were supposed to be spread out over a period of days which caused the company over $440 million. Software with poor functional quality may compromise user security and privacies. For example, posting your kid's photos using applications with [inaudible] taggings might expose your kid's exact location to strangers. Such security issues are also reported in news. To improve software quality, software testing and analysis tool can be used to automate certain activities in software development and maintenance which reduce the manual efforts of quality assurance. Among software testing and analysis tools for ensuring high quality, test generation to encode coverage tools are the type of tool that are commonly used in practice. The reason is that generating test inputs to cover a line of code is necessary for exposing faults at this line of codes. Although tool automations is important to reduce manual efforts, these tools face various challenge when dealing with complex softwares. For example, even the state-of-art test generation tools have difficulties in achieving high coverages for complex object-oriented programs since the tools cannot generate desired object states for certain branches and cannot deal with the method calls to external libraries. These challenges will still be difficult for the tool to tackle in the near future. For some of these challenges, user can provide their help to address the challenges. For example, user can provide more objects to replace method call to external libraries, help the test generator to improve the test courage. However, most of the tools does not communicate information with the users. A little research has been done to support such call operations. To better support such call operation, I propose a call -- a general methodology, cooperative testing analysis, which allows users to make informed decisions when cooperating with software testing analysis tools. My methodology advocates two types of call operations. The first type of call operation focuses on the tools, and user provide their help to address the difficulties faced by the tools. Here I would like to use the Google driverless car as the example. Given the destination, the Google driverless car will now tell us the destination automatically. Along the way, the car may face different kinds of difficulties. For example, the car may enter a crowded street and does not know how to move forward. In these cases, user can jump in and manually drive the car, help the car to get through the street. After that, the car can continue to move toward the destination automatically. If more difficulty are encountered later, then user can join in and help again. The second type of the cooperations focuses on the users, and the tools help the users make informed decisions to [inaudible] the task more effectively and more efficiently. Consider the Google driverless car example again. [inaudible] the user forgets the next destination to go, he may ask the AI assistance in the car to suggest the destinations. Then the AI assistant then reports a list of suggested destinations along with the information of how these destinations are suggested. For example, some destination may be suggested based on user's preferences. Some destination maybe be suggested based on user's current location and previous history. While looking at this kind of information of how the tools suggest destinations, provide context for users to choose the next destination to go, enable user to make informed decisions. In cooperative testing analysis, I have met research contributions on analyzing structured software artifacts such as problem source code and exclusion traces with software testing analysis techniques. I have also met research contributions on analyzing unstructured software artifacts using analysis techniques. These techniques focus on assuring the functional correctness, security and privacy and performance of the softwares. The structured software artifacts analyzed by my techniques include various types software applications such as command line application, GUI application, and mobile applications. The unstructured software artifacts analyzed by my techniques include various types of natural language software artifacts include such as API documentation, application description, and requirement documents. In today's talk, I will focus on three concrete projects that use two types of the cooperations to improve test generation and mobile securities. I will start with my projects on test generations. So software testing is one of the most widely used techniques to improve software qualities. But it's typically labor intensive and costly process. To address these issues, structural test generation can be used to produce high-covering tests automatically. Recently, with the advances of [inaudible], test generation based on dynamic symbolic executions has shown promising results in achieving high coverage and detecting real faults in well-tested libraries. Although test generation tools can achieve high coverages on certain problems, these tools face various challenges when dealing with complex object-oriented problems. So here is an example test report generated by Pex. Pex is based on dynamic symbolic executions which instrument the code to explore the feasible paths for generating test cases. Since the number of paths grows exponentially with the size of the problems, which is called path explosions, Pex includes a set of heuristics that guides the exploration to achieve high coverage faster and more effectively. So after the test generation, test generator tests reports that shows how many test cases has been generated and what are the achieved coverages. So let's look at this example. The achieved coverage of Pex is not that hard. Here I refer to the not covered part of the code as the symptoms. So what are the causes to these symptoms? Is it because of the path is frozen or what else? To understand the challenges faced by the test generation tools, I conduct a preliminary study on four popular open source projects. I first apply Pex to generate test inputs to achieve coverage. As we can see from the results, the achieved coverage is not that hard. Then I manually studied causes to the not-covered branches and I find out that -- yeah. >>: So what was your setup for applying Pex? I mean, how -- what was -- what was the experimental setup? So you had a function, you had an application with, say, a hundred methods. How did you apply Pex? >> Xusheng Xiao: Okay. So Pex comes with a feature that can automatically generate [inaudible] for each public method. So I try to apply Pex just to generate this program input for this public method. >>: So just Pex out of the box? So you push button. You did no manual effort? >> Xusheng Xiao: Yeah, yeah. I just [inaudible] Pex 1, yes. >>: Okay. So when you say apply Pex, there's a feature in Pex that just lets you generate regardless of what exists already, what tests exist already? You just run Pex on each public method and then collect the results? So this is completely automated? >> Xusheng Xiao: Yeah, completely automated. >>: And you haven't done any manual work to this point to achieve this coverage? >> Xusheng Xiao: Right, right. I haven't done manual work. >>: I've got another question. If you run Pex on a method, if it doing [inaudible] execution I can imagine that [inaudible] continue running forever, the presence of loops and whatnot. So did you provide a tight bound? Or how does it decide to stop? >> Xusheng Xiao: Okay, yeah. So Pex comes with preset boundaries for different kind of resources, like the execution time, a number of branches that you can collect. Yeah. So I just let Pex run with the default boundaries. Yes. Yeah. So I found out that like most of the not-covered branches are caused by two major kinds of problems. The first major kind of problem, object-creation problems, where Pex cannot generate desirable receiver or argument objects. Next I will show how this problem will compromise the achieved coverages. So here is example class, fixed size stack. This class is a [inaudible] class of stack. It puts a limitation on the maximum number of objects that can be pushed into the stack. To cover the true branches [inaudible], the test generation tool needs to generate a stack object whose size is 10. And here is one of the target sequences that can be used to produce such desired object states. And so we can see there are multiple method can be invoke on the object stack. And the combination of the method call grows exponentially with the number of method call in the sequences. Therefore, the search space is very huge and [inaudible] too often cannot easily generate such sequences. Without such frequencies, the true branches [inaudible] cannot be covered. The second major problem, external-method call problems, where the test generator tool failed to deal with a method call to the external library. So typically test generation tool instruments explores only methods at the project under test. The reason is that these are third-party API methods. They may have too many paths and the test generation tool use up the resources in exploring these experiment methods. Also, some of the external method, they are not instrumentable. For example, like method call to file assistant, a method call to network I/Os. So next I will use two example to show how external method calls may compromise the achieved coverage. So let's look at the first external method call file.exist. The return value of this method is used in an if statement. Since the test generation tool does not know the files in existence, it typically cannot generate a value to cause this method to return true. Therefore, the true branches is not covered. Next let's look at a second external method call, path.getfullpath. This external method slows exception for invalid inputs. So if the test generation tool cannot figure out how to generate the value of the inputs, this method will keep slowing exceptions, preventing the test generation tool from covering the remaining part of the problems. So for object-creation problem and external method call problems, user can provide their help to address these problems. To tackle object-creation problems, user can provide factory methods that encode sequence of method calls to produce a desired object state and the test generation tools can use these factory methods to generate more test cases for improving the achieved coverage. To tackle external method call problems, users can instruct the test generation tool to explore the external libraries and address [inaudible] but if the external library are too complex and containing too many paths, the test generation tool may not still generate desired values. Alternatively, user can provide more objects to simulate the environment dependencies and enable the test generation tool to generate the desired values for improving the achieved coverage. So to obtain user's help, the tools need to report the problems for the not-covered branches. Given the not-covered branches as the symptoms, the existing approaches will report all the non-permitted program inputs and fields as object-creation problems and report all the [inaudible] external method calls as external method call problems. These will become the likely causes to the not-covered branches here. However, these existing approaches faces two problems. The first problem is like most of the reported problems are false onlys which means that if the users provide their help to solve all these problems, a lot of efforts will be wasted. The second problem is like ->>: What is a false warning to you in this case? >> Xusheng Xiao: Yeah, so, for example, if I tell you that could you provide a factory method for object type C, and if I provide that factory method, still the recovery cannot be improved, then the users' effort will be wasted. >>: Okay, okay, okay. >> Xusheng Xiao: Yeah. And the second problem is like the existing approaches does not know the target state to solve the problems. That is that you can see in the approach it does not know what are the desired object states for solving object equation problem and what are the desired return values for solving external method call problems. Without such information, it's very difficult for user to provide their help. For example, a user may provide a factory method that produce a state object whose size is 5, but this factory method cannot help the test generation tools to improve the coverage. Therefore, the user may try several times until he luckily provides a factory method that can provide a desired state object. So to adjust these two problems, I propose an approach called Covana. Covana precisely identify problems faced by the test generation tool for the not-covered branches. The insight used by Covana is that partially covered conditional statements usually have either data or control dependency on the real problems. So here is the overview of Covana. Covana first performs forward symbolic executions on the problem using the generated test inputs. During the test executions, Covana observe the runtime events and identify the problem candidates. This problem candidate will become the likely causes to the not-covered branches. After identify these problem candidates, Covana turns the elements of the problem candidate into a symbolic value. That is, Covana turns the program inputs and fields into a symbolic value [inaudible] turn values of external method call into a symbolic value. Covana then performs forward symbolic executions on these symbolic values and collect the coverages and the runtime informations. Using these collected informations, Covana computes the data and control dependencies to prune irrelevant problem candidates and identify the problems along with the target state to solve these problems. So the symbol execution here is used to compute the data dependencies and the target states to solve our problems. Next I will use some example to show how Covana identify external method call problems. So Covana can see the only external method call whose arguments have data dependency on the program input as candidates. The reason is that other external method calls usually pre and constant string and I'll put the thread into three or four constant times, which does not compromise the achieved coverages. In this example, both of the external method call have data dependency on the program input and therefore I can see that as the problem candidates. Covana turns the return value of the external method call into symbolic values and compute the data dependency using forward symbolic executions. In this example, based on the data dependency computation, Covana knows that the if statement has data dependency on the external method call [inaudible]. Since the true branches of this if statement is not covered, Covana correctly reports this external method as an external method call problem. Along with the target state to solve this problem is like if you generate a value to make sure this method retain true, then the not-covered branches can be covered. Besides data dependencies, Covana also use control dependencies to identify external method call problems. So recall that this external method call slows exception if given invalid inputs. So if Covana identified that this external method call slow exception for all test executions and that the code after line 1 is not covered, then Covana reports this external method another external method call problems. So so far I have showed how Covana identifies external method call problems. Here I would like to show how Covana prunes irrelevant problem candidates. So in this example, most of the methods have data dependency and the program improve and therefore are considered as candidates. However, none of these methods compromise the subsequent coverage and therefore are safely pruned by Covana. To identify object -- yes. >>: Sorry. >> Xusheng Xiao: Yeah. >>: Can't format throw an exception? >> Xusheng Xiao: Not in these cases. So if like in building the test generation, if one of the test execution does not slow exceptions, which means that the test generation to already figure out some of the paths, that doesn't lead to the exceptions, so then the following part of the code can be covered. >>: Okay. I see. So what you're saying is that you gain your existence of tests. >> Xusheng Xiao: Yes. >>: Within those tests, some methods throw exceptions. >> Xusheng Xiao: Yes. >>: Then, quote/unquote, there's some reason that, yes, this may throw exception. >> Xusheng Xiao: Right. >>: But within those tests if a method does not run exception, you just assume that this method does not run exception. >> Xusheng Xiao: Yeah, yeah. >>: Okay. >> Xusheng Xiao: Yeah. So to identify object equation problems, Covana uses similar data dependency analysis. So in this example, if the true branches at line 5 is not covered, Covana knows that that's because the field stack.size is not equal to 10. However, if Covana directly reports the object type of this field, which is integer as the object equation problem, then it will result in an inaccurate report. The problem is that this field is private and it can only be modified by invoking the method of push and pop. So to adjust this problem, users actually provide factory methods for stack instead of [inaudible]. So to address these problems, so how do we adjust these problems? So given the field, we can actually create a field declaration hierarchy up to the program input, and the problem here is like which object type the test generation tool fail to generate the sequence of method call full. So if we look at how test generation tools generate a sequence or method call more carefully, we can observe that it's not difficult for test generation tools to generate method calls that invoke construct -- public constructors or public setters to assigned fields. Therefore, typically the difficulty lies in the field that is not assignable by invoking these public methods. So based on these observations, starting from the field of the problem inputs, if a field is available, then object creation problems is either the field or the field at the next level in these hierarchies. And if the field is not assignable, then the object-creation problem is the declaring classes of the fields. So based on these insights, I provide an algorithm to identify the object-creation problems from the field declaration hierarchies. I will use this same example to show how my algorithm works. >>: Can I ask a question? >> Xusheng Xiao: Yep. >>: I'm trying to understand what is the problem. Is the problem that we don't have a model for the push method on stack? Not fixed site stack, on stack. >> Xusheng Xiao: Right. >>: That it can modify the count field. >> Xusheng Xiao: Right. >>: We don't have that model. >> Xusheng Xiao: Yeah, we don't have the models. >>: And so how long to recover that? >> Xusheng Xiao: Yes. Yes. So we would like the users to help us to find out the relationships. Because you can -- besides push, you can also invoke pop, so then the state would change [inaudible] and it's difficult for the tool to automatically figure out how to generate. Yeah. >>: [inaudible] I mean, if you have the DLL or the library for stack, can't you just read the I/O code? >> Xusheng Xiao: Yeah, so basically the test generation that we try -- for example, I randomly try invoking different methods then, but it does not know like whether I should invoke -- keep invoking this method in order to approach the target states. For example, like you can try to invoke push one time and you see the state increase by 1. Then you try the pop again, and it goes back to 0. So yes. >>: Can I rephrase the question? So the question is I have the I/O. >> Xusheng Xiao: Yeah. >>: I can look at the I/O, I can see the internal state. >> Xusheng Xiao: Right. >>: So can I just basically muck with that I/O, rewrite it somehow so I can just build an argument stack and set its size equal to a fixed value? >> Xusheng Xiao: Oh, yeah. Yeah, you can do that. So you violate this like object invariance. Yes. >>: That's not how we set up the problem. The reason other problems that you have injected a valid sequence of methods. >>: And I think the other thing there is if you assign that field, then it's empty. So if the next thing here was pop rather than push, it would report an invariant violation there, even though that doesn't make sense from the [inaudible]. >> Xusheng Xiao: Yes. Yeah. Basically you will generate the invariant objects. So then when the people run your test cases, you may cover it, but it doesn't show the real force. >>: I mean, but I still -- I would like to refine my question further. Supposing I do have the I/O available to me, okay ->> Xusheng Xiao: Right. >>: So I can make some heuristic judgment based on the knowledge that the push method modifies the count field. It's not perfect because modification is a very loose thing. I know that in one case it is incrementing and in the other case it is decrementing it. >> Xusheng Xiao: Yes. >>: So I'm wondering if the problem that you're trying to solve is because I don't have perfect knowledge of what push is doing or is it the case that I don't have any knowledge at all. >> Xusheng Xiao: It's because I don't know what the push is doing. You actually have that knowledge. You can know that this method modify which fields. >>: Okay. That's allowed. I see. >> Xusheng Xiao: Right, right, right. >>: But you don't -- but you don't know, you know, how does it modify it. >> Xusheng Xiao: Yes, yes, yes. But actually my technique is orthogonal to this technique because you can always employ more powerful automation technique to irrelevant problem. But in the end there are still some objects that you cannot figure out. And my technique can -- applicable in the cases to identify the problems. Yeah. So my analysis start with analyzing the field. The problem input in these cases is a fixed size stack.stack. So my analysis final that these methods can be -- this field can be assigned by invoking the public in [inaudible] so that my analysis continue to analyze the next field. In this case, this is stack.stack. Since this filed cannot be assignable, then my analysis stops and reports the stack as the object-creation problems and ask users to provide factory methods for these object types and the target state for this object state is like the field stack.stack should be 10 in order to cover the branches. So to evaluate the effectiveness of Covana, I use two open source projects, xUnit QuickGraph. So I first apply Pex to generate test inputs for these two open source projects and I feed the problems and the generated test cases to Covana for identify object-creation problem and external method call problems for the not-covered branches. I then compare the effectiveness of Covana and the existing approaches. So I conduct two evaluations. The first evaluation is to show how effective is Covana in identifying these two types of problems. And so we can see from the results, Covana is able to identify both type of the problems with low false positive and low false negatives. And the second evaluation is to evaluate how effective is Covana in pruning the irrelevant problem candidates identified by the existing approaches. The result shows that Covana is able to prune that more than 60 percent of the irrelevant problem candidate also with low false positive and low false negatives. So Covana enables the cooperations between the developers and the test generation tools ->>: Sorry, can I ask -- can you explain what qualifies as a false positive and a false negative? >> Xusheng Xiao: Yeah. So for after the tool generator test inputs, many only look at all the not-covered branches, and I figure what are the real problems that cause these branches cannot covered. So and then when Covana reports a problem and then try to provide factory method of program, not objects, either after I do that, if I can cover the branches as a true positive, otherwise it's a false positive. And for the branches that supposed to be object-creation problem, but Covana does not correctly report or mark it as false negative. >>: So you worked with all 2,000 of those ones? >> Xusheng Xiao: Yeah, yeah. No. Sorry. This is the [inaudible] reported by the existing approaches. Most of them are false positives. But my approaches only report about 200. So I look at all 200 of them. >>: So maybe I misunderstood this. So you said you pruned 1567 of them. >> Xusheng Xiao: Right. >>: And so there was one false positive and two false negatives? >> Xusheng Xiao: Yes, yes. >>: Didn't you have to look at all of them to make sure that they were the false positives? >> Xusheng Xiao: Oh, because I look at the not-covered branches and I already know like how many real problems are there. So many of them are actually false positive by the existing approaches. Yeah. Okay. So Covana enables the developers to cooperate with the test generation tool. So first the test generation tool is used to generate the test cases automatically. Covana then diagnoses the difficulty first by the test generation tool for the not-covered branches. A report identify objection-creation problem and external method call problems to the users. By looking at these problems, user can provide factory methods and mock objects to address the problems. These factory methods and mock objects are [inaudible] the test generation tools for generating more test cases for improving the test coverages. So I have show like how user can cooperate with the test generation tool to improve the test coverage. Next I would like to show how tools can -- how tools can help users make informed decisions on how to control their privacies. I would like to use my projects on mobile security for illustrating these kind of cooperations. So [inaudible] of smartphones and mobile applications, application markets such as Apple's App Store, Google's Google Play, Microsoft Windows Phone markets has become a primary mechanism for distributing software [inaudible] into mobile devices. And unfortunately this is also an easy mechanism for malicious users to build malwares. To protect users' privacies, these predominate mobile phones provide different privacy control mechanisms. For Google Androids, a permission list is showed to the users before the user install the applications. This permission list solves the permissions that are requested by the mobile applications and user has to approve all the permissions in order to install the applications. For Apple iOS, a popup dialogue is showed to the users the first time the application try to use permissions. The Windows Phones use a mix of strategies that combine the both. So although these two approaches are a bit different, both of these two approaches just report what permissions are requested by the mobile applications, but it does not explain how and why they would like to use your permissions. Therefore, users make uninformed decisions on controlling their privacies. Studies show that this is simple really to situations where users just approve all the permissions without looking at what permissions are requested by the mobile applications. So to improve the privacy control mechanism, I propose a user-aware privacy control approaches. Instead of showing the permission list before the user install the applications, by approach shows the information flows. So this information flow shows what data type flow into what output channels, which can be used to explain how the applications would like to use your permissions. In this example, this information flow shows that this application will share your locations at Facebook. Here I refer to the [inaudible] of the private information as source, and I'll refer to the output channels where the private information may escape as a sink. Besides showing the information first, my approach provides monitored sink which are output channels where users can reveal their information. For example, an application may allow users to take photos using the camera and share it at the Facebook. Before the pictures is shared, a popup dialogue shows the pictures for the users to review. In these cases, users are aware of the information that's going to be shared, and also user can perform runtime inspection of the shared data instances. However, there are two challenges in realizing this user-aware privacy controls. The first challenge is that the information flow can escape users' inspections by flowing to a non-monitored sink. I'll refer to such information flow as escaping flows. The second challenge is that the applications can tamper with the data before the data was showed to the users. That is what you see may not be what you shared. For example, an application can encode the location's information into the photos before the photo is showed to the user. Although the user get to review the picture, it's difficult for the user to notice the small changes. Even worse, the application can encode a location into the metadata of the pictures. In that cases, user does not notice the location information at all. Record [inaudible] the picture of location information can cause serious security issues. So I'll refer to such information flow as tampering flows. To address these two challenges, I propose approaches that computes information flows and classified information flow as different kinds of flows. So to identify escaping flows, my approach is [inaudible] information flows, flow into anonymous sink. To identify tampering flows, my approach provides a tampering analysis which checks whether the data tampered with before flow into a sink. And our approach identifies information flow where tampered data flow into monitored sink as tampering flows. Based on this tampering analysis, our project does not require user's decision at [inaudible] for information flow where untampered data flow into monitored sink. Sink's users have the opportunity to perform runtime inspections for these information flows. So our approaches identify the tampering flow and escaping flows and ask the user to provide their decisions on what kind of information they would like to use for these flows. However, the user may not be sure about whether these flows are [inaudible]. So our approaches allows user to use anonymized data for these information flows. By using the anonymized data, the user can experience the applications without compromising their privacies. And if the user confirms these information flows [inaudible], the user can go back and change the settings to use the private data for these information flows. If the user does not want to provide decisions at all, my approaches would by default use anonymized data for the identified tampering and escaping flows. Next I will show how my approaches compute the information flows. So my approaches computes information flow stack query using interpretations. My analysis maintains on object states of the applications and updated state based on the simulated execution of a statement. This -- yeah. >>: So you said there was a result that showed that people don't pay attention to those lists of permissions that applications require. >> Xusheng Xiao: Yes. >>: It's good that you have a default on here, but if they deny something it might change the way the application behaves. So is there a scenario where people either don't pay attention or don't care or flip everything to real data and then they get the whole application experience? >> Xusheng Xiao: Yeah, yeah, they can do that as well. >>: Okay. So I guess is that as likely as people just reading through the permission list and saying, oh, whatever? >> Xusheng Xiao: Yeah, yeah. Actually, some of the people, they don't care whether you share their location or not. But some users, they don't like -- like if you like to share my information, at least you need to let me know. So that's why I have this like approaches called user-aware. So we need to make user aware of like what the application can go into use your permissions. >>: Okay. You're arguing that this is harder to just ignore, like your permissions lists. >> Xusheng Xiao: Yes, yes, yes. Yeah. So my analysis is a summary-based interprocedural analysis. It computes a symbolic method summary for each method and use a fix point algorithm -- yep. >>: One more. Is the application that you're like regulating either anonymous or [inaudible], does it know what [inaudible]? >> Xusheng Xiao: What do you mean by [inaudible]? >>: Could it degrade what it does if it knows that it's getting anonymized data so that it's more likely that a user will give it real data so that it can do what mischievous thing it wants to do? >> Xusheng Xiao: Yeah, I think they can -- at the end, the malicious user can observe the patterns of how you do this. But then like for anonymized data, you can just like choose some random data. >>: No, but, I mean, could -- so the application developer wants real data so they can do something bad. >> Oh, yeah, yeah. >>: So could they just say, if I'm getting anonymized data, I'm going to make this application just not work correctly? >> Xusheng Xiao: Yes, yeah. >>: So if users want to use the application, they have to give it real data and then the application can do its malicious thing. Is that possible? Does the application have access to what permission level it's been given? >> Xusheng Xiao: No, the application doesn't [inaudible]. But the malicious user definitely can observe some pattern. Like if you keep sending me junk datas, then they can block you as well, yeah. So one way to deal with that is like to generate the random datas and then just send to the -- send to the server. So they may have difficulties in like whether -- to observe whether this is some fixed data or not. Like you -- you don't just like give the fixed data every time when there's APIs involved. You generate a little bit different datas. And so they may not be able to detect it. So, yeah. But it's not a perfect event. Yeah. It's just one way to deal with that. Yeah. Okay. So my approaches provides notations that can be used to annotate built-in APIs. So these annotations are used to describe who are the sources and the sink of the built-in APIs. And they're also used to describe how the information will be propagated by invoking the built-in APIs. So the formal definition of this data and how I update this data are in the papers. Here we just use some example to show how my analysis computer states identified information flows. So let's look at the simple examples. At line 4, the local variable is classified with the location informations. Through line 5 and line 6, the local variables, S and P, are classified as well. And finally at line 7 my analysis encounter a sink, and identified information flow from the location to the sharings. So besides of this explicit it information flows, based on dataflow and control flows, there are also implicit flows that pose challenges for the information flow computations. So let's look at an example of implicit flows. So at line 3 a message is added to a message of corrections. And this message is classified at line 4 by using the secret information stored in S. And later in the code if another message is retrieved from the message of corrections, a new share via Web service, we have a potential flow here. Because the retrieved messages can be the classified message that we classify at line 5. But explicit information flow computation cannot detect such information flow since the message is classified after it's added to the message correction. So the message correction cannot be correctly classified. Such implicit flow propagate the information through the containers like [inaudible]. There are also other kind of implicit flows that can propagate the information. For example, an implicit information flow can propagate the information through a file system. It can save classified information into the file system in one method and send the data out using another method. So to identify the implicit flow based on the containers, my approach provides an edge to connect the data types. So when the message is added to the message collections, my approach is to provide the link between message collection and message. With this link, when the message is classified, but the message collection is classified as well. But with such links, my approaches potentially may produce many false positive because the retrieved message from the collections may not be the classified information. So in future work I plan to record all these locations where my analysis made conservative decisions and ask the user to confirm whether such decisions are desirable or not. So my final approach is that [inaudible] increased flow through file systems. In future work I plan to use dynamic analysis to identify such flows. So information flow provide the basics for the tampering analysis. The tampering analysis is used to track whether the source -- the data are tampered with before flow into the sink. So the tamper analysis provide a tamper notation which it can be used to annotate the building API. In this example, the API in collocation should be annotated with the tamper annotations. Based on the tampered annotation, the tamper analysis will apply the tamper operator on a set of sources that are associated with local variable. I will use an example to show how this tamper analysis work. So we take pictures using a camera. And here is what the state would look like. The local variable picture is associated with the source camera. If the API in collocation is invoked on the pictures, a tamper operator will be applied on the camera sources, indicates that the camera sources has been tampered with. So later when we encounter a sink, my approaches can identify a tampering flow. So I implement this whole approach is hard to develop. I think here everyone knows TouchDevelop -- what TouchDevelop is, so I will not try to explain it too details. So TouchDevelop allows users to publish and share their applications, which we call scripts in TouchDevelop, through the script bazaars. And these characteristics [inaudible] similar mobile phones to Android and iOS. I evaluate my approaches on about 600 scripts published by about 200 users. So I compare my approaches with the existing approaches and the flow approaches. The existing approaches will require users' decisions for a flow script if most sources in sink are found in the permission list. The flow approaches will require users' decisions, if any information flow is found for scripts. Record my approaches will only identify the temporary flow and escaping flow and will own require users' decision for those two kind of flows. So I conduct two evaluations. The first evaluation is to measure how many script requires users' decisions. And so we can see from the results there are 172 script that use sources. And the existing approaches will require users' decisions for 89 scripts. This number can be reduced to 78 if we're using the flow approaches. Since our approach does not require users' decisions for information flow with untampered information flow to monitor sink, our approach will only require users' decisions for 54 scripts. The second evaluation is to measure how many sources require users' decisions. And so we can see from the results the existing approaches will require users' decision for 152 sources. And this number can be reduced to 119 by using the flow approaches. By using our approaches, we can reduce this number to 63. These results shows that even with the false positive introduced by the static analysis, our approaches is still very effectively in reducing the users' decisions for the scripts. So so far I have showed that like information flow can be used to explain how a mobile application will use the permissions. My result work with others, WHYPER, propose a framework that use natural language processing technique to establish links between sentences in the application descriptions and the permission in the permission list. Such sentences can be used to explain why an application would like to use your permissions. Both of kind of this information complements the results of the information flow analysis and they serve as the first step toward reaching user expectations and the application behaviors. So consider sentences in application descriptions. Also we can share the yoga exercises to your friends via e-mail and SMS. The phrases share to your friends via e-mail and SMS implies that this application needs to use your contact information. The reason is that they need -- the application needs to know your friends' e-mail in order to send the e-mails. So I'll refer to certain sentences as permission sentences since they're sentences that indicate the use of the permissions. So to identify such sentences, a straightforward approach is just to use the keyword-based search on the application descriptions. So, for example, we can just search the sentences that contains the keyword contacts. However, these keyword-based search approaches face two challenges. The first challenge is there are certain keywords, such as contact, that have confounding meanings. For example, in the sentence like display user contacts, the contacts here means like the application needs to read your contact information. However, in other sentences, like contact me at an e-mail address, the contact here does not imply that the application would like to use your contact permissions. The second challenges is that like sentences are often described as sensitive operations without actually referring to the keywords. Consider the same example. In these sentences, it does not contain the keywords contacts, implies that this application needs to use your contact permissions. So to address these two challenges, my approach -- our framework is natural language processing techniques. So natural language processing techniques help computers understand the natural language artifacts such as speech recognition and translations. The reason natural language technique have shown some successes recently such as the IBM Watson is still difficult to use natural language processing technique for general purposes. However, natural language processing techniques is feasible for analyzing domain-specific sentences with specific styles. My previous work on specializing natural languages processing techniques using domain knowledges to infer formal model from API documentations and use cases have shown promising results. So in WHYPER we also specialize the natural language processing techniques using the domain knowledges from the API documentations. So here I will just show the major components of WHYPER. So WHYPER produces an intermediate representation of the input natural language sentences ->>: Can I ask you a quick question? >> Xusheng Xiao: Yep. >>: So I'm kind of confused. Like so in the previous slide you said API documentation. >> Xusheng Xiao: Right. >>: And then in this next slide this sounds like -- this sounds like a description from what the app does. >> Xusheng Xiao: Right, right, right. >>: So where are these sentences coming from? >> Xusheng Xiao: So these sentences are from the application descriptions. Now, the domain knowledge are from the API documentation. I will explain that domain knowledge later. Yeah. So first my approach is to use the natural language parser to identify the syntax structure of the sentences such as like noun phrases and verb group. Based on the natural language parsing, we also produce a dependence tree. So this dependence tree shows the grammatical relationships between the words. It contains more semantic information than the simple syntax trees. Using these dependent trees, our approach is to further produce an intermediate representation of the sentences, which is essentially a first-order logical presentation. This is also a tree structures. The leaf node of the trees are the entities and the other nodes are the predicates. So each -- the children nodes of predicates are the participating entities, and the first child is the governing entity and the second child is the dependent entities. So using this intermediate representations and the semantic graph inferred from the API documentations, our approaches identify whether sentences is a permission sentences. So here is a semantic graph for the permissions with contacts. So the contacts is the resources protected by the permissions that read the contacts. We also look into the API documentation to identify the attributes of the contacts such as the phone number, e-mail, location. These become the [inaudible] resources of the contacts. And then we look into the API documentations that require the permissions with contacts to identify the verbs that represent the action that can be performed on these resources such as read, search. So to identify whether a sentence is permission sentences, we search the first-order logical representations to find a pair of predicate and entity that match a pair of action or resources in the semantic graphs. So record in this first-order logic representations the leaf know our entities. So our algorithm starts by matching the leaf node with the resources. In this example, we can find a perfect match, e-mails. After we found this message, we start to search the predicate node to try to match the predicate node with the actions which is basically a verb. So in order to perform the match, we use the one [inaudible]. In this example, we will search for starting from the end predicate and we keep going up to search the parent predicate. We also search the sibling predicate if the sibling predicate is the first child or is the first child, the first-order logic [inaudible]. So in this example, if we reach the sibling predicate share of the predicate owned, based on the one [inaudible] we're going to find out that these two verb are very similar. So we find a match pair of predicate and entities and a pair of the action resources and consider these sentences as permission sentences. Yeah. >>: So what if there was the word not in there? Like let's say the English sentence was does not allow you to share your yoga exercises with your friends via e-mail and SMS? >> Xusheng Xiao: Okay. Good question. Yeah. So we -- yeah, our kind of approaches haven't considered the negative effects. But based on this first-order logic representation, we can encode the [inaudible] into that. >>: Well, a different aspect of this but similar is e-mail in the left side of the screen is a verb ->> Xusheng Xiao: Yeah. >>: -- e-mail on the right side of the screen is a noun. >> Xusheng Xiao: Yes. >>: And if you're trying to match -- or trying -- you say one of the problems in the -- just using control F for find was you get the sense of the word wrong. And here you're getting the sense of the word wrong even though you're matching. >> Xusheng Xiao: So because this tree is built from the dependency tree, so we already know the e-mail here is a noun because based on the natural language parsing I already know the syntax structure of each word. >>: Sure. But in a way it's a verb because it's a means of ->> Xusheng Xiao: Yeah, if it's a verb, then it would be kind of ->>: [inaudible] verb, essentially. >> Xusheng Xiao: Yeah. >>: They parse the sentence with [inaudible]. >> Xusheng Xiao: Yeah. So we already parse it ->>: That's not English, that's jargon, e-mail and SMS. That's not English. So [inaudible] sentence unless it's specific to jargon, computers ->> Oh. Oh, yeah, yeah, yeah. Yeah. >>: Don't know those words. >> Xusheng Xiao: Yeah. Because I -- yeah. Because the [inaudible] I did the experiment. >>: Structure says that has to be [inaudible]. >> Xusheng Xiao: Yes. >>: Via this and that. >> Xusheng Xiao: Right, right. >>: English changes things. I don't know, natural language is statistical, not parsing. >> Xusheng Xiao: Yeah, yeah. >>: [inaudible] don't mess with the guy that [inaudible]. >>: [inaudible] training so when you ->> Xusheng Xiao: Right, right. So -- yeah. So the e-mail and SMS is like there is a technique in natural language processing called them entity recognitions. So to improve the precision of the entity recognition, we also maintain a static list of some nouns that we common use in the domain of the mobile applications so that will improve the precision and just the problem is a problem. Yeah. So we evaluate our approaches on 600 application descriptions that consist of about 1,000 sentences, and we achieve for missing results in terms of average precision and recalls. So we also compare our approaches to the keyword-based searching process, and the results of our approach can significantly improve the precisions, and performance slightly worse in terms of the cost. Yes. >>: What's your ground truth? >> Xusheng Xiao: Oh, the ground truth, we manually annotate each sentences as whether the sentences is the permission sentences or not. >>: So human. >>: Yes, human. Yes, yes. Yeah, so we showed out the results so that we have much better precisions but we have slightly worse recalls. The reason like why we have slightly worse recall is because the action or resource pairs inferred from the [inaudible] cannot cover every cases of how the people will use the API and use the resources. So in the future work we plan to use some learning techniques to learn from the user comments and the [inaudible] to improve the semantic graph. And also under our contacts, precision is much more important because if our technique says that this application has the description for permissions but in the end it doesn't, it's pretty bad. So in our contacts we've seen precision is much more important. So information flow analysis on the WHYPER enables user to make informed decision on how to control their privacies. So a privacy [inaudible] system will give users an application with the application description to inspect. And in my information flow analysis will compute information flows and identify the escaping flow and the tampering flows. The WHYPER will also identify the sentences that indicate the user of the permissions. These two kinds of informations are then presented to the users, helping user make informed decisions on how to control their privacies. If user are not sure about this tampering flows and escaping flows, user can use anonymized data to experience the applications without compromising their privacies. So this process can be repeated until all the information flow has been inspected by the users or the user loses the patience. So I have to show like two kinds of cooperations. And the first part, the car, shows users can cooperate with the test generation tool to improve the coverage. And the second part of my talk shows that tools can help users make informed decision on controlling their privacies. My future work in cooperative testing analysis consists of two directions. So the first direction is economical analysis. So in both type the cooperations, user may have limited time for each task and they would like focus on the more important task items. So for test generation, the benefit -- so we would like to have a technique that can estimate the benefit and the cost for solving a task items. For test generation, for benefits, like if I solve this object creation problem, how much more coverage can I get. And the cost is how difficult to solve this object creation problems. For mobile security, the benefit is like if I grant private data full permissions, what functionality can benefit from it. And the cost is that if I use anonymized data for permissions, how difficult for me to assess this permission uses when they're using the applications. The second direction is I would like to provide better explanations. So for test generation I would like to in best case how can I use this visualization tool to improve user understanding of the problem faced by the test generation tools. I also plan to conduct user study to measure the effectiveness of the visualizations. For mobile securities, I plan to investigate technique to reveal the contacts of the permission uses. So these contacts can be used and explained when and where the permission will be used. So certain malicious behavior will be easier for users to notice if their contacts are revealed to the users. For example, malicious applications may send a text message in the background when a signal is changed. So if such contacts information is revealed to the users, is much -- user can easily find suspicious behaviors. So my long-term future work in cooperative testing analysis focus on three types of questions. The first question is like what user can do. So currently I assume users have to send programming skill to provide their help. In reality user may have different kind of skills. So I would like to investigate how the factors might affect the cooperation process proposed by my methodologies. The second question is like how can the tools learn from the users. So after the users provide their help to the tools, the tool can accumulate the user interaction histories. So later if a similar problem is faced by the tool, can we mutate this user interaction history to provide solutions to the new problems. And the third question is like how to assist the user in providing their help. So I pray their investigative technique provide better tools of course for the cooperations. For example, currently user need to provide a factory method from the scratch. But in the future, I plan to investigate like to use program synthesis technique to synthesize a partial solution and maybe also suggest method call for user to choose in order to contract factory methods. So to conclude my talk, I have introduced methodologies, cooperative testing analysis where users make informed decision when cooperating with software testing analysis tool. I have also showed how this methodology can be used to improve test generation and mobile security. So thanks for coming to my talk, and I'm ready to take questions. [applause] >> Mark Marron: Do we have any questions? We asked a bunch during the talk. >>: Already asked. >> Xusheng Xiao: Yeah, yeah. Thank you. >>: Okay. I do have one, though. So on your evaluation on the user assistant testing, right, you're using some existing frameworks including an xUnit, a unit testing framework, right? >> Xusheng Xiao: Right. >>: I assume they have unit tests written for xUnit that's very high coverage. How does the coverage you get compare to the coverage of this handwritten unit tests? >> Xusheng Xiao: Yeah, so ->>: Any idea about like how much effort you had to do using the sort of cooperative approach versus I write all the unit tests by hand? >> Xusheng Xiao: So basically if you look at these like unit tests are written manually, they already encode method call sequences in order to contract different kind of objects. Sometimes they even provide some factory methods for testing purposes. So by using my approaches, I also did a small study like if I provide -- if I provide factory methods to solve two or three problems and also like some external method call problems, I can boost the coverage from the 50 percent to about 80 percent. >>: Okay. And like how does that compare with -- I mean, if this had unit tests already written for it, how does that compare with what other coverages you need to make versus the ->> Xusheng Xiao: Yeah. So, I mean, if you written manually, it's possible for you to reach about more than 90 percent, maybe not hundred percent. Yeah. If you use test generation tools, it provide much less efforts because the first 50 percent or 60 percent were covered by Pex automatically, and later you just need to provide a few factory methods and you can also reach that similar levels. >> Xusheng Xiao: So would it be nice to say like, okay, I can write 200 units tests by hand and get 90 percent coverage or I can write -- just use an automatic tool, get 50 percent coverage, write three more tests, get to 80 percent coverage, write five more tests and get to 90 percent coverage, and then I had things I had to do manually versus 200, right? So I'm curious as to what that number is, if you evaluated that yet. >> Xusheng Xiao: I don't have the exact data for that. >>: Okay. >> Xusheng Xiao: But I think like it's much less help if you're using automated test generation tools. >>: And based on that quantification, especially on something like xUnit, right? >> Mark Marron: I can write unit tests better than you can ->> Xusheng Xiao: Yeah, yeah, yeah. Okay. Thank you. >> Mark Marron: Any other questions? All right. Thanks a lot. >> Xusheng Xiao: Thank you, yeah. [applause]