Functional Parameterized Unit Testing

advertisement
SBQS 2013
Tao Xie
University of Illinois
at Urbana-Champaign,USA
taoxie@illinois.edu
IBM's Deep Blue defeated chess champion
Garry Kasparov in 1997
IBM Watson defeated top human Jeopardy!
players in 2011
IBM Watson as Jeopardy! player
Google’s driverless car
Microsoft's instant voice translation tool
"Completely Automated
Public Turing test to tell
Computers and Humans
Apart"
iPad
Movie: Minority Report
CNN News
…

Machine is better at task set A
 Mechanical, tedious, repetitive tasks, …
 Ex. solving constraints along a long path

Human is better at task set B
 Intelligence, human intent, abstraction, domain
knowledge, …
 Ex. local reasoning after a loop, recognizing naming
semantics
=A U B
8
Ironies of Automation
“Even highly automated systems, such as
electric power networks, need human beings...
one can draw the paradoxical conclusion that
automated systems still are man-machine
systems, for which both technical and human
factors are important.”
“As the plane passed 39 000 feet, the stall and overspeed
warning indicators came on simultaneously—something
that’s supposed to be impossible, and a situation the crew
is not trained to handle.” IEEE Spectrum 2009
Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .
Malaysia Airlines Flight 124
@2005
Ironies of Automation
“The increased interest in human factors among
engineers reflects the irony that the more
advanced a control system is, so the more
crucial may be the contribution of the human
operator.”
Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .
Malaysia Airlines Flight 124
@2005

Don’t forget human factors
 Using your tools as end-to-end solutions
 Helping your tools

Don’t forget cooperations of human and tool;
human and human
 Human can help your tools too
 Human and human could work together to help your
tools, e.g., crowdsourcing
11

Don’t forget human factors
 Using your tools as end-to-end solutions
 Helping your tools

Don’t forget cooperations of human and tool;
human and human
 Human can help your tools too
 Human and human could work together to help your
tools, e.g., crowdsourcing
12
“During the past 21 years, over 75 papers and 9
Ph.D. theses have been published on pointer
analysis. Given the tones of work on this topic
one may wonder, “Haven't we solved this
problem yet?'' With input from many
researchers in the field, this paper describes
issues related to pointer analysis and remaining
open problems.”
Michael Hind. Pointer analysis: haven't we solved this problem yet?. In Proc.
ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and
Engineering (PASTE 2001)
14
Section 4.3 Designing an Analysis for a Client’s Needs
“Barbara Ryder expands on this topic: “… We can all write
an unbounded number of papers that compare different
pointer analysis approximations in the abstract.
However, this does not accomplish the key goal, which is
to design and engineer pointer analyses that are useful
for solving real software problems for realistic
programs.”
15
MSRA
XIAO
Zhenmin Li, Shan Lu, Suvda Myagmar, and
Yuanyuan Zhou. CP-Miner: a tool for finding
copy-paste and related bugs in operating
system code. In Proc. OSDI 2004.
Yingnong Dang, Dongmei Zhang, Song Ge,
Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO:
Tuning code clones at hands of engineers in
practice. In Proc. ACSAC 2012
MSR 2011 Keynote by YY Zhou: Connecting Technology
with Real-world Problems – From Copy-paste
Detection to Detecting Known Bugs
Human to Determine What
are Serious (Known) Bugs
17
Available in Visual Studio 2012
Finding refactoring
opportunity
Searching similar snippets for
fixing bug once
XIAO Code Clone Search service integrated into workflow of
Microsoft Security Response Center (MSRC)
Microsoft Technet Blog about XIAO:
We wanted to be sure to address the vulnerable code wherever it appeared
across the Microsoft code base. To that end, we have been working with
Microsoft Research to develop a “Cloned Code Detection” system that we can
run for every MSRC case to find any instance of the vulnerable
code in any shipping product. This system is the one that found several of
the copies of CVE-2011-3402 that we are now addressing with MS12-034.
Yingnong Dang, Dongmei Zhang, Song Ge, Yingjun Qiu, and Tao Xie. XIAO: Tuning code clones at hands of engineers in
practice. In Proc. Annual Computer Security Applications Conference (ACSAC 2012)
18
XIAO enables code clone analysis with
High scalability, High compatibility
High tunability: what you tune is what you get
High explorability:
1
2
3
4
5
6
6
4
3
7
1
1.
2.
3.
4.
5.
6.
7.
Clone navigation based on source tree hierarchy
Pivoting of folder level statistics
Folder level statistics
Clone function list in selected folder
Clone function filters
Sorting by bug or refactoring potential
Tagging
How to navigate through the large
number of detected clones?
1.
2.
3.
4.
5.
6.
1
Block correspondence
Block types
Block navigation
Copying
Bug filing
Tagging
5
2
How to quickly review a pair of clones?
19

50 years of automated debugging research
 N papers  only 5 evaluated with actual programmers
“
”
Chris Parnin and Alessandro Orso. Are automated debugging techniques actually helping
programmers?. In Proc. ISSTA 2011

Academia
 Tend to leave human out of loop (involving human makes
evaluations difficult to conduct or write)
 Tend not to spend effort on improving tool usability
▪ tool usability would be valued more in HCI than in SE
▪ too much to include both the approach/tool itself and usability/its evaluation
in a single paper

Real-world
 Often has human in the loop (familiar IDE integration, social
effect, lack of expertise/willingness to write specs,…)

Examples
 Agitar [ISSTA 2006] vs. Daikon [TSE 2001]
 Test generation in Pex based on constraint solving


Goal: to identify the future directions in research
in formal methods and its transition to industrial
practice.
The workshop will bring together researchers and
identify primary challenges in the field, both
foundational, infrastructural, and in transitioning
ideas from research labs to developer tools.
http://goto.ucsd.edu/~rjhala/NSFWorkshop/





“Lack of education amongst practitioners”
“Education of students in logic and design for
verification”
“Expertise required to create and use a verification
tool. E.g., both Astre for Airbus and SDV for
Windows drivers were closely shepherded by
verification experts.”
“Tools require lots of up-front effort (e.g., to write
specifications)”
“User effort required to guide verification tools,
such as assertions or specifications”



“Not integrated with standard development flows
(testing)”
“Too many false positives and no ranking of errors”
“General usability of tools, in terms of false alarms
and error messages. The Coverity CACM paper
pointed out that they had developed features that
they do not deploy because they baffle users.
Many tools choose unsoundness over soundness to
avoid false alarms.”


“The necessity of detailed specifications and
complex interaction with tools, which is very costly
and discouraging for industrial, who lack high-level
specialists.”
“Feedback to users. It’s difficult to explain to users
why automated verification tools are failing.
Counterexamples to properties can be very difficult
for users to understand, especially when they are
abstract, or based on incomplete environment
models or constraints.”
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
Human Factors
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
Andy Ko and Brad Myers. Debugging Reinvented: Asking and Answering Why and Why Not
Questions about Program Behavior. In Proc. ICSE 2008

Don’t forget human factors
 Using your tools as end-to-end solutions
 Helping your tools

Don’t forget cooperations of human and tool
intelligence; human and human intelligence
 Human can help your tools too
 Human and human could work together to help your
tools, e.g., crowdsourcing
29

Motivation
 Architecture recovery is challenging (abstraction gap)
 Human typically has high-level view in mind

Repeat
 Human: define/update high-level model of interest
 Tool:
extract a source model
 Human: define/update declarative mapping between
high-level model and source model
 Tool:
compute a software reflexion model
 Human: interpret the software reflexion model
Until happy
Gail C. Murphy, David Notkin. Reengineering with Reflection Models: A Case Study. IEEE Computer 1997
Running Symbolic PathFinder ...
…
============================================
========== results
no errors detected
============================================
========== statistics
elapsed time:
0:00:02
states:
new=4, visited=0,
backtracked=4, end=2
search:
maxDepth=3,
constraints=0
choice generators: thread=1, data=2
heap:
gc=3, new=271, free=22
instructions:
2875
max memory:
81MB
loaded code:
classes=71, methods=884
…
31

Recent advanced technique: Dynamic
Symbolic Execution/Concolic Testing
 Instrument code to explore feasible paths

Example tool: Pex from Microsoft Research
(for .NET programs)
L. A. Clarke. A system to generate test data and symbolically execute programs. TSE 1976.
J. C. King. Symbolic execution and program testing. CACM 1976.
P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. PLDI 2005
K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. ESEC/FSE 2005
N. Tillmann and J. de Halleux. Pex - White Box Test Generation for .NET. TAP 2008
32
Choose next path
Code to generate inputs for:
Solve
void CoverMe(int[] a)
{
if (a == null) return;
if (a.Length > 0)
if (a[0] == 1234567890)
throw new Exception("bug");
}
F
F
a.Length>0
a==null
Data
Observed constraints
null
a==null
a!=null
{}
a!=null &&
a.Length>0
{0}
a!=null &&
!(a.Length>0)
a!=null &&
a.Length>0 &&
a[0]!=1234567890
Constraints to solve
T
Execute&Monitor
a!=null &&
a.Length>0 &&
a[0]==1234567890
Negated condition
{123…}
a!=null &&
a.Length>0 &&
a[0]==1234567890
T
Done: There is no path left.
a[0]==123…
F
T
Released since 2008
Pex detected various bugs (including a serious
bug) in a core .NET component (already been
extensively tested over 5 years by 40 testers) ,
used by thousands of developers and millions
of end users.
Download counts
initial 20 months of release
Academic: 17,366
Industrial: 13,022
Total:
30,388
“It has saved me two major bugs (not caught by normal unit tests)
that would have taken at least a week to track down and fix
normally plus a few smaller issues so I'm a big proponent of Pex.”
http://research.microsoft.com/projects/pex/
34

Method sequences
 MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09],
Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10],
Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]

Environments
e.g., db, file systems, network, …
 DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11]
 CloudApp Testing [Zhang et al. IEEE Soft 12]

Loops
 Fitnex [Xie et al. DSN 09]
http://people.engr.ncsu.edu/txie/publications.htm
Class Under Test
00: class Graph { …
03:
public void AddVertex (Vertex v) {
04:
vertices.Add(v);
05:
}
06:
public Edge AddEdge (Vertex v1, Vertex v2) {
…
15:
}
16: }
Manual Test
Generation:
Tedious, Missing
Special/Corner
Cases, …
Generated Unit Tests
void test1() {
Graph ag = new Graph();
Vertex v1 = new Vertex(0);
ag.AddVertex(v1);
}
void test2() {
Graph ag = new Graph();
Vertex v1 = new Vertex(0);
ag.AddEdge(v1, v1);
}
…
36
36
Running Symbolic PathFinder ...
…
============================================
========== results
no errors detected
============================================
========== statistics
elapsed time:
0:00:02
states:
new=4, visited=0,
backtracked=4, end=2
search:
maxDepth=3,
constraints=0
choice generators: thread=1, data=2
heap:
gc=3, new=271, free=22
instructions:
2875
max memory:
81MB
loaded code:
classes=71, methods=884
…
37
Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing
 Instrument code to explore feasible paths
 Challenge: path explosion

When desirable
receiver or argument
objects are not
generated
Total block coverage achieved is 50%, lowest coverage 16%.


object-creation problems (OCP) - 65%
external-method call problems (EMCP) – 27%
38
00: class Graph { …
03:
public void AddVertex (Vertex v) {
04:
vertices.Add(v); // B1 }
06:
public Edge AddEdge (Vertex v1, Vertex v2) {
07:
if (!vertices.Contains(v1))
08:
throw new VNotFoundException("");
09:
// B2
10:
if (!vertices.Contains(v2))
11:
throw new VNotFoundException("");
12:
// B3
14:
Edge e = new Edge(v1, v2);
15:
edges.Add(e); } }
//DFS:DepthFirstSearch
18: class DFSAlgorithm { …
23:
public void Compute (Vertex s) { ...
24:
if (graph.GetEdges().Size() > 0) { // B4
25:
isComputed = true;
26:
foreach (Edge e in graph.GetEdges()) {
27:
... // B5
28:
}
29: } } }
[OOPSLA 11]
 A graph example from
QuickGraph library
 Includes two classes
Graph
DFSAlgorithm
 Graph
AddVertex
AddEdge: requires
both vertices to be
in graph
39
39
00: class Graph { …
03:
public void AddVertex (Vertex v) {
04:
vertices.Add(v); // B1 }
06:
public Edge AddEdge (Vertex v1, Vertex v2) {
07:
if (!vertices.Contains(v1))
08:
throw new VNotFoundException("");
09:
// B2
10:
if (!vertices.Contains(v2))
11:
throw new VNotFoundException("");
12:
// B3
14:
Edge e = new Edge(v1, v2);
15:
edges.Add(e); } }
//DFS:DepthFirstSearch
18: class DFSAlgorithm { …
23:
public void Compute (Vertex s) { ...
24:
if (graph.GetEdges().Size() > 0) { // B4
25:
isComputed = true;
26:
foreach (Edge e in graph.GetEdges()) {
27:
... // B5
28:
}
29: } } }
 Test target: Cover true
branch (B4) of Line[OOPSLA
24
11]
 Desired object
state: graph should
include at least one
edge
 Target sequence:
Graph ag = new Graph();
Vertex v1 = new Vertex(0);
Vertex v2 = new Vertex(1);
ag.AddVertex(v1);
ag.AddVertex(v2);
ag.AddEdge(v1, v2);
DFSAlgorithm algo = new
DFSAlgorithm(ag);
algo.Compute(v1);
40
40
Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing
 Instrument
code toDSE
explore
feasibleorpaths
Typically
instruments
explores
@ project under test;
 Challenge:only
pathmethods
explosion

Third-party API external methods
(network, I/O, ..):
•too many paths
•uninstrumentable
Total block coverage achieved is 50%, lowest coverage 16%.


object-creation problems (OCP) - 65%
external-method call problems (EMCP) – 27%
41
42

Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing
 Instrument code to explore feasible paths
 Challenge: path explosion
Total block coverage achieved is 50%, lowest coverage 16%.
Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for
Structural Test Generation. In Proc. ICSE 2011
43
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
@NCSU ASE

Tackling object-creation problems
 Seeker [OOSPLA 11] , MSeqGen [ESEC/FSE 09]
Covana [ICSE 11], OCAT [ISSTA 10]
Evacon [ASE 08], Symclat [ASE 06]
 Still not good enough (at least for now)!
▪ Seeker (52%) > Pex/DSE (41%) > Randoop/random (26%)

Tackling external-method call problems
 DBApp Testing [ESEC/FSE 11], [ASE 11]
 CloudApp Testing [IEEE Soft 12]
 Deal with only common environment APIs
 Test target: Cover true
branch (B4) of Line 24
00: class Graph { …
03:
public void AddVertex (Vertex v) {
04:
vertices.Add(v); // B1 }
06:
public Edge AddEdge (Vertex v1, Vertex v2) {
07:
if (!vertices.Contains(v1))
08:
throw new VNotFoundException("");
09:
// B2
10:
if (!vertices.Contains(v2))
11:
throw new VNotFoundException("");
12:
// B3
14:
Edge e = new Edge(v1, v2);
15:
edges.Add(e); } }
//DFS:DepthFirstSearch
18: class DFSAlgorithm { …
23:
public void Compute (Vertex s) { ...
24:
if (graph.GetEdges().Size() > 0) { // B4
25:
isComputed = true;
26:
foreach (Edge e in graph.GetEdges()) {
27:
... // B5
28:
}
29: } } }
 Desired object
state: graph should
include at least one
edge
 Target sequence:
Graph ag = new Graph();
Vertex v1 = new Vertex(0);
Vertex v2 = new Vertex(1);
ag.AddVertex(v1);
ag.AddVertex(v2);
ag.AddEdge(v1, v2);
DFSAlgorithm algo = new
DFSAlgorithm(ag);
algo.Compute(v1);
46 46
Tackle object-creation problems with Factory Methods
47
Tackle external-method call problems with Mock Methods or
Method Instrumentation
Mocking System.IO.File.ReadAllText
48

Human-Assisted Computing
 Driver: tool Helper: human
 Ex. Covana [ICSE 2011]

Human-Centric Computing
 Driver: human  Helper: tool
 Ex. Pex for Fun [ICSE 2013 SEE]
Interfaces are important. Contents are important too!
49
Symptoms
all non-primitive program inputs/fields
object-creation problems (OCP)
external-method call problems (EMCP)
all executed external-method calls
(Likely)
Causes
50

Causal analysis: tracing between symptoms and
(likely) causes
 Reduce cost of human consumption
▪ reduction of #(likely) causes
▪ diagnosis of each cause

Solution construction: fixing suspected causes
 Reduce cost of human contribution
▪ measurement of solution goodness
▪ Inner iteration of human-tool cooperation!
51
Symptoms
Given symptom s
foreach (c in LikelyCauses) {
Fix(c);
if (IsObserved(s))
RelevantCauses.add(c)
}
object-creation problems (OCP)
external-method call problems (EMCP)
(Likely)
Causes
52
[ICSE 11]

Goal: Precisely identify problems (causes) faced
by a tool for causing not to cover a statement
(symptom)

Insight: Partially-covered conditional has data
dependency on a real problem
From xUnit
53
 Consider only EMCPs whose arguments have data
dependencies on program inputs
▪ Fixing such problem candidates facilitates test-generation tools
Data Dependencies
From xUnit
Symptom Expression:
return(File.Exists) == true
Element of
EMCP Candidate:
return(File.Exists)
 Partially-covered
Conditional in Line 1 has data
dependency on File.Exists
conditionals have data
dependencies on EMCP
candidates
55
From xUnit
56
Program
Problem
Candidate
Identification
Runtime
Events
[Inputs  EMCP]
Generated
Test Inputs
Forward
Symbolic
Execution
Coverage
Identified
Problems
Problem
Candidates
Runtime
Information
Data
Dependence
Analysis
[EMCP Symptom]
57

Subjects:
 xUnit: unit testing framework for .NET
▪ 223 classes and interfaces with 11.4 KLOC
 QuickGraph: C# graph library
▪ 165 classes and interfaces with 8.3 KLOC

Evaluation setup:
 Apply Pex to generate tests for program under test
 Feed the program and generated tests to Covana
 Compare baseline solution and Covana
58

RQ1: How effective is Covana in identifying
the two main types of problems, EMCPs and
OCPs?

RQ2: How effective is Covana in pruning
irrelevant problem candidates of EMCPs and
OCPs?
59
Covana identifies
• 43 EMCPs with only 1 false positive and 2 false negatives
• 155 OCPs with 20 false positives and 30 false negatives.
60
Covana prunes
• 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives
• 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives
61

Motivation
 Tools are often not powerful enough
 Human is good at some aspects that tools are not
 What difficulties does the tool face?
 How to communicate info to the user to get help?
Iterations to form Feedback Loop
 How does the user help the tool based on the info?
62

Human-Assisted Computing
 Driver: tool Helper: human
 Ex. Covana [ICSE 2011]

Human-Centric Computing
 Driver: human  Helper: tool
 Ex. Pex for Fun [ICSE 2013 SEE]
Interfaces are important. Contents are important too!
63
www.pexforfun.com
1,270,159 clicked 'Ask Pex!'
http://research.microsoft.com/en-us/projects/pex4fun/
Nikolai Tillmann, Jonathan De Halleux, Tao Xie, Sumit Gulwani and Judith Bishop.
Teaching and Learning Programming and Software Engineering via Interactive
64
Gaming. In Proc. ICSE 2013 SEE.
behavior
Secret Impl ==
Secret Implementation
class Secret {
public static int Puzzle(int x) {
if (x <= 0) return 1;
return x * Puzzle(x-1);
}
}
class Test {
public static void Driver(int x) {
if (Secret.Puzzle(x) != Player.Puzzle(x))
throw new Exception(“Mismatch”);
}
}
Player Impl
Player Implementation
class Player {
public static int Puzzle(int x) {
return x;
}
}
65

Coding duels at http://www.pexforfun.com/
 Brain exercising/learning while having fun
 Fun: iterative, adaptive/personalized, w/ win criterion
 Abstraction/generalization, debugging, problem solving
Brain exercising
Observed Benefits
• Automatic Grading
• Real-time Feedback (for Both Students and Teachers)
• Fun Learning Experiences
http://pexforfun.com/gradsofteng
“I used to love the first person shooters and the
satisfaction of blowing away a whole team of
Noobies playing Rainbow Six, but this is far more
fun.”
X
“I’m afraid I’ll have to constrain myself to spend just an hour
or so a day on this really exciting stuff, as I’m really stuffed
with work.”
“It really got me *excited*. The part that got me most is
about spreading interest in teaching CS: I do think that it’s
REALLY great for teaching | learning!”

Internet
Everyone can
contribute
 Coding duels
 Duel solutions
class Secret {
public static int Puzzle(int x) {
if (x <= 0) return 1;
return x * Puzzle(x-1); } }
70
Internet
Puzzle Games Made from
Difficult Constraints or ObjectCreation Problems
Ning Chen and Sunghun Kim. Puzzle-based Automatic Testing: bringing humans into the loop by
solving puzzles. In Proc. ASE 2012
Supported by MSR SEIF Award
http://www.cs.washington.edu/verigames/
StackMine [Han et al. ICSE 12]
Pattern Matching
Bug update
Internet
Problematic
Pattern Repository
Bug Database
Bug
filing
Trace collection
Trace Storage
Trace analysis
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of Stack Traces. In Proc. ICSE 2012
73
“We believe that the MSRA tool is highly valuable and
much more efficient for mass trace (100+ traces) analysis.
For 1000 traces, we believe the tool saves us 4-6 weeks of
time to create new signatures, which is quite a significant
productivity boost.”
- from Development Manager in Windows
Highly effective new issue discovery on
Windows mini-hang
Continuous impact on future Windows versions
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large
via Mining Millions of Stack Traces. In Proc. ICSE 2012

Don’t forget human factors
 Using your tools as end-to-end solutions
 Helping your tools

Don’t forget cooperations of human and tool
intelligence; human and human intelligence
 Human can help your tools too
 Human and human could work together to help your
tools, e.g., crowdsourcing
75

Human-Assisted Computing

Human-Centric Computing

Human-Human Cooperation

Don’t forget human factors
 Using your tools as end-to-end solutions
 Helping your tools

Don’t forget cooperations of human and tool;
human and human
 Human can help your tools too
 Human and human could work together to help your
tools, e.g., crowdsourcing
77

Wonderful current/former students@ASE

Collaborators, especially those from Microsoft
Research Redmond/Asia, Peking University
Colleagues who gave feedback and inspired me

NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an
NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award
Questions ?
https://sites.google.com/site/asergrp/

Human-Assisted Computing

Human-Centric Computing

Human-Human Cooperation
Download