im.png' onload='javascript:

advertisement
Decision Procedures for String Constraints
Pieter Hooimeijer
2
http://en.wikipedia.org/wiki/Osborne_1
3
4
<img src='untrusted input'/>
5
What could
possibly go wrong?
6
<img src='untrusted input'/>
Attacker:
im.png' onload='javascript:...
7
<img src='untrusted input'/>
Attacker:
im.png' onload='javascript:...
8
<img src='untrusted input'/>
Attacker:
im.png' onload='javascript:...
<img src='im.png' onload ='j
9
<img src='untrusted input'/>
Attacker:
im.png' onload='javascript:...
<img src='im.png' onload ='j
10
11
www.cs.virginia.edu/~ph4u/
12
Talk Outline
Background
Building
Tuning
Conclusion
13
Talk Outline
Background
Building
Tuning
Conclusion
14
ASE
Bug Reports
2007
Sensys
MacroLab
2008
Sensys
MacroLab 2
2009
2010
ISSTA
Hampi
SocialNets
Proxied Content
PLDI
DPRLE
ASE
StrSolve
Sesena
MacroLab 3
2011
USENIX Sec
BEK
2012
POPL
BEK2
2013
TOSEM
Hampi 2
VMCAI
Data structures
J. ASE
StrSolve 2
15
ASE
Bug Reports
2007
Sensys
MacroLab
2008
Sensys
MacroLab 2
2009
2010
ISSTA
Hampi
SocialNets
Proxied Content
PLDI
DPRLE
ASE
StrSolve
Sesena
MacroLab 3
2011
USENIX Sec
BEK
2012
POPL
BEK2
2013
TOSEM
Hampi 2
VMCAI
Data structures
J. ASE
StrSolve 2
This Talk
16
Decision Procedures
• Program analysis work frequently
uses one of these:
• They solve mathematical constraints
• There is a standard input format
17
Example
2
𝑥 = 25
𝑥>0
18
2
𝑥 = 25
𝑥>0
(declare-fun x () Int)
(assert (= (* x x) 25))
(assert (> x 0))
(check-sat)
(get-model)
✔
[ 𝑥 ↦ 5]
19
Motivation
Reasoning about strings is
difficult:
– for programmers
– for automated tools
20
String Constraint Solvers
Hampi
Kaluza
Rex
21
Hampi
Kaluza
Rex
String a;
String a;
//...
//...
R = Regex("^ab$");
R =R.IsMatch(a)
Regex("^ab$");
= true;
assert(R.Match(a));
22
Hampi
Kaluza
String a;
String a;
//...
//...
[𝑎
R = Regex("^ab$");
R =R.IsMatch(a)
Regex("^ab$");
= true;
assert(R.Match(a));
Rex
✔
↦ ′ab′]
23
solvers
Hampi
Kaluza
String a;
String a;
//...
//...
[𝑎
R = Regex("^ab$");
R =R.IsMatch(a)
Regex("^ab$");
= true;
assert(R.Match(a));
constraints
Rex
✔
solution(s)
↦ ′ab′]
24
What should we model?
25
Example
How hard is regex
matching in Perl?
26
A: Just as hard as 3-SAT…
$istr = '^' . ('(x?)' x $V) . ".*;\n"
$ireg = '^' . ('(x?)' x $V) . ".*;\n"
. join('',
map {'(?:'
. join('|',
map { $_ < 0
? ('\\' . -$_ . 'x')
: ('\\' . $_ )
} @$_ )
. "),\n"
} @Clauses );
http://perl.plover.com/NPC/NPC-3SAT.html
27
Where do
constraints come from?
28
Code
String a;
// ...
R = Regex("^ab$");
if (R.IsMatch(a)) {
// ...
}
29
Constraint
Generation
Constraint
Solving
30
Constraint
Generation
Constraint
Solving
31
Talk Outline
Background
Building
Tuning
Conclusion
32
Chapter 2: Defining String Constraints
Contributions:
1. The definition of the regular
matching assignments problem
2. An algorithm, its implementation,
and correctness proof
3. An evaluation, applying (2) to a
static analysis problem
33
34
demo (internet permitting)
Evaluation
The Task:
generate string inputs that
exercise 17 known vulnerabilities in 30,000 lines of PHP
Metric:
running time
35
Results
• Our constraint definition is
sufficiently expressive to capture the
constraints of interest
• Wall-clock running time is between
0.01 seconds and 10 minutes
36
Talk Outline
Background
Building
Tuning
Conclusion
37
Chapter 3: Evaluating Data Structures
Contribution:
4. An apples-to-apples performance
comparison of data structures
and algorithms for automatabased string constraint solving
38
Motivation
• Existing work provided tool-totool performance comparisons
• Confounds: Performance gains
may be due to external factors
39
The Framework
• Based on Rex
• Fixes external factors:
– front-end parser
– regex-to-automaton conversion
– implementation language
– search tree
40
Study Design
Tasks:
– automaton intersection
– automaton subtraction
Metric:
– running time
41
Character Sets
BDD
Pred
Range
Hash
binary decision diagrams
symbolic bitvector ranges in DNF
concrete set of character ranges
concrete set of individual characters
42
Task 1 (55x):
𝑤 ∈ 𝐿(𝑎) ∩ 𝐿(𝑏)
Task 2 (100x):
𝑤 ∈ 𝐿(𝑎) ∖ 𝐿(𝑏)
43
Eager Lazy
Task 1 (55x):
𝑤 ∈ 𝐿(𝑎) ∩ 𝐿(𝑏)
Task 2 (100x):
𝑤 ∈ 𝐿(𝑎) ∖ 𝐿(𝑏)
44
Eager Lazy
Task 1 (55x):
𝑤 ∈ 𝐿(𝑎) ∩ 𝐿(𝑏)
Task 2 (100x):
𝑤 ∈ 𝐿(𝑎) ∖ 𝐿(𝑏)
Unicode
ASCII
Unicode
ASCII
Unicode
ASCII
Unicode
ASCII
45
Results
Eager Lazy
Task 1 (55x):
𝑤 ∈ 𝐿(𝑎) ∩ 𝐿(𝑏)
Task 2 (100x):
𝑤 ∈ 𝐿(𝑎) ∖ 𝐿(𝑏)
Unicode
ASCII
Unicode
ASCII
Unicode
ASCII
Unicode
ASCII
46
ASCII
Lazy
1000
1000
100
100
10
10
1
1
0.1
0.1
BDD
Unicode
Eager
Pred Range Hash
BDD
1000
1000
100
100
10
10
1
1
0.1
0.1
Pred Range Hash
47
ASCII
Lazy
1000
1000
100
100
10
10
1
1
0.1
0.1
BDD
Unicode
Eager
Pred Range Hash
BDD
1000
1000
100
100
10
10
1
1
0.1
0.1
Pred Range Hash
48
Chapter 4: Solving String Constraints Lazily
Contributions:
5. A novel (lazy) algorithm for solving
multivariate string constraints
6. A comprehensive performance
evaluation
49
Motivation
• More scalable algorithms are
more likely to see real use
50
Approach
1. Eagerly construct a
high-level representation
of the search space
2. Explore the search space
lazily, adding restrictions
for one variable at a time
51
Evaluation
Difference
Hampi
Long
Strings
CFG
Intersection
52
Evaluation
Difference
Hampi
Long
Strings
CFG
Intersection
53
Hampi: Background
2007
2008
2009
2010
ISSTA
Hampi
SocialNets
Proxied Content
PLDI
DPRLE
ASE
StrSolve
2011
USENIX Sec
BEK
2012
POPL
BEK2
2013
TOSEM
Hampi 2
VMCAI
Data structures
J. ASE
StrSolve 2
54
Hampi: Background
2007
2008
2009
2010
ISSTA
Hampi
SocialNets
Proxied Content
PLDI
DPRLE
ASE
StrSolve
2011
USENIX Sec
BEK
2012
POPL
BEK2
2013
TOSEM
Hampi 2
VMCAI
Datastructures
J. ASE
StrSolve 2
55
Hampi: Architecture
Hampi
STP (bv)
MiniSAT
56
Hampi
encoding
STP (bv)
MiniSAT
solving
57
Experiment
Task:
regex difference
(same dataset as before)
Metric:
proportion of wall-clock
time spent solving
58
Results
Length Bound
15
10
5
Solving
Encoding
1
0%
20%
40%
60%
80%
Proportion of Running time
100%
59
Results
Length Bound
15
10
5
Solving
Encoding
1
0%
20%
40%
60%
80%
100%
60
Results
100%
10
5
1
0%
Proportion of Running Time
Length Bound
15
80%
60%
Encoding
40%
Solving
20%
0%
20%
Solving
Encoding
0
2
4
6
8
Absolute
time (seconds)
40% Running
60%
80%
10
100%
61
Evaluation
Difference
Hampi
Long
Strings
CFG
Intersection
62
Experiment
Task:
intersect two regexes
parameterized on n:
[a-c]*a[a-c]{n+1}
and
[a-c]*b[a-c]{n}
Metric:
running time
63
Participating Tools
Hampi
Rex
Strsolve
64
Results
10000
1000
Time (s)
100
10
Hampi
1
Rex
0.1
Strsolve
0.01
0.001
0
250
500
n
750
1000
65
Talk Outline
Background
Building
Tuning
Conclusion
66
Conclusion
• Introduced string constraint solving in
the context of program analysis
• Two algorithms:
one eager (DPRLE), one lazy (strsolve)
• Presented experiments
– data structure selection
– solving multivariate constraints
• Our lazy prototype outperforms other
approaches on indicative workloads
67
Thanks for stopping by!
www.cs.virginia.edu/~ph4u/
68
69
Download