A Model Counter For Constraints Over Unbounded Strings Loi Luu Prateek Saxena

advertisement
A Model Counter For Constraints
Over Unbounded Strings
Loi Luu*
Shweta Shinde*
Prateek Saxena* Brian Demsky+
National University of Singapore*
University of California, Irvine+
Example:
Quantifying Password Strength Meters
1
Example:
Quantifying Password Strength Meters
Password
Cracking
Dictionary
ch1 = [a-Z]*STRONG
ch2 = [0-9]*
ch3 = [@#?^&*-+]*
INPUT Îch1^ch2^ch3
Ç
INPUT
Password
Î Dictionary
Database
How many?
2
The Model Counting Problem
String Constraints
String Model Counter
# of Solutions
3
Contributions
• SMC: String Model Counter
– Handles constraints structured data types
– Available at https://github.com/loiluu/smc
Fast & Scalable
Expressive
Better
Precision
• Handles Unbounded Strings
– Uses Generating functions
• Many Practical Applications
– Quantifying Password Strength meters
– Quantifying Information Leaks via Side Channels
4
Technical Problem Definition
• C
: Set of string constraints
Policy for
STRONG Password
• S
: Set of feasible string
solutions for C
All possible
STRONG passwords
: Model count
Number of
STRONG Passwords
• |S|
• [L,U]n : Model count bounds
for string length n
5
Soundness & Precision
Unsound
0
Exact Count
LB
ε
Imprecise
UB
2L
ε-precise: distance of LB and UB on log-scale
6
Insufficiency of Previous Approaches:
Enumeration or Sampling
strstr(S,"XXXXY ") =100
• Enumerate more than 256100 strings [NDSS’ 14]
• Does not scale for large strings
7
Insufficiency of Previous Approaches:
Symbolic Execution & Integer MC
const char* mystrstr(const char* str, const char* search) {
if (!str || !search) { return 0; }
while(*str != '\0’) {
int len = 0;
const char* sub = search;
while(*sub != '\0’) {
if (*sub == *str) {
sub++; str++; len++;
if (*sub == '\0’) { return str – len;
}
100
}
else {
str -= len; break; }
}
str++;
}
return 0;
}
5
Paths
Symbolic
execution
Symbolic
Path
Constraint
Integer
Model
Counter
Model
Count
8
String Model Counting:
Representing Set Cardinalities with GFs
Example: Set of all strings over alphabet: {a, b}
Φ
|S0| = 1
a
b
|S1| = 2
aa
ab
ba
bb
aaa
aab
aba
abb
…
|S2| = 4
|S3| =
...
9
String Model Counting:
Representing String Sets with GFs
• Can be viewed as a Series: 1, 2, 4,8,.....
• Represent as polynomial
1+ 2z + 4z 2 +8z3 +...
• Has a closed form algebraic expression:
1
G(z) =
(1- 2z)
• Can represent infinite sets!
Generating
Function (or GF)
10
Recovering Co-efficient from GFs
• How many strings of length 3?
3
• Co-efficient of z in G(z)
d d d
G'''(z) = ( ( (1+2z + 4z 2+ 8z3 +...)))
dz dz dz
= 3!(8+16z + 32z 2 +...)
G'''(z)[z = 0]
=8
3!
G [0]
a(k) =
k!
(k )
11
Modeling String Ops Over GFs:
Concatenation
S1 = {a, b}
1
G1 (z) =
1- 2z
S2 = {+, -}
G2 (z) = 2z
*
S1.S2
G3 (z) = G1 (z)´G2 (z)
2z
=
1- 2z
a
b
aa
ab
ba
bb
…
+a
-a
+b
+
-b
= +aa
-aa
+ab
-ab
+ba
-ba
…
12
Modeling String Ops Over GFs:
Regular Expression Match
S Î {a | ab}
*
S3 = {a | ab}
S1 = {a}
S2 = {ab}
1
G(z) =
2
1- (z + z )
G3 (z) = z + z
G1 (z) = z
2
G2 (z) = z
2
13
Preserving Precision:
contains Operation
S.contains("aba") ? S Î.*aba.*
=
Modeling contains as regex is not always precise
abababbb
ab
aba
bbb
z3
G(z) =
(1- 2z)(z 3 + (1- 2z)(1+ z 2 ))
aba
babbb
Exact!
14
SMC: Full Language
RegExp
:=
character | ε
| RegExp RegExp | RegExp|RegExp | RegExp*
Constraint :=
Var = Var
| Var IN RegExp
| Var = Var • Var
| Var = ConsString
| contains(Var, ConsString)
| strstr(Var, ConsString)
| length(Var) ○ Num
○
Formula
:=
< | ≤ | > | ≥ | ≠
:=
Formula
| Formula OR Formula
| Formula AND Formula
| NOT Formula
| Constraint
Core Constraints
Full Constraints
- String level
Combining multiple
Constraints
15
SMC Design
SMC
Constraints
CNF
Constraint
Formula Translation
Generating
Function
Translation
Algebraic
Computation
SMC
Generating
Function
Evaluate at n
Model Count
Mathematica
16
Case Studies &
Applications
17
Experiments
• Real-world Password strength meters
– 3 websites: Drupal, eBay and Microsoft meters [NDSS ‘14]
• Quantify Leakage in C Programs
– 4 UNIX Utilities: obscure, grep, csplit [CCS’13]
– 2 Web Servers: Ghttpd, Null HTTPd [ASPLOS ‘08]
• String constraints from JavaScript applications
– 18,901 path constraints from 18 real world apps [SSP’10]
• 13 iGoogle gadgets and 5 AJAX applications
18
Application I: Password Strength Meters
Guessing Attacks w/o Dictionaries
• eBay
Length
L=5
L = 10
Invalid
Weak
Medium Strong
8
2.29
´10
2.4 ´10 5.53´10 1.07´10
17
18
2.82 ´1014 1.58´10 1.45´1018 2.14 ´10
8
7
9
• Microsoft
Length
Weak
L=5
L = 10
0
1.93´109
3.74 ´1018
0
Length
Weak
L=5
0
0
0
1.93´109
2.82 ´1014 1.58´1017 3.95´1018
0
• Drupal
L = 10
Medium
Fair
Good
Strong
Application I: Password Strength Meters
Guessing Attacks with Dictionaries
STRONG
PASSWORDS
The
smaller
The
better
• How many of these also exist in JtR Database?
– Password database dictionary of 3,106 words
20
Application I: Password Strength Meter
Website
Strength
Total (L = 1..10)
Invalid
2678
Weak
413
Medium
3
Strong
0
Weak
2640
Medium
461
Good
0
Best
0
Weak
936
Fair
1974
Good
369
Strong
3
Drupal will be
more vulnerable
to JtR dictionary
attack
21
Application II: Quantifying Side Channel
Leakage
Malicious
VM
Side Channel
Attacks
input= readline(file);
if (strstr(input,'\n'))
lines++;
if (strstr(input, '\r') ||
strstr(input, '\f')) {
if (linepos > linelength)
Web Server
linelength
== linepos;
ch1
\n
VM
linepos = 0;
words++; ch2 = \r|\f
}
INPUT
Sensitive
if (strstr(input, Î
'\t') {
(ch1|ch2)*
linepos += 8 - (linepos %
8);
words++;
}
write_counts (lines, words);
Execution
Hypervisor & Hardware
22
Application II: Quantifying Side Channel Leakage
ch1 = \n ch2 = \r|\f
INPUT Î (ch1|ch2)*
INPUT Î .*
SMC
SMC
X
How 2many?
Y
How 2many?
X-Y
23
Application II: Quantifying Side Channel Leakage
Path Leakage
800
750.1
Leakage in Bits
700
600
500
400
355.3
300
179.4
200
100
0.133
0
grep
wc
UNIX Utilities
csplit
obscure
24
Tool Evaluation
25
Result I: Speed
• SMC vs. FuzzBALL [PLAS’ 09]
Program
len SMC
FuzzBALL
Obscure
6
0.5 sec
2 Hrs
strstr(input, "abc")!=NULL
5
0.4 sec
2Hrs
strstr(input, "abc")!=NULL
4
0.5 sec
150 sec
match regex(input, "(a|b)*")
4
0.4 sec
2 Hrs
• SMC vs. QUAIL [CAV’ 13]
Program
len SMC
QUAIL
strstr(input, "ab")=2
5
0.2 sec
6.1 sec
strstr(input, "ab")=2
7
0.2 sec
648 sec
input.contains("ab")
5
0.3 sec
5.1 sec
input.contains("ab")
7
0.3 sec
606 sec
26
Result II: Expressiveness
(JavaScript Applications & UNIX utilities)
Frequency of Constraints
43381
38242
regexes
24
202
191
Contains
length
83382
89771
95
239
concatenation
comparison with
const string
18,901 JavaScript
Benchmarks
UNIX Case studies
27
Result III: Precision
• SMC vs. Castro et al. [ASPLOS ‘08]
Program
len
SMC
Bits
Castro
ε-precise et al.
Ghttpd
620 80.2
0.003
Null HTTPd
500 248.0 0.002
• SMC vs. FuzzBALL [PLAS’ 09]
Program
len
SMC
Bits
Obscure
6
»0
strstr(input, "abc")!=NULL
5
strstr(input, "abc")!=NULL
4
match regex(input, "(a|b)*”)
4
» 248
» 500
FuzzBALL
ε-precise
0.06
> 2 Hrs
22.4
0
> 2 Hrs
23.0
0
13.5
0.13
> 2 Hrs
» 32
28
Conclusion
• SMC: String Model Counter
Fast & Scalable
Expressive
Better
Precision
• Handles Unbounded Strings
• Practical Applications
– Quantifying Password Strength meters
– Quantifying Information Leaks via Side Channels
29
Related Work
•
•
•
•
•
•
•
•
•
•
[Cambridge University Press’ 09] R. Sedgewick and P. Flajolet.
Analytic Combinatorics.
[JAIR’ 99] E. Birnbaum and E. L. Lozinskii.
The good old davis-putnam procedure helps counting models
[FOCS’ 93] A. I. Barvinok.
A Polynomial Time Algorithm for Counting Integral Points in Polyhedra When the Dimension
Is Fixed.
[Algorithmica’ 07] S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe.
Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions
[SSP’ 09] M. Backes, B. Kopf, and A. Rybalchenko.
Automatic Discovery and Quantification of Information Leaks.
[PLAS’ 09] J. Newsome, S. McCamant, and D. Song.
Measuring Channel Capacity to Distinguish Undue Influence
[CAV’ 13] F. Biondi, A. Legay, L.M. Traonouez, and A. Wasowski.
QUAIL: A quantitative security analyzer for imperative code.
[SEN’ 12] Q.S. Phan, P. Malacaria, O. Tkachuk, and C. S. Pasareanu.
Symbolic Quantitative Information Flow.
LattE Tool. http://www.math.ucdavis.edu/~latte/.
RelSat Tool. http://code.google.com/p/relsat/.
30
Contact
• Loi Luu, Shweta Shinde
{loiluu, shweta24} @comp.nus.edu.sg
• Our SMC Tool is available at:
– https://github.com/loiluu/smc
Thank You !
31
References
• [FoSSaCS’ 09] G. Smith.
On the Foundations of Quantitative Information Flow.
• [CCS’ 13] S. Tople, S. Shinde, Z. Chen, and P. Saxena.
AutoCrypt: Enabling homomorphic computation on servers to protect
sensitive web content
• [ASPLOS’ 08] M. Castro, M. Costa, and J.-P. Martin.
Better Bug Reporting with Better Privacy.
• [SSP’ 10] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D.
Song
A Symbolic Execution Framework for JavaScript.
• [NDSS’ 14] X. de Carne de Carnavalet and M. Mannan.
From Very Weak to Very Strong: Analyzing Password-Strength Meters
• John the Ripper password cracker. http://www.openwall.com/john
• Wolfram Mathematica. http://www.wolfram.com/mathematica
32
Backup Slides
33
Robustness
• Real world JavaScript Benchmarks [SSP’10]
Evaluation Parameters
Number of tests
Total running time
Average no. of constraints
Average running time
Test cases for which SMC
reports exact model count
Big Test Cases
(variable > 4)
1342
1h 58 mins
Small Test Cases
(variable < 4)
17559
1h 9 mins
187
5.29 seconds
21%
2.05
0.24 seconds
94%
34
Download