A Model Counter For Constraints Over Unbounded Strings Loi Luu* Shweta Shinde* Prateek Saxena* Brian Demsky+ National University of Singapore* University of California, Irvine+ Example: Quantifying Password Strength Meters 1 Example: Quantifying Password Strength Meters Password Cracking Dictionary ch1 = [a-Z]*STRONG ch2 = [0-9]* ch3 = [@#?^&*-+]* INPUT Îch1^ch2^ch3 Ç INPUT Password Î Dictionary Database How many? 2 The Model Counting Problem String Constraints String Model Counter # of Solutions 3 Contributions • SMC: String Model Counter – Handles constraints structured data types – Available at https://github.com/loiluu/smc Fast & Scalable Expressive Better Precision • Handles Unbounded Strings – Uses Generating functions • Many Practical Applications – Quantifying Password Strength meters – Quantifying Information Leaks via Side Channels 4 Technical Problem Definition • C : Set of string constraints Policy for STRONG Password • S : Set of feasible string solutions for C All possible STRONG passwords : Model count Number of STRONG Passwords • |S| • [L,U]n : Model count bounds for string length n 5 Soundness & Precision Unsound 0 Exact Count LB ε Imprecise UB 2L ε-precise: distance of LB and UB on log-scale 6 Insufficiency of Previous Approaches: Enumeration or Sampling strstr(S,"XXXXY ") =100 • Enumerate more than 256100 strings [NDSS’ 14] • Does not scale for large strings 7 Insufficiency of Previous Approaches: Symbolic Execution & Integer MC const char* mystrstr(const char* str, const char* search) { if (!str || !search) { return 0; } while(*str != '\0’) { int len = 0; const char* sub = search; while(*sub != '\0’) { if (*sub == *str) { sub++; str++; len++; if (*sub == '\0’) { return str – len; } 100 } else { str -= len; break; } } str++; } return 0; } 5 Paths Symbolic execution Symbolic Path Constraint Integer Model Counter Model Count 8 String Model Counting: Representing Set Cardinalities with GFs Example: Set of all strings over alphabet: {a, b} Φ |S0| = 1 a b |S1| = 2 aa ab ba bb aaa aab aba abb … |S2| = 4 |S3| = ... 9 String Model Counting: Representing String Sets with GFs • Can be viewed as a Series: 1, 2, 4,8,..... • Represent as polynomial 1+ 2z + 4z 2 +8z3 +... • Has a closed form algebraic expression: 1 G(z) = (1- 2z) • Can represent infinite sets! Generating Function (or GF) 10 Recovering Co-efficient from GFs • How many strings of length 3? 3 • Co-efficient of z in G(z) d d d G'''(z) = ( ( (1+2z + 4z 2+ 8z3 +...))) dz dz dz = 3!(8+16z + 32z 2 +...) G'''(z)[z = 0] =8 3! G [0] a(k) = k! (k ) 11 Modeling String Ops Over GFs: Concatenation S1 = {a, b} 1 G1 (z) = 1- 2z S2 = {+, -} G2 (z) = 2z * S1.S2 G3 (z) = G1 (z)´G2 (z) 2z = 1- 2z a b aa ab ba bb … +a -a +b + -b = +aa -aa +ab -ab +ba -ba … 12 Modeling String Ops Over GFs: Regular Expression Match S Î {a | ab} * S3 = {a | ab} S1 = {a} S2 = {ab} 1 G(z) = 2 1- (z + z ) G3 (z) = z + z G1 (z) = z 2 G2 (z) = z 2 13 Preserving Precision: contains Operation S.contains("aba") ? S Î.*aba.* = Modeling contains as regex is not always precise abababbb ab aba bbb z3 G(z) = (1- 2z)(z 3 + (1- 2z)(1+ z 2 )) aba babbb Exact! 14 SMC: Full Language RegExp := character | ε | RegExp RegExp | RegExp|RegExp | RegExp* Constraint := Var = Var | Var IN RegExp | Var = Var • Var | Var = ConsString | contains(Var, ConsString) | strstr(Var, ConsString) | length(Var) ○ Num ○ Formula := < | ≤ | > | ≥ | ≠ := Formula | Formula OR Formula | Formula AND Formula | NOT Formula | Constraint Core Constraints Full Constraints - String level Combining multiple Constraints 15 SMC Design SMC Constraints CNF Constraint Formula Translation Generating Function Translation Algebraic Computation SMC Generating Function Evaluate at n Model Count Mathematica 16 Case Studies & Applications 17 Experiments • Real-world Password strength meters – 3 websites: Drupal, eBay and Microsoft meters [NDSS ‘14] • Quantify Leakage in C Programs – 4 UNIX Utilities: obscure, grep, csplit [CCS’13] – 2 Web Servers: Ghttpd, Null HTTPd [ASPLOS ‘08] • String constraints from JavaScript applications – 18,901 path constraints from 18 real world apps [SSP’10] • 13 iGoogle gadgets and 5 AJAX applications 18 Application I: Password Strength Meters Guessing Attacks w/o Dictionaries • eBay Length L=5 L = 10 Invalid Weak Medium Strong 8 2.29 ´10 2.4 ´10 5.53´10 1.07´10 17 18 2.82 ´1014 1.58´10 1.45´1018 2.14 ´10 8 7 9 • Microsoft Length Weak L=5 L = 10 0 1.93´109 3.74 ´1018 0 Length Weak L=5 0 0 0 1.93´109 2.82 ´1014 1.58´1017 3.95´1018 0 • Drupal L = 10 Medium Fair Good Strong Application I: Password Strength Meters Guessing Attacks with Dictionaries STRONG PASSWORDS The smaller The better • How many of these also exist in JtR Database? – Password database dictionary of 3,106 words 20 Application I: Password Strength Meter Website Strength Total (L = 1..10) Invalid 2678 Weak 413 Medium 3 Strong 0 Weak 2640 Medium 461 Good 0 Best 0 Weak 936 Fair 1974 Good 369 Strong 3 Drupal will be more vulnerable to JtR dictionary attack 21 Application II: Quantifying Side Channel Leakage Malicious VM Side Channel Attacks input= readline(file); if (strstr(input,'\n')) lines++; if (strstr(input, '\r') || strstr(input, '\f')) { if (linepos > linelength) Web Server linelength == linepos; ch1 \n VM linepos = 0; words++; ch2 = \r|\f } INPUT Sensitive if (strstr(input, Î '\t') { (ch1|ch2)* linepos += 8 - (linepos % 8); words++; } write_counts (lines, words); Execution Hypervisor & Hardware 22 Application II: Quantifying Side Channel Leakage ch1 = \n ch2 = \r|\f INPUT Î (ch1|ch2)* INPUT Î .* SMC SMC X How 2many? Y How 2many? X-Y 23 Application II: Quantifying Side Channel Leakage Path Leakage 800 750.1 Leakage in Bits 700 600 500 400 355.3 300 179.4 200 100 0.133 0 grep wc UNIX Utilities csplit obscure 24 Tool Evaluation 25 Result I: Speed • SMC vs. FuzzBALL [PLAS’ 09] Program len SMC FuzzBALL Obscure 6 0.5 sec 2 Hrs strstr(input, "abc")!=NULL 5 0.4 sec 2Hrs strstr(input, "abc")!=NULL 4 0.5 sec 150 sec match regex(input, "(a|b)*") 4 0.4 sec 2 Hrs • SMC vs. QUAIL [CAV’ 13] Program len SMC QUAIL strstr(input, "ab")=2 5 0.2 sec 6.1 sec strstr(input, "ab")=2 7 0.2 sec 648 sec input.contains("ab") 5 0.3 sec 5.1 sec input.contains("ab") 7 0.3 sec 606 sec 26 Result II: Expressiveness (JavaScript Applications & UNIX utilities) Frequency of Constraints 43381 38242 regexes 24 202 191 Contains length 83382 89771 95 239 concatenation comparison with const string 18,901 JavaScript Benchmarks UNIX Case studies 27 Result III: Precision • SMC vs. Castro et al. [ASPLOS ‘08] Program len SMC Bits Castro ε-precise et al. Ghttpd 620 80.2 0.003 Null HTTPd 500 248.0 0.002 • SMC vs. FuzzBALL [PLAS’ 09] Program len SMC Bits Obscure 6 »0 strstr(input, "abc")!=NULL 5 strstr(input, "abc")!=NULL 4 match regex(input, "(a|b)*”) 4 » 248 » 500 FuzzBALL ε-precise 0.06 > 2 Hrs 22.4 0 > 2 Hrs 23.0 0 13.5 0.13 > 2 Hrs » 32 28 Conclusion • SMC: String Model Counter Fast & Scalable Expressive Better Precision • Handles Unbounded Strings • Practical Applications – Quantifying Password Strength meters – Quantifying Information Leaks via Side Channels 29 Related Work • • • • • • • • • • [Cambridge University Press’ 09] R. Sedgewick and P. Flajolet. Analytic Combinatorics. [JAIR’ 99] E. Birnbaum and E. L. Lozinskii. The good old davis-putnam procedure helps counting models [FOCS’ 93] A. I. Barvinok. A Polynomial Time Algorithm for Counting Integral Points in Polyhedra When the Dimension Is Fixed. [Algorithmica’ 07] S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions [SSP’ 09] M. Backes, B. Kopf, and A. Rybalchenko. Automatic Discovery and Quantification of Information Leaks. [PLAS’ 09] J. Newsome, S. McCamant, and D. Song. Measuring Channel Capacity to Distinguish Undue Influence [CAV’ 13] F. Biondi, A. Legay, L.M. Traonouez, and A. Wasowski. QUAIL: A quantitative security analyzer for imperative code. [SEN’ 12] Q.S. Phan, P. Malacaria, O. Tkachuk, and C. S. Pasareanu. Symbolic Quantitative Information Flow. LattE Tool. http://www.math.ucdavis.edu/~latte/. RelSat Tool. http://code.google.com/p/relsat/. 30 Contact • Loi Luu, Shweta Shinde {loiluu, shweta24} @comp.nus.edu.sg • Our SMC Tool is available at: – https://github.com/loiluu/smc Thank You ! 31 References • [FoSSaCS’ 09] G. Smith. On the Foundations of Quantitative Information Flow. • [CCS’ 13] S. Tople, S. Shinde, Z. Chen, and P. Saxena. AutoCrypt: Enabling homomorphic computation on servers to protect sensitive web content • [ASPLOS’ 08] M. Castro, M. Costa, and J.-P. Martin. Better Bug Reporting with Better Privacy. • [SSP’ 10] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song A Symbolic Execution Framework for JavaScript. • [NDSS’ 14] X. de Carne de Carnavalet and M. Mannan. From Very Weak to Very Strong: Analyzing Password-Strength Meters • John the Ripper password cracker. http://www.openwall.com/john • Wolfram Mathematica. http://www.wolfram.com/mathematica 32 Backup Slides 33 Robustness • Real world JavaScript Benchmarks [SSP’10] Evaluation Parameters Number of tests Total running time Average no. of constraints Average running time Test cases for which SMC reports exact model count Big Test Cases (variable > 4) 1342 1h 58 mins Small Test Cases (variable < 4) 17559 1h 9 mins 187 5.29 seconds 21% 2.05 0.24 seconds 94% 34