Component-based Synthesis Sumit Gulwani (MSR Redmond) Joint work with: Susmit Jha and Sanjit Seshia (UC-Berkeley) Ashish Tiwari (SRI) Ramarathnam Venkatesan (MSR Bangalore/Redmond) Component based Synthesis Problem Definition Given: • A library of components where each component comes with its functional specification • Functional Specification of desired behavior Obtain: Appropriate composition of components to obtain desired behavior. Applications Bit-vector Algorithm Synthesis, Deobfuscation 1 Application 1: Bit-vector Algorithms • Straight-line programs that use – Arithmetic Operators: +,-,*,/ – Logical Operators: Bitwise and/or/not, Shift left/right • Challenge: Combination of arithmetic + logical operators leads to unintuitive algorithms • Application: Provides most-efficient way to accomplish a given task on a given architecture 2 Examples of Bitvector Algorithms Turn-off rightmost 1-bit 10101100 10101000 & Y Y & (Y-1) 10101100 Y 10101011 Y-1 10101000 Y & (Y-1) 3 Examples of Bitvector Algorithms Turn-off rightmost contiguous sequence of 1-bits 10101100 10100000 Y Y & (1 + (Y | (Y-1))) Ceil of average of two integers without overflowing (X|Y) – ((X©Y) >> 1) 4 Examples of Bitvector Algorithms P24: Round up to next highest power of 2 o1 := sub(x,1); o2 := shr(o1,1); o3 := or(o1,o2); o4 := shr(o3,2); o5 := or(o3,o4); o6 := shr(o5,4); o7 := or(o5,o6); o8 := shr(o7,8); o9 := or(o7,o8); o10 := shr(o9,16); o11 := or(o9,o10); res := add(o10,1); P25: Higher order half of product of x and y o1 := and(x,0xFFFF); o2 := shr(x,16); o3 := and(y,0xFFFF); o4 := shr(y,16); o5 := mul(o1,o3); o6 := mul(o2,o3); o7 := mul(o1,o4); o8 := mul(o2,o4); o9 := shr(o5,16); o10 := add(o6,o9); o11 := and(o10,0xFFFF); o12 := shr(o10,16); o13 := add(o7,o11); o14 := shr(o13,16); o15 := add(o14,o12); res := add(o15,o8); 5 Application 2: Deobfuscation • Transform given code into simpler representation (using components from the given code). • Important for identifying malware/viruses 6 Deobfuscation Example: Multiply by 45 Int multiply45Obs(int y) a=1; b=0; z=1; c=0; while(1) if (a==0) if (b==0) y=z+y; a=:a; b=:b; c=:c; if :c break; else z=z+y; a=:a; b=:b; c=:c; if :c break; else if (b==0) z=y<<2; a=:a; else z=y <<3; a=:a; b=:b; return y; Int multiply45(int y) z=y<<2; y=z+y; z=y<<3; y=z+y; return y; 7 Deobfuscation Example: Interchange src/dest InterchangeObs(Ipaddr *s, *d) *s = *s©*d; if (*s == *s©*d) *s = *s©*d; if (*s == *s©*d) *d = *s©*d; if (*d == *s©*d) *s = *d©*s; else *s = *s©*d; *d = *s©*d; return; else *s = *s©*d; *d = *s©*d; *s = *s©*d; Interchange(Ipaddr *s, *d) *d = *s © *d; *s = *s © *d; *d = *s © *d; 8 Dimensions in Program Synthesis • Functional Specification – Pre/Post-conditions, Input-output examples, Inefficient/Related programs – Interaction in face of over/under specification • Search Space – Imperative/Functional Programs • Operators • Control-flow – Restricted Models of Computation • Search Technique – Constraint Generation • Invariant-based, Path-based, Input-based • Precise/Abstract/Approximate Operator Encoding – Constraint Solving 9 Dimensions in Program Synthesis • Functional Specification – Pre/Post-conditions, Input-output examples, Inefficient/Related programs – Interaction in face of over/under specification • Search Space – Imperative/Functional Programs • Operators (Arithmetic/Logical) • Control-flow (Straight-Line) – Restricted Models of Computation • Search Technique – Constraint Generation • Invariant-based, Path-based, Input-based • Precise/Abstract/Approximate Operator Encoding – Constraint Solving 10 Dimensions in Program Synthesis • Functional Specification – Pre/Post-conditions, Input-output examples, Inefficient/Related programs – Interaction in face of over/under specification • Search Space – Imperative/Functional Programs • Operators (Arithmetic/Logical) • Control-flow (Straight-Line) – Restricted Models of Computation • Search Technique – Constraint Generation • Invariant-based, Path-based, Input-based • Precise/Abstract/Approximate Operator Encoding – Constraint Solving 11 Dimensions in Program Synthesis • Functional Specification – Pre/Post-conditions, Input-output examples, Inefficient/Related programs – Interaction in face of over/under specification • Search Space – Imperative/Functional Programs • Operators (Arithmetic/Logical) • Control-flow (Straight-Line) – Restricted Models of Computation • Search Technique – Constraint Generation • Invariant-based, Path-based, Input-based • Precise/Abstract/Approximate Operator Encoding – Constraint Solving 12 Functional Specification • Choice 1: Logical relation between inputs and outputs • Choice 2: Input-Output Examples 13 Functional Specification: Logical Relations Problem: Turn off rightmost 1-bit Functional Spec of components Subtract, Bitwise-And Subtract(I1,I2,J) := J = (I1-I2) Bitwise-And(I1,I2,J) := J = (I1 & I2) Functional Specification of desired behavior b Æ[ ( p=1 I[p]=1 b Æ (I[j]=0)) ) (J[p]=0 Æ(J[j] = I[j])) ] j=p+1 jp 14 Experiments: Comparison with Exhaustive Search Program Brahma AHA time Program Name lines iters time P13 4 4 6 X P14 4 4 60 X P15 4 8 119 X P16 4 5 62 X P17 4 6 78 109 P18 6 5 46 X P19 6 5 35 X P20 7 6 108 X P21 8 5 28 X P22 8 8 279 X P23 10 8 1668 X P24 12 9 224 X P25 16 11 2779 X Name lines iters time P1 2 2 3 0.1 P2 2 3 3 0.1 P3 2 3 1 0.1 P4 2 2 3 0.1 P5 2 3 2 0.1 P6 2 2 2 0.1 P7 3 2 1 2 P8 3 2 1 1 P9 3 2 6 7 P10 3 14 76 10 P11 3 7 57 9 P12 3 9 67 10 Brahma AHA time 15 Functional Specification Problem: Turn off rightmost contiguous string of 1-bits • Logical Relations – A bit complicated • Input-Output Relations – Key challenge is to resolve ambiguity – Our solution: Interaction with user 16 Dialog: Interactive Synthesis Problem: Turn-off rightmost contiguous string of 1’s User: I want a design that maps 01011 -> 01000 Oracle: I can think of two designs Design 1: (x+1) & (x-1) Design 2: (x+1) & x which differ on 00000 (Distinguishing Input) What should 00000 be mapped to? User: 00000 -> 00000 17 Dialog: Interactive Synthesis Problem: Turn-off rightmost contiguous string of 1’s User: 01011 -> 01000 Oracle: 00000 ? User: 00000 Oracle: 01111 ? User: 00000 Oracle: 00110 ? User: 00000 Oracle: 01100 ? User: 00000 Oracle: 01010 ? User: 01000 Oracle: Your design is X & (1 + ((x-1)|x)) 18 Synthesizing Inputs for Dialog with User • Distinguishing Input construction is a bit expensive. • We tried two optimizations – Interleave with random inputs. • Overall end-to-end performance even worse. – Interleave with biased random inputs. • Performs best. 19 Biased Random Input Selection • Theorem: If a circuit uses only add/subtract/and/or/not operators, then ith bit of an output depends only on ith bit of inputs and bits on right side of it. • Biased Random Strategy: – Choose a random input whose rightmost bits are different from the ones that have already been queried for. – For example, if 3 inputs of following form have been queried r1 0 0 r2 0 1 r3 1 0 Then, choose the 4th input to be of the form r4 1 1 20 Experiments: Random vs Biased-Random Prog. Random Biased-Random Prog. Random Biased-Random Time Iters Time Iters P13 33 9 7 3 7 P14 14 25 4 7 1 4 P15 168 7 14 4 11 1 6 P16 67 10 19 6 4 8 2 6 P17 217 17 21 6 P6 6 23 2 4 P18 229 19 26 4 P7 1 5 1 5 P19 164 13 65 5 P8 2 11 1 6 P20 214 17 63 6 P9 5 10 5 6 P21 1074 15 272 6 P10 14 14 3 9 P22 X X 186 9 P11 24 16 14 9 P23 24 9 12 5 P12 279 24 46 10 P24 12 4 3 2 P25 X X 1 9 Time Iters Time Iters P1 1 5 1 3 P2 7 11 5 P3 2 8 P4 2 P5 21 Conclusion: Component based Synthesis Problem Definition Given: • A library of components where each component comes with its functional specification • Functional Specification of desired behavior Obtain: Appropriate composition of components to obtain desired behavior. Inspiration • Standard process of knowledge discovery • Modular development • Can it help with modular synthesis? 22