Interactive deobfuscation A thrift shop for static deobfuscation whoami • Security researcher • Break stuff, reverse, make them better and break again • Part of nullsec non profit group How it all started blame this person => • Presumably a simple crackme – Eventually discovered as wb aes • I wanted to solve it statically – Since running things is cheating – Goal was to solve in lt a month • A race I didn’t manage to fulfill when working statically • Name is md5’ed • Serial is transformed / permutated using unknown function Challenge archeology • Overall the crackme was deployed into 2 main parts • Deobfuscation – Opaque predicates, lookup tables, value tables and “spaghetti” code • Cryptanalysis – The original cipher was whitebox’ed Deobfuscation Deobfuscation - Layer0 • Found some jmps, decided to map them all – find_lookuptables(“Mov <register>, dword ptr [addr*4]”) – Add xrefs, define locs • IDA can’t map them all into graph views (due to size, more RAM == bigger graph) • After looking a bit there seem to be some logic and different operations inside them • However they all lead to the same path eventually Deobfuscation Layer1 • Removal of jmps and basic block identification – All the obfuscation was done in a matter to effect the bb itself, after a jmp to another table occurred everything was restored • Follow_jmps_by_addr(addr) to find bb boundaries – Follow jcc until a jmp / push + ret sequence is found – Compress it, remove jccs and make one BB – In case xrefs, patch them together Deobfuscation – Layer2 • Opaque predicates • Ops which used to make the bb bigger – Simple rule – operations are per bb and do not exceed it – Wrote a simple emulator to emulate bb and optimize them to simple instructions • 1 exception – do not touch lookup tables values – More on this later Deobfuscation – Layer 3 • Tables, and lots of them – Apart from the jmptables which lead the way • Tables are used as part of the cipher itself • Key is dismantled inside them (more on this later) • Each table has a different role and some are doubled for obfuscation • FindTables to the rescue Deobfuscation – Layer3 • FindTables basically taints memory and looks for read of 16b tables • Once it finds one it defines an array of 0xFF to that addr • All value tables are mapped using this way, their usage however varies Deobfuscation – Layer 4 • Once we have all the code cleaned we get several consecutive lookup tables • Loops are unrolled and become normal repetitive ops (per round and state) • All deobfuscated code was written into a new section called “deobf” to make code reading easier • It is now time to move on to the cryptanalysis stage Cryptanal Cryptanal • The idea to automate every process is infeasible and too much time consuming • I decided to split the work into two main stages: – Operation identification – Key extraction • Both are used interactively – Thus the name interactive deobfuscation Cryptanal archeology • Discovered BGE attacks from the academia – Chow , Xiao • sysk’s phrack article • Eventually said FUCK YOU ALL gonna do it myself w/o cryptic math – Lack of algebra lessons and focus Cryptanal – Layer0 • Actual wb code to encrypt a text • Loops 9 times which made me quite frustrated – Before discovering it was wb’ed – After counting the loops by hand I thought it might be AES – But where’s the key ? • LOLWTF ? md5(user) == wbaes.dec(serial,user_as_key) – No, key must be *embedded* • LOLWUT? md5(user) == wbaes.d/enc(serial,key) ?? – Output isn’t ascii so it could be both enc/dec Cryptanal – Rijndael on a toe • Several simple operations – AddRoundKey, SubBytes , ShiftRows,MixColumns • Some operations are linear and could be replaced with their previous op • The key to understand the attack is to sniff the first round and extract the key – In the future I found Eloi made my life harder rijndael whitebox(rijndael) => evolves into => whitebox(rijndael) • 1st transformation: – ShiftRows is linear, and thus could be replaced in op position with AddRoundKey – SubBytes and ShiftRows could be replaced in op position, as SubBytes does the same op • Let “Linear” aka lin be – lin(x) ^ lin(y) == lin(x ^ y) • 2nd transformation – It is possible to tranform and “compress” several ops into one • By using XORtables and T/yboxes – T/yibox • Combine AddRoundKey and SubBytes into one operation (lookup table) to emit 1 byte • SubBytes(x ^ k[i]) • XORtable – Transform MixColumns into a series of lookuptables, particulary these tables are created by XORing one input byte at a time through the MixColumns vector • 3rd transformation – – – – – Append external encoding into the keys and lookuptables Replace table values with random ones upon stage 41 => 32, 21 => 56, 12 => 4 Let G & F be encoding values G() o AES() o F() • Such that G & F cancel each other out eventually – The external encoding is what makes the whitebox variant “attack resistant” Attaq Attaq 101 • Chow stated that his implementation doesn’t leak any information – In reality the XORtables and T/ytables still leaks one nibble each time – Not very helpful but still something • Since the external encoding cancel each out it might be worth to understand them – Hint hint Attaq! • If we look at input encoding and output encoding we know that they both cancel each other out • Thus if we manage to find the values of the encoding we’d only have a “naked” implementation of wbaes • And then just sniff the first round key and extract the key Cryptbox • Let’s try to look at MixColumns in the Ty/itables transformations • In a general idea it transforms 32b to 32b values • Let P be input encoding and Q output encoding • Now let’s try to give an approximation about the encoding values • Billet suggests to zero out two bits out of the 4 and build up a new lookup table and perform the transformation • Once we have that we construct a new lookup table to their reversed operation whitebox^whitebox • We get 256 possible bijections which can be used to build up output encoding approximations • The same operation is done to the input encoding using the acquired approximation we had for Q • Once have the external encoding values we can just sniff the first round key and extract the keys FIN • @shiftreduce • shiftreduce@gmail.com • Thanks to Eloi for making this challenge • greetz @ #ecl,#nullsec,inbarr,nirizr,skier_,emdel,over, Mikae, l_inc,