>> Francesco Logozzo: Good afternoon. For me it's a... today, Roberto Giacobazzi, a professor at the University of Verona....

>> Francesco Logozzo: Good afternoon. For me it's a pleasure to introduce our speaker today, Roberto Giacobazzi, a professor at the University of Verona. So you know Roberto is one of the most important researchers in static analysis and in abstract interpretation. He's famous for his work with Francesco. He'll give a talk on Thursday on domain theory and abstract interpretation particularly in the concept of completeness. And he's also famous because he was my very first professor at the university, the one who taught me about data verification and weakest precondition. >> Roberto Giacobazzi: Yeah. >> Francesco Logozzo: He's young because it was not so long a time ago. >> Roberto Giacobazzi: I know. [Inaudible] >> Francesco Logozzo: He's relative young, yes. And, yeah, back when I was at the University of Pisa and then he moved to the University of Verona. And I had him for one year and he never taught me about abstract interpretation. And then, we found ourselves later. So, thank you very much. >> Roberto Giacobazzi: Okay, thank you, Francesco. Thank you to all of you. So today I will try to introduce the notion of completeness and incompleteness in abstract interpretation, but I will make in such a way that the interpretation of these two notions will be more in the language-based security than in program analysis. But, I think that the notions are basically the same because it is about the precision of an abstraction, the precision of a procedure that tries to learn what the program does. And what we will see is that changing the program in order to make this analysis imprecise is like obfuscating, hiding information. And refining the analysis in order to get the information from the program is like attacking the code. So these two, the battle between this rat and cat is exactly the same battle that happens in security from the language-based approach of course. The scenario quickly: doing this slide here is like saying something obvious. I mean there is a line that goes towards mainframe to ubiquitous. And this makes things in the context where typically you cannot always trust the environment where your program runs. So the standard crypto assumption is that the perimeter of defense is around the software, at least around the software. Bob tried to communicate and the attacker tries to listen into the middle. So I want to hide the information but I cannot hide the fact that the message exists. Indeed, crypto doesn't hide the fact the message exists; it hides the content of the message. I will try to interpret completeness and incompleteness, namely precision of analysis, in the context of white-box attack or white-box cryptography which is more related to the ubiquitous nature of software nowadays. The fact that Alice produced her information but she cannot trust completely the fact that Bob will run that, be Bob first, and secondly that the environment that Bob uses can be trusted. So basically I will be in the context of having a man at the end attack. When Alice delivers the software, at the end there can be somebody that tries to make complete reverse engineering and crack the information that the program contains. So this is the context we tried to approach. And this is basically how these things are handled in reality: namely there is an adversary. This is the asset that I want to protect. There is a sensor that tries to see whether this asset has been attacked. There is a control system that activates the defense. This is typical in tamper-proofing which is a kind of software that reacts to [inaudible] or in code [inaudible] marking, fingerprinting and so on. Well, this has quite a value in the market. And the interesting thing I think, and this is the line of my most recent research, is that trying to see whether behind these different bubbles there is a common path or ground which can be linked to the precision of the analysis by viewing the analysis as the process of attacking the code. And this is basically the picture because typically in black-box cryptography we have an input-output but we cannot see much about the inside of the running of the code. We can weaken this and having the attack to the code more and more it tells about the running of internal of the program in such a way that from the black-box we go to the white-box. And this is something like making the analysis more and more precise about the behavior of the program. Going through along these lines corresponds exactly to refining the abstraction. So basically if I want to interpret this gray-box crypto, white-box crypto and black-box crypto, I can say that, "Well, this is a standard input-output abstraction." That's the identity [inaudible]. So in the middle there are levels of obscurity that I can have, and for each of them there will be probably a reaction or a protection system that my code has to deliver in order to defeat that attack. I want to link these two. So I want to link the precision of the attacker with respect to the fact that the program can be transforming in order to defeat that abstraction. Okay, what is this? It looks like a picture at the beginning of the universe, the very early seconds in the universe. If you look, the picture looks like this. But it's not; it's a chess board. So what's the difference between these two? This is absolutely obscure. Here we have information. What kind of information? Well, we know the pieces on the chess board so we know how many of them, what type and so on. The relation between these two has to be understood with respect to the eyes so the perception we have. So the analysis, our view over this, is able to extract something: colors, shadows. And here, it is able to abstract more. I want to use this analogy in order to do the same on the software, on the code with respect to an analysis which will be an abstract interpretation. So we need a model and, well of course, it's the standard model that we all know. It's too complicated, too complex, undecidable. [Inaudible] showed us that this is not recursive in general so it's absolutely complicated. So this is a complete mess and, well, we need abstraction. Abstraction means that the traces can be -- We don't have a precise definition of each single transition but we have an approximation of these. And this should be computable. In this set me have a loss of precision. We all know that, for instance, if you take the interval of the maximum and the minimum in these traces computed we get an interval that contains many [inaudible] traces that don't exist in real execution. And while we can set up logic around this and have logic over abstract traces. This was interesting old paper that links model checking -- analyzing is model check -analyzing [inaudible] is modeling checking of an abstract interpretation. It was an interesting old paper in the nineties. And then it may well happen that we deal with the precision, the precision that, well, actually we think that this is the interval computed but in reality the true interval computed in the end is much smaller than the interval computed by the analysis which is bigger. So we have a loss of precision. And completeness means that the analysis loses precision with error. Okay, from the early definition of abstract interpretation in Cousot and Cousot '77 and '79, there has a flourishing of works that deals with precision: Steffen, Mycroft and then myself, Francesco and Francesca tried to solve the problem once and for all. And we proved that indeed it's possible to refine an abstraction with respect to any [inaudible] continuous function, namely for any computable function in the least possible way in such way to make it complete. And then we tried to apply this little result to many aspects, and language-based security is the one that we will try. The scenario that I've shown you is the area where I've tried to show this application. What are the ingredients of [inaudible]? So the ingredients are the standard ones: abstraction. Abstraction I think most of you know very well. I used the formalization standard from abstract interpretation and used a pair of functions that take a concrete object, abstract into any property and then concretize it back to something which is above, which is the error made in the abstraction. And these correspond exactly to see an abstract domain or to see a subset of the concrete that contains only the points that represent the abstract object is perfect [inaudible]. And this means that basically an abstract domain is nothing else than an operation that takes an object, concrete one, maps into somewhere which is above, which is approximation and then is stuck there because once you lose information you cannot recover it any more. This is an [inaudible] closure operator. So the lattice of all [inaudible] closure operators is the lattice of all possible abstractions. And this is pretty nice because you can play there the game of transforming closures which means transforming domains. So when you do this standard approximately you typically inject an error because you compute in the abstract instead of computing the concrete. And the error you made corresponds basically to be sound but not complete. And the error can be propagated in a fixed point, and this is what happens typically. This would be the true abstraction of the true computation; if you are computing the abstract, you can get an object which is an over approximation. Soundness means what? Standard soundness that we know is the following: well, you typically have a function that computes from x to f of x. But, in the abstract domain you don't have x; you have the property of x so you have the approximation of x. Then, you compute the function. And then, you need to go into the domain of objects, of abstract objects so you approximate the result. So in the abstract domain you compute this. In the concrete domain you compute this. When the completeness happens -- So in this case you have sound because you are above then the approximation of the true result which is this. If these two collapse, you are complete. This is called backward completeness, namely by approximation of the input you don't lose precision in the computation. Typical example: you have rule of sign. Rule of sign is complete with respect to multiplication but is incomplete, doesn't work -- it's sound but not complete with respect to addition because you lose the magnitude of numbers. So once you have the positive and the negative, you want to multiply. Then, you've got exactly a negative. But if you made the addition of the two -- once you've lost the magnitude of the number you don't know any more who was prevailing of the two. So you can only say I don't know [inaudible]. Forward completeness is perfectly the dual. In this case instead of looking if you lose precision by approximating the object in the input with respect to what is computed, you see whether you lose precision approximating the output. So you assume that the input is abstract. And then, what's happened is that you are incomplete when you have an arrow between abstracting the output or having the concrete output. It's perfectly dual. Look at this example; this is a classical example to show these two notions. Being abstract and concrete is a relative notion, so you can be abstract of something which is more concrete than another and so on. So consider that this is your concrete domain; it's a simple lattice of intervals. And take this abstract domain, this abstract domain with the red bullets. Take the square operation. The square operation is computed with the blue arrows here. This domain that says, "I don't know the number. It's positive. It's between 0 and 10," it's forward complete but not backward complete. Why? If you approximate -- Being backward complete means that you don't lose precision by approximating the input of the function. So if you approximate 0, 2, you get here. Then, you do the square you get here. Okay, while if you don't approximate the input and you do the square, you get here. This is the arrow made here, made not backward complete but it is forward complete. All these points are already the output of the function square and they are all inside the abstract domain. So basically if you look, being backward complete means it contains the inverse image of the function with respect to which I want to be complete. This is linked with the control algorithm that tries to refine the partitions going backward by the precondition. The only difference is that we proved this in the year 2000 and Clark made it in 2002. Sorry? >>: And [inaudible] backward complete, what does that give you with respect to the compute semantics? >> Roberto Giacobazzi: The fact that with respect to the approximation of the old compute -- So if you compute in the concrete and then you approximate the output or you compute in the abstract, you get the same. This is backward completeness. >>: But how does that help static analysis? >> Roberto Giacobazzi: Well, it's the top you can make; you cannot get it better. You don't have false alarms. >>: With respect to the abstract domain? >> Roberto Giacobazzi: Yeah. Conversely, there are domains that are backward complete and not forward complete and this is all dual stuff. So what we proved is that we can modify domains. Any... >>: Maybe it's interesting to say that trivial abstraction is trivially complete both backward and forward. >> Roberto Giacobazzi: And forward. Yeah, of course. The concrete semantics are perfectly complete. We can modify domain namely -- This is a case of completeness. You see that x is approximated here then computed and approximated there, so the two elements collapse exactly through the same point. This is incompleteness. When this happens, there is an error here due to the approximation. Well, in this case we can -- If you have an incomplete domain abstraction, you can make it complete by adding points -- You refine your abstraction -- or eliminating points and you simply your abstraction. Typically in static analysis we refine because we look for a more precision domain that is able to avoid false alarms. But you can also avoid false alarms by removing information which is simplification. >>: But it might have more false alarms with respect to the concrete semantics. >> Roberto Giacobazzi: You don't have more false alarms because you are complete. You are less precise with respect to the property. You don't have any more of the same property. You lose the property you want to look for. But you remove the presence of false alarms. >>: [Inaudible] abstract domain [inaudible]... >> Roberto Giacobazzi: Yeah, the property is the abstract domain. And this was proved -- Well, actually it was from '98 but basically a backward problem can only be transforming in the forward problem by considering the inverse function with respect to ways to become complete. Amazingly we can also modify programs, not only domains. So until now we have a domain. We have a program. We want to refine the domain or simply the domain to avoid false alarms for that program. But we can take the domain fixed and change the code, the program, in order to be complete for that domain. Well, it's possible theoretically. Basically this is a case of incompleteness and in order to become complete you simply have to avoid that this -- namely transform the function to the closest from above or the closest from below that is complete for that abstraction. And this is very simply easy because you can compose this with the abstraction itself or with the adjoined of the abstraction. Okay so how all this fits into the stuff of security or let's say static analysis as a way for attacking code and code transformation towards obfuscation is a way for protecting code. We go back to the picture. So basically, what is an obfuscator? And obfuscator is a compiler or a bad student writing code. Typically you have your input-output and you want to keep your input-output and you want a transformation from this code that everybody can understand what is inside here goes there then nobody can understand what's happening inside. This has to be compiler. But true hackers actually do not perform compilation; they really add junk, reorder code. They do very weird stuff on the machine level. So the idea is that I want to see that this transformation, tau, can be systematically derived from the precision in terms of completeness of the attacker and how this can be done. So the typically attackers use [inaudible] many tools like old GDB and so on, colluding attacks, differential attacks. There are many ways for attacking code, for making reverse engineering and understanding how it works. And most of them use tools that are based on the analysis. So the objection that, well, your way of viewing the relation between attack and defense is strictly related to the analysis that doesn't consider the human capability of understanding code in the attack is partially true because in reality for industrial-size code the reverse engineering cannot be done without a tool based on analysis, which is a slicer, which can be a debugger or whatever. So if you are able to defeat an analysis, automatically you delay much the power of an attacker in understanding the behavior of the code. So this is the idea basically. The malicious user has a lens so he cannot really see everything but can only see a portion, an abstraction of the execution. And the obfuscation wants to make this malicious user blind. So basically the defense has to turn this into this and the attacker has to do the reverse. And that will use some stuff many years ago by Neil Jones; indeed, this is a paper that we did together last year. And it's interesting because we said obscuring code is compiling. Well, you can specify a compiler at least at the level of specification like the combination of a abstract interpreter. Because if this is your source code that you want to make obscure, well, we all know that the source code is equivalent to the specialization of an interpreter with the source code. And if you want to keep the input-output of the program, it's enough to find any interpreter of your language and a specializer for that make this combination. But, in most cases you inherit almost completely the structure. So basically if this is clear then this is clear too. The challenge is to make this obscure, namely, to twist something inside here in order to make it obscure and link the twisting of the object inside here to the power of the attacker. So, look, this is a little program. This is another program that computes exactly the same. What's the difference between these two? Well, it's obvious. This is the true code. This is the flattening of the code. If you take the [inaudible] of this program, it's completely flat. And everything is handled by the program counter which is statically here, statically written inside the program itself. So if you have a good specializer, typically the specializer doesn't return to you this. It's able to understand that the program counter can be statically derived. So if you apply this equation, you get back to here. So how can I let this equation generate this instead of that? This is related to completeness and I will show you how. So the attacker -- In order to understand this we have to understand what is the attacker. The attacker is an abstract interpretation. So imagine that you have the previous approximation; you have a function which chess are on the board. You have an approximately that takes an image and returns another image; for example, the strange image where we cannot recognize if it's the origin of the universe of it's a chess board. This is an abstraction because this contains this and many other images of course. Then, you have a function that counts an upper bound of the number of different types of chess on the board. Here you have a case of incompleteness because if you approximate the image with this -- So if you approximate the input: WhichChess is only able to say, "Well, there are probably so many chess from all the kind of chess over the board," black and white. So I can produce 12. While instead if I have the true image then I get 7. So moving from this picture to that picture is an incompleteness. And from the perspective of our eyes, it's an obfuscation. So does it work the same on the program? Yes. From my point of view, obfuscating is making an abstract interpreter incomplete. So the attacker is an abstract interpreter, whatever abstraction considers, and failing precision is like return the maximum amount of false positives namely failing in the capability of extracting the true information. But this can be simply proved by simple reasoning. Well, basically if you want to keep the input-output, the transformed code has to have the same input-output of the original one. You assume that an abstraction is complete. So if you compute the abstraction of the semantics, this is equivalent to computing the abstract interpretation of the program. So you don't lose precision by analyzing. Well, you obfuscate when you transform the problem in order to lose some information. This happens if and only if the transformed code is incomplete for that abstraction; if and only if. So losing precision in transforming code is precisely the same as telling you that the transformed program is incomplete for that abstraction. Well, this happens also in static analysis because if you compile your code, transform your code, it may well happen that the same analysis doesn't work any more in the same way because what's happening there is that the transformer obfuscated the analysis. Let's go back to the example of the rule of sign. The rule of sign is, we said, complete for multiplication; we all know it. So if you approximate the input with the sign, you get precisely the sign of the output with no loss of precision. But it is incomplete with respect to addition. So if you have a little program which is one line of code that makes multiplication, how can you obfuscate it with respect to the rule of sign? It's very simple: you transform multiplication into an iteration of additions. You keep the same input-output but the static analysis which is of course very poor, the abstract interpretation which is very poor is only able to see the rule of sign fails in extracting the sign of the code. So this is a transformation that keeps the input-output but obfuscates the analysis. What we will try to see now is how to derive this transformation systematically from the property that I want to make obscure, blind. Well, we tried some with my little group and we observed that most truths used by attackers correspond to abstract interpretations. Profiling: abstract the memory over particular variables. Tracing, slicing, monitoring, decompilation, disassembly can all be formalized as abstract interpretations. So if each of these is an attack strategy against the code then I can derive from each of them a transformation of the code that makes that attack blind. Okay, how? We all know that good programs are well structured and have concise invariants. Obfuscated programs should be very badly structured and very ugly invariants, incomprehensible or at best you basically say, "I don't know what's happening in that program." So this is a conflict being well written and obfuscated of course. There is interesting stuff around the idea of deriving a compiler by specializing an interpreter. The following two aspects hold: the first is that the program that you attain in this way inherits the algorithm of the source code, so the algorithm remains basically the same. What changes is the programming style which is inherited from the interpreter. So when you have a code, you specialize an interpreter with that code, you inherit the algorithm of your source but the programming style is taken from the interpreter. So if I want to obscure my code, I have to twist the interpreter in order to change the programming style in such a way that the analysis becomes blind. So I have to derive a distorted interpreter. Well, from the interpreter I have to move to an interpreter which is distorted but is still an interpreter from my language. An example: let's see this by two examples. The first is flattening. Flattening is a pretty well established technology for -- actually, I think, the very first patent around this was by Microsoft in 1992. So we go back. And they were hiding in this flattening the key for the use of the program, such a way for basically activating the code because the order of the blocks becomes relevant in order to activate the code. It's an interesting patent to learn. Well, actually the technology of flattening is much developed and there is a company, Cloackware -- that now is completely [inaudible] by Irdeto in Canada which is a multinational big company making security -- that basically made around flattening their core business. The flattening idea is the following simplified: you have your control flow graph, you flatten it and you have a dispatcher that decides which block goes into execution. Of course all the complexity is moved from the control flow graph to the dispatcher. The dispatcher can be very complicated and become flow sensitive so if you input some data the control flow -- the sequence of blocks changes. For the same data you may have change of the control flow because basically blocks are redundant and so on. But it is flattening. So it works very well with this example because if you take this and you take the program that I showed you before, this is the original code, this is the flattened code. You have a case -- The dispatcher here is very basic; it's basically the program counter. These two are exactly the correspondence of what? Of the source program and the specialization of an interpreter with this code. Look at the interpreter. The interpreter is by itself flattening code because you have [inaudible], the code [inaudible] go back to the same loop. How? Well, if I take this program and I specialize a little interpreter for C I don't get that because the control flow here is static. So I can predict the next program counter perfectly, and once I predict the specialization, do a little pass through evaluation and generate the true code. This should not happen because, otherwise, I get back to the original code. I want an obfuscated one. So how can I make it? Well, you take the interpreter and if you force the program counter to be dynamic so the specializer cannot understand -- It's forbidden for the specializer to understand and analyze the program counter -- automatically the specialization generates for you a flattened program. So by specializing this interpreter with the original code, forcing the program counter to become dynamic, you get an automatically flattened program. Then, if you twist the interpreter, you add very complicated homomorphic encrypted function around the program counter then you get a more, more, more, more complicated way for flattening the program and make it more and more secure. But why is this true? Namely why is making this dynamic related and how is this related with the attack? Because this looks like a trick: I have an interpreter. I force the program counter to become dynamic. Automatically, it returns me the flattened code. Where is the attack there? We proved the theorem that says you are forced to be dynamic if and only if you want to make incomplete a very simple abstract interpretation that is the one that constructs the control flow graph. So if you make a very simple abstract interpretation that forgets completely about the memory of your computation and simply extract the control flow graph, you make that incomplete if and only if the program counter is dynamic. So... >>: And then I move to tracing the control flow. >> Roberto Giacobazzi: Of course. And then, you swap to another attack and you try to make incomplete the data. Why? This is the theorem. Namely by extracting the control flow graph from the execution is equivalent to extracting the control flow graph statically. So your algorithm for extracting the control flow graph is complete. So you don't lose precision, so you are complete if and only if the program counter is not the program variable; it's not variable so it's static. That means that if you want to let your attack -The attacker here is the algorithm that extracts the control flow graph which is static, purely; it's an inspection of the code. It can be easily extracted as an iteration of the code by simple abstract interpretation that forgets the computation memory. You don't lose precision if and only if that is fully static. Namely, if you want to make it incomplete, obscure, you have to make it dynamic. This is exactly what you do in order to generate the transformed code. So basically flattening is nothing else than distorting an interpreter by forcing the program counter to become dynamic that makes the abstract interpreter of extracting the control flow graph imprecise. Is there a theory behind this? Yes. It's exactly the theory of transforming domains, making it complete, incomplete and so on. I go quickly around this. Typically you have a domain and you have another domain. And if you refine for becoming complete you add points and you become more complete. For instance, [inaudible] refines a domain to become more complete. So you add points and the domain becomes more and more precise. Here you have many domains that may reach to the same point, so there are many domains that once refined provide you data as the result. Among all of them take the most abstract if it exists. Would it exist, that corresponds to a kind of compression of your domain that once refined gives you the target domain which is this. This is the most abstract domain that once refined gives you this. Yeah? >>: Just thinking of a way if it existed [inaudible] because it's a complete lattice, the abstraction over the UCL so... >> Roberto Giacobazzi: No, that... >>: ...[inaudible]. >> Roberto Giacobazzi: There are cases where it doesn't exist. For instance if the operation with respect to which you were refined is negation. You have a square. You have one [inaudible] abstraction. You add the other point. The other one add the other point but the most abstract doesn't contain any of them. It is complete. It is the two-point lattice, top and bottom. It's a property of r. We started with Francesco many years ago the property of compressible domains, compressible abstraction. For instance if you have junction completion, we take the disjunction, the compression is the joining reducible elements. So those are the kinds of flat graphs, flat lattices that contains all the basic points from which you can generate all of the disjunctions. It's a property of r. Okay, so basically you have a function that refines and you have an inverse function that squeezes the domain when it exists. It doesn't always exist; in most cases it exists. For instance this is the lattice of intervals. This is the square. Then we can build this little function by considering this formula. So basically if we remove [inaudible] with respect to that function square, this is the squeeze of the original domain. Okay, so what we tried to prove is that with respect to the function that is inside r, r is a way for completing with respect to the function f. This inverse is the one that induces the maximal amount of incompleteness, namely removes all the relevant points that are useful for removing false alarms. So it's exactly the contrary of what we do in static analysis, but it's exactly what we look for if we want to make the analysis blind. Okay? Let's see this with another example then I finish. Slicing. Slicing obfuscation is more tricky. Program slicing obfuscation. So program slicing: basically you generate the program dependency graph. You have this little program then you slice off from this program with respect to the variables x and y and so on. And all this is statically derived from the program dependency graph. Take for instance this little word-count program. Okay, you have number of lines, number of words, number of characters. The slicing criterion is the variable with respect to which you want to slice; this is the number of lines. And you get out this slice. And if you have number of words, you have this slice. Okay, if you want to obfuscate program slicing, what you should do is to return a slice of the old code. So the slicing algorithm is more precise, it's able to have a sharp view of the execution around that criterion if the slice is small in size. If you want to obfuscate the program slicer, you have to make the slicer blind to its capability of selecting instructions. Basically he has to return the old code as a possible slice. That means that it fails. Of course, I mean if I tried to attack a program and I used a program slicer to reduce the size of the code I want to attack, it returns me the code at the beginning, it's a completely useless tool for may attack. Okay, so how do hackers -and this is simple hacking -- do this? They add the fake dependencies. Because the program slicing is related with the control dependency graph, the program dependency graph. If you add dependencies which are fake -- For instance, in this case you see that this is always true and this is always false, so there are instructions that relate the variables, link, make the variables depending with each other but they will never be executed. Because, the program dependency graph is extracted statically, I would say, in abstract interpretation of the program. Then the program slicer is enabled to return a good slice. Indeed, it gets a much bigger slice for number of lines and for number of words: two big slices. Is this related with the algorithm that attacks the code which is the algorithm that's extracted from the dependency graph? Yes, exactly as before here the transformation that adds the fake dependency is precisely induced by the algorithm that extracts the program dependency graph. Look, the algorithm of program dependency graph is an abstract interpretation where you forget completely about the state once again and generates the graph. Okay, so what's happening here? If I formalize this as an abstraction what happens is that it's very easy to prove once again an "if and only if" that says that the program dependency graph algorithm is an abstract interpretation defined by an abstraction row and that abstract is incomplete if and only if the code contains static -- so not dynamic -dependencies, fake dependencies, namely dependencies that are not true in the true trace of execution. So dependencies that are not generated [inaudible]. Okay, so it seems that with these two examples the theory is more general of course; it doesn't work for all examples. What we tried to do is the following: we want to obfuscate the program means we want to make blind an attacker. The attacker for myself is an abstract interpretation. Warning: an abstract interpretation doesn't need to be static. Also, monitoring, tracing can be formalized with an abstraction. So also dynamic attacks can be formalized by an abstraction. Also tracing when you have huge amounts of traces and you make mining on this, the mining is related to some abstraction because you lose some information in order to extract some of the information. Once you know this abstraction no matter what -- For instance take the compilation. The compilation you look for irreducible graphs in the code. You know, [inaudible] of the loops. So how can you make it incomplete, the algorithm that extracts the irreducible graph? You jump inside the code with fake jumps. In this way the code appears completely reducible and the decompiler is unable to reconstruct the original structure. Once again this is making incomplete an abstraction which is the one that looks for the graphs that are reducible. Disassembling, if you see the standard is assembled they work perfectly in the same way. So once you are able to extract the abstraction, you can always build the twisted interpreter which is always a modification of the standard interpreter that depends on this abstraction and makes by this equation the transformed code blind for the abstraction. The point is the following: you can always find a better abstraction than the obfuscated one. Of course. But look, Barack and others proved in 2001 that obfuscation is impossible. So you cannot universally obfuscate your programs. Rice in 1952 proved that analysis is impossible. Well, we all have done program analysis for at least 40 years so it makes sense to do obfuscation even though it's impossible. That's it. Thank you. [Applause] >> Francesco Logozzo: Time for some questions. >>: So you might also -- you can even increase the power of the abstract interpreter, not just by changing the domain but by, for example, unrolling that loop to begin with. It might get rid of the irreducible part. Or the multiplication example that you gave; you can just do trace partitioning. You can actually get that one, right? >> Roberto Giacobazzi: Yeah. >>: So I mean even if you stayed in the domain... >> Roberto Giacobazzi: In that case what I would do -- I would say this is a line of research; we don't have the ending point on this of course. But I would do that and try to specify trace partition as a refinement of the domain. And then, I use that domain for deriving the obfuscated code that defeats your trace partition. I agree with you that there is a rigidity inside this stuff, that we always pass through the abstraction in order to construct the interpreter. But, I believe that the most refinement you can do of the interpreter, you can see that as an abstraction of the domain over a more standard interpreter, a standard interpreter like you want, the simplest one. Of course if you look at define refiners like refining the widening or weighting some iteration before threshold, that cannot be specified as [inaudible]. But it's nice, challenging stuff because I think that also, for instance, in the delay of the widening it's very easy to find the transformation of the code that simply delays more the change of the variable in such a way that it breaks your refinement. So probably there is something even more general than the thing that we are looking at, at the moment. But we are pretty happy that if you take this book by Christian Collberg, some kind of bible of all these tricky transformations, most of them we were able to specify as an abstract interpretation. And for each of them, the twisted interpreter was derived almost naturally. >>: Are you able to define new obfuscation techniques using [inaudible]? >> Roberto Giacobazzi: Well, for the moment... >>: For the moment you... >> Roberto Giacobazzi: For the moment... >>: ...view the... >> Roberto Giacobazzi: ...we tried to understand... >>: ...existing of [inaudible]... >> Roberto Giacobazzi: Yeah, it was a kind of understanding that instead of viewing obfuscation as a trick that each time I think new stuff I generate then I think that I have a billion-dollar company mine that doesn't work, of course, we tried to derive the principle behind this. The idea now is the following: is it possible to compose in a kind of cryptoway very simple transformations in order to make more complicated ones by composing in such a way that the order becomes relevant? So if you know the order of the transformations of the very tiny little transformations that you do, you are able to reconstruct back the original code. So the order can be exponential because you have exponentially many different orders among it. And that would be interesting stuff to do. At the moment we are trying to understand existing. I think, yes, in principle. >> Francesco Logozzo: Questions? >>: Quick question. So my understanding is that all this works because you have yet to [inaudible] so you are just considering the static approximation, you are considering the best transformer. >> Roberto Giacobazzi: Best transformer. >>: Okay, then. That's not reality. >> Roberto Giacobazzi: Yeah. >>: You always don't have the best transformer. You have widening. You have sometimes separation which -- Yeah, you are considering... >> Roberto Giacobazzi: If you defeat the... >>: ...the worst case -- You are considering the worst case but... >> Roberto Giacobazzi: If you defeat the best transformer, you will defeat any other such. >>: Yeah, of course. You are considering the worst case. But as I say, how far is the worst case from the real case? >> Roberto Giacobazzi: Yeah, but from my point of view when I want to protect -because from my perspective I want to protect against somebody that wants to enter my house. So if I'm able to protect against the best... >>: Yeah, but it can be too much. >> Roberto Giacobazzi: ...guy that can... >>: I'm saying, you can protect it by just putting [inaudible]... >> Roberto Giacobazzi: Yeah, I'm probably too much. >>: ...[inaudible] or whatever. The door fine. The lock [inaudible]... >> Roberto Giacobazzi: I agree with you. I agree. >>: So that's what I'm wondering. What's [inaudible]...? >> Roberto Giacobazzi: You can probably have a lower level of obfuscation to defeat the true tools. But from my point of view if I -- This is why I look for simple transformations because if I'm able to defeat the basic attacks and compose them with respect to the strongest possible attacker which is the best [inaudible] then I'm pretty sure that other attackers in any way have trouble getting in. Of course you pay -- There is... >>: [Inaudible] performance [inaudible] too complicated or [inaudible] does not kick in and... >> Roberto Giacobazzi: I agree. But the... >>: So it can be to much. So I'm wondering if you know your attacker, you know for instance what is the [inaudible], if you know that widening is used... >> Roberto Giacobazzi: If you know the widening, you can probably simplify this. Yes. Consider that anyway most of these technologies are used not for protecting the algorithm -- Nobody wants to protect the weak sword because everybody knows it -- it's for protecting keys inside the program. And these are related to a very small portion of the code. So you don't really need to obfuscate the old code. You really need to target the specific area of the code in order to let them, for instance, very hard to extract by slicing, very hard to understand in the control flow and so on. So you probably pay a runtime slowdown of ten times over that little piece of code. Computed over -- A student of mine made [inaudible] he made a dynamic obfuscator that was encrypting code in Java bytecode so by passing the type system, so it was very complicated. The slowdown was ten thousand times. But he applied it in such a small area of the code that the eventual slowdown was less than 0.7. So it depends on where -- Of course, I mean if you applied it to the old, it can be too much. >> Francesco Logozzo: Okay. Thank you. [Applause] >> Roberto Giacobazzi: Okay, thank you.

>> Francesco Logozzo: Good afternoon. For me it's a... today, Roberto Giacobazzi, a professor at the University of Verona....

Related documents

Products

Support

&gt;&gt; Francesco Logozzo: Good afternoon. For me it's a... today, Roberto Giacobazzi, a professor at the University of Verona....

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Francesco Logozzo: Good afternoon. For me it's a... today, Roberto Giacobazzi, a professor at the University of Verona....