>> Ratul Mahajan: Welcome. Good morning. Welcome everybody. It's my great pleasure to introduce Cristian Estan from University of Wisconsin, Madison. And I've known actually Cristian for quite a while now, I think when I was in a grad student at UW and he was a grad student at San Diego and graduated there four or five years ago? >> Cristian Estan: Yes. >> Ratul Mahajan: With George Varghese. and today he's going to talk about some of the work he's done over the past few years on improving deep packet inspection. Take it away. >> Cristian Estan: Thank you, Ratul. So okay. Let me try presenting from here, the cameras can see me here also. This is joint work with my colleague Sam Aschaw [phonetic] and my two students Randy Smith and Chi Ging Cong [phonetic]. And this talk summarizes work that we had in SIGCOMM paper this year and then Oakland paper earlier this year. So what is this talk about? So let me first tell you what this talk is about. It's about regular expression matching, which is a performance critical operation for all type of systems that do deep packet inspection. And the problem is that it takes too much time and too much memory. So our solution to this problem is a new representation for the signature sets that need to be compared against the traffic. And it's a representation that allows us to represent this signature sets compactly and supports fast matching. So for example, compared to methods currently in use in commercial intrusion prevention systems, we have 50 times reduction and at the -- in memory usage, and at the same time a 10 times increase in the speed of the matching operation. So not 50 percent, 50 X. And but the magnitude of the benefits depends on the complexity of the signature so for simple signature sets we don't see benefits that are quite this big. So this is in a nutshell what this talk is about. Let me go on and motivate why we're working on this. Why do we need deep packet inspection? So there are a couple of scenarios that are probably going to keep motivating this type of solution. So for example, you have a server that is -that has a vulnerability and is not patched but it has to accept a connections from outside clients, from anywhere in the Internet. But some of those clients may trigger that vulnerability and want to take over the server or shut down the server or whatever. So why would you ever have an unpatched server facing the Internet? Well, maybe the patch doesn't exist yet or maybe the patch begs some other functionality so there are many reasons why you could have such a server so you put an intrusion prevention system before the server to protect it, and the server can keep running. Another example is when you have an enterprise running multiple applications and the enterprise want to prioritize the traffic without having to change the applications to insert markings into the packets, so that's another driver. Or when you have floods that you need to defend against in the network because it's too late at the end point, and you want to analyze packet contents to detect what's an attack, what's not an attack, and that's another case when you would want to do deep packet inspection. Now, packet has a base filtering criteria have been applied and are used but what we see in all of these problems is that you have to look inside the content of the packet. And for the purpose of this talk, I define deep packet inspection as being done by any system that looks at the payload of the IP or TCP packets. So okay. So how does regular expression matching fit into this whole scenario? So not all of what these systems do is deep packet inspection, they use this header based criteria. For example to decide what signature set to apply. Or if you have something that does application identification then you do deep packet inspection on the first few packets of a flow, and from there on you just look at packet headers because you already classify the flows. So it's not all, not looking at -- not necessarily looking at the pay loads of all packets. And deep packet inspection is not all regular expression matching. You can have some parsing in there that would direct the regular expression matching just to some portions of the packet, not the entire packets. Or you may have some other things such as decoding, various encodings for protocols. And in some cases it plain doesn't apply. So if you have encrypted traffic then you cannot do a regular expression matching on it. You can decrypt it and then do regular expression matching but not apply regular expression matching [inaudible]. >>: I was wondering. Do you know how many enterprises are using IP set in their -- how much air traffic say? >> Cristian Estan: I don't know. >>: When you talk about regular expression here, do you mean also like kind of packet boundaries or do you mean [inaudible]? >> Cristian Estan: So for -- we assume that someone gives us an input and then whether the system gives us a single packet or reassembles multiple packets and gives us a TCP level in a byte string, the same problem appears. So that's external to what we do. But yes, the systems that care about the security, they need to do reassembly, because otherwise the bad guys can evade the detection. So just one more historic note that string matching used to be used for a similar purpose but it's not expressive enough just looking for strengths in payload is not expressive enough. There are so many ways of just changing the attacks to full string based method, so the world is moving towards like regular expressions now. So I'm -- I start by introducing the problem of regular expression matching, then talk a little bit about the core idea, then about the things we needed to do to turn into it a solution, show some results. And if we have time, then I can talk a little bit about other ideas that have been used in this context of improving the memories of your performance of this critical operation of regular expression matching. So, okay. What's our definition of a problem? So we have a signature sets, a signature set. So what's a signature set? We have a list of a set of signatures, each is -- each signature is a regular expression and each regular expression has a rule number associated with it. And the matching problem we are solving is that we take all these regular expressions and we want to be able to tell the rest of the system when any of these rules matches. And we want to find all of the matches and we want to know which of the rules match, not just one of the rules matched. And we detect matches in the middle of the string also. So that's what that prefix of the input matches part means. So if a signature matches in the middle of the packet, then we also have an alert. Now, this is pretty much the same problem as taking those irregular expressions and then oaring them together and then just recognizing the combined regular expression. It has these tiny differences but that's the fundamental problem that we are looking at. So while people have been looking at regular languages and regular expressions for a long time, so let's not just launch into hacking the solution to get it, let's look at what we know from theory. So we know that finite automata are the simplest machines that can recognize regular languages and so anything else that recognizes regular language is -- so we don't need anything more complex than the simple deterministic finite automata to recognize whether an input match is part of a regular language or no. We also know that their exists a canonical minimal DFA and no automaton can recognize correctly the language, a given language with fewer automaton states than the number of states of the minimal DFA. So does it mean that whatever we do we cannot have use less memory than this minimal DFA? That's not a correct conclusion, and I hope it will become obvious why. And does this mean that we use -- if you use anything other than DFAs it will be slower because we use something more complex? Actually that's not true either, and we will get into explaining why that is so in a few minutes. So let me just -- just a refresher. Just go through what a DFA is and how it works. So this automaton that defines states BQRST is a data structure that is used during matching to recognize the signature dot star AB dot star CD, which basically is a string AB followed at any distance by a string CD. So the way this works is that we have a pointer pointing to the current state so note that the automaton itself doesn't tell you anything about the relation of the input to the signature, how well it matches, but this pointer describes the state of the computation. So how the input seems so far relates to the signature. So ones we have this pointer to the current state, which is initialized to the star state we go through characters one by one. So here I'm using the, the, the convention used in -- by many of the intrusion prevention systems. So square brackets with I indicate A means all characters other than A. And these transition tables are just large tables and you index into them with the actual character and you always have a transition for the next guy. So we get an A, we get a B with each state R, we get a -- well, E and F will keep us there. Okay. We see the C, D and with each -- and accepting states the DFA accepts and then we notify the rest of the system that we have a match. And then we continue because we want to find all matches. Now, in many of the figures I will use a simplified representations without all the transitions, but all the automata with just one exceptions -- one exception. All the automata in this talk have [inaudible] and they have all the transitions defined. >>: [inaudible] because if you get AA, BE, FCE or Q you should have an extra transition [inaudible] if it gets A. >> Cristian Estan: Yes, yes, yes, yes, yes, yes, yes, yes, yes, yes. Yes. So I'm cutting corners. I'm not putting all the transitions and I would keep doing that throughout the talk because it gets hairy. [laughter]. So what is the problem? So if you have two signatures like this, and we want to combine them, then we can do that, and we have a single automata that recognizes both signatures at the same time. So with this combined automata we have a single state pointer and it tracks the progress in matching both signatures. So if after GAB we transition to a state that indicates that we have seen AB, so there's progress in matching the first signature, after EF we transition to a state that indicate -- that we made some progress on both first and the second signature, and after we see the CD, we arrive in this final -- in this accepting state and then we alert for signature one and later on we may alert for signature one or for signature two. So we have a single data structure describing both signatures. But the problem is that this is a large one. So if we have N such signatures, we need at least two to the N states for the combined automaton. >>: Are the examples carefully picked because the signature [inaudible] overlapping so what if the signature's [inaudible]. So for example if you have AB and [inaudible] AB and [inaudible] ABE. >> Cristian Estan: So if the signatures are with dot star AB, dot star ABD, you get the same type of explosion. You cannot get away with two to the power N. Some examples may be some of the automata may become messier, but you don't get away from the exponential states basic explosion because of overlapping those strings. >>: So in other words you're putting [inaudible] optimized just because you actually have [inaudible] signatures? Because you are already -- so think of it as signature having a smaller sort of [inaudible] you can match for both signatures, sort of X signatures. >> Cristian Estan: Yes. >>: So you can say already match a subset of my bigger match, can I now use that as a building block [inaudible]. >> Cristian Estan: Please wait in five or six more slides and let me know if what I'm going to present is what you had in mind. So basically why do we have this test base explosion? Because at any point in time we need to know if we see the second string of any of the signatures. We need to know whether to accept or no. So we need to know whether we have seen the previous -- the first part of the signature or no. And we need to know that for all signatures. So there are many possible interleavings. And we need a separate automata state that would represent that we haven't seen the first string of any of the signatures, or we have seen it just for the first or just for the second or the first and the second or just for the third and so on. So it's an exponential number of combinations that the computation needs to differentiate between. And a DFA, the only way to distinguish between two things is to have a separate automaton state. So now, okay, this is problematic. So people obviously haven't been building DFAs that are exponential in size because they wouldn't fit in the memory of any device. So what have -- what approaches have been used? So one solution is to just match the signatures in parallel. Now, the downside of that is that instead of just having one pointer that you keep updating and I have input like you have multiple pointers so your throughput goes down. Or you need to span more processors, more cores on achieving the throughput that you desire. Another solution is to combine them but not trigger big exponential explosion but just control it somehow, so combine just subsets of the signature set and then you can control the explosion and you don't have -- and depending on how much memory you have, you can use more or fewer automata. Now, what -- does any of this contradict the theory? So how can we match a signature set with fewer automaton states like when using separate automata than the number of states of the minimal DFA? So is there any contradiction here? There isn't because the state of the computation still has many, many possible values. But the number of automaton states which is the data structure we use for recognizing it is not as large. So here the state of the computation if used multiple automaton is just a [inaudible] with all the pointers for all the automata and that can still have a very large number of values, we just don't have a separate data structure with all those big transition tables for every possible state of the computation. So this is what we see with a methods currently in used to. We can have separate DFAs for each signature which is slow, or we can have a single DFA which is fast but uses a lot of memory, and there's a curve in between that we can move on by controlling the number of DFAs that we combine. But this is not what we want. We want to get in that ideal spot there. We want the same memory as the separate signatures, and we want matching to be no slower than it would take to just match a single signature. And actually for a string matching, which was used before regular expression matching we had exactly this behavior. You just added more strings. Most basic explosion your automaton becomes bigger, but your matching isn't any slower. So let me tell you about how we achieve this and then we will go into other topics. So again, some transitions are missing. But this is what we do. We have these extended finite automata XFAs where we extend the underlying automaton with a little bit of scratch memory, in this case a variable called bit that can have two values, true and false, and we add some programs to the -- attach some programs to some of the automaton states. So how does this work? Why does this work? By adding these bits and this extra computation, we have automata whose shape is closer to the shape of automata that do string matching. So these don't have those bad properties that we're trying to run away from. And but we still can recognize the same language because we have these extra checks. So this automaton is the same as an automaton that would recognize an AB and a CD just totally independently. So at the initialization we don't just initialize the state pointer but we initialize the value of the bits so the state of the computation is a pointer to an automaton state and the value of the bit. And then on every character, we keep following transitions and we don't touch this scratch memory that holds this variables, the bit, until we get to a state that has a program associated with it that had the bit and then we keep transitioning, and eventually we get to a state that's an accepting state but it's not an unconditional accepting state now we check the value of the bit. So we don't accept if we just see a CD that's not followed, that's not preceded by an AB because the bit wouldn't have been set. So the gain is not -- if you look at the single automaton but the gain is when you look at multiple automata. Again, I'm simplifying. And when you combine them, then you don't get the state space explosion. So single automaton in this case is not even smaller, but when you combine them you don't get the state space explosion. But you get the nice automaton whose size is linear in the number of signatures that you combine. Now, let me just say a few words here about how this combines operation works. So we are combining two things. We have the underlying DFAs and then we have these programs associated with states. Now, in our system the programs and the variables in scratch memory do not affect at all transitions in any way. They only affect acceptance decisions. So we combine the underlying DFAs just as you would combine normal DFAs. So it's a very simple well known, well understood operation. Now, what do we do with the programs? Well, each state is in the combined automaton is reflection in the two states in the two automaton you combine, and all you have to do is to just concatenate the two programs. With an extra twist that you may have to rename some variables if both have a variable called bit, then you will call one of them bit one and the other one bit two. But there's absolutely no interaction because they work on different variables and all do you is concatenate the programs. Well, and if there's an empty program, then your program that you combine with is just the same. So any questions about why or how this works before we move on to another example? >>: The reason behind this is that you're trying to basically save -- if you have multiple regular expressions save subsets of them [inaudible] through these variables? >> Cristian Estan: Yes. >>: So that you can come back -- is that the transition of this ->> Cristian Estan: Yes. So basically we are dividing the work between DFAs and then these extra variables and this memory and the programs. And we are using the automata for things they are good at, like recognizing chunks of regular expressions and we are using bits for things that are bits and well counters for things that are good at like tracking things that happen independently, and those can be mixed very cheaply if you just have independent bits. But if you have automata where things happen independently and can be interleaved arbitrarily, then that causes to say basic explosion. So just separating the two and then moving more the structure of the underlying computation. >>: Looks like here if you were to say just treat those bits as -- if you were to take the bits out of state space. >> Cristian Estan: Yes. >>: And just add them back into your dispatch tables would that exactly have the standard combination of [inaudible] correct? Right now if I were to say every place where there's a decision made on the bit, I was to imagine just sort of making the transition tables in memory now larger to reflect what's the state of bit one and what's the state of bit two. >> Cristian Estan: That's exponential. >>: I agree. But my question is is it exactly isomorphic to the standard exponentially large thing? >> Cristian Estan: Not that way. So the way I would say it's isomorphic is that the state of the computation is the point at the current state and the collection of all the bits. And now your transitions update one part of it and then update the other part of it, but by representing the two separately we can represent things much more compactly. >>: Okay. >> Cristian Estan: And then actually later in the -- when I get to the compiler you can see that these updates to these bits all throughout the compiler are in transitions just like the updates to the state variable, not on states. But it's more efficient to implement it, associating things with states not with transitions because there's much fewer states than transitions. >>: So when you talk about saving memory, are you mostly talking about saving the memory of the tables, the program or saving memory in the space? It seems like the memory of the states is going to be about the same but the programs are going to be a lot [inaudible] the transition table slash programs. >> Cristian Estan: So the state of the computation which is the per flow state you need to save between one packet and another is not -- is getting reduced usually, but that's not the significant reduction. The size of the data structure, which includes the automaton states, the programs, all that, that gets reduced significantly. Okay. So let's look at another example that's not as bad but still important in practice. If we have a signature like this one where we are looking for -- and this is typical for buffer overflows. We are looking for a new line followed by a key word well which in this case is just the letter A followed by 200 non new lines. Then while this is the automaton with some transition submitted that would recognize that. And in itself it's inefficient because it just uses 200 states to just count how many non new line characters have I seen after the key word, but when it gets combined with the well behaved string matching automaton it gets even worse because the string matching automaton gets replicated because that string can occur in any of set after this command. So actually we have KN squared states for answer signatures where K is this number, this number that's inside the integer range constraint, so how much we can't do. Now with XFAs we can use instead counter, and it's a bit trickier than the other one, but it works. So if we see a new line followed by an A, then we initialize the counter to zero. And then we would normally come back to the state K and keep just staying in that state and then whenever we move back to that state we increment the counter, we check if the counter reach 200 and if so raise an alert. That's not the whole story. That wouldn't give us the correct semantics. So what if we see a new line? So actually we have to add a countering validation program to the state that comes after seeing a new line and now this has exactly the correct semantics. It recognizes new lines followed by As, followed by 200 non new lines. And it has exactly the same shape as an automaton in recognizing the string new line followed by an A. And when we combine it to another string matching automaton we have again this nice shape of automata that just recognize strings. The programs get copied in different places. And if you look at state KR there, that's a first instance of an example where the programs from the two states that are combined are non empty and then we actually have a concatenation. So we are incrementing the counter and checking and accepting signature one if it's 200, but we are also unconditionally accepting signature two because we have just seen a BC. So and this is linear in the number of signatures we combine. So the core idea is to just take automata with -- and add these extra variables, put them in a scratch memory and it works because it allows us to change the shape of the automata any way that doesn't cause explosion and then these extra variables they don't cause explosion because we just concatenate the programs and we just concatenate the scratch memories basically when we combine automata. So that's the core idea behind what we do behind these XFAs. Okay. >>: So how large those programs can be, like those variables [inaudible]. >> Cristian Estan: So for now we use bits and counters and we have instructions for setting bit three, setting bits, testing bits, incrementing counters, invalidating counters, testing counters, and that's all we needed to get where we are. But you may find some other instructions useful or some other data structures useful if you different signatures as to what we had. >>: So in -- okay, so in theory these XFAs can recognize more than regular expressions. >> Cristian Estan: No. If you ask a theoretician this didn't recognize anything more than regular expressions because we have finite state. The number of variables we have is finite, they are finite counters, so the state of the computation is finite, just like with DFAs. Now, if you ask a practitioner then well a counter counting, you know, 64 bit counter is very different from an automata 2 to the 64 states. So in practice it allows to you do things that you wouldn't normally do with regular expressions, but from a theoretic point of view, it still strictly recognizes on regular languages because the state of the computation is finite, it's just better structured. And there is structure there in the signature sets that we looked at and that people use for intrusion prematching. >>: So [inaudible] is really neat, but just because like the state is finite means that does one follow from the other that any program that has finite state will only recognize regular languages? Is that what you just said or ->> Cristian Estan: Yes. Because if it has finite state then it can just represent it with -- well, if it has finite state then it goes through this input bytes one by one which we do. Then you can represent any change in the state of the program as a transition. So -- and of course theoreticians say well, be you have counter, so if you give me two infinite counters then we have a touring machine. But these are not infinite counters, these are just finite counters. So to the theoreticians this is regular expression matching and regular languages. It cannot do anything more than that. Okay. So let me tell you about a couple of the things we had to do to demonstrate this in practice. First what we need to handle regular expressions that are very different from these simplified examples that illustrate well what we do. But I may be -- well, a lot cleaner than what we see in practice. Then we need a compiler actually to take us from regular expressions to XFAs because we don't want to be building these by hand. And actually it turns out that there's a lot of mileage we can get out of optimizations on the combined XFA because this structure allows us to optimize away things that help us with performance and memory usage. >>: Just curious so [inaudible] XFAs as DFAs plus [inaudible]. >> Cristian Estan: Yes. >>: But [inaudible] total [inaudible] is to have as compact [inaudible] as possible? >> Cristian Estan: Yes. Of the data -- yes. >>: Essentially if you replace the DFAs with say [inaudible]. >> Cristian Estan: Yes. >>: [inaudible] maybe a [inaudible] increased or [inaudible]. >> Cristian Estan: Exactly. >>: Do you actually get better encodings for them or essentially what [inaudible]. >> Cristian Estan: So, well, the NFAs are much more compact. They are quicker to build but matching is slow. So what we have is a single DFA for the entire signature set that matches the signature set, and you pay the cost of a few instructions for the programs. Whereas with NFAs if you do a [inaudible] reversal then you are at many states at any given time and your update gets complicated. Or if you do backtracking then while you can defeat a lot of work because of backtracking or you can just do a lot of extra work because of the backtracking. So matching is one slower and often less deterministic in terms of the time it takes to go through the input. >>: [inaudible] because I don't [inaudible] probably be a [inaudible] if the DFA becomes too large essentially [inaudible] become more complex or [inaudible]. >> Cristian Estan: So in some sense this original picture here -- I shouldn't go back this way. Okay. If you look at the corner for DFA, for each signature, that's like an NFA where it -- it's in parallel in all these states for the different signatures and you have an epsilon transition to the start space from a common start state, so that's an NFA. So of course for NFAs there are many types of NFAs you can build for a signature, so it's not like the DFS or random [inaudible]. But the way to think about it NFAs are that lower right corner pretty much. >>: So let me ask this question. >> Cristian Estan: Okay. >>: [inaudible] I can figure out. Should we build a pro regular expression at this point in the XFAs? Or is this for routers only or where [inaudible]. >> Cristian Estan: Okay. So the whole regular expression library is actually used in intrusion prevention systems, and it is an NFA based approach that uses backtracking. Now, there are a couple of advantages we have. So one advantage is that we can -- we don't have -- with PCRA, that's the library, if you combine the regular expressions with the big oar, then it gets probably very slow. And you have to match them separately. And then you pay the time -- the cost of matching this signatures separately. So that's our advantage. Their advantage is that it's very quick to match. So we go through some work and determinization building the XFAs, combining XFAs before we can match. So in many settings where you would use per regular expressions you don't want to spend a lot of time optimizing and combining things, you just want a quick match and maybe you'll be using the regular expression just once. And then another advantage that it has ->>: So let me just -- so it's also a size of the -- so it's also a function of the size of the unit of the data? >> Cristian Estan: No, it's a function of how many regular expressions you have and how complicated your regular expressions are. So basically we explored the fact that we can spend a lot of time -- actually it's not that much but we can spend time on trying to get to a signature that allows fast matching. They have to care much more about the time it gets -- it takes to get the representation where they can do matching plus they have all types of results that we don't have so in many practical application you need those so it's not ready to replace them. But an intrusion prevention system where you look for this high throughput processing, I would argue that this is what you want. Sorry. Okay. So let's see. With general regular expressions we have two -well, we have one big problem. How does the compiler know when to use bits and counters and actually this is the question about introducing bits and counters in the front end of the compiler. So for integer, for counters it's easy because there is a give away, there's this integer range notation which is a syntactic sugar on the original syntax but it's used extensively, so whenever we see something like that, it means between M and M repetitions, then we insert a counter. For bits it's a bit more complicated. We introduce a parallel concatenation operator that we insert in the regular expressions and it's the same as the normal concatenation operator. And it's the same in terms of semantics but it introduce a bit in the construction. And what we are trying to do with this parallel concatenation operator is to break the regular expression into chunks that are more string like. And we have some heuristics that do this like for three quarter of the signatures, and then for 15 percent of the signatures we have to adjust the insertion of this operator manually and so we cover roughly 90 percent of the signatures in the data sets that we looked at. So here are some examples. So for example if we have -- and these are actual regular expressions from Snort, the open source intrusion prevention system that's most widely used. So the first regular expression looks for ping dot ASP that can be preceded by a slash or a backslash. It's not exactly one string, it's two possible strings but for us it's close enough to string matching that we don't introduce a parallel concatenation operation. The second one is actually the type of signatures that we have seen. The first string followed by a second string. So the first string is BAT double quotes and the second string is the ampersand sign. So this we break into two before the second dot star by inserting this parallel concatenation operator which doesn't change the semantics but it tells the compiler to insert a bit. The last one is similar to the expression that caused the polynomial blowup so new line followed by key words followed by 300 repetitions of non new lines. Now, this is not obviously string like, but for our compiler having a character class such as non new line, that's very large is the same as having a dot and having 300 repetitions gives us the same shape of automaton as the closure that looks for an arbitrary number of representations. So we insert that parallel concatenation operator before this large number of representations because that's like the beginning of another dot star string. So that's how we -- yes? >>: What's the meaning of the [inaudible]. >> Cristian Estan: The rule number, that's just the identifier that Snort gives to these rules. >>: [inaudible]. >> Cristian Estan: Yes, yes, yes, yes, yes. And then that's -- if it's configured as an intrusion protection system then you log rule number blah alert or something like that. So actually we are still working on a theory that would allow us to fully automate this step, and we have a lot of intuition about what causes state space explosion and what doesn't, but we need to make some progress on a theory to be able to make principled and informed decisions about how to break this regular expressions. Okay. The next step is compiling a regular expression to an XFA. And this is the only slide where I'll have a nondeterministic automaton on the slide. And inside the compiler of course we use nondeterministic XFAs. So what do they have? They have a set of states. They have data domain which for our compiler is this unstructured set so for example it's 012, be so if we have data domain that can have four values, you know, the compiler most of the time doesn't know whether those should be two bits that can be set or set independently or a counter that can count from 0 to 3 or a counter that can be invalidated and then it can count from 0 to 2. So it's all [inaudible] it's just a set. It's just a set of values for the data domain. We have an input off of that. We have a nondeterministic transition relation. We have an updated relation for the data domain, so inside the compiler actually we update this data domain, the value of the data domain on the transitions. And it's a nondeterministic relation and later it gets determinized. And we don't just have an initial state, we have an initial configuration because now the state of the computation is the point to the automaton state and the value in the data domain, so state K and the value has to be 0. And for acceptance we just don't have an accepting state but the set of accepting configurations so we accepting state N if the value of the data domain is 2. So we use these N XFAs and go through the normal steps of the construction of the Thompson construction for DFAs from regular expressions. So we build the nondeterministic XFAs from the pars tree of the regular expression, then we eliminate the epsilon transitions, then we have two separate determinization steps, one to determinize the transitions, another one to determinize these updated relations and turn them into update functions. And we have two steps for minimizing the data domain and the state space, but the one for minimizing the state space is not implemented in the results I'm going to show. And actually I put minimize in quotes because we don't have the concept of a canonical minimal XFA like there is for DFA, so it's more reducing the data domain. And there's tricky, tricky step at the end. And this is one of the things we would like to get through and make progress on. So this is another reason why we're still working on this project. So we have these data domains that -- and these update functions that are described as sets. But what we actually want is efficient programs that would update the data domain and give some structure to the data domain. And wee need to find an efficient implementation of the data domain. So going from this will take unstructured representation to something efficient. So if you have four values, for example, this step would recognize that for this automaton I have to use two separate bits, for this other automaton I have to use a counter that can go from zero to three. So there is this last step of finding the right structure for the data domain. And actually we move the updates to the states because it's more efficient to match that way. But that's an easy step at the end. So I'm not going to go through all these steps. So but I have here examples of nondeterministic XFAs that we get during the process. So the one on the right is what we get if we combine expression one and expression two with parallel concatenation. It's similar to have two automaton that recognize the two expressions separately. But because we have the bit, it acceptance just -- if it sees the second expression before the first expression. And actually if there are overlaps between them, then this still guarantees the correct semantics. So it can handle arbitrary cases even if there are overlaps between the strings, the two regular expressions, they are handled correctly. And the other example is the shape of the automaton that we get if we use this integer arranged location to introduce a counter. Okay. Let me talk about optimizations. And actually there are two optimizations that help us a lot. For some signatures that's at least -- one is that for the example we saw we have this counter that we have to increment on every byte of input pretty much. And that's not the problem but if we combine many signatures like this, 15 signatures like this, then we have 15 counters to increment on every byte. That slows down the processing. So can we do it some other way without incrementing the counter on every byte? And what we do is for some counters such as the one used here, we just don't increment them on every character, but instead remember when they would trigger an alert. So we have this global data structure. Think of it as a timer and we set an alert in that timer when we would initialize the counter. And then instead of incrementing the counter on every byte, we just check to see whether any of these alerts that we had timers for is triggered or no. And then of course we remove it from this list when we see new line. And what we did here is we removed the increment operation from the common state. Now, we still have this operation of -- from state K, which is the state in which the automaton spends most of its time. Now, we still need to check whether there are any timers that expire on the current byte. But that's a scalable operation because that's a single check and we perform a single check whether we combine 15 signatures or whether we have a single signature. It's not like incrementing 15 counters which is more work as the number of signatures increases. Yes? >>: [inaudible] other counters check the [inaudible] right? >> Cristian Estan: No. So we can have a counter that's being invalidated by a new line, we can have a counter that's being invalidated by a space or a ->>: [inaudible] and another one that [inaudible]. >> Cristian Estan: Yes. >>: Then ->> Cristian Estan: No. Then this won't work. So if you have -- then we wouldn't apply this optimization. So this applies if you have counters that are incremented on most possible input characters. And we don't always supply these optimization. For some counters, for example, for some e-mail signatures you are looking for a certain number of certain characters but just very specific characters. So a number of repetitions for the at sign or for the percentage sign and so there we don't use this. >>: [inaudible] signatures, how many repetitions do the regular expressions specify? >> Cristian Estan: So hundreds. >>: Hundreds, really? >> Cristian Estan: Hundreds. And there are some that go up to two or 4,000. But ->>: So [inaudible] so many -- I mean, what are the signatures looking for so that contain hundreds of repetitions or something? >> Cristian Estan: Buffer overflow, yes. >>: And the question is why 2,000 why not just [inaudible]. [brief talking over]. >> Cristian Estan: Well, if you have a buffer that's 2,000 characters then you need at least that magnitude to cause trouble. So we haven't looked very deeply into the reasons behind these signatures, but we are assuming many of these are buffer overflows. The ones that do the counting. This type of counting. Okay. There's another thing we can do. So there is an opportunity because often these counters are what we would call mutually exclusive. So if you want to count the number of new lines after the hello key word in NSTP, and if you want to count the number of non new lines after another key line in NSTP, well, you will never be at the same time on a line that had one key word first and at the same time on a line that had the other key word first. So those two counters, they are never used at the same time. So what we can do is use actually a single counter and when the signatures are combined notice that we can use a single counter to look for both types of repetitions. So this is the combined automaton for not doing this optimization and if we use a single counter, then the only changes really that we have fewer alerts to cancel when we see a new line. And that's actually an important benefit because now it's not as bad as before that you have, you know, 15 counters to increment, but after a new line if we have these separate counters, we have 15 counters to reset. So we are reducing the length of that by using this optimization. It's actually more general. We have a dataflow analysis, and it's similar to some of the analysis done by compilers. But this is the most important case in which this optimization of using the same counter for multiple signatures is applicable. It's in some sense similar to what compilers do when you have different variables that use the same register. They do some analysis, and if they figure out that you can use the same register then they use the single register and you don't need two registers for two variables that are not used at the same time. Okay. Let me talk a little bit about our experimental evaluation. And so we use signature sets from Cisco IPS and Snort and then we use F signatures which are at least text based protocols where regular expressions are used a lot. There are other protocols also that use regular expressions but we just focused on these three. And we're able to construct XFAs and DFAs for 90 percent of the signatures and for DFAs for neither of these signature sets could we fit the, combine the FA into gigabytes of memory. Actually I think we also tried with 16 gigabytes and it still wouldn't fit. So they are way larger. Even for you know the small Les signature set which has 38 signatures. For XFAs the number of automaton states we get is on the order of well from hundreds to 15 thousand states, so that's still a few megabytes or 10s of megabytes of data. So in all cases we could just produce a single XFA recognizing the entire signature set. Now -- >>: Question ->> Cristian Estan: Yes. >>: So when you say you construct this for 90 percent of the structure what was the problem for [inaudible]. >> Cristian Estan: The problem for the remaining 10 percent is that our compiler -- the problem for the remaining 10 percent is that they were different from the others. We haven't picked you know the hard or the easy ones. And for our compiler we have that last set when we need to add the structure to the data domain and there would be just many variations on bits and counters and it's -so for the most common shapes of signatures we went through the manual work of building descriptions for those structures for the data domains, but it's a manual step. So when we got to the signatures that are not exactly like the other signatures, it's just a lot of work for solving the last 10 percent of the problem. So we stop there. And well, we decided that it's better to try to come up with an automated compiler as opposed to just pushing that through 100 percent. It's not the fundamentally limitation of the idea of using the XFAs it's just our compiler is not automated enough to do things on its own entirely. >>: When you say [inaudible] so for me -- when you said you actually have a large memory does that mean that the term ID systems don't actually do the signature matching? >> Cristian Estan: They don't use a single DFA, they use multiple DFAs. They have to use multiple DFAs. Or some things other than DFAs. If we have time after half past 11 then I will go into a couple of things, a couple of other ideas that are being used and their pros and cons. Okay. So we compare against multiple DFAs and multiple DFAs with compressed transition tables which is something that has been proposed and SIGCOMM 2006 and this is the type of results that we get where the black crosses that are just the multiple DFAs and then the diamonds are the compressed DFAs. The horizontal the axis is the processing time, so the further you are to the right it means the slower the solution and the vertical axis is memory usage so the higher up you are, the more memory you use. The vertical line there is the execution time of a single DFA, which is -- well, we cannot use that to represent the entire signature set, but we put there -- we put it there for reference. So this is the data point we get with an XFA before we apply these optimizations, and this is the data point that we get with optimizations, so we can see that it's faster and more compact than the solutions we can have based on multiple DFAs for exactly the same signature set. And this is for Cisco SMTP, which is one of the simpler signature sets. Snort HTP this is the most complex signature set we looked at and again the axis are on large scale, so the differences here, these are factor of 15 and memory consumption and the factor of almost N in speed up. Now, why are we faster? We are not faster than the blue line, so we are not faster than a single DFA because we do what a DFA does plus run some little programs. But if we compare against something that can recognize the same complex signature set, which in this case is 40 some DFAs and they use much more memory, then we are faster than a bunch of DFAs. We are not faster than an individual DFA. But we are faster than the current solution which is that of using multiple DFAs. >>: So this cycle, this is for the commodity [inaudible]. >> Cristian Estan: Yes, this is for a commodity pentium and so many of the existing intrusion prevention systems don't run on commodity pentium, but this is something we could measure that would give us an idea. And well, if you use something faster or if you use something specialized than the absolute number is changed. But we expect to see the same types of curves. >>: I'm curious, is that [inaudible] memory size of the transition tables are [inaudible]. >> Cristian Estan: It's hard to say from these numbers, but our suspicion is that a lot of it is because of memory excesses, also. Because these are large data structures, and there is some locality, there is some popular states, but every now and then you go outside popular states and that's a very slow operation. So things that we do such as incrementing, manipulating a few bits and counters that are relatively easy to cache, we assume that that actually doesn't generate as much memory traffic as looking up many transition tables. Even if you have a cache ratio of 90 percent, if you have 50 automaton, then two out of how many, five out of 50 will still require slow memory accesses and that -- we speculate that that's part of the reason. It's a combines of both. But -- well, it's hard to say from the numbers we have. >>: A question. So the cycles is not actually operations executed or ->> Cristian Estan: That's pentium performance counter site, yes. >>: Okay. >> Cristian Estan: And we are not counting for the alerts themselves we have a null routine, so we are not spending any time on, you know, logging or anything like that. And we are not counting the time to read in the input. So that's what our methodology was. So let me summarize -- so regular recognition matching is a performance critical upon for deep packet inspection and specifically for intrusion prevention systems but for other systems that do deep packet inspection also. And the state space explosion leads to memory problems that translate to run time, throughput problems with DFAs because you have to go to more than one DFA. And [inaudible] extend these DFAs with auxiliary variables and the effect of that is that we [inaudible] this space explosion by having underlying DFAs that don't interact adversely. And where we are now is that we need more work to fully automate the construction effects XFAs but we are able to build a prototype with 90 percent of the signatures that XFAs can outperform multiple DFAs in both matching and memory usage. So -- yes? >>: I have a question like how, like how about these compilations. How hard is this? Like if you [inaudible] how long will it take [inaudible]. >> Cristian Estan: The computation stuff? >>: Yes. So if he goes like for a company, right, so a company can actually throw the Snort thing with whatever the regular expression they are using right now, it's slower but it's just, you know, they press a button for it to work or they can do this stuff that I understand it can be automatic but it's not right now. So how hard is to do this, to actually combine all the regular expressions to these structures? >> Cristian Estan: So the hard part is not the combination. So when we combine regular expressions for five out of the six signature sets combining from individual XFAs is less than a minute, including all optimizations for the most complex signature set it's seven minutes. And this is just combining them one by one. So it's ->>: [inaudible] hatch signs, all that? >> Cristian Estan: No. So combining the XFAs, that's the easy part. Building the XFAs that's the hard part. And with our current tools for -- I don't know off the top of my head the exact numbers, but it's very close to 85 percent of the signatures that we compile are under 10 seconds. >>: [inaudible] I would imagine they have a long lifetime and get used widely. Like if somebody were to count the signature, if combination is easy, somebody needs to do it once manually and then just keep using it, right? >> Cristian Estan: That's, that's, that's one argument to make. But if they don't buy that, then we have to come up with a compiler that works for more signatures. >>: What I meant to say, sometimes [inaudible] can be a deal breaker. >> Cristian Estan: Yes. >>: For something ->> Cristian Estan: For intrusion prevention it is not. >>: Right [inaudible]. >> Cristian Estan: Well, we don't think it should be, but Cisco may disagree. And the other comment is that in some other settings where you have automated systems coming up with signatures and you want the user to insert for application identification you give more freedom for the user to insert things they want to recognize. There it's more important to have a fully automated way of going from this frequently updated signatures to an XFA representation. So okay. Since we have more time, I can go through a couple of other ideas that have been used. And actually this idea of separating the variables from the automata, we think in a real system it would be used in combination with a couple of other ideas that have been out there. So let's just go through a couple of extra topics here. The first one is exploiting harder parallelism, the second one is slow path, fast path. This is what Snort uses. Nondeterministic automata and compressing transition tables. So if we have multiple signatures to match against the input then we can do this internally if we are parallel hardware and people who do FGA based solutions or build integrated circuits, they can have as much parallelism as they want and have work be done in parallel. Now, this -- so the advantage is that the error increases linearly with the number of signatures, and there's no slow down as the number of signatures increases. But what happens is if you have too many signatures and people would argue that we haven't reached that too many point, then the power consumption goes up because you have all these microcontrollers or cores working in parallel and the per flow state gets very large. So imagine that you have 500 microcontrollers each with their own state pointer. If you have a single automata and then you have a single state pointer. If you have 500 automaton matching in parallel then you can't even broadcast something to them, you need to add these state pointers one by one and then remove them after you're done with the packet. So the throughput, if you are just looking at the string is okay. But if you need to contact switch and then you'll save the per flow state, then that becomes a more expensive operation with these types of approaches. >>: I would assume in most [inaudible] you would be wanting to compare multiple flows rather than just a single flow? >> Cristian Estan: Multiple flows rather than a single flow. So exactly. The other point is that if you have this parallelism available in hardware, then wouldn't it be better to have a solution towards the single automaton instead of using 50 cores to match one packet, use the 50 cores to match 50 separate packets and increase your throughput that way. >>: It might be more efficient to process flows in [inaudible] process each individual one in parallel. I don't know. >> Cristian Estan: So what happens is you often get these papers that look at the whole design space and then there's a cost of working on different packets because then you need different local buffering, they cannot use the same buffer. So it's hard to piece apart the Ys. But basically there have been proposals that proposed using this multiple cores to work on the same input and there have been proposals to work on the separate course and harder parallelism and different inputs in parallel. It's more expensive to work multiple inputs in parallel than it is to work on the same input and then have multiple processing units work on it. And put again, this is something that has been proposed and it is useful so for example in an XFA based system if our compiler is not good enough, we can still have one XFA and then a bunch of DFAs that would run in parallel. And that would be a better solution than running way too many DFAs in parallel. So another solution is a fast path, slow path. And there are many variants of this. Snort is using this extensively. So you can make the signature into something simple that you can recognize efficiently and where you can combine the signatures such as string matching and that's exactly what Snort does and then if you see a string that must be there for the signature to match then you go through a slow path that would actually check everything, not just string matching. So you don't alert whenever you see that string, you get the good semantics, but most of the time you don't do the expensive matching. So typically the way it works is you combine these fast path representations but then you match individually these slow path representations that give you the semantics you want. >>: [inaudible]. Don't you [inaudible]? I mean, suppose that back followed by star and [inaudible] right so [inaudible] string you could look for on the fast path but then you need to know what passed before so you [inaudible] on to some of the data that went before to actually raise an alert. >> Cristian Estan: Yes, yes, yes. So you can do this if you have all the data, you do the fast path pass first and then you may need to come back. And then how do you know if needed? Well, in the Snort signature language it tells you these strings and then if the string doesn't occur then the signature cannot match. So for example the key word was describe. You look for the string describe. If the string describe is not there in the input, then you cannot see a new line followed by describe, followed by 500 non new lines. >>: So you're essentially breaking individual signatures, you're not saying, okay, these signatures are really slow, I'm going to put them in the slow path? >> Cristian Estan: No, no, no, no, no. You're making individual signatures. And that's a very good technique. So our gripe with this approach is that it's open to algorithmic complexity attacks which is -- the problem is not that it's algorithmic complexity attacks can happen but that people don't really look at how bad they can be. So obviously if you have this fast path slow path then by triggering the slow path more often than with normal traffic you can slow it down. But people proposing these types of approaches don't really try to break their own scheme and see how vulnerable it is. So I don't think it's a -- well, some of them. Okay. So I don't think this is a bad idea, but if you have such a system then we would argue that the best way to build it is to have some quantification for how much it can be slowed down by someone who adversarily tries to trigger the slow path. and then there are things you can do to make it hard to trigger the slow path very often. But it's not always done. So another thing that's use is nondeterministic finite automaton and they are very compact, you can build them quickly, but you need more processing when you match them. So if you do this breath first it's similar with having the multiple DFAs. If you do a depth first reversal of the state space then backtracking can get you in trouble, and again you can have algorithmic complexity attacks and the result is that for actual IPS signatures we're able to slow them down by six orders of magnitude just by giving them inputs that cause them to backtrack. So this is actual Snort signatures that I'm talking about. So ->>: [inaudible] for example? >> Cristian Estan: So we had some earlier work that looked at backtracking but not inside NFAs but inside Snort's language for signatures which is a predicate based complex language. And what we did there is we slowed it down by a million and a half by triggering backtracking. And then if we applied some dynamic programming techniques, memorization specifically, then that slowdown went away because we could check and not do work that wasn't necessary. So backtracking an adversary, if you do backtracking in a simple minded way, then you can redo work that you did previously, and that ultimately led to failure. And you just keep it because you don't keep enough memory of what you did. Now, if you do this dynamic programming where you remember that if I tried this from this input, from this position in the input it will lead to failure, then you can cut down very much on the power of this algorithmic complexity attacks. Now, I don't know how easy it is to integrate that with PCRE. With a given library which is relatively messy. But it's a solution that worked for the specific backtracking [inaudible]. Okay. And the other thing is compressing transition tables. This is what we compared against. There are all types of methods of compressing these large transition tables with 256 characters and we can notice that many characters are treated the same by all states or that transition tables for different automata states are similar and then use these similarities. And you can get pretty far with easy -- with simple things a factor of 10, 20s is probably easy to achieve without complicating very much the uncompressing you need to do when you do matching. But it's hard to go past that. And but we want to point out here that this is an orthogonal solution to what we have for XFAs so we could actually further reduce the memory usage of the DFA underlying the XFA by applying some of these techniques for transitioning compression. Another solution where -- another problem where compressed transition tables are being applied is this multi-byte matching which I didn't talk about in the main part of the talk. If you want to take not one byte at a time but two bytes at a time, then you have an alphabet of 65,536 characters because you're looking at two bytes at a time or even larger. So for these alphabets, this compression techniques, they work better than for smaller alphabets but still you cannot take too many bytes because they get overwhelmed. So and that's the last of my extra slides, so if you have questions then I'm -- I can answer them. If not, then we can end here. >> Ratul Mahajan: Let's thank the speaker. [applause]