>> Ratul Mahajan: Welcome. Good morning. Welcome... great pleasure to introduce Cristian Estan from University of Wisconsin,...

advertisement
>> Ratul Mahajan: Welcome. Good morning. Welcome everybody. It's my
great pleasure to introduce Cristian Estan from University of Wisconsin, Madison.
And I've known actually Cristian for quite a while now, I think when I was in a
grad student at UW and he was a grad student at San Diego and graduated
there four or five years ago?
>> Cristian Estan: Yes.
>> Ratul Mahajan: With George Varghese. and today he's going to talk about
some of the work he's done over the past few years on improving deep packet
inspection. Take it away.
>> Cristian Estan: Thank you, Ratul. So okay. Let me try presenting from here,
the cameras can see me here also.
This is joint work with my colleague Sam Aschaw [phonetic] and my two students
Randy Smith and Chi Ging Cong [phonetic]. And this talk summarizes work that
we had in SIGCOMM paper this year and then Oakland paper earlier this year.
So what is this talk about? So let me first tell you what this talk is about. It's
about regular expression matching, which is a performance critical operation for
all type of systems that do deep packet inspection. And the problem is that it
takes too much time and too much memory. So our solution to this problem is a
new representation for the signature sets that need to be compared against the
traffic. And it's a representation that allows us to represent this signature sets
compactly and supports fast matching.
So for example, compared to methods currently in use in commercial intrusion
prevention systems, we have 50 times reduction and at the -- in memory usage,
and at the same time a 10 times increase in the speed of the matching operation.
So not 50 percent, 50 X. And but the magnitude of the benefits depends on the
complexity of the signature so for simple signature sets we don't see benefits that
are quite this big. So this is in a nutshell what this talk is about.
Let me go on and motivate why we're working on this. Why do we need deep
packet inspection? So there are a couple of scenarios that are probably going to
keep motivating this type of solution. So for example, you have a server that is -that has a vulnerability and is not patched but it has to accept a connections from
outside clients, from anywhere in the Internet. But some of those clients may
trigger that vulnerability and want to take over the server or shut down the server
or whatever. So why would you ever have an unpatched server facing the
Internet? Well, maybe the patch doesn't exist yet or maybe the patch begs some
other functionality so there are many reasons why you could have such a server
so you put an intrusion prevention system before the server to protect it, and the
server can keep running.
Another example is when you have an enterprise running multiple applications
and the enterprise want to prioritize the traffic without having to change the
applications to insert markings into the packets, so that's another driver. Or
when you have floods that you need to defend against in the network because it's
too late at the end point, and you want to analyze packet contents to detect
what's an attack, what's not an attack, and that's another case when you would
want to do deep packet inspection.
Now, packet has a base filtering criteria have been applied and are used but
what we see in all of these problems is that you have to look inside the content of
the packet. And for the purpose of this talk, I define deep packet inspection as
being done by any system that looks at the payload of the IP or TCP packets.
So okay. So how does regular expression matching fit into this whole scenario?
So not all of what these systems do is deep packet inspection, they use this
header based criteria. For example to decide what signature set to apply. Or if
you have something that does application identification then you do deep packet
inspection on the first few packets of a flow, and from there on you just look at
packet headers because you already classify the flows. So it's not all, not looking
at -- not necessarily looking at the pay loads of all packets. And deep packet
inspection is not all regular expression matching. You can have some parsing in
there that would direct the regular expression matching just to some portions of
the packet, not the entire packets. Or you may have some other things such as
decoding, various encodings for protocols.
And in some cases it plain doesn't apply. So if you have encrypted traffic then
you cannot do a regular expression matching on it. You can decrypt it and then
do regular expression matching but not apply regular expression matching
[inaudible].
>>: I was wondering. Do you know how many enterprises are using IP set in
their -- how much air traffic say?
>> Cristian Estan: I don't know.
>>: When you talk about regular expression here, do you mean also like kind of
packet boundaries or do you mean [inaudible]?
>> Cristian Estan: So for -- we assume that someone gives us an input and then
whether the system gives us a single packet or reassembles multiple packets
and gives us a TCP level in a byte string, the same problem appears. So that's
external to what we do. But yes, the systems that care about the security, they
need to do reassembly, because otherwise the bad guys can evade the
detection.
So just one more historic note that string matching used to be used for a similar
purpose but it's not expressive enough just looking for strengths in payload is not
expressive enough. There are so many ways of just changing the attacks to full
string based method, so the world is moving towards like regular expressions
now.
So I'm -- I start by introducing the problem of regular expression matching, then
talk a little bit about the core idea, then about the things we needed to do to turn
into it a solution, show some results. And if we have time, then I can talk a little
bit about other ideas that have been used in this context of improving the
memories of your performance of this critical operation of regular expression
matching.
So, okay. What's our definition of a problem? So we have a signature sets, a
signature set. So what's a signature set? We have a list of a set of signatures,
each is -- each signature is a regular expression and each regular expression
has a rule number associated with it. And the matching problem we are solving
is that we take all these regular expressions and we want to be able to tell the
rest of the system when any of these rules matches. And we want to find all of
the matches and we want to know which of the rules match, not just one of the
rules matched. And we detect matches in the middle of the string also. So that's
what that prefix of the input matches part means. So if a signature matches in
the middle of the packet, then we also have an alert.
Now, this is pretty much the same problem as taking those irregular expressions
and then oaring them together and then just recognizing the combined regular
expression. It has these tiny differences but that's the fundamental problem that
we are looking at. So while people have been looking at regular languages and
regular expressions for a long time, so let's not just launch into hacking the
solution to get it, let's look at what we know from theory.
So we know that finite automata are the simplest machines that can recognize
regular languages and so anything else that recognizes regular language is -- so
we don't need anything more complex than the simple deterministic finite
automata to recognize whether an input match is part of a regular language or
no.
We also know that their exists a canonical minimal DFA and no automaton can
recognize correctly the language, a given language with fewer automaton states
than the number of states of the minimal DFA. So does it mean that whatever
we do we cannot have use less memory than this minimal DFA? That's not a
correct conclusion, and I hope it will become obvious why. And does this mean
that we use -- if you use anything other than DFAs it will be slower because we
use something more complex? Actually that's not true either, and we will get into
explaining why that is so in a few minutes.
So let me just -- just a refresher. Just go through what a DFA is and how it
works. So this automaton that defines states BQRST is a data structure that is
used during matching to recognize the signature dot star AB dot star CD, which
basically is a string AB followed at any distance by a string CD.
So the way this works is that we have a pointer pointing to the current state so
note that the automaton itself doesn't tell you anything about the relation of the
input to the signature, how well it matches, but this pointer describes the state of
the computation. So how the input seems so far relates to the signature.
So ones we have this pointer to the current state, which is initialized to the star
state we go through characters one by one. So here I'm using the, the, the
convention used in -- by many of the intrusion prevention systems. So square
brackets with I indicate A means all characters other than A.
And these transition tables are just large tables and you index into them with the
actual character and you always have a transition for the next guy. So we get an
A, we get a B with each state R, we get a -- well, E and F will keep us there.
Okay. We see the C, D and with each -- and accepting states the DFA accepts
and then we notify the rest of the system that we have a match. And then we
continue because we want to find all matches.
Now, in many of the figures I will use a simplified representations without all the
transitions, but all the automata with just one exceptions -- one exception. All the
automata in this talk have [inaudible] and they have all the transitions defined.
>>: [inaudible] because if you get AA, BE, FCE or Q you should have an extra
transition [inaudible] if it gets A.
>> Cristian Estan: Yes, yes, yes, yes, yes, yes, yes, yes, yes, yes. Yes. So I'm
cutting corners. I'm not putting all the transitions and I would keep doing that
throughout the talk because it gets hairy. [laughter].
So what is the problem? So if you have two signatures like this, and we want to
combine them, then we can do that, and we have a single automata that
recognizes both signatures at the same time. So with this combined automata
we have a single state pointer and it tracks the progress in matching both
signatures. So if after GAB we transition to a state that indicates that we have
seen AB, so there's progress in matching the first signature, after EF we
transition to a state that indicate -- that we made some progress on both first and
the second signature, and after we see the CD, we arrive in this final -- in this
accepting state and then we alert for signature one and later on we may alert for
signature one or for signature two.
So we have a single data structure describing both signatures. But the problem
is that this is a large one. So if we have N such signatures, we need at least two
to the N states for the combined automaton.
>>: Are the examples carefully picked because the signature [inaudible]
overlapping so what if the signature's [inaudible]. So for example if you have AB
and [inaudible] AB and [inaudible] ABE.
>> Cristian Estan: So if the signatures are with dot star AB, dot star ABD, you
get the same type of explosion. You cannot get away with two to the power N.
Some examples may be some of the automata may become messier, but you
don't get away from the exponential states basic explosion because of
overlapping those strings.
>>: So in other words you're putting [inaudible] optimized just because you
actually have [inaudible] signatures? Because you are already -- so think of it as
signature having a smaller sort of [inaudible] you can match for both signatures,
sort of X signatures.
>> Cristian Estan: Yes.
>>: So you can say already match a subset of my bigger match, can I now use
that as a building block [inaudible].
>> Cristian Estan: Please wait in five or six more slides and let me know if what
I'm going to present is what you had in mind.
So basically why do we have this test base explosion? Because at any point in
time we need to know if we see the second string of any of the signatures. We
need to know whether to accept or no. So we need to know whether we have
seen the previous -- the first part of the signature or no. And we need to know
that for all signatures. So there are many possible interleavings. And we need a
separate automata state that would represent that we haven't seen the first string
of any of the signatures, or we have seen it just for the first or just for the second
or the first and the second or just for the third and so on. So it's an exponential
number of combinations that the computation needs to differentiate between.
And a DFA, the only way to distinguish between two things is to have a separate
automaton state.
So now, okay, this is problematic. So people obviously haven't been building
DFAs that are exponential in size because they wouldn't fit in the memory of any
device. So what have -- what approaches have been used? So one solution is
to just match the signatures in parallel. Now, the downside of that is that instead
of just having one pointer that you keep updating and I have input like you have
multiple pointers so your throughput goes down. Or you need to span more
processors, more cores on achieving the throughput that you desire.
Another solution is to combine them but not trigger big exponential explosion but
just control it somehow, so combine just subsets of the signature set and then
you can control the explosion and you don't have -- and depending on how much
memory you have, you can use more or fewer automata.
Now, what -- does any of this contradict the theory? So how can we match a
signature set with fewer automaton states like when using separate automata
than the number of states of the minimal DFA? So is there any contradiction
here? There isn't because the state of the computation still has many, many
possible values. But the number of automaton states which is the data structure
we use for recognizing it is not as large. So here the state of the computation if
used multiple automaton is just a [inaudible] with all the pointers for all the
automata and that can still have a very large number of values, we just don't
have a separate data structure with all those big transition tables for every
possible state of the computation.
So this is what we see with a methods currently in used to. We can have
separate DFAs for each signature which is slow, or we can have a single DFA
which is fast but uses a lot of memory, and there's a curve in between that we
can move on by controlling the number of DFAs that we combine. But this is not
what we want. We want to get in that ideal spot there. We want the same
memory as the separate signatures, and we want matching to be no slower than
it would take to just match a single signature. And actually for a string matching,
which was used before regular expression matching we had exactly this
behavior. You just added more strings. Most basic explosion your automaton
becomes bigger, but your matching isn't any slower.
So let me tell you about how we achieve this and then we will go into other
topics. So again, some transitions are missing. But this is what we do. We have
these extended finite automata XFAs where we extend the underlying automaton
with a little bit of scratch memory, in this case a variable called bit that can have
two values, true and false, and we add some programs to the -- attach some
programs to some of the automaton states.
So how does this work? Why does this work? By adding these bits and this
extra computation, we have automata whose shape is closer to the shape of
automata that do string matching. So these don't have those bad properties that
we're trying to run away from. And but we still can recognize the same language
because we have these extra checks.
So this automaton is the same as an automaton that would recognize an AB and
a CD just totally independently. So at the initialization we don't just initialize the
state pointer but we initialize the value of the bits so the state of the computation
is a pointer to an automaton state and the value of the bit. And then on every
character, we keep following transitions and we don't touch this scratch memory
that holds this variables, the bit, until we get to a state that has a program
associated with it that had the bit and then we keep transitioning, and eventually
we get to a state that's an accepting state but it's not an unconditional accepting
state now we check the value of the bit. So we don't accept if we just see a CD
that's not followed, that's not preceded by an AB because the bit wouldn't have
been set.
So the gain is not -- if you look at the single automaton but the gain is when you
look at multiple automata. Again, I'm simplifying. And when you combine them,
then you don't get the state space explosion. So single automaton in this case is
not even smaller, but when you combine them you don't get the state space
explosion. But you get the nice automaton whose size is linear in the number of
signatures that you combine.
Now, let me just say a few words here about how this combines operation works.
So we are combining two things. We have the underlying DFAs and then we
have these programs associated with states. Now, in our system the programs
and the variables in scratch memory do not affect at all transitions in any way.
They only affect acceptance decisions. So we combine the underlying DFAs just
as you would combine normal DFAs. So it's a very simple well known, well
understood operation.
Now, what do we do with the programs? Well, each state is in the combined
automaton is reflection in the two states in the two automaton you combine, and
all you have to do is to just concatenate the two programs. With an extra twist
that you may have to rename some variables if both have a variable called bit,
then you will call one of them bit one and the other one bit two. But there's
absolutely no interaction because they work on different variables and all do you
is concatenate the programs. Well, and if there's an empty program, then your
program that you combine with is just the same.
So any questions about why or how this works before we move on to another
example?
>>: The reason behind this is that you're trying to basically save -- if you have
multiple regular expressions save subsets of them [inaudible] through these
variables?
>> Cristian Estan: Yes.
>>: So that you can come back -- is that the transition of this ->> Cristian Estan: Yes. So basically we are dividing the work between DFAs
and then these extra variables and this memory and the programs. And we are
using the automata for things they are good at, like recognizing chunks of regular
expressions and we are using bits for things that are bits and well counters for
things that are good at like tracking things that happen independently, and those
can be mixed very cheaply if you just have independent bits. But if you have
automata where things happen independently and can be interleaved arbitrarily,
then that causes to say basic explosion. So just separating the two and then
moving more the structure of the underlying computation.
>>: Looks like here if you were to say just treat those bits as -- if you were to
take the bits out of state space.
>> Cristian Estan: Yes.
>>: And just add them back into your dispatch tables would that exactly have the
standard combination of [inaudible] correct? Right now if I were to say every
place where there's a decision made on the bit, I was to imagine just sort of
making the transition tables in memory now larger to reflect what's the state of bit
one and what's the state of bit two.
>> Cristian Estan: That's exponential.
>>: I agree. But my question is is it exactly isomorphic to the standard
exponentially large thing?
>> Cristian Estan: Not that way. So the way I would say it's isomorphic is that
the state of the computation is the point at the current state and the collection of
all the bits. And now your transitions update one part of it and then update the
other part of it, but by representing the two separately we can represent things
much more compactly.
>>: Okay.
>> Cristian Estan: And then actually later in the -- when I get to the compiler you
can see that these updates to these bits all throughout the compiler are in
transitions just like the updates to the state variable, not on states. But it's more
efficient to implement it, associating things with states not with transitions
because there's much fewer states than transitions.
>>: So when you talk about saving memory, are you mostly talking about saving
the memory of the tables, the program or saving memory in the space? It seems
like the memory of the states is going to be about the same but the programs are
going to be a lot [inaudible] the transition table slash programs.
>> Cristian Estan: So the state of the computation which is the per flow state you
need to save between one packet and another is not -- is getting reduced
usually, but that's not the significant reduction. The size of the data structure,
which includes the automaton states, the programs, all that, that gets reduced
significantly.
Okay. So let's look at another example that's not as bad but still important in
practice. If we have a signature like this one where we are looking for -- and this
is typical for buffer overflows. We are looking for a new line followed by a key
word well which in this case is just the letter A followed by 200 non new lines.
Then while this is the automaton with some transition submitted that would
recognize that. And in itself it's inefficient because it just uses 200 states to just
count how many non new line characters have I seen after the key word, but
when it gets combined with the well behaved string matching automaton it gets
even worse because the string matching automaton gets replicated because that
string can occur in any of set after this command. So actually we have KN
squared states for answer signatures where K is this number, this number that's
inside the integer range constraint, so how much we can't do. Now with XFAs we
can use instead counter, and it's a bit trickier than the other one, but it works. So
if we see a new line followed by an A, then we initialize the counter to zero. And
then we would normally come back to the state K and keep just staying in that
state and then whenever we move back to that state we increment the counter,
we check if the counter reach 200 and if so raise an alert.
That's not the whole story. That wouldn't give us the correct semantics. So what
if we see a new line? So actually we have to add a countering validation
program to the state that comes after seeing a new line and now this has exactly
the correct semantics. It recognizes new lines followed by As, followed by 200
non new lines.
And it has exactly the same shape as an automaton in recognizing the string new
line followed by an A. And when we combine it to another string matching
automaton we have again this nice shape of automata that just recognize strings.
The programs get copied in different places. And if you look at state KR there,
that's a first instance of an example where the programs from the two states that
are combined are non empty and then we actually have a concatenation.
So we are incrementing the counter and checking and accepting signature one if
it's 200, but we are also unconditionally accepting signature two because we
have just seen a BC. So and this is linear in the number of signatures we
combine.
So the core idea is to just take automata with -- and add these extra variables,
put them in a scratch memory and it works because it allows us to change the
shape of the automata any way that doesn't cause explosion and then these
extra variables they don't cause explosion because we just concatenate the
programs and we just concatenate the scratch memories basically when we
combine automata. So that's the core idea behind what we do behind these
XFAs. Okay.
>>: So how large those programs can be, like those variables [inaudible].
>> Cristian Estan: So for now we use bits and counters and we have instructions
for setting bit three, setting bits, testing bits, incrementing counters, invalidating
counters, testing counters, and that's all we needed to get where we are. But
you may find some other instructions useful or some other data structures useful
if you different signatures as to what we had.
>>: So in -- okay, so in theory these XFAs can recognize more than regular
expressions.
>> Cristian Estan: No. If you ask a theoretician this didn't recognize anything
more than regular expressions because we have finite state. The number of
variables we have is finite, they are finite counters, so the state of the
computation is finite, just like with DFAs. Now, if you ask a practitioner then well
a counter counting, you know, 64 bit counter is very different from an automata 2
to the 64 states. So in practice it allows to you do things that you wouldn't
normally do with regular expressions, but from a theoretic point of view, it still
strictly recognizes on regular languages because the state of the computation is
finite, it's just better structured. And there is structure there in the signature sets
that we looked at and that people use for intrusion prematching.
>>: So [inaudible] is really neat, but just because like the state is finite means
that does one follow from the other that any program that has finite state will only
recognize regular languages? Is that what you just said or ->> Cristian Estan: Yes. Because if it has finite state then it can just represent it
with -- well, if it has finite state then it goes through this input bytes one by one
which we do. Then you can represent any change in the state of the program as
a transition. So -- and of course theoreticians say well, be you have counter, so
if you give me two infinite counters then we have a touring machine. But these
are not infinite counters, these are just finite counters. So to the theoreticians
this is regular expression matching and regular languages. It cannot do anything
more than that. Okay. So let me tell you about a couple of the things we had to
do to demonstrate this in practice. First what we need to handle regular
expressions that are very different from these simplified examples that illustrate
well what we do. But I may be -- well, a lot cleaner than what we see in practice.
Then we need a compiler actually to take us from regular expressions to XFAs
because we don't want to be building these by hand. And actually it turns out
that there's a lot of mileage we can get out of optimizations on the combined XFA
because this structure allows us to optimize away things that help us with
performance and memory usage.
>>: Just curious so [inaudible] XFAs as DFAs plus [inaudible].
>> Cristian Estan: Yes.
>>: But [inaudible] total [inaudible] is to have as compact [inaudible] as possible?
>> Cristian Estan: Yes. Of the data -- yes.
>>: Essentially if you replace the DFAs with say [inaudible].
>> Cristian Estan: Yes.
>>: [inaudible] maybe a [inaudible] increased or [inaudible].
>> Cristian Estan: Exactly.
>>: Do you actually get better encodings for them or essentially what [inaudible].
>> Cristian Estan: So, well, the NFAs are much more compact. They are
quicker to build but matching is slow. So what we have is a single DFA for the
entire signature set that matches the signature set, and you pay the cost of a few
instructions for the programs. Whereas with NFAs if you do a [inaudible] reversal
then you are at many states at any given time and your update gets complicated.
Or if you do backtracking then while you can defeat a lot of work because of
backtracking or you can just do a lot of extra work because of the backtracking.
So matching is one slower and often less deterministic in terms of the time it
takes to go through the input.
>>: [inaudible] because I don't [inaudible] probably be a [inaudible] if the DFA
becomes too large essentially [inaudible] become more complex or [inaudible].
>> Cristian Estan: So in some sense this original picture here -- I shouldn't go
back this way. Okay. If you look at the corner for DFA, for each signature, that's
like an NFA where it -- it's in parallel in all these states for the different signatures
and you have an epsilon transition to the start space from a common start state,
so that's an NFA. So of course for NFAs there are many types of NFAs you can
build for a signature, so it's not like the DFS or random [inaudible]. But the way
to think about it NFAs are that lower right corner pretty much.
>>: So let me ask this question.
>> Cristian Estan: Okay.
>>: [inaudible] I can figure out. Should we build a pro regular expression at this
point in the XFAs? Or is this for routers only or where [inaudible].
>> Cristian Estan: Okay. So the whole regular expression library is actually
used in intrusion prevention systems, and it is an NFA based approach that uses
backtracking. Now, there are a couple of advantages we have. So one
advantage is that we can -- we don't have -- with PCRA, that's the library, if you
combine the regular expressions with the big oar, then it gets probably very slow.
And you have to match them separately. And then you pay the time -- the cost of
matching this signatures separately. So that's our advantage. Their advantage
is that it's very quick to match. So we go through some work and determinization
building the XFAs, combining XFAs before we can match. So in many settings
where you would use per regular expressions you don't want to spend a lot of
time optimizing and combining things, you just want a quick match and maybe
you'll be using the regular expression just once.
And then another advantage that it has ->>: So let me just -- so it's also a size of the -- so it's also a function of the size of
the unit of the data?
>> Cristian Estan: No, it's a function of how many regular expressions you have
and how complicated your regular expressions are. So basically we explored the
fact that we can spend a lot of time -- actually it's not that much but we can spend
time on trying to get to a signature that allows fast matching. They have to care
much more about the time it gets -- it takes to get the representation where they
can do matching plus they have all types of results that we don't have so in many
practical application you need those so it's not ready to replace them. But an
intrusion prevention system where you look for this high throughput processing, I
would argue that this is what you want.
Sorry. Okay. So let's see. With general regular expressions we have two -well, we have one big problem. How does the compiler know when to use bits
and counters and actually this is the question about introducing bits and counters
in the front end of the compiler. So for integer, for counters it's easy because
there is a give away, there's this integer range notation which is a syntactic sugar
on the original syntax but it's used extensively, so whenever we see something
like that, it means between M and M repetitions, then we insert a counter. For
bits it's a bit more complicated. We introduce a parallel concatenation operator
that we insert in the regular expressions and it's the same as the normal
concatenation operator. And it's the same in terms of semantics but it introduce
a bit in the construction.
And what we are trying to do with this parallel concatenation operator is to break
the regular expression into chunks that are more string like. And we have some
heuristics that do this like for three quarter of the signatures, and then for 15
percent of the signatures we have to adjust the insertion of this operator
manually and so we cover roughly 90 percent of the signatures in the data sets
that we looked at.
So here are some examples. So for example if we have -- and these are actual
regular expressions from Snort, the open source intrusion prevention system
that's most widely used. So the first regular expression looks for ping dot ASP
that can be preceded by a slash or a backslash. It's not exactly one string, it's
two possible strings but for us it's close enough to string matching that we don't
introduce a parallel concatenation operation. The second one is actually the type
of signatures that we have seen. The first string followed by a second string.
So the first string is BAT double quotes and the second string is the ampersand
sign. So this we break into two before the second dot star by inserting this
parallel concatenation operator which doesn't change the semantics but it tells
the compiler to insert a bit. The last one is similar to the expression that caused
the polynomial blowup so new line followed by key words followed by 300
repetitions of non new lines.
Now, this is not obviously string like, but for our compiler having a character class
such as non new line, that's very large is the same as having a dot and having
300 repetitions gives us the same shape of automaton as the closure that looks
for an arbitrary number of representations. So we insert that parallel
concatenation operator before this large number of representations because
that's like the beginning of another dot star string. So that's how we -- yes?
>>: What's the meaning of the [inaudible].
>> Cristian Estan: The rule number, that's just the identifier that Snort gives to
these rules.
>>: [inaudible].
>> Cristian Estan: Yes, yes, yes, yes, yes. And then that's -- if it's configured as
an intrusion protection system then you log rule number blah alert or something
like that. So actually we are still working on a theory that would allow us to fully
automate this step, and we have a lot of intuition about what causes state space
explosion and what doesn't, but we need to make some progress on a theory to
be able to make principled and informed decisions about how to break this
regular expressions.
Okay. The next step is compiling a regular expression to an XFA. And this is the
only slide where I'll have a nondeterministic automaton on the slide. And inside
the compiler of course we use nondeterministic XFAs. So what do they have?
They have a set of states. They have data domain which for our compiler is this
unstructured set so for example it's 012, be so if we have data domain that can
have four values, you know, the compiler most of the time doesn't know whether
those should be two bits that can be set or set independently or a counter that
can count from 0 to 3 or a counter that can be invalidated and then it can count
from 0 to 2. So it's all [inaudible] it's just a set. It's just a set of values for the
data domain.
We have an input off of that. We have a nondeterministic transition relation. We
have an updated relation for the data domain, so inside the compiler actually we
update this data domain, the value of the data domain on the transitions. And it's
a nondeterministic relation and later it gets determinized. And we don't just have
an initial state, we have an initial configuration because now the state of the
computation is the point to the automaton state and the value in the data domain,
so state K and the value has to be 0. And for acceptance we just don't have an
accepting state but the set of accepting configurations so we accepting state N if
the value of the data domain is 2.
So we use these N XFAs and go through the normal steps of the construction of
the Thompson construction for DFAs from regular expressions. So we build the
nondeterministic XFAs from the pars tree of the regular expression, then we
eliminate the epsilon transitions, then we have two separate determinization
steps, one to determinize the transitions, another one to determinize these
updated relations and turn them into update functions. And we have two steps
for minimizing the data domain and the state space, but the one for minimizing
the state space is not implemented in the results I'm going to show. And actually
I put minimize in quotes because we don't have the concept of a canonical
minimal XFA like there is for DFA, so it's more reducing the data domain. And
there's tricky, tricky step at the end.
And this is one of the things we would like to get through and make progress on.
So this is another reason why we're still working on this project. So we have
these data domains that -- and these update functions that are described as sets.
But what we actually want is efficient programs that would update the data
domain and give some structure to the data domain. And wee need to find an
efficient implementation of the data domain. So going from this will take
unstructured representation to something efficient. So if you have four values,
for example, this step would recognize that for this automaton I have to use two
separate bits, for this other automaton I have to use a counter that can go from
zero to three. So there is this last step of finding the right structure for the data
domain. And actually we move the updates to the states because it's more
efficient to match that way. But that's an easy step at the end.
So I'm not going to go through all these steps. So but I have here examples of
nondeterministic XFAs that we get during the process. So the one on the right is
what we get if we combine expression one and expression two with parallel
concatenation. It's similar to have two automaton that recognize the two
expressions separately. But because we have the bit, it acceptance just -- if it
sees the second expression before the first expression. And actually if there are
overlaps between them, then this still guarantees the correct semantics. So it
can handle arbitrary cases even if there are overlaps between the strings, the
two regular expressions, they are handled correctly.
And the other example is the shape of the automaton that we get if we use this
integer arranged location to introduce a counter. Okay. Let me talk about
optimizations. And actually there are two optimizations that help us a lot. For
some signatures that's at least -- one is that for the example we saw we have this
counter that we have to increment on every byte of input pretty much. And that's
not the problem but if we combine many signatures like this, 15 signatures like
this, then we have 15 counters to increment on every byte. That slows down the
processing. So can we do it some other way without incrementing the counter on
every byte? And what we do is for some counters such as the one used here,
we just don't increment them on every character, but instead remember when
they would trigger an alert. So we have this global data structure. Think of it as
a timer and we set an alert in that timer when we would initialize the counter.
And then instead of incrementing the counter on every byte, we just check to see
whether any of these alerts that we had timers for is triggered or no. And then of
course we remove it from this list when we see new line.
And what we did here is we removed the increment operation from the common
state. Now, we still have this operation of -- from state K, which is the state in
which the automaton spends most of its time. Now, we still need to check
whether there are any timers that expire on the current byte. But that's a
scalable operation because that's a single check and we perform a single check
whether we combine 15 signatures or whether we have a single signature. It's
not like incrementing 15 counters which is more work as the number of
signatures increases. Yes?
>>: [inaudible] other counters check the [inaudible] right?
>> Cristian Estan: No. So we can have a counter that's being invalidated by a
new line, we can have a counter that's being invalidated by a space or a ->>: [inaudible] and another one that [inaudible].
>> Cristian Estan: Yes.
>>: Then ->> Cristian Estan: No. Then this won't work. So if you have -- then we wouldn't
apply this optimization. So this applies if you have counters that are incremented
on most possible input characters. And we don't always supply these
optimization. For some counters, for example, for some e-mail signatures you
are looking for a certain number of certain characters but just very specific
characters. So a number of repetitions for the at sign or for the percentage sign
and so there we don't use this.
>>: [inaudible] signatures, how many repetitions do the regular expressions
specify?
>> Cristian Estan: So hundreds.
>>: Hundreds, really?
>> Cristian Estan: Hundreds. And there are some that go up to two or 4,000.
But ->>: So [inaudible] so many -- I mean, what are the signatures looking for so that
contain hundreds of repetitions or something?
>> Cristian Estan: Buffer overflow, yes.
>>: And the question is why 2,000 why not just [inaudible].
[brief talking over].
>> Cristian Estan: Well, if you have a buffer that's 2,000 characters then you
need at least that magnitude to cause trouble. So we haven't looked very deeply
into the reasons behind these signatures, but we are assuming many of these
are buffer overflows. The ones that do the counting. This type of counting.
Okay. There's another thing we can do. So there is an opportunity because
often these counters are what we would call mutually exclusive. So if you want
to count the number of new lines after the hello key word in NSTP, and if you
want to count the number of non new lines after another key line in NSTP, well,
you will never be at the same time on a line that had one key word first and at the
same time on a line that had the other key word first. So those two counters,
they are never used at the same time. So what we can do is use actually a
single counter and when the signatures are combined notice that we can use a
single counter to look for both types of repetitions.
So this is the combined automaton for not doing this optimization and if we use a
single counter, then the only changes really that we have fewer alerts to cancel
when we see a new line. And that's actually an important benefit because now
it's not as bad as before that you have, you know, 15 counters to increment, but
after a new line if we have these separate counters, we have 15 counters to
reset. So we are reducing the length of that by using this optimization. It's
actually more general. We have a dataflow analysis, and it's similar to some of
the analysis done by compilers. But this is the most important case in which this
optimization of using the same counter for multiple signatures is applicable.
It's in some sense similar to what compilers do when you have different variables
that use the same register. They do some analysis, and if they figure out that
you can use the same register then they use the single register and you don't
need two registers for two variables that are not used at the same time.
Okay. Let me talk a little bit about our experimental evaluation. And so we use
signature sets from Cisco IPS and Snort and then we use F signatures which are
at least text based protocols where regular expressions are used a lot. There are
other protocols also that use regular expressions but we just focused on these
three. And we're able to construct XFAs and DFAs for 90 percent of the
signatures and for DFAs for neither of these signature sets could we fit the,
combine the FA into gigabytes of memory. Actually I think we also tried with 16
gigabytes and it still wouldn't fit. So they are way larger. Even for you know the
small Les signature set which has 38 signatures. For XFAs the number of
automaton states we get is on the order of well from hundreds to 15 thousand
states, so that's still a few megabytes or 10s of megabytes of data.
So in all cases we could just produce a single XFA recognizing the entire
signature set. Now --
>>: Question ->> Cristian Estan: Yes.
>>: So when you say you construct this for 90 percent of the structure what was
the problem for [inaudible].
>> Cristian Estan: The problem for the remaining 10 percent is that our compiler
-- the problem for the remaining 10 percent is that they were different from the
others. We haven't picked you know the hard or the easy ones. And for our
compiler we have that last set when we need to add the structure to the data
domain and there would be just many variations on bits and counters and it's -so for the most common shapes of signatures we went through the manual work
of building descriptions for those structures for the data domains, but it's a
manual step. So when we got to the signatures that are not exactly like the other
signatures, it's just a lot of work for solving the last 10 percent of the problem. So
we stop there. And well, we decided that it's better to try to come up with an
automated compiler as opposed to just pushing that through 100 percent. It's not
the fundamentally limitation of the idea of using the XFAs it's just our compiler is
not automated enough to do things on its own entirely.
>>: When you say [inaudible] so for me -- when you said you actually have a
large memory does that mean that the term ID systems don't actually do the
signature matching?
>> Cristian Estan: They don't use a single DFA, they use multiple DFAs. They
have to use multiple DFAs. Or some things other than DFAs. If we have time
after half past 11 then I will go into a couple of things, a couple of other ideas that
are being used and their pros and cons.
Okay. So we compare against multiple DFAs and multiple DFAs with
compressed transition tables which is something that has been proposed and
SIGCOMM 2006 and this is the type of results that we get where the black
crosses that are just the multiple DFAs and then the diamonds are the
compressed DFAs. The horizontal the axis is the processing time, so the further
you are to the right it means the slower the solution and the vertical axis is
memory usage so the higher up you are, the more memory you use. The vertical
line there is the execution time of a single DFA, which is -- well, we cannot use
that to represent the entire signature set, but we put there -- we put it there for
reference.
So this is the data point we get with an XFA before we apply these optimizations,
and this is the data point that we get with optimizations, so we can see that it's
faster and more compact than the solutions we can have based on multiple DFAs
for exactly the same signature set.
And this is for Cisco SMTP, which is one of the simpler signature sets. Snort
HTP this is the most complex signature set we looked at and again the axis are
on large scale, so the differences here, these are factor of 15 and memory
consumption and the factor of almost N in speed up. Now, why are we faster?
We are not faster than the blue line, so we are not faster than a single DFA
because we do what a DFA does plus run some little programs. But if we
compare against something that can recognize the same complex signature set,
which in this case is 40 some DFAs and they use much more memory, then we
are faster than a bunch of DFAs. We are not faster than an individual DFA. But
we are faster than the current solution which is that of using multiple DFAs.
>>: So this cycle, this is for the commodity [inaudible].
>> Cristian Estan: Yes, this is for a commodity pentium and so many of the
existing intrusion prevention systems don't run on commodity pentium, but this is
something we could measure that would give us an idea. And well, if you use
something faster or if you use something specialized than the absolute number is
changed. But we expect to see the same types of curves.
>>: I'm curious, is that [inaudible] memory size of the transition tables are
[inaudible].
>> Cristian Estan: It's hard to say from these numbers, but our suspicion is that
a lot of it is because of memory excesses, also. Because these are large data
structures, and there is some locality, there is some popular states, but every
now and then you go outside popular states and that's a very slow operation.
So things that we do such as incrementing, manipulating a few bits and counters
that are relatively easy to cache, we assume that that actually doesn't generate
as much memory traffic as looking up many transition tables. Even if you have a
cache ratio of 90 percent, if you have 50 automaton, then two out of how many,
five out of 50 will still require slow memory accesses and that -- we speculate
that that's part of the reason. It's a combines of both. But -- well, it's hard to say
from the numbers we have.
>>: A question. So the cycles is not actually operations executed or ->> Cristian Estan: That's pentium performance counter site, yes.
>>: Okay.
>> Cristian Estan: And we are not counting for the alerts themselves we have a
null routine, so we are not spending any time on, you know, logging or anything
like that. And we are not counting the time to read in the input. So that's what
our methodology was.
So let me summarize -- so regular recognition matching is a performance critical
upon for deep packet inspection and specifically for intrusion prevention systems
but for other systems that do deep packet inspection also. And the state space
explosion leads to memory problems that translate to run time, throughput
problems with DFAs because you have to go to more than one DFA. And
[inaudible] extend these DFAs with auxiliary variables and the effect of that is that
we [inaudible] this space explosion by having underlying DFAs that don't interact
adversely. And where we are now is that we need more work to fully automate
the construction effects XFAs but we are able to build a prototype with 90 percent
of the signatures that XFAs can outperform multiple DFAs in both matching and
memory usage.
So -- yes?
>>: I have a question like how, like how about these compilations. How hard is
this? Like if you [inaudible] how long will it take [inaudible].
>> Cristian Estan: The computation stuff?
>>: Yes. So if he goes like for a company, right, so a company can actually
throw the Snort thing with whatever the regular expression they are using right
now, it's slower but it's just, you know, they press a button for it to work or they
can do this stuff that I understand it can be automatic but it's not right now. So
how hard is to do this, to actually combine all the regular expressions to these
structures?
>> Cristian Estan: So the hard part is not the combination. So when we
combine regular expressions for five out of the six signature sets combining from
individual XFAs is less than a minute, including all optimizations for the most
complex signature set it's seven minutes. And this is just combining them one by
one. So it's ->>: [inaudible] hatch signs, all that?
>> Cristian Estan: No. So combining the XFAs, that's the easy part. Building
the XFAs that's the hard part. And with our current tools for -- I don't know off the
top of my head the exact numbers, but it's very close to 85 percent of the
signatures that we compile are under 10 seconds.
>>: [inaudible] I would imagine they have a long lifetime and get used widely.
Like if somebody were to count the signature, if combination is easy, somebody
needs to do it once manually and then just keep using it, right?
>> Cristian Estan: That's, that's, that's one argument to make. But if they don't
buy that, then we have to come up with a compiler that works for more
signatures.
>>: What I meant to say, sometimes [inaudible] can be a deal breaker.
>> Cristian Estan: Yes.
>>: For something ->> Cristian Estan: For intrusion prevention it is not.
>>: Right [inaudible].
>> Cristian Estan: Well, we don't think it should be, but Cisco may disagree.
And the other comment is that in some other settings where you have automated
systems coming up with signatures and you want the user to insert for application
identification you give more freedom for the user to insert things they want to
recognize. There it's more important to have a fully automated way of going from
this frequently updated signatures to an XFA representation.
So okay. Since we have more time, I can go through a couple of other ideas that
have been used. And actually this idea of separating the variables from the
automata, we think in a real system it would be used in combination with a
couple of other ideas that have been out there.
So let's just go through a couple of extra topics here. The first one is exploiting
harder parallelism, the second one is slow path, fast path. This is what Snort
uses. Nondeterministic automata and compressing transition tables. So if we
have multiple signatures to match against the input then we can do this internally
if we are parallel hardware and people who do FGA based solutions or build
integrated circuits, they can have as much parallelism as they want and have
work be done in parallel.
Now, this -- so the advantage is that the error increases linearly with the number
of signatures, and there's no slow down as the number of signatures increases.
But what happens is if you have too many signatures and people would argue
that we haven't reached that too many point, then the power consumption goes
up because you have all these microcontrollers or cores working in parallel and
the per flow state gets very large.
So imagine that you have 500 microcontrollers each with their own state pointer.
If you have a single automata and then you have a single state pointer. If you
have 500 automaton matching in parallel then you can't even broadcast
something to them, you need to add these state pointers one by one and then
remove them after you're done with the packet. So the throughput, if you are just
looking at the string is okay. But if you need to contact switch and then you'll
save the per flow state, then that becomes a more expensive operation with
these types of approaches.
>>: I would assume in most [inaudible] you would be wanting to compare
multiple flows rather than just a single flow?
>> Cristian Estan: Multiple flows rather than a single flow. So exactly. The other
point is that if you have this parallelism available in hardware, then wouldn't it be
better to have a solution towards the single automaton instead of using 50 cores
to match one packet, use the 50 cores to match 50 separate packets and
increase your throughput that way.
>>: It might be more efficient to process flows in [inaudible] process each
individual one in parallel. I don't know.
>> Cristian Estan: So what happens is you often get these papers that look at
the whole design space and then there's a cost of working on different packets
because then you need different local buffering, they cannot use the same buffer.
So it's hard to piece apart the Ys. But basically there have been proposals that
proposed using this multiple cores to work on the same input and there have
been proposals to work on the separate course and harder parallelism and
different inputs in parallel. It's more expensive to work multiple inputs in parallel
than it is to work on the same input and then have multiple processing units work
on it. And put again, this is something that has been proposed and it is useful so
for example in an XFA based system if our compiler is not good enough, we can
still have one XFA and then a bunch of DFAs that would run in parallel. And that
would be a better solution than running way too many DFAs in parallel.
So another solution is a fast path, slow path. And there are many variants of this.
Snort is using this extensively. So you can make the signature into something
simple that you can recognize efficiently and where you can combine the
signatures such as string matching and that's exactly what Snort does and then if
you see a string that must be there for the signature to match then you go
through a slow path that would actually check everything, not just string
matching. So you don't alert whenever you see that string, you get the good
semantics, but most of the time you don't do the expensive matching. So
typically the way it works is you combine these fast path representations but then
you match individually these slow path representations that give you the
semantics you want.
>>: [inaudible]. Don't you [inaudible]? I mean, suppose that back followed by
star and [inaudible] right so [inaudible] string you could look for on the fast path
but then you need to know what passed before so you [inaudible] on to some of
the data that went before to actually raise an alert.
>> Cristian Estan: Yes, yes, yes. So you can do this if you have all the data, you
do the fast path pass first and then you may need to come back. And then how
do you know if needed? Well, in the Snort signature language it tells you these
strings and then if the string doesn't occur then the signature cannot match. So
for example the key word was describe. You look for the string describe. If the
string describe is not there in the input, then you cannot see a new line followed
by describe, followed by 500 non new lines.
>>: So you're essentially breaking individual signatures, you're not saying, okay,
these signatures are really slow, I'm going to put them in the slow path?
>> Cristian Estan: No, no, no, no, no. You're making individual signatures. And
that's a very good technique. So our gripe with this approach is that it's open to
algorithmic complexity attacks which is -- the problem is not that it's algorithmic
complexity attacks can happen but that people don't really look at how bad they
can be. So obviously if you have this fast path slow path then by triggering the
slow path more often than with normal traffic you can slow it down. But people
proposing these types of approaches don't really try to break their own scheme
and see how vulnerable it is.
So I don't think it's a -- well, some of them. Okay. So I don't think this is a bad
idea, but if you have such a system then we would argue that the best way to
build it is to have some quantification for how much it can be slowed down by
someone who adversarily tries to trigger the slow path. and then there are things
you can do to make it hard to trigger the slow path very often. But it's not always
done. So another thing that's use is nondeterministic finite automaton and they
are very compact, you can build them quickly, but you need more processing
when you match them. So if you do this breath first it's similar with having the
multiple DFAs. If you do a depth first reversal of the state space then
backtracking can get you in trouble, and again you can have algorithmic
complexity attacks and the result is that for actual IPS signatures we're able to
slow them down by six orders of magnitude just by giving them inputs that cause
them to backtrack.
So this is actual Snort signatures that I'm talking about. So ->>: [inaudible] for example?
>> Cristian Estan: So we had some earlier work that looked at backtracking but
not inside NFAs but inside Snort's language for signatures which is a predicate
based complex language. And what we did there is we slowed it down by a
million and a half by triggering backtracking. And then if we applied some
dynamic programming techniques, memorization specifically, then that slowdown
went away because we could check and not do work that wasn't necessary. So
backtracking an adversary, if you do backtracking in a simple minded way, then
you can redo work that you did previously, and that ultimately led to failure. And
you just keep it because you don't keep enough memory of what you did.
Now, if you do this dynamic programming where you remember that if I tried this
from this input, from this position in the input it will lead to failure, then you can
cut down very much on the power of this algorithmic complexity attacks.
Now, I don't know how easy it is to integrate that with PCRE. With a given library
which is relatively messy. But it's a solution that worked for the specific
backtracking [inaudible]. Okay. And the other thing is compressing transition
tables. This is what we compared against. There are all types of methods of
compressing these large transition tables with 256 characters and we can notice
that many characters are treated the same by all states or that transition tables
for different automata states are similar and then use these similarities. And you
can get pretty far with easy -- with simple things a factor of 10, 20s is probably
easy to achieve without complicating very much the uncompressing you need to
do when you do matching. But it's hard to go past that.
And but we want to point out here that this is an orthogonal solution to what we
have for XFAs so we could actually further reduce the memory usage of the DFA
underlying the XFA by applying some of these techniques for transitioning
compression.
Another solution where -- another problem where compressed transition tables
are being applied is this multi-byte matching which I didn't talk about in the main
part of the talk. If you want to take not one byte at a time but two bytes at a time,
then you have an alphabet of 65,536 characters because you're looking at two
bytes at a time or even larger.
So for these alphabets, this compression techniques, they work better than for
smaller alphabets but still you cannot take too many bytes because they get
overwhelmed. So and that's the last of my extra slides, so if you have questions
then I'm -- I can answer them. If not, then we can end here.
>> Ratul Mahajan: Let's thank the speaker.
[applause]
Download