>> Nikolaj Bjorner: So is my pleasure to introduce Mahesh who is visiting from University of Waterloo. For today and will look forward to discuss access control policies and analysis of these over these, hear what he has to say in the talk and discuss it over the day, thank you. >> Mahesh Tripunitara: Thanks Nikolaj. Thank you for showing up and to whoever is watching as well. Okay so what I’m going to do is I’m going to talk over three problems. Each of them has an acronym; I’ll of course explain what they are. The first couple I worked on with my grad student Nima at Waterloo. The other one is Karthick’s work with some collaborators. Okay so just to give some context. So those are the three problems. Just to give some context, what is access control where we think of the setting where we have a user that tries to access a resource? We typically qualified the access with what the kind of access in this case read. Well, that’s not exactly how it works in computer systems of course because we think of an intermediary, like a processor or something that actually does the access. So the user instantiates an agent who does the access, that becomes a little bit important in one of, the context in one of the problems that I’ll discuss. Now we typically don’t think of every user being able to access everything. So conceptually we think of an intermediary called a reference monitor that mediates the access and allows or disallows. The question is how does the reference monitor make its decisions? The answer is it has access to a database called authorizations and it consults the database and says yes or no. This process is called access enforcement. Now the authorizations can change and that process called administration. The simplest model for administration is that we have a super-user that goes and changes the authorizations. For example, Alice is allowed to read the file today but not, you know maybe tomorrow because she’s no longer an employee. That’s the simplest model where we have entrusted super-user that makes all the changes. But that doesn’t scale so we usually have this, you know we have this notion of delegation where a trusted super-user delegates some power over changing the policies to a partially trusted administrator, and that’s called delegation. So this gives somewhat of a broad context in which I can pose the three problems that I’m going to talk about. Just in the interest of having a specific syntax for authorizations, specific language I’ll adopt a role-based access control. What is it? Well let’s just, rather than a user such as Bob or Alice directly getting a permission which is at the bottom I just give them names such as budget, hire, and things like that. Rather than each of them getting permissions directly we have an indirection that we call a role, in this example finance, human resources, and so on. Now if you’re all familiar with the RBAC you’ll notice that in this example I don’t show a role hierarchy and there’s a reason for it. It turns out that in the context that we discuss the role hierarchy doesn’t add any complexity. Okay, so first when we talk over the user authorization query problem, the first of my problems. The context that is the, kind of the context in which the problem is posed which is the access enforcement part, so here’s the issue, the way things are defined in RBAC and also in some real systems is if a user wants to exercise permissions she instantiates what we’ll call a session. So if you download the oracle database for example this is exactly how it works. What is a session? A session is a set of roles. Now the immediate question that arises is if I’m a user why don’t I just establish a session and, that has all the permissions to which I’m authorized? The reason a user does not want to do that is to adhere to the principle of least privilege. So the thing is you know a user should only establish a session which has the permissions that she really needs to get the job done, right. So for example if Alice wants to activate a session with the commission Pay, you know presumably she would activate only human resources or purchasing, right. She shouldn’t go and grab all the permissions that she has for the session. So in this context we get a natural problem. So here I’ll pose it as a decision problem. So given an RBAC policy for a user role and a subset of a permissions P, and let’s say an integer K, does there exist a set of K roles that yields the set P? Just by way of notation I might use this notation of a language rather than this input output thing. Of course the correspondence is straightforward, you know so language is a set of tuples and each component of the tuple we can think of as an input to the decision problem. In this case the policy, the set of permissions, and the K. Now it turns out that deciding this language this decision problem is intractable as shown in two thousand six. Now the thing is that okay why do I care about the decision problem? Where there’s an optimization version that’s kind of a natural correspondence and the optimization version of course we don’t have the K. We just ask okay this user wants to establish a session with these permissions, what is the minimum size set of roles that she needs? Of course given this NP completeness result for the decision version we know that the optimization version is NP-hard. So there’s a natural correspondence between the two. Now there’s another version of this UAQ problem which I’ll call UAQ-P the focus on permissions. So what people observed is well I don’t know how much we should care about a minimum set of roles that yields the set of permissions. Here’s another kind of take on this problem. We noticed for example in this example, so those are, I’ve eliminated the users because we focus on a single user, so in this case it refers to Alice from the earlier pictures. That if she wants a session with the permission P she has to make do with some extra permissions in the session, right, because there is no role that yields just that permission or no set of roles. So this is a practical issue. So she may have redeemed some extra permissions in the session and specifically it may be impossible to get an exact set of permissions. So in practice maybe we want to allow some slack, sure. >>: Professor the finance humor results for chasing this girl, right? >> Mahesh Tripunitara: Yes. >>: And this bottom layer is the permission? >> Mahesh Tripunitara: Yes. >>: Okay. >>: So the NP hardness [inaudible] what reduction is there [inaudible]? >> Mahesh Tripunitara: Yes... >>: And is that [inaudible] you stand and [inaudible]… >> Mahesh Tripunitara: Are actually from set cover it’s even easier. >>: Set cover, right. >> Mahesh Tripunitara: So basically we’re looking for a set of roles that covers some permissions, right. So this is exactly the set cover. So I use to have a line in the slide where I said this is straightforward reduction and set cover I just removed it. Because it seemed to, so this version what we do is we allow some slack permission. So don’t just, you know as input we don’t just specify an exact set of permissions, we say well these are the permission we absolutely need so here’s the new version. >>: So by that you mean that you [inaudible] fade but they themselves get permission to be [inaudible]? >> Mahesh Tripunitara: Yes, right so if she for example activates purchasing then she’s going to end up with the invoice then, because she cannot exactly get the pay thing. Yeah that’s exactly what I mean. So here’s a new problem or language. We have the RBAC policy role, we have a lower bound set of permissions, and then we have an upper bound set of permissions. So the difference between the upper bound and the lower bound, presumably the upper bound is a super-set of lower bound. The difference between them is exactly the slack that Alice is willing to tolerate. Now we have the K because what she wants to say is for example I want as few extra permissions from P ub minus P lb. She could also say now there’s a maximization version also that people have considered which is where she says well give me as many extra permissions as possible just to make it easier to get my job done, right. Now this version is also intractable. It’s NP complete specifically… >>: Is anybody interested in solving this [inaudible], because this is a [inaudible]? >> Mahesh Tripunitara: Right. >>: [inaudible] >> Mahesh Tripunitara: So in practice these, I can tell you that from what I know that it’s specifically oracle. They don’t implement this but they have exactly this, meaning they have the notion of a session, they have notion of roles, and things like that. So it may be meaningful to put it in there. But I haven’t talked to anybody at all from the… >>: Because it seems to be like a delay, the hard constraint is the lower bound and then a set of different [indiscernible], some constraint [indiscernible] minimize the… >> Mahesh Tripunitara: Right. >>: [indiscernible] adjustments. >> Mahesh Tripunitara: [indiscernible]. >>: Yeah. >> Mahesh Tripunitara: So for the optimization version is we dropped the K and then we went to minimize or maximize the extra permissions, so this is [indiscernible]. Okay so, we came along and then we decided to, well somebody else actually combined the two. Now the only other, so the combination is that we, you know combined these two optimization problems. Now it turns out that in the literature and this comes from the [indiscernible] RBAC standard, people also consider an additional input which is a set of what I’ll call the separation of duty constraints. Each of these constraints looks like this. It’s a pair which is a set of roles and an integer T and the meaning, the semantics of the two tuple is that a user is allowed to activate at most T minus one roles, from that set out. So the idea here is some kind of mutual exclusion, right. So we talked about the principle of least privilege earlier the intent behind this kind of constraint is to force the user to adhere to some kind of least privilege. Basically saying that well if you activate these two roles then you cannot activate this role as well in the same session. That is the intent of this kind of constraint. So the general problem is like this, right. You have this policy role, you have the lower bound set of permissions, you have the upper bound set, and then you have these two integers K subscript down and K subscript P which capture the optimization objectives, and you have these constraints that have to be met. The question is whether there exists a session, you know that satisfies these constraints. So from the standpoint of computational complexity we have a lower bound because of special cases we know are NP complete. So that in general we know that, you know UAQ is NP hard. We also have an upper bound, it turns out the upper bound doesn’t change. Yeah? >>: [indiscernible] in the background of this assumption so. Do… >> Mahesh Tripunitara: This one? >>: Do, no the previous problems [indiscernible]… >> Mahesh Tripunitara: Okay. >>: You mean that there are some girls which are mutually exclusive. >> Mahesh Tripunitara: Oh, that is this one. >>: Yeah this one… >> Mahesh Tripunitara: Right, so the simplest case is that you give me a set of roles R, let’s say there are five roles and the T is two. The meaning of that is that at most one role from those five roles is allowed to be activated in a session. >>: Okay, so I mean [indiscernible] there any case like is [indiscernible] have the semantics. >> Mahesh Tripunitara: Uh, so this isn’t the R in the [indiscernible] RBAC standard. >>: Oh. >> Mahesh Tripunitara: So if you ask me about a real system I cannot, you know I mean I can’t tell you. >>: Okay you mean standard we have this for example term of [indiscernible]. >> Mahesh Tripunitara: Yes, that’s right, so I can’t give you a concrete like a system... >>: Okay. >> Mahesh Tripunitara: Operating system or databases. All I can tell you I don’t know for example to use the earlier example of oracle, I don’t know if they’ve implemented this in the thing, I haven’t checked that. >>: Okay. >> Mahesh Tripunitara: But it isn't in this standard. So one of the things, problems that I worked on earlier, actually I was still a PhD with my supervisor was, does this kind of constraint make sense? >>: Oh. >> Mahesh Tripunitara: So, you know the fact that they put in a standard we were asking does it even make sense to have this kind of constraint, right? The answer was yes. Like if you take a more abstract notion of separation of duty it turns out that these constraints do make sense. But nobody had looked to our, you know looked at it in the past anyway. >>: [inaudible] K R and K P [inaudible]? >> Mahesh Tripunitara: So those are the, kind of capture the optimization objectives. Because I want so many roles, you know I want a session with K R roles and K P extra permissions. >>: Okay. >> Mahesh Tripunitara: Let’s say that most, at most K R roles. So they’re exactly from the earlier K. So for this case K P because there’s extra permissions. >>: Okay. >> Mahesh Tripunitara: This K is the K R because this is the number of roles, right. So it’s a two dimensional optimization problem. >>: Okay, and do they have, so you get a preventable in the problems? >> Mahesh Tripunitara: Yeah so that’s what I was going to say here. So I’ll talk over that in a second. >>: Okay. >> Mahesh Tripunitara: Right, so you do get a [indiscernible]. Okay, so in terms of computational complexity it doesn’t change from the special cases. So the upper bound is the same, of course we have a lower bound. No the optimization version, well so the optimization objectives are the number of roles and the number of extra permissions. I just want to point out the, something that we expect which is natural for many of these problems that are NP complete is that given an oracle for the decision version we have a, you know polynomial time two interdiction from the optimization version to the decision version, right. So all we do is a two dimensional binary search on these two integers and then we get the optimal. But, yeah, you are right so we can have incomparable solutions. So this is a simple example. So the lower bound set budget and pay and the upper bound set which is a super set includes the hire and layoff, and we have no constraints. Then we have two optimal solutions depending on our optimization objective. So if we want to minimize the number of roles then we should pick human resources, but that yields two extra permissions, right. But if we want to minimize the number of extra permissions then we should pick those two roles. But, you know that gives us two roles. So, yeah, so we end up with a [indiscernible] up front. So what we say in the paper is there is one way is that we could explore the [indiscernible] up front or we could just prioritize one of those objectives over the other, saying it doesn’t… >>: So select purchasing… >> Mahesh Tripunitara: Right. >>: You get invoice. >> Mahesh Tripunitara: Right. >>: Then [indiscernible]… >> Mahesh Tripunitara: Oh, you’re right so I should have put invoice there, I’m sorry. >>: [inaudible] >> Mahesh Tripunitara: It’s a bug in the slide. Thanks for point that out. Yeah, it’s a good point. >>: So you… >> Mahesh Tripunitara: My mistake. >>: But if you, so this magic says you must have, you cannot have anything that’s [indiscernible]. >> Mahesh Tripunitara: Correct, correct, so when I talk over the encoding as SAT that comes up. >>: Yeah. >> Mahesh Tripunitara: So you cannot have anything yet. So that’s a bug in the slide, sorry. Okay so what we did is, okay how do we maintain this intractability? So I take a computational complexity kind of view which is okay so we can use a constraint solver that is complete for NP or we can look to approximation theory because we have an optimization problem, or we can look at the theory of fixed parameter tractability. So let me look at approximation theory. So of course an approximation theory the interesting thing is that what they do, what the theory does is shift the focus from time efficiency to something called the approximation ration, which is one of the worst, so the point is that okay here’s an efficient algorithm for this problem. So assuming that P’s not equal to NP we know that there has to be a trade off somewhere. The trade off is one of the worst case, how bad could it be in terms of the solution compared to the optimal solution. So if you’re talking about a minimization problem we expect that in the worst case the approximation algorithm returns a result that is not the minimum. But how bad could it be if I take the result of the approximation algorithm and divide it by the minimum for that instance, right. So we expect two kinds of results for an algorithm. Two kinds of results, one I’ll call a positive result is I present an algorithm to you and I prove to you that in the worst case it achieves a certain approximation ratio. The other is the main thing behind the approximation theory which is that a negative result, which is to show that a certain approximation ratio is the best that you could hope for. So here’s an example of a result that we have. So the optimization version of the role thing which is UAQ-R opt, if it has the solution then we don’t expect to be able to, efficiently approximated within that kind of approximation ratio, right. So if you look at it closely it’s basically a horrible approximation ratio because it’s polynomial in the size of the input. So this is pretty much the strongest negative result you can have from the standpoint of approximation. Now why do we have the qualification assuming it has a solution? Well, it’s because of these constraint, because the constraints may result in no solution existing, so. In which case we have a pure decision kind of, so first we add is a pure decision version problem which is, does a solution even exist? Okay, fine even if it exists, even if I tell you given an existence that it exists, you know you don’t expect to be able to efficiently approximate it within a polynomial factor. >>: So it’s a square root in [inaudible]. >> Mahesh Tripunitara: That would not exist really because yeah… >>: Because of… >> Mahesh Tripunitara: Yes. So this is the strongest negative result… >>: Okay. >> Mahesh Tripunitara: In my understand that we can get from approximation theory, because a commanding number is efficiently approximately within Q brute of N or something like that, the graph coloring thing. >>: Okay. >> Mahesh Tripunitara: Right, but vortex cover is efficiently approximable within a constant factor. So there’s a very simple algorithm which is within a factor of two. >>: Okay. >> Mahesh Tripunitara: Like I’ll end up with a vortex core that is more double the size. But actually there’s another algorithm that I think… >>: When you said cover already like, or [indiscernible] cover you say it’s a cover [indiscernible] constant. It’s like okay, okay, right. >> Mahesh Tripunitara: So set cover is another is a different class because it’s log N. You can actually show that the log N is a type bone. >>: Yeah. >> Mahesh Tripunitara: Right it’s both lower and upper bound, yeah. So the thing is how do we prove something like this? Well there’s kind of the user friendly way is, at least for me is to, for those of us that are familiar with the NP hardness reductions, the CAR productions is there’s a different kind of reduction called a gap introducing reduction. The way it works is that you know we reduce SAT in a way that introduces this gap Alpha which is the approximation ratio. This immediately implies that it’s NP hard to approximate within, better than Alpha. Now for the permission version the CAR production from set cover to UAQ-P the decision problem can be easily adapted to a gap preserving reduction which is at, and because we know that for set cover we have a gap introducing reduction from SAT that shows that you cannot approximate in better than log N. We know immediately that UAQ-P opt cannot be approximated within better than log N. Now the thing to point out is that this is only a lower bound, right. So we don’t have an upper bound for UAQ-P opt. So it’s just saying that, so this is weak inapproximability result. We don’t have a better result like the lower bound could be worse. Okay, now fixed parameter tractability I’m going to be a little bit informal, this, the theories much richer than what I suggest here. So if you’re familiar with it forgive me for it. So the only question that I’m going to ask is the problem in P if we assume that some parameter is bounded in size? What is parameter? Well a natural parameter maybe an input, something input. So for example it may make sense to bound the number of roles in the policy or maybe the permissions these lower and upper bound sets because we are dealing with the least privileged. So, you know let’s say that Alice, we assume that every user wants to activate per her sessions that only have five permissions for example, something like that. If it makes sense to make those assumptions the question is does the complexity change? Because then maybe we have a different approach to deal with it. So here’s an example of result. So of course we assume P is not equal to NP. Then basically we have a necessary condition for UAQ-P to be intractable, which is that these three parameters have to be unbounded. So if any of them is bounded then we know that UAQ-P is NP. This is slightly overstated because if P lb is unbounded then we know that P ub is unbounded because P ub is a super set of P lb, alright. Okay so this is a one sided implication it’s not if and only if. Specifically we have this case that if only these are unbounded then the, we have an algorithm that NP. Of course there’s exponential time if you have so many of them as unbounded. We actually have a slightly better algorithm if we can make an additional assumption. So here we’re assuming that P ub is bounded so from the previous theorem we already know that the problem is in P. However, if we can make this additional assumption, so I can argue for it in practice which is that the number of roles that’s a solution is at most the upper bound set. So what is the justification for this? The justification for this is that well to get a permission I need at most one role, right. It doesn’t make sense to have two roles to get a permission, right. So that’s the justification for that. So if we assume this we get a more efficient algorithm. That’s the point of this. So these are examples of some results from assuming some boundedness. So that was the second way that we approach. >>: Is that the standard way that you informally describe the lower cased access where you only need one… >> Mahesh Tripunitara: Role… >>: Role [inaudible], right? >> Mahesh Tripunitara: Well the thing is if you think of a maximization version, I don’t know if that makes sense. If you’re trying to maximize the number of roles in a session, I don’t know why anybody would want to do that. Then that doesn’t make sense. >>: [inaudible] >> Mahesh Tripunitara: That’s the only thing. But what we’re saying is because the maximization version does not make sense that assumption maybe the reason. >>: Another [indiscernible] twists to the role based [indiscernible] two roles together [indiscernible]? >> Mahesh Tripunitara: So that’s a very good question. The short answer is no. Nobody has explored semantics far back that lead to that. However, I actually have a problem where those kind of semantics may make sense. Which is, we could call it like conjunctive semantics which is something like only a few are authorized to R one and R two, are you authorized for this permission P one? So there’s a thing that I worked on Java, you know using RBAC in the context of authorization to Java methods where I think that that kind of semantics for roles makes sense. Unfortunately the student graduated and we haven’t done anything since then, so anyway. >>: [inaudible]… >> Mahesh Tripunitara: Yes. >>: [inaudible] two keys that open the vault… >> Mahesh Tripunitara: Yes, yes. >>: Just slightly, yes. >> Mahesh Tripunitara: So the context there also is a slightly different which we can talk about offline. Which is a, which relates to the default, like you know the catch all, we usually assume that the default is deny all, right. That’s what we usually assume for firewalls and things like that, right. Now if you want a default of allow all as opposed to deny all as well then maybe for one of those options. I don’t know which one it is then this one makes sense. I mean we can talk over a little more detail about that work. >>: But anyway… >> Mahesh Tripunitara: But the bottom line is… >>: [inaudible] >> Mahesh Tripunitara: Bottom line is nobody is exposed… >>: [inaudible] because of the max… >> Mahesh Tripunitara: Yes. >>: Otherwise it’s [inaudible]. >> Mahesh Tripunitara: And with customary semantics of roles which is that if you don’t have this kind of, you need two of them. If you have any of them you get the permissions then that makes sense with the [indiscernible]. Okay, so the point of this is of course that we get a more efficient algorithm. Okay and CNF-SAT, I assume everybody knows what it is. So reduction to CNF-SAT, so just wanted to put the UAQ back up there. Now we’re reducing the CNF-SAT I just want to point out that capturing some of these inputs is easy. So for the standpoint of the policy for each role that’s authorized to a permission, let’s say for every role I introduce a Boolean variable R and for every permission P I introduce a Boolean variable P. For each role that’s authorized to P I just introduce a clause of this form and the meaning of the clause of course that I activate out then I activate P. For each permission that’s assigned to all those roles I introduce a clause of this form, right, because to activate P at least one of those roles have to be activated. For each permission that’s in the lower bound set I just introduce a clause just at permission by itself because it has to be activated. This is the thing we were talking about earlier which is that any permission that is not in the upper bound set I introduce a clause of this form. The only somewhat tricky, but so that’s pretty easy, the only somewhat tricky part is these constraints and these, you know integers that capture the optimization objectives because we need to be able to count, right. We need to be able, in the constraint remember that it’s a two-tuple or pair where you have a set of roles and integer T and basically what we want to be able to say is at most T minus one roles are activated from that set of roles. So we want to kind of be able to count. So it turns out that there was a previous paper on this problem that did subset enumeration which is of course not efficient. So what did we do? We looked to circuit SAT. So we first reduced to a Boolean circuit and then we went to the Carbon License and Book, Rubenstein book, textbook to reduce circuit SAT to CNF-SAT. They have an efficient reduction from circuit SAT to CNF-SAT. I think it’s a linear time or something like that, anyway. So basically we have building block circuits which are quite simple. So here’s an example of one, bitsum, what we call bit-sum is we want to add a bit X to the string Y N minus one to Y zero. We have a circuit for comparing two of these bit strings whether the count one is less or equal count two, maximum. So you can imagine that these circuits are pretty simple. So here’s a simple example of bitsum which is I’m adding X to Y one Y zero of course the output might be three bits because I may have a carry bit, right. So it’s pretty simple. All said and done the cost of our reduction from UAQ to CNF-SAT ends up being N squared log N. The log again we expect because of its counting, right. >>: But in a nutshell there in the previous subject you’re bringing a CAR [indiscernible]. >> Mahesh Tripunitara: Yes, yes. >>: So they’re different and many ways of doing the encoding [inaudible]. >> Mahesh Tripunitara: Okay. >>: So there are properties [inaudible]. >> Mahesh Tripunitara: Okay, so I wonder if, I mean I wonder if any is better than what we’ve done. Because I mean I can’t think of a way of doing other than just counting. >>: Oh, so you can sort so what you can do is so you can do the sorting circuit so that it shorts the things that are [indiscernible] first and then [indiscernible] constraint are [indiscernible] indicator where you go from zero to [indiscernible]. >> Mahesh Tripunitara: Oh, I see. >>: And the size of the sorting constraint we’ve been there [indiscernible]… >>: [inaudible] >> Mahesh Tripunitara: So does that actually mean that SATs are more efficient… >>: Well, [inaudible] >> Mahesh Tripunitara: Way of practice… >>: [inaudible] encode this [indiscernible] circuits and then encode them with different [indiscernible] bits as well... >> Mahesh Tripunitara: Oh, okay… >>: [inaudible] >> Mahesh Tripunitara: Okay so we didn’t think that… >>: Process of [indiscernible]. >> Mahesh Tripunitara: But it seems to me like in the worst case it’s still going to be some log N kind of thing. >>: Yeah. >> Mahesh Tripunitara: Right because... >>: If you want to get the smallest possible [inaudible]. >> Mahesh Tripunitara: Okay so we did something here called empirical [indiscernible] design. It’s an improvement of [indiscernible], why? Some people say well that’s an old solver. Well the reason one of the previous papers on this use that. So we just wanted to do a comparison. So we implemented the CNF-SAT thing and we have the, well we call it the fixed parameter polynomial. It’s one of the algorithms that I discussed where we assume a certain bounded thing. So anyway… >>: [inaudible] might take some time. >> Mahesh Tripunitara: So you’ll see. So they both perform pretty well. So here what we do basically is that because we have so many inputs and we want to draw a two dimensional graph we vary one of the inputs and we hold everything else constant, and then we look at, for, so you know I have an earlier paper where I propose a benchmark for RBAC so we tried it on those benchmarks. So they seem to do pretty well. So here’s one where we expect that kind of behavior, right. So this is completely expected because that a FPP algorithm we know is exponential in the size of this thing… >>: [inaudible]? >> Mahesh Tripunitara: So it’s basically, I think the one that’s shown there is this one. >>: Oh, okay so it’s one of your algorithms… >> Mahesh Tripunitara: Yes. >>: That’s specialized. >> Mahesh Tripunitara: Yes. >>: Okay. >> Mahesh Tripunitara: So here’s where it sort of starts taking off. So, I mean, you know so it tells you kind of where you should maybe use one approach as opposed to the other approach, so. Okay, so that’s it for the first problem, its eleven zero-eight. So maybe I’ll, so the second one has also to do with the access enforcement. The problem is the following which is that we want to come up with the data structure for enforcement. So we have a bunch of current sessions. I know in the customary architecture for access enforcement people talk on something called policy, protocol enforcement point or policy enforcement point. Okay so we would like it to be time and space efficient, right, and practice, maybe. So here’s a data structure that’s a candidate. It’s called a bloom filter. It was originally proposed for set membership checking. The way it works is, so we want to encode a set A which is a subset of a universe U. So we, you know first encode the set A in the data structure and then at one time we have queries. A query is a U in the universe and we want the data structure to tell us whether it’s an A or not. Of course we have plenty of data structures to do this, binary search trees, hash tables, and things like that. Now how does a bloom filter work? Well we pick some number of buckets kind of like a hash table, I’ll call them bits M. We pick some functions that I’ll call hash functions. What is the meaning of a hash function? Well each hash function H I maps elements of the universe to one of the indices zero to M minus one. To, so this is how we encode the set A. In bloom filter we start with all the bits as clear and then if we want to insert A we set the position at H of I for every I, for every one of those K hash functions we set the bit. So that’s the data structure. It’s now ready to be queried. Now if I want to check if some U and U is an A what I do is subject it to every one of these hashes and I check if the, any of the bits is clear. If any of them is clear then I know that that U is not an A, right. Okay, so what are the properties of this, time efficiency properties of a space efficiency properties, what is a query time? Well it’s the time of doing K hashes. Now I could choose the K to be constant in the size of the set A. The space is the M bits which is also could be chosen constant in A. So this seems magical because now I have this, you know I’ve seemed to have broken the time space trade off but of course nothing comes for free and here’s a trade off, right, which is that if I pick some U and U that is not an A and then I query for it, it’s possible that all the bits are set to which it maps under the hash function and then we have a false positive. But this error is one sided. So it cannot happen if, you know one of the bits is clear when I check, right. Okay, so to use this for access enforcement we cannot deal with the false positives because in access enforcement we typically expect a one hundred percent correct answer. So we went to the bloomier filter which was proposed in one of those algorithms News Stocker Fox in two thousand four. We have an adaptation of it. So the idea, one of the ideas behind the bloomier filter is quite simple. You actually take the false positives and then you store them in another bloom filter. So basically if the first layer of bloom filter says yes this thing that you’re looking for is in the set, right, might be a false positive I need to check whether it is how do I do that? Well I check in another bloom filter. So basically we end up with L layers of bloom filters, right. Where the layer I encodes the set A I and it has a potential set of false positives P I. So in the first layer we encode the set A and the potential set of false positives is everything that’s not an A. In the second layer we actually, we take the actual false positives in the first layer, right, and then we encode in the bloom filter but the potential set of false positives is actually the set A one and then we switch back and forth. Well, you may never eliminate all the false positives using only bloom filters so in the end we assume that we maintain explicit list of false positives. Now empirically we showed that the status structure can be effective for access enforcement in some other work, so I don’t talk about it. The one I want to talk about is a problem with constructing an instance of this data structure. So what are the inputs? Well the inputs are the A and the P which are for encoding access, basically the sessions. So we assume some syntax for that, let’s say a pair which is session ID and permission. We have a set of hash functions, candidates that, so there are heuristics on how to choose these hash functions. Specifically there’s a paper in Communications of the ACM many years ago where he shows in practice that cryptographic hash functions like SHA-1 are good candidates. We have the maximum total bits so this can be seen as an optimization objective because we want to minimize this, we want to minimize this as well which is the bottom, the number of false positives that come out of the end. We may want to minimize this which is a time and the levels we don’t, it’s not an optimization objective, right. We don’t care, usually we don’t care how many levels we use. Yeah? >>: So for [indiscernible] bloom filter. So every element you want to check whether it’s a large success or not you need to let it pass all the data… >> Mahesh Tripunitara: Unless one of the layers says no… >>: Okay, yeah. >> Mahesh Tripunitara: It’s not there. So that would be the best case. The best case is one of the less is new. >>: So will this approach be better than just increase the number of page control? >> Mahesh Tripunitara: I will show you. We have empirical evidence for that. The short answer is yes. >>: Okay. >> Mahesh Tripunitara: Right, so the effectiveness of using multiple layers is a question, right. Why don’t you just use the, yeah? Okay, so the decision problem is you’re given all these things, it’s the existence. Does there exist an instance of the data structure? It’s only decidable in exponential time but it’s unlikely to be in NP, this is a somewhat informal comment. The reason I say that of course is that if you look at these inputs these are integers and in binary for example if you use, depending on your encoding they can be encode in log N bits. So if I change those slightly too uppercase, meaning that I pre-allocate the space for the bits and I pre-allocate the space for the explicit list then it’s an NP. Then I get an upper bound but I like more. Or, you know the fact that I get a, this upper bound I might as well deal with this version of the problem than the other one because this is as practical as the other one is the claim. So if you pre-allocate the space, uh… >>: [inaudible] also much more realistic because [indiscernible] where you realize the [indiscernible]… >> Mahesh Tripunitara: Yes. >>: [inaudible] >> Mahesh Tripunitara: Yes, you’re right. So you could argue that this was we used minus what [indiscernible]. Okay, so this kind of, assuming NP is not equal to EXP this kind of dramatically changes the complexity. We do have a lower bound, it is NP hard, it’s not hard to show. Now one thing is that it remains NP hard even for the classical bloom filter. So I don’t know if anyone has shown this result for the classical bloom filter. We do have a gap preserving reduction from set cover so if you perceive the final set of false, size of the set of false positives as an optimization objective then there’s a pretty straight forward reduction from set cover that’s gap preserving, which means that we cannot, we don’t expect to efficiently approximate it in better than log N approximation ratio. So I’m just going to look at the reduction to CNF-SAT. Again we went to circuit SAT, Boolean circuits. Here the building block circuits are much more complicated so we have a bunch of building block circuits. I had this nice picture to show some of them. I don’t know if I want to go over them, just, so the filter map is just saying that should this bit be set or not? It’s going to be set if the hash of this element is one, right, but I need to audit with all the other elements so that’s what that is. But then we also have a question as to, okay in layer I do I choose a certain hash function? Is the hash function even valid? Well how do I determine whether it is valid? Well it depends on the number of bits I choose for that lay out, right. So basically we have the number of bits that are chosen for the layer and all the hash function, and basically this kind of outputs whether a certain hash function is even a candidate for a certain layer from the armosta list of hash functions. So things like that and then of course we have false positives. So this is a fairly complex bunch of circuits but it’s fairly efficient so the cost of the reduction is quadratic. Okay, so let’s look at some fun empirical results. So the first question, do we benefit from multiple levels? The claim is that this graph shows that the answer is yes. So what we do is as the goodness criteria we us the size, the number of false positives as the E size as a vertical access. We have something called problem size here so this is from our benchmark that RBAC benchmark that I talked about. Now it turns out that if you use only one layer then this is the graph that we get. But as you increase the number of layers, you know you start getting better and better performance. >>: I mean for whether you’re [indiscernible] same number of [indiscernible]? >> Mahesh Tripunitara: Yes, so the total number of bits is the same. >>: Ah total number of bits, okay… >> Mahesh Tripunitara: Right, because that is the space that I want to allocate. >>: Yeah but potentially you increase the [indiscernible] cost… >> Mahesh Tripunitara: Well that’s… >>: [inaudible] >> Mahesh Tripunitara: That’s sort of obvious but potentially yes… >>: [inaudible] >> Mahesh Tripunitara: But that’s not a, I won’t necessarily say that yes just because we use more layers we use more, no that’s not necessarily true. >>: [inaudible] what [indiscernible] bloom filters to implement a fast hash. >> Mahesh Tripunitara: Right an access… >>: An access check, the access check. >> Mahesh Tripunitara: So you get an access request I won’t immediately say yes or no. >>: So you, you use a pair of hash like a two level hash tables you also get low [indiscernible]. >> Mahesh Tripunitara: So, okay, so… >>: [inaudible] >> Mahesh Tripunitara: Let’s say we use perfect hashing. So with perfect hashing I’m guaranteed constant time query. But in the worst case the space is quadratic. >>: Right but if you have two, two mapping, if you have two hash functions… >> Mahesh Tripunitara: Right. >>: So then you have, you can get the hash [indiscernible] ratio [inaudible]... >> Mahesh Tripunitara: Right, so in perfect hashing… >>: [inaudible] >> Mahesh Tripunitara: We do use… >>: [inaudible] perfect. >> Mahesh Tripunitara: Two layers, oh. >>: [inaudible] hash [indiscernible] rearrange [inaudible] picture [inaudible] according to two hash functions. >> Mahesh Tripunitara: Oh I see, I see, I see. >>: And then allows you to use [indiscernible] buckets. So when you [indiscernible] one bucket [indiscernible] to one of the hash tables. [inaudible]. >> Mahesh Tripunitara: Okay. >>: [inaudible] that gives you space efficient hash [inaudible]. >> Mahesh Tripunitara: Bloom filter… >>: What’s the tradeoffs between using say a bloom filter [indiscernible]? >> Mahesh Tripunitara: I don’t… >>: [inaudible] >> Mahesh Tripunitara: So if I understand you correctly what you’re saying is that for each hash function we use a separate set of buckets, is that correct? >>: Um… >> Mahesh Tripunitara: Not the same, because… >>: Reduce the [inaudible]. >> Mahesh Tripunitara: Then that’s exactly the bloom filter. >>: [inaudible] >> Mahesh Tripunitara: Right because in the bloom filter I use K hashes. So there is a, some mathematics behind given the size of the set A that I want to encode. What is the optimal number of bits and hashes I should use, in the expected case, you know in the expected case? Because in the worst case it doesn’t matter right, depending on your input you may do badly no matter what. But anyway I don’t know if, maybe we can talk over it because I don’t, maybe I don’t fully understand. Okay, so I… >>: So I use bloom filters as I test a filter out negative… >> Mahesh Tripunitara: Okay. >>: Call the expensive [indiscernible]. >> Mahesh Tripunitara: Right. >>: Right so that’s a typical way… >> Mahesh Tripunitara: Right, right. >>: But if you use bloom filters for the expectation of a perfect… >> Mahesh Tripunitara: Yes. >>: Hashing… >> Mahesh Tripunitara: Yes. >>: It seems like then you get a trade off [inaudible]. >> Mahesh Tripunitara: So the first thing is that if I use two hashes with a shared set of bits to me it sounds exactly like a bloom filter. Now I could use separate sets of bits but that’s exactly the layering thing that we’re doing, right. That sounds like a two layered thing, right. The point is that well why two, why not five? So here’s another question. Does adding another level always help? The answer is no. So again here we’re going with E size, not the query time as a goodness criteria. Basically here we’re adding levels. So if I have a one level bloom filter then my E size is over here. If I go to a two level bloom filter, oh so those are problem sizes, right. So I’m sorry so I need to go back. So for a certain problem size if I use a one level bloom filter then I have the E sizes here. But if I go to two then it dramatically improves the thing. But you notice for certain other problem sizes when I increase to three I get a worse performance than I had with two. This is not surprising because we do the switching thing, right. So really the fair comparison should be between two and four, or one and three. In that case I can say that using four layers is at worst as bad as using two layers, and similarly for three and one. Okay, but adding one level doesn’t necessarily help. Finally, is there a cost with using more levels? The answer is yes. So there are two answers so this one is sort of a complexity theoretic view. The other is a, you know this kind of empirical view. So the empirical view you can see here that if I use only one layer bloom filter then this is for our implementation, right, our CNF-SAT reduction plus mini SAT, or running time of mini SAT which is the SAT solver that we used. You notice that you, I quickly sort of converged to the best I can get within so much time. But then if I go to five levels, you know and I wait only whatever this time is, ten seconds I still get a pretty bad bloom filter and it takes a while before I, you know I get a, even this crossing point, right. So in some ways this quantifies the cost of using more levels. It’s going to take you longer to construct instances of this thing. So… >>: [inaudible] to make sure I understand this so you run mini sets to synthesize the filter? >> Mahesh Tripunitara: Yes. >>: Or a given set of levels one, three, and five? >> Mahesh Tripunitara: Yes. >>: And if you give it a longer time it makes it better than quality? >> Mahesh Tripunitara: Yes and the point is that if you use more levels you have to wait longer, right. So for example if my patience was less than ten seconds and I was using five levels I’m going to get a pretty crappy bloom filter. >>: U-huh. >> Mahesh Tripunitara: Because my result in E size which is my goodness criteria is pretty high. Well if only I’d been, but the point is that in that case I would have been better off using a one level, just a classical bloom filter. Because it would have given be a better E size, if I was only ten seconds of patient. On the other hand if I was willing to wait twenty seconds then I can have a five level bloom filter, right. Because then it’s going to give me much better, because that the one guy is not going to give me much better the level… >>: So input to mini SAT is the domain A and U. So A and U are enumerating A and B? >> Mahesh Tripunitara: So we use all these, so you have these possible, the A is the… >>: And you would say implicit okay so A… >> Mahesh Tripunitara: No, no used A union P. >>: Ah, okay. >> Mahesh Tripunitara: So A’s the authorizations that are allowed. >>: Yes. >> Mahesh Tripunitara: A union P is all possible access requests that I could get. >>: Okay. >> Mahesh Tripunitara: Now you have to assume some certain things about the domain for this. Hey, Marco, and etch is a set of all possible hash functions. >>: Right. >> Mahesh Tripunitara: Obviously we are to finalize this or it doesn’t make sense for us to consider the computational complexity, right. Right, M and E are like the space optimizational objectives, K is the time, it’s the query time. Because I’m allowed to use at most K hash functions. The L is really we don’t care but we put it in as a, as a, oh by the way so that just, because Marco came in this paper is his paper. I’m not talking about it because we did some more work on it. Anyway, okay, so yeah that’s the input. >>: Okay. >> Mahesh Tripunitara: Okay, so, yeah, so I think that empirical results is somewhat interesting, right. We still have one kind of open problem here because we did look at it from the standpoint of this fixed parameter thing. On one of them which is the most interesting case, I would say, which is do the number of levels lend to intractability? That question we have not answered yet. So we have empirical answer but we don’t have the computational complexity answer. Times now eleven twenty-seven. >> Nikolaj Bjorner: [inaudible] >> Mahesh Tripunitara: Okay, now we come to Karthick’s work which is policy verification. So here this part if you remember is the delegation from the super-user to the semi-trusted admin who’s going to make changes to the authorization policy. So here are, I have to introduce some new syntax, I generally don’t propose policy languages. So the first question we are to consider is okay what is a meaningful language for delegating things in this context? So remember that the RBAC policy consists of, we could think of it as consisting of three pieces, the authorizations that users have to roles, the relationship between roles which I don’t show here, and the relationship of roles to permissions, right. So the question is what should a delegation model look like to administer something like this? There’s a, quite a famous proposal in the community from nineteen ninety-seven called ARBAC. We’re going to focus only on the user role authorization part. There’s good motivation for that because it’s been argued that that is the one that changes the most in enterprises. So here’s their specification of a language for doing this. Okay there are two things that you can do. You can assign a user to a role or you can revoke a user from a role. Accordingly we have two sets of rules that they call can assign and can revoke rules. Each entry in the can assign set is a three tuple. It looks like this. We define something called an administrative role and we give, we put members in it like Alice, Bob, and things like that. We say that this administrative role is allowed to assign users to this role, member of this administrative role are allowed to assign users to this target role. So for example we might put Alice as a member of A, which means that she’s authorized to assign anyone to be a member of RT. Now the only additional thing is that we constrain her by putting a precondition. So basically we don’t want her to have unfettered authorization to assign users to RT. So what does the C look like? So I’ll do by example the C is sort of a propositional kind of expression on existing role memberships of the user that she’s trying to assign. So let’s say the Alice says okay I want to assign Bob to the, to be a member of engineer. Well she’s constraint, Bob has to already be an employee, member of employee, and he should not already be a manager. If Bob violates that she cannot do that, right. So that is how the super-user gives control delegation. Because presumably the super-user wants to control, you know how these sub-administrators are [indiscernible]. Okay, one more revocation. Well it looks similar but without the precondition. So all we’re saying here is a member of A, for example Alice is allowed to revoke users. So for example Bob is a member of employee then she can just say oh he’s no longer a member of employee. Now the question is why is there an A symmetry? Why don’t we have a precondition here? The rational stated is that the Monotonicity Assumption. Which is that provocation is an inherently safe operation. You’re only going to take away privileges not give privileges. That is the rational for that. So you can imagine the assumption here is that all the permissions are positive permissions. There are no negative permissions. Okay, so safety, the question is of course whether a user could acquire membership to some role by some sequence of actions. So more precisely if I was to write it as a language we have a start state which is the RBAC graph, right. We have these rules, right, by which the state can change. Okay, I’ll explain this in a second the trusted users and then we have a query. Okay what is the set T of trusted users? Well so that comes up from, it doesn’t change the computation complexity or anything. But it comes up from a practical consideration which is that, let’s say the super-user trusts Alice but he does not trust Bob. So what he’s going to do is, and Alice and Bob are administrators, they’re the partially trusted admins. What he’s going to do is he’s going to put Alice in this trusted set and the meaning is that she’s not allowed to do anything as part of this analysis. She’s not allowed to assign any users. She’s not allowed to revoke any users. So it is a very coarse grained way of trying to capture insider abuse. Because I trust you but not him, so I’m going to put you in the trusted set and assume for the purpose of analysis that you don’t do anything. You could say that this very coarse grain because you should be allowed to do some things because, you know and things like that. But that is the way this problem we have set it up. Okay, so the intent there is to capture some kind of insider abuse thing. So in some earlier work that I did that was published in two thousand eight, we showed that this thing is PSPACE complete. I just wanted to put the relationship of the complexity clause here. Now one of the comments here is that when people design these things they don’t think of policy verification. That’s the point I’m trying to make here that it’s a limiting factor for adoption of this, because you can’t even give a tool for doing this kind of basic policy verification. Okay, model checking is complete for PSPACE. Why don’t we try it? So Karthick tried it on some benchmark policies and you can see for even kind of relatively small policies he was getting poor performance on his desktop PC. So of course we have to use some tricks so what did we do? We complimented it with abstraction refinement and bound estimation, and then we used bounded model checking. So these are domain specific tweaks, right. So abstraction transforming into a smaller problem and ensure one-sided error, specifically that if the smaller problems unsafe then so is the original. But if the smaller one isn’t safe then we basically have to refine, meaning increase the, put, throw more things in. So of course it’s highly effective if safety, you know the unsafety is shallow. So where Vijay came in is that he was like well coming from the bug finding kind of mind set, which is that if you think of unsafety as a bug and you make the promise that software people I guess do in testing that bugs are easy to find, right. We don’t know if these are validated in practice for these kind of policies. Okay, so that’s of course how it works, generating initial abstraction, model check if it’s unsafe then we’re done, if it’s not then we have to refine until eventually we go to the thing. The bound estimation how does that work? Well we have a state reachability graph and what we want to do is hint the model checker with an upper bound on its diameter. The diameter is of course the length of the longest/shortest path between any two nodes. Because we know that the model checker does not have to go in cycles. Okay so Karthick implemented this in a tool that he calls Mohawk which is downloadable. By the way the other things that I presented are also downloadable. I think the CBF one we’re still working on it. I’m not sure if I put it out there yet. But all the code is available for download. Right, so we’re going to take the policy and the safety query we’re going to do some mundane transformation. We’re going to do the abstraction so we have the abstract policy we’re going to estimate a bound on it and then we’re going to invoke NuSMV. If we get on safety which we call an error then we’re done, right. Otherwise we’re going to refine, right, we’re going to do, estimate a new bound and then we’re going to invoke NuSMV again. So here’s Karthick’s approach to abstraction refinement which happened to work for the benchmark policies. So what he does is he does something somewhat straight forward which he defines a relationship called related by assignment between roles. So for example if you have A, C, R one, and can assign, and if there is a precondition role R two then we say that R two’s related to R one, right. Then he builds a tree based on this. So here’s a simple example where you have those can assign rules. So you notice for example that finance is a precondition role for budget committee. So he basically puts an [indiscernible] like that, right. But budget committee’s on top because that is the target role, right. Similarly if you look at finance as a target role then you have accounting and audit over here so he puts them here. If we have a query like could Bob become a member of budget committee. So basically what we’re going to do is we’re going to aprune the policy and we only take that part of the policy that refers to some constant number of roles by this priority. So you could have the K A to be one which means we’ll start off with only, I think we start here, right? Okay… >>: [inaudible] >> Mahesh Tripunitara: Oh we start at the top, okay. So we’ll actually throw away all of these guys, actually we’ll throw away everything. >>: [inaudible] >> Mahesh Tripunitara: Right because even the finance role is not in there. >>: [inaudible] the abstraction is under approximation? >> Mahesh Tripunitara: Yes. >>: Yes. >> Mahesh Tripunitara: So, okay so of course the refinement is you throw in K R more roles and their turnable knobs they refine how aggressive we are. >>: So it’s basically a slicing. So it’s some slice of the system [indiscernible]? >> Mahesh Tripunitara: Yes, yes, okay bound estimation. What we’re looking for is an upper bound of the diameter of the state reachability graph. Of course we would ideally like it to be tight. Now if I have R roles, R is a set of all roles that appear in the problem instance then I immediately have an upper bound like this. So why is it that I have a minus one in the exponent here and the plus one here? Well the reason is that the, the role that’s in the safety query we know that the only thing we have to do is assign a user to it and then we’re done, right. So there’s only one state transition that applies to that guy. That’s why it disappears from here and goes here. That already gives a hint about, as to how we do the bound estimation. Now of course there’s a question how tight is the bound and that relates to some foundational question in computing, right. So we don’t expect a tight bound unless PSPACE equals EXP. We don’t expect to answer that question as part of this work. Okay, so what is our strategy? We progressively remove terms from the exponent which is up here. They don’t disappear they’re likely going to get added over here, right. So here’s a simple example, we call these various approaches, tightening. So let’s say we have the safety query that refers to the user U and the role R, we identify those rules A one through A K in the can assign set who’s target role is R, which is the safety one. Then we assign, we identify sets of roles that may cause each of those to fire, right. Then we pick the largest sized subset among that one and we can replace, we can replace the set of roles that we started with which is from the policy with this one. Why? Well basically we know that one of them fires and causes the user to become a member of R, that’s it. We need at most one. So similarly we have various tightenings. So if there’s some preconditions that appear negated only then we know that the only thing that can be done is revocation from those roles, right. So basically we end up with a set of roles here that we can remove from this and then we stick them here, right. We don’t sacrifice any correctness, that’s what the theorem states. Now we get a dramatic improvement. So this is the final result on our benchmark. This RBAC-PAT thing is a prior work that was done at Stony Brook where they have a, I don’t know it’s publicly available or they email you the code, right. So they called and asked for it and they emailed it to him. So of course, you know I’m only presenting the good news for our tool, which is that in the paper we talk over the tradeoffs are, right. That classification of the test suites with the complexity comes from the fact that if you remove certain syntactic elements from the problem then you end up with a problem that’s in P or NP complete and things like that. Okay, oh right very important slide. My university hired a… [laughter] Ha, ha, it’s my final slide. >>: Oh my god. >> Mahesh Tripunitara: I think Marco might like it especially because he’s a UW grad. University of Waterloo I should say because there is a UW here, right. So Waterloo had this, it’s kind of a traditional academic looking insignia or whatever it is and they hired someone and he or she proposed this new fancy logo for the university, which some people say is more appropriate to a disco club than a serious university but oh well. >>: [inaudible] >> Mahesh Tripunitara: Maybe it’s the other universities that are behind the times. I think they have quietly gone back to that one. I think that person left the university and now I notice that, anyway, whatever. >>: [inaudible] >> Mahesh Tripunitara: Oh really. >>: [inaudible] Waterloo. [laughter] >> Mahesh Tripunitara: So… >>: Maybe it’s just going to be for fun events this would be the word just as a show… >> Mahesh Tripunitara: Right. >>: Right. >> Mahesh Tripunitara: Well there was a time that they said that if you wanted business cards as faculty, I mean I don’t print business cards anyway but they said that no we cannot give you with that, you have to take the new one. >>: [inaudible] >> Mahesh Tripunitara: Right, so people are saving their old business cards because they didn’t want the new one. [laughter] >>: [inaudible] >> Mahesh Tripunitara: But now quietly it turns out that you can, somebody told me that now you can order with the old ones. So I think there quietly kind of moving back with that. >>: [inaudible] >> Mahesh Tripunitara: Anyway thank you for your time and the opportunity. >> Nikolaj Bjorner: Thank you. [applause]