17379. >> Stuart Schecter: Hi. I'm Stuart Schecter. ...

advertisement
17379.
>> Stuart Schecter: Hi. I'm Stuart Schecter. I'm here to introduce Lujo Bauer from Carnegie
Mellon. He'll be speaking to us about access control.
>> Lujo Bauer: Thanks very much. Any clue how many people might be or might not be
watching remotely? I'm just curious? No. Okay. Doesn't matter.
So I'm going to be talking about this system we have at Carnegie Mellon called Grey and the
basic idea in Grey is we're using smart phones as the vehicle by which users exercise the
authority and in a particular scenario this authority is to access office doors delegate to each
other the right to access each other's office doors and also log into computers, log into each
other's computers.
But just to roll back a little bit. This is stuff that's -- this lies now almost a couple of years old. A
version of it is a couple of years old.
And I think a couple of years ago we had to do much more selling of why we were interested in
smart phones. I think now everybody's much more excited about smart phones and not so much
motivation is needed there. But one fact that perhaps bears repeating is huge numbers of mobile
phones are being sold. Last year 1.1 billion phones were sold.
In a couple of years it will be -- it will definitely be the situation that every mobile phone that will be
sold will be for all practical purposes be what today we're calling a smart phone. In a few years
we're looking at a world where practically everybody in the planet has at least one of these
powerful mobile computing devices and is walking around with them. We might as well find some
cool stuff for these devices to do.
And what we want to do with the phones is use them to intelligently control the environment and
Grey is sort of -- current version of Grey is the tip of the iceberg in that sense. I'll show you some
scenarios we'd like to enable with Grey. One is suppose we're doing access control to the
vehicle with the phones and something I'd like to make possible is if I'm on a vacation and a
friend of mine wants to borrow my car which I left at home I'd like to make it possible somehow
for some interaction to take place between my phone and my friend's phone so there's a virtual
transfer of credentials from my phone to his and he can just use his phone to drive my car as long
as I've decided it's appropriate.
And another example, suppose we're using these devices for mobile commerce, some sort of
mobile payments. Something that will be really neat is if a friend of mine walked into a store and
noticed something that he knew I really wanted to buy it would be neat if I could send him the
money so he could make this purchase on my behalf without having to lend me any money.
Another example. Suppose I'm traveling. There's this important e-mail exchange going on.
Something I'd like to enable is that my secretary get access to that e-mail exchange but only to
that thread and only to that set of threads and only for a specific amount of time.
And moreover I'd like to do this in such a way that I don't have to give my secretary any user
names or passwords that she might be able to use to access my other accounts or that I might
have to change afterwards or anything of that nature.
And final example: Suppose again I'm traveling. I've made my hotel reservations, all my travel
reservations online. A scenario I'd like to make possible when I walk into a hotel where I've
booked a stay in a city I've never been in. A hotel I've never been in. I'd like the hotel
infrastructure to transmit to my phone a map of the hotel and directions to get to the room and as
well as authority to unlock my room door and keep unlocking it for the duration of my stay.
All these scenarios are something that unites them is there's a kind of virtual transfer of authority
going on between devices or devices and infrastructure and this should be completely seamless
and transparent to the user whether it's regardless being done between two devices in the same
room or on opposite sides of the planet.
That was the more grand vision of Grey. This is concretely what's happening with it right now.
What you sigh here is a map of the second floor of a building at Carnegie Mellon where I work.
All the circles are doors that are Grey-enabled. The green ones are ones where we're using Grey
and red ones is Grey could be used if we wanted to activate it.
The idea here is that each door is outfitted with an actuator such that the phone can communicate
with this actuator via bluetooth and if the actuator is convinced that the owner of the phone should
be able to gain access to the office or to the lab or whatever the door will unlock.
Beyond what's possible in this setting is that users can delegate authority at the time and place of
their choosing.
If I decide that my students should be able to get into my office I can do stuff on my phone which
will create credentials and enable them to get into my office. This can happen both proactively in
the way that I just described, or reactively. That is, somebody can attempt to access a resource.
The system will discover that the person doesn't have authority to access the resource, and then
it will go seek out people who might be able to modify their policies in order to make this access
possible.
So we've had this deployed various versions of this system for a couple of years now. We
usually have about 35 users, 35 doors. And we also use this, although less extensively for
logging into XP, Vista and Linux, that is, we have a little module, a little plug-in on these areas
OSs such that the phone can be used to unlog a screen or screen for the user and we actually
started another deployment of this at UNC Chapel Hill. So let me describe a scenario that is a
pretty typical one that takes place in the context of our system.
Here we have two players. We have student Scott and myself. And my office door, the resource
that somebody will want to access. So here the idea is Scott wants to get into my office. And the
only way he'll be able to get in is if he can convince my office door or the little computer
embedded in the wall next to it that I've authorized the access.
Now, in this scenario I'm traveling, and so I can't somehow directly authorize this access. My
phone here holds a bunch of credentials. These credentials represent pieces of my access
control policy.
Some of these pieces are, for example, I have stated through a credential that I think students
should be able to or my students should be able to get into my office. I've stated that people I
believe are faculty should be get into the office. And also that for all things having to do with CNU
my secretary has full authority to speak on my behalf.
But importantly for the example, there's no subset of credentials here which enables Scott to get
into my office. In fact, none of these credentials even mention Scott. So here's what happens.
Scott tries to instruct his phone to open my door. The door sends back a statement of the
theorem that Scott's phone must prove in order to demonstrate that I've authorized this access.
Now, Scott, in this example, has no credentials, but he realizes this is something -- his phone
knows it's about accessing my phone. In fact, there's something about that stated in the theorem.
And it turns around and asks my phone for help.
Now, my phone can help because it has a bunch of credentials. And a bunch of credentials. But
in the particular case that we're looking at these credentials don't yet enable Scott to get in.
However, my phone can, using those credentials, give me a list of new credentials that I might be
willing to create such that any one of these new credentials in combination with all the ones that
are already there might be sufficient to give Scott access.
Here I pick which one I think is most appropriate, that is that Scott is a student. This means
there's now a collection of credentials sufficient to answer Scott and these get shipped back to
Scott. He adds one of his own and sends them to the door. The door opens.
Now, what's going on here under the covers is that in each one of these cases when I'm sending
credentials to Scott and Scott is sending them to the door, so what's being sent is a bunch of
digitally signed credentials. These are the things that represent pieces of the access control
policy.
And they're wrapped up in a formal mechanically verifiable proof, which explains how these
credentials together imply that I've authorized this particular access. So I won't go into this in
detail. I don't really want to argue the point. But we believe there are a couple of neat things
about this approach to doing things.
One of them is that after -- one of them is that after an access is allowed, this proof can be easily
put into some sort of record. When you're auditing accesses you know not only that somebody
accessed something or even all the credentials that were involved, but you have an exact
explanation as to why the access was allowed.
And this might be interesting because, for example, a piece of the explanation, which also could
be the result of a number of credentials, could be, for example, that the HR department was one
of the entities that allowed access. And it's neat that you can just pluck this piece of information
out of this proof instead of having to try to interpret these credentials to see if they imply
something like that.
Another neat feature is that in the limit, this allows the reference monitor that's checking proofs to
be relatively simple. It doesn't have the complicated task of figuring out why access should be
granted. It just has the task of figuring out whether an explanation as to why access should be
granted is a valid explanation.
And to help convince you that this actually works in reality, I'll show you a two-minute video which
basically replays the scenario I just went through. And here my developer, Dave, is playing the
part of Scott. But otherwise it's the same scenario. So here you see Dave walking up to my
office. I'm not there. I think he'll be pretending this is the first time he's accessed my office. So
he'll use his phone to take a picture of the two dimensional bar code which encodes the bluetooth
address of the computer that's embedded in the door.
Now he instructed his phone to open the door. The door communicated back what the policy
was. His phone realized that it couldn't compute a proof. And it offered him an option as to
whom to ask for help.
There's some automatic guidance, but there's also the ability for the user to direct the request in
the way that he thinks is most useful. So here I am traveling ostensibly but actually in the
conference room on the other side of the floor. I get the request, and this little wizard interface
helps me choose which additional pieces of access control policy I want to create in order to
make this access possible.
And here you see I'm adding Dave to a group that I've already created in the past. Interestingly,
this group already has other credentials granted to it. So that Dave will be getting not only the
authority to get into my office but also the authority to do other things that members of my group
are, members of this group are allowed to do.
So as soon as I'm done creating the credentials, my phone assembles this proof from them.
Sends it back to Dave. Dave turns around and sends a bigger proof to the door. The door
opens.
>>: How do they communicate, bluetooth now work wireless?
>> Lujo Bauer: So they communicate over what's essentially a home-grown version of
multimedia messages. So one phone up loads stuff to a server and the other phone gets an SMS
telling it where to go pick up data.
>>: I see. Not in the cellular provider, your own server?
>> Lujo Bauer: It is -- well the server doesn't belong to the server provider but the data is
uploaded via whatever data connection the cell provider offers and the other phone is [inaudible]
SMCS.
So there are a bunch of research challenges and really a bunch of research directions that we're
exploring in the context of this project. One has to do with developing better logics for access
control that will let us describe a number of interesting access control scenarios that perhaps
some older logics couldn't let us handle.
So one such example is in the case when we want to control access to consumable resources.
So if you think about digitally signed credentials, usually they have an interval in which they're
valid. But the intention is that they can be used within that interval as many times as somebody
likes.
However, suppose you want to give out the credential saying I'd like to give you one can of soda
or I'd like to have access to one can of soda. I don't want this credential to be used many, many
times. I'd like it to be usable exactly once otherwise it loses its meaning.
So we did some work that enabled scenarios like that. We also did a little bit of work to enable
scenarios where perhaps you might have multiple organizations using a system like this. And the
organizations cooperate to the extent they're all willing to play in the system but at the same time
they're independent enough that each organization wants to use its own access control logic. So
now there's the question of if I'm trying to -- if the resource is housed in one organization but my
home organization speaks a different access control logic, is there any way I can convince this
resource it should open. So we figured out one approach to doing things like that.
Another class of issues had to do with distributed theorem proving. This task that Scott's phone
engages in or Dave's phone engages in in order to demonstrate that it has access is essentially
distributive theorem proving task. It has to do with credentials or premises available at various
places in the network.
It has to figure out how it can be demonstrated that it has access. And some of the things that
make this more difficult -- well beyond just the distributed, the distributed factor, is that many of
these credentials don't exist. For example, in the scenario that you saw, the credentials
hypothetical credentials which my phone showed me and allowed me to pick which one I wanted
to create, these didn't exist until the theorem prover suggested that maybe they were good ones
to create in order to compute this theorem.
Another piece of work has to do with helping users configure access control policies. And this is
one of the things that I'll talk about today. This has to do with easing this burden of configuration.
In this situation Dave had to wait for a while while I picked what policy to create in order to let him
into my office. It would be nice if somehow the system was smart enough to minimize the
number of situations where a user had to incur latency of that nature.
And finally, I think this is a really critical piece of the puzzle in developing access, new access
control systems, and one that's often overlooked is I think it's important that once you come up
with a system that you think is cool and new and useful in some way that previous systems aren't,
you really ought to be able to demonstrate that in fact compared against some previous system or
compared against what people want, this system in practice provides some usefulness.
It doesn't matter maybe your system has really awesome features, but when you deploy it for
some reason or another it does nothing useful. It's less useful than whatever other people used
before.
So we did some work in that vein. And these are the two things I'll talk about today. But if you
would like to discuss any of the other matters, I'd be very happy to talk to you about them off line.
So the first part, the first of the remaining two parts of the talk will be trying to solve a problem of
how to make it easier for users to configure policy. And again this very specific issue that we're
trying to solve here is that in the situation where I always knew or maybe at some point I knew
that I would be willing to let Dave in. But I never got around to implementing this policy that I had
in mind. So as a result Dave had to wait for perhaps a minute or two or perhaps many hours
because I was on the plane and wasn't able to create a policy before he was able to get into my
office.
So I'd like to do away with latencies like that whenever possible. And the approach we took to
solving this has two pieces. One is try to figure out what the intended policy is. That is, the policy
that somebody had in their head. Compare it to the implemented policy. The policy that's been
operationalized in the system.
And figure out where the mismatches are. And then suggest to the user that he fix these
mismatches, if we think that they're an indication of misconfiguration.
The thing that makes this possible is that access control policy exhibits patterns. And more
specifically if you have a collection of access control logs, you can mine these logs to extract
those patterns. And these patterns are indicators of what's good policy.
Now, one thing that really needs to be highlighted here is in order to be able to do something like
this, well, you actually have to have a collection of logs to analyze. And this is something that a
particular device in this system on its own wouldn't have. You actually have to go to some effort
to come up with this comprehensive collection of logs.
So once we have those logs, though, what we do is we apply a cessation rule mining. Now, this
is a technique which it's an algorithm which takes as input a set of records where each record has
a fixed number of attributes. For example, you can think of a record as being a shopping cart.
And the attributes are the contents in the shopping cart.
And so the set of possible attributes is the set of items that the source sells and the source of
attributes that is true for the shopping carts are the ones inside it.
And the output of the association rule mining algorithms is a set of rules that really describes the
statistical patterns that are observable from these records. For example, a rule might be that
people who buy peanut butter and jelly also buy bread. And now that way we use these rules is
to identify anomalies. So we see somebody who has bought peanut butter and jelly but not
bread. But maybe this is a mistake.
Maybe if most people buy bread when they buy peanut butter and jelly but this person didn't
maybe this person really intended to buy bread but just plain forgot. So we can suggest to this
person do you also want to buy bread.
And incidentally this is also, this plays a role in how items are organized on shelves and grocery
stores and other stores. The people who stock these places try to make sure that the items that
you buy together are always live together in part on a shelf in part so that when you're buying one
you notice the other and remember to buy it.
Going into this in a little bit more detail. Here's an example of a record with four attributes. Just
four possible attributes. Only two of which are actually true in this record. So we have a set of
records like this, and then we can begin looking at combinations of attributes.
For example, if we look attribute-based access control beauties A and B across all records, we
see that, well, four times it's the case that attributes A holds and two times the case that attribute
B holds. So we can say we can make up a rule that the existence of attribute A implies the
existence of attribute B but with confidence of .5, because the rule is only supported in some
sense by half of the evidence.
Similarly, that can be a rule that attribute A implies attribute C with a slightly higher confidence
because it has slightly more data items supporting it.
So how does this translate to users and doors? Well, each record becomes a user. And all the
attributes of that record become the resources that that user has been observed to have
accessed.
So what this line about Alice means is that Alice has been seen accessing resource A and
resource C. That's now we do rule mining on this dataset. So, for example, if we come up with a
rule that resource accessing resource A implies accessing resource C, then this one instance
where that rule doesn't hold if it's a high quality rule, is likely a potential misconfiguration. So
maybe this is a mistake in the sense that maybe Bob soon will be accessing C even though we
haven't seen in the system any policy that allows Bob to access C, maybe he will and therefore
maybe such policy should be created.
So we tested this on a dataset that we collected from our deployment. This dataset was collected
over 16 months. Contains about 11,000 accesses using Grey to various doors on our floor. 25
users, 29 resources.
Right. Just a piece of terminology. I'll use this terminology of policy matrix, which is one of
those -- a matrix kind of like you saw on the previous slide. And there will be two kinds of it. One
is the implemented policy matrix, that is, it will be constructed just from the things that occurred in
the log. There will be another kind which goes beyond what occurred in the log. That is, we
actually gave out the survey to users and asked them if such and such a person asked if they
could access your office, would you allow this access.
So through that we got a richer matrix than the intended policy matrix. And these were the two
against which we evaluated how well we could -- yes.
>>: Some computers were stolen as a result of [inaudible].
>> Lujo Bauer: Right.
>>: Laptops.
>> Lujo Bauer: What happens if we make wrong guesses?
>>: Because you can allow some accesses that you don't want to.
>> Lujo Bauer: Right. This is a very interesting point. We've had some very lively discussions
on this topic, in that, for example, I feel that if the computer was doing -- it was correct 90 percent
of the time I'd be okay with that and I would just let it do its thing and trust it. Most people
strongly disagree.
So what we do in the system is that as a result of a guess, the computer interacts with the
person. So it doesn't create policy. It talks to a person and says: Well it seems like maybe this
policy would be useful, do you want to create it.
So that speaks to the concern that you voiced. So the way we tested this is using a simulator.
And the simulator sort of replayed all the accesses that were seen in the logs. So at every
iteration it would use the access it was currently looking at to extend the access matrix that
described all the accesses that were seen so far. So this represents all the accesses that were
seen so far in the system.
Then we would do rule mining on that access matrix. This would generate rules which we would
use to make predictions. Again, using this access matrix. And then we would, the way we would
evaluate these predictions to compare them to the ones in either the intended or implemented
policy matrix, just two different types of comparison. Here I'll just speak about the intended one.
So in this case because this final matrix in fact showed that this access was given or would have
been given, then we say well this is a good prediction.
There are a couple of matrix we use to measure our success. The most natural ones are
accuracy and coverage. So accuracy is of the predictions we make, what percentage is correct.
Coverages of the accesses that will take place, how many can we predict.
And we measure these with respect to here I'll just talk about with respect to intended policy.
You'll see a particular parameter being varied in all graphs and that's the minimum confidence of
the rules being used to make predictions.
I mentioned the rules can have a confidence depending on how much data supported them. If
the rules are very low confidence we just plain don't use them. So this mean count is the
threshold at which point we start using rules.
>>: So since you're initiating these two metrics you could have -- you could have low coverage
but it turns out you've covered every case that eventually comes up. So there's lots of cases
where when you ask people well are there any drawbacks of resource, it's sure. But the fact that
it never actually happened, that's a simpler rule in their head, life's just better because it's never
actually going to happen. But that doesn't seem to be covered by this.
>> Lujo Bauer: Absolutely. Right. So really that goes into whether we're comparing against
intended policy or exercised policy, which is exercised policy is the policy that we see at the end
of our data collection. So does that make sense?
>>: So but a survey doesn't get intended policy.
>> Lujo Bauer: Right. Absolutely. So the survey gets -- sorry? The survey gets intended policy.
The exercise policy would come ->>: I thought survey was getting who they would want to allow, not who they believed they
allowed? Was it -- should this person be allowed to access or did you give this person the ability
to access.
>> Lujo Bauer: Right.
>>: The question being asked.
>> Lujo Bauer: Okay. Okay. So we distinguish between two things which are similar but not
exactly the same as what you're asking. So we distinguish between intended policy, which is the
policy people said they would create if asked, and exercised policy, which is the policy that was
observed by looking at the system over 16 months.
So the exercise policy is which access has actually happened in the system. And you're right ->> Lujo Bauer: Have these people created policies?
>>: So all of the exercised policy, right, all of the accesses happened because somebody
created policy to allow them.
>>: I could have created a policy rule, granted you access to my office and maybe you never
used it.
>> Lujo Bauer: And that's the one that falls through the cracks in this particular version. We're
now doing stuff which takes that into account. I don't have results for that here. But the general
nature of the results are the same. If you're willing to trust me on it.
>>: What happened is [inaudible] is not in dispute. Where the intended policy it's something
vague. But what would I do if ->> Lujo Bauer: That's right.
>>: So you compare something well defined with something which is vague.
>> Lujo Bauer: Well, all right. So I would say it's not vague in the sense that we ask people
specific questions. But it is absolutely true that the way they answer these questions might be
different from what they might do in reality. So this is a source of impreciseness, you're right.
Did you have another.
>>: I think it's linguistic confusion. I think what you're talking about is ideal policy. It's intended if
you said in retrospect did you mean to give this person access to that. But it's ideal if you're
saying would you want, right? Because there's my ideal policy and then there's what was I
intending when I created the policy.
>> Lujo Bauer: Yeah, so I think that what you're calling ideal I'm calling intended here, yes.
>>: I have a question about this configurations. Can what you learned from the log be -- how can
you use what you learned from the past to improve the future when users are making a choice
and the system will make an error. I don't see the connection.
>> Lujo Bauer: Sure. I should have given a more concrete example. So here the idea is
suppose that Alice, Bob and Charlie all accessed door 1, right? And Alice and Bob both
accessed door 2. Now, so we have a situation where it seems like most people who access door
1 also access door 2. It could be the case that Charlie also will access door 2. So we ask
somebody: Will Charlie need to access door 2? It seems like other people who are accessing
the things Charlie accesses also access door 2; would you like to extend policy in order to make it
possible for Charlie to access door 2?
Does that answer your question?
>>: I guess I'll see the results.
>> Lujo Bauer: Okay. And so I guess to connect the example to what I was talking about. The
fact that Alice, Bob and Charlie accessed door 1 and Alice and Bob accessed door 2, that's what
we observe from the access logs. And so then we crank this rule mining crank and it tells us,
well, it seems likely that accessing A implies or accessing door 1 implies accessing door 2. And
then we apply that back to the matrix and notice, look, Charlie accessed door 1 but did not
access door 2. Maybe this is something that ought to be made possible by creating more policy.
Okay. I'm confused a little bit. If Charlie never requests access, why is that a problem?
>> Lujo Bauer: Right. So it's a problem because we'd like to eliminate situations where Charlie
doesn't have access the moment he requests it. We'd like to make it the case that if Charlie is
going to request it, the policy will already be created before he requests it so he can access with
no delay.
So what we're trying to get rid of is delays like the one in the demo video where somebody, Dave,
had to wait in front of my office while I create a new policy. And while in this particular case the
delay was short because I was able to pay attention to my phone immediately, it could well be the
case that I'm asleep or out of on a plane or for some reason unable to pay attention to my phone,
in which case Dave wouldn't have been able to get into my office for a while.
Of course, with access to my office that should not be such a big deal. But you can imagine
needing to get to data that's critical, like, I don't know, patient records or a server room where a
server is about to overheat or something of that nature.
>>: But actually your example was -- it reminds me of a counter example, is the doctors may
have, share a lot of common access rights. That's my guess. One doctor shouldn't have access
to another doctor's patients's files. See what I'm saying, even if they share a lot of common
access, right, doesn't mean another one should get access rights ->> Lujo Bauer: That's right. And so, well, let me go on a little bit and I think at least part of your
question will be addressed.
So accuracy and coverage. Completely unsurprisingly here mincom [phonetic], this threshold is
being varied on the X axis. On the Y axis is prediction accuracy from 0 to 100 percent as you use
rules that have more support by the data or they're more accurate. Right. No particular surprise
there.
And also if you think about it, not very surprisingly -- sorry. So here to your point about some
rules aren't a good indicator of policy. So there are situations like doctors do lots of similar things,
but then in some places they also do different things.
So what we do is we add a feedback mechanism. That is, as I said the result of a rule the system
interacts with the user to ask if the user wants to create more policy. Now, if the user listens to
the rule, we mark this rule good in some way, and I won't go into the details. If the user does not
listen to the rule, we mark the rule as a bad rule. And so in this way we penalize rules for not
being useful. We reward rules for being useful. And eventually we stop using rules that haven't
shown themselves to be useful or have shown themselves to frequently be incorrect.
And in this way we can improve the quality of our predictions to some degree. The old line are
slower ones and newer line with feedback with this bigger one and here the differences are
actually, it's about 30 percent better for range here.
Also not too surprisingly, the coverage, that is the number of accesses we're able to predict,
decreases with the confidence, as the confidence of the rules grows. This is because there's a
very small number of rules with very high confidence.
So even though they are very accurate rules, they only result in a very small number of
predictions. So coverage can't be very high.
>>: I'm just trying to understand this graph. You're saying if you have no confidence, you have
no evidence that a user's going to want to change policy there's still a 50 percent chance they'll
say ->> Lujo Bauer: Right. In particular due to the way that the system was -- yeah, so I don't know
about the zero point, but just to the right of the zero point we started off with such a smaller user
population and such a small resource population that randomly you could guess pretty accurately.
When you got to -- when you went through the first, I don't know, 50 accesses or so, then this
was no longer true at all. But at the very beginning it happened to be true.
>>: Are these adding -- are these for rules for giving additional information or also for removing
information?
>> Lujo Bauer: These are just rules for giving additional permission.
>>: So in theory it could just be that 50 percent of the time if you throw some a random name if
you want to give someone access it would say sure, why not.
>> Lujo Bauer: It could be the case. It's not, but it could be.
>>: Because people are going to feel like they -- to say no they have to have some justification
for why they don't trust that member of their lab.
>> Lujo Bauer: Yes. And actually to some degree I will address the question of what kind of
policy people created in the second part.
But what's important here is that there's some area of parameters, some space of parameters
where we can achieve decent -- for some definition decent accuracy and decent coverage. That
is, we can make correct guesses most of the time while at the same time predicting most of the
policy.
So the next question is: Once we make a guess as to how policy should be changed, how can
we figure out who is the right person to change it. And so in a system with a single administrator
it's trivial because there's one administrator. Just ask administrator. In a system with many
administrators, this is a little bit more difficult. You don't want to ask all of them because, well,
you'll be asking a whole lot of people and probably at most one will be answering. So they'll get
annoyed.
So what we do is we also log not only who accessed what but also logged who helped. And then
we come up with a bunch of strategies for contacting users based on how they have helped in the
past.
So we have several strategies. One is we ask users who helped when other users accessed that
resource or we accessed -- we asked users who helped this same user but when he was
accessing other resources or the union of these or the union of these plus anybody else who
happened to have accessed this resource in the past.
And again we run this through a simulator in order to evaluate how well it works. The simulator
starts out as before, except now the logs also include information about how misconfigurations
were previously resolved. Notice that this information a lot of this information can also be
extracted from the proofs, right if you actually look at the structure of the proof you can sort of tell
who could have come up with that proof or who contributed credentials to that proof.
So first is the same step as about of. We identify misconfigurations, and then we construct from
that and the logs of who helped resolve misconfigurations we construct a list of candidates who
might be asked to help change policy. And then we ask any of those candidates.
So here we have a couple of different metrics. One is the success rate. That is, the number -the percentage of correctly predicted misconfigurations that can be repaired. That is, for which
we can figure out who is the right person to ask.
The second metric we have is a number of high latency accesses we save. So I say the goal is
to lessen the number of these high latency accesses when somebody is stuck in front of a door
waiting for policy to be created. And so we'll just count for the data that we have over 16 months
how many of those high latency accesses that occurred would not have occurred if you were
applying these methods.
And finally, this is an interesting one. So we're changing the kind of user interaction that's going
on. Previously the only kind of user interaction between the user and the system was either
you're creating policy or you are accessing a door.
Now, there's a new kind, which is the system proactively asks you: Do you want to allow
somebody to do something. So we're wasting user's time. We want to see how much user times
we're wasting compared to how much time is saved by the person who could now enter
immediately instead of waiting.
So this is on the X axis you see different strategies. On the Y axis you see the success rate.
Let's see. The three different bars within each strategy indicate whether we asked just the first
person in this ordered list or the first and second person or everybody on the ordered list.
And what shows up here is that occasionally there's some difference between asking, between
asking just the first person or asking the first and second. Occasionally it helps to ask the second
person. But it actually very rarely helps to ask beyond the second person. If you just ask the first
and second person, it seems like that's the optimal point between getting somebody to answer
you but not bothering people too much.
>>: How large of a set --
>> Lujo Bauer: I'm sorry?
>>: What's the total set of potential things you can do this.
>> Lujo Bauer: Potential set of admins I guess is the same number of users. So it's either 25 or
29, I forget which it was.
>>: But like I can't set the policy on Lori's door.
>> Lujo Bauer: Actually, you can. Because if Lori gave you access, then somebody could ask
you whether you want to let that person in.
So this is the graph that shows how many total high latency accesses we saved. This is, again,
the confidence threshold of the rules that we used. Here's just the number of accesses that
occurred in the system.
So this red line is our baseline. This is how many high latency accesses actually occurred in the
data we observed. And for some particular reasonable selection of parameters we can do away
with 44 percent of these high latency accesses using the techniques I described.
Finally, with respect to the user interaction time, that is, how much total time everybody involved
spends interacting with the system. Well, first of all, this is really extremely specific to our
particular system because each system will be very different in this respect. To give you some
sense for how much time various events took, when a user was waiting to access a resource, the
time it took for access to be given ranged between, well, 25 seconds and 18 hours.
Clearly at the end of 18 hours this person wouldn't have been waiting anymore. But the median
was 98 seconds. And the average time a user spent repairing a misconfiguration, interacting with
his phone when the phone asked allow X is 23 seconds. So these numbers are how the following
graph is produced.
And again here the X axis is the confidence threshold of the rule. So the Y axis is the total
amount of time all users spent interacting with the system during these 16 months in hours.
So again red is the baseline. Users in reality spend something more than 2.6 hours interacting
with the system. And for a set of points that are the result of how long people would have spent
interacting with the system using these prediction techniques, what you see is the total interaction
time would have been less than it was in the real system.
So the way we read this is, first of all, it's nice that we were able to do away with 44 percent of the
high latency accesses but what's more we were able to do away with many of those high latency
accesses without increasing the total amount of time people used interacting with the system.
There's the danger yes we saved some high latency accesses but people spend ten times more
overall interacting with the system.
So the summary of this part is that for some reasonable set of parameters we can substantially
reduce the rate of high latency accesses while also reducing the total interaction time or not
affecting the total interaction time much. And identifying a large percentage of the misconfigured
policy. So something that we've discovered since our initial results was that this is actually the
result worked reasonably well even if you pull out a bunch of users.
So if you pull out 20 percent of users, the results are actually very similar. If you look at
performance over time, it's also reasonably stable. So that's not the case that -- these are some
initial period of 30 accesses or something like that during which performance, there's no
performance to speak of. But after that predictions are reasonably accurate from beginning to
end.
Any questions on that before I hop to the next part?
>>: I've been thinking about this concept intended or ideal confidences. Have you looked at
asking people questions in two ways: Which of these people would you exclude from this door
and the other, which of these people would you allow access to this door and seeing whether,
what the baseline effect is there?
>> Lujo Bauer: I think that's an excellent question. We haven't done that. So I concede that the
way in which we collected what we called intended policy was not amazingly scientific. I'm sure
that there are many specific inaccuracies.
I don't have a feeling that it was wildly inaccurate in the sense that people would have always
given much more access or given much less access. But I don't have a scientific are ultimate
which would say this.
>>: The problem is scaling to sophistication policies. So, for example, when the pinpoint tells me
at least these people -- so I know who is responsible. Maybe not a friend, maybe an enemy, now
I know who is responsible for the source that itself may be a secret. So, for example, when they
delegate something, the delegation may be only, say, the next person but not this person.
>> Lujo Bauer: Right.
>>: It has to be transitive. And the policy may be kind of no conditions, time, position, the
records [phonetic] and just seems it works in this simple situation.
>> Lujo Bauer: Well, yes. So I agree that policies could be a lot more complex and that would
complicate matters, certainly. In terms of the dataset on which we're basing the predictions, it's
not policy. But it's accesses. So a lot of the policy complexity would get lost, would get erased
before then.
There's still the issue of privacy. And that we just plain haven't addressed yet. It's been on our
to-do list for a long time but we haven't gotten around to it. It definitely might be the case that I'm
willing to give Stewart access to my office but there are some other people I don't want to know
that. And so for right now the system doesn't implement any safeguards which would prevent
these other people from accidentally learning that.
Anything more for now? All right. This part will be quicker. So this was the part where we tried
to -- well, we tried to provide some justification for all the features that we included in the system.
And every system design begins with some intuition on somebody's part that here's some set of
features that's interesting that people will find useful. But we actually wanted to see whether
when we built a system, let people use it overall did they gain something for using the system or
not.
And there were two particular questions that we wanted to answer. One is so we built a system.
We'd like to know exactly how well does it match the user's needs. The users want some stuff we
believe maybe they want some other stuff. So we built a system to match what we think is cool.
Let's see exactly how it works for them.
The other part was somewhat to address a criticism I've often heard, if you allow delegation to
happen very easily, what you'll end up with is a world where everybody's delegated to everybody
else and, hence, everybody has access to everything and so you have no security whatsoever.
So we were curious how that would play out. My intuition it wouldn't but it's a valid scientific
question.
So let's see. So when I speak about a policy for the next several slides, now, I will talk about a
policy being a TUPL over a principle. Like Bob. A resource like Alice's office. And a condition
under which this person can access that resource. So true means that this person can always
access that resource.
We conducted a study here where we examined the access control policies created by eight
people who had resources for which they could control the access control policy. A lot of the
users in our system were users that didn't own any space, and therefore weren't really
necessarily empowered to give out access to it. We study primarily the people who were able to
give out access to their space and then all the users who actually made use of the policies that
these people created.
Here we have a slightly larger set of accesses than previously. Previously we had 11,000. Now
we have almost 20,000. And a key thing that we try to extract here is the ideal policy.
To introduce more potentially confusing terminology. So the policy that people would want to
enact if they weren't restricted by technology. So this is something that was actually quite hard to
extract from users. In general it's data that -- it's very hard to come up with, because when you
ask a user but what accesses he would like to make possible, sometimes he plain just doesn't
know. Other times his thinking is guided by what he knows is possible because he's controlling
access using keys or proxy cards or whatever.
So it's really hard to get the correct answer. So we think we came up with a very good
approximation of the right answer. But the reason we were able to do this is because we
periodically interviewed users. We interviewed users in essence I think every two weeks or every
month over a long period of time. And so initially we asked them what were all the things that
they wanted to do, and we did some probing to figure out if their answers were somewhat
consistent.
But then we compared that to what we observed them doing in practice, and we always would go
back to what they originally said and we would try to again, we would try to make sure that the
answers they gave were consistent. Often they weren't so we would revise what their ideal policy
was.
And then we put this together in this humongous graph of what was everybody's ideal policy is
just a piece of it. Essentially says things like Gary is always allowed access to Charlie's lab or
Frank is allowed access to Charlie's lab only the access is logged or Joan is allowed to access
the lab only if the lab owner's notified. And Mary is never allowed access to the lab.
Now, what was interesting is that this set of conditions that we identified as conditions under
which people were willing to give access. So there was the false, which is never give access to
true. Trivial logs would mean that somebody would want to give access only under the condition
that they could have a record of the access after the fact.
And these were kind of interesting. This was willing to give access but only if I'm notified that the
access took place, that it took place. This is that I would be consulted in real time. That is I want
to be consulted at the time of the access if I want to give it or not. These are extensions of that. I
want to be consulted at the time of access but also somebody has to be present to witness the
access. And I'll trust the third person to make a realtime decision and witness the access.
So with these conditions you can immediately see that various systems might not be able to
implement these conditions. Here you're using keys. You can't implement logging or owner
identification and so on. If you're using Grey, well, you imagine that logging is trivial. Something
like real time approval also works because I talked about that.
But we happened to forget owner identification, because we only afterwards discovered that it
was useful. So now that we have this list of conditions, we can look at all the rules that were
created using -- that exist in the ideal policy as well as in the policies that people enacted with
keys by giving keys to people or with Grey, by creating credentials in Grey. So now what we did
is we compared policies. Not accesses.
We actually compared the policies implemented in the various systems to each other and to the
intended policy. And we counted false accepts, false rejects and faithful implementations where
a false accept is a situation where the condition implemented by the real system is in some way
not as strict as the ideal condition. Versus the false reject is a situation where the system and
practice is implementing what somehow is stricter condition than the ideal conditions that the user
desired.
So we just -- the way we make a comparison but we count up these numbers and compare them.
Just a more concrete example of false accepts and false rejects. So here is a situation where the
ideal policy is for Alice to have always to have access to the lab, the way to implement it is to give
Alice a key. And we assumed -- we took it as meaning that the policy was faithfully implemented,
as the condition of having a key is approximately as powerful as the condition of the desired
condition of having access anytime.
If on the other hand the ideal policy that the owner should be notified when the lab is accessed
and again the policy is implemented by just giving somebody a key, well, then this is a false
accept. Because this condition of giving somebody a key is not as strict as the condition of
requiring owner identification. It does not implement owner identification. But the key was given
and so this is a false accept.
And finally a false reject is a situation where -- and we had some of these where the user really
wanted access only if logging will take place. And this was so important to him that since he
couldn't have logging he denied access.
So we say that this is a false reject because the reality is that it was always a rejection even
though sometimes under some condition the user wished to allow the access.
So here we have a graph by ideal conditions, the various ideal conditions are on the X axis, on
the Y axis is the number of policies in each condition. Now, this is for keys. And green indicates
faithful implementations and yellow and red false rejects and false accepts.
What are interesting things here. Sort of what you would expect keys don't do well with logging
or owner identification or owner approval. The errors are sort of split into false accepts or false
rejects, because if people wanted the condition but not so strongly, then they gave a key anyway.
If they really wanted a condition they didn't give a key. In one case it would be false accept and
the other case it would be false reject.
This one is really interesting. You would think that when people didn't want somebody to have
access, that would be easy to implement with keys. But that turned out to be not the case. And
we looked at it, into it a little bit more deeply I think in general it turns out with keys not to be the
case.
So what happened frequently is that people wanted to give access under specific circumstances.
Or they wanted to give access to a group of people. They want to make sure that a group of
people could sometimes get into a machine room. And since it was too complicated to create a
bunch of different keys and hand them out to everybody and then sort of do revocation of keys
when people left so on and so forth, what happened was one key got created and it got hidden
somewhere on the floor where everybody could access it.
Everybody who might need to access it was told where that key was. And this happened a bunch
of times for a bunch of different resources and the result was that now this hiding place had keys
for a whole bunch of things, such that anybody who needed any of the keys in that hiding place
had access to all the other keys that maybe they didn't even know what they were for.
But certainly they had access to them. They could use them to get into other resources. So this
led to a large number of false accepts when people wanted to deny access.
And this is actually a very conservative number. We counted the false accepts in various ways
depending on how certain we were that somebody could get to the key. So this was the smallest
number of false accepts, when we were more liberal about interpreting false accepts, the number
was just ginormous. Now comparing keys with Grey. Couple of interesting things. This is one I
mentioned: Owner identification is something you can imagine that with Grey we could
implement easily. We didn't think to implement it. It wasn't here in this study, so all of those
policies for us were false rejects.
Owner approval is something that Grey could implement easily with this reactive delegation
feature. It didn't work well with keys but works well in Grey. And perhaps most importantly all
situations where users inadvertently had access with keys, it turns out that the policies that were
implemented in Grey were such that people did not inadvertently get access.
So that perhaps was the most interesting result here. So to summarize this part, I think one of
the interesting contributions is that we document this collection of ideal policy data, which is really
pretty difficult to get. And we developed the metric for talking about how accurate a particular set
of policies is and comparing different sets of policies complemented by different systems.
And finally with respect to Grey, we were able to show that Grey was both less permissive than
keys. And it was more accurate at implementing the user's ideal policies. That is, it was more
accurate at implementing the policies that users wanted to have.
So, in fact, it was not the case that in a situation where you could or at least in this particular
situation where you could delegate very easily, it was not the case that everybody delegated to
everybody else. In fact, overall fewer people had access to things with Grey than they did using
keys.
My last slide. So Grey is this platform that I really like as a vehicle for experimenting both with
neat things we can do with mobile phones and in particular with a logic-based platform for
distributed access control. There's a particular set of things that we've studied so far. But I think
something to take away from this, too, is that the various things we've done in this context, they
don't apply necessarily only in this context.
So everything that I've talked about I've talked about in the context of mobile phones. But you
can really apply this technology in situations that have nothing to do with a mobile phone. You
can do access -- distributed access control in this way on the web or in IT systems or certainly
some other environment. All right. And that's all I had.
[Applause].
Any questions? All right. Thanks.
Download