17469 >> Jim Larus: It's my pleasure today to introduce... January from UC Santa Cruz with Martina Bardias as advisor...

advertisement
17469
>> Jim Larus: It's my pleasure today to introduce Avik Chaudhuri who graduated in
January from UC Santa Cruz with Martina Bardias as advisor and has been working for
the meantime as a post-doc with Jeff Foster at the University of Maryland, talking about
some work on work he's been doing on building secure systems.
>> Avik Chaudhuri: Thanks for the introduction, Jim. So let's start. So the systems I
want to talk about today are the systems that you rely on. These are the systems that
you are running code on. Traditionally these have been operating systems. More
recently these systems, the level of abstraction of these systems are getting higher and
higher and more advanced. For example, you have application frameworks. You have
mobile device platforms. These are typically high level operating systems and storage
systems, but they're still kind of the lowest level of abstraction that you care about. And
you run your code on top of these systems.
So the important thing is that these systems run code and of course we're interested in
knowing whether these systems are secure and correct and what kind of guarantees
these systems provide.
It was important because you're trusting your code to be run on these systems. So the
aim of my research is to develop foundations for these systems. This is really two tasks.
First of all, you want to analyze these systems, and here you want to do
specification/verification. And then you want to kind of apply the insights you get from
analysis of existing systems and maybe some other items to constructing systems. And
you want to do some design and implementation.
Now, the process that I'll take in this talk and the process I usually take in my work is
really view this problem from program languages perspective. So to exploit program
languages of a lens what I mean is essentially you want to focus and organize your
ideas around program languages. And also bring principles and techniques from
program languages to these problems, analysis and construction of these systems.
So this talk is going to be more or less an overview of past work that I've been doing in
particular in system analysis. So I'll cover storage systems, operating systems and
mobile device platforms and web application frameworks. Mostly give you a very high
level view of projects I've been involved in, delve into some details somewhere and for
projects I'll just gloss over the details.
Please feel free to ask me about the details, and I want to run conversations as well if
you're not [inaudible], feel free to send me an e-mail and we can set up some time
afterward or something.
So the process we take here is kind of a new one. We'll view these systems as
programming languages. And then we will apply program analysis for getting system
guarantees. And the claim here is that similar ideas should be, we should be able to
apply similarity to system design.
So viewing systems as program languages is an idea but it's not entirely new. For
example, program languages have been very useful for the rigorous design analysis of
communication protocols, and there you have languages applied by calculus that is very
successful.
For example, you model cryptography in an algebraic way and you have equations that
describe your graphic operations but your cryptographic operations are entirely abstract,
just applications of function symbols.
For example, this equation says that if you have a term encrypt of SK which says that
you're encrypting a message of X with key K. If you supply the key K and then apply this
function decrypt, you get back message X. That's all you care about in this language
and possibly a low level model of computation you would model these messages and
keys as bit strings and you would have a different model.
But then you'll be sure that there's some correspondence between these models.
Having a high level view like this allows you to have a nice tractable symbolic analysis.
So you have a program language to describe protocols. Protocols are just distributed
programs that are run on various principles that parse protocols. And in this program
language you can create new names. These are kind of atomic fresh names which can
represent keys and you can build terms by applying function symbols.
And then you could communicate these terms over channels and then compose parallel
processes. So very -- you get very terse and precise descriptions of protocols.
What can you do with them? Well, you can model the adversary here as kind of an
unspecified arbitrary context. When I say context here I'm using the program language
of a context, which is just a program with a hole. And the hole is where you push in your
protocol and then the adversary gives you environment which is completely unspecified
and arbitrary that runs your protocol.
Now the important thing here is the language semantics defines the power of the
adversity. By define, what I mean is that.
Now, the important thing here is that the language semantics defines the power of
adversary. So by defining, what I mean is that whatever the language allows you to do,
whatever abstractions are in the language, whatever constructs there are in the
language, the adversary is free to use that, so it's an arbitrary program in the language.
But then it also limits the power of the adversary in the sense that if your language does
not allow it to break some abstractions then adversary cannot do that. This is kind of a
nice separation of concerns, and if you can prove something in this model, then you can
get some guarantees. But obviously you would require more and more refinements as
you get into more concrete guarantees but at some point you have to stop.
This is a sweet spot where you do get a lot of power. You have various techniques that
can allow you to do some really useful analysis. So here you do specifications with
types, logical assertions. Here types are richer than the usual program language types.
They also carry around security invariance. You can also use logical assertions. You
can embed a logical language in these program languages.
There are associate proof techniques, special types systems [inaudible], and there are
successful tools in this domain. [Inaudible] is a tool we'll see more about later in the talk.
There's also F7 built by your colleagues at Cambridge. This is a type system that works
on F sharp. So you can write inference implementations of your protocols in F sharp
and write finer. So the bottom line is this field is pretty mature and you can verify
protocols or implementations as well as uses using these techniques.
The claim here is that this approach can be extended to other systems. Communication
systems are not the only systems we care about. Typically you have a competition in
storage as well as communication so you really want end-to-end guarantees.
But similar techniques can be extended to there. So we see various kinds of systems
here where we apply more or less the same ideas. So we look at storage systems and
here we'll talk about, for example, a protocol for doing secure file sharing on trusted
storage. And someone has this there. And we also talk about implementation about
disputed access control and more exotic models of proof authorization where we can
provide some language support to automated proof management.
So most of this work is -- so the first project is joint work with Bruno Blanchet. The
second line of work is joint work with my advisor Mario Abadi. And the third one is joint
work with Deep Garg, who is one of my undergraduate classmates who is now finishing
up his Ph.D. at CMU.
So these kind of ideas are not only useful for storage systems but also operating
systems. Here I'll talk about a little bit of work that I did with guys at MSR India.
Rajamani's group where we looked at operating systems security models, and in
particular we came up with a type system for trying to enforce a security on Windows
Vista.
More recently I have also been looking at mobile device platforms, in particular android,
which is Google's new platform for mobile phones. There we are trying to do some
security verification of mobile device applications and in particular we're trying to do
certified installation of these applications.
This is joint work with Jeff Foster and a student there. And we also are applying very
similar ideas to verify Web application frameworks, in particular loop beyond rails and
this is joint work with two other students and Jeff Foster, where the goal is really to
eliminate whole classes of attacks that are possible on Web applications.
So let's start. So let's look at storage now. But even before we look at storage, you may
wonder that just now I said that, okay, we know a lot about how to do secure and correct
communication. And you may argue that, well, storage is after all communication. So
reading and writing a file is more or less like sending or receiving on the channel, as well
as if you imagine that you have untrusted disks, for example, and you want to do storage
on a trusted disk, for example, buying some disks from some company and you don't
trust that company.
This is more or less similar like doing communication in seeker networks. In particular
you would imagine that similar cryptography techniques might be used.
This is true. So at least you would want to be able to leverage at least some of the
previous work on communication for doing secure and correct storage. But at the same
time we should be careful about carrying these analogies too far. For example, access
control is pulling storage systems and in storage systems essentially you have dynamic
access control, where -- and because of that you should expect dynamic specifications.
So communicating systems are more like short term contracts where you can get away
with static specifications you start off a session and generate some keys and after that
you throw the keys and start another session.
So your analysis essentially focuses at session level. But if you have storage, then you
have more kind of, you have resources and you don't really throw away. And you have
longer term contracts, and there, because you have longer term contracts, your
assumptions about the environment can change. So typically your trust assumptions
change and you need finer specifications and you need techniques to actually verify
these kind of specifications.
You'll see examples of all this in this talk. So let's look at storage on untrusted disk,
because it's something that I just mentioned on the previous slide. So imagine that you
have a server that is untrusted that houses some disks. And your various clients. Some
of which are untrusted. Some of which are trusted. In fact, trusted assumptions on our
clients will change over time as you'll see.
And Alice and Bob are two users who want to do secure file sharing on these untrusted
servers. So Bob writes a file that Alice must read.
So of course because the server is untrusted, you have to store files encrypted, that's
the minimum you have to do, right, because you don't trust the server to not leak your
data or not tamper with your data.
Because the server is untrusted you have to manage all the keys on the client side. So
you can imagine Bob and Alice are essentially a part of some group of writers and
readers and these groups are essentially defined from the knowledge of some write key
and read key. Write key is pair of relationships and assign key read key is the
complementary pair of verification [inaudible] encryption key.
And there may be other users in the system, some of which are part of these set of
readers and writers, some of which are not. And now you also need someone to
actually manage these keys because the storage server is not going to do access
control for you.
So I mention access control, but really here access control is defined by knowledge of
keys, right? So the owner is, say, some set of principals who can create and dispute
these keys and they define who -- which access groups are currently in effect.
So here now something interesting happens now. Until now, it was more or less like
communication, but now said owner suddenly stops trusting Alice, or maybe not as
trusting -- maybe the owner just wants Alice to not be able to read the file for some time.
You can imagine various scenarios why this is true. And storage systems you have
various scenarios like this.
In the real world, for example, you may have a PC committee discussion going on and a
PC member has to leave the room when a paper is being discussed with which he has
some conflict of interest. And later he may be brought in. But he has these dynamic
changes of your assumptions.
So now because of access control is defined by knowledge of keys here, really the only
way the owner has to kind of implement this access control is if he can create and
distribute new keys to the new sets of readers and writers.
And this will go on. So at a later point in time you may want to bring back Alice in your
set of readers. You may want to change some other assumptions. And then the owner
again creates and distributes new keys.
So if you think of implementing this in a file system, then of course you have to take care
of various things. In the previous slide we saw that the owner had to create and
distribute new keys every time access control is happening and then potentially your
principles have to manage a lot of keys. Also it's not immediately clear what you do with
your files, which are encrypted and signed with previous versions of keys. What
happens when you actually issue new keys.
So Plutos is one of the first file systems that looked at this kind of architecture. And
really the meat of the protocol is in its optimizations how it handles dynamic access
control efficiently.
One of the questions it asked are these files immediately secured with the new keys,
whenever you change your set of keys, do you have to go and read encrypt and resign
your files.
And the interesting answer in Plutos is that, no, you actually secure the files with new
keys only on subsequent writes. So this is a scheme that is called lazy revocation, and
this has been very popular in these file systems from then on.
It's largely seen as an optimization, but really if you think about it, it actually makes better
sense for security, because you're trying to attach your security to data rather than this
more intangible notion of access control. Right?
So your old data is already, can be read by all readers, for example. You can do nothing
about it except going and erasing your data right now. But the next time you do access
control, then next time you do some write you're using these new keys.
Now there's a potential problem with this, right? Because you have now some data that
is stored in your file server but it could have been written by old keys, new keys, don't
know.
So the readers potentially have to manage a lot of keys, and they have to decide which
keys to use to read your data. So, again, Plutos comes in and says, look, you can throw
away your old keys, just keep the newest version of your key and we let you derive lower
version of keys as necessary and put in enough information in the file system to let you
decide which version of keys to use and again roll back your keys as you need it.
There's something called key rotation. This has had a lot of interest in the cryptographic
community. Quite independently of this.
>>: Do you guarantee that the old blocks are still actually readable to the old keys, can
they actually be rewritten?
>> Avik Chaudhuri: Yes.
>>: What's the granularity, because sometimes you have to write a whole block even
though you're only changing one item?
>> Avik Chaudhuri: I'm simplifying the protocol. All of this is done at the block level. So
we look to some of the issues in a slide or two.
So really after -- so key rotation is a very funky thing. The implementation is very tricky if
you are going to use existing cryptographic primitives like RSA. Plutos is implemented
over RSA. And you really need some cool number theoretical values to fit everything
together but at the same time not break off all the guarantees.
So the question to ask is the protocol even implement these optimizations correctly?
And then you really want to know what security implications you have of these
optimizations. So some of the work that we did here was we did automated security
analysis of this protocol, Plutos, with ProVerif. Now if you remember I talked about
ProVerif a few slides back when I talked about secure optimization.
You see here we're using a tool that was originally built using for analyzing protocols and
for now analyzing storage protocol. What ProVerif does, it takes applied calculus
programs, applied protocols, plus some disparate properties and translates everything
down to clauses, translates the properties to queries and uses a clever algorithm to
answer these queries.
So the results we got out of this work was we actually found that the protocol has weaker
secrecies than that was claimed. In particular, writers can act for readers in this model,
writers are as powerful as readers. Also secrets may eventually leak. So if now you
have some secrets and then somebody else goes corrupt else later on, guarantees are
very weak on that.
So this is kind of disappointing, but if you think about it it's really a consequence of
design choices in Plutos. So the designers wanted some optimizations. These kind of
fall out from those optimizations so you should really not view these as bugs. These are
kind of features.
But most seriously we found an attack on integrity. And this is certainly a lot of feature -the adversary can exploit a very cleverly, a very subtle bug in the protocol. Because you
juggle all the key rotation, laser rotation and still want to have some guarantees, and this
is not something new. Even in communication protocols these kind of bugs appeared
where you just thought that you're doing some redundant stuff and you just take it out.
You don't really realize that it breaks your protocol. Here you have to put in extra
information in the file system and file headers and you design some stuff and you
missed some terms when you're signing some stuff. And the adversary can use this to
construct bogus keys and fool the protocol.
The fix is extremely simple but the point is that these bugs have to be found somehow
and here automation really helped.
>>: These are clogs in the specification other than implementation?
>> Avik Chaudhuri: Yeah, these are -- we analyze abstract specification of the protocol
as described in the paper. So these are kind of logical errors. These are more serious
errors than implementation errors because no matter how good the implementation is
you'll still get these bugs.
So here using this the adversary can somehow collude with readers to become writers.
It completely breaks the protocol.
So we did use a tool that was originally meant for communication. Analyzing
communication protocols, right. So what's the new idea here? Well, as I said you need
different kinds of specifications.
So the key idea here was how we model corruption. We give stronger guarantees
because we admitted some corruption, and then we gave security guarantees despite
certain principles going corrupted, various points.
So the method here is that we specified code for each roll in the protocol. Owners were
always trusted but the readers and writers could be trusted or corrupt. Trusted here
means you followed the protocol. Corrupt means you leaked keys. This is version
dependent. You can go corrupt at one point and not be corrupt at some other point in
time.
So as usual the adversary is unspecified and it controls any run of the protocol. So we
give a lot of power to the adversary. You have this code but the adversary is free to
instantiate this code in any way. So it can choose what your set of readers and writers
are. It can also choose which principles to corrupt and when.
And in spite of all this power now we now specify security, despite corruption. So
essentially the guarantees that we provide are of this form where it says that if you have
a security violation, then we'll give you precise lower bounds on corruption. We'll tell you
exactly which set of principles had to go corrupt and when. In order for this secure
violation to work at a particular point in time.
So this is a much more powerful guarantee than what is usually considered in
communication systems, where you don't really model corrupt -- I mean you have kind of
a very black and white view of corruption. Once you have corruption in your system all
bets are off.
But here we have to accommodate partial failure of our systems. You can imagine that
there are other models of storage where you still have distribution. For example, you
may consider that now your server is distrusted, but you still have distribution and this
may be due to performance reasons.
For example, traditionally every file system request passes through an access control
monitor and then to the disk, right? But here in some architectures, in particular network
storage and object storage device and so on, you sometimes want to pull out the access
control authority out of the critical path. So every request now does not have to pass
through access control and then get your file.
So in this model the client first goes to the access control authority and says I want to
access this file and gets back a so-called capability. A capability is just an authorization
certificate that can't be forged and that can be verified somehow with the disk so you can
imagine the disk and access authority are sharing some keys that allow this verification
to happen.
Now in this scenario, again if you have changing trust assumptions, the location is really
difficult. As you may imagine if the access control authority is given out the capability
and it's in the large, now it doesn't trust our client anymore, it has to somehow ensure
that capability goes invalid. Various implementations you can think of. You can
communicate some level list to the server and somehow synchronize with the disk or
you can put in time stamps in a capability and let them expire after some time. And then
there will be some lag when you have specified that your access has to change and
when it actually gets into effect.
So again we might be concerned how do we specify and verify the correctness of these
implementations. And one way to look at this is really compared with a much simpler
model of doing access control, where traditionally every file system request has to pass
through the access control authority and then location and everything is immediate,
right? And if you compare these two models, you can ask whether the implementation,
the disputed implementation actually preserves trace properties and equal relevances of
the specification. So why would we want to do that? Well, if you show that indeed
there's this kind of preservation, then you can -- in fact you can specify various safety
properties and security properties using trace properties and equalnesses. But now you
can do your analysis on a simpler system and by preservation you can carry over these
guarantees to the distributed implementation for free, if you manage to prove these very
general terms.
Unfortunately, there are various ways of breaking full abstraction in this implementation.
So not everything is preserved. But if you're willing to compromise a little bit we can give
you a very nice guarantees about how to do correct implementation in this scenario. So
you can actually do these implementations and still preserve a whole lot of properties of
interest.
And while we were at it, really we had to avoid a lot of pitfalls here because really there
is something fundamentally different when you're doing distribution. And we got some
general design principles that can actually be used to derive correct distributed
implementations of other kinds of staple computations. And surprisingly the model here
looks surprisingly close to Driad where you have these computation graphs and you
want to cut them up into various distributed and distributed in various places. And if
Driad was actually concerned about secure communication of results and actually
preserving these trunk properties, then more or less the same design principles had to
be used.
Now going in the other direction you can actually make your model more complex, rather
than simplifying it can actually make it even more distributed. So proof carrying
authorization is an architecture where you have a distributed access control authorities
so each access control authority says one thing about some local resources that it owns
or whatever it's an expert on.
And then it's a burden on the client now to collect all of this information and somehow
construct a proof to convince the file system that it can access your file. So clearly
because you have a lot of distribution here, the file system does not trust too many
people. So it requires authorization proofs and it's a nightmare for users, as you can
imagine.
So the solution here was we came up with a language called PCAL which is an
extension of the Bash scripting language, and we had a compiler for this language. And
this gives a nice support for these kind of architectures, because now the user can write
a script in this language, and the compiler automatically manages the proofs for you and
manages the script to put in enough combined so that some of the proofs are
constructed statically by the compiler but some of the proofs cannot be constructed
statically and the compiler automatically generates combines that will do the proofs for
you at runtime.
Various tricky issues here. For example, proofs at compile time may go invalid at
runtime because state constraints may change. For example, if your proof depends on
the owner of a file being somebody and the owner of the file actually changes at runtime
before compilation at runtime. Then you have to take care of these issues.
But the bottom line here is that this actually makes these kind of very nice systems
which kind of reduce a trusted computing base of your system. And you can add all of
this stuff and make these kind of architectures useful.
So let me pause at this point and just summarize what I said about secure storage,
right? So really here the theme was that we have dynamic access control that kind of
induces dynamic specifications and dynamic trust assumptions and we are to take care
of that in various complications in the system. But really the theme is the dynamic
access control complicates your life. And we need some special techniques to take care
of that.
Now, the next level of the talk, again dynamic access control is going to play a role, but
in a different way. So we look at operating systems and there dynamic access control
will really be a way of resolving security versus functionality issues.
So if you look at a typical operating systems, for example, Windows Vista has pictures
somewhat like this where you have various levels of trust and there's a trust ordering.
So system is more trusted than administrators than of users and then of Web. And
every process and location that is owned by these principles, they have labels attached
to them which kind of specify the level of trust that you start off with.
And then there are protection mechanisms. For example, these labels define access
control in your system. So there's a process where certain labels cannot do certain
operations on locations with certain labels. And here when I talk about location, this is
kind of a generalization of all kinds of files and objects that can be in your system.
So the intention of these protection mechanisms is to restrict information flow across
levels of trust. But of course this often does not happen in practice. Simply because it's
too restrictive to kind of cut off all kinds of interactions between these levels of trust.
So in Windows Vista you have access control for integrity. So every process and
location has an [inaudible] label and you have this policy no write-up and no execute
down. All that means is that no write-up means less trusted process cannot write to a
more trusted locations. And no execute down conversely says that [inaudible] process
cannot execute code written trusted locations. As you can imagine if you do not have a
no executed down, you effectively can create no write-up as well.
So this is great for browser security, because you have a very nice consequence out of
this policy, right? So if you start off a browser with Web privileges, as you do in
Windows Vista, Web data by default cannot override user locations and cannot be run
by administrative processes and so on.
Ignoring the boxes that come up in Windows Vista. This is a basic model that's going
on. But this is terrible for browser functionality because these are exactly the things that
you may need to do, right, in certain scenarios. The user may need to download data
from the Web. For example, e-mail attachments and administrators may install code
from the Web. So what do you do here?
So the key is you want to have dynamic control of labels. And this is something that
Windows Vista proposes. So it does allow administrators to install code from the Web
and users to save data from the Web. But then you have to do something funky. So
there are two ways in Vista an administrator can install code from the Web. One it can
spawn a trusted process and try to execute this code in a less trusted context. This is
fine because with those low privileges you cannot write back to high locations.
And this may be exactly what you want. But sometimes you do want that code to be
able to write high locations. So what do you do then? Maybe you trust that location that
contains that code, basically raise its label and execute with your own privileges. So
now the code actually runs with high privileges, but then you have the verify that trust
pattern. For example, there might be some code signed by Microsoft and you trust
Microsoft and therefore you kind of run it with privileges. Similarly, the user can save
data from the Web. So Windows Vista actually allows a high, a more trusted process, to
read from less specification location. Although by default less trusted process cannot
write to high trusted location.
So it is again kind of a strange design decision. Makes some intuitive sense. But really
at this point we can't pin down on what Windows Vista is trying to do, right?
So there are various implicit design intentions here and some best practices. So if you
look at all this, there's a pattern that emerges here. So in every of these dangerous
operations, you actually have to have a more trusted process involved.
Okay. So more trusted process has to participate somehow in these dangerous
operations. And this may be kind of the design goal of Windows Vista. It doesn't allow
you to have attacks under the hood but it actually forces trusted process to decide when
to actually compromise certain guarantees.
Okay. And similar ideas also appear in other operating systems. I'm now going to talk
about other operating systems here. As it starts a research operating system that's built
up from the ground up with these principles. Again to resolve conflicts between security
and functionality it has to have these kind of ideas.
So as the system designer, now if you are given a task of designing a system with all
these constraints, you may want to wonder what guarantees can the system provide.
And typically you would want to do these two tasks in a loop. So you would want to
model behaviors that can be allowed by a system.
And then you want to restrict those behaviors, analyze them for some nice properties. If
you don't get what you want, you add a role here, you change a rule there, delete a rule
there and keep on doing this until you get to your correct design. Right?
So if you want to use something like ProVerif here undecidable query valuation can be a
problem. If you're trying to analyze your system and suddenly the tool stops, does not
terminate, right? So you can't really do experimentation using this approach.
So our solution was that we came up with a small dynamic logic program language
called Eon, which is expressive enough and the nice thing is it has decidable query
valuation. So it allows you to do whatever you wanted.
So Eon is an extension of datalog with some dynamic stuff. So datalog, as you may
recall, is a small subset of Prologue. Prologue is essentially what [inaudible] clause is all
about. Prologue is undecidable. But datalog, query valuation is quite nice. It's
polynomial time.
So datalog is a very well behaved subset of prologue that has been studied extensively
in the database community. We add that -- we add to datalog two new operators called
new and next, which essentially allow you to introduce and transform new relations in
your system.
So basically model some dynamic stuff. And then we add some syntactic restrictions.
For example, the predicates that you can introduce and transform, they have to be
unary. If they're binary then very soon you can code some undecided with problems.
We also have some transity restrictions.
So new allows you to create constants and initialize with predicates. And next allows
you to update predicates for such constants. For example, with new you can create new
processes and objects initialize the label. With next, you can update labels of these
constants.
And the datalog fragment kind of manages all your constraints. So you can enforce
constraints on these transitions. You can enforce constraints on these transitions and
access control and so on in this datalog fragment.
Query valuation is decidable. The way we assure is reduce it to a decidable problem in
datalog. When I said the datalog query valuation is polynomial time, there is separate
result. Here we are reducing it to a different problem where you're not given the initial
database and you're trying to find out if a database exists that will answer your query.
This is in general not decidable but there are fragments that are decidable and we
reduce it to decidable fragment.
And here the model of -- sorry, the idea of this translation is that you model new with an
extensional quantifier and next state transformer. The proof is kind of difficult, but once
you have it it's very nice. Because now you have this language, which is a programming
language. So again imagine doing experiments with this language. In particular, you
can now model behaviors and analyze restrictions. If you're not happy you can change
around your model just like you would do programming.
So we got our various results for various operating systems. Focus on Windows Vista
here. For Windows Vista we could actually pinpoint now what the design intention was a
few slides back.
What we could show is that attacks can always be blamed on trusted processes. And so
essentially something wrong cannot happen by default unless there's a trusted process
that actually enables that attack.
And there's an interesting design principle, at least I think so. Moreover, we are actually
also assured a sound monitoring technique for trusted processes that would eliminate
attacks of interest.
Here when I say sound monitoring technique I'm really talking at a very high level
because the language itself is limited. It allows you to reason at the process level. So
the granularity of this monitoring technique is really the process level.
But in real life you would want a much more precise control of what you monitor, if you
had, if you looked at code of those processes. So Eon is not good for that but at least it
could make all the implicit design intentions and best practices that we had, all that
explicit.
Now we understand what the secure design actually intends to do. So going further
now, let's look at some code analysis for security. So now we have learned from the
previous slide that Windows Vista actually guarantees that attacks can be eliminated if
you're willing to restrict trusted code.
So the idea here is that we showed how to restrict this trusted code via a secure type
system. I'll be honest here, we're not looking at actual code that runs on the system.
We're looking at a very abstract. We're trying to formalize exactly what kind of
restrictions they need and maybe at a future date we will be able to complete the picture
from concrete code to type checking but we haven't done that yet.
And presumably if your code resides in the high level language like singularity has, then
this would be much easier. But looking at abstract code you're trying to come up with a
nice way of formalizing how to restrict trusted code to get back nice guarantees.
Here types equal security invariance, and the soundness of the type system is basically
type reservation. If you have type reservation then you have your security property of
interest.
So the idea here is that we combined access control and secure types. Again, we had a
similar kind of program language like applied by calculus, but now the construct of the
language is different. They're trying to model what kind of calls you can make to the
Windows API. Some of the calls that are relevant for security.
For example, you can create process and locations that are protected labels. You can
update the labels to reflect access control. Another interesting construct of the language
is that it can back orders data. So this kind of models compilation because you can
backward in this language as data and then write it to a location and that location
becomes an executable that you can execute.
So you have read/write execute on the context of locations. And now access control is
encoded in the dynamic semantics. In particular when you're updating labels or you are
reading, writing, executing these locations, you're subject to access control.
And recall that long back I said that the language semantics defines the power of the
adversary. So again the adversary is free to do whatever it can with this language, but
again it's also restricted by the rules of the game. So it's subject to access control like
everybody else.
And on top of that we add some secure types that enforce the required static semantics.
And this approach is very similar to hybrid type checking in program languages, where
you combine static and dynamic checks as and when required and this gives you lots of
gains.
In this system, at least, soundness relies on this combination. But also we get precision
and in some cases also optimization. So we can optimize some runtime checks that
were not necessary. For example, this is an execution check within Vista that turns out
it's not necessary.
So how do we combine access and secure types? This is a very simplified view of the
type system. There's more stuff to handle all the higher order things that are in the
language. But essentially you can say that you have a secure type, which is essentially
a data type. Annotated with the security level. So you have data type T which is trusted
statistically at level L and that is a type, that is a secure type that will give to such data.
And then you have a type like lock of TL which you would give to locations that contain
data of that type, of that secure type. So lock TL contains data of secure type TL. If you
take such a location, then what we enforce with the type system is that the dynamic label
of lock is always better than or equal to L.
The dynamic label is actually the label at runtime that protects that location. So by
access control we immediately get levels less than L cannot write to that location. And
also cannot control the dynamic label of location. So basically the levels less than L
cannot decide what the contents of this location is.
And conversely by type checking now we are left with levels greater than or equal to L
because we know levels less than L cannot directly harm the system but more trusted
processes at levels greater than L have to be type checked and now type checking
ensures that such levels always write data that flows from high levels to lock.
And also maintains this crucial invariant here. Maintains a dynamic label of log greater
than or equal to L. So quickly going through some examples here. So if you can read
the code on the left-hand side, it's not very difficult to read. Know what's going on. If
you can't, then just follow along on the right-hand side.
So you can initialize say command XT which contains your code for command TXT and
initialize on a location that represents a URL. See IDXT which is your browser code
contains highly simplified code. What it does, it reads the URL and executes the
contents.
Say now your user fires up an instance of i.exe with Web privileges as you do in
Windows Vista. Now on the Web there's an attacker setting who writes a virus.
Virus.exe contains code to override command.exe. Then somehow you point your URL
to i.exe, maybe when you are browsing you suddenly encounter this virus.
But this is actually safe code. So code type checks and we know that attack system is
sound. So, again, definitely say that nothing wrong is going on. Why? Because, well,
because the user fired up i.exe with local privileges.
Now eventually that virus code which overrides command.exe is being executed with
Web privileges, but access control guarantees that with these low privileges you cannot
write to command.exe which is protected by higher level. This is fine. It happens all the
time and it's perfectly safe code and the type system recognizes this code as safe.
Now conversely let's look at another example where say now you initialize the user's
home directory, right? And you have the same URL. There's another thing called
setup.exe. Another location for setup.exe. What i.exe does is reads the URL and
copies the contents to setup.exe. Kind of usual behavior for a browser, right.
Now the user again fires up an stance of i.exe at Web and Web similarly writes a virus
that now tries to overwrite your home directory, the user's home directory. Now
something dangerous happens. The administrator comes and then trust setup.exe
raises the level of setup.exe and executes it with admin privileges.
This is typically called privilege escalation. Here actually something bad happens. And
the code does not type check. Why? Because you know you fired up i.exe with low
privileges and eventually it, what it did was it copied this virus.EXE to setup.exe and now
admin takes that code and executes it with its own privileges. With those high privileges
it can actually write to the user's home directory.
So the attacker has been able to execute his attack and the code does not type check.
So we're sure that the type system is safe. So if you actually use this type system, you
can imagine that you have concrete coding in your system and you abstracted it into our
small model off of program language.
If you apply this type system then you'll actually get these guarantees. Let's move on
very quickly to some of the work I've been currently doing. So here I'm looking at
android, which is Google's new operating system for mobile devices.
Android is essentially an operating system, plus some core apps and more interestingly
it ships with an SDK that allows you to develop new apps. And the SDK is basically
Java APIs. The other interesting thing is the new apps and core apps are really on par.
So every application that write on read on android will use these APIs and they have to
conform to a restricted application model and so on.
Another interesting design decision here is that apps can share components. So every
app is described with a set of components and then they can share. So it's one app can
call a component of a different app. And the sharing is controlled statically. What I
mean by that is install time you have to exactly specify who can control what and what
permissions you need and so on.
So this creates as we see in a few slides this actually gives us a lot of -- it's kind of an
ideal setting for doing static analysis. So the picture here is that Bob is a developer that
writes an application, sends it to an app store. The Alice has applications stored on a
phone now it wants to install this new app. And it has to decide whether Bob's app is
safe or not.
Conversely, Bob also wants to convince Alice his app is safe. So if somehow God forbid
this thing actually works, then every user potentially becomes security conscious and no
one will buy Bob's app if he cannot convince those users his app is safe. The idea here
is we want to do certified installation. This is barred from proof carrying code. The high
level idea is that somehow Bob will construct a proof that's safe and Alice will verify the
proof before installing the app. Here the notion of proof will be relaxed a little bit. We're
not going into foundational proof on this one. The verification can be done elsewhere
and the proofs may be implemented using certificates, for example.
For example, you can imagine that the app store is doing the verification for you, and all
the phone has to do is just trust the app store to have done the verification.
Also, if you just -- if you're doing code analysis and you can prove that the code analysis
is sound somewhere else, somehow somewhat like certified compilation, then maybe
the proof is entirely left implicit all over. You're just trusting the access. We want to do
something like this.
What is required here is really these 2 or 3 ingredients. So first we want to know how
these apps are running on your system. So we want to give operation semantics for
apps running on android. And the reason you want to do this is, well, you could just use
Java semantics, right? Because everything is ultimately can be translated to Java.
That is fine, but for every application you just really want to look at the application code.
You don't want to go into the implementation of all these APIs, because then every app,
to prove every application sound you have to actually verify the whole system is sound.
And that is not a very tractable approach. So you want to separate the verification of the
API for efficiency and have some abstract via these APIs and give a form of semantics
as independent of implementation and then try to use the semantics to get the
guarantees.
And on top of that, of course, we need some kind of static analysis for android apps for
safety. And here we can formalize this as secure type system as we did with Windows
Vista. And the soundness of the analysis will provide the proofs.
So this is how android apps look like. Essentially they're a set of components. You
have inherited classes and other methods so the APIs essentially provide some base
classes you can override and the methods are essentially lifecycle methods for various
kinds of components.
More importantly, perhaps, in the spirit of singularity here, there is some kind of manifest
that every application has to have a manifest. And the installation process can inspect
the manifest.
So the manifest declares all these components and more crucially declares whatever
permissions it needs to actually execute. And also it specifies the access controls on
other apps. So which of its components can be called by which other apps and actually
not which apps but what permissions are needed to call these components and so on.
So all of this is statically declared. So you can use all this information for doing static
analysis. On the dynamic side, there are much more, becomes much more complicated.
You have generally a stack of windows, a user can click from one screen to another and
go back and various call-backs that are called in the lifecycle of a particular component
and so on. You can also raise to the various listeners which will listen for events in your
system and respond to them.
You have data in a database. So because apps can share components, potentially the
code running in parts of a system may belong to other apps and they run with the
permissions of those apps.
And so this is what goes on in a dynamic site. And of course all of them are subject to
access control. So we're doing some static analysis here. The picture is that we have
some concrete code in Java and we're translating in the abstract code and doing some
type checking and here we can actually complete the picture or trying to complete the
picture because Java is a higher level language than what usual operating systems are
written in, applications or operating systems are written in.
So here we are extracting security specifications from the manifests of the install labs
and type changing guarantees that access controls actually enforce security. So an
example of this is suppose I want reading my contacts list has to require permission P.
Time system will actually ensure that only apps installed with those permissions can
eventually knows those contacts. Access control won't give you these things, because if
I have permission to read your contacts list I can read your contacts and share it with
somebody who does not have access.
It is trying to enforce these end-to-end security guarantees. Building a certified installer
process. It's a work in progress called Roid [phonetic].
>>: That means you really have to have full information flow access?
>> Avik Chaudhuri: Yes. But -- yes, that is true. But note here that we are not dealing
with arbitrary Java programs, which is really the reason why we believe this would work.
The applications that you write with these really high level APIs are very small. You can
understand them very well. And it can do this kind of flow analysis very well using these
apps.
The application model is very restricted.
>>: These apps cannot manifest ->>: They can manipulate string. But mostly you would know exactly what they're trying
to do, because there are various conventions and various APIs that are better to use. If
you cannot verify your app, you will say give your hands up. But most applications ever
looked at really follow a very nice structure, which allows you to do this analysis with.
And very similar ideas are being applied to Web application frameworks. So again Ruby
on Rails is such a high level framework for writing Web applications. It's built over Ruby,
it's a dynamically typed object-oriented scripting language. And rails essentially provides
very high level Ruby APIs for essentially automating the development of Web apps.
So really you just have to write very small bits of code and you can get your Web app
running. Sometimes you don't even have to write those small bits of code. You can just
write a line that scavenges all those things for you.
So what makes this work is again a very restrictive application model, right? So you
have this model of your controller architecture that is presented elsewhere, where you
essentially specify models using controllers, and models are automatically connected to
database tables, use a descript and describe by code embedded markups. You can
write Ruby Code inside your markup.
Controllers route request responses. So we have that kind of picture. And the slow one
here is convention of congregation. So if you follow convention, then most of the code
will be automatically generated for you.
If you break convention, then you have to do more congregations, you have to write
more code and typically people don't do that. They follow convention.
So what about static analysis in this framework? If you look at these RES APIs,
because they're doing a lot of automation, they're generating a lot of code for you. They
make extensive use of meta programming, all the dynamic features in Ruby. And then
the recognizes of all these features very difficult.
So our idea is that we built a tool called DRails that can translate these rails programs or
applications to explicit Ruby code, potentially larger Ruby programs.
But then they're easier to analyze because we don't use too many meta programming
constructs there. So essentially we're instantiating the res code for every application
that you're giving.
Okay. Again, we have a similar picture here. We have concrete coding Ruby on rails
and we have this translate code. And now we type checking using an in-house tool we
had previously before I came there called DRuby. And DRuby is a static type system for
doing actual type checking on this dynamic language called Ruby.
So here this is not a secure type system. All the type checking does is this guarantees
all method calls succeed. But the cool thing is that almost everything in Ruby is a
method code. So using this type system you can already get, catch a lot of bugs. And
we actually had good results. We found a lot of bugs that crash existing apps and these
bugs were conformed with the developers.
This idea works. But in the end simple type checking cannot catch other logical errors
and security attacks, for example, right? But conversely now building these advance
secure type systems are difficult for Ruby, simply because it's the dynamic typed
language and too many dynamic features that can compromise the soundness of such a
type system.
So here the process that we're taking is we're doing symbolic verification of these Ruby
scripts. So this is just a generalization of testing in the sense that we are symbolically
including these Ruby strips. But we are actually trying to do more. So we are not doing
testing. We're trying to do verification using symbolic execution.
The key idea here is that we are trying to have executable specifications. This is kind of
a neat idea. But again this is work in progress. So let me summarize now.
So I believe that in the system of the future we'll see more and more inferences of
program languages. We're already seeing a lot of inference and this inference is only
going to grow.
So here in designing a system I think the principles that we can borrow from
programming language is that we can start our system with core expressive
abstractions. These abstractions are expressive enough so that you can build
abstractions on top of them. But to have a core set of abstractions really makes life
easy, because then you can analyze your language and provide strong guarantees of
security and correctness.
And of course the implementations of these abstractions have to be taken care of. And
they should be viewed as refinements. So they should preserve the properties that are
promised by these abstractions.
If we can build our systems in this way probably we can get really robust systems. And
if you can do that, then there are lots of techniques that may help us. You can use rich
types invariants for specification verification and then the techniques of combining static
and dynamic mechanisms to guarantee whatever we want.
So I want to leave you with two slogans. So one is we should revisit academic
programming languages. If you have to build systems of the future, essentially we have
to take care of various distribution issues, failure issues, concurrency and academic
program languages are very good at that. Typically you'll see languages and ideas that
have already been explored. And then slowly ten years down the line some of the ideas
actually get transferred here.
But because we are at a point where we can actually think of making new systems with
totally different requirements, we can possibly revisit these academic programming
languages and try to transfer some of these ideas.
And the way here if you have strong foundations then already a lot of artificial complexity
is reduced and moreover it can support new inventions. If you don't have to be
distracted with arbitrary complexity in other domains you can actually try something new.
The final thing is that we should really view system design as language design. This is
again kind of a [inaudible] idea where if you're trying to design a system with very rigid
goals, then you have only designed that one system. And tomorrow if your requirements
change a little bit you have to rethink your design all over again. Whereas in language
design usually you get to be more creative. You essentially specify the minimum
elements that are common to all the designs you want to look at and then you build
some nice theory so that and principles which you can use to actually build whole
process of systems. So you can extract systems as whole classes of programs in this
language, so to speak. And the generalizations can provide insight and they can also
set trends. All this is very optimistic. But I believe that this is kind of the approach that
we should take. And I believe that there's a lot of opportunity to make lasting
contributions. So with that I'll conclude the talk.
>> Jim Larus: Questions? Okay. Let's thank the speaker then.
Download