>> Helen Wang: Good morning, everyone. It's my... to welcome Dr. Roxana Geambasu, who is a professor at...

advertisement
>> Helen Wang: Good morning, everyone. It's my great pleasure
to welcome Dr. Roxana Geambasu, who is a professor at Columbia
University. She's going to tell us about her recent work in big
data management and a responsible one.
>> Roxana Geambasu: Thank you very much, Helen. Thank you very
much, everyone, for coming. I'm very pleased to be here, back
here after two years and I guess a half now or something like
this, I'm very pleased to tell you what I've been doing for the
past, you know, year and a half, and this is a new topic that I'm
starting on new abstractions for what I call Responsible Big Data
Management.
We live in a world of big data. Really big data, right? For
example, it's estimated that just two days of current data
production is equivalent to what the world has produced since its
very beginning. That's gigantic, right? And this includes
browsing habits, social media, GPS and [indiscernible] data from
smartphones, you know, videos from smart classes or surveillance
cameras and so on.
So what's been driving this data surge? Well, it's a set of new
technologies, mobile and wearable computing, cloud computing, and
huge capacity disks which enable the acquisition, processing and
storage of unprecedented amounts of data. And this technology
essentially transform our world from an old world of isolated
desktops, you know, which gathered and used only sporadic data
from one user into a world in which mobile devices in our
pockets, in our wrists, in our noses and so on are pouring
enormous amounts of information into giant scale clouds where
they get stored and processed together with billions of other or
millions
billions, whatever, of other users' data.
Okay. And that's really great because this big data, you know,
big data has enormous potential, right, like I'm sure all of you
know, potential for all walks of society, business, science,
government, and so on. For example, it can be used to improve
business revenue through effective product placement and targeted
advertisements. And it's also been targeted, right, to be the
next big driver of scientific breakthroughs. Now, this immense
potential of data has led to what I call a true rush, right, in
which everybody's eager and extremely excited to acquire data and
leverage it for something new. Everybody's asking things like,
you know, what data can we gather or what can we use it for and
so on, and that's really great, because these questions foster
innovation and, you know, into kind of other great uses, featured
great uses of the data, but they also raise serious concerns.
Specifically, because that excitement and frenzy, you know,
because of that excitement and frenzy, a lot of people are taking
now dangerously permissive behaviors with respect to the data.
For example, we're seeing, you know, in a lot of places
aggressive accumulation in every click, stream, search, purchase
is being monitored and analyzed and archived, right, within a
giant scale cloud as well as on mobile devices. And, you know,
it's archived potentially forever because, you know, disks are
huge so there's no need to ever delete anything.
Data acquisition is also ad hoc and obscure oftentimes. We have
some applications gathering data outside their scopes. For
example, the Facebook like button, as you probably already know,
tracks your visits across all websites that include a button and
not just on Facebook, right? And finally, all of these, you
know, uses of the data and the commission of the data happen
without the user's knowledge or control, right. The user has no
idea where his data is being accumulated, on which device there
is his data on which cloud service and what it is being used for
and whether, you know, these uses are good for him or bad for
him, right.
For example, my Facebook likes, right, are using my Facebook
likes to recommend fun movies is a good thing. But using them to
drive my health insurance prices is not a good thing, right. So
for most types of data today, aside from prams, you know, health
data, banking data and a few other types of data, it's pretty
much a lawless land, this big data world. A bit like the wild
west with hardly any principles to govern it.
And that's very dangerous, particularly in the context of today's
increasingly aggressive attacks, you know, for example, mobile
attacks are extremely
you know, where a lot of this, most of
this data originates and it's cached are extremely prone to
[indiscernible] loss. Similarly, clouds, which, you know,
accumulate much of this data and archive much of this data are
magnets for sneakingly sophisticated attacks such as hackers and,
you know, subpoenas, foreign spies, insiders, and so on.
Now, of course, application writers and cloud providers and the
like already deploy traditional protection tools, right, some of
which are particularly powerful, right, especially the newly
developed encrypted databases that permit computation on top of
while keeping, you know, the data encrypted. However, despite
these great advances in the protection technology, protection
systems are not perfect either, right? Hackers do find their way
around firewalls and intrusion detection systems as, you know,
these snippets show.
And to date, you know, there's no such thing as an encrypted
database that supports fully fledged, arbitrary data analytics.
So what I believe is that instead of letting data accumulate
forever, right, and, you know, accumulate [indiscernible] and
doing so in very obscure ways, and then, you know, trying to
protect that data in all sorts of ways, I believe that it's now
time for us to be setting some ground rules, right, for this big
data game, right. So what should societies in particular, what
should society's rules be for collection of end use of the data.
How do we weigh the trade offs, right, between privacy, security,
and functionality. Those are the kinds of questions that a lot
of people have been asking lately, you know, from a principally a
perspective, and that I ponder through my research, right. So in
particular, I cannot answer any of these questions now, just to
be very clear. But I do believe that there is significant room
for stricter and more responsible approach to big data
management, and thus far, I've identified two directions for
improvement.
First, I believe that data accumulation should be much more
restrained and principled than it is today. Programmers should
reason about the data that they accumulate, whether it is all
needed and whether it can be trimmed for security. Sometimes the
answer will be no. But in other case, it may be yes, and I'll
show you an example in this talk later, you know, as the first
this concrete example that I'll show, a situation where you can
minimize accumulation without affecting functionality in
particular.
Second, I believe that there is an enormous need for more
transparency for users into what data is being accumulated, where
it is stored, how it is being used, you know, with whom it is
being shared and so on.
Now, the key challenge here is to meet these goals, right,
fulfill these kind of principles however you want to call them
without affecting performance and functionality.
>>:
And also productivity.
>> Roxana Geambasu:
>>:
Hm?
And also productivity.
>> Roxana Geambasu: And also productivity, that's true. That's
exactly right. And that's actually a very, very good point and
it ties exactly with what I say, because you know, programmers
and users alike have no support from the operating systems on
their mobiles or the infrastructures in their clouds to apply
these principles, you know, that I've just defined, right? Just
as an example, think of a modern operating system, right, a
modern operating system itself is incredibly dirty and opaque,
right? It leaves bread crumbs everywhere, right? Bread crumbs
of deallocated application data everywhere, and they provide no
information related to what object bread crumb stored where,
right? For example, on your mobile device, do you have any idea
what data you have stored there? If you lose it, do you have any
idea what you've lost, right? Probably not.
So my research focuses precisely
>>:
[indiscernible] snap chats, it's kind of [indiscernible].
>> Roxana Geambasu:
>>
yes, please.
I agree with you, people are
[indiscernible].
>> Roxana Geambasu: Yeah, that's a great point. People are
starting to look into that, right, and creating services, you
know, noticing that, look, this has gone out of control, right?
Accumulation has gone out of control and there are a couple of
services that are taking this stand that I'm arguing for here.
So I don't want to claim that this is, oh, I've invented this
problem, you know, or I'm the only one who's thinking about these
problems. No way, right? But what I do, you know, what my work
does is to design
I'll tell you in a minute, right, is to
design and build, right, and at times deploy, right, new
operating systems, extractions and distributed systems
extractions to simulate and promote data management for, you
know, responsible programmers.
And specifically for the past, you know, year and a half or so,
my students and I have been working for
on a number of
projects on this topic, and I'm listing here four of these
projects, two in each direction that I mentioned in the previous
slide. Limiting accumulation and increasing transparency.
They're each at very different stages of completion and, you
know, here I'll focus on two of these. These yellow ones. One
for each direction just to give you a gist for what I really
mean, you know, by these directions. And, you know, the first
one is what was published last year at OSD, and the second one is
very much in progress work. So you'll see, you know, a big
transition between, you know, stuff that I, you know, know and
stuff that I'm speculating a lot about.
Okay. All right. So let's get started with the first example.
I'll first talk about CleanOS, a system that we have built to
limit mobile data accumulation with a new process called idle
eviction. And it showcases the first principle that is how an
operating abstraction can involve or, rather, can improve in data
security on a mobile device by linking accumulation.
Okay. So let's focus on mobile devices right for now. As you
all know, right, these devices are taking over desktops as the
primary computing platform for personal users. And they have a
lot of advantages. I'm not going to go through them here, but
despite this great advantages, mobile devices also bring a number
of challenges, and ones which should challenge, like I said
before, is that when in the doesn't world data used to be stored
in a physically secured, firewall enabled network, you know, such
as, you know, a home network or a corporate network. Users now
take their mobile devices out in the world, right where they can
be easily stolen, or lost or seized or shoulder surfed. Okay.
That's a big problem. Now, despite these threats, mobile
operating systems, which, if you think of it, come really are
fledglings of desktop operating systems, you know,
[indiscernible] kind of links or [indiscernible] iOS or OSX, and
so on. These mobile operating systems have not evolved to
protect sensitive data from thieves and other parties that may
capture the device. Just like the desktop ancestors, mobile
operating systems mismanage sensitive data by allowing you to
accumulate on the theft prone device and such manage management
occurs at all layers of the operating system. For example, the
OS doesn't securely deallocate data, applications hoard sensitive
data for performance or convenience reasons, and so on. And, of
course, all of this data that gets hoarded on the mobile device
is placed at risk if the device is stolen or lost. For example a
thief can dump RAM or flash memory contents, right, or break
passwords or fingerprints and so on.
And let me give you a few examples from a small study that we ran
on Android to figure out, you know, kind of what the thief would
be able to capture if, you know, they were to steal the device
and break some of these very basic production systems. So we
wanted to find out just how much sensitive data he would get.
For that, we installed 14 applications on
within Android
gingerbread so it's an older version of Android with default
security settings. And among the applications we had, you know,
email, we had password managers, document editing things and so
on.
And we dumped the contents of the RAM and the SQLite databases.
For each application, we click in the table means that we found
clear text passwords or contents from the dump. For example,
let's take email as a specific example. We were able to grip the
clear text password from RAM as well as, you know, email snippets
at all times, okay. So, you know, they were hoarded there all
the time. And if you know, you know, Android RAM is not
encrypted so you can very easily get that.
Also, everything is stored in the present, right, in clear text
SQLite database. Okay. Overall, we captured sensitive data from
13 out of the 14 applications, and nine of these applications
hoarded sensitive data in clear text RAM at all times. That's
what we found. And much of this data is exposed by the
applications themselves, as you can see. And I don't believe
it's because programmers are malicious or stupid, okay. I don't
think that that's a good argument. I think they, you know, just
lack appropriate support from the operating system to manage this
data. If you think of it, operating systems don't have a notion
of sensitive data, right. They don't have treat anything, you
know, sensitive data in any way that's different. Yes, please?
>>:
Two questions.
>> Roxana Geambasu:
Yes, please.
>>: So first, the password, you mean like my email account
password?
>> Roxana Geambasu:
You're email account password, yes.
>>: And the second one is what's [indiscernible] clear text all
my password [indiscernible] is a problem or not.
>> Roxana Geambasu: Yes, that is correct. So what I'm, you
know, I'm talking here about very sophisticated. It's going to
become clear in the next section or in the next slide that, you
know, I'm assuming a fairly sophisticated attack, right. I'm
assuming essentially that the attacker can do any physical attack
against, you know, against the device. They can dump the RAM.
They can, you know, perhaps break through the user's
authentication and so on. Let me go through, you know, the next
slide, I believe, is when I
the next couple of slides when is
I make that clear, okay?
All right. So and, you know, of course, like you're saying,
right, the big issue with the sensitive data accumulation that is
securing the data is really, really hard, you know, under
particular threat models. And the issue, right, you know, sure,
of course, right, people should and can, you know, encrypt their
file systems and the RAM and use, you know, some of the existing
automatic wipe out systems to disable data after a device loss.
However, I argue that these existing systems come with
limitations. They're not entirely perfect.
For example, statistics show that 56 percent of corporate users
don't lock their device. That's down one percent, it's true,
from two years ago. The same, you know, survey two years ago,
but it's still, you know, very significant. For those who do use
lock their devices, they configure, you know, extremely
oftentimes, extremely poor passwords, right. We all know that.
Which essentially rendered encryption useless, okay. And, of
course, you have probably heard about the recent, you know, Apple
touch ID, you know, highly usable fingerprint based
authentication system for the iPhone and it's very useful and I
do believe that it increases security, right, against, you know,
certain types of attackers. Potentially not the sophisticated.
However, you might also have heard, right, that the Apple touch
ID was hacked by
was it Germans? Well, from Europeans. I
forget. Right. And, you know
>>
Hacked?
>> Roxana Geambasu: Well, you know, the way it was hacked is
very easy, right. The fingerprint is almost everywhere, right?
It's on the device itself. And you can, you know, photograph it
and generate a fake, you know, fake finger. You know, fake
[indiscernible] there are videos that show how to do that, fake
[indiscernible] kind of thing that you put on the finger in order
to authenticate yourself to the device.
Okay so and, you know, they argued, right, that, you know, our
system that requires fingerprints and things that, you know, are
everywhere, is not necessarily a good authentication system. You
know, whether it is or not, the reality is that now protection
system that we, you know, are talking about is perfect, right,
and in particular, what I argue here is that these solutions
that, you know, we are talking about here are imperfect stop gaps
really for operating systems that were never designed with
physical security in mind, right, desktop computers were assumed
to be largely physically secured, right. And as a result, they
lack abstractions for dealing with sensitive data and
inappropriate manner. Okay. Does that answer your threat mail
question a little bit?
>>:
Yes.
>> Roxana Geambasu: Okay, thank you. All right. So in this
talk, we argue precisely for that. We argue for the need of new
mobile operating system abstractions for sensitive data, and we
believe that rather than allowing sensitive data to accumulate
unlimitedly on the device and then scrambling to protect it,
mobile operating systems should instead try to manage sensitive
data more rigorously and keep the device clean at any point in
time in anticipation of device theft or loss. And if the device
is stolen or lost, then the operating system should ensure that
the minimal amount of data, sensitive data, can be exposed and
that users know precisely what data was exposed, okay. Yes,
please?
>>: [indiscernible] password manager.
sense can minimal
minimize the
>> Roxana Geambasu:
>>:
I guess that in some
It can.
The problem of setting a password on a device.
>> Roxana Geambasu: Yes, it can. Or you can use, you know, key
chains. Key chains have existed for a very long time and so on.
So there are solutions for managing certain types of data. But
when you use a key chain or, you know, this, you know, other
solutions to store your emails, why, you know
I mentioned
passwords because passwords are what people react to when they
think
not react to, sorry. Think about particularly when they
think about sensitive data. I argue that it's actually not the
password that I care about. It's really the data that I care
about, okay. The data which the password, you know, is
considered as the lock that locks that data, okay. But really,
to me, if, you know, they can read most of my email or something
like that, but they don't get my password. Whatever, you know.
That's extremely bad, right. Do you see my point? So very
specific types of data, maybe there are solutions for them, but
the problem is that this is not that kind of a generalized
solution and I do believe that, you know, an abstraction within
the operating system to manage sensitive data more rigorously in
general, sensitive data, is to me long overdue.
>>: So [indiscernible] this morning saying, I mean, Obama made a
comment either today or yesterday saying he can now use the
iPhone because the security. And the [indiscernible] create a
phone that Obama can use because [indiscernible] network hacks.
>> Roxana Geambasu:
>>:
Well, you know
[indiscernible].
>> Roxana Geambasu:
because
Eventually, but I'm not going to go there,
>>: Because some people really care about their data safety
[indiscernible] otherwise you couldn't use a mobile device. But
the cause that user study show a lot of people, they don't care,
right.
>> Roxana Geambasu:
So actually, user studies show
[Multiple people speaking.]
>>:
So that the half of the user base like to [indiscernible].
>> Roxana Geambasu: Right. So I don't have that statistic here,
but there is a statistic run by Mozilla that actually shows that
users, when they feel it is incredibly important for them to do
so, they do configure, and this refers to configurations within
Firefox, they do try to figure and, you know, and protect their
privacy, okay. So this argument that users don't care, you know,
I don't believe that it holds true. There are things that I
really care about, okay, that are very sensitive for me, right,
and there are things that, whatever, you know. It's okay if they
get, you know, they get leaked, right. So, you know, I don't
believe it's so black and white, and I don't believe that we
should disable operating systems, you know, mechanisms, right,
just because not everybody, you know, needs them, okay. So
that's kind of my argument.
I was going to give you one more example with respect to this.
So, for instance, there are these apps, mobile apps, and I know
those on Android, I forget their names now. One of them is volt
hide and the other one is something else. What they do is they
eventually take a few types of your data, right, like, you know,
your images, they know how to hide your images or some images
that you select, right, hide some contacts or things like this,
and that's what they do, and they have, you know, between I think
10 and 50 million downloads each. I don't know what that means
in terms of usage, of course. Downloads is one thing. Usage is
another thing. But that may also kind of tell you that, well,
you know, this is not, you know, necessarily that, you know, we
can't put a blanket on all users, right, or on any user at all
times. Okay. There are situations when I care. Situations when
I don't care. So, you know, having that option I think is very
important to protect with you really care about it.
And I have account project, I'm not going to talk about it here
well, I don't know if I should say much at all, actually, because
it's under submission, so let me just get back on that. But it
relates to this.
Okay. All right. So where were we?
I answer both of you questions?
>>:
Just a second.
Okay.
Did
Yeah.
>> Roxana Geambasu: Okay, all right. Okay. So the point is,
right, that I believe that it's time for new operating systems
abstractions and that's what CleanOS. That's what we did in
CleanOS. CleanOS is our first step toward creating what I call a
clean mobile operating system. It's an Android based operating
system that minimizes and audits the accumulation of sensitive
data on a stolen or most mobile device. Or, sorry, minimizes and
audits the accumulation, right.
And it does so by implementing an abstraction code of sensitive
date an object or an SDO, which we believe, as we said, right, is
a long [indiscernible] abstraction within operating systems. And
SDO, what they do is they identify locations of sensitive data
both in RAM and on stable storage, monitor, you know
monitor
the use of
CleanOS monitors the use of these SDOs by the
applications and evicts sensitive data to the cloud
cloud, right, whenever it is not under active use.
to a
So this eviction process, what it does is that it helps maintain
a clean environment at any point in time on the device so that a
potential thief can't get a free lunch by capturing the device.
And instead, the thief has to go to the cloud, right, to access
any unused data. And upon that time, the cloud can enforce a set
of useful post loss functions. Have a question?
>>: Probably just asking you to push the slide advance button,
but it seems like this is going to have pretty [indiscernible] on
both power
>> Roxana Geambasu:
>>:
Yes, I will show you at the end.
And disconnected operations.
>> Roxana Geambasu: Yes. I will show you at the end. I don't
believe I have a slide on that, but I can talk to you about what
we do about this kind of operations in particular. Next slide is
going to address that, yes.
Okay. So now, you know, cleansing the devices, the operating
system is a very broad vision and, you know, very complex thing.
Because as I said, at all layers, right, there's dirtiness within
all layers of the operating system. And here, what we are doing
is that we're going to focus only on cleansing data hoarded by
applications themselves, not as much the lower levels, right,
like OS buffers and so on for which work has existed for a very
long time.
Okay. Our design of CleanOS, you know, to address this question
indirectly a little bit, our design of CleanOS relies on a few
crucial insights from mobile operating systems, you can think of
them as assumptions, right. But I think, you know, they are
largely true. First, although sensitive data are exposed
permanently, much of them are actually used very rarely. For
example, the email password is constantly being exposed by the
email application. However, it's only used during refreshes.
A particular email's content, similarly, is only used when the
user reads that email, okay, not otherwise. Second, mobile
applications also oftentimes have a cloud back
according to
our studies, about 70 percent, two years ago, that was the case
of the applications have a cloud back end. And, you know, which
already stores that sensitive information so, you know, why
expose it on the device as well.
Third, mobile devices are becoming increasingly connected with
pervasive wireless and 3G, 4G, cellular coverage, right, these
days. And what we do is we leverage these insights, these
assumptions to turn Android into a clean operating system and
I'll tell you a little bit about how we do that next. But, you
know, we do include mechanisms that, you know, assume some of
these away, right.
All right. Okay. So the basic functioning of CleanOS is like
this. Applications create SDOs and place their sensitive data in
them. This way, they identify to the operating system their
sensitive data objects. Okay. And CleanOS manages them
rigorously by implementing three key functions.
First, CleanOS tracks data within SDOs using taint tracking. It
automatically adds any new data computed from the SDO to the SDO
itself.
Second, CleanOS evicts SDOs to a trusted cloud whenever the SDO
becomes idle, right, hasn't been used for a period of time. And
the trusted cloud could be, for example, the application's own
cloud or a third party maintained service and, you know, by the
way, you know, I said that we evicted the SDOs to the cloud. We
don't actually ship data back and forth. But rather, we just
encrypt the data in place and ship the keys back and forth, you
know, the keys are actually stored in the cloud. Yes, there is a
question.
>>:
This is a [indiscernible] I envision, it's a
>> Roxana Geambasu:
I didn't understand that.
>>: You say the classic cloud, are you thinking about an
[indiscernible] cloud like I'm using [indiscernible].
>> Roxana Geambasu:
Yes.
>>: And now the OS, not just OS, but the [indiscernible] and my
outlook app on my phone, they both have to understand this new
abstraction.
>> Roxana Geambasu:
Yes.
That is correct.
>>: [indiscernible] specific.
this.
[indiscernible] app has to do
>> Roxana Geambasu: Yes, that is correct. So the point is that,
you know, the operating system on the mobile device, right,
provides this abstraction. This abstraction has a back end. You
know, if the application has a back end, the key for the SDO is
stored in that back end, can be stored in that back end. And if
the application essentially wants to use that to cleanse itself,
some applications may not need to completely or want at all.
Right. Other applications may leverage our services, right, in
order to do that much more easily than they would
you know,
they would be doing otherwise, right. And the way they, you
know, they integrate with us is that they implement this
interface that I'm going to show in a second and, second, they
host, you know, a key server on the server side, right, in their
cloud.
This is not something that we've done within CleanOS yet, but I
believe
so, you know, mobile operating systems, you know, have
evolved. Are very different, in fact. They are coming from the
same operating systems, but they're very different, okay. One
way in which they've evolved is, you know, by going to, you know,
to the cloud, right. Most applications are cloud based. Yet the
operating system still, you know, has kind of on the
you know,
is only local. I do believe that abstractions need to, you know,
for managing data need to transcend, you know, the mobile and the
cloud portions, okay. And, you know, by transcending, what I
mean is that this SDO should actually be existing on that side as
well, okay. And
>>: Has to be application specific.
store anything on a server, right.
>> Roxana Geambasu:
>>:
Um hmm.
[indiscernible].
>> Roxana Geambasu:
>>:
Because [indiscernible]
Yes.
[indiscernible].
>> Roxana Geambasu:
application.
So CleanOS does require changes to the
>>: That means I have to change the application. I have to
change the protocol, my application protocol between my
application and my server and then that has nothing to do with
>> Roxana Geambasu: I don't know that you need to change the
protocol, right. I think the protocol can go the way it goes,
right, and then you need to add on your server side, you need to
be hosting a key server, okay, in addition.
>>:
[indiscernible].
>> Roxana Geambasu:
It is an assumption, yes.
>>: This trusted cloud does not have to be
[indiscernible].
could be
>> Roxana Geambasu: It can be [indiscernible]. So, you know,
there are multiple deployment options here, right. Either
an
application either makes this constant, conscious decision to I
want to cleanse, I want to use this abstraction. I want to
cleanse, you know, cleanse my data. Well, what they will do is
they will host a web server, right
a key server. The
application will implement this abstraction and that's how you do
that, right. For applications that don't do that, we actually
have default and those sorts of things, you know, to identify
sensitive data, in fact, ourselves. And you, the user, can take
this over, right, and say no, I want my device to be cleansed,
okay. So what you do is you have that host, you know, the keys,
right, the key server yourself. Yes, please.
>>: So just to make sure I understand what we're getting with
this interaction, we're not sending data off to the cloud.
>> Roxana Geambasu:
No.
>>: Because the whole point of this is all the cloud's really
doing is giving us an opportunity to throw away the keys
remotely?
>> Roxana Geambasu:
Yes.
>>: One equivalent level for that would be
stored it all locally, we protected all the
we take the last key at the very top of the
off to the cloud and then we have to get it
imagine if all
we
encryption, and then
tree and we hand that
back later.
>> Roxana Geambasu: You can do this, and that's essentially what
this will give you is it will give you an all or nothing kind of
cleansing.
>>: So what do you get by breaking it down to a per application
basis?
>> Roxana Geambasu: Well, actually, even more than that.
Breaking it down to a per object basis.
>>:
Why is that
>> Roxana Geambasu: So the reason why it's useful is that the
whole idea, right, of minimizing exposure. The point is your
mobile device accumulates a lot of email. Well, most of your
emails will probably be on your mobile device, right? You don't
read all of
just one second. So you don't read all of your
emails at the same time, clearly, right? There are very few
operations that have to access all of those emails, right? So
why are they on this device, okay? Let me show you perhaps it's
the next slide that will show you the use of this, you know, of
this kind of restricted, you know, accumulation that will reveal
this. Are we asking
>>: So the concern you have is that I'm looking at my device, I
read my email, and if reading my email involves decrypting
having the top level key on the device and then I set the device
down and somebody steals it, they can see everything on the
device, whereas in your system, they can only see the email
>> Roxana Geambasu: Yes, that's correct. Directly. They can
only see directly the email that you just read. So it's this
whole idea of taking a device that accumulates a lot of things
and minimizing that accumulation to your working set, pretty
much, right. You as the user, the application's working set.
>>: The question is why does this need to be an application
involves abstraction? Why can't you do it at the page level? I
mean, you're in front of it. Sorry, you're asking the
application to expose its working set. But why not just infer it
from
>> Roxana Geambasu: So a lot of reasons. Performance is one of
them, right. It's good to differentiate between what's really
sensitive and what's whatever, right, Java stuff. You know,
like, you know, there is an enormous amount of
sorry. There
is an enormous amount of data, other data that's really not
>>: That probably requires measurement, because the alternative
I mean, because the flip side argument is you're asking for an
invasive change. And if you could make a
if you could do this
at the page level, it applies to all applications right now
[indiscernible].
>> Roxana Geambasu: Um hmm. So I think you can get this at the
page level. You can. If only
the only thing that you cared
about was minimizing the accumulation, then you can do this,
okay. But there is another question. What you want after you
lose the device is you want to ask what has been potentially
compromised. And you'll find if that's
if the unit of what
you evict is the page, you'll find page 0XAB75, you know, has
been, you know, potentially accessed or exposed on the device,
okay. What does that mean for you.
So there is no meaning associated with a page.
meaning really associated with a file.
>>:
There noise
[indiscernible] storage backup to [indiscernible].
>> Roxana Geambasu: So what I argue is especially for things
that, you know, you need to understand and not just auditing,
hiding as well and protection in general, I really think that
that's the case. You really need another extraction, an object
level abstraction, because that's what you, the user, can
actually understand, okay.
And today, we're doing protection at these levels, much lower
levels, you know, at block disk level, page level, at, you know,
file level which are completely meaningless. What's a file, for
example, on your mobile device? It's nothing. I don't ever see
files ever on my mobile device, right? So it's a bigger kind of,
bigger spectrum of ideas, right, that come in and motivate this
choice, right. Yes, please.
>>: Why do you need a trusted cloud? Because in your scheme, in
some sense, [indiscernible] sending things over to the cloud, I
could just choose to encrypt those things, and now
[indiscernible] I throw away the key and the next time, only if
you enter some credential, right?
>> Roxana Geambasu:
You could do that.
>>: [indiscernible] that credential would be used to retrieve
the key that would be used for
>> Roxana Geambasu: You can certainly do that. Again, you're
asking
is anybody
I shouldn't
like what can I say and
what can't I say, do you know? Like if this is under submission?
At any of the security conferences, are you
>>:
[indiscernible].
>> Roxana Geambasu: Never mind. The point is like some of these
questions, I'm addressing them in some of the other work that
I've done, and, you know, this notion of kind of having users,
for example, you know, hide, you know, their objects at the
object level, you know, hide their objects within applications if
they show choose is something that I've been looking at.
>>:
[indiscernible].
>> Roxana Geambasu: Yeah. That's a possibility. It's possible.
There are big challenges when you try to do that. And, you know,
we can
yes, there are big challenges, actually. Because the
nice thing about this is that the cloud is always available. If
you're encrypting and requiring that
well, it's assumed to be
always available. If it's not always available in CleanOS, let
me just
>>:
[indiscernible].
>> Roxana Geambasu:
Yes, please.
>>: I have to provide something, and that something presumably
is [indiscernible] my fingerprint or whatever. They can use that
information.
>> Roxana Geambasu: So what you are thinking is whenever you
need a particular key, you would prompt the user for that key?
Is that what you're thinking?
>>: Whenever I log in.
[indiscernible].
>>:
[indiscernible] unlock my phone
You lose some information, which is what gets lost.
>> Roxana Geambasu: Yes, that's right.
are losing it altogether.
>>:
The [indiscernible] you
[indiscernible].
>>: The data is not exposed to other people, but you may not
remember [indiscernible].
>> Roxana Geambasu:
Well, I don't think you can tell what
[Multiple people speaking.]
>>: Hang on. The assumption here is that when you lose your
device, there's a possibility of losing your device while some of
the data is not protected. So it's the same
>>:
[indiscernible].
>>: No, no, she's making two criticisms of your alternative
approach. The first one that is in your approach, if you never
lose the phone when it's not locked, when it is not locked, your
[indiscernible] is fine. Because there's no auditing required,
because there's no data that can possibly be lost. So Roxana is
assuming that you can lose the phone when it's unlocked.
[indiscernible] losing your phone when it's unlocked cause the
system
]
[Multiple people speaking.]
>>:
Because this scheme
>> Roxana Geambasu:
Doesn't
>>: One is it doesn't
it has finer granularity locking. When
your phone's unlocked, it's only
you know, much of the data is
still locked. Only the part you're looking at is unlocked.
>>: No, I'm not changing that. All I'm changing is
[indiscernible] I could just throw away my [indiscernible] or
encrypt my [indiscernible] key with internal information my
[indiscernible] password I use to unlock my phone. That's all
I'm out here to change.
>> Roxana Geambasu:
Yes.
[Multiple people speaking.]
>>: You can't encrypt your web password, because you're going to
have to type that password in every time you go from one message
to the next.
>>:
Yeah.
>>:
No, I'm not, right.
[Multiple people speaking.]
>>: When you turn off the phone, are you going to leave the key
lying around.
>>:
There's no key lying around.
>>:
In the phone
>>: The [indiscernible] lying around in the phone when it's on
here, because [indiscernible] trusted cloud [indiscernible]
retrieves whatever key, you have to authenticate yourself and you
[indiscernible] authenticate yourself again and again.
[Multiple people speaking.]
>>: No, no. Remember when you're going to the cloud, one thing
you can do to the cloud is from a different channel, you can
revoke that connection, whereas you can't revoke the key that's
sitting on the phone. Revocation is an important
>> Roxana Geambasu:
things here.
So there are two
well, there are two
>>: Because when the device is lost, you call it IT department
and say please revoke the cloud key, which is
>>:
[indiscernible].
>>: Where you can't call the phone and tell it to revoke its
key.
>>:
[indiscernible].
>> Roxana Geambasu: So first of all, the master key, I did not,
you know, propose the master key. What we do is we have part SDO
keys, part object keys.
>>:
[indiscernible].
>> Roxana Geambasu:
correct.
You have to use the network, yes, that's
>>: So that's [indiscernible] you don't want to store the
authentication on the phone.
>> Roxana Geambasu: Yes, I can. So here's the way this will
work, right, and I think it's my next slide. Is it? Yes it's my
next slide. So if you can hold off just a second, you know, I'll
be able to answer, you know, your particular question. And the
idea is that we don't offer just, you know, minimizing
accumulation, but we also offer auditing with that. Okay. So it
will become clear. All right. So the idea is the following,
right, that, you know, the application kind of implements this,
and the, you know, I'm sure in here, I just wanted to show a
couple of examples. Let's suppose that the email application may
have, for example, a content SDO which contains, you know,
corresponds to an email and that content and a password SDO,
right.
When you read your email, okay, that password is actually not
used. It's only used during refreshes. So this is evicted.
This doesn't exist on the phone for all intended purposes, okay?
It is still available. Its key is still available in the cloud,
all right. However, you have to go to the cloud to fetch that
key in order to access this, all right, because it's evicted,
right. When you stop reading the email. When, for example, you
send the application into the background, the content SDO is not
used either. So both of them are evicted, okay?
So how do we use this? Did I scroll? How do I scroll? Okay.
All right. So, you know, how do we use this? What's the major
benefit because I think that's what you're really asking. What's
the major benefit of CleanOS. Well, it's the fact that it
increases post loss data control. For example, suppose the
following situation, right. Your device gets stolen or lost at
some point in time. And you notice the device is lost after a
while. Initially, before the thief stole the device, he had no
access to it. But after he steals your device, he gains full
access to the device and any data stored on it he can tamper with
a device in all sorts of ways, both in hardware and in software.
For instance, he can dump the content of RAM, et cetera. With
CleanOS, however, two things happen. First, because CleanOS
keeps evicting item data at a fine granularity, it ensures that
only a few sensitive data objects are exposed on the device at
the time of theft.
Second, after theft, the cloud can implement a variety of useful
functions on top of the SDOs that, you know, those SDOs that were
not exposed, okay. For example, it can log all accesses to this
SDO so that you can tell after a while, after you go to the
cloud, you can go and tell
ask him what data was exposed after
theft. You can also disable all accesses to these new SDOs,
these other SDOs that were not evicted yet
that were evicted
at the time of theft.
So therefore what I'm trying to say here is that CleanOS gives
you much better control and transparency over the sensitive data
on your mobile device, and the way it achieves that is through
fine grained minimization kind of data accumulation or
preventing, I guess, you know, data accumulation, all right.
Does that address the questions, the concerns that you have?
>>:
The point is when a phone is lost when it's unlocked.
>> Roxana Geambasu:
Yes.
>>: Now we have a better way to control it because the key is
not stored on the phone.
>> Roxana Geambasu:
Yes.
>>: The key is stored on the phone when it's unlocked
[indiscernible] lose control. And when the phone's locked, it
should be the same.
>> Roxana Geambasu: Yes, that's right. Well, yes. Except
there's one single difference. How do you know that your lock
has held? Okay? How do you know? You lose the device. How do
you know what's happened, right? Do you know for sure that that
lock, that password that you set is locked, you know, cannot be
exposed? Can you tell? You know. Maybe you can for your
passwords. But maybe, you know, the random person can't.
So the point here is that this way, you can
>>: Are you making an argument that your password might be
guessable?
>> Roxana Geambasu: Well, it might be, right? It might be. It
might be, you know, my husband works in the financial industry,
and I am horrified at, you know, the kinds of practices that they
have with respect to passwords particularly on their mobile
devices. And as I told you
>>:
That's a pretty essential assumption here.
>> Roxana Geambasu:
Yes.
>>: I think you're saying that if every user used 128 bit lock
passwords every time they looked at their email, about 128 bits
of [indiscernible], you would need this?
>> Roxana Geambasu:
Yes, correct.
>>: Okay. Well, no, that's important. That's a valuable thing
to communicate [indiscernible] we understand that that's an
important
>> Roxana Geambasu:
But that's obviously not feasible, right?
>>: I think that's a fine assumption.
clear until now.
I just think it wasn't
>> Roxana Geambasu: Okay, okay. Right. So my point is the
following, you know. Irrespective of exactly what attack, what,
you know, protections you're talking about and so on, I think the
basic, you know, concept that I'm trying to communicate to you
guys is making data accumulate enormously and then trying to
protect it in all sorts of ways, to me, is not an indication of a
responsible data management theme, right. What you should do, in
my mind, what you should do is what Snap Chat does. For example,
you know, we think it does. Which is to reduce the documentation
because then you have a lot more control, and this at least, when
the bad thing happens, if it happens, because attacks are
inevitable, unfortunately, in today's world. When attacks do
happen, at least you know for sure the least amount is
compromised and second potentially, you know, if you have this
kind of an architecture, right, you can audit what the exposure
was, and that's what I'm trying to communicate here, right.
Irrespective of the specifics of particular attack, and I think
that's a concept that is applicable for mobile devices, right,
for clouds, and for, you know, for potentially other, you know,
corporate networks, potentially other, you know, environments as
well, okay?
>>: [inaudible]. If there's a strong password to lock the
device, then you don't need this.
>>: Well, sorry [indiscernible] if your device is stolen while
unlocked, there's a question of how much stuff is there. So
there's certain
there's two entry points that the attacker
has. One is the attacker, if the attacker gets ahold of the
device while you are using some of the state [indiscernible] how
much of that state is accessible and also be able to tell
remotely how much of it could have been accessed.
The second observation is even if you did have the device locked,
what it was a crummy password? [indiscernible] same guarantees.
In other words, a crummy password looks like an unlocked device.
So the question is how much
how much trouble could have been
>> Roxana Geambasu: So I thought that the alternative was to
have one password per object [indiscernible].
>>:
Suggesting
>> Roxana Geambasu:
>>:
A strong password.
A strong lock password.
>> Roxana Geambasu: A strong lock password. If you have
encrypted RAM, if you have, of course, you're encrypting your
disk and you have a
then exactly what you said is valued,
right.
>>:
Except the [indiscernible].
>> Roxana Geambasu:
>>:
Yes.
I think [indiscernible].
>>: And if you use a strong password, unlocking it is not easy,
right? Pretty much I think you won't be able to break because
the firewall [indiscernible] whatever it is can make the bar
really high. That means the key [indiscernible].
>> Roxana Geambasu: So by the way, that's very dangerous, the
whole, you know, [indiscernible] five times thing, unless, you
know
and maybe it's okay now because all of your data is
really in the cloud and you can just, you know, reset your entire
device. But, you know, the five account, five times trials is
dangerous, especially if your key's only on the local device, not
in the cloud, because then you can pick your device and all of
the data that's stored there, if there is data that's only stored
there
>>: [indiscernible] opposite way. Now you use the cloud to back
up that key in a way that takes a greater burden to unlock. But
there, the cloud is providing disjunctive access to the device
rather than conjunctive access to the device. In your approach,
the cloud needs to get involved if you want to get to the data.
>> Roxana Geambasu:
Yes.
>>: And the way you get to the access is that you
[indiscernible] your key somewhere which is disjunctive.
>>: [indiscernible] trade off of increasing the complexity of
password and the increase the limit of accounts, right? You
could practically it won't be repeat that many times and it's
[indiscernible] unlock the device once locked.
>> Roxana Geambasu: So, you know, I forget, there is nothing
that I can argue. But, you know, the reality that we live in is
this, right. That, you know, users oftentimes don't do that.
>>:
[indiscernible].
>> Roxana Geambasu: No, no, I'm talking here, actually, 56
percent of corporate users.
>>:
[indiscernible].
>> Roxana Geambasu:
>>:
You're one of them, so there are
Yeah, I am one of them.
You have to
>>: [indiscernible] I choose to not have a pin, because too much
more convenient than my security. I have the pin just because
[indiscernible] requiring me to do that. But the moment I can
>>: [indiscernible] it's because if I lose it, I want people to
use that to call me.
>>:
That's another reason.
>> Roxana Geambasu: That's a great use case here, because you
lose it, right, somebody else, you know, finds your
somebody
good, nice, right, find your phone and wants to call you, go to
the little, you know, server there and you look is this someone
nice? I don't know who this guy is. Has he looked at something
significant or not?
>>:
[indiscernible].
>> Roxana Geambasu: It's true, right, but at least you know,
right? What can you do? You're completely right, but at least
you're aware. So again, lack of transparency, lack of
[indiscernible] and so on, you know, are very problematic for
users today, I think. And, you know, we have too much of that.
So, you know, I agree with you on this, but on the other hand,
you know, I think there is, you know, great room for improving.
So what you're saying, essentially, is well but if I had a
complex enough password and unlock system, this would all be
fine. But that really, what we're doing and we've been doing
this for too long, I think, systems people, we've been pushing
stuff on to the end user, the responsibility on to the end user,
right. So we let stuff accumulate and then we say ah, and the
user will fix this because he will use a strong password. And,
you know, he will have to type it a thousand times per day. I
don't know. I'm making this up, right, but, you know, he'll do
that if he cares, okay. And, you know, the reality is that
oftentimes, you don't care until you care. So until you lose
that device, right, maybe, you know, okay, well, I prefer, you
know, [indiscernible] and so on, not having to unlock things and
to type things and so on. But once you lose the device, you
think oh, my God, what was there? Shoot, you know. What did I
lose. And so that's the question that, you know, this is
actually trying to answer.
First, it's trying to ensure that it doesn't answer.
And second
>>: I just speak for myself. When I lose a device, it's not a
huge deal because I don't have secret that I don't worry about
people
I mean, I worry about my financial safety, but I every
financial application require password to log me in. So they
wouldn't be able to use my apps [indiscernible] Bank of America
and [indiscernible] confidentiality on my data, like my pictures
and my email, I mean, they won't be able to use anything really
harmful to me.
>>: What you're arguing is that financial organizations are
already doing this on a per application basis.
>>: That's it. That's what I care about is for anything that
wants to do this protection, they could do this at the course
granularity and as [indiscernible]. I don't care
[indiscernible]. Willing to type in a password every single
time. I never want to do that.
>>: The thing that you're assuming that is different or that
Roxana is assuming is that in your example, you're invested in
protecting your financial data. What Roxana cares about, for
example, the corporate user who isn't terribly invested in
protecting Microsoft's [indiscernible], yeah, there's a please,
I've got to follow it. But you're saying can we arrange the
device in a way where we can lower the cost to you as an end user
to providing that security, you know, to participating in
protecting [indiscernible] data, even though personally it's hard
for you to care on a day to day basis about that [indiscernible]
with the same level of conviction that you care about your stock
options disappearing, right? So that's what, I think, the way in
which your example didn't apply to her argument is you say, well,
the applications I really care about, those are being protected.
But if every single Microsoft internal site that had
[indiscernible] on it made you type another stupid password or
remember a password and you deal with some sort of key chain
thing on your own, I mean, that's where that
>>: But now we are not talking about an app now?
about using a [indiscernible] to access
We're talking
[Multiple people speaking.]
>>:
You generalize.
[indiscernible] is on your phone.
>>: We generalize you can make an app and you can make the
SkyDrive Pro [indiscernible] just like Bank of America. And
we'll sign out every single device and now it's encrypted. You
could just do that [indiscernible].
>>: This is more usable.
every time.
You don't have to enter a password
>>:
But this makes a [indiscernible].
This is more usable?
>> Roxana Geambasu: So I don't argue either way. I don't want
to argue either way. You know, the argument that I'm really
trying to make, again, at a much higher level than going into the
very specifics of each case, right, is that what I'm trying to do
is I'm trying to investigate what happened
you know, what it
does mean to accumulate enormous amounts of data as we are
capable from a technical perspective to do today, okay. What
does it mean, okay. And how can we control that accumulation in
some way, because there are
I'm sure in here one case in which
I do believe there are scenarios, situations in which this
minimize accumulation or limited accumulation is very useful,
okay. But kind of the broader argument I'm trying to make is
that, right? So
>>: [indiscernible] I like what you say users don't block their
devices and configure poor passwords. However, if they have a
strong password and keep it locked all the time [indiscernible]
unlocked and [indiscernible] minimal. That's going to be very
unusable.
>> Roxana Geambasu:
Um hmm.
>>: You argue that there is a big usability issue of relying on
strong passwords, short intervals.
>> Roxana Geambasu:
Um hmm.
[Multiple people speaking.]
>>:
In comparison, does your scheme [indiscernible] usability?
>> Roxana Geambasu: Well, from certain perspectives, yes. From
other perspectives, no. In particular, you need
your device
needs to be connected, by and large, right, in order to be able
to access your data.
>>:
[inaudible].
>> Roxana Geambasu: It's always connected and, you know, unless
you're in the New York subway, the New York City subway and
that's when it doesn't work. Hm? I'm sorry?
[Multiple people speaking.]
>>: I'm curious about the answer.
connected.
>> Roxana Geambasu:
So let's say it's always
Yes.
>>: Does it present some usability benefit so that my data is
protected?
>> Roxana Geambasu: So I'm not going to argue that it's now good
if you CleanOS it's now good and fine for you to not lock your
device, because your data will get indeed
your question,
right? Well, you know if I know that my data [indiscernible]
access, what good is it for me, right, you know, that I know,
right? So I think protecting your device is still important.
Perhaps the pressure, you know, to do that is not as high
potentially, right, as before. But, you know, I am a thorough
I'm a full believer in, you know, protecting your device as you,
you know, as you normally would, right. But with this additional
>>: There's a trade off. So you don't have to spend so much
effort
the user don't need to spend so much efforts, you know,
in a hard password or [indiscernible] say unlock, unlock.
[Multiple people speaking.]
>>: It's a trade off [indiscernible] but here if the window
between your device is lost and the time you know you realize
that you've lost, it
[Multiple people speaking.]
>>:
Doesn't protect anything.
>> Roxana Geambasu:
Doesn't protect.
>>: [indiscernible] protect you but tells you what data was
accessed.
>> Roxana Geambasu:
>>:
[indiscernible] lose your data.
>> Roxana Geambasu:
>>:
Yes, that's right.
Yes, correct.
And nobody knows, right?
>> Roxana Geambasu:
That is correct, but there are two
>>: She's [indiscernible] are you talking about Roxana's
proposal?
>>: Yeah, I'm talking about [indiscernible] that window, you're
not being protected. You're only being collect [indiscernible]
information say oh, that sensitive email was downloaded to my
device through that window, even though I [indiscernible] but
during that window, nobody is protecting you.
[Multiple people speaking.]
>>:
There is [indiscernible]
>> Roxana Geambasu:
>>:
But not just
[indiscernible].
[Multiple people speaking.]
>>: The keys are stored in [indiscernible] and unlocked and it's
just you. I don't ask you to type a password, a pin. I will
just you click on that email, you retrieve it
>> Roxana Geambasu: So you retrieve the key, okay. So what will
happen in that case this, right. The thief, right, will try to
access this email and that email and that email. It will always
keep the cloud, okay, because they're all evicted, let's say,
okay. And then, you know, you will show in your audit log that
this and that and that have been accessed, okay? In addition to
that you may ask well, why doesn't the thief then just try to
access all of them and decrypt all of them and get, you know,
everything, CleanOS would not offer. You can do that, but note
that on the cloud side, you could also be looking at, you know,
monitoring your accesses and so on and kind of predicting when
you're
very high
um hmm.
>>: [indiscernible] right because [indiscernible] if I'm doing
search. Search my mailbox. You want to allow me to do a local
search?
>> Roxana Geambasu: You know that's a good question, right.
Well, first of all, a search typically goes into, you know, goes
into an index and so on.
>>: [indiscernible] on the phone to search [indiscernible] no
big deal, right?
>> Roxana Geambasu:
No, it's not a big deal
[Multiple people speaking.]
>>:
I need to decrypt a lot of data to search.
>> Roxana Geambasu: Right, but what I mean to say is that it's
going to go into an index so you don't necessarily access all the
contents of your entire email.
>>:
The index is valuable.
>> Roxana Geambasu:
The index is.
So and I believe
>>: With the index, I can reconstruct all of the data, so I
think you
[Multiple people speaking.]
>> Roxana Geambasu:
You're right, you're right.
>>: I don't think you want to claim that revealing the index is
any weaker than
>> Roxana Geambasu: Yes, you're right. And, in fact, I don't
know, you know, if this is covered. But I think that you would
actually include the index. You would actually be including it
in the SDOs themselves.
>>: [indiscernible] saying once you get into the territory of a
behavior based monitoring, it's become very messy in some sense.
>> Roxana Geambasu: Well, I don't know. I think that there are
things that can be done, you know, big data is used to derive,
you know, behavior.
So anyway, but
Why don't we use it for security so much?
>>: Data allow [indiscernible] mobile OS as well, right. This
device was always in Microsoft building. If not in Microsoft
building during the work hour, I'm going to [indiscernible]
myself asking how the pin every five minutes. It was the type of
behavior that can be done in the cloud because we do that in the
mobile app. That's what I'm saying.
>> Roxana Geambasu: You can. And again, you mentioned, you
know, if you're outside, right, break it to the user, you know,
leave it to the user to secure, right. So you know, like in
general, right, I think protection systems, in the end, will
touch the user. Will have an impact on the user. The more you
increase your protection, the more the user will be affected. I
think there is a fundamental trade off here, okay. And so there
is a big balance here that, you know, careful balance here that
you need to think about, right, in terms of how much protection
do you want, because then you're losing usability, right, like
what you've been talking about.
If you want, you know, this needs to happen, right, this trade
off needs to be set somewhere, and what CleanOS gives you is that
if your trade off if you want to make your trade off, you know,
fairly kind of, you know, favoring usability quite a bit, right,
then CleanOS has more
so the point is, you know, depending on
where that trade off is, CleanOS is more useful or not, okay.
More or less useful. So if you're willing to make usability a
priority, right, if that's what you want to do, make usability a
priority, protection goes down, okay. You don't require the user
to enter every five minutes. You don't require them to enter,
you know, very complex password and so on, right. So protection
goes down. Usability goes up. CleanOS becomes more important.
When, vice versa, you want to make protection, you know, much
more important, usability, you know, goes down. CleanOS is
useful, right, in a sense goes down as well. So that's kind of,
you know, one way to think about it.
sense.
I don't know if this makes
>>: So now I'm a little bit confused. Can you remind me of the
benefit of eviction for SDOs? Exactly what's the benefit.
>> Roxana Geambasu: So two things, right. You evict the fine
granularity and object granularity, right, and you get two
things. You get minimal accumulation, right. You get the
property [indiscernible] the minimal amount
the minimal
number, right, of objects are exposed at any point in time. And
second, you get auditing, the auditing benefit and the remote
control benefit of these
of non evicted
>>: If an attacker gets the device when it's unlocked, he could
basically access [indiscernible].
>> Roxana Geambasu: Yes. It could, as I said, it could, you
know, potentially do that, right. And there could be, you know,
multiple types of attackers, right. Some attackers will, you
know, try to do that. Other attackers won't, right, like you're
saying maybe somebody good as actually kept your dear device.
It's not really an attacker, okay, and they don't, you know,
snoop around. As a user, you have no idea, you have no
transparency into that post theft. You have no idea what's
happened.
>>: [indiscernible] purposes, not like work.
access [indiscernible].
>> Roxana Geambasu:
on its own
>>:
Accumulation.
So it's minimal
So minimal accumulation, just
Seems like mostly [indiscernible].
>>: I think the stolen device is just one of the theft scenarios
here.
>> Roxana Geambasu:
Yes.
>>: You also assume code [indiscernible] and hardware level
tampering.
[Multiple people speaking.]
>>: [indiscernible] able to do that, I would first try to look
at all the objects.
>> Roxana Geambasu:
Yes, you can
[Multiple people speaking.]
>>: [indiscernible] software and look at memory.
[indiscernible].
So
>> Roxana Geambasu: Right. So I think we do assume that, in
fact. And as I said, right, we do assume cold boot and cold boot
could happen, right?
>>:
You could [indiscernible] a little bit, but I think you need
[Multiple people speaking.]
>> Roxana Geambasu: So let me make something clear. CleanOS
will not, very clear, and this I agree with this, will not
protect your data in all circumstances, okay? It will protect
your data in some circumstances, where you are able to, you know,
disable accesses to the keys, you know, before all of them got
compromised, okay. However, in all situations, it will tell you
what really happened. And there are many situations that could
have happened after you lose the device, and you don't know which
one it is. There could be a cold boot attacker which goes
through every single thing and grabs every single thing, okay.
In that case, at least you know, and you say oh, whatever. There
is no use for CleanOS, quite frankly. But at the very least you
know that.
Whereas there could be, you know, another type of attacker who
doesn't do that right. He's a hardware thief or a nice person or
whatever, doesn't do that, you'll know, okay. So that's kind of
the point that I
you know, you can differentiate the very
least. Right now, you lose the device, you know, well I don't
know what's going on with it, okay.
If you lose the device, I'm going to ask
I've asked a lot of
people this. If you lose your device, do you know what you've
lost? Do you have any idea? I'm not going to run a poll now
because it's getting a little bit late, and I do want to talk
about my second project.
>>: [indiscernible] because now I'm still relying on the
applications to do the right thing and I don't know if the
application is doing the right thing. If the application's not
smart enough to mark a password was sensitive, I still lose that.
>> Roxana Geambasu:
>>:
I assume your transparency [indiscernible].
>> Roxana Geambasu:
>>:
Correct.
Yes, so is my what?
Your transparency [indiscernible].
>> Roxana Geambasu: No, that's actually the one that I'm going
to talk here is a little different, but we do address that, in
fact, and I haven't gotten, you know, to talk about default SDOs.
So in addition to these SDOs that applications create, and I do
believe this is the right thing to offer, you know, to
applications, application writers that want this, right, feature,
to offer this abstraction, in addition to that, we do provide a
set of default SDOs where we actually identify data that's
sensitive. We do our best to identify data that sensitive. And
our [indiscernible] is very good. Our precision is not very good
at all. And the way we do this is essentially we identify, for
example, passwords where we create SDOs that, with, you know,
data that comes over SSL, we just consider all of that in bulk,
you know, that's sensitive. And I think there's another one.
Oh, user input. User input in general is also, you know, an SDO
that's sensitive. So, you know, precision is not good for this
default SDOs, but, you know, [indiscernible] is pretty good, at
least, you know, at least in our experiments. So, you know,
there are lots of things that can be done. And in this other
project of mine, you know, you can identify actually from the
operating system and they can talk to you, you know, one on one
if, you know, [indiscernible], you know, identify from the
operating system up, you know, objects, application level objects
and that's another thing.
Now, okay. So I want to go forward, but here's the very quickly,
so I'm going to probably spend three more minutes on CleanOS,
because I do want to wrap it up, but I do want to share with you,
you know, the SDO abstraction, this interface, right, and, you
know, essentially what you can do, what this interface lets you
do is to create an SDO as an application, right, specify the
description for it, which is going to be useful for auditing, a
description, for example, can be for an [indiscernible] SDO, it's
a
it's a, you know, subject or something like that for a
password, it can be the name of the
you know, your email
account for which you're setting that pass
for which the
password corresponds to. You can add objects. You can move
objects. Okay. So fairly simple interface here and, of course,
you know, ourselves, we have a private SDO [indiscernible] that
identifies the cloud under the device. And this is how, you
know, modified version of the clean application uses SDOs. You
know, a few lines of change, you know, which I think most lines
of change, I think, [indiscernible] or something like this lines
of change.
I'm going to jump over the architecture, because we have
discussed it quite a bit here and I'm just going to jump directly
into, you know, a little bit of evaluation.
So the one that we asked is does CleanOS limit data exposure,
right. And what we've seen, you know, this table shows
essentially the fraction of time in which sensitive data in the
email application was exposed. Without CleanOS, the password,
for example, is just for the email application in this particular
table. Without the CleanOS, the password and contents were
exposed almost all the time. With application defined, you know,
SDOs, that goes down significantly. With different SDOs, you
know, the exposure also goes down, although not quite as much as
this, because default SDO a lot coarser than application defined
SDOs. So that's kind of what this can tell you.
So I'm not going to go into a whole lot more detail about overall
the highlight that is in our evaluation, CleanOS has, you know,
we've been showing
we've shown that CleanOS can slower
exposure of sensitive data by up to 97 percent for reasonable
performance overheads, and we are talking about energy here. I
believe I have at the end a graph on this. It's not included
right now, but I'm going to skip through it, and I'm happy to
show it to you later.
Okay. So in summary, very quickly, I've shown how today mobile
OSs and applications mismanage sensitive data by allowing you to
accumulate on these theft prone devices and exposing it to thiefs
and such, right. And I've told you about CleanOS, a new Android
based OS that is designed to minimize that.
And more importantly, CleanOS showcases a new and powerful view,
I believe, on data security in general, which is instead of
accumulating data on [indiscernible], many systems do today,
because, you know, they can, and then struggling to protect it
against myriad of attacks, systems and applications in my mind
should be designed to manage data much more vigorously and
minimize the exposure to attack and I believe that, you know,
abstractions built within operating systems and cloud
infrastructures can help with that and make it easy, right, for
programmers to do that without losing a lot of hours, right, to
do this.
Okay. Unless there are other questions related to this, I would
like to very, very briefly talk about the other second project.
Okay. Thank you very much for coming.
Okay. So the second project that I would like to tell you about
is specifically an example of the second direction that I told
you about transparency. So CleanOS is also about transparency as
well, but also minimizing accumulation. And it was for mobile
devices.
I'm going to show you now an example for cloud auditing, for
cloud auditing project, xRay. It's a system we're trying to
create and its goal to answer, really a question, a very
difficult question that I think most of us have asked in recent
years. What are web services like Google and Amazon and
Microsoft itself, right, doing with our data. Okay? It's a very
ambitious project and it's very much in the works so I won't be
able to answer a lot of your questions, I'm sure, you know.
You'll have more questions than I can answer.
So why do we want this? Like I said before, today's cloud
services leverage users' data for a lot of purposes. And
presently, users have no visibility into that. And what I want
to do is I want to add visibility and awareness for the users.
For example, what could we use this for? Well, wouldn't you like
to know, for example, if your Facebook like buttons influence our
insurance prices or if your searches change your Amazon prices?
Or, you know, if why you're being constantly
you know,
recommended inappropriate videos
this actually happened to me,
and it was very frustrating. I kept wondering, why does Google
think that I'm interested in this content? What have I done in
the past to make him believe, Google believe that, right? So it
is
yeah. I was wondering for a while.
So in xRay, what we are trying to do is to increase transparency
and awareness by tracking and exposing what services do with
users' data. And I'm trying to do this in a very generic way and
without controlling the web services, okay. There are real
services out there, and we're end users that were trying to, you
know, get tools for end users to get more awareness.
Okay. Now, this sounds very much like information flow control,
right or tracking, information no tracking. And we all know how
to do information flow tracking in controlled environments, in an
operating system, in, you know, a controlled distributed cloud
infrastructure and so on. However, here, what we're talking
about is, you know, the big question, right, in xRay is how do we
track information when we have no control over these clouds. How
do we track information in the open internet, okay. That's a
very big question, very daunting question.
>>:
From the cloud perspective?
>> Roxana Geambasu: From the cloud perspective, exactly, right.
So, you know, we're the client. We want to know what they're
doing with our data. We're uploaded the data to them, you know,
from, you know, for a very long time I've been asking, you know,
there are all of these strange auditing systems that exist and,
you know, [indiscernible] retrievability and so on. I've been
asking myself, how can you do that with accesses for a very long
time, for two or three years.
And, you know, a year and a little bit before, you know, I
thought about, you know, a couple of insights, right, that I
think, you know, have led to pretty, you know, good early
results.
Okay. So the idea, you know, the key insight relied on this
observation is that oftentimes, the use of your information comes
back to you in a diluted form, okay? In the form, for example
are of targeted ads, of products that are recommended to you,
prices that are modified, right, based on your data. Videos and
so on.
Okay. So if you input, you know, you input your personal data
into the cloud, the cloud does it's magic, and then affects the
output that you're seeing on the website of the cloud or on
another website with whom the cloud has shared data for example,
if that happens.
In intuitively, if [indiscernible] inputs and this is outputs and
you look at the correlation between the inputs and the outputs,
you know, you may be able to tell, you know, which data led to
which, you know, [indiscernible].
>>:
[indiscernible].
>> Roxana Geambasu:
>>:
Could you
[indiscernible] somehow.
>> Roxana Geambasu: It's related to [indiscernible] marking
except that it's real data that is not [indiscernible] in any
way. You can improve our system, but we don't do [indiscernible]
yet. That is creating, you know, create, you know, uniqueness
about the units. We keep the units
sorry, I didn't tell you
about specifically what we do, but we keep the datas, the user's
data [indiscernible], okay? It's related.
All right. So that's what we want to audit, right? We want to
kind of audit the correlation between these inputs and these
outputs. Now, even with this simply said, the problem is still
too broad, right, and complex and abstract. So, you know, how do
we make it more [indiscernible] so that we make progress on it?
Well, through [indiscernible] assumptions. And first, for
example, we assume that users and auditors know what inputs and
what outputs to track. For example, user might want to track
how, you know, some of his most sensitive emails are being used
to target ads. Can you tell me how many minutes I have?
>> Helen Wang:
[indiscernible].
>> Roxana Geambasu:
>> Helen Wang:
How many minutes.
About five.
>> Roxana Geambasu: Five minutes, okay. I can do that, yes.
Thank you. Okay. You know, there are a number of other
assumptions, but one important one is that for now, we focus on
very specific scenarios, and I want to talk very briefly about
the scenarios. First is ads on Google. We want to diagnose
which emails have led to which ads in Gmail, okay. That's one
thing, and that's what we're focusing on the most for now.
Second, we want to take, you know, look, diagnose products and
prices in Amazon. So, for example, in Amazon when you search for
something, you get a bunch of recommendations, right, for things
that you, you know, output. And [indiscernible] differs
depending on, you know what account you are. Some users may be,
you know, believed to be more interested in more expensive stuff.
Others in cheaper stuff. So [indiscernible] matters. So we want
to understand which of the previous searches or purchases led to
this order.
>>:
[indiscernible] machine learning algorithm.
>> Roxana Geambasu: Not reverse engineering. Not reverse
engineer. [indiscernible] black box treated as an F function,
that is a bunch of input in a particular way. But I don't want
to assume, you know, almost anything and I don't really want to
know what happens there. I just want to know, you know,
correlations, right, between the inputs and the outputs. So
reverse engineering means that I will understand causality. And
I cannot. I will not do that, okay? Instead, what we will
understand is correlation between the inputs and the outputs,
okay? That's very different and much more limited.
>>: So given this information [indiscernible]. For example if
you knew which personal information was used to [indiscernible]
to reverse.
>> Roxana Geambasu: Well that's a great question. We don't do
anything right now, but I think there are solutions. So, for
example, what can I do for Google to not think of that. Not
think this way of me. For example, could I be looking
what
should I be looking for, right?
>>: [indiscernible] what would happen if they provide certain
[indiscernible].
>> Roxana Geambasu:
Maybe.
Maybe that's a [indiscernible].
[Multiple people speaking.]
>>:
[indiscernible].
>> Roxana Geambasu: Yes, for emails, it's hard. Potentially for
searches here, you may say something like oh, if you're searching
for this, be aware of that, you know, you're going to get, I
don't know, higher prices or something. You may not want to
search for something. Or you search for this, you know, then you
say, oh, you're going to get higher prices now. Search for this
as well and you will not get the higher prices anymore. Or
something. I don't know.
>>:
[indiscernible] suit company for discrimination.
>> Roxana Geambasu:
>>:
So I'm talking
[indiscernible].
>> Roxana Geambasu: There are many uses of this. I think there
are many uses of this, and one of the biggest uses is for people,
you know, like, you know, journalists, for example, right, to
raise these issues, to leverage this tool and raise the issues.
And then Monday I'm actually meeting with a journalist from
[indiscernible] who, you know
anyway, you know, I can talk to
you.
>>:
[inaudible].
>> Roxana Geambasu: Yes, I do. Okay. But let me tell you very
briefly how it works. The way it works, you know, so what's the
mechanism, right? What's the architecture? Well, the way this
works is that you have a primary account. Then we create a bunch
of virtual accounts for you. We don't necessarily need to create
them. We can use them from others, but I won't talk about that.
[indiscernible] the way to think about that is there are a bunch
of virtual accounts which contain subsets of your data, okay.
Not the whole data, but subsets of your data. These, you know,
lead to certain output sets and then xRay, what it does, it likes
at the differences between this data and these input, these
commonality between these accounts and the outputs and kind of
like [indiscernible], another way to think about
>>: [indiscernible] it's not generating new data
[indiscernible].
>> Roxana Geambasu: It's not generating new data.
of the same data that are similar data, okay?
>>:
It's subsets
Except that I think that the cloud is very [indiscernible].
>> Roxana Geambasu: Amazon is very deterministic, as it turns
out. [indiscernible] is not that deterministic. Therefore, what
it means is that we need more accounts to get better coverage,
virtual accounts. Whereas within Amazon, we need very few to get
that.
>>: [indiscernible] emails and how do you get a subset of data
[indiscernible].
>> Roxana Geambasu: So, you know, it varies and I'm going to
talk to you later because I have one minute to do this. It
varies. One way to do this is to create the virtual accounts and
forward your emails to these virtual accounts, right, for
example, we don't actually only do this because that scales very
poorly. What we will do is something like
what is it called,
community? We have name for it. Collaborative auditing, where I
can actually use stuff from your account and identify it with
mine so if we've sent a similar email in terms of ad signatures
then we're going to match this together and I'm going to say, oh,
I'm going to use Helen to be able to
Helen's email, I don't
know the contents of your email. I just know the signature, you
know, to this
not necessarily myself, but this somewhat
trusted cloud provider.
But I'm going to tell you about that later. Very quickly, just
as an example, right, in order to
let's say that you have this
account. You have this distribution of your emails in these
virtual accounts, and in order to diagnose this ad, you know,
suppose that these accounts, you know, see the ad. This one
doesn't see the ad. Therefore, you know, the fact that it too is
the common one here and if three, you know, doesn't seem to see
the ad, so it too, you know, targets ad one, explains this
observation. So that's kind of how it works. And we have a
simple base network to do this.
This is very early results that we have. To monitor 15 emails,
this is the number of virtual accounts and the accesses. This is
the precision and [indiscernible] and what we see is that
precision and recall go pretty high and this is zero non
optimized and really the first few that we tried, and the recall
is 76 percent, and precision is around 87 percent for ten
accounts, okay. And again, you know
>>:
[inaudible].
>> Roxana Geambasu: Well, we are the ground truth. And by the
way, we are wrong oftentimes. Not wrong, but there are things
that xRay diagnoses that is like, oh, interesting. You know. So
we ourselves, you know, we look at the emails and the ads and
match themselves ourselves with our minds. And sometimes,
believe that we are wrong. We do not have complete ground truth,
okay? So it's a little bit of a fuzzy ground truth. In Amazon,
we did have ground truth, because they actually tell us for one
particular feature [indiscernible] is the ground truth. This,
you're seeing this because of that. So we don't have ground
truth.
And I'm going to show you those results, because they are much
more. So in any case, in conclusion, what I'm trying to say here
is that today's practices are loose and overly permissive
oftentimes in terms of hoarding data, accumulating data and in
terms of opaqueness to the users so my research, you know, aims
to create new abstractions to pursue responsible data management,
you know, which consists of two things. Curbing data
accumulations and increasing transparency. And I've shown you
one example for each. Thank you very much and I'm two minutes
late. I think many minutes late.
[Multiple people speaking.]
>> Roxana Geambasu:
Thank you very much for your questions also.
Download