>> Helen Wang: Good morning, everyone. It's my great pleasure to welcome Dr. Roxana Geambasu, who is a professor at Columbia University. She's going to tell us about her recent work in big data management and a responsible one. >> Roxana Geambasu: Thank you very much, Helen. Thank you very much, everyone, for coming. I'm very pleased to be here, back here after two years and I guess a half now or something like this, I'm very pleased to tell you what I've been doing for the past, you know, year and a half, and this is a new topic that I'm starting on new abstractions for what I call Responsible Big Data Management. We live in a world of big data. Really big data, right? For example, it's estimated that just two days of current data production is equivalent to what the world has produced since its very beginning. That's gigantic, right? And this includes browsing habits, social media, GPS and [indiscernible] data from smartphones, you know, videos from smart classes or surveillance cameras and so on. So what's been driving this data surge? Well, it's a set of new technologies, mobile and wearable computing, cloud computing, and huge capacity disks which enable the acquisition, processing and storage of unprecedented amounts of data. And this technology essentially transform our world from an old world of isolated desktops, you know, which gathered and used only sporadic data from one user into a world in which mobile devices in our pockets, in our wrists, in our noses and so on are pouring enormous amounts of information into giant scale clouds where they get stored and processed together with billions of other or millions billions, whatever, of other users' data. Okay. And that's really great because this big data, you know, big data has enormous potential, right, like I'm sure all of you know, potential for all walks of society, business, science, government, and so on. For example, it can be used to improve business revenue through effective product placement and targeted advertisements. And it's also been targeted, right, to be the next big driver of scientific breakthroughs. Now, this immense potential of data has led to what I call a true rush, right, in which everybody's eager and extremely excited to acquire data and leverage it for something new. Everybody's asking things like, you know, what data can we gather or what can we use it for and so on, and that's really great, because these questions foster innovation and, you know, into kind of other great uses, featured great uses of the data, but they also raise serious concerns. Specifically, because that excitement and frenzy, you know, because of that excitement and frenzy, a lot of people are taking now dangerously permissive behaviors with respect to the data. For example, we're seeing, you know, in a lot of places aggressive accumulation in every click, stream, search, purchase is being monitored and analyzed and archived, right, within a giant scale cloud as well as on mobile devices. And, you know, it's archived potentially forever because, you know, disks are huge so there's no need to ever delete anything. Data acquisition is also ad hoc and obscure oftentimes. We have some applications gathering data outside their scopes. For example, the Facebook like button, as you probably already know, tracks your visits across all websites that include a button and not just on Facebook, right? And finally, all of these, you know, uses of the data and the commission of the data happen without the user's knowledge or control, right. The user has no idea where his data is being accumulated, on which device there is his data on which cloud service and what it is being used for and whether, you know, these uses are good for him or bad for him, right. For example, my Facebook likes, right, are using my Facebook likes to recommend fun movies is a good thing. But using them to drive my health insurance prices is not a good thing, right. So for most types of data today, aside from prams, you know, health data, banking data and a few other types of data, it's pretty much a lawless land, this big data world. A bit like the wild west with hardly any principles to govern it. And that's very dangerous, particularly in the context of today's increasingly aggressive attacks, you know, for example, mobile attacks are extremely you know, where a lot of this, most of this data originates and it's cached are extremely prone to [indiscernible] loss. Similarly, clouds, which, you know, accumulate much of this data and archive much of this data are magnets for sneakingly sophisticated attacks such as hackers and, you know, subpoenas, foreign spies, insiders, and so on. Now, of course, application writers and cloud providers and the like already deploy traditional protection tools, right, some of which are particularly powerful, right, especially the newly developed encrypted databases that permit computation on top of while keeping, you know, the data encrypted. However, despite these great advances in the protection technology, protection systems are not perfect either, right? Hackers do find their way around firewalls and intrusion detection systems as, you know, these snippets show. And to date, you know, there's no such thing as an encrypted database that supports fully fledged, arbitrary data analytics. So what I believe is that instead of letting data accumulate forever, right, and, you know, accumulate [indiscernible] and doing so in very obscure ways, and then, you know, trying to protect that data in all sorts of ways, I believe that it's now time for us to be setting some ground rules, right, for this big data game, right. So what should societies in particular, what should society's rules be for collection of end use of the data. How do we weigh the trade offs, right, between privacy, security, and functionality. Those are the kinds of questions that a lot of people have been asking lately, you know, from a principally a perspective, and that I ponder through my research, right. So in particular, I cannot answer any of these questions now, just to be very clear. But I do believe that there is significant room for stricter and more responsible approach to big data management, and thus far, I've identified two directions for improvement. First, I believe that data accumulation should be much more restrained and principled than it is today. Programmers should reason about the data that they accumulate, whether it is all needed and whether it can be trimmed for security. Sometimes the answer will be no. But in other case, it may be yes, and I'll show you an example in this talk later, you know, as the first this concrete example that I'll show, a situation where you can minimize accumulation without affecting functionality in particular. Second, I believe that there is an enormous need for more transparency for users into what data is being accumulated, where it is stored, how it is being used, you know, with whom it is being shared and so on. Now, the key challenge here is to meet these goals, right, fulfill these kind of principles however you want to call them without affecting performance and functionality. >>: And also productivity. >> Roxana Geambasu: >>: Hm? And also productivity. >> Roxana Geambasu: And also productivity, that's true. That's exactly right. And that's actually a very, very good point and it ties exactly with what I say, because you know, programmers and users alike have no support from the operating systems on their mobiles or the infrastructures in their clouds to apply these principles, you know, that I've just defined, right? Just as an example, think of a modern operating system, right, a modern operating system itself is incredibly dirty and opaque, right? It leaves bread crumbs everywhere, right? Bread crumbs of deallocated application data everywhere, and they provide no information related to what object bread crumb stored where, right? For example, on your mobile device, do you have any idea what data you have stored there? If you lose it, do you have any idea what you've lost, right? Probably not. So my research focuses precisely >>: [indiscernible] snap chats, it's kind of [indiscernible]. >> Roxana Geambasu: >> yes, please. I agree with you, people are [indiscernible]. >> Roxana Geambasu: Yeah, that's a great point. People are starting to look into that, right, and creating services, you know, noticing that, look, this has gone out of control, right? Accumulation has gone out of control and there are a couple of services that are taking this stand that I'm arguing for here. So I don't want to claim that this is, oh, I've invented this problem, you know, or I'm the only one who's thinking about these problems. No way, right? But what I do, you know, what my work does is to design I'll tell you in a minute, right, is to design and build, right, and at times deploy, right, new operating systems, extractions and distributed systems extractions to simulate and promote data management for, you know, responsible programmers. And specifically for the past, you know, year and a half or so, my students and I have been working for on a number of projects on this topic, and I'm listing here four of these projects, two in each direction that I mentioned in the previous slide. Limiting accumulation and increasing transparency. They're each at very different stages of completion and, you know, here I'll focus on two of these. These yellow ones. One for each direction just to give you a gist for what I really mean, you know, by these directions. And, you know, the first one is what was published last year at OSD, and the second one is very much in progress work. So you'll see, you know, a big transition between, you know, stuff that I, you know, know and stuff that I'm speculating a lot about. Okay. All right. So let's get started with the first example. I'll first talk about CleanOS, a system that we have built to limit mobile data accumulation with a new process called idle eviction. And it showcases the first principle that is how an operating abstraction can involve or, rather, can improve in data security on a mobile device by linking accumulation. Okay. So let's focus on mobile devices right for now. As you all know, right, these devices are taking over desktops as the primary computing platform for personal users. And they have a lot of advantages. I'm not going to go through them here, but despite this great advantages, mobile devices also bring a number of challenges, and ones which should challenge, like I said before, is that when in the doesn't world data used to be stored in a physically secured, firewall enabled network, you know, such as, you know, a home network or a corporate network. Users now take their mobile devices out in the world, right where they can be easily stolen, or lost or seized or shoulder surfed. Okay. That's a big problem. Now, despite these threats, mobile operating systems, which, if you think of it, come really are fledglings of desktop operating systems, you know, [indiscernible] kind of links or [indiscernible] iOS or OSX, and so on. These mobile operating systems have not evolved to protect sensitive data from thieves and other parties that may capture the device. Just like the desktop ancestors, mobile operating systems mismanage sensitive data by allowing you to accumulate on the theft prone device and such manage management occurs at all layers of the operating system. For example, the OS doesn't securely deallocate data, applications hoard sensitive data for performance or convenience reasons, and so on. And, of course, all of this data that gets hoarded on the mobile device is placed at risk if the device is stolen or lost. For example a thief can dump RAM or flash memory contents, right, or break passwords or fingerprints and so on. And let me give you a few examples from a small study that we ran on Android to figure out, you know, kind of what the thief would be able to capture if, you know, they were to steal the device and break some of these very basic production systems. So we wanted to find out just how much sensitive data he would get. For that, we installed 14 applications on within Android gingerbread so it's an older version of Android with default security settings. And among the applications we had, you know, email, we had password managers, document editing things and so on. And we dumped the contents of the RAM and the SQLite databases. For each application, we click in the table means that we found clear text passwords or contents from the dump. For example, let's take email as a specific example. We were able to grip the clear text password from RAM as well as, you know, email snippets at all times, okay. So, you know, they were hoarded there all the time. And if you know, you know, Android RAM is not encrypted so you can very easily get that. Also, everything is stored in the present, right, in clear text SQLite database. Okay. Overall, we captured sensitive data from 13 out of the 14 applications, and nine of these applications hoarded sensitive data in clear text RAM at all times. That's what we found. And much of this data is exposed by the applications themselves, as you can see. And I don't believe it's because programmers are malicious or stupid, okay. I don't think that that's a good argument. I think they, you know, just lack appropriate support from the operating system to manage this data. If you think of it, operating systems don't have a notion of sensitive data, right. They don't have treat anything, you know, sensitive data in any way that's different. Yes, please? >>: Two questions. >> Roxana Geambasu: Yes, please. >>: So first, the password, you mean like my email account password? >> Roxana Geambasu: You're email account password, yes. >>: And the second one is what's [indiscernible] clear text all my password [indiscernible] is a problem or not. >> Roxana Geambasu: Yes, that is correct. So what I'm, you know, I'm talking here about very sophisticated. It's going to become clear in the next section or in the next slide that, you know, I'm assuming a fairly sophisticated attack, right. I'm assuming essentially that the attacker can do any physical attack against, you know, against the device. They can dump the RAM. They can, you know, perhaps break through the user's authentication and so on. Let me go through, you know, the next slide, I believe, is when I the next couple of slides when is I make that clear, okay? All right. So and, you know, of course, like you're saying, right, the big issue with the sensitive data accumulation that is securing the data is really, really hard, you know, under particular threat models. And the issue, right, you know, sure, of course, right, people should and can, you know, encrypt their file systems and the RAM and use, you know, some of the existing automatic wipe out systems to disable data after a device loss. However, I argue that these existing systems come with limitations. They're not entirely perfect. For example, statistics show that 56 percent of corporate users don't lock their device. That's down one percent, it's true, from two years ago. The same, you know, survey two years ago, but it's still, you know, very significant. For those who do use lock their devices, they configure, you know, extremely oftentimes, extremely poor passwords, right. We all know that. Which essentially rendered encryption useless, okay. And, of course, you have probably heard about the recent, you know, Apple touch ID, you know, highly usable fingerprint based authentication system for the iPhone and it's very useful and I do believe that it increases security, right, against, you know, certain types of attackers. Potentially not the sophisticated. However, you might also have heard, right, that the Apple touch ID was hacked by was it Germans? Well, from Europeans. I forget. Right. And, you know >> Hacked? >> Roxana Geambasu: Well, you know, the way it was hacked is very easy, right. The fingerprint is almost everywhere, right? It's on the device itself. And you can, you know, photograph it and generate a fake, you know, fake finger. You know, fake [indiscernible] there are videos that show how to do that, fake [indiscernible] kind of thing that you put on the finger in order to authenticate yourself to the device. Okay so and, you know, they argued, right, that, you know, our system that requires fingerprints and things that, you know, are everywhere, is not necessarily a good authentication system. You know, whether it is or not, the reality is that now protection system that we, you know, are talking about is perfect, right, and in particular, what I argue here is that these solutions that, you know, we are talking about here are imperfect stop gaps really for operating systems that were never designed with physical security in mind, right, desktop computers were assumed to be largely physically secured, right. And as a result, they lack abstractions for dealing with sensitive data and inappropriate manner. Okay. Does that answer your threat mail question a little bit? >>: Yes. >> Roxana Geambasu: Okay, thank you. All right. So in this talk, we argue precisely for that. We argue for the need of new mobile operating system abstractions for sensitive data, and we believe that rather than allowing sensitive data to accumulate unlimitedly on the device and then scrambling to protect it, mobile operating systems should instead try to manage sensitive data more rigorously and keep the device clean at any point in time in anticipation of device theft or loss. And if the device is stolen or lost, then the operating system should ensure that the minimal amount of data, sensitive data, can be exposed and that users know precisely what data was exposed, okay. Yes, please? >>: [indiscernible] password manager. sense can minimal minimize the >> Roxana Geambasu: >>: I guess that in some It can. The problem of setting a password on a device. >> Roxana Geambasu: Yes, it can. Or you can use, you know, key chains. Key chains have existed for a very long time and so on. So there are solutions for managing certain types of data. But when you use a key chain or, you know, this, you know, other solutions to store your emails, why, you know I mentioned passwords because passwords are what people react to when they think not react to, sorry. Think about particularly when they think about sensitive data. I argue that it's actually not the password that I care about. It's really the data that I care about, okay. The data which the password, you know, is considered as the lock that locks that data, okay. But really, to me, if, you know, they can read most of my email or something like that, but they don't get my password. Whatever, you know. That's extremely bad, right. Do you see my point? So very specific types of data, maybe there are solutions for them, but the problem is that this is not that kind of a generalized solution and I do believe that, you know, an abstraction within the operating system to manage sensitive data more rigorously in general, sensitive data, is to me long overdue. >>: So [indiscernible] this morning saying, I mean, Obama made a comment either today or yesterday saying he can now use the iPhone because the security. And the [indiscernible] create a phone that Obama can use because [indiscernible] network hacks. >> Roxana Geambasu: >>: Well, you know [indiscernible]. >> Roxana Geambasu: because Eventually, but I'm not going to go there, >>: Because some people really care about their data safety [indiscernible] otherwise you couldn't use a mobile device. But the cause that user study show a lot of people, they don't care, right. >> Roxana Geambasu: So actually, user studies show [Multiple people speaking.] >>: So that the half of the user base like to [indiscernible]. >> Roxana Geambasu: Right. So I don't have that statistic here, but there is a statistic run by Mozilla that actually shows that users, when they feel it is incredibly important for them to do so, they do configure, and this refers to configurations within Firefox, they do try to figure and, you know, and protect their privacy, okay. So this argument that users don't care, you know, I don't believe that it holds true. There are things that I really care about, okay, that are very sensitive for me, right, and there are things that, whatever, you know. It's okay if they get, you know, they get leaked, right. So, you know, I don't believe it's so black and white, and I don't believe that we should disable operating systems, you know, mechanisms, right, just because not everybody, you know, needs them, okay. So that's kind of my argument. I was going to give you one more example with respect to this. So, for instance, there are these apps, mobile apps, and I know those on Android, I forget their names now. One of them is volt hide and the other one is something else. What they do is they eventually take a few types of your data, right, like, you know, your images, they know how to hide your images or some images that you select, right, hide some contacts or things like this, and that's what they do, and they have, you know, between I think 10 and 50 million downloads each. I don't know what that means in terms of usage, of course. Downloads is one thing. Usage is another thing. But that may also kind of tell you that, well, you know, this is not, you know, necessarily that, you know, we can't put a blanket on all users, right, or on any user at all times. Okay. There are situations when I care. Situations when I don't care. So, you know, having that option I think is very important to protect with you really care about it. And I have account project, I'm not going to talk about it here well, I don't know if I should say much at all, actually, because it's under submission, so let me just get back on that. But it relates to this. Okay. All right. So where were we? I answer both of you questions? >>: Just a second. Okay. Did Yeah. >> Roxana Geambasu: Okay, all right. Okay. So the point is, right, that I believe that it's time for new operating systems abstractions and that's what CleanOS. That's what we did in CleanOS. CleanOS is our first step toward creating what I call a clean mobile operating system. It's an Android based operating system that minimizes and audits the accumulation of sensitive data on a stolen or most mobile device. Or, sorry, minimizes and audits the accumulation, right. And it does so by implementing an abstraction code of sensitive date an object or an SDO, which we believe, as we said, right, is a long [indiscernible] abstraction within operating systems. And SDO, what they do is they identify locations of sensitive data both in RAM and on stable storage, monitor, you know monitor the use of CleanOS monitors the use of these SDOs by the applications and evicts sensitive data to the cloud cloud, right, whenever it is not under active use. to a So this eviction process, what it does is that it helps maintain a clean environment at any point in time on the device so that a potential thief can't get a free lunch by capturing the device. And instead, the thief has to go to the cloud, right, to access any unused data. And upon that time, the cloud can enforce a set of useful post loss functions. Have a question? >>: Probably just asking you to push the slide advance button, but it seems like this is going to have pretty [indiscernible] on both power >> Roxana Geambasu: >>: Yes, I will show you at the end. And disconnected operations. >> Roxana Geambasu: Yes. I will show you at the end. I don't believe I have a slide on that, but I can talk to you about what we do about this kind of operations in particular. Next slide is going to address that, yes. Okay. So now, you know, cleansing the devices, the operating system is a very broad vision and, you know, very complex thing. Because as I said, at all layers, right, there's dirtiness within all layers of the operating system. And here, what we are doing is that we're going to focus only on cleansing data hoarded by applications themselves, not as much the lower levels, right, like OS buffers and so on for which work has existed for a very long time. Okay. Our design of CleanOS, you know, to address this question indirectly a little bit, our design of CleanOS relies on a few crucial insights from mobile operating systems, you can think of them as assumptions, right. But I think, you know, they are largely true. First, although sensitive data are exposed permanently, much of them are actually used very rarely. For example, the email password is constantly being exposed by the email application. However, it's only used during refreshes. A particular email's content, similarly, is only used when the user reads that email, okay, not otherwise. Second, mobile applications also oftentimes have a cloud back according to our studies, about 70 percent, two years ago, that was the case of the applications have a cloud back end. And, you know, which already stores that sensitive information so, you know, why expose it on the device as well. Third, mobile devices are becoming increasingly connected with pervasive wireless and 3G, 4G, cellular coverage, right, these days. And what we do is we leverage these insights, these assumptions to turn Android into a clean operating system and I'll tell you a little bit about how we do that next. But, you know, we do include mechanisms that, you know, assume some of these away, right. All right. Okay. So the basic functioning of CleanOS is like this. Applications create SDOs and place their sensitive data in them. This way, they identify to the operating system their sensitive data objects. Okay. And CleanOS manages them rigorously by implementing three key functions. First, CleanOS tracks data within SDOs using taint tracking. It automatically adds any new data computed from the SDO to the SDO itself. Second, CleanOS evicts SDOs to a trusted cloud whenever the SDO becomes idle, right, hasn't been used for a period of time. And the trusted cloud could be, for example, the application's own cloud or a third party maintained service and, you know, by the way, you know, I said that we evicted the SDOs to the cloud. We don't actually ship data back and forth. But rather, we just encrypt the data in place and ship the keys back and forth, you know, the keys are actually stored in the cloud. Yes, there is a question. >>: This is a [indiscernible] I envision, it's a >> Roxana Geambasu: I didn't understand that. >>: You say the classic cloud, are you thinking about an [indiscernible] cloud like I'm using [indiscernible]. >> Roxana Geambasu: Yes. >>: And now the OS, not just OS, but the [indiscernible] and my outlook app on my phone, they both have to understand this new abstraction. >> Roxana Geambasu: Yes. That is correct. >>: [indiscernible] specific. this. [indiscernible] app has to do >> Roxana Geambasu: Yes, that is correct. So the point is that, you know, the operating system on the mobile device, right, provides this abstraction. This abstraction has a back end. You know, if the application has a back end, the key for the SDO is stored in that back end, can be stored in that back end. And if the application essentially wants to use that to cleanse itself, some applications may not need to completely or want at all. Right. Other applications may leverage our services, right, in order to do that much more easily than they would you know, they would be doing otherwise, right. And the way they, you know, they integrate with us is that they implement this interface that I'm going to show in a second and, second, they host, you know, a key server on the server side, right, in their cloud. This is not something that we've done within CleanOS yet, but I believe so, you know, mobile operating systems, you know, have evolved. Are very different, in fact. They are coming from the same operating systems, but they're very different, okay. One way in which they've evolved is, you know, by going to, you know, to the cloud, right. Most applications are cloud based. Yet the operating system still, you know, has kind of on the you know, is only local. I do believe that abstractions need to, you know, for managing data need to transcend, you know, the mobile and the cloud portions, okay. And, you know, by transcending, what I mean is that this SDO should actually be existing on that side as well, okay. And >>: Has to be application specific. store anything on a server, right. >> Roxana Geambasu: >>: Um hmm. [indiscernible]. >> Roxana Geambasu: >>: Because [indiscernible] Yes. [indiscernible]. >> Roxana Geambasu: application. So CleanOS does require changes to the >>: That means I have to change the application. I have to change the protocol, my application protocol between my application and my server and then that has nothing to do with >> Roxana Geambasu: I don't know that you need to change the protocol, right. I think the protocol can go the way it goes, right, and then you need to add on your server side, you need to be hosting a key server, okay, in addition. >>: [indiscernible]. >> Roxana Geambasu: It is an assumption, yes. >>: This trusted cloud does not have to be [indiscernible]. could be >> Roxana Geambasu: It can be [indiscernible]. So, you know, there are multiple deployment options here, right. Either an application either makes this constant, conscious decision to I want to cleanse, I want to use this abstraction. I want to cleanse, you know, cleanse my data. Well, what they will do is they will host a web server, right a key server. The application will implement this abstraction and that's how you do that, right. For applications that don't do that, we actually have default and those sorts of things, you know, to identify sensitive data, in fact, ourselves. And you, the user, can take this over, right, and say no, I want my device to be cleansed, okay. So what you do is you have that host, you know, the keys, right, the key server yourself. Yes, please. >>: So just to make sure I understand what we're getting with this interaction, we're not sending data off to the cloud. >> Roxana Geambasu: No. >>: Because the whole point of this is all the cloud's really doing is giving us an opportunity to throw away the keys remotely? >> Roxana Geambasu: Yes. >>: One equivalent level for that would be stored it all locally, we protected all the we take the last key at the very top of the off to the cloud and then we have to get it imagine if all we encryption, and then tree and we hand that back later. >> Roxana Geambasu: You can do this, and that's essentially what this will give you is it will give you an all or nothing kind of cleansing. >>: So what do you get by breaking it down to a per application basis? >> Roxana Geambasu: Well, actually, even more than that. Breaking it down to a per object basis. >>: Why is that >> Roxana Geambasu: So the reason why it's useful is that the whole idea, right, of minimizing exposure. The point is your mobile device accumulates a lot of email. Well, most of your emails will probably be on your mobile device, right? You don't read all of just one second. So you don't read all of your emails at the same time, clearly, right? There are very few operations that have to access all of those emails, right? So why are they on this device, okay? Let me show you perhaps it's the next slide that will show you the use of this, you know, of this kind of restricted, you know, accumulation that will reveal this. Are we asking >>: So the concern you have is that I'm looking at my device, I read my email, and if reading my email involves decrypting having the top level key on the device and then I set the device down and somebody steals it, they can see everything on the device, whereas in your system, they can only see the email >> Roxana Geambasu: Yes, that's correct. Directly. They can only see directly the email that you just read. So it's this whole idea of taking a device that accumulates a lot of things and minimizing that accumulation to your working set, pretty much, right. You as the user, the application's working set. >>: The question is why does this need to be an application involves abstraction? Why can't you do it at the page level? I mean, you're in front of it. Sorry, you're asking the application to expose its working set. But why not just infer it from >> Roxana Geambasu: So a lot of reasons. Performance is one of them, right. It's good to differentiate between what's really sensitive and what's whatever, right, Java stuff. You know, like, you know, there is an enormous amount of sorry. There is an enormous amount of data, other data that's really not >>: That probably requires measurement, because the alternative I mean, because the flip side argument is you're asking for an invasive change. And if you could make a if you could do this at the page level, it applies to all applications right now [indiscernible]. >> Roxana Geambasu: Um hmm. So I think you can get this at the page level. You can. If only the only thing that you cared about was minimizing the accumulation, then you can do this, okay. But there is another question. What you want after you lose the device is you want to ask what has been potentially compromised. And you'll find if that's if the unit of what you evict is the page, you'll find page 0XAB75, you know, has been, you know, potentially accessed or exposed on the device, okay. What does that mean for you. So there is no meaning associated with a page. meaning really associated with a file. >>: There noise [indiscernible] storage backup to [indiscernible]. >> Roxana Geambasu: So what I argue is especially for things that, you know, you need to understand and not just auditing, hiding as well and protection in general, I really think that that's the case. You really need another extraction, an object level abstraction, because that's what you, the user, can actually understand, okay. And today, we're doing protection at these levels, much lower levels, you know, at block disk level, page level, at, you know, file level which are completely meaningless. What's a file, for example, on your mobile device? It's nothing. I don't ever see files ever on my mobile device, right? So it's a bigger kind of, bigger spectrum of ideas, right, that come in and motivate this choice, right. Yes, please. >>: Why do you need a trusted cloud? Because in your scheme, in some sense, [indiscernible] sending things over to the cloud, I could just choose to encrypt those things, and now [indiscernible] I throw away the key and the next time, only if you enter some credential, right? >> Roxana Geambasu: You could do that. >>: [indiscernible] that credential would be used to retrieve the key that would be used for >> Roxana Geambasu: You can certainly do that. Again, you're asking is anybody I shouldn't like what can I say and what can't I say, do you know? Like if this is under submission? At any of the security conferences, are you >>: [indiscernible]. >> Roxana Geambasu: Never mind. The point is like some of these questions, I'm addressing them in some of the other work that I've done, and, you know, this notion of kind of having users, for example, you know, hide, you know, their objects at the object level, you know, hide their objects within applications if they show choose is something that I've been looking at. >>: [indiscernible]. >> Roxana Geambasu: Yeah. That's a possibility. It's possible. There are big challenges when you try to do that. And, you know, we can yes, there are big challenges, actually. Because the nice thing about this is that the cloud is always available. If you're encrypting and requiring that well, it's assumed to be always available. If it's not always available in CleanOS, let me just >>: [indiscernible]. >> Roxana Geambasu: Yes, please. >>: I have to provide something, and that something presumably is [indiscernible] my fingerprint or whatever. They can use that information. >> Roxana Geambasu: So what you are thinking is whenever you need a particular key, you would prompt the user for that key? Is that what you're thinking? >>: Whenever I log in. [indiscernible]. >>: [indiscernible] unlock my phone You lose some information, which is what gets lost. >> Roxana Geambasu: Yes, that's right. are losing it altogether. >>: The [indiscernible] you [indiscernible]. >>: The data is not exposed to other people, but you may not remember [indiscernible]. >> Roxana Geambasu: Well, I don't think you can tell what [Multiple people speaking.] >>: Hang on. The assumption here is that when you lose your device, there's a possibility of losing your device while some of the data is not protected. So it's the same >>: [indiscernible]. >>: No, no, she's making two criticisms of your alternative approach. The first one that is in your approach, if you never lose the phone when it's not locked, when it is not locked, your [indiscernible] is fine. Because there's no auditing required, because there's no data that can possibly be lost. So Roxana is assuming that you can lose the phone when it's unlocked. [indiscernible] losing your phone when it's unlocked cause the system ] [Multiple people speaking.] >>: Because this scheme >> Roxana Geambasu: Doesn't >>: One is it doesn't it has finer granularity locking. When your phone's unlocked, it's only you know, much of the data is still locked. Only the part you're looking at is unlocked. >>: No, I'm not changing that. All I'm changing is [indiscernible] I could just throw away my [indiscernible] or encrypt my [indiscernible] key with internal information my [indiscernible] password I use to unlock my phone. That's all I'm out here to change. >> Roxana Geambasu: Yes. [Multiple people speaking.] >>: You can't encrypt your web password, because you're going to have to type that password in every time you go from one message to the next. >>: Yeah. >>: No, I'm not, right. [Multiple people speaking.] >>: When you turn off the phone, are you going to leave the key lying around. >>: There's no key lying around. >>: In the phone >>: The [indiscernible] lying around in the phone when it's on here, because [indiscernible] trusted cloud [indiscernible] retrieves whatever key, you have to authenticate yourself and you [indiscernible] authenticate yourself again and again. [Multiple people speaking.] >>: No, no. Remember when you're going to the cloud, one thing you can do to the cloud is from a different channel, you can revoke that connection, whereas you can't revoke the key that's sitting on the phone. Revocation is an important >> Roxana Geambasu: things here. So there are two well, there are two >>: Because when the device is lost, you call it IT department and say please revoke the cloud key, which is >>: [indiscernible]. >>: Where you can't call the phone and tell it to revoke its key. >>: [indiscernible]. >> Roxana Geambasu: So first of all, the master key, I did not, you know, propose the master key. What we do is we have part SDO keys, part object keys. >>: [indiscernible]. >> Roxana Geambasu: correct. You have to use the network, yes, that's >>: So that's [indiscernible] you don't want to store the authentication on the phone. >> Roxana Geambasu: Yes, I can. So here's the way this will work, right, and I think it's my next slide. Is it? Yes it's my next slide. So if you can hold off just a second, you know, I'll be able to answer, you know, your particular question. And the idea is that we don't offer just, you know, minimizing accumulation, but we also offer auditing with that. Okay. So it will become clear. All right. So the idea is the following, right, that, you know, the application kind of implements this, and the, you know, I'm sure in here, I just wanted to show a couple of examples. Let's suppose that the email application may have, for example, a content SDO which contains, you know, corresponds to an email and that content and a password SDO, right. When you read your email, okay, that password is actually not used. It's only used during refreshes. So this is evicted. This doesn't exist on the phone for all intended purposes, okay? It is still available. Its key is still available in the cloud, all right. However, you have to go to the cloud to fetch that key in order to access this, all right, because it's evicted, right. When you stop reading the email. When, for example, you send the application into the background, the content SDO is not used either. So both of them are evicted, okay? So how do we use this? Did I scroll? How do I scroll? Okay. All right. So, you know, how do we use this? What's the major benefit because I think that's what you're really asking. What's the major benefit of CleanOS. Well, it's the fact that it increases post loss data control. For example, suppose the following situation, right. Your device gets stolen or lost at some point in time. And you notice the device is lost after a while. Initially, before the thief stole the device, he had no access to it. But after he steals your device, he gains full access to the device and any data stored on it he can tamper with a device in all sorts of ways, both in hardware and in software. For instance, he can dump the content of RAM, et cetera. With CleanOS, however, two things happen. First, because CleanOS keeps evicting item data at a fine granularity, it ensures that only a few sensitive data objects are exposed on the device at the time of theft. Second, after theft, the cloud can implement a variety of useful functions on top of the SDOs that, you know, those SDOs that were not exposed, okay. For example, it can log all accesses to this SDO so that you can tell after a while, after you go to the cloud, you can go and tell ask him what data was exposed after theft. You can also disable all accesses to these new SDOs, these other SDOs that were not evicted yet that were evicted at the time of theft. So therefore what I'm trying to say here is that CleanOS gives you much better control and transparency over the sensitive data on your mobile device, and the way it achieves that is through fine grained minimization kind of data accumulation or preventing, I guess, you know, data accumulation, all right. Does that address the questions, the concerns that you have? >>: The point is when a phone is lost when it's unlocked. >> Roxana Geambasu: Yes. >>: Now we have a better way to control it because the key is not stored on the phone. >> Roxana Geambasu: Yes. >>: The key is stored on the phone when it's unlocked [indiscernible] lose control. And when the phone's locked, it should be the same. >> Roxana Geambasu: Yes, that's right. Well, yes. Except there's one single difference. How do you know that your lock has held? Okay? How do you know? You lose the device. How do you know what's happened, right? Do you know for sure that that lock, that password that you set is locked, you know, cannot be exposed? Can you tell? You know. Maybe you can for your passwords. But maybe, you know, the random person can't. So the point here is that this way, you can >>: Are you making an argument that your password might be guessable? >> Roxana Geambasu: Well, it might be, right? It might be. It might be, you know, my husband works in the financial industry, and I am horrified at, you know, the kinds of practices that they have with respect to passwords particularly on their mobile devices. And as I told you >>: That's a pretty essential assumption here. >> Roxana Geambasu: Yes. >>: I think you're saying that if every user used 128 bit lock passwords every time they looked at their email, about 128 bits of [indiscernible], you would need this? >> Roxana Geambasu: Yes, correct. >>: Okay. Well, no, that's important. That's a valuable thing to communicate [indiscernible] we understand that that's an important >> Roxana Geambasu: But that's obviously not feasible, right? >>: I think that's a fine assumption. clear until now. I just think it wasn't >> Roxana Geambasu: Okay, okay. Right. So my point is the following, you know. Irrespective of exactly what attack, what, you know, protections you're talking about and so on, I think the basic, you know, concept that I'm trying to communicate to you guys is making data accumulate enormously and then trying to protect it in all sorts of ways, to me, is not an indication of a responsible data management theme, right. What you should do, in my mind, what you should do is what Snap Chat does. For example, you know, we think it does. Which is to reduce the documentation because then you have a lot more control, and this at least, when the bad thing happens, if it happens, because attacks are inevitable, unfortunately, in today's world. When attacks do happen, at least you know for sure the least amount is compromised and second potentially, you know, if you have this kind of an architecture, right, you can audit what the exposure was, and that's what I'm trying to communicate here, right. Irrespective of the specifics of particular attack, and I think that's a concept that is applicable for mobile devices, right, for clouds, and for, you know, for potentially other, you know, corporate networks, potentially other, you know, environments as well, okay? >>: [inaudible]. If there's a strong password to lock the device, then you don't need this. >>: Well, sorry [indiscernible] if your device is stolen while unlocked, there's a question of how much stuff is there. So there's certain there's two entry points that the attacker has. One is the attacker, if the attacker gets ahold of the device while you are using some of the state [indiscernible] how much of that state is accessible and also be able to tell remotely how much of it could have been accessed. The second observation is even if you did have the device locked, what it was a crummy password? [indiscernible] same guarantees. In other words, a crummy password looks like an unlocked device. So the question is how much how much trouble could have been >> Roxana Geambasu: So I thought that the alternative was to have one password per object [indiscernible]. >>: Suggesting >> Roxana Geambasu: >>: A strong password. A strong lock password. >> Roxana Geambasu: A strong lock password. If you have encrypted RAM, if you have, of course, you're encrypting your disk and you have a then exactly what you said is valued, right. >>: Except the [indiscernible]. >> Roxana Geambasu: >>: Yes. I think [indiscernible]. >>: And if you use a strong password, unlocking it is not easy, right? Pretty much I think you won't be able to break because the firewall [indiscernible] whatever it is can make the bar really high. That means the key [indiscernible]. >> Roxana Geambasu: So by the way, that's very dangerous, the whole, you know, [indiscernible] five times thing, unless, you know and maybe it's okay now because all of your data is really in the cloud and you can just, you know, reset your entire device. But, you know, the five account, five times trials is dangerous, especially if your key's only on the local device, not in the cloud, because then you can pick your device and all of the data that's stored there, if there is data that's only stored there >>: [indiscernible] opposite way. Now you use the cloud to back up that key in a way that takes a greater burden to unlock. But there, the cloud is providing disjunctive access to the device rather than conjunctive access to the device. In your approach, the cloud needs to get involved if you want to get to the data. >> Roxana Geambasu: Yes. >>: And the way you get to the access is that you [indiscernible] your key somewhere which is disjunctive. >>: [indiscernible] trade off of increasing the complexity of password and the increase the limit of accounts, right? You could practically it won't be repeat that many times and it's [indiscernible] unlock the device once locked. >> Roxana Geambasu: So, you know, I forget, there is nothing that I can argue. But, you know, the reality that we live in is this, right. That, you know, users oftentimes don't do that. >>: [indiscernible]. >> Roxana Geambasu: No, no, I'm talking here, actually, 56 percent of corporate users. >>: [indiscernible]. >> Roxana Geambasu: >>: You're one of them, so there are Yeah, I am one of them. You have to >>: [indiscernible] I choose to not have a pin, because too much more convenient than my security. I have the pin just because [indiscernible] requiring me to do that. But the moment I can >>: [indiscernible] it's because if I lose it, I want people to use that to call me. >>: That's another reason. >> Roxana Geambasu: That's a great use case here, because you lose it, right, somebody else, you know, finds your somebody good, nice, right, find your phone and wants to call you, go to the little, you know, server there and you look is this someone nice? I don't know who this guy is. Has he looked at something significant or not? >>: [indiscernible]. >> Roxana Geambasu: It's true, right, but at least you know, right? What can you do? You're completely right, but at least you're aware. So again, lack of transparency, lack of [indiscernible] and so on, you know, are very problematic for users today, I think. And, you know, we have too much of that. So, you know, I agree with you on this, but on the other hand, you know, I think there is, you know, great room for improving. So what you're saying, essentially, is well but if I had a complex enough password and unlock system, this would all be fine. But that really, what we're doing and we've been doing this for too long, I think, systems people, we've been pushing stuff on to the end user, the responsibility on to the end user, right. So we let stuff accumulate and then we say ah, and the user will fix this because he will use a strong password. And, you know, he will have to type it a thousand times per day. I don't know. I'm making this up, right, but, you know, he'll do that if he cares, okay. And, you know, the reality is that oftentimes, you don't care until you care. So until you lose that device, right, maybe, you know, okay, well, I prefer, you know, [indiscernible] and so on, not having to unlock things and to type things and so on. But once you lose the device, you think oh, my God, what was there? Shoot, you know. What did I lose. And so that's the question that, you know, this is actually trying to answer. First, it's trying to ensure that it doesn't answer. And second >>: I just speak for myself. When I lose a device, it's not a huge deal because I don't have secret that I don't worry about people I mean, I worry about my financial safety, but I every financial application require password to log me in. So they wouldn't be able to use my apps [indiscernible] Bank of America and [indiscernible] confidentiality on my data, like my pictures and my email, I mean, they won't be able to use anything really harmful to me. >>: What you're arguing is that financial organizations are already doing this on a per application basis. >>: That's it. That's what I care about is for anything that wants to do this protection, they could do this at the course granularity and as [indiscernible]. I don't care [indiscernible]. Willing to type in a password every single time. I never want to do that. >>: The thing that you're assuming that is different or that Roxana is assuming is that in your example, you're invested in protecting your financial data. What Roxana cares about, for example, the corporate user who isn't terribly invested in protecting Microsoft's [indiscernible], yeah, there's a please, I've got to follow it. But you're saying can we arrange the device in a way where we can lower the cost to you as an end user to providing that security, you know, to participating in protecting [indiscernible] data, even though personally it's hard for you to care on a day to day basis about that [indiscernible] with the same level of conviction that you care about your stock options disappearing, right? So that's what, I think, the way in which your example didn't apply to her argument is you say, well, the applications I really care about, those are being protected. But if every single Microsoft internal site that had [indiscernible] on it made you type another stupid password or remember a password and you deal with some sort of key chain thing on your own, I mean, that's where that >>: But now we are not talking about an app now? about using a [indiscernible] to access We're talking [Multiple people speaking.] >>: You generalize. [indiscernible] is on your phone. >>: We generalize you can make an app and you can make the SkyDrive Pro [indiscernible] just like Bank of America. And we'll sign out every single device and now it's encrypted. You could just do that [indiscernible]. >>: This is more usable. every time. You don't have to enter a password >>: But this makes a [indiscernible]. This is more usable? >> Roxana Geambasu: So I don't argue either way. I don't want to argue either way. You know, the argument that I'm really trying to make, again, at a much higher level than going into the very specifics of each case, right, is that what I'm trying to do is I'm trying to investigate what happened you know, what it does mean to accumulate enormous amounts of data as we are capable from a technical perspective to do today, okay. What does it mean, okay. And how can we control that accumulation in some way, because there are I'm sure in here one case in which I do believe there are scenarios, situations in which this minimize accumulation or limited accumulation is very useful, okay. But kind of the broader argument I'm trying to make is that, right? So >>: [indiscernible] I like what you say users don't block their devices and configure poor passwords. However, if they have a strong password and keep it locked all the time [indiscernible] unlocked and [indiscernible] minimal. That's going to be very unusable. >> Roxana Geambasu: Um hmm. >>: You argue that there is a big usability issue of relying on strong passwords, short intervals. >> Roxana Geambasu: Um hmm. [Multiple people speaking.] >>: In comparison, does your scheme [indiscernible] usability? >> Roxana Geambasu: Well, from certain perspectives, yes. From other perspectives, no. In particular, you need your device needs to be connected, by and large, right, in order to be able to access your data. >>: [inaudible]. >> Roxana Geambasu: It's always connected and, you know, unless you're in the New York subway, the New York City subway and that's when it doesn't work. Hm? I'm sorry? [Multiple people speaking.] >>: I'm curious about the answer. connected. >> Roxana Geambasu: So let's say it's always Yes. >>: Does it present some usability benefit so that my data is protected? >> Roxana Geambasu: So I'm not going to argue that it's now good if you CleanOS it's now good and fine for you to not lock your device, because your data will get indeed your question, right? Well, you know if I know that my data [indiscernible] access, what good is it for me, right, you know, that I know, right? So I think protecting your device is still important. Perhaps the pressure, you know, to do that is not as high potentially, right, as before. But, you know, I am a thorough I'm a full believer in, you know, protecting your device as you, you know, as you normally would, right. But with this additional >>: There's a trade off. So you don't have to spend so much effort the user don't need to spend so much efforts, you know, in a hard password or [indiscernible] say unlock, unlock. [Multiple people speaking.] >>: It's a trade off [indiscernible] but here if the window between your device is lost and the time you know you realize that you've lost, it [Multiple people speaking.] >>: Doesn't protect anything. >> Roxana Geambasu: Doesn't protect. >>: [indiscernible] protect you but tells you what data was accessed. >> Roxana Geambasu: >>: [indiscernible] lose your data. >> Roxana Geambasu: >>: Yes, that's right. Yes, correct. And nobody knows, right? >> Roxana Geambasu: That is correct, but there are two >>: She's [indiscernible] are you talking about Roxana's proposal? >>: Yeah, I'm talking about [indiscernible] that window, you're not being protected. You're only being collect [indiscernible] information say oh, that sensitive email was downloaded to my device through that window, even though I [indiscernible] but during that window, nobody is protecting you. [Multiple people speaking.] >>: There is [indiscernible] >> Roxana Geambasu: >>: But not just [indiscernible]. [Multiple people speaking.] >>: The keys are stored in [indiscernible] and unlocked and it's just you. I don't ask you to type a password, a pin. I will just you click on that email, you retrieve it >> Roxana Geambasu: So you retrieve the key, okay. So what will happen in that case this, right. The thief, right, will try to access this email and that email and that email. It will always keep the cloud, okay, because they're all evicted, let's say, okay. And then, you know, you will show in your audit log that this and that and that have been accessed, okay? In addition to that you may ask well, why doesn't the thief then just try to access all of them and decrypt all of them and get, you know, everything, CleanOS would not offer. You can do that, but note that on the cloud side, you could also be looking at, you know, monitoring your accesses and so on and kind of predicting when you're very high um hmm. >>: [indiscernible] right because [indiscernible] if I'm doing search. Search my mailbox. You want to allow me to do a local search? >> Roxana Geambasu: You know that's a good question, right. Well, first of all, a search typically goes into, you know, goes into an index and so on. >>: [indiscernible] on the phone to search [indiscernible] no big deal, right? >> Roxana Geambasu: No, it's not a big deal [Multiple people speaking.] >>: I need to decrypt a lot of data to search. >> Roxana Geambasu: Right, but what I mean to say is that it's going to go into an index so you don't necessarily access all the contents of your entire email. >>: The index is valuable. >> Roxana Geambasu: The index is. So and I believe >>: With the index, I can reconstruct all of the data, so I think you [Multiple people speaking.] >> Roxana Geambasu: You're right, you're right. >>: I don't think you want to claim that revealing the index is any weaker than >> Roxana Geambasu: Yes, you're right. And, in fact, I don't know, you know, if this is covered. But I think that you would actually include the index. You would actually be including it in the SDOs themselves. >>: [indiscernible] saying once you get into the territory of a behavior based monitoring, it's become very messy in some sense. >> Roxana Geambasu: Well, I don't know. I think that there are things that can be done, you know, big data is used to derive, you know, behavior. So anyway, but Why don't we use it for security so much? >>: Data allow [indiscernible] mobile OS as well, right. This device was always in Microsoft building. If not in Microsoft building during the work hour, I'm going to [indiscernible] myself asking how the pin every five minutes. It was the type of behavior that can be done in the cloud because we do that in the mobile app. That's what I'm saying. >> Roxana Geambasu: You can. And again, you mentioned, you know, if you're outside, right, break it to the user, you know, leave it to the user to secure, right. So you know, like in general, right, I think protection systems, in the end, will touch the user. Will have an impact on the user. The more you increase your protection, the more the user will be affected. I think there is a fundamental trade off here, okay. And so there is a big balance here that, you know, careful balance here that you need to think about, right, in terms of how much protection do you want, because then you're losing usability, right, like what you've been talking about. If you want, you know, this needs to happen, right, this trade off needs to be set somewhere, and what CleanOS gives you is that if your trade off if you want to make your trade off, you know, fairly kind of, you know, favoring usability quite a bit, right, then CleanOS has more so the point is, you know, depending on where that trade off is, CleanOS is more useful or not, okay. More or less useful. So if you're willing to make usability a priority, right, if that's what you want to do, make usability a priority, protection goes down, okay. You don't require the user to enter every five minutes. You don't require them to enter, you know, very complex password and so on, right. So protection goes down. Usability goes up. CleanOS becomes more important. When, vice versa, you want to make protection, you know, much more important, usability, you know, goes down. CleanOS is useful, right, in a sense goes down as well. So that's kind of, you know, one way to think about it. sense. I don't know if this makes >>: So now I'm a little bit confused. Can you remind me of the benefit of eviction for SDOs? Exactly what's the benefit. >> Roxana Geambasu: So two things, right. You evict the fine granularity and object granularity, right, and you get two things. You get minimal accumulation, right. You get the property [indiscernible] the minimal amount the minimal number, right, of objects are exposed at any point in time. And second, you get auditing, the auditing benefit and the remote control benefit of these of non evicted >>: If an attacker gets the device when it's unlocked, he could basically access [indiscernible]. >> Roxana Geambasu: Yes. It could, as I said, it could, you know, potentially do that, right. And there could be, you know, multiple types of attackers, right. Some attackers will, you know, try to do that. Other attackers won't, right, like you're saying maybe somebody good as actually kept your dear device. It's not really an attacker, okay, and they don't, you know, snoop around. As a user, you have no idea, you have no transparency into that post theft. You have no idea what's happened. >>: [indiscernible] purposes, not like work. access [indiscernible]. >> Roxana Geambasu: on its own >>: Accumulation. So it's minimal So minimal accumulation, just Seems like mostly [indiscernible]. >>: I think the stolen device is just one of the theft scenarios here. >> Roxana Geambasu: Yes. >>: You also assume code [indiscernible] and hardware level tampering. [Multiple people speaking.] >>: [indiscernible] able to do that, I would first try to look at all the objects. >> Roxana Geambasu: Yes, you can [Multiple people speaking.] >>: [indiscernible] software and look at memory. [indiscernible]. So >> Roxana Geambasu: Right. So I think we do assume that, in fact. And as I said, right, we do assume cold boot and cold boot could happen, right? >>: You could [indiscernible] a little bit, but I think you need [Multiple people speaking.] >> Roxana Geambasu: So let me make something clear. CleanOS will not, very clear, and this I agree with this, will not protect your data in all circumstances, okay? It will protect your data in some circumstances, where you are able to, you know, disable accesses to the keys, you know, before all of them got compromised, okay. However, in all situations, it will tell you what really happened. And there are many situations that could have happened after you lose the device, and you don't know which one it is. There could be a cold boot attacker which goes through every single thing and grabs every single thing, okay. In that case, at least you know, and you say oh, whatever. There is no use for CleanOS, quite frankly. But at the very least you know that. Whereas there could be, you know, another type of attacker who doesn't do that right. He's a hardware thief or a nice person or whatever, doesn't do that, you'll know, okay. So that's kind of the point that I you know, you can differentiate the very least. Right now, you lose the device, you know, well I don't know what's going on with it, okay. If you lose the device, I'm going to ask I've asked a lot of people this. If you lose your device, do you know what you've lost? Do you have any idea? I'm not going to run a poll now because it's getting a little bit late, and I do want to talk about my second project. >>: [indiscernible] because now I'm still relying on the applications to do the right thing and I don't know if the application is doing the right thing. If the application's not smart enough to mark a password was sensitive, I still lose that. >> Roxana Geambasu: >>: I assume your transparency [indiscernible]. >> Roxana Geambasu: >>: Correct. Yes, so is my what? Your transparency [indiscernible]. >> Roxana Geambasu: No, that's actually the one that I'm going to talk here is a little different, but we do address that, in fact, and I haven't gotten, you know, to talk about default SDOs. So in addition to these SDOs that applications create, and I do believe this is the right thing to offer, you know, to applications, application writers that want this, right, feature, to offer this abstraction, in addition to that, we do provide a set of default SDOs where we actually identify data that's sensitive. We do our best to identify data that sensitive. And our [indiscernible] is very good. Our precision is not very good at all. And the way we do this is essentially we identify, for example, passwords where we create SDOs that, with, you know, data that comes over SSL, we just consider all of that in bulk, you know, that's sensitive. And I think there's another one. Oh, user input. User input in general is also, you know, an SDO that's sensitive. So, you know, precision is not good for this default SDOs, but, you know, [indiscernible] is pretty good, at least, you know, at least in our experiments. So, you know, there are lots of things that can be done. And in this other project of mine, you know, you can identify actually from the operating system and they can talk to you, you know, one on one if, you know, [indiscernible], you know, identify from the operating system up, you know, objects, application level objects and that's another thing. Now, okay. So I want to go forward, but here's the very quickly, so I'm going to probably spend three more minutes on CleanOS, because I do want to wrap it up, but I do want to share with you, you know, the SDO abstraction, this interface, right, and, you know, essentially what you can do, what this interface lets you do is to create an SDO as an application, right, specify the description for it, which is going to be useful for auditing, a description, for example, can be for an [indiscernible] SDO, it's a it's a, you know, subject or something like that for a password, it can be the name of the you know, your email account for which you're setting that pass for which the password corresponds to. You can add objects. You can move objects. Okay. So fairly simple interface here and, of course, you know, ourselves, we have a private SDO [indiscernible] that identifies the cloud under the device. And this is how, you know, modified version of the clean application uses SDOs. You know, a few lines of change, you know, which I think most lines of change, I think, [indiscernible] or something like this lines of change. I'm going to jump over the architecture, because we have discussed it quite a bit here and I'm just going to jump directly into, you know, a little bit of evaluation. So the one that we asked is does CleanOS limit data exposure, right. And what we've seen, you know, this table shows essentially the fraction of time in which sensitive data in the email application was exposed. Without CleanOS, the password, for example, is just for the email application in this particular table. Without the CleanOS, the password and contents were exposed almost all the time. With application defined, you know, SDOs, that goes down significantly. With different SDOs, you know, the exposure also goes down, although not quite as much as this, because default SDO a lot coarser than application defined SDOs. So that's kind of what this can tell you. So I'm not going to go into a whole lot more detail about overall the highlight that is in our evaluation, CleanOS has, you know, we've been showing we've shown that CleanOS can slower exposure of sensitive data by up to 97 percent for reasonable performance overheads, and we are talking about energy here. I believe I have at the end a graph on this. It's not included right now, but I'm going to skip through it, and I'm happy to show it to you later. Okay. So in summary, very quickly, I've shown how today mobile OSs and applications mismanage sensitive data by allowing you to accumulate on these theft prone devices and exposing it to thiefs and such, right. And I've told you about CleanOS, a new Android based OS that is designed to minimize that. And more importantly, CleanOS showcases a new and powerful view, I believe, on data security in general, which is instead of accumulating data on [indiscernible], many systems do today, because, you know, they can, and then struggling to protect it against myriad of attacks, systems and applications in my mind should be designed to manage data much more vigorously and minimize the exposure to attack and I believe that, you know, abstractions built within operating systems and cloud infrastructures can help with that and make it easy, right, for programmers to do that without losing a lot of hours, right, to do this. Okay. Unless there are other questions related to this, I would like to very, very briefly talk about the other second project. Okay. Thank you very much for coming. Okay. So the second project that I would like to tell you about is specifically an example of the second direction that I told you about transparency. So CleanOS is also about transparency as well, but also minimizing accumulation. And it was for mobile devices. I'm going to show you now an example for cloud auditing, for cloud auditing project, xRay. It's a system we're trying to create and its goal to answer, really a question, a very difficult question that I think most of us have asked in recent years. What are web services like Google and Amazon and Microsoft itself, right, doing with our data. Okay? It's a very ambitious project and it's very much in the works so I won't be able to answer a lot of your questions, I'm sure, you know. You'll have more questions than I can answer. So why do we want this? Like I said before, today's cloud services leverage users' data for a lot of purposes. And presently, users have no visibility into that. And what I want to do is I want to add visibility and awareness for the users. For example, what could we use this for? Well, wouldn't you like to know, for example, if your Facebook like buttons influence our insurance prices or if your searches change your Amazon prices? Or, you know, if why you're being constantly you know, recommended inappropriate videos this actually happened to me, and it was very frustrating. I kept wondering, why does Google think that I'm interested in this content? What have I done in the past to make him believe, Google believe that, right? So it is yeah. I was wondering for a while. So in xRay, what we are trying to do is to increase transparency and awareness by tracking and exposing what services do with users' data. And I'm trying to do this in a very generic way and without controlling the web services, okay. There are real services out there, and we're end users that were trying to, you know, get tools for end users to get more awareness. Okay. Now, this sounds very much like information flow control, right or tracking, information no tracking. And we all know how to do information flow tracking in controlled environments, in an operating system, in, you know, a controlled distributed cloud infrastructure and so on. However, here, what we're talking about is, you know, the big question, right, in xRay is how do we track information when we have no control over these clouds. How do we track information in the open internet, okay. That's a very big question, very daunting question. >>: From the cloud perspective? >> Roxana Geambasu: From the cloud perspective, exactly, right. So, you know, we're the client. We want to know what they're doing with our data. We're uploaded the data to them, you know, from, you know, for a very long time I've been asking, you know, there are all of these strange auditing systems that exist and, you know, [indiscernible] retrievability and so on. I've been asking myself, how can you do that with accesses for a very long time, for two or three years. And, you know, a year and a little bit before, you know, I thought about, you know, a couple of insights, right, that I think, you know, have led to pretty, you know, good early results. Okay. So the idea, you know, the key insight relied on this observation is that oftentimes, the use of your information comes back to you in a diluted form, okay? In the form, for example are of targeted ads, of products that are recommended to you, prices that are modified, right, based on your data. Videos and so on. Okay. So if you input, you know, you input your personal data into the cloud, the cloud does it's magic, and then affects the output that you're seeing on the website of the cloud or on another website with whom the cloud has shared data for example, if that happens. In intuitively, if [indiscernible] inputs and this is outputs and you look at the correlation between the inputs and the outputs, you know, you may be able to tell, you know, which data led to which, you know, [indiscernible]. >>: [indiscernible]. >> Roxana Geambasu: >>: Could you [indiscernible] somehow. >> Roxana Geambasu: It's related to [indiscernible] marking except that it's real data that is not [indiscernible] in any way. You can improve our system, but we don't do [indiscernible] yet. That is creating, you know, create, you know, uniqueness about the units. We keep the units sorry, I didn't tell you about specifically what we do, but we keep the datas, the user's data [indiscernible], okay? It's related. All right. So that's what we want to audit, right? We want to kind of audit the correlation between these inputs and these outputs. Now, even with this simply said, the problem is still too broad, right, and complex and abstract. So, you know, how do we make it more [indiscernible] so that we make progress on it? Well, through [indiscernible] assumptions. And first, for example, we assume that users and auditors know what inputs and what outputs to track. For example, user might want to track how, you know, some of his most sensitive emails are being used to target ads. Can you tell me how many minutes I have? >> Helen Wang: [indiscernible]. >> Roxana Geambasu: >> Helen Wang: How many minutes. About five. >> Roxana Geambasu: Five minutes, okay. I can do that, yes. Thank you. Okay. You know, there are a number of other assumptions, but one important one is that for now, we focus on very specific scenarios, and I want to talk very briefly about the scenarios. First is ads on Google. We want to diagnose which emails have led to which ads in Gmail, okay. That's one thing, and that's what we're focusing on the most for now. Second, we want to take, you know, look, diagnose products and prices in Amazon. So, for example, in Amazon when you search for something, you get a bunch of recommendations, right, for things that you, you know, output. And [indiscernible] differs depending on, you know what account you are. Some users may be, you know, believed to be more interested in more expensive stuff. Others in cheaper stuff. So [indiscernible] matters. So we want to understand which of the previous searches or purchases led to this order. >>: [indiscernible] machine learning algorithm. >> Roxana Geambasu: Not reverse engineering. Not reverse engineer. [indiscernible] black box treated as an F function, that is a bunch of input in a particular way. But I don't want to assume, you know, almost anything and I don't really want to know what happens there. I just want to know, you know, correlations, right, between the inputs and the outputs. So reverse engineering means that I will understand causality. And I cannot. I will not do that, okay? Instead, what we will understand is correlation between the inputs and the outputs, okay? That's very different and much more limited. >>: So given this information [indiscernible]. For example if you knew which personal information was used to [indiscernible] to reverse. >> Roxana Geambasu: Well that's a great question. We don't do anything right now, but I think there are solutions. So, for example, what can I do for Google to not think of that. Not think this way of me. For example, could I be looking what should I be looking for, right? >>: [indiscernible] what would happen if they provide certain [indiscernible]. >> Roxana Geambasu: Maybe. Maybe that's a [indiscernible]. [Multiple people speaking.] >>: [indiscernible]. >> Roxana Geambasu: Yes, for emails, it's hard. Potentially for searches here, you may say something like oh, if you're searching for this, be aware of that, you know, you're going to get, I don't know, higher prices or something. You may not want to search for something. Or you search for this, you know, then you say, oh, you're going to get higher prices now. Search for this as well and you will not get the higher prices anymore. Or something. I don't know. >>: [indiscernible] suit company for discrimination. >> Roxana Geambasu: >>: So I'm talking [indiscernible]. >> Roxana Geambasu: There are many uses of this. I think there are many uses of this, and one of the biggest uses is for people, you know, like, you know, journalists, for example, right, to raise these issues, to leverage this tool and raise the issues. And then Monday I'm actually meeting with a journalist from [indiscernible] who, you know anyway, you know, I can talk to you. >>: [inaudible]. >> Roxana Geambasu: Yes, I do. Okay. But let me tell you very briefly how it works. The way it works, you know, so what's the mechanism, right? What's the architecture? Well, the way this works is that you have a primary account. Then we create a bunch of virtual accounts for you. We don't necessarily need to create them. We can use them from others, but I won't talk about that. [indiscernible] the way to think about that is there are a bunch of virtual accounts which contain subsets of your data, okay. Not the whole data, but subsets of your data. These, you know, lead to certain output sets and then xRay, what it does, it likes at the differences between this data and these input, these commonality between these accounts and the outputs and kind of like [indiscernible], another way to think about >>: [indiscernible] it's not generating new data [indiscernible]. >> Roxana Geambasu: It's not generating new data. of the same data that are similar data, okay? >>: It's subsets Except that I think that the cloud is very [indiscernible]. >> Roxana Geambasu: Amazon is very deterministic, as it turns out. [indiscernible] is not that deterministic. Therefore, what it means is that we need more accounts to get better coverage, virtual accounts. Whereas within Amazon, we need very few to get that. >>: [indiscernible] emails and how do you get a subset of data [indiscernible]. >> Roxana Geambasu: So, you know, it varies and I'm going to talk to you later because I have one minute to do this. It varies. One way to do this is to create the virtual accounts and forward your emails to these virtual accounts, right, for example, we don't actually only do this because that scales very poorly. What we will do is something like what is it called, community? We have name for it. Collaborative auditing, where I can actually use stuff from your account and identify it with mine so if we've sent a similar email in terms of ad signatures then we're going to match this together and I'm going to say, oh, I'm going to use Helen to be able to Helen's email, I don't know the contents of your email. I just know the signature, you know, to this not necessarily myself, but this somewhat trusted cloud provider. But I'm going to tell you about that later. Very quickly, just as an example, right, in order to let's say that you have this account. You have this distribution of your emails in these virtual accounts, and in order to diagnose this ad, you know, suppose that these accounts, you know, see the ad. This one doesn't see the ad. Therefore, you know, the fact that it too is the common one here and if three, you know, doesn't seem to see the ad, so it too, you know, targets ad one, explains this observation. So that's kind of how it works. And we have a simple base network to do this. This is very early results that we have. To monitor 15 emails, this is the number of virtual accounts and the accesses. This is the precision and [indiscernible] and what we see is that precision and recall go pretty high and this is zero non optimized and really the first few that we tried, and the recall is 76 percent, and precision is around 87 percent for ten accounts, okay. And again, you know >>: [inaudible]. >> Roxana Geambasu: Well, we are the ground truth. And by the way, we are wrong oftentimes. Not wrong, but there are things that xRay diagnoses that is like, oh, interesting. You know. So we ourselves, you know, we look at the emails and the ads and match themselves ourselves with our minds. And sometimes, believe that we are wrong. We do not have complete ground truth, okay? So it's a little bit of a fuzzy ground truth. In Amazon, we did have ground truth, because they actually tell us for one particular feature [indiscernible] is the ground truth. This, you're seeing this because of that. So we don't have ground truth. And I'm going to show you those results, because they are much more. So in any case, in conclusion, what I'm trying to say here is that today's practices are loose and overly permissive oftentimes in terms of hoarding data, accumulating data and in terms of opaqueness to the users so my research, you know, aims to create new abstractions to pursue responsible data management, you know, which consists of two things. Curbing data accumulations and increasing transparency. And I've shown you one example for each. Thank you very much and I'm two minutes late. I think many minutes late. [Multiple people speaking.] >> Roxana Geambasu: Thank you very much for your questions also.