>> Seny Kamara: So it's a pleasure to have Travis Mayberry speaking today. Travis is a PhD student at Northeastern University, and he'll be speaking about encrypted hidden volumes. >> Travis Mayberry: Thank you. Hello everyone. Thank you for having me here. I want to start by just saying how many people here use full disk encryption? So a lot of companies I know mandate it. I don't know, is that a thing at Microsoft? Everybody has to use it? Okay. So you're all familiar with that, but the idea is you're going to encrypt your entire disk and you're going to tie it to a password that you know and it will protect your data if you happen to lose your device or if you happen to just walk away from your device while you're not using it nobody can get access to it without the password. So instead of the password being tied to some operating system procedure the entire hard drive is encrypted and the key is derived from your password or potentially stored in your trusted hardware, Blob, something like that. Somehow it comes from your password. And again, it's increasingly relied upon by industry people and government people everywhere. It's a big technology, really important technology. However, there's still a significant problem with it depending on who you talk to. Some of you might be familiar with XKCD. This is one of their more popular comics. The idea here is there’s kind of a difference between what people think that are crypto experts and what actually happens in reality. So you think that you encrypt your laptop with a really great cipher and it's super secure and it's going to take somebody millions of dollars and a ton of computers to crack it and they're just going to give up, right? But in reality something like that is not necessary. So if you live somewhere where maybe the law enforcement is not so hesitant they can just capture you and coerce you into telling them the password is the point. So the password is the weak spot, and no matter how good your encryption is if somebody can beat you until you give them your password it’s not going to help very much. So there are some technical things we can do to get around this. So there are ways to get security against an adversary that is going to try and coerce you and the main one, for the last who knows however many years since it came around, is to store some of your data in a hidden volume. So the idea with this is that you encrypt your entire disk as before and your encrypted disk contains some pieces which are data and some pieces which are just empty. That’s just a side effect of the fact that you're not going to use your entire disk at once. So you have some parts that are encrypted with your key K and some parts that are empty space and the kind of motivation here is that the empty space, if you fill it with random bits, just junk, then it gives you two kinds of advantages. One is that somebody who doesn't have key K can't tell what portion of your disk you're using. They can't tell where the files are and where the files aren’t because encrypted things look just like random bits if you’re using a good cipher. And the other bonus is that if they do have this key K then we can apply kind of like a second layer here and we can actually hide more information in this free space and we rely on the fact that that is going to be encrypted with a different key and since we have, again, this property that cipher texts are indistinguishable from random bits if you don't have the key then what we have here is two volumes kind of interleaved or layered on top of each other; and I have the ability now if someone captures my laptop and they come to coerce the password out of me. I say sure, sure, sure. Here's my password and you give up the first key. Now from their perspective they can decrypt your data, and to them it all looks good. They see your data and they think that's all there is to it and they have no reason to keep coercing you. So they won't get access to the second key because you’ll deny that you have anything in that space, and there’s no way they can prove to you that you do have anything there. So to them it just looks like random bits, you don’t have any other volumes, and I'm done. Yes. >>: When you access the data you know where the data is stored on here that is not store there? >> Travis Mayberry: Certainly. So when you're accessing it you have to have both keys at once. If you were to login, for instance, and you logged in with only one key, which you have to have the ability to do, then you would be clobbering all of your data. So if I want to write something it would write data over what I think is free space but is actually my hidden volume. So yes, you need both passwords to log in and if you log in with one, just so that you have the ability to say that you only have one volume you have the ability to log in with one password, and when you can you can overwrite some data potentially. But that's just one of the downsides. So this is implementing TrueCrypt. As far as I know it's the only full disk encryption which offers this type of hidden volume security and it’s one of its big selling points. If you don't know TrueCrypt is now defunct. Something strange happened with the developers but it has been picked up as an open source project through several variants now that support all this still. So everything is great. Now we have this hidden volume security we can protect ourselves against coercion, but there was a paper about six years ago I think where some people noted that it's not really that secure if you consider a slightly stronger adversary. So what we had before was somebody could capture your laptop and capture you maybe at the same time and what they would be able to do then is to give them your first password and they say oh, that's all there is and you’re done, can’t find your hidden volume. However, in reality if somebody has a slightly stronger adversarial ability which is that they can see the disk on multiple occasions, they get your laptop once, they take a copy of it, and they give it back to you and maybe that you don't even know that they took it and then you do some more work on it and then they get it again later, what they can do at that point is they can see that between two snapshots if there's a bunch of space that was supposed to be free space and it’s spontaneously kind of changed for no reason then there must've been some data hidden there. So this is a relatively simple way to reveal hidden volumes if you have this slightly stronger adversarial capability. So is that clear? So it's not 100 percent because there is some chance that as this user, if I live in this world, I could've created a file and then deleted a file or something like that so it’s not completely foolproof but it becomes relatively clear if I have some structured access to this a data in my hidden volume. And it just kind of adds up. It’s more circumstantial evidence. So the next thing you're going to say is that never happens. How can somebody have access to my machine and give it back to me? But it can happen all the time is what I would argue. The reason is that people really trust in this full disk encryption and they’re not so careful with their devices anymore. For instance, when we went to lunch today, I left my laptop just sitting in the office there with the door open so anybody could have gone up, anybody that has access to the building could've gone in there and done whatever they want with my laptop. Additionally, we have devices that we take with us all time, phones and tablet and things they're very easy to lose track of. And if you set it down and you walk away from it, you come back it’s still there, you don't think of it. So I would argue that it's relatively common for that to happen; and additionally if you live in some countries where you maybe don't trust your own government then it becomes an even bigger problem because they have a lot more access to your own things than you do. Yeah. >>: What about cloud storage? So if you have a network disk? >> Travis Mayberry: If you have a network disk, that's a really good question whether you can have some sort of hidden volume type thing. I want to say off the top of my head that it becomes more difficult, and the reason is because they can see every interaction that you do with them. In this case you get a little bit of privacy because you do some stuff and then the guy gets access to your computer and he doesn't get kind of continuous access to it, it's me and then you and then me and then you whereas the cloud guy has access to everything all the time. But that's an interesting question and worth thinking about. So having hopefully established that that's a reasonable situation to you our contributions are we are kind of the, as far as I know the first people to come up with good security definitions for hidden volume encryption. Before it was just ad hoc. The TrueCrypt developers thought it was a cool idea and they put it in there. We also have a construction which is going to be secure against these multiple snapshot adversaries according to our definition that we come up with. And then we're going to show that basically this construction requires Oblivious RAM as a building block and we're going to show that you can actually come up with a slightly more restricted Oblivious RAM which still satisfies our purposes but ends up being much more efficient than previous work. So we have this Write-only Oblivious RAM which I'll get into in a bit, but it's going to have much lower overhead and make this whole thing kind of practical. And again, we also have an implementation of our construction as a Linux kernel module and some performance results that I'll show at the end. So, starting with, we have this security game. The idea here is that it's a game between an adversary and a challenger and the adversary is going to choose L which is some number of volumes. So an additional contribution we have here is that we’re going to kind of generalize this to as many volumes as you want not just one or two volumes, but that's just a small thing. So the adversary chooses L which is less than max. We’re going to set some upper limit on the number of volumes you have but it could be arbitrarily high. And then he sends that value to the challenger. The challenger is going to flip a bit and he’s going to essentially instantiate this scheme with either L worlds, L volumes, or L minus 1 volumes because that's where the uncertainty comes from. He’s going to say the adversary picks a number, challenger says okay, I'm going to either have that number or one less than that number of volumes and then we’re going to play this game and you're going to have to guess how many I actually set up. So what we are hiding here is the exact number of worlds, kind of the intuition is the adversary should never know whether I have this many worlds or I actually have one more than this many worlds. That’s the, yeah? >>: Does the adversary have L minus one passwords available? >> Travis Mayberry: Yes. So it generates L passwords and we will send L minus one passwords and a snapshot of the disk at that point. And then we also have, as pretty standard in these type security definitions, we have a chosen plaintext ability of the adversary. He sends some accesses to the challenger and the challenger will execute one or the other and then you get a snapshot back and we do this repeatedly over and over and over. And at the end, after seeing all these snapshots, you try to guess whether you have L worlds or L minus 1 volumes. >>: So [inaudible] L passwords he cannot figure out the next one like this? >> Travis Mayberry: Yes. So I give him four passwords, he can’t tell whether there is a fifth world or if four was the end of it, whether that was all of them. So it might be obvious right away that we can't let him choose these accesses completely unrestricted and the reason is because he was just to pick them to be writes to a volume that is less than L, so a volume he has the password to, he could say write the number 10 to in this operation and write the number 15 in this operation and he’ll just, since he has the password he can just decrypt them and see immediately what they are. So we do have the restriction that he can't do this kind of straightforward attack which is that if one of the operations is a write to one of the volumes he a password to then both the operations have to be the same and that just avoids this trivial distinguisher. So some problems with TrueCrypt that kind of motivated this in the first place and which are going to lead to our solutions are that the hidden volumes are stored in a really predictable and stable location. So if I access one particular file in my hidden volume, it’s always in the same place, and of an adversary to see is that place change he'll notice something. So the problem is that they're always in the same place and that place is easily distinguished from the main volume. They look like different things but the main volume is obviously data and the other volume is obviously free space. And additionally, as I said, this only lasts for one hidden volume. We’re going to generalize that. So the key idea is that we are going to have, our reads are not observable by the adversary. In our security model here your hard drive doesn't record what read operations you do. It records what writes you do in the form of data changing, but no matter what you read it doesn't keep any track of that. So this is different from, for instance an adversary which would sit kind of between you and your RAM or even between you and your desk he would see all of your accesses, but in this case the adversary sits kind of outside of the system and is able to occasionally get snapshots and so he only sees the results of the write operations and not the reads which is going to be key to getting efficiency in this case. So the main kind of technique that we'll do is that we are going to unify the read and write operations into a single access operation so it will be the same thing basically whether you're reading or writing. And what that allows us to do is make it kind of indistinguishable what you're actually doing, whether you're doing a read, whether you're doing it write, whether you're accessing this volume or that volume, it's all going to look the same. We’re going to try and kind of flatten that out, make it all uniform operation. So the idea is that upon access every volume is going to do something. So every time we access anything all the volumes are going to do something so that we don't know which one we are actually working with and which ones are just kind of running themselves. So to do this we're going to have to use Oblivious RAM, like I said before, so the idea with Oblivious RAM is that you have a client that wants to accesses some untrusted storage, if you've never seen this before, and they want to do it in such a way that they get the data that they want and they make the updates that they want, so this is the API here, the client can read and he can write and he interacts with the untrusted storage in such a way that he gets and does the changes that he wants but through kind of this intermediary algorithm here which translates these reads and writes into a whole bunch of reads and writes, some number of them, I’m not going to say a whole bunch, some number of them that’s more than one in such a way that this guy over here who just sees the output of this algorithm doesn't learn anything about what's on this side. So the ORAM algorithm is like a firewall; and it blocks essentially all knowledge of these operations from somebody who sees the output here of these operations. So this is a really cool thing, and it's been around for some substantial amount of time, but it's kind of been of limited usefulness. It's an interesting that we can apply it this way, I think. The idea here is we are going to set up a total number of volumes equal to max, which I said before is some kind of global parameter, some kind of limit on the max number of volumes you have. And this is just concretely you can imagine that this is a parameter that comes with the software distribution that you get. The designers will say we are limiting it to 10 worlds or 15 worlds or something. So it’s a parameter, but it's a global parameter. And then the user is going to initialize W of them which is his choice as to how many he's actually going to use. So in this case if we had, we are setting up max to be four, and the user can just choose anything from one to four. They can use one volume or they can use all volumes. So the main idea here is going to be>>: But you said it’s going to be hidden. So the W will be hidden? >> Travis Mayberry: Yes. The W will be hidden, but max will not. So the max will be obvious but W will not. >>: So does that mean this space can never be used? >> Travis Mayberry: Right now yes, but I'll show towards the end that we can get rid of that. So the way we'll do the writes to start with is if I want to write the value X to location Y in volume number two what I'll do this over all of these Oblivious Rams I will execute the actual write operation on the volume that I want to touch, ORAM 2, and for the other ones I'll do this which I used to denote some sort of dummy operation. You can imagine that there's some particular block on the ORAM, some address, address zero maybe. That’s what I do; I write address zero some random junk every time I want to just do something, something that that looks like an operation. We'll call that this, write Bot Bot. So every time I want to write something I do that write operation to the one ORAM that I want to access and I do a dummy operation to all the rest of them. So somebody observing this at this point if I don't have any passwords, first of all, I can't tell what's happening at all. Yeah. >>: So this is ORAM write or just single write? >> Travis Mayberry: ORAM. Yeah. So we are using ORAMs as a black box at this point. They are just an interface to some data structure and we're going to access them at that level. So at this point if you don't have any keys, so you don't have any passwords, you don't know anything that's happened. You just see that something, one operation was done to all of these ORAMs and you don't know which one was which and that comes from the hiding property of ORAM. So when we do a read what we are going to do in this case is execute kind of a dummy write on all of the volumes. So every time I read something you can think of it like a null write because I said before we are going to unify these together, so when I do a read I do like a null write on all of the other volumes; and the idea behind this is that now if I'm writing to something that was one of these higher volumes here it will be indistinguishable if I have passwords to the lower volumes from just doing a read to them because if I go back one slide, say I have the password to this volume, I can see what is that exactly what's happening here, once I have the password to something it reveals these parameters right now. So if I have the password to ORAM 1 I'll know that it is doing write to Bot Bot. If I have the password to ORAM 2 I know it’s doing a write to X, Y. If we go between these two slides you'll see that the idea here is when I do a read I do this dummy write on all of them. So if I have just the password to this first ORAM these two situations are indistinguishable. If I did a read to something, first of all if I do a read it's not clear which volume I read from, it's completely impossible to determine that even if I have all the passwords, so if I do a read to something or if I do a write to a volume that I don't necessarily have the password to yet, those two situations are equivalent. >>: You maybe [inaudible] context that one can’t read, the less [inaudible] reads makes more sense than>> Travis Mayberry: Yes. I'll talk about that of the second. But that certainly is a problem. Yeah. >>: I think that ORAM already, most ORAMs actually high [inaudible]. >> Travis Mayberry: Yes. >>: Were there reads in your security definition? >> Travis Mayberry: Yeah, sorry. These could be either reads or writes operations. >> But you said in your module you assume that [inaudible] writes. >> Travis Mayberry: Sorry. So there's two notions of read and write here. I mean, there's reads and writes to the disk itself and there's reads and writes that I issued to the system. So when I do a read on the system, so if I'm interacting with my layer that’s kind of my hidden volume layer, if I do a read it’s going to issue a raw write to the system but the actual reads to and from the system are not observable, yeah. >>: So explicitly, you said the number of RAMs is determined by this global parameter? And then I can use a subset? So [inaudible] parameter [inaudible] basically is like a kind of [inaudible] ORAMs are available in my system>> Travis Mayberry: You could also>>: Space is kind of [inaudible] to use? >> Travis Mayberry: Sorry? Say that again? >>: [inaudible] parameter [inaudible] but I only want to use two so I still have to have space on my system for the remaining three, right? >> Travis Mayberry: For now, when I get the end I'll talk about how we can get rid of that property. But yeah, getting back to the parameter for one second, it's not necessarily, could be user adjustable but you just have to have some reason to plausibly say, and this gets outside of the scope of the paper, more into like the game theory and the psychology of it, but you have to have some plausible reason to say that you're not using all the volumes, right? So you can say well, when I installed it I set the limit to be 15 but never got around to using them. I only used four or something like that. So it could be user changeable. It’s just something that's known. It’s a public parameter that's known to the adversary. You just have to have some reason to say that you’re kind of not using them all. So there is one remaining question here, at least one, probably more that you guys will ask. But one important question which is if I don't actually have a password for these volumes how do I make them run? How do I make them do things that look like operations if I didn't initialize them with a key or anything, right? So we do need the property that the Oblivious RAM is kind of simulatable[phonetic], officially simulatable[phonetic], and what we mean by that is that we can run it in kind of a keyless mode; so I can make it operate in a way that’s indistinguishable from it being operated with a key with completely without the key. And it turns out, I don't know that this is universal but all the ones I can think of, all the ORAMS I can think of you can do it like this You just write random strings. You kind of execute the algorithm as you normally would, but you just write random strings to all the places that you're supposed to be accessing the data to. >>: You just use a different key to encrypt? >> Travis Mayberry: What do you mean? >>: The logs. >> Travis Mayberry: Yeah. Or equivalently you could just make up a key and then throw it away. That's the same thing. But you don't even need to do that is what I'm saying. You can just rely on the fact that if your encryption is indistinguishable from random you can just fill everything with random strings but you have to operate according to your algorithm. So, for instance, if you're using Path ORAM which is tree based all I'd do is I'd read one path in tree every time and then I would fill it back in with random bits and that's fine. And it’s indistinguishable from what I was operating that IRAM with a key and actual data in it and things. >>: [inaudible] much cheaper [inaudible] encrypting and getting pseudo-randomness? >> Travis Mayberry: Sorry? What do you mean? >>: So if you use some arbitrary key you kind of automatically use generally pseudorandomness using the [inaudible] and it’s not clear that that’s more expensive than getting real randomness from somewhere. >> Travis Mayberry: Right. I see what you're saying. So obviously if I'm using my entropy pool or whatever on Linux it’s going to run out very quickly. In that case you're right. It would be better to, but I mean yeah, you could just get a small seed and seed your number generator or something like that, but basically the point here though is we can feasibly operate this ORAM in a manner which is not tied to a secret that you actually know because if there was no way to do this the guy would just to say, give me the key and you’d say, I don't have the key, but you have to have a key or else how would you be running>>: Maybe it's the same thing. You’re not committed to a key. >> Travis Mayberry: Exactly. >>: And you can decide which key to give him, any key [inaudible] random. >>: Which makes a random seed actually like [inaudible] generator maybe sort of dangerous because they can say, well give me your seed. How did you get all this randomness? >> Travis Mayberry: Oh. That's a good point. Although you could throw away your seed, right? There's no reason you have to keep the seed. She was saying that, her argument was if you are producing this by a random number generator then the adversary could say produce me the seed that you used to do this so I can verify that it’s not actually a volume. But you could initiate it and then you can throw away the seed perhaps. And I think in practice the way that these attacks would work is that they would get access to your machine when you're not using it, maybe when it’s turned off or shut down or something in which case the state of whatever you used to generate that randomness will be gone. But it is interesting discussion practical implementations of how you do that. But in our situation we use a random number generator which is seeded, but then as soon as you un-mount the disk it just destroys all. >>: [inaudible] machine you can always use keystroke timing. I have no idea. I have no ideas what my keystroke timing is, but>> Travis Mayberry: Oh, that's a good point. But they might say give me the state of your generator at this time. There could be something you could learn about it. So certainly that’s a good question though. Okay. So now we have some number of Oblivious RAMs and we’re simulating some of them, and some of them are real, and the idea is that you shouldn't be able to tell which ones I'm simulating and which ones are actually real, I actually have passwords for. So now I'm able to plausibly deny that such a world exists or such a volume exists. Now what kind of security do we get from this? It's not what we would hope to get, right? What we would hope to get is some kind of ultimate security where no matter what operations I'm doing and no matter how many volumes I have you can't tell whether I have another world or not. So unfortunately we have, and if you're interested you can read in the paper, there's a lot of really straightforward definitions of security that are impossible. So it's somewhat of a difficult thing to formalize and I kind of noodled it for several weeks, and we came up with this. I'm not saying that this is the best ever definition and I certainly encourage people to research it further because what we have I think is useful, but it does have some drawbacks, and I’m going to illustrate that right now. The kind of security we get here is that we get indistinguishable for these types of access patterns. So if I have a pattern one right here, I've compromised password one and password two is given to the adversary, and he sees an access pattern that looks like this or that actually, let’s say I have actually executed this access pattern, read, read, read, write to volume 1, read, read, write to volume 2, write to volume 1. Again, it doesn't matter which volume we read to; they all look the same. That type of access pattern on a situation where I have two volumes is indistinguishable from one that looks like this where I have actually three volumes. So any of these reads in this access pattern could become writes to any of these volumes up here. So this is a very good point that you brought up earlier that certain things are going to have natural sequences of reads and writes and that very well could give away the game in this situation. So if I see too many reads, for instance, like more reads than I would expect to see then that is some evidence that I might have a higher volume. But we call this plausible security because there's no way to actually prove that. There's no way to prove it with any high degree of certainty that you do have that volume because I’ll say whatever, I was reading that. You know what I mean? I was watching movie or something. And since we don't know which things you're reading it’s impossible to link that even if you have all the passwords you don't know which things you’re reading. I was reading a long sequence of files. I was watching a movie that was on my hidden volume or something. Do you have a question? >>:. How does the adversary see that you read something? >> Travis Mayberry: Yes. So normally reads are not distinguishable, but in order to the hide the fact that we have these volumes that are potentially hidden we have to do something every time we write. And the reason is this is just like a consequence of the fact that when we change something in one of these hidden volumes we have to actually change something. Something has to be stored on the disk in order to maintain that data if we ever want to read it later. And so we have to change something. So we have to have some plausible reason to be writing to the disk. And so we kind of force the user to do writes every time they do any access, every time they do a read they’re actually also going to write to the disk. >>: [inaudible]? >> Travis Mayberry: Okay. I'm open, if you have an idea better than that please, by all means. >>: You could use ORAM to also read, right? >> Travis Mayberry: You can use ORAM to also read. >> And then you don't have to simulate the writes because most ORAMS anyway reads and writes>> Travis Mayberry: Certainly. Yeah. Reads and writes in ORAM are the same, like you said. But the point I think maybe that you were saying is that originally I told you that reads are not observable so there's no reason I have to do anything for reads. In this case we are saying, yeah you do. You still have access the ORAM to do the reads because we have to make all these operations look the same. Exactly like you said, ORAM does that. It makes reads and writes look the same, but if you were to for some reason bypass the ORAM when we were doing reads, it would become obvious that they weren't the same. So in order to maintain that indistinguishability between what type of operation we are doing. We have to actually go through the ORAM even though the ORAM does not, even though reads are not saved. Yeah? >>: So if you know the keys for these two ORAMS and then you do a read and then you get like nothing here, nothing here and then you know that there is some probability of writing more than reading then you know that there is a read happening>> Travis Mayberry: Yes. If you have a very strong notion of the A, [inaudible] probability of reads and writes then you can leak some information. And this is certainly worth further investigation, and I think I put it at the end; but what does an actual access pattern look like that people are using? My argument is that it's not highly variable, but in some senses it is because there are certainly, and what you need here is you don't need like 100 percent provability, you just need the ability to plausibly say that you don't have this volume. And there are, for every access pattern that has three volumes there is another access pattern you could've done that has only two volumes which is completely indistinguishable. That's what we're trying to guarantee. And on the side hopefully this pattern doesn't look too weird because the chances that you would do some super weird access pattern is quite low. In our situation we end up with a somewhat organic looking access pattern. Everything in here write to these higher volumes is replaced by a read. But as I said before, if you're looking for an excuse to say that it was actually disk you can always say you have very big files stored on there, you were watching a movie, looking at pictures, something like that. So there's really no way that they can say for sure or even with any very large degree of certainty that you are in one or other of these worlds. >>: So you don't have proof of this? >> Travis Mayberry: No, no. It’s provable that these two access patterns are distinguishable. Yeah, yeah, yeah. But our security model may not be as great as you wanted it to be. Yeah? >>: So we didn’t care about efficiency at all could you like dump these four ORAMs inside another ginormous ORAM and that way make your>> Travis Mayberry: Well, no because you’d have to give up the key to the large ORAM so eventually they would get access to that because the whole point of this is in a philosophical point of view is that knows you have some data on here. So you have to be able to give up something without giving up everything in a way that looks organic. So as I said before, if you look in the paper I think this is a pretty reasonable solution, but we do show that there are a lot of solutions that you would think are better are not actually possible. So a lot more organic solutions are not achievable. >>: One question. If you force someone to give you one you could just hit him more. >> Travis Mayberry: Until they die or something? Yeah, yeah, yeah. That’s the point. So again, this gets into the game theory of it, but you're banking on the fact they're not going to just like keep hitting you until you're dead. You think that once they are satisfied that you give up the information that they will stop coercing you, right? So again, that's like a question whether they will or will not do that. Okay. So I'm just going to move onto the next part; that is really our construction. There are a lot of kind of details I'm going to gloss over here, please feel free to read the paper for the full treatment of it, but that's the main idea. Now if you were thinking this whole time how can we do this? It’s completely impossible because Oblivious RAM is really inefficient. Well, you're not wrong because, for instance, if you have a situation like a 500 gigabyte database which is reasonable for a hard drive, it's not even that big, if you have 4096 byte blocks, which is already not technically possible right now because most operating systems don't support that, hard drives are going to support these size blocks but operating systems haven't caught up yet; but let's say you do, this is the best case, right? So let's say you do, then the overhead here, the number of blocks you need to read and going back to our notion before, this is not strictly the number of blocks but it’s the number of blocks in bandwidth you need to read to download one block or to change one block is let's say, these are two kind of recent related ORAM schemes. This one is by Elaine Shi et al would cost you about 5600 times overhead to download or to change one block. To change one block on your disk you would have to change 5600 blocks or something like that. >>: Did you write the code or is it just [inaudible]? >> Travis Mayberry: No I do have the code. This is not asymptotically; this is like concrete. You don't really need to write the code; you can easily kind of concretely on paper figure out what this is. But these are concrete numbers not asymptotic numbers. >>: Does it use constants inside [inaudible]? >> Travis Mayberry: Yes it does. >>: So sometimes it’s very high? [inaudible] equal to the number of blocks that you are going to store? >> Travis Mayberry: No, no. Not that high. No, because 500 gigabyte database is a lot of>>: So[inaudible] is not the maximum number of blocks? >> Travis Mayberry: No, no, no. That’s just on top of the graph. The number of blocks is like billions I think. That's not that bad. But then again, so Path ORAM is more recent and it's a little bit more efficient, but it's still 100 times overhead, right? So going into this we can take advantage of, like I said before, the fact that the disk only observes your writes and not necessarily your reads. If we have a lot modified Oblivious RAM where instead of seeing everything on this side the adversary only see is the writes not the reads. It turns out that we can get much better efficiency from this. I'm going to briefly sketch the idea and kind of analysis of this. It starts with essentially, if you're familiar with this related work Oblivious RAM, they're all tree based, right? You have some path in the tree which contains your data and every time you read it you look through the whole tree to find it. That's where the log factors come from. In our case though it turns out with writing all you need is a flat data store. You need an array, and you need that that array is twice as big as you actually want it to be. So to store N elements you have to have it to be, actually sorry, this is not strictly true. This could be any constant greater than one but two is a good place to start. The analysis works pretty easily. You have to have some constant times N number of blocks. So in this case half of them are full, half of them are empty. And the idea is, just like in related schemes at a high-level, we're going to have a map that specifies which array index each logical block is currently stored at. So if I want to get block number 10 I look into my map and it says block number 10 is currently stored in this position in the array and I go there and I get it and that's how I can do the reads. So I have this map, that's how I can access things in this, but the question is how can I update things? So I want to update them in such a way that it's not clear which block I'm updating is the whole point of this, right? So I want to change some block and I don't want you to know which block I’m actually changing. Well, the way we do that is pick K blocks uniformly where K is some constant over this whole array; and what we do first is we write the new value that we want, so I want to write an updated value for block number 10. I write that to one empty block out of that set of K that I’ve just chosen. I write my new value, and for the remaining K minus one blocks I re-encrypt what was already there. So I touch them but I don't change the data. If it was an empty block I would just write random strings to it. If it was actual data then I reencrypt it again. And the third step then is that I have to change the address of the new block in the map. So put it somewhere else, now you have to know I go to look for it in the future, where it is now. It’s at this new location that I've just updated. So in terms of security, like many of these schemes, it’s relatively easy to see that it's secure but it’s somewhat more difficult to show that it actually works. So it's pretty straightforward to show security because every time we access anything we're just writing K blocks. The K blocks are chosen uniformly random. And since they're all encryptions or random strings which are indistinguishable from encryptions, you can't tell which one I’ve actually changed and which ones just remain the same. So the adversary learned there's essentially no data to learn in this case. He just sees that I pick K blocks and in those K blocks I have written random strings. It’s completely independent of the data that I want. Yeah? >>: [inaudible] table? >> Travis Mayberry: That's going to be stored in RAM for now. We'll get to it in a second because it's very large and it's very difficult to store in RAM. But for now you can think of it like it's stored in your RAM, in your memory, and the adversary has no access to your RAM is another assumption that we make. And in this type of hard drive encryption that's fairly reasonable, I think. So it's pretty easy to see that it's secure, that it doesn't leak any information. But how do we actually use it, right? So how do we actually choose how big this K has to be? And since K is a constant I said before we have this constant complexity as opposed to logarithmic [inaudible]. But how big does K actually need to be? And that question we can answer pretty simply because you'll notice I chose before two to be the size of the array, two times N, and it's because this math works out pretty easily here but it can be another consonant just as easily. The idea is that any block that we choose randomly is going to have at least one half probability of being empty and that makes it very neat because we could say that if I choose K blocks the chance that none of them will be empty, which is bad, if I choose K blocks and none of them are empty I have nowhere to put my new information and everything kinds of falls apart. So the guarantee I have to have here is that none of the blocks, or at least one of the blocks I choose is going to be empty. So if every block has one half probability of being empty and I want to guarantee that with very low probability here none of them will be empty then we have to set K equal to our security parameter S essentially. Yeah? >>: What happens if K says 16 and it’s going to be retried? >> Travis Mayberry: Well, that would be bad because then you would reveal if you do a retry you reveal that all the blocks that you just touched have data in them and that’s not immediately going to be a problem. >>: [inaudible]. I know for 16 K blocks all are retry [inaudible] disk. >> Travis Mayberry: Okay. But let me say two answers to this. One, heuristically it's not going to hurt you that much, I don't think. But in terms of provable security you're going to alter the distribution of your choices now because it's impossible to ever choose a full set of K from the blocks that are full and so I'm not now uniformly choosing them randomly. I’m choosing them dependent on some status of the disk. So we like to avoid that. In practice, it's probably a bad idea. But I think off the top of your head it seems good because it probably doesn't leak a lot of information, but it does change into conditional probability and we don't want that. >>: But you can actually say I'm going to read three times 16 block and it comes out to [inaudible]? >> Travis Mayberry: Yeah. About the same. >>: [inaudible]? >> Travis Mayberry: So this seems relatively high though, if I told you every one block you want to read you have to do 64, that's basically as high as the Path ORAM and we have really haven't gained anything. So there is some optimization we can do here. And what we can do is we can notice kind of that if we set K equal to S to be something large, 64, we've made it so that now it's very, very unlikely that we have no free blocks. But on average are going to have a ton of free blocks. We’re going to be drowning in free blocks. Every time we access with K to 64 on average there's going to be 32 free blocks just to avoid the fact that every once in a while we have no free blocks. So we are wasting a lot here. Our overhead is significantly more than we need. So instead of trying to make this really unlikely failure event, make it never happen, we can kind of raise the probability that it will happen but kind of hedge against and make it a nonfailing event. So what we can do is we’ll store a local stash in RAM of blocks that we haven’t fit yet. So I read some blocks and I don't have enough space for the one that I want I’ll just hold onto it and I will write next time, I'll get back to it. I'll write next time. >>: Do you have analysis how much blocks are left on stash? >> Travis Mayberry: So this slide, these should be hidden, they’re not going to come up til later but you're getting a sneak peak. S for example, we can set K equal to four and what this will look like, for instance, is that the probability we have no empty blocks is going to be one over 16 which is not high but it's definitely not low enough support cryptographic purposes, right? So it's still relatively high. But the probability that we have more than one block is like 11 over 16 greater than 50 percent of the time. So if you think of this, if you think of our stash here which holds the blocks we can't fit, if you think of it as a queue we put things on the queue and we can't fit anything there and we push pop things off of the queue when we have extra space and we have more than one empty block, then the stash kind of has a lot more pressure to empty it than it does to go into it, right? And so this a pretty classical situation, particularly we can model this as a DM1 queue with deterministic input, arcovian service rate with a rival rate of one and a service rate of K over two. So we can use some existing analysis to show that basically if K is at least three then the rest of our stash is not going to exceed O of S, the security parameter essentially with probability greater than two to the minus S. So we have about a constant stash. >>: Do you [inaudible] blocks or you’re just writing those empty blocks and do you delete from, because it be can always write, right? >> Travis Mayberry: Yes. If you always write you’re going to have stale versions, old versions of a block. Is that what you mean? >>:. No. So you don’t have your empty space. Like you’re always writing>> Travis Mayberry: Yes, yes, yes. So this is again, like one of the details in the paper which I really didn't talk about, but since I'm writing a new version to like an empty space the old position of that block now is an empty space. It's still technically has data sitting in there but I do it in a lazy way. So I waited till next time to, if I pick my K blocks probably eventually I'll fill up all the free spaces, some of them will be like old versions of blocks and I'll notice that when I read them. There's a way we can determine that and I’ll just overwrite what’s there because they're not fresh anymore. >>: What happens if your laptop is grabbed while there’s good stuff in the stash? >> Travis Mayberry: If your laptop is grabbed while you have good stuff in the stash then there's not much you can do at that point. >>: So the stash empty [inaudible] as well or it waits for a write? >> Travis Mayberry: Wait for a write. Yeah. >>: So you always expose information in the database? >> Travis Mayberry: Well, okay. Let me say yes, but really any disk encryption is defeated if someone takes your laptop while you're logged in. So we can't really hope to get much at that point. And upon log out, even if you have things left in your stash, since we have the fact that the stash is bounded at a relatively low amount whenever you log out we can just write the stash to disk in a fixed size, right? So I'll always write S blocks even if I don't have that many blocks so you don't know how big the stash is and you just have like a place on the disk to store it when you're logged off. So it's kind of like a naive Oblivious RAM that you use just to store the stash. Okay. Getting back to your question about the client map, it's phenomenally large, way too large. So as we described in our construction so far the map is going to be N log 2 N which is way too big, gigabytes probably. So fortunately we can fix this with a standard technique from related ORAM literature which is to store the map again in a recursive ORAM. So it's really big, but it’s not as big as the original ORAM, so we store it in its own ORAM. The map for that ORAM will be smaller than the old map and we store that one in an ORAM and so on. Unfortunately, the size of the map is like only a very, very small fraction of the original disk usually. So in this case we have fixed block sizes which are 4096 bytes. The size of the map is going to be much, much smaller than the size of the ORAM and so this kind of recursive descent will happen very quickly. Technically it's bounded at log N is the number of possible recursions that you could have but because of our 4096 block size for anything up to like X of bytes of data you only have maybe three levels of recursion. So in practice it’s nowhere near log N, it's going to be something much smaller than that, but there is this interesting trick you can do due to Stefanov, from his Path ORAM paper, where you can even asymptotically get rid of that if you consider your overhead not in terms of the raw number of accesses you do but in terms of the total amount of bandwidth that you consume doing the execution, and what they do is they say some blocks are large, some blocks are small. They have the various sizes of the blocks. And you can kind of do this neat little asymptotic trick where you can get rid of this log N. So asymptotically if we want to have a fair comparison with these related works we are constant. We are O of one. But in practice it’s not technically O of one. It depends on whether you believe in this trick or not. So it's either O of one or it’s log N, but in either case if you consider it to be log N the constant on the log N is like very small. It's like less>>: If I put it in practice, like for terabytes of data, you just need one [inaudible]. >> Travis Mayberry: Yes. That’s what I'm saying. So before they had this trick in Path ORAM it was kind of well-known that although technically there was a log N factor for the recursion and was nowhere near log N. It was one or two. So all they did was formalize that and they say if you do this trick with the different sized blocks you can formally show that it's not anything. So that's why I said it's a half-dozen to one, six of the other, so either way it's very small as you can see by this graph here. We have the 5600 overhead for this one, we have like 100 overhead for Path ORAM; and for us it’s like three or six depending on, this is just for one particular setting of this database here so it could be three or six maybe. >>: So the main reason for this being three is that you just write? >> Travis Mayberry: Yeah. So again, this we were able to kind of drastically reduce this which is what makes it even close to efficient. So there are other some interesting uses for Write-only Oblivious RAM which I'm just going to take one slide to discuss. You could, for instance, uses this for full disk encryption without your hidden volume. So you're saying I'm not paranoid enough to use hidden volumes but I do want to avoid the fact that my access pattern on my disk is revealed, so if someone is to take a couple snapshots of my hard drive, if I live in the US and I'm secure in the fact that no one’s going to beat me to give up my password but I still don't want to, if there’s a very determined adversary that could learn information about my encrypted data from just watching the access pattern on my desk, every day they take a picture of your hard drive and they see that certain things change, certain things don't change, you can use simply Write-only ORAM without any of this hidden volume stuff around it to defeat that type of an attacker. But additionally, I think the more interesting situation is for cloud backup and synchronization. So if you use something like Dropbox or Box or SkyDrive the reads that you do to the data in that are not observable to the cloud service itself because you store a local copy of it. So if I have a Dropbox the Dropbox files are stored on my laptop; the only thing Dropbox sees is when I make changes to that data and it gets pushed back to the server. Otherwise everything gets synchronized regardless of what information I'm actually interested in reading. So in that type of a situation you could use Write-only Oblivious RAM and you get the same guarantees of security that you would get with the full Oblivious RAM. So it’s just an interesting aside about other applications for this. So we did have a question about basically we have all this overhead. So we have a separate ORAM for all these volumes and it becomes expensive in terms of storage but also in terms of access every time I do anything I have to touch all of them. Well, we can kind of to do the same trick that they do in TrueCrypt which is to store the volumes inside of each other. So if I have something like this they're all stored in the same array, if you're going back to our representation of ORAM, all of my volume’s blocks now are stored on the same array, the same ORAM but I have different volumes. So I have this volume 1, volume 2, volume 3, volume 4. If I have only the passwords to volume 1 and 2 then it looks like this. It looks like this is empty data here. Empty space. So in this case all we have to guarantee is that the sum of all of the used blocks in all of the volumes doesn't exceed half the size of the array. So essentially now we have kind of squished all the volumes together and the only invariant we have is that all of the used blocks do not exceed the size of one volume. So in this case you might argue that it becomes suspicious if I have a large volume but only a small amount of data, right? So then you’ll say oh, there must be hidden volumes, but I don't think that's a really great argument because, for instance, I know my wife on her laptop has like a one terabyte drive but only uses like 15 gigabytes of it. So it's not a factor of what you actually have it’s just what you happen to be using at the time. >>: [inaudible] ORAM? >> Travis Mayberry: So this is just, like I said, an optimization to kind of collapse all the volumes on top of each other so you don't have to worry about accessing them all at the same time. And again, the details for this are in the paper. It's not completely straightforward. This is just a rough idea of how it works. And for implementation we have a Linux kernel module which we call HiVE, Hidden Volume Encryption. I forget what the I stands for. Oh, it's Hidden. >>: I have a quick question on that. So if the adversary tells me to run these two keys, so I say now use with these volumes, so I would overwrite the underlying volumes. >> Travis Mayberry: Yes. You would overwrite. That is true. So we implemented this using device-mapper which is just a kernel API in Linux. It’s how they actually do that DMCrypt which is the standard full disk encryption in Linux. And it works basically on any block device because of this device-mapper API. We can use any block device as an underlying level. And we did benchmark it using bonnie plus plus. I think the numbers are pretty interesting. One, they're a lot slower than what you would get with the raw disk, but I think there's still high enough to be usable. And the other thing I'd like to point out is this is kind of obviously first attempt, there's a lot of work to do optimize this, particularly we kind of have the exact opposite of the access pattern which it’s normally optimized for in terms of both hard drives and operating system because we were accessing completely random blocks every time all over the disk back and forth and back and forth; and the operating system has a lot of things built in to do sequential reads and sequential writes. So if you do a read of one thing it will kind of pull in all the rest of the data in the same page, especially if you're using SSD there's a lot of optimizations that work against us. They do a lot of extra reads that we don't want because we’re never doing sequential reads. When they do writes they expect you to write in kind of the same place, in sequential areas, so that's how they optimize their write-waring. So what looks, like I said, back here to be good asymptotically turns out to be worse in practice, but it's not really a permanent situation. I think there are optimizations we can do to get this number up. And at the end of the day one megabyte per second is still pretty reasonable. It's pretty usable. If you have a one megabyte per second Internet connection you’re still surfing the Internet fine. The final note I wanted to have here is that, especially this becomes more practical if you consider that you don't have to use this for your entire disk. You don't have to use this for all of your operating system files. You can use it for your personal information. So you could have like your user directory stored in this type of encryption. So the things that kind of regulate your day to day operating system stuff can be completely unencrypted because nobody cares about that, right? That’s not the information that anyone is going to try to get out of you anyways, and so for that type of thing you can be in the outsider layer, you can be in a different encryption. Yeah? >>: [inaudible] OS related files is a swap file, and I really don't want bad guys to get a hold of that. >> Travis Mayberry: Yeah. That's a good point, actually. A swap file. But there is some granularity. You could store your swap file in here, you could store other parts of the operating system other places. You don't have to, it's not like a fixed thing. You don't have to store your entire system on this. Yeah? >>: So I think swap files are probably not a big problem anymore because in general [inaudible] we can’t swap anymore, so>> Travis Mayberry: Yeah. I disabled the swap file on my computer just because you don't want it to swap. >>: I think it’s also probably not good for the [inaudible]. >> Travis Mayberry: True. >>: Have you considered implementing this on a RAM disk? >> Travis Mayberry: Yeah. That's actually one further, so we did do some preliminaries just to see whether this worked and how reasonable it was just completely in RAM, and it was much faster than this obviously, but for the purpose of kind of honestly telling people what the performance of it is right now we thought we’d do it on an actual disk, but in a RAM disk I would expect it to be, basically there would be no limit. It would be, if I tell you that we’re ten times slower than raw access, in RAM it would be 10 times slower. We can very precisely measure the number of blocks we need to access and it is not very many. It’s like 10 times, like I said. So in a RAM you don't have to drawbacks of these. >>: [inaudible] if it's a three times slower why do you get [inaudible]? >> Travis Mayberry: Like I said it’s an artifact of the disk itself and the operating system because the operating system is built to expect you to do work on sequential areas of the disk. So the fact that every single access that we are doing is a random location is bad. It works against a lot of the optimizations that they have. >>: So did you try the SSDs? >> Travis Mayberry: Yeah, we did. This is on SSD. That's why this number is so big, right? So on a regular hard drive this number is way lower than that. >>: So on SSD it is supposed to be like random access is supposed to be 10 times slower than sequential access on hard disk 500 times. So on SSDs it's not so this is not just because of SSD? >> Travis Mayberry: Yeah. Well, no, no, no. I mean that works out about right. So if you said 10 times slower this will go from 216 to about 20. And I said our overhead is about 10 and so that's within a factor of one or two. That could just be our software needs an optimization. >>: So on hard disk it would be much worse? >> Travis Mayberry: Oh, yeah. Much worse. We didn't even run numbers on a hard disk. I expect probably you wouldn't even be able to run this benchmark program. It wouldn't even execute. But again, this is just a first attempt at this and I think it could be certainly optimized. >>: So does this mean that if you just stored ORAM in disk it will be even worse? >> Travis Mayberry: It would be worse than you expect for the same reason as this. It wouldn’t be worse than this because this is using ORAM, but additionally we have around the ORAM some hidden volume stuff so we have to store additional, again this is all in the paper, but you have to store like IV somewhere that you have access and they have to be in a predictable location and you have to store the client map and things like that. So there are additional overheads induced by our hidden volume stuff. So ORAM will not be worse, but it will be similarly bad if you do it on disk. Most ORAM benchmarks that people show are done in RAM for precisely that purpose. So also in the paper, kind of this overlaying as I said it’s more complex than I led it to believe, and all those details are in there. There's a lot of practical considerations. Like I said, where do you store the IVs? How do you do these things securely? How do you store the stash when you log out in a way that doesn't leak information. And additionally we have these security proofs. Future work, as I said before, is optimizing this and targeting maybe more appropriate storage devices like you said, a RAM disk if we could. But again, if you have a RAM disk I mean nobody really has those. It’s very uncommon, that's kind of a downside to that. And additionally something I really didn't talk about but we have in the paper this idea of an on-event adversary. So the adversary I considered so far in all this stuff I've described is one who every time you do the operation he gets a snapshot. You do an operation he gets a snapshot. Back and forth. But in reality the guy’s not that powerful, right? He doesn't have continuous access to your disk. In reality he's only getting it after a sequential, a series of accesses. So there's some chance that we could make this more efficient by kind of delaying some of this bookkeeping, some of this cleaning up until we logout and that would mean that in between log outs we're still secure and I think that's a slightly more realistic adversary. We haven't taken advantage of that yet. So, like I said before, exploring further adversarial models that may be match your expectations better. There is one, let me mention very quickly, there's one adversarial model that I didn't talk about that’s in the paper that is maybe more realistic but comes with an additional drawback which is the idea behind it is kind of that every time I do a write to one of these volumes that's hidden I will wait until I do a write to a lower volume and I'll just execute them at the same time. And so that way you get kind of the intuitive notion of what you would expect for your hidden volumes but the downside is that your RAM is unconstrained. If I never write to the volume one then I'm just going to keep building up these pending operations in my RAM until I have no chance to flush them out. So with the restriction we have on that when we model it is that the number of writes in the volume X must always be greater than or equal to the number of writes in volume X plus one. So you have this cascading series of writes which I think makes a little bit of sense because the higher the volume you go the more secret it is and the less files you will have in that volume. So it makes a little bit of sense. >>: You can do a deliberate write to the lower secret [inaudible]? >> Travis Mayberry: Yes. But that gets, again, into all those philosophical questions of if I'm influencing you to do a deliberate write then you're not doing your natural access pattern anymore and I have, as far as my security model is concerned, it’s questionable how that impacts it. >>: Did you look at all, because of the sequential thing have you considered choosing K RAM positions you choose like K positions? Is that something you're going to [inaudible] because your writes are all at random places, right? >> Travis Mayberry: Yeah. So if you choose K, if we could get that to work that would certainly be better, but I'm thinking that it would only work really if you happen to like, well, so in the worst case that won’t work obviously, right? So I can always come up with an access pattern which will cause that to fail which is to just read and write the same block over and over again and eventually I'll happen to like punch a block where, so what you're relying on I think intuitively is that I'm going to be making new versions of these blocks all over the place and so I'll be making up holes in there everywhere, but if I pick a really bad worst-case access pattern it doesn't have to happen. I can always be reading and writing and changing the value of the same block and the rest of them maintain their position. And so eventually I’ll just unluckily stamp down on one that happens to be all full. So yeah, I think you're relying on, we can certainly talk about this, but I think that intuitively that relies on some randomness in the access pattern itself, but I'm not sure. It depends on how you formulate that. >>: So one question. When you introduce this scenario, if I know about encrypt, my solution would be to just continuously write into free memory to just write something garbage to the free memory during the operational[inaudible]. Could you compare this [inaudible] or this straightforward solution to your solution? So what>> Travis Mayberry: Yeah. Well, okay. So if you were willing to do that, if you're willing to just like continuously execute these dummy operations then you could operate in the first or the kind of optimal situation that I said, and he described also, where you use our system but you kind of wait for this thing to flush out your operations in time. So if you're willing to have that kind of situation where the system just kind of does its own thing and is constantly like writing stuff then ours would be only more efficient in that situation than theirs because for theirs they have to write, a significant fraction of the free space has to be overwritten in order to be plausible because, for instance, and the reason is because, so let's say I have one file that I access every day and it’s like my calendar or something like that. If it changes every single day then it becomes obvious. If I take some snapshots I see that piece always is changing. The random accesses are all over the place but that one is always changing. So in order to guarantee that I don’t leak information about that I have to change a very significant fraction of my free space for every, between every snapshot that the guy gets. Does that make sense? >>: [inaudible]? >> Travis Mayberry: That's what I'm saying. That’s what TrueCrypt would do. But it would have to change, I'm saying if your free space is somewhat large between the snapshots they have to change a lot of it in order to hide>>: Could you just change one file in the free space, so like mimic this access pattern that you would expect [inaudible] agency from user and just do random writes into the free space, but also access patterns that would be expected from>> Travis Mayberry: Yeah. But how do you quantify what access pattern is expected? >>: So I think you could set a set of access patterns, and then if you just go to very totally random access patterns but if you just go over location on disk and change it every time because you expect [inaudible] calendar and you will change other parts so you don't just [inaudible] but that you try to have a profile the user, >> Travis Mayberry: Yeah. I mean in terms of practical solutions, as an engineering solution that's great; but again, that's more of a heuristic. So I can't think of how you prove something like that secure. No, not that I'm dissing that solution all, there’s plenty of things that are heuristically good enough to use. I'm just saying that we are considering provable solutions. >>: What are all your transfers [inaudible] going through customs and planning>> Travis Mayberry: Exactly. So the idea is that I'm using my computer, and every time I'm done I shut it down, and at that point the computer is, at log out point I can actually take that opportunity to run some kind of cleanup operation. So I can do completely like unencrypted, unencumbered operations while I'm using it, and then right I before I shut it down it does something like sanitize the disk. So I think that's a realistic setting; and it seems intuitively that you should be able to gain efficiency from that advantage, but we haven’t taken advantage of it so that's why I said that it's useful future direction. >>: We've had too many questions already. So we’ll thank Travis. >> Travis Mayberry: Okay. Thank you.