>> Justin Stiles: Thank you for coming. I see a lot of friendly faces in the crowd and hopefully the online presence is good as well. I'm Justin Stiles. I'm a part of OSU security and we're sponsoring along with Research this talk that Josh is going to be giving us. There is a sign-up sheet somewhere around the room. Raise your hand if you've got it. If you happen to make all six in person you will get a special prize. Online doesn't count. Sorry, everybody, but that's kind of one of the fun things that we are doing to get you to show up. We hope it's an interactive conversation so Josh is ready to take questions during the talk, or you can save them until after, but make sure you sign up on the sheet if you're here, if you're present. It won't be something, you won't win a vacation to Disneyland or anything, but it won't be some random tchotchke. It'll be worth, hopefully forthcoming for if you started today. Josh in my eyes really needs no introduction. If you want to know cryptography or have questions on cryptography, he's kind of the guy. You guys can read his bio. You know he is well qualified, but don't let him slip up, if you've got any random crypto questions during his talk. I think he's up for the challenge, so let's give a round of applause for Josh Benaloh [applause]. >> Josh Benaloh: Okay. Thank you very much Justin. I presume I can be heard? Good. Those of you who online and didn't know about the prize, you can still make it here by the end of the talk and get your name on the list. Just to make it clear what's going on, we've got six sessions planned. This first one is going to be pretty much a broad overview touching on a lot of things that are very high level and then the remaining five will dive into more detail on various things, and there will be some flexibility. I've got some things in mind that I want to be talking about, but if you have things, directions you want to take things let me know and hopefully we will be able to accommodate. Anyway, to start things off a lot of people like to talk about internet security. That's kind of why many of us are here. It's worth remembering that the internet was not designed at all with security in mind. It was designed basically for a small number of mutually trusting and trustworthy entities to exchange whatever they wanted to. And pretty much sending anything on the internet is like sending a postcard through the mail in the case where you don't really trust the post office. It's a difficult environment to work in, but we can manage some things. What does a typical internet transaction look like? Assume that we've got you out there browsing the web and maybe some merchant you maybe want to buy something from. Let's say you want to get a book from Amazon. You may have in exchange. Let me look at this. Here is the information. I like that. People who looked at that seemed to have looked at these books also, so you look at those books. Eventually, maybe you get to a point where you hit you have convinced me. I want to buy something. Amazon is very happy and they say great. Please send us your credit card number. Well, okay. Here's my credit card number. Remember that thing about postcards and whatnot. This is not a very good state of affairs, so we want to find a way to do better than this. Ideally, everything, that whole conversation is protected so that people can't see my browsing history and my Amazon browsing history just in that little environment. But at the very least, we would like to encrypt or someway protect the credit card numbers so that only the merchant sees them. Even that's a problem and I will talk about that later. Let's try to get at that as our first goal. This is going to take us a while. It's going to take us most of the next 30 minutes or so. To begin with, it's worth mentioning Kerchhoffs's Principle. This dates back about 130 years and says that if you are building a security system that the system itself, the security of the system itself shouldn't depend on everything being hidden. You shouldn't use security through obscurity. You should be able to make everything you are doing public except some very small bit piece which is the key. That's the goal. It's interesting. At this time locksmiths adopted this wholeheartedly and that's an industry that has now flipped the other way and they are like you shouldn't be able to understand how locks work. Knowing how locks work makes you a special person who we have to keep track of, so that's an industry now that is based very largely on security by obscurity, not knowing the tricks in how to pick a lock, for instance. There are other industries that have gone hard in that direction of trying to hide things, but interestingly, some industries have embraced this coming out in the open. Not just cryptography but, for instance, the gaming industry, Las Vegas, you can go to conferences and workshops where they'll talk about all of their vulnerabilities completely openly, what tricks have been used at the blackjack tables in the last 12 months and whatnot. They'll talk about things and they decide that sunshine is a good thing and I think sunshine is a good thing. I think this principle is worth following, but it does mean you have to be very careful with your keys. When I talk about keys, let me talk about what I mean by a key. First of all, I'm not talking about a PIN. This is just a little tiny nothing. I'm not really talking about a password either. This is something small, something pretty guessable most of the time. What I mean when I talk about a key is a big random at least 128 bit string. The reason that this is important is that pins and passwords have a place but they only have a place where off-line attacks are not possible. Think of what happens if you try to enter a password too many times. You get locked out very quickly. If I had a way of checking through a password space without being online, without interacting, then passwords, pins wouldn't work at all. For cryptography though, if we're going to encrypt something you can check off-line whether the, what possible key will decrypt it. When you have an off-line attack, then anything like a pin or a password is terrible. Everything may be subject to an off-line attack and it's something we need to consider. An attacker just as a rough estimate, might be able to search through as many as two to the 64 values. That's a good indication as to the size of the space that an attacker may be able to search through. Think about what that is. That could be all numerical pins of fewer than 20 digits. There are very few people who I have seen that enter 20 digit bank pins. Passwords, fewer than 14 lowercase letters, most of our passwords aren't quite up to that. We could add a few characters of the alphabet. How about alphanumerics, still fewer than 12 characters or printable passwords, still, anything fewer than 10 characters is in some sense searchable. These things still work for the online only application of entering a password to get access to something and you get locked out. But if you are trying to use this in a cryptographic manner as a key, you're going to have problems. Don't even think about using any user chosen passwords as an encryption key, please, nor anything that's been derived deterministically. That's just as bad. Basically, given some ciphertext, an attacker can just do some sort of an exhaustive search, a guided search through the space of up to two to the 64 things and crack your user chosen password or anything. Basically, there's no such thing as a user chosen key. Think of a key as something that is really just a big random value. Something that doesn't apply that much anymore but I decided to keep this in his dictionary attacks, because you do hear people talk about these against the space of passwords, say. The idea is, basically, it can often be the case that you try one password and encrypt this common password one, two, three, four, five, A, B, C, D, E, whatever, and then you can sort of scan through and try it to see where this encryption matches something because you saw the way it was encrypted. There is a little trick that is often used here of make sure everything is encrypted differently and use some sort of a SALT, some sort of initial value. It can be public, that value but make sure it's a different value for each encryption in some way and we'll be using that a lot more later in the next session. Let's get into the meat of things, into the S’s and start talking about what you might think of as encryption. An encryption starts out as symmetric encryption. Basically, symmetric encryption is there is a key; I've shared a key with somebody else. I use the key to encrypt; you use the key to decrypt. That is traditional encryption as it existed for literally millennia. Caesar ciphers, in fact, can be thought of as bad. That goes back quite a way. How do we do this? When we talk about symmetric ciphers, there are usually two major classifications, two broad groups of symmetric ciphers we can talk about, stream ciphers, and block ciphers. I'll spend a few minutes talking about each and surely you have a work and we'll spend more time looking at these in a future session. Stream ciphers are few examples but really RC4. This one is like everything. This is like 99.9 percent of what you see. A5/1 is used in cell phones, so that has come up a little bit more lately, but RC4 used to be like 99 percent of all encrypted traffic anywhere in the world was RC4. What you do with any string cipher is basically you build a pseudo random generator. If you've ever seen a pseudo random generator, thought about these, these things are very, very important for many applications, but all a stream cipher is is just a pseudo random generator and all you do is you take the stream of output bits from the pseudo random generator and XOR it with plain text to form ciphertext. Here's pretty much what it looks like. You take your plaintext. You take your seed and put it in your pseudo random number generator and you create a stream of key bits and you just XOR every bit and you get a bit of ciphertext. How do you decrypt? You do exactly the same thing. You can even use exactly the same function. The ciphertext goes in the top, gets XORd. When you XOR with the same key bits twice you get back to where you started; you get the plaintext back. It's very simple, very easy, so much so that I will show you one common example. I talk about alleged RC4. There's an interesting history here. RC4 is a trade secret of the RSA Corporation. It has presumably been leaked. There has been leaked implementations and leaked descriptions of what this looks like, but technically still if you use RC4, you have to pay RSA Corporation, but as a trade secret this thing called alleged RC4 has been discovered. It seems to on all inputs and all outputs for anybody who has tried it does exactly the same thing. But we will talk about alleged RC4 here. It's quite simple. You take a byte array, 256 bytes and you stick in all possible values in order. You also take another array of exactly the same size and you replicate your key as many times as necessary to fill this array. It's completely flexible. You can use 128 bit key is commonly the case. You can use a shorter key. You can use a longer key, just repeat the key; fill this array with that key, repeat it over and over. Then just for the setup you go through this little tiny loop, 256 times and what you're doing is just shifting this array around. You will still have one this is over all 256 possible bytes in that array; they will just be in a different order. What you do is take what's in position i here and what's in position i here and add them with a j which started at 0 and compute a value and then just swap two elements in this array. And just go through a bunch of times and do a swap and get it all scrambled up. And this key caused the swapping to be done. Once this initialization is complete, this 256 steps of initialization, then you can start streaming out bytes and the way you do this is I started 0. You just keep incrementing i. You keep incrementing j not by 1 but by the value in the i position and you do another swap each time, so you just keep on swapping those values in this array and you compute a new index which is the two values that you swapped and you output whatever is in that position. This is the entire pseudo random number generator known as RC4. You can put it on a T-shirt. You can put it into slides. You can put it in any way, but it's still alleged. It is extremely fast. It's extremely nice. It's got a lot of good properties. It was discovered years after it started being used. There's a little bit of a bias in the early bytes of the output that had been used to break some things, so the current recommendation is just sort of throw away the first thousand or so bytes and just start a little bit later. Stream ciphers are great. They are typically very, very fast. You can see this thing just screams. You just do a few steps and then you get another byte. Then you just keep on going, 10 cycles or less per byte. They are very simple. Encryption and decryption can be identical, literally, but there are some problems. I'll hit the first one right here. What happened there? It wasn't on my machine. You can probably guess what that operator is supposed to be. I don't know why that -- but anyway. If you ever use the same key stream, ever, ever you have a problem. Here's plaintext one. Here's plaintext two, ciphertext one and ciphertext two. If you XOR these two ciphertext what happened to that key? It's gone. The key XORd with itself. You just get these two plaintext XORd together. And from two plaintexts XORd together you can usually tease those apart. Now you have the two plaintexts and you can derive the key that was used and the packet and anything else. This happens surprisingly often even amongst people who know it should happen. A few years ago I was working with a team in Office that will not be named and using a stream cipher and saying yes, yes. We know about all the pitfalls. We are very careful; we don't use the same key stream. What we do to make sure of it is we derive the key stream based on the filename with the whole path. Okay. Think about this for a minute. I'll point it out. That's great for separate files, but what if you change a file? And I watched 12 jaws around the table simultaneously drop. Yeah, okay. It's a mistake, but basically stream ciphers are fragile. It's really easy to mess them up. Here's another way in which they can be messed up. Take a look at basic stream cipher; how we designed it is bit by bit. Every single bit encrypts to a single bit. There's no interaction across the bits. I hear people joke sometimes that this is terrible because anytime there is a 0 here, that means that bit when unencrypted. [laughter]. Okay. But there's another problem when you're doing it bit by bit. It's very easy for an adversary to alter the ciphertext in a way such that it will have very specific known impact on the plaintext. You may have read well structured text. Maybe here is some sort of a bank transfer notice. Please, transfer two dollars and you know exactly what the form of this is. Here it comes. It's encrypted and I know that the second byte in this value is going to have the amount. I can just take the ciphertext and just flip one little bit and suddenly the bank transfer is an encrypted version of this. That also leads to undesirable results, may be undesirable for Bob, no for Alice in this case. But you can see that it can cause some problems. These are the two biggest problems with stream ciphers. The reason that fragility exists so much with them is lack of integrity checking to make sure that this hasn't been done and key reuse which happens to even people who think they are being careful to avoid it. Microsoft policy requirements, part of the SDL, don't use stream ciphers. They are so easy to mess up. Don't use them. They are nice; they have lots of good properties and if you really, really have a good reason, you come to the crypto board and you plead your case and we'll make sure that you are doing it okay, because we do use them in places where they are reasonable to use, but as a rule avoid them. The alternative is block ciphers. A block cipher basically look something like you take a key, you take some data and you churn for a while and outcomes ciphertext. Let's take a look at what this might look like. There are a bunch of ways of building these things, but I'm going to show you a really clever way of building block ciphers that works very well. Lots of block ciphers have been done this way. It was invented a little over 50 years ago by a guy named Horace Feistel. The basic idea is you want to make something that's really ugly function that takes bits and scrambles them all up and is really hard to figure out what is going on. That's what encryption is in some sense. But it has to be reversible. Reversibility is kind of hard, but if you want to encrypt, you've got to be able to decrypt. How do you make this really, really ugly and nasty and difficult but also make it reversible? Here's this really clever trick. You take your input up here and you break it into two halves, the left half and the right half. The right half goes through the ugliest function you could imagine and it doesn't have to be reversible. What you do with it is take that XOR with the left half comes through and that becomes the right half. The old right half just goes through unchanged here. The nice thing about this is that even though you've gone through and under reversible function here, you can reverse this. You can get from the output back to the input by just going up and doing the same thing because you still preserve the input into here. You've still got it so you can reproduce it and do the XOR and get back to where you started. That works pretty well. Except, wait a minute, we left this half unchanged. Okay. Sure, but we can do it again. Coming from here now we will do another. Now both halves have been messed up and you actually can do both of these things essentially in parallel because you can see this is coming down to here. You have to wait a little bit, but you can compress the effort, but now both halves have been done. And things go through much more easily and you have done some encryption. This is good. It's not really good enough. Usually when you do these things you keep on churning. One thing you do, put in the key here. Whatever key you had, the same key doesn't have to be used in both of these functions. You can mix it up a little bit, but also you iterate, so typically one of these Feistel ciphers are iterated for at least 10 rounds, 16 rounds, often more. You use different sub keys for each round and with this, even if your ugly function in the middle is very weak, you can get a very strong cipher by just iterating enough. I would never recommend that you roll your own cipher. If you had to, don't do it. But if you had to, if you take some really lame little ugly function here and iterate it 100 times through a Feistel network like this, you'll probably have a pretty good cipher. It'll be slow. The trick is making it so you don't have to iterate it for 100 rounds, which can reduce the number of rounds and make it fast and efficient and still very strong. If you iterate it for 100 rounds you will probably get something that is very good. I'll just take a moment to show you an old style cipher that uses this. It was very common in the old days, not so much anymore, but the data encryption standard was the first widely used public block cipher. The main reason, the main problem is, besides being a little slow is that 56 bit key. Remember we talked about being able to search through 2 to the 64 size space. This is smaller than that. You can do exhaustive search on this. Just to get a sense of what can be done and how a real cipher can be built, also the 64-bit plaintext, by the way, is a little bit short for larger block sizes now. A Feistel cipher is done here, 16 rounds. Let's take a look inside one of these rounds for DES. All that's going on is one of these Feistel rounds and the only thing you have to describe is the ugly function in the middle, and in the case of DES it looks pretty much like this. I'm oversimplifying a bit, but you take 32-bits in because it was 64 bits. The right half is now 32 bits. You take a piece of the key, 32 bits there, XOR them together, take this, break it into 8 4-bit blocks. You have a substitution table for each block that says if you see these four bits, you substitute those four bits and then you do a permutation and then you have used 16 rounds of this. Now the actual DES is a little bit more complicated but not a lot. What's done is this 32 bits is expanded to 48 bits by every block taking a neighboring bit off of the neighboring block and just absorbing it. Than the substitution tables go from six bit values to four bit values and the subkey for each round is 48 of the 56 bits, just taking some subset of the 56 bits each time. That's all of DES. That's a full symmetric encryption function, a full block cipher. Let's get back to where we started. We know how to encrypt now. Great. We can go back and solve this problem. We get down to the point where we want to buy something. You get the question of what's your credit card number, and you say here it is. I can encrypt my credit card number because I know how to encrypt. I got a key. I can encrypt. Except what likely happens is I want to make a purchase and Amazon might say okay. Encrypt your credit card number with our shared key because this is how we're doing this, right? And if this is the first time you have gone there and say what's that shared key again? And we could have an infrastructure that says okay. First time you go to Amazon you say I want to set up an account and via snail mail three days later comes an envelope with the key in it. You type this 128-bit key into your browser and now you can buy your book. If we had no alternatives I guess this would be the best business model we had but hopefully we can do better. To do better, we need to fall on asymmetric encryption. Asymmetric encryption deals with the case of a user and a merchant which have no prior relationship and allows someone to encrypt a message without knowledge of decryption key. Somebody in the back of the room right now who I have not met, I could give you instructions and say okay. I want to send you a secret message. I tell you what to do, do this, this and this, tell me what you got. Pick some random numbers, do the following things and tell me the answer you got is. I will use that answer to encrypt my message to you and only you will understand. That's what we can do with asymmetric encryption. I'm going to show you how. Again, we'll talk about it in more depth later, but most of asymmetric cryptography, whether it's RSA or Diffie Hellman or elliptic curve cryptography, uses this equation in some form. When we deal in elliptic curves, we usually write it differently, but I prefer writing it this way even for elliptic curves, making the group operation multiplication and it still looks this way. Basically, this is an equation in some form or another that we need to work with and solve. Let's look at this equation a little bit more. When the unknown is over here, you are just computing this modular exponentiation and it turns out it can be computed efficiently. I'll spend a few minutes on that. Let's look at some other cases. When this is the unknown, this is known as the discrete log problem. If we eliminated the mod n for a while, then you are just computing the logarithm of z to the base y. We can do that. If there's no mod here that's easy. There are lots of ways of doing it, but you could always do it just sort of by a binary search and interpolation and get closer and closer. What the mod n does here is it makes things chaotic because taking things mod n -- just to make sure everybody in the room knows, mod n is just take the remainder when divided by n. It's basically just dropping off everything but the low order this, or the low order portion of anything. Think about what's happening there. We're just keeping the low end, so think about like you have a timer that's running really quickly and you see the low order stuff going wild and the high order stuff is slowly implementing, but the low order stuff is kind of chaotic. And if we're just taking that and we're trying to find values that cause something to happen at the low order, we're trying to do a binary search, which is not going to work because if we change this just a little bit there are small changes up at the higher end order stuff, but the low order stuff gets you something completely different. Binary search just doesn't work, and other methods just don't work. We've got this thing that is believed to be hard. We don't have a proof that it's hard, but it seems to be hard. What about some other cases? If y is the unknown, this is also believed to be hard to solve, except if you happen to know the factorization of n it turns out root finding, which again is easy if it's not mod n. It's just finding the x root of z, but if you know how to factor n then it turns out you can compute root’s mod n. I'm not going to tell you how today. I'll tell you how in a month or so if you come back to that session. It turns out it's not that hard. The final one just for completeness, I like to look at this. What happens if n is unknown? There's really not a lot of work there. I've heard of a couple people looking at that. I think it's an interesting problem. How hard is this? I don't really know. It might be worth investigating. Might be able to do something with it. Let's look first at the how do we compute this, y to the x mod n. You might just compute y to the x mod n by taking y to the power x and reducing it mod n. The problem is we like to use for integer operations, integers of at least 2048 bits. That's another group the word SDL requirement, RSA, similar things, 2048-bit minimum. If you have 2048 bit integers, x, y and n are 2048 bits, you raise y to the x power and to get something of about that many bits. Just for comparison, that's how many particles there are in the universe. Storing this number is going to be a problem. [laughter]. We can't come close to storing that value, so we're going to have to find another way. Fortunately, we can be a little bit more clever. We can do this mod n reduction with each multiply. We don't ever let the value get very big. We can multiply and then reduce, multiply and reduce and keep the value from ever getting big. We can solve the storage problem that way, but to do this exponentiation by just doing 2 to the 2048 multiplies, we have to do a lot of them pretty quickly. That's about how many we would have to do every second just to finish before the sun burns out. A good processor now, four gig, 2 to the 32 computations per second, so we're getting there, just a little slow, okay. We want to be able to do our encryption a little faster than this. Here's the trick. Think about how you learned to do multiplication in elementary school. Multiplication is just repeated addition, nothing more to it than that, but you don't multiply 23 x 49 by adding up 49 copies of 23. There are tricks. Basically what you could do is repeated doubling. We work base 10, so we don't do it by doubling; we do it they stand, multiples of 10. It's actually a little bit harder to learn to do it that way. Think of if we were doing everything base 2. To compute x times y what we can do is just compute y -- it's not obvious why I've highlighted a few of these yet, but it will be. Y, 2y, 4y, 8y, 16y and some up whichever of these we need to get x times y. Sorry, I should have mathed this initially, but here's the example. Suppose you want to do 26 y. I really should have this one tilted in a way. I see people are having trouble seeing the things low, sorry. I don't know that there is anything I can do about this at this point. I'll just try to lead and make it clear. Suppose you want to compute 26 y. If we write it out in binary, write 26 out in binary, you find that you need 16 y +8 y +2 y. And I don't know why I made that 2 green, sorry. Instead of doing 26 additions, or maybe 25 additions, you're doing four doublings and two additions and the doublings are additions; they are just easy additions. We save a lot of work. Exactly the same thing can be done with exponentiation. Suppose we want y to the x or we just repeatedly square value and all these things are done mod n, by the way, because we know we want to reduce it mod n, keep a small. Don't use of the universe. So we do these squarings and then we just multiplied together the things we need to get y to the 26 power, so we're not doing 25 multiplies. We're doing four multiplies here, four easy multiplies here and two more multiplies, much faster. If we count it up, now if we were trying to do, multiply 2000 bit, you do and exponentiation of 2000 bit values, then it takes about 3000 multiplications, 3000 ordinary multiplications to do it. Basically, 2000 squarings to get all of the things that we need and about 1000 side multiplications to multiply roughly half of these will have to be multiplied together, depending on the binary representation of this x up here. Make sense? Questions? Okay, good. Great. We can exponentiation quickly. Now let's do RSA. Let's show how we can use root findings to solve the problems that we have here of asymmetric encryption. This is going to be RSA in two slides and it really actually, most of its in one slide but two slides get everything you want to know. I tell you you want to create an RSA key? Take too big primes, multiply them together. Published the product. Tell everybody the product. That's public now. We typically use the x-value, the exponent, 65537. It's two to the 16th +1. It has a very nice binary representation. A 1, a whole bunch of zeros and a 1, right? So we have very few of those side multiplies, a nice little convenience. It doesn't have to be a huge exponent for the encryption. We tell you to encrypt your message y by just taking y to the power x, mod your chosen n and since you know the factorization of n, you can solve this discrete root problem and invert it. You know the factorization of n; you can do this if you don't know the factorization of n, it is a secret. It is opaque to you. That is RSA. Because if you take the x power and then the x route, you get back to where you started. That's all it is. In fact, RSA has another really nice property and this is slide 2 of RSA, signatures. We know that you can encrypt and then decrypt the encryption and get back to where you started, but there's also this odd little feature that you could decrypt first and then encrypt, take the x route and then the x power and you will also get back to where you started. Looks bizarre. Why would anybody want to do that? This is something that nobody else can do, so taking the x root of y effectively serves as a signature on y. It's something that only I could produce, so I have signed y by producing a value which is the x root of y and anybody can verify that signature by raising it to the xth power mod my n and they get y back. That's a digital signature. Great. We got encryption and digital signatures all in one simple thing. Back to our confidential data transfer, our purchase from Amazon. Now we are in much better shape. I want to make a purchase, really, I do. I'm trying. Amazon comes along and says here's my key, now encrypt with that. I send my credit card over that channel. Looks good. Everybody happy? No. Okay. Why not? Anybody want to help? I'm sorry. Try again. Are you sure it's their key, I thought I heard. Yep. That is a big problem. You might have an intermediary hanging out there. It's not just you and Amazon, but there may be somebody in the middle. You start browsing through the Amazon collection and the intermediary intercepts everything and forwards it on and gets the responses and forwards them back-and-forth and eventually, you get to a point where you want to make a purchase. The intermediary happily says, sure, I want to make a purchase, and Amazon says here's my public key, and the intermediary says and here is my public key, but it's different. It's a tilde, a crooked E. Uh-huh. They send back a crooked encryption function and you encrypt your credit card number very dutifully to the intermediary and now the intermediary has your credit card number. And if the intermediary is clever, it will then take your credit card number and re-encrypt it for Amazon so that you get your book or blunder or whatever it is you were trying to buy and you are none the wiser and Amazon is not the wiser, but your credit card number is now in somebody else's hands. This isn't good. So we add a feature, a digital certificate. This is, basically, a statement that somebody has signed saying what Amazon's public key really is. If we have a certificate authority that can do this signature and we believe that certificate authority, then we know what Amazon's real key is. The problem is now we just have bootstrap the problem or put it back down to how do we believe the certifying authority. Baked into our browsers are the root keys of various certifying authorities, so if they're good and they do their job well, then we can trust them and everything is good. Of course, there are a lot of certifying authorities in the browser, maybe some more trustworthy than others, maybe some more careful than others. If you look in your browser you can still find some old keys that have been explicitly disallowed because they were issued by the certifying authority VeriSign to Microsoft because somebody called them up and said I'm Microsoft, please sign my key. Please certify it and they did and nobody at Microsoft seems to know who that person was. Those keys never seem to have shown up, but you have to be careful. Okay, so we've got a cert. Can we do this now? I want to make a purchase. Amazon sends back a public key and a certificate saying this really is Amazon’s public key and now I encrypt. Okay? Are we good now? We're better, but you're wise. You know not to trust a cryptographer. >>: Replay attack. >> Josh Benaloh: Replay attack, perfect, thank you. I'll pay you later. [laughter]. Just give me your credit card and I'll be fine. This seems to work just fine. Here's a perfectly good exchange and somebody overheard this perfectly good exchange. And then later on whoever that eavesdropper was to overheard this goes back to Amazon and says I want to buy some really expensive, something really nice. And Amazon says, okay. Here's my public key and the certificate. And all I have to do is say sure. My credit card number is that thing, whatever that thing that went over the wire last time, yeah, it's the same one. And Amazon dutifully decrypts it and suddenly somebody else gets charged. So we're going to add one more component to this. We're going to add what's called a nonce, which is a coined word from number once. A nonce is just a random value that goes in and it changes every time and you send the encryption of the credit card together with a nonce and the next time Amazon sends it to someone else they will send a different nonce and if the same nonce doesn't come back each time than Amazon will know this is wrong, disallow it. Are we done? We're done to the extent that this is what is done today. I'll show you how. I'll claim it's not good enough, but let's take a look at how this is used today. There is a sordid history of SSL, which probably most people here have heard of and TLS, which probably most people here have heard of, and PCT which is a Microsoft standard which probably very few people here have heard of. The history, roughly, is SSL came out in 1994. It was produced by Netscape. Now, 1994, those of you here who are techno history buffs might recognize as this was the time of the big browser wars. Actually, this was a little bit before they started, but this was still the first salvo, essentially, and Netscape had this nice secure protocol and we weren't going to use, well we kind of had to, compatibility, but we didn't want to be bound to Netscape, so we came out with our own protocol called private communication technology. Part of the excuse for coming out with it was we, and when I say we in this case I mean especially Dan Simon who now works in Windows Phone, I think, found a big bug in SSL2 and we said we can fix that bug. We've got something better and we've got these nice new features and people should be using PCT and PCT was put in browsers and it was default and it was even used occasionally [laughter]. Next year Netscape came out with an improved SSL3, which fixed the bug and was very common. Lots of people used it. It was good and it got some new features. We are still competing with them. We have to come out with our next version, but how are we going to, even though we put in lots of enhancements in the new version, beat SSL3 with our new version? The trick is we come out with version 4. [laughter]. And we put out version 4 which had some nice enhancements and 4 is bigger than 3 so it must be better. And it was enough eventually to bring Netscape to the table in an IETF meeting and we eventually all got together and agreed on TLS which, version 1 existed for a long time. There is now a slightly newer, better version, but still it's a very, very nice protocol. It has a lot of good features. Basically, the way it works, the thing that has made this survive for so long is the handshake. The trick of the handshake is you've got a client and a server and the client starts off with okay. I want to set up a secure communication channel. Here are the protocols I understand. Here are the ciphers I understand. Here are the tools. Here are the languages I speak in some sense. The merchant can go through the list that the client provides. The server goes through and says okay. This one is good enough. I'll take that one. The server gets to choose. Let's use this protocol. Let's use RSA or let's use UCC or let's use Diffie Hellman and let's use this hash function and this symmetric cipher. I don't like DES anymore; I'll use AES, whatever. They can choose. And here is the public key and the certificate and the nonce and then okay. The public key is used to transmit a symmetric key and then all the subsequent communication is with the symmetric key once you've done this exchange. The really nice thing about this is it's very agile. We don't have to decide that next Thursday the internet is going to change ciphers. Everybody has to shut their old cipher down and start using a new cipher Thursday. It's very smooth. Servers can start accepting new ciphers. Browsers, clients can start implementing new ciphers. If they're not yet understood by the servers, that's okay. Eventually, old ones the servers will stop accepting. It's a smooth, gradual process. It's very agile; it's very nice. It's a good way of doing this negotiation and this has allowed this protocol to survive as new primitives, long and old ones lose favor. It works quite well. Once the negotiation is done, you get into the SSL/PCT/TLS record layer which basically says from then on you just use the symmetric cipher with the key that you sent over and you integrity check it with a hash, which I'll talk more about in another session and you wind up with what is sometimes known as hybrid cryptography. We use asymmetric crypto and its nice features for things that we can't do like sending a symmetric key to somebody that I've never talked to before. But the symmetric crypto is much, much much more efficient, so once we've done those things which we can't do without asymmetric stuff, then we do pretty much everything else symmetrically. And we have this hybrid of asymmetric wrapping the symmetric keys were doing the digital signatures and the bulk stuff is done with the much faster symmetric stuff. One thing I haven't talked about much and this is going to be, we're getting close to the end, which is good because I'm getting close to the end of my slides, is authentication and authorization. This is something which has been done woefully badly in the protocol that I just showed you. The reason is when a merchant is on the other end of this conversation, gets my credit card number, does the merchant really know who this merchant is talking to, that I'm the one on the other end? No. Not with basic SSL/TLS. There is something called client auth that can be used, but really never is. All they have is here's my credit card number. Okay. I will charge this person. I will assume that this credit card number is good and go from there. But I am my credit card number effectively. That's really not a really good way of doing things. Does the merchant know that I authorized the purchase? No. The merchant just sees my credit card number coming over the wire. If possession of a credit card number is used as a proxy for identification, then I can be impersonated. My claim is this is the reason for things like the Target reach. People will talk about well, Target had this failing of security, this failing of security. Not. The problem is right here. The problem is Target has data which allows an attacker to grab and impersonate millions of people because all you need to impersonate somebody is their credit card number. If we got rid of that there would be other sensitive data, what purchases I've made in the last year, but people aren't going to go through too much trouble to steal that stuff, at least not as much trouble. The real problem is we built this insane infrastructure in which a credit card is everything and if you have my credit card number you can impersonate me, or other things. Maybe my credit card number and by Social Security number or my date of birth. I can find for probably most people in this room, find your date of birth in less than 2 minutes and your driver’s license number. This is all public data, easily discoverable. It doesn't cost anything. Ask me at the end and if you want I'll show you how to do it. It used to be that your election records in this state, you could change your address with your date of birth and driver's license number. It turns out the driver’s license number in this state is deterministically computed from your date of birth, which is public. That's been fixed a little bit. It's a little bit better now, but not a lot better. I had a discussion with the director of elections a couple of years ago and we agreed. They have now added driver’s license issue date, which is far most people within two months of the expiration, which is on a birthday that is a multiple of five years. There's about six bits of entropy in there, better than nothing. Also, if my giving a merchant a credit card number is a proxy for an authorization, then merchants can cheat all they want. They can say you told me you wanted to buy this expensive thing. You sent me your credit card number, right, so you must have. It's not really authorization. There are ways that we can do better. We could use private authenticators like what people usually do with a bank. My bank and I somehow agree on a password and this password is supposed to not be shared with anybody else, supposed to be. Okay. That can work reasonably well for authentication. My bank can't impersonate me to somebody else if nobody else recognizes that password as having any meaning. It doesn't have any authorization value. The fact that I gave my bank a password and gave it some instructions and it shows this transcript of all these things that happened, doesn't mean that I was really on the other end. I can say you had my password to begin with. You could have done that all yourself. How do you know I was involved in that? And, of course, we all know the usability is horrible with passwords. People can't remember passwords. People are supposed to have individual passwords with many, many different entities, so they wind up having the same password and it just doesn't work well. Biometrics is a possibility. This has some value for local authentication. It's really bad for remote authentication and yet people like to use it for that. You see my fingerprint coming down a wire and you think I must be on the other end as though it's some sort of a secret? I leave my fingerprints everywhere I go. It's not a secret. It's a worse secret than just about anything else you can imagine. But somehow seeing my fingerprint means that I'm there. I don't know. That's really not a very good choice and, of course, biometrics are very difficult to revoke. I'd like to keep my fingers, thank you, so treating biometrics as some sort of secret data is often done and it's a terrible way to do things. It can have value for a local authentication. What we would like to do is the digital signatures. I think we've got like two slides. Digital signatures offer a good authentication mechanism. You can do a lot with them, but you need a public-key infrastructure of some sort to do this. Amazon has gotten a digital certificate on its key. How many people in this room have a public key that has been certified by some certifying authority? Okay. A few, but not many. Until we find a way of getting there, we need the killer app that will cause people to get certified. Some governments are issuing cards to citizens and if you have this and a good way to do this kind of real authentication, then it's great, but until we get there we're stuck with biometrics and passwords and broken systems. Another thing to mention is digital signatures you can do as a human being. You need computational systems, but they can be used as authorization if you do it right. It's more than two slides but these are quick slides. Sorry. Within SSL/PCT/TLS you can imagine you want to make a purchase. You get something that says please sign this authorization and you go and say okay. Here's my signature. Here's my certificate. There are still problems with this. It's not quite this simple. You have replay attacks. That signature could be reused if you don't make sure that it contains data that won't allow it so it should have nonces or something of that sort. There's also malleability issues. Malleability is signatures can be transformed in various ways. Or remember with RSA, signing something is exactly the same as decrypting something. So somebody comes along and says please sign this authorization and what they are really saying is please decrypt this for me. We have to build into the protocols ways of making sure that you never just signed something that you're given, that the signer also has to put something into the thing being signed of the signer’s choosing so it's not deterministic. And there are the human limitations of dealing with just how do you do the computation. People can't digitally signed things without computational assistance. The final thing here is we want to get some sort of hybrid authorization where we authorize a device, our own computing device to authorize on our behalf, but how do we do that? It changes the whole notion. Normally we talk about the protocol, but instead, we should be talking about what a former Microsoft college used to refer to as a ceremony. It's not just you and the merchant involved in the protocol. It's you and your device and your merchant. There's an interaction between you and your device and there's an interaction between your device and the merchant, whether your device is a smartcard or it's your home PC or it's your phone or whatever it is. The protocol is going all through here and there are all these places where there could be a break where this could be a perfect protocol but the interaction between you and your device can be taken advantage of. It should be that when you get to the point that you tell your device that you want to make a purchase and that gets passed on, there's an authorization step here. There's an authorization step here and you can enter something that you can do locally to your device, something reasonably simple. Your device, then, does the digital authorization, but this all has to be taken into account as a whole, not broken up piecemeal. That's kind of an overview of the kinds of things that I'll be talking about. Subsequent sessions -- just when did there, but that's okay. Subsequent sessions we'll be talking about symmetric protocols, asymmetric protocols, the sort of standard integer ones, non-integer ones, especially elliptic curves, lattice space systems, various protocol properties, forward secrecy and then some applications and maybe if people allow me I'll sneak in some election protocols at the very end. But if people want to hear about other things, if people are less interested in some of these things, there are other things, let me know. I want to keep this flexible and tried to find a way to sort of fit whatever people want to hit on. By the way, all of the other sessions, 1927, I apologize for the confusion. I think that must've been my mistake. My office is building 112. My admin’s office is 112. Somehow that slipped in. But we are in 99, but the future sessions will be in 1927. It's a smaller room that way, just around the corner. Okay. That's it. Any questions? [applause]. Seny? >>: You showed RSA? [indiscernible] version of RSA? Are you going to talk about some security at all? >> Josh Benaloh: Yeah. Maybe not in exactly that form, but yeah. The RSA that I showed, nobody in their right minds would use that in practice. You have to be very careful about exactly how you had things in the actual use of RSA and I'll talk about that. Among other things, to make sure that encryption is distinct from signatures and there is no possibility of confusion, but there are a lot of things. The Symantec security that you mentioned is one important aspect of understanding exactly the kind of security that the RSA can provide. I do hope to talk about that as well. Yep? >>: That last example of the ceremony during authorization, is chip and pin credit cards an example of that? >> Josh Benaloh: Yes. Chip and pin does that. Now it doesn't solve all the problems because chip and pin typically the way it's used credit card number still get transferred and maintained, so it's better, but chip and pin, which is done a lot in Europe, and there is now some expectation that they will come to the U.S. But people don't know; basically it's not enough to swipe a credit card, you have to have a pin also. There is a pin on the credit card and the credit card is supposed to engage in a more sophisticated protocol than the usual one of here I am. Here's my number. Anybody want my number? Which is what many of our cards to like our [indiscernible] cards. Basically, you ping them. Here's my information. We want to do better than that. We can do better than that. Chip and pin is a little bit better than what we're doing, but it's not what we should be doing. Yeah? >>: You did a talk about some of the tricks, algorithm in AES. Were you talking about [indiscernible]? >> Josh Benaloh: Yes. Actually, I debated whether or not to put in cipher mode today and I decided we are already over time so I'm glad I didn't. We'll spend plenty of time talking about cipher modes. >>: So we can go until three? >> Josh Benaloh: The usual way this is done is in the hour of talk, plenty of time for questions. We are flexible so, if people would like hour and a half talks, I can talk ad nauseam, I promise. But the tradition, anyway, is it's on the calendar until three o'clock so that there is some flexibility at the end. I'm not going to talk about cipher modes now. I'll talk about them next time unless you really want something, talk to me later. >>: I'm curious. There's a lot of government databases that hold very sensitive information regarding individuals or medical purposes. How good are those? >> Josh Benaloh: What's the quality of medical databases and other similar databases? It's very mixed. Some are pretty good. Some are not nearly as good as they should be, but there are problems as to just what's in there. Usually, there's a lot more in there then needs to be, then should be. Medical databases are kind of an exception because that's very sensitive stuff that really does need to be there for the most part. That's a case where we have to try to be very careful and instead not expose them as much as they tend to be exposed. They need some extra security. As opposed to Target which probably doesn't need the extra security. What they store is only what you bought in the last two years. It's still kind of sensitive, but it's a lot less sensitive than what medical tests I've had in the last two years. There isn't a good answer to that. It's not great. We can talk about some methods that are used, but you probably don't want to know all the details. [laughter]. >>: Actually I do. >> Josh Benaloh: Okay. We can talk little bit. Honestly, I don't know all the details. I know some of, but not all of it. Yeah? >>: [indiscernible] about the exponentiation where you multiply multiple times to go to n? >> Josh Benaloh: Okay. You know what? I'm going to try to do it this way. >>: While Josh puts up that slide, anybody here who hasn't signed that spreadsheet, then on your way out the door please do. >>: It's on the back table now? >> Josh Benaloh: Oh no. I hit the wrong key. I hit F5 instead of shift F5. This one? >>: Yeah. >> Josh Benaloh: Okay. You want to go a little further. >>: Where it was simplified. >> Josh Benaloh: So let's do exponentiation. >>: Yes. If someone knows how many times you exponentiation and how many times you added, is it hard to reach 2n or is it computationally [indiscernible] >> Josh Benaloh: If it's just the number of times, then you are pretty secure, because there aren't that many possibilities. The number of times you -- this is square and multiply here. You are always doing the same number of squares; basically, you should always be doing the same number of squares. The multiplies, the number of multiplies that you do should be clustered tightly around half of the number of squares and knowing that allowed you to actually figure out somebody's key, there aren't that many possibilities you could guess that and try it. The problem though is that sometimes if you are close in you can actually tell the difference. The usual approach is not due all the squares and then do a bunch of multiplies, it is while you are doing it, you square and then multiply or square and then not multiply and if you are not careful in your implementation, then there's a side channel attack and probably I should put on the list side channel attacks, which has been some time talking about some of those. But there's a side channel attack which is just sort of a timing attack or power analysis or listening to, if you can get close enough to a device it can even be done acoustically, listening to it. You can record it and listen and tell the difference between a multiply and a not multiply and that way you get the bits of the key one at a time. That would be very detrimental, so we are very careful in our implementations. One of the reasons you should use the Microsoft internal approved implementations and not roll your own is we're very aware of side channel attacks and try to build in as much resistance as we can. Anything else? >>: If you go back about two slides to the slide that had the 2048 on it. This one I thought when you said the crypto board recommends that all three be 2048 signatures. Is that what you said? Is that what you said? One of those everybody uses 65537 and so can you explain that? >> Josh Benaloh: These two need to be 2048 bit. Sorry, not these, these two. >>: [indiscernible] 65537. >> Josh Benaloh: It's n that's important here. Y is the message, so y will be padded separately. It's just a message. The message means that the computations will be done on 2048 bit values. It's just how many competitions and it turns out that with RSA as long as you're a little bit careful about a small exponent seems to be just as good as a large exponent. You get some interesting effects out of this and it's worth mentioning. This exponent is going to be public anyway, so you're not hiding anything by, or revealing anything by using this special form. The question is, does it make it easy to attack? It turns out that if you want something that is of this special form 1 all zeros and a 1, there are very few possibilities that meet all of the requirements. Three actually does, but there is some reason not to use three. There are some attacks on very, very small exponents if they are used carelessly. Basically, you could get something that doesn't get large enough to wrap mod n and if you had very badly and you just use raw RSA, you don't wrap mod and, then it just becomes taking cubed roots. So we want something that small, has this structure and sort of 65537 has turned out to be the perfect value and virtually everybody uses it. It leads this interesting asymmetry in asymmetric encryption with RSA that encryption is much, much faster than decryption, a couple of orders of magnitude faster. Verifying a signature is much faster than signing something because the verification is with a small exponent. The decryption which is root finding is, it's also just an exponentiation. It just turns out to be a large exponentiation. You use a full size -- and that case the exponent becomes 2048 bits here. You wind up encryption and decryption are using basically the same function, but the exponent is much larger for that secret operation and therefore, encryption is fast; decryption is slow. Often you can use that to your benefit. Sometimes you wind up on the wrong side of it. You end up with a weak client having to do the expensive operation and that's a nuisance. We try to formulate things when we can so that it's the other way around, but we can't always do that. >>: Will you be going into this in a lot more detail on your third talk? >> Josh Benaloh: Yeah, the third talk is when it's most likely. Yep. And I do intend on showing you how we compute these things. If you like math that's the talk to come to. Anything else? Okay. Thank you. Be sure to sign the sheet if you want a chance at the trinket. [applause]