CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz Attacking the Vigenere cipher Let pi (for i=0, …, 25) denote the frequency of letter i in English-language text – Known that Σ pi2 ≈ 0.065 For each candidate period t, compute frequencies {qi} of letters in the sequence c0, ct, c2t, … For the correct value of t, we expect Σ qi2 ≈ 0.065 – For incorrect values of t, we expect Σ qi2 ≈ 1/26 Once we have the period, can use frequency analysis as in the case of the shift cipher Moral of the story? Don’t use “simple” schemes Don’t use schemes that you design yourself – Use schemes that other people have already designed and analyzed… A fundamental problem Wouldn’t it be nice if we could somehow prove that an encryption scheme is secure? But before that…we haven’t even defined what “secure” means! Modern cryptography Proofs – We won’t do proofs in this course, but we will state known results Definitions Assumptions Defining security Why is a good definition important? – If you don’t know what you want, how can you possibly know whether you’ve achieved it? – Forces you to think about what you really want • What is essential and what is extraneous – Allows comparison of schemes • May be multiple valid ways to define security – Allows others to use schemes; allows analysis of larger systems built using components – Allows for (the possibility of) proofs… Security definitions Two components – The threat model – The “security guarantees” or, looking at it from the other side, what counts as a successful attack Crucial to understand these issues before crypto can be successfully deployed! – Make sure the stated threat model matches your application environment – Make sure the security guarantees are what you need Security guarantee for encryption? So how would you define encryption? Adversary unable to recover the key – Necessary, but meaningless on its own… Adversary unable to recover entire plaintext – Good, but not enough Adversary unable to determine any information at all about the plaintext – How to formalize? – Can we achieve it? Defining secrecy (take 1) Even an adversary running for an unbounded amount of time learns nothing about the message from the ciphertext – (Except the length) Perfect secrecy (Shannon) Formally, for all distributions over the message space, all m, and all c: Pr[M=m | C=c] = Pr[M=m] Leaking the message length In general, encryption leaks the length of the message Possible to (partly) address this using padding – Inefficient – Generally not done Does not mean that length is unimportant! – In some cases, leaking length can ruin security The one-time pad Scheme Proof of security Properties of the one-time pad? Achieves perfect secrecy – No eavesdropper (no matter how powerful) can determine any information whatsoever about the plaintext Limited use in practice… – Long key length – Can only be used once (hence the name!) – Insecure against known-plaintext attacks These are inherent limitations of perfect secrecy Computational secrecy We can overcome the limitations of perfect secrecy by (slightly) relaxing the definition Instead of requiring total secrecy against unbounded adversaries, require secrecy against bounded adversaries except with some small probability – E.g., secrecy for 100 years, except with probability 2-80 How to define formally? A simpler characterization Perfect secrecy is equivalent to the following, simpler definition: – Given a ciphertext C which is known to be an encryption of either m0 or m1, no adversary can guess correctly which message was encrypted with probability better than ½ Relax this to give computational security! Is this definition too strong? Why not? The take-home message Weakening the definition slightly allows us to construct much more efficient schemes! However, we will need to make assumptions Strictly speaking, no longer 100% absolutely guaranteed to be secure – Security of encryption now depends on security of building blocks (which are analyzed extensively, and are believed to be secure) – Given enough time and/or resources, the scheme can be broken PRNGs A pseudorandom (number) generator (PRNG) is a deterministic function that takes as input a seed and outputs a string – To be useful, the output must be longer than the seed If seed chosen at random, output of the PRNG should “look random” (i.e., be pseudorandom) to any efficient distinguishing algorithm – Even when the algorithm knows G! (Kerchoffs’s rule) PRGs: a picture y{0,1}l chosen uniformly at random y ?? World 0 World 1 x {0,1}n chosen uniformly at random (poly-time) G(x) Far from identical, but Adv can’t tell them apart Notes Required notion of pseudorandomness is very strong – must be indistinguishable from random for all efficient algorithms – General-purpose PRNGs (rand( ), java.random) not sufficient for crypto Pseudorandomness of the PRNG depends on the seed being chosen “at random” – True randomness very difficult to obtain – In practice: randomness from physical processes and/or user behavior A computationally secure scheme The pseudo-one-time pad… – Theorem: If G is a pseudorandom generator, then this encryption scheme is secure (in the computational sense defined earlier) Which drawback(s) of the one-time pad does this address?