COMMUNICATION SECURITY LECTURE 2: INTRODUCTION TO CRYPTOGRAPHY Dr. Shahriar Bijani Shahed University Spring 2016 SLIDES REFERENCES Matt Bishop, Computer Security: Art and Science, the author homepage, 2002-2004. Addam Schroll, Cryptography, Purdue university. Nikita Borisov, Cryptography, Illinois university, CS461, 2007. 2 DEFINITIONS Cryptography = the science of encryption Cryptanalysis = the science of breaking encryption Cryptology = cryptography + cryptanalysis 3 DEFINITIONS Plaintext: A message in its natural format (readable by an attacker) Ciphertext: Message changed to be unreadable by anyone except the intended recipients Key: Sequence that controls the operation and behavior of the cryptographic algorithm Keyspace: Total number of possible values of keys in a crypto algorithm 4 CRYPTOSYSTEM Quintuple (E, D, M, K, C) M set of plaintexts K set of keys C set of ciphertexts E set of encryption functions e: M K C D set of decryption functions d: C K M CRYPTOSYSTEM SERVICES Confidentiality Integrity Authenticity Nonrepudiation Access Control 6 ENCRYPTION SYSTEMS Substitution Cipher Replacing one letter with another Monoalphabetic Cipher: substitutes one letter in the ciphertext for one in the plaintext. It uses fixed substitution over the entire message (e.g. Caesar) Polyalphabetic Cipher: multiple substitution alphabets. A string as a key (e.g. Vigenère) Transposition Cipher Reordering the letters within a message 7 TYPES OF CRYPTOGRAPHY Stream Ciphers Encrypts 1 bit (or byte) of plaintext at a time Mixes plaintext with key stream Good for real-time services Block Ciphers Encrypts a fixed size of a block (n-bits of data) at one time Substitution and transposition 8 CRYPTOGRAPHIC METHODS Symmetric Same key for encryption and decryption Key distribution problem Asymmetric Mathematically related key pairs for encryption and decryption Public and private keys 9 CRYPTOGRAPHIC METHODS Hybrid Combines strengths of both methods Asymmetric distributes symmetric key Also known as a session key Symmetric provides bulk encryption Example: SSL negotiates a hybrid method 10 SYMMETRIC ALGORITHMS (BLOCK CIPHERS) DES / 3DES AES IDEA Blowfish RC4/ RC5 CAST SAFER Twofish KASUMI A5 (stream cipher) 11 ASYMMETRIC ALGORITHMS RSA Diffie-Hellman El Gamal Elliptic Curve Cryptography (ECC) 12 HASHING hash function maps any input length to a fixed-size output hash function h(x) must provide Compression: output length is small Efficiency: h(x) easy to compute for any x One-way : given a value y it is infeasible to find an x such that h(x) = y Weak collision resistance: given x and h(x), infeasible to find y x such that h(y) = h(x) Strong collision resistance: infeasible to find any x and y, with x y such that h(x) = h(y) 13 HASHING ALGORITHMS MD5 Computes 128-bit hash value Widely used for file integrity checking SHA-1 Computes 160-bit hash value NIST approved message digest algorithm 14 CRYPTOGRAPHIC ATTACKS Assume key adversary knows algorithm used, but not Three types of attacks: Ciphertext only: adversary has only ciphertext; goal is to find plaintext, possibly key Known plaintext: adversary has ciphertext, Learn (or guess) part of the corresponding plaintext, decrypt the rest plaintext; goal is to find key Chosen plaintext: adversary may supply plaintexts and obtain corresponding ciphertext; goal is to find key (or other messages) BASIS FOR ATTACKS Mathematical attacks Based on analysis of underlying mathematics Statistical attacks Make assumptions about the distribution of letters, pairs of letters (digrams), triplets of letters (trigrams), etc. Called models of the language Examine ciphertext, correlate properties with the assumptions. TRANSPOSITION CIPHER Rearrange letters in plaintext to produce ciphertext Example: Rail-Fence Cipher Plaintext is HELLO WORLD Rearrange as HLOOL ELWRD Ciphertext is HLOOL ELWRD ATTACKING THE CIPHER Anagramming If 1-gram frequencies match English frequencies, but other n-gram frequencies do not, probably transposition Rearrange letters to form n-grams with highest frequencies EXAMPLE Ciphertext: HLOOLELWRD Frequencies of 2-grams beginning with H HE 0.0305 HO 0.0043 HL, HW, HR, HD < 0.0010 Frequencies of 2-grams ending in H WH 0.0026 EH, LH, OH, RH, DH ≤ 0.0002 Implies E follows H EXAMPLE Arrange so the H and E are adjacent HE LL OW OR LD Read off across, then down, to get original plaintext! SUBSTITUTION CIPHERS Change characters in plaintext to produce ciphertext Example: Caesar cipher Plaintext is HELLO WORLD Change each letter to the third letter following it (X goes to A, Y to B, Z to C) Key is 3, usually written as letter ‘D’ Ciphertext is KHOOR ZRUOG Each letter gets mapped to another letter E.g. A -> E, B -> R, C -> Q, ... CAESAR CIPHER Historical Ciphers K=3 Outer: plaintext Inner: ciphertext CAESAR CIPHER Formally Encrypt(Letter, Key) = (Letter + Key) (mod 26) Decrypt(Letter, Key) = (Letter - Key) (mod 26) Encrypt(“NIKITA”, 3) = “QLNLWD” Decrypt(“QLNLWD”, 3) = “NIKITA” More Formally M = { sequences of letters } K = { i | i is an integer and 0 ≤ i ≤ 25 } E = { Ek | k K and for all letters m, Ek(m) = (m + k) mod 26 } D = { Dk | k K and for all letters c, Dk(c) = (26 + c – k) mod 26 } C=M ATTACKS Ciphertext only attack: Recover plaintext knowing only the ciphertext Ciphertext: HSPAA SLRUV DSLKN LPZHK HUNLY VBZAO PUN FREQUENCY ANALYSIS HSPAA SLRUV DSLKN LPZHK HUNLY VBZAO PUN Find most frequent letters 4 times: L 3 times: A, H, N, P, S, U Guess: Decrypt(L) = E Key = L-E = 7 Decrypt(HSPAA SLRUV DSLKN LPZHK HUNLY VBZAO PUN, 7) = ALITT LEKNO WLEDG EISAD ANGER OUSTH ING BRUTE FORCE Ciphertext = IGKYGXOYOTYKIAXK Decrypt(IGKYGXOYOTYKIAXK, HFJXFWNXNSXJHZWJ Decrypt(IGKYGXOYOTYKIAXK, GEIWEVMWMRWIGYVI Decrypt(IGKYGXOYOTYKIAXK, FDHVDULVLQVHFXUH Decrypt(IGKYGXOYOTYKIAXK, ECGUCTKUKPUGEWTG Decrypt(IGKYGXOYOTYKIAXK, DBFTBSJTJOTFDVSF Decrypt(IGKYGXOYOTYKIAXK, CAESARISINSECURE 1) = 2) = 3) = 4) = 5) = 6) = ATTACKING THE CIPHER Exhaustive search If the key space is small enough, try all possible keys until you find the right one Caesar cipher has 26 possible keys Statistical analysis Compare to 1-gram model of English STATISTICAL ATTACK Compute frequency of each letter in ciphertext: G 0.1 R 0.2 H 0.1 U 0.1 K 0.1 Z 0.1 O 0.3 Apply 1-gram model of English Frequency of characters (1-grams) in English is on next slide CHARACTER FREQUENCIES (1-GRAMS) IN ENGLISH a 0.080 h 0.060 n 0.070 t 0.090 b 0.015 i 0.065 o 0.080 u 0.030 c 0.030 j 0.005 p 0.020 v 0.010 d 0.040 k 0.005 q 0.002 w 0.015 e 0.130 l 0.035 r 0.065 x 0.005 f 0.020 m 0.030 s 0.060 y 0.020 g 0.015 z 0.002 Slide #9-30 STATISTICAL ANALYSIS f(c) frequency of character c in ciphertext (i) correlation of frequency of letters in ciphertext with corresponding letters in English, assuming key is i (i) = 0 ≤ c ≤ 25 f(c)p(c – i) so here, (i) = 0.1p(6 – i) + 0.1p(7 – i) + 0.1p(10 – i) + 0.3p(14 – i) + 0.2p(17 – i) + 0.1p(20 – i) + 0.1p(25 – i) p(x) is frequency of character x in English CORRELATION: (I) FOR 0 ≤ I ≤ 25 i 0 1 2 3 4 5 6 (i) 0.0482 0.0364 0.0410 0.0575 0.0252 0.0190 0.0660 i 7 8 9 10 11 12 (i) 0.0442 0.0202 0.0267 0.0635 0.0262 0.0325 i 13 14 15 16 17 18 (i) 0.0520 0.0535 0.0226 0.0322 0.0392 0.0299 i 19 20 21 22 23 24 25 (i) 0.0315 0.0302 0.0517 0.0380 0.0370 0.0316 0.0430 THE RESULT Most probable keys, based on : i = 6, (i) = 0.0660 i = 10, (i) = 0.0635 plaintext HELLO WORLD i = 14, (i) = 0.0535 plaintext AXEEH PHKEW i = 3, (i) = 0.0575 plaintext EBIIL TLOLA plaintext WTAAD LDGAS Only English phrase is for i = 3 That’s the key (3 or ‘D’) CAESAR’S PROBLEM Key is too short Can be found by exhaustive search Statistical frequencies not concealed well They look too much like regular English letters So make it longer Multiple letters in key Idea is to smooth the statistical frequencies to make cryptanalysis harder VIGÈNERE CIPHER Like Caesar cipher, but use a phrase Example Message THE BOY HAS THE BALL Key VIG Encipher using Caesar cipher for each letter: key VIGVIGVIGVIGVIGV plain THEBOYHASTHEBALL cipher OPKWWECIYOPKWIRG VIGENERE CIPHER A different caesar cipher per letter MORESECURETHANCAESAR (Ciphertext) + SECRETSECRETSECRETSE (Key) = FTUWXYVZUWYBTSFSJMTW M (13) + A (19) = F (6) mod 26 O (15) + E (5) = T (20) mod 26 ... VIGENERE ANALYSIS Key space? Frequency analysis? 26Length(Key) Doesn’t work because of different keys For many years, the Vigenère cipher was considered unbreakable! USEFUL TERMS period: length of key tableau: table used to encipher and decipher In earlier example, period is 3 Vigènere cipher has key letters on top, plaintext letters on the left polyalphabetic: the key has several different letters Caesar cipher is monoalphabetic Slide #9-38 VIGENERE ANALYSIS Guess period of the cipher= p Construct p frequency tables Cryptanalyze each one http://math.ucsd.edu/~crypto/java/EARLYCIPHERS/Vigenere.html Better yet, recover period Look for repeated n-grams VIGENERE ANALYSIS The index of coincidence measures the differences in the frequencies of the letters in the ciphertext. the probability that two randomly chosen letters from the ciphertext will be the same. Fc = frequency of cipher character c, N = length of the ciphertext Indices of coincidences for different periods: VIGENÈRE TABLEAU The key letters on top, plaintext letters on the left 41 RELEVANT PARTS OF TABLEAU A B E H L O S T Y G G H L N R U Y Z E I I J M P T W A B H V V W Z C G J N O T Tableau shown has relevant rows, columns only Example encipherments: key V, letter T: follow V column down to T row (giving “O”) Key I, letter H: follow I column down to H row (giving “P”) VIGENÈRE ANALYSIS Ciphertext: ADQYS MIUSB OXKKT MIBHK IZOOO EQOOG IFBAG KAUMF VVTAA MOCIO EQOOG BMBFV ZGGWP CIEKQ HSNEW CIDTW VECNE DLAAV RWKXS VNSVP HCEUT QOIOF MEGJS WTPCH AJMOC HIUIX Could this be a Caesar cipher? We find that the index of coincidence is 0.043, which indicates a key of length 5 or more. So we assume that the key is of length greater than 1, and apply the Kasiski method 43 VIGENÈRE ANALYSIS Repetitions of 2 letters or more The only factors that occur more in the gaps are 2 (in eight gaps) and 3 (in seven gaps). As a first guess, let us try 6. Factors of Letters Start End Gap length gap length MI 5 15 10 2, 5 OO 22 27 5 5 OEQOOG 24 54 30 2, 3, 5 FV 39 63 24 2, 2, 2, 3 AA 43 87 44 2, 2, 11 MOC 50 122 72 2, 2, 2, 3, 3 QO 56 105 49 7, 7 PC 69 117 48 2, 2, 2, 2, 3 NE 77 83 6 2, 3 SV 94 97 3 3 CH 118 124 6 2, 3 44 VIGENÈRE ANALYSIS To verify this guess, we compute the index of coincidence for each alphabet. We first arrange the message into 6 columns. Each column represents one alphabet. The indices of coincidence are: A D Q Y S M I U S B O X K K T M I B H K I Z O O O E Q O O G I F B A G K A U M F V V T A A C I D T W M O C I O E Q O O G B M B F V Z Alphabet #1: IC = 0.069 Alphabet #4: IC = 0.056 G G W P C I Alphabet #2: IC = 0.078 Alphabet #5: IC = 0.124 E K Q H S N Alphabet #3: IC = 0.078 Alphabet #6: IC = 0.043 E W V E C N E D L A A V R All ICs indicate a single alphabet N except for the ICs of alphabets #4 E (period between 1 and 2) and #6 (period O between 5 and 10). S W K X S V S V P H C U T Q O I F M E G J W T P C H A J M O C H I U I X 45 VIGENÈRE ANALYSIS Counting characters in each column (alphabet) : Column A B C D E F G H I J K L MN O P Q R S T U V WX Y Z #1 3 1 0 0 4 0 1 1 3 0 1 0 0 1 3 0 0 1 1 2 0 0 0 0 0 0 #2 1 0 0 2 2 2 1 0 0 1 3 0 1 0 0 0 0 0 1 0 4 0 4 0 0 0 #3 1 2 0 0 0 0 0 0 2 0 1 1 4 0 0 0 4 0 1 3 0 2 1 0 0 0 #4 2 1 1 0 2 2 0 1 0 0 0 0 1 0 4 3 1 0 0 0 0 0 0 2 1 1 #5 1 0 5 0 0 0 2 1 2 0 0 0 0 0 5 0 0 0 3 0 0 2 0 0 0 0 #6 0 1 1 1 0 0 2 2 3 1 1 0 1 2 1 0 0 0 0 0 0 3 0 1 0 1 unshifted H M M M H M M H H M M M M H H M L H H H M L L L L L An unshifted alphabet has the characteristics in the last row (L=low frequency, M = moderate frequency, H =high frequency) now compare the frequency counts in the six alphabets with the frequency count of the unshifted alphabet. The first alphabet matches the characteristics of the unshifted alphabet (note the values for A, E, and I in particular). 46 VIGENÈRE ANALYSIS the 3rd alphabet seems to be shifted with I mapping to A. in the 6th alphabet : V maps to A. 47 VIGENÈRE ANALYSIS 48 VIGENÈRE ANALYSIS With proper spacing and punctuation, we have A LIMERICK PACKS LAUGHS ANATOMICAL INTO SPACE THAT IS QUITE ECONOMICAL BUT THE GOOD ONES I'VE SEEN SO SELDOM ARE CLEAN, AND THE CLEAN ONES SO SELDOM ARE COMICAL. The key is ASIMOV. 49 VIGENERE ANALYSIS Here is a ciphertext message 50