Solutions to problems for week 1-2, Cryptography 1. From the keyword we deduce the following substitution: plain a b c d e f g h i j k l m n o p q r s t u v w x y z cipher K R Y P T O E N I A B C D F G H J L M Q S U V W X Z Now encryption is straightforward: secre MTYLT tmess QDTMM age KET 2. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 a b c d e f g h i j k l m n o p q r s t u v w x y z +k +l +a +s s e c r e t m e 7→ s s a g e 3. CPCJ O E M W 7→ CPCJO EMWCD AYO CDAY O (a) HKPUFCMHY BHDDXZH encrypted message (b) No, the ciphertext is too short (a more precise argument based on information theory will be discussed later). Any message matching the following pattern is a possible plaintext: ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?1 ?8 ?9 ?1 ?10 ?10 ?11 ?12 ?1 Other possible decryptions: • rightward broomer • embroaden lettuce • equilobed terrace • ... Grouping the ciphertext letters in groups of five, and thus hiding word lengths, would increase the number of possible plaintexts even more. 4. We use Kasiski’s test and note that the distance between the two first occurrences is 1283 − 37 = 1246, while the distance between the latter two is 2291 − 1283 = 1008. We factor both these numbers: 1246 = 2 · 7 · 89. 1008 = 24 · 32 · 7. Thus, the likely period is 7 (or possibly 14). Remark: We could also calculate the gcd of 1008 and 1246, which is 14, and draw the same conclusion. 1 5. The extended key is EXAMWF and we just have to subtract modulo 26, using A=0, . . . Z = 25: W F M B H J - E X A M W F S I M P L E The plaintext is simple. 6. A first crude estimate is as follows. The number of “words” with n letters is 26n . Thus the total number of words with between four and ten letters is 10 X 26n = 146813779461232. n=4 Each keyword uniquely defines a substitution so the above number is also an upper bound on the total number of possible substitutions in the keyword-based practice. A more careful analysis shows the following: Any ten-letter “word” with all letters distinct gives rise to one substitution. These are all different. This contributes with 26 · 25 · 24 · · · 17 = 26!/16! substitutions. But all shorter keywords (or keywords that become shorter because of removing duplicates) gives rise to a substitution that has already been counted among those above. So the total number is 26!/16! = 19275223968000. This should be compared with the total number without the keyword restriction, which is 26! = 403291461126605635584000000. Thus the keyword-based substitutions form only a very small fraction of all substitutions; the fraction is 1/16! ≈ 4.8 · 10−14 . 7. To estimate the period we use the Kasiski test. The distance between the two occurrences given is 241 − 10 = 231 = 3 · 7 · 11 positions. Possible periods are thus 3, 7 and 11. If the guess is correct, we can immediately find the corresponding shifts: at position 10 the shift is T − c = 19 − 2 = 17 = r. Similar computations for the other positions gives the shift keys rrectcorrect. We now see that this is not periodic with periods 3 or 11, while period 7 is possible. The keyword of length 7 starts at position 15; hence the keyword is correct. 8. (a) By studying the plaintext/ciphertext pair we note that the system used is not a simple substitution cipher, since e.g. the two occurrences of ’y’ have different encryptions (’A’ and ’R’). It is also not a transposition cipher, since the ciphertext contains letters not occurring in the plaintext. To see if the cipher is a Vigenère cipher we compute the successive shifts by subtracting plaintext from ciphertext letter by letter. The result is SECRET SECRET SECRET SECR and it becomes obvious that we have a Vigenère cipher with key SECRET . (b) Again matching plaintext against ciphertext we find that nothing contradicts a substitution cipher. Collecting data, we find the following incomplete substitution: plain a b c d e f g h i j k l m n o p q r s t u v w x y z cipher E G I J K L M R S T V W X Y A C It seems highly plausible that the cipher is a shift cipher with shift 4. 2 (c) Here we can see that the ciphertext is a permutation of the plaintext. Looking more carefully, we see that the first eleven letters of plaintext are permuted to the first eleven letters of ciphertext; the same holds for the last eleven letters. In both cases the permutation is (3, 7, 0, 1, 6, 10, 2, 8, 5, 4, 9), so we have a transposition cipher with block length 11. 9. (a) It is enough to ask for encryption of any one-letter plaintext, e.g. ’a’. This will allow us to find the shift. (b) The plaintext message “aaaaaaaaaaaaa” will encrypt to repetitions of the keyword. The number of ’a’:s depends on how long keywords one considers reasonable. Probably one should choose a bit more than the estimated maximal key length, to doublecheck on the repetition. (c) Here one in general needs to supply the complete alphabet “abcd...xyz”, i.e. a plaintext of 26 letters. (d) For block length at most 26, again an alphabet string is enough. Longer periods needs details of how messages are padded; probably several messages will have to be encrypted to solve the problem in general. 10. (a) We follow the hint and consider a fixed n. The choice of n pairs can be done sequentially: 1. We choose two letters among the 26 available. This can be done in 26 × 25 1×2 ways. 2. We choose the next pair among the remaining 24 letters, which can be done in 24 × 23 1×2 ways. ··· n. We choose the nth pair among the remaining 26 − 2(n − 1) letters, which can be done in (28 − 2n) × (27 − 2n) 1×2 ways. Multiplying the number of choices in these steps we get in total 26! (26 − 2n)! × 2n different ways. However, now each n-pair substitution has been counted several times, since each order of selection of the n pairs has been counted. Thus we must divide with the number of ways we can order these n pairs, i.e. n!. The final result, the number of reciprocal substitutions that exchange n pair of letters, is then 26! (26 − 2n)! × 2n × n! Let us call this R(n). The total number of reciprocal subtitutions is then 13 X n=0 R(n) = 13 X 26! = 532985208200576 ≈ 5.3 × 1014 . n × n! (26 − 2n)! × 2 n=0 3 (b) The total number of substitutions is 26! = 403291461126605635584000000 ≈ 4.0 × 1026 , so the fraction of reciprocal substitutions is ≈ 1.3 × 10−12 . (c) R(13) = 7905853580625. (d) The plugboard contributes with R(6) = 100391791500 different settings. The choice of three rotors out of five, with order significant, gives 5 × 4 × 3 = 60 settings and 26 possible positions for each rotor gives 263 = 17576 possibilities. In total, this version of Enigma had 100391791500 × 60 × 17576 = 105869167644240000 ≈ 1.1 × 1017 different settings. This is the same order of magnitude as the age of the universe in seconds. Remark: It is not claimed that all settings give rise to different ciphers; indeed, the result in (a) shows that this is not the case. (e) One way to convince oneself is to draw the machine and the signal path through the different components. A more precise way is the following. The plugboard and the rotors together form a product substitution S. If we denote the reflector substitution with R, we get for the complete Enigma substitution SRS −1 , where juxtaposition means composition of substitution and S −1 the inverse substitution (signal back through rotors and plugboard). We need to show that SRS −1 is reciprocal, using the fact that R is reciprocal. In this notation, the fact that R is reciprocal can be expressed as RR = I, where I is the identity substitution. We get (SRS −1 )(SRS −1 ) = {regrouping using associativity} (SR)(S −1 S)(RS −1 ) = {using inverse property} (SR)(RS −1 ) = {regrouping using associativity} S(RR)S −1 = {R is reciprocal} SS −1 = {using inverse property} I so SRS −1 is reciprocal. 11. The block size is the least common multiple of m and n. 12. We recall that the IC is the probability that the letters at two distinct, randomly chosen, positions in the text are equal. The expected value of the IC for a text taken from a language P where the letter probabilities are {pi } is p2i . Thus for plaintext English we have E(IC) = ICEng ≈ 0.066, while a text where all letters have the same probabilities will have E(IC) = ICunif orm = 1/26 ≈ 0.038. Now to the problem itself. We consider a ciphertext of length N obtained by encrypting English plaintext with a Vigenère cipher of period n. We think of the ciphertext as written in n columns, where all letters in a column are shifted the same amount. We consider two cases: (a) The two positions are in the same column. In this case the two letters come from a language with the same letter distribution as English and thus we should expect an IC ≈ ICEng . 4 (b) The two positions are in different columns. In this case the letters at the two positions come from different shifts and we should expect that IC ≈ ICunif orm . Thus the expected IC is a weighted average of the two cases, with weights corresponding to the probabilities for the cases. If we assume that N is much larger than n we can approximate the probability for case (a) with 1/n (after the first position has been chosen in column C, the probability is 1/n that the second choice is also in column C). A more exact calculation would take into account that the number of possible choices in column C is one less after the first choice and that the length of the columns may differ by one, but we leave that for the interested reader. Summarizing, we get E(IC) = 1 n−1 · ICEng + · ICunif orm . n n As a check, we see that for n = 1 (i.e a simple substitution), we get E(IC) = ICEng , the result we know since before. For large n, we approach the uniform case. A table of values for some values of n is n 1 2 3 4 5 E(IC) 0.066 0.052 0.048 0.045 0.044 It seems clear that only could expect to identify the case n = 1 (i.e. not Vigenère at all) and possibly the unlikely case n = 2. Remark: This last table also shows in a precise sense what we already noted, that the Vigenère cipher gives a flatter letter frequency distribution than a simple substitution cipher. 13. To understand what kind of system we are dealing with, a suitable first test is to compute the index of coincidence. This turns out to be 0.042, which is way too low for the ciphertext to be the result of a simple substitution. To find out whether the ciphertext could be the result of a Vigenère cipher we look for repeated letter sequences in the ciphertext. With computer help one easily finds that the string GAVOCYPLLZGDD occurs in the ciphertext at positions 155 and 477 and that the string BEHTPGRIKSS occurs at position 345 and 464. We use the Kasiski test and compute 477 − 155 = 322 = 2 × 7 × 23 464 − 345 = 119 = 7 × 17 which points to 7 as a likely period. The next step is to write the ciphertext in 7 columns and compute letter distributions for each of these. We illustrate with the first column, which starts with ARVGILII. We compute the letter frequencies and get A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 3 0 2 0 2 1 7 2 15 3 0 7 6 0 0 6 4 7 7 2 1 7 1 5 3 0 We compute the correlation between the letter distribution for English and the various shifts of this distribution. The result shows that the likely shift is 4, for which the correlation is 0.063; for all other shifts the corrrelations are less than 0.047. Thus the first letter of the Vigenère keyword is e. Continuing in this way for the other six columns shows that the keyword is entropy. Decryption is now easy (but tedious by hand). 5 Frequency (%) of Letters in English Text A B C D E 8.15 1.44 2.76 N O P Q R 7.10 8.00 1.98 0.12 6.83 F G H I J K L M 1.99 5.26 6.35 0.13 0.42 3.39 2.54 S T U V W X Y Z 6.10 10.47 2.46 0.92 1.54 0.17 1.98 0.08 3.79 13.11 2.92 Text: The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text. The chance of drawing a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again (without replacement) is (appearances ­ 1 / text length ­ 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English. IC = 06779546681057744 Uniform: alokkbfyjubxvndtlcnaywdaduflkxcmiqyvtdtczwbulfqwjfzhbeigynslkuzumypfdkjdglzssrqbypjbw riugchswhmifcpioynvjnpdghffxqivjqppszhqgvalekhhwsiihkiazfzbycleduznzicazuoggxpdpxnyhv bjackzgyfjsezspbuoyiwehljyjwzqrabucmtkmateeuyvwsmuciajukkdgjvajxunlnxxpsdvncrludhxl hihekhhwsiihkiazfzby IC= 0.03889248006895066 The substitution cipher:(not for the text above) wmzfxtdhzfngfwxwnwxjevxdmzoxfkvxdmzowmkwmkfgzzexenfzpjotkebmneloz lfjpbzkofxwvjefxfwfjpfngfwxwnwxeszyzobdhkxewzawvmkokvwzopjoklxppz ozewvxdmzowzawvmkokvwzoxwlxppzofpojtvkzfkovxdmzoxewmkwwmzvxdmzokh dmkgzwxfejwfxtdhbwmzkhdmkgzwfmxpwzlxwxfvjtdhzwzhbrntghzl IC= 0.06667729083665339 Vigenere Example: (not for the text above) vptzmdrttzysubxaykkwcjmgjmgpwreqeoiivppalrujtlrzpchljftupucywvsyi uuwufirtaxagfpaxzxjqnhbfjvqibxzpotciiaxahmevmmagyczpjxvtndyeuknul vvpbrptygzilbkeppyetvmgpxuknulvjhzdtgrgapygzrptymevppaxygkxwlvtia wlrdmipweqbhpqgngioirnxwhfvvawpjkglxamjewbwpvvmafnlojalh IC= 0.04216733067729084 IC = (the probability to choose A for the first choice * the probability to choose A for the second choice) + (the probability to choose B for the first choice * the probability to choose B for the second choice) + (the probability to choose C for the first choice * the probability to choose C for the second choice) + … (the probability to choose Z for the first choice * the probability to choose Z for the second choice) = IC_ uniform = (1/26 * 1/26 ) + … + (1/26 * 1/26 ) = 1/26 = 0.038 IC_english = (8.15 * 8.15) + (1.44 * 1.44) + … + (0.08 * 0.08)= 0.06