Solutions to problems for week 1

advertisement
Solutions to problems for week 1-2, Cryptography
1. From the keyword we deduce the following substitution:
plain a b c d e f g h i j k l m n o p q r s t u v w x y z
cipher K R Y P T O E N I A B C D F G H J L M Q S U V W X Z
Now encryption is straightforward:
secre
MTYLT
tmess
QDTMM
age
KET
2. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
a b c d e f g h i j k l m n o p q r s t u v w x y z
+k +l +a +s
s e c r
e t m e 7→
s s a g
e
3.
CPCJ
O E M W 7→ CPCJO EMWCD AYO
CDAY
O
(a)
HKPUFCMHY BHDDXZH
encrypted message
(b) No, the ciphertext is too short (a more precise argument based on information theory
will be discussed later).
Any message matching the following pattern is a possible plaintext:
?1 ?2 ?3 ?4 ?5 ?6 ?7 ?1 ?8 ?9 ?1 ?10 ?10 ?11 ?12 ?1
Other possible decryptions:
• rightward broomer
• embroaden lettuce
• equilobed terrace
• ...
Grouping the ciphertext letters in groups of five, and thus hiding word lengths, would
increase the number of possible plaintexts even more.
4. We use Kasiski’s test and note that the distance between the two first occurrences is 1283 −
37 = 1246, while the distance between the latter two is 2291 − 1283 = 1008. We factor
both these numbers:
1246 = 2 · 7 · 89.
1008 = 24 · 32 · 7.
Thus, the likely period is 7 (or possibly 14).
Remark: We could also calculate the gcd of 1008 and 1246, which is 14, and draw the same
conclusion.
1
5. The extended key is EXAMWF and we just have to subtract modulo 26, using A=0, . . . Z =
25:
W F M B H J
- E X A M W F
S I M P L E
The plaintext is simple.
6. A first crude estimate is as follows.
The number of “words” with n letters is 26n . Thus the total number of words with between
four and ten letters is
10
X
26n = 146813779461232.
n=4
Each keyword uniquely defines a substitution so the above number is also an upper bound
on the total number of possible substitutions in the keyword-based practice.
A more careful analysis shows the following:
Any ten-letter “word” with all letters distinct gives rise to one substitution. These are all
different. This contributes with
26 · 25 · 24 · · · 17 = 26!/16!
substitutions. But all shorter keywords (or keywords that become shorter because of removing duplicates) gives rise to a substitution that has already been counted among those above.
So the total number is
26!/16! = 19275223968000.
This should be compared with the total number without the keyword restriction, which is
26! = 403291461126605635584000000. Thus the keyword-based substitutions form only
a very small fraction of all substitutions; the fraction is 1/16! ≈ 4.8 · 10−14 .
7. To estimate the period we use the Kasiski test. The distance between the two occurrences
given is 241 − 10 = 231 = 3 · 7 · 11 positions. Possible periods are thus 3, 7 and 11.
If the guess is correct, we can immediately find the corresponding shifts: at position 10 the
shift is T − c = 19 − 2 = 17 = r. Similar computations for the other positions gives
the shift keys rrectcorrect. We now see that this is not periodic with periods 3 or 11,
while period 7 is possible. The keyword of length 7 starts at position 15; hence the keyword
is correct.
8.
(a) By studying the plaintext/ciphertext pair we note that the system used is not a simple
substitution cipher, since e.g. the two occurrences of ’y’ have different encryptions
(’A’ and ’R’). It is also not a transposition cipher, since the ciphertext contains letters
not occurring in the plaintext.
To see if the cipher is a Vigenère cipher we compute the successive shifts by subtracting plaintext from ciphertext letter by letter. The result is
SECRET SECRET SECRET SECR
and it becomes obvious that we have a Vigenère cipher with key SECRET .
(b) Again matching plaintext against ciphertext we find that nothing contradicts a substitution cipher. Collecting data, we find the following incomplete substitution:
plain a b c d e f g h i j k l m n o p q r s t u v w x y z
cipher E
G
I J K L M
R S T
V W X Y
A
C
It seems highly plausible that the cipher is a shift cipher with shift 4.
2
(c) Here we can see that the ciphertext is a permutation of the plaintext. Looking more
carefully, we see that the first eleven letters of plaintext are permuted to the first eleven
letters of ciphertext; the same holds for the last eleven letters. In both cases the permutation is
(3, 7, 0, 1, 6, 10, 2, 8, 5, 4, 9),
so we have a transposition cipher with block length 11.
9.
(a) It is enough to ask for encryption of any one-letter plaintext, e.g. ’a’. This will allow
us to find the shift.
(b) The plaintext message “aaaaaaaaaaaaa” will encrypt to repetitions of the keyword. The
number of ’a’:s depends on how long keywords one considers reasonable. Probably
one should choose a bit more than the estimated maximal key length, to doublecheck
on the repetition.
(c) Here one in general needs to supply the complete alphabet “abcd...xyz”, i.e. a plaintext
of 26 letters.
(d) For block length at most 26, again an alphabet string is enough. Longer periods needs
details of how messages are padded; probably several messages will have to be encrypted to solve the problem in general.
10.
(a) We follow the hint and consider a fixed n. The choice of n pairs can be done sequentially:
1. We choose two letters among the 26 available. This can be done in
26 × 25
1×2
ways.
2. We choose the next pair among the remaining 24 letters, which can be done in
24 × 23
1×2
ways.
···
n. We choose the nth pair among the remaining 26 − 2(n − 1) letters, which can be
done in
(28 − 2n) × (27 − 2n)
1×2
ways.
Multiplying the number of choices in these steps we get in total
26!
(26 − 2n)! × 2n
different ways. However, now each n-pair substitution has been counted several times,
since each order of selection of the n pairs has been counted. Thus we must divide
with the number of ways we can order these n pairs, i.e. n!. The final result, the
number of reciprocal substitutions that exchange n pair of letters, is then
26!
(26 − 2n)! × 2n × n!
Let us call this R(n).
The total number of reciprocal subtitutions is then
13
X
n=0
R(n) =
13
X
26!
= 532985208200576 ≈ 5.3 × 1014 .
n × n!
(26
−
2n)!
×
2
n=0
3
(b) The total number of substitutions is
26! = 403291461126605635584000000 ≈ 4.0 × 1026 ,
so the fraction of reciprocal substitutions is ≈ 1.3 × 10−12 .
(c)
R(13) = 7905853580625.
(d) The plugboard contributes with R(6) = 100391791500 different settings. The choice
of three rotors out of five, with order significant, gives 5 × 4 × 3 = 60 settings and 26
possible positions for each rotor gives 263 = 17576 possibilities. In total, this version
of Enigma had
100391791500 × 60 × 17576 = 105869167644240000 ≈ 1.1 × 1017
different settings. This is the same order of magnitude as the age of the universe in
seconds.
Remark: It is not claimed that all settings give rise to different ciphers; indeed, the
result in (a) shows that this is not the case.
(e) One way to convince oneself is to draw the machine and the signal path through the
different components. A more precise way is the following.
The plugboard and the rotors together form a product substitution S. If we denote the
reflector substitution with R, we get for the complete Enigma substitution SRS −1 ,
where juxtaposition means composition of substitution and S −1 the inverse substitution (signal back through rotors and plugboard). We need to show that SRS −1 is
reciprocal, using the fact that R is reciprocal. In this notation, the fact that R is reciprocal can be expressed as RR = I, where I is the identity substitution. We get
(SRS −1 )(SRS −1 ) = {regrouping using associativity}
(SR)(S −1 S)(RS −1 ) = {using inverse property}
(SR)(RS −1 ) = {regrouping using associativity}
S(RR)S −1 = {R is reciprocal}
SS −1 = {using inverse property}
I
so SRS −1 is reciprocal.
11. The block size is the least common multiple of m and n.
12. We recall that the IC is the probability that the letters at two distinct, randomly chosen,
positions in the text are equal. The expected value of the IC for a text taken from a language
P
where the letter probabilities are {pi } is p2i . Thus for plaintext English we have E(IC) =
ICEng ≈ 0.066, while a text where all letters have the same probabilities will have E(IC) =
ICunif orm = 1/26 ≈ 0.038.
Now to the problem itself. We consider a ciphertext of length N obtained by encrypting
English plaintext with a Vigenère cipher of period n. We think of the ciphertext as written
in n columns, where all letters in a column are shifted the same amount.
We consider two cases:
(a) The two positions are in the same column. In this case the two letters come from a
language with the same letter distribution as English and thus we should expect an IC
≈ ICEng .
4
(b) The two positions are in different columns. In this case the letters at the two positions
come from different shifts and we should expect that IC ≈ ICunif orm .
Thus the expected IC is a weighted average of the two cases, with weights corresponding to
the probabilities for the cases.
If we assume that N is much larger than n we can approximate the probability for case (a)
with 1/n (after the first position has been chosen in column C, the probability is 1/n that
the second choice is also in column C). A more exact calculation would take into account
that the number of possible choices in column C is one less after the first choice and that
the length of the columns may differ by one, but we leave that for the interested reader.
Summarizing, we get
E(IC) =
1
n−1
· ICEng +
· ICunif orm .
n
n
As a check, we see that for n = 1 (i.e a simple substitution), we get E(IC) = ICEng , the
result we know since before.
For large n, we approach the uniform case. A table of values for some values of n is
n
1
2
3
4
5
E(IC)
0.066
0.052
0.048
0.045
0.044
It seems clear that only could expect to identify the case n = 1 (i.e. not Vigenère at all) and
possibly the unlikely case n = 2.
Remark: This last table also shows in a precise sense what we already noted, that the
Vigenère cipher gives a flatter letter frequency distribution than a simple substitution cipher.
13. To understand what kind of system we are dealing with, a suitable first test is to compute the
index of coincidence. This turns out to be 0.042, which is way too low for the ciphertext to
be the result of a simple substitution. To find out whether the ciphertext could be the result
of a Vigenère cipher we look for repeated letter sequences in the ciphertext.
With computer help one easily finds that the string GAVOCYPLLZGDD occurs in the ciphertext at positions 155 and 477 and that the string BEHTPGRIKSS occurs at position 345
and 464. We use the Kasiski test and compute
477 − 155 = 322 = 2 × 7 × 23
464 − 345 = 119 = 7 × 17
which points to 7 as a likely period.
The next step is to write the ciphertext in 7 columns and compute letter distributions for each
of these. We illustrate with the first column, which starts with ARVGILII. We compute the
letter frequencies and get
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
3 0 2 0 2 1 7 2 15 3 0 7 6 0 0 6 4 7 7 2 1 7 1 5 3 0
We compute the correlation between the letter distribution for English and the various shifts
of this distribution. The result shows that the likely shift is 4, for which the correlation is
0.063; for all other shifts the corrrelations are less than 0.047. Thus the first letter of the
Vigenère keyword is e. Continuing in this way for the other six columns shows that the
keyword is entropy. Decryption is now easy (but tedious by hand).
5
Frequency (%) of Letters in English Text A B C D E 8.15 1.44
2.76
N O P Q R 7.10 8.00
1.98
0.12
6.83 F G H I J K L M 1.99 5.26
6.35
0.13
0.42
3.39
2.54 S T U V W X Y Z 6.10
10.47
2.46
0.92
1.54
0.17
1.98
0.08 3.79 13.11 2.92
Text:
The index of coincidence provides a measure of how likely it is to draw two matching letters by randomly selecting two letters from a given text. The chance of drawing a given letter in the text is (number of times that letter appears / length of the text). The chance of drawing that same letter again (without replacement) is (appearances ­ 1 / text length ­ 1). The product of these two values gives you the chance of drawing that letter twice in a row. One can find this product for each letter that appears in the text, then sum these products to get a chance of drawing two of a kind. This probability can then be normalized by multiplying it by some coefficient, typically 26 in English.
IC = 06779546681057744
Uniform: alokkbfyjubxvndtlcnaywdaduflkxcmiqyvtdtczwbulfqwjfzhbeigynslkuzumypfdkjdglzssrqbypjbw
riugchswhmifcpioynvjnpdghffxqivjqppszhqgvalekhhwsiihkiazfzbycleduznzicazuoggxpdpxnyhv
bjackzgyfjsezspbuoyiwehljyjwzqrabucmtkmateeuyvwsmuciajukkdgjvajxunlnxxpsdvncrludhxl
hihekhhwsiihkiazfzby
IC= 0.03889248006895066
The substitution cipher:(not for the text above)
wmzfxtdhzfngfwxwnwxjevxdmzoxfkvxdmzowmkwmkfgzzexenfzpjotkebmneloz
lfjpbzkofxwvjefxfwfjpfngfwxwnwxeszyzobdhkxewzawvmkokvwzopjoklxppz
ozewvxdmzowzawvmkokvwzoxwlxppzofpojtvkzfkovxdmzoxewmkwwmzvxdmzokh
dmkgzwxfejwfxtdhbwmzkhdmkgzwfmxpwzlxwxfvjtdhzwzhbrntghzl
IC= 0.06667729083665339 Vigenere Example: (not for the text above)
vptzmdrttzysubxaykkwcjmgjmgpwreqeoiivppalrujtlrzpchljftupucywvsyi
uuwufirtaxagfpaxzxjqnhbfjvqibxzpotciiaxahmevmmagyczpjxvtndyeuknul
vvpbrptygzilbkeppyetvmgpxuknulvjhzdtgrgapygzrptymevppaxygkxwlvtia
wlrdmipweqbhpqgngioirnxwhfvvawpjkglxamjewbwpvvmafnlojalh
IC= 0.04216733067729084
IC = (the probability to choose A for the first choice * the probability to choose A for the second choice) + (the probability to choose B for the first choice * the probability to choose B for the second choice) + (the probability to choose C for the first choice * the probability to choose C for the second choice) + …
(the probability to choose Z for the first choice * the probability to choose Z for the second choice) = IC_ uniform = (1/26 * 1/26 ) + … + (1/26 * 1/26 ) = 1/26 = 0.038
IC_english = (8.15 * 8.15) + (1.44 * 1.44) + … + (0.08 * 0.08)= 0.06
Download