Week 8 notes

advertisement
Cryptanalysis of Substitution Cipher
Reminder: alphabet is permuted, and each plaintext letter is replaced with appropriate
letter from permutation.
There are 26! possible permutations – exhaustive search is impossible.
To perform cryptanalysis using statistical methods, we need a pretty long piece of
ciphertext (so I’ll steal the one from the book).
General procedure followed:
1. Look for most common letters. If one is far in advance, guess it’s E. Other common
ones are probably from T, A, O, I, N, S, H, R.
2. Look at common bigrams, especially those involving letters you have guessed from
above. Using these, guess the other letter in the bigram. Most common bigrams (in
order): TH, HE, IN, ER, AN, RE, … (longer list in book)
3. Try to fill in each of the most common letters one at a time.
This can vary a lot depending on the ciphertext you have at hand – you may have to make
lots of guesses (many incorrect).
Refer to slides of tables & partial solutions from pages 29-32 of book.
Procedure followed for this example:
1. Z is most common by far  guess it’s e.
Other common letters: C,D,F,J,M,R,Y  these are probably from {t,a,o,i,n,s,h,r}
Look for common digrams _Z or Z_:
DZ: not sure, since it could be he or re
ZW: W is not very common; guess it’s d rather than r or s
We also have ZRW, RZW and RW occurring  guess R=n.
2. Look at bigrams Z_: try N=h, since NZ (he) is common and ZN (eh) is not.
We have the string RZCRWNZ on the 3rd line, which would b ne_ndhe  guess C=a.
3. Think about M… we have RNM which we think is nh_  M probably a vowel, i or o
… we’ll guess M=i, since ai is a more common digram than ao
4. Try to guess what’s encrypted to o… probably D,F,J or Y… guess Y, since otherwise
we’d get long strings of vowels
D,F,J still most common  likely from {r,s,t}
Since we have NMD = hi_, guess D = s
HNCMF = chair?  F=r and H=c  J=t by elimination.
5. Now just fill in more stuff that makes sense.
COSC 4P03 Week 8
1
Cryptanalyis of Vigenere Cipher
Reminder: use a keyword of length m, and add it to blocks of length m from the plaintext.
There are 26m choices for keywords of a fixed length m, but we don’t know m (and an
intelligent person would use a long value for m).
First step: try to determine m. (There are a couple of methods we can try.)
Kasiski test:
Suppose we have 2 identical segments of plaintext x positions apart, x = 0 mod m.
Then these are encrypted to same ciphertext.
Procedure:
 Look for identical sections in ciphertext, of length at least 3.
 Record distance between these sections, and find greatest common divisor.
 Guess that this is m, the length of the keyword.
Index of coincidence:
Let x be a string of alphabetic characters of length n.
Index of coincidence Ic(x): the probability that 2 random elements of X are identical.
Suppose the observed frequencies of A, …, Z are fo, f1, …
There are n choose 2 ways to choose 2 elements of x: n * (n-1) / 2
For each i, there are are fi choose 2 ways to choose both elements to be i: fi * (fi-1)/2
So:
Ic(x) = sum(fi * (fi – 1)) / (n * (n-1))
(measures likelihood that 2 letters are encrypted with the same keyword character)
We expect that Ic(x) ≈ sum((pi)2) = 0.065, where pi is the expected probability of each
letter.
Write out the ciphertext by columns of m elements each.
Count fi, the number of occurrences of each letter in a row yi of length n.
(Note that all letters in the same row have been shifted by the same letter of the keyword)
Compute Ic(yi) for each yi.
If m is the correct keyword length, each Ic(yi) should be close to 0.065.
If incorrect, it will be closer to 0.038 (index of coincidence for a random string).
Example (from book) see slides corresponding to example from p. 32 onwards.
Results:
m = 1: 0.045
m = 2: 0.046, 0.041
m = 3: 0.043, 0.050, 0.047
m = 4: 0.042, 0.039, 0.046, 0.040
m = 5: 0.063, 0.068, 0.069, 0.061, 0.072.
COSC 4P03 Week 8
2
Once we think we have a correct keyword length, we still have to find the actual
keyword.
Idea:
 Take a substring yi (from the strings used in test to find length). All of these will be
encrypted with the same letter of the keyword.
 Let f0, f1, … be the frequencies of A, B, … in yi and let yi have length n’ = n/m.
 The probability distribution of the letters in yi is f0/n’, f1/n’, …
Recall: substring yi is obtained by shifting plaintext by some ki

 hope fki/n’, f1+ki/n’, … is “close to” ideal probability distribution p0, p1, … (which
can be looked up)
 compare to these
Let Mg = sum(pi * f i+g) (i = 0 to 25) / n’
If g=ki then this should be about 0.065. (Note i+g computed modulo 26)
Use a computer to compute the above for each i, and g. Look for values close to 0.065.
Example: (see slide of table 1.4)
Put table 1.4 on overhead.
Results:
The key is likely (9,0,13,4,19) = JANET.
COSC 4P03 Week 8
3
Download