Uploaded by Mubeen Naeem

EE 418 Lecture6

advertisement
EE 418 Network Security and Cryptography
Lecture #6
October 18, 2016
Cryptanalysis.
Lecture notes prepared by Professor Radha Poovendran.
Tamara Bonaci
Department of Electrical Engineering
University of Washington, Seattle
Outline:
1.
2.
3.
4.
5.
1
Review: Introduction to cryptanalysis
Remarks on Letter Distribution of the English Language
Cryptanalysis of the Affine Cipher
Cryptanalysis of the Vigenère Cipher
Cryptanalysis of the Hill Cipher
Review: Introduction to Cryptanalysis
Last lecture, we started our discussion on how secure cryptosystems are, and how could one go about breaking them. In doing so, we turned to cryptanalysis, and started by considering one of the most important
assumptions in the modern cryptography, namely the Kerchoff ’s principle, which states that in assessing
the security of a cryptosystem, one should always assume that an attacker knows the details of the cryptosystem being used. Therefore, the security of the system should always be based on the key, and not on
the obscurity of a cryptographic algorithm.
1.1
Attack models
We then considered different goals that an attacker can have when attacking a channel between communicating parties. For example, an attacker may wish to:
1. Read one specific message.
2. Find the encryption/decryption key, and thus read all of the exchanged messages.
3. Corrupt Alice’s message into another message in such a way that Bob thinks that Alice has sent the
altered message.
4. Masquerade as Alice in order to communicate with Bob such that Bob believes he is communicating with
Alice.
For each of these goals, there are four main types of attacks that an attacker can use, and those types differ
in the amount of information an attacker has available when trying to determine the key. Those four
attack types are as follows.
Type of attack
Ciphertext only attack
Known plaintext attack
Chosen plaintext attack
Description
Eve only observes the ciphertext y
Eve knows the ciphertext y corresponding to plaintext x
Eve has temporary access to an encryption box.
The encryption box takes as input any chosen plaintext x and
outputs the ciphertext y
Chosen ciphertext attack Eve has temporary access to a decryption box.
The decryption box takes as input any chosen ciphertext y and
outputs the plaintext x
Based on these models, we can analyze the security of every cryptosystem.
1
2
Cryptanalysis of the Shift Cipher
– Ciphertext only: Let K = 3 and the plaintext be shift. We then get VKLIW as the cipher (for a right
shift). Assume Eve knows only the ciphertext V KLIW . Eve also knows that a shift cipher algorithm is
used for encryption. Given the small cardinality of the key space, Eve can try all the possible 26 shifts
in right direction. Upon shifting, the following plaintexts are obtained:
1st lef t shif t
2nd lef t shif t
3rd lef t shif t
vkliw
−→
ujkhv
−→
tijgu
−→
shif t, and so on. Since “shift” is the only dictionary word in the list of 26 possible words, Eve assumes that it is indeed the plaintext that was encrypted.
Therefore, Eve can also infer the original key K = 3.
– Known plaintext: If Eve knows a (plaintext, ciphertext) pair, then Eve can find the key by subtracting
the plaintext from the ciphertext mod 26. For instance, if Eve knows that plaintext b corresponds to
ciphertext E, then Eve can determine that K = 3.
– Chosen plaintext: Choose letter a as plaintext; the resulting ciphertext will be the key. For example,
if the ciphertext is P then K = 15.
– Chosen cipher: Choose A as the ciphertext. The plaintext is then the negative of the key K.
3
Remarks on Letter Distribution of the English Language
English language text has different frequencies for different alphabetic characters. An estimate of relative
frequencies (probabilities) of the 26 letters are presentedin Table 3. Note that letter e has the maximum
relative frequency of 0.127.
Table 1. Probabilities of occurrence of the 26 letters of the English language alphabet.
A
B
C
D
E
F
G
H
I
J
K
L
M
0.082 0.015 0.028 0.043 0.127 0.022 0.020 0.061 0.070 0.002 0.008 0.040 0.024
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
0.067 0.075 0.019 0.001 0.060 0.063 0.091 0.028 0.010 0.023 0.001 0.020 0.001
Similarly we can define frequencies of digrams, trigrams, initial letters, final letters, etc. More generally,
we can use the statistical properties of the English language to perform cryptanalysis. A key observation
here that the vowels ”a, e, i, o” and the letters ”t, s, b, h, d” have relatively high probability of appearance
compared to other characters. Table 3 indicates the rank order of vowels based on their frequencies, and
Table 3 the rank order of consonants ”t, s, d, n, h” based on their frequencies.
Table 2. Rank order of the probabilities of occurrence of the vowels.
E
A
I
O
U
0.127
0.082
0.075
0.070
0.028
2
Table 3. Probabilities of most frequently occurring consonants.
T
S
N
H
D
4
0.091
0.063
0.067
0.061
0.043
Cryptanalysis of the Affine Cipher
– Ciphertext only attack: Let’s assume Eve that has intercepted the following ciphertext:
FMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSH
VUFEDKAPRKDLYEVLRHHR
The most frequent letters are R with 8 occurrences, D with 7, E, K, H with 5 and F, V, S with 4. First
guess is that R = e and D = t. Given the encryption function
eK (x) = ax + b
(1)
4a + b = 17
(2)
19a + b = 3.
(3)
we get the following linear system:
Solving the system we obtain the unique solution a = 6, b = 19 (note that a solution must be in Z26 ).
But for the affine cipher a has to be relatively prime to 26. Given that gcd(26, 6) = 2, a = 6, b = 19 is
not a valid key. Second guess R = e and E = t. Solving the linear system yields a = 13 which again is
not a legal key. Third guess is R = e and K = t, which yields a = 3, and b = 5. Since this is a valid key
we decrypt the entire ciphertext to see if we get a meaningful English text.
algorithms are quite general def initions of arithmetic processes
Note: Besides the statistical analysis, Eve could have tried all possible 312 pairs (a, b) that constitute a
valid key for the affine cipher.
– Known plaintext attack: Let Eve know that uw = 20 22, has cipher KQ = 10 16. She can then setup
the following system of linear equations:
10 = 20a + b (mod 26),
(4)
16 = 22a + b (mod 26).
(5)
Equations 4 and 5 give:
6 = 2a mod 26. i.e. 2a = q × 26 + 6 ⇒ a = 3, 16. But gcd(16, 26) 6= 1 ⇒ a = 3. From Equation 4 we can
now get b as follows:
10 = 20 × 3 + b (mod 26),
(6)
i.e. − 50 = b (mod 26)
(7)
i.e. b = q × 26 + (−50) ⇒ q = 2 ⇒ b = 2.
(8)
Hence Eve only needs to know two pairs of (cipher, plaintext) pairs.
– Chosen plaintext: If Eve can choose ab = 0 1 as plaintext, the cipher will be:
0 × a + b ≡ b (mod 26),
(9)
1 × a + b ≡ a + b (mod 26).
(10)
and Eve can easily find the key K.
– Chosen ciphertext: Eve chooses AB as cipher, and proceeds as above.
3
5
5.1
Cryptanalysis of the Vigenére Cipher
Known Plaintext Attack
If Eve knows at least m (ciphertext, plaintext) pairs then by subtracting the plaintext from the ciphertext
she can get the vector of m keys.
5.2
Chosen plaintext attack
Choose aa..a
| {z } as plaintext, and get K as the ciphertext.
m
a
0
+ K1
K1
a
0
K2
K2
a
0
K3
K3
...
...
...
...
a
0
Km
Km
Note 1: One does not need to choose x = aa...a
| {z } as plaintext, as any known plaintext will also reveal the
m
key K.
5.3
Chosen Ciphertext Attack
Choose AAA..A
| {z } as a ciphertext, and the obtained plaintext is then the negative of the key K.
m
A
0
- K1
−K1
A
0
K2
−K2
A
0
K3
−K3
...
...
...
...
A
0
Km
−Km
Note 2: Again, one does not need to not choose AAA..A
| {z } as the ciphertext. Any chosen ciphertext will do.
m
5.4
Ciphertext only attack
We left this attack last as it is the hardest to launch. In general, an exhaustive search is very slow due to the
large cardinality of the keyspace. We can, however, perform a statistical analysis based on the structure of
the English language. The statistical analysis is more difficult than the affine and substitution cipher cases
because:
(a) the Vigenére cipher is a polyalphabetic cryptosystem, and
(b) the length of the key m is not known to Eve.
4
Consider the following example where the plaintext is x=weed : In the given example, alphabet e is mapped
PLAINTEXT: 22 4 4
KEY:
2 46
CIPHER:
24 8 10
Y I K
3
7
10
K
to I the first time, and to K the second time. Moreover, alphabets e and d both map to the same cipher
K. For long text, we can expect that all the letters may have equal frequency of occurrence and hence, the
letter frequencies may not be particularly useful.
Eve can still attempt to break the cryptosystem by executing the following attack in two stages:
1. Finding the key vector length m;
2. Finding the key vector K.
Finding key vector length m using Kasiski Test: The key length m can be found using the Kasiski test.
The idea behind the Kasiski test is that it is quite improbable to find pairs of identical segments of ciphertext
of length at least three, unless these segments are the result of the encryption of the same plaintext. In that
case, the distance δ of occurrence of the identical segment must be a multiple of m. That is, δ ≡ 0 (mod m).
To find the period of the Vigenére cipher using the Kasiski test, we execute the following steps:
1. Search ciphertext for pairs of identical segments with length at least 3.
2. Record distances between the starting positions of the segments.
3. Take Greatest Common Divisor (gcd ) of these distances as the key vector length m.
Let us illustrate the use of these techniques with an example. The following is a ciphertext obtained from
Vigenère Cipher.
CHREEVOAHMAERATBIAXXWTNXBEEOPHBSBQMQEQERBW
RVXUOAKXAOSXXWEAHBWGJMMQMNKGRFVGXWTRZXWIAK
LXFPSKAUTEMNDCMGTSXMXBTUIADNGMGPSRELXNJELX
VRVPRTULHDNQWTWDTYGBPHXTFALJHASVBFXNGLLCHR
ZBWELEKMSJIKNBHWRJGNMGJSGLXFEYPHAGNRBIEQJT
AMRVLCRREMNDGLXRRIMGNSNRWCHRQHAEYEVTAQEBBI
PEEWEVKAKOEWADREMXMTBHHCHRTKDNVRZCHRCLQOHP
WQAIIWXNRMGWOIIFKEE
CHR cipher appears at 1, 166, 236, 276, 286 start locations. So the distances from 1st occurence to other
four occurences are 165, 235, 275, 285 respectively. The gcd of these distances is 5, and so the most likely
length of the key vector is 5 according to the Kasiski test.
Finding the key vector K leveraging language: Let yi be the i-th character of the ciphertext, and let xi
be the corresponding character of the plaintext. If m is the key length, then for a sufficiently long plaintext,
the characters xi , xi+m , xi+2m , . . . will have the distribution of the English language. Furthermore, since,
under the Vigenere cipher, yi = xi + Ki , yi+m = xi+m + Ki , and so on, the characters yi , yi+m , yi+2m , . . . will
have the distribution of the English language, plus some fixed shift. Hence finding the correct shift is a matter
of finding the Ki such that subtracting Ki from yi , yi+m , . . . will result in a string with the distribution of
English.
A formal description of this approach is presented in Figure ??
5
VIGENERE CRYPTANALYSIS ALGORITHM
Input: Ciphertext y = y1 y2 . . . yn encrypted using Vigenere cipher
Output: Plaintext x = x1 x2 . . .
m ← KasiskiTest(y) //Key length m
for all i = 1, . . . , m
yi ← yi yi+m yi+2m . . .
Generate rank ordering of letters in yi , denoted yi1 , yi2 , . . .
Solve shift cipher equation xil + Kil ≡ yil mod 26 for each l
end for
z←0
while z == 0
Pick l1 , l2 , . . . , lm
K ← K1l1 K2l2 . . . Kmlm
x ← dK (y)
if x resembles English text
z←1
end if
end while
return x
Fig. 1. Algorithm for cryptanalysis of Vigenere cipher.
Let’s analyze the presented approach by continuing the example from above. We begin by generating the
vectors yi :
y1
y2
y3
y4
y5
=
=
=
=
=
CVABWEBQBUAWWQRWWXANTBDPXXRDWBFAXCWMNJJFAIACNRNCATBWKDMCDCQQXWK
HOEITESEWOOEGMFTIFUDSTNSNVTNDPASNHESBGSEGEMRDRSHEAIEORTHNHOANOE
RARANOBQRASAJNVRAPTCXUGRJRUQTHLVGRLJHNGYNQRRGINRYQPVEEBRVRHIRIE
EHAXXPQEVKXHMKGZKSEMMIMEEVLWYXJBLZEIWMLPRJVELMRQEEEKWMHTRCPIMI
EMTXBHMRXXXBMGXXLKMGXAGLLPHTGTHFLBKKRGXHBTLMXGWHVBEAAXHKZLWWGF
For y1 , the most frequent letters are W (9 times), A (7 times), B (6 times), and C (6 times).
Based on the information about the letter frequencies, we make the first guess that ciphertext W is mapped
to plaintext e. Using the encryption rule yi = Ki +xi mod 26 (1 ≤ i ≤ m), we write K1 = 22−4 = 18 mod 26.
For y2 , the most common ciphertext letters are E (10 times), T (7 times), O (6 times), and N (6 times). We
therefore guess that ciphertext E maps to plaintext e. For y3 , the most common ciphertext characters are R
(13 times), N (5 times), A (5 times), and V (4 times), and so we guess that ciphertext R maps to plaintext
e. In the ciphertext string y4 , the most common characters are E (10 times), M (8 times), X (4 times), and
L (4 times). Hence we guess that ciphertext E maps to plaintext e for y4 . Finally, in y5 , the most common
characters are X (10 times), G (7 times), L (6 times), and H (6 times), and so we guess that ciphertext X
maps to plaintext e in y5 . Our initial guess of the key is therefore given by K1 = 18, K2 = 0, K3 = 13,
K4 = 0, and K5 = 19. Decrypting with this key, we obtain plaintext:
kheeldonhtieeaajinxeetaximebpojsoqtyedeyjweveconkeiofxeeenhiegwmtymak
nzfigeetezeeinksffcsriugetvdpmnbskmejthihlntmnxseesfnwesfvevwzthlolnd
waedgynjpuxanayjoisibmfntlskhezieeyeruswirvbuwyrgamnrstlenelpoigariqe
djaimevskreetvdtlezrvmnvsardkheqoielecbadeijiceleeikhsorwhlrrmeutohok
hetrlnirgkhecsyoupdyavidfnemneovimser
So far it doesn’t look like English. Suppose we were wrong about the cipher W being e. Lets try A to
correspond to plain text e. under this mapping, the key K1 = 0 − 4 = −4 = 22 mod 26. The decryption
then goes as “yheeldon...” which also does not seems like regular English. We keep trying this and find
that none of the cipher letters W, A, B, C can map to plaintext letters e. We then try if the cipher letters
6
W, A, B, C can map to the second most frequent plaintext letter t. After checking each of them, we find that
the mapping of cipher letter C to t works well and the key for this case is K1 + 19 = 2 mod 26, leading to
K1 = 2 − 19 = −17 = 9 mod 26. Now the text reads as:
theelmonhtreeaasinxentaxivebpossoqthedeyswevelonkerofxenenhingwmthmak
nifigentezeninksofcsrrugetedpmnkskmesthihuntmngseesonwesovevwithlound
wandgynspuxawayjorsibmontlsthezineyerdswirebuwyagamnastlewelporgarize
djarmevstreetedtleirvmnesardtheqoreleckadeisicelneikhborwhurrmedtohot
hetrunirgthecshoupdhavidonemnnovimber
Now the message makes a bit more sense. But the fourth vector y4 looks like it has not been decrypted well
into regular English. It seems that the most frequent ciphertext letter E is not mapped to plaintext letter e.
If we try to map it to the plaintext letter t the text does not make sense either. But if we map the ciphertext
E to plaintext a then the key is K1 = 4 and the text reads as “thealmondtree...” and the text looks like
regular English.
thealmondtreewasintentativeblossomthedayswerelongeroftenendingwithmag
nificenteveningsofcorrugatedpinkskiesthehuntingseasonwasoverwithhound
sandgunsputawayforsixmonthsthevineyardswerebusyagainasthewellorganize
dfarmerstreatedtheirvinesandthemorelackadaisicalneighborshurriedtodot
hepruningtheyshouldhavedoneinnovember
We have a message which seems regular English, based on the key K = (9, 0, 13, 4, 19). With correct spacing
and punctuation, the message looks like:
The almond tree was in tentative blossom. The days were longer, often ending with magnificent
evenings of corrugated pink skies. The hunting season was over with hounds and guns put away for
six months. The vineyards were busy again as the well-organized farmers treated their vines and the
more lackadaisical neighbors hurried to do the pruning they should have done in November.
6
Cryptanalysis of the Hill Cipher
The Hill cipher is difficult to break with a ciphertext only attack, but a known plaintext attack can be easily
launched.
6.1
Known Plaintext Attacks
Assume that Eve knows that m = 2 and that the plaintext friday yields ciphertext PQCFKU. Given that
Eve knows at least two (plaintext, ciphertext) pairs, she can create a matrix equation Y = XK and solve
for K by inverting matrix X, so that K = X −1 Y .
For our example
5 17
X=
.
(11)
8 3
and the inverse is
X −1 =
9 1
2 15
.
(12)
We can then compute the key K to be:
K=
9 1
2 15
15 16
2 5
=
7 19
8 3
.
If m is unknown, Eve can proceed using trial and error for different values of m.
7
(13)
Sources for Today’s Lecture:
1. Douglas R. Stinson, Cryptography, Theory and Practice, 3rd edition. CRC Press, 2005, p. 1–39.
2. Wade Trappe and Lawrence C. Washington Introduction to Cryptography with Coding Theory. Prentice Hall, 2002,
p. 1–26 and 59–74.
3. Neil Daswani, Christoph Kern, and Anita Kesavan Foundations of Security, What Every Programmer Needs to
Know. Apress, 2007, p. 203–221.
8
Download