Chapter 3 – Linear Cipher In this chapter we will combine the previous two ciphers into one super cipher, the Linear Cipher. We first multiply a plain letter value and afterwards perform an additional Caesar shift. This allows more encryptions thus hampering an eavesdropper’s job. We will be able to use our knowledge of the two previous chapters to discover the number of possible unique encryptions for the Linear Cipher. The Linear Cipher turns out to be an insecure cipher as well. However, it offers the opportunity to study the “Euclidean Algorithm” to find the good encoding keys and the “Extended Euclidean Algorithm” to find the corresponding decoding keys. Furthermore, we will learn a very important aspect of cryptography: How to use letter frequency analysis in order to crack ciphers. After learning how to crack the Linear Cipher we will extend this technique to crack any Monoalphabetic Cipher, even the Random Substitution Cipher. 3.1 Encryption using the Linear Cipher The Caesar and the Multiplication Cipher are not secure ciphers. To improve the security of an encrypted document we combine the Caesar and the Multiplication Cipher: we first multiply each plain letter by an integer a as done in the Multiplication Cipher and consequently shift it by b positions. We therefore obtain the following Definition of the Linear Cipher: The Linear Cipher encodes each plain letter P to a cipher letter C using the following encoding function: C = a*P + b MOD M where the encoding key consists of the pair of integers (a,b). We call a the factor key and b the shift key. Let us first investigate how many possible encryptions the Linear Cipher offers. Certainly more than the two previous ciphers, but how many exactly? Therefore, we have to first answer the following question: Does the Linear Cipher yield unique encryptions for any key pair (a,b)? Take a moment and come up with your guess. Let’s start by checking the factor key a=2 and the shift key b=4. Thus, we use the encryption function C= 2*P + 4 MOD 26 to encode the virus carrier message as follows: 1 PLAIN TEXT P 2*P C=2*P+4 MOD 26 Cipher text A 0 N 13 T 19 I 8 S 18 T 19 H 7 E 4 C 2 A 0 R 17 R 17 I 8 E 4 R 17 0 4 0 4 12 16 16 20 10 14 12 16 14 18 8 12 4 8 0 4 8 12 8 12 16 20 8 12 8 12 e e q u o q s m i e m m u m m We observe that this encryption does not produce the desired unique encryption. I.e. both the A and the N encode to the cipher letter e, also both R and E encode to m. The recipient does not know for sure how to decode the cipher letters e and m resulting in ambiguous messages. What causes the ambiguity? Is it the factor key a=2? Or the shift b=4? The answer is easy. Shifting each letter never causes ambiguity. However, the factor key a=2 turns A = 0 and N = 13 into a = 0 making the cipher code not unique. The same will happen for any other factor key that was a bad key in the Multiplication Cipher. Vice versa, a good key in the Multiplication Cipher must be a good factor key here since the final shift does not change the uniqueness of the encryption. Thus, the uniqueness solely depends on the chosen factor key a and not at all on the shift. This becomes even more evident if we choose the bad factor key a=13 and the shift key b=4. The corresponding encoding function is C= 13*P + 4 MOD 26. PLAIN TEXT A 0 N 13 T 19 I 8 S 18 T 19 H 7 E 4 C 2 A 0 R 17 R 17 I 8 E 4 R 17 13*P C=13*P + 4 Cipher text 0 4 e 13 17 r 13 17 r 0 4 e 0 4 e 13 17 r 13 17 r 0 4 e 0 4 e 0 4 e 13 17 r 13 17 r 0 4 e 0 4 e 13 17 r The multiplication with the factor key a=13 only yields 0 and 13. The final shift of 4 then produces the two cipher letters 4=e and the 17=r which makes the Cipher Code impossible to decode. 2 Recall that a=3 was a good key for the Multiplication Cipher MOD 26, so that we now encode the virus message using the good factor key a=3 and the final shift b=4. Thus, using the encoding function C= 3*P + 4 MOD 26 we obtain the following: PLAIN TEXT 3*P C=3*P+4 Cipher text A 0 N 13 T 19 I 8 S 18 T 19 H 7 E 4 C 2 A 0 R 17 R 17 I 8 E 4 R 17 0 4 e 13 17 r 5 9 j 24 2 c 2 6 g 5 9 j 21 25 z 12 16 q 6 10 k 0 4 e 25 3 d 25 3 d 24 2 c 12 16 q 25 3 d Exercise1: Identify the key pairs (a,b) that produce unique encryptions. Exercise2: Can you guess a decoding function for any encoding function? Hint: it will be again a Linear Cipher. Further questions to investigate: 1. The good keys of the Multiplication Cipher serve as good factor keys for the Linear Cipher. Does this implies that there are again (M) good factor keys for a given alphabet length M? 2. How many encryptions does the Linear Cipher therefore allow? Do they make the Linear Cipher a secure cipher? 3. We have to set up the decoding function so that the recipient can decode the encrypted message. 4. How could an eavesdropper possibly crack Linear Cipher-encoded message? We will answer these questions in the following sections. 3 3.1.1 The Linear Cipher offers 12 Good Factor Keys and 26 Good Shift Keys for M=26 To validate his conjecture that there are again (M) good factor keys for a given alphabet length M recall the multiplication table of the Multiplication Cipher for an alphabet of length M=26: PLAIN LETTER 0 Factor Key a a 0 A 0 B 0 C 0 0 D 0 0 E 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F G H I J K L M N O P Q R S T U V W X Y Z 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 2 3 4 5 2 0 2 4 6 8 10 12 14 16 18 20 22 24 3 0 3 6 9 12 15 18 21 24 4 0 4 8 12 16 20 24 5 0 5 10 15 20 25 6 0 6 12 18 24 4 10 16 22 7 0 7 14 21 2 9 16 23 8 0 8 16 24 6 14 22 9 0 6 4 7 2 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 4 2 3 9 18 1 10 19 2 11 20 10 0 10 20 4 14 24 8 18 11 0 11 22 7 18 12 0 12 24 10 22 13 0 13 0 13 0 13 0 13 0 13 14 0 14 2 16 4 18 6 20 8 22 10 24 12 15 0 15 4 19 8 23 12 16 0 16 6 22 12 2 18 17 0 17 8 25 16 7 24 15 18 0 18 10 2 20 12 19 0 19 12 5 24 17 10 20 0 20 14 8 21 0 21 16 11 3 14 25 10 21 8 20 6 18 1 16 8 8 4 8 0 22 18 14 10 24 0 24 22 20 18 16 14 12 10 25 0 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 8 6 4 2 8 18 7 16 25 2 12 22 1 12 23 8 17 6 16 8 19 4 15 4 16 2 14 0 13 0 13 0 13 6 20 8 22 10 24 12 3 18 7 22 11 8 24 14 4 20 10 3 20 11 2 19 10 4 22 14 4 23 16 2 22 16 10 0 22 18 14 10 4 2 10 18 6 18 3 24 19 14 7 4 12 20 0 13 2 20 12 0 23 20 17 14 11 2 25 22 19 16 13 10 8 14 20 5 12 19 2 18 6 25 18 11 0 20 14 6 11 16 21 2 6 21 10 25 14 4 21 12 6 10 14 18 22 1 8 20 4 18 6 22 12 22 5 8 2 17 2 6 15 24 5 16 0 13 2 16 8 11 14 17 20 23 3 10 17 24 6 14 22 4 14 24 9 20 0 13 8 5 4 10 16 22 8 15 22 5 14 23 0 18 10 2 23 18 13 2 24 20 16 12 1 8 16 24 0 16 2 7 12 17 22 6 12 18 24 0 14 1 20 13 6 2 23 8 8 10 12 14 16 18 20 22 24 0 12 24 10 22 5 22 13 4 24 18 12 7 6 8 12 16 20 24 0 10 20 9 24 13 6 24 16 3 22 15 0 0 13 4 20 10 6 23 14 4 22 14 1 22 17 12 6 5 20 0 2 13 24 2 14 0 13 4 4 13 22 6 16 6 17 4 16 8 24 14 2 22 16 10 6 3 12 21 0 6 13 20 2 10 18 2 12 22 4 8 13 18 23 8 14 20 4 11 18 25 4 12 20 2 7 10 13 16 19 22 25 6 10 14 18 22 9 14 19 24 0 6 9 9 1 18 9 6 24 16 8 2 21 14 7 4 24 18 12 6 4 25 20 15 10 5 2 24 20 16 12 8 4 9 6 3 8 6 4 2 4 3 2 1 1 24 21 18 15 12 0 24 22 20 18 16 14 12 10 9 8 7 6 5 The bold rows display the factor keys that produce unique encryptions as every row contains every integer from 0 to 25 exactly once. If we consider the factor key a=3 for a moment and varying the key shifts between b=0 and b=25 we obtain the following encryption table: 4 PLAIN LETTERS Key pairs a=3 , b=0 a=3 , b=1 a=3 , b=2 a=3 , b=3 a=3 , b=4 a=3 , b=5 a=3 , b=6 a=3 , b=7 a=3 , b=8 a=3 , b=10 a=3 , b=11 a=3 , b=12 a=3 , b=13 a=3 , b=14 a=3 , b=15 a=3 , b=16 a=3 , b=17 a=3 , b=18 a=3 , b=19 a=3 , b=20 a=3 , b=21 a=3 , b=22 a=3 , b=23 a=3 , b=24 a=3 , b=25 A B C D 0 0 1 3 2 6 3 4 5 6 7 8 9 12 15 18 21 24 1 4 7 10 2 5 8 11 3 6 9 12 4 7 10 13 5 8 11 14 6 9 12 15 7 10 13 16 8 11 14 17 9 12 15 18 10 13 16 19 11 14 17 20 12 15 18 21 13 16 19 22 14 17 20 23 15 18 21 24 16 19 22 25 17 20 23 0 18 21 24 1 19 22 25 2 20 23 0 3 21 24 1 4 22 25 2 5 23 0 3 6 24 1 4 7 E F G H I J K L M N O P Q R S T U V W X Y Z 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 11 14 17 20 23 0 3 6 9 12 15 18 21 24 1 4 7 10 13 16 19 22 25 2 5 8 14 17 20 23 0 3 6 9 12 15 18 21 24 15 18 21 24 1 4 7 10 13 16 19 22 25 16 19 22 25 2 5 8 11 14 17 20 23 0 17 20 23 0 3 6 9 12 15 18 21 24 1 18 21 24 1 4 7 10 13 16 19 22 25 2 19 22 25 2 5 8 11 14 17 20 23 0 3 20 23 0 3 6 9 12 15 18 21 24 1 4 21 24 1 4 7 10 13 16 19 22 25 2 5 22 25 2 5 8 11 14 17 20 23 0 3 6 23 0 3 6 9 12 15 18 21 24 1 4 7 24 1 4 7 10 13 16 19 22 25 2 5 8 25 2 5 8 11 14 17 20 23 0 3 6 9 0 3 6 9 12 15 18 21 24 1 4 7 10 1 4 7 10 13 16 19 22 25 2 5 8 11 2 5 8 11 14 17 20 23 0 3 6 9 12 3 6 9 12 15 18 21 24 1 4 7 10 13 4 7 10 13 16 19 22 25 2 5 8 11 14 5 8 11 14 17 20 23 0 3 6 9 12 15 6 9 12 15 18 21 24 1 4 7 10 13 16 7 10 13 16 19 22 25 2 5 8 11 14 17 8 11 14 17 20 23 0 3 6 9 12 15 18 9 12 15 18 21 24 1 4 7 10 13 16 19 10 13 16 19 22 25 2 5 8 11 14 17 20 11 14 17 20 23 0 3 6 9 12 15 18 21 This table shows that each row shows contains each integer between 0 and 25 exactly once thus yielding 26 unique encryptions for the factor key a=3. The 26 shifts don’t spoil the uniqueness, they rather offer 26 different ways to uniquely encode our message. Certainly, tables holding this uniqueness property can be generated for the 12 good key factors a=1,3,5,7,9,11,15,17,19,21,23,25 yielding a total of 12*26=312 unique encryptions for the Linear Cipher. There are two special cases to consider: 1. If we choose a=1 and vary b the resulting encryption is just the familiar Caesar Shift. The factor key a=1 has no effect, it leaves each plain letter unchanged. Nevertheless, for completeness, we count it as a good factor key. 2. If we choose b=0 and vary a the resulting encryption turns out to be the familiar Multiplication Cipher. Here, the shift key b=0 has no effect. Nevertheless, for completeness, we also count b=0 as a good shift key. 5 3.1.2 The Linear Cipher produces (M)*M unique Encryptions We can expect to vary the number of unique encryptions when varying the alphabet length M. How many unique encryptions do we obtain for an alphabet of length M? We obtained 12*26=312 unique encryptions when M=26. How about for M=27? Here again, multiply the 18=(27) good factor keys for M=27 by the 27 possible shift keys to obtain 18*27=486. It is your turn: An alphabet of length M=28 has (28)=__ good factor keys and therefore allows __ * __ = ___ unique encryptions. What strategy should we pursue to further increase the number of unique encryptions and to thus make life more difficult for an eavesdropper? The strategy for the Multiplication Cipher to choose a prime alphabet length M works here aswell. In that case each of the M-1 factor keys less than M is a good factor key. If we choose i.e. M=29 we multiply the (29)=28 good factor keys by the 29 possible shifts to obtain 28*29=812 unique encryptions. Let’s generalize our observations to determine the number of unique encryptions for a given alphabet length M: The number of unique encryptions for the Linear Cipher The number of possible unique encryptions for the Linear Cipher with alphabet length M is (M)*M . In particular, if we choose M to be a prime number p then (p) = p-1 and we thus obtain (p-1) * p = p2 - p unique encryptions. Example1: If M=3 we can choose among (3)*3 = (3-1)*3 = 6 unique encryptions. Namely among the key pairs (a,b): (1,0), (1,1), (1,2), (2,0), (2,1), (2,2). Example2: If M=5 we can choose among (5)*5 = (5-1)*5 = 20 unique encryptions. Namely among the key pairs (a,b): (1,0), (1,1), (1,2), (1,3), (1,4), (2,0), (2,1), (2,2), (2,3), (2,4), (3,0), (3,1), (3,2), (3,3), (3,4), (4,0), (4,1), (4,2), (4,3), (4,4). Example3: If M=6 we can choose among (6)*6 = 2*6 = 12 unique encryptions. Example4: If M=7 we can choose among (7)*7 = __*__ = ___ unique encryptions. Example5: If M=23 we can choose among (23)*23 = __ *__ = __ unique encryptions. Example6: If M=24 we can choose among (24)*24 = ____ = ___ unique encryptions. 6 Example7: If M=25 we can choose among (25)*25 = ____ = ___ unique encryptions. Exercise: Say you want to use an alphabet consisting of no more than 50 symbols. a) What length would you choose for a maximum safety? b) How many possible encryptions do you therefore obtain? Another option to increase the number of unique encryptions is to encode pairs of letters. The 262 pairs of letters AA, AB, AC, …, ZZ increase the alphabet length M to 676 allowing (676)*676 unique encryptions. Encoding three letters at a time would increase the alphabet length M to 263 allowing (263)*263 unique encryptions. 3.2 Decryption of the Linear Cipher How shall we go about decoding a Linear Cipher-encoded message? Try to answer it for yourself, then continue reading. In order to decode messages that were encrypted with the Linear Cipher we have to solve the encryption equation for the plain letter P. C= a*P + b MOD M We first subtract b on both sides. This undoes the final shift when encrypting. C-b= a*P MOD M We now have to isolate P. Remember that we can not just multiply by the reciprocal of a, namely 1/a, since we are doing MOD arithmetic and thus dealing with integers. We have to multiply by that integer that turns a into 1, namely by the inverse a-1. a-1 *(C-b )= (a-1 *a)*P MOD M Since a-1 * a =1 we now have. a-1 (C-b)= 1 * P = P MOD M Or similarly. P = a-1 *(C-b) MOD M which is the desired decoding function. How to decode the Linear Cipher Messages that were encrypted with the Linear Cipher using C = a*P + b MOD M, have to be decrypted using the decoding function P = a-1 *(C-b) MOD M . Remark: Obtaining the decoding equation requires again the 4 properties of the group G=(Z M*, *): the existence of an inverse and of an unit element, the closure and the associative property. We are guaranteed to solve the first equation for P since the set of good factor keys a forms a group with respect to multiplication MOD M as we learned in the previous chapter. 7 Our decoding function P = a-1 *(C-b) MOD M dictates to first subtract b from each cipher number to then multiply by the inverse of the encoding factor key, namely by the decoding factor key a-1 MOD M. Thus, the recipient has to possess the decoding key pair (a-1, b) which the encoding person has to transfer in some secure manner. Encoding and Decoding Keys: When encrypting using the Linear Cipher, the encoding key consists of the pair of integers (a,b), whereas the secret decoding key is the pair of integers (a-1,b). Example1: Say, a virus-carrier message was encoded using the key pair (3,9). Therefore, the decoding key (a-1, b)=(9,4) is used in the decoding function P = 9*(C-4) MOD 26 to decode the following cipher text as follows: Cipher text C C-4 P = 9*(C-4) PLAIN TEXT e 4 r 17 j 9 c 2 g 6 j 9 z 25 q 16 k 10 e 4 d 3 d 3 c 2 q 16 d 3 0 0 A 13 13 N 5 19 T 24 8 I 2 18 S 5 19 T 21 7 H 12 4 E 6 2 C 0 0 A 25 17 R 25 17 R 24 8 I 12 4 E 25 17 R Let’s verify how we obtained i.e. the plain letters T and I from the cipher letters j and c. We first have to subtract 4 from each cipher letter value to undo the final encoding shift. Subtracting 4 from the cipher letter j=9 yields 5. We then multiply by the inverse a-1 = 9 to undo the original encoding multiplication performed by the factor key a=3. Thus, 9 * 5 = 45. Since we compute MOD 26, we obtain 45 –26 = 19 MOD 26. 19=T turns out to be the plain letter of the cipher letter j. I will briefly verify how c turns into I and leave two other cipher letters for you to verify. Subtracting 4 from the letter c=2 yields –2 which equals 24 MOD 26. Consequently, 9 * 24 = 216 which again equals 8 since 216 – 8*26 = 8 and finally yields the plain letter I=8. Exercise: Verify that g turns into S and that d turns into R? The decoding function is itself a Linear Cipher I want to demonstrate to you that the decoding function is itself a Linear Cipher which simply means that we can rewrite the decoding function P = a-1*(C-b) MOD M so that it acquires the format P = a*C + b MOD M for some integers (a,b). I will first demonstrate it for the decoding function used in the previous example and afterwards show you the general decoding function as a Linear Cipher. 8 Consider the decoding function P = 9*(C-4) MOD 26 P = 9*C - 36 MOD 26 P = 9*C + 16 MOD 26 We distribute the 9 yielding You can recognize already the linear form, however, we want to use keys that are between 0 and M. Thus, we replace the –36 by its positive equivalent between 0 and 26, namely by 16. (Why 16 ?) Done we are. Hence, the decoding function P = 9*(C-4) MOD 26 with the decoding key pair (9,4) can be rewritten as the Linear Cipher P = 9*C + 16 MOD 26 with the decoding key pair (9,16). The advantage of being able to rewrite the decoding function as a Linear Cipher of the the form P = A-1 * C + B MOD M becomes more evident when programming the decoding function of the Linear Cipher. Instead of writing extra code for a new decoding function we may use the same function used to encode the plain text. The only task remaining is to determine the correct decoding key pair (A-1, B). Notice in our example that the decoding factor key A-1 = 9 is simply the inverse of the encoding factor key a=3. Thus, A-1 = a-1. That was easy. Computing B is a bit more tricky: The decoding function is P = a-1 *(C-b) MOD M. Distributing a-1 yields a-1 *C - (a-1 * b) MOD M. Therefore, we choose B to be that positive integer between 0 and M that is congruent to the negative integer – (a-1 * b). We may write this as: B - (a-1 * b) MOD M. The Decoding Function as a Linear Cipher: Messages that were encrypted with the encoding function of the Linear Cipher, C = a*P + b MOD M, have to be decrypted using the decoding function P = a-1 *(C-b) MOD M. Alternatively, we may express the decoding function as the Linear Cipher P = A *C + B MOD M with the decoding key pair (A,B) = (a-1, - (a-1 * b) MOD M) Exercise1: Find the decoding key pair (A,B) for a message that was encoded with C = 5*P +17 MOD 26. Exercise2: If we add a blank to the 26 letters in the previous exercise yielding an alphabet length of M=27. Therefore, a message is encoded with C = 5*P +17 MOD 27. Find the decoding pair (A,B). 9 How to determine the decoding key a-1 To find a decoding key a-1, we could just do it the fast way that I demonstrated to you in the previous chapter: Just look at the 26x26 multiplication table, find the only 1 in the row of the key a used and finally go up that column to obtain a-1. I.e. if a=3, then a-1=9 since the only 1 in the 3-row is in the 9-column. That works fine, no problem. However, what if we choose a large alphabet length M? It would be inefficient to use the same procedure as it gives more information than needed. The good news are that there exists a shortcut that finds the inverse a-1 for any key a and for any alphabet length M in an efficient manner. The procedure is called Extended Euclidean Algorithm. It computes a-1 in two steps: Firstly, it computes the greatest common divisor (gcd) of a and M. This part of the whole procedure used is called Euclidean Algorithm. Secondly, extending the Euclidean Algorithm finds the desired inverse a-1 of a MOD M. In the following section you will learn how the Euclidean Algorithm finds the gcd of a and M. Thereafter, I will show you how the Extended Euclidean Algorithm computes the inverse a-1 by making use of the computations used to find the gcd of a and M. 3.2.1 The Euclidean Algorithm quickly finds the Greatest Common Divisor of two Integers As you can imagine, the Euclidean Algorithm goes back to the Geek Philosopher and Mathematician Euclid. He described the algorithm in the 7th chapter of his book Elements written around 300 B.C. However, he did not invent it. This algorithm may be 300 years older. It is not known who invented it. The word algorithm is derived from the name of the Persian Mathematician Mûsâ AlChowârizmi. In 825 A.D., he was the first to publish a book that describes procedures to solve mathematical problems (such as the Euclidean Algorithm). Let’s study the Euclidean Algorithm. How does it find the greatest common divisor of two integers? I will explain the idea graphically. Picture the even divisors of one integer as divisor-lines. Example1: The integer 5 has the divisors 1 and 5. Therefore, 5 can be reached by 1 jump of length 5 or by 5 jumps each of length 1. Graphically: 1 5 10 Example2: The integer 6 has the divisors 1,2,3 and 6. Thus, the 6 can be reached by 1 jump of length 6 or by 2 jumps of length 3 or by 3 jumps of length 2 or by 6 jumps of length 1. Graphically: 1 6 To now find a common divisor of two integers all we have to do is to compare the divisor-lines of both integers to find matching lengths. Our goal is to find the greatest common divisor of two integers. Thus, just find the longest line among those that match in length. Example1: Consider the two integers 6 and 8. Their divisor lines look as follows: 1 2 3 4 5 6 1 2 3 4 5 6 7 8 We see that the lines of length 2 (in blue) are the longest among those that both have in common. Thus, 2 is the greatest common divisor of 6 and 8: gcd(6,8)=2. 11 Example2: Consider the two integers 12 and 8. 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 10 11 12 We observe that among the matching lines of length 1, 2 and 4, 4 is the longest and thus is the greatest common divisor of 12 and 8: gcd(12,8)=4. Graphically, the concept is easy: to find the greatest common divisor of two integers just find the divisors of both integers and find the greatest divisor among those they have in common. The concept is easy, however, it can not be executed for large integers. I mentioned to you already earlier that there are no procedures devised (yet) that are guaranteed to produce the divisors of an arbitrary integer consisting of 100 digits and more. This is the caliber of integers to deal with in the RSA encryption. Here is where the Euclidean Algorithm comes in very handy. Although devised over 2000 years ago, it has proven to be a very efficient procedure to compute the greatest common divisor of two integers without knowing the divisors of both integers. Here is how: We now picture the two integers using two dimensions: We draw the divisor line of the larger integer, here M=4, horizontally and the shorter one, a=2, vertically: 4 2 Instead of finding the longest common lines I now find the greatest common square that fills the rectangle perfectly. This reflects the fact that two integers urge us to work in 2 dimensions. In our example, two 2x2 squares fill the rectangle as the greatest square. Therefore, the greatest common divisor of 4 and 2 is the side of the greatest squares that fill the rectangle perfectly. It is 2. Certainly, any rectangle with integer-dimensions may be filled with 1x1 squares (just as any integer can be divided by 1). However, we want to find the greatest common divisor. Therefore, we want to find the greatest square that fills the rectangle. 12 Let’s look at further examples: Example1: The greatest common divisor of 6 and 2 is 2 since the 6x2 rectangle can be filled with 2x2 squares: Example2: The greatest common divisor of 6 and 3 is 3 since the 6x3 rectangle can be filled with 3x3 squares: Example3: The greatest common divisor of 5 and 2 is 1 since the 5x2 rectangle can only be filled with 1x1 squares: Although two 2x2 squares fit into the rectangle they don’t fill the whole rectangle. Two rectangles each of size 1x1 are needed to cover the remainder. Example4: The greatest common divisor of 5 and 3 is 1 since the 5x3 rectangle can only be filled with 1x1 squares: Although one 3x3 and one 2x2 square fit in the rectangle they don’t suffice to fill the whole rectangle. Two rectangles each of size 1x1 are needed to cover the remainder. Let’s use this gcd(5,3)=1 example to demonstrate the filling-the-rectangle-with-squares idea in the Euclidean Algorithm: The rectangle with longer side 5 holds one 5 = 1 * 3 + 2. square with sides 3 and leaves a remainder of 2. The remaining rectangle with longer side 3 3 = 1 * 2 + 1 holds one square with sides 2 and leaves a remainder of 1. The remaining rectangle with longer side 2 2 = 2 * 1 holds two squares with sides 1 and leaves no remainder. Visual example1 of the Euclidean Algorithm: Architecturally, the greatest common divisor can be computed by finding the greatest square tiles that yet fill a rectangular room perfectly. Say, we have a 26 by 21 ft2 room that is to be tiled. What are the largest square-shaped tiles that will do the job? 13 21 11 1 11 26=1*21+5 21=4*5+1 5=5*1 5 21 5 5 5 21 5 26 We start off with a huge 21x21 tile. Apparently it doesn’t fill the room perfectly. Now the 21 x 5 remainder of the room can be filled with four 5x5 tiles leaving a 5 x 1 remainder. That remainder can only be filled using 5 1x1 tiles. Thus, to fill the room completely we have to use the smallest square tiles possible that have integer sides: 546(=21*26) 1x1 tiles will cover the 26x21 room perfectly. This shows that gcd(26,21)=1. Notice that we did not have to compute all divisors of 26 and 21 in order find their greatest common divisor. Visual example2 of the Euclidean Algorithm: Goal: Find the greatest common divisor of 25 and 5: 25=1*20+5 20=4*5+0 5 20 5 5 5 20 5 25 14 The 20x20 tile does not fill the room, it leaves a 20 x 5 remainder that can be perfectly filled using four 5x5 tiles. Why does that show that the gcd(25,20)=5? With other words: how do we know that the sixteen 5x5 tiles fill the big 20x20 perfectly? The reason is that the four 5x5 tiles don’t leave a remainder in the 20 x 5 rectangular area as we had in the first example. In both examples we had a horizontal fit of the 5x5 tiles, however, the vertical mattered. We could only fill the 20x20 or the 21x21 area perfectly with the 5x5 tiles if they additionally produce a perfect vertical fit. This is the case for the 20x20 but not for the 21x21 area. Thus, gcd(25,20)=5. However, besides forming a perfect horizontal fit, the 1x1 tiles also produce a perfect vertical fit in the 5x1 area which in turn shows that they fill the four 5x5 area and the 21x21 area perfectly. We are now in a position to state the working principle of the Euclidean Algorithm: The search for the gcd of 26 and 21 can be reduced by finding that of 21 and 5. As an equation: gcd(26,21)=gcd(5,21). This reduction principle continues until the room can be perfectly tiled with tiles whose length yield the gcd. As an equation: gcd(5,21)=gcd(5,1)=1. Altogether, we have the chain: gcd(26,21)=gcd(5,21) =gcd(5,1)=1. Example7: The Euclidean Algorithm becomes really helpful when finding the gcd of two large integers. Say, we want to find the gcd of M=2322 and a=654. Would the answer be 2 or 3 or 4 or 6 or 7 or 8? We can’t really tell right away that the answer is 6. Here is why: 2322 = 654*3 + 360 654 = 360*1 + 294 360 = 294*1 + 66 294 = 66*4 + 30 66 = 30*2 + 6 30 = 6*5 Therefore, gcd(2322,654) = 6. gcd(2322,654) = gcd(654,360) gcd(654,360) = gcd(360,294) gcd(360,294) = gcd(294,66) gcd(294,66) = gcd(66,30) gcd(66,30) = gcd(30,6) gcd(30,6) = 6 Exercise: Find the gcd of 2546 and 1728. Before doing it out, guess what it could be? 3.2.2 Proof of the Euclidean Algorithm How can we prove that the Euclidean Algorithm yields the greatest common divisor of two integers. It worked fine in our tiling example. The foregoing made sense - intuitively. But how can we mathematically prove that the Euclidean Algorithm works correctly and 15 will always yield the proper greatest common divisor. In other words, how can we transfer our intuitive understanding into sober mathematical equations? Now, this is a wonderful opportunity for you to learn the formal process to prove the correctness of the Euclidean Algorithm. What we really have to prove is that the chain of equations eventually yields the greatest common divisor. How do we bridge two equations? In other words: Why is the equal sign correct? If we manage to bridge the first two equations we bridge all equations since the setup of the subsequent equations is identical. This would mean for the above example that we have to prove that the gcd of 2322 and 654 equals the gcd of 654 and 360: gcd(2322,654) = gcd(654,360). We would therefore also prove that gcd(654,360) = gcd(360,294) which bridges the next two equations. Continuing in this manner we eventually end up with the last equation that has a remainder of 0. Here, the smaller integer as a divisor of the larger integer turns out to be the gcd of the two integers we started with. In the example, the last equation is 30 = 5 * 6 so that the gcd(30,6) = 6 turns out to be the gcd of 2322 and 654. Therefore, to prove the Euclidean Algorithm we will consider two cases. In both cases we assume that the integer b is greater than the integer a: b>a. In this case, Mathematicians use the standard clause “without a loss of generality” since the restriction that b>a is not at the cost of any generality. We just assign the greater of the two original integers to b and the smaller to a. (Why would it not make any sense to try to find the gcd of two equal integers, that is when a=b?) Proof of the Euclidean Algorithm: 1. case: In case a is a divisor of b such that b=k*a for some integer k. Then gcd(b,a) = a because a’s greatest divisor is a which is also a divisor of b. 2. case: Starting with b = q0*a + r1 and if r1 0 we then obtain the following chain of equations: a = q1*r1 + r2 r1 = q2*r2 + r3 r2 = q3*r3 + r4 : : rn-2 = qn-1*rn-1 + rn rn-1 = qn*rn + 0 until the first equation yields a remainder of 0. We are guaranteed to end up with an equation that contains a remainder of 0 since the integers in the left column are in decreasing order: 16 b > a > r1 > r2 > …. > rn-1 . In the above example: 2322 > 654 > 360 > 294 > 66 > 30. Since these integers get smaller and smaller but never become negative one of the equations must yield a remainder of 0. Then, the gcd(b,a) can be read off as the fifth remainder r5 (in the above example gcd(2322,654) = rn = 6) from the second to last equation (i.e. 66=30*2+6) or last equation (i.e. 30=6*5). I will now prove in general that gcd(b,a)=rn if the (n+1)st equation yields a remainder of 0 by proving the two inequalities I) gcd(b,a) < rn and II) rn < gcd(b,a) which when both proved to be true yield nothing but the equality. Proof that gcd(b,a) < rn : In b = q0*a + r1, the greatest common divisor of b and a (unknown so far) also divides r1. (I.e. the gcd(36,28) = 4 also divides r1 = b – q0*a = 36 – 2*14 = 8). Therefore, the gcd(b,a) is a common divisor of a and r1 and we thus are guaranteed that the gcd(a, r1) cannot be greater than the gcd(b,a): gcd(b,a) < gcd(a,r1). For the same reason, the second equation yields gcd(a, r1) < gcd(r1, r2), the third yields gcd(r1, r2) < gcd(r2, r3), etc. Finally, the next-to-last equation yields gcd(rn-2, rn-1) < gcd(rn-1, rn) = rn. The equality is what we just proved in the 1. case: Since the last equation rn-1 = qn*rn leaves no remainder, rn-1 is a divisor of rn so that gcd(rn-1, rn) = rn. The chain of inequalities yields: gcd(b,a) < rn. Proof that rn < gcd(a,b): The converse is also true. Starting from the last equation and working our way up we obtain the following chain of inequalities: rn is a divisor of rn-1 ( from the last equation), Therefore, rn also divides rn-2 which we obtain from the next-to-last equation using the same reasoning as in part I. Continuing going from bottom to top, rn also divides rn-3, …., r2, r1, a and b. We see that rn is a common divisor of a and b and altogether we therefore obtain that rn < gcd(a,b). This completes the proof. The two inequalities now prove that rn = gcd(a,b) Exercise: Go through the two cases of the proof one more time. Check that the gcd(72,56) = 8. In that way you make the proof more concrete and thus more tangible. Remember: Any abstraction has its origin in a sufficient repertoire of concrete examples. 17 The C++ implementation of the Euclidean Algorithm Devising the Euclidean Algorithm can be accomplished quite easily as we just have to inspect the steps and then formalize them. Let’s take a closer look at the example7 when we found the gcd of L=2322 and S=654. We started off by dividing the larger integer L by S yielding the quotient Q and the remainder R as follows: We will just use the values S and R for the next step (see the L =Q*S + R arrow). However, we won’t need Q and L anymore. 2322 = 3 *654 + 360 654 = 1* 360 + 294 I first compute R in the above equation from L and S as follows: R = L MOD S, here 360. It becomes the new value of S. In terms of programming this procedure, we have to be aware of a technicality: Before assigning it to S, we have to assign the value of S to the new L. Otherwise, we would lose the value of S, here 654. Thus, the order of the three computer statements here has to be: R = L MOD S, L=S, S=R. 360 = 1* 294 + 66 Similarly, we have to perform the same computations here: R = L MOD S, L=S, S=R. 294 = 4* 66 + 30 Similarly, R = L MOD S, L=S, S=R. 66 = 2* 30 + 6 Again: R = L MOD S, L=S, S=R. 30 = 5 * 6 Finally here: R = L MOD S, L=S, S=R. Since the new remainder equals 0 for the first time, we are done now. The desired gcd is the remainder of the second last equation (in bold), here 6, which is the value of S in the last equation. Hence, gcd(2322,654)=6. We can translate the pseudo-code of the Euclidean Algorithm (to the left) into C++ code as follows: As pseudo-code: As long as R is not 0 do the following: As C++ code: While(R!=0) { R=L%S; L=S; S=R; } cout<< "gcd = "<< L; R=L MOD S; L=S; S=R; Finally, output the gcd of L and S Attention: You might wonder why the gcd of L and S equals L and not S in the bottom equation. This has technical reasons. When you execute the C++ code in your head you notice that the last while-loop is executed for the last equation (30=5*6 as you see above) where – for the first time – the remainder R equals 0 (in R=L%S). However, because the 18 same three statements are executed in this last while loop, S gets assigned the value of the remainder, 0, in S=R. Thus, both R and S end the loop with a value of 0. Fortunately, the value of L holds the correct gcd-value since it was assigned the value of S before it got assigned 0 (in L=S). Now, follow the arrow of the proper remainder in the second to last equation, namely 6. You see that the value of S in the last equation gets assigned the value of the remainder R in the previous step. This shows us how the remainders R are passed over: R becomes S in the next run to become L in the following run of the while loop. Observe this in the example by following the arrows starting with the 360 in the top equation. Now, it is easy to implement the C++ program that determines the gcd of two integers. #include <iostream.h> #include <conio.h> void main() { unsigned long L,S,R=1; clrscr(); cout << "Enter the larger integer: L = "; cin >> L; cout << "Enter the smaller integer: S = "; cin >> S; while(R!=0) { R=L%S; L=S; S=R; } cout<< "gcd = "<< L; getch(); } Notice two more technicalities in the program below: 1) I assigned R a value different from 0 (in R=1;) so that the while-loop is entered in its first run. 2) I used unsigned long instead of int as an integer data type that enables us to use integers up to 4,294,967,295 instead of up to 32,767. Programmer’s Remarks: The Euclidean Algorithm is a great example for the usage of the while-loop and is thus discussed in many introductory programming lectures. You will learn that a while-loop just as a do-while-loop are examples of indefinite loops which just means that number of runs is indefinite. (A for-loop, on the contrary, is a definite loop since the number of runs are definite.) The two while loops differ as follows: The while-loop checks the loop-condition before entering the loop, 19 the do-while-loop, however, checks after the loop. Consequently, the while-loop might never be entered whereas a do-while will at least be entered once since the condition check is at the end of the loop. Often times, algorithms can be devised using either the while-loop or the do-while-loop, sometimes even the for-loop. Exercise: Try to revise the above Euclidean Algorithm using a do-while-loop. You will not have to use the initial R=1 assignment to enter the loop at least once. The more elegant recursive implementation of the Euclidean Algorithm The implementation of the Euclidean Algorithm becomes really elegant if we translate the original idea of finding the gcd of two integers by reducing the size of the rectangles until there is no more rectangle left. Let’s investigate this idea when finding the gcd of 26 and 21: 21 11 1 11 26=1*21+5 21=4*5+1 5=5*1 5 21 5 5 5 21 5 26 Instead of considering the original 26x21 rectangle, we could reduce the problem of finding the gcd of 26 and 21 by considering the smaller 5x21 rectangle since we proved that gcd(26,21)=gcd(21,5). Continuing to reduce the rectangles to consider we have altogether: gcd(26,21)=gcd(21,5)=gcd(5,1)=1. Doesn’t that chain of equations strongly suggest to implement a gcd-function that keeps calling itself with 2 varying integers as inputs? And when doing so when do we end this loop? (Finding the base case to exit is always the main question to answer when programming a recursive function.) 20 Well, geometrically speaking, when the last rectangle can be filled perfectly with squareshaped tiles. I.e. the 5x1 rectangle can be filled with 1x1 tiles. When do these tiles do such a wonderful job? Precisely when the longer side is evenly divisible by the shorter side for the first time (i.e. 5 is divisible by 1). Thus, we have to check if 26%21 or 21%5 or 5%1 yields 0. The first time we obtain a remainder of 0 we stop our recursion and return the value of the shorter side as the gcd. In our example, we obtain the first 0 for our 5x1 rectangle and the shorter side, 1, yields the gcd of 26 and 21. The idea is simple and useful. Here is how we translate this idea into workable C++ code: In pseudo-code: Find the gcd of the L by S rectangle. If S equals 0 then return L as the gcd Otherwise find the gcd of the Smaller S by L%S rectangle. In C++ code: int gcd(int L, int S) { if(S==0) return L; else return gcd(S,L%S); } A technical remark: The reason why I checked if S equals 0 instead of checking if L%S equals 0 is a technical one. First of all it is based on the assumption that nobody really wants to find the gcd if S equals 0 (In that case the gcd function would yield the improper answer L as the gcd). Secondly, the actual reason is that when we are trying to find the gcd of the smaller rectangle of dimensions S by (L%S) then these two dimensions are read in as L and S in the next call of the gcd-function as the dimensions of the smaller rectangle to be checked. Now enjoy the brief and effective C++ code for the Euclidean Algorithm. #include <iostream.h> #include <conio.h> unsigned long gcd(unsigned long L, unsigned long S) { if(S==0) return L; else return gcd(S,L%S); } void main() { unsigned long L,S; clrscr(); cout << "Enter the larger integer: L = "; cin >> L; cout << "Enter the smaller integer: S = "; cin >> S; cout<< "gcd = "<< gcd(L,S); getch(); } 21 Programmer’s Remarks: I changed the data type from int in the C++ code (listed next to the pseudo-code) to unsigned long in order to find the gcd of integers up to 4,294,967,295 instead of up to 32,767. Exercise1: Integrate an input check in the above program such that the user has to reenter a value for L or S if the original input was 0. Exercise2: How would you modify the above code if a or b or both are negative integers? Or does this program even handle these cases aswell? Try it 3.2.3 The Extended Euclidean Algorithm quickly finds the Modular Inverse a-1 Finding the gcd of two given integers is a keystone in Cryptography as it allows us to find the good keys in a constructive manner (as done in the C++ program in the previous section). Our objective in this section is to find the corresponding decoding key a-1. Mathematically, this translates into finding the modular inverse a-1 of an integer a. To do so, we first execute the Euclidean Algorithm. Finding the gcd of the integers M and a does all the preparatory work to compute the inverse a-1 as it enables us to express the gcd as a linear combination of M and a: Theorem of Bachet The greatest common divisor of a and M can be expressed as follows: gcd(M,a) = x*a + y*M for some integers x and y. The French Mathematician and Philosopher Gaspar Bachet de Meziriac (1581-1638) posed the problem of expressing the gcd of two integers as a linear combination of those two integers in his book “Problem plaisants et detectables, qui sont fait par les nombres”, Lyon 1612. He solved the problem himself in the second edition of this book in 1624. Let’s take a look at some examples of the Theorem of Bachet. Example1: The gcd of 27 and 14 is 1: gcd(27,14)=1. We can easily see that 1=2*14 + (-1)*27 with x=2 and y=-1. Example2: The gcd of 26 and 12 is 2: gcd(26,12)=2. We can see again that 2=1*26 + (-2)*12 with x=1 and y=-2. Example3: The gcd of 26 and 3 is 1: gcd(26,3)=1. Again , we easily see that 1=9*3 + (-1)*26 with x=3 and y=-1. 22 Example4: The gcd of 26 and 5 is 1: gcd(26,5)=1 and 1=1*26 + (-5)*5 with x=1 and y=-5. Example5: The gcd of 26 and 11 is 1: gcd(26,11)=1. Trial and error yields 1=3*26 + (-7)*11 with x=3 and y=-7. It is not a surprise that x and y have to be of opposite sign. Here is why: The gcd is always a number that is smaller than a and M. If x and y were both positive, we could only form integers greater than a+M. If both were negative we could only form numbers smaller than –(a+M). However, our desired number, the gcd(M,a), is certainly between those two bounds so that x and y have to be of opposite sign. These examples give you an understanding what is meant by Bachet’s Theorem. The used trial and error method to find the integers x and y is feasible when a and M are fairly small numbers. This approach would be not efficient for larger numbers used in Cryptography. Instead, we use Bachet’s Theorem and reverse the Euclidean Algorithm to find x and y. In order to express the gcd(26,21)=1 in terms of 21 and 26 in the form stated in Bachet’s Theorem 1 = x*21+y*26 we have to find the unknown integers x and y we recall the 3 steps to find the gcd: 26=1*21 + 5 21=4* 5 + 1 5=5* 1 + 0 We do not need the 3rd equation (in gray) to determine the gcd. Since the last equation has a remainder of 0, the remainder of the second to last equation, here 1, yielded the gcd. Thus, I can simply express the gcd in terms of 26 and 21 by isolating it in that second to last equation: 1=21 – 4*5 I am half way done since 1 is already expressed in terms of 21. It is also expressed in terms of 5, however, not in terms of 26 yet. To do so, I just solve the first of our three equations for 5, yielding 5 = 26 - 1*21 and insert that for the 5 in 1=21 – 4*5 as follows: 1=21 – 4*(26 – 1*21) Resolving the parentheses yields 1=21 – 4*26 + 4*21 Combining the terms that involve 21 yields 1=5*21 + (-4)*26 23 Check: 1=105-104. Correct. We accomplished to express the gcd(26,21)=1 in terms of 26 and 21 with x=5 and y=-4. We learned how to express the gcd(a,M)=1 in terms of a and M. How does that help to find the inverse of a, namely a-1? The answer is as simple as x. In our example a-1 = x = 5 is the inverse of a=21 since a * a-1 = 21*5 = 1 MOD 26. To understand why x is the inverse of a we just have to take a look at the equation 1=x*21+y*26 and remember that we are computing MOD 26. Thus, our equation turns out to be 1=x*21 MOD 26 where x is exactly that number that multiplied by a=21 yields 1 which is exactly the definition of the inverse a-1. In conclusion: The value of x in Bachet’s Theorem is the desired decoding factor key a-1 for a given alphabet length M. Example1: Let’s find the inverse of a=3 MOD 26. First we have to execute the Euclidean Algorithm: 26=8*3+2 2= 26 - 8*3 3=1*2+1 1= 3 - 2 2=2*1 Secondly, we make use of the above steps to express the gcd(26,3)=1 in terms of 26 and 3: 1= 3 - 2 = 3 - (26 - 8*3) = 3 – 26 + 8 * 3 = 9*3 – 26 1 = 9*3+(-1)*26 so that 9 is the inverse of 3 Mod 26. Check:3*9 = 1 MOD 26. Correct. Example2: Let’s find the inverse of a=23 MOD 26. First we have to execute the Euclidean Algorithm. 26=1*23+3 3= 26 – 23. 23=7*3+2 2= 23 – 7*3. 3=2*1+1 1= 3 – 2. 2=2*1 Secondly, we make use of the above steps in the Euclidean Algorithm to express the gcd(23,26)=1 in terms of 26 and 23: 1=3-2 1 = 3 - (23 - 7*3) = (-1)*23 + 8*3 1 = (-1)*23 + 8*(26-23) 1 = (-9)*23 + 8*26 so that -9 is the inverse of 23 Mod 26. However, we don’t want any negative integers. In that case we just have to add the modulus M, here 26, to get the positive integer that is congruent to –9 and between 0 and 26: a-1 =(-9) + 26 = 15. Check: 15*23=1 MOD 26. Correct. 24 How do we deal with negative inverses? Certainly, (-9) * 23 also yields 1 MOD 26 since –9 and 15 are congruent MOD 26 which shows that –9 and therefore all integers that are congruent to 15 MOD 26 are inverse to 23. However, we desire to obtain that representative of the inverses that is between 0 and the modulus, here 26. Here is another example. Example3: Let’s find the inverse of a=11 MOD 26. First we have to execute the Euclidean Algorithm. 26 = 2*11 + 4 4 = 26 – 2*11 11 = 2*4 + 3 3 = 11 – 2*4 4 = 1*3 + 1 1= 4 –3 3 = 3*1 + 0 Secondly, we make use of the above steps in the Euclidean Algorithm to express the gcd(26,11)=1 in terms of 26 and 11: 1=4-3 1 = 4 - (11 - 2*4) = (-1)*11 + 3*4 1 = (-1)*11 + 3*(26 - 2*11) 1 = 3*26 + (-7)*11 so that -7 is the inverse of 11 Mod 26. Since we don’t want any negative integers we just have to add the modulus M, here 26, to obtain that positive integer that is congruent to –7 and less than 26: a-1 =(-7) + 26 = 19. Check: 19*11 = 209 = 1 MOD 26 which is correct. The C++ implementation of the Extended Euclidean Algorithm The above method is a little tedious to program. Rather, I will show you a graphical procedure to determine the x and y in order to express the gcd as follows: gcd(a,M)=x*a+y*M. This procedure is much more handy and will help to easily design a C++ program for the Extended Euclidean Algorithm. Let me introduce you to a graphical intuitive procedure by finding the multiplicative inverse of 21 MOD 26 in the introductory example. 25 Example1: Step 1: Set up three columns and place the greater integer, 26, on top of the smaller one, 21, in the left column: A Q X 26 21 Step 2: Fill the first column with the remainders when dividing the two above integers until you reach 0. I.e. 26 MOD 21 = 5, 21 MOD 5 = 1, 5 MOD 1 = 0. In the middle column place the quotients (Q) when dividing two consecutive integers. I.e. 26 / 21 = 1, 21 / 5 = 4, 5 / 1 = 5. A 26 21 5 1 0 Q X 1 4 5 Step 3: The bottom two entries in the X column (same height as the q values) are always 0 and 1. Place them as follows: A Q X 26 21 1 5 4 1 1 5 0 0 26 Step 4a: After this preliminary work, we now compute the integers in the third Xcolumn. As we work our way from the bottom to the top, the two top numbers yield the desired x and y coefficients in Bachet’s Theorem. Here is the computation rule: the entries in the X column are computed by multiplying the neighboring entry in the Q by the x-entry underneath and adding the x-entry that is two rows underneath. Here, 4 = 4*1 + 0, or graphically: 4= 4 * 1 + 0 A Q X 26 21 1 4 5 4 * 1 1 5 +0 0 Step 4b: The top entry in the x-column is found in the same manner: 1 * + A 26 21 5 1 0 Q 5= 4 1 X 1 4 5 5 4 +1 0 * Step 5: We are done. The greatest common divisor of 26 and 21, namely 1, can now be expressed by subtracting the crosswise product of the two top values in the A and Xcolumn: A 26 21 Q X 5 4 1 1 = 5 * 21 – 4 * 26 which shows that 5 is the inverse of 21 MOD 26. WARNING: The crosswise subtraction will always yield the gcd, however, the order of subtraction may differ. Instead of subtracting the product of the black line numbers from the product of the gray line numbers, we may obtain the opposite subtraction as shown in the next example: 27 Example2: Let’s express the greatest common divisor of 26 and 23, namely 1, in terms of 26 and 23. Moreover, find the inverse of 23 MOD 26. A Q X 26 9 23 1 8 3 7 1 2 1 1 1 0 1 = 8 * 26 – 9 * 23. Correct. So what is the inverse of 23? No, it is not 9 since 9*23 = 207 which yields –1 MOD 26 and not 1. Rather, instead of subtracting 9*23 in the above equation, we add (-9)*23 so that we correctly obtain: 1 = 8 * 26 + (- 9) * 23 This shows that 9 is not the inverse, but rather –9. Since the inverse of 23, our decoding factor key, shall be a positive integer between 0 and 26 we just have to add 26 to –9 which yields 17 as the correct inverse of 23 MOD 26. Check: 17*23 = 391 = 1 MOD 26. Correct. Example3: We just found that 17 is inverse to 23, let’s now verify that consequently 23 must be the inverse of 17 MOD 26 using again the Extended Euclidean Algorithm. Again, verify all table entries. A Q X 26 3 17 1 2 9 1 1 8 1 1 1 8 0 1 = 2 * 26 – 3 * 17. = 2 * 26 + (- 3) * 17 Since –3 is congruent to 23 MOD 26 we obtain 23 as the inverse of 17 MOD 26. 28 Example4 is your Exercise1: Show that the greatest common divisor of 26 and 19, namely 1, can be expressed in terms of 26 and 19 as follows: 1 = 11 * 19 – 8 * 26. Moreover, this shows that 11 is the inverse of 19 MOD 26. A Q X 26 19 1 0 Exercise2: a) Show that the greatest common divisor of 15 and 28 is 1 using the Euclidean Algorithm. b) Find the inverse of 15 MOD 28 using the Extended Euclidean Algorithm. I wrote the C++ program exactly the way I explained the Extended Euclidean Algorithm to you: it first computes the gcd, it then expresses it in terms of the two entered integers L and S and finally gives the desired inverse of S MOD L. Programmers Remarks: Here, the trick is how to label the integers in one column. Consider the A-column in Example4: the numbers appear listed as: 26, 19, 7,5,2,1. I want to store them as a0=26, a1=19, a2=7, etc. Storing a list of integers can be done in C++ using arrays as follows: I first declare one array per column in: unsigned long a[100],q[100],x[100]; This allows us to store 100 values in each of the three columns. I assume here that the Euclidean Algorithm finds the gcd of two integers in 98 steps or less since the entered two integers are the first entries on the list. Warning: the computer starts storing the 100 entries with a[0]=26, then a[1]=19, then a[2]=7. a[99] would hold the last entry. I will store the entries of the other two columns similarly in the q and in the x array. The entries in example4 will thus be stored as follows: A Q X a[0]=26 x[0]=11 a[1]=19 q[1]=1 x[1]=8 a[2]=7 q[2]=2 x[2]=3 a[3]=5 q[3]=1 x[3]=2 a[4]=2 q[4]=2 x[4]=1 a[5]=1 x[5]=0 29 Yes, you noticed correctly that the top and the bottom entry in the Q-column have no values. They are not needed to compute the gcd. Thus, nothing will be stored in their spots in the arrays. So, the first and the last entry in the Q-column, here q[0] and q[5], will never have a value. In the program below, I fill the columns exactly the same way I explained it to you in the 5 steps above: after assigning L to a[0] and S to a[1], I simultaneously compute the A- and the Q-columns row by row until the division of two consecutive integers in the A-column leaves a remainder of 0 in: a[0]=L; a[1]=S; i=0; while(a[i]%a[i+1]!=0) //compute the A and the Q column { a[i+2]=a[i]%a[i+1]; q[i+1]=a[i]/a[i+1]; i++; } After determining the length of the X-column, n=i+1 (Why not n=i?), I assign 0 to x[n] and 1 to x[n-1] and then compute the remaining entries of the x-column in: n=i+1; x[n]=0; x[n-1]=1; for(j=n-2;j>=0;j--) x[j]=x[j+1]*q[j+1] + x[j+2];//compute X column Notice that as I work bottom-up the index j starts for n-2 and goes down to 0. The previous while-loop, on the contrary, started at i=0 and went up to n reflecting the top-down design. To finally display the gcd in terms of x[1], a[0], x[0] and a[1] I have to check which of the 2 possible subtractions I have to use. Among the many possible ways of determining the proper subtraction, I simply identify the greater of the two products to then display the gcd as follows: if((x[1]*a[0])>(x[0]*a[1])) {cout << x[1]<< " * " <<a[0]<< " - " <<x[0]<< " * " << a[1]; x[0]=L-x[0];} else cout << x[0]<< " * " <<a[1]<< " - " <<x[1]<< " * " << a[0]; Notice that if x[1]*a[0] is greater than x[0]*a[1] then L – x[0] is the inverse of a[1]=S MOD L=a[0]. In that case I want to display that inverse as a positive integer between 0 and L and not as a negative integer. I, thus, add the modulus L=a[0] to the negative x[0] and assign it to x[0] 30 again. Therefore, at the end of the program x[0] doesn’t hold its original proper value anymore, which is fine since we don’t need it anymore. Here is the Code: //Author: Nils Hahnfeld - 3/11/00 //Program computes gcd, expresses it as gcd=y*L+x*S //and gives inverse of S MOD L #include <iostream.h> #include <conio.h> void main() { unsigned long L,S,a[100],q[100],x[100]; int n,i,j; clrscr(); cout << "Enter the larger integer: L = cin >> L; cout << "Enter the smaller integer: S = cin >> S; a[0]=L; a[1]=S; i=0; while(a[i]%a[i+1]!=0) //compute the a { a[i+2]=a[i]%a[i+1]; q[i+1]=a[i]/a[i+1]; i++; } n=i+1; x[n]=0; x[n-1]=1; for(j=n-2;j>=0;j--) x[j]=x[j+1]*q[j+1] cout << endl; " ; "; and the q column + x[j+2];//compute x column //display the three columns cout << "\t a \t q \t x "<<endl; cout << "\t==================="<<endl; cout << "\t" << a[0] <<"\t\t"<<x[0]<<endl; for(i=1;i<n;i++) { cout << "\t" <<a[i]<<"\t"<<q[i]<<"\t"<<x[i]<<endl;} cout << "\t" << a[n] <<"\t\t"<<x[n]<<endl<<endl; //express gcd in terms of L and S cout<< "gcd = "<< a[n] << " = " ; if((x[1]*a[0])>(x[0]*a[1])) {cout << x[1]<< " * " <<a[0]<< " - " <<x[0]<< " * " << a[1]; x[0]=L-x[0];} else cout << x[0]<< " * " <<a[1]<< " - " <<x[1]<< " * " << a[0]; cout<<endl<<endl; //give the inverse of S MOD L if their gcd(S,L)=1 31 if(a[n]==1) {cout << x[0]<<" is inverse to "<<S<<" MOD " <<L<<endl; cout << "Check: "<<x[0]<<"*"<<S<<" = "<<(x[0]*S)%L<<" MOD "<<L<<endl; } getch(); } There are two major criticisms about this program: 1) Choosing the array length to be 100 might be insufficient when finding the gcd of two very big integers. The proper way for using a non-static list length in C++ are either Linked Lists or Vectors whose sizes can be varied. If you are familiar with such C++ structures, please go ahead and refine the above version. 2) I am wasting a lot of memory when choosing the arrays of type unsigned long. Thus, each of the 300 array entries – if used or not – take up 32 bits (= 4 bytes) of memory. This is not memory-efficient and should also be refined using vectors or linked lists. Final remark: The above program will also work if you enter the smaller number for L and the bigger one for S. Thus, you can also find the multiplicative inverse of L MOD S. Try in example S=26 and L=15 or S=26 and L=19. 32 3.3 Cryptoanalysis: Cracking the Linear Cipher Great, we have learned how to encode and decode the Linear Cipher. Now we pretend we are Mr. X and study how to crack the Linear Cipher. Even though the Linear Cipher is harder to crack than the two previous Ciphers that we discussed, it can be cracked. In this section I am going to demonstrate to you how the Linear Cipher can be cracked in 3 steps. How to Crack a Linear Cipher in 3 Steps: Step1: Eavesdropping Mr. X has to find two corresponding plain and cipher letters since the key pair of the Linear Cipher consists of the two integers (a,b). Again, with the aid of frequency analysis on the intercepted cipher text he will be able to identify two such correspondences. Step2: Using the two letter correspondences he can set up two congruences using the encoding function. Solving this 2 by 2 system of congruences for the two integers a and b yields the encoding key pair (a,b). Step3: Finally, he harvests the fruits. Having cracked the decoding key pair (a-1, b) he is now ready to decode the remaining cipher letters of the encoded message by employing the decoding function P = a-1 *(C-b) MOD 26. Step1: In this step Mr. X has to set up 2 letter correspondences. This is accomplished by finding the most frequent cipher letters assuming that they correspond to the most frequent plain letters e and t. I will explain this step in detail further down. Say we found the following two letter correspondences: a) the cipher letter v=21 corresponds to the plain letter F=5 and b) the cipher letter a=0 corresponds to the plain letter I=8: Step2: Setting up and solving a 2x2 system of congruences yields the key pair as follows: Mr. X inserts both letter information into the encoding function C = a * P + b MOD 26: 21 = a*5+b MOD 26 (1) 0 = a*8+b MOD 26 (2) How does he solve this 2x2 system of congruences? Similar to solving two equations with two variables, he solves the two congruences with respect to a and b. In the first step he subtracts the first equation from the second equation 33 in order to get rid of b and thus reducing the 2 x 2 system to one equation with one variable (a 1x1 system). This yields (2) – (1) -21 = a*3 MOD 26 since –21 = 5 MOD 26 we write: 5 = a*3 MOD 26 (3) Instead of dividing both sides by 3, we have to multiply by the inverse of 3 since we are performing MOD-arithmetic and thus exclusively deal with whole numbers. The inverse of 3 MOD 26 can be obtained via the Extended Euclidean Algorithm as I showed you in the previous section. Clever guessing is sufficient in this case: 3-1=9 since 3*9=1 MOD 26. Multiplying both sides of equation (3) by 9 yields: 45 = a MOD 26 Since 45 = 19 MOD 26, we find that a=19. To find b, we replace 19 for a in (1) yielding: 13 = 95+b MOD 26 Subtracting 13 yields: 82 = b MOD 26 Since 82 = 4 MOD 26 we eventually find that b = 4. Mr. X successfully cracked the encoding key pair: a=19 and b=4. He is going to harvest the fruits of his work in the following step. Step3: To decode the remaining cipher letters and therefore to read the whole decoded message, he first has to use the Extended Euclidean Algorithm to compute a-1. We find that a-1 = 11 is the inverse of a = 19 MOD 26. Secondly, we insert a-1 = 11 and b = 4 into the decryption function P = a-1 *(C-b) MOD 26 yielding P = 11 *(C-4) MOD 26 Eventually, he has to convert each cipher letter back to its plain letter by inserting each cipher letter integers C into the equation. With the aid of i.e. MS Excel, he obtains the following conversion table: Cipher letter a b c d e C P=11*(C-4) Plain letter 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 8 19 4 15 0 11 22 7 18 3 14 25 10 21 6 17 2 13 24 9 20 5 16 1 12 23 I T E P A L W H S D O Z K V G R C N Y J U F Q B M X f g h i j k i m n o p q 34 r s t u v w x y z Let me help refine your Excel knowledge. I computed the next-to-last row in Excel as follows: The decoding function P=11*(C-4) MOD 26 turns into the Excel-formula =MOD(11*(B2-4),26) with B2 containing the cipher letter integer. For instance, if B2=0 (=letter a) the formula yields the correct plain letter 8 (= letter I) since 11*(0-4)= -44 = -18 = 8 MOD 26. The last row is a simple integer to letter conversion. Recall that A is stored as 65, B as 66, …, Z as 90 implying that we have to add 65 to any integer that has to be computed. Thus, to convert the first integer 8 into letter I we write: =CHAR(65+B3) with B3 containing the 8: CHAR(65+8)=CHAR(73)=I. 3.3.1 Letter Frequency Analysis as a Tool to Crack Ciphers Remember that steps 2 and 3 can only be executed if Mr. X finds how 2 cipher letters translate into their respective plain letters first. But how can he find what two cipher letters translate into? Consider the following cipher text and try to translate two cipher letters into plain letters! Remember your answer, we will unscramble this message soon. hxo rboexmop fbapu c jahhjo gpsqjoluo ap cpl hxo eopasbe hcgo pspo skh es ah ciikmkjchoe hxbskux hxo wocbe jsqojj vboealoph sr xcbncbl What is meant by letter frequency? It is the relative occurrence of each letter expressed in percent. We find these percentages by simply adding each letter’s number of occurrence divided by the total number of letters in a sentence. On average, letters in English sentences occur as follows: 35 13% 0.14 0.12 9% 0.1 7% 0.08 7% 0.06 4% 3% 0.04 4% 3% 2% 1% 0.02 8%7% 4% 3% 8% 6% 3% 0%0% 3% 2% 1%2% 1% 0% 0% 0 a b c d e f g h I j k l m n o p q r s t u v w x y How does this help finding two letter correspondences? Recall that the Linear Cipher is the combination of the Caesar Cipher and the Multiplication Cipher. Thus, in order to break the Linear Cipher using letter frequencies, we have to understand how a) the Caesar Cipher and b) the Multiplication Cipher letter frequencies of the plain letters are transferred to the cipher letters. This is the objective of the following two sections. You learn you how to crack the Caesar and the Multiplication Cipher easily with the aid of letter frequencies. This knowledge - in turn teaches us how to crack the above example of the Linear Cipher. Don’t forget your guess on the two letter correspondences, we will get back to it. 3.3.2 Cracking the Caesar Cipher with the Aid of Letter Frequency Analysis Let’s encode the following revelation with the Caesar Cipher: T A S L HE ND O I O WE F RESH T HE S T ACC L L PR M E N B R I N G A L I T T L E K N O WL E D G E E NI ORS T A K E NONE OUT U MU L A T E S OV E R T H E Y E A R S E S I DE NT OF HA RV A RD 36 I N z Shifting each letter 3 positions to the right yields: wk h i u h v k d q g wk h v v r l w df f or z hoo s u p h x h h q p v q eul qj d l r u v wd n h x o d wh v r y h l ghqw r i k o l wwo h n q r z o h g j h qr qh r x w u wk h b h d u v duy dug l q What happens to the letter frequencies when shifting the plain letters 3 positions to the right? Of course, they are shifted too! Observe this on the two following bar graphs: Letter frequencies in the message by Harvard President Lowell 18% 16% 16% 14% 12% 10% 8% 8% 7% 6% 5% 4% 4% 1% 2% 2% 6% 6% 2% 2% 2% 9% 7% 7% 6% 3% 2% 1% 0% 2% 2% 0% 0% 1% 0% 0% A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Letter frequencies of the cipher letters: 18% 16% 16% 14% 12% 10% 8% 7% 8% 6% 2% 5% 4% 4% 0% 1% 1% 0% 2% 6% 6% 2% 2% 2% 9% 7% 7% 6% 3% 2% 1% 0% 2% 2% 0% 0% a b c d e f g h I j k l m n 37 o p q r s t u v w x y z Notice that the bars of the second graph displaying the letter frequencies of the cipher letters are shifted by 3 positions to the right. To crack the Caesar Cipher, we just have to find the most frequent cipher letter, here h=7 with 16%, and equal it to the plain letter E=4 in the encryption function: 7 = 4 + b MOD 26. This yields the encoding key b=3 and we can decode the remaining message easily. We observe: In a Caesar Cipher: As the plain letters are shifted so are the letter frequencies 3.3.3 Cracking the Multiplication Cipher with the Aid of Letter Frequency Analysis Are the letter frequencies shifted in the Multiplication Cipher as well? No, because the plain letters are not shifted but multiplied by the good key a. Say we encrypt the original Harvard message using a=3 as our key. We obtain: F A C H V M P Z MC V N J F V M C Q Y F A GG QOMH H T Z K M I M M N K C N D Z Y N S A Y QZ C F A E M I H A F MC QL M Y J MN F QP V H Y F F H M E N QOH MJ S M N QN M QI F Z F V M U MA Z C A Z L A Z J Y N Letter frequencies in the message by Harvard President Lowell 18% 16% 16% 14% 12% 10% 8% 8% 7% 6% 4% 4% 2% 5% 1% 2% 6% 6% 2% 2% 2% 9% 7% 7% 6% 3% 2% 1% 0% 0% 2% 2% 0% 1% 0% 0% A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Follow the translation of the plain letters and their frequencies caused by the multiplication with the key a=3. 38 Plain Letters A - I 20% 16% 15% 10% 7% 5% 1% 2% B C 4% 2% 2% F G 5% 6% H I 0% A D E 18% 16% 16% 14% 12% 9% 10% 8% 7% 8% 6% 6% 4% 1% 2% 7% 6% 2% 3% 2% 7% 6% 5% 4% 2% 2% 2% 2% 2% 0% 1% 1% 0% 0% 0% a b c d e f g h I j k l 39 m n o p q r s t u v w x y z Similarly, observe this for the plain letters from J through R: Plain Letters J - R 20% 15% 10% 8% 6% 5% 2% 0% 7% 7% 2% 1% 0% P Q 0% J K L M N 18% O R 16% 16% 14% 12% 9% 10% 8% 7% 8% 6% 6% 4% 1% 2% 7% 6% 2% 3% 2% 7% 6% 5% 4% 2% 2% 2% 2% 2% 0% 1% 1% 0% 0% 0% a b c d e f g h I j k l m n o 40 p q r s t u v w x y z And finally for the plain letters from S through Z: Plain Letters S-Z 20% 15% 10% 9% 6% 3% 5% 2% 2% V W 0% 1% 0% X Y Z 0% S T U 18% 16% 16% 14% 12% 9% 10% 8% 7% 6% 8% 6% 4% 2% 2% 1% 7% 6% 2% 4% 3% 2%2% 7% 5% 2%2% 0% 2% 1%1% 6% 0%0% 0% a b c d e f g h I j k l m n o p q r s t u v w x y z Again the translation of the letters and its frequencies is easy to observe: the bars of the original frequency distribution are not just shifted to the right but placed to the right in steps of 3 yielding the frequency distribution of the cipher letters. Again, cracking the Multiplication Cipher is accomplished by determining the most frequent cipher letter (that one with the highest bar), here it is the cipher letter m. Setting the value of m, 12, equal to the value of the plain letter E, 4, in the encryption function 12 = a*4 MOD 26. This yields the key a=3. Finding the decoding key a-1 =9 MOD 26 enables us to decode the remaining cipher letters easily. 41 We observed for the Multiplication Cipher: Encoding with the Multiplication Cipher leaves the underlying letter frequencies unchanged as any plain letter is encoded with the same cipher letter. The letter frequencies of the same plain letters are transferred to their corresponding cipher letters in steps of length a which is the encoding key. 3.3.4 Cracking the Linear Cipher with the Aid of Letter Frequency Analysis We could easily see that the underlying letter frequencies remain unchanged in the Caesar and Multiplication Cipher. Does this therefore hold true for the Linear Cipher aswell? Let’s take a look at the encryption function: C = a*P + b MOD 26. The Linear Cipher first encodes with the Multiplication Cipher and we just observed that it leaves the underlying letter frequencies unchanged. The subsequent Caesar Shift simply shifts the frequency bars to the right, again without changing the underlying frequencies. Consequently, encrypting with a Linear Cipher leaves the underlying letter frequency unchanged. Let’s apply the learned to our introductory example: hxo rboexmop fbapu c jahhjo gpsqjoluo ap cpl hxo eopasbe hcgo pspo skh es ah ciikmkjchoe hxbskux hxo wocbe jsqojj vboealoph sr xcbncbl Computing the letter frequencies of the cipher text yields the following frequency chart: 16% 14% 14% 12% 10% 10% 8% 6% 7%7% 5% 8% 6% 7% 4%4% 4% 2% 6% 0% 2% 1% 2% 2% 1% 2%2% 6% 3% 0% 1%1% 0%0% 0% a b c d e f g h I j k l m n o p q r s t u v w x y z 42 This frequency chart is a result of the combination of the frequency bar translations caused by the Caesar Cipher and the Multiplication Cipher. The original plain letter frequencies were first translated in steps of the factor key a=2. Subsequently, the letter frequencies were shifted by b=3 positions to the right yielding the final letter frequencies. Imagine Mr. X does not know how the key par used to encrypt the message. How could he crack the encrypted message? He can generate the frequency graphs off the intercepted cipher text: He observes that the most frequent cipher letter o=14 which most likely corresponds to the plain letter E=4. Hence, the unknown key pair (a,b) has to fulfill the encoding equation 14 = a * 4 + b MOD 26. There are two options to crack the Cipher. Cracking option A: Either he further exploits letter frequencies. Testing other possible letter correspondences allows him to solve two linear equations in two variables which yields the desired key pair (a,b). or Cracking option B: He simply tests all possible key pair combinations (a,b). This is a brute force attack. I will first show you option A and afterwards option B. Cracking option A: What is the second most common letter? T, N, O, R, I, A, S are the most likely candidates as you can see in the above letter frequency chart of English letters. Any flashbacks of the TV-show “Wheel Of Fortune”? It is no coincidence that those letters are bought during the game since they are the most frequent ones. The above chart reveals the second most frequent cipher letter is h with 10%. Assuming that the cipher letter h=7 translates into plain letter T=19, Mr. X can therefore set up the following equation: 7 = a * 19 + b MOD 26 14 = a * 4 + b MOD 26 The top minus the bottom equation yields: -7 = a * 15 MOD 26 Since -7 = 19 MOD 26 we have: 43 19 = a * 15 MOD 26 In order to solve this congruence for a he has to multiply both sides by the inverse of 15 MOD 26, namely 15-1 = 7. This yields: 19 * 7 = a MOD 26 Since 19 * 7 = 133 = 3 MOD 26, we finally obtain 3 = a MOD 26 To find b he inserts a=3 in the second equation and obtain: 14 = 3 * 4 + b = 12 + b MOD 26 Thus, he obtains the key pair: a=3 and b=2. In order to decrypt the Cipher Code he proceeds as usual: he first finds the decoding factor key, namely a-1 = 9, and then determines the following plain/cipher letter correspondences by using the decoding function P = 9*(C-2) MOD 26. Cipher Letter Plain Letter a b c d e f g h i j k l m n o p q I R A J S B K T C L U D M V E N W r s F O t u v w x y z X G P Y H Q Z Replacing each cipher letter in the Cipher Code by its plain letter equivalent yields again the correct Harvard statement. He was lucky. His first attempt to crack the Cipher Code was already successful. The reason for that is that the 2nd most frequent cipher letter happened to be the expected one, namely the t. This is even more likely to happen if the encoded message is longer. Because in that case we can expect the cipher letter frequencies to converge to the letter frequencies for English sentences. This statistical property is known as the “Law for Large Numbers”. Exercise1 for option A: Crack a different Linear Cipher using letter frequency analysis. Your job is to fill in the blanks. C = a*P+b MOD 26 Say, we found that the cipher letter n=13 corresponds to F=5: and s=18 corresponds to I=8: 13 = a*5+b MOD 26 __= a*_+b MOD 26 Subtracting the first from the second yields __= a*_ MOD 26 To isolate a we have to multiply both sides by the inverse of the integer right of a, namely 9. (Why?) __* 9 = a MOD 26 Since 45 = 19 = a MOD 26, we find that a=19. Replacing the 19 for a in the first of our two original equations we obtain: 13=__+b MOD 26 44 Subtracting 95 on both sides yields: __ = b MOD 26 Since –82 = __ MOD 26 we find b to be: __ = b MOD 26 Hence, we successfully cracked the key: a=19 and b=22. To decode the remaining cipher letters and therefore to read the whole encoded message, we just have to insert a-1 = 11 and b=4 into the decryption function P = a-1 *(C-b) MOD 26. Our decryption function turns out to be: P = 11 *(C-22) MOD 26 Which can be rewritten as P = 11 *C – 11*22 MOD 26 P = 11 *C – 242 MOD 26 P = 11 *C + 18 MOD 26 Check: Inserting the cipher letter n=13 should yield the plain letter F=5: P = 11 *13 + 18 MOD 26 P = 143 + 18 MOD 26 P = 161 MOD 26 P = 5 MOD 26 Correct. Cracking option B: Let’s break the Harvard message using brute force. Which values of a and b fulfill the encoding function? Mr. X assumes that an alphabet length M=26 was used. Therefore, there are only 12 good encoding factor keys a, namely those that are relative prime to 26. Assuming that the most frequent cipher letter o corresponds to the plain letter E, what values could b attain? Well, only those that fulfill the equation 14=a*4+b MOD 26. Here is a list of the possible key pairs (a,b): a b 0 1 14 10 2 6 3 4 5 6 7 2 24 20 16 12 8 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 4 0 22 18 14 10 6 2 24 20 16 12 8 4 0 22 18 We compute the inverse a-1 of each a MOD 26 to be: a-1 0 1 2 9 4 21 6 15 8 3 10 19 12 13 14 7 16 23 18 19 20 5 22 17 24 25 We can test each of the above 12 key pairs by using a-1 and b in the decoding equation 45 P = a-1 *(C-b) MOD 26. I.e. testing a-1=9 and b=2 yields the following table of letter correspondences: Cipher Letter Plain Letter a b c d e f g h i j k l m n o p q I R A J S B K T C L U D M V E N W r s F O t u v w x y z X G P Y H Q Z Bingo. Replacing the cipher letters by the corresponding plain letters yields the correct Harvard statement. Verify this. Reflection on Cracking Option B: In the above example, eavesdropping Mr. X assumed an alphabet length of 26. What if that assumption is not correct? Mr. X would just have to test (M) key pairs for each used an alphabet length M: 12 for M=26, 18 for M=27, 12 for M=29, etc. The reason why he just has to test (M) as opposed to all the possible (M) * M key pairs is due to the fact that he can identify the cipher e by using letter frequency. This allows him to set up one equation mod M which yields a unique shift key b for a given factor key a. Finding the inverse a-1 depends on the alphabet length M. I.e. the inverse of a=3 MOD 26 is a-1=9, however, it would be a-1=10 MOD 29. In conclusion: With computer assistance, Mr. X is able to crack any Linear Cipher easily using Cracking Option B. However, Cracking Option A turns out to be even faster. The longer a cipher text, the more likely it is to find two correct letter correspondences that are sufficient to compute the key pair (a,b). 3.4 Random Substitution Ciphers The fact that the underlying letter frequencies remain unchanged is not only true when cracking Caesar Ciphers, Multiplication Ciphers or Linear Ciphers. It is actually true for those ciphers where a same plain letters are encrypted to the same cipher letter. Such ciphers are called Monoalphabetic Substitution Ciphers. Definition: In Monoalphabetic Substitution Ciphers same plain letters are encrypted to the same cipher letters. You can imagine that the underlying letter frequencies help eavesdroppers a lot to crack such ciphers. Cracking becomes particularly easy if the intercepted message consists of many letters. In that case, the cipher letter frequencies approach the letter frequencies of the English language. 46 Warning for Monoalphabetic Encryptions: The longer the intercepted cipher text the more will the cipher letter frequencies converge to the letter frequencies of the English language. 3.4.1 The Art of Cracking Random Substitution Ciphers In this last section, we are going to study the art of how to crack a “Random Substitution Cipher”: that is a cipher where each plain letter is assigned to each cipher letter in a random manner. You find such ciphers as “cryptograms” in newspapers. For example, using the following random letter assignment Plain letter Cipher letter A B C D E F G q a z w s x H I J K L M e d c r f v the virus carrier message encodes to: PLAIN TEXT A N T I S Cipher text q g j c u T j H d N O P Q R S t g b Y h n u E s C z A q T U j m R n R n V W i X k o I c Y Z l p E s R n The key to crack the Random Substitution Cipher is that the underlying letter frequencies of the original plain text don’t change. This fact makes the Random Substitution Cipher very susceptible to Cipher attacks. An eavesdropper literally just has to count the letter frequencies of the Cipher letters, create a chart displaying the relative letter frequencies and match it with the letter frequencies of the English language. Recall that the most frequent letters in the English language are ETNORIA which – except for the O - occur even as the most frequent letters in the brief virus carrier message. Surely, the longer the messages are the more do the relative frequencies of the cipher letters approach the expected frequencies. Let’s break the following random-substituted encrypted message: q q a j v v j d s i s n q e s l j d s d q g q j q e b u s q t i z o s s b s n n g u c z t s w y q c q g g j w a q n s c g l j d s u y s g j v s u u How could we go about breaking this message? Certainly, we shall take advantage of the known letter frequencies. Additionally, we can take advantage of the blanks between two 47 words, certainly a helpful hint for eavesdroppers. Since a sender does not want to give such hints to eavesdroppers he would not use blanks in the encrypted message. Blanks could be encoded as a certain letter (i.e. X) or a number combination in the cipher text. If they are left out the Cipher message reads as follows: qvvjdsjqosuyqcwaljdsqisnqesqtsnczqgqnsuysgjaljdsebisntsgjcgvsuujdq gquszbgw Certainly, this appears to be a more challenging task, however, letter frequency will eventually yield the correct plain text. It is just more challenging. Before trying to crack such a cipher text, let’s first crack the version with the blanks: Step 1: We first find the cipher E. Finding the most frequent letter yields s. So, let’s assume that s is E. Step 2: Secondly, we try to detect the most common English 3-letter word “THE”. We, therefore, have to look for repetitive 3-letter-sequences ending in s. We observe - even without the help of the blank spaces - that jds occurs three times, more than any other trigram. It is very likely, that we revealed the two correspondences j=T and d=H yielding T H E j d s q v v q I E E s n q e s a l T H E j d s T H j d q g q T E j q o s U q t e b I y q c w a l E s n c z q g E q n s E s n t c g E s g j T H E j d s E T u y s g J E v s u u E u s z b g w The knowledge of the letters T, H and E reveals words like there, this, than, thus, that, etc. So, let’s look for them. We find jdqg which could be this or thus, however, it could not be that. Why not? Checking the frequency of q shows that it is likely to be one of the most common letters ETNORIA, and since it is a vowel (why?) we may reduce the possible choices for q to O, I or A. I or A are more likely to follow TH and may form the second to last 1-letter-words q. Step 3: We now form possible words of the given letters and test if the found letters make sense in other words. Say, we choose q to be A. What word could the first word qvv be? ALL and ADD are possible, ARE is not. If q would be I then ILL is possible. A seems to be the more reasonable choice. Substituting A for q yields THA_ for jdqg. We know it can not be THAT, therefore, THAN makes more sense. It makes sense that g is N since the g is one of the most frequent letters. Thus, using altogether the correspondences A=q, N=g L=v yields 48 A L L q v v T H E j d s A q I E A E s n q e s a l T H E j d s T H A N j d q g A q T A E j q o s u A q t e b I A y q c w a l E A N s n c z q g A E q n s E s n t N c g E N T s g j T H E j d s E N T u y s g j L E v s u u E N u s z b g w Step 4: Now, words will appear. vsuu looks very much like LESS, then S_ENT in uysgj should be SPENT. A_E in qns seems to be ARE. This is very likely since the encrypted R, the n, appears frequently. We now have: A L L q v v T H E j d S A q I E R A E s n q e s a l T H E j d s T H A N j d q g A q T A E S j q o s u A q t e b I P A y q c w a l E R A N s n c z q g A R E q n s E R s n t N c g E N T s g j T H E j d s S P E N T u y s g j L E S S v s u u S E N u s z b g w Continuing to detect words, we see that A_ERA_E in qlsnqes looks like AVERAGE, and A_ER__AN in qtsnczqg looks very much like AMERICAN. What other words can you find? Try to finish it by yourself. Replacing these letters yields A L L q v v T H E j d s A V E R A G E q I s n q e s a l T H E j d s T H A N j d q g A q T A E S j q o s u P A I y q c w A M E R I C A N q t s n c z q g G V E R M E N T e b I s n t s g j a l A R E q n s I N c g T H E j d s S P E N T u y s g j L E S S v s u u S E C N u s z b g w And we finally have: ALL THE TAXES PAID BY THE AVERAGE AMERICAN ARE SPENT BY THE GOVERNMENT IN LESS THAN A SECOND 49 Resumee: Cracking Random Substitution Ciphers can be accomplished by a combination of finding most frequent letters and trigrams as well as clever guessing and testing missing letters. The more Random Substitution Ciphers you will crack the more experienced you will become. As a side effect: You will be able to solve the “cryptograms” in the newspapers easily. Exercise1: You now have the opportunity to practice the art of cracking such Random Substitution Ciphers. Crack the following cipher text: yxdy pq yjc xzpvpyw ya icqdepzc ayjceq xq yjcw qcc yjcuqcvrcq. xzexjxu vpsdavs Exercise2: Crack the following “Cryptoclip” found in the “Daily News of the Virgin Islands” on March 21, 2001. vkbj v avpq bphvu baj hfj bahjk vy yjxbjxfjq bc bdjxbr rjvpy hx baj fccujp. 50 3.5 An Inventory of the learned Monoalphabetic Substitution Ciphers In this section we will analyze and compare the ciphers that we have studied thus far. They are all Monoalphabetic Substitution Ciphers since each translates same plain letters into the same cipher letter. However, the rule to assign letters varies. For the Caesar, the Multiplication and the Linear Cipher, we are using encoding functions. This functional relationship between plain and cipher letters made these ciphers very susceptible to eavesdroppers. In order to crack the first two Ciphers, it is sufficient to determine the most frequent cipher letter and set it equal to E. The Linear Cipher uses a key pair and thus requires two plain/cipher letter correspondences. On the contrary, the Random Substitution Cipher does not use an encoding function. It simply uses a plain/cipher letter assignment table to encode the original messages. This non-functional encryption allows many more possible unique encryptions and makes it therefore harder to crack. It is not sufficient to simply find two plain/cipher lettercorrespondences. Rather, each cipher/plain letter correspondence has to figured out. Nevertheless, since it is still a Mono-substitution Cipher the underlying letter frequencies were passed over to the cipher text so that we could use frequency analysis to crack the cipher. Let’s summarize these facts in a table: Cipher MonoSubstituti on Caesar Cipher Yes Multiplication Cipher Linear Cipher Yes Yes Random Sub. Yes Cipher Encoding function, (Encoding key is bold) C=P+b MOD M C=a*P MOD M C=a*P+b MOD M Assignment Table Decoding function (Decoding key is bold) P=C-b MOD M P=a-1*C MOD M P= a-1*(C-b) MOD M Assignment Table Cracking Method Number of Unique Encryption s Requires 1 M letter corresp. Requires 1 (M) letter corresp. Requires 2 M * (M) letter corresp. Frequency ? Analysis How many unique Random Substitution Ciphers are there? Let’s figure out the only ? in the table by determining how many unique Random Substitution Ciphers there are. The way of going about it is to start with an alphabet length of M=1, then M=2, etc. In that way can gain insight how an additional letter changes the number of possible unique Ciphers. 51 M=1: A 1-letter alphabet allows 1 possible Cipher: A. M=2: The 2-letters A and B may be encrypted to A A B B B A Surely, encryption using only two symbols is rather difficult. Nevertheless, the Morse Code would work. M=3: The 3-letters A, B and C may be encrypted to: A A A B B C C B B C A C A B C C B C A B A We thus have 2*3=6, possible Random Substitutions. M=4: For a 4-letter alphabet we may place the fourth letter D in one of four positions: before the first, between the first and the second, between the second and the third or after the third letter. Since we can choose among the six above 3-letter combinations, we obtain a total of 4*6=24 possible 4-letter combinations. Since 6 =3*2 we may just write 24 = 4 * 3 * 2 *1. Or simply 4! . M=5: Now guess how many 5-letter combinations you can form? 5! = 5 * 4 * 3 * 2 * 1 = 120 are there. Explain why we had to multiply the 24 possible 4-letter-words by 5 to obtain the number of possible 5-letter words. M=26: Continuing adding letter by letter to the existing words we obtain 26! possible Ciphers for 26 letters: 26! = 26 * 25 * 24 * … * 3 * 2 * 1 which turns out to be a huge number, namely 403291461126605635584000000. In scientific notation this is approximately 4.0 * 1026 which appears on calculators as 4.0 * E+26 Let’s summarize our results: #letters 1 2 3 #words 1 2!=2 3!=6 4 4!=24 5 5!=120 52 6 6!=720 … 26 26! … M M! How can we increase the security of the Random Substitution Cipher? We can increase the security of the Random Substitution Cipher by increasing the number of possible Ciphers. We just learned how: Increasing the alphabet length M increases the number of possible ciphers. In example we could use the 52 upper and lower case plain letter. To avoid any patterns we would assign upper case plain letters not only to upper but also to lower case cipher letters and vice versa. This allows us to choose among 52! possible Ciphers. Check with your windows calculator how many these are. Is it just double of 26! ? 52! = __________________________ In order to increase the security of our cipher, would it help to use a different cipher letter alphabet? I.e. we could encode every plain letter using symbols as follows: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z What does the following Cipher therefore translate to? Exactly. It was the earlier sentence on the taxes spent by the government. Now, does this translation in MS WinWord from the Times New Roman to the Wingdings Font increase the security of the Random Substitution Cipher? Certainly not since each of the 26 plain letters is still encoded with the same Cipher symbol. Therefore, the underlying plain letter frequency is maintained and can be used again on this Cipher to first detect the cipher e, then the cipher word the to eventually crack the entire Cipher. Exercise1: The following Cipher was created with the Font “Marlett”. Decode it. Exercise2: Create your own Cipher using the Help Cyrillic Font. Exercise3: Find out if MS WinWord allows some other simple cipher codes. 53