Crytography Cryptography is the study of making and breaking codes for communicating secret information. In this project we will study some old-fashioned methods of making and breaking codes and we will study a modern method called RSA cryptography which is used, among other things, to encode your credit card number when you purchase things online. I. The Spartan Code and Substitution Cyphers Throughout history there has always been a need to send coded messages. Kings must send coded messages to their generals, generals to their colonels. Spies need to send coded messages to whomever they’re spying for and so on. One of the first known codes was called the Spartan code. The Spartan government used it 2500 years ago to communicate to its generals in the field. It worked like this. Each general had some type of cylinder, perhaps a cane or a special spear, and all of these cylinders were of the same radius. The King also had a cylinder of the same radius. The King would wrap a ribbon around his cylinder and write horizontally across the wound ribbon. The King might need a few ribbons to write his message since the cylinder had such a small radius. The ribbons would then be unwound and given to a messenger who might wear them as a belt. Anyone who saw it just thought it was a belt decorated with letters (and not very many people could read back then anyway). When the general received the ribbon, he would wrap it around his cylinder to see what the king had written. The letters would line up to spell out the secret message. For example maybe the ribbon, unwound, said T P O E S A H T M G F N E O E R O D W G S E RWORFEHRRARKIIDPOWDTCHMODIRYT R E N Y C H D N G, but when wound around the cylinder and each row of letters read horizontally, it would read T H E W O R D C R Y P T O G R A P H Y C O M E S F R O M T H E G R E E K W O R D S F O R H I D DE N A N DW R I T I N G You can try this at home with a ribbon and a paper towel tube. Wrap the ribbon around the tube and then write horizontally across the tube, one letter on each section of the ribbon. Unwind the ribbon and you can see that it just looks like a string of random letters. Wind it around the tube again and you can see the message again. Hand in your ribbon with your project and I will wrap it around my own paper towel tube to read your secret message. How secure is the Spartan code? How easily could the Persians (one of Sparta’s favorite enemies) read a message if it was intercepted? First, the Persian would have to be able to read the Spartan language. Then, even if the Persian knew the basic idea behind the code, he would have to know the radius of the cylinder to be able to read it. If he intercepted a message, he could eventually decode it using trial and error with cylinders with differing radii. Once he found the correct radius, he could decode anything the King or generals sent. So the code is not that secure to people who know the method. But the method was a well kept secret and only people who could read Spartan had any chance of cracking it. We would not consider this a very good code today, but it was good enough for the Spartans of 500 BC. Our next code comes from Julius Caesar from about 80 BC. Caesar used what’s known as a shift cipher to encode messages. We will use capitol letters for unencrypted letters and words (called the “cleartext”) and lowercase letters for the letters and words of the secret code (called the “cyphertext”). A shift cipher is one where you just shift the alphabet to the left. For example if Caesar shifted the alphabet 4 spaces to the left, the cleartext alphabet and the cyphertext alphabet would look like this Cleartext: ABCDEFGHIJKLMNOPQRSTU VWXY Z Cyphertext: d e f g h I j k l m no p q r s t u v w x y z a b c So “d” would stand for “A”, “e” would stand for “B”, “f” would stand for “C”, and so on. For example, the soothsayer might communicate with Caesar by sending the following secret message ehzduh wkh lghv ri pdufk A shift cipher is not particularly secure. If a message was intercepted, the interceptor would just have to try 25 different shifts. Of course, this assumes that the interceptor knows the cipher is a shift cipher. From what he can tell, it is just some sort of substitution cipher where we substitute each cleartext letter by a randomly assigned ciphertext letter. For example Cleartext: A B C D E F G H I J K L M N O P Q R S T U V W XY Z Ciphertext: o c x s f d h m n g u w i p r e b z v k q l y t j a If you know the ciphertext it is easy to decode a message. For example decode the following message using the substitution cipher above: n wrlf iokm. But if you don’t know the ciphertext then it is a lot harder to decode messages. There are 26! = 403291461126605635584000000 substitution ciphers (including the 25 shift ciphers), so trial and error won’t work here. But if you know what language is being used (for example, English, Latin, Arabic, etc), you can use what you know about that language to help you. For example, by far the most commonly used letter in the English language is the letter “e”, and the most common pair of letters is “th” so you might try to decode the message by first figuring out which ciphertext letter appears most often and assuming that’s an “e”. Here is a table showing the relative frequency of each letter in the English language. Letter a b c d e f g h i j k l m n o p q r s t u v w x y z Relative Frequency(%) 8.2 1.5 2.3 4.3 12.7 2.2 2.0 6.1 7.0 0.2 0.8 4.0 2.4 6.7 7.5 1.9 0.1 6.0 6.3 9.1 2.8 1.0 2.4 0.2 2.0 0.1 So, for example, in a random selection of English text, “a” occurs 8.2% of the time and “e” occurs 12.7% of the time. Substitution ciphers are what’s used in the newspaper cryptoquotes. Let’s try cracking a few cryptoquotes. We’ll use a computer program to make the guessing process involved less tedious. There are several encrypted jokes and mathematical quotes each encrypted using a different substitution cipher at the website http://www.nebrwesleyan.edu/people/kpfabe/AGAM To go to the first joke, go to http://www.nebrwesleyan.edu/people/kpfabe/AGAM/jok e1.txt Open another window in your browser and go to http://cryptoclub.math.uic.edu Click on Ciphers and then on Frequency Analysis (for substitution ciphers). Now copy and paste the encrypted joke from the first window into the box on the Frequency Analysis page. Click on the Frequency Analysis button and you will get a table which shows you the frequency of each letter in the encoded joke. Then you can start guessing plaintext letters. Click on the Try it! button to make the substitutions. You can make a few substitutions at a time until you’ve cracked the code. In all there are 10 jokes and 32 quotes. The quotes are definitely harder than the jokes. Play around a bit with some of them. You could also enter a cryptoquote from the newspaper into the box on the Frequency Analysis page. II. The Math You Need for RSA Cryptography The goal of the rest of this project is to teach you how to encode and decode messages using something called RSA cryptography. RSA cryptography uses modular arithmetic, prime numbers, greatest common divisors, other number bases, and related tools to code and decode secret messages. When you type your credit card number on Amazon (or any other website), your credit card number is encrypted using RSA cryptography. RSA cryptography is named after the three people who developed it, Ronald Rivest, Adi Shamir, and Leonard Adleman. We will first study each of the mathematical tools used in RSA cryptography. IIa. Modular Arithmetic You are already familiar with a special case of modular arithmetic, which we will call “clock arithmetic”. Question: If you start working at 10:00 and work 4 hours, what time is it when you stop working? Answer: 2:00 To answer this, you probably split up your hourse into 2 hours before 12:00 and 2 hours after. So you went to 12:00 and and pretended that was 0:00 and added 2 more hours. We might say that 10+4 (mod 12) = 14 (mod 12) = 2. What if we had an 8 hour clock instead of a 12 hour clock? Well then we couldn’t start at 10:00. We’d have to start at something less than 8:00. Question: If you start working at 5:00 and work for 7 hours, what time is it when you stop working? Answer: 4:00 How did you do this problem? You probably figured that 5:00 to 8:00 is three hours and then 4 more hours would be 4:00. Again, you essentially treated 8:00 as 0:00 and added on 4 more hours. We say that 5+7 (mod 8) = 12 (mod 8) = 4. How about this problem: Question: Still using the 8 hour clock, what if you started working at 5:00 and worked for 12 hours? When would stop working? Answer: 1:00 How did you do this last problem. You probably figured that 5:00 to 8:00 is 3 hours and then you needed 9 more hours. But your clock only goes til 8:00, so once around is 8 hours. Therefore you need to go 8 hours plus one more hour, putting you at 1:00. Again, we might say that 5+12 (mod 8) = 17 (mod 8) = 1. Clock arithmetic is also called modular arithmetic. With a 12 hour clock we are working mod 12. With an 8 hour clock we are working mod 8. Here are some slightly different questions about mod 8 and mod 12. Question: What is 20 (mod 8)? If we start at the top of the clock and go around the clock twice, we get 16 hours. What’s left is 4 hours, so 20 (mod 8) = 4. Another way of thinking about it is to ask yourself how many times 8 goes into 20 and what is the remainder. remainder 4. 8 goes into 20 twice with Question: What is 38 (mod 12)? 12 goes into 38 3 times with 2 leftover, so 38 (mod 12) = 2. Notice that, in general b(mod n) is the remainder when you divide b by n. Theorem 1: If b (mod n) = a then there is an integer k such that b-nk = a. a is the remainder when you divide b by n. k is the number of times n goes into b. We will use this characterization of b (mod n) later. You can use negative numbers in modular arithmetic, too. For example Question: What is -17 (mod 5)? Answer: -2 We will have to use mods of very large numbers for RSA cryptography. Question: What is 85,561 (mod 1998). To do this, you’ll need your calculator. First divide 85,561 by 1998, You will get 42.82332332, so 1998 goes into 85,561 42 times with some left over. To find the remainder, multiply 42x1998 and subtract that from 85,561. The answer you get must be less that 1998 (otherwise 1998 would be able to go into 85,561 at least one more time). In this case the answer is 1645. So 85,561 (mod 1998) = 1645. Here are some exercises for you to try. Exercises: 1. 2. 3. 4. 5. 6. 7. 8. 8523 (mod 46) 463 (mod 84) 9201 (mod 27) 3331 (mod 2) 8901 (mod 23,456) -7 (mod 23) 48 (mod 12) 49 (mod 7) IIb. More Modular Arithmetic What if we want to do addition mod 4? Consider the two modular equations 22 (mod 4) = 2 and 5 (mod 4) = 1. Notice that (22+5) (mod 4) = 27 (mod 4) = 3. And also 2+1 (mod 4) = 3. So we get the same answer whether we add 22+5 and then take the mod or just add the mods together, 2+1. What if we want to multiply 22x5 (mod 4)? We could first multiplty 22 and 5 and then take mod 4 of the answer: (22x5) (mod 4) = 110 (mod 4) = 2 or we can multiply the mods, 2x1 (mod 4)=2. We can make this into a Theorem: Theorem 2: If b (mod n) = a and d (mod n)= c then a+c = b+d (mod n) and ac = bd (mod n) Proof: Since b(mod n) = a, Theorem 1 tells us that b-nk=a for some integer k and since d(mod n) = c, d-nj=c for some integer j. Now finish the proof by adding a+c. To prove the second part of the theorem, consider ac=(bnk)(d-nj) = bd – bnj-dnk + n^2kj = n(nkj –bj-dk) + bd. Now nkjbj-dk is an integer so let l=(nkj-bj-dk). Then ac = bd – nl, so bd (mod n) = ac (this is because bd-nl (mod n) = bd- we have subtracted as many n’s from bd as we can). IIc. Fast Exponentiation Now we want to do exponentiation (mod n). When we get to RSA cryptography, we will want to exponentiate very large numbers with very large exponents and we will want to use very large mods, for example, we might want to compute 988139423 (mod 46,927). This is obviously impossible to do by hand or by calculator. There is a method, however, called Fast Exponentiation. To see how it works, let’s use an easier problem. Suppose we want to find 547 (mod 21). Normally you would think of multiplying 5 47 times and then taking (mod 21) of the answer. But remember that we are wanting a method to use for large numbers (and 5^47 is already too big for your calculator). We wouldn’t want to have to multiply 9881 by itself 39423 times. For our new method of exponentiating, we first find 5x where x is a power of 2. We will do this for n powers of 2 where 2n < 47 < 2n+1 (so here n is 5). We’ll see why we choose n like this soon. We get 52 (mod 21) = 25 (mod 21) = 4 54 (mod 21) = (52)2 (mod 21) = 42 (mod 21) =16 In this last step we used that 54 = (52)2 and then substituted in our previous answer for 52 (mod 21). Continuing, we get 58(mod 21) = (54)2 (mod 21) = 162(mod 21) = 256 (mod 21) =4 516 (mod 21) = (58)2 (mod 21) = 42 (mod 21)=16 532 (mod 21) = (516)2 (mod 21) = 162(mod 21) = 256 (mod 21) =4 Now we stop finding powers of 2 because 64>47 and so 564>547 and will not appear in the breakdown of 547 . Now we break down the exponent, 47, into powers of 2. 47=32+8+4+2+1, so 547 = 532+8+4+2+1=5^32 x 58 x 54 x 52 x 51 547 (mod 21) = 4 x 4 x 16 x 4 x 5 (mod 21) = 5120 (mod 21) (now you can see why we didn’t bother computing 564; in the breakdown of 47 into powers of 2, 64 would not come into play since 64>47). We have now broken our problem down into something we can do on our calculator. Since 5120/21 = 243.8 and 243x21 = 5103, we know that 5120 (mod 21) = 5120-5103 = 17 and so 547 (mod 21) = 17 Let’s try another one, 723 (mod 17). This time we will exponentiate using powers of 2 up to 24 since 24 < 23 < 25. 72 (mod 17) = 49 (mod 17) = 15 74 (mod 17) = (72)2 (mod 17) = 152 (mod 17) = 225 (mod 17) =4 8 4 2 7 (mod 17) = (7 ) (mod 17) = 42 (mod 17) = 16 716 (mod 17) = (78)2 (mod 17) = 162 (mod 17) = 256 (mod 17) =1 Now we have 23=16+4+2+1, so 723 (mod 17) = 716+4+2+1 (mod 17) = 716 x 74 x 74 x 7 (mod 17) = 1 x 4 x 15 x 7 (mod 17) = 420 (mod 17) = 12 Exercises: 1. 2. 3. 4. 5. 6. 25x36 (mod 4) (use your theorems) 1753247x235711131719 (mod 5) 117 (mod 21) 342 (mod 5) 3753 (mod 15) 742 (mod 18) IId. Solving Modular Equations Consider the following modular equations: 2x=1 (mod 5) 3x=1 (mod 6) 5x=1 (mod 8) 4x=1 (mod 10) 3x=1 (mod 11) 3x=7 (mod 11) 5x=1 (mod 7) 5x=4 (mod 7) 5x=1 (mod 18) 5x=7 (mod 18) Examine these equations. Which have solutions? Since they are modular problems, we only have to check values of x between 0 and m-1 (where m is the mod). For example, we can see that the second problem has no solution since (3)(0) = 0 (mod 6) (3)(1) = 3 (mod 6) (3)(2) = 6 (mod 6) = 0 (mod 6) (3)(3) = 9 (mod 6) = 3 (mod 6) (3)(4) = 12 (mod 6) = 0 (mod 6) (3)(5) = 15 (mod 6) = 3 (mod 6) So there is no x that makes 3x (mod) 6 = 1 Check the others and see if they have solutions. We see that some of these equations have solutions and some don’t. To be able to tell which have solutions and which don’t, without having to try every number between 0 and m-1, we have to understand the greatest common divisor, or gcd, of two positive integers. The gcd(j,k), where j and k are two positive integers, is the largest positive integer which divides both j and k. For example gcd(6,8)=2, gcd(15,25) = 5, and gcd(15,77)=1. Two positive integers, j and k, are said to be relatively prime to one another if gcd(j,k)=1; that is, they have no common divisors except 1. For example, 15 and 77 are relatively prime and 6 and 8 are not relatively prime. Looking back at our list of equations above, we can spot a pattern: There is a solution x to ax=1 (mod m) if and only if gcd(a,m)=1. This is part of Theorem 3. Theorem 3: Given a and m, there is a solution, x, to ax =1 (mod m) if and only if gcd(a,m) = 1. If there is a c such that ac=1 (mod m), then the solution to the equation ax=b (mod m) is x=bc (mod m). The second part of the Theorem tells us how to solve 3x=7 (mod 11) by first solving 3x=1 (mod 11). We know that 3x=1 (mod 11) has a solution because gcd(3,11)=1. We found this solution to be x=4. So the solution to 3x=7 (mod 11) must be (4)(7) = 28 (mod 11) = 6. IIe. Finding GCD’s How does one find the gcd of two numbers? One way is to list all the divisors for each number and find the largest number that appears in both lists. But as mentioned before, we will be working with large numbers once we finally get to RSA cryptography. How does one find the gcd of two large numbers? Let’s start with trying to find the gcd(169,55). It’s fairly easy to do this by hand as neither of these numbers has very many divisors, but we will use it as an example in a new method of finding gcd’s called the Euclidian Algorithm. Before we explain the algorithm there is a very important fact that we must be aware of Very Important Fact If a+b = c, then any number that divides both a and b also divides a+b and so also divides c. Similarly, if a number divides b and c, it must also divide a because a = b-c and anything that divides b and c also divides b-c. Similarly, anything that divides a and c must also divide b. We call this the 2 implies 3 fact. Notice that, by dividing 169 by 55, we find that So 169 = 3x55 + 4 4 = 169 – 3x55 This second equation shows us that anything that divides both 169 and 55 also divides 4, including the greatest common divisor of 169 and 55 (because it is, obviously, a divisor of both 169 and 55). The set of common divisors of 169 and 55 is therefore the same the set of common divisors of 55 and 4 (and also of 169 and 4). Therefore, gcd(169,55) = gcd(55,4). Now divide 55 by 4 to get 55 = 13x4 + 3 3 = 55 – 13x4 So if something divides both 55 and 4 it must also divide 3, so gcd(55,4) = gcd(4,3). Pretend that you don’t know gcd(4,3) and continue dividing: 4 = 1x3 + 1 1 = 4-1x3 So if something divides both 4 and 3, it must also divide 1, so gcd(4,3) = gcd(3,1). The greatest common devisor of anything and 1 is 1, so gcd(169,55) = gcd(55,4) = gcd(4,3) = gcd(3,1) = 1. Therefore 169 and 55 are relatively prime. Let’s try another one. Find gcd(556,224). 556= 2x224 + 108 108 = 556 – 2x224 So gcd(556,224) = gcd(224,108) 224 = 2x108 + 8 8 = 224 – 2x108 So gcd(224,108) = gcd(108,8) 108 = 13x8 + 4 4 = 108-13x8 So gcd(108,8) = gcd(8,4) 8 = 2x4 + 0 When you hit 0 or 1 you are done. If you hit 0, go back to the previous line and find the gcd. So here the gcd(8,4) = 4. So gcd(556,224) = gcd(224, 108) = gcd(108,8) = gcd(8,4) = 4. If you get down to 1 then the two numbers you started with are relatively prime. We can compress out calculations a bit by leaving out the second equation in each pair, like this: 556= 2 x 224 + 108 224 = 2 x 108 + 8 108 = 13 x 8 + 4 8=2x4+0 IIf. Solving Modular Equations Part 2 If we use the Euclidean Algorithm to go backwards when gcd(a,m)=1, we can solve the equation ax (mod m)=1. Let’s see how this works. Let’s solve the equation 169x = 1 (mod 55). We will go back to our first problem where we found that gcd(169,55)=1 and work backwards. Recall we had 1 = 4-1x3 Now substitute 3=55-13x4 1 = 4-(55-13x4) 1 = 14x4 – 55 Now substitute 4=169-3x55 and continue in this manner until you get 1=ax169 + bx55 (a and b can be positive or negative). 1 = 14x(169-3x55) – 55 1 = 14x169 – 14x3x55 – 55 1 = 14x169 – 42x55 – 55 1 = 14x169 – 43x55 So 169x14 = 1+43x55 = 1 mod(55) . So the solution to 169x=1 (mod 55) is x=14. Notice that we have also solved 55x = 1 (mod 169) since -55x43 = 1-14x169 = 1 (mod 169). The solution to 55x=1 (mod 169) is x=-43. If you don’t want a negative answer then add 169 to your answer, enough times so that your answer is no longer negative. In this case we just need to add 169 once to get a positive answer: x= -43 + 169 = 126. Notice that these two problems would be very hard to solve by hand. The process of finding the gcd and then going backwards is called the Extended Eulidean Algorithm and we can use it to solve equations like 55x=1 (mod 169). Let’s do another example: Solve 61x=1 (mod 126). First we use the Euclidean Algorithm to find gcd(126,61). 126 = 2 x 61 + 4 61 = 15 x 4 + 1 4 = 4x1 + 0 So gcd(126,61)=1 and, working backwards, we get 1 = 61 – 15x4 Substitute 4=126-2x61 1 = 61 – 15x(126-2x61) 1 = 61 – 15x126 + 30x61 1 = 31x61 – 15x126 So 61x31=1+15x126 = 1 (mod 126), so the solution of 61x=1 (mod 126) is x=31. Notice that you have to be very careful when going backwards, substituting the right numbers with the goal of getting 1 = au + bv where we start with gcd(a,b). Let’s review how to solve ax=b (mod m). 1. Check that gcd(a,m)=1, otherwise the equation has no solution. If the numbers are small enough factor each one and find the greatest common divisor. Otherwise use the Euclidean Algorithm. 2. Assuming that gcd(a.m)=1, find integers u and v such that au+mv=1. You may just be able to do this by looking at the equation, for example 2u + 7v = 1 has solutions u=-3 and v=1. Otherwise use the Extended Euclidean Algorithm to go backwards from finding gcd(a,m) 3. Now rearrange the equation so that au=1-nv. Then we have au=1 (mod n). Make your u positive if it’s not already. Things are just easier that way. 4. Now use Theorem 3 to solve au=b (mod m) Exercises: 1. Find the smallest positive integer x, so that 243 x = 1 (mod 40) 2. Find the smallest positive integer x, so that 40 x = 1 (mod 243) 3. Find the smallest positive integer x, so that 61 x =7 (mod 126) 4. Find the smallest positive integer x, so that 562 x = 8 (mod 57) 5. Find three integer solutions of 53 x = 4 (mod 169) IIg. Powers Mod m Following are tables of powers mod m for various m. Once the pattern becomes apparent, the line in the table is ended. For example, once a zero appears, we know the rest of that line will be zeroes. Once a one appears, we know that the line will repeat itself. For example 21 (mod 5) = 2, 22 mod(5) = 4, 23 (mod 5) = 3, 24 (mod 5)=1, 25 (mod 5) = 2 and as we take successive powers of 2 (mod 5) we will get 4, 3, 1, 2, 4, 3, 1, 2, … A question we will need to answer in order to do RSA cryptography is under what conditions, given a and m, is there a positive integer k such that ak=1 (mod m)? Looking at the table of powers mod m, we see, for example, that there is no such k for a=2 and m=4. This is because 22= 0 (mod 4) and so 2K=0 (mod 4) for all k >= 2. On the other hand, there is a k when a=1, 2, 3, or 4 and m=5. For a =1 or 4, k=2 works, and when a=2 or 3, k=4 works. Looking even more closely at the table, it seems reasonable to say Theorem 4: If gcd(a,m)=1, then there is an integer k such that ak=1 (mod m). This leads us to another question: Given m, is there one value of k such that ak = 1 (mod m) for all a with gcd(a,m)=1? Let’s look at the table. When m=5, k=4 will work for all values of a with gcd(a,m)=1 (mod m). For example, 14=1 (mod 5), 24=1 (mod 5), 34 =1 (mod 5), 4^4=1 (mod 5). When m=6, k=2 works, for example 12 =1 (mod 6) and 52 = 1 (mod 6). When m=7, k=6 works (verify this). Restricting ourselves to prime values of m, we can spot a pattern: It looks like when m is prime am-1=1 (mod m) for all a with gcd(a,m)=1. In fact, this is a theorem called Fermat’s Little Theorem. Theorem 5 (Fermat’s Little Theorem): If p is prime, then for any integer a with gcd(a,p)=1, we have ap-1=1 (mod p). Proof: Let p be a prime and let a be any integer with gcd(a,p)=1. Assume 1 <= a <= p-1. First we will show that a, 2a, 3a, … (p-1)a are all distinct (mod p). That is, there is no ka and ja such that ka=ja (mod p) (assuming k and j are distinct and less than p). Let’s assume there are such values j and k. This kind of proof is called proof by contradiction where we prove something by assuming the opposite is true and then show there is a contradiction. So suppose that aj=ak (mod p). By definition, we know that p divides aj-ak=a(j-k). So, since p is prime, either p divides a or p divides j-k. But p can’t divide a because 1<=a < p, so it must divide j-k, which means that j=k (mod p). But j and k are also between 1 and p and so this means j=k. So now we know that a, 2a, 3a, …, (p-1)a are all distinct mod p. But there are p-1 of these things so we know that, as a set, {a mod p), 2a (mod p),…, (p-1)a (mod p)}= {1,2,3,…p-1} (this is not to say that a (mod p) = 1, just that a (mod p) is in the set {1,2,3,…p1}which must be true because the set {a (mod p), 2a (mod p), 3a (mod p),…, (p-1)a (mod p)} has p-1 distinct elements all less than p and all distinct. If we multiply all the elements in each of the two sets together, we get (a)(2a)(3a)…(p-1)a = (1)(2)(3)…(p-1) (mod p). Finish this proof to show that ap-1=1 (mod p). What if p is not prime? Is there a value of k such that ak=1 (mod p) for all a with gcd(a,p)=1? To answer this, we need a new concept called the Euler phifunction (phi is pronounced “fee”). Here is the definition of the Euler phi-function: Definition: Given a positive integer n, phi(n) = the number of positive integers less than n and relatively prime to n. For example, phi(5) = 4 since 1, 2, 3, and 4 are all relatively prime to 5 and less than 5. Phi(6) = 2, since the only numbers less than 6 that are relatively prime to 6 are 1 and 5. Exercises: 1. Compute phi(15) 2. Compute phi(21) 3. Suppose p is prime, what is phi(p)? Theorem 6 (Euler’s Theorem): If gcd(a.m)=1, then aphi(m) = 1 (mod m) Let’s try this. Let m=6. Then gcd(5,6)=1 and 5phi(6) = 52 = 1 (mod 6). Also gcd(1,6)=1 and 1phi(6) =12 = 1 (mod 6). Let’s try another one. Let m=8. Then gcd(5,8) = 1 and 5phi(8)=54 = 625 (mod 8) = 1. Also gcd(7,8)=1 and 7phi(8)= 74 = 2401 (mod 8) = 1. Exercises: 1. Let m=9. Find all integers x such that 0 < x < p with gcd(x,9) =1 Compute xphi(m) for all these values of x. We can compute phi(m) by simply listing all the positive integers less than m and crossing out the ones that share factors with m. This is probably what you did when you computed phi(15) and phi(21). But remember that when we get to RSA cryptology we will be working with large numbers and this process would be nearly impossible. We are in luck however because in RSA cryptology, the m that we choose will be the product of 2 large primes, p and q, and we have a theorem that tells us what phi(p,q) is. Theorem 7: If p and q are distinct prime numbers (i.e. p is not equal to q) and m=pq, then phi(m)=phi(pq)=(p-1)(q-1). Let’s try a couple of examples. Let m=6. We know that phi(6)=2. We also know that 6 is the product of two prime numbers, 2 and 3. Using the theorem, phi(6)=phi(3x2)=2x1=2. So the theorem holds in this case. Let’s try another one. Let m=15. We know that phi(15)=8. We also know that 15 is the product of two primes, 3 and 5. From the theorem, phi(15)=phi(5x3) = 4x2 = 8. This example also holds. We won’t prove this theorem because this project is already long, but the proof is not that hard. IIh. Numbers in Other Bases When we first learned about place value in elementary school, we learned that we could write 364 like this: 364 = 3x100 + 6x10 + 4 and later, when we learned about exponents, we could write this as 364 = 3x102 + 6x101 + 4x100 In our number system, we write integers and rational numbers in place-value notation, where each place indicates a different power of 10. The Babylonians, of more than 3000 years ago, invented place-value notation. The Babylonians shared their discovery with the Hindus of more than 2600 years ago. The Hindus transmitted it to the Arabs by 600 AD and the Arabs shared it with the Europeans around 1200 AD. The choice of 10 as the base is no accident. Look at your fingers! Ten is not the only base to have been used. The Babylonians sometimes used base 60. The Mayans used base 20 (also no accident- look at your toes, too!). Today bases 2, 8 and 16 are used by computers. Let’s look at base 8. In base 8, we use the digits 0, 1, 2, 3, 4, 5, 6, and 7 (notice that in base 10 we use the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9). If we want to notate that a number is in base 8 we use a subscript: 4238, which is read “four two three base eight”. The place values in base 8 are 80, 81, 82, 83, 84, etc. So we have the 1’s place all the way to the right, then the 8’s place, then the 64’s place, then the 512’s, then the 4096’s place and so on. If we write 4238 out in terms of its place values we would have 4238 = 4x82 + 2x81 + 3x80 This helps in converting the base 8 number to a base 10 number. 4238 = 4x64 + 2x8 + 3 = 256 + 16 + 3 = 275 What if we want to convert a number from base 10 to base 8? Let’s convert the number 7492 to a base 8 number. It looks like the first base 8 place value we will use is the 4096’s place (the next highest place value is 32768, which we get by multiplying 4096x8, and this is bigger than 7492 so there are no 32768’s in 7492, but there is a 4096 in 7492). How many 4096’s are in 7492? Just one. So 7492 will have a 1 in the 4096’s place. How much is left of my 7492 after I have taken out 1 4096? 7492-4096 = 3396. This is bigger than 512, so there will be some 512’s. Divide 3396 by 512 to see how many 512’s there are. 3396/512=6.632125, so there are 6 512’s. How much is left after you take out the 6 512’s? Multiply 512 by 6 and subtract that from 3396: 3396512x6 = 324. 324 is bigger than 64 so it will have some 64’s in it. Again, divide 324 by 64: 324/64 = 5.0625. So there will be 5 64’s in 324. What’s left is 324 – 64x5 = 4. 4 is smaller than 8 so there will be nothing in the 8’s place and 4 left in the 1’s place so we have that 7492 has 1 4096, 6 512’s, 5 64’s, 0 8’s and 4 1’s so 7492 = 165048 Here’s a fun little exercise. Convert the largest 3 digit base 8 number to base 10. First, what is the largest 3 digit base 8 number? It’s 7778. Now 7778 = 7x64 + 7x8 + 7x1 = 448 + 56 + 7 = 511 Notice that 511 is one less than the next place value in base 8, which is 512. Surprising? Not really. Think about the largest 3 digit number in base 10. It is 999 which is one less than 1000. What about bases bigger than 10? They pose the question of what notation to use for digits larger than 9. For example let’s think about the hexadecimal, or base 16, digits. We need digits for 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15. But we can’t use a number with two digits to describe a base 16 digit. Instead, once we get past 9, we use letters. We use A for 10, B for 11, C for 12, and so on til we get to F for 15. Let’s convert the number C516 to base 10: C516 = Cx16 + 5 = 12x16 + 5 = 192 + 5 = 197 If you’ve built a web page using HTML, you may have used the hexadecimal system to set your background color on the web page. Background colors are coded in 6 digit numbers, where the first two numbers (or letters) are the hexadecimal number for the amount of red in the color, the next two digits are the hexadecimal number for the amount of green is in the color and the last two digits are the hexadecimal number for how much blue is in the color. A color code FF119A will set the background to have as much red as possible, not much green, and a moderate amount of blue. Exercises: 1. Convert 3048 to base 10. 2. Convert FFD16 to base 10. 3. Convert 10112 to base 10 (base 2 is a computer’s favorite base- base 2 numbers are also called binary numbers). 4. Convert 41336 to base 10 5. Convert 851 to base 8 6. Convert 1389 to hexadecimal 7. Convert 322 to binary 8. Convert 677 to base 9 III. Public Key Cryptography and the RSA System Suppose your best friend moves to California and you want to communicate with her via email. If you want to be sure that no one can intercept and read your messages, you’ll need to encode them in some way. If you decide to use a substitution cipher, you must first decide on the specific key, i.e. what substitutions you’ll make and you need to share this key with your friend, otherwise she won’t be able to read your emails. But if someone intercepts your key, they will be able to read all your emails. For a more realistic scenario, consider purchasing things on Amazon. When you type in your credit card number into the purchase form, it gets sent through cyberspace to Amazon. This message could be intercepted, so you certainly want it encoded in a way so that no one but Amazon can decode it. You don’t want to meet with a representative with Amazon to decide on your encoding key. And Amazon has literally millions of customersthey can’t keep millions of different keys. The main problem is that substitution ciphers, besides being easy to crack, are symmetric. That means that anyone who knows how to encode a message also knows how to decode it. We call this a private key system, and the effectiveness of the cipher depends upon the secrecy of the key. Wouldn’t it be nice if there was a way to send a message using a system where, even if someone else finds out how to encode a message using your system, they can’t decode messages. So they can send a message using your system, but they can’t read a message that you write because they can’t decode a message sent in your system. This is the idea behind Public Key Cryptography. A public key cryptographic system actually has two keys- one public and one private. In order to encode a message, all one needs to know is the public key, but to decode the message you need a private key, too. Think of having a box with a combination lock. Someone can put a message in your box and lock it, but only you know the combination for the lock and so only you can open the box and read the message. To communicate with your friend in California, you would each design your own set of keys, one public and one private. You would tell each other your public keys, but keep your private keys to yourself. Then to send a message to your friend, you would encode it using her public key. There would be no reason for you to know her private key- only she needs to know it. So there’s no need to exchange private keys and so nobody can ever intercept them and so no one can read the messages you send. Wouldn’t it be nice to have a system like this? Well there is one! More than one actually, but the one we will study is called the RSA system. Remember it was named after its inventors, Ronald Rivest, Adi Shamir, and Leonard Adleman, who invented it in 1977 and who were very clever. Amazon uses the RSA system to encode and decode credit card numbers. Everyone uses Amazons public key (they only have one), but no one except Amazon knows their private key, so no one can decode encoded credit card numbers. Throughout this project, we have learned all the math we need to know to use the RSA system. In the RSA system, the public key is two integers, n and e, and the private key is a 3rd integer d. In practice, these integers need to be very large; typically n will have at least 300 digits. Of course, n, e, and d need to be chosen in a special way to make the system work. In particular, n will be the product of two distinct large primes, and d and e will be integers less than n with de = 1 (mod phi(n)). The private key, d, is secret because given e and n, in order to compute d, you would have to factor n, which is computationally impossible for very large n. How would you figure out the two primes whose product is n? In the RSA system each user sets up his or her own public and private keys. The public keys are very public- they are published in some location so that everyone who wants them can find them. The private keys are very private. Only the owner of the private key knows the private key and if someone else finds it out, the system is no longer any good because that person could then decode messages sent by the owner of that private key. Let’s imagine two friends, Alice and Bob, who use the RSA system to send messages back and forth. We’ll describe how Alice would choose her keys and then we’ll see how Bob would sent her an encoded message and how she would decode it. Pick primes: First Alice would pick two large primes, p and q, each having about 150 digits. For our purposes, we’ll use smaller primes, say p=281 and q=167. Calculate n and phi(n): Next Alice sets n=pq. Since Alice knows the factorization of n, she can compute phi(n)=(p-1)(q-1) (recall Theorem 7). Since, in a real scenario, n would have about 300 digits, it would be very difficult to compute phi(n) without knowing the factorization of n and, since n is so large, it would be impossible to factor. In our example, we have n=281x167=46,927 and phi(n)=280x166 = 46,480. Choose e (the encoding exponent): The next step is to pick a value e, less than n, at random such that gcd(e,phi(n)) = 1. Alice does this by first selecting a value for e and then using the Euclidean Algorithm to see if gcd(e,phi(n))=1. If not, she tries another e. We’ll choose e=39,423. Let’s check that gcd(39,423, 46,480) = 1 46,480 = 1 x 39423 + 7057 39423 = 5x7057 + 4138 7057 = 1x4138 + 2919 4138 = 1x2919+ 1219 2919 = 2x1219 + 481 1219 = 2x481 + 257 481 = 1x257 + 224 257 = 1x224 + 33 224 = 6x33 + 26 33 = 1x26 + 7 26 = 3x7 + 5 7 = 1x5 + 2 5 = 2x2 + 1 So gcd (39,423, 46,480) = 1. Find d (the decoding exponent): Now Alice needs to find a value of d so that de = 1 (mod phi(n)), or, in our case dx39,423 = 1 (mod 46480). She does this by going backwards on the Euclidean Algorithm. Here are the first few steps: 1 = 5 – 2x2 1 = 5 - 2(7-5) 1 = 5 – 2x7 + 2x5 1 = 3x5 – 2x7 1 = 3x(26-3x7) – 2x7 1 = 3x26 – 9x7 -2x7 1 = 3x26 – 11x7 1 = 3x26 – 11(33-26) 1 = 3x26 – 11x33 + 11x26 1 = 14x26 – 11x33 1 = 14(224-6x33) – 11x33 1 = 14x224 – 84x33 – 11x33 1 = 14x224 – 95x33 1 = 14x224 – 95x(257-224) 1 = 14x224 – 95x257 + 95x224 1 = 109x224 – 95x257 Continue by substituting in for 224 and 257 and then substituting again and again until you get 1 = ax39,423 + bx46,480 ax39,423 = 1 – bx46,480 = 1 (mod 46,480) Then a is your d. It turns out that d=26,767. Now Alice will publish her public keys: n=46, 927 and e=39,423, but she will keep the secret key d=26,767. She will not tell anyone the secret key, not even Bob. Incidentally, she should destroy her records of the values of p, q and phi(n) because these are no longer needed and if someone found them out, they could compute her secret decoding exponent, d. Now we will see how to send an encoded message to Alice using her public keys. Because the keys are public, anyone can send Alice an encoded message, but only Alice can decode the messages because only she knows the decoding exponent. Suppose Bob wants to send the message “NO WAY” to Alice’s invitation to go skydiving. Bob is going to first change his message to a string of numbers. First he needs to break up his message into blocks that are as large as possible so that frequency analysis cannot be used to decode his message. He will use blocks of size k where 27k is the largest integer power of 27 which is less than Alice’s public number n. Let’s look at powers of 27: 271=27, 272 =729, 273=19683, 274 = 531411. So 273 =19683 is less than n=46927, but 274 is greater than n, so Bob will use blocks of size k=3. We include spaces in the blocks, so he will break up his message into two blocks: NO and WAY. Next he will convert his alphabetic blocks into numbers, one number for each block. Let “A” be 0, “B” be 1, “C” be 2, … “Z” be 25 and “ “ be 26. He will think of the letters in each block as being coefficients of a base 27 number, like so NO = 13x272 + 14x271 + 26x270 = 9,881 because 13 corresponds to “N”, 14 corresponds to “O” and 26 corresponds to “ “. So now “NO “ corresponds to the base 10 number 9881. We do the same thing to “WAY”. WAY = 22x272 + 0x271 + 24x270 = 16,062 Now he computes each number raised to the encoding exponent and takes mod n of that number. So to encode “NO “ he computes 9,88139423 (mod 46,927) and to encode “WAY” he computes 16,06239423 (mod 46,927). To do the exponentiation, Bob uses fast exponentiation. To make this even faster, he uses a fast exponentiation program at http://www.nebrwesleyan.edu/people/kpfabe/FastExp,html. He gets 9,88139423 (mod 46,927) = 9388 And 16,06239423 (mod 46,927) = 21,358 Now Bob converts his encoded numbers into base 27 numbers. He needs to use k place values where 27k < 46926 < 27k+1 because he could have numbers as large as 46926 to convert. So he will use place values 273, 272,, 271, 270. We go back to the powers of 27 we computed earlier to compute the place values. Note that 272 < 9388 < 273, so we will have 0 in the 273 place. Next 272=729, so we see how many 729’s are in 9388. The answer is 12. 9388=12x729 + 640. Now we see how many 27’s are in 640. The answer is 23. 640=23x27 + 19. 19 is less than 27, so there will be 19 1’s. So 9388 = 0x273 + 12x272 + 23x271 + 19x270 which corresponds to the string “AMXT” because 0 corresponds to “A”, 12 corresponds to “M”, 23 corresponds to “X” and 19 corresponds to “T”. Now we have to write 21,538 in base 27. 21538 has 1 273 in it. I’ll let you do the rest of the conversion. You should get 21,538 = 1x273 + 2x272 + 8x271 + 1x270 which corresponds to the string “BCIB”. So Bob sends the encoded string AMXTBCIB. Note that he only used Alice’s public keys, n and e to encode his message. Now let’s see how Alice decodes Bob’s message. She has to “undo” what Bob did. She begins by breaking up the string into 2 blocks of 4, “AMXT” and “BCIB”. She knows to use blocks of 4 because she knows the value of her public key, n, and she can figure out that 273 < n < 274 so she needs place values of 273, 272, 271 and 270. The alphabetic blocks stand for base 27 numbers and she can convert these into base 10 numbers like so: AMXT = 0x273 + 12x272 + 23x271 + 19x270=9388 BCIM = 1x273 + 2x272 + 8x271 + 1x270 = 21,358 using that 0 corresponds to “A”, 12 corresponds to “M” and so on. To decode each number, she uses her decoding exponent, d, and takes the resulting number (mod 46,927). So to decode AMXT, she computes 938826,767 (mod 46,927) = 9881 and to decode BCIB, she computes 21,35826767 (mod 46,927) = 16062. These are the same numbers Bob got when he first converted his alphabetic blocks to numbers. Now Alice just needs to convert these numbers to base 27 numbers and use the coefficients to get the original letters in Bob’s code. We already know the conversion because we saw Bob do it, but in real life, Alice wouldn’t have seen Bob do it. 9881 = 13x272 + 14x271 + 26x270 corresponds to “NO “ 16062 = 22x272 + 0x271 + 24x270 corresponds to “WAY” The message Bob sent to Alice is “NO WAY”. Too bad for Alice, she will have to find someone else to go skydiving with. Alice decoding Bob’s message looks like magic, but it is really just clever mathematics. Remember that in real life we would use much larger numbers. We would pick p and q to be very large primes, each with about 150 digits. That way one wouldn’t be able to factor n with a computer or calculator. If one picks a number with 150 digits, how does one know whether or not it is prime? The problem of finding large primes is still an active research area in mathematics. Exercises: 1. Encode a short message for Alice, using her public keys. Break the message into blocks of three letters. If the length of the message is not divisible by three, then pad the end with spaces. When you encode your message, notice that you only use her public keys, n and e. When I pretend to be Alice and decode your messages, I will have to use d. 2. Suppose you want to send me a coded message and suppose you pick p=701 and q=331. Then n=701x331 = 232031. phi(n)=(p1)(q-1) = 231,000. Suppose you also pick e=10163. This value of e is prime and does not divide 231,000 so gcd(e,phi(n))=1. So your public keys are n=232031 and e=10163. Compute your private decoding key (remember we want it to be positive) and then get rid of your values for p, q and phi(n), so that no one can figure out your value of d (in real life you would pick your own values of p and q, but if you did, I would have to deal with many different pubic keys and encode my message differently for each one of you). You can check with me that you got the right decoding exponent (again, in real life, since you would each have your own public keys, you would each have your own decoding exponent, d, and I wouldn’t know what d was because I wouldn’t know what your p and q are, but if you each had your own decoding exponent, I would have to check all of them to make sure you computed it correctly). These will be your keys for decoding a message from me. Here is my message to you: HBORDFPOKNMP. Note that when I encoded this message, I only used your values for n and e. Now you should decode this message using your values for n and e and d.