codes

advertisement
Crytography
Cryptography is the study of making and breaking codes for
communicating secret information. In this project we will study
some old-fashioned methods of making and breaking codes and we
will study a modern method called RSA cryptography which is
used, among other things, to encode your credit card number when
you purchase things online.
I. The Spartan Code and Substitution
Cyphers
Throughout history there has always been a need to send
coded messages. Kings must send coded messages to their
generals, generals to their colonels. Spies need to send coded
messages to whomever they’re spying for and so on. One of the
first known codes was called the Spartan code. The Spartan
government used it 2500 years ago to communicate to its generals
in the field. It worked like this. Each general had some type of
cylinder, perhaps a cane or a special spear, and all of these
cylinders were of the same radius. The King also had a cylinder
of the same radius. The King would wrap a ribbon around his
cylinder and write horizontally across the wound ribbon. The
King might need a few ribbons to write his message since the
cylinder had such a small radius. The ribbons would then be
unwound and given to a messenger who might wear them as a belt.
Anyone who saw it just thought it was a belt decorated with letters
(and not very many people could read back then anyway). When
the general received the ribbon, he would wrap it around his
cylinder to see what the king had written. The letters would line
up to spell out the secret message. For example maybe the ribbon,
unwound, said T P O E S A H T M G F N E O E R O D W G S E
RWORFEHRRARKIIDPOWDTCHMODIRYT
R E N Y C H D N G, but when wound around the cylinder and
each row of letters read horizontally, it would read
T H E W O R D C R Y
P T O G R A P H Y C
O M E S F R O M T H
E G R E E K W O R D
S F O R H I D DE N
A N DW R I T I N G
You can try this at home with a ribbon and a paper towel
tube. Wrap the ribbon around the tube and then write horizontally
across the tube, one letter on each section of the ribbon. Unwind
the ribbon and you can see that it just looks like a string of random
letters. Wind it around the tube again and you can see the
message again. Hand in your ribbon with your project and I will
wrap it around my own paper towel tube to read your secret
message.
How secure is the Spartan code? How easily could the
Persians (one of Sparta’s favorite enemies) read a message if it was
intercepted? First, the Persian would have to be able to read the
Spartan language. Then, even if the Persian knew the basic idea
behind the code, he would have to know the radius of the cylinder
to be able to read it. If he intercepted a message, he could
eventually decode it using trial and error with cylinders with
differing radii. Once he found the correct radius, he could decode
anything the King or generals sent. So the code is not that secure
to people who know the method. But the method was a well kept
secret and only people who could read Spartan had any chance of
cracking it. We would not consider this a very good code today,
but it was good enough for the Spartans of 500 BC.
Our next code comes from Julius Caesar from about 80 BC.
Caesar used what’s known as a shift cipher to encode messages.
We will use capitol letters for unencrypted letters and words
(called the “cleartext”) and lowercase letters for the letters and
words of the secret code (called the “cyphertext”). A shift cipher
is one where you just shift the alphabet to the left. For example
if Caesar shifted the alphabet 4 spaces to the left, the cleartext
alphabet and the cyphertext alphabet would look like this
Cleartext:
ABCDEFGHIJKLMNOPQRSTU
VWXY Z
Cyphertext: d e f g h I j k l m no p q r s t u v w x y z
a b c
So “d” would stand for “A”, “e” would stand for “B”, “f” would
stand for “C”, and so on. For example, the soothsayer might
communicate with Caesar by sending the following secret message
ehzduh wkh lghv ri pdufk
A shift cipher is not particularly secure. If a message was
intercepted, the interceptor would just have to try 25 different
shifts. Of course, this assumes that the interceptor knows the
cipher is a shift cipher. From what he can tell, it is just some sort
of substitution cipher where we substitute each cleartext letter by a
randomly assigned ciphertext letter. For example
Cleartext: A B C D E F G H I J K L M N O P Q R S T U V W
XY Z
Ciphertext: o c x s f d h m n g u w i p r e b z v k q l y
t j a
If you know the ciphertext it is easy to decode a message. For
example decode the following message using the substitution
cipher above: n wrlf iokm. But if you don’t know the ciphertext
then it is a lot harder to decode messages. There are 26! =
403291461126605635584000000 substitution ciphers (including
the 25 shift ciphers), so trial and error won’t work here. But if
you know what language is being used (for example, English,
Latin, Arabic, etc), you can use what you know about that
language to help you. For example, by far the most commonly
used letter in the English language is the letter “e”, and the most
common pair of letters is “th” so you might try to decode the
message by first figuring out which ciphertext letter appears most
often and assuming that’s an “e”. Here is a table showing the
relative frequency of each letter in the English language.
Letter
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
Relative Frequency(%)
8.2
1.5
2.3
4.3
12.7
2.2
2.0
6.1
7.0
0.2
0.8
4.0
2.4
6.7
7.5
1.9
0.1
6.0
6.3
9.1
2.8
1.0
2.4
0.2
2.0
0.1
So, for example, in a random selection of English text, “a” occurs
8.2% of the time and “e” occurs 12.7% of the time. Substitution
ciphers are what’s used in the newspaper cryptoquotes.
Let’s try cracking a few cryptoquotes. We’ll use a computer
program to make the guessing process involved less tedious.
There are several encrypted jokes and mathematical quotes each
encrypted using a different substitution cipher at the website
http://www.nebrwesleyan.edu/people/kpfabe/AGAM
To go to the first joke, go to
http://www.nebrwesleyan.edu/people/kpfabe/AGAM/jok
e1.txt
Open another window in your browser and go to
http://cryptoclub.math.uic.edu
Click on Ciphers and then on Frequency Analysis (for
substitution ciphers). Now copy and paste the encrypted
joke from the first window into the box on the Frequency Analysis
page. Click on the Frequency Analysis button and you will
get a table which shows you the frequency of each letter in the
encoded joke. Then you can start guessing plaintext letters.
Click on the Try it! button to make the substitutions. You can
make a few substitutions at a time until you’ve cracked the code.
In all there are 10 jokes and 32 quotes. The quotes are definitely
harder than the jokes. Play around a bit with some of them. You
could also enter a cryptoquote from the newspaper into the box on
the Frequency Analysis page.
II. The Math You Need for RSA
Cryptography
The goal of the rest of this project is to teach you how to
encode and decode messages using something called RSA
cryptography. RSA cryptography uses modular arithmetic, prime
numbers, greatest common divisors, other number bases, and
related tools to code and decode secret messages. When you type
your credit card number on Amazon (or any other website), your
credit card number is encrypted using RSA cryptography. RSA
cryptography is named after the three people who developed it,
Ronald Rivest, Adi Shamir, and Leonard Adleman. We will first
study each of the mathematical tools used in RSA cryptography.
IIa. Modular Arithmetic
You are already familiar with a special case of modular
arithmetic, which we will call “clock arithmetic”.
Question: If you start working at 10:00 and work 4 hours, what
time is it when you stop working?
Answer: 2:00
To answer this, you probably split up your hourse into 2 hours
before 12:00 and 2 hours after. So you went to 12:00 and and
pretended that was 0:00 and added 2 more hours. We might say
that 10+4 (mod 12) = 14 (mod 12) = 2.
What if we had an 8 hour clock instead of a 12 hour clock? Well
then we couldn’t start at 10:00. We’d have to start at something
less than 8:00.
Question: If you start working at 5:00 and work for 7 hours,
what time is it when you stop working?
Answer: 4:00
How did you do this problem? You probably figured that
5:00 to 8:00 is three hours and then 4 more hours would be 4:00.
Again, you essentially treated 8:00 as 0:00 and added on 4 more
hours. We say that 5+7 (mod 8) = 12 (mod 8) = 4.
How about this problem:
Question: Still using the 8 hour clock, what if you started
working at 5:00 and worked for 12 hours? When would stop
working?
Answer: 1:00
How did you do this last problem. You probably figured that 5:00
to 8:00 is 3 hours and then you needed 9 more hours. But your
clock only goes til 8:00, so once around is 8 hours. Therefore you
need to go 8 hours plus one more hour, putting you at 1:00.
Again, we might say that 5+12 (mod 8) = 17 (mod 8) = 1.
Clock arithmetic is also called modular arithmetic. With a
12 hour clock we are working mod 12. With an 8 hour clock we
are working mod 8. Here are some slightly different questions
about mod 8 and mod 12.
Question: What is 20 (mod 8)?
If we start at the top of the clock and go around the clock twice, we
get 16 hours. What’s left is 4 hours, so 20 (mod 8) = 4. Another
way of thinking about it is to ask yourself how many times 8 goes
into 20 and what is the remainder.
remainder 4.
8 goes into 20 twice with
Question: What is 38 (mod 12)?
12 goes into 38 3 times with 2 leftover, so 38 (mod 12) = 2.
Notice that, in general b(mod n) is the remainder when you divide
b by n.
Theorem 1: If b (mod n) = a then there is an integer k such
that b-nk = a. a is the remainder when you divide b by n. k is
the number of times n goes into b.
We will use this characterization of b (mod n) later.
You can use negative numbers in modular arithmetic, too.
For example
Question: What is -17 (mod 5)?
Answer:
-2
We will have to use mods of very large numbers for RSA
cryptography.
Question: What is 85,561 (mod 1998).
To do this, you’ll need your calculator. First divide 85,561 by
1998, You will get 42.82332332, so 1998 goes into 85,561 42
times with some left over. To find the remainder, multiply
42x1998 and subtract that from 85,561. The answer you get must
be less that 1998 (otherwise 1998 would be able to go into 85,561
at least one more time). In this case the answer is 1645. So
85,561 (mod 1998) = 1645. Here are some exercises for you to
try.
Exercises:
1.
2.
3.
4.
5.
6.
7.
8.
8523 (mod 46)
463 (mod 84)
9201 (mod 27)
3331 (mod 2)
8901 (mod 23,456)
-7 (mod 23)
48 (mod 12)
49 (mod 7)
IIb. More Modular Arithmetic
What if we want to do addition mod 4? Consider the two
modular equations 22 (mod 4) = 2 and 5 (mod 4) = 1. Notice that
(22+5) (mod 4) = 27 (mod 4) = 3. And also 2+1 (mod 4) = 3. So
we get the same answer whether we add 22+5 and then take the
mod or just add the mods together, 2+1.
What if we want to multiply 22x5 (mod 4)? We could first
multiplty 22 and 5 and then take mod 4 of the answer: (22x5)
(mod 4) = 110 (mod 4) = 2 or we can multiply the mods, 2x1
(mod 4)=2.
We can make this into a Theorem:
Theorem 2: If b (mod n) = a and d (mod n)= c then a+c = b+d
(mod n) and ac = bd (mod n)
Proof: Since b(mod n) = a, Theorem 1 tells us that b-nk=a for
some integer k and since d(mod n) = c, d-nj=c for some integer j.
Now finish the proof by adding a+c.
To prove the second part of the theorem, consider ac=(bnk)(d-nj) = bd – bnj-dnk + n^2kj = n(nkj –bj-dk) + bd. Now nkjbj-dk is an integer so let l=(nkj-bj-dk). Then ac = bd – nl, so
bd (mod n) = ac (this is because bd-nl (mod n) = bd- we have
subtracted as many n’s from bd as we can).
IIc. Fast Exponentiation
Now we want to do exponentiation (mod n). When we get
to RSA cryptography, we will want to exponentiate very large
numbers with very large exponents and we will want to use very
large mods, for example, we might want to compute 988139423
(mod 46,927). This is obviously impossible to do by hand or by
calculator. There is a method, however, called Fast
Exponentiation. To see how it works, let’s use an easier
problem. Suppose we want to find 547 (mod 21). Normally you
would think of multiplying 5 47 times and then taking (mod 21) of
the answer. But remember that we are wanting a method to use
for large numbers (and 5^47 is already too big for your calculator).
We wouldn’t want to have to multiply 9881 by itself 39423 times.
For our new method of exponentiating, we first find 5x where x is
a power of 2. We will do this for n powers of 2 where 2n < 47 <
2n+1 (so here n is 5). We’ll see why we choose n like this soon.
We get
52 (mod 21) = 25 (mod 21) = 4
54 (mod 21) = (52)2 (mod 21) = 42 (mod 21) =16
In this last step we used that 54 = (52)2 and then substituted in our
previous answer for 52 (mod 21). Continuing, we get
58(mod 21) = (54)2 (mod 21) = 162(mod 21)
= 256 (mod 21)
=4
516 (mod 21) = (58)2 (mod 21) = 42 (mod 21)=16
532 (mod 21) = (516)2 (mod 21) = 162(mod 21)
= 256 (mod 21)
=4
Now we stop finding powers of 2 because 64>47 and so 564>547
and will not appear in the breakdown of 547 . Now we break down
the exponent, 47, into powers of 2. 47=32+8+4+2+1, so
547 = 532+8+4+2+1=5^32 x 58 x 54 x 52 x 51
547 (mod 21) = 4 x 4 x 16 x 4 x 5 (mod 21)
= 5120 (mod 21)
(now you can see why we didn’t bother computing 564; in the
breakdown of 47 into powers of 2, 64 would not come into play
since 64>47). We have now broken our problem down into
something we can do on our calculator. Since 5120/21 = 243.8
and 243x21 = 5103, we know that
5120 (mod 21) = 5120-5103 = 17
and so
547 (mod 21) = 17
Let’s try another one, 723 (mod 17). This time we will
exponentiate using powers of 2 up to 24 since 24 < 23 < 25.
72 (mod 17) = 49 (mod 17) = 15
74 (mod 17) = (72)2 (mod 17) = 152 (mod 17)
= 225 (mod 17)
=4
8
4 2
7 (mod 17) = (7 ) (mod 17) = 42 (mod 17)
= 16
716 (mod 17) = (78)2 (mod 17) = 162 (mod 17)
= 256 (mod 17)
=1
Now we have 23=16+4+2+1, so
723 (mod 17) = 716+4+2+1 (mod 17)
= 716 x 74 x 74 x 7 (mod 17)
= 1 x 4 x 15 x 7 (mod 17)
= 420 (mod 17)
= 12
Exercises:
1.
2.
3.
4.
5.
6.
25x36 (mod 4) (use your theorems)
1753247x235711131719 (mod 5)
117 (mod 21)
342 (mod 5)
3753 (mod 15)
742 (mod 18)
IId. Solving Modular Equations
Consider the following modular equations:
2x=1 (mod 5)
3x=1 (mod 6)
5x=1 (mod 8)
4x=1 (mod 10)
3x=1 (mod 11)
3x=7 (mod 11)
5x=1 (mod 7)
5x=4 (mod 7)
5x=1 (mod 18)
5x=7 (mod 18)
Examine these equations. Which have solutions? Since they are
modular problems, we only have to check values of x between 0
and m-1 (where m is the mod). For example, we can see that the
second problem has no solution since
(3)(0) = 0 (mod 6)
(3)(1) = 3 (mod 6)
(3)(2) = 6 (mod 6) = 0 (mod 6)
(3)(3) = 9 (mod 6) = 3 (mod 6)
(3)(4) = 12 (mod 6) = 0 (mod 6)
(3)(5) = 15 (mod 6) = 3 (mod 6)
So there is no x that makes 3x (mod) 6 = 1
Check the others and see if they have solutions. We see that
some of these equations have solutions and some don’t. To be able
to tell which have solutions and which don’t, without having to try
every number between 0 and m-1, we have to understand the
greatest common divisor, or gcd, of two positive integers. The
gcd(j,k), where j and k are two positive integers, is the largest
positive integer which divides both j and k. For example
gcd(6,8)=2, gcd(15,25) = 5, and gcd(15,77)=1. Two positive
integers, j and k, are said to be relatively prime to one another if
gcd(j,k)=1; that is, they have no common divisors except 1. For
example, 15 and 77 are relatively prime and 6 and 8 are not
relatively prime. Looking back at our list of equations above, we
can spot a pattern: There is a solution x to ax=1 (mod m) if and
only if gcd(a,m)=1. This is part of Theorem 3.
Theorem 3: Given a and m, there is a solution, x, to ax =1 (mod
m) if and only if gcd(a,m) = 1. If there is a c such that ac=1 (mod
m), then the solution to the equation ax=b (mod m) is x=bc (mod
m).
The second part of the Theorem tells us how to solve 3x=7 (mod
11) by first solving 3x=1 (mod 11). We know that 3x=1 (mod
11) has a solution because gcd(3,11)=1. We found this solution to
be x=4. So the solution to 3x=7 (mod 11) must be (4)(7) = 28
(mod 11) = 6.
IIe. Finding GCD’s
How does one find the gcd of two numbers? One way is to list all
the divisors for each number and find the largest number that
appears in both lists. But as mentioned before, we will be
working with large numbers once we finally get to RSA
cryptography. How does one find the gcd of two large numbers?
Let’s start with trying to find the gcd(169,55). It’s fairly easy to
do this by hand as neither of these numbers has very many
divisors, but we will use it as an example in a new method of
finding gcd’s called the Euclidian Algorithm. Before we explain
the algorithm there is a very important fact that we must be aware
of
Very Important Fact If a+b = c, then any number that divides
both a and b also divides a+b and so also divides c. Similarly, if a
number divides b and c, it must also divide a because a = b-c and
anything that divides b and c also divides b-c. Similarly, anything
that divides a and c must also divide b. We call this the 2 implies
3 fact.
Notice that, by dividing 169 by 55, we find that
So
169 = 3x55 + 4
4 = 169 – 3x55
This second equation shows us that anything that divides both 169
and 55 also divides 4, including the greatest common divisor of
169 and 55 (because it is, obviously, a divisor of both 169 and 55).
The set of common divisors of 169 and 55 is therefore the same
the set of common divisors of 55 and 4 (and also of 169 and 4).
Therefore, gcd(169,55) = gcd(55,4). Now divide 55 by 4 to get
55 = 13x4 + 3
3 = 55 – 13x4
So if something divides both 55 and 4 it must also divide 3, so
gcd(55,4) = gcd(4,3). Pretend that you don’t know gcd(4,3) and
continue dividing:
4 = 1x3 + 1
1 = 4-1x3
So if something divides both 4 and 3, it must also divide 1, so
gcd(4,3) = gcd(3,1). The greatest common devisor of anything
and 1 is 1, so gcd(169,55) = gcd(55,4) = gcd(4,3) = gcd(3,1) = 1.
Therefore 169 and 55 are relatively prime. Let’s try another one.
Find gcd(556,224).
556= 2x224 + 108
108 = 556 – 2x224
So gcd(556,224) = gcd(224,108)
224 = 2x108 + 8
8 = 224 – 2x108
So gcd(224,108) = gcd(108,8)
108 = 13x8 + 4
4 = 108-13x8
So gcd(108,8) = gcd(8,4)
8 = 2x4 + 0
When you hit 0 or 1 you are done. If you hit 0, go back to the
previous line and find the gcd. So here the gcd(8,4) = 4. So
gcd(556,224) = gcd(224, 108) = gcd(108,8) = gcd(8,4) = 4.
If you get down to 1 then the two numbers you started with are
relatively prime.
We can compress out calculations a bit by leaving out the second
equation in each pair, like this:
556= 2 x 224 + 108
224 = 2 x 108 + 8
108 = 13 x 8 + 4
8=2x4+0
IIf. Solving Modular Equations Part 2
If we use the Euclidean Algorithm to go backwards when
gcd(a,m)=1, we can solve the equation ax (mod m)=1. Let’s
see how this works. Let’s solve the equation 169x = 1 (mod 55).
We will go back to our first problem where we found that
gcd(169,55)=1 and work backwards. Recall we had
1 = 4-1x3
Now substitute 3=55-13x4
1 = 4-(55-13x4)
1 = 14x4 – 55
Now substitute 4=169-3x55 and continue in this manner until you
get 1=ax169 + bx55 (a and b can be positive or negative).
1 = 14x(169-3x55) – 55
1 = 14x169 – 14x3x55 – 55
1 = 14x169 – 42x55 – 55
1 = 14x169 – 43x55
So 169x14 = 1+43x55 = 1 mod(55) . So the solution to 169x=1
(mod 55) is x=14. Notice that we have also solved 55x = 1 (mod
169) since -55x43 = 1-14x169 = 1 (mod 169). The solution to
55x=1 (mod 169) is x=-43. If you don’t want a negative answer
then add 169 to your answer, enough times so that your answer is
no longer negative. In this case we just need to add 169 once to
get a positive answer: x= -43 + 169 = 126. Notice that these two
problems would be very hard to solve by hand. The process of
finding the gcd and then going backwards is called the Extended
Eulidean Algorithm and we can use it to solve equations like
55x=1 (mod 169). Let’s do another example: Solve 61x=1 (mod
126). First we use the Euclidean Algorithm to find gcd(126,61).
126 = 2 x 61 + 4
61 = 15 x 4 + 1
4 = 4x1 + 0
So gcd(126,61)=1 and, working backwards, we get
1 = 61 – 15x4
Substitute 4=126-2x61
1 = 61 – 15x(126-2x61)
1 = 61 – 15x126 + 30x61
1 = 31x61 – 15x126
So 61x31=1+15x126 = 1 (mod 126), so the solution of
61x=1 (mod 126)
is x=31. Notice that you have to be very careful when going
backwards, substituting the right numbers with the goal of getting
1 = au + bv where we start with gcd(a,b).
Let’s review how to solve ax=b (mod m).
1. Check that gcd(a,m)=1, otherwise the equation has no
solution. If the numbers are small enough factor each
one and find the greatest common divisor. Otherwise use
the Euclidean Algorithm.
2. Assuming that gcd(a.m)=1, find integers u and v such that
au+mv=1. You may just be able to do this by looking at
the equation, for example 2u + 7v = 1 has solutions u=-3
and v=1. Otherwise use the Extended Euclidean
Algorithm to go backwards from finding gcd(a,m)
3. Now rearrange the equation so that au=1-nv. Then we
have au=1 (mod n). Make your u positive if it’s not
already. Things are just easier that way.
4. Now use Theorem 3 to solve au=b (mod m)
Exercises:
1. Find the smallest positive integer x, so that 243 x = 1 (mod 40)
2. Find the smallest positive integer x, so that 40 x = 1 (mod 243)
3. Find the smallest positive integer x, so that 61 x =7 (mod 126)
4. Find the smallest positive integer x, so that 562 x = 8 (mod 57)
5. Find three integer solutions of 53 x = 4 (mod 169)
IIg. Powers Mod m
Following are tables of powers mod m for various m. Once the
pattern becomes apparent, the line in the table is ended. For
example, once a zero appears, we know the rest of that line will be
zeroes. Once a one appears, we know that the line will repeat
itself. For example 21 (mod 5) = 2, 22 mod(5) = 4, 23 (mod 5) =
3, 24 (mod 5)=1, 25 (mod 5) = 2 and as we take successive powers
of 2 (mod 5) we will get 4, 3, 1, 2, 4, 3, 1, 2, …
A question we will need to answer in order to do RSA
cryptography is under what conditions, given a and m, is there a
positive integer k such that ak=1 (mod m)? Looking at the table
of powers mod m, we see, for example, that there is no such k for
a=2 and m=4. This is because 22= 0 (mod 4) and so 2K=0 (mod 4)
for all k >= 2. On the other hand, there is a k when a=1, 2, 3, or 4
and m=5. For a =1 or 4, k=2 works, and when a=2 or 3, k=4
works. Looking even more closely at the table, it seems
reasonable to say
Theorem 4: If gcd(a,m)=1, then there is an integer k such that
ak=1 (mod m).
This leads us to another question: Given m, is there one value of k
such that ak = 1 (mod m) for all a with gcd(a,m)=1?
Let’s look at the table. When m=5, k=4 will work for all values of
a with gcd(a,m)=1 (mod m). For example, 14=1 (mod 5), 24=1
(mod 5), 34 =1 (mod 5), 4^4=1 (mod 5). When m=6, k=2 works,
for example 12 =1 (mod 6) and 52 = 1 (mod 6). When m=7, k=6
works (verify this). Restricting ourselves to prime values of m,
we can spot a pattern: It looks like when m is prime am-1=1 (mod
m) for all a with gcd(a,m)=1. In fact, this is a theorem called
Fermat’s Little Theorem.
Theorem 5 (Fermat’s Little Theorem): If p is prime, then for
any integer a with gcd(a,p)=1, we have ap-1=1 (mod p).
Proof: Let p be a prime and let a be any integer with gcd(a,p)=1.
Assume 1 <= a <= p-1.
First we will show that a, 2a, 3a, … (p-1)a are all distinct (mod p).
That is, there is no ka and ja such that ka=ja (mod p) (assuming k
and j are distinct and less than p). Let’s assume there are such
values j and k. This kind of proof is called proof by contradiction
where we prove something by assuming the opposite is true and
then show there is a contradiction. So suppose that aj=ak (mod p).
By definition, we know that p divides aj-ak=a(j-k). So, since p is
prime, either p divides a or p divides j-k. But p can’t divide a
because 1<=a < p, so it must divide j-k, which means that j=k
(mod p). But j and k are also between 1 and p and so this means
j=k.
So now we know that a, 2a, 3a, …, (p-1)a are all distinct mod p.
But there are p-1 of these things so we know that, as a set, {a mod
p), 2a (mod p),…, (p-1)a (mod p)}= {1,2,3,…p-1} (this is not to
say that a (mod p) = 1, just that a (mod p) is in the set {1,2,3,…p1}which must be true because the set {a (mod p), 2a (mod p), 3a
(mod p),…, (p-1)a (mod p)} has p-1 distinct elements all less than
p and all distinct. If we multiply all the elements in each of the
two sets together, we get (a)(2a)(3a)…(p-1)a = (1)(2)(3)…(p-1)
(mod p). Finish this proof to show that ap-1=1 (mod p).
What if p is not prime? Is there a value of k such that ak=1 (mod
p) for all a with gcd(a,p)=1?
To answer this, we need a new concept called the Euler phifunction (phi is pronounced “fee”). Here is the definition of the
Euler phi-function:
Definition: Given a positive integer n, phi(n) = the number of
positive integers less than n and relatively prime to n.
For example, phi(5) = 4 since 1, 2, 3, and 4 are all relatively prime
to 5 and less than 5. Phi(6) = 2, since the only numbers less than 6
that are relatively prime to 6 are 1 and 5.
Exercises:
1. Compute phi(15)
2. Compute phi(21)
3. Suppose p is prime, what is phi(p)?
Theorem 6 (Euler’s Theorem): If gcd(a.m)=1, then aphi(m) = 1
(mod m)
Let’s try this. Let m=6. Then gcd(5,6)=1 and 5phi(6) = 52 = 1 (mod
6). Also gcd(1,6)=1 and 1phi(6) =12 = 1 (mod 6). Let’s try another
one. Let m=8. Then gcd(5,8) = 1 and 5phi(8)=54 = 625 (mod 8) =
1. Also gcd(7,8)=1 and 7phi(8)= 74 = 2401 (mod 8) = 1.
Exercises:
1. Let m=9. Find all integers x such that 0 < x < p with
gcd(x,9) =1 Compute xphi(m) for all these values of x.
We can compute phi(m) by simply listing all the positive integers
less than m and crossing out the ones that share factors with m.
This is probably what you did when you computed phi(15) and
phi(21). But remember that when we get to RSA cryptology we
will be working with large numbers and this process would be
nearly impossible. We are in luck however because in RSA
cryptology, the m that we choose will be the product of 2 large
primes, p and q, and we have a theorem that tells us what phi(p,q)
is.
Theorem 7: If p and q are distinct prime numbers (i.e. p is not
equal to q) and m=pq, then phi(m)=phi(pq)=(p-1)(q-1).
Let’s try a couple of examples. Let m=6. We know that phi(6)=2.
We also know that 6 is the product of two prime numbers, 2 and 3.
Using the theorem, phi(6)=phi(3x2)=2x1=2. So the theorem holds
in this case. Let’s try another one. Let m=15. We know that
phi(15)=8. We also know that 15 is the product of two primes, 3
and 5. From the theorem, phi(15)=phi(5x3) = 4x2 = 8. This
example also holds. We won’t prove this theorem because this
project is already long, but the proof is not that hard.
IIh. Numbers in Other Bases
When we first learned about place value in elementary school, we
learned that we could write 364 like this:
364 = 3x100 + 6x10 + 4
and later, when we learned about exponents, we could write this as
364 = 3x102 + 6x101 + 4x100
In our number system, we write integers and rational numbers in
place-value notation, where each place indicates a different power
of 10. The Babylonians, of more than 3000 years ago, invented
place-value notation. The Babylonians shared their discovery
with the Hindus of more than 2600 years ago. The Hindus
transmitted it to the Arabs by 600 AD and the Arabs shared it with
the Europeans around 1200 AD. The choice of 10 as the base is
no accident. Look at your fingers! Ten is not the only base to
have been used. The Babylonians sometimes used base 60. The
Mayans used base 20 (also no accident- look at your toes, too!).
Today bases 2, 8 and 16 are used by computers.
Let’s look at base 8. In base 8, we use the digits 0, 1, 2, 3, 4, 5, 6,
and 7 (notice that in base 10 we use the digits 0, 1, 2, 3, 4, 5, 6, 7,
8, and 9). If we want to notate that a number is in base 8 we use a
subscript: 4238, which is read “four two three base eight”. The
place values in base 8 are 80, 81, 82, 83, 84, etc. So we have the 1’s
place all the way to the right, then the 8’s place, then the 64’s
place, then the 512’s, then the 4096’s place and so on. If we write
4238 out in terms of its place values we would have
4238 = 4x82 + 2x81 + 3x80
This helps in converting the base 8 number to a base 10 number.
4238 = 4x64 + 2x8 + 3 = 256 + 16 + 3 = 275
What if we want to convert a number from base 10 to base 8?
Let’s convert the number 7492 to a base 8 number. It looks like
the first base 8 place value we will use is the 4096’s place (the next
highest place value is 32768, which we get by multiplying 4096x8,
and this is bigger than 7492 so there are no 32768’s in 7492, but
there is a 4096 in 7492). How many 4096’s are in 7492?
Just one. So 7492 will have a 1 in the 4096’s place. How much is
left of my 7492 after I have taken out 1 4096? 7492-4096 = 3396.
This is bigger than 512, so there will be some 512’s. Divide 3396
by 512 to see how many 512’s there are. 3396/512=6.632125, so
there are 6 512’s. How much is left after you take out the 6
512’s? Multiply 512 by 6 and subtract that from 3396: 3396512x6 = 324. 324 is bigger than 64 so it will have some 64’s in it.
Again, divide 324 by 64: 324/64 = 5.0625. So there will be 5
64’s in 324. What’s left is 324 – 64x5 = 4. 4 is smaller than 8 so
there will be nothing in the 8’s place and 4 left in the 1’s place so
we have that 7492 has 1 4096, 6 512’s, 5 64’s, 0 8’s and 4 1’s so
7492 = 165048
Here’s a fun little exercise. Convert the largest 3 digit base 8
number to base 10. First, what is the largest 3 digit base 8
number? It’s 7778. Now
7778 = 7x64 + 7x8 + 7x1 = 448 + 56 + 7 = 511
Notice that 511 is one less than the next place value in base 8,
which is 512. Surprising? Not really. Think about the largest 3
digit number in base 10. It is 999 which is one less than 1000.
What about bases bigger than 10? They pose the question of what
notation to use for digits larger than 9. For example let’s think
about the hexadecimal, or base 16, digits. We need digits for 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15. But we can’t use a
number with two digits to describe a base 16 digit. Instead, once
we get past 9, we use letters. We use A for 10, B for 11, C for 12,
and so on til we get to F for 15. Let’s convert the number C516 to
base 10:
C516 = Cx16 + 5 = 12x16 + 5 = 192 + 5 = 197
If you’ve built a web page using HTML, you may have used the
hexadecimal system to set your background color on the web page.
Background colors are coded in 6 digit numbers, where the first
two numbers (or letters) are the hexadecimal number for the
amount of red in the color, the next two digits are the hexadecimal
number for the amount of green is in the color and the last two
digits are the hexadecimal number for how much blue is in the
color. A color code FF119A will set the background to have as
much red as possible, not much green, and a moderate amount of
blue.
Exercises:
1. Convert 3048 to base 10.
2. Convert FFD16 to base 10.
3. Convert 10112 to base 10 (base 2 is a computer’s favorite
base- base 2 numbers are also called binary numbers).
4. Convert 41336 to base 10
5. Convert 851 to base 8
6. Convert 1389 to hexadecimal
7. Convert 322 to binary
8. Convert 677 to base 9
III. Public Key Cryptography and the RSA System
Suppose your best friend moves to California and you want to
communicate with her via email. If you want to be sure that no
one can intercept and read your messages, you’ll need to encode
them in some way. If you decide to use a substitution cipher, you
must first decide on the specific key, i.e. what substitutions you’ll
make and you need to share this key with your friend, otherwise
she won’t be able to read your emails. But if someone intercepts
your key, they will be able to read all your emails.
For a more realistic scenario, consider purchasing things on
Amazon. When you type in your credit card number into the
purchase form, it gets sent through cyberspace to Amazon. This
message could be intercepted, so you certainly want it encoded in a
way so that no one but Amazon can decode it. You don’t want to
meet with a representative with Amazon to decide on your
encoding key. And Amazon has literally millions of customersthey can’t keep millions of different keys.
The main problem is that substitution ciphers, besides being easy
to crack, are symmetric. That means that anyone who knows how
to encode a message also knows how to decode it. We call this a
private key system, and the effectiveness of the cipher depends
upon the secrecy of the key. Wouldn’t it be nice if there was a way
to send a message using a system where, even if someone else
finds out how to encode a message using your system, they can’t
decode messages. So they can send a message using your system,
but they can’t read a message that you write because they can’t
decode a message sent in your system.
This is the idea behind Public Key Cryptography. A public key
cryptographic system actually has two keys- one public and one
private. In order to encode a message, all one needs to know is
the public key, but to decode the message you need a private key,
too. Think of having a box with a combination lock. Someone
can put a message in your box and lock it, but only you know the
combination for the lock and so only you can open the box and
read the message. To communicate with your friend in California,
you would each design your own set of keys, one public and one
private. You would tell each other your public keys, but keep
your private keys to yourself. Then to send a message to your
friend, you would encode it using her public key. There would be
no reason for you to know her private key- only she needs to know
it. So there’s no need to exchange private keys and so nobody can
ever intercept them and so no one can read the messages you send.
Wouldn’t it be nice to have a system like this?
Well there is one! More than one actually, but the one we will
study is called the RSA system. Remember it was named after its
inventors, Ronald Rivest, Adi Shamir, and Leonard Adleman, who
invented it in 1977 and who were very clever. Amazon uses the
RSA system to encode and decode credit card numbers. Everyone
uses Amazons public key (they only have one), but no one except
Amazon knows their private key, so no one can decode encoded
credit card numbers. Throughout this project, we have learned all
the math we need to know to use the RSA system. In the RSA
system, the public key is two integers, n and e, and the private key
is a 3rd integer d. In practice, these integers need to be very large;
typically n will have at least 300 digits. Of course, n, e, and d
need to be chosen in a special way to make the system work. In
particular, n will be the product of two distinct large primes, and d
and e will be integers less than n with de = 1 (mod phi(n)). The
private key, d, is secret because given e and n, in order to compute
d, you would have to factor n, which is computationally impossible
for very large n. How would you figure out the two primes whose
product is n?
In the RSA system each user sets up his or her own public and
private keys. The public keys are very public- they are published
in some location so that everyone who wants them can find them.
The private keys are very private. Only the owner of the private
key knows the private key and if someone else finds it out, the
system is no longer any good because that person could then
decode messages sent by the owner of that private key.
Let’s imagine two friends, Alice and Bob, who use the RSA
system to send messages back and forth. We’ll describe how
Alice would choose her keys and then we’ll see how Bob would
sent her an encoded message and how she would decode it.
Pick primes: First Alice would pick two large primes, p and q,
each having about 150 digits. For our purposes, we’ll use smaller
primes, say p=281 and q=167.
Calculate n and phi(n): Next Alice sets n=pq. Since Alice
knows the factorization of n, she can compute phi(n)=(p-1)(q-1)
(recall Theorem 7). Since, in a real scenario, n would have about
300 digits, it would be very difficult to compute phi(n) without
knowing the factorization of n and, since n is so large, it would be
impossible to factor.
In our example, we have n=281x167=46,927 and phi(n)=280x166
= 46,480.
Choose e (the encoding exponent): The next step is to pick a
value e, less than n, at random such that gcd(e,phi(n)) = 1. Alice
does this by first selecting a value for e and then using the
Euclidean Algorithm to see if gcd(e,phi(n))=1. If not, she tries
another e.
We’ll choose e=39,423.
Let’s check that gcd(39,423, 46,480) = 1
46,480 = 1 x 39423 + 7057
39423 = 5x7057 + 4138
7057 = 1x4138 + 2919
4138 = 1x2919+ 1219
2919 = 2x1219 + 481
1219 = 2x481 + 257
481 = 1x257 + 224
257 = 1x224 + 33
224 = 6x33 + 26
33 = 1x26 + 7
26 = 3x7 + 5
7 = 1x5 + 2
5 = 2x2 + 1
So gcd (39,423, 46,480) = 1.
Find d (the decoding exponent): Now Alice needs to find a
value of d so that de = 1 (mod phi(n)), or, in our case dx39,423 =
1 (mod 46480). She does this by going backwards on the
Euclidean Algorithm. Here are the first few steps:
1 = 5 – 2x2
1 = 5 - 2(7-5)
1 = 5 – 2x7 + 2x5
1 = 3x5 – 2x7
1 = 3x(26-3x7) – 2x7
1 = 3x26 – 9x7 -2x7
1 = 3x26 – 11x7
1 = 3x26 – 11(33-26)
1 = 3x26 – 11x33 + 11x26
1 = 14x26 – 11x33
1 = 14(224-6x33) – 11x33
1 = 14x224 – 84x33 – 11x33
1 = 14x224 – 95x33
1 = 14x224 – 95x(257-224)
1 = 14x224 – 95x257 + 95x224
1 = 109x224 – 95x257
Continue by substituting in for 224 and 257 and then substituting
again and again until you get
1 = ax39,423 + bx46,480
ax39,423 = 1 – bx46,480 = 1 (mod 46,480)
Then a is your d. It turns out that d=26,767.
Now Alice will publish her public keys: n=46, 927 and e=39,423,
but she will keep the secret key d=26,767. She will not tell
anyone the secret key, not even Bob. Incidentally, she should
destroy her records of the values of p, q and phi(n) because these
are no longer needed and if someone found them out, they could
compute her secret decoding exponent, d.
Now we will see how to send an encoded message to Alice using
her public keys. Because the keys are public, anyone can send
Alice an encoded message, but only Alice can decode the
messages because only she knows the decoding exponent.
Suppose Bob wants to send the message “NO WAY” to Alice’s
invitation to go skydiving. Bob is going to first change his
message to a string of numbers. First he needs to break up his
message into blocks that are as large as possible so that frequency
analysis cannot be used to decode his message. He will use blocks
of size k where 27k is the largest integer power of 27 which is less
than Alice’s public number n. Let’s look at powers of 27:
271=27, 272 =729, 273=19683, 274 = 531411. So 273 =19683 is
less than n=46927, but 274 is greater than n, so Bob will use
blocks of size k=3. We include spaces in the blocks, so he will
break up his message into two blocks: NO and WAY.
Next he will convert his alphabetic blocks into numbers, one
number for each block. Let “A” be 0, “B” be 1, “C” be 2, …
“Z” be 25 and “ “ be 26. He will think of the letters in each
block as being coefficients of a base 27 number, like so
NO = 13x272 + 14x271 + 26x270 = 9,881
because 13 corresponds to “N”, 14 corresponds to “O” and 26
corresponds to “ “. So now “NO “ corresponds to the base 10
number 9881. We do the same thing to “WAY”.
WAY = 22x272 + 0x271 + 24x270 = 16,062
Now he computes each number raised to the encoding exponent
and takes mod n of that number. So to encode “NO “ he
computes 9,88139423 (mod 46,927) and to encode “WAY” he
computes 16,06239423 (mod 46,927). To do the exponentiation,
Bob uses fast exponentiation. To make this even faster, he uses a
fast exponentiation program at
http://www.nebrwesleyan.edu/people/kpfabe/FastExp,html.
He gets 9,88139423 (mod 46,927) = 9388
And 16,06239423 (mod 46,927) = 21,358
Now Bob converts his encoded numbers into base 27 numbers. He
needs to use k place values where 27k < 46926 < 27k+1 because he
could have numbers as large as 46926 to convert. So he will use
place values 273, 272,, 271, 270. We go back to the powers of 27
we computed earlier to compute the place values. Note that 272 <
9388 < 273, so we will have 0 in the 273 place. Next 272=729, so
we see how many 729’s are in 9388. The answer is 12.
9388=12x729 + 640. Now we see how many 27’s are in 640.
The answer is 23. 640=23x27 + 19. 19 is less than 27, so there
will be 19 1’s. So 9388 = 0x273 + 12x272 + 23x271 + 19x270
which corresponds to the string “AMXT” because 0 corresponds to
“A”, 12 corresponds to “M”, 23 corresponds to “X” and 19
corresponds to “T”.
Now we have to write 21,538 in base 27. 21538 has 1 273 in it.
I’ll let you do the rest of the conversion. You should get
21,538 = 1x273 + 2x272 + 8x271 + 1x270 which corresponds to the
string “BCIB”.
So Bob sends the encoded string AMXTBCIB. Note that he only
used Alice’s public keys, n and e to encode his message.
Now let’s see how Alice decodes Bob’s message. She has to
“undo” what Bob did. She begins by breaking up the string into 2
blocks of 4, “AMXT” and “BCIB”. She knows to use blocks of 4
because she knows the value of her public key, n, and she can
figure out that 273 < n < 274 so she needs place values of 273, 272,
271 and 270. The alphabetic blocks stand for base 27 numbers
and she can convert these into base 10 numbers like so:
AMXT = 0x273 + 12x272 + 23x271 + 19x270=9388
BCIM = 1x273 + 2x272 + 8x271 + 1x270 = 21,358
using that 0 corresponds to “A”, 12 corresponds to “M” and so on.
To decode each number, she uses her decoding exponent, d, and
takes the resulting number (mod 46,927). So to decode AMXT,
she computes
938826,767 (mod 46,927) = 9881
and to decode BCIB, she computes
21,35826767 (mod 46,927) = 16062.
These are the same numbers Bob got when he first converted his
alphabetic blocks to numbers. Now Alice just needs to convert
these numbers to base 27 numbers and use the coefficients to get
the original letters in Bob’s code. We already know the
conversion because we saw Bob do it, but in real life, Alice
wouldn’t have seen Bob do it.
9881 = 13x272 + 14x271 + 26x270 corresponds to “NO “
16062 = 22x272 + 0x271 + 24x270 corresponds to “WAY”
The message Bob sent to Alice is “NO WAY”. Too bad for
Alice, she will have to find someone else to go skydiving with.
Alice decoding Bob’s message looks like magic, but it is really just
clever mathematics.
Remember that in real life we would use much larger numbers.
We would pick p and q to be very large primes, each with about
150 digits. That way one wouldn’t be able to factor n with a
computer or calculator. If one picks a number with 150 digits,
how does one know whether or not it is prime? The problem of
finding large primes is still an active research area in mathematics.
Exercises:
1. Encode a short message for Alice, using her public keys.
Break the message into blocks of three letters. If the length of the
message is not divisible by three, then pad the end with spaces.
When you encode your message, notice that you only use her
public keys, n and e. When I pretend to be Alice and decode your
messages, I will have to use d.
2. Suppose you want to send me a coded message and suppose you
pick p=701 and q=331. Then n=701x331 = 232031. phi(n)=(p1)(q-1) = 231,000. Suppose you also pick e=10163. This value
of e is prime and does not divide 231,000 so gcd(e,phi(n))=1. So
your public keys are n=232031 and e=10163. Compute your
private decoding key (remember we want it to be positive) and
then get rid of your values for p, q and phi(n), so that no one can
figure out your value of d (in real life you would pick your own
values of p and q, but if you did, I would have to deal with many
different pubic keys and encode my message differently for each
one of you). You can check with me that you got the right
decoding exponent (again, in real life, since you would each have
your own public keys, you would each have your own decoding
exponent, d, and I wouldn’t know what d was because I wouldn’t
know what your p and q are, but if you each had your own
decoding exponent, I would have to check all of them to make sure
you computed it correctly). These will be your keys for decoding
a message from me. Here is my message to you:
HBORDFPOKNMP. Note that when I encoded this message, I
only used your values for n and e. Now you should decode this
message using your values for n and e and d.
Download