Cryptography: from substitution cipher to RSA Alex Karassev Description of the problem • Alice wants to send a secret message to Bob • Problem: this message can be captured • Possible ways to intercept a message: steal a letter from mailbox wiretap \ listen in a phone call steal data from network cable use a hacker attack Possible solutions • Conceal the fact of sending • Conceal the text of the message Conceal the fact of sending • Use “invisible” ink • “Hide” the message inside a larger text However, in all these examples • Use very small font (also used as anti-counterfeiting measure: look at dollar bills for example) the message is send as a plaintext • Example from ancient time: shaving the head of a slave and tattooing the (i.e. non-encrypted text) message on the slave's bald scalp. When his hair had grown enough to conceal the tattoo, the slave was sent to another person who shaved the slave's head to receive the message Conceal the text of the message • Use cipher or, more precisely, encryption • Encryption is a process that transforms a plaintext of the message into a different text (which usually looks as a meaningless collection of symbols or numbers) according to certain algorithm • The resulting text is called ciphertext (or encrypted message) • Algorithms used are called ciphers • The study of ciphers is called cryptography How to make all this happen? Alice sends a message to Bob… • Alice takes the message and replaces each letter in the message by some other letter of symbol (or by several symbols) according to some cipher • She sends it then to Bob • Nobody else except for Bob knows how to read the encrypted message • Bob receive the message and uses the process of deciphering (or decrypting) to read the message The oldest algorithm: Substitution cipher • Each letter of alphabet is replaced by another letter or symbol, or several symbols • Example: A → 1, B → 2, C → 3 and so on • Less trivial example: • A → 26, B → 25, C → 24, …, Z → 1 Main Requirement • We must have a procedure allowing us to restore plaintext from encrypted text without ambiguities • Then we need… • One-to-one correspondence: different letter of alphabet are replaced by different symbols • Substitution table: A B C D E F G H I J K L M 26 25 24 23 22 21 20 19 18 17 16 15 14 N O P Q R S T U V W X Y Z 13 12 11 10 9 8 7 6 5 4 3 2 1 • Immediately, we have a problem: What is 262524? • Is it ABC? • Or is it YUYVYW? • Or maybe ABYW? • Also, we need to encode spaces between words A B C D E F G H I J K L M 26 25 24 23 22 21 20 19 18 17 16 15 14 N O P Q R S T U V W X Y Z 13 12 11 10 09 08 07 06 05 04 03 02 01 • It would be better to use the following cipher: • A → 26, …,X →03, Y →02, Z → 01 and space is 00 • We know that every TWO symbols represent a letter • Thus • 14260719001808000719220807 • is… • MATH IS THE BEST A historical example of substitution cipher – shift (or Caesar) cipher ABCDEFGHIJKLMNOPQRSTUVWXYZ • Choose k • Shift all letters by k • For example, if k = 5 • A becomes F, B becomes G, C becomes H, and so on… • What will replace X? Modular arithmetic • In the Caesar cipher, the following algorithm is used • If n is the number of a letter in the alphabet, this letter is replaced by another letter, whose number is (n+k) modulo 26 (shortly (n+k) mod 26) • This is a remainder of division of (n+k) by 26 • For example, take k=5 and take letter X • Its number n = 24 • (n+k) mod 26 = 24 + 5 mod 26 = 3 • So X is replaced with C Exercise • Use shift cipher with k=7 to encrypt the text “TOP SECRET” (ignore space) Is substitution cipher really good? • It seems it satisfies the main condition: if Alice and Bob agree on the table for substitution then they can exchange Substitution cipher is completely ruined by messages and keep them secret FREQUENCY • Is it really the case? ANALYSIS Frequency analysis • Discovered by Arabs (approx. 9th century) • Statistically, it is possible to determine how often each letter appears in an “average” text • Frequency table • Other useful observation: ST, NG, TH, and QU are common pairs of letters (bigrams), while NZ and QJ are rare. What is one the most common trigram? THE Letters that often appear at the beginnings of words To analyze the text… • Count the number of appearance of each letter and divide it by the total number of words in the ciphertext • Compare the results with the frequency table • Note: this method applies effectively to sufficiently large texts Weaknesses of the substitution cipher • Every letter is ALWAYS encoded by the same symbol, which makes frequency analysis a very effective tool • Another problem: knowing context of the message is very useful (for instance, if you know that the text is about the types and number of airplanes in the enemy’s army, you should expect words “airplane”, “aircraft”, “weapon”, and so on) • Guesses also help: for instance, if you deciphered a part of the sentence “A cat drinks …” you would guess that the last word is “milk” (although it could be “lemonade”) Examples from Literature and History • the use of frequency analysis to attack simple substitution ciphers: Arthur Conan Doyle “Sherlock Holmes tale: The Adventure of the Dancing Men” Edgar Allan Poe “The Gold Bug” • Greeks: scytale cipher • Morse code • WWII: Enigma machine • ASCII (American Standard Code for Information Interchange): Each character is encoded by numbers from 0 to127 in binary format Alternatives to substitution cipher • Symmetric key algorithms (private key) • Algorithms with two keys (private and public) Symmetric key algorithm: simplified description • Block cipher: a plaintext is divided into blocks of equal length • Key: a sequence of symbols (or, equivalently, a number) Encryption and Decryption with symmetric key: mathematical model • Encryption Plaintext PT Cipher defines a function F that has two arguments: PT and a key k Ciphertext CT = F(PT,k) • Decryption -1 PT = F (CT,k) Example: Vernam’s cipher • Represent each letter in binary form • Transform the plaintext into the sequence of 0’s and 1’s • Choose a length of the key (say, 7 digits) • Generate a random 7-digits key • Divide this sequence into blocks (7 digits in each block) • Use addition modulo 2 to encrypt the message Vernam's cipher • It was mathematically proved unbreakable by Claude Shannon (in 1940s) assuming that the following conditions are satisfied: One-time pad Key is as long as the plaintext Key is random Used only once, and kept entirely secret Main problem with symmetric cipher • If Alice wants to send encrypted messages to Bob she needs to send the key to Bob • This is not safe • It would be nice if Bob lets Alice know the algorithm Bob send Alice a key called PUBLIC KEY (more precisely, Bob’s public key) (so everybody knows it) Alice uses this key to send an encrypted message to Bob Bob uses another key (PRIVATE KEY) to decipher the message from Alice Nobody else (who does not know Bob’s private key) can decipher this message YES! Is it at all possible? • RSA algorithm is one of such methods • The algorithm was described in 1977 by Ron Rivest, Adi Shamir and Len Adleman at MIT • RSA is an “unbreakable” algorithm Why is RSA so good? • Factorization into primes • A prime number is a positive integer greater than 1 that is divisible only by 1 and itself so 2, 3, 5, 7, 11, … are prime numbers while 84 = 3*4*7 is not • Fundamental theorem of arithmetic: every integer is a product of prime numbers. Such a representation is unique (up to permutation of factors) • Exercise: there are infinitely many primes Why is RSA so good? • At the present time, no effective algorithm of finding prime factorization of numbers is known • Equivalently, no effective algorithm to decide whether a given number is prime is known (it does not mean, however, that such algorithm cannot exist!) • Effective means an algorithm that works “reasonably” fast • Example: is 943 prime? References and Further Reading • Undergraduate “Introduction to Cryptography” by Johannes Buchmann (Springer) “Cryptography: An Introduction” by V. V. Yaschenko (editor) (American Mathematical Society) • Graduate “A Course in Number Theory and Cryptography” by Neal Koblitz (Springer)