# Cryptography ```Cryptography:
from substitution cipher to RSA
Alex Karassev
Description of the problem
• Alice wants to send a secret message to Bob
• Problem: this message can be captured
• Possible ways to intercept a message:




steal a letter from mailbox
wiretap \ listen in a phone call
steal data from network cable
use a hacker attack
Possible solutions
• Conceal the fact of sending
• Conceal the text of the message
Conceal the fact of sending
• Use “invisible” ink
• “Hide” the message inside a larger text
However, in all these examples
• Use very small font (also used as anti-counterfeiting
measure: look at dollar bills for example)
the message is send as a plaintext
• Example from ancient time:
shaving the
of a slave and tattooing
the
(i.e.
non-encrypted
text)
message on the slave's bald scalp. When his hair
had grown enough to conceal the tattoo, the slave
was sent to another person who shaved the slave's
Conceal the text of the message
• Use cipher or, more precisely, encryption
• Encryption is a process that transforms a
plaintext of the message into a different text
(which usually looks as a meaningless
collection of symbols or numbers) according to
certain algorithm
• The resulting text is called ciphertext (or
encrypted message)
• Algorithms used are called ciphers
• The study of ciphers is called cryptography
How to make all this happen?
Alice sends a message to Bob…
• Alice takes the message and replaces each
letter in the message by some other letter of
symbol (or by several symbols) according to
some cipher
• She sends it then to Bob
• Nobody else except for Bob knows how to read
the encrypted message
• Bob receive the message and uses the
process of deciphering (or decrypting) to read
the message
The oldest algorithm:
Substitution cipher
• Each letter of alphabet is replaced by
another letter or symbol, or several
symbols
• Example: A → 1, B → 2, C → 3 and so
on
• Less trivial example:
• A → 26, B → 25, C → 24, …, Z → 1
Main Requirement
• We must have a procedure allowing us to
restore plaintext from encrypted text
without ambiguities
• Then we need…
• One-to-one correspondence: different
letter of alphabet are replaced by different
symbols
• Substitution table:
A
B
C
D
E
F
G
H
I
J
K
L
M
26
25
24
23
22
21
20
19
18
17
16
15
14
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
13
12
11
10
9
8
7
6
5
4
3
2
1
• Immediately, we have a problem:
What is 262524?
• Is it ABC?
• Or is it YUYVYW?
• Or maybe ABYW?
• Also, we need to encode spaces between
words
A
B
C
D
E
F
G
H
I
J
K
L
M
26
25
24
23
22
21
20
19
18
17
16
15
14
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
13
12
11
10
09
08
07
06
05
04
03
02
01
• It would be better to use the following cipher:
• A → 26, …,X →03, Y →02, Z → 01
and space is 00
• We know that every TWO symbols represent a letter
• Thus
• 14260719001808000719220807
• is…
• MATH IS THE BEST
A historical example of substitution
cipher – shift (or Caesar) cipher
ABCDEFGHIJKLMNOPQRSTUVWXYZ
• Choose k
• Shift all letters by k
• For example, if k = 5
• A becomes F, B becomes G, C becomes H,
and so on…
• What will replace X?
Modular arithmetic
• In the Caesar cipher, the following algorithm is used
• If n is the number of a letter in the alphabet, this letter
is replaced by another letter, whose number is
(n+k) modulo 26 (shortly (n+k) mod 26)
• This is a remainder of division of (n+k) by 26
• For example, take k=5 and take letter X
• Its number n = 24
• (n+k) mod 26 = 24 + 5 mod 26 = 3
• So X is replaced with C
Exercise
• Use shift cipher with k=7 to encrypt the
text “TOP SECRET” (ignore space)
Is substitution cipher really good?
• It seems it satisfies the main condition:
if Alice and Bob agree on the table for
substitution then they can exchange
Substitution
cipher
is completely
ruined by
messages and
keep
them secret
FREQUENCY
• Is it really
the case? ANALYSIS
Frequency analysis
• Discovered by Arabs (approx. 9th century)
• Statistically, it is possible to determine how
often each letter appears in an “average” text
• Frequency table
• Other useful observation:
 ST, NG, TH, and QU are common pairs of letters
(bigrams), while NZ and QJ are rare.
 What is one the most common trigram?
 THE
 Letters that often appear at the beginnings of words
To analyze the text…
• Count the number of appearance of
each letter and divide it by the total
number of words in the ciphertext
• Compare the results with the frequency
table
• Note: this method applies effectively to
sufficiently large texts
Weaknesses of the substitution cipher
• Every letter is ALWAYS encoded by the same
symbol, which makes frequency analysis a very
effective tool
• Another problem: knowing context of the message is
very useful (for instance, if you know that the text is
about the types and number of airplanes in the
enemy’s army, you should expect words “airplane”,
“aircraft”, “weapon”, and so on)
• Guesses also help: for instance, if you deciphered a
part of the sentence “A cat drinks …”
you would guess that the last word is “milk” (although
Examples from Literature and History
• the use of frequency analysis to attack simple
substitution ciphers:
 Arthur Conan Doyle “Sherlock Holmes tale: The Adventure of
the Dancing Men”
 Edgar Allan Poe “The Gold Bug”
• Greeks: scytale cipher
• Morse code
• WWII: Enigma machine
• ASCII (American Standard Code for Information
Interchange): Each character is encoded by numbers
from 0 to127 in binary format
Alternatives to substitution cipher
• Symmetric key algorithms (private key)
• Algorithms with two keys
(private and public)
Symmetric key algorithm:
simplified description
• Block cipher: a
plaintext is divided
into blocks of equal
length
• Key: a sequence of
symbols
(or, equivalently, a
number)
Encryption and Decryption with
symmetric key: mathematical model
• Encryption
 Plaintext PT
 Cipher defines a function F that has two
arguments: PT and a key k
 Ciphertext CT = F(PT,k)
• Decryption
-1
 PT = F (CT,k)
Example: Vernam’s cipher
• Represent each letter in binary form
• Transform the plaintext into the sequence of
0’s and 1’s
• Choose a length of the key (say, 7 digits)
• Generate a random 7-digits key
• Divide this sequence into blocks
(7 digits in each block)
• Use addition modulo 2 to encrypt the message
Vernam's cipher
• It was mathematically proved
unbreakable by Claude Shannon (in
1940s) assuming that the following
conditions are satisfied:
 Key is as long as the plaintext
 Key is random
 Used only once, and kept entirely secret
Main problem with symmetric cipher
• If Alice wants to send encrypted messages to Bob
she needs to send the key to Bob
• This is not safe
• It would be nice if
 Bob lets Alice know the algorithm
 Bob send Alice a key called PUBLIC KEY (more precisely,
Bob’s public key) (so everybody knows it)
 Alice uses this key to send an encrypted message to Bob
 Bob uses another key (PRIVATE KEY) to decipher the
message from Alice
 Nobody else (who does not know Bob’s private key) can
decipher this message
YES!
Is it at all
possible?
• RSA algorithm is one of such
methods
• The algorithm was described in
1977 by Ron Rivest, Adi Shamir
• RSA is an “unbreakable” algorithm
Why is RSA so good?
• Factorization into primes
• A prime number is a positive integer greater
than 1 that is divisible only by 1 and itself
 so 2, 3, 5, 7, 11, … are prime numbers
 while 84 = 3*4*7 is not
• Fundamental theorem of arithmetic: every
integer is a product of prime numbers. Such a
representation is unique (up to permutation of
factors)
• Exercise: there are infinitely many primes
Why is RSA so good?
• At the present time, no effective algorithm of
finding prime factorization of numbers is known
• Equivalently, no effective algorithm to decide
whether a given number is prime is known
(it does not mean, however, that such algorithm
cannot exist!)
• Effective means an algorithm that works
“reasonably” fast
• Example: is 943 prime?