User Manual (MS Word 97)

advertisement
CYPHER USER MANUAL
INTRODUCTION
Description
Cypher is a software toolkit designed to aid in the decryption of standard
(historical) ciphers by providing statistical data and algorithmic analysis on encrypted
messages. Most modern forms of encryption utilize the relatively recently discovered
method of public-key cryptography which, for the purposes of this software, is currently
unbreakable and thus is not addressed. As well, Cypher allows users to encrypt
messages using these same historical encryption algorithms and techniques. Therefore,
this toolkit hopes to serve as a cryptographic learning tool and as a pastime for amateur
cryptographers.
Assumptions
This software is best put to use with a basic background in historical
cryptography as much of the manipulation of encrypted messages is user-guided. As not
all users may be familiar with historical cryptography, a brief background to the field is
included in this manual. It is possible to run the automated decryption algorithms, but
these are limited in scope and lack the ingenuity of the human brain, which historically
has prevailed against the most trying odds in cipher decryption.
Intended Users
Anyone wanting to learn to use basic historical encryption and decryption
techniques will find this software useful. Some advanced ciphers can be decrypted (or
encrypted) with this software, but these are methods that have historically been
overcome and so Cypher does not provide secure data encryption at a professional level.
Organization
This manual is divided into six sections:
1. Introduction:
A brief overview of the software.
2. Installation:
Instruction on installing and running Cypher.
3. Getting Started:
A step-by-step guide to solving a
monoalphabetic, a polyalphabetic, and a
transposition cipher with user assistance.
4. Additional Functions and Features:
A detailed guide to all the options, functions, and
features of Cypher, including cipher type
selection, type-specific options, display options,
and automated cipher decryption.
5. Background:
A basic historical background necessary to
maximize the software’s decryption potential.
6. Glossary:
A list of common cryptographic terms and their
definitions.
INSTALLATION
Unix Machines
Download cypher.tar.gz from the Cypher homepage. Uncompress the file by typing
the following commands:
gunzip cypher.tar.gz
tar -vfx cypher.tar
Move into the cypher directory by typing cd cypher and follow instructions found in the
README file.
Microsoft Windows
Download the binary executable, Cypher.exe, from the Cypher homepage. Start the
program by double-clicking on the Cypher.exe icon.
GETTING STARTED
User-assisted Cipher Decryption of a Simple Monoalphabetic Cipher
After starting up the program, you should see a window similar to the following on
your screen:
The layout is fairly simple - there are four windows and a toolbar. The two text
windows and the key palette (at the bottom of the screen) are action windows, where the
user can perform text manipulations or substitutions. The top left window is a reference
tool to allow the user to compare statistical results of a message to that of the English
language. Finally, the bottom left window is a display window for the results of any
analysis performed on the encrypted message.
In the course of decryption, the user may modify any of the action windows and
affect the decryption or partial decryption of the message in the decryption window:
 Modifying text in the cipher window will change the source for viewing the
encrypted message, which is then replaced by plaintext characters
according to the key palette.
 Modifying the key palette will change what a character in the cipher window
will be replaced by when displayed on the decryption window.
 Changing a character in the decryption window has the same effect as
modifying an entry in the key palette, with the same consequences of
reconstituting the decryption shown according to the new key settings.
Our first task is to input the encrypted message. This can be done in two ways,
either by manually inputting the encrypted message by hand, or by loading the message
from a text file by choosing “Open…” from the File menu in the cipher window. At any
point when the message is displayed on the cipher window, it can be saved to disk by
choosing “Save…” or “Save As…” from the File menu on the cipher window. (Similarly, a
message in the decryption window can be saved as plain text at any time by using the
File menu on that window.) Once the message has been inputted, choose “Begin
Decryption” from the Option menu in the cipher window, and the encoded message will
be copied to the decryption window:
At this point we probably want to run a frequency analysis on the encrypted
message. To do this, click on the “%” icon on the toolbar on the right of the screen. When
the analysis is run, statistics are computed automatically for single character frequencies,
digram frequencies, and trigram frequencies, but by default only the single character
frequencies are displayed. In order to view the other results, or to view any results in
histogram format, choose the appropriate display option from the application’s View
menu. The results for the message will automatically be displayed in the message
frequency window on the left of the screen:
With this information we can start guessing at the substitutions. All of M, J, and
X have high frequencies in the encrypted message, so one of them is probably “e” in
plain text. We might try some combinations of these by inputting the corresponding plain
text letters in the key palette at the bottom of the screen. Assuming we guess correctly,
here are the results of those substitutions and the corresponding display on the screen
(we’ll assume correct guesses until a little farther in this manual):
At this point, we might assume that “TNE” represents “the” and so “N” is “h” in
plain text. Using these intuitive observations along with the frequency tables for single
characters, digrams, and trigrams, one might arrive at this partial decryption:
By now, several potential words have been completed. As strings of plain text
characters lengthen, and if the dictionary auto-lookup is enabled, plain text character
substrings are checked against a standard dictionary file and are underlined green if
there is a match; if there is a partial match or seemingly misspelled word, the character
substring will be underlined red. In the example above, there are no misspellings.
Considering the substring “Yall”, we might guess that “Y” represents “b” since
“ball” is a valid word. However, making this substitution produces the following display
(Note the underlined red words that are misspelled as a result of this substitution):
We can see that we’ve made a mistake; so we delete “b” from below “Y” in the
key palette, and reassess, this time coming up with “w” as a possible match. The
dictionary auto-lookup recalculates the matches and this time there are no misspelled
words:
Entering keys for the remaining few cipher text letters produces a completely
deciphered message, with no errors (Note that a proper name or other uncommon words
may not be recognized by the dictionary auto-lookup, and so a message that is
completely deciphered may have “error” markings. Here the dictionary auto-lookup has
been disabled.):
Solving a Polyalphabetic Cipher
The display for a polyalphabetic cipher is very similar to that of the
monalphabetic cipher. However, the message frequencies do not lead to a ready
decryption as in the case of the monalphabetic cipher. The additional repetition search
tool (denoted “XYZ,XYZ” on the toolbar) will display results that may give insight into the
length of the keyword. Once some preliminary guesses are established, set the keyword
length (whose default is one, for monalphabetic ciphers) by clicking on the keyword
length icon on the toolbar, and recalculating the message frequencies for different
character positions within the keyword. Different frequencies within the keyword will be
viewable by selecting the relative position in the keyword from the View menu in the main
application window. Once a keyword length is selected, all statistical and substitution
functions are performed only on multiples of the keyword length from that position.
Solving Transposition Ciphers
Select “Transposition Cipher” from the main application’s Options menu. The
program will prompt for matrix dimensions and the display will be reset. The toolbar icons
will change to perform inversions, rotations, transposes, and other matrix functions.
ADDITIONAL FUNCTIONS AND FEATURES
Cipher Type Selection and Type-specific Options
Tools for decrypting different types of ciphers can be made available by choosing
the appropriate cipher type from the Options menu. Once the cipher type is selected, a
dialog box will prompt the user for the options to be used for that mode of cipher
decryption. In this way it is also possible to change the options for the current decryption
mode.
Display of Results
Using the View menu, results of statistical analysis can be displayed in either
tabular or histogram format. As well, the data displayed can be changed from single
character analysis to digrams or trigrams, also available in tabular or histogram format.
The single character histogram is plotted alphabetically to aid in determining any possible
shift encryptions, but the digram and trigram histograms are plotted in decreasing order
of frequency. If the results of the histogram are hard to see, click once on the histogram
and a new window will pop-up with a larger version of that image.
Changing the Dictionary
Under the Options menu, choose “Load new dictionary” and follow the prompts.
Automated Cipher Decryption
Select “Automated Cipher Decryption” from the Options menu. A pop-up box will
appear prompting for the type of cipher to be used, potential key if any, and any other
necessary or relevant information (perhaps choice of algorithm, or inclination to
encryption type, or selection of multiple cipher types to be attempted?) for decrypting the
message. As well, the user can select whether to attempt to decipher the message using
a random or iterative key search approach. The dictionary auto-lookup function provides
the means of determining the success of decryption; key sequences leading to the
highest percentage of matches are likely to be the actual decryption key.
BACKGROUND
Types of Ciphers
There are two broad categories of ciphers: substitution ciphers and transposition
ciphers. Substitution ciphers are characterized by a substitution of a character in the
original plain text message by the corresponding character in the cipher alphabet, often
according to some protocol derived from a predetermined key. In a substitution cipher,
the cryptographer is not concerned with changing the position of the characters being
encoded, only their values. Transposition ciphers, on the other hand, retain a character’s
true form but instead change its position. The subject of substitution ciphers is addressed
first.
Monalphabetic Substitution Ciphers
In a monoalphabetic substitution cipher a single character in the plain text
alphabet is replaced by a single character in the cipher alphabet. If the language of the
plain text message is known (and it is assumed to be English in this software), then the
frequency statistics of the occurrence of each letter in the plain text message are also
known by analyzing samples of that language. In English, the most frequently occurring
letter is ‘e’ at a 12.3% count, followed by t (9.6%) and then a (8.0%); a complete
reference to the order statistics of the English language is included in the software. After
analyzing the order statistics of a message encrypted by a monoalphabetic substitution
cipher, a fairly close mapping of the most frequently occurring letters in the cipher text to
the corresponding most frequently occurring letters in the plain text language often yields
most of the correct substitutions. The incorrect assignments can be corrected by visually
(or algorithmically) finding patterns within the text, often by recognizing commonly
occurring words such as “and” or “the”. Once a few letters are deciphered, the rest will
follow as a natural consequence of the order statistics and new recognizable words within
the partially decrypted message. If the message does not yield readily to a singlecharacter frequency analysis, the frequencies of digrams (two-character combinations)
and trigrams (three-character combinations) can also be analyzed and compared to the
known statistics in the English language.
One common form of the monoalphabetic substitution cipher is the Caesar shift,
in which the encrypter chooses a keyword or keyphrase (simply referred to as the “key”)
that is easy to remember and readily yields the cipher alphabet. As an example, consider
the plain text alphabet to be “abcdefghijklmnopqrstuvwxyz” and let the key be the word
“cypher”. Then the cipher alphabet would look like “cypherabdfghijklmnoqstuvwxz”,
simply the key (with letter repetitions removed) followed by the rest of the normal
alphabet (also with repetitions removed). This type of cipher is easily solved by
examining what the software calls a “shift histogram”, or a histogram of the cipher
alphabet with the assumption that the cipher and plain text alphabets are the same. Since
the ordered histogram of letters in the English language is known it can be compared to
the results of the cipher text analysis, and peaks and troughs in the graphs can be
matched to yield a reasonable guess at the key and thus the cipher alphabet.
Polyalphabetic Substitution Ciphers
A polyalphabetic substitution cipher is fairly similar to a monoalphabetic one, the
difference being that instead of using a single cipher alphabet, multiple cipher alphabets
are used. The simplest type of polyalphabetic cipher, the Vigenere cipher, can use as
many as twenty-six distinct cipher alphabets, each a simple rotation of the normally
ordered alphabet. For example, one alphabet might be “abcdefghijklmnopqrstuvwxyz”,
another “bcdefghijklmnopqrstuvwxyza”, another “cdefghijklmnopqrstuvwxyzab” and so
on. The Vigenere encryption works on the basis of a keyword, whose letters determine
the order and character of the cipher alphabets used to encode a message.
As a simple example, to encrypt a message using the keyword “cypher” start by
translating the first character of the message according to the cipher alphabet
corresponding to the first letter of the keyword, or “cdef…”. If the plain text character is ‘b’
then it becomes ‘d’ according to this first cipher alphabet. To encrypt the second letter of
plain text, use the cipher alphabet corresponding to the second letter of the keyword, or
“yzabc…”. If the plain text character is ‘b’ again, this time it becomes ‘z’ according to the
second cipher alphabet. This process continues to the last letter of the keyword and then
the cycle is restarted, reusing the first letter of the keyword, then the second, and so on.
The solution to the Vigenere cipher takes advantage of the fact that cipher
alphabets repeat themselves according to the length of the keyword. By guessing various
lengths of the keyword and performing analyses only on cipher text produced by the
same cipher alphabet (ie, if the keyword is assumed to be 5 letters long, include every 5 th
letter in the analysis) and scanning the results for a distribution similar to the English
language, individual cipher alphabets can be deciphered and then combined to find the
keyword, and decode the remainder of the encrypted message. Alternately, or jointly,
finding repetitions of letter combinations in the encrypted message (ie, “XYZ…XYZ”) may
indicate a repeated word encrypted with the same rotation. Statistically, if the message is
long enough then there is a significant chance that this may occur. Using the spacing
between the letter repetitions, the length of the keyword can be constrained to be a factor
of that spacing interval, and this consequence can then be tested. The fact that the cipher
alphabets are simple rotations on the normal alphabet further simplifies this task.
If the cipher alphabets are not simple rotations but instead are random
arrangements of the normal alphabet, the encrypted message may not succumb to this
type of analysis. Historically, decoding machines were paired with human ingenuity to
find alternate solutions to these types of codes, such as those used in the German
Enigma machine in World War II, but in today’s world of super-fast computers an
exhaustive search of keys is a real possibility to cracking these codes.
Transposition Ciphers
A transposition cipher rearranges the positions of the characters but not their true
values, so that a frequency analysis of a message encrypted by transposition appears
“normal”. There are many ways to accomplish this, including rotating columns and/or
rows of the message, rotating the entire message left or right, or along diagonals. These
types of ciphers are fairly straightforward to break by considering a block of text as
characters in a matrix and performing row or column shifts and swaps until a
recognizable message is derived. One can recognize that a message has been
encrypted by a transposition cipher if it has the same frequency statistics as a given
language (assumed to be English) but the message text still appears garbled. In this case
we rotate rows or columns or perform other basic matrix operations to recover the original
message. A special consideration of transposition ciphers is that the message must act
as a block of text, and so might require reformatting or buffering of some sort (such as by
adding extra characters at the end of the message) in order to give it a readily
transposable form.
Other Types of Ciphers
There are many other ways to complicate ciphers and render them useless for
decryption by automated algorithms. As an example, a cryptographer could chose a
larger cipher alphabet than required for a one-to-one mapping with the plain text
alphabet, and assign more frequently occurring letters in the plain text alphabet
correspondingly more symbols in the cipher alphabet so that analyzing letter frequencies
of the end result yields no useful information (as all frequencies will be roughly
equivalent). Another way to deviously encrypt a message is by using a key-text, such as
the Declaration of Independence or a well-known poem, numbering the words in the said
document and replacing letters in the plain text by a number that represents the first
character of that number word in the key-text. Similarly, all pairs of letters could be
numbered randomly and replaced by their corresponding numbers to form the cipher text.
It is important to emphasize that any or all of these types of ciphers may be used to
encrypt a message, and a clever cryptographer can quickly boggle preset software
algorithms.
Public-Key Encryption
This type of encryption is not addressed by the Cypher software, but is the most
commonly used and secure form of encryption at this time. Public-key encryption assigns
the receiver of the message both a public and a private key; the public one is available
for any potential sender to encrypt a message with and the private one is kept by the
receiver. Because these keys are related mathematically and based on modular functions
and very large prime numbers, knowing the public key gives little information on the
private key needed to decrypt the message, and a brute force approach is not feasible.
Suggestions for Further Reading
Simon Singh’s The Code Book is a very readable and up to date account of the
history of cryptography, and as a matter of fact is the basis for most of the algorithms
used in this software. Other references might include…
GLOSSARY
ASCII American Standard Code for Information Interchange, a standard for turning alphabetic and other
characters into numbers.
Asymmetric-key cryptography A form of cryptography in which the key required for encrypting is not
the same as the key required for decrypting. Describes public-key cryptography systems, such as
RSA.
Caesar-shift substitution cipher Originally a cipher in which each letter in the message is replaced with
the letter three places further on in the alphabet. More generally, it is a cipher in which each letter
in the message is replaced with the letter x places further on in the alphabet, where x is a number
between 1 and 25.
cipher Any general system for hiding the meaning of a message by replacing each letter in the original
message with another letter. The system should have some built-in flexibility, known as the key.
cipher alphabet The rearrangement of the ordinary (or plain ) alphabet, which then determines how each
letter in the original message is enciphered. The cipher alphabet can also consist of numbers or
any other characters, but in all cases it dictates the replacements for letters in the original message.
ciphertext The message (or plaintext) after encipherment.
code
A system for hiding the meaning of a message by replacing each word or phrase in the original
message with another character or set of characters. The list of replacement is contained in a
codebook. (An alternative definition of a code is any form of encryption which has no built-in
flexbility, i.e. there is only one key, namely the codebook.)
codebook A list of replacements for words or phrases in the original message.
cryptanalysis The science of deducing the plaintext from a ciphertext, without knowledge of the key.
cryptography The science of encrypting a message, or the science of concealing the meaning of a message
of a message. Sometimes the term is used more generally to mean the science of anything
connected with ciphers, and is an alternative to the term cryptology.
cryptology The science of secret writing in all its forms, covering both crytography and cryptanalysis.
decipher To turn an enciphered message back into the original message. Formally, the term refers only to
the intended receiver who knows the key required to obtain the plaintext, but informally it also
refers to the process of cryptanalysis, in which the decipherment is performed by an enemy
interceptor.
decode To turn an encoded message back into the original message.
decrypt To decipher or to decode.
DES
Data Encryption Standard, developed by IBM and adopted in 1976.
Diffie-Hellman-Merkle key exchange A process by which a sender and receiver can establish a secret key
via public discussion. Once the key has been agreed, the sender can use a cipher such as DES to
encrypt a message.
digital signature A method for proving the authorship of an electronic document. Often this is generated
by the author encrypting the document with his or her private-key.
encipher To turn the original message into the enciphered message.
encode To turn the original message into the encoded message.
encrypt To encipher or encode.
encryption algorithm Any general encryption process which can be specified exactly by choosing a key.
homophonic substitution cipher A cipher in which there are several potential substitutions for each
plaintext letter. Crucially, if there are, say, six potential substitutions for the plaintext letter A, then
these six characters can only represent the letter A. This is a type of monoalphabetic substitution
cipher.
key
The element that turns the general encryption algorithm into a specific method for encryption. In
general, the enemy may be aware of the encryption algorithm being used by the sender and
receiver, but the enemy must not be allowed to know the key.
key distribution The process of ensuring that both sender and receiver have access to the key required to
encrypt and decrypt a message, while making sure that the key does not fall into enemy hands.
Key distribution was a major problem in terms of logistics and security before the invention of
public-key cryptography.
key escrow A scheme in which users lodge copies of their secret keys with a trusted third party, the escrow
agent, who will pass on keys to law enforcers only under certain circumstances, for example if a
court order is issued.
key length Computer encryption involves keys which are numbers. The key length refers to the number of
digits of bits in the key, and thus indicates the biggest number that can be used as a key, thereby
defining the number of possible keys. The longer the key length (or the greater the number of
possible keys), the longer it will take a cryptanalysis to test all the keys.
monoalphabetic substitution cipher A substitution cipher in which the cipher alphabet is fixed throughout
encryption.
National Security Agency (NSA) A branch of the U.S. Department of Defense, responsible for ensuring
the security of American communications and for breaking into the communications of other
countries.
one-time pad The only known form of encryption that is unbreakable. If relies on a random key that is the
same length as the message. Each key can be used once and only once.
plaintext The original message before encryption.
polyalphabetic substitution cipher A substitution cipher in which the cipher alphabet changes during the
encryption, for example the Vigenère cipher. The change is defined by a key.
Pretty Good Privacy (PGP) A computer encryption algorithm developed by Phil Zimmermann, based on
RSA.
private-key The key used by the receiver to decrypt messages in a system of public-key cryptography. The
private-key must be kept secret.
public-key The key used by the sender to encrypt messages in a system of public-key cryptography. The
public-key is available to the public.
public-key cryptography A system of cryptography which overcomes the problems of key distribution.
Public-key cryptography requires an asymmetric cipher, so that each user can create a public
encryption key and a private decryption key.
quantum computer An immensely powerful computer that exploits quantum theory, in particular the
theory that an object can be in many states at once (superposition), or the theory that an object can
be in many universes at once. If scientists could build a quantum computer on any reasonable
scale, it would jeopardise the security of all current ciphers except the one-time pad cipher.
Quantum cryptography An unbreakable form of cryptography that exploits quantum theory, in particular
the uncertainty principle - which states that it is impossible to measure all aspects of an object with
absolute certainty. Quantum cryptography guarantees the secure exchange of a random series of
bits, which is then used as the basis for a one-time pad cipher.
RSA The first system that fitted the requirements of public-key cryptography, invented by Ron Rivest, Adi
Shamir and Leonard Adleman in 1977.
steganography The science of hiding the existence of a message, as opposed to cryptography, which is the
science of hiding the meaning of a message.
substitution cipher A system of encryption in which each letter of a message is replaced with another
character, but retains its position within the message.
symmetric-key cryptography A form of cryptography in which the key required for encrypting is the
same as the key required for decrypting. The term describes all traditional forms of encryption, i.e.
those in use before the 1970s.
transposition cipher A system of encryption in which each letter of a message changes its position within
the message, but retains its identity.
Vigenère cipher A polyalphabetic cipher which was developed around 1500. The Vigenère square
contains 26 separate cipher alphabets, each one a Caesar-shifted alphabet, and a keyword defines
which cipher alphabet should be used to encrypt each letter of a message.
Download