CYPHER USER MANUAL INTRODUCTION Description Cypher is a software toolkit designed to aid in the decryption of standard (historical) ciphers by providing statistical data and algorithmic analysis on encrypted messages. Most modern forms of encryption utilize the relatively recently discovered method of public-key cryptography which, for the purposes of this software, is currently unbreakable and thus is not addressed. As well, Cypher allows users to encrypt messages using these same historical encryption algorithms and techniques. Therefore, this toolkit hopes to serve as a cryptographic learning tool and as a pastime for amateur cryptographers. Assumptions This software is best put to use with a basic background in historical cryptography as much of the manipulation of encrypted messages is user-guided. As not all users may be familiar with historical cryptography, a brief background to the field is included in this manual. It is possible to run the automated decryption algorithms, but these are limited in scope and lack the ingenuity of the human brain, which historically has prevailed against the most trying odds in cipher decryption. Intended Users Anyone wanting to learn to use basic historical encryption and decryption techniques will find this software useful. Some advanced ciphers can be decrypted (or encrypted) with this software, but these are methods that have historically been overcome and so Cypher does not provide secure data encryption at a professional level. Organization This manual is divided into six sections: 1. Introduction: A brief overview of the software. 2. Installation: Instruction on installing and running Cypher. 3. Getting Started: A step-by-step guide to solving a monoalphabetic, a polyalphabetic, and a transposition cipher with user assistance. 4. Additional Functions and Features: A detailed guide to all the options, functions, and features of Cypher, including cipher type selection, type-specific options, display options, and automated cipher decryption. 5. Background: A basic historical background necessary to maximize the software’s decryption potential. 6. Glossary: A list of common cryptographic terms and their definitions. INSTALLATION Unix Machines Download cypher.tar.gz from the Cypher homepage. Uncompress the file by typing the following commands: gunzip cypher.tar.gz tar -vfx cypher.tar Move into the cypher directory by typing cd cypher and follow instructions found in the README file. Microsoft Windows Download the binary executable, Cypher.exe, from the Cypher homepage. Start the program by double-clicking on the Cypher.exe icon. GETTING STARTED User-assisted Cipher Decryption of a Simple Monoalphabetic Cipher After starting up the program, you should see a window similar to the following on your screen: The layout is fairly simple - there are four windows and a toolbar. The two text windows and the key palette (at the bottom of the screen) are action windows, where the user can perform text manipulations or substitutions. The top left window is a reference tool to allow the user to compare statistical results of a message to that of the English language. Finally, the bottom left window is a display window for the results of any analysis performed on the encrypted message. In the course of decryption, the user may modify any of the action windows and affect the decryption or partial decryption of the message in the decryption window: Modifying text in the cipher window will change the source for viewing the encrypted message, which is then replaced by plaintext characters according to the key palette. Modifying the key palette will change what a character in the cipher window will be replaced by when displayed on the decryption window. Changing a character in the decryption window has the same effect as modifying an entry in the key palette, with the same consequences of reconstituting the decryption shown according to the new key settings. Our first task is to input the encrypted message. This can be done in two ways, either by manually inputting the encrypted message by hand, or by loading the message from a text file by choosing “Open…” from the File menu in the cipher window. At any point when the message is displayed on the cipher window, it can be saved to disk by choosing “Save…” or “Save As…” from the File menu on the cipher window. (Similarly, a message in the decryption window can be saved as plain text at any time by using the File menu on that window.) Once the message has been inputted, choose “Begin Decryption” from the Option menu in the cipher window, and the encoded message will be copied to the decryption window: At this point we probably want to run a frequency analysis on the encrypted message. To do this, click on the “%” icon on the toolbar on the right of the screen. When the analysis is run, statistics are computed automatically for single character frequencies, digram frequencies, and trigram frequencies, but by default only the single character frequencies are displayed. In order to view the other results, or to view any results in histogram format, choose the appropriate display option from the application’s View menu. The results for the message will automatically be displayed in the message frequency window on the left of the screen: With this information we can start guessing at the substitutions. All of M, J, and X have high frequencies in the encrypted message, so one of them is probably “e” in plain text. We might try some combinations of these by inputting the corresponding plain text letters in the key palette at the bottom of the screen. Assuming we guess correctly, here are the results of those substitutions and the corresponding display on the screen (we’ll assume correct guesses until a little farther in this manual): At this point, we might assume that “TNE” represents “the” and so “N” is “h” in plain text. Using these intuitive observations along with the frequency tables for single characters, digrams, and trigrams, one might arrive at this partial decryption: By now, several potential words have been completed. As strings of plain text characters lengthen, and if the dictionary auto-lookup is enabled, plain text character substrings are checked against a standard dictionary file and are underlined green if there is a match; if there is a partial match or seemingly misspelled word, the character substring will be underlined red. In the example above, there are no misspellings. Considering the substring “Yall”, we might guess that “Y” represents “b” since “ball” is a valid word. However, making this substitution produces the following display (Note the underlined red words that are misspelled as a result of this substitution): We can see that we’ve made a mistake; so we delete “b” from below “Y” in the key palette, and reassess, this time coming up with “w” as a possible match. The dictionary auto-lookup recalculates the matches and this time there are no misspelled words: Entering keys for the remaining few cipher text letters produces a completely deciphered message, with no errors (Note that a proper name or other uncommon words may not be recognized by the dictionary auto-lookup, and so a message that is completely deciphered may have “error” markings. Here the dictionary auto-lookup has been disabled.): Solving a Polyalphabetic Cipher The display for a polyalphabetic cipher is very similar to that of the monalphabetic cipher. However, the message frequencies do not lead to a ready decryption as in the case of the monalphabetic cipher. The additional repetition search tool (denoted “XYZ,XYZ” on the toolbar) will display results that may give insight into the length of the keyword. Once some preliminary guesses are established, set the keyword length (whose default is one, for monalphabetic ciphers) by clicking on the keyword length icon on the toolbar, and recalculating the message frequencies for different character positions within the keyword. Different frequencies within the keyword will be viewable by selecting the relative position in the keyword from the View menu in the main application window. Once a keyword length is selected, all statistical and substitution functions are performed only on multiples of the keyword length from that position. Solving Transposition Ciphers Select “Transposition Cipher” from the main application’s Options menu. The program will prompt for matrix dimensions and the display will be reset. The toolbar icons will change to perform inversions, rotations, transposes, and other matrix functions. ADDITIONAL FUNCTIONS AND FEATURES Cipher Type Selection and Type-specific Options Tools for decrypting different types of ciphers can be made available by choosing the appropriate cipher type from the Options menu. Once the cipher type is selected, a dialog box will prompt the user for the options to be used for that mode of cipher decryption. In this way it is also possible to change the options for the current decryption mode. Display of Results Using the View menu, results of statistical analysis can be displayed in either tabular or histogram format. As well, the data displayed can be changed from single character analysis to digrams or trigrams, also available in tabular or histogram format. The single character histogram is plotted alphabetically to aid in determining any possible shift encryptions, but the digram and trigram histograms are plotted in decreasing order of frequency. If the results of the histogram are hard to see, click once on the histogram and a new window will pop-up with a larger version of that image. Changing the Dictionary Under the Options menu, choose “Load new dictionary” and follow the prompts. Automated Cipher Decryption Select “Automated Cipher Decryption” from the Options menu. A pop-up box will appear prompting for the type of cipher to be used, potential key if any, and any other necessary or relevant information (perhaps choice of algorithm, or inclination to encryption type, or selection of multiple cipher types to be attempted?) for decrypting the message. As well, the user can select whether to attempt to decipher the message using a random or iterative key search approach. The dictionary auto-lookup function provides the means of determining the success of decryption; key sequences leading to the highest percentage of matches are likely to be the actual decryption key. BACKGROUND Types of Ciphers There are two broad categories of ciphers: substitution ciphers and transposition ciphers. Substitution ciphers are characterized by a substitution of a character in the original plain text message by the corresponding character in the cipher alphabet, often according to some protocol derived from a predetermined key. In a substitution cipher, the cryptographer is not concerned with changing the position of the characters being encoded, only their values. Transposition ciphers, on the other hand, retain a character’s true form but instead change its position. The subject of substitution ciphers is addressed first. Monalphabetic Substitution Ciphers In a monoalphabetic substitution cipher a single character in the plain text alphabet is replaced by a single character in the cipher alphabet. If the language of the plain text message is known (and it is assumed to be English in this software), then the frequency statistics of the occurrence of each letter in the plain text message are also known by analyzing samples of that language. In English, the most frequently occurring letter is ‘e’ at a 12.3% count, followed by t (9.6%) and then a (8.0%); a complete reference to the order statistics of the English language is included in the software. After analyzing the order statistics of a message encrypted by a monoalphabetic substitution cipher, a fairly close mapping of the most frequently occurring letters in the cipher text to the corresponding most frequently occurring letters in the plain text language often yields most of the correct substitutions. The incorrect assignments can be corrected by visually (or algorithmically) finding patterns within the text, often by recognizing commonly occurring words such as “and” or “the”. Once a few letters are deciphered, the rest will follow as a natural consequence of the order statistics and new recognizable words within the partially decrypted message. If the message does not yield readily to a singlecharacter frequency analysis, the frequencies of digrams (two-character combinations) and trigrams (three-character combinations) can also be analyzed and compared to the known statistics in the English language. One common form of the monoalphabetic substitution cipher is the Caesar shift, in which the encrypter chooses a keyword or keyphrase (simply referred to as the “key”) that is easy to remember and readily yields the cipher alphabet. As an example, consider the plain text alphabet to be “abcdefghijklmnopqrstuvwxyz” and let the key be the word “cypher”. Then the cipher alphabet would look like “cypherabdfghijklmnoqstuvwxz”, simply the key (with letter repetitions removed) followed by the rest of the normal alphabet (also with repetitions removed). This type of cipher is easily solved by examining what the software calls a “shift histogram”, or a histogram of the cipher alphabet with the assumption that the cipher and plain text alphabets are the same. Since the ordered histogram of letters in the English language is known it can be compared to the results of the cipher text analysis, and peaks and troughs in the graphs can be matched to yield a reasonable guess at the key and thus the cipher alphabet. Polyalphabetic Substitution Ciphers A polyalphabetic substitution cipher is fairly similar to a monoalphabetic one, the difference being that instead of using a single cipher alphabet, multiple cipher alphabets are used. The simplest type of polyalphabetic cipher, the Vigenere cipher, can use as many as twenty-six distinct cipher alphabets, each a simple rotation of the normally ordered alphabet. For example, one alphabet might be “abcdefghijklmnopqrstuvwxyz”, another “bcdefghijklmnopqrstuvwxyza”, another “cdefghijklmnopqrstuvwxyzab” and so on. The Vigenere encryption works on the basis of a keyword, whose letters determine the order and character of the cipher alphabets used to encode a message. As a simple example, to encrypt a message using the keyword “cypher” start by translating the first character of the message according to the cipher alphabet corresponding to the first letter of the keyword, or “cdef…”. If the plain text character is ‘b’ then it becomes ‘d’ according to this first cipher alphabet. To encrypt the second letter of plain text, use the cipher alphabet corresponding to the second letter of the keyword, or “yzabc…”. If the plain text character is ‘b’ again, this time it becomes ‘z’ according to the second cipher alphabet. This process continues to the last letter of the keyword and then the cycle is restarted, reusing the first letter of the keyword, then the second, and so on. The solution to the Vigenere cipher takes advantage of the fact that cipher alphabets repeat themselves according to the length of the keyword. By guessing various lengths of the keyword and performing analyses only on cipher text produced by the same cipher alphabet (ie, if the keyword is assumed to be 5 letters long, include every 5 th letter in the analysis) and scanning the results for a distribution similar to the English language, individual cipher alphabets can be deciphered and then combined to find the keyword, and decode the remainder of the encrypted message. Alternately, or jointly, finding repetitions of letter combinations in the encrypted message (ie, “XYZ…XYZ”) may indicate a repeated word encrypted with the same rotation. Statistically, if the message is long enough then there is a significant chance that this may occur. Using the spacing between the letter repetitions, the length of the keyword can be constrained to be a factor of that spacing interval, and this consequence can then be tested. The fact that the cipher alphabets are simple rotations on the normal alphabet further simplifies this task. If the cipher alphabets are not simple rotations but instead are random arrangements of the normal alphabet, the encrypted message may not succumb to this type of analysis. Historically, decoding machines were paired with human ingenuity to find alternate solutions to these types of codes, such as those used in the German Enigma machine in World War II, but in today’s world of super-fast computers an exhaustive search of keys is a real possibility to cracking these codes. Transposition Ciphers A transposition cipher rearranges the positions of the characters but not their true values, so that a frequency analysis of a message encrypted by transposition appears “normal”. There are many ways to accomplish this, including rotating columns and/or rows of the message, rotating the entire message left or right, or along diagonals. These types of ciphers are fairly straightforward to break by considering a block of text as characters in a matrix and performing row or column shifts and swaps until a recognizable message is derived. One can recognize that a message has been encrypted by a transposition cipher if it has the same frequency statistics as a given language (assumed to be English) but the message text still appears garbled. In this case we rotate rows or columns or perform other basic matrix operations to recover the original message. A special consideration of transposition ciphers is that the message must act as a block of text, and so might require reformatting or buffering of some sort (such as by adding extra characters at the end of the message) in order to give it a readily transposable form. Other Types of Ciphers There are many other ways to complicate ciphers and render them useless for decryption by automated algorithms. As an example, a cryptographer could chose a larger cipher alphabet than required for a one-to-one mapping with the plain text alphabet, and assign more frequently occurring letters in the plain text alphabet correspondingly more symbols in the cipher alphabet so that analyzing letter frequencies of the end result yields no useful information (as all frequencies will be roughly equivalent). Another way to deviously encrypt a message is by using a key-text, such as the Declaration of Independence or a well-known poem, numbering the words in the said document and replacing letters in the plain text by a number that represents the first character of that number word in the key-text. Similarly, all pairs of letters could be numbered randomly and replaced by their corresponding numbers to form the cipher text. It is important to emphasize that any or all of these types of ciphers may be used to encrypt a message, and a clever cryptographer can quickly boggle preset software algorithms. Public-Key Encryption This type of encryption is not addressed by the Cypher software, but is the most commonly used and secure form of encryption at this time. Public-key encryption assigns the receiver of the message both a public and a private key; the public one is available for any potential sender to encrypt a message with and the private one is kept by the receiver. Because these keys are related mathematically and based on modular functions and very large prime numbers, knowing the public key gives little information on the private key needed to decrypt the message, and a brute force approach is not feasible. Suggestions for Further Reading Simon Singh’s The Code Book is a very readable and up to date account of the history of cryptography, and as a matter of fact is the basis for most of the algorithms used in this software. Other references might include… GLOSSARY ASCII American Standard Code for Information Interchange, a standard for turning alphabetic and other characters into numbers. Asymmetric-key cryptography A form of cryptography in which the key required for encrypting is not the same as the key required for decrypting. Describes public-key cryptography systems, such as RSA. Caesar-shift substitution cipher Originally a cipher in which each letter in the message is replaced with the letter three places further on in the alphabet. More generally, it is a cipher in which each letter in the message is replaced with the letter x places further on in the alphabet, where x is a number between 1 and 25. cipher Any general system for hiding the meaning of a message by replacing each letter in the original message with another letter. The system should have some built-in flexibility, known as the key. cipher alphabet The rearrangement of the ordinary (or plain ) alphabet, which then determines how each letter in the original message is enciphered. The cipher alphabet can also consist of numbers or any other characters, but in all cases it dictates the replacements for letters in the original message. ciphertext The message (or plaintext) after encipherment. code A system for hiding the meaning of a message by replacing each word or phrase in the original message with another character or set of characters. The list of replacement is contained in a codebook. (An alternative definition of a code is any form of encryption which has no built-in flexbility, i.e. there is only one key, namely the codebook.) codebook A list of replacements for words or phrases in the original message. cryptanalysis The science of deducing the plaintext from a ciphertext, without knowledge of the key. cryptography The science of encrypting a message, or the science of concealing the meaning of a message of a message. Sometimes the term is used more generally to mean the science of anything connected with ciphers, and is an alternative to the term cryptology. cryptology The science of secret writing in all its forms, covering both crytography and cryptanalysis. decipher To turn an enciphered message back into the original message. Formally, the term refers only to the intended receiver who knows the key required to obtain the plaintext, but informally it also refers to the process of cryptanalysis, in which the decipherment is performed by an enemy interceptor. decode To turn an encoded message back into the original message. decrypt To decipher or to decode. DES Data Encryption Standard, developed by IBM and adopted in 1976. Diffie-Hellman-Merkle key exchange A process by which a sender and receiver can establish a secret key via public discussion. Once the key has been agreed, the sender can use a cipher such as DES to encrypt a message. digital signature A method for proving the authorship of an electronic document. Often this is generated by the author encrypting the document with his or her private-key. encipher To turn the original message into the enciphered message. encode To turn the original message into the encoded message. encrypt To encipher or encode. encryption algorithm Any general encryption process which can be specified exactly by choosing a key. homophonic substitution cipher A cipher in which there are several potential substitutions for each plaintext letter. Crucially, if there are, say, six potential substitutions for the plaintext letter A, then these six characters can only represent the letter A. This is a type of monoalphabetic substitution cipher. key The element that turns the general encryption algorithm into a specific method for encryption. In general, the enemy may be aware of the encryption algorithm being used by the sender and receiver, but the enemy must not be allowed to know the key. key distribution The process of ensuring that both sender and receiver have access to the key required to encrypt and decrypt a message, while making sure that the key does not fall into enemy hands. Key distribution was a major problem in terms of logistics and security before the invention of public-key cryptography. key escrow A scheme in which users lodge copies of their secret keys with a trusted third party, the escrow agent, who will pass on keys to law enforcers only under certain circumstances, for example if a court order is issued. key length Computer encryption involves keys which are numbers. The key length refers to the number of digits of bits in the key, and thus indicates the biggest number that can be used as a key, thereby defining the number of possible keys. The longer the key length (or the greater the number of possible keys), the longer it will take a cryptanalysis to test all the keys. monoalphabetic substitution cipher A substitution cipher in which the cipher alphabet is fixed throughout encryption. National Security Agency (NSA) A branch of the U.S. Department of Defense, responsible for ensuring the security of American communications and for breaking into the communications of other countries. one-time pad The only known form of encryption that is unbreakable. If relies on a random key that is the same length as the message. Each key can be used once and only once. plaintext The original message before encryption. polyalphabetic substitution cipher A substitution cipher in which the cipher alphabet changes during the encryption, for example the Vigenère cipher. The change is defined by a key. Pretty Good Privacy (PGP) A computer encryption algorithm developed by Phil Zimmermann, based on RSA. private-key The key used by the receiver to decrypt messages in a system of public-key cryptography. The private-key must be kept secret. public-key The key used by the sender to encrypt messages in a system of public-key cryptography. The public-key is available to the public. public-key cryptography A system of cryptography which overcomes the problems of key distribution. Public-key cryptography requires an asymmetric cipher, so that each user can create a public encryption key and a private decryption key. quantum computer An immensely powerful computer that exploits quantum theory, in particular the theory that an object can be in many states at once (superposition), or the theory that an object can be in many universes at once. If scientists could build a quantum computer on any reasonable scale, it would jeopardise the security of all current ciphers except the one-time pad cipher. Quantum cryptography An unbreakable form of cryptography that exploits quantum theory, in particular the uncertainty principle - which states that it is impossible to measure all aspects of an object with absolute certainty. Quantum cryptography guarantees the secure exchange of a random series of bits, which is then used as the basis for a one-time pad cipher. RSA The first system that fitted the requirements of public-key cryptography, invented by Ron Rivest, Adi Shamir and Leonard Adleman in 1977. steganography The science of hiding the existence of a message, as opposed to cryptography, which is the science of hiding the meaning of a message. substitution cipher A system of encryption in which each letter of a message is replaced with another character, but retains its position within the message. symmetric-key cryptography A form of cryptography in which the key required for encrypting is the same as the key required for decrypting. The term describes all traditional forms of encryption, i.e. those in use before the 1970s. transposition cipher A system of encryption in which each letter of a message changes its position within the message, but retains its identity. Vigenère cipher A polyalphabetic cipher which was developed around 1500. The Vigenère square contains 26 separate cipher alphabets, each one a Caesar-shifted alphabet, and a keyword defines which cipher alphabet should be used to encrypt each letter of a message.