MS Word

advertisement
Assessing the Viability of the Rotor Cipher
in the Modern World
by
Jordan Zink
Gahanna Lincoln High School
140 S Hamilton Road
Gahanna, Ohio 43230
jordanzink@gmail.com
(614) 307-3669
Prepared for
GLHS Science Academy Symposium
February 5, 2010
2
ASSESSING THE VIABILITY OF THE ROTOR CIPHER IN THE
MODERN WORLD
Jordan Zink
Gahanna Lincoln High School, 140 S Hamilton Road, Gahanna, Ohio 43230
The purpose of this project is to determine the viability of the rotor cipher in the modern world of
computer cryptography. This project consists of two phases: modifying the cipher to increase its
security and running a simulation to assess the effectiveness of a brute force attack.
While many modifications were made, one modification involved shortening the plaintext before
encryption by removing unnecessary letters and replacing words with symbols. An experiment
was run to determine the level of shortening that would not distort meaning. 20 subjects
participated and it was found that a conservative level of shortening did not significantly distort
meaning, while a liberal level of shortening did distortion meaning slightly. It was also found
that there was no significant difference between youth and adult subjects, or between subjects
familiar and unfamiliar with texting and online lingo.
To simulate a brute force attack, a computer program was written in Visual Basic. Initial testing
found household computers could not break a message encrypted with 3 or more rotors. A
regression equation was found to predict the key search speed based on plaintext length (R2 =
0.9988). Also, the equation t = 2562n/s was created to showed the relation of time to run a brute
force attack to the number of rotors and the key search speed.
An investigation into the parts of a computer that make a brute force attack run faster was also
conducted. It was found that processors with high l-2 cache, voltage, and front side bus speed
were the fastest, and RAM had little to no effect on time.
It was concluded that the rotor cipher is a viable cipher in the modern world. Also, plaintext
shortening can be applied with most ciphers, which can boost security.
3
Table of Contents
I. Introduction…………………………………………………………………………….……….4
II. Review of Literature………………………………………………………………...…...…….5
III. Methods………………………………………………………………………………...…….18
IV. Results & Analysis…………………………………………………………………….…….27
V. Conclusion………………………………………………………………………………...….34
VI. Acknowledgments………………………………………………………………………..….36
VII. Works Cited……………………………………………………………………...………….37
VIII. Appendix………………………………………………………………………………...…40
4
Introduction
Cryptography is a field that often goes unappreciated. Many people do not know what a
cryptologist even does. However, cryptography is one of the most important fields in computer
science. Although it often goes unseen, it is used by everyone daily. The simple concept of
keeping information secret from prying eyes is, in reality, a complex science rooted deep in logic
and mathematics. While it may be considered an evil, it is a necessary evil for the world today.
Within cryptography there lie many different ciphers, or methods of encrypting and
decrypting data. The point of this project is to research one of these ciphers called a rotor cipher.
This cipher was used by the Germans in WWII and required a large effort by the Allies to break,
including the creation of the first computer. However, in the move to computers and modern
ciphers after the war, the rotor cipher was left behind and became merely cryptographic history.
This project hopes to assess whether, through implementation on a computer, the rotor cipher
still presents a viable secure cipher. The project consists of two phases; modifications to the
cipher to increase security and a simulation to test the effectiveness of a brute force attack on the
cipher.
5
Review of Literature
Introduction to Cryptography
The first recorded instance of concealing information is in Roman philosopher
Herodotus’s The Histories, where he chronicles the conflicts of the fifth century BC between
Persia and Greece. He mentions that a Greek in Persia hid a message on a piece of wood by
covering it with wax revealing Persian military actions. The message was not discovered as it
left Persia and ended up saving Greece from a Persian assault. This event is an example of
steganography, or hiding messages through hiding the existence of a message. While effective
when not discovered, steganography provides no security if the message itself is discovered
(Singh, 1999).
The weaknesses of stegaography led to the creation of cryptography. Cryptography hides
information not by concealing the existence of a message, but by making the information
unreadable except by those parties meant to read the message. In order to hide a message, a
sender would take the plaintext (information to be encrypted) and what is called a key and
combined them through an algorithm to produce a ciphertext (the encrypted message to be sent).
The algorithm is the basic instructions on how to encipher a message, while the key is a further
set of instructions which is specific to each message. Once the receiver gets the message, he or
she will use the algorithm along with the same key to decrypt the message, revealing the original
plaintext (Signh, 1999).
The first ciphers consisted of two categories: transposition and substitution.
Transposition revolves around re-arranging the order of the letters in a word. For example, the
word “example” would become “xplamee”. The weakness of transposition is for the algorithm
to be more secure, it must become very complex. Simple algorithms, like writing the message
6
backwards, provide little security. The second method, substitution, involves replacing certain
letters with other letters. The most famous of these ciphers was the Caesar cipher. The cipher
involved replacing a letter with the letter a certain number of places in front of it in the alphabet.
So if the shift was one, “example” would become “fybnqmf”, and if the shift was two,
“example” would become “gzcorg” (Singh, 1999).
The weakness of the simple substitution cipher led to the creation of a critical part of
cryptography: cryptanalysis.
Cryptanalysis, or codebreaking, is the study of deriving the
plaintext from a ciphertext without knowing the key. The first reference of cryptanalysis is in
the works of al-Kindī, an Arab philosopher. He suggested counting the number of occurrences
of a letter in a ciphertext and comparing that to the normal distribution of letters in that language.
This method, known as frequency analysis, shows the weakness of substitution ciphers, since
they have the same letter distribution in the ciphertext and the plaintext. Al-Kindī’s work also
showed that cryptographic ciphers could be broken by analyzing the ciphertext (Singh, 1999).
Overtime, the simple ciphers of the early days of cryptography were replaced by newer,
more complex ones. One of the most famous was Le Chiffre Indéchiffrable (French for “the
unbreakable cipher”), or the Vigenère Cipher. Substitution ciphers up to the Renaissance were
monoalphabetic ciphers, meaning a single substitution alphabet was used for the entire message.
While these ciphers allowed for easy frequency analysis, the lack of common knowledge in such
matters kept these ciphers secure. But with the Renaissance came a need for more secure
encryptions. In the mid-1500s, a French diplomat named Blaise de Vigenère developed the Le
Chiffre Indéchiffrable (French for “the unbreakable cipher”), or the Vigenère Cipher. This
cipher was a polyalphabetic cipher, meaning it used multiple substitution alphabets to encrypt a
single message. In the cipher, the plaintext would be encrypted using one of 26 different
7
substitution alphabets selected by the key (which was usually a keyword repeated for the length
of the message). Using multiple substitution alphabets made simple frequency analysis useless
since every letter did not necessary encrypt to the same letter for the length of the message. This
cipher remained secure until it was broken by Charles Babbage in the 19th century, who took
advantage of the repeating keyword to perform complex frequency analysis. (Pincock, 2006).
During WWI, many new ciphers were created to keep military traffic secret. However,
they were all built on out-dated thoughts and ideas, allowing cryptanalyst to break most ciphers.
Countries began looking for more powerful ciphers, and cipher machines presented an answer.
A cipher machine uses a mechanism to perform encryption rather than a person using pencil,
paper, and look-up tables. First cipher machines were disks which simply made it easier to
perform the Vigenère Cipher. However, cipher machines would revolutionize cryptography in
1918 (Singh, 1999).
The Enigma Machine
Arthur Scherbius, a German inventor with an understanding in electrical engineering,
invented the Enigma machine in 1918. The machine consisted of a keyboard, a series of
scrambling mechanisms, and a lamp board. After pressing a key, an electrical current traveled
down a wire to the scrambling mechanism, which consisted of a series of rotors (the exact
number varies based on Enigma model; the standard is three) and a reflector. Each rotor had 26
contacts, each corresponding to a letter of the alphabet. Inside the rotor was a jumble of wires
which connected each contact on one side with a random contact on the other side. The
electrical current would pass through these rotors, changing position along the way, until it
reached the reflector. The reflector paired together random pairs of contacts (which can be
thought of as letters). So if the current entered the “A” contact, it would leave through the “G”
8
contact, moving in the opposite direction, and vice versa (current entering “G” would leave as
“A”). After passing through the reflector, the current traveled back through the rotors and again
change positions. Once back to the side of the rotors where it began, it traveled to a lamp board
which illuminated a small light bulb under the letter corresponding to the position the current left
the last rotor in (Singh, 1999).
As described, this machine mealy produces a monoalphabetic substitution cipher (a single
cipher alphabet for the entire message). However, there is one key feature to the machine which
provides impressive security. After a key is pressed, a stepper mechanism rotates the first rotor
one twenty-sixth of a rotation, or so it rotates to the next contact. Every time the first rotor
completes a full rotation, the next rotor over is rotated one contact. So for every 26 steps of the
first rotor, the next rotor is stepped once, and for every 676 (26x26) steps of the first rotor, the
rotor one spot over is stepped 26 times and the rotor two spots over is stepped once, and so on .
The operator could change when this action occurred by setting a notch on the rotor to a specific
setting, but only one turn of the next rotor would ever occur for a full rotation of the original
rotor. This stepping action creates a powerful polyalphabetic cipher by making a random cipher
alphabet for every letter encrypted with no repeating keyword (Pincock, 2006).
Further security is provided by the plug board. The plug board was a series of 26
electrical plugs on the front of the machine with one for each letter of the alphabet. The
electrical current would pass through the plug board both before and after passing through the
rotors. The operator would insert a wire with jacks on both ends into two of the plugs. This
would switch letters connected together whenever current passed through the plug board (letters
with no connections remained the same) (Singh, 1999). The number of connections used varied
9
by who was using the Enigma (i.e. Army or Navy), but six was average (Kahn, 1991). In 1941,
all Enigma traffic was standardized to ten connections (Miller, 1996).
Operators of the Enigma had many settings on the machine to change depending on the
key. The order of the rotors (each rotor could be removed from the machine and placed in a
different position), the rotors’ starting positions, the notch position on each rotor, and the plug
board could all be changed based on the key (Ratcliff, 2003).
Arthur Scherbius market his new invention in 1923, hoping to sell models to both
civilians (such as businessmen) as well as military and diplomatic offices. While failing to gain
popularity in the civilian sector, the German Navy, eager for a new cipher system after learning
that theirs from WWI was broken, took interest in the Enigma. They developed a system to use
the Enigma machine and increase security. Codebooks would be issued to ships which would
contain the keys to be used for a certain period of time (at first, keys were used for weeks or
months, but this later changed to daily key changes). Using these key settings, the operator
would arrange the rotors in the machine, turn the rotors to their starting positions, and plug the
plug board. The operator would make a random key and type it twice (like “ABCABC” for a
three rotor machine). The operator would then reset the machine using the random key and type
the message. This meant that the daily keys issued to every ship would only be used to encrypt a
random key rather than an entire message (Kahn, 1991). The Navy adopted the Enigma machine
in 1926. The German Army followed suit in 1928 (Kahn, 1993). These military models
contained rotors with different wiring than the commercial models, preventing enemies of
Germany from simply buying an Enigma to read their messages (Kahn, 1991).
10
Enigma Strengths
The Enigma machine had many strengths for the Germans. One was the ease of use. The
reflector mechanism made it so that, on the same settings, any letter’s encrypted counterpart will
encrypt back to the first letter. For example, if on the setting ABC, if “A” encrypts to “B”, then
“B” will encrypt to “A”. This allowed for very easy decryption, since operators simply needed
to put in the same settings used to encrypt and type the ciphertext to get the plaintext
(“Cryptanalysis of the Enigma”, 2009).
The main strength that the Germans cited for the security of the Enigma machine was the
number of possible setups for the machine. There were 6.5x1079 different ways to wire the three
rotors, 17,576 different possible settings for three rotors, 676 different notch settings, 7.9x10 12
different reflector wirings, and 5.3x1014 different plug board wirings, bringing the grand total
number of combinations to 3x10114.
This number was further increased when encryption
procedure was changed. Instead of only three rotors which could be rearranged, rotors were now
selected from a group of five, increasing the number of setting even further. This incredibly
large number allowed the Germans to feel safe, believing brute force attacks to be infeasible and
frequency analysis out of the question (Miller, 1996).
Breaking Enigma
The Germans’ belief that Enigma was unbreakable due to the large number of different
settings was largely false. In order to take advantage of the vast number of ways to wire the
three rotors (which provides the bulk of the different combinations), they would have to have a
way to re-wire the rotors, or have each Enigma operator posses 6.5x1079 different rotors.
However, the Germans never implemented a re-wire able rotor, only using the same hardwired
11
rotors for thousands of messages. Using this, Allied Cryptanalyst could reconstruct the rotor
wirings of the military rotors (Ratcliff, 2003).
For the most part, the same was true for the reflector. The reflector, even though it could
be removed and replaced with another, was kept the same by Enigma operators normally.
However, near the end of WWII, Germany developed a re-wire able reflector. This development
terrified the Allied cryptanalysts, but never became an issue due to the fact that it was never fully
implemented and was later abandoned. Due to the weaknesses presented by the hardwiring of
the rotors and reflector, the number of possible settings was reduced from 3x10114 to 1x1023
(Ratcliff, 2003).
Even though the Allies were able to reconstruct the wirings for the rotors, there was still
the problem of the setting presented in the key: the rotor arrangement, the rotor settings, the
notch settings, and the plug board. To discover the key for a message, the Allied cryptanalyst
used a brute force attack. A brute force attack is a cryptanalytic method where the cryptanalyst
tries using every key possible to decode a message, under the assumption that every key but the
correct one will make an incomprehensible message. Only the key that produces a message that
makes sense must be the correct message.
Allied cryptanalysts took advantage of many
weaknesses of the Enigma and its operation to reduce the number of keys that were required to
check, and then used a variety of tools (most of which led to the construction of the modern
computer) to check the remaining keys (“Brute force attack”, 2009).
Enigma Weaknesses Used to Reduce the Number of Keys to Check
One of Enigma’s biggest weaknesses was the mechanism that made it so easy to use: the
reflector. The reflector allowed encryption and decryption to be done easily using the same
settings. The reflector also had the effect of preventing any letter from encrypting to itself. So
12
an “A” in the ciphertext could never correspond to an A in the plaintext (“Cryptanalysis of the
Enigma”, 2009).
Allied cryptanalysts relied heavily on cribs for breaking Enigma. Crib is a cryptographic
term used to refer to a plaintext that is known or suspected to be in a ciphertext. Enigma
operators were notorious for including long phrases in many reports, such as Keine besonderen
Ereignisse (figuratively translates to “nothing to report”) (Milner-Barry, 1993), or even using
small words like eins (one) in their messages repeatedly (“Cryptanalysis of the Enigma”, 2009).
Operators would include callsigns (the identification of the transmitting and receiving stations) in
their brosdcasts, allowing identical plaintexts to be cross referenced (Welchman, 1984). The
British would even place mines in certain areas of ocean in order to try and control what a
German message would say (a method known as “gardening”) (Morris, 1993). Knowing or
guessing plaintext provided the basis for most methods of cryptanalysis of the Enigma.
The plug board, a tool which was added to provide extra security to the Enigma machine,
also provided a weakness. If “A” was connected to “B”, then “A” would always switch with “B”
and “B” would always switch with “A”. Since this stayed the same for the entire message,
Allied cryptanalysts could exploit this feature as an advantage. Creating what was called a
diagonal board, they could reduce the number of rotor settings to be checked considerably
(Welchman, 1984).
German Enigma protocols, while made to increase security, created weaknesses to the
cipher. Operators were instructed to use the daily key to encrypt a made-up key that was typed
twice (“ABCABC” for example). Since the rotors were turning during this process, it made it
seem that this was a secure practice. But in reality, by encrypting the same plaintext in two
different places, the Germans had given Allied cryptanalysts an advantage because they could
13
compare two different ciphertexts knowing that the plaintexts were identical. This allowed
Allied cryptanalyst to be able to make cycle groups (also called boxes of chains) which reduced
the number of possibilities to be tested from 10,000 trillion to 105,456, which is the number of
possible rotor settings (Singh, 1999).
The human operators of the Enigma machine also provided weakness to the cipher. The
“made-up” keys at the beginning of every message (which already present a weakness by
repeating) were notoriously not that random. Operators usually used nearby key combinations,
like “QWE” (the first three letters on the top row) (Rejewski, 1984). Also, operators would use
German three letter words, such as “IST” (which translates to “is”), or even one operator who
used the initials of his girlfriend (Pincock, 2006). All of these nonrandom combinations made it
much easier for Allied cryptanalysts to break the keys.
More human error came from encrypting and sending messages twice. Often in the
German Navy, messages would be sent via the Enigma system and then resent verbatim using
weaker, easy to break ciphers. Known as a kiss, these foolish practices by human operators
allowed Allied cryptanalyst to compare full plaintexts with ciphertext, increasing their
understanding of Enigma and allowing completely different messages to be read more easily
(Mahon, 1945).
It was generally assumed that adding more rotors would increase security of the Enigma
machine. However, to make it portable, most versions of the Enigma machine used only three
rotors (“Enigma machine”, 2009). High security models used by Army HQ staff had many more
rotors than three (“Enigma machine”, 2009). But when it comes down to it, mechanics limited
the Enigma machine’s effectiveness. The need for a physical device with moving parts limited
14
how many rotors could be used, or even how they were used. Modern computers can generate
the effect of an Enigma machine, but, theoretically, without the limit of number of rotors.
Another physical limit was stepping action. The stepping action of the rotors was created
by ratchets and paws (Hamer, 1997). Due to how complex these mechanisms would be if
intricate stepping was used, only single steps were used in most Enigmas (as earlier described)
(Ratcliff, 2003). Some Enigmas contained more advance stepping action (such as early stepping
of the middle and end rotors or double steps), but the normally simple stepping action made
cryptanalysis much easier than erratic stepping.
Modern Cryptography
After WWII, science saw a revolution with the development, which was ironically rooted
in the Allied cryptanalytic effort during the war. During this time of development, ciphers
became shrouded in secrecy with the start of the Cold War. The US government had no
computer-based encryption standard (that is known publicly) until the early 1970 with the
creation of the Data Encryption Standard, or DES. This system, which was disclosed to the
public, became the government standard until it was replaced by the AES system (“History of
cryptography”, 2009).
One issue in modern cryptography is key size. The key size is the amount of information
(in bits) in each key. If there were eight possible keys for a cipher, the key size would be 3 bits
(23 = 8). DES uses a 56 bit key (“Data Encryption Standard”, 2009) while AES uses various key
sizes: 128, 192, and 256 (“Advanced encryption standard”, 2009). Due to secure key transfer, a
smaller key is seen as easier to keep safe than a larger key, but larger keys present more security.
A key the size of the original message would provide perfect security if it was kept secret (this
methods is known as a one time pad) (Singh, 1999).
15
Brute force attacks can easily be implemented on modern computers. As chips become
faster, larger keys are required to keep systems secure from this attack. DES, using custom
chips, can be brute forced in a matter of days. AES has yet to be publicly brute forced (“Brute
force attack”, 2009).
An important issue for cryptographers and cryptanalysts is determining when a brute
force attack would be ineffective. If there were only 100 keys that a cryptanalyst would have to
search through, they could find the key in almost no time, but if there were 1x10^100 keys to
search through, it would take much too long to find the key. Now, 1x10^100 might be a bit
overkill, but any number of keys that would require more than a certain amount of time to search
can be considered secure from brute force attack. The certain amount of time would vary based
on the information being encrypted. If the information that is being encrypted is battle plans that
are to be carried out the next day, then the information may only need to be secure for one week,
but if the information is launch codes for nuclear missile silos, the information would need to
remain secure for decades (if not longer).
The number of keys required then would be
proportional to the length of time that the cipher would remain secure (“Key size”, 2009).
Other Issues in Modern Cryptography
Every cipher mentioned to this point is called a symmetric cipher. That means the key
used to encrypt and decrypt is identical. There is another system called an asymmetric cipher, or
public key encryption. This involves the encrypting key (which is not kept secret and can be
used by anyone) being different than a decrypting key (which is kept secret). As of now, this
method is secure, but no proof of its security exists (Pincock, 2006).
Outside of brute force attacks (and other cryptanalytic attacks that focus on the cipher
itself like frequency analysis) are side-channel attacks. These attacks involve looking at the
16
physical system used to implement the cipher rather than the cipher itself. By watching the
system, attackers can discover information which can allow for the breaking of an otherwise
secure system (“Side-channel attack”, 2009).
Since this attack has to do with physical
implementation of a cipher and not the theory of the cipher itself, this paper will not address
side-channel attacks on rotor ciphers.
Kerckhoff’s Principle
Kerckhoff’s Principle is a fundamental idea of cryptography. The principle states that an
encryption system should remain secure even if everything about the system (outside of the
message-specific key) is known. In another words, the key should be the only factor relied upon
for security (Kahn, 1996).
Various Other Topics
Best/Average/Worst Cases
When looking at an algorithm, one issue taken into consideration is the best, average, and
worst case scenario. These are the amount of resources, such as run time or memory required,
that an algorithm will require in three different situations. Best case would be an optimal run of
the algorithm using the fewest resources, while worst case would be a run of the algorithm that
takes the most amount of resources. Average case would be the average number of resources
required to run the algorithm. For most algorithms, it is ideal to have average case near to or
equal to best case, but for cryptography, it is ideal to have average case equal worst case (“Best,
worst and average case”, 2009).
17
SMS Language
During the first decade of the 21st century, short messaging service, or SMS, has become
one of the most popular forms of communication. Usually referred to as texting, it involves
sending a message no more than 160 characters (http://www.3gpp.org). The need for short
messages that are quick to write or type has caused the evolution of a new language (“New
Language Spawned by SMS Abbreviations”, 2009). Called SMS language, it involves removing
letters from words, replacing common phrases with abbreviations, and using non-letter symbols
to replace words (Alvi, 2009). Popular with teenagers, this language has found its way into
Standard English (such as the abbreviation “LOL” meaning “laugh out loud”). Certain teenagers
are even using aspects of the language in essays for school, like shortening “you” to “u”, which
has created controversy to the language’s validity (“New Language Spawned by SMS
Abbreviations”, 2009).
18
Methods
This project consists of two phases: making modifications to the Enigma cipher to
strengthen it from the cryptanalysis run against it during WWII, and implementing these
modifications on a computer and testing for the viability of a brute force attack based on the
number of rotors in the cipher.
Modifications to the cipher
The first, and seemingly most obvious modification, is the elimination of the reflector.
The reflector was added so that a letter would be encrypted through the rotors twice rather than
once and so that encryption and decryption follow the exact same process (same settings and
everything). However, the weakness of not being able to have the ciphertext character be the
same as the plaintext character is much to great for either of the two strengths, and computers
can do things a mechanical device cannot do, allowing for both strengths to be implemented
without the need of a reflector. So the reflector is eliminated from the cipher.
When it was used during WWII, the rotor cipher consisted of rotors with 26 contacts, one
for every letter of the alphabet. For the modified cipher, the rotors had 256 contacts. This serves
two main advantages. 256 converts perfectly to binary (8 binary digits), which allows for a
medium of data transfer that is universal as opposed to simply the 26 letters of the English
alphabet. Also, 256 provides an immense number of wiring possibilities (256!). The number of
rotors is easily changeable, but the exact number is dependent on the amount of security desired
(see “Assesing the effectivness of a brute force attack” section for more information).
Another major modification to the cipher is the addition of a “shift rotor” in addition to
the normal rotors (from now on referred to as substitution rotors). The substitution rotors
provide the actual substitution alphabet for each letter to be encrypted with, but the shift rotor
tells how to advance the substitution rotors. The Enigma machine would advance each rotor one
19
step at a time and would not turn the 2nd or 3rd rotor until the previous rotor had made a complete
revolution. All of these factors aided in making the cryptanalyst of the Enigma much easier. But
during those few times when the Enigma would advance the 2nd rotor early, or the special
Enigma variations which would occasionally “double step” (advance two settings), the Allied
cryptanalyst were drastically slowed down.
So by devoting entire rotors to shifting the
substitution rotors, cryptanalyst should be made extremely challenging. Each substitution rotor
has its own shift rotor which is unique to each rotor (no two rotors should used the same shift
rotor). The shift rotors themselves will also shift, but will simply advance one position every
letter encrypted.
The key tells the initial positions of every rotor. When specifying the number of rotors,
one is specifying how many substitution rotors there are. So if there are three rotors, there are
actually six rotors, three substitution rotors and three shift rotors. So the length of a key is the
number of characters equal to twice the number of rotors. Each character, which gives an
individual start position for a rotor, is really eight bits. So the key size in bits is 16 times the
number of rotors.
Modifying the plaintext prior to encryption
Even with a stronger cipher, there still lie several problems with the plaintext.
Commonly repeated words provided suspected plaintexts, or cribs, for cryptanalyst to use to try
and break a cipher. Normal English text contains certain letters that are used more commonly
than others (like “E”), which allow for frequency analysis, the main method used for
cryptanalysis of polyalphabeitc substitution ciphers.
before encryption is vital to a cipher’s security.
Finding a way to strengthen plaintext
20
Text Shortening
One way to do this is to shorten the plaintext. English is an inefficient language with
long words that can be shortened. Evolution of this can be seen in SMS language, or texting
language, where the writer uses abbreviations and letter removal (usually vowels) to create a
message still readable but much shorter in length. However, SMS language is not always clear,
causing occasional miscommunication. Individuals who do not know the abbreviations may not
even be able to read the language.
To implement the concept of text shortening, rules must be develop for text shortening
that a computer can follow and which produces a shortened text still readable by any individual.
Ideally, a human could provide the best shortening since a person can use complex logic to know
whether removal of certain characters will yield understandable text, but long passages of text
that need shortening would be time consuming for a person to shorten, so a computer method
was used here. The shortened text must be understandable to any individual as well since the
process of shortening is one way (that is, the text will be shortened before encryption, but after
decryption, the text will not be “un-shortened”; it will stay shortened).
To find the best rules, an experiment was devised to test how text shortening affects
comprehension. Two types of shortening methods were developed, which are referred to as
conservative and liberal. Some rules are followed by both methods (like “to” becoming “2”), but
each method has some individual rules. Conservative removes certain unnecessary letters while
liberal removes all vowels (following some rules, however). The complete set of rules can be
found in Figure 1.1.
For testing, many different passages were created. Two different types of text were
identified: conversational and analytical. Conversational text is meant to simulate everyday
communication by average people, such as E-Mail traffic. It contains many 1st, 2nd, and 3rd
21
person pronouns, mostly simple words, and few or no large names, numerical figures, or specific
location names, but may contain dates and times. Analytical text is supposed to simulate
secretive traffic by organizations such as the government or scientists.
It contains many
numbers, figures, names, and specific locations. It is crucial that this information remain clear,
since miscommunication could be disastrous. There were four different samples created for each
type of text. The four samples for conversational have no theme to them and were referred to as
Sample 1, Sample 2, Sample 3, and Sample 4. For analytical, four different sub-types were
identified: Diplomatic, Military, Corporate, and Scientific. Each simulates a different field
where encryption would be necessary to preserve secrecy. All passages are approximately the
same size in words and characters (with other passages from their type), and all passages can be
found in Figure 1.2. Each passage was shortened via a computer program written by me with
both methods of shortening. The program can be found in Figure 1.3.
Testing involved a test subject reading passages out loud while observed by myself (the
judge). Three measures of data were collected: stumbles, missed words, and time. Every time a
subject paused their reading for a period of time outside normal pauses, or when the subject
began to read a word incorrectly and then fixed the word, the judge marked down a “stumble”.
Every time a subject completely read a word incorrectly, the judge marked down a “missed
word”. Also, if a subject paused for approximately a second before reading a word because he or
she could not figure out the meaning of the word, the judge told the subject to skip the word and
continue and marked down a “missed word”. The time of each passage’s reading is also taken.
Each subject read sixteen passages.
Eight of the passages were shortened (half
conservatively, half liberally), and eight of the passages were controls (not shortened at all). A
subject always read the shortened form of a passage before the control of the passage. The
control allows for a time unique to each subject for comparing the shortened text to normal text.
22
For the test, subjects read three shortened passages, then alternated between control and
shortened until all shortened passages were read, then completed the remaining controls.
Shortened passages were alternated between conservatively shortened and liberally shortened.
The order of the specific passages was randomized from subject to subject excluding the first
passage read. Since this is the first shortened passage the subject would read, it was decided that
it was crucial this did not repeat, so over the course of sixteen runs, each of the eight passages
should be read first once under each shortening. Also, after eight tests, each passage should be
read four times under each shortening. A random order is important so that there is no bias to the
learning curve that subjects develop while reading shortened texts.
The testing environment during each test was quiet and free of distractions. Each test
subject had no knowledge of the content of the passages or of specifics of the text shortening
process (all subjects did know the general premise of the test and those subjects under 18 and
their parents signed consent forms). It was important subjects had no knowledge of the specifics
of the text shortening (outside of experience possibly gained from the internet or texting) so that
the test could reflect an average person’s comprehension. Subjects were selected with no
distinction (such as gender, race, background, etc.), but it was noted when the subject was over
18 or 18 and under. This distinction, referred to as adult vs. youth, is to see if the increased use
of internet and texting by the generation currently in high school translates to increased abilities
in reading shortened texts. After the test, subjects were asked four questions: two concerning
their familiarity with texting, and two concerning their familiarity with online instant messaging.
Both questions also look to see if there is a correlation between texting/internet and ability to
read shortened texts.
In order to try and eliminate any bias from me as a judge, a second judge was present at a
few of the test. During the test, he would also mark down stumbles and missed words. Each
23
judge could not see what the other was marking. This is to make sure a single judge is not
pulling the experiment one way or the other.
Other modifications to the plaintext prior to encryption
Even with text shortening, there are still letters used more commonly than others. A way
to get around this is to trick cryptanalyst by switching out common characters. Since there are
256 characters at our disposal when encrypting text, not all 256 characters are going to be used.
These extra characters would be useless, but if common characters (like “E”) were replaced with
certain extra characters, frequency analysis would be hindered.
And decryption would be
simple, since each extra character would always correspond to a specific normal character (like
“$” always equaling “A”). This method would not make frequency analysis impossible by any
means, but is merely a means of slowing the process of cryptanalysis. Also, keeping in mind
Kerckhoff’s Principle, the system of character exchange should be changed frequently in case a
cryptanalyst discovers the system.
While in a full implementation of the cipher this process would be included, it was
omitted from the brute force simulation described below because the simulation is assuming a
cryptanalyst knows all of the cipher (i.e. rotor wirings) except the message specific key
(Kerckhoff’s Principle).
Assessing brute force effectiveness
The modifications to the cipher were all done in order to strengthen the rotor cipher from
cryptanalysis. However, there is one attack that may still be effective: the brute force attack.
Unless the key size is very large (like a one time pad where the key size is the same as the size of
the plaintext), a brute force attack can always produce the key to decipher any ciphertext.
However, as key size increases, the number of searches to run for a brute force attack and the
time it would take to run the attack increases exponentially. At a certain point, it can be
24
determined that a brute force attack would be ineffective because a search would require years to
discover the key. Since a smaller key is preferable to a larger key, the key need only be as large
as it takes to make a brute force attack ineffective (larger is okay, but it is in a sense overkill).
To find the key size where brute force is ineffective, an experiment was run with the
modified rotor cipher. The rotor cipher’s key size is easy to modify because it is easy to add or
remove rotors from the cipher. To experiment with brute force attacks, the cipher was coded in
Visual Basic. A language like Assembler would be superior in speed to Visual Basic but due to
resources and my knowledge in programming, Visual Basic was used. The cipher was coded as
efficiently as could be made since it is assume that a cryptanalyst would used only the most
efficient ways to implement the cipher for a brute force attack. A screenshot of the interface can
be found in Figure 1.4 and the program can be found in Figure 1.5.
Once the program was completed, small modifications allowed it to run a brute force
attack on a ciphertext created using the cipher. These modifications included searching through
every key along with a very simple plaintext analysis to determine which key is the correct one
(a good plaintext analysis would be very complicated; mine simply looked for the word “the”).
For experimentation, a worst case scenario was simulated. Best case scenario would be the first
guess being the correct key, so it does not require experimentation. There is equal probability for
the key to be any key possible. So there is a 50% chance the key will be found by halfway
through the search. So the average case scenario should take exactly half the time of the worst
case scenario.
The experiment consisted of many trials, with every trial being five runs of the worst case
scenario (a full key search), with the program clocking the run time of each search.
An
encrypted form of “The quick brown fox jumps over the lazy dog.” was used as the ciphertext
being attacked. The variable was the number of rotors (the key size). Starting with one rotor,
25
the number of rotors increased after every trial until the time it took to run the trials became
unreasonable (i.e. it taking more than a few hours to run a full search). Once this is done, data
analysis produced a graph of run time versus rotors (key size). From that, a regression line was
created to predict run time for any rotor.
One major issue was the choice of computers. Due to accessible resources, the trials
were all run on household/workplace grade computers with the algorithm running as a Windows
application (coded in Visual Basic). This represented what key size would be necessary to keep
the cipher secure from an average person running a brute force attack. The ideal brute force
attack would be conducted through a supercomputer with the algorithm coded in Assembler or
machine code for speed, but lack of resources prevents any test from being run.
It is important to remember that this experiment assumes that a cryptanalyst knows the
wiring and order of the rotors, making the only unknown the message-specific key (rotor
settings). In practice, it would be an immense challenge for cryptanalysts to discover the rotor
wirings. But unless the rotors are change frequently, Kerckhoff’s Principle must be taken into
account and it can be assumed that a cryptanalyst could discover the rotor wirings.
A second part of this phase of the experiment was also conducted to determine what
specific aspects of a computer increase the speed of the cipher. Two aspects were looked at; the
processor and the amount of RAM. To conduct this experiment, a brute force attack was run on
many different computers ten times with the search time recorded. The parameters of the brute
force attack were a single rotor with an encrypted form of the text “the quick brown fox”.
Information on each computer was recorded as well; specifically operating system, type of
processor, the speed of the processor(s) in GHz, and the amount of RAM in GB (all of this
information was accessed by right clicking on “My Computer” and selecting Properties).
26
An issue with the brute force simulation in general is what else the computer is running
while the test was run. Each test was conducted with all other applications closed, but since
Microsoft Windows runs many processes in the background, it is hard to know whether the
processor is fully being used to run the brute force attack. However, since no other major
applications were being run, the speed reduction can be considered minor. Further
experimentation regarding this issue could yield faster times, but this will not be discussed in this
paper.
27
Results & Analysis
Results for test shortening
A total of 20 subjects took part in the experiment. Of those 20, 6 were adult (over 18)
while 14 were youth (18 and under). Each subject read 16 passages (8 shortened and 8 control)
meaning that there were 320 pieces of data, with each piece containing number of stumbles,
number of missed words, and time. Three categories of comparison were used for comparing
data: number of stumbles, number of missed words, and Vs Control % (percent difference
between the time of a shortened passage vs. the time it took the same subject to read the control
for that passage). Pure time was not used for comparison since what needed to be identified was
the difference between reading shortened text and normal text.
After running a t-test, it was found that there is a very significant difference in the Vs
Control % of conservatively shortened text and liberally shortened text.
Conservatively
shortened text took on average 11.2% longer to read than control while liberally shortened text
took 45.6% longer to read than control (P value between the shortening methods was 5.50x10-20).
This result basically confirms the belief that a more shortened text is harder to comprehend and
takes longer to read, but does not really tell how understandable the shortened text is compared
to normal text. A complete table of comparison can be found in Figure 2.1.
To see how understandable the text is, the shortened text must be compared to the control
text. They were compared with number of stumbles and number of missed words (it is already
known that both shortening methods took longer to read, so time was not used). Also, since it is
being compared to a control, each method of shortening can now be looked at individually.
Passages using the liberal method of shortening contain a significantly higher number of
stumbles and missed words than the control, meaning that liberally shortened text is very hard to
comprehend. On average, about 2 missed words were present for every liberally shortened text,
28
so meaning may be distorted by the text shortening. However, this is only 2 words in a 65-70
word passage, which is no that much. Liberally shortened analytical passages also had more
stumbles and missed words than conversational passages, which makes sense since analytical
passages contain larger words that could be harder to read if shortened. A complete table of
comparison can be found in Figure 2.2.
Conservatively shortened passages had a significantly higher number of stumbles than
control, but had an insignificantly higher number of missed words (P value was 0.0873). The
number of missed words on average for conservatively shortened passages was 0.05, which is
very, very low. This means that while comprehension may be slowed down, meaning is almost
completely preserved. A complete table of comparison can be found in Figure 2.3.
While a general trend existed of adults having slightly higher stumbles, missed words,
and Vs Control %, no trend exhibited any significance. The lowest P value of any youth-adult
comparison was 0.2029, which is not near significant. A complete table of comparison can be
found in Figure 2.4.
After every test, the subject was asked four questions to assess their familiarity with
texting and online lingo. When comparing those who said they were familiar with those who
said they were not familiar, no significant difference was found. Interestingly, there was a very
slight increase in stumbles and missed words for those who were familiar with texting and online
lingo compared to those who were not, but none of the data was significant (the lowest P value
was 0.2390). A complete table of comparison can be found in Figure 2.5 and Figure 2.6.
When data gathered by myself was compared to the other judge’s data, it was found that
there was no significant difference in our results, which were very close to each others. The P
value of the comparison was 0.8660.
29
Results of Brute Force Attack simulation
When the brute force attack simulation was run, a major problem was discovered. The
time it took to run through a single rotor (16 bit key) was about 1.787 seconds (data can be found
in Figure 2.7). If the number of keys searched is divided by the average time (216 / 1.787), a
search speed of 36674 keys per second is produced. If a test were to be run to try and find the
search time for a second rotor, a total of 4,294,967,296 (2564) keys would need to be searched.
At the speed of 25,889 keys per second, that would take 117,112 seconds, or 32 hours. Even
greater quantities of time would be required to search any more rotors. Knowing this, it can be
concluded that an average computer not design for cryptography could not run an effective brute
force attack on a ciphertext encrypted with three or more rotors.
While this is a result in itself, further research was conducted on the brute force attack.
For the time trials, a very short message was used (“the quick brown fox”), but an investigation
into the relation between message size and key search speed seemed necessary, since most
messages would be much longer than “the quick brown fox”. So a second set of time trials was
run. A brute force attack was run on messages of various sizes (one rotor only, due to already
mentioned reasons). Then, using Excel, the data was graphed and a regression equation was
created. Two graphs were made: one of the raw times, the other of the key search speed (in keys
per second). Both graphs can be found in Figure 2.8 and Figure 2.9.
To estimate the amount of time it would take to run a brute force attack, one would need
to take the number of keys and divide it by the key search speed at that size of message. The
number of keys can be found by taking 256 to the power of two times the number of rotors. So
the equation would look like this:
t = 2562n/s
30
where t = time, n = number of rotors, and s = key search speed. The key search speed is relative
to the size of the message. The regression equation in Figure 2.9 would be used for an average
computer not design for cryptography, but a new regression equation would need to be made for
computers and processors specifically design for running brute force searches. However, the
equation should still be effective in finding the number of rotors required to keep a message
secure if one were to put in the time desired for the message to remain secure and the key search
speed.
Something to remember is that all of this is working on the worst case scenario of the
brute force attack where every key must be searched. In reality, an average case scenario would
be more realistic. To find the average case scenario time, one can simply divide the worst case
scenario time by 2. So the equation for average search time would be: 2t = 2562n/s.
Results of brute force simulation on different computers
Nine computers were tested. All ran Microsoft Windows XP except one which ran a 64
bit version of Windows 7. Information on all computers can be found in Figure 2.10. Average
times of the ten trials on each computer were between 0.9 seconds and 1.75 seconds except for
that of the Netbook which averaged almost 4 seconds. This raises an interesting point, since the
GHz of the processors and the GB of the RAM for the Netbook were not outside the normal
range of the other computers, but the time from the Netbook was well above the other computers.
This is because other factors are at play in the processor than just frequency of the processor.
Factors like efficiency of the processor and cache size also affect processor performance.
Because these other factors are hard to keep track of and since all other computers had very
similar times, these other factors were considered negligible for data analysis. The Netbook’s
data, however, was left out of the data analysis since its processor behaved so differently than the
other processors tested.
31
For analysis, a regression was run using Microsoft Excel. Three independent variables
(first processor frequency, second processor frequency, and RAM) were analyzed relative to one
dependent variable (time). When no second processor was present, 0 was used. The results of
the regression can be found in Figure 2.11. Using the coefficients, the following equation was
predicted:
t = 2.08134 – 0.18688 * P1 + 0.04271 * P2 – 0.14551 * R
where t = time (sec), P1 = the first processor’s frequency (GHz), P2 = the second processor’s
frequency (GHz), and R = RAM (GB). The R2 value of the equation was 0.65693, which is not
great but not bad. An interesting part of the equation is the fact that the coefficient for the
second processor’s frequency is positive, not negative. While this could mean that the second
processor is actually detrimental to the run time for the brute force attack, it is much more likely
that the program is not even using the second processor and the positive coefficient is a result of
the regression trying to find the best fit.
Based on this, a second regression was run with it only looking at fist processor
frequency and RAM. The results of the regression can be found in Figure 2.12. Using the
coefficients, the following equation was predicted:
t = 1.95775 – 0.11970 * P1 – 0.13280 * R.
The R2 value of the equation was 0.65385, which almost identical to the R2 value of the first
regression. Looking at this equation, it shows that RAM and processor frequency have very
similar affects on run time, with RAM having a slightly larger effect. However, the R2 value was
still far from ideal. This means than the other factors in the processor that were considered
negligible do indeed affect the time it takes to run a brute force attack.
To find a better equation, a third regression was run. This regression used 5 more factors
related to the processor: level-2 cache (in MB), front side bus clock time (in MHz), multiplier,
32
voltage (in volts), and thermal design power (in watts). Data can be found in Figure 2.13. The
results of the regression can be found in Figure 2.14. The coefficients produced the following
equation:
t = 12.62375 – 0.19966 * P – 0.00371 * R – 0.73653 * C – 0.00203 * F – 0.03836 * M – 5.85850
* V + 0.00820 * T
where t = time (sec), P = first processor’s frequency (GHz), R = RAM (GB), C = level-2 cache
(MB), F = front side bus frequency (MHz), M = multiplier, V = voltage (V), and T = thermal
design power (W). With and R2 value of 0.9853, this equation is a very good predictor of run
time. This regression equation is limited, however. Since it is linear, it predicts impossible
circumstances. For instance, the equation could predict a zero second (or even negative time)
brute force attack if certain numbers are high enough. But for most normal cases, it should
provide a good prediction.
To aid in analysis of the equation, a table was created which showed each factor’s affect
on the time (this was accomplished by multiplying the technical data by the coefficients
produced by the regression). The table can be found in Figure 2.15. When sorted from lowest
time to highest time, two factors really set apart the two fastest computers (the two that had times
less than one second) from the rest: level-2 cache and front side bus speed. Ironically, these two
computers had low voltages, which served as a large detrimental affect on the time. While most
every factor had an affect on time, one factor had essentially no effect: RAM. This is most likely
because a brute force attack requires high quantity low memory operation, something much more
suited for cache memory than RAM. Knowing this, any party wishing to construct the fastest
computer using only household grade computing equipment would need to acquire a processor
with high level-2 cache, front side bus speed, and voltage. RAM does not need to be large and,
33
unless the program used was modified to take advantage of a second processor, a second
processor would not be necessary or helpful.
34
Conclusion
The goal of this project was to assess the viability of a rotor cipher in the modern world
of computers. Success was found in modifying the cipher to strengthen it from its weaken state
as the Enigma machine of WWII. Taking advantage of the large scale effort to break the cipher
in WWII has allowed the cipher to be changed to counter the weaknesses found.
The cipher has many different advantages. It is easily modifiable to add or remove rotors
from the cipher to adjust the security level to what is needed or desired. It provides effective
security from brute force attack and unlike other ciphers that die out as technology advances,
more rotors can be added to the cipher to increase its security. All of the tests run on the cipher
assume that a cryptanalyst would know the wiring of the rotors, but in reality, cryptanalyst would
have a very challenging time determining the wiring of the rotors.
With the cipher’s advantages also come several disadvantages. The cipher, while very
simple, requires the storage of 768 bytes of information per rotor. This may present problems
when trying to transfer the rotors to all parties who would use the cipher without an unwanted
party receiving a copy. This would also prevent problems to organizations like governments or
banks that already require large databases of keys and would be bogged down by additional rotor
data.
Finally, the cipher, while strengthened against certain cryptanalytic attack, is not
guaranteed secure against every cryptanalytic attack. Further research could either prove or
disprove its security against other attacks, but it is safe to assume it is fairly secure.
So where does this leave the rotor cipher? The rotor cipher presents a means of providing
high security for limited traffic between parties, but large quantities of encrypted traffics may be
better suited for another cipher. However, with these modifications, it is safe to say that the rotor
cipher can be considered a viable cipher in the modern world.
35
Shortening text also went through extensive testing and experimentation. After analyzing
the results, it can be concluded that using the conservative form of text shortening will result in
text that, while slightly harder to comprehend, contains no meaning distortion. Liberal text
shortening will result in minor meaning distortion. This means normal text can be shortened by
approximately 10% without any meaning distortion. This is important for cryptography since
shorter messages make cryptanalysis harder. While this process was discussed as part of the
rotor cipher, it can actually be used with any cipher since it modifies the plaintext prior to
encryption.
36
Acknowledgements
I would like to thank my instructor, Mr. Donelson, for his help and guidance in this
project, Houston Fortney for his help as a second judge, and the subjects of the text shortening
experiment for their participation.
37
Works Cited
______. “Advanced Encryption Standard”. Wikipedia, from
http://en.wikipedia.org/wiki/Advanced_Encryption_Standard.
Alvi, Muzammal. “Saying it With a Smile Instead of Words”. EzineArticles.com. July 20, 2009.
<http://ezinearticles.com/?Saying-it-With-a-Smile-Instead-of-Words&id=2636903>
______. “Best, worst and average case”. Wikipedia, from
http://en.wikipedia.org/wiki/Best,_worst_and_average_case.
______. “Brute force attack”. Wikipedia, from http://en.wikipedia.org/wiki/Brute_force_attack.
Budiansky, Stephen. Battle of Wits. The Free Press, New York City. 2000.
______. “Cryptanalysis of the Enigma”. Wikipedia, from
http://en.wikipedia.org/wiki/Cryptanalysis_of_the_Enigma.
______. “Data Encryption Standard”. Wikipedia, from
http://en.wikipedia.org/wiki/Data_Encryption_Standard.
DeBrosse, Jim and Colin Burke. The Secret in Building 26. Random House, New York. 2004.
Grinter, Rebecca E. and Margery A. Eldridge. “y do tngrs luv 2 txt msg?” European
Conference on Computer Supported Cooperative Works. 2001.
Hamer, David H. “Enigma: Actions Involved in the ‘Double Steeping’ of the Middle Rotor”.
Cryptologica. January 21, 1997.
Hamer, David H., Geoff Sullivan, and Frode Weirud. “Enigma Variations: An Extended Family
of Machines”. Cryptologica. July 1998.
______. “History of cryptography”. Wikipedia, from
http://en.wikipedia.org/wiki/History_of_cryptography.
http://www.3gpp.org
Kahn, David. “An Enigma Chronology”. Cryptologica. July, 1993.
38
Kahn, David. Seizing the Enigma. Houghton Mifflin Company, Boston. 1991.
Kahn, David. The Codebreakers: The Comprehensive History of Secret Communication from
Ancient Times to the Internet. Simon and Schuster, ______. 1996.
______. “Key size”. Wikipedia, from http://en.wikipedia.org/wiki/Key_size.
Mahon, Patrick. "History of Hut 8 to December 1941". The Essential Turing: Seminal
Writings in Computer Logic, Philosophy, Artificial Intelligence and Artificial Life.
Oxford University Press, Oxford. 2004.
Miller, Ray. “The Cryptographic Mathematics of Enigma”. Center for Cryptologic History, Fort
Meade. 1996.
Milner-Barry, Stuart. "Navy Hut 6: Early days". Codebreakers: The Inside Story of
Bletchley Park. Oxford University Press, Oxford. 1993.
Morris, Christopher. "Navy Ultra's Poor Relations". Codebreakers: The Inside Story of Bletchley
Park. Oxford University Press, Oxford. 1993.
_________. “New Language Spawned by SMS Abbreviations”. Asian News International. June
29, 2009.
Pincock, Stephen. Codebreaker: The History of Codes and Ciphers, From the Ancient Pharohs to
Quantum Cryptography. Walker & Company, New York. 2006.
Ratcliff, R. A. “How Statistics Led the Germans to Believe Enigma Secure and Why They Were
Wrong”. Cryptologica. April 2003.
Rejewski, Marion and Richard Woytak. "A Conversation with Marian Rejewski: Appendix B”.
Enigma: How the German machine cipher was broken, and how it was read by the Allies
in World War Two. University Publications of America,
______. “Side-Channel attack”. Wikipedia, from
http://en.wikipedia.org/wiki/Side_channel_attack.
______. 1984.
39
Singh, Simon. The Code Book. Doubleday, New York, 1999.
Welchman, Gordon. The Hut Six Story: Breaking the Enigma Codes. Penguin Books,
Harmondsworth. 1984.
40
Appendix
Figure 1.1
Rules for Shortening:
Both:
 “and” → “&”
 “to”, “too” → “2”
 “you” → “u”
 “at” → “@”
 “see” → “c”
 “for” → “4”
 “are” → “r”
 “why” → “y”
 “because” → “cuz”
 “be” → “b”
 Use all contractions (“can not” → “can’t”)
 Replace spelled out numbers with numerals (“seven” → “7”)
 Do NOT shorten proper names with any method above or below
Conservative only:
 “with” → “wit”
 “until” → “til”
 “about” → “bout”
 “-ing” → “-in”
 “-ed” → “-d”
 “-er” → “-r”
Liberal only:
 Remove all vowels (a,e,i,o,u)
o Keep vowels in words three letters or less
o Keep vowels if first or last letter in word
Note: All substitutions preserve capitalization (i.e. “Because…” → “Cuz…”, not “cuz…”)
Figure 1.2
Passages for text shortening test:
Conversational:
Sample 1:
Plaintext: 77 words, 353 characters
Hey. It is John. How are you doing? I have not talk to you in a long time. Why is that? I have
been doing pretty well. Work has been a great pain as normal. How is your job? The wife and
kids are doing great. My oldest son Jim is the captain of his soccer team. Write me back, man. I
want to hear about how it is going up north. Talk to you later, John.
41
Shortened Text (Conservative): 324 characters, 91.8%
Hey. It’s John. How r u doin? I have not talk 2 u in a long time. Y is that? I’ve been doin pretty
well. Work has been a great pain as normal. How’s your job? The wife & kids r doin great. My
oldest son Jim is the captain of his soccr team. Write me back, man. I want 2 hear bout how it’s
goin up north. Talk 2 u latr, John.
Shortened Text (Liberal): 280 characters, 79.3%
Hey. It’s John. How r u dng? I hve not tlk 2 u in a lng tme. Y is tht? I’ve bn dng prtty wll. Wrk
has bn a grt pn as nrml. How’s yr job? The wfe & kds r dng grt. My oldst son Jim is the cptn of
his sccr tm. Wrte me bck, man. I wnt 2 hr abt how it’s gng up nrth. Tlk 2 u ltr, John.
Sample 2:
Plaintext: 76 words, 351 characters
What is up, Jill? It is your big sister. I just got the news. Great job on that award. I am so proud
of you. Mom and Dad will be so surprised when they hear about this. Is the ceremony on the
tenth or the twelfth? Make sure you get me some tickets. I want to be in the front row when you
get the award. I truly am proud of you. Your big sister, Alice.
Shortened Text (Conservative): 327 characters, 93.2%
What’s up, Jill? It’s your big sistr. I just got the news. Great job on that award. I’m so proud of u.
Mom & Dad will b so surprisd when they hear bout this. Is the ceremony on the 10th or the
12th? Make sure u get me some tickets. I want 2 b in the front row when u get the award. I truly
am proud of u. Your big sistr, Alice.
Shortened Text (Liberal): 288 characters, 82.1%
Wht’s up, Jill? It’s yr big sstr. I jst got the nws. Grt job on tht awrd. I’m so prd of u. Mom & Dad
wll b so srprsd whn thy hr abt ths. Is the crmny on the 10th or the 12th? Mke sre u get me sme
tckts. I wnt 2 b in the frnt row whn u get the awrd. I trly am prd of u. Yr big sstr, Alice.
Sample 3:
Plaintext: 75 words, 378 characters
Hey Mary. That was a great party I went to last night at Jenny’s house. Why did you not show
up? It would have been much better if you were there. We should get together for dinner
sometime. I am free every day until Friday. I know a great Italian place downtown with the best
pasta you have ever tasted. What do you say to Thursday at six? Send me back soon. Your
friend, Beth.
Shortened Text (Conservative): 351 characters, 92.9%
Hey Mary. That was a great party I went 2 last night @ Jenny’s house. Y did u not show up? It
would have been much bettr if u were there. We should get togethr 4 dinnr sometime. I’m free
every day til Friday. I know a great Italian place downtown wit the best pasta u have evr tastd.
What do u say 2 Thursday @ 6? Send me back soon. Your friend, Beth.
Shortened Text (Liberal): 295 characters, 78.0%
42
Hey Mary. Tht was a grt prty I wnt 2 lst nght @ Jenny’s hse. Y did u not shw up? It wld hve bn
mch bttr if u wre thre. We shld get tgthr 4 dnnr smtme. I’m fre evry day untl Fri. I knw a grt Itln
plce dwntwn wth the bst psta u hve evr tstd. Wht do u say 2 Thurs @ 6? Snd me bck sn. Yr frnd,
Beth.
Sample 4:
Plaintext: 78 words, 366 characters
It is Michael. I have been picked to lead the youth group in our town. Do you think I should take
the job, because I am having my doubts. I do not really enjoy the job right now, and being the
leader would mean more work for me. But it is a very good experience, and I would be helping
the community. I think I am going to take it, but I want to know what you think.
Shortened Text (Conservative): 338 characters, 92.3%
It’s Michael. I have been pickd 2 lead the youth group in our town. Do u think I should take the
job, cuz I am havin my doubts. I don’t really enjoy the job right now, & bein the leadr would
mean more work 4 me. But it’s a very good experience, & I would b helpin the community. I
think I’m goin 2 take it, but I want 2 know what u think.
Shortened Text (Liberal): 287 characters, 78.4%
It’s Michael. I hve bn pckd 2 ld the yth grp in our twn. Do u thnk I shld tke the job, cz I am hvng
my dbts. I don’t rlly enjy the job rght now, & bng the ldr wld mn mre wrk 4 me. But it’s a vry gd
exprnce, & I wld b hlpng the cmmnty. I thnk I’m gng 2 tke it, but I wnt 2 knw wht u thnk.
Analytical:
State Department:
Plaintext: 65 words, 410 characters
Status Report from Moscow. Relations with the Kremlin are beginning to break down. The
Russian President is calling for immediate reductions in our nuclear stockpiles by three hundred
warheads or they threaten to sanction exports to the United States. As Ambassador to Russia, I
would strongly suggest that President Smith use his influence to try and resolve this issue, or
military answers may be necessary.
Shortened Text (Conservative): 369 characters, 90.0%
Status Report from Moscow. Relations wit the Kremlin r beginnin 2 break down. The Russian
President is callin 4 immediate reductions in our nuclear stockpiles by 300 warheads or they
threaten 2 sanction exports 2 the US. As Ambassador 2 Russia, I’d strongly suggest that
President Smith use his influence 2 try & resolve this issue, or military answers may b necessary.
Shortened Text (Liberal): 304 characters, 74.1%
Stts Rprt frm Moscow. Rltns wth the Kremlin r bgnnng 2 brk dwn. The Russian Prsdnt is cllng 4
immdte rdctns in our nclr stckpls by 300 wrhds or thy thrtn 2 snctn exprts 2 the US. As Ambssdr
2 Russia, I’d strngly sggst tht Prsdnt Smith use his inflnce 2 try & rslve ths isse, or mltry answrs
may b ncssry.
Military:
Plaintext: 67 words, 399 characters
43
Field Report from General Brown. The army has been drilling three miles south of Fort Victory
for the past six months. In light of the recent aggression by the enemy, we have moved five miles
north west towards the front. We are currently positioned outside the village of Johnstown. An
ambush has been planned by Generals Smith and Jones and I will recommend reinforcements
immediately for success.
Shortened Text (Conservative): 372 characters, 93.2%
Field Report from General Brown. The army has been drillin 3 miles south of Fort Victory 4 the
past 6 months. In light of the recent aggression by the enemy, we’ve movd 5 miles north west
towards the front. We r currently positiond outside the village of Johnstown. An ambush has
been plannd by Generals Smith & Jones & I’ll recommend reinforcements immediately 4
success.
Shortened Text (Liberal): 308 characters, 77.2%
Fld Rprt frm Gnrl Brown. The army has bn drllng 3 mls sth of Frt Victory 4 the pst 6 mnths. In
lght of the rcnt aggrssn by the enmy, we’ve mvd 5 mls nrth wst twrds the frnt. We r crrntly pstnd
otsde the vllge of Johnstown. An ambsh has bn plnnd by Gnrls Smith & Jones & I’ll rcmmnd
rnfrcmnts immdtly 4 sccss.
Business/Corporate:
Plaintext: 66 words, 397 characters
To CEO Williams. We have completed the prototype for the new product with excellent results.
Efficiency is improved forty three percent while cost is reduced thirteen percent. The final design
will be ready on November 9. It is crucial that our rival Anderson Technologies does not find out
about this, since using this product could boost their profit by one million dollars. From R&D,
Mr. Davis.
Shortened Text (Conservative): 342 characters, 86.1%
2 CEO Williams. We have completd the prototype 4 the new product wit excellent results.
Efficiency is improvd 43% while cost is reducd 13%. The final design will b ready on Nov 9. It
is crucial that our rival Anderson Technologies does not find out bout this, since usin this
product could boost their profit by 1 mil. $. From R&D, Mr. Davis.
Shortened Text (Liberal): 291 characters, 73.3%
2 CEO Williams. We hve cmpltd the prttype 4 the new prdct wth excllnt rslts. Effcncy is imprvd
43% whle cst is rdcd 13%. The fnl dsgn wll b rdy on Nov 9. It is crcl tht our rvl Anderson
Technologies ds not fnd out abt ths, snce usng ths prdct cld bst thr prft by 1 mil $. Frm R&D,
Mr. Davis.
Scientific:
Plaintext: 67 words, 369 characters
To Dr. Tompson. We have discovered a new form of hydrogen while experimenting in
Switzerland. It seems to act as if effected by some unknown force. We think it may be gravity,
but are not sure. Our test involved one thousand pieces of hydrogen and resulted in four hundred
being changed. The test rig was moving at fifty miles per hour during the test. From Dr. Wilson.
44
Shortened Text (Conservative): 322 characters, 87.3%
2 Dr. Tompson. We have discoverd a new form of hydrogen while experimentin in Switzerland.
It seems 2 act as if effectd by some unknown force. We think it may b gravity, but aren’t sure.
Our test involvd 1000 pieces of hydrogen & resultd in 400 bein changd. The test rig was movin
@ 50 mph durin the test. From Dr. Wilson.
Shortened Text (Liberal): 288 characters, 78.0%
2 Dr. Tompson. We hve dscvrd a new frm of hydrogen whle exprmntng in Switzerland. It sms 2
act as if effctd by sme unknwn frce. We thnk it may b grvty, but arn’t sre. Our tst invlvd 1000
pcs of hydrogen & rsltd in 400 bng chngd. The tst rig was mvng @ 50 mph drng the tst. Frm Dr.
Wilson.
Average percent of reduction for conservative shortening: 9.2%
Average percent of reduction for liberal shortening: 22.4%
Figure 1.3
Automated Text Shortening Program
Written by Jordan Zink in VBA (Word)
Note: “---” denotes a continuation from the previous line
'declaration of a public variable
Public ReplaceCheck As Boolean
--------------------------------------------------------------------------------------------------------------------Sub TextShortener()
'defines whether shortening method is conservative (false) or liberal (true); this is changed
manually (could be easily modified for friendly user interface)
IsLiberalShortening = False
'gets the length of the text
TextLength = Len(ActiveDocument.Range.Text) - 1
'A is a counter for the current spot the program is searching in the text (like a cursor)
A=1
Do While A <= TextLength
'check if current character is a letter or some other character using LetterCheck function
If LetterCheck(Mid(ActiveDocument.Range.Text, A, 1)) = False Then
'it is some other character (.,:"'?! ect.), no shortening will be applied, add to A and loop
A=A+1
Else
'it is a letter, continue
'find length of word
'B is a counter for the length of the word
B=1
Do
'searches until it finds a non-letter character
If LetterCheck(Mid(ActiveDocument.Range.Text, A + B, 1)) = False Then Exit Do
B=B+1
45
Loop
'check for proper noun (all caps)
If UCase(Mid(ActiveDocument.Range.Text, A, B)) = Mid(ActiveDocument.Range.Text, A,
---B) Then
'proper noun, no text shortening applied, add to A and loop
A=A+B
Else
'not proper, continue
'check substitution database
'first, chec if capitalized (so substitute will also be capitalized)
IsCapitalized = False
If UCase(Mid(ActiveDocument.Range.Text, A, 1)) = Mid(ActiveDocument.Range.Text,
---A, 1) Then IsCapitalized = True
'puts word in "Txt" variable (makes it lower case for easier substitution searching)
Txt = LCase(Mid(ActiveDocument.Range.Text, A, B))
'ReplaceCheck sees if any modifications have been made (starts out false, will be set to
true if the ReplaceDocText function is run)
ReplaceCheck = False
'this section contains substitutions to be applied for both conservitinve and liberal
shortening
'IMPROTANT: the way the program applies modifications is by placing a "ß" character
wherever there is a character that needs to be removed. A later function will remove all "ß" (this
was done for ease and speed of programming) ("ß" character used because it is an odd
character not used in English)
'substitutions are self explanitory
If Txt = "and" Then Temp = ReplaceDocTxt(A, "&ßß")
If Txt = "to" Then Temp = ReplaceDocTxt(A, "2ß")
If Txt = "too" Then Temp = ReplaceDocTxt(A, "2ßß")
If Txt = "you" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "Ußß")
If Txt = "you" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "ußß")
If Txt = "at" Then Temp = ReplaceDocTxt(A, "@ß")
If Txt = "see" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "Cßß")
If Txt = "see" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "cßß")
If Txt = "for" Then Temp = ReplaceDocTxt(A, "4ßß")
If Txt = "are" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "Rßß")
If Txt = "are" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "rßß")
If Txt = "why" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "Yßß")
If Txt = "why" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "yßß")
If Txt = "because" And IsCapitalized = True Then Temp = ReplaceDocTxt(A,
---"Cuzßßßß")
If Txt = "because" And IsCapitalized = False Then Temp = ReplaceDocTxt(A,
---"cuzßßßß")
If Txt = "be" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "Bß")
If Txt = "be" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "bß")
46
If Txt = "one" Then Temp = ReplaceDocTxt(A, "1ßß")
If Txt = "two" Then Temp = ReplaceDocTxt(A, "2ßß")
If Txt = "three" Then Temp = ReplaceDocTxt(A, "3ßßßß")
If Txt = "four" Then Temp = ReplaceDocTxt(A, "4ßßß")
If Txt = "five" Then Temp = ReplaceDocTxt(A, "5ßßß")
If Txt = "six" Then Temp = ReplaceDocTxt(A, "6ßß")
If Txt = "seven" Then Temp = ReplaceDocTxt(A, "7ßßßß")
If Txt = "eight" Then Temp = ReplaceDocTxt(A, "8ßßßß")
If Txt = "nine" Then Temp = ReplaceDocTxt(A, "9ßßß")
If Txt = "ten" Then Temp = ReplaceDocTxt(A, "10ß")
'apply type-specific shortening
If IsLiberalShortening = False Then
'conservitive only
If B > 1 Then
'check if ending in "-er" or "-ed". If so, remove the "e"
If Mid(Txt, B - 1, 2) = "ed" Then Temp = ReplaceDocTxt(A + B - 2, "ß")
If Mid(Txt, B - 1, 2) = "er" Then Temp = ReplaceDocTxt(A + B - 2, "ß")
End If
If B > 2 Then
'check if ending in "-ing". If so, remove "g"
If Mid(Txt, B - 2, 3) = "ing" Then Temp = ReplaceDocTxt(A + B - 1, "ß")
End If
'other substitutions (conservative only)
If Txt = "with" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "Witß")
If Txt = "with" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "witß")
If Txt = "until" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "ßßTil")
If Txt = "until" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "ßßtil")
If Txt = "about" And IsCapitalized = True Then Temp = ReplaceDocTxt(A, "ßBout")
If Txt = "about" And IsCapitalized = False Then Temp = ReplaceDocTxt(A, "ßbout")
Else
'liberal only
If ReplaceCheck = False Then
'still no modifications, continue
'check word length
If B > 3 Then
'the word is long enough; remove vowels
For Search = 2 To B - 1
If VowelCheck(Mid(Txt, Search, 1)) = True Then Temp = ReplaceDocTxt(A +
---Search - 1, "ß")
Next Search
End If
End If
End If
'move cursor past word that was just shortened
A=A+B
End If
47
End If
Loop
'function Replacer actually removes all "ß" characters (Temp is for the returned value of the
function (unused))
Temp = Replacer("ß", "")
End Sub
--------------------------------------------------------------------------------------------------------------------Function ReplaceDocTxt(Where, What As String)
'true ReplaceCheck to show that change has been made
ReplaceCheck = True
'sticks in string requested
ActiveDocument.Characters(Where + Len(What) - 1).InsertAfter (What)
'removes old string
For Tmp = Where To Where + Len(What) - 1
ActiveDocument.Characters(Where).Delete
Next Tmp
End Function
--------------------------------------------------------------------------------------------------------------------Function LetterCheck(Txt As String) As Boolean
LetterCheck = True
'if upper case and lower case equal each other, then it is a non-letter character
If UCase(Txt) = LCase(Txt) Then LetterCheck = False
End Function
Function VowelCheck(Txt As String) As Boolean
VowelCheck = False
'checks if it is a vowel
If Txt = "a" Or Txt = "e" Or Txt = "i" Or Txt = "o" Or Txt = "u" Then VowelCheck = True
End Function
--------------------------------------------------------------------------------------------------------------------Function Replacer(FindT As String, ReplaceT As String)
'replaces requested text with another requested text
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = FindT
.Replacement.Text = ReplaceT
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
48
End Function
Figure 1.4
Figure 1.5
Rotor Cipher Porgram
Written by Jordan Zink in Visual Basic (2008 edition)
Note: “---” denotes a continuation from the previous line
Sub BruteForceAttack()
'start timer
Dim startSec As Integer = Now.Second
Dim startMSec As Integer = Now.Millisecond
'initialize variables
Dim PlainText As String
Dim CipherText As String
Dim CurChar As String
Dim Hits As Integer
Dim TempA As Byte
Dim TempB As Byte
Hits = 0
'load plaintext
PlainText = System.IO.File.ReadAllText(TextBox2.Text + TextBox1.Text,
---System.Text.Encoding.Default)
49
'initialize rotors
Dim RotorsSub(0 To NumericUpDown1.Value) As String
Dim RotorsSft(0 To NumericUpDown1.Value) As String
Dim RotorsSet(0 To NumericUpDown1.Value, 0 To 1) As Integer
Dim RotorsSetStart(0 To NumericUpDown1.Value, 0 To 1) As Integer
'load substitution and shift rotors
For A As Byte = 1 To CByte(NumericUpDown1.Value)
RotorsSub(A) = Mid(System.IO.File.ReadAllText(TextBox2.Text + "Rotor" +
---CStr(NumericUpDown1.Value + 1 - A) + "Sub.txt",
---System.Text.Encoding.Default), 257, 256)
RotorsSft(A) = System.IO.File.ReadAllText(TextBox2.Text + "Rotor" +
---CStr(NumericUpDown1.Value + 1 - A) + "Sft.txt",
---System.Text.Encoding.Default)
Next
'prep key search
For A As Integer = 1 To NumericUpDown1.Value
For B = 0 To 1
RotorsSetStart(A, B) = 0
Next
Next
Do
'load key
For A As Integer = 1 To NumericUpDown1.Value
For B = 0 To 1
RotorsSet(A, B) = RotorsSetStart(A, B)
Next
Next
'run decrypt (see below for comments)
CipherText = ""
For WhichChar As Integer = 1 To Len(PlainText)
CurChar = Mid(PlainText, WhichChar, 1)
For NumRotor As Byte = 1 To CByte(NumericUpDown1.Value)
CurChar = Chr(Mod256(Asc(Mid(RotorsSub(NumRotor), Asc(CurChar) + 1, 1)) –
---RotorsSet(NumRotor, 0)))
Next
CipherText = CipherText + CurChar
'"rotate rotors" (advance rotor settings)
For NumRotor As Byte = 1 To CByte(NumericUpDown1.Value)
RotorsSet(NumRotor, 0) = Asc(Mid(RotorsSft(NumRotor),
---Mod256(RotorsSet(NumRotor, 0) + RotorsSet(NumRotor, 1)) + 1, 1))
RotorsSet(NumRotor, 1) = RotorsSet(NumRotor, 1) + 1
If RotorsSet(NumRotor, 1) = 256 Then RotorsSet(NumRotor, 1) = 0
Next
Next
'look at decrypted text for "the" (to determine if correct decrypted text)
For A = 1 To Len(CipherText) - 2
50
If Mid(CipherText, A, 3) = "the" Then Hits = Hits + 1
Next
'advance to next key
TempA = 1
TempB = 0
Do
RotorsSetStart(TempA, TempB) = RotorsSetStart(TempA, TempB) + 1
If RotorsSetStart(TempA, TempB) = 256 Then
RotorsSetStart(TempA, TempB) = 0
TempB = TempB + 1
If TempB = 2 Then
TempB = 0
TempA = TempA + 1
If TempA > NumericUpDown1.Value Then Exit Do
End If
Else
Exit Do
End If
Loop
If TempA > NumericUpDown1.Value Then Exit Do
Loop
'search done, stop timer
Dim finishMSec As Integer = Now.Millisecond
Dim finishSec As Integer = Now.Second
'display time
TextBox4.Text = finishSec + finishMSec / 1000 - (startSec + startMSec / 1000)
TextBox5.Text = Hits
End Sub
--------------------------------------------------------------------------------------------------------------------Sub EncryptText()
Dim PlainText As String
Dim CipherText As String
'load plaintext
PlainText = System.IO.File.ReadAllText(TextBox2.Text + TextBox1.Text,
---System.Text.Encoding.Default)
CipherText = ""
'initialize rotors
Dim RotorsSub(0 To NumericUpDown1.Value) As String
Dim RotorsSft(0 To NumericUpDown1.Value) As String
Dim RotorsSet(0 To NumericUpDown1.Value, 0 To 1) As Integer
'load substitution and shift rotors (if decrypting, in reverse order)
For A As Byte = 1 To CByte(NumericUpDown1.Value)
If EncryptSelected.Checked = True Then RotorsSub(A) =
---Mid(System.IO.File.ReadAllText(TextBox2.Text + "Rotor" + CStr(A) +
---"Sub.txt", System.Text.Encoding.Default), 1, 256)
If DecryptSelected.Checked = True Then RotorsSub(A) =
51
---Mid(System.IO.File.ReadAllText(TextBox2.Text + "Rotor" +
---CStr(NumericUpDown1.Value + 1 - A) + "Sub.txt",
---System.Text.Encoding.Default), 257, 256)
If EncryptSelected.Checked = True Then RotorsSft(A) =
---System.IO.File.ReadAllText(TextBox2.Text + "Rotor" + CStr(A) + "Sft.txt",
---System.Text.Encoding.Default)
If DecryptSelected.Checked = True Then RotorsSft(A) =
---System.IO.File.ReadAllText(TextBox2.Text + "Rotor" +
---CStr(NumericUpDown1.Value + 1 - A) + "Sft.txt",
---System.Text.Encoding.Default)
Next
'load key (forward if encrypting, reversed if decrypting)
Dim Counter As Integer
If EncryptSelected.Checked = True Then
Counter = 1
For A As Byte = 1 To CByte(NumericUpDown1.Value)
For B = 0 To 1
RotorsSet(A, B) = Asc(Mid(TextBox3.Text, Counter, 1))
Counter = Counter + 1
Next
Next
End If
If DecryptSelected.Checked = True Then
Counter = NumericUpDown1.Value * 2
For A As Byte = 1 To CByte(NumericUpDown1.Value)
For B = 1 To 0 Step -1
RotorsSet(A, B) = Asc(Mid(TextBox3.Text, Counter, 1))
Counter = Counter - 1
Next
Next
End If
Dim CurChar As String
'run cipher (same process for encrypt and decrypt)
For WhichChar As Integer = 1 To Len(PlainText)
'load current character
CurChar = Mid(PlainText, WhichChar, 1)
'move through every rotor
For NumRotor As Byte = 1 To CByte(NumericUpDown1.Value)
'looks up and preforms the substitution
If EncryptSelected.Checked = True Then CurChar = Mid(RotorsSub(NumRotor),
---Mod256(Asc(CurChar) + RotorsSet(NumRotor, 0)) + 1, 1)
If DecryptSelected.Checked = True Then CurChar =
---Chr(Mod256(Asc(Mid(RotorsSub(NumRotor), Asc(CurChar) + 1, 1)) –
---RotorsSet(NumRotor, 0)))
Next
CipherText = CipherText + CurChar
52
'"rotate rotors" (advance rotor settings)
For NumRotor As Byte = 1 To CByte(NumericUpDown1.Value)
RotorsSet(NumRotor, 0) = Asc(Mid(RotorsSft(NumRotor),
---Mod256(RotorsSet(NumRotor, 0) + RotorsSet(NumRotor, 1)) + 1, 1))
RotorsSet(NumRotor, 1) = RotorsSet(NumRotor, 1) + 1
If RotorsSet(NumRotor, 1) = 256 Then RotorsSet(NumRotor, 1) = 0
Next
Next
'write text to new file
If EncryptSelected.Checked = True Then System.IO.File.WriteAllText(TextBox2.Text +
---Mid(TextBox1.Text, 1, Len(TextBox1.Text) - 4) + "(encrypted).txt", CipherText,
---System.Text.Encoding.Default)
If DecryptSelected.Checked = True Then System.IO.File.WriteAllText(TextBox2.Text +
---Mid(TextBox1.Text, 1, Len(TextBox1.Text) - 4) + "(decrypted).txt", CipherText,
---System.Text.Encoding.Default)
End Sub
--------------------------------------------------------------------------------------------------------------------Function Mod256(ByVal WhatNum As Integer) As Integer
'preforms modulo 256
If WhatNum > 255 Then
WhatNum = WhatNum - 256
End If
If WhatNum < 0 Then
WhatNum = WhatNum + 256
End If
Mod256 = WhatNum
End Function
--------------------------------------------------------------------------------------------------------------------Sub RotorGenerator()
'initialize variables
Dim CurRotor As String
Dim Temp(0 To 256) As Byte
Dim Z As Byte
'makes number of rotors specified
For NumRotor As Byte = 1 To CByte(NumericUpDown2.Value)
'sets rotor type (1 = substitution rotor, 2 = shift rotor
For RType As Byte = 1 To 2
'initialize Temp arrray
For A As Integer = 0 To 255
Temp(A) = A
Next
CurRotor = ""
'generate rotor
For A As Integer = 0 To 255
Randomize()
Z = Int((255 - A) * Rnd())
53
'gets random character from array that has not been used yet
CurRotor = CurRotor + Chr(Temp(Z))
'advances array so character will not be used again
For B As Integer = Z To 255 - A
Temp(B) = Temp(B + 1)
Next
Next
'checks if substitution rotor
If RType = 1 Then
'reverses rotor and adds to end of original rotor (for easy decryption)
For A As Integer = 1 To 256
Temp(Asc(Mid(CurRotor, A, 1))) = A - 1
Next
For A As Integer = 0 To 255
CurRotor = CurRotor + Chr(Temp(A))
Next
End If
'writes rotors
If RType = 1 Then System.IO.File.WriteAllText(TextBox2.Text + "Rotor" +
---CStr(NumRotor) + "Sub.txt", CurRotor, System.Text.Encoding.Default)
If RType = 2 Then System.IO.File.WriteAllText(TextBox2.Text + "Rotor" +
---CStr(NumRotor) + "Sft.txt", CurRotor, System.Text.Encoding.Default)
Next
Next
End Sub
Figure 2.1
Conservative Shortening vs. Liberal Shortening
Type of
Comparison
Vs Control %
Stumbles
Missed Words
Vs Control %
Stumbles
Missed Words
Vs Control %
Stumbles
Missed Words
Specifics
None
None
None
Conversational
Conversational
Conversational
Analytical
Analytical
Analytical
Mean of 1st
Set
11.2%
0.61
0.05
10.2%
0.68
0.05
12.2%
0.55
0.05
Mean of 2nd
Set
45.6%
1.54
2.15
37.8%
1.31
1.33
54.1%
1.79
3.05
P value
5.50E-20
1.53E-07
9.56E-13
3.23E-10
0.0019
4.91E-06
4.67E-12
4.66E-05
5.72E-09
54
Figure 2.2
Liberal Shortening vs. Control (No Shortening)
Type of
Comparison
Stumbles
Missed Words
Stumbles
Missed Words
Stumbles
Missed Words
Specifics
None
None
Conversational
Conversational
Analytical
Analytical
Mean of 1st
Set
1.54
2.15
1.31
1.33
1.79
3.05
Mean of 2nd
Set
0.21
0.006
0.23
0
0.20
0.01
P value
6.48E-14
4.30E-13
1.29E-09
2.24E-06
4.65E-07
4.84E-09
Figure 2.3
Conservative Shortening vs. Control (No Shortening)
Type of
Comparison
Stumbles
Missed Words
Stumbles
Missed Words
Stumbles
Missed Words
Specifics
None
None
Conversational
Conversational
Analytical
Analytical
Mean of 1st
Set
0.61
0.05
0.68
0.053
0.55
0.048
Mean of 2nd
Set
0.21
0.006
0.23
0
0.20
0.013
P value
2.24E-05
0.0873
0.0028
0.1600
0.0028
0.3274
Figure 2.4
Youth vs. Adult
Type of
Comparison
Vs Control %
Stumbles
Misses
Vs Control %
Vs Control %
Vs Control %
Vs Control %
Specifics
None
None
None
Conversational
Analytical
Conservative
Liberal
Mean of 1st
Set
27.7%
0.60
0.51
24.9%
30.4%
11.1%
44.2%
Mean of 2nd
Set
30.1%
0.75
0.65
24.1%
36.1%
11.4%
48.8%
P value
0.5668
0.2029
0.4739
0.8732
0.4087
0.8922
0.3993
Figure 2.5
Familiar with Texting vs. Not Familiar
Type of Comparison
Vs Control %
Stumbles
Missed Words
Specifics
None
None
None
Mean of 1st Set
27.8%
0.68
0.58
Mean of 2nd Set
29.9%
0.55
0.48
P value
0.6109
0.2390
0.5241
55
Figure 2.6
Familiar with Online Lingo vs. Not Familiar
Type of Comparison
Specifics
Mean of 1st Set
Mean of 2nd Set
P value
Vs Control %
Stumbles
Missed Words
None
None
None
28.5%
0.64
0.58
28.0%
0.66
0.45
0.9048
0.9044
0.4860
Figure 2.7
Running on a Windows XP OS with an AMD Athlon 2500+ processor at 1.84 GHz
1.765
1.828
1.797
1.734
1.813
*Time in seconds
Figure 2.8
Brute Force Time (One Rotor) y = 8E-05x 2 + 0.105x - 0.0608
R2 = 1
60
Time (seconds)
50
40
30
20
10
0
0
50
100
150
200
250
300
350
Number of characters in plaintext/ciphertext
400
450
56
Figure 2.9
y = 752257x -1.0634
R2 = 0.9988
Key Search Speed
160000
Speed (keys per second)
140000
120000
100000
80000
60000
40000
20000
0
0
50
100
150
200
250
300
350
400
450
Number of characters in plaintext/ciphertext
Figure 2.10
Brute force run times on different computers
General
Type
AMD Athlon XP 3200+
AMD Athlon XP 2500+
Intel Pentium D CPU
Intel Atom CPU N270
Intel Core 2 Duo CPU P8400
1st
processor
freq. (GHz)
2.19
1.84
2.80
1.60
2.26
2nd
processor
freq. (GHz)
None
None
2.80
None
1.58
Intel Core 2 Duo CPU P7350
2.00
Intel Pentium M 725
Intel Pentium 4 CPU HT 650
Mobile Intel Pentium 4 CPU
HT 3.2
OS
Type of processor
Laptop
Desktop
Windows XP
Windows XP
Windows XP
Windows XP
Windows XP
Windows 7
(64 bit)
Windows XP
Windows XP
Laptop
Windows XP
Desktop
Desktop
Desktop
Netbook
Laptop
Laptop
RAM
(GB)
Average
Time
1.00
1.50
1.00
1.00
2.95
1.472
1.750
1.580
3.964
0.952
2.00
6.00
0.981
1.60
3.40
0.60
3.40
1.00
3.00
1.660
1.268
3.20
1.85
0.47
1.444
57
Figure 2.11
Regression Statistics
Multiple R
0.818508438
R Square
0.669956063
Adjusted R Square
0.656928012
Standard Error
0.166077627
Observations
80
ANOVA
Regression
Residual
Total
df
3
76
79
MS
1.418368779
0.027581778
F
51.42412365
t Stat
P-value
2.081336029
Standard
Error
0.125853008
16.537833
6.78802E-27
-0.186882119
0.060106823
-3.109166497
0.002640396
0.042711387
0.032851338
1.300141466
0.197481677
-0.145507826
0.014679091
-9.91259126
2.43087E-15
SS
4.208482985
2.142838503
6.351321488
MS
2.104241492
0.027829071
F
75.6130687
Coefficients
Intercept
1st Processor
Frequency
2nd Processor
Frequency
RAM
SS
4.255106336
2.096215152
6.351321488
Significance F
2.93808E-18
Figure 2.12
Regression Statistics
Multiple R
0.814011874
R Square
0.66261533
Adjusted R Square
0.653852092
Standard Error
0.166820477
Observations
80
ANOVA
Regression
Residual
Total
df
2
77
79
t Stat
P-value
1.957750737
Standard
Error
0.082852127
23.62945671
4.99392E-37
-0.119696147
0.030836372
-3.881654717
0.000217534
-0.132799481
0.011000385
-12.07225769
1.93305E-19
Coefficients
Intercept
1st Processor
Frequency
RAM
Significance F
6.80496E-19
58
Figure 2.13
More processor data
Type of processor
AMD Athlon XP 3200+
AMD Athlon XP 2500+
Intel Pentium D CPU
Intel Atom CPU N270
Intel Core 2 Duo CPU P8400
Intel Core 2 Duo CPU P7350
Intel Pentium M 725
Intel Pentium 4 CPU HT 650
Mobile Intel Pentium 4 CPU HT 3.2
L2-Cache
512 KB
512 KB
2 MB
512 KB
3 MB
3 MB
2 MB
2 MB
512 KB
Front Side Bus
400 MHz
333 MHz
800 MHz
533 MHz
1066 MHz
1066 MHz
400 MHz
800 MHz
533 MHz
Multiplier
11x
11x
14x
12x
8.5x
7.5x
16x
17×
24×
Voltage
1.65 V
1.65 V
1.3 V
1.1 V
1.15 V
1.15 V
1.34 V
1.3 V
1.5 V
TDP
76.8 W
68.3 W
95 W
2.5 W
25 W
25 W
15 W
84 W
76 W
Figure 2.14
Regression Statistics
Multiple R
0.999083791
R Square
0.998168421
Adjusted R Square
0.985347368
Standard Error
0.109270413
Observations
9
ANOVA
Regression
Residual
Total
df
7
1
8
Coefficients
Intercept
Processor Frequency
RAM
L2 - Cache
Front Side Bus
Multiplier
Voltage
TDP
12.62375
-0.19966
-0.00371
-0.73653
-0.00203
-0.03836
-5.85850
0.00820
SS
6.507038199
0.011940023
6.518978222
Standard
Error
0.78125764
0.277376856
0.042665112
0.080882086
0.00062386
0.02365915
0.475595476
0.003804659
MS
0.929576886
0.011940023
F
77.85385915
t Stat
P-value
16.15823675
-0.719808786
-0.086984021
-9.10627595
-3.260044702
-1.621392709
-12.31823343
2.154207112
0.039348901
0.602814764
0.944763284
0.069630999
0.189478828
0.351826277
0.051568009
0.276678909
Significance F
0.087052268
59
Figure 2.15
Note: Time (Act.) is the actual average time. Time (Equ.) is the time predicted by the regression
equation.
Each Factor's Affect on Time
Time (Act.)
0.95
0.98
1.27
1.44
1.47
1.58
1.66
1.75
3.96
Time (Equ.)
0.93
1.00
1.25
1.45
1.54
1.59
1.67
1.68
3.96
Frq
-0.45
-0.40
-0.68
-0.64
-0.44
-0.56
-0.32
-0.37
-0.32
RAM
-0.011
-0.022
-0.011
-0.002
-0.004
-0.004
-0.004
-0.006
-0.004
L2-Cache
-2.21
-2.21
-1.47
-0.37
-0.37
-1.47
-1.47
-0.37
-0.37
Front Side Bus
-2.17
-2.17
-1.63
-1.08
-0.81
-1.63
-0.81
-0.68
-1.08
Mult
-0.33
-0.29
-0.65
-0.92
-0.42
-0.54
-0.61
-0.42
-0.46
Voltage
-6.7
-6.7
-7.6
-8.8
-9.7
-7.6
-7.9
-9.7
-6.4
TDP
0.20
0.20
0.69
0.62
0.63
0.78
0.12
0.56
0.02
Download