Scott Reed MATH 714 07/21/09 Vigenere Cipher This week, I started to look at the Vigenere cipher. A Vigenere cipher is one that uses a shifted alphabet, called a Caesar shift, in a periodic manner based on a keyword. I more difficult version of Vigenere cipher uses a keyword that is the same size as the original message, making decoding a message much more difficult. The type are sometimes called “one time pad”, as the encoding scheme can only be used for that message due to the number of characters being equal. The cipher is written mathematically by Bauer1 as: A : i with i Nn Nn Ai : Ai ( x) x i showing the cipher is a shift of the letters Ai based on the modulus of Nn. A less complicated mathematical definition is given by Hamilton2 using modular arithmetic as: c ( p k ) mod 26 where: c = ciphertext character value p = plaintext character value k = keyword character value A key to encoding and decoding messages using the Vignere cipher is the Vigenere Square shown in figure 1. The original message letters, or plaintext, form the column headings. The keyword letters are the labels of the rows. The letters are given a numeric value from 0 to 25, where A=0 continuing sequentially until Z=25. The letters of the plaintext are shifted the number of letters corresponding to the number of the letter in the key. The square shows the shifts for each letter, easing the encoding or decoding of the message when the key is known. As an example, the phrase “applied mathematics” will be encoded using the keyword of “math.” Using Hamilton’s formula, the first entry will be applied with A=0, M=12. So that: c = (0+12)mod26 c = 12mod26 c = 12 c=M A P Plaintext M A Keyword Ciphertext M P P T I L H S I E M A X E D M T H W T A T M A M T H T L E H L M M Y A A A T I C S T H M A M P O S The cipher text is then “mpisxewtmtllyampos”, which is very perplexing when seen and it is not known what it is. In my first attempt at using the cipher, I encoded the alphabet using randomized letters in the alphabet to get the following cipher text. Plaintext Keyword Ciphertext A K K B L M C X Z D C F E Y C F Z E G W C H A H I N V J H Q K I S L J U M B N N O B O G U P V K Q S I R E V S T K T M F U D X V U P W R N X P M Y F D Z Q P Figure 1 – Vigenere Square This would be an example of a one time pad, where the message size has an equally sized keyword that does not repeat as normal. Decoding messages using the cipher is the opposite of the encoding. Using the table, find the cipher text in the row of the keyword; then the heading of that column is the plaintext. Mathematically, it is found using the additive inverse in modular arithmetic so that2: p (c (26 k )) mod 26 The problem of decoding a message written in the Vigenere cipher without knowing the key is simplified if the keyword length is found. One method of determining the keyword length is called the Friedman test2. 0.0265n , w (0.065 IC ) n( IC 0.0385) where: w= keyword length n= message length IC= index of coincidence for relatively large n The incidence of coincidence is the “probability that two randomly selected letters from the text are identical.”2 For a Caesar shift, the IC is around 0.065, and for the Vigenere cipher about 0.0385. Another method for determining the keyword length is presented in Tilborg3. Kasiski’s method is based on repeated segments of the cipher text. The method is completed by finding a string of repeated characters, noting their positions in the text. The keyword length should divide the difference of the positions of the repeated characters. If more than one set of repeated characters can be found, then the keyword length is likely a common divisor of the differences in positions of the repeated character strings. Once the keyword length is found, Hamilton2 provides a method to determine the keyword. Since the size of the keyword is known, the cipher text can be split into subsets, one for each character of the keyword. If the index of coincidence of each subset is found, it should be around 0.065, indicating a Caesar shift, which each subset is. The relative frequencies of the letters in each coset are compared to the relative frequcncies of the letters in the standard English language. Let the vector b be the frequencies in standard English so the b=(b1,b2,. . .b26), where b1 is the frequency of the letter a and b26 is the frequency of the letter z. The vector a is the letter frequency of the shifted letters of a subset of the cipher text, where a=(a1,a2,. . .a26). To determine the keyword letter, the vector a is rotated so that: a 0 a (a1 , a 2 , , a 26 ) a1 (a 2 , a3 ,, a 25, a 26 , a1 ) a k (a k 1 , a k 2 ,, a 26 , a1 , , a k 1 , a k ) a 25 (a 26 , a1 , , a 24 , a 25 ) The scalar product for each ak and b is calculated. The index k with the greatest scalar product is likely the value of the letter of the keyword for that subset. This is due to the fact that parallel vectors should have the greatest dot product, as orthogonal vectors have a scalar product of zero. This is what I had found on the Vigenere cipher. I am not sure where it might head from here. Hamilton has developed a program to encode, decode, and decipher the Vigenere for the TI-83. I may look to develop a program also, or to look more into the one time pad problem. 1 Bauer, F.L. Decrypted Secrets: Methods and Maxims of Cryptology. Springer. New York, 1997. 2 Hamilton, Michael and Bill Yonkosky. “The Vigenere Cipher with the TI-83.” Mathematics and Computer Education. 38.1. Winter 2004. 3 Tilborg, Henk C.A. van. Fundamental of Cryptology. Kluwer Academic Publishers. Boston, 2000.