EE113L Final Project: RC4 encryption Layne Hoo Nathaniel Wong Spring 2001 Introduction For our proposed project, we chose to implement RC4 encryption on the TMS320C54x DSP processor. Although a very simple algorithm, it is very reliable for encrypting data for certain purposes. RC4 is a symmetric stream cipher developed by RSADSI. RC4 uses keys up to 256 bytes or 2048 bits in length to initialize a 2048 bit state table. The state table is used for generating pseudo-random bytes and then generating a pseudo-random stream, which is XORed with the plaintext to give the ciphertext. RC4 is interesting in that doing the same routine does encryption and decryption. For example, to encrypt data, data and the password is passed to the routine, which will generate encrypted data. To decrypt, the encrypted data and the password are passed back into that same routine to give the original data. Project Development The overall project proposal posed us a lot of trouble. There were a lot of possible projects to choose from, but when it came down to actually implementing given the allotted time, many were deemed impossible. We stumbled upon RC4 during 7th week and decided that it was a doable project. During 7th week, we tried to gather up as much information as possible on RC4, mostly by searching on the Internet. While searching on the web, we found a few sites that contained the implementation of the RC4 algorithm in C, not to mention a few detailed sites on how the RC4 algorithm works. (A list of these websites can be found in the reference section of our report). The next step was to implement the RC4 algorithm into .asm source code. The C source code helped us when we had to verify to make sure that our results were correct. We did the implementation and the debugging during 8th and 9th week. The following are the steps that we took to implement the RC4 algorithm. Each step was tested before we integrated it into the entire implementation. The RC4 algorithm is broken up into two different phases. The first one is key setup and the second being ciphering. Here is the algorithm that we used to implement RC4 encryption: Phase 1: Key Setup 1. Create a 256 state array like so: S [0] .. S [255]. 2. Take the 256 state array and initialize the array with the value of it’s own index like so: S [0] = 0; S [1] = 1……… S [255] = 255; 3. Create a second state array of the same size and fill this array with the key. If the key is smaller then 256 bytes then repeat the key to fill up the array like so: for (i = 0; i < 256; i = i + 1) S2 [i] = key [i % keylen]; where “i” is the known first index value, key[ ] is the stored key value inputted by the user and keylen is the length of the key value. 4. Initialize the index “j” to zero and initialize the S-box like this: for (i = 0; i < 256; i = i + 1) { j = (j + S [i] + S2 [i]) % 256; temp = S [i]; S [i] = S [j]; S [j] = temp } “j” is used as an unknown index value generated that allows the values in the state table to be randomly swapped. 5. Initialize indexes i and j to zero and arrays S2 and the key array to zero too. Phase two: Ciphering A pseudorandom byte K is generated: i = (i+1) % 256; j = (j + S[i]) % 256; Swap values in the state array table: temp = S [i]; S [i] = S [j]; S [j] = temp; t = (S [i] + S [j]) % 256; K = S [t]; This resulting K byte is the encryption key. To encrypt, XOR the value K with the next byte of plaintext. To decrypt, XOR the K byte with the next byte of the ciphertext. The S-box slowly evolves with use; index i ensures that every element changes and index j ensures that the elements change randomly. Our development of the project consists of a final product where the value that is encrypted is only a byte. The application that uses our implementation should be able to take a plaintext character that is entered and turn each individual character into a corresponding value which will then be ready to be encrypted using our implementation of RC4. We ran into a few problems while trying to implement the RC4 algorithm. We were worried that we would not be able to finish so we did not implement a minor part of the algorithm where the second state array is filled with the repeated key value. Instead, we filled that array with an arbitrary repeated key manually at the beginning of our .asm code. During 9th week, we were having a lot of problems with setting up the state table due to the continuous swapping of the values in the state table. We used what we learned in previous experiments and looked at the memory locations while stepping through the instructions. This proved to be very tedious, yet very effective. At one point in our development stage, we found out that we received different values whiling running the whole source code versus stepping through the entire code. This was very odd. We discovered that it took a certain number of cycles for certain instructions to update registers and memory locations before you can start to read from them. If one did not wait for the registers to update, then the subsequent instruction would be evaluating with the wrong values and thus give undesirable results. The solution: nop’s. We found that five to six nop instructions between each instruction was sufficient. Due to the nature that the DSP does not divide or do modulus, we had to implement our own. We went about it by subtracting until the value was negative. We chose to use the instruction cmps for the check for negativity. Once it was negative the negative value would be added with 256 and that sum would be the modulus. However, if the difference obtained was a zero then it would have to skip the addition of 256. Discussion To use and test our RC4 implementation you can change a few parameters in the source code. The possible things to change are: 1) Specify a key value and repeat it in the S2 array filling all 256 values of the array. 2) Specify the key in the key variable. 3) Specify the key length in keylen variable. 4) Specify a plain text value ( this int value can be representative of an ascii letter). Our implementation of the RC4 algorithm was tested with a few different keys and key sizes. When comparing our results to the ones obtained by running the C implementation, we found that our results were the same and were indeed correct based on that fact alone. We could not accurately measure the performance of our C54x RC4 algorithm due to the addition of NOP commands in order for our code to function and the manual filling of the second state array. Due to this fact, the performance of the DSP implementation versus the C version could not be accurately compared. The performance of the algorithm depends on the key length and the length of the data. Conclusion The main goal of this project was to see if our TMS320C54x processor produced better performance then say an Intel x86 processor when testing the RC4 encryption. We unable to determine which was faster experimentally due to the addition of “nops” and a few manual interventions in our implementation. However, we do know that there are encryption implementations out there where the use of DSP processors have an incredible increase in performance when compared to the CPU processors. This project has helped us learn many fundamental ideas and applications for DSPs. We were able to use our knowledge from previous experiments to help us develop our current project. This is a tactic has helped us reinforce using what we have learned and helped us remember what we have learned this whole quarter. References WWW Pages: ISAAC and RC4 URL http://burtleburtle.net/bob/rand/isaac.html 4GuysFromRolla.com – RC4 Encryption Using ASP & VBScript URL http://www.4guysfromrolla.com/webtech/010100-1.shtml RC4 Encryption Algorithm URL http://www.ncat.edu/~grogans/main.htm Stream Cipher RC4 / ARC-4 URL http://www.achtung.com/crypto/rc4.html#Algorithm_Description Index of /security/cryptography/algorithms/rc4 URL http://the.wiretapped.net/security/cryptography/algorithms/rc4/?N=D