1 Final Project Specifications (10/03/2006) Fouad Ramia, Hunar Qadir, ECE 646 1 INTRODUCTION With today's great demand for secure communications systems, networks and the Internet, there is a growing demand for real-time implementation of cryptographic algorithms. RC6, RSA’s candidate for AES remains a good choice for security applications and is also a candidate for the NP 18033 project. In this project, we investigate the efficiency of RC6 from the hardware implementation perspective with Field Programmable Gate Arrays (FPGAs) as the target technology. Our analysis and synthesis studies of the ciphers will suggest that it would be desirable for FPGA implementations to have a simpler cipher design that makes use of simpler operations that not only possess good cryptographic properties, but also make the overall cipher design efficient from the hardware implementation perspective. 2 DESIGN Design entry level: VHDL Target implementation: Xilinx Spartan 3 FPGA CAD tools: Aldec Active-HDL, Synplicity Synplify Pro, ModelSim and Altera Quartus II 3 DESCRIPTION OF THE RC6 ALGORITHM Like all AES ciphers, RC6 works on 128 bit blocks. It can accept variable length keys. RC6 uses multiplication to determine the rotation amount and uses all bits of input data to determine the rotation amount, strengthening the avalanche effect. Computation of f(X) = (X(2X + 1)) mod 2w is a critical arithmetic operation of this block cipher. RC6 is more accurately specified as RC6-w/r/b where the word size is w bits, encryption consists of r number of rounds and b denotes the length of the encryption key in bytes. We are going to implement this version of RC6 algorithm, using 20 rounds and 32 bytes (256 bits) encryption key lengths (RC6-32/20/32). The key schedule generates 2r + 4 words (w bits each) from the b-bytes key provided by the user. The round keys are stored in an array S[0, . . . , 2r+3] and are used for encryption and decryption. 4 RC6 CIPHER BLOCK DIAGRAM Fig.1 RC6 Cipher block diagram 5 PROCEDURE In this project we will present a hardware implementation of RC6 algorithm using VHDL hardware description Language. For this implementation we use Xilinx XSA3S1000 board. We chose this board for its characteristics: more than one million equivalent gates and 512 input/output buffers. In VHDL Hardware Description Language, we have to define two elements in the code: the entity and the architecture. The entity in VHDL describes the interface to a hierarchical block, without defining its behavior. Architecture is always associated with an entity and it defines the behavior of the entity. To drive the circuit we need to write a test bench that will verify the functionality of the circuit. Using Active-HDL: We will run the simulation and check the waveforms. We will synthesize the code and get the RTL netlist. We will run the timing simulation/analysis enabling us to determine the critical path and the minimum clock frequency. We will implement the code and get the bit file to download on the FPGA if possible (if FPGA is physically present in the lab). We will also determine the number of CLB slices used and the efficiency of the algorithm when implemented in hardware. 2 6 INITIAL DESIGN APPROACH All of the encryption, decryption and key schedule circuits will be implemented in VHDL. 6.1 Encryption with RC6-w/r/b Input: Plaintext stored in four w-bit input registers A, B, C, D Number r of rounds w-bit round keys S[0, … ,2r + 3] Output: Ciphertext stored in A, B, C, D Procedure: B = B + S[0] D = D + S[1] for i = 1 to r do { t = (B x (2B + 1)) <<< log w u = (D x (2D + 1)) <<< log w A = ((A xor t) <<< u) + S[2i] C = ((C xor u) <<< t) + S[2i+ 1] (A, B, C, D) = (B, C, D, A) } A = A + S[2r + 2] C = C + S[2r + 3] S [0, … ,2r + 3]. The constants P32 = B7E15163 and Q32 = 9E3779B9 (hexadecimal) are the same “magic constants" as used in the RC5 key schedule. Procedure: S[0] = Pw for i = 1 to 2r + 3 do S[i] = S[i - 1] + Qw A=B=i=j=0 v = 3 x max{c, 2r + 4} for s = 1 to v do { A = S[i] = (S[i] + A + B) <<< 3 B = L[j] = (L[j] + A + B) <<< (A + B) i = (i + 1) mod (2r + 4) j = (j + 1) mod c } 6.4 Initial block diagram 6.2 Decryption with RC6-w/r/b Input: Ciphertext stored in four w-bit input registers A, B, C, D Number r of rounds w-bit round keys S[0; : : : ; 2r + 3] Output: Plaintext stored in A, B, C, D Procedure: C = C - S[2r + 3] A = A - S[2r + 2] for i = r downto 1 do { (A, B, C, D) = (D, A, B, C) u = (D x (2D + 1)) <<< log w t = (B x (2B + 1)) <<< log w C = ((C - S[2i + 1]) >>> t) xor u A = ((A - S[2i]) >>> u) xor t } D = D - S[1] B = B - S[0] 6.3 Key schedule with RC6-w/r/b The key schedule of RC6-w/r/b is practically identical to the key schedule of RC5-w/r/b. The only difference is that more words are derived from the user-supplied key for use during encryption and decryption. The user supplies a key of b bytes (in our case b = 32). Sufficient zero bytes are appended to give a key length equal to a non-zero integral number of words; these key bytes are then loaded in little-endian fashion into an array of c w-bit (w = 32 bits in our case) words L[0], … ,L[c - 1]. Thus the first byte of key is stored as the low-order byte of L[0], etc., and L[c - 1] is padded with high-order zero bytes if necessary. The number of w-bit (32 bit) words that will be generated for the additive round keys is 2r + 4 and these are stored in the array 7 TIME SCHEDULE Weeks 1-2 Weeks 3-4 Week 5 Week 6 : Research + coming up with block diagrams and design specifications : Coding + Debugging : Implementation in hardware : Documentation 3 8 REFERENCES [1] R.L. Rivest, M.J.B. Robshaw, R. Sidney, and Y.L.Yin, “The RC6 Block Cipher," available at website http://theory.csail.mit.edu/~rivest/rc6.pdf [2] R.L. Rivest, “The RC5 Encryption Algorithm," available At website http://theory.lcs.mit.edu/~rivest/Rivestrc5rev.pdf [3] “National Bureau of Standards - Data Encryption Standard," FIPS Publication 46, 1977. [4] NIST Advanced Encryption Standard (AES) Development Effort available at website http://csrs.nist.gov/encryption/aes/aes-home.htm [5] R.L. Rivest, “The RC5 Encryption Algorithm," Proceedings of Fast Software Encryption - 2nd International Workshop, Leuven, Belgium, Springer Verlag LNCS 1008, pp. 86-96, 1995. [6] J.-P. Kaps and C. Paar, “Fast DES Implementation for FPGAs and its Application to a Universal Key-search Machine," presented at Workshop in Selected Areas of Cryptography (SAC’98), Kingston, Ont., Aug. 1998. [7] H. Feistel, W.A. Notz, and J.L. Smith, “Some Cryptographic Techniques for Machine-to-Machine Communi cations," Proceedings of IEEE, Vol. 63, No. 11, pp. 15451554, 1975. [8] C.M. Adams and S.E. Tavares, “Designing S-boxes for Ciphers Resistant to Differential Cryptanalysis," Proceedings of the 3rd Symposium on State and Progress of Research in Cryptography, Rome, Italy, pp. 181-190, 1516 Feb. 1993. [9] Doran, R. W., “Variants on an Improved Carry Lookahead Adder," IEEE Trans. on Computers, Vol. 37, No. 9, pp. 1110-1113, 1988. [10] Wallace, C. S., “A Suggestion for a Fast Multiplier," IEEE Trans. on Computer, Vol. EC-13, pp.14-17, 1964. [11] “Active-HDL Getting Started” available at website http://ece.gmu.edu/labs/Active_HDL.pdf (manual) [12] “XSA-3S1000 FPGA Board v. 1.0 User Manual,” Xess Corporation, available at website http://www.xess.com/manuals/xsa-3S-manual-v1_0.pdf. 2005