Implementing Crypto Microprocessor using Rijndael Algorithm Abdel-Karim R. Al Tamimi aa7@cec.wustl.edu Computer Science and Engineering Department Washington University in St. Louis St. Louis, Missouri Abstract This work presents a microprocessor for executing programs which have been enciphered. Such a cryptomicroprocessor deciphers the enciphered program piecemeal as it executes it. Deciphered instructions are stored inside another memory module to allow the processor to access them on the fly. An implementation of Rijndael, the Advanced Encryption Standard (AES) of National Institute of Standards and Technology (NIST), is used to encrypt and decrypt data running its symmetric cipher algorithm using a key size of 128bits, mode called AES128. Such microprocessor reduces the security risks related to code mobility. 1. Introduction Wireless sensor networks increasingly becoming viable solutions to many challenging problems and will successively be deployed in many areas in the future. Wireless sensor network devices have two key aspects: they use low power devices, and they are small in size relatively to other devices used in other kinds of networks. Many applications are introduced at the present time that use wireless network as their network infrastructure. However, deploying a new technology without taking security in mind has often proved to be unreasonably dangerous [1]. What is proposed in this paper is a way to transfer data/programs through sensor networks without sacrificing data security. The main idea is to develop a microprocessor that can execute encrypted instruction gradually as needed (referenced) inside the running program. The idea of encrypting program instructions down to the binary level and decrypting them gradually as they are needed in the program was described first by Best [2, 3]. But because of the cryptographic functions overhead involved in such circuits, it was never used on a large scale. The great innovations in ICs industry motivate us to reevaluate the solution and revisit the subject again. What demanded a large power-hungry chip numbers in the old days now can be implemented by a single, small and power-efficient chip. The idea proposed in this paper seeks also to decrease the performance sacrifice due to decrypting functionality to minimum. This paper is organized as follows: Section 2 presents an overview about cryptography, Section 3 shows the reason behind using Rijndael over other cryptographic algorithms, Section 4 introduces the new architecture proposed, Section 5 presents simulation results, while Section 6 describes future work and some conclusions. 2. Cryptography overview Cryptography is not a recent science, but an old strategy to guarantee that information is exchanged securely, which means, other people do not have access to encrypted information. Many devices were utilized in history, like mechanic equipments used by Germany in World War II (Enigma). The core of the security back then, is to hide the algorithm you are using so that intruders can not use reverse engineering to find out the encryption algorithm used. Nowadays, cryptography is largely used in Internet banking systems and other money transfer operations. Cryptographic algorithms nowadays are provided to the public, since it is believed that security core or emphasis should be moved from securing the algorithm to securing the key used in that algorithm. In cryptography, the original data is called plaintext. The process of hiding the information is called encryption. The result (hidden text), is called cipher text. To extract the original data (plain text) for the hidden text, we execute decryption process, which requires knowing the secret key used in the encryption process. There are two main types of cryptography: symmetric and asymmetric. In the first type, symmetric cryptography, the communication between the two sides uses one key, that is called secret key or private key, which should be known to the two sides before they can communicate. This secret key is used to encrypt and decrypt the information exchanged between them. In asymmetric algorithm, there are two sets of keys: public key and private key. Public key is used to encrypt the secret key you want to exchange. In a simple scenario if we assume that host A wants to contact host B securely, it will send a request to host B requesting its public key. When it arrived to host A it encrypt the secret key it wants to be used in the communication process and sends back to host B. The only one can decrypt the encrypted secret key is host B, think of it as if host B holds two parts of the key and hands only one of them to host A. after the operation is done successfully the two hosts can now start transferring their data using symmetric encryption. The secret key used for the encryption/decryption process is the one exchanged [6]. Cryptography needs a standard, since communication is only possible when the same algorithm is used on the both sides. Data Encryption Standard (DES) is a wellknown 64-bit block and 56-bit key private=key symmetric encryption algorithm. It was made into a federal standard in 1977 and has since been widely used. A #1 million brute-force DEScracking machine that can break a DES key in about 3.5 hours has been reported [7]. To replace the old Data Encryption Standard (DES); In September, 12 of 1997, the National Institute of Standards and Technology (NIST) required proposals to what is called Advanced Encryption Standard (AES) [8]. After Round 1 selection process, five algorithms were chosen to advance to Round 2, in which NIST improve the analysis on each proposal, encouraging the “attack” to all competitors [9]. The five algorithms selected are: MARS RC6 RIJNDAEL SERPENT TWOFISH At the end of Round 2, the conclusion was that that the five competitors showed similar characteristics. On October 2nd, NIST announced Rijndael algorithm as the winner of the contest, because it has the best overall scores in security, performance, efficiency, implementability and flexibility [10]. 3. Rijndael Encryption Algorithm Rijndael Algorithm was developed by Joan Daemon, and Vincent Fijmen. The name Rijndael is a portmanteau comprising of the names of the inventors. Rijndael is a private-key symmetric block encryption algorithm that supports 128, 192, and 256bit length keys and operates on 128, 192, 256-bit blocks. All nine combinations of key length and block size are possible. In this work, all the implementation was focused on AES128. Rijndael has been implemented in software using C/C++, Java, C#, assembly languages and many other languages [11, 12]. Software implementation offer limited throughput capabilities, compared to specialized hardware chips. In this paper a hardware implementation called AES86 is used, provided by ht-lab and working at relatively low rate (since the proposed solution aimed to provide a solution to privacy risks regardless of the speed) of about 37Mb/s [13].Table A shows software implementation throughputs [14]. 4. Crypto Microprocessor Architecture The proposed architecture can be attached to any processor design available, simple 32-bit 5-stages pipeline processor was used to demonstrate the behavior of the model. Implementation ANSI C Visual C++ Encryption Speed 27 Mb/s 70.5 Mb/s Table A: Software implementation throughput. Table B shows some of the commercial hardware AES cores available nowadays [15]. Core Technology Speed Throughput AES 32-bit TSMC 0.13 u 400 MHz ~1.16 Gbit/s AES 32-bit UMC 0.18 u 344 MHz ~997 Mbit/s AES 128-bit TSMC 0.13 u 400 MHz ~4.64 Gbit/s AES 128-bit UMC 0.18 u 344 MHz ~3.99 Gbit/s Figure A: Simple MIPS microprocessor Figure B shows the extra components added to the simple MIPS microprocessor to implement the proposed architecture. There are 5 modules added to the architecture (4 are shown only). Table B: Hardware implementation throughput As we can notice in table B, customized hardware implementation of Rijndael can offer a very high throughput rate. Rijndael has two main modes of operation, Electronic Code Book (ECB) and Cipher Block Chaining (CBC). The ECB mode is the simplest one and will encrypt each block of 128bits independently of each other. A more secure method is to XOR each input block with the cipher-text of the preceding block before encryption; this is called Cipher Block Chaining [13]. The proposed module uses ECB mode since it offers more throughput and it is easier to implement. Figure B: Changes made to simple MIPS MP. In the front end, these modules were added (beside the decryption module described before): 128-bit SRAM memory module: a 64-KBx16 Bytes memory module is used to store the encrypted instructions. The length 128-bit is used because it is the most suitable length as a trade of convenient and efficiency. Four 32-bit instructions are stored as one 128-bit encrypted entry, since the Encryption/ Decryption module uses AES128 bit ECB. If the total size of the instructions is not 128-bit divisible, zero padding is used to keep the system able to encrypt/decrypt the needed instructions. 4. If the memory reference is not valid (USED = ‘0’), the decision maker stalls the pipe lines (by sending FREEZE signal), putting the processor in freeze state, and then it decrypts the 128-bit reference and stores it back into the main system memory module as four 32-bit chunks of data. Figure C shows the flow diagram of decision maker behavior. New PC available False Used bit array (64-Kb ≡ 8KB): this module is used to tell whether the instruction in the 128-bit SRAM module has been encrypted before or not, this information helps the decision maker module (will be introduced in the following point) to decide whether the memory references in the main system memory (32-bit long) is valid or not. In most of the times this check is the only overhead added to the system. Decision maker: This module is the heart of the system it works according to the following algorithm: 1. New PC value is available. 2. Check whether the memory reference was decrypted before (using used bit for each entry) and if it is in the main memory (is the memory reference valid in the main memory), then we will have two options. 3. If the memory reference is valid (the instruction has been already decrypted, USED = ‘1’), the system will continue as if there are no extra modules in the way. This reduces the performance penalty to this extra checking. Mem. Ref. Valid ? True Decrypt the required instruction and store it. Continue working normally Figure C: Decision maker flow chart The decryption process results of decrypting 4 instructions at a time. This gives the system the opportunity to reference them without the need to decrypt the whole 128-bit entry again. When the program references all the pages it uses normally, the behavior of the system will be almost identical to the one without security capabilities. Since all the needed instructions will be ready to be fetched from the system main memory module. On the other end (output) one module was added to let the system output its results in encrypted form. In this case two possible approaches were available: 1. Encrypt each 32-bit output after adding 128-bit zero padding to it. 2. Wait until four 32-bit output chunks are ready and then encrypt them without adding any kinds of padding. Option number 1 was used, since it is related more to sensor networks world constraints. 5. Modeling and Simulation Results In this section I will show both sides of the project. First part is to convert the binary instruction from plain binary instruction 32bit wide, to 128bit wide encrypted units. An assembler was created to facilitate the process. The assembler was programmed using C#.NET. It is divided into three stages: 1. Separate data section from instructions section. Compute labels values and replace labels with their values. 2. Convert instruction from their text format to binary format with 32-bit width. 3. Convert binary instruction data to 128-bit encrypted chunks.(Rijndael managed class provided with .NET framework 1.1 was used [16]) Figures D.a, D.b and D.c show the three steps. Figure D.b: (Step 2) Decode instructions Figure D.c: (Step 3) Encrypt binary instructions The second part of the project is the simulation process for the model. The simulation was done using ModelSim 6.0a. In this simulation a small program to execute bubble sort algorithm was used. The following figure show how the simulation executed. Figure E: Project Simulation Figure D.a: (Step 1) Replacing labels with their line values. Figure E shows how the system reacted after an invalid memory references. New address is requested. The decrypted data is ready and it is written as four chunks of 32-bit instruction. Write bit is activated to allow the decision maker to write to main memory module. Decrypted instructions are written to the main memory module and ready to be fetched. Input Address for memory module changes according to the data written to the module. For instance if we have 1010 memory reference the sequence of memory addresses will be: [w]1000, [w]1001, [w]1010, [w]1011, [r]1010. Freeze signal (stall system pipelines) is activated while the decryption process is processing, then it deactivated after finishing writing the four 32-bit chunks back to the memory. Figure F shows how the memory is changed after the last memory reference. 6. Conclusions and future work A simple and efficient solution to security vulnerabilities in sensor networks world, especially when updating mobile sensors program, was presented. The solution supports the capability of sending encrypted instructions through sensor networks without sacrificing security. Although the system adds an extra overhead due to the decryption process, but the nowadays chips show a great capability of throughput that exceeds the needs of mobile hosts. Moreover, this overhead is applied once per program, i.e. once the program is decrypted into the system main memory the performance will be almost identical to prior unsecured systems. The next step is to synthesize the project and have it available on ready to use chip. Extra work might be done to produce a top notch decryption/encryption unit with low power consumption to allow mobile units to live longer. Acknowledgements I would like to thank my professor (Prof. Young Cho) for his sincere work and important directions. Also, I would like to thank my classmates who provided a challenging environment that led to this proposal. Figure F: Memory Contents As shown in the figure above the system decrypts the instructions gradually as needed. Since most –if not all- sensor networks programs executes their program infinite times, after the first execution of the program, it will be decrypted and ready to be fetched from the main memory .The system shows almost an identical performance results to the same system configuration without encryption capabilities. References [1] Stefan Schmidt, Holger Krahn, Stefan Fischer, and Dietmar Watjen, “A Security Architecture for Mobile Wireless Sensor Networks”. [2] R.M. Best, “Preventing Software Piracy with CryptoMicroprocessors,” Proc. IEEE Spring COMPCON ’80, pp. 466-469, San Francisco, 2528 Feb. 1980. [3] R.M. Best, Microprocessor for Executing Enciphered Programs, U.S. patent 4,168,396, 18 Sept. 1979. [4] R.M. Best, Crypto Microprocessor for Executing Enciphered Programs, U.S. patent 4,278,837, 14 July 1981. [5] R.M. Best, Crypto Microprocessor that Executes Enciphered Programs, U.S. patent 4,465,901, 14 Aug. 1984. [6] Alex Panato, Marcelo Barcelos, Ricardo Reis , A Low Device Occupation IP to implement Rijndael Algorithm. [7] D. Runje and M. Kovac, “Univerisal Strong Encryption FPGA Core Implementation,” Proceedings of IEEE Design Automation and Test in Europe, pp.923-924. [8] NIST. Advanced Encryption Standard (AES). Official NIST homepage about AES. [9] NIST. AES Round 2 Information, Official NIST information about the five algorithms selected to the second round of AES. [10] NIST. Commerce Department Announces Winner of Global Information Security Competition, Official NIST site. [11] J. Daemon and V. Rijmen, The Rijndael Block Cipher, AES proposal. ver.2 , March 1999. [12] The Rijndael Page, available at http://www.iaik.tugraz.ac.at/research/krypto/AES/old/~rijmen/rijndael/. [13] ht-lab website, free cores page. http://www.ht-lab.com [14] Mrs. G. Umamaheswari, Dr. A. Shaunmugam, “Efficient VLSI implementayion of the block cipher Rijndael algorithm,” [15] CAST cores, http://www.castinc.com/cores/aes/index.shtml [16] MSDN website, http://msdn.microsoft.com/library/default.asp?url=/library/en us/cpref/html/frlrfsystemsecuritycryptographyrijndaelmanage dclasstopic.asp