Report - Information Sciences Institute

Implementing Crypto Microprocessor using Rijndael
Abdel-Karim R. Al Tamimi
Computer Science and Engineering Department
Washington University in St. Louis
St. Louis, Missouri
This work presents a microprocessor for
executing programs which have been
cryptomicroprocessor deciphers the enciphered
program piecemeal as it executes it.
Deciphered instructions are stored inside
another memory module to allow the
processor to access them on the fly. An
Advanced Encryption Standard (AES) of
National Institute of Standards and
Technology (NIST), is used to encrypt
and decrypt data running its symmetric
cipher algorithm using a key size of
128bits, mode called AES128. Such
microprocessor reduces the security risks
related to code mobility.
1. Introduction
Wireless sensor networks increasingly
becoming viable solutions to many
challenging problems and will successively
be deployed in many areas in the future.
Wireless sensor network devices have two
key aspects: they use low power devices,
and they are small in size relatively to other
devices used in other kinds of networks.
Many applications are introduced at the
present time that use wireless network as
their network infrastructure. However,
deploying a new technology without taking
security in mind has often proved to be
unreasonably dangerous [1]. What is
proposed in this paper is a way to transfer
data/programs through sensor networks
without sacrificing data security. The main
idea is to develop a microprocessor that can
execute encrypted instruction gradually as
needed (referenced) inside the running
The idea of encrypting program
instructions down to the binary level and
decrypting them gradually as they are
needed in the program was described first by
Best [2, 3]. But because of the cryptographic
functions overhead involved in such circuits,
it was never used on a large scale.
The great innovations in ICs industry
motivate us to reevaluate the solution and
revisit the subject again. What demanded a
large power-hungry chip numbers in the old
days now can be implemented by a single,
small and power-efficient chip.
The idea proposed in this paper seeks also
to decrease the performance sacrifice due to
decrypting functionality to minimum.
This paper is organized as follows: Section 2
presents an overview about cryptography,
Section 3 shows the reason behind using
algorithms, Section 4 introduces the new
architecture proposed, Section 5 presents
simulation results, while Section 6 describes
future work and some conclusions.
2. Cryptography overview
Cryptography is not a recent science, but
an old strategy to guarantee that information
is exchanged securely, which means, other
people do not have access to encrypted
information. Many devices were utilized in
history, like mechanic equipments used by
Germany in World War II (Enigma). The
core of the security back then, is to hide the
algorithm you are using so that intruders can
not use reverse engineering to find out the
encryption algorithm used.
Nowadays, cryptography is largely used in
Internet banking systems and other money
transfer operations.
Cryptographic algorithms nowadays are
provided to the public, since it is believed
that security core or emphasis should be
moved from securing the algorithm to
securing the key used in that algorithm.
In cryptography, the original data is called
plaintext. The process of hiding the
information is called encryption. The result
(hidden text), is called cipher text. To
extract the original data (plain text) for the
hidden text, we execute decryption process,
which requires knowing the secret key used
in the encryption process.
There are two main types of cryptography:
symmetric and asymmetric. In the first type,
symmetric cryptography, the communication
between the two sides uses one key, that is
called secret key or private key, which
should be known to the two sides before
they can communicate. This secret key is
used to encrypt and decrypt the information
exchanged between them.
In asymmetric algorithm, there are two sets
of keys: public key and private key. Public
key is used to encrypt the secret key you
want to exchange. In a simple scenario if we
assume that host A wants to contact host B
securely, it will send a request to host B
requesting its public key. When it arrived to
host A it encrypt the secret key it wants to
be used in the communication process and
sends back to host B. The only one can
decrypt the encrypted secret key is host B,
think of it as if host B holds two parts of the
key and hands only one of them to host A.
after the operation is done successfully the
two hosts can now start transferring their
data using symmetric encryption. The secret
key used for the encryption/decryption
process is the one exchanged [6].
Cryptography needs a standard, since
communication is only possible when the
same algorithm is used on the both sides.
Data Encryption Standard (DES) is a wellknown 64-bit block and 56-bit key
algorithm. It was made into a federal
standard in 1977 and has since been widely
used. A #1 million brute-force DEScracking machine that can break a DES key
in about 3.5 hours has been reported [7].
To replace the old Data Encryption
Standard (DES); In September, 12 of 1997,
the National Institute of Standards and
Technology (NIST) required proposals to
what is called Advanced Encryption
Standard (AES) [8]. After Round 1 selection
process, five algorithms were chosen to
advance to Round 2, in which NIST improve
the analysis on each proposal, encouraging
the “attack” to all competitors [9]. The five
algorithms selected are:
 RC6
At the end of Round 2, the conclusion was
that that the five competitors showed similar
characteristics. On October 2nd, NIST
announced Rijndael algorithm as the winner
of the contest, because it has the best overall
scores in security, performance, efficiency,
implementability and flexibility [10].
3. Rijndael Encryption Algorithm
Rijndael Algorithm was developed by Joan
Daemon, and Vincent Fijmen. The name
Rijndael is a portmanteau comprising of
the names of the inventors. Rijndael is a
private-key symmetric block encryption
algorithm that supports 128, 192, and 256bit length keys and operates on 128, 192,
256-bit blocks. All nine combinations of key
length and block size are possible. In this
work, all the implementation was focused on
AES128. Rijndael has been implemented in
software using C/C++, Java, C#, assembly
languages and many other languages [11,
12]. Software implementation offer limited
throughput capabilities, compared to
specialized hardware chips. In this paper a
hardware implementation called AES86 is
used, provided by ht-lab and working at
relatively low rate (since the proposed
solution aimed to provide a solution to
privacy risks regardless of the speed) of
about 37Mb/s [13].Table A shows software
implementation throughputs [14].
4. Crypto Microprocessor
The proposed architecture can be attached
to any processor design available, simple
32-bit 5-stages pipeline processor was used
to demonstrate the behavior of the model.
Visual C++
Encryption Speed
27 Mb/s
70.5 Mb/s
Table A: Software implementation throughput.
Table B shows some of the commercial
hardware AES cores available nowadays
AES 32-bit
TSMC 0.13 u
400 MHz
~1.16 Gbit/s
AES 32-bit
UMC 0.18 u
344 MHz
~997 Mbit/s
AES 128-bit
TSMC 0.13 u
400 MHz
~4.64 Gbit/s
AES 128-bit
UMC 0.18 u
344 MHz
~3.99 Gbit/s
Figure A: Simple MIPS microprocessor
Figure B shows the extra components
added to the simple MIPS microprocessor to
implement the proposed architecture. There
are 5 modules added to the architecture (4
are shown only).
Table B: Hardware implementation throughput
As we can notice in table B, customized
hardware implementation of Rijndael can
offer a very high throughput rate. Rijndael
has two main modes of operation, Electronic
Code Book (ECB) and Cipher Block
Chaining (CBC). The ECB mode is the
simplest one and will encrypt each block of
128bits independently of each other. A more
secure method is to XOR each input block
with the cipher-text of the preceding block
before encryption; this is called Cipher
Block Chaining [13]. The proposed module
uses ECB mode since it offers more
throughput and it is easier to implement.
Figure B: Changes made to simple MIPS MP.
In the front end, these modules were added
(beside the decryption module described
 128-bit SRAM memory module: a
64-KBx16 Bytes memory module is
used to store the encrypted
instructions. The length 128-bit is
used because it is the most suitable
length as a trade of convenient and
efficiency. Four 32-bit instructions
are stored as one 128-bit encrypted
entry, since the Encryption/
Decryption module uses AES128 bit
ECB. If the total size of the
instructions is not 128-bit divisible,
zero padding is used to keep the
system able to encrypt/decrypt the
needed instructions.
4. If the memory reference is not
valid (USED = ‘0’), the decision
maker stalls the pipe lines (by
putting the processor in freeze
state, and then it decrypts the
128-bit reference and stores it
back into the main system
memory module as four 32-bit
chunks of data.
Figure C shows the flow diagram of
decision maker behavior.
New PC available
Used bit array (64-Kb ≡ 8KB):
this module is used to tell whether
the instruction in the 128-bit SRAM
module has been encrypted before
or not, this information helps the
decision maker module (will be
introduced in the following point) to
decide whether the memory
references in the main system
memory (32-bit long) is valid or not.
In most of the times this check is the
only overhead added to the system.
Decision maker: This module is the
heart of the system it works
1. New PC value is available.
2. Check whether the memory
reference was decrypted before
(using used bit for each entry)
and if it is in the main memory
(is the memory reference valid
in the main memory), then we
will have two options.
3. If the memory reference is valid
(the instruction has been already
decrypted, USED = ‘1’), the
system will continue as if there
are no extra modules in the way.
This reduces the performance
penalty to this extra checking.
Mem. Ref.
Valid ?
Decrypt the required
instruction and store it.
Continue working normally
Figure C: Decision maker flow chart
The decryption process results of
decrypting 4 instructions at a time. This
gives the system the opportunity to reference
them without the need to decrypt the whole
128-bit entry again. When the program
references all the pages it uses normally, the
behavior of the system will be almost
identical to the one without security
capabilities. Since all the needed instructions
will be ready to be fetched from the system
main memory module.
On the other end (output) one module was
added to let the system output its results in
encrypted form. In this case two possible
approaches were available:
1. Encrypt each 32-bit output after
adding 128-bit zero padding to
2. Wait until four 32-bit output
chunks are ready and then
encrypt them without adding
any kinds of padding.
Option number 1 was used, since it is
related more to sensor networks world
5. Modeling and Simulation Results
In this section I will show both sides of the
project. First part is to convert the binary
instruction from plain binary instruction 32bit wide, to 128bit wide encrypted units. An
assembler was created to facilitate the
process. The assembler was programmed
using C#.NET. It is divided into three
1. Separate
instructions section. Compute labels
values and replace labels with their
2. Convert instruction from their text
format to binary format with 32-bit
3. Convert binary instruction data to
128-bit encrypted chunks.(Rijndael
managed class provided with .NET
framework 1.1 was used [16])
Figures D.a, D.b and D.c show the three
Figure D.b: (Step 2) Decode instructions
Figure D.c: (Step 3) Encrypt binary
The second part of the project is the
simulation process for the model. The
simulation was done using ModelSim 6.0a.
In this simulation a small program to
execute bubble sort algorithm was used. The
following figure show how the simulation
Figure E: Project Simulation
Figure D.a: (Step 1) Replacing
labels with their line values.
Figure E shows how the system reacted
after an invalid memory references.
 New address is requested.
 The decrypted data is ready and it is
written as four chunks of 32-bit instruction.
 Write bit is activated to allow the
decision maker to write to main memory
 Decrypted instructions are written to the
main memory module and ready to be
 Input Address for memory module
changes according to the data written to the
module. For instance if we have 1010
memory reference the sequence of memory
addresses will be: [w]1000, [w]1001,
[w]1010, [w]1011, [r]1010.
 Freeze signal (stall system pipelines) is
activated while the decryption process is
processing, then it deactivated after finishing
writing the four 32-bit chunks back to the
Figure F shows how the memory is
changed after the last memory reference.
6. Conclusions and future work
A simple and efficient solution to security
vulnerabilities in sensor networks world,
especially when updating mobile sensors
program, was presented. The solution
supports the capability of sending encrypted
instructions through sensor networks
without sacrificing security.
Although the system adds an extra
overhead due to the decryption process, but
the nowadays chips show a great capability
of throughput that exceeds the needs of
mobile hosts. Moreover, this overhead is
applied once per program, i.e. once the
program is decrypted into the system main
memory the performance will be almost
identical to prior unsecured systems.
The next step is to synthesize the project
and have it available on ready to use chip.
Extra work might be done to produce a top
notch decryption/encryption unit with low
power consumption to allow mobile units to
live longer.
I would like to thank my professor (Prof.
Young Cho) for his sincere work and
important directions. Also, I would like to
thank my classmates who provided a
challenging environment that led to this
Figure F: Memory Contents
As shown in the figure above the system
decrypts the instructions gradually as
needed. Since most –if not all- sensor
networks programs executes their program
infinite times, after the first execution of the
program, it will be decrypted and ready to
be fetched from the main memory .The
system shows almost an identical
performance results to the same system
[1] Stefan Schmidt, Holger Krahn, Stefan Fischer, and
Dietmar Watjen, “A Security Architecture for Mobile
Wireless Sensor Networks”.
[2] R.M. Best, “Preventing Software Piracy with CryptoMicroprocessors,”
Proc. IEEE Spring COMPCON ’80, pp. 466-469, San
Francisco, 2528 Feb. 1980.
[3] R.M. Best, Microprocessor for Executing Enciphered
Programs, U.S.
patent 4,168,396, 18 Sept. 1979.
[4] R.M. Best, Crypto Microprocessor for Executing
Enciphered Programs,
U.S. patent 4,278,837, 14 July 1981.
[5] R.M. Best, Crypto Microprocessor that Executes
Enciphered Programs,
U.S. patent 4,465,901, 14 Aug. 1984.
[6] Alex Panato, Marcelo Barcelos, Ricardo Reis , A Low
Device Occupation IP to implement Rijndael Algorithm.
[7] D. Runje and M. Kovac, “Univerisal Strong Encryption
FPGA Core Implementation,” Proceedings of IEEE Design
Automation and Test in Europe, pp.923-924.
[8] NIST. Advanced Encryption Standard (AES). Official
NIST homepage about AES.
[9] NIST. AES Round 2 Information, Official NIST
information about the five algorithms selected to the second
round of AES.
[10] NIST. Commerce Department Announces Winner of
Global Information Security Competition, Official NIST site.
[11] J. Daemon and V. Rijmen, The Rijndael Block Cipher,
AES proposal. ver.2 , March 1999.
[12] The Rijndael Page, available at
[13] ht-lab website, free cores page.
[14] Mrs. G. Umamaheswari, Dr. A. Shaunmugam, “Efficient
VLSI implementayion of the block cipher Rijndael
[15] CAST cores,
[16] MSDN website,