pdf file - Electrical Engineering

advertisement
IEEE JOUIRNAL
OF SOLID-STATE
CIRCUITS,
VOL.
23, NO.
3, JUNE
647
1988
Security and Performance Optimization
of a New DES Data Encryption Chip
INGRID
VERBAUWHEDE,
FRANK
HOORNAERT,
SENIOR MEMBER, IEEE, AND HUGO
Abstract
high
formance
result
the
— Cryptograpfdcaf
security.
This
Data Encryption
Standard
between
design
transformations
algorithms.
wonld
These
strategy.
CAD
data encryption
floor
which
are combined
Novel
tools
high
and
It
chip.
is the
and chip designers.
not
At
and equivalence
plan with
problem
do
speed
of a new high-per-
optimization
a serious
optfmizations,
nor the security,
and layout
(DES)
cryptograpfdcal
present
both
cryptographers
lead to a very efficient
otherwise
algorithm
level,
demand
the implementation
of close cooperation
system
which
applications
paper presents
minimal
for
data
compromise
with a highly
routing,
scrambling
the
structured
are nsed at different
DES
design
steps in the
design process. The resnft is a single chip of 25 mmz in 3- pm double-metal
CMOS,
which
Functionality
tests show that a clock of 16.7 MHz
means that a 32-Mbk/s
modes.
This
execution
is the fastest
of all four
DES
DES
can be applied,
data rate can be achieved for all eight byte
chip reported
modes of operation
yet, aflowing
equally
due to an originaf
fast
pipeline
architecture.
JOOS VANDEWALLE,
J. DE MAN,
FELLOW, IEEE
sender and no one else. Both sender and receiver want
be sure that the integrity
that an opponent
of the message is guaranteed,
did not change, insert, or delete parts of
the message.
As electronic communications
and networks grow, there
is not only a need for compatibility
on communication
protocols
but also for common methods for data protection. To set up a secure communication
between different
parties of different
organizations
all over the world s@zdards have to be defined [1], [2]. Although there are several
DES chips available on the market, there is still a need for
higher
performance
the existing
chips
and more secure DES chips. None
combines
high
speed with
standards.
INTRODUCTION
U
NTIL the last decade, cryptography
was the domain
of the diplomatic
and military world [3]. Due to the
microelectronics
(revolution
a need for
commercial
c~ptography
has emerged.
performance
digital
circuits
The ever cheaper
and higher
have caused a rapid expansion
of international
telephone communications,
computer
works, etc. Electronic
mail, electronic funds transfer,
recently
tion
theft.
smart
may
cards are part of daily
create
However,
security
threats
the same digital
netand
life. But this evoht-
from
eavesdropping
circuits
allow
to
for small
and elegant cryptographical
solutions to replace their rigid
and heavy mechanical counterparts.
A classical algorithm, like the Data Encryption
Standard
(DES) [1], protects data in two ways. First, priuacy is
protected.
After
message,
sent
such
encryption,
over
as electronic
receiver.
A second
that of authentication.
sure that
the sender can be sure that the
an insecure
mail,
is only
and often
After
communication
read
by
more important
decryption,
the message he received
channel
the
intended
demand
is
the receiver can be
came from
the original
modes
and
0018-9200/88/0600-0647
it implements
output
but
cial
the Data Encryption
dependency
at a certain
without
speed
allows
to include
in his ciphertext.
moment
also on previous
approach
of
Stan-
most widely used standards for enciplner[1]. Second, it implements
all operation
in the standard, “DES Modes of Operreason for this is that the strength of a
system depends heavily on its use. These
give the user the opportunity
time
feedback
In this way
the
not only depends on the input
inputs
or ciphertext
outputs.
A spe-
the
implementation
of
modes
reduction
and with
a small
all
overhead
in
silicon. Third,
one can easily use the chip for bankingrelated standards (for message authentication
and financial
key management),
as in the generation of message authentication codes (MAC’s) and triple encryption.
Key management
and key security both demand
ple key registers
on chip. As a practical
balance
multibetween
user convenience of key management and silicon area, four
key registers are implemented,
e.g., for two master keys
and two session keys. This consumes
the
data-path
algorithm,
Manuscript
received Se~tember 28, 1987; revised February
1, 1988.
This work was performed
m collaboration
with the Cryptech company.
I. Verbauwhede
and H. J. De Man are with the IMEC
Laboratory,
B-3030 Heverlee,
Belgium and with the ESAT Laboratory,
Katholieke
Universiteit
Leuven, B-3030 Heverlee, Belgium.
F. Hoomaert
is with C
J. Vandewalle
is with %t&~~!-~~j~#~lsK~~~~~~e
Uttiversiteit
Leuven, B-3030 Heverlee, Belgium.
IEEE Log Number
8820717.
First,
dard, one of the
ing computation
modes specified
ation”
[2]. The
cryptographical
of
all require-
ments for high security and user-friendliness.
The chip presented here has a unique combination
1.
to
i.e.,
like
area.
The
security
the DES, depends
about
of
30 percent
a publicly
only
of
known
on the security
of
the keys. If the keys are compromised,
the whole system is.
This requires that, once entered, keys may not leave the
chip. Hereto special key registers are implemented
which
cannot be scanned out.
More generally, from the system performance
point of
view,
a cryptographic
$01.00
01988 IEEE
device
should
be easy to use :and
IEEE JOURNAL
648
The algorithm
- CV@ZW@ti
mquirsmmts
rsstridons
– security
1,=-.]
-------
!
/gor/thm
‘“9’”’=”””
IL
Fig.
VOL.
23, NO.
e.g., for
IMPLEMENTING A MULTIMODE DES INTO A
COMPACT FLOOR PLAN
General design methodology.
1.
bulk
encryptions,
satellite
communications,
or ISDN. The insertion of an encryption
device may not
slow down the overall performance
of a system and must
also be flexible for use in a large number of environments.
If someone tries to tamper, a general reset and especially a
disabling
of all key registers must occur.
The general
applied
in
design methodology
this
particular
cryptographical
case, as shown
requirements
can also be
in Fig.
smaller subproblems.
This results in a data path consisting
of a DES part, a key management
part, an input/output
and a mode
functionality
in this
A.
The DES Algorithm
whereby
part,
Encryption
other
devices in a system.
Processor (DEP)
part.
For each module,
the
are discussed below.
[14] is very flexible
and its Hardware
As shown in Fig. 2 a single DES calculation
[1] is a
sequence of a 64-bit initial
permutation,
a consecutive
calculation
of 16 rounds, and a 64-bit inverse initial permutation.
A round can be compared with an iteration step,
difficult
with
calculation
and its implementation
1. All
that are implemented
speed of 14 Mbit/s,
can calculate only three limited modes
and has a synchronous
1/0
interface
which makes it
to communicate
implemen-
tation of the DES algorithm but to provide a more general
cryptographical
device of which the DES computation
is
only a part. In order to deal with all cryptographical
requirements,
the global problem has been partitioned
into
part,
of an ASIC
chip have been described briefly. They can be found in
more detail in [4]. There exist several DES chips [12] –[14].
Most of these can compute only the DES algorithm
[12].
The fastest DES chip reported yet [13], with a maximum
The Data
1988
design task, which consists of
The aim of this work is not a straightforward
fast,
3, JUNE
detail in [4]. The layout engineering task is the topic of this
paper. Only the important
aspects of the high level design,
necessary to understand
the context, are repeated here.
?
Wi
engineering
CIRCUITS,
the translation
of the cryptographical
and security requirements in a floor plan and architecture,
is reported in full
K
‘,
OF SOLID-STATE
the number
shown
round.
in Fig.
of iterations
is fixed
3, contains
hardware
at 16. The DES
It consists of 32- and 48-bit modulo
for
one DES
2 adders (XOR’S
to use and is realized as a programmable
microprocessor.
But this is at the cost of a much slower speed, maximum
Al and A2), eight nonlinear
4.7 Mbit/s,
sion E, a permutation
P, and two registers of 32 bits.
In the floor plan and layout the S boxes are realized as
the eight PLA’s (see the bottoms
of Figs. 15 and 16).
PLA’s are the best way of implementing
these nonlinear
functions,
which are defined in table format, since these
None
and at the cost of an overhead
of these chips implements
with
on silicon
area.
the same speed and
the same ease MAC’s or triple encryption
or can handle
multiple
keys. It is unique to have all of these cryptographical
requirements
combined in one device. This has
been achieved by applying a modular design strategy, the
result of which is explained in Section II.
A modular
tained
by
algorithm
and efficient
applying
floor
plan
some algorithmic
engineering
could
knowledge
and layout
restrictions
1). In
Section
the choice
of algorithmic
leading
to a minimum
routing
be ob-
equivalences.
can only be efficiently
layout
III
only
are known
and silicon
The
done if some
(Fig.
equivalences
area are dis-
inputs
and four
substitutions
PLA’s
that
suited
way,
these PLA’s
were possible,
to be as random
nonlinear
arbitrary
cannot
be minimized
that
would
with six
as possible.
in a compact
functions.
scattered.
some structure
and
functions
S boxes), an expan-
to implement,
and ONE’S are randomly
the tables
substitution
(also called
are supposed
are well
structured
outputs
because the ZERO’S
Indeed,
would
and
We found
if a reduction
have to be present
constitute
a weakness
in
of the
cussed.
Starting from the architecture and floor plan, the layout
engineer wants to generate in a minimal design time layout
security of the DES algorithm.
On top of the PLA’s the
32-bit hardware routing for the permutation
P is placed,
which at the same time makes a connection
between the
PLA’s and the 48 bit slices of the rest of the DES part. In
that is fast, compact, and highly efficient.
vanced CAD tools are helpful. The whole
and the related CAD tools are explained
and the end could be realized without the use of routing by
using the inherent
structure. In this way a considerable
Even for a cryptographical
Therefore
addesign process
in Section IV.
chip, the design conforms
to a
conventional
design and layout strategy, as explained in
Section V. However, testability is something special and at
first sight seems to be in conflict
with security requirements: one must be unable to scan out keys or intermediate ciphertext.
The solution to this dilemma is discussed in Section VI. Finally,
Section VII contains the
implementation
and test results of the first prototype.
contrast,
amount
the large
of silicon
in Section
64-bit
permutations
at the beginning
area has been saved, as will be explained
III.
B. The Keys
The
key
module
implements
two
tasks:
1) the
safe
storage of the keys, and 2) the key schedule calculation.
For each DES round, a subkey of 48 bits has to be
VERSAUWHEDE
el al.:
OPTIMIZATION
OF DES DATA
ENCRYPTION
CHIP
649
St3nder
Receivw
~-1
S
~1.
w
+3
16
DES
56
+....
munda
Plain
w
&y
I
Fig.
‘T
f R3rseInitial
Fig.
2.
5.
Definition
transition
l..._T__J_J
~
l..r
~–.
key
pitrt.
for
Above
this
and the
routing,
( LR) is implemented.
the implementation
the
Most
of the ‘four
key
KRI – KR41 As shown’ in Fig. 4 an incoming
key
has 64 bits. The key parity
-.J
and decryption.
the 56 bit slices of the keypart
of the DES
area is needed
&iiL:&
Phlin
plt
for left right shifting
registers,
~,–-.+
PC2 is located, which consists of two 28-bit
next to each other. This enables an easy
slices
hardware
– 1
of the CBC mode, encryption
between
32 bit
The main parts of the DES algorithm.
K’w
cipher
M
and selection
parts placed
x “64
U*
-+
MS
DES
=i
enters
is checked only once, when the
the chip,’ so 8 bits
are dropped.
A hardware
crisscross rcwting
for the initial
key permutation
F’C1
could be saved by using the inherent structure; Therefore
the initial
key permutation
PC1
is also computed
on’ly
once when the key enters the chip and the keys are stored
in a permuted
fashion. The algorithmic
equivalent to save
this routing
is explained
C. The Modes:
L*
Fig.
3.
in Section III.
Combining
Feedback and Pipelining
For a given fixed key and given fixed data input, the
DES cipher output is always the same. An opponent could
Architecture
thus use frequency analysis to retrieve the original text [3].
To avoid this, several DES modes of operation are defined
[2]. These provide
feedback and block chaining.
As an
for one DES round.
example,
key inplr
J-”
in’ Fig.
5 the cipher
block
defined.
In this mode a given fixed input
key will
produce
different
chaining
(CBC)
ii
and a“given fixed
cipher outputs,
because they do
not depend only on the input but also on previous inputi.
From the implementation
point of view,. a pipeline
is
introduced
to increase the data rate through the chip. The
inherently
slow on–off
chip communication
“is done in
parallel with the DES calculation. The previous cipher text
is written
out while the actual plain
ciphered and the next one’ is read in.
16 Subkey
generation
seems to be in
I
example,
with
the feedhac~ ~modes.
mode’ the new input
for the following
2 sum of the -previcm? cipher
The key schedule calculation
and the new data input.
This conflict
has’ been resolved
modulo
?ES
For
in the CIEIC
the” bit-wise
-t-”
4.
conflict
the DES device is configured
.A . . . . . . --------
-~
Fig.
when
text is being enAt ,first siglit this
calculation
by placing
is
cy,@ut
the mode
generated, as shown in Fig. 4. The input key is also 64 bit,
but 8 bits are used for parity checking [1]. After an initial
key permutation,
PCJ, the 16 subkeys, one for each round,
calculation
strategically
between the DES hardware and
the input/output
hardware and by combining mode calculation with the internal data transport
on chip. How the
feedback loop can be combined with the pipeline is explained in Fig, 6 for the same CBC mode (encryption
ahd
are derived
decryption).
from
the 56-bit
One subkey is obtained
after
a 56- to 48-bit
key selected
for
encryption.
after some left or right shifting
permutation
and selection,
and
PC2 [1].
The key hardware is placed in the middle of the layout
(see Figs. 15 and 16). At the bottom, the key permutaticm
enciphered
The newly
entered
the DES part or vice-versa
done
shown
on
data are xoRed
data while they are transferred
the
fly
between
with
[4], [5]. This mode-calculation
every
in Fig. 7. This technique
two
the
from the 1/0
pipeline
to
is
stages as
of mode calculation
does
IEEE JOURNAL
650
OF SOLID-STATS
CIRCUITS,
23, NO. 3, JUNE 1988
VOL.
)7
10
m
—
DES
DES
(a)
10
E!
~:
x
:Iti
.
l.(+)
... .
.
DES
(b)
Fig.
6.
Mode
Fig.
8.
Hierarchy:
one head controller,
and four data-path modules
(c)
calculation
combined
with internal
(b) CBCenc, and (c) CBCdec.
transport:
-k–l
pipnll”e
mk
communications
plfall.
S+a90k+l
(LC’S),
(a) ECB,
most
@peline
four local controllers
(cross-hatched).
placed
m
,,
with
applications
in
a microprocessor
it is important
a number
modifications
of different
to the original
ing the communication
Moreover,
or for DMA.
that
the chip
systems
with
For
can be
minimal
system and without
decreas-
speed.
if h the future
this DES hardware
module
is
used as a small encryption
corner on a large VLSI digital
device, it is only this 1/0 frame that has to be stripped off
or replaced
by a smaller one, e.g., for on-chip
interprocessor
E.
communication.
The Controller
In
addition
needed.
Unit
to the data
Extra
features
path
a controller
like MAC’s,
triple
structure
encryption,
is
etc.
can be computed with the same data path but impose strict
requirements
on this controller.
As shown in Fig. 8 the
controller
has also been designed
module has its own local controller,
only
(b)
Fig. 7.
Fast internal
with
the main
codes incoming
mode calculation in between two pipeline stages:
(a) ECB, and (b) CBCenc.
controller.
commands
hierarchically.
Each
which communicates
This central
D.
The Inp@/Output
The input/output
tion.
On the floor
limited.
pendent
which is
hardware
nous way
the keys.
from the
Therefore,
(IO) Part
module
plap
realizes off-chip
of the prototype,
communica-
this is still very
in a second implementation
an inde-
1/0
interface
(1/0
frame) is being designed,
divided into two main parts. The internal
DES
communicates
with this 1/0 frame in a synchroand performs mainly a redirection
of the data or
The second part can read or write data to and
outside world in a asynchronous or semisynchro-
nous fashion.
It can combine
8, 16, or 32 bits and it contains
or separate data for buses of
the necessary glue logic for
de-
part of the
chip has to be activated.
The
local
controllers
can execute
a limited
long sequences, e.g., the local DES controller
not decrease the speed of the device. The same can be
shown for all other modes. Hence a unique feature of the
design, not found inother
implementations,
is that allfour
modes are equally fast.
controller
and decides which
number
of
can generate
sequences for DES or inverse DES (DES-1) calculations.
The local controller
for the modes decides the configuration of the modes instruction
register for the correct internal transport
and mode calculation.
The main controller
synchronizes
the local controllers
for parallel execution as, e.g., for the pipeline execution of
Fig. 7. It monitors the local controllers until every module
has finished. The execution times can be of varying length,
e.g., depending
cation
on the speed of the external
or depending
on single or triple
1/0
communi-
encryption.
The communication
between two modules is restricted
to one common register between every pair of modules,
which is strictly monitored,
as shown in Fig. 8. Neighboring local controllers
can give instructions
to these registers
but at different
time instances. For example, the 1/0
register is shared by the 1/0 part and modes part. The
1/0
part
internal
fills
or clears it, and the modes part
transport.
The local
controllers
uses it for
use the common
VERBAUWHEDE
registers
et a[.:
but
are allowed
the 1/0
OPTIMIZATION
only
OF DES DATA
the main
controller
to use it. For example,
register,
The
main
1/0
controller
starts
decides
if the 1/0
the modes part
controller
ENCRYPTION
TABLE
when they
THE
part is filling
is not allowed
alternatively
651
CHIP
INITIAL
I
PERMUTATION
1P
to work.
the DES
IP=
and/or
[58
50
42
34
26
18
10
2
60
.52
44
36
28
20
12
4
the chip, the key and
62
54
46
38
30
22
14
6
modes sequences in between different pipeline sections (cf.
At present, the longest modes
Fig. 7)I should be minimal.
64
56
48
40
32
24
16
8
and then the key and/or
To maximize
the data rate through
modes controller.
57494133251791
sequence takes only nine clock cycles and the longest key
sequence takes three clock cycles.
the controller is very small, as the
In tlhe first prototype
objective
was to test the functionality
rate
through
the chip.
The
51
43
35
27
19
11
3
53
45
37
29
21
13
5
63
.5.5
47
39
31
23
15
7 ]
of the data path. In
the actually completed second implementation,
mance of the controller
has been optimized
data
59
61
challenge
the perforto maximize
for
the main
controller
‘is on the one hand to provide the external user
with simple powerful
high-level
commands,
but on the
other
hand
to keep
calculating
without
were not possible,
data path would
F. The Overall
For
shown
not be available
parts
of the data
path
to the outside
world.
Floor Plan
the first
illustrated
all parallel
losing time in between subtasks. If this
the internal
calculation
power of the
prototype
in
Figs.
the floor
15 and
leads to minimum
plan
16. The
and layout
relative
wire and’ bus length.
are
placement
Each ‘mod-
ule is bit-slice
designed and has the same width, though
they have different word lengths. Thus modules are abutted
and routing is avoided.
Fig.
III.
ALGORITHMIC
9.
EQUIVALENCES FOR AN
Straightforward
hardware
mented compared
routin
for 1P which
with t Ee data path.
is not
48.
84’2-
imple-
EFFICIENT FLOOR PLAN
The global
layout
permutations,
28-bit
(Fig. 16) contains
one of 56 bits (which
routing)
and
only two hardwired
is in fact two times a
one of 32 bits.
algorithm
as defined in the standard
large permutations,
a 64-bit initial
64-bit final permutation
1P-1, and
permutation
with
PC1.
dedicated
However,
These permutations
crisscross
routing
the DES
but
are not
by
using
realized
yield
this hardware
A. Byte-Oriented
For
saving will be briefly
explained.
IP and IP-1 Realization
illustration,
standard
the
IP
table
[1]
as defined
is given as Table 1. It reads as follows:
in
the
bit 58 of
the input is the first, most significant
bit of the permuted
input, bit 50 is the second and so on, and bit 7 is the last,
least
significant
64-bit
channel
bit.
routing,
The
d@l_L.
.4
9
...1
..56
124J574J
57.-.
‘
corresponding
straightforward
which is not implemented,
yg%~
a shift
technique
and an optimal arrangement
of the floor plan
[4]. The three most important
algorithmic
equivalences
which
33E~~~1
contains three more
permutation
1P, a
a 56-bit initial
key
is shown
in Fig. 9 to compare it with the actual data path. If the
width is taken the same as for all other modules, the height
is 500 pm and this only for IP.
33 . . .
Fig.
[
~
10.
49 . . .
41...48
a
Equivalent
realization
I
56
57.
of 1P: floor-plan
..s4
organization
However,
IP is byte oriented and so can be realized
with a shift technique
[4]. IP is calculated when going
on–off
chip. Instead of two 64-bit complex channel routings, one for IP and one for 1P-1, the 64-bit 1/0 register
is arranged as a concatenation
of eight shift registers of 8
bits, One can prove [5] that this shift-register
concatenacorretion together with a small floor-plan
reorganization
sponds
to
the
initial
Fig. 10, no scrambling
permutation
routing
1P. As can be seen in
remains.
back after the DES round calculations
byte per byte performs 1P-1.
Shifting
the result
and reading
it out
IEEE JOURNAL
652
-%
OF SOLID-STATE
CIRCUITS,
u
VOL.
23,
NO. 3, JUNE
1988
Architecture
Fig. 11.
Equivalent
realization
of PC1:
floor-plan
+
I
I
&
organization.
I
1
!
Conrmmmion !
FbOrplan
OrgmlmUon
5
‘-@J ‘*
16
DES
rounds
12.
1
—.—
u
Fig.
Equivalent
realization:
1P and 1P-1 outside
of the modes.
the feedback
c1
output :
–
– switch Iavel
- transistor
–
pattern
loops
layout
test
B. Equivalent
PCI
Realization
Fig.
A
similar
applied
shift
technique
to realize
the initial
with
seven registers
key permutation
into
two
28-bit
registers,
PC1.
necessary
The
for
key
schedule calculation.
Again, similar to the 1P realization,
it is the switching from serial shift to parallel
use that
performs
the permutation.
C. 1P and IP-1
IP
and
while
modes,
IP-I
Fig.
1P
are brought
however,
the 8-byte
in
Outside the Feedback Loops of the Modes
these are calculated
and
12. The
feedback
loops
on chip. For example,
mode (CFB),
simplification
outside
the feedback
when going on or off chip. The
are calculated
cipher
IP-1
outside
the
feedback
for
this is explained
one obtains
loops
by bringing
is based
on
the fact that IP and 1P-1 are each others inverse, so
lP-l(lP(Data))
= Data.
The combination
of both techniques, the equivalent shift
technique and bringing
IP and 1P-1 outside the feedback
loops, allows for very compact realization
of the modes
hardware.
First the hardware routing for three permutations is saved. This would produce
an increase of 30
percent of the data path without considering the placement
problems. Second, by shifting from the 1/0 register to the
internal
register, only eight XOR’S, eight mukiplexers,
and
eight
bit buses are needed, instead
64 XOR’S, 65 multiplexer,
[14].
IV.
llm
The design
of an area-consuming
and 64 bit buses such as used in
DESIGN PROCESSAND USE OF CAD
evolution
of the data path,
tions to layout, is presented
into five phases.
from
13.
Design
level
evolution
of the data path.
can be
corresponding
floor plan is shown in Fig. 11. There is no
routing of the incoming bits. The seven shift registers are
combined
~—– J_
! Lagic Sbnulat-til
TOOLS
specifica-
in Fig. 13. It can be broken
up
A.
The Architecture
The
DES
Synthesis
algorithm,.
its modes
of operation,
and
the
other cryptographical
specifications
have already been explained above. The feedback loop over architecture
represents the choice of the algorithmic
producing
a better floor plan.
B. The Floor-Plan
The
result
Organization
of the first
equivalences
thereby
and the Cell Library
iteration
loop
is the ji’oor-plan
organization
discussed above on which the optimal relative
position
of the different
elements is established. Based on
that
floor-plan
organization
a minimum
cell library
defined.
Besides the eight PLA’s, registers, multiplexer
was
and XOR’S
are needed to implement
the DES, key, and modes hardware. Except for the PLA’s, we have been able to build all
of the other logic out of only four gates: a static XOR and
three types of inverters
(with small, medium,
or large
W/L).
This is possible
because DES consists essentially
of
data scrambling
and bit-wise modulo 2 arithmetic.
Pass
transistors
are used to implement
the clocks and conditions of multiplexer
and registers (a maximum
of two in
series is permitted).
The inverters
can be provided
with,
or
without,
static refresh to implement
memory. As an example, a register with a two-input
multiplexer
at the input is
given in Fig. 14. The basic cells have been characterized
individually
for the transistor dimensions,
the driving capability,
etc. “Special”
cells are the eight PLA’s used to
realize the substitution
functions.
These are also characterized separately.
The cell layouts are, however, customized to the floor
plan.
For
the width
one cell different
versions
of the bit slice. Typical
exist, depending
for the DES algorithm
on
is
et al.: OPTIMIZATION
VERBAUWHEDE
cl
OF DES DATA
Encryption
CHIP
653
Phil
pert system DIALOG
Register implementation
with two-input
multiplexer,
using pass transistors and refresh inverters.
realized
results in
in
such
that
is
a way
with
except what
14.
description
base of 85 rules. The rules are formulated
consistent
Fig.
[6]. The formal
a knowledge
is that
everything
is forbidden
the rules, instead
is forbidden
otherwise
except
of accepting
what
everything
by the rules. The reason for tl&
some construction
errors can pass unde-
tected.
The DIALOG
system verifies
first the correctness
of the
the use of different
word lengths, 8 X 8, 32, 48, and 56. To
generate a floor plan on which all modules have the same
width, the cells had to be laid out on different pitches. This
library topology. This means that every cell that does not
conform to the fcmr basic cells is reported to the designer.
Refresh inverters are identified
by their weak W/L
ratio.
has been done using the pitch matching
PLA’s
symbolic
layout
editor,
To implement
CAMELEON
the “glue”
the controller,
capabilities
[11].
logic of the 1/0
some more
logic
in generating
device,
a standard
C. Construction
A novel
way
of design
a compact
including
Second,
clude:
Be-
a)
DES
in accordance
clocking
and
implementation:
threshold-voltage
~ drop
ond to avoid leakage problems
how long
As shown in Fig. 14
drive at most two refreshing inverters through a pass-transistor network.”
This is to avoid problems
with ratioed
Indeed,
the preceding
static
refresh
inverter
constitutes
which
switches
d)
capacitance
ional
a ratioed
load
the refreshing
clocks;
of
a
that every
is followed
at least
by
a
possible prolbof the capaci-
of every
maximum
and
The verification
DES
ances)
input
(gate)
and diffusion
the load
capacitances,
path
45-m@
The main advantage
CPU
level descripti~n
transistors
and
of the
capacitmachine.
is that, it is comple-
With simulation
one detects faults
for. With this verification
mainly
at the interface between
particularly
construction
exceeded capacitive
6K
time on a Symbolics
of this verification
mentary to simulation.
that one is searching
connection
faults
ules are detected,
(12K
routing
area capacitances.
of the transistor
data
took
calculates
cell and checks whether it exceeds the
value specified in the rules. This load
capacitances
whole
is mainly
used for the
implementation
of condit-
checks—DIALOG
incorporates
the Phil
conditional
clocks are followed
by full static
refresh; for the Phi2 latch a restoring PMOS transistor is
sufficient.
3) For correct logic levels: “A ‘medium’
inuerter may
logic.
timing
verification—this
verification
of the correct
clock-
and sec-
as it is unknown
a value must be stored in a register.
c)
“Conditional
after a pass transistor,
length
identified;
driving
clocks always have to be followed by a refreshing inverter.”
The first reason is for level restoration,
to restore the
the
or it is verified
tances at the input and output nodes of a pass-transistor network is calculated and problem nodes are
ing is only allowed on clock Phil .“ The reason for this is
that it would cause an unfeasible controller
implementamemory
verification—e.g.,
is checked,
rules. These in-
electrical
pmblems-e.g.,
to detect
lems with charge sharing, the ratio
a priori
rules,
separately.
the library
pass transistor
network
level-restoring
inverter;
with some constructiming
the system verifies’
pass-transistor
b)
has been the use of
and treated
pass network
cell library.
capability
rules, etc. Some examples are as follows.
1) For correct clocking and control: “Conditional
tion later on.
2) For correct
are identified
rules and the use of an expert system
design practice, The basic logic gates
and cells are connected
rules
and
Rules and Verification
defined construction
to check consistent
tion
interface
gates are necessary.
cause they are not critical
they are taken from
of the
different
modviolations
and
loads.
to
in-
D.
The Layout
Generation
verter.
4) For capacitive
a maximal
loading:
capacitive
loads are estimated
extraction
programs.
For a large digital
“A
‘small’
load of 500 fF.”
initially
inverter
and checked
chip a rigorous
may drive
These capacitive
afterwards
construction
by
strategy
must be followed. One cannot allow “fancy tricks” to save
gates or transistors.
Such local optimization
carries a
penalty
back,
when
the overall
system has to be tested,
controlled,
or verified. How the construction
rules are used
for a novel way of verification
is discussed next.
For verification
the construction
rules for building
the
data path are written
(Language
Expressing
in the LISP-based
TOpology
language LEXTOC
Constraints)
of the ex-
The floor plan, the adjusted cell layouts, and the construction
rules form one global layout. Placement
and
routing of the different modules is done by hand.
The hardwired
permutations,
P and PC2, are difficult
to route by hand. Therefore
the router of the modide
generator environment
(MGE) [10] has been used. Ternninal connections
can be specified in a textual form using
LISP. The resulting routing has a poor density compared
with
what could be obtained
however,
writing
connections
by hand routing.
in
a texttial
!n this case,
form
is less
sensitive to errors and mistakes are more easily detected
because the permutations
tables are copied from the stamdard
[1], similar
to Table I for 1P.
IEEE JOURNAL
654
E. ERC and Logic
DES-like
The latter
this will
the input
for a logic simulation.
The first
goal is to check the floor plan and in particular
the global
routing of the floor plan, which is part of the algorithm
implementation.
lated
The whole
because routing,
be checked
data path
needs to be simu-
e.g., for the permutations,
in relation
to neighboring
can only
modules.
A second
goal of the logic simulation
is to generate test patterns for
the processed prototypes.
Indeed, a functional
reference pattern for an adder or a
multiplier
is easy to check. But it is not obvious that for
the hexadecimal
01234567
data word
89abcdef,
the first
S-box
are generated
input
567890ab cdeff31f
the first
right
by an independent
input
is 7c6abae9
software
checked
and
patterns
implementation
with
Hardware
encryption
corner
encryp-
on a larger
circuit.
V.
STRUCTURED
DESIGN
AND
LAYOUT
STRATEGIES
It will be explained in this section that the design fulfills
general design and layout strategies. The application
of
these techniques
A. A Structured
In general,
eases the design effort.
the complexity
of IC design:
larity, and locality.
The implementation
the consistent
hierarchy,
has to be changed,
building
application
blocks.
visible
in layout,
as shown in
in Fig. 16. But regularity
logic
also exists in
of the construction
gates,
registers,
etc.,
The same construction
rules, when
into
functional
rules,
sometimes
termed a methodology,
can be coded in a verification
to allow automatic
verification.
B. Layout
tool
Strategies
Typical for this design” is the use of d$ferent word lengths
in different
places. The DES part has a combination
of 32
part
slices. The key part
is byte oriented
has
several
ing is applied.
for reducing
modularity,
regu-
These techniques are closely related.
of the DES algorithm
itself is only a
small part of the problem. The architecture
for one DES
round is easy to deduce and the control for this hardware
has 56 bit
slices and the
and has eight blocks
consequences
on
the
a more or less square layout,
Each module,
independent
of 8 bits.
general
layout
cell stretch-
of the number
slices, is 4 mm wide. The DES part has a combination
of
of
32 and 48 bit slices. The key part has 56 bit slices and the
1/0 part is byte oriented and has eight blocks of 8 bits.
Most
connections
between
made by abutment.
cells inside
the modules
are
This means that a bit slice in the key
part is only 70 pm wide compared to 120 pm for a bit slice
in the DES part. The layout of a register in the DES part
will thus be arranged differently
from a register in the key
part.
Second, the relative placement
for the permutations
also allows
word
length
to
another.
For
of the routing
the transitions
example,
PC2
channels
from one
is placed
the 56 key bits and the 32 data registers,
expansion
[7] are available
and if something
is most clearly
constructing
between
Design Methodoloq
four techniques
Regularity
strategy.
First, to obtain
Module
level in system design, as a stand-alone
or as a small
algorithms,
the chip photograph
1/0
switch-level
descriptions,
and test patterns are produced.
This module
can be used as a black box at a higher
device
1988
have only local influence.
and 48 bit
independently
The result of the whole design process is a hardware
module
for the data path: full layout,
transistorand
tion
23, NO. 3, JUNE
tion protocols,
but they apply everywhere
(regularity).
Modularity
allows also the reuse of the same modules for
This
F. The Result: A Ciyptographical
VLSI
VOL.
and the key
is b48132c41ef 7. Reference
of the DES algorithm
and
available reference tables.
abstraction
CIRCUITS,
two modules as explained in Section II. The use of these
registers is submitted to very strict timing and communica-
Simulation
An electrical rule check is performed on the layout. The
output
of the extraction
is a transistor-level
description
which can be converted easily to a switch-level description.
forms
OF SOLID-STATE
E is placed between
Also,
PLA’s
path,
therefore
are difficult
they
and the
the 32 and 48 data slices.
to fit nicely
are placed
on a bit-sliced
at the bottom
data
and
the
terminal
fitting is included in the permutation
P between
the 48 data slices and the eight PLA’s.
Third, the variation in word lengths has influence on our
power and ground routing strategy. A double-metal
technology
is used
in
the
design.
The
two
metal
layers
run
is a well-defined
sequence. But the realization
and interaction of all aspects, the mode calculation,
the key handling,
perpendicular
to each other on chip. Metall
runs parallel
to the data flow, in the vertical direction, because there are
and the 1/0 interface make it a complex problem. It can
be said that a more attractive solution is obtained when all
more data contacts to be made (in metall).
Meta12 runs
horizontally,
parallel
with the control
flow. Control
is
four techniques
For instance,
mostly
are applied.
the controller
is designed
in a hierarchical
way. One controller
is master and the others are slaves.
Each controller
is a separate module. Modularity
means
well-defined
interfaces. The terminals and the timing are
defined for higher levels and what is inside is transparent.
This is the definition
of locality. The same is valid for the
data
path.
restricted
The
communication
to one communication
between
register
two
modules
between
is
every
supplied
in poly
to make connections
to the gates
of (pass) transistors.
Long poly wires have RC problems.
In this case the maximum
length amounts
to 4 mm.
Therefore
meta12 is used to bypass these long poly wires.
Power and ground
lines run also horizontally,
perpendicular
to data flow. Because of the problem of different word
lengths
vertically
for
unders
would
two
power
and ground
reasons.
be unavoidable
First,
cannot
be provided
crossovers
at the interface
and
cross-
between
VERRACWHSDE
d
af.:
OPTIMIZATION
OF DES DATA
L
scan
ENCRYPTION
pads
635
CHIP
\
c<)
n
:
0
..1.
L. 1.. LL...LJ
K
I
Fig. 15.
different
small
Even
‘Es
-
1
Fig. 16.
be testable. These tests are based on signatures. A test
signature uses the same propagation
principle
as a digital
Floor plan of the data path of the first prototype.
modules.
A second problem
signature
is caused by the very
pitches in some modides such as the key registers.
by mirroring
the cells, it would be impossible
to
arrange
the wells and to separate
transistors.
horizontally
Now the n-wells
over the different
lines are used in common
the NMOS
and PMOS
two different
typical
of surges on the power
double-metal
classic
finger
power
or
fork
single-metal-layer
fork.
cells.
ring is the ground
ring. Each horizontal
is connected
connections
between
they are multiple
is typical
bottleneck,
our
rnetall
dips are unlikely
of
StUck-at-ZERO
a MAC,
is to detect
of false data.
A small
chafige the signature.
is used for ‘test sig-
clear data and key pair is applied,
and
the DES will be tested and stuck-at-O~is,
faults will be detected. The typical stuck-
a
overhead
dynamic
faults
research.
of static
While
CMOS
design
are a
in other chip designs,
an
OFIlogic and chip area is necessary to realize fault
propagation,
The
rings.
Via
and meta12 are necessary but
and the current
ionality
and
line through
the
For
in DES it is inherently
present.
a double
the data path.
sides to
or an insertion
A known
of further
ring, and the meta12 ring is the
at both
of MAC’S.
the restilt is compared with a reference cipher text after a
number
of encryptions
and decryptions.
Complete funct-
open
for
the generation
change
topic
that all cur-
power or ground
an illegal
A
the stem of the
is more equally
uted along the ring. Thus there is no current
and power
by
distribution.
this is solved by providing
power
cells
is solved
which
and meta12 all around
metall
the
lines
ground
has the problem
one critical
In this design,
of metall
structure
technology
rent passes through
ring
and
as in
is based on the CBC mode, the function
natures.
Usually, however, power is distributed
along the signal
lines to avoid surge currents on the power lines [8]. The
problem
which
change of input data will completely
The same propagation
principle
are connected by abutment
bit slices. Power and ground
between
Chip photograph of the implemented prototype.
distrib-
The
VII.
IMPLEMENTATION
AND TEST RESULTS,
first
prototype,
scan
with
registers,
has. be~~n
processed using a 3-pm N-well CMOS process with double
metal (see Fig. 16). It contains 12000 transistors and htis
an area of 25 mm2. The aim was to test the functionality
of
the different
parts, mainly the DES and the modes part.
Tests have shown full operation
up to 16.7-MHz
clock
concentration
to occur.
rate. With
this clock frequency,
a 20-Mbit/s
data rate on
all eight byte modes is achieved.
VI.
TESTABILITY
OF A CRYPTOGRAPHICAL
A second implementation
is now being developed, which
takes the data path as a hardware module arid which also
DEVICE
On the floor plan of the first prototype (Fig. 15) it can
be seen that some scan registers are implemented.
This
scan
technique
design
security
during
demands.
scan out
ciphertext.
mercial
is very
useful
development.
localizing
faults
this is in conflict
in
a
with
Scan pads provide a very easy way to
initialization
vectors
or intermediate
keys,
Therefore
applications.
for
But
they are not allowed
They are only provided
opment and are realized as independent
combining
them with existing registers.
contains the full controller
and microprocessor
The scan registers have been removed. Thanks
performance
controller,
a full DES calculation,
the mode
calculation,
same clock frequency
rate of 32 Mbit/s
takes only
28 clock
can be applied
interface.
to a highincluding
cycles. If
the
to this device, a data
can be expected.
for safe comduring
devel-
blocks instead of
This means over-
head on area, but they are very easy to remove.
Scan pads are removed for safe commercial applications.
For validation
or maintenance purposes, the chip must still
VIII.
CONCLUSIONS
A single chip has been designed and implemented
which
executes DES with a number of unique cryptographical
features. The original approach of mode calculation
allows
pipelining
in combination
with feedback modes. Due to
656
1Ei3EJOURNALOF SOLID-STATECIRCUITS,VOL. 23, NO. 3,
the structured
design strategy,
optimization,
tools
the regular
floor
the global
plan,
at all design levels, the main
building
blocks
commercial
Checking
architecture
can be reused in a flexible
applications
at different
verification
rather
or
tial and complement
for
and many
way for safe
DES-like
algorithms.
design levels, including
and electrical
than local
and the use of CAD
simulation,
each other. The outcome
authors
Rammelaere
expert
system
service
chip
division
wish
for
to
their
thank
help
I.
in
for chip verification,
processing,
for stimulating
division of the IMEC Laboratory, Heverlee, Belgium, working towmds
the Ph.D. degree in electrical engineering. She is a member of the research
group on applications and architectural design strategies, under the
direction of Dr. F. Catthoor, Prof. H. De Man, and Prof. J. Vandewalle.
Her research interest is in the design and integration of cryptography
applications ad currently of ~gebrtic data processing applications. In
particular, the formalization of the architectural methodologies is envisioned.
is a compact
ACKNOWLEDGMENT
The
1988
and design rule checks are essen-
design which can be used as a small module in larger
digital VLSI circuits, but also as a fast stand-alone device.
In short, it has a number of unique features not found in
other devices.
De
JUNE
Bolsens
using
the
and
the INVOMEC
and the members
W.
DIALOG
MPC
of the VSDM
discussions.
Frank Hoornaert was born on October 20, 1961
in Belgium. He received his education from the
Klein
Seminarie
Roeselare and from
the
Katholieke Universiteit Leuven, Heverlee, Belgium.
From October 1984 to September 1986 he was
a Chip Designer at IMEC, Heverlee, Belgium.
He worked on hardware design and development
of secufity systems at the Katholieke Universiteit
Leuven from October 1986 until Februarv 1987.
Since March 1987 he has been a Secu~ty and
Development Manager at Cryptech n.v., Brussels, Belgium. His hardware
and software knowledge includes gate array and full-custom design
experience, digital design with PP and TTL components, and design of
cryptographic hardware.
REFERENCES
[1] Data
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[Ii]
[12]
Encryption
Standard,
Federal Information Processing Standard (FIPS) 46, Nat. Bur. Stand., Jan. 1977.
DES Modes of Operation, Federaf Information Processing Standard
(FIPS) 81, Nat. Bur. Stand., Dec. 1980.
W. Diffie and M. E. Hellman, ‘~Privacy and authentication: An
introduction
to cryptography,” Proc. IEEE, vol. 67, no. 3, pp.
397-427. Mar. 1979.
I. Verbauwhede, F. Hoomaert, J. Vandewalle, H. De Man, and R.
Govaerts, “Security considerations in the design and implementation of a new DES chip,” to be published in Proc, Eurocrypt 87
Conf., Springer-Verlag.
M. Davio; Y. Desmedt, J. Goubert, F. Hoomaert, and J.-J.
Quisquater, “Efficient hardware and software implementations of
the DES,” in Aduances in C~ptologp,
Proc. Ctypto 84, Aug. 1984.
H. De I&i,,
L Bolsens, E. Vanden Meersch, and J. Van
“DIALOG:
Ari expert debugging system for
(leyenbreughel,
MOSVLSI design,” IEEE Trans. Computer-Aided
Des., vol. CAD-4,
no. 3, pp. 303–311, July 1985.
N. Weste and K. Eshraghian, Principles of CMOS VLSI besign, A
Systems Perspective.
Reading, MA: Addison-Wesley, 1984, pp.
238-24Q.
L, A. Glasser and D. W. Doppe&ihl,
The Design and Analysis of
Reading, MA: Addison-Wesley, 1985, pp. 316-317.
VLSI Circuits.
M. Bartholomeus, L, Reynders, M. Pauwels, and H. De Man,
“ PLASCO: A procedural silicon compiler for PLA based systems~’
in Proc. [EEE 1985 Custom Integrated Circuits Conf., pp. 226–229,
H. De Man. J. Rabaev. P. Six. and L. Claesen. “Cathedral-II:
A
silicon compiler for di~tal sign’al processing,” IhEE Design&Test,
pp. 13-25, Dec. 1986.
K. Croes~ H. J, De Man, and P. Six, “ CAMELEON:
A processtolerant symbolic layout system,” IEEE J. Solid-State Circuits, vol.
23. no. 3. DD. 705-713. June 1988.
R.’ H. Cu;h’man, “Data-encryption
chips provide security—or is it
fake
[13]
[14]
security?.”
EDN.
vol. 27. DD. 39-45.
Feb.
17.1982.
D. MacMil&,
“Single chip en&ypts data at 14 Mb/s,” Electronics, vol. 54, pp. 161–165, June 16, 1981,
R. C. Fairfield, A. Matusevich, and J. Plany, “An LSI digital
encryption processor (DEP),” IEEE Cow-nun. Msg., vol. 23, no, 7,
pp. 30-41, July 1985.
hrgrid Verbauwhede was born in Leuven, Belgium, on October 24, 1961. She received the
electrical engineering degree from the Katholieke
Universiteit Letiven, Heverlee, Belgium, in 1984.
Her NLS. thesis was on analog design: the study
ahd design of a micropower instrumentation
amplifier
based on switched-capacitor
techniques,
Since September 1984 she has been at the
VSDM (VLSI Systems Design Methodologies)
Hugo J. De Man (M81–SM81–F’86)
was born
in Boom, Belgium, on September 19, 1940. He
received the electncaf engineering degree and the
Ph.D. degree in applied sciences from the
Kathofieke Universiteit Leuven, Heverlee, Belgium, in 1964 and 1968, respectively.
In 1968 he became a member of the staff of
the Laboratory for Physics and Electronics of
Semiconductors at the University of Leuven,
working on device physics and integrated circuit
technology. From 1969 to 1971 h= was at the
Electronic Research Laboratory, University of California, Berkeley, as an
ESRO-NASA Postdoctoral Research Fellow, working on computer-aided
device and circuit design. In 1971 he returned to the University of Leuven
as a Research Associate of the NFWO (Belgian National Science Foundation). In 1974 he became a Professor at the University of Leuven.
During the winter quarter of 1974–1975 he was a Visiting Associate
Professor at the University of California, Berkeley. Since 1984 he has
been Vice-President of the VLSI systems design group at the IMEC
Laboratory, Leuven, Belgium.
Dr. De Man was ~an Associate Editor for the IEEE JOURNAL OF
SOLID-STATE CIRCUITS from 1975 to 1980 and was European Associate
Editor for the IEEE TRANSACTIONSON COMPUTER-AIDED DESIGN from
1982 to 1985. He received a Best Paper Award at the ISSCC of 1973 on
Bipolar Device Simulation and at the 1981 ESSCIRC conference for work
on an integrated CAD system. In 1986 he became Fellow of the IEEE.
His actuaf field of research is the design of integrated circuits and
computer-aided design.
Download