Chortle-crf Fast Technology Mapping for Lookup

advertisement
Chortle-crf
Fast Technology
Lookup
Robert
Department
Table-Based
Francis,
of
Jonathan
Electrical
technology
based
Field
sented.
major
approach
is up
to
paths
to reduce
and
the
new
program.
number
14
an
10 ~o fewer
to implement
lookup
lookup
can
Xilinx
(CLBS).
also
3000
of
CLBS
than
mis-pga
the
and
age of 68 times
22
% fewer
experiments
faster
than
XNFOPT.
1
Introduction
Field
Programmable
than
12 70 fewer
than
Chortle-crf
mis-pga
This
memory
of implementing
any
paper
30 times
tional
cent
innovation
cuits
(ASICS)
tion
and
in Application
that
ASIC
An
blocks
and
consists
of an array
can
and
a programmable
class
a reCir-
integra-
[Ahre90].
The
dramatically
re-
manufacturing
costs.
of programmable
routing
of FPGAs
scale
[Hsie88]
time
are
Integrated
large
FPGAs
turn-around
FPGA
portant
both
user-programmability
of
(FPGAs)
Specific
provide
user-programmability
duce
Arrays
consists
logic
network.
of those
An
that
im-
The
input
#URFO043298
Northern
~=
supported
and
by
#OGPOO05280,
Research,
and
NSERC
grant
grant
from
the
Gr~ts
from
ITRC
Bellof On-
tario.
to
that
copy
w>thout
the copies
fee
are not
all
made
or
part
of
this
or distributed
material
for
drect
I< granted
commercial
the ACM copyright notice and the title of the pubhcation and
Its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery. To copy otherwise, or to republish,
requmes a fee and/or specific penmssion.
advanrage,
$1.50
and
is
is capable
of K input
bles
in
in
vari-
lookup
the
remainder
2
Background
mapping
a combinational
on circuits
An important
was
Keutzer
and
include
that
taK
is
implements
set of circuit
standard
and
that
mapping,
such
as
[Kahr86],
cell libraries.
technology
map-
programming
library-based
[Detj87]
The
examples
by Kahrs
of dynamic
Other
misII
a circuit
from
lb.
lookup
All
assume
in library-based
introduction
[Keut87].
of the
will
work
the
circuit
implemented
a restricted
the
created
advance
the
in Figure
in technology
[Greg86]
by the
one
paper
of K-
example,
5 inputs.
using
Earl y work
focused
that
produces
network
SOCRATES
pers
Note
a circuit
number
For
functions
available
the
has K or fewer
total
shown
the
of this
to 5.
ping
tables
4 of the
equal
elements.
the
circuit.
indicate
table.
uses only
table
ta-
by
into
1a can be implemented
lookup
lookup
a combina-
NOTS
lookup
this
for
converts
and
is to minimize
5-input
table
Boolean
than
lookup
table
technology
technology
McMAP
by
map-
[Lisa87].
becomes
mapping
tables
reported
The
Chortle
and
an
exhaustive
level
decomposition
the
are
large
of every
of K
a K-input
and
[Fran90].
therefore
specifically
Two
pre-
mappers
are
[Murg90a].
mapper
search
values
deal
technology
mis-pga
partitioning
that
required
table
22K differ-
For
to describe
impractically
technology
uses
However,
required
algorithms
lookup
[Fran90]
can implement
of K variables.
3 the library
lookup
Chortle
of K-inputs
functions
greater
28th ACM/l EEE Design Automation
01991 ACM 0.89791-395-7/91/0006/0227
a one-bit
is implemented
ORs,
every
tables
in Figure
viously
provided
table
and
algorithm
Chortle-crf
boundaries
with
Permlsslon
combina-
use logic
Operating
a research
a research
in
tables
lookup
2 K bits
which
where
goal
lookup
dotted
ent
work
com-
lookup
lines
function
a new
of ANDs,
tables
A lookup
1This
address
contains
program.
Technology
Gate
K-input
Boolean
mapping
network
by each
faster
that
of implementing
A
K
presents
technology
of three
1
first
studies
ables.
as cir-
XNFOPT
as the
recent
suggested
[Rose90].
with
such
Moreover,
have
memory
output.
tables,
86].
method
functions
a digital
Canada
lookup
[Cart
are an area-efficient
tional
network
CLBS
Toronto,
architectures
Blocks
waa an aver-
and
FPGA
inputs.
as a cir-
Logic
CLBS
FPGA
of lookup
[Murg90a]
networks
requires
containing
Chortle-crf
[Fran90]
a network
of
mercial
This
Chortle-crf
Vranesic
blocks
ble
networks.
benchmark
Zvonko
University
circuit.
Chortle-crf
Chortle
Configurable
Chortle-crf
In these
nodes
mis-pga
implement
series
ex-
at fanout
in the
than
This
recon-
in the
than
choosing
exploits
comparison
tables
tables
To implement
cuits
[Xili89].
tables
experimental
is pre-
for
a previous
also
of logic
of lookup
table-
packing.
than
a set of benchmark
Chortle-crf
of
faster
is implemented
fewer
YO
lookup
(FPGA)
on bin
algorithm
replication
algorithm
In
requires
Arrays
based
The
for
is a method
28 times
approach.
vergent
cuit
Gate
innovation
decompositions
haustive
The
algorithm
Programmable
The
gate-level
and
mapping
for
FPGAs
Rose,
Engineering,
Abstract
A new
Mapping
to
presented
find
node
of the
the
in [Fran90]
optimal
in a fanout-free
original
network
gatetree.
into
Conference@
Paper 15.1
227
~-v
,-.
-..
–-i
i 9’
‘Y
.-.1
/-.__y ~----.l
If-t-Y
i
i---
I
. . .
--..4
.-
a) without
~ .. . . . .
~-----”l
I
1
!
/
I
I
L_____
----
-.
L ----
----
J
gate
decomposition
i
I
i
___.J
Y’
I
-
.... .
i
!
I
i
!
i----
J
.-
I. .
. . .
.
R?---.---.,
.
$$-”
I
i
I
. .
I
i
network
a) combinational
ii
-.
!
I
i
ii
I
i
!
i
i
i
i
!
[
I
i
!
i
1....... . ...- ..—.. . ....
1
b) circuit
of 5-input
lookup
Figure
b) with
trees
reconvergent
optimization
and
replication
technology
mapper
that
of
logic
at
tables
initially
as an intermediate
performs
combinational
problem
network
The
at fanout
and
the
covering
exploit
result
a non-optimal
to reduce
circuit.
produces
number
focuses
does
paths
tables
in the
opt imizat
replication
ions
of logic
nodes.
The
Chortle-crf
Algorit
innovation
bin packing
other
important
vergent
The
and
the
is finding
reduces
the
For
example,
tree
single
OR
node
OR
The
principal
versed
beginning
toward
the
primary
the
inputs
of the
is referred
by
primary
At
network
to as the
circuit.
and
each
from
a circuit
node
Circuit
to the
This
implementing
tables
number
These
allow
its five
The
goal
both
the
the
the
(root)
The
in the
has two
first
circuit
and
of unused
unused
subsequent
Paper 15.1
228
goals
when
is to minimize
the
inputs
inputs
nodes
at
are
constructing
the
second
the
important
number
the
is to maximize
output
lookup
because
to be implemented
Best
of lookup
the
table.
they
may
without
the
The
First,
this
tree
of
the
a two-level
multi-level
network
circuits
output
lookup
be referred
to
3a shows
fanin
an OR
have
tables
the
without
implements
tables
tree
as
node
is attained
that
lookup
This
and
must
and
a
contain
the
maximum
output
number
increasing
the
of
number
tree.
tables
is constructed
3b and
decompositions
in two
is constructed
is converted
Figures
the
fanin
tables
decomposition
decomposition
will
de-
the node’s
of
The
of lookup
must
a node
decomposition
node.
in the
of lookup
the
for
order
of lookup
possible
tables
composition.
and
tables.
the best
table
inputs
of lookup
lookup
number
lookup
the
into
to be implemented
implement
Circuits
Figure
functions
minimum
2b,
decomposed
immediate
tables.
a tree
of
to imple-
Figure
Circuit
constructed.
of finding
decomposition
tree
that
these
Best
fanin
In
The
that
fanin
by constructing
unused
Chortle-crf
Circuit.
and
nodes.
lookup
required
Best
Circuits
previously
of the
fanin
cir-
node.
been
tables
is tra-
proceeding
node
the
have
of the
ensures
are
2a.
that
circuit.
tables.
the Best
fanin
traversal
the
node
in the final
2a has been
allows
lookup
implementing
of the
tables
Figure
of Figure
construction
upon
Circuit
tables
in
which
two
immediate
the
is dy-
network
is constructed.
Best
nodes
Chortle-crf
inputs
extending
of recon-
in the
combinational
outputs.
cone
of
Two
at fanout
tables
used
The
primary
implementing
cuit
of lookup
at the
exploitation
of logic
technique
programming.
application
lookup
shown
nodes,
just
Best
decomposition
of lookup
five
the
the
the
number
ment
hm
decompositions.
are the
replication
number
namic
is the
gate-level
features
paths
to reduce
in Chortle-crf
to choosing
Approach
Decomposition
key to constructing
pends
A major
Gate
a node
two
The
Packing
to
with
3
tables.
It
of the
on a covering
allow
and
Bin
of
[Murg90a].
of lookup
problem
reconvergent
a circuit
decomposition
then
lookup
fanout
3.1
mis-pga
lookup
that
of extra
exploit
nodes.
The
2.
1.
precludes
paths
decomposition
Figure
addition
fanout-free
gate
tables
into
a multi-level
3C illustrate
constructed
the
steps.
and then
de-
two-level
from
the
FirstFitDecreasing
{
start
with
uhile
a) fanin
lookup
there
{
if
tables
en
are
the
vithin
bin
empty
list
unpacked
boxes
largest
unpacked
any
in
bin
box
the
bin
will
not
fit
list
I
1
!
I
{
create
an empty
add
to
i.t
the
bin
end
and
of
the
bin
list
}
pack
the
first
largest
bin
it
unpacked
will
fit
box
into
the
within
}
}
b) two-level
decomposition
Figure
bins
are
are
~....
7’
is K,
the
packed
Figure
/
!1
size
lookup
Fit
In Figure
2.
In
(fanin
Figure
Fit
the
lookup
boxes
final
The
Decreasing
bin
table)
3a the
2.
boxes
of each
3b the
5, 4, and
is First
and
capacity
inputs.
are
Decreasing
tables
The
box
bins
used
First
of each
of used
algorithm
for
tables.
3, 2, 2, 2, and
of the
i
lookup
the
number
code
second-level
fanin
and
sizes
. . .. ...... . . . . .. .
I
the
its
4: Pseudo
is
have
contents
bin
packing
as outlined
in
4 [Gare79].
3.1.2
Multi-Level
Decomposition
.—... -_ . ....._ i
c) multi-level
The
decomposition
Figure
decomposition
the
3.
first-level
inputs
to
are the
fanin
lookup
3.1.1
tables
of Figure
Two-Level
3a.
two-level
decomposition
level
node
several
second-level
OR
node
is the first-level
the
second-level
the
and
3-input
inputs
are
node
implements
composed
lookup
nodes
ble.
over
tables.
each
The
lookup
the
a subset
of
which
tables,
two-level
however,
it will
decomposition
yet
its three
being
decomposition.
by
de-
be implemented
into
packing
ing
is to find
decomposition
algorithm.
a set of boxes
the
In general,
minimum
can be packed
the goal
number
[Gare79].
of bins
from
into
5.
in Figure
for
of K is less than
or equal
partitioned
fanout-free
the
approach
haustive
using
a
with
provement
of bin
pack-
mization
into
which
of logic
case,
the
is up to 28 times
search
the
approach
same
in speed
exploiting
at fanout
number
makes
con-
of Figure
the
can
trees
than
yet
of lookup
it practical
reconvergent
nodes,
totree.
paths
as discussed
3b.
two-level
is out-
be shown
tree
to 5 [Fran91].
faster
a
the
decomposition
[Fran90],
ta-
decomposition
is a fanout-free
works
lookup
implement
decomposition
converting
value
into
to
reducing
decomposition
network
tree
of the
decomposition
a multi-level
multi-level
if the
in the
ta-
when
In this
procedure
lined
thereby
multi-level
the two-level
detailed
final
tables
used
The
tables
second-level
tree,
the
tables.
first-level
lookup
be
by any
a multi-level
is constructed
Any
of lookup
implementing
of this
second-level
can
by
of lookup
tables
first-level
decomposition
cuits
two-level
bin
structed
a tree
inputs
3C illustrates
be optimal
decomposition.
The
number
The
second-level
a lookup
of the
The
or all of the fanin
implemented
is converted
3b
of the
unused
Figure
second-level
node
are three
is implemented
is not
and
Each
some,
3b there
jirst-
In Figure
node
of the
of one,
node
of a single
nodes.
nodes.
operation
In Figure
first-level
consists
lookup
ble
tal
The
leaf
two-level
with
is completed
with
outputs
portion
Decomposition
the
tree
node
the
For
bin
the
to
and
the
net-
packing
previous
ex-
it produces
cir-
tables.
This
to consider
and
in
the
imopti-
replication
following
sect ions.
Paper 15.1
229
!
MultiLevel
{
while
there
is
more
there
are
than
one
unconnected
bin
{
if
remaining
no free
inputs
unconnected
bins
among the
a) fanin
{
create
an empty
add
to
it
the
bin
end
lookup
and
of
bin
the
tables
r-tI--+ -t:
---
list
with
-I.-l-..,
---
shared
input
.
A.-.1--,
.-1-.-1-.
}
comect
the
the
next
most
filled
unconnected
unconnected
bin
vith
bin
a
free
to
input
}
}
Figure5:
Pseudo
code
for
multi-level
conversion
b) realized
3.2
Exploiting
It is possible
a better
to exploit
circuit
discussion
where
the
local
fanin
lookup
paths
a node.
terminology
The
of the
tables
second-level
Paths
reconvergent
implementing
uses the
the
and
Reconvergent
lookup
previous
are
referred
tables
are
a pair
boxes
of
distinct
share
reconvergent
inputsto
K, then
the
these
these
volume
two
topack
chosen
ing.
section,
to as boxes
referred
is the
is less than
Figure
6a shows
a pair
Figure
6b shows
the
to
within
a bin.
By merging
of the
rior
packing,
bin
as
Best
two
into
the
same
packed.
bin
Forcing
with
boxes
can
the same
the
bin.
two
packing
in an inferior
packing.
packing
the forced
the
with
forced
A
merge
further
of reconvergent
find
pairs
the
Best
of local
combination
constructed
Paper 15.1
230
is considered
at every
works
one node
ing
and
at
of re-
The
To find
need
paths
by first
into
to a supe-
This
to a superior
and
if they
bins
actually
Circuit,
are
are
may
more
terminate
Chortle-crf
pairs,
merging
begins
paths.
including
the
at
For
respective
It
itly
of some
the
pair
the
bin
any
pack-
search
of all
practical.
when
by finding
all
quired
In
Figure
the combina-
be
trees
[Fran90].
explicitly
imple-
table,
as primary
inputs
and
to the
work
the
network.
are
the
fanout
the
the
the
allows
these
rest
of the
only
example,
to
node
may
in
the
in
Figure
implement
lookup
de-
circuit
7a,
the
net-
implemented.
implementing
two
implic-
replication
replication
is explicitly
gate
nodes
the
tables
the
tables
fanout
are
re-
network.
programming
a fanout
fanout
For
node
and
the dynamic
encounters
This
lookup
required
AND
to implement
menting
of
fanout
requires
node.
number
tables
7b,
the
which
at a fanout
is replicated
When
of the
to
of a lookup
tables,
total
work
node
boxes
logic
the
lookup
To
is
node
to implement
implementing
node.
possible
partitions
of fanout-free
fanout
lookup
three
the
a circuit
a set
every
is possible
crease
without
of Chortle
into
to be treated
inside
one
every
make
pairs
The
netat
net work.
than
none,
pairs
Nodes
as the output
inter-
to be considered.
is that
is traversed.
of Logic
version
forces
result
both
the packing
of
that
benchmark
to be six pairs.
to
the
lookup
realization
network
MCNC
of these
pack-
(and
optimization
of reconvergent
enough
Fanout
network
mented
can be forced
one bin may
and
the Best
merge
reconvergent
of these
boxes
is fast
previous
lead
the
found
Replication
tional
before
algorithm
complication
Circuit,
them
has been
combinations
3.3
a smaller
boxes
as the
the
bin
output
This
local
node
number
approach
Circuit.
with
largest
the
tables
at the
is a greedy
experiments
the
with
lookup
inputs
as the Best
paths
realized
the pair
be merged
two
by merging
these
bin
only
The
of unused
is retained
In our
proceeding
the fewest
sizes.
an input
lead
number
table)
nodes
However,
fere
may
then
with
reconvergent
table,
may
and
circuit
possible
Circuit.
packed
into
in turn
the
inputs,
paths
lookup
This
bin.
bin,
individual
share
to
pairs
The
greatest
of
orequal
intoone
and realizing
is occupied.
number
of distinct
that
a single
exists
the same
number
boxes
which
boxes
into
of reconvergent
within
bin
total
of the boxes’
of boxes
the two
paths
portion
total
there
less than
the two
the sum
pair
then
the
are packed
which
convergent
If
twoboxesis
boxes
occupied
input,
paths.
it impossible
When
same
paths
6.
to find
following
bins.
If two
reconvergent
Figure
node
traversal
the
Best
is constructed.
of the net-
Circuit
At
this
implepoint
.\.i
.-..
‘frw
-..;~...-.,
~-.
----
Network
L---
.-
--
. .. . ..J
,... --!
-i
z4ml
Iwl
replicated
Figure
logic
7.
19
11
23
21
30
34
27
31
40
62
67
71
74
80
31
31
55
56
59
72
80
82
86
103
apex2
123
123
121
120
80
alu2
131
121
127
116
129
duke2
138
136
126
120
128
C499
166
164
158
74
66
rot
219
207
208
189
200
apex6
232
219
230
212
243
alu4
238
219
227
195
235
apex4
603
600
579
558
765
e64
II
!1
II
Ii . .... .... ... . . .—-. —.-..
19
112
rd84
II
8
20
24
34
47
63
69
72
76
110
apex7
!/
!1
6
95
9symml
logic
9sym
b) with
two
options
explicitly
are considered.
If the fanout
node
as a primary
implicitly
The
implemented,
is explicitly
input
lookup
table
replica
replaces
node
rest
of the
a replica
is made
the
fanout
starting
with
each
node
1073
1060
1050
952
1016
total
3547
3454
3417
3057
3390
for
5
4
If it
function
fanout
as the
is
of the
edge.
source
This
of
the
edge.
Every
will
path
eventually
output
and
reach
of the
primary
an edge from
another
network.
fanout
These
outputs
a fanout
node
be referred
the
replication
node
or a primary
subsequent
will
40
1: Results
K =
Results
it is treated
network.
of the
for
64
73
can be ei-
implemented.
implemented
to the
implemented,
output
fanout
or implicitly
64
des
Table
ther
lookum
9
115
5xpl
count
. .. . .. . . ---------
lookups
lookups
9
C880
vg2
a) no replicated
Iookups
9
20
24
31
45
59
65
71
76
95
misexl
I
L..- .-.__!
..-!
lookups
mis-pga
-crf
-Cf
-cr
--l
I
I
!
!
!
L----
:-crf
Cho]
-c
i
to
fanout
nodes
as the
visible
To
evaluate
Chortle-crf
performed
on
sis benchmark
on each
-c
a series
networks
suite.
from
Four
experiments
MCN-C
experiments
were
logic
were
synthe-
performed
network:
using
only
-cr
using
the
reconvergent
-cf
using
the
replication
using
both
-crf
of
the
the
constructive
bin
packing
approach
optimization
optimization
reconvergent
and
replication
nodes.
To
determine
Chortle-crf
visible
if
solves
node
the
node
is constructed
once
without
solved
lems
Best
Chortle-crf
are
explicitly
like
is fast
enough
once
with
the
inputs.
to make
The
For
technology
solving
each
visible
and
is itself
assumption
that
in these
and
The
the
subproblem
encountered
implemented
primary
worthwhile
the replication
Each
with
nodes
is
subproblems.
implementing
replication.
fanout
treated
of
Circuit
twice;
the
using
remaining
a series
bin
can
therefore
packing
these
any
subprob-
the
number
subproblems
of lookup
nodes
both
known.
If the
by the
replication,
replication
with
total
network.
been
required
and
number
then
of logic
it is encountered
of the
tables
have
the
solved
to implement
without
the
of lookup
by the dynamic
at every
total
the
replication
tables
replication
is considered
the
Table
fanout
programming
the
3.7 %.
node
traversal
aa
tion
number
the
of lookup
by 2.7
Combining
both
of lookup
together
by
This
nodes
and
the
when
are found
tables
of the
required
the
four
reduced
to
imple-
replication
of lookup
opti-
tables
reduced
by
the total
14 Yo.
when
exceeds
occurs
of K-
10.
optimization
optimizations
tables
often
cross fanout
lookup
number
achieved
reductions.
of 5-input
in each
as
Chortle-crf
2 to
tables
total
networks
that
as circuits
networks
YO ,
the
[Bray86].
of K from
reconvergent
the
the
Note
was
using
script
networks
values
number
reduction
standard
tables.
networks
reduced
number
The
to implement
the
The
total
are
The
used
for
1 records
experiments.
mization
is reduced
is retained.
the
lookup
tables
procedure
optimization
with
to implement
vis-
experimental
logic
of implementing
lookup
ment
the
was then
of 5-input
is capable
the
After
in
optimizer
Chortle-crf
required
prac-
step
independent
logic
circuits
be
tical.
ible
misII
input
approach
subproblems
first
using
sum
both
optimiza-
of the
reconvergent
and realized
individual
paths
within
that
a single
Paper 15.1
231
Network
Chortle-crf
Network
-c
-cr
-Cf
-crf
CLBS
CLBS
CLBS
CLBS
Chortle-crf
CLBS
3
z4ml
misexl
14
14
vg2
18
21
18
5xpl
20
20
count
27
31
23
32
27
9symml
41
42
50
41
9sym
42
44
56
42
apex7
42
45
49
42
rd84
53
52
52
53
53
e64
54
e64
48
48
54
54
C880
69
C880
75
70
94
69
apex2
93
apex2
94
90
97
93
alu2
83
alu2
94
86
98
83
duke2
89
duke2
88
87
91
89
C499
50
5
5
7
3
misexl
14
14
14
vg2
5xpl
20
19
20
count
32
9symml
50
9sym
52
apex7
48
rd84
z4ml
23
84
84
96
50
rot
131
rot
134
129
144
131
apex6
161
apex6
169
161
169
161
alu4
138
alu4
165
144
174
138
subtotal
apex4
457
451
463
448
des
714
695
797
743
2418
2317
2582
2319
C499
total
lookup
table.
where
A dramatic
using
lookup
both
tables
As
an
0.8
7
6
296.5
10
12
298.2
0.6
21
25.6
299.7
3.2
23
45.5
20
19
2.0
28
32
59.1
43
56
901.2
1
52
305.1
117.3
51
304.6
65.1
38
303.2
62.9
59
50
2.9
number
result
sixth
of 5-input
lookup
lookup
benchmark
of
the
448
des
743
15.4
32
1.9
61
65
901.5
12.6
82
101
1809.4
34.9
70
102
909.7
56.3
102
91
907.8
9.1
105
357.1
99
903.6
15.9
50
137.5
121
1847.0
14.0
153
844.8
166
1811.4
25.3
191
1376.8
198
1822.6
178.1
189
232
1849.4
z
2319
-
number
tables
than
tables
Xilinx
produced
into
CLBS
required
that
can
to
pairs
exceeds
the
implement
bles
Each
3000
two
can
4-input
5-input
lookup
finding
pairs
CLB.
Finding
be restated
Chortle-crf
Note
using
of lookup
the
(CLBS).
table
total
and
then
the
other
each
CLB
tables
maximum
that
to implement
fit
circuits
derived
that
using
only
the number
the
Paper 15.1
232
optimization
basis
and
using
22 Yo fewer
benchmark
The
Chortle-crf
from
the
[Murg90a].
the
mapping
previous
and
the replication
optimization
in the derived
reduces
the
number
circuit,
can
on its
records
of lookup
of
increase
capable
CLBS
in
12
than
system
systems
in
the
final
the
the benchmark
Xilinx
fewer
YO
CLBS
cir-
number
networks
software.
XNFOPT
im-
mis-pga
design
3 records
and
of
are
to these
CLBS
Table
mis-pga
also
records
on a Sun
In
3/60
the
Xilinx
is performed
by
[Xili89].
and
execution
on a Sun
number
to implement
XNFMAP
even
ta-
In
than
to-
mis-pga
to implement
the
networks.
indefinitely
of CLBS
of
of lookup
proprietary
time.
CLBS
of
be compared
required
table
Matching
num-
required
and
pairs
2 records
Xilinx
can
of the
Chortle-crf
a sin-
of pairs
number
in a net
systems
as circuits
execution
tal,
inside
the
number
result
synthesis
the
Chortle-crf,
of such
Cardinality
Table
of
can be reduced
number
[Gibb85].
circuit
logic
and
of CLBS
to 5.
will
networks
Chortle-crf
the
number
in
ta-
tables
of CLBS.
[Murg90a]
cuits
in the
may
lookup
of lookup
reduction
reduction
node
at some
maximum
the
replication
a fanout
pairings
the
If
the
at
used
some
reducing
be found.
[Xili89].
on
or
number
or equal
from
of CLBS
as a Maximum
the
Blocks
lookup
as the
one
to
devices
experiments.
increase
when
in
Logic
is less than
number
tables
These
5-input
as long
CLB
by
The
[Murg90a]
of CLBS
[Hsie88].
can be derived
tables
by
problem
tables
table.
gle
one
to the
of CLBS
each lookup
uses lookup
logic
of Configurable
inputs
A circuit
ber
of FPGAs
implement
lookup
of distinct
can
series
an array
CLB
8800
Results
of inputs
precluding
number
Two
CLBS
combinational
contain
Sun 3/60
VAX
of logic
number
plementing
The
a
a
3: CLB
replication
the
thereby
Ghortle-crf
mis-pga
The
increase
bles
1 records
in the circuits
total,
tables.
technology
networks.
Xilinx
implement
on
on
3ELE!I
m
of
the
4.1
times
times
Table
C499.
lookup
of Table
tables
In
mis-pga
5-input
column
[Murg90b].
10 % fewer
the
a circuit
The
mis-pga
reduces
301.1
301.9
by 55 %.
produces
[Murg90a].
the
optimizations
sec. 1
0.7
2 execution
is the network
OPT
CLBS
m
apex4
Results
example
intermediate
mapper
by
2: CLB
2
sec.
‘execution
Table
xl
mis-pga
T
1128
tot al
I
~
the
total
3/60.
in these
time.
execution
mis-pga
for
8800
system
technology
two
programs
XNFOPT
that
XNFOPT
experiments
The
times
on a VAX
design
the
Note
execution
It should
the
and
limits
seventh
time
be noted
will
were
column
of
the
that
two
run
placed
of Table
programs
by conservative
3
estimate
a VAX
Taking
into
and the VAX
faster
8800
account
than
is twice
the
8800,
as fast
relative
Chortle-crf
mis-pga
and
as a Sun
speed
of the
is an average
30 times
faster
3/60.
Sun
[Fran90]
J.
Francis,
Based
of 68 times
than
R.
Technology
3/60
27th
Field
DAC,
R.
J.
bin
packing
in this
vious
approach
paper
exhaustive
of gate
search
decomposition
that
replication
Using
of
and
[Murg90a]
aa circuits
of Xilinx
benchmark
12
CLBS
than
times
faster
CLBS.
CLBS
XNFOPT.
On
than
average,
and
mis-pga
re-
[Gibb85]
[Greg86]
D.
tion.?d logic,”
networks.
[Hsie88]
networks
22
Chortle-crf
times
re-
[Kahr86]
fewer
was
faster
68
than
M.
H. Freeman
Lookup
of Electrical
En-
“Computers
Theory
of
and
NP-
and
[Lisa87]
Proc.
Co.,
1979,
pp.
IEEE
R.
June
1987,
Lisanke,
Logic
June
9000-Gate
au-
combin>
1986, pp. 79-85.
User-Programmable
1988 CICC,
May
a parts
K. Keutzer,
“DAGON:
cal Optimization
by
for
optimizing
DAC,
ICCAD,
Cam-
125-133.
a system
and
23rd
Theory,”
pp.
“Socrates:
“A
Proc.
1988, pp. 15,3,1
library
1986,
pp.
in
a silicon
169-172.
Technology
Bindkg
and LoDAG
Matching,”
Proc.
24th
pp. 341-347.
F.
Fast Technology
Work
al.,
Graph
1985,
“Matching
Kahrs,
compiler,”
[Keut87]
et
et al.,
Gate Array,”
-15.3.7.
the
‘?10
for
in preparation,
Department
synthesizing
H. Hsieh,
DAC,
Future
Proc.
Thesis
D. S. Johnson,
A Guide
to the
W.
Gregory,
mis-pga
XNFOPT.
6
Arrays:
Mapping
Ph.D.
“Algorithmic
A. Gibbons,
bridge University
Press,
than
and
30
Table-
124-129.
tomatically
Chortle-crf
mis-pga
M. R. Garey,
Intractability,
Chortle
To implement
of CLBS,
than
A
Lookup
613-619.
“Technology
of Toronto,
than
of implementing
as circuits
fewer
YO
tables
“Chortle:
for
lo-
paths
Chortle-crf
a set of benchmark
3000 series
networks
quired
to consider
tables
Chung,
Gate
pp.
FPGAs,”
Completeness,”
speed
nodes.
lookup
capable
[Gare79]
a pre-
reconvergent
lookup
to implement
is also
both
de-
than
improved
optimizations,
5-input
10 % fewer
Chortle-crf
faster
The
at fanout
these
14 % fewer
[Fran90]
decomposition
it practical
exploit
of logic
both
quired
gate
approach.
makes
cal optimizations
and
to
is up to 28 times
1990,
Francis,
University
gineering.
Conclusions
The
K.
Program
Programmable
June
Table-Based
scribed
Rose,
XNFOPT.
[Fran91]
5
J.
Mapping
Brglez,
G.
Mapping
Synthesis?
Kedem,
Procedure
Proc.
ICCD,
“McMAP:
A
for Multi-Level
Oct.
1988,
pp.
252-
256.
Currently,
fanout
the
and
There
optimizations
replication
are, however,
timization.
The
be extended
to
local
search.
paths
within
the
There
tually
not
table
the
will
result
in the
the
entire
network
at different
by
depend
fanout
nodes.
fanout
number
Murgai,
1990,
et
Gate
al.,
“Logic
Arrays,”
pp.
620-625.
the
[Murg90b]
R. Murgai,
private
[Rose90]
J. Rose, R. J. Francis,
upon
tures
Synthesis
Proc,
27th
for
Pro-
DAC,
June
correspondence.
D. Lewis,
of Field-Prograrmnable
fect of Logic
may
Block
Functionality
IEEE
Journal
of Solid-State
5, Oct. 1990, pp. 1217-1225.
P. Chow,
Gate
“Architec-
Arrays:
of Area
Circuits,
The
ef-
Efficiency,”
Vol.
25, No.
be mu-
tractable
method
at fanout
of lookup
R.
grammable
requiring
nodes
set of replications
minimum
found
of reconvergent
optimizations
[Murg90a]
op-
should
may
A computationally
which
these
paths
a pair
at multiple
locally,
among
paths
lookup
where
reconvergent
evaluated
reconvergent
those
of logic
of determining
for
realizing
cases
are
interactions
well,
of logic
exclusive.
logic
include
a single
are
replication
global
search
As
replication
exploiting
of
nodes
tables
[Xili89]
XACT
blC..
LCA
Development
System,
Vol.
II,
Xilinx
1989.
for
is needed.
References
[Ahre90]
M. Ahrens,
High
[Bray86]
[Cart86]
and
19!20 CICC,
May
R. Brayton,
et al.,
tion
359.
UAn FpGA
et aL,
Densities
System~
Reduced
1990,
optimized
Delay,”
for
Proc.
pp. 31.5.1-31.5.4.
“Multiple-Level
Proc.
W. Carter
et al.,
urable gate array?
Family
Routing
ICCAD,
Logic
Nov.
1986,
Optimizapp.
356-
“A user Programmable
reconfigProc. CICC,
May 1986, pp 233-
235.
[Detj87]
E. Detjens
Proc.
et.
ICCAD
al,
“Technology
87, Nov
1987,
Mapping
pp.
in
MIS”,
116-119.
Paper 15,1
233
Download