Programmable Parallel Arithmetic in Cellular Automata Using a

advertisement
Complex Systems 8 (1994) 311-323
Programmable Parallel Arithmetic in
Cellular Automata Using a Particle Model
R ichard K . Squier
Computer Science Departm ent, Georgetown University,
Washi ngton , D C 20057, USA
Ken Steiglitz
Com p uter Science Department , Princet on University,
Prin cet on, N J 08544, USA
A bst ract. In thi s paper we show how to embed pr acti cal compu tat ion in one-dimensional cellular automata using a mod el of comput ation based on collisions of moving part icles. T he cellular automat a
have small neighbo rhoods, and state spaces t hat are bin ary occupancy
vecto rs. T hey can be fabricated in VLSI, and perh aps also in bulk
media t ha t support appropriate part icle prop agati on and collisions .
The model uses inject ed part icles to repr esent bot h dat a and processors. Consequentl y, realizations are highly progr amm abl e and do not
have applicat ion-specific to pology, in contras t to systolic arrays . We
describe several pr actical calculations t hat can be carr ied out in a
highly par allel way in a single cellular automa t on, including addit ion,
subt raction, multipli cat ion , arbit ra rily nest ed combinat ions of t hese
ope ra t ions, and finite-impulse-resp onse digit al filt ering of a signal arriving in a cont inuous st rea m. T hese are all accomplished in t ime
linear in t he num ber of inpu t bits, and wit h fixed-p oint arit hm et ic of
arbitrary pr ecision , independent of t he har dware.
1.
Introduction
Our goal in t h is pap er is t o ach ieve p ra ct ical compu tation in a uniform ,
sim ple, locally con necte d , hi gh ly p ar allel arch itect ure- in a way t ha t is also
progr ammable, and t her eby accom m od at es t he d iffer in g requir ements of a
variety of ap plica t ions . Syst olic arrays [22], of course , sa t isfy t he loca lly connected and parallel requirement s , b u t t he to pology a nd pr ocessor functionality are difficult t o m odify on ce t he m achine is bui lt . In t h is p aper we show
how to embe d pr acti cal com p u tation in on e-dim ension a l cellular a utomata
(CA s) usin g a mod el of com p utation t ha t is b as ed on collisions of m oving
p art icles [33]. T he resultin g (fixed) har dware com bin es t he parallelism of
systolic arrays with a high degr ee of progr am m ability.
312
Ri chard K. Squier and Ken St eiglitz
Fredkin , Toffoli, and Mar golus [19- 21] have explored the idea of the inelast ic billiar d-b all mod el for computat ion, which is computation-universal.
There is also a lar ge lit erature in lattice-gas aut omata (see [18], for example) , which use particle motion and collisions in CAs to simulate fluids. We
have shown [17] t ha t a general class of la tti ce-gas automa ta is comput ationun iversal.
Several recent papers have dealt wit h particle-like persist ent structures
in binary CAs and t heir relationship to p ersist ent structures like solito ns in
nonlinear different ial equa t ions [3, 8- 14] . It is fair to say t hat no one has yet
succeeded in usin g these "nat urally occur ring" particles to do useful compu t at ion . For inst an ce, t he line of work in [14] has succeeded in support ing
only t he simplest, almost trivial computation- binary ripp le-carry additio n ,
where t he dat a is pr esent ed to the CAs wit h the addend bit s int erleaved. The
reason for the difficulty is t hat t he behavior of t he par ticles is very difficult
to control and their proper ties only vaguely underst ood.
This paper describ es what we think is a promising method of using a particl e model and illust rat es it s applica t ion to several different computations.
Our basic method will be t o int roduce t he model of comp utation using part icles, and t hen realize t his mod el as a CA. Onc e the particles and their
interacti ons have been set in t he realization , t he computation is st ill highly
pr ogrammab le: t he pr ogram is det ermined by t he sequence of particles inject ed . In our examples, we build in enough particles and int eracti ons so
t ha t many different computations are possible in t he same machin e.
CAs for t he particle mod el can be realized in conventional VLSI , orand t his is much more spec ulat ive-in a bulk physical medium t hat supports
movin g persistent st ructures wit h the appropriate charact erist ics [1, 2, 47]. Withou t dist inguishing between these two sit uat ions, we call the CAs
or medium a substrate. We call the subst rate, the collect ion of particles it
supports , and their int eractions a Particle Ma chine (PM) .
The machin es in t his pap er are one-dimensional, and t hese PMs are
most clearl y applicable t o comput at ions that can be mapp ed well t o onedim ensional systolic arrays . T he computat ions given as examples in t his
pap er deal wit h basic binar y arit hmet ic and applications that use ar it hmet ic
in regul ar ways , like digit al filt ering. Our cur rent ongoing work is aimed
toward adding more parallel app lica tions, esp ecially signa l pro cessing, ot her
num erical computation , and combinatorial applications like st ring matching.
Alt hough our main goa l is par allelism , it is not hard t o show t hat there
are CAs of our typ e t hat are comput at ion-universal, by t he usual stratagem
of embedding an arbit rary Tu rin g machine in the medium and simulat ing it .
We omit det ails here.
On e imp or tant adva ntage of par allel comput ing using a PM is the homogeneity of t he implement ation . Once a par ticular set of particles and t heir
interacti ons are decided on , only the subst rat e suppo rt ing t hem needs to be
implement ed . Since t he cells of t he supporting substrate are all identical , a
great economy of scale b ecomes possible in t he implement ati on . In a sense,
Programmable Parallel A rithme tic in CAs Using a Particle Mod el
313
we have moved t he particular s of applicat ion-spec ific computer design to the
realm of software.
The results in this pap er show that P M-b ased computations inherit the
efficiency of syst olic arr ays, but t ake pl ace in a less spec ialized machin e.
Our examples include arbitrarily nest ed multiplication / addition and finiteimpulse res pons e (FIR) filt ering, all usin g the sa me cellular aut omato n realization, in time linear in the number of input bit s, and with arbit ra ry
pr ecision fixed-point op erands.
This pap er has three parts. F irst we describ e the model for parallel
computation based on t he collisions and int eractions of particles. Second , we
show how this model can be reali zed by CAs . Finally, we describe lin ear-time
implement ations for the following computat ions :
• binar y ad dit ion and su btrac t ion;
• binary multiplication;
• arbitrarily nest ed arit hme tic expressions t hat use t he op erations of additi on , subt rac t ion, and multiplication of fixed-point numbers; and
• digit al filt ering of a semi-infinite st ream of fixed-point numbers wit h a
F IR filter (with arbitrary fixed-point coefficients ) .
2.
The particle model
We begin wit h an informal desc ript ion of the mod el. Put simply, we want to
think of comp utation as b eing performed whe n particles movin g in a medium
collide and int eract in ways det ermined only by the particles' identities. As
we will see b elow, this kind of interacti on is easy to in corporat e into t he
framework of a CA .
Figur e 1 shows t he general sche me envisione d for a PM. We think of part icles as b eing inj ect ed at different speeds at the left end , colliding along the
(arbitraril y long) array wh ere they pr opagate, and finally pro ducing results.
We will not be concerne d here wit h retrieving t hose results, and for our purposes the comp utation is complete when the answers appear somew here in
the array. Of course t he distance of answers fro m t he input port is no worse
than lin ear in t he number of time ste ps that the comp utatio n takes.
In a real implem ent ation, we can eit her pro vid e taps along the array or
wait for results t o come out t he ot her end , however far away that may be. A
thi rd alte rnat ive is to send in a slowly moving "mirr or" particle defined so
as to reflect the answer particles back to the input. We can make sure t hat
particles injected
~--~=----~--=O-~---:=---D=---~--=-
to infinity -
Figure 1: The general model envisioned for a particle machine.
314
Richard K. Squier and Ken Steiglitz
reflecti on from the mirror produces particles that pass unharmed through
any particles earlier in the array.
We next show how such PMs can be embe dded in CA s in a natural way.
Not e t hat sometimes it is convenient to shift the speed of every particle
to change the fram e of reference. For example, in the following we will
someti mes assume t hat some particles are moving left , and others right ,
whereas in an actual implementati on all par ticl es might be moving right
at different spee ds . The collisions and t heir results are what matter. This
shift in t he fram e of reference is trivial in the abstrac t picture we have ju st
given , and will b e eas y t o compe nsate for in a CA implementation by shift ing
t he output window of the CA .
3.
Cellular automaton implementation
Informally, a CA is an array of cells, each in a state t hat evolves wit h time.
The state of cell i at time t + 1 is determined by the st ates of cells in a
n eighborh ood of cell i at t im e t, the neighborhood being defined as those cells
at a dist an ce r (t he radius) or less of cell i . Thus, t he neighb orh ood of a CA
wit h radius r contains k = 21' + 1 cells and includes cell i it self. When the
state space of a cell, S , is binary-valued- that is, when S = {a, l} - we call
t he CA a bina ry CA.
We will avoid t he difficul ti es of using "nat ur ally occurr ing" particles by
expanding the state space of the CA in a way t hat reflect s our picture of a
P M. T hink of eac h cell of t he CA as containing at any par ti cular t ime any
combinat ion of a given set of n particles. T hus , we can think of the st ate
as an occupan cy vect or, and the state space is therefore now S = {O, I} " .
Im plementing such a CA presents no part icular problems . We spec ify t he
nex t -state t ra nsit ions by a table t hat shows, for each combinat ion of particles
t hat can enter t he window, what the next state is for the cell in qu esti on.
Thus, for a CA wit h a neighb orhood of size k t hat supports n distinct par ticles, t here are in t heory 2 n k states of t he neighb orhood , and hence 2 n k rows
of t he t ransit ion table. Each row contains t he next state, a binary n-vect or
t hat encodes t he part icles t hat are now present at the site in qu estion. Onl y
a small fracti on of this space is usu ally needed for implement ati on , since we
are only int erest ed in row ent ries that correspond to states t hat can actually
occur. T his is an imp ortant point whe n we consider the practica l implementat ion of such CAs , either in software or har dwar e.
Figure 2 shows a simp le example of two particles traveling in oppos ite
dir ecti ons that collide and int eract. In t his case t he right- movin g particle
annihilates the left- movin g particle, and is itself transformed into a right mov ing part icle of a different type. It is easy to verify that the t ransit ion of
each cell from t he states rep res ented by t he particles present can be ens ured
by an ap pro priate state-trans it ion table for a neighb orh ood of radius l.
Next , we describe some nu merical operat ions that can be performed efficientl y in P Ms. We start with two simple ways to perform binary addit ion ,
an d then build up to more complicated examples. Our descrip tions will be
Programm able Parallel Ari thm etic in CAs Using a Particle Model
315
Figure 2: An examp le of a collision between a left- and right-moving
particle. Th e left-moving part icle is annihilated, and th e right-moving
part icle is transformed into a different kind of right-moving particle.
Th e neighborhood radius in this example is 1, as indicat ed in Row 4.
more or less informal , but t he more com plicate d ones have b een verified by
com pute r simulat ion . Furthermore, t hey can all b e implem ent ed in a single
PM wit h ab out 14 p art icles and a trans ition table representing a b out 150
rules. The examp les all use a neighborhood of rad ius 1.
4.
A dding bin a ry n umbers
F igure 3 shows on e way t o add binary numb ers . Each of the two addends are
represent ed by a sequence of particles, each p article representing a sing le bit.
T hus there are fou r par ticles used to represent dat a : left- and right-moving
as and I s. We will arrange them wit h leas t- significan t bi t leading, so th at
when t hey collide , t he bit s meet in order of incr easing significance . This
makes it easy to encode the carry bit in the st ate of t he processor p ar ti cle.
At t he cell where the da ta particles meet we place an inst an ce of a fifth
kind of particle, a stat ionary "processo r" p article, which we call p . This
processor p ar t icle is act ually one of two par ticles, say Po and PI, meant to
represent th e fact t hat the current carr y bit is eit her a or 1. The pr ocessor
left addend
processor
particle
o
o
right addend
............
....
• 0 • 0
•
Figure 3: Implementation of binary addition by having the two addends collide head-on at a single processor parti cle. T here are four
different dat a particles here, left- and right-moving Os and Is, represented by unfilled and filled circles. Th e processor particle is represented by the diamond , and may actually have several sta tes.
Richard K. Squier and Ken St eiglitz
316
par ticl e starts out as Po , to repr esent a 0 carry. The collision t abl e is defined
so that the followin g things happen at a processor particle wh en t here is a
right-moving data particle in the cell immediat ely adjacent to it on the left,
and a left-m ovin g dat a particle in t he cell immediat ely adjacent t o it on t he
right:
1. the two dat a particles are annihila te d;
2. a new, left- movin g "answer" particle is ejecte d ;
3. the identity of th e processor particle is set t o Po or Pl in the obv ious
way, to reflect the value of t he carry bit .
Not ice that we can use the sa me left-movin g par ticl es to represent the answer
bit s, as lon g as we make sure t hey pass t hrough any right- mov ing par ticl es
they encounte r .
We can t hink of the two vers ions of the pro cessor particle as two dist inct particles, alt hough t hey are really two "states" of the sa me part icle- in
some sense analogous to the ground and excite d states of at oms . T hus, we
can alte rnat ively t hink of the different pr ocessor par ticles collective ly as a
"processor at om," and t he Po and P: particles as different states of the sa me
ato m . Either way, t he processor ato m occupies two slots of the occupancy
vector t hat st ores the stat e. We call the particular state of a part icle it s
excit ation state or it s particle identity. We use similar termino logy for the
dat a atom, which has four states that represent binary value an d directi on.
It is not hard to see t hat t his is not the on ly way t o add , and we now
describ e anot her , to illustrat e some of t he tricks we have found useful in
designing PMs to do par ti cular tasks . On e addit ion met ho d may be better
than another in a given app lication , becau se of t he place ment of t he addends
or t he desired place ment of the sum. T he second addit ion method ass umes
t hat the addends are sto red on t op of one anot her, in dat a atoms that are
defined to be stat ionary. That is, t he dat a atoms of one addend are distinct
from those of t he ot her , so t hey can coexist in t he sa me cell and have speed
zero (see Fi gure 4). We can think of t his sit uation as st oring t he two nu mb ers
in par allel registe rs.
We now shoot a processor atom at the two addends , from t he leastsignificant -bit directi on. T he pr ocessor atom simply leaves the sum bit behind, and t akes t he carry bit wit h it self in the form of it s excitat ion state .
processor
particle
left addend
•
000
0 •
o •
•
0
•
•
right addend
Figure 4: A second met hod for adding binary numbers, using a processor atom t hat sweeps across two stationary addends stored in parallel
and takes t he carry bit with it in th e form of it s excitation state.
Programmable Parallel Arithmetic in CA s Using a Particle Model
left multiplicand
processor
particles
317
right multiplicand
DDDDDD
Figure 5: Two dat a streams colliding with a processor st ream to perform multiplication by bit-level convolut ion.
T his method reflect s t he hardwar e of a ripple-carry adder , and was used in
[14J.
Negation, and hence subt raction, is qui te easy to incorp orate int o the CA.
Just add a particle that complements data bits as it passes t hr ough t hem ,
and then add one . (We ass ume two 's-complement arithm et ic.)
5.
Mult iply in g binary numbers
T he preceding description should make it clear that new kinds of particles
can always be added t o a given CA implem ent at ion of a PM , and t hat the
proper ties we have used can be easily incorporated in the next-st ate t ab le.
Figur e 5 shows how a bit -level syst olic mult iplier [15, 16, 23, 24, 25, 30J
can be imp lemented by a P M. As in the adders , both data and processors
are represented by par ti cles. The two sequences of bits representing t he
multiplicands t ravel t oward each ot her and collide at a stat ionary row of
processor particles. At each three-way collision involving two data bit s and
one pr ocessor , the pr ocessor particle encodes the product bit in it s excitat ion
st ate and adds it to the product bit previously stored by it s state . Each rightmov ing data particle may be accompanied by a carry bit particle. When the
data bits have passed each ot her, the bit s of the pr oduct remain enco ded in
the st a tes of t he pr ocessor particles, lowest-order bit on t he left . Fi gure 6
shows t he out put of a simulation program written to verify correct op eration
of t he algorit hm .
6.
Nested arithmetic
It is easy enough t o "clean up " the op erands afte r addit ion or mult ipli cation
by sending in particles that annihilat e t he original data bits. This may
req uire sending in slow annihila tion particles first , so one op erand overtakes
and hits them after the multiplication . Therefore, we can arr ange addit ion
and multiplication so the results are in t he sa me form as th e inputs. This
makes it possible to imp lement ar bitrarily nest ed combinations of addit ion
an d mul ti plicati on wit h t he same degree of parallelism as sing le addition and
multiplica tion.
Fi gure 7 illust rat es t he gene ral idea . The product A x B is added to the
pro du ct C x D . The sequences of part icles representing the ope rations must
be spaced with some care, so that the collisions pro ducing the product finish
before the addition begin s.
Richard K. Squier and Ken Ste iglitz
318
R.
.R
R.
.R
R.
.R
R.
.R
R.
.R
R.
.R
R.
.R
R.
.R
.L
L.
.L
L.
.L
L.
.L
L.
.L
L.
.L
.L
L.
.L
p
P
P
P
P
P
P
P
p
p
p .L
p
pL.
p
p
p
p
p
P .L P
p
p
pL . p
pRo p . L p
p .L
pL.
p . R pL.
p
p . L i n. p . L p
pL. 1 .R pL . p
iR . 1 . L i R. p
1 . R iL . 1 . R p
pRo
1 .L eBR. 1
u , B . Rei
p .R
eBR. p
1
B
1
B
B . Rep
B
B
iR .
1
B
1 .R
1
B
1
1
B
B
.L
L.
.L
L.
.L
L.
.L
L.
R.
.R
R.
.R
R.
R.
F igure 6: CA implement a tion of PM sys t olic mult iplication , show n
with most- significant bit on th e right . Row t represents the CA states
at t ime t; t hat is, t ime goes from top to bo ttom. The symbol "R"
represents a right-movin g 1, "L" a left -movin g 1, and so for th. The
com putation is 3 x 3 = 9. T he dat a part icles keep going afte r they
inter act wit h t he pr ocessor particles, and would probab ly be clean ed
up in a pr acti cal sit uation .
[ A ~ B] .. [C • 0]
~
o
0 ••
~
00000000 0 0 0 •
v
<>
~ -- - - - - . - .._-'4.
• 0 • •
~
00000000 • 0 0 •
Fi gure 7: Illustrati on of nest ed computation. Shown is t he particle
st ream inj ect ed as input to a particle m achine . Collisions among the
gro up on t he left will produce the pro d uct A x B , wit h the resu lting product movin g right , and symmet rically for the gro up on the
right. T hen t he particles repr esenti ng the two products will collide a t
a st ation ary particle that repr esent s an ad der-processor , and the two
products will be ad ded , as in Figure 3. The outlined "+" and " x" represent the particle gro ups that pr oduce add it ion and mu lt iplication ,
resp ectively.
7.
Digit al filt ering
A mu lti plier-accumu lat or is implement ed in a PM by st oring each pro duc t
"in par allel" wit h the previous ly accumula te d sum, as in the second add it ion
met hod, which was illustra ted in Figur e 4. T he new pro du ct and t he old
sum are then added by sending t hrough a ripp le-carry act ivation par ticle,
which tr avels along wit h and immediat ely follows each coefficient (see F igur e 8) . This makes po ssible t he PM implement ation of a fixed-point FIR
filter by havin g a left -m ovin g inp ut signal st ream hit a right-moving coefficient stream . The multip lications ar e bit- level sys to lic, and t he filt ering
Programmable Parallel Arithmetic in CA s Using a Particle Model
{
+hO ~
outputs :
~ +h 1
,,
*
Yo
multiplier-accumulator
particle group
"' -
~ ~ + h2 ~
319
Xo ~ ~
,,
* -x , ~,,
~
*
x2
*
Ys
Figure 8: Particle machine implementation of an FIR filt er. The
coefficients hi collide with t he inpu t data words X t at multiplieraccumulator particle groups indi cated by "*" . Each such group consists of a stat ionary multiplier group t hat sto res its product and t he
accumulated sum. The ripp le-carry adder particles, indicated by "+",
accomp any each coefficient hi and activate th e addition of the product to the accumulate d sum after each produ ct is formed. Finally,
on t he ext reme left is a right -moving "propeller" particle t hat t ravels
t hrough the results and propels t hem leftward , where t hey leave t he
machine.
convolut ion is word-level sys tolic , so t he ent ire ca lculat ion m irror s a two level systolic arr ay mult iplier [15, 16, 23, 24, 25, 30]. T he det ails are ted ious ,
but simple examples of such a filt er have b een simulated and t he idea verified .
8.
F eedba ck
So far we have int erpret ed the PM subs t rate as strict ly on e-d im ensional, wit h
particles passing "t hro ugh" one another if t hey do not int eract . However , we
ca n eas ily change our po int of view and int erpret the subs t rate as having
limit ed exte nt in a second dim ension , and t hink of parti cles as traveling on
differ ent t racks. We ca n gro up the comp onents of t he CA state vector and
int erpret each gro up as holdi ng part icles t hat share a comm on t rack. All
t racks op erate in p ar allel, and we ca n design sp ecial particles t hat ca use
ot her p art icles on different t racks to int er act . The entire CA itself ca n t he n
b e t ho ug ht of as a fixed-wid th bus, one wire per t rack.
To illust ra t e t he u tility of thinking t his way, cons ider t he problem of
im plem entin g a digit al filt er with a feedback loop . Su ppose we h ave an input
st rea m m oving to t he left. We ca n send t he output stream of a comp utat ion
back to t he right alon g a parallel track and have it interact wit h t he input
st ream at the processor particles doing t he filter arit hmet ic. T his feedback
t rack requires t wo bits, one for a "0" and one for a "I" m oving to t he right. In
order t o copy t he out put stream to t he feed back track we n eed an addit ional
particle, which can b e t ho ught of as t raveling on a t hird track. T his "copying"
pa rticle ca n b e gro up ed wit h t he ot he r processor par t icles and requires one
addit ional bit in t he CA state vector. Altoget her we need t hree addit ional
bi t s in t he state vect or to im plem ent feedback pro cessing .
In t his mult i-track scheme, each extra t rack enlarg es t he sta te space, but
adds progr ammin g flexib ility to t he mo de l. Ult im at ely, it is t he size of t he
collision table in a ny par ticular implement ati on t hat det ermin es t he numbe r
320
Rich ard K. Squier and Ken Steiglitz
of tr acks t hat are pract ical. We ant icipate that only a few ext ra t racks will
make P Ms more flexibl e an d easier to progra m.
9.
C onclusions
We t hink t he P M fram ework described here is int eresting from both a pr act ical and a t heoret ical po int of view. For one t hing, it invit es us t o think
about comp ut at ion in a new way, on e t hat has some of t he feat ures of systo lic arrays , bu t at the sa me t ime is more flexible and not ti ed closely to
a particular hardw are configuration . Onc e a subs trate and it s par ticl es are
defined , spe cifying st reams of particles to inj ect for par ti cular comput at ions
is a programming t ask, not a hard ware design t ask, alt ho ugh it is st rongly
cons traine d by t he one-di mensional geometry and t he interacti on rul es we
have create d. T he work on map pin g algor it hms to syst olic arr ays [31, 32]
may help us find a good lan gu age and build an effect ive compiler .
From a pract ical po int of view, the approach presented here could lead
to new kinds of hardware for highl y par allel comp utat ion, using VLSI imp lement ati ons of CAs. Su ch CAs would be dr iven by convent ional compute rs
to generat e t he st reams of input particl es. T he CA s t hemselves would b e
design ed to support wide classes of computations.
T here are many int eresting ope n research quest ions concern ing t he design of particle sets and int eracti ons for PMs. Of course we are espe cially
int erest ed in find ing PMs wit h small or easily implement ed collision t ables,
which are as com p utationally rich and efficient as po ssible.
Ou r curr ent ongoing work [34, 35] is also aimed at developin g more applicati ons, includin g it erati ve numerical computat ions and combinato rial pro blems such as st ring matching [26-29].
Acknowledgments
T his work was sup porte d in par t by NSF gr an t MIP -9201484 , and by a grant
from Georgetown University.
R eferences
[1] S. A. Sm it h , R. C. Watt, and S. R. Hameroff, "Cellular Au t om at a in Cyto skelet al La ttices ," Physica D, 10 (198 4) 168- 174.
[2] A. J. Heeger , S. Kivelson , J . R. Schr ieffer , a nd W. -P. Su , "Solit ons in Conduction P olymers ," R eviews of Modern Ph ysics, 60 (3) (1988) 781-850.
[3] N. Islam, J. P. Singh, and K. St eiglit z, "Solit on P hase Shifts in a Dissipati ve
Lat t ice," Journ al of Applied Ph ysics, 6 2(2) (1987) 689-693 .
[4] F . L. Carte r , "T he Molecular Device Comp ute r : P oint of Depar ture for Larg e
Scale Cellular Auto m at a ," Ph ysica D, 10 (1984) 175- 194.
Programmable Parallel Arithmetic in CA s Using a Particle M od el
321
[5] J. R. Milch , "Compute rs Based on Molecular Im plem entations of Cellu lar
Autom at a," pres ented at Third International Sympo sium on Mol ecular El ectronic Devices, Arl ington VA, Octo b er 7, 1986.
[6] S. Lloyd , "A Pot entially Realiza ble Qua nt um Com pute r," Science, 261(17)
(1993) 1569-1 571.
[7] C . Halvorson , A. Hays, B . Kr aab el, R. Wu , F . Wudl, and A . Heeger , "A
160-Femtosecond Optical Im age P rocessor Based on a Conjugated P olym er ,"
S cience, 265(26) (1994) 1215-1216.
[8] J. K. P ark, K. St eiglit z, and W . P . T hurs ton , "Solit on-Like Beh avior in Aut om at a ," Phy sica D, 19 (1986) 423-432 .
[9] C. H. Goldb erg, "P arity F ilte r Automat a," Complex Sy st em s, 2 (1988) 91141.
[10] A. S. Fokas , E. P ap ado poulou, and Y. Saridaki s, "P articles in Soliton Cellular
Autom ata ," Com plex Syst em s, 3 (1989) 615-633.
[11] A. S. Fokas , E . P ap ad op ou lou , a nd Y. Saridaki s, "Cohere nt Structures in
Cellular Autom ata ," Phy sics Lett ers A , 147 (1990) 369- 379 .
[12] M. Bruschi , P . M. Santini, and O. Ragnisco, "Int egra ble Cellular Au tomata ,"
Phy sics Lett ers A, 169 (1992) 151-160 .
[13] M. J . Ab lowit z, J . M . K eiser , and L. A . Ta khtajan , "Class of St able Mu lti stat e
T ime- Rever sib le Cellul ar Au tom ata with Ri ch Part icle Cont ent ," Phy sical
R eview A , 44 (1991) 6909- 6912.
.r
[14] K. St eiglit z, I. K am al, and A. Watson , "E mbe dd ing Computation in On eDimen sional Autom ata by Phase Coding Solit ons," IEEE Transactions on
Com puters , 37 (1988) 138-145.
[15] P. R. Cappello , "Towards an FIR F ilt er Tissu e," pages 276-279 in Proceedings
of th e International Confere nce on A coustics, Speech, and Signal Processing,
Tampa , FL, Mar ch 1985.
[16] J. V . McCanny, J . G. McWhir ter , J. B . G. Roberts, D . J. D ay, a nd T . L.
Thorp, "Bit Level Syst olic Arrays," Proceedings of th e Fift eenth A siloma r
Conference on Circuit s, Sy st em s f3 Computers , Novemb er , 1981.
[17] R. Squier a nd K. St eiglitz, "T wo-Dimens ional F HP Latti ce-Gases Ar e Computation Uni ver sal," Compl ex Sy st em s, 7 (1993) 297- 307.
[18] U. Fri sch , D . cl'Humier es, B . Hasslacher , P. Lallem and, Y. P omeau , and J. P.
Rivet , "Lat t ice Gas Hydrodynamics in Two and Three Dimensions," Compl ex
Sy st em s, 1 (1987) 649-707.
[19] C . H . Bennett , "Note s on the Hist or y of Reversib le Com putation," IBM Journal of Research and Development , 32(1) (1988) 16-23.
322
Richard K. Squier and Ken Steiglitz
[20] N . Margolu s, "P hysics-Like Mode ls of Computations ," Phys ica D, 10 (1984)
81-95.
[21] T. Toffoli and N. Margolus, Cellular Automata Machine s: A N ew Environ m ent for Modeling (Cambridge: MIT Press, 1987) .
[22] H . T. Kung , "W hy Systolic Architectures?" IEEE Transactions on Computers, 1 5 (1982) 37-46.
[23] H. T. K un g, L. M. Ruan e, and D. W. L. Yen , "A Two-Level P ip elined Systolic
Array for Convolutions," pages 255-264 in CMU Conference on VLSI Systems
and Computations, edite d by H. T. Kung , B. Sproull, a nd G. St eele (Ro ckville,
MD: Comput er Science Press, 1981).
[24] S. Y. Ku ng, VLSI Array Processors (E nglewood Cliffs, NJ: Prent ice Ha ll,
1988).
[25] P. R. Cappello and K St eiglit z, "Digit al Signal Processing Applications of
Systolic Algorithms,." pages 245- 254 in CMU Conference on VLSI Systems
and Computation s, edite d by H . T . Kun g, B. Sproull, and G . Steele (Roc kville,
MD: Comput er Science Press, 1981).
[26] H.-H. Liu and K-S . Fu , "VLSI Arrays for Minimum-Distance Classifications ,"
in VLSI for Pattern Recognition and Im age Processing , edited by K-S. Fu
(Berlin: Sp ing er- Verlag , 1984) .
[27] R. J . Lipton and D. Lopr esti , "A Systolic Array for Rapi d String Comparison ," pages 363-376 in 1985 Chapel Hill Conference on Very Large S cale
Integration, edite d by Henry Fuchs (Ro ckville, MD: Computer Scienc e Press,
1985).
[28] R. J . Lipton and D . Loprest i, "Comparing Lon g Strings on a Short Systolic
Array," 1986 In ternational Work shop on Systo lic Arrays, Ox ford University,
July 2-4, 1986.
[29] G . M. Landau and U . Vishkin , "Int roducing Efficient P ara llelism into Approximate String Matching and a New Serial Algorithm," ACM STOC, (1986)
220-230.
[30] F. T. Leighton, Introduction to Parall el Algorithms and Archit ectures (San
Mateo, CA : Morgan Ka ufman, 1992).
[31] P . R. Cappello and K St eiglitz , "Unifying VLSI Array Design wit h Linear
Transformations of Space-Time," pages 23-65 in Advances in Computing Research: VLSI Th eory, edite d by F . P. Preparata (Greenwich, CT: JAI P ress,
1984).
[32] D. 1. Moldovon and J . A. B. Fortes, "P art it ioning and Mapping Algorithms
into Fixed Systolic Arrays," IEEE Transact ions on Computers, C -35(1)
(1986) 1- 12.
Programmable Parallel Arit hmetic in CAs Using a Particl e Mode l
323
[33] R. K. Squ ier and K. St eiglit z, "Subatomic P article Mach ines: Parallel Processing in Bulk Material ," submit te d to Signal Processing Lett ers.
[34] R . K. Squ ier , K. Ste iglit z, and M. H. Jakubowski, "Implementation of P ara llel Ar ithmet ic in a Cellu lar Automaton," 1995 Intern ation al Conference on
Application Specific Array Processors, Strasbourg, France, July 24-26, 1995.
[35] R. K. Squ ier , K. Steiglit z, and M. H . J akubowski, "General P ar allel Comput ation Wit ho ut CPUs : VLSI Realization of a Particle Machine," Technic al
Report CS-TR-484-9 5, Comput er Science Department , P rinceton University
(February 1995). Submitted to IEEE Transactions on Computers.
Download