Complex Systems 8 (1994) 311-323 Programmable Parallel Arithmetic in Cellular Automata Using a Particle Model R ichard K . Squier Computer Science Departm ent, Georgetown University, Washi ngton , D C 20057, USA Ken Steiglitz Com p uter Science Department , Princet on University, Prin cet on, N J 08544, USA A bst ract. In thi s paper we show how to embed pr acti cal compu tat ion in one-dimensional cellular automata using a mod el of comput ation based on collisions of moving part icles. T he cellular automat a have small neighbo rhoods, and state spaces t hat are bin ary occupancy vecto rs. T hey can be fabricated in VLSI, and perh aps also in bulk media t ha t support appropriate part icle prop agati on and collisions . The model uses inject ed part icles to repr esent bot h dat a and processors. Consequentl y, realizations are highly progr amm abl e and do not have applicat ion-specific to pology, in contras t to systolic arrays . We describe several pr actical calculations t hat can be carr ied out in a highly par allel way in a single cellular automa t on, including addit ion, subt raction, multipli cat ion , arbit ra rily nest ed combinat ions of t hese ope ra t ions, and finite-impulse-resp onse digit al filt ering of a signal arriving in a cont inuous st rea m. T hese are all accomplished in t ime linear in t he num ber of inpu t bits, and wit h fixed-p oint arit hm et ic of arbitrary pr ecision , independent of t he har dware. 1. Introduction Our goal in t h is pap er is t o ach ieve p ra ct ical compu tation in a uniform , sim ple, locally con necte d , hi gh ly p ar allel arch itect ure- in a way t ha t is also progr ammable, and t her eby accom m od at es t he d iffer in g requir ements of a variety of ap plica t ions . Syst olic arrays [22], of course , sa t isfy t he loca lly connected and parallel requirement s , b u t t he to pology a nd pr ocessor functionality are difficult t o m odify on ce t he m achine is bui lt . In t h is p aper we show how to embe d pr acti cal com p u tation in on e-dim ension a l cellular a utomata (CA s) usin g a mod el of com p utation t ha t is b as ed on collisions of m oving p art icles [33]. T he resultin g (fixed) har dware com bin es t he parallelism of systolic arrays with a high degr ee of progr am m ability. 312 Ri chard K. Squier and Ken St eiglitz Fredkin , Toffoli, and Mar golus [19- 21] have explored the idea of the inelast ic billiar d-b all mod el for computat ion, which is computation-universal. There is also a lar ge lit erature in lattice-gas aut omata (see [18], for example) , which use particle motion and collisions in CAs to simulate fluids. We have shown [17] t ha t a general class of la tti ce-gas automa ta is comput ationun iversal. Several recent papers have dealt wit h particle-like persist ent structures in binary CAs and t heir relationship to p ersist ent structures like solito ns in nonlinear different ial equa t ions [3, 8- 14] . It is fair to say t hat no one has yet succeeded in usin g these "nat urally occur ring" particles to do useful compu t at ion . For inst an ce, t he line of work in [14] has succeeded in support ing only t he simplest, almost trivial computation- binary ripp le-carry additio n , where t he dat a is pr esent ed to the CAs wit h the addend bit s int erleaved. The reason for the difficulty is t hat t he behavior of t he par ticles is very difficult to control and their proper ties only vaguely underst ood. This paper describ es what we think is a promising method of using a particl e model and illust rat es it s applica t ion to several different computations. Our basic method will be t o int roduce t he model of comp utation using part icles, and t hen realize t his mod el as a CA. Onc e the particles and their interacti ons have been set in t he realization , t he computation is st ill highly pr ogrammab le: t he pr ogram is det ermined by t he sequence of particles inject ed . In our examples, we build in enough particles and int eracti ons so t ha t many different computations are possible in t he same machin e. CAs for t he particle mod el can be realized in conventional VLSI , orand t his is much more spec ulat ive-in a bulk physical medium t hat supports movin g persistent st ructures wit h the appropriate charact erist ics [1, 2, 47]. Withou t dist inguishing between these two sit uat ions, we call the CAs or medium a substrate. We call the subst rate, the collect ion of particles it supports , and their int eractions a Particle Ma chine (PM) . The machin es in t his pap er are one-dimensional, and t hese PMs are most clearl y applicable t o comput at ions that can be mapp ed well t o onedim ensional systolic arrays . T he computat ions given as examples in t his pap er deal wit h basic binar y arit hmet ic and applications that use ar it hmet ic in regul ar ways , like digit al filt ering. Our cur rent ongoing work is aimed toward adding more parallel app lica tions, esp ecially signa l pro cessing, ot her num erical computation , and combinatorial applications like st ring matching. Alt hough our main goa l is par allelism , it is not hard t o show t hat there are CAs of our typ e t hat are comput at ion-universal, by t he usual stratagem of embedding an arbit rary Tu rin g machine in the medium and simulat ing it . We omit det ails here. On e imp or tant adva ntage of par allel comput ing using a PM is the homogeneity of t he implement ation . Once a par ticular set of particles and t heir interacti ons are decided on , only the subst rat e suppo rt ing t hem needs to be implement ed . Since t he cells of t he supporting substrate are all identical , a great economy of scale b ecomes possible in t he implement ati on . In a sense, Programmable Parallel A rithme tic in CAs Using a Particle Mod el 313 we have moved t he particular s of applicat ion-spec ific computer design to the realm of software. The results in this pap er show that P M-b ased computations inherit the efficiency of syst olic arr ays, but t ake pl ace in a less spec ialized machin e. Our examples include arbitrarily nest ed multiplication / addition and finiteimpulse res pons e (FIR) filt ering, all usin g the sa me cellular aut omato n realization, in time linear in the number of input bit s, and with arbit ra ry pr ecision fixed-point op erands. This pap er has three parts. F irst we describ e the model for parallel computation based on t he collisions and int eractions of particles. Second , we show how this model can be reali zed by CAs . Finally, we describe lin ear-time implement ations for the following computat ions : • binar y ad dit ion and su btrac t ion; • binary multiplication; • arbitrarily nest ed arit hme tic expressions t hat use t he op erations of additi on , subt rac t ion, and multiplication of fixed-point numbers; and • digit al filt ering of a semi-infinite st ream of fixed-point numbers wit h a F IR filter (with arbitrary fixed-point coefficients ) . 2. The particle model We begin wit h an informal desc ript ion of the mod el. Put simply, we want to think of comp utation as b eing performed whe n particles movin g in a medium collide and int eract in ways det ermined only by the particles' identities. As we will see b elow, this kind of interacti on is easy to in corporat e into t he framework of a CA . Figur e 1 shows t he general sche me envisione d for a PM. We think of part icles as b eing inj ect ed at different speeds at the left end , colliding along the (arbitraril y long) array wh ere they pr opagate, and finally pro ducing results. We will not be concerne d here wit h retrieving t hose results, and for our purposes the comp utation is complete when the answers appear somew here in the array. Of course t he distance of answers fro m t he input port is no worse than lin ear in t he number of time ste ps that the comp utatio n takes. In a real implem ent ation, we can eit her pro vid e taps along the array or wait for results t o come out t he ot her end , however far away that may be. A thi rd alte rnat ive is to send in a slowly moving "mirr or" particle defined so as to reflect the answer particles back to the input. We can make sure t hat particles injected ~--~=----~--=O-~---:=---D=---~--=- to infinity - Figure 1: The general model envisioned for a particle machine. 314 Richard K. Squier and Ken Steiglitz reflecti on from the mirror produces particles that pass unharmed through any particles earlier in the array. We next show how such PMs can be embe dded in CA s in a natural way. Not e t hat sometimes it is convenient to shift the speed of every particle to change the fram e of reference. For example, in the following we will someti mes assume t hat some particles are moving left , and others right , whereas in an actual implementati on all par ticl es might be moving right at different spee ds . The collisions and t heir results are what matter. This shift in t he fram e of reference is trivial in the abstrac t picture we have ju st given , and will b e eas y t o compe nsate for in a CA implementation by shift ing t he output window of the CA . 3. Cellular automaton implementation Informally, a CA is an array of cells, each in a state t hat evolves wit h time. The state of cell i at time t + 1 is determined by the st ates of cells in a n eighborh ood of cell i at t im e t, the neighborhood being defined as those cells at a dist an ce r (t he radius) or less of cell i . Thus, t he neighb orh ood of a CA wit h radius r contains k = 21' + 1 cells and includes cell i it self. When the state space of a cell, S , is binary-valued- that is, when S = {a, l} - we call t he CA a bina ry CA. We will avoid t he difficul ti es of using "nat ur ally occurr ing" particles by expanding the state space of the CA in a way t hat reflect s our picture of a P M. T hink of eac h cell of t he CA as containing at any par ti cular t ime any combinat ion of a given set of n particles. T hus , we can think of the st ate as an occupan cy vect or, and the state space is therefore now S = {O, I} " . Im plementing such a CA presents no part icular problems . We spec ify t he nex t -state t ra nsit ions by a table t hat shows, for each combinat ion of particles t hat can enter t he window, what the next state is for the cell in qu esti on. Thus, for a CA wit h a neighb orhood of size k t hat supports n distinct par ticles, t here are in t heory 2 n k states of t he neighb orhood , and hence 2 n k rows of t he t ransit ion table. Each row contains t he next state, a binary n-vect or t hat encodes t he part icles t hat are now present at the site in qu estion. Onl y a small fracti on of this space is usu ally needed for implement ati on , since we are only int erest ed in row ent ries that correspond to states t hat can actually occur. T his is an imp ortant point whe n we consider the practica l implementat ion of such CAs , either in software or har dwar e. Figure 2 shows a simp le example of two particles traveling in oppos ite dir ecti ons that collide and int eract. In t his case t he right- movin g particle annihilates the left- movin g particle, and is itself transformed into a right mov ing part icle of a different type. It is easy to verify that the t ransit ion of each cell from t he states rep res ented by t he particles present can be ens ured by an ap pro priate state-trans it ion table for a neighb orh ood of radius l. Next , we describe some nu merical operat ions that can be performed efficientl y in P Ms. We start with two simple ways to perform binary addit ion , an d then build up to more complicated examples. Our descrip tions will be Programm able Parallel Ari thm etic in CAs Using a Particle Model 315 Figure 2: An examp le of a collision between a left- and right-moving particle. Th e left-moving part icle is annihilated, and th e right-moving part icle is transformed into a different kind of right-moving particle. Th e neighborhood radius in this example is 1, as indicat ed in Row 4. more or less informal , but t he more com plicate d ones have b een verified by com pute r simulat ion . Furthermore, t hey can all b e implem ent ed in a single PM wit h ab out 14 p art icles and a trans ition table representing a b out 150 rules. The examp les all use a neighborhood of rad ius 1. 4. A dding bin a ry n umbers F igure 3 shows on e way t o add binary numb ers . Each of the two addends are represent ed by a sequence of particles, each p article representing a sing le bit. T hus there are fou r par ticles used to represent dat a : left- and right-moving as and I s. We will arrange them wit h leas t- significan t bi t leading, so th at when t hey collide , t he bit s meet in order of incr easing significance . This makes it easy to encode the carry bit in the st ate of t he processor p ar ti cle. At t he cell where the da ta particles meet we place an inst an ce of a fifth kind of particle, a stat ionary "processo r" p article, which we call p . This processor p ar t icle is act ually one of two par ticles, say Po and PI, meant to represent th e fact t hat the current carr y bit is eit her a or 1. The pr ocessor left addend processor particle o o right addend ............ .... • 0 • 0 • Figure 3: Implementation of binary addition by having the two addends collide head-on at a single processor parti cle. T here are four different dat a particles here, left- and right-moving Os and Is, represented by unfilled and filled circles. Th e processor particle is represented by the diamond , and may actually have several sta tes. Richard K. Squier and Ken St eiglitz 316 par ticl e starts out as Po , to repr esent a 0 carry. The collision t abl e is defined so that the followin g things happen at a processor particle wh en t here is a right-moving data particle in the cell immediat ely adjacent to it on the left, and a left-m ovin g dat a particle in t he cell immediat ely adjacent t o it on t he right: 1. the two dat a particles are annihila te d; 2. a new, left- movin g "answer" particle is ejecte d ; 3. the identity of th e processor particle is set t o Po or Pl in the obv ious way, to reflect the value of t he carry bit . Not ice that we can use the sa me left-movin g par ticl es to represent the answer bit s, as lon g as we make sure t hey pass t hrough any right- mov ing par ticl es they encounte r . We can t hink of the two vers ions of the pro cessor particle as two dist inct particles, alt hough t hey are really two "states" of the sa me part icle- in some sense analogous to the ground and excite d states of at oms . T hus, we can alte rnat ively t hink of the different pr ocessor par ticles collective ly as a "processor at om," and t he Po and P: particles as different states of the sa me ato m . Either way, t he processor ato m occupies two slots of the occupancy vector t hat st ores the stat e. We call the particular state of a part icle it s excit ation state or it s particle identity. We use similar termino logy for the dat a atom, which has four states that represent binary value an d directi on. It is not hard to see t hat t his is not the on ly way t o add , and we now describ e anot her , to illustrat e some of t he tricks we have found useful in designing PMs to do par ti cular tasks . On e addit ion met ho d may be better than another in a given app lication , becau se of t he place ment of t he addends or t he desired place ment of the sum. T he second addit ion method ass umes t hat the addends are sto red on t op of one anot her, in dat a atoms that are defined to be stat ionary. That is, t he dat a atoms of one addend are distinct from those of t he ot her , so t hey can coexist in t he sa me cell and have speed zero (see Fi gure 4). We can think of t his sit uation as st oring t he two nu mb ers in par allel registe rs. We now shoot a processor atom at the two addends , from t he leastsignificant -bit directi on. T he pr ocessor atom simply leaves the sum bit behind, and t akes t he carry bit wit h it self in the form of it s excitat ion state . processor particle left addend • 000 0 • o • • 0 • • right addend Figure 4: A second met hod for adding binary numbers, using a processor atom t hat sweeps across two stationary addends stored in parallel and takes t he carry bit with it in th e form of it s excitation state. Programmable Parallel Arithmetic in CA s Using a Particle Model left multiplicand processor particles 317 right multiplicand DDDDDD Figure 5: Two dat a streams colliding with a processor st ream to perform multiplication by bit-level convolut ion. T his method reflect s t he hardwar e of a ripple-carry adder , and was used in [14J. Negation, and hence subt raction, is qui te easy to incorp orate int o the CA. Just add a particle that complements data bits as it passes t hr ough t hem , and then add one . (We ass ume two 's-complement arithm et ic.) 5. Mult iply in g binary numbers T he preceding description should make it clear that new kinds of particles can always be added t o a given CA implem ent at ion of a PM , and t hat the proper ties we have used can be easily incorporated in the next-st ate t ab le. Figur e 5 shows how a bit -level syst olic mult iplier [15, 16, 23, 24, 25, 30J can be imp lemented by a P M. As in the adders , both data and processors are represented by par ti cles. The two sequences of bits representing t he multiplicands t ravel t oward each ot her and collide at a stat ionary row of processor particles. At each three-way collision involving two data bit s and one pr ocessor , the pr ocessor particle encodes the product bit in it s excitat ion st ate and adds it to the product bit previously stored by it s state . Each rightmov ing data particle may be accompanied by a carry bit particle. When the data bits have passed each ot her, the bit s of the pr oduct remain enco ded in the st a tes of t he pr ocessor particles, lowest-order bit on t he left . Fi gure 6 shows t he out put of a simulation program written to verify correct op eration of t he algorit hm . 6. Nested arithmetic It is easy enough t o "clean up " the op erands afte r addit ion or mult ipli cation by sending in particles that annihilat e t he original data bits. This may req uire sending in slow annihila tion particles first , so one op erand overtakes and hits them after the multiplication . Therefore, we can arr ange addit ion and multiplication so the results are in t he sa me form as th e inputs. This makes it possible to imp lement ar bitrarily nest ed combinations of addit ion an d mul ti plicati on wit h t he same degree of parallelism as sing le addition and multiplica tion. Fi gure 7 illust rat es t he gene ral idea . The product A x B is added to the pro du ct C x D . The sequences of part icles representing the ope rations must be spaced with some care, so that the collisions pro ducing the product finish before the addition begin s. Richard K. Squier and Ken Ste iglitz 318 R. .R R. .R R. .R R. .R R. .R R. .R R. .R R. .R .L L. .L L. .L L. .L L. .L L. .L .L L. .L p P P P P P P P p p p .L p pL. p p p p p P .L P p p pL . p pRo p . L p p .L pL. p . R pL. p p . L i n. p . L p pL. 1 .R pL . p iR . 1 . L i R. p 1 . R iL . 1 . R p pRo 1 .L eBR. 1 u , B . Rei p .R eBR. p 1 B 1 B B . Rep B B iR . 1 B 1 .R 1 B 1 1 B B .L L. .L L. .L L. .L L. R. .R R. .R R. R. F igure 6: CA implement a tion of PM sys t olic mult iplication , show n with most- significant bit on th e right . Row t represents the CA states at t ime t; t hat is, t ime goes from top to bo ttom. The symbol "R" represents a right-movin g 1, "L" a left -movin g 1, and so for th. The com putation is 3 x 3 = 9. T he dat a part icles keep going afte r they inter act wit h t he pr ocessor particles, and would probab ly be clean ed up in a pr acti cal sit uation . [ A ~ B] .. [C • 0] ~ o 0 •• ~ 00000000 0 0 0 • v <> ~ -- - - - - . - .._-'4. • 0 • • ~ 00000000 • 0 0 • Fi gure 7: Illustrati on of nest ed computation. Shown is t he particle st ream inj ect ed as input to a particle m achine . Collisions among the gro up on t he left will produce the pro d uct A x B , wit h the resu lting product movin g right , and symmet rically for the gro up on the right. T hen t he particles repr esenti ng the two products will collide a t a st ation ary particle that repr esent s an ad der-processor , and the two products will be ad ded , as in Figure 3. The outlined "+" and " x" represent the particle gro ups that pr oduce add it ion and mu lt iplication , resp ectively. 7. Digit al filt ering A mu lti plier-accumu lat or is implement ed in a PM by st oring each pro duc t "in par allel" wit h the previous ly accumula te d sum, as in the second add it ion met hod, which was illustra ted in Figur e 4. T he new pro du ct and t he old sum are then added by sending t hrough a ripp le-carry act ivation par ticle, which tr avels along wit h and immediat ely follows each coefficient (see F igur e 8) . This makes po ssible t he PM implement ation of a fixed-point FIR filter by havin g a left -m ovin g inp ut signal st ream hit a right-moving coefficient stream . The multip lications ar e bit- level sys to lic, and t he filt ering Programmable Parallel Arithmetic in CA s Using a Particle Model { +hO ~ outputs : ~ +h 1 ,, * Yo multiplier-accumulator particle group "' - ~ ~ + h2 ~ 319 Xo ~ ~ ,, * -x , ~,, ~ * x2 * Ys Figure 8: Particle machine implementation of an FIR filt er. The coefficients hi collide with t he inpu t data words X t at multiplieraccumulator particle groups indi cated by "*" . Each such group consists of a stat ionary multiplier group t hat sto res its product and t he accumulated sum. The ripp le-carry adder particles, indicated by "+", accomp any each coefficient hi and activate th e addition of the product to the accumulate d sum after each produ ct is formed. Finally, on t he ext reme left is a right -moving "propeller" particle t hat t ravels t hrough the results and propels t hem leftward , where t hey leave t he machine. convolut ion is word-level sys tolic , so t he ent ire ca lculat ion m irror s a two level systolic arr ay mult iplier [15, 16, 23, 24, 25, 30]. T he det ails are ted ious , but simple examples of such a filt er have b een simulated and t he idea verified . 8. F eedba ck So far we have int erpret ed the PM subs t rate as strict ly on e-d im ensional, wit h particles passing "t hro ugh" one another if t hey do not int eract . However , we ca n eas ily change our po int of view and int erpret the subs t rate as having limit ed exte nt in a second dim ension , and t hink of parti cles as traveling on differ ent t racks. We ca n gro up the comp onents of t he CA state vector and int erpret each gro up as holdi ng part icles t hat share a comm on t rack. All t racks op erate in p ar allel, and we ca n design sp ecial particles t hat ca use ot her p art icles on different t racks to int er act . The entire CA itself ca n t he n b e t ho ug ht of as a fixed-wid th bus, one wire per t rack. To illust ra t e t he u tility of thinking t his way, cons ider t he problem of im plem entin g a digit al filt er with a feedback loop . Su ppose we h ave an input st rea m m oving to t he left. We ca n send t he output stream of a comp utat ion back to t he right alon g a parallel track and have it interact wit h t he input st ream at the processor particles doing t he filter arit hmet ic. T his feedback t rack requires t wo bits, one for a "0" and one for a "I" m oving to t he right. In order t o copy t he out put stream to t he feed back track we n eed an addit ional particle, which can b e t ho ught of as t raveling on a t hird track. T his "copying" pa rticle ca n b e gro up ed wit h t he ot he r processor par t icles and requires one addit ional bit in t he CA state vector. Altoget her we need t hree addit ional bi t s in t he state vect or to im plem ent feedback pro cessing . In t his mult i-track scheme, each extra t rack enlarg es t he sta te space, but adds progr ammin g flexib ility to t he mo de l. Ult im at ely, it is t he size of t he collision table in a ny par ticular implement ati on t hat det ermin es t he numbe r 320 Rich ard K. Squier and Ken Steiglitz of tr acks t hat are pract ical. We ant icipate that only a few ext ra t racks will make P Ms more flexibl e an d easier to progra m. 9. C onclusions We t hink t he P M fram ework described here is int eresting from both a pr act ical and a t heoret ical po int of view. For one t hing, it invit es us t o think about comp ut at ion in a new way, on e t hat has some of t he feat ures of systo lic arrays , bu t at the sa me t ime is more flexible and not ti ed closely to a particular hardw are configuration . Onc e a subs trate and it s par ticl es are defined , spe cifying st reams of particles to inj ect for par ti cular comput at ions is a programming t ask, not a hard ware design t ask, alt ho ugh it is st rongly cons traine d by t he one-di mensional geometry and t he interacti on rul es we have create d. T he work on map pin g algor it hms to syst olic arr ays [31, 32] may help us find a good lan gu age and build an effect ive compiler . From a pract ical po int of view, the approach presented here could lead to new kinds of hardware for highl y par allel comp utat ion, using VLSI imp lement ati ons of CAs. Su ch CAs would be dr iven by convent ional compute rs to generat e t he st reams of input particl es. T he CA s t hemselves would b e design ed to support wide classes of computations. T here are many int eresting ope n research quest ions concern ing t he design of particle sets and int eracti ons for PMs. Of course we are espe cially int erest ed in find ing PMs wit h small or easily implement ed collision t ables, which are as com p utationally rich and efficient as po ssible. Ou r curr ent ongoing work [34, 35] is also aimed at developin g more applicati ons, includin g it erati ve numerical computat ions and combinato rial pro blems such as st ring matching [26-29]. Acknowledgments T his work was sup porte d in par t by NSF gr an t MIP -9201484 , and by a grant from Georgetown University. R eferences [1] S. A. Sm it h , R. C. Watt, and S. R. Hameroff, "Cellular Au t om at a in Cyto skelet al La ttices ," Physica D, 10 (198 4) 168- 174. [2] A. J. Heeger , S. Kivelson , J . R. Schr ieffer , a nd W. -P. Su , "Solit ons in Conduction P olymers ," R eviews of Modern Ph ysics, 60 (3) (1988) 781-850. [3] N. Islam, J. P. Singh, and K. St eiglit z, "Solit on P hase Shifts in a Dissipati ve Lat t ice," Journ al of Applied Ph ysics, 6 2(2) (1987) 689-693 . [4] F . L. Carte r , "T he Molecular Device Comp ute r : P oint of Depar ture for Larg e Scale Cellular Auto m at a ," Ph ysica D, 10 (1984) 175- 194. Programmable Parallel Arithmetic in CA s Using a Particle M od el 321 [5] J. R. Milch , "Compute rs Based on Molecular Im plem entations of Cellu lar Autom at a," pres ented at Third International Sympo sium on Mol ecular El ectronic Devices, Arl ington VA, Octo b er 7, 1986. [6] S. Lloyd , "A Pot entially Realiza ble Qua nt um Com pute r," Science, 261(17) (1993) 1569-1 571. [7] C . Halvorson , A. Hays, B . Kr aab el, R. Wu , F . Wudl, and A . Heeger , "A 160-Femtosecond Optical Im age P rocessor Based on a Conjugated P olym er ," S cience, 265(26) (1994) 1215-1216. [8] J. K. P ark, K. St eiglit z, and W . P . T hurs ton , "Solit on-Like Beh avior in Aut om at a ," Phy sica D, 19 (1986) 423-432 . [9] C. H. Goldb erg, "P arity F ilte r Automat a," Complex Sy st em s, 2 (1988) 91141. [10] A. S. Fokas , E. P ap ado poulou, and Y. Saridaki s, "P articles in Soliton Cellular Autom ata ," Com plex Syst em s, 3 (1989) 615-633. [11] A. S. Fokas , E . P ap ad op ou lou , a nd Y. Saridaki s, "Cohere nt Structures in Cellular Autom ata ," Phy sics Lett ers A , 147 (1990) 369- 379 . [12] M. Bruschi , P . M. Santini, and O. Ragnisco, "Int egra ble Cellular Au tomata ," Phy sics Lett ers A, 169 (1992) 151-160 . [13] M. J . Ab lowit z, J . M . K eiser , and L. A . Ta khtajan , "Class of St able Mu lti stat e T ime- Rever sib le Cellul ar Au tom ata with Ri ch Part icle Cont ent ," Phy sical R eview A , 44 (1991) 6909- 6912. .r [14] K. St eiglit z, I. K am al, and A. Watson , "E mbe dd ing Computation in On eDimen sional Autom ata by Phase Coding Solit ons," IEEE Transactions on Com puters , 37 (1988) 138-145. [15] P. R. Cappello , "Towards an FIR F ilt er Tissu e," pages 276-279 in Proceedings of th e International Confere nce on A coustics, Speech, and Signal Processing, Tampa , FL, Mar ch 1985. [16] J. V . McCanny, J . G. McWhir ter , J. B . G. Roberts, D . J. D ay, a nd T . L. Thorp, "Bit Level Syst olic Arrays," Proceedings of th e Fift eenth A siloma r Conference on Circuit s, Sy st em s f3 Computers , Novemb er , 1981. [17] R. Squier a nd K. St eiglitz, "T wo-Dimens ional F HP Latti ce-Gases Ar e Computation Uni ver sal," Compl ex Sy st em s, 7 (1993) 297- 307. [18] U. Fri sch , D . cl'Humier es, B . Hasslacher , P. Lallem and, Y. P omeau , and J. P. Rivet , "Lat t ice Gas Hydrodynamics in Two and Three Dimensions," Compl ex Sy st em s, 1 (1987) 649-707. [19] C . H . Bennett , "Note s on the Hist or y of Reversib le Com putation," IBM Journal of Research and Development , 32(1) (1988) 16-23. 322 Richard K. Squier and Ken Steiglitz [20] N . Margolu s, "P hysics-Like Mode ls of Computations ," Phys ica D, 10 (1984) 81-95. [21] T. Toffoli and N. Margolus, Cellular Automata Machine s: A N ew Environ m ent for Modeling (Cambridge: MIT Press, 1987) . [22] H . T. Kung , "W hy Systolic Architectures?" IEEE Transactions on Computers, 1 5 (1982) 37-46. [23] H. T. K un g, L. M. Ruan e, and D. W. L. Yen , "A Two-Level P ip elined Systolic Array for Convolutions," pages 255-264 in CMU Conference on VLSI Systems and Computations, edite d by H. T. Kung , B. Sproull, a nd G. St eele (Ro ckville, MD: Comput er Science Press, 1981). [24] S. Y. Ku ng, VLSI Array Processors (E nglewood Cliffs, NJ: Prent ice Ha ll, 1988). [25] P. R. Cappello and K St eiglit z, "Digit al Signal Processing Applications of Systolic Algorithms,." pages 245- 254 in CMU Conference on VLSI Systems and Computation s, edite d by H . T . Kun g, B. Sproull, and G . Steele (Roc kville, MD: Comput er Science Press, 1981). [26] H.-H. Liu and K-S . Fu , "VLSI Arrays for Minimum-Distance Classifications ," in VLSI for Pattern Recognition and Im age Processing , edited by K-S. Fu (Berlin: Sp ing er- Verlag , 1984) . [27] R. J . Lipton and D. Lopr esti , "A Systolic Array for Rapi d String Comparison ," pages 363-376 in 1985 Chapel Hill Conference on Very Large S cale Integration, edite d by Henry Fuchs (Ro ckville, MD: Computer Scienc e Press, 1985). [28] R. J . Lipton and D . Loprest i, "Comparing Lon g Strings on a Short Systolic Array," 1986 In ternational Work shop on Systo lic Arrays, Ox ford University, July 2-4, 1986. [29] G . M. Landau and U . Vishkin , "Int roducing Efficient P ara llelism into Approximate String Matching and a New Serial Algorithm," ACM STOC, (1986) 220-230. [30] F. T. Leighton, Introduction to Parall el Algorithms and Archit ectures (San Mateo, CA : Morgan Ka ufman, 1992). [31] P . R. Cappello and K St eiglitz , "Unifying VLSI Array Design wit h Linear Transformations of Space-Time," pages 23-65 in Advances in Computing Research: VLSI Th eory, edite d by F . P. Preparata (Greenwich, CT: JAI P ress, 1984). [32] D. 1. Moldovon and J . A. B. Fortes, "P art it ioning and Mapping Algorithms into Fixed Systolic Arrays," IEEE Transact ions on Computers, C -35(1) (1986) 1- 12. Programmable Parallel Arit hmetic in CAs Using a Particl e Mode l 323 [33] R. K. Squ ier and K. St eiglit z, "Subatomic P article Mach ines: Parallel Processing in Bulk Material ," submit te d to Signal Processing Lett ers. [34] R . K. Squ ier , K. Ste iglit z, and M. H. Jakubowski, "Implementation of P ara llel Ar ithmet ic in a Cellu lar Automaton," 1995 Intern ation al Conference on Application Specific Array Processors, Strasbourg, France, July 24-26, 1995. [35] R. K. Squ ier , K. Steiglit z, and M. H . J akubowski, "General P ar allel Comput ation Wit ho ut CPUs : VLSI Realization of a Particle Machine," Technic al Report CS-TR-484-9 5, Comput er Science Department , P rinceton University (February 1995). Submitted to IEEE Transactions on Computers.