LIDS- P 2027 March, 1991 Research Supported By: AFOSR grant 88-0032 ARO contract DAAL03-86-K-0171 ONR grant N00014-9 1-J-1004 Parallel Smoothing Algorithms for Causal and Acausal Systems* Taylor, D. Willsky, A.S. LIDS-P-2027 Parallel Smoothing Algorithms for Causal and Acausal Systemsi Darrin Taylor and Alan S. Willsky Abstract In this paper we describe parallel processing algorithms for optimal smoothing for discrete time linear systems described by two point boundary value difference equations. These algorithms involve the partitioning of the data interval with one processor for each subinterval. The processing structures considered consists of independent parallel processing on each subinterval, followed by an information exchange between processors and then a final sweep of independent subinterval processing. The local processing procedures that we describe produce maximum likelihood (ML) estimates in which dynamics and a priori conditions play the same role as measurements, i.e. they are all noisy constraints. Consideration of such ML procedures for descriptor systems requires that we develop a general procedure for recursive estimation in situations in which neither the error covariance nor its inverse is well defined. This leads among other things to a generalization of the well known Mayne-Fraser two filter algorithm in which the two directions of processing are treated symmetrically,. Furthermore using an ML procedure for the local processing step leads to considerable simplification of the subsequent interprocessor information exchange step. We present both a two filter implementation of this step as well as a highly parallel implementation exactly matched to the hypercube computer architecture. This algorithm by itself yields a newr parallel smoothing algorithm and also, significantly, is extendible to higher dimension- offering the promise of even more significant computational savings for applications involving the estimation of random fields. !,. uail Ih ,rs : I,, w iti,' Inflorimatlioll all I),-ciSio S l),1 ,, , 1) l', rl',ri':l -; , irt, '., ri,: -r,I.g,( 'II ,I)ItrS ,i' .,,,.1 1, 1 ,,.,III yMtiTs. Il! ( 'alibridg'. NI.A 21'9. 'I Iis work was silpport,,ld iln lo(rce ()Olicci(o( ScicIIti li lccsarchIt tider ( r:al, I\ l1()SH1-Ss-o0(32. aIid in pr;ll I., lite I S Army nii(ler ('onltraet I)!\A I,(:{-;-I%- - 0ll.7, auid by Ilie ()Olice ot NavalI tiesearchl mitner (4,rati N0 I b,i .. r: l I;,r parl Iy Ihei \il {ls .arl h( )Ilic(,, )!1.I-!I-.1--l- I).I. 2 1 Introduction In this paper we describe algorithms for parallel optimal smoothing for systems described by two point boundary value descriptor systems (TPBVDS's), a class that includes btoth the standard causal model as well as a rich class of acausal models. There are several reasons for interest in parallel smoothing algorithms. First, the processing environment has changed substantially over the years to allow for multiprocessor computational enviromnents. The popular recursive estimation algorithms which were developed based on the Kalman filter, were designed to function in a single processor environment. In addition estimation of multidimensional processes nearly necessitates a multiprocessor environment. Specifically, while the boundary of a one dimensional process does not increase when the interval of consideration increases, a two dimensional system has a boundary that grows at a rate no smaller than the square root of the size of the region being considered. This is significant since the size of the boundary gives an indication of the complexity of the system. In particular, if we think of the dimension of the 'state' of a system as both a measure of the complexity of the system (e.g. in terms of required storage), and as a set of required boundary conditions needed for further computation, we see that there is a dramatic difference between the l-D and the 2-D cases. Thus, for large regions partitioning the data and processing it separately makes sense in order to reduce the complexity of the entire algorithm. This not only yields timne savings, in 1-D as well as in 2D, but also may be essential ill 2-D in order to keep the computational burden on individual processors within reason. Althcug!, ,.his paper considers cnlr thbe one dimensional problem, multidimensional estimation considerations are a guide to judge which approaches to consider for parallel estimation. In particular we wish to develop algorithms motivated by and (hopefully) generalizable to higher dimensions and which demonstrate the promise of efficient multidimensional estimation algorithms. Also, another criteria by which to guide us in consideration of parallel estimation algorithms is the notion of fault tolerance. In the event that a processor fails, do the remaining processors produce useful information? It is not enough to take an algorithm and parallelize it. It is important that the local processing involves computing meaningful information so that alternate strategies can be employed to recover useful information in the event of processor or communication failure. In particular we seek algorithms in which each processing step has a precise interpretation as producing an optimal estimate in some sense. Furthermore, we also wish to obtain algorithms that are capable of providing estimation error covariances. Such information not only allows performance assessment but also is essential for fault tolerant operation in which the absence of one or more information source must be accounted for in a statistically optimal fashion. The two parallel estimation algorithms described here have common characteristics. First, the data is partitioned among the processors. Local calculations are performed by processors on their own sets of data. Local information is then exchanged between processors and this is followed by a parallel post processing step in which each processor updates the estimates on its subinterval to produce the final globally optimal estimate over thle entire data interval. While a variety of approaches have been developed for various optimnal estimation problems [1-7], only two of these [6,7] employ a similar data partitioning structure for parallel filtering and smoothing for causal systems. In [6] a square root algorithm is used for parallel filtering on the subintervals assuming perfect knowledge of the state at 3 one endpoint. This is followed by an interprocessor information exchange and computation step. This step which is based on a change of initial condition formula in order to correct for imperfect endpoint knowledge, is similar in structure to the Mayne-Fraser two filter smoother in orde:' to obtain optimal smoothed estimates at the boundaries of the data intervals and to allow subsequent parallel computation of smoothed estimates within each subinterval. A somewhat more efficient algorithm, with a similar structure, is described in [7]. This procedure deals symmetrically with the two endpoints of each subinterval by initially processing data outward toward and in the final step inward from the boundary points (essentially using in each interval a joint model for x(t) and x(-t), with t = 0 corresponding to the center of the interval). The interprocessor exchange steps makes use of the so called partition theorem [10], resulting again in a two-filter sweep from processor to processor in both directions to produce optimal estimates at all boundary points. The algorithms we describe here bear some similarity to these approaches (in particular, and as we have indicated, they also use the same data partitioning and three step structure), but they also have some significant differences. First of all we deal here with the more general class of TPBVDS's. Also at each stage of our processing we compute maximum likelihood (ML) estimates based on the available information. Here as in [12], we essentially adopt the perspective that a priori statistics, dynamical relationships, and actual observations all play the same role, namely as noisy constraints. The use of this formalism has several important implications, perhaps most notably in the simplification and greatly enhanced flexibility it provides us in the interprocessor exchange step. However, let us first conmment on some of the implications for the local procetaineg step. Specifically as discussed in [11] we can without any loss of generality restrict our attention to so called separable TPBVDS's (STPBVDS's) in which independent boundary conditions are specified at two ends of the interval, say , t = +T. In particular, if we do not have such separable conditions on x(t), we can obtain them by considering the evolution of z(t) = [xT(t), T(--t)lT, so that the original boundary condition is now a condition on z(t), and we acquire a separate condition on z(0), namely that its two components must be exactly equal. Obviously this construction points to a connection with the model in [7]. Note also that as is clear from this construction and as must be true for any well posed TPBVDS, only partial boundary conditions are available at each end. Viewing these as initial measurements for recursive ML estimation procedures starting at either end of the data interval, we see that at least initially only incomplete information is available, which would seem to imply that we should use the information form in any recursive procedure, i.e to propagate P-', and P-'l. This could be carried out at least until P- 1 is invertible so that P is well defined. However as discussed in [13], the consideration of descriptor dynamics with possibly singular dynamic matrices, also means that we may have some noiseless constraints or as in the boundary constraints on the two components of z(O), implying that p-l is not well defined either! The preceding discussion makes clear that in considering recursive ML estimation for TPBVDS's we must directly confront the problem of estimation in the face of degenieracy, where the linear equations yielding the ML estimate need not have a unique solution (so t hat at least some part of x(t) is unconstrained by available information) but may yield perfect estimates of other parts of x(t). The generalized framework of such generalized estimation in the static case is developed in [9] (see also [8]). In [12] the results of [9] are used to develop 4 recursive filtering procedures for TPBVDS in the case when all variables are estimable, (so that P is well defined). What we describe in the next section are algorithmns for optimal STPBVDS smoothing in the general case. In particular we describe generalizations of the well known Mayne-Fraser and Rauch-Tung-Striebel algorithmts and in fact provide a completely symmetric version of the first of these in which each of the two filters is initialized with the independent boundary information available to it. The algorithms described in Section 2, in addition to being of interest in their own right, also provide us with the first and third local processing steps for our data partitioned parallel processing procedure. Two new algorithms for the second, interprocessor data step are described in Section 3. As in [6,7], we can view the output of the first step as producing 'measurements' of x(t) at the boundaries. However in the Bayesian approaches of [6,7], the errors in these 'measurements' are correlated since each local processor makes use of common prior information. This leads to the comparatively involved two filter procedure in [6,7] for exchanging and fusing endpoint information among processors. In contrast, by adopting the ML formalism we guarantee that the result of out first local processing step produces independent 'measurements' of boundary points. This leads to an algorithm, similar in structure but far simpler than the approach in [7],or[6]. However, it is the second algorithmic structure described in Section 3 that we feel is most novel and noteworthy. First of all, unlike the data exchange steps in [6,7] and our first algorithm, the data exchange structure of our second algorithm is itself highly parallel in nature and is in fact perfectly matched to the hypercube architecture. Secondly this ran bc applied to the original discrete data without 3c*t' processing steps befort and after it, yielding by itself a new highly parallel smoothing algorithm matched to a very different computer architecture than that in [6,7] Finally, the basic structure of this algorithm can be extended to multiple dimensions, offering the promise of achieving the needed efficiencies mentioned previously. We comment on this a bit more as we conclude the paper in Section 4. ^.l~r:hhtc. 2 2.1 Maximum Likelihood Recursive Estimation for TPBVDS's Maximum Likelihood Estimation In this section we describe algorithms for recursive filtering and smoothing for TPBVDS's As we indicated in the introduction, we adopt an ML perspective in part with an eye toward the parallel processing procedures of Section 3 and in part because such a formalism is particularly natural for descriptor systems in which the dynamics are more appropriately thought of as constraints rather than the basis of recursion. Also, as we have indicated, the problems of interest to us require that we examine, situations in which neither the estimation error covariance nor its inverse are well defined. To this end, let us begin with a brief look at a static ML estimation problem. Specifically consider the problem of estimlatiing an unknown vector x based on the observations y = He + v (I) where v is a zero mean Gaussian random variable, with possibly singular covariance R, and where H need not have full columnm rank, so that some part of x may be perfectly 5 reconstructed while another part remains completely unknown. optimal estimate ;ML What we mean by an in this case is a linear function of y so that if cTx is estimable ( i.e if a finite variance estimate of it can be constructed), then cT;iML is a mlinimum variance estimate of cTx. Note that in general iML is not unique, as no constraint is placed on the non-estimable portion of x. The solution we choose is the minimum norm solution given by ;rML = 0 Y R H ] [HT I | (2) [ ° where # denotes the Moore-Penrose pseudoinverse. As developed thoroughly in [9], (see also [12]) other generalized inverses can be used to obtain other valid choices for the ML estimate. However, this is the one we require for purposes we now outline, without proof (see [13] for details) Note that the range space of the symmetric projection matrix P, = (HTH)#(HTH) determines the estimable subspace of x. Then P:x~ML = COV(PX - XML)= [ '0T R HT H (3) XML and furthermore ]0 1 0 (4) Note also that P,Cov(Px - XML)P, = Cov(Pzx - XML) (5) Furthermore the following properties of ;ML are critical in deriving the recursive structure described later in this section: a) Consider the problem of ML estimation of x and z based on the data in (3) together with the measurements w = Gz + Jz + u (6) where u is a zero mean random vector uncorrelated with v . Then the optimal ML estimate for this problem is the same as that based on (6) and the observation iML = PZx + ±ML (7) where :ML is a zero mean Gaussian random vector independent of u with covariance given by (4) b) Suppose that = Ax (s) The ML estimate of z based on (3) is given by ZML = PzA.IML (9) Cov(PWz - iML) = PZA Cov(Pz - iML)ATP, (10) 6 where Pz is the largest rank symmetric projection matrix such that PzA(I - P,) = 0 (11) Pz = I - (A(I - P?)AT)#(A(I - P.)AT) (12) and is given by the following Note that one implication of (a) is that we can use our formalism for the recursive incorporation of information (so if in particular J = 0 in (6), and x is estimable based on (3), and (6), this procedure yields the unique optimal estimate of all of z). Also if A is invertible in (8), then P, has the same rank as P,. However, if A is singular, it is possible that P, will have larger rank because A may kill some of the non-estimable portions of x. 2.2 Two Point Boundary Value Descriptor Systems A general TPBVDS has the following form: E(t + l)x(t + 1) = A(t)x(t) + Btu(t) -T < t < T - 1 y(t) = C(t)x(t) + v(t) -T < t < T E(-T)x(-T) = A(T)x(T) + BTu(T) (13) (14) (15) where [uT(t), vT(t)]T] is a whii,; noise process with Co c[ u(t) v(t) I 0 (16) [ 0 R(t + 1) Note that written in this form, we have made little distinction between the boundary condition (15) on the process and the dynamics (13). We assume for simplicity, that the system (13), (15) is well-posed, i.e. that (13), (15) admits a unique solution for any choice of u(t), although, as in [12], the results here can be extend to a more general setting in which (13), (15) simply provide either an over- or under-constrained set of noisy constraints (and where, in fact x(t) may vary in dimension). In general the boundary conditions (15) couple the values of x(t) at the two endpoints t = ±T. An important subclass of TPBVDS's are those that are separable in which (15) specifies independent constraints of x(-T) and x(T). (In the well-posed case (15) is a set of constraints of dimension equal to that of x(t), so that in the separable case (15) provides incomplete constraints of at least one of the two boundary points). As shown in [11] it is always possible to transform a TPBVDS to one that is separable by considering the joint evolution of x(t) and x(-t): 0 E(-tt) , [ E(t + 1) 0 10 Af-to 1) (-t) 0 1 [ -B-t-1 O (t + 1) X- t- 1) 11 (17) - 1) 7 -][ [I (-0) =o0 (19) (20) [ -A(T)ET)] [ (-TT(T) ]=BTu(T) y(t) C(t) 0 y(-t) (t) C(-t) t)(21) [ x(-t) [ r(-t) 21 where (17)is defined for 0 < t < T-1, (21)for 0 < t < T, and the boundary conditions (19) and (20) are indeed separable. Note that (17), (19),and (20) represent the original system starting from the center, then moving outward. Thus any smoothing algorithm based on this model will involve processing outward toward and inward from the boundaries. Note also that the boundary conditions (19),and (20) provide only partial information about the states [xT(t), zT(-t)] at t = 0, and t = T and furthermore the boundary condition at t = 0 provides perfect information about part of the state [XT(0), xT(_)]. To continue, let us revert to a simpler notation for a general STPBVDS: E(t + 1)x(t + 1) = A(t)x(t) + Btu(t) 0 < t < T - 1 y(t) = C(t)x(t) + v(t) 0 <t <T [ v(t + Cov ) = (22) (23) 0 Rt+l (24) Here, consistent with our ML perspective, we have incorporated the independent boundary conditions on x(0) and x(T) into the measurements (23) at t = 0, and t = T. Viewing (22) through (24) as providing a set of noisy constraints, we can apply the ML estimation results of Subsection 2.1 to obtain recursive estimation algorithms. In presenting these algorithms it is useful to define two auxiliary (forward and backward prediction) variables. zf(t) = E(t)x(t) zb(t) = A(t)ax(t) (25) (26) Let XML[sO, t] denote the ML estimate of x(s) based on (22) for 0 < r < t - 1,and (23) for 0 < r < t, and iMfL[S10, t] denote the ML estimate of z(s) based on (22) for 0 < r < t,and (23) for 0 < T < t. We then obtain the following forward ML filter (FMLF) equations: T tlO ] Ef[tl, t- 0 ] 0 I Pzf (t) ET(t) 0 - I ZML[tIOt y(t) 0 I-Pzf(t) E(t) R(t) 0 C(t) 0 0 0 0 CT(t) 0 0 0 -1 (27) where [ ]- T - [tl0o. t-1] ] IL II 0 R(t) 0 (I - PZ(t)) ET(t) o (I-Pf(t)) 0 E(t) C(t) 0 0 0 IJ 0 o o CT(t) O 8 Pf(t) = (E T (t)Pf (t)E(t))#(ET(t)P, (t)E(t)) (29) dM [t + 10, t] = Pf(t + 1)A(t)0ML[tlO , t] Ef[t + 110, t] = Pf(t + 1)(A(t)UEf[tJO, t]AT(t) + B(t)BT(t))Pzf(t + 1) T (31) T Pz,(t + 1) = I - (A(t)(I - Pf(t))A (t))#(A(t)(I - PI(t))A (t)) (32) where Pf (t) indicates the symmetric projection matrix which defines the estimable part of x(t) based on data through time t, and Pf (t) is the symmetric projection matrix which defines the estimable part of zf(t). Equations (27) through (31) represent the generalization of [12] to allow for the possibility that z(t) and / or zf(t) are not completely estimable. Also 4[tJ0, t] can be thought of as the error covariance in the estimate of iML[t1O, t] in the sense of (4). U [tl0, t] represents the corresponding error covariance for the estimable part of x(t) The matrix Ef [t + 110, t] has a similar interpretation for 4M[t f + 110, t]. Similarly we can define the backward ML filter (BMLF) where XML[Slt, TJ denotes the ML estimate of x(s) based on (22), for t < r < T and (23) for t < r < T, and ZbML[SIt, T] denotes the ML estimate of zb(s) based on (22), for t-1 < r < T and (23) for t < r < T.The BMLF is then given by T 0 R(t) I-Pb(t) 0 E iL[tt, ML[tl, = vb[tlt + 1, T] I0 0 0 A T(t) ~I '-'ft ' 0o - 0 I ]= 0 CT(t) ML[tt + 1,T] # y(t) 0 o 0 O O (33) Oi0b[tlt + f [tt, T] = I- Pb(t) A(t) 0 C(t) 1,T] 0 (I - Pzb(t)) AT(t) 0 (I - Pb(t)) A(t) R(t) 0 0 CT(t) 0 0 C(t) 0 0 0 I P?(t) = (A T (t)Pf(t)A(t))#(AT (t)Pf(t)A(t)) MbL[t- lt, T] (34) (35) = Pzb(t)E(t)iML[tIt, T] ,zb[tlt + 1, T] = Pb(t)(E(t + 1) [[t + lt + 1, T]E(t + 1)T + B(t)BT(t))Pb(t)(36) Pzb(t + 1) = I - (E(t)(I - P.(t))E(t)T )#(E(t)(I - P,(t))E(t)T ) (37) Also the FMLF and the BMLF can be combined to produce the optimal smoothed estimate using one of two forms. The first combines forward filtered with backward predicted estimates, and the second combines forward predicted with backward filtered estimates. iML[tIO, E- [tJl, t] 0O 0 0 0 (38) T] = 0 I -P(t) 0 I iML[tIO, T] I 0 zb[tt + 1, T] 0 I - Pb(t AT(t) I- Pf (t) 0 0 0 0 0 I I - Pb(t) A(t) A(t) 0 0 0 0 0 0 # AML[tO. t] 51L[tt --- i, T]) 0 0 0 (39) 9 oT - 0 o ,Zf[to0,t-1] O I- PIf(t) I - I P,(t) E(t) O O I-Pb(t) 0 IO 0 0 I/ - Pb(t) 0 / O I CSb[tlt,T] ET(t) MAL[tlt,T] 0 I-ML[t10, t- 1]) O 0 0 0 0 0 0 O O The smoothed error covariances are given by the following 0 - - T 0 uE[t0, T] = - [tl, t] 0 0 O P.(t) o o Pzb(t) I I AT(t) o o0 E,[tl0,T ] = 0 I T P,(t) 0 I 0 0 0 0 0 Pb(t) 0 0 0 A(t) 0 0 0 0 0 0 I 0 I E(t) ~zb[tlt + 1, T] 0 vb[tit, T] 0 0 PT(t) szf[tlo, t-1] 0 I P,f(t) ET(t) 0 P0(t) _ 0 0 o0 0 P, (t) 0 0 0 (40) # 0 0 O 0 (41) 0 I The FMLF and the BMLF together with either (38), or (39) form a generalization of the Mayne-Fraser two filter formulas for optimal smoothing on STPBVDS's in the case where x(t) may not be estimnabl:', while portions of it are specified perfectly. Specifically if E=I and only initial conditioI.s are specified (making the system well posed), the FMLF and the BMLF and (39) reduce to the usual Mayne-Fraser equations. As a result, the generalization to STPBVDS deals in a symmIetric way with information available at the two ends of the interval. It is also possible to generalize the Rauch-Tung-Striebel algorithml to STPBVDS's. This algorithm involves a forward sweep to compute xML[tO0, t] for each t producing the smoothed estimate i,(T) = iML[TI0, T] at one endpoint, which initiates a reverse sweep to compute x.(t) = xML[tlO, T] over the entire interval. The key to this backward sweep is again to interpret it as the computation of ML estimates based on an appropriate set of observations. In particular suppose that we have computed x,(t+1) and its corresponding error covariance v [t + 110, T], where EV[t + 110, T] is interpreted as in (4) if x(t) is not estimable. Then as shown in [13] the computation of x,(t) and E,[t + 110, T] can be obtained by solving the following ML estimation problem which captures all relevant information relating x(t) to x(t + 1) and the available estimates of of each of these: iML[tl0, tj] wL[t + 110, t] 0 E(t + 1)x,(t + 1) Pf (t) 0 (I - P,f(t + 1))A(t) O 0 P, (t + 1) (I - P(t(t I x(t) + 1)) + + (t)(2) By choosing appropriate change of variables this can be partitioned into two indlepelentlt observations. [ M L' [ t O t ] , - { v '[ t O t ]A T ( [t) [ lO 1 ]ttM ; f L t + 110, t ] {(I- Pf (t + 1))BT(t)B(t), Zf [t + 10,tfif t]} + 110, l0 t] t] ]}t110L[tt + 10 p~f~t) -vrttlO t]AT(t)E# [t + 110, t] (Pzf (t + 1) - I)A (I - P, (t + 1))(A(t) - BT(t)B(t)Ef [t + 110, t]) | MLIt ±+1 0,t] ir P~f(t + 1) E(t + 1)f[t + 110, T] I (t) [ z(t zf(t + 1) + V2 (t) 1 + 1) (44) where v 1 (t) has a covariance given by C(OV[vl(t)]ll =- f[to0, t] CoV[Vl (t)]l 2 = -f Cov[vl(t)]2 1 = -(I - ,[tlO,t]A(t)TE# [t + 110, t]A(t)f [tlJ, t] [tlO, t]A(t)TE# [t + (45) 10, t]BT(t)B(t)(I - Pf (t + 1)) (46) Pf (t + 1))BT(t)B(t),# [t + 110, t]A(t)Ef[tO, t]AT(t) (47) Cov[vi(t)]2 2 = (I - Pzf (t + 1))BT(t)B(t) - BT(t)B(t)3*, [t+110, t]BT(t)B(t)(I - P,,) (48) In equation (44) two observations are provided of the estimate of zf(t + 1) since one of the measurements is the smoothed estimate no additional information is contained in ftL[t + 110, t]. The estimate is therefore equal to Ei[t + 110, T] with error covariance given by E(t + 1)ESf[t + 110, T]ET(t + 1). The resulting ML estimate of x(t) is precisely f,(t). In the causal case in which E=I and all covariances are well defined and invertible, this rep'ices to the usual Rauch-Tung-Striebel algorithm. We refer the reader to [13] for explicit computations in the more general case. 3 Parallel Smoothing Algorithms In this section we describe two highly parallel algorithms for optimal smoothing for TPBVI)S as in (13) - (15). Amplifying on our discussion in Section 1, our algorithms have the following structure. First the overall data interval of definition is partitioned into disjoint subintervals. In each such subinterval we define the STPBVDS model (17), (19),(21) with the time origin taken at the center of the subinterval and perform outward filtering using the FMLF described in the preceding section. At the end of this stage information must be exchanged among the subinterval processors. From the perspective of any individual subinterval, the relevant information from all other subintervals can be interpreted as providing additional measurements of x(t) at the boundaries of this interval. Once this information is incorporated, each subinterval processor can proceed independently with either the BMLF / Mayne-Fraser procedure or the Rauch-Tung-Striebel algorithm described in the preceding section in order to produce optimal smoothed estimates across the entire subinterval. At this stage the advantage of the Rauch-Tung-Striebel algorithm is that the original data is not necessary to recover smoothed estimates. The preceding description requires several additional commlents. First since the itmost general boundary conditions for TPBVDS (15) couples x(-T) with x(T). our partitioning into subintervals must in essence view the points -T and T as neighbors. Thus for exanmple if we partition our data into two subintervals, the natural choice of partition is [-T/2, T/2] and [-T, (-T/2)-1] U[(T/2)+1, T]. In this case the outward processing over the first subinterval ( t=T/2 t=T t = -T/2 t = -T t=O, Figure 1: Combining data at ±T/2 incorporates the boundary measurement (19) while that for the other interval has as its 'center' the pair of points [( -T), x(T)] and incorporates the boundary 'measurement' (20), In general this boundary condition couples the two components of the state [xT(t), xT(-t]T over the interval t E [T/2 + 1, T]. However, if the original system is separable, these two components are completely decoupled implying as illustrated in Figure 1 that we in fact have a three interval decomposition with outward processing in the central interval and completely decoupled processing at the two ends. For simplicity in the subsequent discussion we will assume that the processing in the end intervals is also outward front their centerr. 1f fi'rt;hr :,khbinterval decomposition is performed the additiona l !r.-te;vals also employ outward processing. Because of the discrete nature of our time index, a general comment is required concerning the precise nature of the data exchange step. To illustrate this consider the case of an STPBVDS and the three interval decomposition described in the preceding paragraph. In this case one might expect the outward processing in the central interval to culminate with the filtered estimates ML[T/21 - T/2, T/2], and ML[--T/21 - T/2, T/2] while the two outer interval culminate in iML[(T/2) + 11(T/2) + 1, T] and .ML[-(T/2) - 11 - T, -(T/2) - 1]. However if we wish to view neighboring intervals as providing 'measurements' at subinterval endpoints then we need to produce predicted estimates as well. Note in particular that these prediction steps in essence incorporate the final dynamic constraints not used in the first local processing step, namely those relating boundary points of the neighboring subintervals. E((T/2) + 1)x((T/2) + 1) = A(T/2)x(T/2) + B(T/2)u(T/2) (49) E(-T/2)x(-T/2)= A(-T/2) - 1)x((-T/2) - 1) + B(-T/2) - 1)u((-T/2) - 1) (50) To simplify the discussion in the remainder of this section, we focus on the case of STPBVDS's (17). Furthermore we assume that the values of x(t) at subinterval boundlary endpoints are estimable based on the data in the subinterval. As a result it is no longer necessary to propagate pseudo-inverses in our subsequent discussion since all covariainces are well defined. The results of Section 2 can of course be used in the more general case. By making the assumption of strong observability introduced in [12] we are able to guarantee the estimability of x(t) at the end points. In the case of constant coefficients systems the assumption of strong observability implies that x(t) and x(-t) are jointly estimable based 12 on data over the interval [-t, t]( so that the joint error covariance is well defined) as long as t > n where n is the dimension of x(t). 2 . In addition it is useful to adopt simplified notation describing only those variables of interest in the exchange step. Specifically, again using an ML perspective, we have the following unknowns which we wish to estimate. [ [ X1 J x 1[ X3 'j x4 X5 [ . ' 2m-4 X2m-3 ] X2m- 2 [ 1 (51) 2 L2m-1 X2k-2 represents the left most internal boundary point of the (k)th subinterval, and X2k-1 represent the right most boundary point of the (k)th interval. The time indices indi- where cate that the indicated quantities are appropriate samples of X(t) at the various endpoints. In our three interval example, [x T , is given by [xT(O), XT((-T/2) - 1)], [ T , x T ] is given by [xT(-T/2),xT(T/2)] and finally [xTm_2, T_l] is given by [xT((T/2) + 1),T]. Our estimates of these variables are based on the following 'measurements' 4T] [X 2i-2 Y2i-2 Y2i-1 J [ 2i-1 E2i-2 (52) E2i-1 (53) as well as the following additional noisy constraints. E2iX2i = A2i- 1 x2i- 1 + B 2 i-lU2i- 1 where Ei, (54) and u, are independent Gaussian random variables with the following covariances. Cov(u) = I Co [E-2 ]= E2i-1 R2i-2 R 2 i-1,2i-2 (55) R2i-2,2i-1 R2i-1 (56) Here the 'measurements' (52) correspond to the independent endpoint estimates produced by each subinterval processor during the first stage, while the constraints (54) correspond to the dynamics (22) across subinterval boundaries. Note that because of our adoption of an ML procedure for the first stage, the zero mean gaussian variables ei, and uj are mutually independent. In contrast to the approaches in [6,7] this leads to dramatic simplifications in terms of interpretations of the result, computations, and preprocessing. Note also that (52), and (54) looks very much like our original STPBVDS. The only difference being the fact that the system (52), and (54) is trivially estimable since (54) provide complete measurements of each xi and the special form of the dynamics linking the bottom half of one state to the top half of the next. This form allows us to describe two procedures for the data exchange step, one of which is a natural application of the mllethods in Section 2 and the other of which offers some new possibilities for parallel processing. Algorithm # 1: We use a Mayne-Fraser or a Rauch-Tung-Striebel procedure to exchange information between subintervals. In particular let us describe a version of the FMLF tailored to this 2 Nole Ithat this implies thaIIt th(e irojf(clio Iliatrices Pr(t) aId Pi(t) 1teed oily .,e propar;lgate;l ,e'r a limit ed intterval of Icllgt h of at most 2n before they are equal to tile idlentity 13 model. In particular thanks to the form of this of the dynamics (54) our FMLF propagates estimnates of the odd numbered z(t)'s , i.e. the bottom halves of each state vector. 3 Also let zilj denote the ML estimate of zi based on Yk, k < j. Then 1il = Y1 (57) ELI1 = R (58) We can then compute i2k-112k-1 the estimates of Z2k-1 based on on Yl through Y2k-1, and equations (54) for i < k - 1. recursively as solutions to the ML estimation problems r2k-312k-3 Z[ I Y 2k- 0 Y Y2k-2 Y2k-1 0 -A2k-3 L 0 0 i- 0 E 2k- 2 T XI ° 0 0 X2k.312k3 o0 X2k-312k-3 L2k-312k-3 I J L2k-1 B2k-3U2k-3 2k-112k-1 12k-3 (59) E k-2 J where x2i-312i-3, the error in i2k-112k-3, is uncorrelated with u 2k- 3 , E2k-2, and e2k-1 and has covariance E.2i3_I2k3,.Equation (59) is of the form (3), and the solution i2k-112k-1 is directly obtained in the form given by (2). Finally, at the last stage we compute the full smoothed estimate of X2ml1 will have been obtained. This FMLF can then be combined either with an analogous BMLF to yield a Mayne-Fraser procedure or with a backward Rauch-Tung-Striebel step. Note that the extension to the case of non-estimable systems can be readily accomplished using the formalism in Section 2. However, even in the case of estimable variables it is still necessary in ueneral to use pseudo-inverses to solve the ML problems since the noise covariances needed to solve (59) are in general singular. 4 The computational structure of Algorithm # 1 involves essentially serial processing from subinterval to subinterval and thus takes time proportional to m the number of subintervals and in fact does not make use of the parallel computing power of the array processors, although it does only require nearest neighbor connectivity for the subinterval processors. Also the use of a purely serial formalism indicates that this approach is naturally associated with 1D processes. In contrast our second approach is highly parallelizable, although it uses more dense interprocessor communication, corresponding most naturally to a hypercube architecture. Also this approach, which involves propagation from fine to coarse partitions( of the data), is naturally extended to higher dimensions. Furthermore since the subinterval interchange step is itself a TPBVDS smoothing problem, this second approach also provides an alternate parallel processing algorithm for our original TPBVDS smoothing problem. The statistical interpretation of our second algorithm is best understood by contrasting it with that of Algorithm #1. Specifically in the FMLF step of Algorithml #1 we essentially use the Markovian nature of the TPBVDS to obtain a recursion for the best estimate i2i-_l2i- 1 of the boundary point of a data interval of increasing size based on all of the data within the interval. However this same philosophy leads to the idea of simultalleolsly obtaining recursions for estimates over several disjoint data intervals of increasing size. 3 'fIe I .fo1thie. 1,:,.ldl calculatI, e estii - es. o[ Of Ithle i to hfileop Imleal(jui.± Ito a stly %Ilmlu or , irvllc,,l {'lrl for the step forrcomliinuig the FIILJ" and B3ILF estimaites. Tlhe I !{.ih-'liifl-g-Stri l ,le :lg itIt i i .iil:lyki mI. modified. See [13] for (letails. 41For examplle vevtn in causal sySttlilS the tdyn. ic constraints [] does not necessaril' process w. haIve' a f1ll ralk niois' 14 where the estimates are merged as data intervals are joined. Taking this to it full limit, we obtain the following. Algorithm# 2: We suppose, for simplicity that m = 2 K so that the number of vectors to be estimated in (51) is a power of 2. To initialize the algorithm we use the measurements yi, in pairs as independent ML estimates of the following 2 K quantities. [ 2 - 21 ] 0 [ Y2i-2 i 2i-l1o J i = 2, 4,..., m (60) Y2i-1 with the corresponding estimation error given by the appropriate Ej's with the estimation error covariances given in (56). Then the first step of the algorithm merges the estimates in non-overlapping pairs together with the appropriate intervening dynamic constraint (54). Specifically, we can solve in parallel the following 2 K ML estimation problems: [2i-210 00I I0 00 i2i-110 0 0 I0 O i2i0O 2i+110 0 O -A2i E ] [ X2i-2 X0 I 0 [ 1 E2i2 (61) Ei-1 2i e2i 2i+1 0 2i1 2i+lB2i-1U2i-1 where i is an odd integer. Because of the block diagonal nature of the noise covariances in clese ML v estimation problems, and the special structure of the measurement matrices (i.e. each consists of an identity block together with one dynamic coupling constraint), these ML problems can be solved efficiently (see [13]). Furthermore the resulting estimates, which we denote by z2i-2(1, x2ij-ll, Z2ill, and x 2i+1 11, etc., have independent errors from ML problem to ML problem ( e.g. the error in carrying out the estimation indicated in equation (61) for i=1 is uncorrelated with the error in from the same equation when i=3). Note also that we have used half of the dynamic constraints in this first step. To continue the process it is important to realize that we essentially have the same problem as we did at the first stage! To make this more explicit, consider the estimate resulting from (61), for i = 3, i.e.i 411, i511, x6 11, and i 7 11 which the best estimates of Z4 , x$, z 6, and z7 based on the corresponding data from (52) and the intervening dynamic constraint from (54). However, thanks to the Markovian nature our system- or equivalently the local nature of the dynamic constraints- it is only the boundary elements of this set of estimates ii4I1, and 2i711 that are relevant to the estimation of variables outside this data interval when the remaining dynamic constraints from (54) are taken into account. Thus for the next step of the problem we wish to estimate the variables, [O [3 X4 [I,X X8 ][ ... X2mn- 11 (62) X2?n-4 X2m-5 2n-1 ] (2) based on the measurements [ X311 X31 I L3 11 1 (63) 15 lI I . __ II 11* II I I 'I Processor Processor Processor Processor Processor Processor Processor Processor ·#000 #001 #010 #011 #100 #101 #110 #111 Figure 2: Combining Independent Boundary Data in a Tree Structure [411 ;L71l I= ]1+ [ 4 X711 .. 4[ 1] (64) 711j (65) and the remaining dynamic constraints (i.e (54) for i=2,4, m) Thus we have half as many variables to estimate based an half as many independent dynamic constraints, and half as many 'measurements' representing the accumnulated information over intervals of twice the length as before. The complete processing structure is as depicted in Figure 2. Specifically, we have a tree of computations producing estimates at boundary points of merged intervals that double in size as we move up the tree indicating a coarsening of the data partitioning and a concomitant thinning of the required estimates. All of the computations in going fromt one level to the next can be calculated in parallel. Since the number of such complpltations is halved at each level, we have a natural pyramidal structure for the computations. Such a structure is perfectly well-suited to a hypercube architecture in which processors are placed at the vertices of a unit cube in an L-dimensional space and are directly connected to processors at nodes connected by edges. For our problem ideally we would like to use 16 a K-dimensional hypercube so that no processing steps required communication with any latency (i.e., communication between non-adjacent nodes). For example as illustrated in Figure 2, for the case of m = 3, the initialization step (corresponding to the initial local processing within each data subinterval) is carried out in parallel on all 8=23 processors. The next step involves pairings of processors that differ in only one bit (i.e. (000, 001), (010, 011), (100, 101), (100,111)) with processing accomplished in the first element of each pair incorporating the data for each of the two processors as well as the intervening dynamic constraint (which is then removed at the next step). At the next level, the remaining active processors are again paired so that there is only one bit difference ((000, 010) and (100, 110)), etc. Note that when we have reached the top of the tree, we have computed the full optimal estimate at only a pair of the xi boundary points. However, the procedure we have described is exactly the same in structure as the recursive method outlined in Section 2, except that here the recursion is indexed by the resolution of the data partitioning - i.e., we have described a fine-to-coarse recursion. It is not difficult to see then that what remains is the Rauch-Tung-Striebel back-substitution step, proceeding back down the tree,in parallel at each level, until at the end we have distributed appropriately the optimal smoothed estimates based on all data of the end points of each subinterval, which is exactly the same as the result of Algorithm # 1, although in this case the time required to do this is proportional to log(m) since we have been able to use parallel rather than serial operations. 4 Conclusion In this paper we have described new parallel algorithms for optimal smoothing for the class of two-point boundary value descriptor systems, which includes not only standard causal linear state models but also a rich class of noncausal models. Our approach is based on a partitioning of the data into subintervals with parallel local processing, followed by an interprocessor exchange step, and a subsequent parallel local processing step. The desire to simplify the problem of merging estimates and the nature of TPBVDS's led us to adopt an ML philosophy throughout our development, necessitating a generalization of recursive ML estimation procedures to allow for the possibility that neither the estimation error covariance nor its inverse may be well defined. This led us to a generalization of the MayneFraser two-filter smoothing algorithm, in which the two ends of the data interval are treated synnmmetrically, and of the Rauch-Tung-Striebel algorithm. As we have shown, the data interchange step of our parallel processing algorithms can itself be viewed as a smoothing problem for a TPBVDS whose "state" represents the boundaries of the subintervals used in the first, local processing stage. This led naturally to one class of algorithmic structures using the ML version of Mayne-Fraser or Rauch-Tung-Striebel that we have derived. An important point to note here that in this structure the estimates produced by the first local processing stage play the role of "mleasurements", with the sdynamic relationships between subinterval boundaries playing the role of dynamics. Note tlhat the interval dynamic relationships have in a sense been absorbed into the "measurements". This emphasizes not only the fact that measurements and dynamics play essentially identical roles as noisy constraints in the ML formalism but also that the key to essentially all efficient (and in our case parallel) estimation algorithms is the judicious choice of the order 7 in which these constraints are applied and in which variables are eliminated. This is perhaps even more apparent in our second algorithm which can serve either as the interprocessor interface step or as a stand-alone parallel smoothing algorithm. In this algorithm, the individual estimates from the first local processing stage (or the new data in the stand-alone mode) serve as initial condition for our dynamic model which evolves finer to coarser subinterval partitions by merging subintervals and keeping track only of the resulting exterior boundaries - i.e., the dynamics in this case are essentially nothing more than decimation! Moreover, the dynamic relationships between subinterval boundaries in this case essentially play the role of measurements! This leads us naturally to the consideration of dynamic models on dyadic trees, a topic that has also arisen quite independently, from the development of statistical filtering methods related to the wavelet transform. We refer the reader to [14] for complete expositions of this topic. Finally, as we indicated in the introduction, the structure of Algorithm # 2 can be easily extended to multiple dimensions. For example consider a Markov random field [15] on a 2-D rectangular grid, and suppose we partition the data array into many smaller rectangles. A natural parallel processing structure in this case is a first, parallel, outward processing step within each sub-rectangle, followed by an exchange of boundary information and a subsequent parallel inward processing step. Looking more carefully at the boundary exchange step, we can imagine performing it in exactly the same way as in Algorithm # 2: merging smaller rectangles into larger ones (in parallel) and propagating information about the resulting outward boundaries. Note that t his also can be organized to have a dyadic -ese structure and thus is naturally matched to the hypcaube architecture. Obvi .:!y; while what we have just described is superficially identical to what we have discussed in this paper, there are substantial differences since outward and inward processing on rectangles is quite different from that on intervals and the same is obviously true about the relationship between rectangle and interval boundaries! Thus, the development of methods for 2-D smoothing that realize the structure we have described is far from. trivial. An investigation of this problem is currently underway and will be reported on in [13]. References [1] Bello, Martin G., Alan S. Willsky, Bernard C. Levy, and David A. Castanon, Smoothing Error Dynamics and Their Use in the Solution of Smoothing and Mapping Problems, IEEE Transactions on Information Theory, Vol IT-32, No.4, July 1986 [2] Levy, Bernard C., David A. Castanon, George C. Verghese, and Alan S. Willsky, A Scattering Framework for Decentralized Estimation Problems, Automatica Vol 19, No 4, pp373-384, 1983 [3] Speyer J.L., Computation and Transmission Requirements for a Decentralized LinearQuadratic-GaussianControl Problem, IEEE Transactions on Automatic Control, Vol AC-24, pp266, 1979 [4] Willsky, Alan S., Martin G. Bello, David A. Castanon, Bernard C Levy and George C Verghese, Combining and Updating of Local Estimates and Regional Maps (along 18 one Dimensional Tracks, IEEE Transactions on Automatic Control, Vol AC-27, No.4, pp799-813, August 1982 [5] Hashemipour, Hamid R., Stunit Roy, and Alan J. Laub, Decentralized Structures for ParallelKalman Filtering,IEEE Transactions on Automatic Control, Vol AC-33, No.1, January 1988 [6] Morf, M., J.R. Dobbins, B. Freidlander, and T. Kailath, Square Root Algorithms for ParallelProcessingin Optimal Estimation, Automatica, Vol-15, pp. 299-306, 1979 [7] Tewfik, A.H., B.C. Levy, and A.S. Willsky, A New Distributed Smoothing Algorithm, MIT Laboratory for Information and Decision Systems, (LIDS-P- 1501), Aug 1988 [8] Catlin, Donald E.,Estimation of random States in General Linear Models, IEEE Transactions on automatic control, Vol. 36, No.2, February 1991 [9] Campbell, S.L., and C.D. Meyer, Generalized Inverses of Linear Transformations. London: Pitman, 1979 [10] Ljung Lennart, and Thomas Kailath , A Unified Approach to Smoothing Formulas, Automatica, Vol. 12, ppl 4 7-157, 1976 [11] Nikoukah, Ramine A Deterministic and Stochastic Theory for Two-point Boundary Value Descriptor Systems, MIT-Laboratory for information and Decisi(?n Systemq LIDS-TH-1820 [12] Nikoukhah, R., A.S. Willsky, and B.C. Levy, Kalman Filtering and Riccati Equations for Descriptor Systems Proceedings of the 2 9 th IEEE Conference on Decision and Control, Dec 1990 [13] Taylor, Darrin ParallelEstimation on Two Dimensional Systems Ph.D. Thesis, MIT, Aug 1991 [14] Chou, K.C., A Stochastic Modeling Approach MultiScale Signal ProcessingPh.D Thesis, MIT, Jun 1991 [15] Levy, B.C., M.B. Adams, A.S. Willsky, Solution and Linear Estimation of 2-D Nearest neighbor Models Proceedings of the IEEE Vol.78, No. 4, April 1990